Re: Re: Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair: Add lag based placement)

2024-01-31 Thread Tobias Huschle
On Sun, Jan 21, 2024 at 01:44:32PM -0500, Michael S. Tsirkin wrote:
> On Mon, Jan 08, 2024 at 02:13:25PM +0100, Tobias Huschle wrote:
> > On Thu, Dec 14, 2023 at 02:14:59AM -0500, Michael S. Tsirkin wrote:
> > - Along with the wakeup of the kworker, need_resched needs to
> >   be set, such that cond_resched() triggers a reschedule.
> 
> Let's try this? Does not look like discussing vhost itself will
> draw attention from scheduler guys but posting a scheduling
> patch probably will? Can you post a patch?

As a baseline, I verified that the following two options fix
the regression:

- replacing the cond_resched in the vhost_worker function with a hard
  schedule 
- setting the need_resched flag using set_tsk_need_resched(current)
  right before calling cond_resched

I then tried to find a better spot to put the set_tsk_need_resched
call. 

One approach I found to be working is setting the need_resched flag 
at the end of handle_tx and hande_rx.
This would be after data has been actually passed to the socket, so 
the originally blocked kworker has something to do and will profit
from the reschedule. 
It might be possible to go deeper and place the set_tsk_need_resched
call to the location right after actually passing the data, but this
might leave us with sprinkling that call in multiple places and
might be too intrusive.
Furthermore, it might be possible to check if an error occured when
preparing the transmission and then skip the setting of the flag.

This would require a conceptual decision on the vhost side.
This solution would not touch the scheduler, only incentivise it to
do the right thing for this particular regression.

Another idea could be to find the counterpart that initiates the
actual data transfer, which I assume wakes up the kworker. From
what I gather it seems to be an eventfd notification that ends up
somewhere in the qemu code. Not sure if that context would allow
to set the need_resched flag, nor whether this would be a good idea.

> 
> > - On cond_resched(), verify if the consumed runtime of the caller
> >   is outweighing the negative lag of another process (e.g. the 
> >   kworker) and schedule the other process. Introduces overhead
> >   to cond_resched.
> 
> Or this last one.

On cond_resched itself, this will probably only be possible in a very 
very hacky way. That is because currently, there is no immidiate access
to the necessary data available, which would make it necessary to 
bloat up the cond_resched function quite a bit, with a probably 
non-negligible amount of overhead.

Changing other aspects in the scheduler might get us in trouble as
they all would probably resolve back to the question "What is the magic
value that determines whether a small task not being scheduled justifies
setting the need_resched flag for a currently running task or adjusting 
its lag?". As this would then also have to work for all non-vhost related
cases, this looks like a dangerous path to me on second thought.


 Summary 

In my (non-vhost experience) opinion the way to go would be either
replacing the cond_resched with a hard schedule or setting the
need_resched flag within vhost if the a data transfer was successfully
initiated. It will be necessary to check if this causes problems with
other workloads/benchmarks.



[v1] trace/hwlat: stop worker if !is_percpu_thread due to hotplug event

2024-01-31 Thread Andy Chiu
If the task happens to run after cpu hot-plug offline, then it would not
be running in a percpu_thread. Instead, it would be re-queued into a
UNBOUND workqueue. This would trigger a warning if we enable kernel
preemption.

Signed-off-by: Andy Chiu 
---
 kernel/trace/trace_hwlat.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
index b791524a6536..87258ddc2141 100644
--- a/kernel/trace/trace_hwlat.c
+++ b/kernel/trace/trace_hwlat.c
@@ -511,7 +511,16 @@ static int start_cpu_kthread(unsigned int cpu)
 static void hwlat_hotplug_workfn(struct work_struct *dummy)
 {
struct trace_array *tr = hwlat_trace;
-   unsigned int cpu = smp_processor_id();
+   unsigned int cpu;
+
+   /*
+* If the work is scheduled after CPU hotplug offline being invoked,
+* then it would be queued into UNBOUNDED workqueue
+*/
+   if (!is_percpu_thread())
+   return;
+
+   cpu = smp_processor_id();
 
mutex_lock(_types_lock);
mutex_lock(_data.lock);
-- 
2.43.0




[v1] trace/osnoise: prevent osnoise hotplog worker running in UNBOUND workqueue

2024-01-31 Thread Andy Chiu
smp_processor_id() should be called with migration disabled. This mean
we may safely call smp_processor_id() in percpu thread. However, this is
not the case if the work is (re-)queued into unbound workqueue, during
cpu-hotplog. So, detect and return early if this work happens to run on
an unbound wq.

Signed-off-by: Andy Chiu 
---
 kernel/trace/trace_osnoise.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index bd0d01d00fb9..cf7f716d3f35 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -2068,7 +2068,12 @@ static int start_per_cpu_kthreads(void)
 #ifdef CONFIG_HOTPLUG_CPU
 static void osnoise_hotplug_workfn(struct work_struct *dummy)
 {
-   unsigned int cpu = smp_processor_id();
+   unsigned int cpu;
+
+   if (!is_percpu_thread())
+   return;
+
+   cpu = smp_processor_id();
 
mutex_lock(_types_lock);
 
-- 
2.43.0




Re: [PATCH RFC v3 13/35] mm: memory: Introduce fault-on-access mechanism for pages

2024-01-31 Thread Anshuman Khandual
On 1/25/24 22:12, Alexandru Elisei wrote:
> Introduce a mechanism that allows an architecture to trigger a page fault,
> and add the infrastructure to handle that fault accordingly. To use make> use 
> of this, an arch is expected to mark the table entry as PAGE_NONE (which
> will cause a fault next time it is accessed) and to implement an
> arch-specific method (like a software bit) for recognizing that the fault
> needs to be handled by the arch code.
> 
> arm64 will use of this approach to reserve tag storage for pages which are
> mapped in an MTE enabled VMA, but the storage needed to store tags isn't
> reserved (for example, because of an mprotect(PROT_MTE) call on a VMA with
> existing pages).

Just to summerize -

So platform will create NUMA balancing like page faults - via marking existing
mappings with PAGE_NONE permission, when the subsequent fault happens identify
such cases via a software bit in the page table entry and then route the fault
to the platform code itself for special purpose page fault handling where page
might come from some reserved areas instead.

Some questions

- How often PAGE_NONE is to be marked for applicable MTE VMA based mappings 

- Is it periodic like NUMA balancing or just one time for tag storage

- How this is going to interact with NUMA balancing given both use PAGE_NONE

- How to differentiate these mappings from standard pte_protnone()

> 
> Signed-off-by: Alexandru Elisei 
> ---
> 
> Changes since rfc v2:
> 
> * New patch. Split from patch #19 ("mm: mprotect: Introduce 
> PAGE_FAULT_ON_ACCESS
> for mprotect(PROT_MTE)") (David Hildenbrand).
> 
>  include/linux/huge_mm.h |  4 ++--
>  include/linux/pgtable.h | 47 +++--
>  mm/Kconfig  |  3 +++
>  mm/huge_memory.c| 36 +
>  mm/memory.c | 51 ++---
>  5 files changed, 109 insertions(+), 32 deletions(-)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 5adb86af35fc..4678a0a5e6a8 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -346,7 +346,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct 
> *vma, unsigned long addr,
>  struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long 
> addr,
>   pud_t *pud, int flags, struct dev_pagemap **pgmap);
>  
> -vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf);
> +vm_fault_t handle_huge_pmd_protnone(struct vm_fault *vmf);
>  
>  extern struct page *huge_zero_page;
>  extern unsigned long huge_zero_pfn;
> @@ -476,7 +476,7 @@ static inline spinlock_t *pud_trans_huge_lock(pud_t *pud,
>   return NULL;
>  }
>  
> -static inline vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
> +static inline vm_fault_t handle_huge_pmd_protnone(struct vm_fault *vmf)
>  {
>   return 0;
>  }
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 2d0f04042f62..81a21be855a2 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1455,7 +1455,7 @@ static inline int pud_trans_unstable(pud_t *pud)
>   return 0;
>  }
>  
> -#ifndef CONFIG_NUMA_BALANCING
> +#if !defined(CONFIG_NUMA_BALANCING) && 
> !defined(CONFIG_ARCH_HAS_FAULT_ON_ACCESS)
>  /*
>   * In an inaccessible (PROT_NONE) VMA, pte_protnone() may indicate "yes". It 
> is
>   * perfectly valid to indicate "no" in that case, which is why our default
> @@ -1477,7 +1477,50 @@ static inline int pmd_protnone(pmd_t pmd)
>  {
>   return 0;
>  }
> -#endif /* CONFIG_NUMA_BALANCING */
> +#endif /* !CONFIG_NUMA_BALANCING && !CONFIG_ARCH_HAS_FAULT_ON_ACCESS */
> +
> +#ifndef CONFIG_ARCH_HAS_FAULT_ON_ACCESS
> +static inline bool arch_fault_on_access_pte(pte_t pte)
> +{
> + return false;
> +}
> +
> +static inline bool arch_fault_on_access_pmd(pmd_t pmd)
> +{
> + return false;
> +}
> +
> +/*
> + * The function is called with the fault lock held and an elevated reference 
> on
> + * the folio.
> + *
> + * Rules that an arch implementation of the function must follow:
> + *
> + * 1. The function must return with the elevated reference dropped.
> + *
> + * 2. If the return value contains VM_FAULT_RETRY or VM_FAULT_COMPLETED then:
> + *
> + * - if FAULT_FLAG_RETRY_NOWAIT is not set, the function must return with the
> + *   correct fault lock released, which can be accomplished with
> + *   release_fault_lock(vmf). Note that release_fault_lock() doesn't check if
> + *   FAULT_FLAG_RETRY_NOWAIT is set before releasing the mmap_lock.
> + *
> + * - if FAULT_FLAG_RETRY_NOWAIT is set, then the function must not release 
> the
> + *   mmap_lock. The flag should be set only if the mmap_lock is held.
> + *
> + * 3. If the return value contains neither of the above, the function must 
> not
> + * release the fault lock; the generic fault handler will take care of 
> releasing
> + * the correct lock.
> + */
> +static inline vm_fault_t arch_handle_folio_fault_on_access(struct folio 
> 

Re: [PATCH 0/4] apply page shift to PFN instead of VA in pfn_to_virt

2024-01-31 Thread Arnd Bergmann
On Thu, Feb 1, 2024, at 01:01, Yan Zhao wrote:
> On Wed, Jan 31, 2024 at 12:48:38PM +0100, Arnd Bergmann wrote:
>> On Wed, Jan 31, 2024, at 06:51, Yan Zhao wrote:
>> 
>> How exactly did you notice the function being wrong,
>> did you try to add a user somewhere, or just read through
>> the code?
> I came across them when I was debugging an unexpected kernel page fault
> on x86, and I was not sure whether page_to_virt() was compiled in
> asm-generic/page.h or linux/mm.h.
> Though finally, it turned out that the one in linux/mm.h was used, which
> yielded the right result and the unexpected kernel page fault in my case
> was not related to page_to_virt(), it did lead me to noticing that the
> pfn_to_virt() in asm-generic/page.h and other 3 archs did not look right.
>
> Yes, unlike virt_to_pfn() which still has a caller in openrisc (among
> csky, Hexagon, openrisc), pfn_to_virt() now does not have a caller in
> the 3 archs. Though both virt_to_pfn() and pfn_to_virt() are referenced
> in asm-generic/page.h, I also not sure if we need to remove the
> asm-generic/page.h which may serve as a template to future archs ?
>
> So, either way looks good to me :)

I think it's fair to assume we won't need asm-generic/page.h any
more, as we likely won't be adding new NOMMU architectures.
I can have a look myself at removing any such unused headers in
include/asm-generic/, it's probably not the only one.

Can you just send a patch to remove the unused pfn_to_virt()
functions?

 Arnd



Re: [PATCH v4] virtio_net: Support RX hash XDP hint

2024-01-31 Thread Jason Wang
On Wed, Jan 31, 2024 at 11:55 AM Liang Chen  wrote:
>
> The RSS hash report is a feature that's part of the virtio specification.
> Currently, virtio backends like qemu, vdpa (mlx5), and potentially vhost
> (still a work in progress as per [1]) support this feature. While the
> capability to obtain the RSS hash has been enabled in the normal path,
> it's currently missing in the XDP path. Therefore, we are introducing
> XDP hints through kfuncs to allow XDP programs to access the RSS hash.
>
> 1.
> https://lore.kernel.org/all/20231015141644.260646-1-akihiko.od...@daynix.com/#r
>
> Signed-off-by: Liang Chen 
> Reviewed-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks




Re: [PATCH v1] module.h: define __symbol_get_gpl() as a regular __symbol_get()

2024-01-31 Thread Christoph Hellwig
On Wed, Jan 31, 2024 at 10:02:52PM +0300, Andrew Kanner wrote:
> Prototype for __symbol_get_gpl() was introduced in the initial git
> commit 1da177e4c3f4 ("Linux-2.6.12-rc2"), but was not used after that.
> 
> In commit 9011e49d54dc ("modules: only allow symbol_get of
> EXPORT_SYMBOL_GPL modules") Christoph Hellwig switched __symbol_get()
> to process GPL symbols only, most likely this is what
> __symbol_get_gpl() was designed to do.
> 
> We might either define __symbol_get_gpl() as __symbol_get() or remove
> it completely as suggested by Mauro Carvalho Chehab.

Just remove it, there is no need to keep unused funtionality around.

Btw, where did the discussion start?  I hope you're not trying to
add new symbol_get users?




Re: [PATCH] lib/test_kmod: fix kernel-doc warnings

2024-01-31 Thread Randy Dunlap
Hi,

Any comments on this patch?
Thanks.


On 11/3/23 21:20, Randy Dunlap wrote:
> Fix all kernel-doc warnings in test_kmod.c:
> - Mark some enum values as private so that kernel-doc is not needed
>   for them
> - s/thread_mutex/thread_lock/ in a struct's kernel-doc comments
> - add kernel-doc info for @task_sync
> 
> test_kmod.c:67: warning: Enum value '__TEST_KMOD_INVALID' not described in 
> enum 'kmod_test_case'
> test_kmod.c:67: warning: Enum value '__TEST_KMOD_MAX' not described in enum 
> 'kmod_test_case'
> test_kmod.c:100: warning: Function parameter or member 'task_sync' not 
> described in 'kmod_test_device_info'
> test_kmod.c:134: warning: Function parameter or member 'thread_mutex' not 
> described in 'kmod_test_device'
> 
> Signed-off-by: Randy Dunlap 
> Cc: Luis Chamberlain 
> Cc: linux-modu...@vger.kernel.org
> ---
>  lib/test_kmod.c |6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff -- a/lib/test_kmod.c b/lib/test_kmod.c
> --- a/lib/test_kmod.c
> +++ b/lib/test_kmod.c
> @@ -58,11 +58,14 @@ static int num_test_devs;
>   * @need_mod_put for your tests case.
>   */
>  enum kmod_test_case {
> + /* private: */
>   __TEST_KMOD_INVALID = 0,
> + /* public: */
>  
>   TEST_KMOD_DRIVER,
>   TEST_KMOD_FS_TYPE,
>  
> + /* private: */
>   __TEST_KMOD_MAX,
>  };
>  
> @@ -82,6 +85,7 @@ struct kmod_test_device;
>   * @ret_sync: return value if request_module() is used, sync request for
>   *   @TEST_KMOD_DRIVER
>   * @fs_sync: return value of get_fs_type() for @TEST_KMOD_FS_TYPE
> + * @task_sync: kthread's task_struct or %NULL if not running
>   * @thread_idx: thread ID
>   * @test_dev: test device test is being performed under
>   * @need_mod_put: Some tests (get_fs_type() is one) requires putting the 
> module
> @@ -108,7 +112,7 @@ struct kmod_test_device_info {
>   * @dev: pointer to misc_dev's own struct device
>   * @config_mutex: protects configuration of test
>   * @trigger_mutex: the test trigger can only be fired once at a time
> - * @thread_lock: protects @done count, and the @info per each thread
> + * @thread_mutex: protects @done count, and the @info per each thread
>   * @done: number of threads which have completed or failed
>   * @test_is_oom: when we run out of memory, use this to halt moving forward
>   * @kthreads_done: completion used to signal when all work is done

-- 
#Randy



[PATCH v3] tracefs: dentry lookup crapectomy

2024-01-31 Thread Steven Rostedt
From: Linus Torvalds 

The dentry lookup for eventfs files was very broken, and had lots of
signs of the old situation where the filesystem names were all created
statically in the dentry tree, rather than being looked up dynamically
based on the eventfs data structures.

You could see it in the naming - how it claimed to "create" dentries
rather than just look up the dentries that were given it.

You could see it in various nonsensical and very incorrect operations,
like using "simple_lookup()" on the dentries that were passed in, which
only results in those dentries becoming negative dentries.  Which meant
that any other lookup would possibly return ENOENT if it saw that
negative dentry before the data was then later filled in.

You could see it in the immense amount of nonsensical code that didn't
actually just do lookups.

Link: 
https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.s...@intel.com/

Cc: sta...@vger.kernel.org
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds 
Signed-off-by: Steven Rostedt (Google) 
---
Changes since v2: 
https://lore.kernel.org/linux-trace-kernel/20240131185512.799813...@goodmis.org

- Removed returning ERR_PTR(-ENOENT) and just return NULL. Got rid of the
  noent: label, as everything can now just jump to out:
  (Suggested by Al Viro)

 fs/tracefs/event_inode.c | 275 +++
 fs/tracefs/inode.c   |  69 --
 fs/tracefs/internal.h|   3 -
 3 files changed, 50 insertions(+), 297 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index e9819d719d2a..04c2ab90f93e 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -230,7 +230,6 @@ static struct eventfs_inode *eventfs_find_events(struct 
dentry *dentry)
 {
struct eventfs_inode *ei;
 
-   mutex_lock(_mutex);
do {
// The parent is stable because we do not do renames
dentry = dentry->d_parent;
@@ -247,7 +246,6 @@ static struct eventfs_inode *eventfs_find_events(struct 
dentry *dentry)
}
// Walk upwards until you find the events inode
} while (!ei->is_events);
-   mutex_unlock(_mutex);
 
update_top_events_attr(ei, dentry->d_sb);
 
@@ -280,11 +278,10 @@ static void update_inode_attr(struct dentry *dentry, 
struct inode *inode,
 }
 
 /**
- * create_file - create a file in the tracefs filesystem
- * @name: the name of the file to create.
+ * lookup_file - look up a file in the tracefs filesystem
+ * @dentry: the dentry to look up
  * @mode: the permission that the file should have.
  * @attr: saved attributes changed by user
- * @parent: parent dentry for this file.
  * @data: something that the caller will want to get to later on.
  * @fop: struct file_operations that should be used for this file.
  *
@@ -292,13 +289,13 @@ static void update_inode_attr(struct dentry *dentry, 
struct inode *inode,
  * directory. The inode.i_private pointer will point to @data in the open()
  * call.
  */
-static struct dentry *create_file(const char *name, umode_t mode,
+static struct dentry *lookup_file(struct dentry *dentry,
+ umode_t mode,
  struct eventfs_attr *attr,
- struct dentry *parent, void *data,
+ void *data,
  const struct file_operations *fop)
 {
struct tracefs_inode *ti;
-   struct dentry *dentry;
struct inode *inode;
 
if (!(mode & S_IFMT))
@@ -307,15 +304,9 @@ static struct dentry *create_file(const char *name, 
umode_t mode,
if (WARN_ON_ONCE(!S_ISREG(mode)))
return NULL;
 
-   WARN_ON_ONCE(!parent);
-   dentry = eventfs_start_creating(name, parent);
-
-   if (IS_ERR(dentry))
-   return dentry;
-
inode = tracefs_get_inode(dentry->d_sb);
if (unlikely(!inode))
-   return eventfs_failed_creating(dentry);
+   return ERR_PTR(-ENOMEM);
 
/* If the user updated the directory's attributes, use them */
update_inode_attr(dentry, inode, attr, mode);
@@ -329,32 +320,29 @@ static struct dentry *create_file(const char *name, 
umode_t mode,
 
ti = get_tracefs(inode);
ti->flags |= TRACEFS_EVENT_INODE;
-   d_instantiate(dentry, inode);
+
+   d_add(dentry, inode);
fsnotify_create(dentry->d_parent->d_inode, dentry);
-   return eventfs_end_creating(dentry);
+   return dentry;
 };
 
 /**
- * create_dir - create a dir in the tracefs filesystem
+ * lookup_dir_entry - look up a dir in the tracefs filesystem
+ * @dentry: the directory to look up
  * @ei: the eventfs_inode that represents the directory to create
- * @parent: parent dentry for this file.
  *
- * This function will create a dentry for a directory represented by
+ * This function will look up a dentry 

Re: [PATCH v2 4/7] tracefs: dentry lookup crapectomy

2024-01-31 Thread Steven Rostedt
On Wed, 31 Jan 2024 22:21:27 -0500
Steven Rostedt  wrote:

> We (Linus and I) got it wrong. It originally had:
> 
>   d_add(dentry, NULL);
>   [..]
>   return NULL;

OK, so I changed that function to this:

static struct dentry *eventfs_root_lookup(struct inode *dir,
  struct dentry *dentry,
  unsigned int flags)
{
struct eventfs_inode *ei_child;
struct tracefs_inode *ti;
struct eventfs_inode *ei;
const char *name = dentry->d_name.name;

ti = get_tracefs(dir);
if (!(ti->flags & TRACEFS_EVENT_INODE))
return ERR_PTR(-EIO);

mutex_lock(_mutex);

ei = ti->private;
if (!ei || ei->is_freed)
goto out;

list_for_each_entry(ei_child, >children, list) {
if (strcmp(ei_child->name, name) != 0)
continue;
if (ei_child->is_freed)
goto out;
lookup_dir_entry(dentry, ei, ei_child);
goto out;
}

for (int i = 0; i < ei->nr_entries; i++) {
void *data;
umode_t mode;
const struct file_operations *fops;
const struct eventfs_entry *entry = >entries[i];

if (strcmp(name, entry->name) != 0)
continue;

data = ei->data;
if (entry->callback(name, , , ) <= 0)
goto out;

lookup_file_dentry(dentry, ei, i, mode, data, fops);
goto out;
}
 out:
mutex_unlock(_mutex);
return NULL;
}

And it passes the make kprobe test. I'll send out a v3 of this patch, and
remove the inc_nlink(dentry->d_parent->d_inode) and the fsnotify as
separate patches as that code was there before Linus touched it.

Thanks,

-- Steve



Re: [PATCH RFC v3 12/35] mm: Call arch_swap_prepare_to_restore() before arch_swap_restore()

2024-01-31 Thread Anshuman Khandual



On 1/25/24 22:12, Alexandru Elisei wrote:
> arm64 uses arch_swap_restore() to restore saved tags before the page is
> swapped in and it's called in atomic context (with the ptl lock held).
> 
> Introduce arch_swap_prepare_to_restore() that will allow an architecture to
> perform extra work during swap in and outside of a critical section.
> This will be used by arm64 to allocate a buffer in memory where to
> temporarily save tags if tag storage is not available for the page being
> swapped in.

Just wondering if tag storage will always be unavailable for tagged pages
being swapped in ? OR there are cases where allocation might not even be
required ? This prepare phase needs to be outside the critical section -
only because there might be memory allocations ?



Re: [PATCH v2 4/7] tracefs: dentry lookup crapectomy

2024-01-31 Thread Steven Rostedt
On Thu, 1 Feb 2024 03:02:05 +
Al Viro  wrote:

> > We had a problem here with just returning NULL. It leaves the negative
> > dentry around and doesn't get refreshed.  
> 
> Why would that dentry stick around?  And how would anyone find
> it, anyway, when it's not hashed?

We (Linus and I) got it wrong. It originally had:

d_add(dentry, NULL);
[..]
return NULL;

and it caused the:


  # ls events/kprobes/sched/
ls: cannot access 'events/kprobes/sched/': No such file or directory

  # echo 'p:sched schedule' >> /sys/kernel/tracing/kprobe_events 
  # ls events/kprobes/sched/
ls: cannot access 'events/kprobes/sched/': No such file or directory

I just changed the code to simply return NULL, and it had no issues:

  # ls events/kprobes/sched/
ls: cannot access 'events/kprobes/sched/': No such file or directory

  # echo 'p:sched schedule' >> /sys/kernel/tracing/kprobe_events 
  # ls events/kprobes/sched/
enable  filter  format  hist  hist_debug  id  inject  trigger

But then I added the: d_add(dentry, NULL); that we originally had, and then
it caused the issue again.

So it wasn't the returning NULL that was causing a problem, it was calling
the d_add(dentry, NULL); that was.

I'll update the patch.

-- Steve



[PATCH v4 6/6] LoongArch: Add pv ipi support on LoongArch system

2024-01-31 Thread Bibo Mao
On LoongArch system, ipi hw uses iocsr registers, there is one iocsr
register access on ipi sending, and two iocsr access on ipi receiving
which is ipi interrupt handler. On VM mode all iocsr registers
accessing will cause VM to trap into hypervisor. So with ipi hw
notification once there will be three times of trap.

This patch adds pv ipi support for VM, hypercall instruction is used
to ipi sender, and hypervisor will inject SWI on the VM. During SWI
interrupt handler, only estat CSR register is written to clear irq.
Estat CSR register access will not trap into hypervisor. So with pv ipi
supported, pv ipi sender will trap into hypervsor one time, pv ipi
revicer will not trap, there is only one time of trap.

Also this patch adds ipi multicast support, the method is similar with
x86. With ipi multicast support, ipi notification can be sent to at most
128 vcpus at one time. It reduces trap times into hypervisor greatly.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/include/asm/hardirq.h   |   1 +
 arch/loongarch/include/asm/kvm_host.h  |   1 +
 arch/loongarch/include/asm/kvm_para.h  | 124 +
 arch/loongarch/include/asm/loongarch.h |   1 +
 arch/loongarch/kernel/irq.c|   2 +-
 arch/loongarch/kernel/paravirt.c   | 113 ++
 arch/loongarch/kernel/smp.c|   2 +-
 arch/loongarch/kvm/exit.c  |  73 ++-
 arch/loongarch/kvm/vcpu.c  |   1 +
 9 files changed, 314 insertions(+), 4 deletions(-)

diff --git a/arch/loongarch/include/asm/hardirq.h 
b/arch/loongarch/include/asm/hardirq.h
index 9f0038e19c7f..8a611843c1f0 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -21,6 +21,7 @@ enum ipi_msg_type {
 typedef struct {
unsigned int ipi_irqs[NR_IPI];
unsigned int __softirq_pending;
+   atomic_t messages cacheline_aligned_in_smp;
 } cacheline_aligned irq_cpustat_t;
 
 DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/loongarch/include/asm/kvm_host.h 
b/arch/loongarch/include/asm/kvm_host.h
index 57399d7cf8b7..1bf927e2bfac 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -43,6 +43,7 @@ struct kvm_vcpu_stat {
u64 idle_exits;
u64 cpucfg_exits;
u64 signal_exits;
+   u64 hvcl_exits;
 };
 
 #define KVM_MEM_HUGEPAGE_CAPABLE   (1UL << 0)
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 41200e922a82..a25a84e372b9 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -9,6 +9,10 @@
 #define HYPERVISOR_VENDOR_SHIFT8
 #define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
 
+#define KVM_HC_CODE_SERVICE0
+#define KVM_HC_SERVICE HYPERCALL_CODE(HYPERVISOR_KVM, 
KVM_HC_CODE_SERVICE)
+#define  KVM_HC_FUNC_IPI   1
+
 /*
  * LoongArch hypcall return code
  */
@@ -16,6 +20,126 @@
 #define KVM_HC_INVALID_CODE-1UL
 #define KVM_HC_INVALID_PARAMETER   -2UL
 
+/*
+ * Hypercalls interface for KVM hypervisor
+ *
+ * a0: function identifier
+ * a1-a6: args
+ * Return value will be placed in v0.
+ * Up to 6 arguments are passed in a1, a2, a3, a4, a5, a6.
+ */
+static __always_inline long kvm_hypercall(u64 fid)
+{
+   register long ret asm("v0");
+   register unsigned long fun asm("a0") = fid;
+
+   __asm__ __volatile__(
+   "hvcl "__stringify(KVM_HC_SERVICE)
+   : "=r" (ret)
+   : "r" (fun)
+   : "memory"
+   );
+
+   return ret;
+}
+
+static __always_inline long kvm_hypercall1(u64 fid, unsigned long arg0)
+{
+   register long ret asm("v0");
+   register unsigned long fun asm("a0") = fid;
+   register unsigned long a1  asm("a1") = arg0;
+
+   __asm__ __volatile__(
+   "hvcl "__stringify(KVM_HC_SERVICE)
+   : "=r" (ret)
+   : "r" (fun), "r" (a1)
+   : "memory"
+   );
+
+   return ret;
+}
+
+static __always_inline long kvm_hypercall2(u64 fid,
+   unsigned long arg0, unsigned long arg1)
+{
+   register long ret asm("v0");
+   register unsigned long fun asm("a0") = fid;
+   register unsigned long a1  asm("a1") = arg0;
+   register unsigned long a2  asm("a2") = arg1;
+
+   __asm__ __volatile__(
+   "hvcl "__stringify(KVM_HC_SERVICE)
+   : "=r" (ret)
+   : "r" (fun), "r" (a1), "r" (a2)
+   : "memory"
+   );
+
+   return ret;
+}
+
+static __always_inline long kvm_hypercall3(u64 fid,
+   unsigned long arg0, unsigned long arg1, unsigned long arg2)
+{
+   register long ret asm("v0");
+   register unsigned long fun asm("a0") = fid;
+   register unsigned long a1  asm("a1") = arg0;
+   

[PATCH v4 5/6] LoongArch: KVM: Add vcpu search support from physical cpuid

2024-01-31 Thread Bibo Mao
Physical cpuid is used for interrupt routing for irqchips such as
ipi/msi/extioi interrupt controller. And physical cpuid is stored
at CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
is created and physical cpuid of two vcpu can not be the same. Since
different irqchips have different size declaration about physical cpuid,
KVM uses the smallest cpuid from extioi irqchip, and the max cpuid size
is defines as 256.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/include/asm/kvm_host.h | 26 
 arch/loongarch/include/asm/kvm_vcpu.h |  1 +
 arch/loongarch/kvm/vcpu.c | 93 ++-
 arch/loongarch/kvm/vm.c   | 11 
 4 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/arch/loongarch/include/asm/kvm_host.h 
b/arch/loongarch/include/asm/kvm_host.h
index 2d62f7b0d377..57399d7cf8b7 100644
--- a/arch/loongarch/include/asm/kvm_host.h
+++ b/arch/loongarch/include/asm/kvm_host.h
@@ -64,6 +64,30 @@ struct kvm_world_switch {
 
 #define MAX_PGTABLE_LEVELS 4
 
+/*
+ * Physical cpu id is used for interrupt routing, there are different
+ * definitions about physical cpuid on different hardwares.
+ *  For LOONGARCH_CSR_CPUID register, max cpuid size if 512
+ *  For IPI HW, max dest CPUID size 1024
+ *  For extioi interrupt controller, max dest CPUID size is 256
+ *  For MSI interrupt controller, max supported CPUID size is 65536
+ *
+ * Currently max CPUID is defined as 256 for KVM hypervisor, in future
+ * it will be expanded to 4096, including 16 packages at most. And every
+ * package supports at most 256 vcpus
+ */
+#define KVM_MAX_PHYID  256
+
+struct kvm_phyid_info {
+   struct kvm_vcpu *vcpu;
+   boolenabled;
+};
+
+struct kvm_phyid_map {
+   int max_phyid;
+   struct kvm_phyid_info phys_map[KVM_MAX_PHYID];
+};
+
 struct kvm_arch {
/* Guest physical mm */
kvm_pte_t *pgd;
@@ -71,6 +95,8 @@ struct kvm_arch {
unsigned long invalid_ptes[MAX_PGTABLE_LEVELS];
unsigned int  pte_shifts[MAX_PGTABLE_LEVELS];
unsigned int  root_level;
+   struct mutex  phyid_map_lock;
+   struct kvm_phyid_map  *phyid_map;
 
s64 time_offset;
struct kvm_context __percpu *vmcs;
diff --git a/arch/loongarch/include/asm/kvm_vcpu.h 
b/arch/loongarch/include/asm/kvm_vcpu.h
index 0cb4fdb8a9b5..9f53950959da 100644
--- a/arch/loongarch/include/asm/kvm_vcpu.h
+++ b/arch/loongarch/include/asm/kvm_vcpu.h
@@ -81,6 +81,7 @@ void kvm_save_timer(struct kvm_vcpu *vcpu);
 void kvm_restore_timer(struct kvm_vcpu *vcpu);
 
 int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
+struct kvm_vcpu *kvm_get_vcpu_by_cpuid(struct kvm *kvm, int cpuid);
 
 /*
  * Loongarch KVM guest interrupt handling
diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
index 27701991886d..97ca9c7160e6 100644
--- a/arch/loongarch/kvm/vcpu.c
+++ b/arch/loongarch/kvm/vcpu.c
@@ -274,6 +274,95 @@ static int _kvm_getcsr(struct kvm_vcpu *vcpu, unsigned int 
id, u64 *val)
return 0;
 }
 
+static inline int kvm_set_cpuid(struct kvm_vcpu *vcpu, u64 val)
+{
+   int cpuid;
+   struct loongarch_csrs *csr = vcpu->arch.csr;
+   struct kvm_phyid_map  *map;
+
+   if (val >= KVM_MAX_PHYID)
+   return -EINVAL;
+
+   cpuid = kvm_read_sw_gcsr(csr, LOONGARCH_CSR_ESTAT);
+   map = vcpu->kvm->arch.phyid_map;
+   mutex_lock(>kvm->arch.phyid_map_lock);
+   if (map->phys_map[cpuid].enabled) {
+   /*
+* Cpuid is already set before
+* Forbid changing different cpuid at runtime
+*/
+   if (cpuid != val) {
+   /*
+* Cpuid 0 is initial value for vcpu, maybe invalid
+* unset value for vcpu
+*/
+   if (cpuid) {
+   mutex_unlock(>kvm->arch.phyid_map_lock);
+   return -EINVAL;
+   }
+   } else {
+/* Discard duplicated cpuid set */
+   mutex_unlock(>kvm->arch.phyid_map_lock);
+   return 0;
+   }
+   }
+
+   if (map->phys_map[val].enabled) {
+   /*
+* New cpuid is already set with other vcpu
+* Forbid sharing the same cpuid between different vcpus
+*/
+   if (map->phys_map[val].vcpu != vcpu) {
+   mutex_unlock(>kvm->arch.phyid_map_lock);
+   return -EINVAL;
+   }
+
+   /* Discard duplicated cpuid set operation*/
+   mutex_unlock(>kvm->arch.phyid_map_lock);
+   return 0;
+   }
+
+   kvm_write_sw_gcsr(csr, LOONGARCH_CSR_CPUID, val);
+   map->phys_map[val].enabled  = true;
+   map->phys_map[val].vcpu = vcpu;
+   if 

[PATCH v4 4/6] LoongArch: Add paravirt interface for guest kernel

2024-01-31 Thread Bibo Mao
The patch adds paravirt interface for guest kernel, function
pv_guest_initi() firstly checks whether system runs on VM mode. If kernel
runs on VM mode, it will call function kvm_para_available() to detect
whether current VMM is KVM hypervisor. And the paravirt function can work
only if current VMM is KVM hypervisor, since there is only KVM hypervisor
supported on LoongArch now.

This patch only adds paravirt interface for guest kernel, however there
is not effective pv functions added here.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/Kconfig|  9 
 arch/loongarch/include/asm/kvm_para.h |  7 
 arch/loongarch/include/asm/paravirt.h | 27 
 .../include/asm/paravirt_api_clock.h  |  1 +
 arch/loongarch/kernel/Makefile|  1 +
 arch/loongarch/kernel/paravirt.c  | 41 +++
 arch/loongarch/kernel/setup.c |  2 +
 7 files changed, 88 insertions(+)
 create mode 100644 arch/loongarch/include/asm/paravirt.h
 create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
 create mode 100644 arch/loongarch/kernel/paravirt.c

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 10959e6c3583..817a56dff80f 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -585,6 +585,15 @@ config CPU_HAS_PREFETCH
bool
default y
 
+config PARAVIRT
+   bool "Enable paravirtualization code"
+   depends on AS_HAS_LVZ_EXTENSION
+   help
+  This changes the kernel so it can modify itself when it is run
+ under a hypervisor, potentially improving performance significantly
+ over full virtualization.  However, when run without a hypervisor
+ the kernel is theoretically slower and slightly larger.
+
 config ARCH_SUPPORTS_KEXEC
def_bool y
 
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
index 9425d3b7e486..41200e922a82 100644
--- a/arch/loongarch/include/asm/kvm_para.h
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -2,6 +2,13 @@
 #ifndef _ASM_LOONGARCH_KVM_PARA_H
 #define _ASM_LOONGARCH_KVM_PARA_H
 
+/*
+ * Hypcall code field
+ */
+#define HYPERVISOR_KVM 1
+#define HYPERVISOR_VENDOR_SHIFT8
+#define HYPERCALL_CODE(vendor, code)   ((vendor << HYPERVISOR_VENDOR_SHIFT) + 
code)
+
 /*
  * LoongArch hypcall return code
  */
diff --git a/arch/loongarch/include/asm/paravirt.h 
b/arch/loongarch/include/asm/paravirt.h
new file mode 100644
index ..b64813592ba0
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_PARAVIRT_H
+#define _ASM_LOONGARCH_PARAVIRT_H
+
+#ifdef CONFIG_PARAVIRT
+#include 
+struct static_key;
+extern struct static_key paravirt_steal_enabled;
+extern struct static_key paravirt_steal_rq_enabled;
+
+u64 dummy_steal_clock(int cpu);
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+
+static inline u64 paravirt_steal_clock(int cpu)
+{
+   return static_call(pv_steal_clock)(cpu);
+}
+
+int pv_guest_init(void);
+#else
+static inline int pv_guest_init(void)
+{
+   return 0;
+}
+
+#endif // CONFIG_PARAVIRT
+#endif
diff --git a/arch/loongarch/include/asm/paravirt_api_clock.h 
b/arch/loongarch/include/asm/paravirt_api_clock.h
new file mode 100644
index ..65ac7cee0dad
--- /dev/null
+++ b/arch/loongarch/include/asm/paravirt_api_clock.h
@@ -0,0 +1 @@
+#include 
diff --git a/arch/loongarch/kernel/Makefile b/arch/loongarch/kernel/Makefile
index 3c808c680370..662e6e9de12d 100644
--- a/arch/loongarch/kernel/Makefile
+++ b/arch/loongarch/kernel/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_MODULES) += module.o module-sections.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 
 obj-$(CONFIG_PROC_FS)  += proc.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 
 obj-$(CONFIG_SMP)  += smp.o
 
diff --git a/arch/loongarch/kernel/paravirt.c b/arch/loongarch/kernel/paravirt.c
new file mode 100644
index ..21d01d05791a
--- /dev/null
+++ b/arch/loongarch/kernel/paravirt.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct static_key paravirt_steal_enabled;
+struct static_key paravirt_steal_rq_enabled;
+
+static u64 native_steal_clock(int cpu)
+{
+   return 0;
+}
+
+DEFINE_STATIC_CALL(pv_steal_clock, native_steal_clock);
+
+static bool kvm_para_available(void)
+{
+   static int hypervisor_type;
+   int config;
+
+   if (!hypervisor_type) {
+   config = read_cpucfg(CPUCFG_KVM_SIG);
+   if (!memcmp(, KVM_SIGNATURE, 4))
+   hypervisor_type = HYPERVISOR_KVM;
+   }
+
+   return hypervisor_type == HYPERVISOR_KVM;
+}
+
+int __init pv_guest_init(void)
+{
+   if (!cpu_has_hypervisor)
+   return 0;
+   if (!kvm_para_available())
+   return 0;
+
+ 

[PATCH v4 0/6] LoongArch: Add pv ipi support on LoongArch VM

2024-01-31 Thread Bibo Mao
This patchset adds pv ipi support for VM. On physical machine, ipi HW
uses IOCSR registers, however there is trap into hypervisor when vcpu
accesses IOCSR registers if system is in VM mode. SWI is a interrupt
mechanism like SGI on ARM, software can send interrupt to CPU, only that
on LoongArch SWI can only be sent to local CPU now. So SWI can not used
for IPI on real HW system, however it can be used on VM when combined with
hypercall method. This patch uses SWI interrupt for IPI mechanism, SWI
injection uses hypercall method. And there is one trap with IPI sending,
however with IPI receiving there is no trap. with IOCSR HW ipi method,
there will be two trap into hypervisor with ipi receiving.

Also this patch adds IPI multicast support for VM, this idea comes from
x86 pv ipi. IPI can be sent to 128 vcpus in one time.

Here is the microbenchmarck data with perf bench futex wake case on 3C5000
single-way machine, there are 16 cpus on 3C5000 single-way machine, VM
has 16 vcpus also. The benchmark data is ms time unit to wakeup 16 threads,
the performance is higher if data is smaller.

perf bench futex wake, Wokeup 16 of 16 threads in ms
--physical machine--   --VM original--   --VM with pv ipi patch--
  0.0176 ms   0.1140 ms0.0481 ms

---
Change in V4:
  1. Modfiy pv ipi hook function name call_func_ipi() and 
call_func_single_ipi() with send_ipi_mask()/send_ipi_single(), since pv
ipi is used for both remote function call and reschedule notification.
  2. Refresh changelog.

Change in V3:
  1. Add 128 vcpu ipi multicast support like x86
  2. Change cpucfg base address from 0x1000 to 0x4000, in order
to avoid confliction with future hw usage
  3. Adjust patch order in this patchset, move patch
Refine-ipi-ops-on-LoongArch-platform to the first one.

Change in V2:
  1. Add hw cpuid map support since ipi routing uses hw cpuid
  2. Refine changelog description
  3. Add hypercall statistic support for vcpu
  4. Set percpu pv ipi message buffer aligned with cacheline
  5. Refine pv ipi send logic, do not send ipi message with if there is
pending ipi message.
---

Bibo Mao (6):
  LoongArch/smp: Refine ipi ops on LoongArch platform
  LoongArch: KVM: Add hypercall instruction emulation support
  LoongArch: KVM: Add cpucfg area for kvm hypervisor
  LoongArch: Add paravirt interface for guest kernel
  LoongArch: KVM: Add vcpu search support from physical cpuid
  LoongArch: Add pv ipi support on LoongArch system

 arch/loongarch/Kconfig|   9 +
 arch/loongarch/include/asm/Kbuild |   1 -
 arch/loongarch/include/asm/hardirq.h  |   5 +
 arch/loongarch/include/asm/inst.h |   1 +
 arch/loongarch/include/asm/irq.h  |  10 +-
 arch/loongarch/include/asm/kvm_host.h |  27 +++
 arch/loongarch/include/asm/kvm_para.h | 157 ++
 arch/loongarch/include/asm/kvm_vcpu.h |   1 +
 arch/loongarch/include/asm/loongarch.h|  11 ++
 arch/loongarch/include/asm/paravirt.h |  27 +++
 .../include/asm/paravirt_api_clock.h  |   1 +
 arch/loongarch/include/asm/smp.h  |  31 ++--
 arch/loongarch/include/uapi/asm/Kbuild|   2 -
 arch/loongarch/kernel/Makefile|   1 +
 arch/loongarch/kernel/irq.c   |  24 +--
 arch/loongarch/kernel/paravirt.c  | 154 +
 arch/loongarch/kernel/perf_event.c|  14 +-
 arch/loongarch/kernel/setup.c |   2 +
 arch/loongarch/kernel/smp.c   |  60 ---
 arch/loongarch/kernel/time.c  |  12 +-
 arch/loongarch/kvm/exit.c | 125 --
 arch/loongarch/kvm/vcpu.c |  94 ++-
 arch/loongarch/kvm/vm.c   |  11 ++
 23 files changed, 678 insertions(+), 102 deletions(-)
 create mode 100644 arch/loongarch/include/asm/kvm_para.h
 create mode 100644 arch/loongarch/include/asm/paravirt.h
 create mode 100644 arch/loongarch/include/asm/paravirt_api_clock.h
 delete mode 100644 arch/loongarch/include/uapi/asm/Kbuild
 create mode 100644 arch/loongarch/kernel/paravirt.c


base-commit: 1bbb19b6eb1b8685ab1c268a401ea64380b8bbcb
-- 
2.39.3




[PATCH v4 2/6] LoongArch: KVM: Add hypercall instruction emulation support

2024-01-31 Thread Bibo Mao
On LoongArch system, hypercall instruction is supported when system
runs on VM mode. This patch adds dummy function with hypercall
instruction emulation, rather than inject EXCCODE_INE invalid
instruction exception.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/include/asm/Kbuild  |  1 -
 arch/loongarch/include/asm/kvm_para.h  | 26 ++
 arch/loongarch/include/uapi/asm/Kbuild |  2 --
 arch/loongarch/kvm/exit.c  | 10 ++
 4 files changed, 36 insertions(+), 3 deletions(-)
 create mode 100644 arch/loongarch/include/asm/kvm_para.h
 delete mode 100644 arch/loongarch/include/uapi/asm/Kbuild

diff --git a/arch/loongarch/include/asm/Kbuild 
b/arch/loongarch/include/asm/Kbuild
index 93783fa24f6e..22991a6f0e2b 100644
--- a/arch/loongarch/include/asm/Kbuild
+++ b/arch/loongarch/include/asm/Kbuild
@@ -23,4 +23,3 @@ generic-y += poll.h
 generic-y += param.h
 generic-y += posix_types.h
 generic-y += resource.h
-generic-y += kvm_para.h
diff --git a/arch/loongarch/include/asm/kvm_para.h 
b/arch/loongarch/include/asm/kvm_para.h
new file mode 100644
index ..9425d3b7e486
--- /dev/null
+++ b/arch/loongarch/include/asm/kvm_para.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_LOONGARCH_KVM_PARA_H
+#define _ASM_LOONGARCH_KVM_PARA_H
+
+/*
+ * LoongArch hypcall return code
+ */
+#define KVM_HC_STATUS_SUCCESS  0
+#define KVM_HC_INVALID_CODE-1UL
+#define KVM_HC_INVALID_PARAMETER   -2UL
+
+static inline unsigned int kvm_arch_para_features(void)
+{
+   return 0;
+}
+
+static inline unsigned int kvm_arch_para_hints(void)
+{
+   return 0;
+}
+
+static inline bool kvm_check_and_clear_guest_paused(void)
+{
+   return false;
+}
+#endif /* _ASM_LOONGARCH_KVM_PARA_H */
diff --git a/arch/loongarch/include/uapi/asm/Kbuild 
b/arch/loongarch/include/uapi/asm/Kbuild
deleted file mode 100644
index 4aa680ca2e5f..
--- a/arch/loongarch/include/uapi/asm/Kbuild
+++ /dev/null
@@ -1,2 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-generic-y += kvm_para.h
diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
index ed1d89d53e2e..d15c71320a11 100644
--- a/arch/loongarch/kvm/exit.c
+++ b/arch/loongarch/kvm/exit.c
@@ -685,6 +685,15 @@ static int kvm_handle_lasx_disabled(struct kvm_vcpu *vcpu)
return RESUME_GUEST;
 }
 
+static int kvm_handle_hypcall(struct kvm_vcpu *vcpu)
+{
+   update_pc(>arch);
+
+   /* Treat it as noop intruction, only set return value */
+   vcpu->arch.gprs[LOONGARCH_GPR_A0] = KVM_HC_INVALID_CODE;
+   return RESUME_GUEST;
+}
+
 /*
  * LoongArch KVM callback handling for unimplemented guest exiting
  */
@@ -716,6 +725,7 @@ static exit_handle_fn kvm_fault_tables[EXCCODE_INT_START] = 
{
[EXCCODE_LSXDIS]= kvm_handle_lsx_disabled,
[EXCCODE_LASXDIS]   = kvm_handle_lasx_disabled,
[EXCCODE_GSPR]  = kvm_handle_gspr,
+   [EXCCODE_HVC]   = kvm_handle_hypcall,
 };
 
 int kvm_handle_fault(struct kvm_vcpu *vcpu, int fault)
-- 
2.39.3




[PATCH v4 3/6] LoongArch: KVM: Add cpucfg area for kvm hypervisor

2024-01-31 Thread Bibo Mao
VM will trap into hypervisor when executing cpucfg instruction. And
hardware only uses the area 0 - 20 for actual usage now, here one
specified area 0x4000 -- 0x40ff is used for KVM hypervisor,
and the area can be extended to use for other hypervisors in future.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/include/asm/inst.h  |  1 +
 arch/loongarch/include/asm/loongarch.h | 10 ++
 arch/loongarch/kvm/exit.c  | 46 +-
 3 files changed, 41 insertions(+), 16 deletions(-)

diff --git a/arch/loongarch/include/asm/inst.h 
b/arch/loongarch/include/asm/inst.h
index d8f637f9e400..ad120f924905 100644
--- a/arch/loongarch/include/asm/inst.h
+++ b/arch/loongarch/include/asm/inst.h
@@ -67,6 +67,7 @@ enum reg2_op {
revhd_op= 0x11,
extwh_op= 0x16,
extwb_op= 0x17,
+   cpucfg_op   = 0x1b,
iocsrrdb_op = 0x19200,
iocsrrdh_op = 0x19201,
iocsrrdw_op = 0x19202,
diff --git a/arch/loongarch/include/asm/loongarch.h 
b/arch/loongarch/include/asm/loongarch.h
index 46366e783c84..a1d22e8b6f94 100644
--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -158,6 +158,16 @@
 #define  CPUCFG48_VFPU_CG  BIT(2)
 #define  CPUCFG48_RAM_CG   BIT(3)
 
+/*
+ * cpucfg index area: 0x4000 -- 0x40ff
+ * SW emulation for KVM hypervirsor
+ */
+#define CPUCFG_KVM_BASE0x4000UL
+#define CPUCFG_KVM_SIZE0x100
+#define CPUCFG_KVM_SIG CPUCFG_KVM_BASE
+#define  KVM_SIGNATURE "KVM\0"
+#define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
+
 #ifndef __ASSEMBLY__
 
 /* CSR */
diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
index d15c71320a11..f4e4df05f578 100644
--- a/arch/loongarch/kvm/exit.c
+++ b/arch/loongarch/kvm/exit.c
@@ -206,10 +206,37 @@ int kvm_emu_idle(struct kvm_vcpu *vcpu)
return EMULATE_DONE;
 }
 
-static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
+static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
 {
int rd, rj;
unsigned int index;
+
+   rd = inst.reg2_format.rd;
+   rj = inst.reg2_format.rj;
+   ++vcpu->stat.cpucfg_exits;
+   index = vcpu->arch.gprs[rj];
+
+   /*
+* By LoongArch Reference Manual 2.2.10.5
+* Return value is 0 for undefined cpucfg index
+*/
+   switch (index) {
+   case 0 ... (KVM_MAX_CPUCFG_REGS - 1):
+   vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
+   break;
+   case CPUCFG_KVM_SIG:
+   vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
+   break;
+   default:
+   vcpu->arch.gprs[rd] = 0;
+   break;
+   }
+
+   return EMULATE_DONE;
+}
+
+static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
+{
unsigned long curr_pc;
larch_inst inst;
enum emulation_result er = EMULATE_DONE;
@@ -224,21 +251,8 @@ static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
er = EMULATE_FAIL;
switch (((inst.word >> 24) & 0xff)) {
case 0x0: /* CPUCFG GSPR */
-   if (inst.reg2_format.opcode == 0x1B) {
-   rd = inst.reg2_format.rd;
-   rj = inst.reg2_format.rj;
-   ++vcpu->stat.cpucfg_exits;
-   index = vcpu->arch.gprs[rj];
-   er = EMULATE_DONE;
-   /*
-* By LoongArch Reference Manual 2.2.10.5
-* return value is 0 for undefined cpucfg index
-*/
-   if (index < KVM_MAX_CPUCFG_REGS)
-   vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
-   else
-   vcpu->arch.gprs[rd] = 0;
-   }
+   if (inst.reg2_format.opcode == cpucfg_op)
+   er = kvm_emu_cpucfg(vcpu, inst);
break;
case 0x4: /* CSR{RD,WR,XCHG} GSPR */
er = kvm_handle_csr(vcpu, inst);
-- 
2.39.3




[PATCH v4 1/6] LoongArch/smp: Refine ipi ops on LoongArch platform

2024-01-31 Thread Bibo Mao
This patch refines ipi handling on LoongArch platform, there are
three changes with this patch.
1. Add generic get_percpu_irq() api, replace some percpu irq functions
such as get_ipi_irq()/get_pmc_irq()/get_timer_irq() with get_percpu_irq().

2. Change parameter action definition with function
loongson_send_ipi_single() and loongson_send_ipi_mask(). Normal decimal
encoding is used rather than binary bitmap encoding for ipi action, ipi
hw sender uses devimal action code, and ipi receiver will get binary bitmap
encoding, the ipi hw will convert it into bitmap in ipi message buffer.

3. Add structure smp_ops on LoongArch platform so that pv ipi can be used
later.

Signed-off-by: Bibo Mao 
---
 arch/loongarch/include/asm/hardirq.h |  4 ++
 arch/loongarch/include/asm/irq.h | 10 -
 arch/loongarch/include/asm/smp.h | 31 +++
 arch/loongarch/kernel/irq.c  | 22 +--
 arch/loongarch/kernel/perf_event.c   | 14 +--
 arch/loongarch/kernel/smp.c  | 58 +++-
 arch/loongarch/kernel/time.c | 12 +-
 7 files changed, 71 insertions(+), 80 deletions(-)

diff --git a/arch/loongarch/include/asm/hardirq.h 
b/arch/loongarch/include/asm/hardirq.h
index 0ef3b18f8980..9f0038e19c7f 100644
--- a/arch/loongarch/include/asm/hardirq.h
+++ b/arch/loongarch/include/asm/hardirq.h
@@ -12,6 +12,10 @@
 extern void ack_bad_irq(unsigned int irq);
 #define ack_bad_irq ack_bad_irq
 
+enum ipi_msg_type {
+   IPI_RESCHEDULE,
+   IPI_CALL_FUNCTION,
+};
 #define NR_IPI 2
 
 typedef struct {
diff --git a/arch/loongarch/include/asm/irq.h b/arch/loongarch/include/asm/irq.h
index 218b4da0ea90..00101b6d601e 100644
--- a/arch/loongarch/include/asm/irq.h
+++ b/arch/loongarch/include/asm/irq.h
@@ -117,8 +117,16 @@ extern struct fwnode_handle *liointc_handle;
 extern struct fwnode_handle *pch_lpc_handle;
 extern struct fwnode_handle *pch_pic_handle[MAX_IO_PICS];
 
-extern irqreturn_t loongson_ipi_interrupt(int irq, void *dev);
+static inline int get_percpu_irq(int vector)
+{
+   struct irq_domain *d;
+
+   d = irq_find_matching_fwnode(cpuintc_handle, DOMAIN_BUS_ANY);
+   if (d)
+   return irq_create_mapping(d, vector);
 
+   return -EINVAL;
+}
 #include 
 
 #endif /* _ASM_IRQ_H */
diff --git a/arch/loongarch/include/asm/smp.h b/arch/loongarch/include/asm/smp.h
index f81e5f01d619..8a42632b038a 100644
--- a/arch/loongarch/include/asm/smp.h
+++ b/arch/loongarch/include/asm/smp.h
@@ -12,6 +12,13 @@
 #include 
 #include 
 
+struct smp_ops {
+   void (*init_ipi)(void);
+   void (*send_ipi_mask)(const struct cpumask *mask, unsigned int action);
+   void (*send_ipi_single)(int cpu, unsigned int action);
+};
+
+extern struct smp_ops smp_ops;
 extern int smp_num_siblings;
 extern int num_processors;
 extern int disabled_cpus;
@@ -24,8 +31,6 @@ void loongson_prepare_cpus(unsigned int max_cpus);
 void loongson_boot_secondary(int cpu, struct task_struct *idle);
 void loongson_init_secondary(void);
 void loongson_smp_finish(void);
-void loongson_send_ipi_single(int cpu, unsigned int action);
-void loongson_send_ipi_mask(const struct cpumask *mask, unsigned int action);
 #ifdef CONFIG_HOTPLUG_CPU
 int loongson_cpu_disable(void);
 void loongson_cpu_die(unsigned int cpu);
@@ -59,9 +64,12 @@ extern int __cpu_logical_map[NR_CPUS];
 
 #define cpu_physical_id(cpu)   cpu_logical_map(cpu)
 
-#define SMP_BOOT_CPU   0x1
-#define SMP_RESCHEDULE 0x2
-#define SMP_CALL_FUNCTION  0x4
+#define ACTTION_BOOT_CPU   0
+#define ACTTION_RESCHEDULE 1
+#define ACTTION_CALL_FUNCTION  2
+#define SMP_BOOT_CPU   BIT(ACTTION_BOOT_CPU)
+#define SMP_RESCHEDULE BIT(ACTTION_RESCHEDULE)
+#define SMP_CALL_FUNCTION  BIT(ACTTION_CALL_FUNCTION)
 
 struct secondary_data {
unsigned long stack;
@@ -71,7 +79,8 @@ extern struct secondary_data cpuboot_data;
 
 extern asmlinkage void smpboot_entry(void);
 extern asmlinkage void start_secondary(void);
-
+extern void arch_send_call_function_single_ipi(int cpu);
+extern void arch_send_call_function_ipi_mask(const struct cpumask *mask);
 extern void calculate_cpu_foreign_map(void);
 
 /*
@@ -79,16 +88,6 @@ extern void calculate_cpu_foreign_map(void);
  */
 extern void show_ipi_list(struct seq_file *p, int prec);
 
-static inline void arch_send_call_function_single_ipi(int cpu)
-{
-   loongson_send_ipi_single(cpu, SMP_CALL_FUNCTION);
-}
-
-static inline void arch_send_call_function_ipi_mask(const struct cpumask *mask)
-{
-   loongson_send_ipi_mask(mask, SMP_CALL_FUNCTION);
-}
-
 #ifdef CONFIG_HOTPLUG_CPU
 static inline int __cpu_disable(void)
 {
diff --git a/arch/loongarch/kernel/irq.c b/arch/loongarch/kernel/irq.c
index 883e5066ae44..ce36897d1e5a 100644
--- a/arch/loongarch/kernel/irq.c
+++ b/arch/loongarch/kernel/irq.c
@@ -87,23 +87,9 @@ static void __init init_vec_parent_group(void)
acpi_table_parse(ACPI_SIG_MCFG, early_pci_mcfg_parse);
 }
 
-static int __init 

Re: [PATCH v2 4/7] tracefs: dentry lookup crapectomy

2024-01-31 Thread Al Viro
On Wed, Jan 31, 2024 at 09:26:42PM -0500, Steven Rostedt wrote:

> > Huh?  Just return NULL and be done with that - you'll get an
> > unhashed negative dentry and let the caller turn that into
> > -ENOENT...
> 
> We had a problem here with just returning NULL. It leaves the negative
> dentry around and doesn't get refreshed.

Why would that dentry stick around?  And how would anyone find
it, anyway, when it's not hashed?

> I did this:
> 
>  # cd /sys/kernel/tracing
>  # ls events/kprobes/sched/
> ls: cannot access 'events/kprobes/sched/': No such file or directory
>  # echo 'p:sched schedule' >> kprobe_events
>  # ls events/kprobes/sched/
> ls: cannot access 'events/kprobes/sched/': No such file or directory
> 
> When it should have been:
> 
>  # ls events/kprobes/sched/
> enable  filter  format  hist  hist_debug  id  inject  trigger
> 
> Leaving the negative dentry there will have it fail when the directory
> exists the next time.

Then you have something very deeply fucked up.  NULL or ERR_PTR(-ENOENT)
from ->lookup() in the last component of open() would do exactly the
same thing: dput() whatever had been passed to ->lookup() and fail
open(2) with -ENOENT.



Re: [PATCH v2 4/7] tracefs: dentry lookup crapectomy

2024-01-31 Thread Steven Rostedt
On Thu, 1 Feb 2024 00:27:19 +
Al Viro  wrote:

> On Wed, Jan 31, 2024 at 01:49:22PM -0500, Steven Rostedt wrote:
> 
> > @@ -329,32 +320,29 @@ static struct dentry *create_file(const char *name, 
> > umode_t mode,
> >  
> > ti = get_tracefs(inode);
> > ti->flags |= TRACEFS_EVENT_INODE;
> > -   d_instantiate(dentry, inode);
> > +
> > +   d_add(dentry, inode);
> > fsnotify_create(dentry->d_parent->d_inode, dentry);  
> 
> Seriously?  stat(2), have it go away from dcache on memory pressure,
> lather, rinse, repeat...  Won't *snotify get confused by the stream
> of creations of the same thing, with not a removal in sight?
> 

That looks to be cut and paste from the old create in tracefs. I don't know
of a real use case for that. I think we could possibly delete it without
anyone noticing.


> > -   return eventfs_end_creating(dentry);
> > +   return dentry;
> >  };
> >  


> > @@ -371,11 +359,14 @@ static struct dentry *create_dir(struct eventfs_inode 
> > *ei, struct dentry *parent
> > /* Only directories have ti->private set to an ei, not files */
> > ti->private = ei;
> >  
> > +   dentry->d_fsdata = ei;
> > +ei->dentry = dentry;   // Remove me!
> > +
> > inc_nlink(inode);
> > -   d_instantiate(dentry, inode);
> > +   d_add(dentry, inode);
> > inc_nlink(dentry->d_parent->d_inode);  
> 
> What will happen when that thing gets evicted from dcache,
> gets looked up again, and again, and...?
> 
> > fsnotify_mkdir(dentry->d_parent->d_inode, dentry);  
> 
> Same re snotify confusion...

Yeah, again, I think it's useless. Doing that is more useless than taring
the tracefs directory ;-)

> 
> > -   return eventfs_end_creating(dentry);
> > +   return dentry;
> >  }
> >  
> >  static void free_ei(struct eventfs_inode *ei)
> > @@ -425,7 +416,7 @@ void eventfs_set_ei_status_free(struct tracefs_inode 
> > *ti, struct dentry *dentry)
> >  }
> >  


> > @@ -607,79 +462,55 @@ static struct dentry *eventfs_root_lookup(struct 
> > inode *dir,
> >   struct dentry *dentry,
> >   unsigned int flags)
> >  {
> > -   const struct file_operations *fops;
> > -   const struct eventfs_entry *entry;
> > struct eventfs_inode *ei_child;
> > struct tracefs_inode *ti;
> > struct eventfs_inode *ei;
> > -   struct dentry *ei_dentry = NULL;
> > -   struct dentry *ret = NULL;
> > -   struct dentry *d;
> > const char *name = dentry->d_name.name;
> > -   umode_t mode;
> > -   void *data;
> > -   int idx;
> > -   int i;
> > -   int r;
> > +   struct dentry *result = NULL;
> >  
> > ti = get_tracefs(dir);
> > if (!(ti->flags & TRACEFS_EVENT_INODE))  
> 
>   Can that ever happen?  I mean, why set ->i_op to something that
> has this for ->lookup() on a directory without TRACEFS_EVENT_INODE in
> its inode?  It's not as if you ever removed that flag...

That's been there mostly as paranoia. Should probably be switched to:

if (WARN_ON_ONCE(!(ti->flags & TRACEFS_EVENT_INODE)))


> 
> > -   return NULL;
> > -
> > -   /* Grab srcu to prevent the ei from going away */
> > -   idx = srcu_read_lock(_srcu);
> > +   return ERR_PTR(-EIO);
> >  
> > -   /*
> > -* Grab the eventfs_mutex to consistent value from ti->private.
> > -* This s
> > -*/
> > mutex_lock(_mutex);
> > -   ei = READ_ONCE(ti->private);
> > -   if (ei && !ei->is_freed)
> > -   ei_dentry = READ_ONCE(ei->dentry);
> > -   mutex_unlock(_mutex);
> > -
> > -   if (!ei || !ei_dentry)
> > -   goto out;
> >  
> > -   data = ei->data;
> > +   ei = ti->private;
> > +   if (!ei || ei->is_freed)
> > +   goto enoent;
> >  
> > -   list_for_each_entry_srcu(ei_child, >children, list,
> > -srcu_read_lock_held(_srcu)) {
> > +   list_for_each_entry(ei_child, >children, list) {
> > if (strcmp(ei_child->name, name) != 0)
> > continue;
> > -   ret = simple_lookup(dir, dentry, flags);
> > -   if (IS_ERR(ret))
> > -   goto out;
> > -   d = create_dir_dentry(ei, ei_child, ei_dentry);
> > -   dput(d);
> > +   if (ei_child->is_freed)
> > +   goto enoent;  
> 
> Out of curiosity - can that happen now?  You've got exclusion with
> eventfs_remove_rec(), so you shouldn't be able to catch the moment
> between setting ->is_freed and removal from the list...

Yeah, that's from when we just used SRCU. If anything, it too should just
add a WARN_ON_ONCE() to it.

> 
> > +   lookup_dir_entry(dentry, ei, ei_child);
> > goto out;
> > }
> >  
> > -   for (i = 0; i < ei->nr_entries; i++) {
> > -   entry = >entries[i];
> > -   if (strcmp(name, entry->name) == 0) {
> > -   void *cdata = data;
> > -   mutex_lock(_mutex);
> > -   /* If ei->is_freed, then the event itself may be too */
> > -   if 

[PATCH] remoteproc: zynqmp: fix lockstep mode memory region

2024-01-31 Thread Tanmay Shah
In lockstep mode, r5 core0 uses TCM of R5 core1. Following is lockstep
mode memory region as per hardware reference manual.

|  *TCM* |   *R5 View* | *Linux view* |
| R5_0 ATCM (128 KB) | 0x_ | 0xFFE0_  |
| R5_0 BTCM (128 KB) | 0x0002_ | 0xFFE2_  |

However, driver shouldn't model it as above because R5 core0 TCM and core1
TCM has different power-domains mapped to it.
Hence, TCM address space in lockstep mode should be modeled as 64KB
regions only where each region has its own power-domain as following:

|  *TCM* |   *R5 View* | *Linux view* |
| R5_0 ATCM0 (64 KB) | 0x_ | 0xFFE0_  |
| R5_0 BTCM0 (64 KB) | 0x0002_ | 0xFFE2_  |
| R5_0 ATCM1 (64 KB) | 0x0001_ | 0xFFE1_  |
| R5_0 BTCM1 (64 KB) | 0x0003_ | 0xFFE3_  |

This fix makes driver maintanance easy and makes design robust for future
platorms as well.

Fixes: 9af45bbdcbbb ("remoteproc: zynqmp: fix TCM carveouts in lockstep mode")
Signed-off-by: Tanmay Shah 
---
 drivers/remoteproc/xlnx_r5_remoteproc.c | 145 ++--
 1 file changed, 12 insertions(+), 133 deletions(-)

diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c 
b/drivers/remoteproc/xlnx_r5_remoteproc.c
index 4395edea9a64..42b0384d34f2 100644
--- a/drivers/remoteproc/xlnx_r5_remoteproc.c
+++ b/drivers/remoteproc/xlnx_r5_remoteproc.c
@@ -84,12 +84,12 @@ static const struct mem_bank_data zynqmp_tcm_banks_split[] 
= {
{0xffebUL, 0x2, 0x1UL, PD_R5_1_BTCM, "btcm1"},
 };
 
-/* In lockstep mode cluster combines each 64KB TCM and makes 128KB TCM */
+/* In lockstep mode cluster uses each 64KB TCM from second core as well */
 static const struct mem_bank_data zynqmp_tcm_banks_lockstep[] = {
-   {0xffe0UL, 0x0, 0x2UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 128KB 
each */
-   {0xffe2UL, 0x2, 0x2UL, PD_R5_0_BTCM, "btcm0"},
-   {0, 0, 0, PD_R5_1_ATCM, ""},
-   {0, 0, 0, PD_R5_1_BTCM, ""},
+   {0xffe0UL, 0x0, 0x1UL, PD_R5_0_ATCM, "atcm0"}, /* TCM 64KB each 
*/
+   {0xffe2UL, 0x2, 0x1UL, PD_R5_0_BTCM, "btcm0"},
+   {0xffe1UL, 0x1, 0x1UL, PD_R5_1_ATCM, "atcm1"},
+   {0xffe3UL, 0x3, 0x1UL, PD_R5_1_BTCM, "btcm1"},
 };
 
 /**
@@ -540,14 +540,14 @@ static int tcm_mem_map(struct rproc *rproc,
 }
 
 /*
- * add_tcm_carveout_split_mode()
+ * add_tcm_banks()
  * @rproc: single R5 core's corresponding rproc instance
  *
- * allocate and add remoteproc carveout for TCM memory in split mode
+ * allocate and add remoteproc carveout for TCM memory
  *
  * return 0 on success, otherwise non-zero value on failure
  */
-static int add_tcm_carveout_split_mode(struct rproc *rproc)
+static int add_tcm_banks(struct rproc *rproc)
 {
struct rproc_mem_entry *rproc_mem;
struct zynqmp_r5_core *r5_core;
@@ -580,10 +580,10 @@ static int add_tcm_carveout_split_mode(struct rproc 
*rproc)
 ZYNQMP_PM_REQUEST_ACK_BLOCKING);
if (ret < 0) {
dev_err(dev, "failed to turn on TCM 0x%x", 
pm_domain_id);
-   goto release_tcm_split;
+   goto release_tcm;
}
 
-   dev_dbg(dev, "TCM carveout split mode %s addr=%llx, da=0x%x, 
size=0x%lx",
+   dev_dbg(dev, "TCM carveout %s addr=%llx, da=0x%x, size=0x%lx",
bank_name, bank_addr, da, bank_size);
 
rproc_mem = rproc_mem_entry_init(dev, NULL, bank_addr,
@@ -593,7 +593,7 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc)
if (!rproc_mem) {
ret = -ENOMEM;
zynqmp_pm_release_node(pm_domain_id);
-   goto release_tcm_split;
+   goto release_tcm;
}
 
rproc_add_carveout(rproc, rproc_mem);
@@ -601,7 +601,7 @@ static int add_tcm_carveout_split_mode(struct rproc *rproc)
 
return 0;
 
-release_tcm_split:
+release_tcm:
/* If failed, Turn off all TCM banks turned on before */
for (i--; i >= 0; i--) {
pm_domain_id = r5_core->tcm_banks[i]->pm_domain_id;
@@ -610,127 +610,6 @@ static int add_tcm_carveout_split_mode(struct rproc 
*rproc)
return ret;
 }
 
-/*
- * add_tcm_carveout_lockstep_mode()
- * @rproc: single R5 core's corresponding rproc instance
- *
- * allocate and add remoteproc carveout for TCM memory in lockstep mode
- *
- * return 0 on success, otherwise non-zero value on failure
- */
-static int add_tcm_carveout_lockstep_mode(struct rproc *rproc)
-{
-   struct rproc_mem_entry *rproc_mem;
-   struct zynqmp_r5_core *r5_core;
-   int i, num_banks, ret;
-   phys_addr_t bank_addr;
-   size_t bank_size = 0;
-   struct device *dev;
-   u32 pm_domain_id;
-   char *bank_name;
-   u32 da;
-
-   r5_core = rproc->priv;
-   dev = r5_core->dev;
-
-   

Re: [PATCH 0/4] apply page shift to PFN instead of VA in pfn_to_virt

2024-01-31 Thread Yan Zhao
On Wed, Jan 31, 2024 at 12:48:38PM +0100, Arnd Bergmann wrote:
> On Wed, Jan 31, 2024, at 06:51, Yan Zhao wrote:
> > This is a tiny fix to pfn_to_virt() for some platforms.
> >
> > The original implementaion of pfn_to_virt() takes PFN instead of PA as the
> > input to macro __va, with PAGE_SHIFT applying to the converted VA, which
> > is not right under most conditions, especially when there's an offset in
> > __va.
> >
> >
> > Yan Zhao (4):
> >   asm-generic/page.h: apply page shift to PFN instead of VA in
> > pfn_to_virt
> >   csky: apply page shift to PFN instead of VA in pfn_to_virt
> >   Hexagon: apply page shift to PFN instead of VA in pfn_to_virt
> >   openrisc: apply page shift to PFN instead of VA in pfn_to_virt
> 
> Nice catch, this is clearly a correct fix, and I can take
> the series through the asm-generic tree if we want to take
> this approach.
> 
> I made a couple of interesting observations looking at this patch
> though:
> 
> - this function is only used in architecture specific code on
>   m68k, riscv and s390, though a couple of other architectures
>   have the same definition.
> 
> - There is another function that does the same thing called
>   pfn_to_kaddr(), which is defined on arm, arm64, csky,
>   loongarch, mips, nios2, powerpc, s390, sh, sparc and x86,
>   as well as yet another pfn_va() on parisc.
> 
> - the asm-generic/page.h file used to be included by h8300, c6x
>   and blackfin, all of which are now gone. It has no users left
>   and can just as well get removed, unless we find a new use
>   for it.
> 
> Since it looks like the four broken functions you fix
> don't have a single caller, maybe it would be better to
> just remove them all?
> 
> How exactly did you notice the function being wrong,
> did you try to add a user somewhere, or just read through
> the code?
I came across them when I was debugging an unexpected kernel page fault
on x86, and I was not sure whether page_to_virt() was compiled in
asm-generic/page.h or linux/mm.h.
Though finally, it turned out that the one in linux/mm.h was used, which
yielded the right result and the unexpected kernel page fault in my case
was not related to page_to_virt(), it did lead me to noticing that the
pfn_to_virt() in asm-generic/page.h and other 3 archs did not look right.

Yes, unlike virt_to_pfn() which still has a caller in openrisc (among
csky, Hexagon, openrisc), pfn_to_virt() now does not have a caller in
the 3 archs. Though both virt_to_pfn() and pfn_to_virt() are referenced
in asm-generic/page.h, I also not sure if we need to remove the
asm-generic/page.h which may serve as a template to future archs ?

So, either way looks good to me :)




Re: [PATCH v2 4/7] tracefs: dentry lookup crapectomy

2024-01-31 Thread Al Viro
On Wed, Jan 31, 2024 at 01:49:22PM -0500, Steven Rostedt wrote:

> @@ -329,32 +320,29 @@ static struct dentry *create_file(const char *name, 
> umode_t mode,
>  
>   ti = get_tracefs(inode);
>   ti->flags |= TRACEFS_EVENT_INODE;
> - d_instantiate(dentry, inode);
> +
> + d_add(dentry, inode);
>   fsnotify_create(dentry->d_parent->d_inode, dentry);

Seriously?  stat(2), have it go away from dcache on memory pressure,
lather, rinse, repeat...  Won't *snotify get confused by the stream
of creations of the same thing, with not a removal in sight?

> - return eventfs_end_creating(dentry);
> + return dentry;
>  };
>  
>  /**
> - * create_dir - create a dir in the tracefs filesystem
> + * lookup_dir_entry - look up a dir in the tracefs filesystem
> + * @dentry: the directory to look up
>   * @ei: the eventfs_inode that represents the directory to create
> - * @parent: parent dentry for this file.
>   *
> - * This function will create a dentry for a directory represented by
> + * This function will look up a dentry for a directory represented by
>   * a eventfs_inode.
>   */
> -static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry 
> *parent)
> +static struct dentry *lookup_dir_entry(struct dentry *dentry,
> + struct eventfs_inode *pei, struct eventfs_inode *ei)
>  {
>   struct tracefs_inode *ti;
> - struct dentry *dentry;
>   struct inode *inode;
>  
> - dentry = eventfs_start_creating(ei->name, parent);
> - if (IS_ERR(dentry))
> - return dentry;
> -
>   inode = tracefs_get_inode(dentry->d_sb);
>   if (unlikely(!inode))
> - return eventfs_failed_creating(dentry);
> + return ERR_PTR(-ENOMEM);
>  
>   /* If the user updated the directory's attributes, use them */
>   update_inode_attr(dentry, inode, >attr,
> @@ -371,11 +359,14 @@ static struct dentry *create_dir(struct eventfs_inode 
> *ei, struct dentry *parent
>   /* Only directories have ti->private set to an ei, not files */
>   ti->private = ei;
>  
> + dentry->d_fsdata = ei;
> +ei->dentry = dentry; // Remove me!
> +
>   inc_nlink(inode);
> - d_instantiate(dentry, inode);
> + d_add(dentry, inode);
>   inc_nlink(dentry->d_parent->d_inode);

What will happen when that thing gets evicted from dcache,
gets looked up again, and again, and...?

>   fsnotify_mkdir(dentry->d_parent->d_inode, dentry);

Same re snotify confusion...

> - return eventfs_end_creating(dentry);
> + return dentry;
>  }
>  
>  static void free_ei(struct eventfs_inode *ei)
> @@ -425,7 +416,7 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, 
> struct dentry *dentry)
>  }
>  
>  /**
> - * create_file_dentry - create a dentry for a file of an eventfs_inode
> + * lookup_file_dentry - create a dentry for a file of an eventfs_inode
>   * @ei: the eventfs_inode that the file will be created under
>   * @idx: the index into the d_children[] of the @ei
>   * @parent: The parent dentry of the created file.
> @@ -438,157 +429,21 @@ void eventfs_set_ei_status_free(struct tracefs_inode 
> *ti, struct dentry *dentry)
>   * address located at @e_dentry.
>   */
>  static struct dentry *
> -create_file_dentry(struct eventfs_inode *ei, int idx,
> -struct dentry *parent, const char *name, umode_t mode, void 
> *data,
> +lookup_file_dentry(struct dentry *dentry,
> +struct eventfs_inode *ei, int idx,
> +umode_t mode, void *data,
>  const struct file_operations *fops)
>  {
>   struct eventfs_attr *attr = NULL;
>   struct dentry **e_dentry = >d_children[idx];
> - struct dentry *dentry;
> -
> - WARN_ON_ONCE(!inode_is_locked(parent->d_inode));
>  
> - mutex_lock(_mutex);
> - if (ei->is_freed) {
> - mutex_unlock(_mutex);
> - return NULL;
> - }
> - /* If the e_dentry already has a dentry, use it */
> - if (*e_dentry) {
> - dget(*e_dentry);
> - mutex_unlock(_mutex);
> - return *e_dentry;
> - }
> -
> - /* ei->entry_attrs are protected by SRCU */
>   if (ei->entry_attrs)
>   attr = >entry_attrs[idx];
>  
> - mutex_unlock(_mutex);
> -
> - dentry = create_file(name, mode, attr, parent, data, fops);
> -
> - mutex_lock(_mutex);
> -
> - if (IS_ERR_OR_NULL(dentry)) {
> - /*
> -  * When the mutex was released, something else could have
> -  * created the dentry for this e_dentry. In which case
> -  * use that one.
> -  *
> -  * If ei->is_freed is set, the e_dentry is currently on its
> -  * way to being freed, don't return it. If e_dentry is NULL
> -  * it means it was already freed.
> -  */
> - if (ei->is_freed) {
> - dentry = NULL;
> - } else {
> - dentry = *e_dentry;
> -  

[PATCH v3 46/47] filelock: remove temporary compatibility macros

2024-01-31 Thread Jeff Layton
Everything has been converted to access fl_core fields directly, so we
can now drop these.

Signed-off-by: Jeff Layton 
---
 include/linux/filelock.h | 16 
 1 file changed, 16 deletions(-)

diff --git a/include/linux/filelock.h b/include/linux/filelock.h
index fdec838a3ca7..ceadd979e110 100644
--- a/include/linux/filelock.h
+++ b/include/linux/filelock.h
@@ -131,22 +131,6 @@ struct file_lock {
} fl_u;
 } __randomize_layout;
 
-/* Temporary macros to allow building during coccinelle conversion */
-#ifdef _NEED_FILE_LOCK_FIELD_MACROS
-#define fl_list c.flc_list
-#define fl_blocker c.flc_blocker
-#define fl_link c.flc_link
-#define fl_blocked_requests c.flc_blocked_requests
-#define fl_blocked_member c.flc_blocked_member
-#define fl_owner c.flc_owner
-#define fl_flags c.flc_flags
-#define fl_type c.flc_type
-#define fl_pid c.flc_pid
-#define fl_link_cpu c.flc_link_cpu
-#define fl_wait c.flc_wait
-#define fl_file c.flc_file
-#endif
-
 struct file_lock_context {
spinlock_t  flc_lock;
struct list_headflc_flock;

-- 
2.43.0




[PATCH v3 38/47] gfs2: adapt to breakup of struct file_lock

2024-01-31 Thread Jeff Layton
Most of the existing APIs have remained the same, but subsystems that
access file_lock fields directly need to reach into struct
file_lock_core now.

Signed-off-by: Jeff Layton 
---
 fs/gfs2/file.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index d06488de1b3b..4c42ada60ae7 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -15,7 +15,6 @@
 #include 
 #include 
 #include 
-#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 #include 
@@ -1441,7 +1440,7 @@ static int gfs2_lock(struct file *file, int cmd, struct 
file_lock *fl)
struct gfs2_sbd *sdp = GFS2_SB(file->f_mapping->host);
struct lm_lockstruct *ls = >sd_lockstruct;
 
-   if (!(fl->fl_flags & FL_POSIX))
+   if (!(fl->c.flc_flags & FL_POSIX))
return -ENOLCK;
if (gfs2_withdrawing_or_withdrawn(sdp)) {
if (lock_is_unlock(fl))
@@ -1484,7 +1483,7 @@ static int do_flock(struct file *file, int cmd, struct 
file_lock *fl)
int error = 0;
int sleeptime;
 
-   state = (lock_is_write(fl)) ? LM_ST_EXCLUSIVE : LM_ST_SHARED;
+   state = lock_is_write(fl) ? LM_ST_EXCLUSIVE : LM_ST_SHARED;
flags = GL_EXACT | GL_NOPID;
if (!IS_SETLKW(cmd))
flags |= LM_FLAG_TRY_1CB;
@@ -1496,8 +1495,8 @@ static int do_flock(struct file *file, int cmd, struct 
file_lock *fl)
if (fl_gh->gh_state == state)
goto out;
locks_init_lock();
-   request.fl_type = F_UNLCK;
-   request.fl_flags = FL_FLOCK;
+   request.c.flc_type = F_UNLCK;
+   request.c.flc_flags = FL_FLOCK;
locks_lock_file_wait(file, );
gfs2_glock_dq(fl_gh);
gfs2_holder_reinit(state, flags, fl_gh);
@@ -1558,7 +1557,7 @@ static void do_unflock(struct file *file, struct 
file_lock *fl)
 
 static int gfs2_flock(struct file *file, int cmd, struct file_lock *fl)
 {
-   if (!(fl->fl_flags & FL_FLOCK))
+   if (!(fl->c.flc_flags & FL_FLOCK))
return -ENOLCK;
 
if (lock_is_unlock(fl)) {

-- 
2.43.0




[PATCH v3 16/47] filelock: drop the IS_* macros

2024-01-31 Thread Jeff Layton
These don't add a lot of value over just open-coding the flag check.

Suggested-by: NeilBrown 
Signed-off-by: Jeff Layton 
---
 fs/locks.c | 32 +++-
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 149070fd3b66..d685c3fdbea5 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -70,12 +70,6 @@
 
 #include 
 
-#define IS_POSIX(fl)   (fl->fl_flags & FL_POSIX)
-#define IS_FLOCK(fl)   (fl->fl_flags & FL_FLOCK)
-#define IS_LEASE(fl)   (fl->fl_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT))
-#define IS_OFDLCK(fl)  (fl->fl_flags & FL_OFDLCK)
-#define IS_REMOTELCK(fl)   (fl->fl_pid <= 0)
-
 static bool lease_breaking(struct file_lock *fl)
 {
return fl->fl_flags & (FL_UNLOCK_PENDING | FL_DOWNGRADE_PENDING);
@@ -767,7 +761,7 @@ static void __locks_insert_block(struct file_lock *blocker,
}
waiter->fl_blocker = blocker;
list_add_tail(>fl_blocked_member, 
>fl_blocked_requests);
-   if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
+   if ((blocker->fl_flags & (FL_POSIX|FL_OFDLCK)) == FL_POSIX)
locks_insert_global_blocked(waiter);
 
/* The requests in waiter->fl_blocked are known to conflict with
@@ -999,7 +993,7 @@ static int posix_locks_deadlock(struct file_lock *caller_fl,
 * This deadlock detector can't reasonably detect deadlocks with
 * FL_OFDLCK locks, since they aren't owned by a process, per-se.
 */
-   if (IS_OFDLCK(caller_fl))
+   if (caller_fl->fl_flags & FL_OFDLCK)
return 0;
 
while ((block_fl = what_owner_is_waiting_for(block_fl))) {
@@ -2150,10 +2144,13 @@ static pid_t locks_translate_pid(struct file_lock *fl, 
struct pid_namespace *ns)
pid_t vnr;
struct pid *pid;
 
-   if (IS_OFDLCK(fl))
+   if (fl->fl_flags & FL_OFDLCK)
return -1;
-   if (IS_REMOTELCK(fl))
+
+   /* Remote locks report a negative pid value */
+   if (fl->fl_pid <= 0)
return fl->fl_pid;
+
/*
 * If the flock owner process is dead and its pid has been already
 * freed, the translation below won't work, but we still want to show
@@ -2697,7 +2694,7 @@ static void lock_get_status(struct seq_file *f, struct 
file_lock *fl,
struct inode *inode = NULL;
unsigned int pid;
struct pid_namespace *proc_pidns = 
proc_pid_ns(file_inode(f->file)->i_sb);
-   int type;
+   int type = fl->fl_type;
 
pid = locks_translate_pid(fl, proc_pidns);
/*
@@ -2714,19 +2711,21 @@ static void lock_get_status(struct seq_file *f, struct 
file_lock *fl,
if (repeat)
seq_printf(f, "%*s", repeat - 1 + (int)strlen(pfx), pfx);
 
-   if (IS_POSIX(fl)) {
+   if (fl->fl_flags & FL_POSIX) {
if (fl->fl_flags & FL_ACCESS)
seq_puts(f, "ACCESS");
-   else if (IS_OFDLCK(fl))
+   else if (fl->fl_flags & FL_OFDLCK)
seq_puts(f, "OFDLCK");
else
seq_puts(f, "POSIX ");
 
seq_printf(f, " %s ",
 (inode == NULL) ? "*NOINODE*" : "ADVISORY ");
-   } else if (IS_FLOCK(fl)) {
+   } else if (fl->fl_flags & FL_FLOCK) {
seq_puts(f, "FLOCK  ADVISORY  ");
-   } else if (IS_LEASE(fl)) {
+   } else if (fl->fl_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT)) {
+   type = target_leasetype(fl);
+
if (fl->fl_flags & FL_DELEG)
seq_puts(f, "DELEG  ");
else
@@ -2741,7 +2740,6 @@ static void lock_get_status(struct seq_file *f, struct 
file_lock *fl,
} else {
seq_puts(f, "UNKNOWN UNKNOWN  ");
}
-   type = IS_LEASE(fl) ? target_leasetype(fl) : fl->fl_type;
 
seq_printf(f, "%s ", (type == F_WRLCK) ? "WRITE" :
 (type == F_RDLCK) ? "READ" : "UNLCK");
@@ -2753,7 +2751,7 @@ static void lock_get_status(struct seq_file *f, struct 
file_lock *fl,
} else {
seq_printf(f, "%d :0 ", pid);
}
-   if (IS_POSIX(fl)) {
+   if (fl->fl_flags & FL_POSIX) {
if (fl->fl_end == OFFSET_MAX)
seq_printf(f, "%Ld EOF\n", fl->fl_start);
else

-- 
2.43.0




Re: [PATCH 6/6] eventfs: clean up dentry ops and add revalidate function

2024-01-31 Thread Steven Rostedt
On Thu, 1 Feb 2024 08:07:15 +0900
Masami Hiramatsu (Google)  wrote:

> > Then tracefs could be nicely converted over to kernfs, and eventfs would be
> > its own entity.  
> 
> If so, maybe we can just make symlinks to the 'id' and 'format' from events
> under tracefs? :)

I don't think that will save anything. The files currently do not allocate
any memory. If we make symlinks, we need to allocate a path, to them. I
think that would be rather difficult to do. Not to mention, that could
cause a lot of breakage. What do you do if the other filesystem isn't
mounted?

I could possibly make a light way handle to pass back to the callbacks.

struct trace_event_light {
unsigned long   flags
struct trace_event_call *event_call;
};

struct trace_event_file {
struct trace_event_lightcall;
[..]
// Remove he flags and event_call and have it above
};

if the callback data has:

 callback(..., void **data)
 {
struct trace_event_light *call = *data;
struct trace_event_file *file;

If (strcmp(name, "id") == 0 || strcmp(name, "format") == 0) {
*data = call->event_call;
return 1;
}

/* Return if this is just a light data entry */
if (!(data->flags & TRACE_EVENT_FULL))
return 0;

file = container_of(data, struct trace_event_file, call);

// continue processing the full data
}

This way the lonely eventfs could still share a lot of the code.

-- Steve



[PATCH v3 17/47] filelock: split common fields into struct file_lock_core

2024-01-31 Thread Jeff Layton
In a future patch, we're going to split file leases into their own
structure. Since a lot of the underlying machinery uses the same fields
move those into a new file_lock_core, and embed that inside struct
file_lock.

For now, add some macros to ensure that we can continue to build while
the conversion is in progress.

Signed-off-by: Jeff Layton 
---
 fs/9p/vfs_file.c  |  1 +
 fs/afs/internal.h |  1 +
 fs/ceph/locks.c   |  1 +
 fs/dlm/plock.c|  1 +
 fs/fuse/file.c|  1 +
 fs/gfs2/file.c|  1 +
 fs/lockd/clntproc.c   |  1 +
 fs/locks.c|  1 +
 fs/nfs/file.c |  1 +
 fs/nfs/nfs4_fs.h  |  1 +
 fs/nfs/write.c|  1 +
 fs/nfsd/netns.h   |  1 +
 fs/ocfs2/locks.c  |  1 +
 fs/ocfs2/stack_user.c |  1 +
 fs/open.c |  2 +-
 fs/posix_acl.c|  4 ++--
 fs/smb/client/cifsglob.h  |  1 +
 fs/smb/client/cifssmb.c   |  1 +
 fs/smb/client/file.c  |  3 ++-
 fs/smb/client/smb2file.c  |  1 +
 fs/smb/server/smb2pdu.c   |  1 +
 fs/smb/server/vfs.c   |  1 +
 include/linux/filelock.h  | 57 ---
 include/linux/lockd/xdr.h |  3 ++-
 24 files changed, 65 insertions(+), 23 deletions(-)

diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index 3df8aa1b5996..a1dabcf73380 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 #include 
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 9c03fcf7ffaa..f5dd428e40f4 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 #include 
diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index 80ebe1d6c67d..ce773e9c0b79 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -7,6 +7,7 @@
 
 #include "super.h"
 #include "mds_client.h"
+#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 
diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index 42c596b900d4..fdcddbb96d40 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -4,6 +4,7 @@
  */
 
 #include 
+#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 #include 
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 148a71b8b4d0..2757870ee6ac 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 6c25aea30f1b..d06488de1b3b 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 #include 
diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index cc596748e359..1f71260603b7 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 #include 
diff --git a/fs/locks.c b/fs/locks.c
index d685c3fdbea5..097254ab35d3 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -48,6 +48,7 @@
  * children.
  *
  */
+#define _NEED_FILE_LOCK_FIELD_MACROS
 
 #include 
 #include 
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 1a7a76d6055b..0b6691e64d27 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -31,6 +31,7 @@
 #include 
 
 #include 
+#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 
 #include "delegation.h"
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 581698f1b7b2..752224a48f1c 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -23,6 +23,7 @@
 #define NFS4_MAX_LOOP_ON_RECOVER (10)
 
 #include 
+#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 
 struct idmap;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index d16f2b9d1765..13f2e10167ac 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 
 #include 
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 74b4360779a1..fd91125208be 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -10,6 +10,7 @@
 
 #include 
 #include 
+#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 #include 
diff --git a/fs/ocfs2/locks.c b/fs/ocfs2/locks.c
index ef4fd91b586e..84ad403b5998 100644
--- a/fs/ocfs2/locks.c
+++ b/fs/ocfs2/locks.c
@@ -8,6 +8,7 @@
  */
 
 #include 
+#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 
diff --git a/fs/ocfs2/stack_user.c b/fs/ocfs2/stack_user.c
index c11406cd87a8..39b7e47a8618 100644
--- a/fs/ocfs2/stack_user.c
+++ b/fs/ocfs2/stack_user.c
@@ -9,6 +9,7 @@
 
 #include 
 #include 
+#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 #include 
diff --git a/fs/open.c b/fs/open.c
index a84d21e55c39..0a73afe04d34 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1364,7 +1364,7 @@ struct file *filp_open(const char *filename, int flags, 
umode_t mode)
 {
struct filename *name = 

Re: [PATCH 6/6] eventfs: clean up dentry ops and add revalidate function

2024-01-31 Thread Google
On Wed, 31 Jan 2024 13:00:39 -0500
Steven Rostedt  wrote:

> On Tue, 30 Jan 2024 11:03:55 -0800
> Linus Torvalds  wrote:
> 
> > It would probably be cleaner to make eventfs its own filesystem, or at
> > least set its own dentry ops when looking up eventfs files.  But as it
> > is, only eventfs dentries use d_fsdata, so we don't really need to split
> > these things up by use.
> 
> BTW, I have been thinking about making eventfs a completely separate file
> system that could be mounted outside of tracefs. One that is readonly that
> only contains the "id" and "format" files and nothing more.
> 
> Why? Because perf and powertop both use those files to know how to parse
> the raw event formats. I don't think there's anything in there that
> requires root privileges to read. They should not be exposing any internal
> kernel information besides the event format layouts, and it would be nice
> to have a /sys/kernel/events directory that only had that.

That's a good idea! So maybe we can allow perf to read it without root user.

> 
> Making eventfs a separate file system where, when added to tracefs, has the
> control files for the specific trace_array, but for the /sys/kernel
> directory, only cares about the trace format files.
> 
> Then tracefs could be nicely converted over to kernfs, and eventfs would be
> its own entity.

If so, maybe we can just make symlinks to the 'id' and 'format' from events
under tracefs? :)

Thank you,

> 
> -- Steve


-- 
Masami Hiramatsu (Google) 



[PATCH v3 47/47] filelock: split leases out of struct file_lock

2024-01-31 Thread Jeff Layton
Add a new struct file_lease and move the lease-specific fields from
struct file_lock to it. Convert the appropriate API calls to take
struct file_lease instead, and convert the callers to use them.

There is zero overlap between the lock manager operations for file
locks and the ones for file leases, so split the lease-related
operations off into a new lease_manager_operations struct.

Signed-off-by: Jeff Layton 
---
 fs/libfs.c  |   2 +-
 fs/locks.c  | 123 ++--
 fs/nfs/nfs4_fs.h|   2 +-
 fs/nfs/nfs4file.c   |   2 +-
 fs/nfs/nfs4proc.c   |   4 +-
 fs/nfsd/nfs4layouts.c   |  17 +++---
 fs/nfsd/nfs4state.c |  27 -
 fs/smb/client/cifsfs.c  |   2 +-
 include/linux/filelock.h|  49 ++--
 include/linux/fs.h  |   5 +-
 include/trace/events/filelock.h |  18 +++---
 11 files changed, 153 insertions(+), 98 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index eec6031b0155..8b67cb4655d5 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1580,7 +1580,7 @@ EXPORT_SYMBOL(alloc_anon_inode);
  * All arguments are ignored and it just returns -EINVAL.
  */
 int
-simple_nosetlease(struct file *filp, int arg, struct file_lock **flp,
+simple_nosetlease(struct file *filp, int arg, struct file_lease **flp,
  void **priv)
 {
return -EINVAL;
diff --git a/fs/locks.c b/fs/locks.c
index 1a4b01203d3d..33c7f4a8c729 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -74,12 +74,17 @@ static struct file_lock *file_lock(struct file_lock_core 
*flc)
return container_of(flc, struct file_lock, c);
 }
 
-static bool lease_breaking(struct file_lock *fl)
+static struct file_lease *file_lease(struct file_lock_core *flc)
+{
+   return container_of(flc, struct file_lease, c);
+}
+
+static bool lease_breaking(struct file_lease *fl)
 {
return fl->c.flc_flags & (FL_UNLOCK_PENDING | FL_DOWNGRADE_PENDING);
 }
 
-static int target_leasetype(struct file_lock *fl)
+static int target_leasetype(struct file_lease *fl)
 {
if (fl->c.flc_flags & FL_UNLOCK_PENDING)
return F_UNLCK;
@@ -166,6 +171,7 @@ static DEFINE_SPINLOCK(blocked_lock_lock);
 
 static struct kmem_cache *flctx_cache __ro_after_init;
 static struct kmem_cache *filelock_cache __ro_after_init;
+static struct kmem_cache *filelease_cache __ro_after_init;
 
 static struct file_lock_context *
 locks_get_lock_context(struct inode *inode, int type)
@@ -275,6 +281,18 @@ struct file_lock *locks_alloc_lock(void)
 }
 EXPORT_SYMBOL_GPL(locks_alloc_lock);
 
+/* Allocate an empty lock structure. */
+struct file_lease *locks_alloc_lease(void)
+{
+   struct file_lease *fl = kmem_cache_zalloc(filelease_cache, GFP_KERNEL);
+
+   if (fl)
+   locks_init_lock_heads(>c);
+
+   return fl;
+}
+EXPORT_SYMBOL_GPL(locks_alloc_lease);
+
 void locks_release_private(struct file_lock *fl)
 {
struct file_lock_core *flc = >c;
@@ -336,15 +354,25 @@ void locks_free_lock(struct file_lock *fl)
 }
 EXPORT_SYMBOL(locks_free_lock);
 
+/* Free a lease which is not in use. */
+void locks_free_lease(struct file_lease *fl)
+{
+   kmem_cache_free(filelease_cache, fl);
+}
+EXPORT_SYMBOL(locks_free_lease);
+
 static void
 locks_dispose_list(struct list_head *dispose)
 {
-   struct file_lock *fl;
+   struct file_lock_core *flc;
 
while (!list_empty(dispose)) {
-   fl = list_first_entry(dispose, struct file_lock, c.flc_list);
-   list_del_init(>c.flc_list);
-   locks_free_lock(fl);
+   flc = list_first_entry(dispose, struct file_lock_core, 
flc_list);
+   list_del_init(>flc_list);
+   if (flc->flc_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT))
+   locks_free_lease(file_lease(flc));
+   else
+   locks_free_lock(file_lock(flc));
}
 }
 
@@ -355,6 +383,13 @@ void locks_init_lock(struct file_lock *fl)
 }
 EXPORT_SYMBOL(locks_init_lock);
 
+void locks_init_lease(struct file_lease *fl)
+{
+   memset(fl, 0, sizeof(*fl));
+   locks_init_lock_heads(>c);
+}
+EXPORT_SYMBOL(locks_init_lease);
+
 /*
  * Initialize a new lock from an existing file_lock structure.
  */
@@ -518,14 +553,14 @@ static int flock_to_posix_lock(struct file *filp, struct 
file_lock *fl,
 
 /* default lease lock manager operations */
 static bool
-lease_break_callback(struct file_lock *fl)
+lease_break_callback(struct file_lease *fl)
 {
kill_fasync(>fl_fasync, SIGIO, POLL_MSG);
return false;
 }
 
 static void
-lease_setup(struct file_lock *fl, void **priv)
+lease_setup(struct file_lease *fl, void **priv)
 {
struct file *filp = fl->c.flc_file;
struct fasync_struct *fa = *priv;
@@ -541,7 +576,7 @@ lease_setup(struct file_lock *fl, void **priv)
__f_setown(filp, task_pid(current), PIDTYPE_TGID, 0);
 }
 
-static const struct 

[PATCH v3 43/47] ocfs2: adapt to breakup of struct file_lock

2024-01-31 Thread Jeff Layton
Most of the existing APIs have remained the same, but subsystems that
access file_lock fields directly need to reach into struct
file_lock_core now.

Signed-off-by: Jeff Layton 
---
 fs/ocfs2/locks.c  | 9 -
 fs/ocfs2/stack_user.c | 1 -
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/ocfs2/locks.c b/fs/ocfs2/locks.c
index 84ad403b5998..6de944818c56 100644
--- a/fs/ocfs2/locks.c
+++ b/fs/ocfs2/locks.c
@@ -8,7 +8,6 @@
  */
 
 #include 
-#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 
@@ -54,8 +53,8 @@ static int ocfs2_do_flock(struct file *file, struct inode 
*inode,
 */
 
locks_init_lock();
-   request.fl_type = F_UNLCK;
-   request.fl_flags = FL_FLOCK;
+   request.c.flc_type = F_UNLCK;
+   request.c.flc_flags = FL_FLOCK;
locks_lock_file_wait(file, );
 
ocfs2_file_unlock(file);
@@ -101,7 +100,7 @@ int ocfs2_flock(struct file *file, int cmd, struct 
file_lock *fl)
struct inode *inode = file->f_mapping->host;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
 
-   if (!(fl->fl_flags & FL_FLOCK))
+   if (!(fl->c.flc_flags & FL_FLOCK))
return -ENOLCK;
 
if ((osb->s_mount_opt & OCFS2_MOUNT_LOCALFLOCKS) ||
@@ -119,7 +118,7 @@ int ocfs2_lock(struct file *file, int cmd, struct file_lock 
*fl)
struct inode *inode = file->f_mapping->host;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
 
-   if (!(fl->fl_flags & FL_POSIX))
+   if (!(fl->c.flc_flags & FL_POSIX))
return -ENOLCK;
 
return ocfs2_plock(osb->cconn, OCFS2_I(inode)->ip_blkno, file, cmd, fl);
diff --git a/fs/ocfs2/stack_user.c b/fs/ocfs2/stack_user.c
index 39b7e47a8618..c11406cd87a8 100644
--- a/fs/ocfs2/stack_user.c
+++ b/fs/ocfs2/stack_user.c
@@ -9,7 +9,6 @@
 
 #include 
 #include 
-#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 #include 

-- 
2.43.0




[PATCH v3 45/47] smb/server: adapt to breakup of struct file_lock

2024-01-31 Thread Jeff Layton
Most of the existing APIs have remained the same, but subsystems that
access file_lock fields directly need to reach into struct
file_lock_core now.

Signed-off-by: Jeff Layton 
---
 fs/smb/server/smb2pdu.c | 39 +++
 fs/smb/server/vfs.c |  9 -
 2 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/fs/smb/server/smb2pdu.c b/fs/smb/server/smb2pdu.c
index 11cc28719582..bec0a846a8d5 100644
--- a/fs/smb/server/smb2pdu.c
+++ b/fs/smb/server/smb2pdu.c
@@ -12,7 +12,6 @@
 #include 
 #include 
 #include 
-#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 
 #include "glob.h"
@@ -6761,10 +6760,10 @@ struct file_lock *smb_flock_init(struct file *f)
 
locks_init_lock(fl);
 
-   fl->fl_owner = f;
-   fl->fl_pid = current->tgid;
-   fl->fl_file = f;
-   fl->fl_flags = FL_POSIX;
+   fl->c.flc_owner = f;
+   fl->c.flc_pid = current->tgid;
+   fl->c.flc_file = f;
+   fl->c.flc_flags = FL_POSIX;
fl->fl_ops = NULL;
fl->fl_lmops = NULL;
 
@@ -6781,30 +6780,30 @@ static int smb2_set_flock_flags(struct file_lock 
*flock, int flags)
case SMB2_LOCKFLAG_SHARED:
ksmbd_debug(SMB, "received shared request\n");
cmd = F_SETLKW;
-   flock->fl_type = F_RDLCK;
-   flock->fl_flags |= FL_SLEEP;
+   flock->c.flc_type = F_RDLCK;
+   flock->c.flc_flags |= FL_SLEEP;
break;
case SMB2_LOCKFLAG_EXCLUSIVE:
ksmbd_debug(SMB, "received exclusive request\n");
cmd = F_SETLKW;
-   flock->fl_type = F_WRLCK;
-   flock->fl_flags |= FL_SLEEP;
+   flock->c.flc_type = F_WRLCK;
+   flock->c.flc_flags |= FL_SLEEP;
break;
case SMB2_LOCKFLAG_SHARED | SMB2_LOCKFLAG_FAIL_IMMEDIATELY:
ksmbd_debug(SMB,
"received shared & fail immediately request\n");
cmd = F_SETLK;
-   flock->fl_type = F_RDLCK;
+   flock->c.flc_type = F_RDLCK;
break;
case SMB2_LOCKFLAG_EXCLUSIVE | SMB2_LOCKFLAG_FAIL_IMMEDIATELY:
ksmbd_debug(SMB,
"received exclusive & fail immediately request\n");
cmd = F_SETLK;
-   flock->fl_type = F_WRLCK;
+   flock->c.flc_type = F_WRLCK;
break;
case SMB2_LOCKFLAG_UNLOCK:
ksmbd_debug(SMB, "received unlock request\n");
-   flock->fl_type = F_UNLCK;
+   flock->c.flc_type = F_UNLCK;
cmd = F_SETLK;
break;
}
@@ -6848,7 +6847,7 @@ static void smb2_remove_blocked_lock(void **argv)
 static inline bool lock_defer_pending(struct file_lock *fl)
 {
/* check pending lock waiters */
-   return waitqueue_active(>fl_wait);
+   return waitqueue_active(>c.flc_wait);
 }
 
 /**
@@ -6939,8 +6938,8 @@ int smb2_lock(struct ksmbd_work *work)
list_for_each_entry(cmp_lock, _list, llist) {
if (cmp_lock->fl->fl_start <= flock->fl_start &&
cmp_lock->fl->fl_end >= flock->fl_end) {
-   if (cmp_lock->fl->fl_type != F_UNLCK &&
-   flock->fl_type != F_UNLCK) {
+   if (cmp_lock->fl->c.flc_type != F_UNLCK &&
+   flock->c.flc_type != F_UNLCK) {
pr_err("conflict two locks in one 
request\n");
err = -EINVAL;
locks_free_lock(flock);
@@ -6988,12 +6987,12 @@ int smb2_lock(struct ksmbd_work *work)
list_for_each_entry(conn, _list, conns_list) {
spin_lock(>llist_lock);
list_for_each_entry_safe(cmp_lock, tmp2, 
>lock_list, clist) {
-   if (file_inode(cmp_lock->fl->fl_file) !=
-   file_inode(smb_lock->fl->fl_file))
+   if (file_inode(cmp_lock->fl->c.flc_file) !=
+   file_inode(smb_lock->fl->c.flc_file))
continue;
 
if (lock_is_unlock(smb_lock->fl)) {
-   if (cmp_lock->fl->fl_file == 
smb_lock->fl->fl_file &&
+   if (cmp_lock->fl->c.flc_file == 
smb_lock->fl->c.flc_file &&
cmp_lock->start == smb_lock->start 
&&
cmp_lock->end == smb_lock->end &&
!lock_defer_pending(cmp_lock->fl)) {
@@ -7010,7 +7009,7 @@ int smb2_lock(struct ksmbd_work *work)
continue;
}
 
-   

[PATCH v3 44/47] smb/client: adapt to breakup of struct file_lock

2024-01-31 Thread Jeff Layton
Most of the existing APIs have remained the same, but subsystems that
access file_lock fields directly need to reach into struct
file_lock_core now.

Signed-off-by: Jeff Layton 
---
 fs/smb/client/cifsglob.h |  1 -
 fs/smb/client/cifssmb.c  |  9 +++
 fs/smb/client/file.c | 67 +---
 fs/smb/client/smb2file.c |  3 +--
 4 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
index 78a994caadaf..16befff4cbb4 100644
--- a/fs/smb/client/cifsglob.h
+++ b/fs/smb/client/cifsglob.h
@@ -26,7 +26,6 @@
 #include 
 #include "../common/smb2pdu.h"
 #include "smb2pdu.h"
-#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 
 #define SMB_PATH_MAX 260
diff --git a/fs/smb/client/cifssmb.c b/fs/smb/client/cifssmb.c
index e19ecf692c20..5eb83bafc7fd 100644
--- a/fs/smb/client/cifssmb.c
+++ b/fs/smb/client/cifssmb.c
@@ -15,7 +15,6 @@
  /* want to reuse a stale file handle and only the caller knows the file info 
*/
 
 #include 
-#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 #include 
@@ -2067,20 +2066,20 @@ CIFSSMBPosixLock(const unsigned int xid, struct 
cifs_tcon *tcon,
parm_data = (struct cifs_posix_lock *)
((char *)>hdr.Protocol + data_offset);
if (parm_data->lock_type == cpu_to_le16(CIFS_UNLCK))
-   pLockData->fl_type = F_UNLCK;
+   pLockData->c.flc_type = F_UNLCK;
else {
if (parm_data->lock_type ==
cpu_to_le16(CIFS_RDLCK))
-   pLockData->fl_type = F_RDLCK;
+   pLockData->c.flc_type = F_RDLCK;
else if (parm_data->lock_type ==
cpu_to_le16(CIFS_WRLCK))
-   pLockData->fl_type = F_WRLCK;
+   pLockData->c.flc_type = F_WRLCK;
 
pLockData->fl_start = le64_to_cpu(parm_data->start);
pLockData->fl_end = pLockData->fl_start +
(le64_to_cpu(parm_data->length) ?
 le64_to_cpu(parm_data->length) - 1 : 0);
-   pLockData->fl_pid = -le32_to_cpu(parm_data->pid);
+   pLockData->c.flc_pid = -le32_to_cpu(parm_data->pid);
}
}
 
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 32d3a27236fc..6c4df0d2b641 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -9,7 +9,6 @@
  *
  */
 #include 
-#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 #include 
@@ -1313,20 +1312,20 @@ cifs_lock_test(struct cifsFileInfo *cfile, __u64 
offset, __u64 length,
down_read(>lock_sem);
 
exist = cifs_find_lock_conflict(cfile, offset, length, type,
-   flock->fl_flags, _lock,
+   flock->c.flc_flags, _lock,
CIFS_LOCK_OP);
if (exist) {
flock->fl_start = conf_lock->offset;
flock->fl_end = conf_lock->offset + conf_lock->length - 1;
-   flock->fl_pid = conf_lock->pid;
+   flock->c.flc_pid = conf_lock->pid;
if (conf_lock->type & server->vals->shared_lock_type)
-   flock->fl_type = F_RDLCK;
+   flock->c.flc_type = F_RDLCK;
else
-   flock->fl_type = F_WRLCK;
+   flock->c.flc_type = F_WRLCK;
} else if (!cinode->can_cache_brlcks)
rc = 1;
else
-   flock->fl_type = F_UNLCK;
+   flock->c.flc_type = F_UNLCK;
 
up_read(>lock_sem);
return rc;
@@ -1402,16 +1401,16 @@ cifs_posix_lock_test(struct file *file, struct 
file_lock *flock)
 {
int rc = 0;
struct cifsInodeInfo *cinode = CIFS_I(file_inode(file));
-   unsigned char saved_type = flock->fl_type;
+   unsigned char saved_type = flock->c.flc_type;
 
-   if ((flock->fl_flags & FL_POSIX) == 0)
+   if ((flock->c.flc_flags & FL_POSIX) == 0)
return 1;
 
down_read(>lock_sem);
posix_test_lock(file, flock);
 
if (lock_is_unlock(flock) && !cinode->can_cache_brlcks) {
-   flock->fl_type = saved_type;
+   flock->c.flc_type = saved_type;
rc = 1;
}
 
@@ -1432,7 +1431,7 @@ cifs_posix_lock_set(struct file *file, struct file_lock 
*flock)
struct cifsInodeInfo *cinode = CIFS_I(file_inode(file));
int rc = FILE_LOCK_DEFERRED + 1;
 
-   if ((flock->fl_flags & FL_POSIX) == 0)
+   if ((flock->c.flc_flags & FL_POSIX) == 0)
return rc;
 
cifs_down_write(>lock_sem);
@@ -1583,6 +1582,8 @@ cifs_push_posix_locks(struct cifsFileInfo *cfile)
el = 

[PATCH v3 15/47] smb/server: convert to using new filelock helpers

2024-01-31 Thread Jeff Layton
Convert to using the new file locking helper functions.

Signed-off-by: Jeff Layton 
---
 fs/smb/server/smb2pdu.c | 6 +++---
 fs/smb/server/vfs.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/smb/server/smb2pdu.c b/fs/smb/server/smb2pdu.c
index ba7a72a6a4f4..e170b96d5ac0 100644
--- a/fs/smb/server/smb2pdu.c
+++ b/fs/smb/server/smb2pdu.c
@@ -6841,7 +6841,7 @@ static void smb2_remove_blocked_lock(void **argv)
struct file_lock *flock = (struct file_lock *)argv[0];
 
ksmbd_vfs_posix_lock_unblock(flock);
-   wake_up(>fl_wait);
+   locks_wake_up(flock);
 }
 
 static inline bool lock_defer_pending(struct file_lock *fl)
@@ -6991,7 +6991,7 @@ int smb2_lock(struct ksmbd_work *work)
file_inode(smb_lock->fl->fl_file))
continue;
 
-   if (smb_lock->fl->fl_type == F_UNLCK) {
+   if (lock_is_unlock(smb_lock->fl)) {
if (cmp_lock->fl->fl_file == 
smb_lock->fl->fl_file &&
cmp_lock->start == smb_lock->start 
&&
cmp_lock->end == smb_lock->end &&
@@ -7051,7 +7051,7 @@ int smb2_lock(struct ksmbd_work *work)
}
up_read(_list_lock);
 out_check_cl:
-   if (smb_lock->fl->fl_type == F_UNLCK && nolock) {
+   if (lock_is_unlock(smb_lock->fl) && nolock) {
pr_err("Try to unlock nolocked range\n");
rsp->hdr.Status = STATUS_RANGE_NOT_LOCKED;
goto out;
diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
index a6961bfe3e13..449cfa9ed31c 100644
--- a/fs/smb/server/vfs.c
+++ b/fs/smb/server/vfs.c
@@ -337,16 +337,16 @@ static int check_lock_range(struct file *filp, loff_t 
start, loff_t end,
return 0;
 
spin_lock(>flc_lock);
-   list_for_each_entry(flock, >flc_posix, fl_list) {
+   for_each_file_lock(flock, >flc_posix) {
/* check conflict locks */
if (flock->fl_end >= start && end >= flock->fl_start) {
-   if (flock->fl_type == F_RDLCK) {
+   if (lock_is_read(flock)) {
if (type == WRITE) {
pr_err("not allow write by shared 
lock\n");
error = 1;
goto out;
}
-   } else if (flock->fl_type == F_WRLCK) {
+   } else if (lock_is_write(flock)) {
/* check owner in lock */
if (flock->fl_file != filp) {
error = 1;

-- 
2.43.0




[PATCH v3 42/47] nfsd: adapt to breakup of struct file_lock

2024-01-31 Thread Jeff Layton
Most of the existing APIs have remained the same, but subsystems that
access file_lock fields directly need to reach into struct
file_lock_core now.

Signed-off-by: Jeff Layton 
---
 fs/nfsd/filecache.c|  4 +--
 fs/nfsd/netns.h|  1 -
 fs/nfsd/nfs4callback.c |  2 +-
 fs/nfsd/nfs4layouts.c  | 15 ++-
 fs/nfsd/nfs4state.c| 69 +-
 5 files changed, 46 insertions(+), 45 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 9cb7f0c33df5..b86d8494052c 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -662,8 +662,8 @@ nfsd_file_lease_notifier_call(struct notifier_block *nb, 
unsigned long arg,
struct file_lock *fl = data;
 
/* Only close files for F_SETLEASE leases */
-   if (fl->fl_flags & FL_LEASE)
-   nfsd_file_close_inode(file_inode(fl->fl_file));
+   if (fl->c.flc_flags & FL_LEASE)
+   nfsd_file_close_inode(file_inode(fl->c.flc_file));
return 0;
 }
 
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index fd91125208be..74b4360779a1 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -10,7 +10,6 @@
 
 #include 
 #include 
-#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 #include 
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 926c29879c6a..32d23ef3e5de 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -674,7 +674,7 @@ static void nfs4_xdr_enc_cb_notify_lock(struct rpc_rqst 
*req,
const struct nfsd4_callback *cb = data;
const struct nfsd4_blocked_lock *nbl =
container_of(cb, struct nfsd4_blocked_lock, nbl_cb);
-   struct nfs4_lockowner *lo = (struct nfs4_lockowner 
*)nbl->nbl_lock.fl_owner;
+   struct nfs4_lockowner *lo = (struct nfs4_lockowner 
*)nbl->nbl_lock.c.flc_owner;
struct nfs4_cb_compound_hdr hdr = {
.ident = 0,
.minorversion = cb->cb_clp->cl_minorversion,
diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 5e8096bc5eaa..daae68e526e0 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -193,14 +193,15 @@ nfsd4_layout_setlease(struct nfs4_layout_stateid *ls)
return -ENOMEM;
locks_init_lock(fl);
fl->fl_lmops = _layouts_lm_ops;
-   fl->fl_flags = FL_LAYOUT;
-   fl->fl_type = F_RDLCK;
+   fl->c.flc_flags = FL_LAYOUT;
+   fl->c.flc_type = F_RDLCK;
fl->fl_end = OFFSET_MAX;
-   fl->fl_owner = ls;
-   fl->fl_pid = current->tgid;
-   fl->fl_file = ls->ls_file->nf_file;
+   fl->c.flc_owner = ls;
+   fl->c.flc_pid = current->tgid;
+   fl->c.flc_file = ls->ls_file->nf_file;
 
-   status = vfs_setlease(fl->fl_file, fl->fl_type, , NULL);
+   status = vfs_setlease(fl->c.flc_file, fl->c.flc_type, ,
+ NULL);
if (status) {
locks_free_lock(fl);
return status;
@@ -731,7 +732,7 @@ nfsd4_layout_lm_break(struct file_lock *fl)
 * in time:
 */
fl->fl_break_time = 0;
-   nfsd4_recall_file_layout(fl->fl_owner);
+   nfsd4_recall_file_layout(fl->c.flc_owner);
return false;
 }
 
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 83d605ecdcdc..4a1d462209cd 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4924,7 +4924,7 @@ static void nfsd_break_one_deleg(struct nfs4_delegation 
*dp)
 static bool
 nfsd_break_deleg_cb(struct file_lock *fl)
 {
-   struct nfs4_delegation *dp = (struct nfs4_delegation *)fl->fl_owner;
+   struct nfs4_delegation *dp = (struct nfs4_delegation *) fl->c.flc_owner;
struct nfs4_file *fp = dp->dl_stid.sc_file;
struct nfs4_client *clp = dp->dl_stid.sc_client;
struct nfsd_net *nn;
@@ -4962,7 +4962,7 @@ nfsd_break_deleg_cb(struct file_lock *fl)
  */
 static bool nfsd_breaker_owns_lease(struct file_lock *fl)
 {
-   struct nfs4_delegation *dl = fl->fl_owner;
+   struct nfs4_delegation *dl = fl->c.flc_owner;
struct svc_rqst *rqst;
struct nfs4_client *clp;
 
@@ -4980,7 +4980,7 @@ static int
 nfsd_change_deleg_cb(struct file_lock *onlist, int arg,
 struct list_head *dispose)
 {
-   struct nfs4_delegation *dp = (struct nfs4_delegation *)onlist->fl_owner;
+   struct nfs4_delegation *dp = (struct nfs4_delegation *) 
onlist->c.flc_owner;
struct nfs4_client *clp = dp->dl_stid.sc_client;
 
if (arg & F_UNLCK) {
@@ -5340,12 +5340,12 @@ static struct file_lock *nfs4_alloc_init_lease(struct 
nfs4_delegation *dp,
if (!fl)
return NULL;
fl->fl_lmops = _lease_mng_ops;
-   fl->fl_flags = FL_DELEG;
-   fl->fl_type = flag == NFS4_OPEN_DELEGATE_READ? F_RDLCK: F_WRLCK;
+   fl->c.flc_flags = FL_DELEG;
+   fl->c.flc_type = flag == NFS4_OPEN_DELEGATE_READ? F_RDLCK: F_WRLCK;
fl->fl_end = OFFSET_MAX;
-   fl->fl_owner = (fl_owner_t)dp;
-   fl->fl_pid = 

[PATCH v3 41/47] nfs: adapt to breakup of struct file_lock

2024-01-31 Thread Jeff Layton
Most of the existing APIs have remained the same, but subsystems that
access file_lock fields directly need to reach into struct
file_lock_core now.

Signed-off-by: Jeff Layton 
---
 fs/nfs/delegation.c |  2 +-
 fs/nfs/file.c   | 19 +--
 fs/nfs/nfs3proc.c   |  2 +-
 fs/nfs/nfs4_fs.h|  1 -
 fs/nfs/nfs4proc.c   | 33 ++---
 fs/nfs/nfs4state.c  |  4 ++--
 fs/nfs/nfs4trace.h  |  4 ++--
 fs/nfs/nfs4xdr.c|  6 +++---
 fs/nfs/write.c  |  5 ++---
 9 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index ca6985001466..d4a42ce0c7e3 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -157,7 +157,7 @@ static int nfs_delegation_claim_locks(struct nfs4_state 
*state, const nfs4_state
spin_lock(>flc_lock);
 restart:
for_each_file_lock(fl, list) {
-   if (nfs_file_open_context(fl->fl_file)->state != state)
+   if (nfs_file_open_context(fl->c.flc_file)->state != state)
continue;
spin_unlock(>flc_lock);
status = nfs4_lock_delegation_recall(fl, state, stateid);
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 0b6691e64d27..407c6e15afe2 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -31,7 +31,6 @@
 #include 
 
 #include 
-#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 
 #include "delegation.h"
@@ -721,15 +720,15 @@ do_getlk(struct file *filp, int cmd, struct file_lock 
*fl, int is_local)
 {
struct inode *inode = filp->f_mapping->host;
int status = 0;
-   unsigned int saved_type = fl->fl_type;
+   unsigned int saved_type = fl->c.flc_type;
 
/* Try local locking first */
posix_test_lock(filp, fl);
-   if (fl->fl_type != F_UNLCK) {
+   if (fl->c.flc_type != F_UNLCK) {
/* found a conflict */
goto out;
}
-   fl->fl_type = saved_type;
+   fl->c.flc_type = saved_type;
 
if (NFS_PROTO(inode)->have_delegation(inode, FMODE_READ))
goto out_noconflict;
@@ -741,7 +740,7 @@ do_getlk(struct file *filp, int cmd, struct file_lock *fl, 
int is_local)
 out:
return status;
 out_noconflict:
-   fl->fl_type = F_UNLCK;
+   fl->c.flc_type = F_UNLCK;
goto out;
 }
 
@@ -766,7 +765,7 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, 
int is_local)
 *  If we're signalled while cleaning up locks on process 
exit, we
 *  still need to complete the unlock.
 */
-   if (status < 0 && !(fl->fl_flags & FL_CLOSE))
+   if (status < 0 && !(fl->c.flc_flags & FL_CLOSE))
return status;
}
 
@@ -833,12 +832,12 @@ int nfs_lock(struct file *filp, int cmd, struct file_lock 
*fl)
int is_local = 0;
 
dprintk("NFS: lock(%pD2, t=%x, fl=%x, r=%lld:%lld)\n",
-   filp, fl->fl_type, fl->fl_flags,
+   filp, fl->c.flc_type, fl->c.flc_flags,
(long long)fl->fl_start, (long long)fl->fl_end);
 
nfs_inc_stats(inode, NFSIOS_VFSLOCK);
 
-   if (fl->fl_flags & FL_RECLAIM)
+   if (fl->c.flc_flags & FL_RECLAIM)
return -ENOGRACE;
 
if (NFS_SERVER(inode)->flags & NFS_MOUNT_LOCAL_FCNTL)
@@ -870,9 +869,9 @@ int nfs_flock(struct file *filp, int cmd, struct file_lock 
*fl)
int is_local = 0;
 
dprintk("NFS: flock(%pD2, t=%x, fl=%x)\n",
-   filp, fl->fl_type, fl->fl_flags);
+   filp, fl->c.flc_type, fl->c.flc_flags);
 
-   if (!(fl->fl_flags & FL_FLOCK))
+   if (!(fl->c.flc_flags & FL_FLOCK))
return -ENOLCK;
 
if (NFS_SERVER(inode)->flags & NFS_MOUNT_LOCAL_FLOCK)
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index 2de66e4e8280..cbbe3f0193b8 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -963,7 +963,7 @@ nfs3_proc_lock(struct file *filp, int cmd, struct file_lock 
*fl)
struct nfs_open_context *ctx = nfs_file_open_context(filp);
int status;
 
-   if (fl->fl_flags & FL_CLOSE) {
+   if (fl->c.flc_flags & FL_CLOSE) {
l_ctx = nfs_get_lock_context(ctx);
if (IS_ERR(l_ctx))
l_ctx = NULL;
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 752224a48f1c..581698f1b7b2 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -23,7 +23,6 @@
 #define NFS4_MAX_LOOP_ON_RECOVER (10)
 
 #include 
-#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 
 struct idmap;
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index df54fcd0fa08..91dddcd79004 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -6800,7 +6800,7 @@ static int _nfs4_proc_getlk(struct nfs4_state *state, int 
cmd, struct file_lock
status = nfs4_call_sync(server->client, server, , _args, 
_res, 1);
switch (status) {
case 0:
-   

[PATCH v3 40/47] lockd: adapt to breakup of struct file_lock

2024-01-31 Thread Jeff Layton
Most of the existing APIs have remained the same, but subsystems that
access file_lock fields directly need to reach into struct
file_lock_core now.

Signed-off-by: Jeff Layton 
---
 fs/lockd/clnt4xdr.c | 14 +-
 fs/lockd/clntlock.c |  2 +-
 fs/lockd/clntproc.c | 62 +++
 fs/lockd/clntxdr.c  | 14 +-
 fs/lockd/svc4proc.c | 10 +++
 fs/lockd/svclock.c  | 64 +++--
 fs/lockd/svcproc.c  | 10 +++
 fs/lockd/svcsubs.c  | 20 +++---
 fs/lockd/xdr.c  | 14 +-
 fs/lockd/xdr4.c | 14 +-
 include/linux/lockd/lockd.h |  8 +++---
 include/linux/lockd/xdr.h   |  1 -
 12 files changed, 119 insertions(+), 114 deletions(-)

diff --git a/fs/lockd/clnt4xdr.c b/fs/lockd/clnt4xdr.c
index 8161667c976f..527458db4525 100644
--- a/fs/lockd/clnt4xdr.c
+++ b/fs/lockd/clnt4xdr.c
@@ -243,7 +243,7 @@ static void encode_nlm4_holder(struct xdr_stream *xdr,
u64 l_offset, l_len;
__be32 *p;
 
-   encode_bool(xdr, lock->fl.fl_type == F_RDLCK);
+   encode_bool(xdr, lock->fl.c.flc_type == F_RDLCK);
encode_int32(xdr, lock->svid);
encode_netobj(xdr, lock->oh.data, lock->oh.len);
 
@@ -270,7 +270,7 @@ static int decode_nlm4_holder(struct xdr_stream *xdr, 
struct nlm_res *result)
goto out_overflow;
exclusive = be32_to_cpup(p++);
lock->svid = be32_to_cpup(p);
-   fl->fl_pid = (pid_t)lock->svid;
+   fl->c.flc_pid = (pid_t)lock->svid;
 
error = decode_netobj(xdr, >oh);
if (unlikely(error))
@@ -280,8 +280,8 @@ static int decode_nlm4_holder(struct xdr_stream *xdr, 
struct nlm_res *result)
if (unlikely(p == NULL))
goto out_overflow;
 
-   fl->fl_flags = FL_POSIX;
-   fl->fl_type  = exclusive != 0 ? F_WRLCK : F_RDLCK;
+   fl->c.flc_flags = FL_POSIX;
+   fl->c.flc_type  = exclusive != 0 ? F_WRLCK : F_RDLCK;
p = xdr_decode_hyper(p, _offset);
xdr_decode_hyper(p, _len);
nlm4svc_set_file_lock_range(fl, l_offset, l_len);
@@ -357,7 +357,7 @@ static void nlm4_xdr_enc_testargs(struct rpc_rqst *req,
const struct nlm_lock *lock = >lock;
 
encode_cookie(xdr, >cookie);
-   encode_bool(xdr, lock->fl.fl_type == F_WRLCK);
+   encode_bool(xdr, lock->fl.c.flc_type == F_WRLCK);
encode_nlm4_lock(xdr, lock);
 }
 
@@ -380,7 +380,7 @@ static void nlm4_xdr_enc_lockargs(struct rpc_rqst *req,
 
encode_cookie(xdr, >cookie);
encode_bool(xdr, args->block);
-   encode_bool(xdr, lock->fl.fl_type == F_WRLCK);
+   encode_bool(xdr, lock->fl.c.flc_type == F_WRLCK);
encode_nlm4_lock(xdr, lock);
encode_bool(xdr, args->reclaim);
encode_int32(xdr, args->state);
@@ -403,7 +403,7 @@ static void nlm4_xdr_enc_cancargs(struct rpc_rqst *req,
 
encode_cookie(xdr, >cookie);
encode_bool(xdr, args->block);
-   encode_bool(xdr, lock->fl.fl_type == F_WRLCK);
+   encode_bool(xdr, lock->fl.c.flc_type == F_WRLCK);
encode_nlm4_lock(xdr, lock);
 }
 
diff --git a/fs/lockd/clntlock.c b/fs/lockd/clntlock.c
index 5d85715be763..a7e0519ec024 100644
--- a/fs/lockd/clntlock.c
+++ b/fs/lockd/clntlock.c
@@ -185,7 +185,7 @@ __be32 nlmclnt_grant(const struct sockaddr *addr, const 
struct nlm_lock *lock)
continue;
if (!rpc_cmp_addr(nlm_addr(block->b_host), addr))
continue;
-   if (nfs_compare_fh(NFS_FH(file_inode(fl_blocked->fl_file)), fh) 
!= 0)
+   if (nfs_compare_fh(NFS_FH(file_inode(fl_blocked->c.flc_file)), 
fh) != 0)
continue;
/* Alright, we found a lock. Set the return status
 * and wake up the caller
diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index 1f71260603b7..cebcc283b7ce 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -12,7 +12,6 @@
 #include 
 #include 
 #include 
-#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 #include 
@@ -134,7 +133,8 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, 
struct file_lock *fl)
char *nodename = req->a_host->h_rpcclnt->cl_nodename;
 
nlmclnt_next_cookie(>cookie);
-   memcpy(>fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct 
nfs_fh));
+   memcpy(>fh, NFS_FH(file_inode(fl->c.flc_file)),
+  sizeof(struct nfs_fh));
lock->caller  = nodename;
lock->oh.data = req->a_owner;
lock->oh.len  = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
@@ -143,7 +143,7 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, 
struct file_lock *fl)
lock->svid = fl->fl_u.nfs_fl.owner->pid;
lock->fl.fl_start = fl->fl_start;
lock->fl.fl_end = fl->fl_end;
-   lock->fl.fl_type = fl->fl_type;
+   lock->fl.c.flc_type = fl->c.flc_type;
 }
 
 static void 

[PATCH v3 39/47] fuse: adapt to breakup of struct file_lock

2024-01-31 Thread Jeff Layton
Most of the existing APIs have remained the same, but subsystems that
access file_lock fields directly need to reach into struct
file_lock_core now.

Signed-off-by: Jeff Layton 
---
 fs/fuse/file.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 2757870ee6ac..c007b0f0c3a7 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -18,7 +18,6 @@
 #include 
 #include 
 #include 
-#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 
@@ -2510,14 +2509,14 @@ static int convert_fuse_file_lock(struct fuse_conn *fc,
 * translate it into the caller's pid namespace.
 */
rcu_read_lock();
-   fl->fl_pid = pid_nr_ns(find_pid_ns(ffl->pid, fc->pid_ns), 
_pid_ns);
+   fl->c.flc_pid = pid_nr_ns(find_pid_ns(ffl->pid, fc->pid_ns), 
_pid_ns);
rcu_read_unlock();
break;
 
default:
return -EIO;
}
-   fl->fl_type = ffl->type;
+   fl->c.flc_type = ffl->type;
return 0;
 }
 
@@ -2531,10 +2530,10 @@ static void fuse_lk_fill(struct fuse_args *args, struct 
file *file,
 
memset(inarg, 0, sizeof(*inarg));
inarg->fh = ff->fh;
-   inarg->owner = fuse_lock_owner_id(fc, fl->fl_owner);
+   inarg->owner = fuse_lock_owner_id(fc, fl->c.flc_owner);
inarg->lk.start = fl->fl_start;
inarg->lk.end = fl->fl_end;
-   inarg->lk.type = fl->fl_type;
+   inarg->lk.type = fl->c.flc_type;
inarg->lk.pid = pid;
if (flock)
inarg->lk_flags |= FUSE_LK_FLOCK;
@@ -2571,8 +2570,8 @@ static int fuse_setlk(struct file *file, struct file_lock 
*fl, int flock)
struct fuse_mount *fm = get_fuse_mount(inode);
FUSE_ARGS(args);
struct fuse_lk_in inarg;
-   int opcode = (fl->fl_flags & FL_SLEEP) ? FUSE_SETLKW : FUSE_SETLK;
-   struct pid *pid = fl->fl_type != F_UNLCK ? task_tgid(current) : NULL;
+   int opcode = (fl->c.flc_flags & FL_SLEEP) ? FUSE_SETLKW : FUSE_SETLK;
+   struct pid *pid = fl->c.flc_type != F_UNLCK ? task_tgid(current) : NULL;
pid_t pid_nr = pid_nr_ns(pid, fm->fc->pid_ns);
int err;
 
@@ -2582,7 +2581,7 @@ static int fuse_setlk(struct file *file, struct file_lock 
*fl, int flock)
}
 
/* Unlock on close is handled by the flush method */
-   if ((fl->fl_flags & FL_CLOSE_POSIX) == FL_CLOSE_POSIX)
+   if ((fl->c.flc_flags & FL_CLOSE_POSIX) == FL_CLOSE_POSIX)
return 0;
 
fuse_lk_fill(, file, fl, opcode, pid_nr, flock, );

-- 
2.43.0




[PATCH v3 37/47] dlm: adapt to breakup of struct file_lock

2024-01-31 Thread Jeff Layton
Most of the existing APIs have remained the same, but subsystems that
access file_lock fields directly need to reach into struct
file_lock_core now.

Signed-off-by: Jeff Layton 
---
 fs/dlm/plock.c | 45 ++---
 1 file changed, 22 insertions(+), 23 deletions(-)

diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index fdcddbb96d40..9ca83ef70ed1 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -4,7 +4,6 @@
  */
 
 #include 
-#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 #include 
@@ -139,14 +138,14 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 
number, struct file *file,
}
 
op->info.optype = DLM_PLOCK_OP_LOCK;
-   op->info.pid= fl->fl_pid;
-   op->info.ex = (lock_is_write(fl));
-   op->info.wait   = !!(fl->fl_flags & FL_SLEEP);
+   op->info.pid= fl->c.flc_pid;
+   op->info.ex = lock_is_write(fl);
+   op->info.wait   = !!(fl->c.flc_flags & FL_SLEEP);
op->info.fsid   = ls->ls_global_id;
op->info.number = number;
op->info.start  = fl->fl_start;
op->info.end= fl->fl_end;
-   op->info.owner = (__u64)(long)fl->fl_owner;
+   op->info.owner = (__u64)(long) fl->c.flc_owner;
/* async handling */
if (fl->fl_lmops && fl->fl_lmops->lm_grant) {
op_data = kzalloc(sizeof(*op_data), GFP_NOFS);
@@ -259,7 +258,7 @@ static int dlm_plock_callback(struct plock_op *op)
}
 
/* got fs lock; bookkeep locally as well: */
-   flc->fl_flags &= ~FL_SLEEP;
+   flc->c.flc_flags &= ~FL_SLEEP;
if (posix_lock_file(file, flc, NULL)) {
/*
 * This can only happen in the case of kmalloc() failure.
@@ -292,7 +291,7 @@ int dlm_posix_unlock(dlm_lockspace_t *lockspace, u64 
number, struct file *file,
struct dlm_ls *ls;
struct plock_op *op;
int rv;
-   unsigned char saved_flags = fl->fl_flags;
+   unsigned char saved_flags = fl->c.flc_flags;
 
ls = dlm_find_lockspace_local(lockspace);
if (!ls)
@@ -305,7 +304,7 @@ int dlm_posix_unlock(dlm_lockspace_t *lockspace, u64 
number, struct file *file,
}
 
/* cause the vfs unlock to return ENOENT if lock is not found */
-   fl->fl_flags |= FL_EXISTS;
+   fl->c.flc_flags |= FL_EXISTS;
 
rv = locks_lock_file_wait(file, fl);
if (rv == -ENOENT) {
@@ -318,14 +317,14 @@ int dlm_posix_unlock(dlm_lockspace_t *lockspace, u64 
number, struct file *file,
}
 
op->info.optype = DLM_PLOCK_OP_UNLOCK;
-   op->info.pid= fl->fl_pid;
+   op->info.pid= fl->c.flc_pid;
op->info.fsid   = ls->ls_global_id;
op->info.number = number;
op->info.start  = fl->fl_start;
op->info.end= fl->fl_end;
-   op->info.owner = (__u64)(long)fl->fl_owner;
+   op->info.owner = (__u64)(long) fl->c.flc_owner;
 
-   if (fl->fl_flags & FL_CLOSE) {
+   if (fl->c.flc_flags & FL_CLOSE) {
op->info.flags |= DLM_PLOCK_FL_CLOSE;
send_op(op);
rv = 0;
@@ -346,7 +345,7 @@ int dlm_posix_unlock(dlm_lockspace_t *lockspace, u64 
number, struct file *file,
dlm_release_plock_op(op);
 out:
dlm_put_lockspace(ls);
-   fl->fl_flags = saved_flags;
+   fl->c.flc_flags = saved_flags;
return rv;
 }
 EXPORT_SYMBOL_GPL(dlm_posix_unlock);
@@ -376,14 +375,14 @@ int dlm_posix_cancel(dlm_lockspace_t *lockspace, u64 
number, struct file *file,
return -EINVAL;
 
memset(, 0, sizeof(info));
-   info.pid = fl->fl_pid;
-   info.ex = (lock_is_write(fl));
+   info.pid = fl->c.flc_pid;
+   info.ex = lock_is_write(fl);
info.fsid = ls->ls_global_id;
dlm_put_lockspace(ls);
info.number = number;
info.start = fl->fl_start;
info.end = fl->fl_end;
-   info.owner = (__u64)(long)fl->fl_owner;
+   info.owner = (__u64)(long) fl->c.flc_owner;
 
rv = do_lock_cancel();
switch (rv) {
@@ -438,13 +437,13 @@ int dlm_posix_get(dlm_lockspace_t *lockspace, u64 number, 
struct file *file,
}
 
op->info.optype = DLM_PLOCK_OP_GET;
-   op->info.pid= fl->fl_pid;
-   op->info.ex = (lock_is_write(fl));
+   op->info.pid= fl->c.flc_pid;
+   op->info.ex = lock_is_write(fl);
op->info.fsid   = ls->ls_global_id;
op->info.number = number;
op->info.start  = fl->fl_start;
op->info.end= fl->fl_end;
-   op->info.owner = (__u64)(long)fl->fl_owner;
+   op->info.owner = (__u64)(long) fl->c.flc_owner;
 
send_op(op);
wait_event(recv_wq, (op->done != 0));
@@ -456,16 +455,16 @@ int dlm_posix_get(dlm_lockspace_t *lockspace, 

[PATCH v3 36/47] ceph: adapt to breakup of struct file_lock

2024-01-31 Thread Jeff Layton
Most of the existing APIs have remained the same, but subsystems that
access file_lock fields directly need to reach into struct
file_lock_core now.

Signed-off-by: Jeff Layton 
---
 fs/ceph/locks.c | 51 ++-
 1 file changed, 26 insertions(+), 25 deletions(-)

diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index ce773e9c0b79..ebf4ac0055dd 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -7,7 +7,6 @@
 
 #include "super.h"
 #include "mds_client.h"
-#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 
@@ -34,7 +33,7 @@ void __init ceph_flock_init(void)
 
 static void ceph_fl_copy_lock(struct file_lock *dst, struct file_lock *src)
 {
-   struct inode *inode = file_inode(dst->fl_file);
+   struct inode *inode = file_inode(dst->c.flc_file);
atomic_inc(_inode(inode)->i_filelock_ref);
dst->fl_u.ceph.inode = igrab(inode);
 }
@@ -111,17 +110,18 @@ static int ceph_lock_message(u8 lock_type, u16 operation, 
struct inode *inode,
else
length = fl->fl_end - fl->fl_start + 1;
 
-   owner = secure_addr(fl->fl_owner);
+   owner = secure_addr(fl->c.flc_owner);
 
doutc(cl, "rule: %d, op: %d, owner: %llx, pid: %llu, "
"start: %llu, length: %llu, wait: %d, type: %d\n",
-   (int)lock_type, (int)operation, owner, (u64)fl->fl_pid,
-   fl->fl_start, length, wait, fl->fl_type);
+   (int)lock_type, (int)operation, owner,
+   (u64) fl->c.flc_pid,
+   fl->fl_start, length, wait, fl->c.flc_type);
 
req->r_args.filelock_change.rule = lock_type;
req->r_args.filelock_change.type = cmd;
req->r_args.filelock_change.owner = cpu_to_le64(owner);
-   req->r_args.filelock_change.pid = cpu_to_le64((u64)fl->fl_pid);
+   req->r_args.filelock_change.pid = cpu_to_le64((u64) fl->c.flc_pid);
req->r_args.filelock_change.start = cpu_to_le64(fl->fl_start);
req->r_args.filelock_change.length = cpu_to_le64(length);
req->r_args.filelock_change.wait = wait;
@@ -131,13 +131,13 @@ static int ceph_lock_message(u8 lock_type, u16 operation, 
struct inode *inode,
err = ceph_mdsc_wait_request(mdsc, req, wait ?
ceph_lock_wait_for_completion : NULL);
if (!err && operation == CEPH_MDS_OP_GETFILELOCK) {
-   fl->fl_pid = 
-le64_to_cpu(req->r_reply_info.filelock_reply->pid);
+   fl->c.flc_pid = 
-le64_to_cpu(req->r_reply_info.filelock_reply->pid);
if (CEPH_LOCK_SHARED == req->r_reply_info.filelock_reply->type)
-   fl->fl_type = F_RDLCK;
+   fl->c.flc_type = F_RDLCK;
else if (CEPH_LOCK_EXCL == 
req->r_reply_info.filelock_reply->type)
-   fl->fl_type = F_WRLCK;
+   fl->c.flc_type = F_WRLCK;
else
-   fl->fl_type = F_UNLCK;
+   fl->c.flc_type = F_UNLCK;
 
fl->fl_start = 
le64_to_cpu(req->r_reply_info.filelock_reply->start);
length = le64_to_cpu(req->r_reply_info.filelock_reply->start) +
@@ -151,8 +151,8 @@ static int ceph_lock_message(u8 lock_type, u16 operation, 
struct inode *inode,
ceph_mdsc_put_request(req);
doutc(cl, "rule: %d, op: %d, pid: %llu, start: %llu, "
  "length: %llu, wait: %d, type: %d, err code %d\n",
- (int)lock_type, (int)operation, (u64)fl->fl_pid,
- fl->fl_start, length, wait, fl->fl_type, err);
+ (int)lock_type, (int)operation, (u64) fl->c.flc_pid,
+ fl->fl_start, length, wait, fl->c.flc_type, err);
return err;
 }
 
@@ -228,10 +228,10 @@ static int ceph_lock_wait_for_completion(struct 
ceph_mds_client *mdsc,
 static int try_unlock_file(struct file *file, struct file_lock *fl)
 {
int err;
-   unsigned int orig_flags = fl->fl_flags;
-   fl->fl_flags |= FL_EXISTS;
+   unsigned int orig_flags = fl->c.flc_flags;
+   fl->c.flc_flags |= FL_EXISTS;
err = locks_lock_file_wait(file, fl);
-   fl->fl_flags = orig_flags;
+   fl->c.flc_flags = orig_flags;
if (err == -ENOENT) {
if (!(orig_flags & FL_EXISTS))
err = 0;
@@ -254,13 +254,13 @@ int ceph_lock(struct file *file, int cmd, struct 
file_lock *fl)
u8 wait = 0;
u8 lock_cmd;
 
-   if (!(fl->fl_flags & FL_POSIX))
+   if (!(fl->c.flc_flags & FL_POSIX))
return -ENOLCK;
 
if (ceph_inode_is_shutdown(inode))
return -ESTALE;
 
-   doutc(cl, "fl_owner: %p\n", fl->fl_owner);
+   doutc(cl, "fl_owner: %p\n", fl->c.flc_owner);
 
/* set wait bit as appropriate, then make command as Ceph expects it*/
if (IS_GETLK(cmd))
@@ -294,7 +294,7 @@ int ceph_lock(struct file *file, int cmd, struct file_lock 
*fl)
 
err = 

[PATCH v3 35/47] afs: adapt to breakup of struct file_lock

2024-01-31 Thread Jeff Layton
Most of the existing APIs have remained the same, but subsystems that
access file_lock fields directly need to reach into struct
file_lock_core now.

Signed-off-by: Jeff Layton 
---
 fs/afs/flock.c | 38 +++---
 fs/afs/internal.h  |  1 -
 include/trace/events/afs.h |  4 ++--
 3 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/fs/afs/flock.c b/fs/afs/flock.c
index 4eee3d1ca5ad..f0e96a35093f 100644
--- a/fs/afs/flock.c
+++ b/fs/afs/flock.c
@@ -121,16 +121,15 @@ static void afs_next_locker(struct afs_vnode *vnode, int 
error)
 
list_for_each_entry_safe(p, _p, >pending_locks, fl_u.afs.link) {
if (error &&
-   p->fl_type == type &&
-   afs_file_key(p->fl_file) == key) {
+   p->c.flc_type == type &&
+   afs_file_key(p->c.flc_file) == key) {
list_del_init(>fl_u.afs.link);
p->fl_u.afs.state = error;
locks_wake_up(p);
}
 
/* Select the next locker to hand off to. */
-   if (next &&
-   (lock_is_write(next) || lock_is_read(p)))
+   if (next && (lock_is_write(next) || lock_is_read(p)))
continue;
next = p;
}
@@ -464,7 +463,7 @@ static int afs_do_setlk(struct file *file, struct file_lock 
*fl)
 
_enter("{%llx:%llu},%llu-%llu,%u,%u",
   vnode->fid.vid, vnode->fid.vnode,
-  fl->fl_start, fl->fl_end, fl->fl_type, mode);
+  fl->fl_start, fl->fl_end, fl->c.flc_type, mode);
 
fl->fl_ops = _lock_ops;
INIT_LIST_HEAD(>fl_u.afs.link);
@@ -524,7 +523,7 @@ static int afs_do_setlk(struct file *file, struct file_lock 
*fl)
}
 
if (vnode->lock_state == AFS_VNODE_LOCK_NONE &&
-   !(fl->fl_flags & FL_SLEEP)) {
+   !(fl->c.flc_flags & FL_SLEEP)) {
ret = -EAGAIN;
if (type == AFS_LOCK_READ) {
if (vnode->status.lock_count == -1)
@@ -621,7 +620,7 @@ static int afs_do_setlk(struct file *file, struct file_lock 
*fl)
return 0;
 
 lock_is_contended:
-   if (!(fl->fl_flags & FL_SLEEP)) {
+   if (!(fl->c.flc_flags & FL_SLEEP)) {
list_del_init(>fl_u.afs.link);
afs_next_locker(vnode, 0);
ret = -EAGAIN;
@@ -641,7 +640,7 @@ static int afs_do_setlk(struct file *file, struct file_lock 
*fl)
spin_unlock(>lock);
 
trace_afs_flock_ev(vnode, fl, afs_flock_waiting, 0);
-   ret = wait_event_interruptible(fl->fl_wait,
+   ret = wait_event_interruptible(fl->c.flc_wait,
   fl->fl_u.afs.state != AFS_LOCK_PENDING);
trace_afs_flock_ev(vnode, fl, afs_flock_waited, ret);
 
@@ -704,7 +703,8 @@ static int afs_do_unlk(struct file *file, struct file_lock 
*fl)
struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
int ret;
 
-   _enter("{%llx:%llu},%u", vnode->fid.vid, vnode->fid.vnode, fl->fl_type);
+   _enter("{%llx:%llu},%u", vnode->fid.vid, vnode->fid.vnode,
+  fl->c.flc_type);
 
trace_afs_flock_op(vnode, fl, afs_flock_op_unlock);
 
@@ -730,7 +730,7 @@ static int afs_do_getlk(struct file *file, struct file_lock 
*fl)
if (vnode->lock_state == AFS_VNODE_LOCK_DELETED)
return -ENOENT;
 
-   fl->fl_type = F_UNLCK;
+   fl->c.flc_type = F_UNLCK;
 
/* check local lock records first */
posix_test_lock(file, fl);
@@ -743,18 +743,18 @@ static int afs_do_getlk(struct file *file, struct 
file_lock *fl)
lock_count = READ_ONCE(vnode->status.lock_count);
if (lock_count != 0) {
if (lock_count > 0)
-   fl->fl_type = F_RDLCK;
+   fl->c.flc_type = F_RDLCK;
else
-   fl->fl_type = F_WRLCK;
+   fl->c.flc_type = F_WRLCK;
fl->fl_start = 0;
fl->fl_end = OFFSET_MAX;
-   fl->fl_pid = 0;
+   fl->c.flc_pid = 0;
}
}
 
ret = 0;
 error:
-   _leave(" = %d [%hd]", ret, fl->fl_type);
+   _leave(" = %d [%hd]", ret, fl->c.flc_type);
return ret;
 }
 
@@ -769,7 +769,7 @@ int afs_lock(struct file *file, int cmd, struct file_lock 
*fl)
 
_enter("{%llx:%llu},%d,{t=%x,fl=%x,r=%Ld:%Ld}",
   vnode->fid.vid, vnode->fid.vnode, cmd,
-  fl->fl_type, fl->fl_flags,
+  fl->c.flc_type, fl->c.flc_flags,
   (long long) fl->fl_start, (long long) fl->fl_end);
 
if (IS_GETLK(cmd))
@@ -804,7 +804,7 @@ int afs_flock(struct file *file, int cmd, struct file_lock 
*fl)
 
_enter("{%llx:%llu},%d,{t=%x,fl=%x}",
   

[PATCH v3 34/47] 9p: adapt to breakup of struct file_lock

2024-01-31 Thread Jeff Layton
Most of the existing APIs have remained the same, but subsystems that
access file_lock fields directly need to reach into struct
file_lock_core now.

Signed-off-by: Jeff Layton 
---
 fs/9p/vfs_file.c | 39 +++
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index a1dabcf73380..abdbbaee5184 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -9,7 +9,6 @@
 #include 
 #include 
 #include 
-#define _NEED_FILE_LOCK_FIELD_MACROS
 #include 
 #include 
 #include 
@@ -108,7 +107,7 @@ static int v9fs_file_lock(struct file *filp, int cmd, 
struct file_lock *fl)
 
p9_debug(P9_DEBUG_VFS, "filp: %p lock: %p\n", filp, fl);
 
-   if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->fl_type != F_UNLCK) {
+   if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->c.flc_type != F_UNLCK) {
filemap_write_and_wait(inode->i_mapping);
invalidate_mapping_pages(>i_data, 0, -1);
}
@@ -127,7 +126,7 @@ static int v9fs_file_do_lock(struct file *filp, int cmd, 
struct file_lock *fl)
fid = filp->private_data;
BUG_ON(fid == NULL);
 
-   BUG_ON((fl->fl_flags & FL_POSIX) != FL_POSIX);
+   BUG_ON((fl->c.flc_flags & FL_POSIX) != FL_POSIX);
 
res = locks_lock_file_wait(filp, fl);
if (res < 0)
@@ -136,7 +135,7 @@ static int v9fs_file_do_lock(struct file *filp, int cmd, 
struct file_lock *fl)
/* convert posix lock to p9 tlock args */
memset(, 0, sizeof(flock));
/* map the lock type */
-   switch (fl->fl_type) {
+   switch (fl->c.flc_type) {
case F_RDLCK:
flock.type = P9_LOCK_TYPE_RDLCK;
break;
@@ -152,7 +151,7 @@ static int v9fs_file_do_lock(struct file *filp, int cmd, 
struct file_lock *fl)
flock.length = 0;
else
flock.length = fl->fl_end - fl->fl_start + 1;
-   flock.proc_id = fl->fl_pid;
+   flock.proc_id = fl->c.flc_pid;
flock.client_id = fid->clnt->name;
if (IS_SETLKW(cmd))
flock.flags = P9_LOCK_FLAGS_BLOCK;
@@ -207,13 +206,13 @@ static int v9fs_file_do_lock(struct file *filp, int cmd, 
struct file_lock *fl)
 * incase server returned error for lock request, revert
 * it locally
 */
-   if (res < 0 && fl->fl_type != F_UNLCK) {
-   unsigned char type = fl->fl_type;
+   if (res < 0 && fl->c.flc_type != F_UNLCK) {
+   unsigned char type = fl->c.flc_type;
 
-   fl->fl_type = F_UNLCK;
+   fl->c.flc_type = F_UNLCK;
/* Even if this fails we want to return the remote error */
locks_lock_file_wait(filp, fl);
-   fl->fl_type = type;
+   fl->c.flc_type = type;
}
if (flock.client_id != fid->clnt->name)
kfree(flock.client_id);
@@ -235,7 +234,7 @@ static int v9fs_file_getlock(struct file *filp, struct 
file_lock *fl)
 * if we have a conflicting lock locally, no need to validate
 * with server
 */
-   if (fl->fl_type != F_UNLCK)
+   if (fl->c.flc_type != F_UNLCK)
return res;
 
/* convert posix lock to p9 tgetlock args */
@@ -246,7 +245,7 @@ static int v9fs_file_getlock(struct file *filp, struct 
file_lock *fl)
glock.length = 0;
else
glock.length = fl->fl_end - fl->fl_start + 1;
-   glock.proc_id = fl->fl_pid;
+   glock.proc_id = fl->c.flc_pid;
glock.client_id = fid->clnt->name;
 
res = p9_client_getlock_dotl(fid, );
@@ -255,13 +254,13 @@ static int v9fs_file_getlock(struct file *filp, struct 
file_lock *fl)
/* map 9p lock type to os lock type */
switch (glock.type) {
case P9_LOCK_TYPE_RDLCK:
-   fl->fl_type = F_RDLCK;
+   fl->c.flc_type = F_RDLCK;
break;
case P9_LOCK_TYPE_WRLCK:
-   fl->fl_type = F_WRLCK;
+   fl->c.flc_type = F_WRLCK;
break;
case P9_LOCK_TYPE_UNLCK:
-   fl->fl_type = F_UNLCK;
+   fl->c.flc_type = F_UNLCK;
break;
}
if (glock.type != P9_LOCK_TYPE_UNLCK) {
@@ -270,7 +269,7 @@ static int v9fs_file_getlock(struct file *filp, struct 
file_lock *fl)
fl->fl_end = OFFSET_MAX;
else
fl->fl_end = glock.start + glock.length - 1;
-   fl->fl_pid = -glock.proc_id;
+   fl->c.flc_pid = -glock.proc_id;
}
 out:
if (glock.client_id != fid->clnt->name)
@@ -294,7 +293,7 @@ static int v9fs_file_lock_dotl(struct file *filp, int cmd, 
struct file_lock *fl)
p9_debug(P9_DEBUG_VFS, "filp: %p cmd:%d lock: %p name: %pD\n",
 filp, cmd, fl, filp);
 
-   if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->fl_type != F_UNLCK) {
+   if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && 

[PATCH v3 33/47] filelock: convert seqfile handling to use file_lock_core

2024-01-31 Thread Jeff Layton
Reduce some pointer manipulation by just using file_lock_core where we
can and only translate to a file_lock when needed.

Signed-off-by: Jeff Layton 
---
 fs/locks.c | 72 +++---
 1 file changed, 36 insertions(+), 36 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 97f6e9163130..1a4b01203d3d 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2718,52 +2718,53 @@ struct locks_iterator {
loff_t  li_pos;
 };
 
-static void lock_get_status(struct seq_file *f, struct file_lock *fl,
+static void lock_get_status(struct seq_file *f, struct file_lock_core *flc,
loff_t id, char *pfx, int repeat)
 {
struct inode *inode = NULL;
unsigned int pid;
struct pid_namespace *proc_pidns = 
proc_pid_ns(file_inode(f->file)->i_sb);
-   int type = fl->c.flc_type;
+   int type = flc->flc_type;
+   struct file_lock *fl = file_lock(flc);
+
+   pid = locks_translate_pid(flc, proc_pidns);
 
-   pid = locks_translate_pid(>c, proc_pidns);
/*
 * If lock owner is dead (and pid is freed) or not visible in current
 * pidns, zero is shown as a pid value. Check lock info from
 * init_pid_ns to get saved lock pid value.
 */
-
-   if (fl->c.flc_file != NULL)
-   inode = file_inode(fl->c.flc_file);
+   if (flc->flc_file != NULL)
+   inode = file_inode(flc->flc_file);
 
seq_printf(f, "%lld: ", id);
 
if (repeat)
seq_printf(f, "%*s", repeat - 1 + (int)strlen(pfx), pfx);
 
-   if (fl->c.flc_flags & FL_POSIX) {
-   if (fl->c.flc_flags & FL_ACCESS)
+   if (flc->flc_flags & FL_POSIX) {
+   if (flc->flc_flags & FL_ACCESS)
seq_puts(f, "ACCESS");
-   else if (fl->c.flc_flags & FL_OFDLCK)
+   else if (flc->flc_flags & FL_OFDLCK)
seq_puts(f, "OFDLCK");
else
seq_puts(f, "POSIX ");
 
seq_printf(f, " %s ",
 (inode == NULL) ? "*NOINODE*" : "ADVISORY ");
-   } else if (fl->c.flc_flags & FL_FLOCK) {
+   } else if (flc->flc_flags & FL_FLOCK) {
seq_puts(f, "FLOCK  ADVISORY  ");
-   } else if (fl->c.flc_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT)) {
+   } else if (flc->flc_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT)) {
type = target_leasetype(fl);
 
-   if (fl->c.flc_flags & FL_DELEG)
+   if (flc->flc_flags & FL_DELEG)
seq_puts(f, "DELEG  ");
else
seq_puts(f, "LEASE  ");
 
if (lease_breaking(fl))
seq_puts(f, "BREAKING  ");
-   else if (fl->c.flc_file)
+   else if (flc->flc_file)
seq_puts(f, "ACTIVE");
else
seq_puts(f, "BREAKER   ");
@@ -2781,7 +2782,7 @@ static void lock_get_status(struct seq_file *f, struct 
file_lock *fl,
} else {
seq_printf(f, "%d :0 ", pid);
}
-   if (fl->c.flc_flags & FL_POSIX) {
+   if (flc->flc_flags & FL_POSIX) {
if (fl->fl_end == OFFSET_MAX)
seq_printf(f, "%Ld EOF\n", fl->fl_start);
else
@@ -2791,18 +2792,18 @@ static void lock_get_status(struct seq_file *f, struct 
file_lock *fl,
}
 }
 
-static struct file_lock *get_next_blocked_member(struct file_lock *node)
+static struct file_lock_core *get_next_blocked_member(struct file_lock_core 
*node)
 {
-   struct file_lock *tmp;
+   struct file_lock_core *tmp;
 
/* NULL node or root node */
-   if (node == NULL || node->c.flc_blocker == NULL)
+   if (node == NULL || node->flc_blocker == NULL)
return NULL;
 
/* Next member in the linked list could be itself */
-   tmp = list_next_entry(node, c.flc_blocked_member);
-   if (list_entry_is_head(tmp, >c.flc_blocker->flc_blocked_requests,
-  c.flc_blocked_member)
+   tmp = list_next_entry(node, flc_blocked_member);
+   if (list_entry_is_head(tmp, >flc_blocker->flc_blocked_requests,
+  flc_blocked_member)
|| tmp == node) {
return NULL;
}
@@ -2813,18 +2814,18 @@ static struct file_lock *get_next_blocked_member(struct 
file_lock *node)
 static int locks_show(struct seq_file *f, void *v)
 {
struct locks_iterator *iter = f->private;
-   struct file_lock *cur, *tmp;
+   struct file_lock_core *cur, *tmp;
struct pid_namespace *proc_pidns = 
proc_pid_ns(file_inode(f->file)->i_sb);
int level = 0;
 
-   cur = hlist_entry(v, struct file_lock, c.flc_link);
+   cur = hlist_entry(v, struct file_lock_core, flc_link);
 
-   if (locks_translate_pid(>c, proc_pidns) == 0)
+   if 

[PATCH v3 32/47] filelock: convert locks_translate_pid to take file_lock_core

2024-01-31 Thread Jeff Layton
locks_translate_pid is used on both locks and leases, so have that take
struct file_lock_core.

Signed-off-by: Jeff Layton 
---
 fs/locks.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 50d02a53ca75..97f6e9163130 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2169,17 +2169,17 @@ EXPORT_SYMBOL_GPL(vfs_test_lock);
  *
  * Used to translate a fl_pid into a namespace virtual pid number
  */
-static pid_t locks_translate_pid(struct file_lock *fl, struct pid_namespace 
*ns)
+static pid_t locks_translate_pid(struct file_lock_core *fl, struct 
pid_namespace *ns)
 {
pid_t vnr;
struct pid *pid;
 
-   if (fl->c.flc_flags & FL_OFDLCK)
+   if (fl->flc_flags & FL_OFDLCK)
return -1;
 
/* Remote locks report a negative pid value */
-   if (fl->c.flc_pid <= 0)
-   return fl->c.flc_pid;
+   if (fl->flc_pid <= 0)
+   return fl->flc_pid;
 
/*
 * If the flock owner process is dead and its pid has been already
@@ -2187,10 +2187,10 @@ static pid_t locks_translate_pid(struct file_lock *fl, 
struct pid_namespace *ns)
 * flock owner pid number in init pidns.
 */
if (ns == _pid_ns)
-   return (pid_t) fl->c.flc_pid;
+   return (pid_t) fl->flc_pid;
 
rcu_read_lock();
-   pid = find_pid_ns(fl->c.flc_pid, _pid_ns);
+   pid = find_pid_ns(fl->flc_pid, _pid_ns);
vnr = pid_nr_ns(pid, ns);
rcu_read_unlock();
return vnr;
@@ -2198,7 +2198,7 @@ static pid_t locks_translate_pid(struct file_lock *fl, 
struct pid_namespace *ns)
 
 static int posix_lock_to_flock(struct flock *flock, struct file_lock *fl)
 {
-   flock->l_pid = locks_translate_pid(fl, task_active_pid_ns(current));
+   flock->l_pid = locks_translate_pid(>c, task_active_pid_ns(current));
 #if BITS_PER_LONG == 32
/*
 * Make sure we can represent the posix lock via
@@ -2220,7 +2220,7 @@ static int posix_lock_to_flock(struct flock *flock, 
struct file_lock *fl)
 #if BITS_PER_LONG == 32
 static void posix_lock_to_flock64(struct flock64 *flock, struct file_lock *fl)
 {
-   flock->l_pid = locks_translate_pid(fl, task_active_pid_ns(current));
+   flock->l_pid = locks_translate_pid(>c, task_active_pid_ns(current));
flock->l_start = fl->fl_start;
flock->l_len = fl->fl_end == OFFSET_MAX ? 0 :
fl->fl_end - fl->fl_start + 1;
@@ -2726,7 +2726,7 @@ static void lock_get_status(struct seq_file *f, struct 
file_lock *fl,
struct pid_namespace *proc_pidns = 
proc_pid_ns(file_inode(f->file)->i_sb);
int type = fl->c.flc_type;
 
-   pid = locks_translate_pid(fl, proc_pidns);
+   pid = locks_translate_pid(>c, proc_pidns);
/*
 * If lock owner is dead (and pid is freed) or not visible in current
 * pidns, zero is shown as a pid value. Check lock info from
@@ -2819,7 +2819,7 @@ static int locks_show(struct seq_file *f, void *v)
 
cur = hlist_entry(v, struct file_lock, c.flc_link);
 
-   if (locks_translate_pid(cur, proc_pidns) == 0)
+   if (locks_translate_pid(>c, proc_pidns) == 0)
return 0;
 
/* View this crossed linked list as a binary tree, the first member of 
fl_blocked_requests

-- 
2.43.0




[PATCH v3 31/47] filelock: convert locks_insert_lock_ctx and locks_delete_lock_ctx

2024-01-31 Thread Jeff Layton
Have these functions take a file_lock_core pointer instead of a
file_lock.

Signed-off-by: Jeff Layton 
---
 fs/locks.c | 44 ++--
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 9f3670ba0880..50d02a53ca75 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -824,28 +824,28 @@ static void locks_wake_up_blocks(struct file_lock_core 
*blocker)
 }
 
 static void
-locks_insert_lock_ctx(struct file_lock *fl, struct list_head *before)
+locks_insert_lock_ctx(struct file_lock_core *fl, struct list_head *before)
 {
-   list_add_tail(>c.flc_list, before);
-   locks_insert_global_locks(>c);
+   list_add_tail(>flc_list, before);
+   locks_insert_global_locks(fl);
 }
 
 static void
-locks_unlink_lock_ctx(struct file_lock *fl)
+locks_unlink_lock_ctx(struct file_lock_core *fl)
 {
-   locks_delete_global_locks(>c);
-   list_del_init(>c.flc_list);
-   locks_wake_up_blocks(>c);
+   locks_delete_global_locks(fl);
+   list_del_init(>flc_list);
+   locks_wake_up_blocks(fl);
 }
 
 static void
-locks_delete_lock_ctx(struct file_lock *fl, struct list_head *dispose)
+locks_delete_lock_ctx(struct file_lock_core *fl, struct list_head *dispose)
 {
locks_unlink_lock_ctx(fl);
if (dispose)
-   list_add(>c.flc_list, dispose);
+   list_add(>flc_list, dispose);
else
-   locks_free_lock(fl);
+   locks_free_lock(file_lock(fl));
 }
 
 /* Determine if lock sys_fl blocks lock caller_fl. Common functionality
@@ -1072,7 +1072,7 @@ static int flock_lock_inode(struct inode *inode, struct 
file_lock *request)
if (request->c.flc_type == fl->c.flc_type)
goto out;
found = true;
-   locks_delete_lock_ctx(fl, );
+   locks_delete_lock_ctx(>c, );
break;
}
 
@@ -1097,7 +1097,7 @@ static int flock_lock_inode(struct inode *inode, struct 
file_lock *request)
goto out;
locks_copy_lock(new_fl, request);
locks_move_blocks(new_fl, request);
-   locks_insert_lock_ctx(new_fl, >flc_flock);
+   locks_insert_lock_ctx(_fl->c, >flc_flock);
new_fl = NULL;
error = 0;
 
@@ -1236,7 +1236,7 @@ static int posix_lock_inode(struct inode *inode, struct 
file_lock *request,
else
request->fl_end = fl->fl_end;
if (added) {
-   locks_delete_lock_ctx(fl, );
+   locks_delete_lock_ctx(>c, );
continue;
}
request = fl;
@@ -1265,7 +1265,7 @@ static int posix_lock_inode(struct inode *inode, struct 
file_lock *request,
 * one (This may happen several times).
 */
if (added) {
-   locks_delete_lock_ctx(fl, );
+   locks_delete_lock_ctx(>c, );
continue;
}
/*
@@ -1282,9 +1282,9 @@ static int posix_lock_inode(struct inode *inode, struct 
file_lock *request,
locks_move_blocks(new_fl, request);
request = new_fl;
new_fl = NULL;
-   locks_insert_lock_ctx(request,
+   locks_insert_lock_ctx(>c,
  >c.flc_list);
-   locks_delete_lock_ctx(fl, );
+   locks_delete_lock_ctx(>c, );
added = true;
}
}
@@ -1313,7 +1313,7 @@ static int posix_lock_inode(struct inode *inode, struct 
file_lock *request,
}
locks_copy_lock(new_fl, request);
locks_move_blocks(new_fl, request);
-   locks_insert_lock_ctx(new_fl, >c.flc_list);
+   locks_insert_lock_ctx(_fl->c, >c.flc_list);
fl = new_fl;
new_fl = NULL;
}
@@ -1325,7 +1325,7 @@ static int posix_lock_inode(struct inode *inode, struct 
file_lock *request,
left = new_fl2;
new_fl2 = NULL;
locks_copy_lock(left, right);
-   locks_insert_lock_ctx(left, >c.flc_list);
+   locks_insert_lock_ctx(>c, >c.flc_list);
}
right->fl_start = request->fl_end + 1;
locks_wake_up_blocks(>c);
@@ -1425,7 +1425,7 @@ int lease_modify(struct file_lock *fl, int arg, struct 
list_head *dispose)
printk(KERN_ERR "locks_delete_lock: fasync == %p\n", 
fl->fl_fasync);
  

[PATCH v3 30/47] filelock: convert locks_wake_up_blocks to take a file_lock_core pointer

2024-01-31 Thread Jeff Layton
Have locks_wake_up_blocks take a file_lock_core pointer, and fix up the
callers to pass one in.

Signed-off-by: Jeff Layton 
---
 fs/locks.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 6892511ed89b..9f3670ba0880 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -806,7 +806,7 @@ static void locks_insert_block(struct file_lock_core 
*blocker,
  *
  * Must be called with the inode->flc_lock held!
  */
-static void locks_wake_up_blocks(struct file_lock *blocker)
+static void locks_wake_up_blocks(struct file_lock_core *blocker)
 {
/*
 * Avoid taking global lock if list is empty. This is safe since new
@@ -815,11 +815,11 @@ static void locks_wake_up_blocks(struct file_lock 
*blocker)
 * fl_blocked_requests list does not require the flc_lock, so we must
 * recheck list_empty() after acquiring the blocked_lock_lock.
 */
-   if (list_empty(>c.flc_blocked_requests))
+   if (list_empty(>flc_blocked_requests))
return;
 
spin_lock(_lock_lock);
-   __locks_wake_up_blocks(>c);
+   __locks_wake_up_blocks(blocker);
spin_unlock(_lock_lock);
 }
 
@@ -835,7 +835,7 @@ locks_unlink_lock_ctx(struct file_lock *fl)
 {
locks_delete_global_locks(>c);
list_del_init(>c.flc_list);
-   locks_wake_up_blocks(fl);
+   locks_wake_up_blocks(>c);
 }
 
 static void
@@ -1328,11 +1328,11 @@ static int posix_lock_inode(struct inode *inode, struct 
file_lock *request,
locks_insert_lock_ctx(left, >c.flc_list);
}
right->fl_start = request->fl_end + 1;
-   locks_wake_up_blocks(right);
+   locks_wake_up_blocks(>c);
}
if (left) {
left->fl_end = request->fl_start - 1;
-   locks_wake_up_blocks(left);
+   locks_wake_up_blocks(>c);
}
  out:
spin_unlock(>flc_lock);
@@ -1414,7 +1414,7 @@ int lease_modify(struct file_lock *fl, int arg, struct 
list_head *dispose)
if (error)
return error;
lease_clear_pending(fl, arg);
-   locks_wake_up_blocks(fl);
+   locks_wake_up_blocks(>c);
if (arg == F_UNLCK) {
struct file *filp = fl->c.flc_file;
 

-- 
2.43.0




[PATCH v3 29/47] filelock: make assign_type helper take a file_lock_core pointer

2024-01-31 Thread Jeff Layton
Have assign_type take struct file_lock_core instead of file_lock.

Signed-off-by: Jeff Layton 
---
 fs/locks.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index c8fd2964dd98..6892511ed89b 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -439,13 +439,13 @@ static void flock_make_lock(struct file *filp, struct 
file_lock *fl, int type)
fl->fl_end = OFFSET_MAX;
 }
 
-static int assign_type(struct file_lock *fl, int type)
+static int assign_type(struct file_lock_core *flc, int type)
 {
switch (type) {
case F_RDLCK:
case F_WRLCK:
case F_UNLCK:
-   fl->c.flc_type = type;
+   flc->flc_type = type;
break;
default:
return -EINVAL;
@@ -497,7 +497,7 @@ static int flock64_to_posix_lock(struct file *filp, struct 
file_lock *fl,
fl->fl_ops = NULL;
fl->fl_lmops = NULL;
 
-   return assign_type(fl, l->l_type);
+   return assign_type(>c, l->l_type);
 }
 
 /* Verify a "struct flock" and copy it to a "struct file_lock" as a POSIX
@@ -552,7 +552,7 @@ static const struct lock_manager_operations 
lease_manager_ops = {
  */
 static int lease_init(struct file *filp, int type, struct file_lock *fl)
 {
-   if (assign_type(fl, type) != 0)
+   if (assign_type(>c, type) != 0)
return -EINVAL;
 
fl->c.flc_owner = filp;
@@ -1409,7 +1409,7 @@ static void lease_clear_pending(struct file_lock *fl, int 
arg)
 /* We already had a lease on this file; just change its type */
 int lease_modify(struct file_lock *fl, int arg, struct list_head *dispose)
 {
-   int error = assign_type(fl, arg);
+   int error = assign_type(>c, arg);
 
if (error)
return error;

-- 
2.43.0




[PATCH v3 28/47] filelock: reorganize locks_delete_block and __locks_insert_block

2024-01-31 Thread Jeff Layton
Rename the old __locks_delete_block to __locks_unlink_lock. Rename
change old locks_delete_block function to __locks_delete_block and
have it take a file_lock_core. Make locks_delete_block a simple wrapper
around __locks_delete_block.

Also, change __locks_insert_block to take struct file_lock_core, and
fix up its callers.

Signed-off-by: Jeff Layton 
---
 fs/locks.c | 42 ++
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index a2be1e0b5a94..c8fd2964dd98 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -659,7 +659,7 @@ static void locks_delete_global_blocked(struct 
file_lock_core *waiter)
  *
  * Must be called with blocked_lock_lock held.
  */
-static void __locks_delete_block(struct file_lock_core *waiter)
+static void __locks_unlink_block(struct file_lock_core *waiter)
 {
locks_delete_global_blocked(waiter);
list_del_init(>flc_blocked_member);
@@ -675,7 +675,7 @@ static void __locks_wake_up_blocks(struct file_lock_core 
*blocker)
  struct file_lock_core, 
flc_blocked_member);
 
fl = file_lock(waiter);
-   __locks_delete_block(waiter);
+   __locks_unlink_block(waiter);
if ((waiter->flc_flags & (FL_POSIX | FL_FLOCK)) &&
fl->fl_lmops && fl->fl_lmops->lm_notify)
fl->fl_lmops->lm_notify(fl);
@@ -691,16 +691,9 @@ static void __locks_wake_up_blocks(struct file_lock_core 
*blocker)
}
 }
 
-/**
- * locks_delete_block - stop waiting for a file lock
- * @waiter: the lock which was waiting
- *
- * lockd/nfsd need to disconnect the lock while working on it.
- */
-int locks_delete_block(struct file_lock *waiter_fl)
+static int __locks_delete_block(struct file_lock_core *waiter)
 {
int status = -ENOENT;
-   struct file_lock_core *waiter = _fl->c;
 
/*
 * If fl_blocker is NULL, it won't be set again as this thread "owns"
@@ -731,7 +724,7 @@ int locks_delete_block(struct file_lock *waiter_fl)
if (waiter->flc_blocker)
status = 0;
__locks_wake_up_blocks(waiter);
-   __locks_delete_block(waiter);
+   __locks_unlink_block(waiter);
 
/*
 * The setting of fl_blocker to NULL marks the "done" point in deleting
@@ -741,6 +734,17 @@ int locks_delete_block(struct file_lock *waiter_fl)
spin_unlock(_lock_lock);
return status;
 }
+
+/**
+ * locks_delete_block - stop waiting for a file lock
+ * @waiter: the lock which was waiting
+ *
+ * lockd/nfsd need to disconnect the lock while working on it.
+ */
+int locks_delete_block(struct file_lock *waiter)
+{
+   return __locks_delete_block(>c);
+}
 EXPORT_SYMBOL(locks_delete_block);
 
 /* Insert waiter into blocker's block list.
@@ -758,13 +762,11 @@ EXPORT_SYMBOL(locks_delete_block);
  * waiters, and add beneath any waiter that blocks the new waiter.
  * Thus wakeups don't happen until needed.
  */
-static void __locks_insert_block(struct file_lock *blocker_fl,
-struct file_lock *waiter_fl,
+static void __locks_insert_block(struct file_lock_core *blocker,
+struct file_lock_core *waiter,
 bool conflict(struct file_lock_core *,
   struct file_lock_core *))
 {
-   struct file_lock_core *blocker = _fl->c;
-   struct file_lock_core *waiter = _fl->c;
struct file_lock_core *flc;
 
BUG_ON(!list_empty(>flc_blocked_member));
@@ -789,8 +791,8 @@ static void __locks_insert_block(struct file_lock 
*blocker_fl,
 }
 
 /* Must be called with flc_lock held. */
-static void locks_insert_block(struct file_lock *blocker,
-  struct file_lock *waiter,
+static void locks_insert_block(struct file_lock_core *blocker,
+  struct file_lock_core *waiter,
   bool conflict(struct file_lock_core *,
 struct file_lock_core *))
 {
@@ -1088,7 +1090,7 @@ static int flock_lock_inode(struct inode *inode, struct 
file_lock *request)
if (!(request->c.flc_flags & FL_SLEEP))
goto out;
error = FILE_LOCK_DEFERRED;
-   locks_insert_block(fl, request, flock_locks_conflict);
+   locks_insert_block(>c, >c, flock_locks_conflict);
goto out;
}
if (request->c.flc_flags & FL_ACCESS)
@@ -1182,7 +1184,7 @@ static int posix_lock_inode(struct inode *inode, struct 
file_lock *request,
__locks_wake_up_blocks(>c);
if (likely(!posix_locks_deadlock(request, fl))) {
error = FILE_LOCK_DEFERRED;
-   __locks_insert_block(fl, request,
+   __locks_insert_block(>c, 

[PATCH v3 27/47] filelock: clean up locks_delete_block internals

2024-01-31 Thread Jeff Layton
Rework the internals of locks_delete_block to use struct file_lock_core
(mostly just for clarity's sake). The prototype is not changed.

Signed-off-by: Jeff Layton 
---
 fs/locks.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 0aa1c94671cd..a2be1e0b5a94 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -697,9 +697,10 @@ static void __locks_wake_up_blocks(struct file_lock_core 
*blocker)
  *
  * lockd/nfsd need to disconnect the lock while working on it.
  */
-int locks_delete_block(struct file_lock *waiter)
+int locks_delete_block(struct file_lock *waiter_fl)
 {
int status = -ENOENT;
+   struct file_lock_core *waiter = _fl->c;
 
/*
 * If fl_blocker is NULL, it won't be set again as this thread "owns"
@@ -722,21 +723,21 @@ int locks_delete_block(struct file_lock *waiter)
 * no new locks can be inserted into its fl_blocked_requests list, and
 * can avoid doing anything further if the list is empty.
 */
-   if (!smp_load_acquire(>c.flc_blocker) &&
-   list_empty(>c.flc_blocked_requests))
+   if (!smp_load_acquire(>flc_blocker) &&
+   list_empty(>flc_blocked_requests))
return status;
 
spin_lock(_lock_lock);
-   if (waiter->c.flc_blocker)
+   if (waiter->flc_blocker)
status = 0;
-   __locks_wake_up_blocks(>c);
-   __locks_delete_block(>c);
+   __locks_wake_up_blocks(waiter);
+   __locks_delete_block(waiter);
 
/*
 * The setting of fl_blocker to NULL marks the "done" point in deleting
 * a block. Paired with acquire at the top of this function.
 */
-   smp_store_release(>c.flc_blocker, NULL);
+   smp_store_release(>flc_blocker, NULL);
spin_unlock(_lock_lock);
return status;
 }

-- 
2.43.0




[PATCH v3 26/47] filelock: convert fl_blocker to file_lock_core

2024-01-31 Thread Jeff Layton
Both locks and leases deal with fl_blocker. Switch the fl_blocker
pointer in struct file_lock_core to point to the file_lock_core of the
blocker instead of a file_lock structure.

Signed-off-by: Jeff Layton 
---
 fs/locks.c  | 16 
 include/linux/filelock.h|  2 +-
 include/trace/events/filelock.h |  4 ++--
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 0dc1c9da858c..0aa1c94671cd 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -400,7 +400,7 @@ static void locks_move_blocks(struct file_lock *new, struct 
file_lock *fl)
 
/*
 * As ctx->flc_lock is held, new requests cannot be added to
-* ->fl_blocked_requests, so we don't need a lock to check if it
+* ->flc_blocked_requests, so we don't need a lock to check if it
 * is empty.
 */
if (list_empty(>c.flc_blocked_requests))
@@ -410,7 +410,7 @@ static void locks_move_blocks(struct file_lock *new, struct 
file_lock *fl)
 >c.flc_blocked_requests);
list_for_each_entry(f, >c.flc_blocked_requests,
c.flc_blocked_member)
-   f->c.flc_blocker = new;
+   f->c.flc_blocker = >c;
spin_unlock(_lock_lock);
 }
 
@@ -773,7 +773,7 @@ static void __locks_insert_block(struct file_lock 
*blocker_fl,
blocker =  flc;
goto new_blocker;
}
-   waiter->flc_blocker = file_lock(blocker);
+   waiter->flc_blocker = blocker;
list_add_tail(>flc_blocked_member,
  >flc_blocked_requests);
 
@@ -996,7 +996,7 @@ static struct file_lock_core 
*what_owner_is_waiting_for(struct file_lock_core *b
hash_for_each_possible(blocked_hash, flc, flc_link, 
posix_owner_key(blocker)) {
if (posix_same_owner(flc, blocker)) {
while (flc->flc_blocker)
-   flc = >flc_blocker->c;
+   flc = flc->flc_blocker;
return flc;
}
}
@@ -2798,9 +2798,9 @@ static struct file_lock *get_next_blocked_member(struct 
file_lock *node)
 
/* Next member in the linked list could be itself */
tmp = list_next_entry(node, c.flc_blocked_member);
-   if (list_entry_is_head(tmp, 
>c.flc_blocker->c.flc_blocked_requests,
-   c.flc_blocked_member)
-   || tmp == node) {
+   if (list_entry_is_head(tmp, >c.flc_blocker->flc_blocked_requests,
+  c.flc_blocked_member)
+   || tmp == node) {
return NULL;
}
 
@@ -2841,7 +2841,7 @@ static int locks_show(struct seq_file *f, void *v)
tmp = get_next_blocked_member(cur);
/* Fall back to parent node */
while (tmp == NULL && cur->c.flc_blocker != NULL) {
-   cur = cur->c.flc_blocker;
+   cur = file_lock(cur->c.flc_blocker);
level--;
tmp = get_next_blocked_member(cur);
}
diff --git a/include/linux/filelock.h b/include/linux/filelock.h
index 4dab73bb34b9..fdec838a3ca7 100644
--- a/include/linux/filelock.h
+++ b/include/linux/filelock.h
@@ -87,7 +87,7 @@ bool opens_in_grace(struct net *);
  */
 
 struct file_lock_core {
-   struct file_lock *flc_blocker;  /* The lock that is blocking us */
+   struct file_lock_core *flc_blocker; /* The lock that is blocking us 
*/
struct list_head flc_list;  /* link into file_lock_context */
struct hlist_node flc_link; /* node in global lists */
struct list_head flc_blocked_requests;  /* list of requests with
diff --git a/include/trace/events/filelock.h b/include/trace/events/filelock.h
index 4be341b5ead0..c778061c6249 100644
--- a/include/trace/events/filelock.h
+++ b/include/trace/events/filelock.h
@@ -68,7 +68,7 @@ DECLARE_EVENT_CLASS(filelock_lock,
__field(struct file_lock *, fl)
__field(unsigned long, i_ino)
__field(dev_t, s_dev)
-   __field(struct file_lock *, blocker)
+   __field(struct file_lock_core *, blocker)
__field(fl_owner_t, owner)
__field(unsigned int, pid)
__field(unsigned int, flags)
@@ -125,7 +125,7 @@ DECLARE_EVENT_CLASS(filelock_lease,
__field(struct file_lock *, fl)
__field(unsigned long, i_ino)
__field(dev_t, s_dev)
-   __field(struct file_lock *, blocker)
+   __field(struct file_lock_core *, blocker)
__field(fl_owner_t, owner)
__field(unsigned int, flags)
__field(unsigned char, type)

-- 
2.43.0




[PATCH v3 25/47] filelock: convert __locks_insert_block, conflict and deadlock checks to use file_lock_core

2024-01-31 Thread Jeff Layton
Have both __locks_insert_block and the deadlock and conflict checking
functions take a struct file_lock_core pointer instead of a struct
file_lock one. Also, change posix_locks_deadlock to return bool.

Signed-off-by: Jeff Layton 
---
 fs/locks.c | 132 +
 1 file changed, 72 insertions(+), 60 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 1e8b943bd7f9..0dc1c9da858c 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -757,39 +757,41 @@ EXPORT_SYMBOL(locks_delete_block);
  * waiters, and add beneath any waiter that blocks the new waiter.
  * Thus wakeups don't happen until needed.
  */
-static void __locks_insert_block(struct file_lock *blocker,
-struct file_lock *waiter,
-bool conflict(struct file_lock *,
-  struct file_lock *))
+static void __locks_insert_block(struct file_lock *blocker_fl,
+struct file_lock *waiter_fl,
+bool conflict(struct file_lock_core *,
+  struct file_lock_core *))
 {
-   struct file_lock *fl;
-   BUG_ON(!list_empty(>c.flc_blocked_member));
+   struct file_lock_core *blocker = _fl->c;
+   struct file_lock_core *waiter = _fl->c;
+   struct file_lock_core *flc;
 
+   BUG_ON(!list_empty(>flc_blocked_member));
 new_blocker:
-   list_for_each_entry(fl, >c.flc_blocked_requests,
-   c.flc_blocked_member)
-   if (conflict(fl, waiter)) {
-   blocker =  fl;
+   list_for_each_entry(flc, >flc_blocked_requests, 
flc_blocked_member)
+   if (conflict(flc, waiter)) {
+   blocker =  flc;
goto new_blocker;
}
-   waiter->c.flc_blocker = blocker;
-   list_add_tail(>c.flc_blocked_member,
- >c.flc_blocked_requests);
-   if ((blocker->c.flc_flags & (FL_POSIX|FL_OFDLCK)) == FL_POSIX)
-   locks_insert_global_blocked(>c);
+   waiter->flc_blocker = file_lock(blocker);
+   list_add_tail(>flc_blocked_member,
+ >flc_blocked_requests);
 
-   /* The requests in waiter->fl_blocked are known to conflict with
+   if ((blocker->flc_flags & (FL_POSIX|FL_OFDLCK)) == (FL_POSIX|FL_OFDLCK))
+   locks_insert_global_blocked(waiter);
+
+   /* The requests in waiter->flc_blocked are known to conflict with
 * waiter, but might not conflict with blocker, or the requests
 * and lock which block it.  So they all need to be woken.
 */
-   __locks_wake_up_blocks(>c);
+   __locks_wake_up_blocks(waiter);
 }
 
 /* Must be called with flc_lock held. */
 static void locks_insert_block(struct file_lock *blocker,
   struct file_lock *waiter,
-  bool conflict(struct file_lock *,
-struct file_lock *))
+  bool conflict(struct file_lock_core *,
+struct file_lock_core *))
 {
spin_lock(_lock_lock);
__locks_insert_block(blocker, waiter, conflict);
@@ -846,12 +848,12 @@ locks_delete_lock_ctx(struct file_lock *fl, struct 
list_head *dispose)
 /* Determine if lock sys_fl blocks lock caller_fl. Common functionality
  * checks for shared/exclusive status of overlapping locks.
  */
-static bool locks_conflict(struct file_lock *caller_fl,
-  struct file_lock *sys_fl)
+static bool locks_conflict(struct file_lock_core *caller_flc,
+  struct file_lock_core *sys_flc)
 {
-   if (lock_is_write(sys_fl))
+   if (sys_flc->flc_type == F_WRLCK)
return true;
-   if (lock_is_write(caller_fl))
+   if (caller_flc->flc_type == F_WRLCK)
return true;
return false;
 }
@@ -859,20 +861,23 @@ static bool locks_conflict(struct file_lock *caller_fl,
 /* Determine if lock sys_fl blocks lock caller_fl. POSIX specific
  * checking before calling the locks_conflict().
  */
-static bool posix_locks_conflict(struct file_lock *caller_fl,
-struct file_lock *sys_fl)
+static bool posix_locks_conflict(struct file_lock_core *caller_flc,
+struct file_lock_core *sys_flc)
 {
+   struct file_lock *caller_fl = file_lock(caller_flc);
+   struct file_lock *sys_fl = file_lock(sys_flc);
+
/* POSIX locks owned by the same process do not conflict with
 * each other.
 */
-   if (posix_same_owner(_fl->c, _fl->c))
+   if (posix_same_owner(caller_flc, sys_flc))
return false;
 
/* Check whether they overlap */
if (!locks_overlap(caller_fl, sys_fl))
return false;
 
-   return locks_conflict(caller_fl, sys_fl);
+ 

[PATCH v3 24/47] filelock: make __locks_delete_block and __locks_wake_up_blocks take file_lock_core

2024-01-31 Thread Jeff Layton
Convert __locks_delete_block and __locks_wake_up_blocks to take a struct
file_lock_core pointer.

While we could do this in another way, we're going to need to add a
file_lock() helper function later anyway, so introduce and use it now.

Signed-off-by: Jeff Layton 
---
 fs/locks.c | 45 +++--
 1 file changed, 27 insertions(+), 18 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index ef67a5a7bae8..1e8b943bd7f9 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -69,6 +69,11 @@
 
 #include 
 
+static struct file_lock *file_lock(struct file_lock_core *flc)
+{
+   return container_of(flc, struct file_lock, c);
+}
+
 static bool lease_breaking(struct file_lock *fl)
 {
return fl->c.flc_flags & (FL_UNLOCK_PENDING | FL_DOWNGRADE_PENDING);
@@ -654,31 +659,35 @@ static void locks_delete_global_blocked(struct 
file_lock_core *waiter)
  *
  * Must be called with blocked_lock_lock held.
  */
-static void __locks_delete_block(struct file_lock *waiter)
+static void __locks_delete_block(struct file_lock_core *waiter)
 {
-   locks_delete_global_blocked(>c);
-   list_del_init(>c.flc_blocked_member);
+   locks_delete_global_blocked(waiter);
+   list_del_init(>flc_blocked_member);
 }
 
-static void __locks_wake_up_blocks(struct file_lock *blocker)
+static void __locks_wake_up_blocks(struct file_lock_core *blocker)
 {
-   while (!list_empty(>c.flc_blocked_requests)) {
-   struct file_lock *waiter;
+   while (!list_empty(>flc_blocked_requests)) {
+   struct file_lock_core *waiter;
+   struct file_lock *fl;
+
+   waiter = list_first_entry(>flc_blocked_requests,
+ struct file_lock_core, 
flc_blocked_member);
 
-   waiter = list_first_entry(>c.flc_blocked_requests,
- struct file_lock, 
c.flc_blocked_member);
+   fl = file_lock(waiter);
__locks_delete_block(waiter);
-   if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
-   waiter->fl_lmops->lm_notify(waiter);
+   if ((waiter->flc_flags & (FL_POSIX | FL_FLOCK)) &&
+   fl->fl_lmops && fl->fl_lmops->lm_notify)
+   fl->fl_lmops->lm_notify(fl);
else
-   locks_wake_up(waiter);
+   locks_wake_up(fl);
 
/*
-* The setting of fl_blocker to NULL marks the "done"
+* The setting of flc_blocker to NULL marks the "done"
 * point in deleting a block. Paired with acquire at the top
 * of locks_delete_block().
 */
-   smp_store_release(>c.flc_blocker, NULL);
+   smp_store_release(>flc_blocker, NULL);
}
 }
 
@@ -720,8 +729,8 @@ int locks_delete_block(struct file_lock *waiter)
spin_lock(_lock_lock);
if (waiter->c.flc_blocker)
status = 0;
-   __locks_wake_up_blocks(waiter);
-   __locks_delete_block(waiter);
+   __locks_wake_up_blocks(>c);
+   __locks_delete_block(>c);
 
/*
 * The setting of fl_blocker to NULL marks the "done" point in deleting
@@ -773,7 +782,7 @@ static void __locks_insert_block(struct file_lock *blocker,
 * waiter, but might not conflict with blocker, or the requests
 * and lock which block it.  So they all need to be woken.
 */
-   __locks_wake_up_blocks(waiter);
+   __locks_wake_up_blocks(>c);
 }
 
 /* Must be called with flc_lock held. */
@@ -805,7 +814,7 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
return;
 
spin_lock(_lock_lock);
-   __locks_wake_up_blocks(blocker);
+   __locks_wake_up_blocks(>c);
spin_unlock(_lock_lock);
 }
 
@@ -1159,7 +1168,7 @@ static int posix_lock_inode(struct inode *inode, struct 
file_lock *request,
 * Ensure that we don't find any locks blocked on this
 * request during deadlock detection.
 */
-   __locks_wake_up_blocks(request);
+   __locks_wake_up_blocks(>c);
if (likely(!posix_locks_deadlock(request, fl))) {
error = FILE_LOCK_DEFERRED;
__locks_insert_block(fl, request,

-- 
2.43.0




[PATCH v3 23/47] filelock: convert locks_{insert,delete}_global_blocked

2024-01-31 Thread Jeff Layton
Have locks_insert_global_blocked and locks_delete_global_blocked take a
struct file_lock_core pointer.

Signed-off-by: Jeff Layton 
---
 fs/locks.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index fa9b2beed0d7..ef67a5a7bae8 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -635,19 +635,18 @@ posix_owner_key(struct file_lock_core *flc)
return (unsigned long) flc->flc_owner;
 }
 
-static void locks_insert_global_blocked(struct file_lock *waiter)
+static void locks_insert_global_blocked(struct file_lock_core *waiter)
 {
lockdep_assert_held(_lock_lock);
 
-   hash_add(blocked_hash, >c.flc_link,
-posix_owner_key(>c));
+   hash_add(blocked_hash, >flc_link, posix_owner_key(waiter));
 }
 
-static void locks_delete_global_blocked(struct file_lock *waiter)
+static void locks_delete_global_blocked(struct file_lock_core *waiter)
 {
lockdep_assert_held(_lock_lock);
 
-   hash_del(>c.flc_link);
+   hash_del(>flc_link);
 }
 
 /* Remove waiter from blocker's block list.
@@ -657,7 +656,7 @@ static void locks_delete_global_blocked(struct file_lock 
*waiter)
  */
 static void __locks_delete_block(struct file_lock *waiter)
 {
-   locks_delete_global_blocked(waiter);
+   locks_delete_global_blocked(>c);
list_del_init(>c.flc_blocked_member);
 }
 
@@ -768,7 +767,7 @@ static void __locks_insert_block(struct file_lock *blocker,
list_add_tail(>c.flc_blocked_member,
  >c.flc_blocked_requests);
if ((blocker->c.flc_flags & (FL_POSIX|FL_OFDLCK)) == FL_POSIX)
-   locks_insert_global_blocked(waiter);
+   locks_insert_global_blocked(>c);
 
/* The requests in waiter->fl_blocked are known to conflict with
 * waiter, but might not conflict with blocker, or the requests

-- 
2.43.0




[PATCH v3 22/47] filelock: make locks_{insert,delete}_global_locks take file_lock_core arg

2024-01-31 Thread Jeff Layton
Convert these functions to take a file_lock_core instead of a file_lock.

Signed-off-by: Jeff Layton 
---
 fs/locks.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 1cfd02562e9f..fa9b2beed0d7 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -596,20 +596,20 @@ static int posix_same_owner(struct file_lock_core *fl1, 
struct file_lock_core *f
 }
 
 /* Must be called with the flc_lock held! */
-static void locks_insert_global_locks(struct file_lock *fl)
+static void locks_insert_global_locks(struct file_lock_core *flc)
 {
struct file_lock_list_struct *fll = this_cpu_ptr(_lock_list);
 
percpu_rwsem_assert_held(_rwsem);
 
spin_lock(>lock);
-   fl->c.flc_link_cpu = smp_processor_id();
-   hlist_add_head(>c.flc_link, >hlist);
+   flc->flc_link_cpu = smp_processor_id();
+   hlist_add_head(>flc_link, >hlist);
spin_unlock(>lock);
 }
 
 /* Must be called with the flc_lock held! */
-static void locks_delete_global_locks(struct file_lock *fl)
+static void locks_delete_global_locks(struct file_lock_core *flc)
 {
struct file_lock_list_struct *fll;
 
@@ -620,12 +620,12 @@ static void locks_delete_global_locks(struct file_lock 
*fl)
 * is done while holding the flc_lock, and new insertions into the list
 * also require that it be held.
 */
-   if (hlist_unhashed(>c.flc_link))
+   if (hlist_unhashed(>flc_link))
return;
 
-   fll = per_cpu_ptr(_lock_list, fl->c.flc_link_cpu);
+   fll = per_cpu_ptr(_lock_list, flc->flc_link_cpu);
spin_lock(>lock);
-   hlist_del_init(>c.flc_link);
+   hlist_del_init(>flc_link);
spin_unlock(>lock);
 }
 
@@ -814,13 +814,13 @@ static void
 locks_insert_lock_ctx(struct file_lock *fl, struct list_head *before)
 {
list_add_tail(>c.flc_list, before);
-   locks_insert_global_locks(fl);
+   locks_insert_global_locks(>c);
 }
 
 static void
 locks_unlink_lock_ctx(struct file_lock *fl)
 {
-   locks_delete_global_locks(fl);
+   locks_delete_global_locks(>c);
list_del_init(>c.flc_list);
locks_wake_up_blocks(fl);
 }

-- 
2.43.0




[PATCH v3 21/47] filelock: convert posix_owner_key to take file_lock_core arg

2024-01-31 Thread Jeff Layton
Convert posix_owner_key to take struct file_lock_core pointer, and fix
up the callers to pass one in.

Signed-off-by: Jeff Layton 
---
 fs/locks.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 9ff331b55b7a..1cfd02562e9f 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -630,9 +630,9 @@ static void locks_delete_global_locks(struct file_lock *fl)
 }
 
 static unsigned long
-posix_owner_key(struct file_lock *fl)
+posix_owner_key(struct file_lock_core *flc)
 {
-   return (unsigned long) fl->c.flc_owner;
+   return (unsigned long) flc->flc_owner;
 }
 
 static void locks_insert_global_blocked(struct file_lock *waiter)
@@ -640,7 +640,7 @@ static void locks_insert_global_blocked(struct file_lock 
*waiter)
lockdep_assert_held(_lock_lock);
 
hash_add(blocked_hash, >c.flc_link,
-posix_owner_key(waiter));
+posix_owner_key(>c));
 }
 
 static void locks_delete_global_blocked(struct file_lock *waiter)
@@ -977,7 +977,7 @@ static struct file_lock *what_owner_is_waiting_for(struct 
file_lock *block_fl)
 {
struct file_lock *fl;
 
-   hash_for_each_possible(blocked_hash, fl, c.flc_link, 
posix_owner_key(block_fl)) {
+   hash_for_each_possible(blocked_hash, fl, c.flc_link, 
posix_owner_key(_fl->c)) {
if (posix_same_owner(>c, _fl->c)) {
while (fl->c.flc_blocker)
fl = fl->c.flc_blocker;

-- 
2.43.0




[PATCH v3 20/47] filelock: make posix_same_owner take file_lock_core pointers

2024-01-31 Thread Jeff Layton
Change posix_same_owner to take struct file_lock_core pointers, and
convert the callers to pass those in.

Signed-off-by: Jeff Layton 
---
 fs/locks.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 5d25a3f53c9d..9ff331b55b7a 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -590,9 +590,9 @@ static inline int locks_overlap(struct file_lock *fl1, 
struct file_lock *fl2)
 /*
  * Check whether two locks have the same owner.
  */
-static int posix_same_owner(struct file_lock *fl1, struct file_lock *fl2)
+static int posix_same_owner(struct file_lock_core *fl1, struct file_lock_core 
*fl2)
 {
-   return fl1->c.flc_owner == fl2->c.flc_owner;
+   return fl1->flc_owner == fl2->flc_owner;
 }
 
 /* Must be called with the flc_lock held! */
@@ -857,7 +857,7 @@ static bool posix_locks_conflict(struct file_lock 
*caller_fl,
/* POSIX locks owned by the same process do not conflict with
 * each other.
 */
-   if (posix_same_owner(caller_fl, sys_fl))
+   if (posix_same_owner(_fl->c, _fl->c))
return false;
 
/* Check whether they overlap */
@@ -875,7 +875,7 @@ static bool posix_test_locks_conflict(struct file_lock 
*caller_fl,
 {
/* F_UNLCK checks any locks on the same fd. */
if (lock_is_unlock(caller_fl)) {
-   if (!posix_same_owner(caller_fl, sys_fl))
+   if (!posix_same_owner(_fl->c, _fl->c))
return false;
return locks_overlap(caller_fl, sys_fl);
}
@@ -978,7 +978,7 @@ static struct file_lock *what_owner_is_waiting_for(struct 
file_lock *block_fl)
struct file_lock *fl;
 
hash_for_each_possible(blocked_hash, fl, c.flc_link, 
posix_owner_key(block_fl)) {
-   if (posix_same_owner(fl, block_fl)) {
+   if (posix_same_owner(>c, _fl->c)) {
while (fl->c.flc_blocker)
fl = fl->c.flc_blocker;
return fl;
@@ -1005,7 +1005,7 @@ static int posix_locks_deadlock(struct file_lock 
*caller_fl,
while ((block_fl = what_owner_is_waiting_for(block_fl))) {
if (i++ > MAX_DEADLK_ITERATIONS)
return 0;
-   if (posix_same_owner(caller_fl, block_fl))
+   if (posix_same_owner(_fl->c, _fl->c))
return 1;
}
return 0;
@@ -1178,13 +1178,13 @@ static int posix_lock_inode(struct inode *inode, struct 
file_lock *request,
 
/* Find the first old lock with the same owner as the new lock */
list_for_each_entry(fl, >flc_posix, c.flc_list) {
-   if (posix_same_owner(request, fl))
+   if (posix_same_owner(>c, >c))
break;
}
 
/* Process locks with this owner. */
list_for_each_entry_safe_from(fl, tmp, >flc_posix, c.flc_list) {
-   if (!posix_same_owner(request, fl))
+   if (!posix_same_owner(>c, >c))
break;
 
/* Detect adjacent or overlapping regions (if same lock type) */

-- 
2.43.0




[PATCH v3 19/47] filelock: convert more internal functions to use file_lock_core

2024-01-31 Thread Jeff Layton
Convert more internal fs/locks.c functions to take and deal with struct
file_lock_core instead of struct file_lock:

- locks_dump_ctx_list
- locks_check_ctx_file_list
- locks_release_private
- locks_owner_has_blockers

Signed-off-by: Jeff Layton 
---
 fs/locks.c | 51 +--
 1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index f418c6e31219..5d25a3f53c9d 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -197,13 +197,12 @@ locks_get_lock_context(struct inode *inode, int type)
 static void
 locks_dump_ctx_list(struct list_head *list, char *list_type)
 {
-   struct file_lock *fl;
+   struct file_lock_core *flc;
 
-   list_for_each_entry(fl, list, c.flc_list) {
-   pr_warn("%s: fl_owner=%p fl_flags=0x%x fl_type=0x%x 
fl_pid=%u\n", list_type,
-   fl->c.flc_owner, fl->c.flc_flags,
-   fl->c.flc_type, fl->c.flc_pid);
-   }
+   list_for_each_entry(flc, list, flc_list)
+   pr_warn("%s: fl_owner=%p fl_flags=0x%x fl_type=0x%x 
fl_pid=%u\n",
+   list_type, flc->flc_owner, flc->flc_flags,
+   flc->flc_type, flc->flc_pid);
 }
 
 static void
@@ -224,20 +223,19 @@ locks_check_ctx_lists(struct inode *inode)
 }
 
 static void
-locks_check_ctx_file_list(struct file *filp, struct list_head *list,
-   char *list_type)
+locks_check_ctx_file_list(struct file *filp, struct list_head *list, char 
*list_type)
 {
-   struct file_lock *fl;
+   struct file_lock_core *flc;
struct inode *inode = file_inode(filp);
 
-   list_for_each_entry(fl, list, c.flc_list)
-   if (fl->c.flc_file == filp)
+   list_for_each_entry(flc, list, flc_list)
+   if (flc->flc_file == filp)
pr_warn("Leaked %s lock on dev=0x%x:0x%x ino=0x%lx "
" fl_owner=%p fl_flags=0x%x fl_type=0x%x 
fl_pid=%u\n",
list_type, MAJOR(inode->i_sb->s_dev),
MINOR(inode->i_sb->s_dev), inode->i_ino,
-   fl->c.flc_owner, fl->c.flc_flags,
-   fl->c.flc_type, fl->c.flc_pid);
+   flc->flc_owner, flc->flc_flags,
+   flc->flc_type, flc->flc_pid);
 }
 
 void
@@ -274,11 +272,13 @@ EXPORT_SYMBOL_GPL(locks_alloc_lock);
 
 void locks_release_private(struct file_lock *fl)
 {
-   BUG_ON(waitqueue_active(>c.flc_wait));
-   BUG_ON(!list_empty(>c.flc_list));
-   BUG_ON(!list_empty(>c.flc_blocked_requests));
-   BUG_ON(!list_empty(>c.flc_blocked_member));
-   BUG_ON(!hlist_unhashed(>c.flc_link));
+   struct file_lock_core *flc = >c;
+
+   BUG_ON(waitqueue_active(>flc_wait));
+   BUG_ON(!list_empty(>flc_list));
+   BUG_ON(!list_empty(>flc_blocked_requests));
+   BUG_ON(!list_empty(>flc_blocked_member));
+   BUG_ON(!hlist_unhashed(>flc_link));
 
if (fl->fl_ops) {
if (fl->fl_ops->fl_release_private)
@@ -288,8 +288,8 @@ void locks_release_private(struct file_lock *fl)
 
if (fl->fl_lmops) {
if (fl->fl_lmops->lm_put_owner) {
-   fl->fl_lmops->lm_put_owner(fl->c.flc_owner);
-   fl->c.flc_owner = NULL;
+   fl->fl_lmops->lm_put_owner(flc->flc_owner);
+   flc->flc_owner = NULL;
}
fl->fl_lmops = NULL;
}
@@ -305,16 +305,15 @@ EXPORT_SYMBOL_GPL(locks_release_private);
  *   %true: @owner has at least one blocker
  *   %false: @owner has no blockers
  */
-bool locks_owner_has_blockers(struct file_lock_context *flctx,
-   fl_owner_t owner)
+bool locks_owner_has_blockers(struct file_lock_context *flctx, fl_owner_t 
owner)
 {
-   struct file_lock *fl;
+   struct file_lock_core *flc;
 
spin_lock(>flc_lock);
-   list_for_each_entry(fl, >flc_posix, c.flc_list) {
-   if (fl->c.flc_owner != owner)
+   list_for_each_entry(flc, >flc_posix, flc_list) {
+   if (flc->flc_owner != owner)
continue;
-   if (!list_empty(>c.flc_blocked_requests)) {
+   if (!list_empty(>flc_blocked_requests)) {
spin_unlock(>flc_lock);
return true;
}

-- 
2.43.0




[PATCH v3 18/47] filelock: have fs/locks.c deal with file_lock_core directly

2024-01-31 Thread Jeff Layton
Convert fs/locks.c to access fl_core fields direcly rather than using
the backward-compatibility macros. Most of this was done with
coccinelle, with a few by-hand fixups.

Signed-off-by: Jeff Layton 
---
 fs/locks.c  | 467 
 include/trace/events/filelock.h |  32 +--
 2 files changed, 254 insertions(+), 245 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 097254ab35d3..f418c6e31219 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -48,8 +48,6 @@
  * children.
  *
  */
-#define _NEED_FILE_LOCK_FIELD_MACROS
-
 #include 
 #include 
 #include 
@@ -73,16 +71,16 @@
 
 static bool lease_breaking(struct file_lock *fl)
 {
-   return fl->fl_flags & (FL_UNLOCK_PENDING | FL_DOWNGRADE_PENDING);
+   return fl->c.flc_flags & (FL_UNLOCK_PENDING | FL_DOWNGRADE_PENDING);
 }
 
 static int target_leasetype(struct file_lock *fl)
 {
-   if (fl->fl_flags & FL_UNLOCK_PENDING)
+   if (fl->c.flc_flags & FL_UNLOCK_PENDING)
return F_UNLCK;
-   if (fl->fl_flags & FL_DOWNGRADE_PENDING)
+   if (fl->c.flc_flags & FL_DOWNGRADE_PENDING)
return F_RDLCK;
-   return fl->fl_type;
+   return fl->c.flc_type;
 }
 
 static int leases_enable = 1;
@@ -201,8 +199,10 @@ locks_dump_ctx_list(struct list_head *list, char 
*list_type)
 {
struct file_lock *fl;
 
-   list_for_each_entry(fl, list, fl_list) {
-   pr_warn("%s: fl_owner=%p fl_flags=0x%x fl_type=0x%x 
fl_pid=%u\n", list_type, fl->fl_owner, fl->fl_flags, fl->fl_type, fl->fl_pid);
+   list_for_each_entry(fl, list, c.flc_list) {
+   pr_warn("%s: fl_owner=%p fl_flags=0x%x fl_type=0x%x 
fl_pid=%u\n", list_type,
+   fl->c.flc_owner, fl->c.flc_flags,
+   fl->c.flc_type, fl->c.flc_pid);
}
 }
 
@@ -230,13 +230,14 @@ locks_check_ctx_file_list(struct file *filp, struct 
list_head *list,
struct file_lock *fl;
struct inode *inode = file_inode(filp);
 
-   list_for_each_entry(fl, list, fl_list)
-   if (fl->fl_file == filp)
+   list_for_each_entry(fl, list, c.flc_list)
+   if (fl->c.flc_file == filp)
pr_warn("Leaked %s lock on dev=0x%x:0x%x ino=0x%lx "
" fl_owner=%p fl_flags=0x%x fl_type=0x%x 
fl_pid=%u\n",
list_type, MAJOR(inode->i_sb->s_dev),
MINOR(inode->i_sb->s_dev), inode->i_ino,
-   fl->fl_owner, fl->fl_flags, fl->fl_type, 
fl->fl_pid);
+   fl->c.flc_owner, fl->c.flc_flags,
+   fl->c.flc_type, fl->c.flc_pid);
 }
 
 void
@@ -250,13 +251,13 @@ locks_free_lock_context(struct inode *inode)
}
 }
 
-static void locks_init_lock_heads(struct file_lock *fl)
+static void locks_init_lock_heads(struct file_lock_core *flc)
 {
-   INIT_HLIST_NODE(>fl_link);
-   INIT_LIST_HEAD(>fl_list);
-   INIT_LIST_HEAD(>fl_blocked_requests);
-   INIT_LIST_HEAD(>fl_blocked_member);
-   init_waitqueue_head(>fl_wait);
+   INIT_HLIST_NODE(>flc_link);
+   INIT_LIST_HEAD(>flc_list);
+   INIT_LIST_HEAD(>flc_blocked_requests);
+   INIT_LIST_HEAD(>flc_blocked_member);
+   init_waitqueue_head(>flc_wait);
 }
 
 /* Allocate an empty lock structure. */
@@ -265,7 +266,7 @@ struct file_lock *locks_alloc_lock(void)
struct file_lock *fl = kmem_cache_zalloc(filelock_cache, GFP_KERNEL);
 
if (fl)
-   locks_init_lock_heads(fl);
+   locks_init_lock_heads(>c);
 
return fl;
 }
@@ -273,11 +274,11 @@ EXPORT_SYMBOL_GPL(locks_alloc_lock);
 
 void locks_release_private(struct file_lock *fl)
 {
-   BUG_ON(waitqueue_active(>fl_wait));
-   BUG_ON(!list_empty(>fl_list));
-   BUG_ON(!list_empty(>fl_blocked_requests));
-   BUG_ON(!list_empty(>fl_blocked_member));
-   BUG_ON(!hlist_unhashed(>fl_link));
+   BUG_ON(waitqueue_active(>c.flc_wait));
+   BUG_ON(!list_empty(>c.flc_list));
+   BUG_ON(!list_empty(>c.flc_blocked_requests));
+   BUG_ON(!list_empty(>c.flc_blocked_member));
+   BUG_ON(!hlist_unhashed(>c.flc_link));
 
if (fl->fl_ops) {
if (fl->fl_ops->fl_release_private)
@@ -287,8 +288,8 @@ void locks_release_private(struct file_lock *fl)
 
if (fl->fl_lmops) {
if (fl->fl_lmops->lm_put_owner) {
-   fl->fl_lmops->lm_put_owner(fl->fl_owner);
-   fl->fl_owner = NULL;
+   fl->fl_lmops->lm_put_owner(fl->c.flc_owner);
+   fl->c.flc_owner = NULL;
}
fl->fl_lmops = NULL;
}
@@ -310,10 +311,10 @@ bool locks_owner_has_blockers(struct file_lock_context 
*flctx,
struct file_lock *fl;
 
spin_lock(>flc_lock);
-   list_for_each_entry(fl, >flc_posix, fl_list) {
-   if (fl->fl_owner != owner)
+   

[PATCH v3 14/47] smb/client: convert to using new filelock helpers

2024-01-31 Thread Jeff Layton
Convert to using the new file locking helper functions.

Signed-off-by: Jeff Layton 
---
 fs/smb/client/file.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index b75282c204da..27f9ef4e69a8 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -1409,7 +1409,7 @@ cifs_posix_lock_test(struct file *file, struct file_lock 
*flock)
down_read(>lock_sem);
posix_test_lock(file, flock);
 
-   if (flock->fl_type == F_UNLCK && !cinode->can_cache_brlcks) {
+   if (lock_is_unlock(flock) && !cinode->can_cache_brlcks) {
flock->fl_type = saved_type;
rc = 1;
}
@@ -1581,7 +1581,7 @@ cifs_push_posix_locks(struct cifsFileInfo *cfile)
 
el = locks_to_send.next;
spin_lock(>flc_lock);
-   list_for_each_entry(flock, >flc_posix, fl_list) {
+   for_each_file_lock(flock, >flc_posix) {
if (el == _to_send) {
/*
 * The list ended. We don't have enough allocated
@@ -1591,7 +1591,7 @@ cifs_push_posix_locks(struct cifsFileInfo *cfile)
break;
}
length = cifs_flock_len(flock);
-   if (flock->fl_type == F_RDLCK || flock->fl_type == F_SHLCK)
+   if (lock_is_read(flock) || flock->fl_type == F_SHLCK)
type = CIFS_RDLCK;
else
type = CIFS_WRLCK;
@@ -1681,16 +1681,16 @@ cifs_read_flock(struct file_lock *flock, __u32 *type, 
int *lock, int *unlock,
cifs_dbg(FYI, "Unknown lock flags 0x%x\n", flock->fl_flags);
 
*type = server->vals->large_lock_type;
-   if (flock->fl_type == F_WRLCK) {
+   if (lock_is_write(flock)) {
cifs_dbg(FYI, "F_WRLCK\n");
*type |= server->vals->exclusive_lock_type;
*lock = 1;
-   } else if (flock->fl_type == F_UNLCK) {
+   } else if (lock_is_unlock(flock)) {
cifs_dbg(FYI, "F_UNLCK\n");
*type |= server->vals->unlock_lock_type;
*unlock = 1;
/* Check if unlock includes more than one lock range */
-   } else if (flock->fl_type == F_RDLCK) {
+   } else if (lock_is_read(flock)) {
cifs_dbg(FYI, "F_RDLCK\n");
*type |= server->vals->shared_lock_type;
*lock = 1;

-- 
2.43.0




[PATCH v3 11/47] nfs: convert to using new filelock helpers

2024-01-31 Thread Jeff Layton
Convert to using the new file locking helper functions. Also, in later
patches we're going to introduce some temporary macros with names that
clash with the variable name in nfs4_proc_unlck. Rename it.

Signed-off-by: Jeff Layton 
---
 fs/nfs/delegation.c |  2 +-
 fs/nfs/file.c   |  4 ++--
 fs/nfs/nfs4proc.c   | 12 ++--
 fs/nfs/nfs4state.c  | 18 +-
 fs/nfs/nfs4xdr.c|  2 +-
 fs/nfs/write.c  |  4 ++--
 6 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index fa1a14def45c..ca6985001466 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -156,7 +156,7 @@ static int nfs_delegation_claim_locks(struct nfs4_state 
*state, const nfs4_state
list = >flc_posix;
spin_lock(>flc_lock);
 restart:
-   list_for_each_entry(fl, list, fl_list) {
+   for_each_file_lock(fl, list) {
if (nfs_file_open_context(fl->fl_file)->state != state)
continue;
spin_unlock(>flc_lock);
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 8577ccf621f5..1a7a76d6055b 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -851,7 +851,7 @@ int nfs_lock(struct file *filp, int cmd, struct file_lock 
*fl)
 
if (IS_GETLK(cmd))
ret = do_getlk(filp, cmd, fl, is_local);
-   else if (fl->fl_type == F_UNLCK)
+   else if (lock_is_unlock(fl))
ret = do_unlk(filp, cmd, fl, is_local);
else
ret = do_setlk(filp, cmd, fl, is_local);
@@ -878,7 +878,7 @@ int nfs_flock(struct file *filp, int cmd, struct file_lock 
*fl)
is_local = 1;
 
/* We're simulating flock() locks using posix locks on the server */
-   if (fl->fl_type == F_UNLCK)
+   if (lock_is_unlock(fl))
return do_unlk(filp, cmd, fl, is_local);
return do_setlk(filp, cmd, fl, is_local);
 }
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 23819a756508..df54fcd0fa08 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7045,7 +7045,7 @@ static int nfs4_proc_unlck(struct nfs4_state *state, int 
cmd, struct file_lock *
struct rpc_task *task;
struct nfs_seqid *(*alloc_seqid)(struct nfs_seqid_counter *, gfp_t);
int status = 0;
-   unsigned char fl_flags = request->fl_flags;
+   unsigned char saved_flags = request->fl_flags;
 
status = nfs4_set_lock_state(state, request);
/* Unlock _before_ we do the RPC call */
@@ -7080,7 +7080,7 @@ static int nfs4_proc_unlck(struct nfs4_state *state, int 
cmd, struct file_lock *
status = rpc_wait_for_completion_task(task);
rpc_put_task(task);
 out:
-   request->fl_flags = fl_flags;
+   request->fl_flags = saved_flags;
trace_nfs4_unlock(request, state, F_SETLK, status);
return status;
 }
@@ -7398,7 +7398,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int 
cmd, struct file_lock
 {
struct nfs_inode *nfsi = NFS_I(state->inode);
struct nfs4_state_owner *sp = state->owner;
-   unsigned char fl_flags = request->fl_flags;
+   unsigned char flags = request->fl_flags;
int status;
 
request->fl_flags |= FL_ACCESS;
@@ -7410,7 +7410,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int 
cmd, struct file_lock
if (test_bit(NFS_DELEGATED_STATE, >flags)) {
/* Yes: cache locks! */
/* ...but avoid races with delegation recall... */
-   request->fl_flags = fl_flags & ~FL_SLEEP;
+   request->fl_flags = flags & ~FL_SLEEP;
status = locks_lock_inode_wait(state->inode, request);
up_read(>rwsem);
mutex_unlock(>so_delegreturn_mutex);
@@ -7420,7 +7420,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int 
cmd, struct file_lock
mutex_unlock(>so_delegreturn_mutex);
status = _nfs4_do_setlk(state, cmd, request, NFS_LOCK_NEW);
 out:
-   request->fl_flags = fl_flags;
+   request->fl_flags = flags;
return status;
 }
 
@@ -7562,7 +7562,7 @@ nfs4_proc_lock(struct file *filp, int cmd, struct 
file_lock *request)
if (!(IS_SETLK(cmd) || IS_SETLKW(cmd)))
return -EINVAL;
 
-   if (request->fl_type == F_UNLCK) {
+   if (lock_is_unlock(request)) {
if (state != NULL)
return nfs4_proc_unlck(state, cmd, request);
return 0;
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 9a5d911a7edc..16b57735e26a 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -847,15 +847,15 @@ void nfs4_close_sync(struct nfs4_state *state, fmode_t 
fmode)
  */
 static struct nfs4_lock_state *
 __nfs4_find_lock_state(struct nfs4_state *state,
-  fl_owner_t fl_owner, fl_owner_t fl_owner2)
+  fl_owner_t owner, fl_owner_t owner2)
 {
struct nfs4_lock_state *pos, *ret = NULL;

[PATCH v3 13/47] ocfs2: convert to using new filelock helpers

2024-01-31 Thread Jeff Layton
Convert to using the new file locking helper functions.

Signed-off-by: Jeff Layton 
---
 fs/ocfs2/locks.c  | 4 ++--
 fs/ocfs2/stack_user.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/ocfs2/locks.c b/fs/ocfs2/locks.c
index f37174e79fad..ef4fd91b586e 100644
--- a/fs/ocfs2/locks.c
+++ b/fs/ocfs2/locks.c
@@ -27,7 +27,7 @@ static int ocfs2_do_flock(struct file *file, struct inode 
*inode,
struct ocfs2_file_private *fp = file->private_data;
struct ocfs2_lock_res *lockres = >fp_flock;
 
-   if (fl->fl_type == F_WRLCK)
+   if (lock_is_write(fl))
level = 1;
if (!IS_SETLKW(cmd))
trylock = 1;
@@ -107,7 +107,7 @@ int ocfs2_flock(struct file *file, int cmd, struct 
file_lock *fl)
ocfs2_mount_local(osb))
return locks_lock_file_wait(file, fl);
 
-   if (fl->fl_type == F_UNLCK)
+   if (lock_is_unlock(fl))
return ocfs2_do_funlock(file, cmd, fl);
else
return ocfs2_do_flock(file, inode, cmd, fl);
diff --git a/fs/ocfs2/stack_user.c b/fs/ocfs2/stack_user.c
index 9b76ee66aeb2..c11406cd87a8 100644
--- a/fs/ocfs2/stack_user.c
+++ b/fs/ocfs2/stack_user.c
@@ -744,7 +744,7 @@ static int user_plock(struct ocfs2_cluster_connection *conn,
return dlm_posix_cancel(conn->cc_lockspace, ino, file, fl);
else if (IS_GETLK(cmd))
return dlm_posix_get(conn->cc_lockspace, ino, file, fl);
-   else if (fl->fl_type == F_UNLCK)
+   else if (lock_is_unlock(fl))
return dlm_posix_unlock(conn->cc_lockspace, ino, file, fl);
else
return dlm_posix_lock(conn->cc_lockspace, ino, file, cmd, fl);

-- 
2.43.0




[PATCH v3 12/47] nfsd: convert to using new filelock helpers

2024-01-31 Thread Jeff Layton
Convert to using the new file locking helper functions. Also, in later
patches we're going to introduce some macros with names that clash with
the variable names in nfsd4_lock. Rename them.

Signed-off-by: Jeff Layton 
---
 fs/nfsd/nfs4state.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 6dc6340e2852..83d605ecdcdc 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -7493,8 +7493,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct 
nfsd4_compound_state *cstate,
int lkflg;
int err;
bool new = false;
-   unsigned char fl_type;
-   unsigned int fl_flags = FL_POSIX;
+   unsigned char type;
+   unsigned int flags = FL_POSIX;
struct net *net = SVC_NET(rqstp);
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 
@@ -7557,14 +7557,14 @@ nfsd4_lock(struct svc_rqst *rqstp, struct 
nfsd4_compound_state *cstate,
goto out;
 
if (lock->lk_reclaim)
-   fl_flags |= FL_RECLAIM;
+   flags |= FL_RECLAIM;
 
fp = lock_stp->st_stid.sc_file;
switch (lock->lk_type) {
case NFS4_READW_LT:
if (nfsd4_has_session(cstate) ||
exportfs_lock_op_is_async(sb->s_export_op))
-   fl_flags |= FL_SLEEP;
+   flags |= FL_SLEEP;
fallthrough;
case NFS4_READ_LT:
spin_lock(>fi_lock);
@@ -7572,12 +7572,12 @@ nfsd4_lock(struct svc_rqst *rqstp, struct 
nfsd4_compound_state *cstate,
if (nf)
get_lock_access(lock_stp, 
NFS4_SHARE_ACCESS_READ);
spin_unlock(>fi_lock);
-   fl_type = F_RDLCK;
+   type = F_RDLCK;
break;
case NFS4_WRITEW_LT:
if (nfsd4_has_session(cstate) ||
exportfs_lock_op_is_async(sb->s_export_op))
-   fl_flags |= FL_SLEEP;
+   flags |= FL_SLEEP;
fallthrough;
case NFS4_WRITE_LT:
spin_lock(>fi_lock);
@@ -7585,7 +7585,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct 
nfsd4_compound_state *cstate,
if (nf)
get_lock_access(lock_stp, 
NFS4_SHARE_ACCESS_WRITE);
spin_unlock(>fi_lock);
-   fl_type = F_WRLCK;
+   type = F_WRLCK;
break;
default:
status = nfserr_inval;
@@ -7605,7 +7605,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct 
nfsd4_compound_state *cstate,
 * on those filesystems:
 */
if (!exportfs_lock_op_is_async(sb->s_export_op))
-   fl_flags &= ~FL_SLEEP;
+   flags &= ~FL_SLEEP;
 
nbl = find_or_allocate_block(lock_sop, >fi_fhandle, nn);
if (!nbl) {
@@ -7615,11 +7615,11 @@ nfsd4_lock(struct svc_rqst *rqstp, struct 
nfsd4_compound_state *cstate,
}
 
file_lock = >nbl_lock;
-   file_lock->fl_type = fl_type;
+   file_lock->fl_type = type;
file_lock->fl_owner = 
(fl_owner_t)lockowner(nfs4_get_stateowner(_sop->lo_owner));
file_lock->fl_pid = current->tgid;
file_lock->fl_file = nf->nf_file;
-   file_lock->fl_flags = fl_flags;
+   file_lock->fl_flags = flags;
file_lock->fl_lmops = _posix_mng_ops;
file_lock->fl_start = lock->lk_offset;
file_lock->fl_end = last_byte_offset(lock->lk_offset, lock->lk_length);
@@ -7632,7 +7632,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct 
nfsd4_compound_state *cstate,
goto out;
}
 
-   if (fl_flags & FL_SLEEP) {
+   if (flags & FL_SLEEP) {
nbl->nbl_time = ktime_get_boottime_seconds();
spin_lock(>blocked_locks_lock);
list_add_tail(>nbl_list, _sop->lo_blocked);
@@ -7669,7 +7669,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct 
nfsd4_compound_state *cstate,
 out:
if (nbl) {
/* dequeue it if we queued it before */
-   if (fl_flags & FL_SLEEP) {
+   if (flags & FL_SLEEP) {
spin_lock(>blocked_locks_lock);
if (!list_empty(>nbl_list) &&
!list_empty(>nbl_lru)) {
@@ -7928,7 +7928,7 @@ check_for_locks(struct nfs4_file *fp, struct 
nfs4_lockowner *lowner)
 
if (flctx && !list_empty_careful(>flc_posix)) {
spin_lock(>flc_lock);
-   list_for_each_entry(fl, >flc_posix, fl_list) {
+   for_each_file_lock(fl, >flc_posix) {
if (fl->fl_owner == (fl_owner_t)lowner) {
status = true;
   

[PATCH v3 10/47] lockd: convert to using new filelock helpers

2024-01-31 Thread Jeff Layton
Convert to using the new file locking helper functions. Also in later
patches we're going to introduce some macros with names that clash with
the variable names in nlmclnt_lock. Rename them.

Signed-off-by: Jeff Layton 
---
 fs/lockd/clntproc.c | 20 ++--
 fs/lockd/svcsubs.c  |  6 +++---
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index fba6c7fa7474..cc596748e359 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -522,8 +522,8 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
struct nlm_host *host = req->a_host;
struct nlm_res  *resp = >a_res;
struct nlm_wait block;
-   unsigned char fl_flags = fl->fl_flags;
-   unsigned char fl_type;
+   unsigned char flags = fl->fl_flags;
+   unsigned char type;
__be32 b_status;
int status = -ENOLCK;
 
@@ -533,7 +533,7 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
 
fl->fl_flags |= FL_ACCESS;
status = do_vfs_lock(fl);
-   fl->fl_flags = fl_flags;
+   fl->fl_flags = flags;
if (status < 0)
goto out;
 
@@ -595,7 +595,7 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
if (do_vfs_lock(fl) < 0)
printk(KERN_WARNING "%s: VFS is out of sync with lock 
manager!\n", __func__);
up_read(>h_rwsem);
-   fl->fl_flags = fl_flags;
+   fl->fl_flags = flags;
status = 0;
}
if (status < 0)
@@ -605,7 +605,7 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
 * cases NLM_LCK_DENIED is returned for a permanent error.  So
 * turn it into an ENOLCK.
 */
-   if (resp->status == nlm_lck_denied && (fl_flags & FL_SLEEP))
+   if (resp->status == nlm_lck_denied && (flags & FL_SLEEP))
status = -ENOLCK;
else
status = nlm_stat_to_errno(resp->status);
@@ -622,13 +622,13 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
   req->a_host->h_addrlen, req->a_res.status);
dprintk("lockd: lock attempt ended in fatal error.\n"
"   Attempting to unlock.\n");
-   fl_type = fl->fl_type;
+   type = fl->fl_type;
fl->fl_type = F_UNLCK;
down_read(>h_rwsem);
do_vfs_lock(fl);
up_read(>h_rwsem);
-   fl->fl_type = fl_type;
-   fl->fl_flags = fl_flags;
+   fl->fl_type = type;
+   fl->fl_flags = flags;
nlmclnt_async_call(cred, req, NLMPROC_UNLOCK, _unlock_ops);
return status;
 }
@@ -683,7 +683,7 @@ nlmclnt_unlock(struct nlm_rqst *req, struct file_lock *fl)
struct nlm_host *host = req->a_host;
struct nlm_res  *resp = >a_res;
int status;
-   unsigned char fl_flags = fl->fl_flags;
+   unsigned char flags = fl->fl_flags;
 
/*
 * Note: the server is supposed to either grant us the unlock
@@ -694,7 +694,7 @@ nlmclnt_unlock(struct nlm_rqst *req, struct file_lock *fl)
down_read(>h_rwsem);
status = do_vfs_lock(fl);
up_read(>h_rwsem);
-   fl->fl_flags = fl_flags;
+   fl->fl_flags = flags;
if (status == -ENOENT) {
status = 0;
goto out;
diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c
index e3b6229e7ae5..2f33c187b876 100644
--- a/fs/lockd/svcsubs.c
+++ b/fs/lockd/svcsubs.c
@@ -73,7 +73,7 @@ static inline unsigned int file_hash(struct nfs_fh *f)
 
 int lock_to_openmode(struct file_lock *lock)
 {
-   return (lock->fl_type == F_WRLCK) ? O_WRONLY : O_RDONLY;
+   return (lock_is_write(lock)) ? O_WRONLY : O_RDONLY;
 }
 
 /*
@@ -218,7 +218,7 @@ nlm_traverse_locks(struct nlm_host *host, struct nlm_file 
*file,
 again:
file->f_locks = 0;
spin_lock(>flc_lock);
-   list_for_each_entry(fl, >flc_posix, fl_list) {
+   for_each_file_lock(fl, >flc_posix) {
if (fl->fl_lmops != _lock_operations)
continue;
 
@@ -272,7 +272,7 @@ nlm_file_inuse(struct nlm_file *file)
 
if (flctx && !list_empty_careful(>flc_posix)) {
spin_lock(>flc_lock);
-   list_for_each_entry(fl, >flc_posix, fl_list) {
+   for_each_file_lock(fl, >flc_posix) {
if (fl->fl_lmops == _lock_operations) {
spin_unlock(>flc_lock);
return 1;

-- 
2.43.0




[PATCH v3 09/47] gfs2: convert to using new filelock helpers

2024-01-31 Thread Jeff Layton
Convert to using the new file locking helper functions.

Signed-off-by: Jeff Layton 
---
 fs/gfs2/file.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 992ca4effb50..6c25aea30f1b 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1443,7 +1443,7 @@ static int gfs2_lock(struct file *file, int cmd, struct 
file_lock *fl)
if (!(fl->fl_flags & FL_POSIX))
return -ENOLCK;
if (gfs2_withdrawing_or_withdrawn(sdp)) {
-   if (fl->fl_type == F_UNLCK)
+   if (lock_is_unlock(fl))
locks_lock_file_wait(file, fl);
return -EIO;
}
@@ -1451,7 +1451,7 @@ static int gfs2_lock(struct file *file, int cmd, struct 
file_lock *fl)
return dlm_posix_cancel(ls->ls_dlm, ip->i_no_addr, file, fl);
else if (IS_GETLK(cmd))
return dlm_posix_get(ls->ls_dlm, ip->i_no_addr, file, fl);
-   else if (fl->fl_type == F_UNLCK)
+   else if (lock_is_unlock(fl))
return dlm_posix_unlock(ls->ls_dlm, ip->i_no_addr, file, fl);
else
return dlm_posix_lock(ls->ls_dlm, ip->i_no_addr, file, cmd, fl);
@@ -1483,7 +1483,7 @@ static int do_flock(struct file *file, int cmd, struct 
file_lock *fl)
int error = 0;
int sleeptime;
 
-   state = (fl->fl_type == F_WRLCK) ? LM_ST_EXCLUSIVE : LM_ST_SHARED;
+   state = (lock_is_write(fl)) ? LM_ST_EXCLUSIVE : LM_ST_SHARED;
flags = GL_EXACT | GL_NOPID;
if (!IS_SETLKW(cmd))
flags |= LM_FLAG_TRY_1CB;
@@ -1560,7 +1560,7 @@ static int gfs2_flock(struct file *file, int cmd, struct 
file_lock *fl)
if (!(fl->fl_flags & FL_FLOCK))
return -ENOLCK;
 
-   if (fl->fl_type == F_UNLCK) {
+   if (lock_is_unlock(fl)) {
do_unflock(file, fl);
return 0;
} else {

-- 
2.43.0




[PATCH v3 08/47] dlm: convert to using new filelock helpers

2024-01-31 Thread Jeff Layton
Convert to using the new file locking helper functions. Also, in later
patches we're going to introduce some temporary macros with names that
clash with the variable name in dlm_posix_unlock. Rename it.

Signed-off-by: Jeff Layton 
---
 fs/dlm/plock.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index d814c5121367..42c596b900d4 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -139,7 +139,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, 
struct file *file,
 
op->info.optype = DLM_PLOCK_OP_LOCK;
op->info.pid= fl->fl_pid;
-   op->info.ex = (fl->fl_type == F_WRLCK);
+   op->info.ex = (lock_is_write(fl));
op->info.wait   = !!(fl->fl_flags & FL_SLEEP);
op->info.fsid   = ls->ls_global_id;
op->info.number = number;
@@ -291,7 +291,7 @@ int dlm_posix_unlock(dlm_lockspace_t *lockspace, u64 
number, struct file *file,
struct dlm_ls *ls;
struct plock_op *op;
int rv;
-   unsigned char fl_flags = fl->fl_flags;
+   unsigned char saved_flags = fl->fl_flags;
 
ls = dlm_find_lockspace_local(lockspace);
if (!ls)
@@ -345,7 +345,7 @@ int dlm_posix_unlock(dlm_lockspace_t *lockspace, u64 
number, struct file *file,
dlm_release_plock_op(op);
 out:
dlm_put_lockspace(ls);
-   fl->fl_flags = fl_flags;
+   fl->fl_flags = saved_flags;
return rv;
 }
 EXPORT_SYMBOL_GPL(dlm_posix_unlock);
@@ -376,7 +376,7 @@ int dlm_posix_cancel(dlm_lockspace_t *lockspace, u64 
number, struct file *file,
 
memset(, 0, sizeof(info));
info.pid = fl->fl_pid;
-   info.ex = (fl->fl_type == F_WRLCK);
+   info.ex = (lock_is_write(fl));
info.fsid = ls->ls_global_id;
dlm_put_lockspace(ls);
info.number = number;
@@ -438,7 +438,7 @@ int dlm_posix_get(dlm_lockspace_t *lockspace, u64 number, 
struct file *file,
 
op->info.optype = DLM_PLOCK_OP_GET;
op->info.pid= fl->fl_pid;
-   op->info.ex = (fl->fl_type == F_WRLCK);
+   op->info.ex = (lock_is_write(fl));
op->info.fsid   = ls->ls_global_id;
op->info.number = number;
op->info.start  = fl->fl_start;

-- 
2.43.0




[PATCH v3 06/47] afs: convert to using new filelock helpers

2024-01-31 Thread Jeff Layton
Convert to using the new file locking helper functions. Also, in later
patches we're going to introduce macros that conflict with the variable
name in afs_next_locker. Rename it.

Signed-off-by: Jeff Layton 
---
 fs/afs/flock.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/afs/flock.c b/fs/afs/flock.c
index 9c6dea3139f5..4eee3d1ca5ad 100644
--- a/fs/afs/flock.c
+++ b/fs/afs/flock.c
@@ -93,13 +93,13 @@ static void afs_grant_locks(struct afs_vnode *vnode)
bool exclusive = (vnode->lock_type == AFS_LOCK_WRITE);
 
list_for_each_entry_safe(p, _p, >pending_locks, fl_u.afs.link) {
-   if (!exclusive && p->fl_type == F_WRLCK)
+   if (!exclusive && lock_is_write(p))
continue;
 
list_move_tail(>fl_u.afs.link, >granted_locks);
p->fl_u.afs.state = AFS_LOCK_GRANTED;
trace_afs_flock_op(vnode, p, afs_flock_op_grant);
-   wake_up(>fl_wait);
+   locks_wake_up(p);
}
 }
 
@@ -112,25 +112,25 @@ static void afs_next_locker(struct afs_vnode *vnode, int 
error)
 {
struct file_lock *p, *_p, *next = NULL;
struct key *key = vnode->lock_key;
-   unsigned int fl_type = F_RDLCK;
+   unsigned int type = F_RDLCK;
 
_enter("");
 
if (vnode->lock_type == AFS_LOCK_WRITE)
-   fl_type = F_WRLCK;
+   type = F_WRLCK;
 
list_for_each_entry_safe(p, _p, >pending_locks, fl_u.afs.link) {
if (error &&
-   p->fl_type == fl_type &&
+   p->fl_type == type &&
afs_file_key(p->fl_file) == key) {
list_del_init(>fl_u.afs.link);
p->fl_u.afs.state = error;
-   wake_up(>fl_wait);
+   locks_wake_up(p);
}
 
/* Select the next locker to hand off to. */
if (next &&
-   (next->fl_type == F_WRLCK || p->fl_type == F_RDLCK))
+   (lock_is_write(next) || lock_is_read(p)))
continue;
next = p;
}
@@ -142,7 +142,7 @@ static void afs_next_locker(struct afs_vnode *vnode, int 
error)
afs_set_lock_state(vnode, AFS_VNODE_LOCK_SETTING);
next->fl_u.afs.state = AFS_LOCK_YOUR_TRY;
trace_afs_flock_op(vnode, next, afs_flock_op_wake);
-   wake_up(>fl_wait);
+   locks_wake_up(next);
} else {
afs_set_lock_state(vnode, AFS_VNODE_LOCK_NONE);
trace_afs_flock_ev(vnode, NULL, afs_flock_no_lockers, 0);
@@ -166,7 +166,7 @@ static void afs_kill_lockers_enoent(struct afs_vnode *vnode)
   struct file_lock, fl_u.afs.link);
list_del_init(>fl_u.afs.link);
p->fl_u.afs.state = -ENOENT;
-   wake_up(>fl_wait);
+   locks_wake_up(p);
}
 
key_put(vnode->lock_key);
@@ -471,7 +471,7 @@ static int afs_do_setlk(struct file *file, struct file_lock 
*fl)
fl->fl_u.afs.state = AFS_LOCK_PENDING;
 
partial = (fl->fl_start != 0 || fl->fl_end != OFFSET_MAX);
-   type = (fl->fl_type == F_RDLCK) ? AFS_LOCK_READ : AFS_LOCK_WRITE;
+   type = lock_is_read(fl) ? AFS_LOCK_READ : AFS_LOCK_WRITE;
if (mode == afs_flock_mode_write && partial)
type = AFS_LOCK_WRITE;
 
@@ -734,7 +734,7 @@ static int afs_do_getlk(struct file *file, struct file_lock 
*fl)
 
/* check local lock records first */
posix_test_lock(file, fl);
-   if (fl->fl_type == F_UNLCK) {
+   if (lock_is_unlock(fl)) {
/* no local locks; consult the server */
ret = afs_fetch_status(vnode, key, false, NULL);
if (ret < 0)
@@ -778,7 +778,7 @@ int afs_lock(struct file *file, int cmd, struct file_lock 
*fl)
fl->fl_u.afs.debug_id = atomic_inc_return(_file_lock_debug_id);
trace_afs_flock_op(vnode, fl, afs_flock_op_lock);
 
-   if (fl->fl_type == F_UNLCK)
+   if (lock_is_unlock(fl))
ret = afs_do_unlk(file, fl);
else
ret = afs_do_setlk(file, fl);
@@ -820,7 +820,7 @@ int afs_flock(struct file *file, int cmd, struct file_lock 
*fl)
trace_afs_flock_op(vnode, fl, afs_flock_op_flock);
 
/* we're simulating flock() locks using posix locks on the server */
-   if (fl->fl_type == F_UNLCK)
+   if (lock_is_unlock(fl))
ret = afs_do_unlk(file, fl);
else
ret = afs_do_setlk(file, fl);

-- 
2.43.0




[PATCH v3 07/47] ceph: convert to using new filelock helpers

2024-01-31 Thread Jeff Layton
Convert to using the new file locking helper functions.

Signed-off-by: Jeff Layton 
---
 fs/ceph/locks.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index e07ad29ff8b9..80ebe1d6c67d 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -273,19 +273,19 @@ int ceph_lock(struct file *file, int cmd, struct 
file_lock *fl)
}
spin_unlock(>i_ceph_lock);
if (err < 0) {
-   if (op == CEPH_MDS_OP_SETFILELOCK && F_UNLCK == fl->fl_type)
+   if (op == CEPH_MDS_OP_SETFILELOCK && lock_is_unlock(fl))
posix_lock_file(file, fl, NULL);
return err;
}
 
-   if (F_RDLCK == fl->fl_type)
+   if (lock_is_read(fl))
lock_cmd = CEPH_LOCK_SHARED;
-   else if (F_WRLCK == fl->fl_type)
+   else if (lock_is_write(fl))
lock_cmd = CEPH_LOCK_EXCL;
else
lock_cmd = CEPH_LOCK_UNLOCK;
 
-   if (op == CEPH_MDS_OP_SETFILELOCK && F_UNLCK == fl->fl_type) {
+   if (op == CEPH_MDS_OP_SETFILELOCK && lock_is_unlock(fl)) {
err = try_unlock_file(file, fl);
if (err <= 0)
return err;
@@ -333,7 +333,7 @@ int ceph_flock(struct file *file, int cmd, struct file_lock 
*fl)
}
spin_unlock(>i_ceph_lock);
if (err < 0) {
-   if (F_UNLCK == fl->fl_type)
+   if (lock_is_unlock(fl))
locks_lock_file_wait(file, fl);
return err;
}
@@ -341,14 +341,14 @@ int ceph_flock(struct file *file, int cmd, struct 
file_lock *fl)
if (IS_SETLKW(cmd))
wait = 1;
 
-   if (F_RDLCK == fl->fl_type)
+   if (lock_is_read(fl))
lock_cmd = CEPH_LOCK_SHARED;
-   else if (F_WRLCK == fl->fl_type)
+   else if (lock_is_write(fl))
lock_cmd = CEPH_LOCK_EXCL;
else
lock_cmd = CEPH_LOCK_UNLOCK;
 
-   if (F_UNLCK == fl->fl_type) {
+   if (lock_is_unlock(fl)) {
err = try_unlock_file(file, fl);
if (err <= 0)
return err;
@@ -385,9 +385,9 @@ void ceph_count_locks(struct inode *inode, int 
*fcntl_count, int *flock_count)
ctx = locks_inode_context(inode);
if (ctx) {
spin_lock(>flc_lock);
-   list_for_each_entry(lock, >flc_posix, fl_list)
+   for_each_file_lock(lock, >flc_posix)
++(*fcntl_count);
-   list_for_each_entry(lock, >flc_flock, fl_list)
+   for_each_file_lock(lock, >flc_flock)
++(*flock_count);
spin_unlock(>flc_lock);
}
@@ -453,7 +453,7 @@ int ceph_encode_locks_to_buffer(struct inode *inode,
return 0;
 
spin_lock(>flc_lock);
-   list_for_each_entry(lock, >flc_posix, fl_list) {
+   for_each_file_lock(lock, >flc_posix) {
++seen_fcntl;
if (seen_fcntl > num_fcntl_locks) {
err = -ENOSPC;
@@ -464,7 +464,7 @@ int ceph_encode_locks_to_buffer(struct inode *inode,
goto fail;
++l;
}
-   list_for_each_entry(lock, >flc_flock, fl_list) {
+   for_each_file_lock(lock, >flc_flock) {
++seen_flock;
if (seen_flock > num_flock_locks) {
err = -ENOSPC;

-- 
2.43.0




[PATCH v3 05/47] 9p: rename fl_type variable in v9fs_file_do_lock

2024-01-31 Thread Jeff Layton
In later patches, we're going to introduce some macros that conflict
with the variable name here. Rename it.

Signed-off-by: Jeff Layton 
---
 fs/9p/vfs_file.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index bae330c2f0cf..3df8aa1b5996 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -121,7 +121,6 @@ static int v9fs_file_do_lock(struct file *filp, int cmd, 
struct file_lock *fl)
struct p9_fid *fid;
uint8_t status = P9_LOCK_ERROR;
int res = 0;
-   unsigned char fl_type;
struct v9fs_session_info *v9ses;
 
fid = filp->private_data;
@@ -208,11 +207,12 @@ static int v9fs_file_do_lock(struct file *filp, int cmd, 
struct file_lock *fl)
 * it locally
 */
if (res < 0 && fl->fl_type != F_UNLCK) {
-   fl_type = fl->fl_type;
+   unsigned char type = fl->fl_type;
+
fl->fl_type = F_UNLCK;
/* Even if this fails we want to return the remote error */
locks_lock_file_wait(filp, fl);
-   fl->fl_type = fl_type;
+   fl->fl_type = type;
}
if (flock.client_id != fid->clnt->name)
kfree(flock.client_id);

-- 
2.43.0




[PATCH v3 04/47] filelock: add some new helper functions

2024-01-31 Thread Jeff Layton
In later patches we're going to embed some common fields into a new
structure inside struct file_lock. Smooth the transition by adding some
new helper functions, and converting the core file locking code to use
them.

Signed-off-by: Jeff Layton 
---
 fs/locks.c   | 18 +-
 include/linux/filelock.h | 23 +++
 2 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 1eceaa56e47f..149070fd3b66 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -674,7 +674,7 @@ static void __locks_wake_up_blocks(struct file_lock 
*blocker)
if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
waiter->fl_lmops->lm_notify(waiter);
else
-   wake_up(>fl_wait);
+   locks_wake_up(waiter);
 
/*
 * The setting of fl_blocker to NULL marks the "done"
@@ -841,9 +841,9 @@ locks_delete_lock_ctx(struct file_lock *fl, struct 
list_head *dispose)
 static bool locks_conflict(struct file_lock *caller_fl,
   struct file_lock *sys_fl)
 {
-   if (sys_fl->fl_type == F_WRLCK)
+   if (lock_is_write(sys_fl))
return true;
-   if (caller_fl->fl_type == F_WRLCK)
+   if (lock_is_write(caller_fl))
return true;
return false;
 }
@@ -874,7 +874,7 @@ static bool posix_test_locks_conflict(struct file_lock 
*caller_fl,
  struct file_lock *sys_fl)
 {
/* F_UNLCK checks any locks on the same fd. */
-   if (caller_fl->fl_type == F_UNLCK) {
+   if (lock_is_unlock(caller_fl)) {
if (!posix_same_owner(caller_fl, sys_fl))
return false;
return locks_overlap(caller_fl, sys_fl);
@@ -1055,7 +1055,7 @@ static int flock_lock_inode(struct inode *inode, struct 
file_lock *request)
break;
}
 
-   if (request->fl_type == F_UNLCK) {
+   if (lock_is_unlock(request)) {
if ((request->fl_flags & FL_EXISTS) && !found)
error = -ENOENT;
goto out;
@@ -1107,7 +1107,7 @@ static int posix_lock_inode(struct inode *inode, struct 
file_lock *request,
 
ctx = locks_get_lock_context(inode, request->fl_type);
if (!ctx)
-   return (request->fl_type == F_UNLCK) ? 0 : -ENOMEM;
+   return lock_is_unlock(request) ? 0 : -ENOMEM;
 
/*
 * We may need two file_lock structures for this operation,
@@ -1228,7 +1228,7 @@ static int posix_lock_inode(struct inode *inode, struct 
file_lock *request,
continue;
if (fl->fl_start > request->fl_end)
break;
-   if (request->fl_type == F_UNLCK)
+   if (lock_is_unlock(request))
added = true;
if (fl->fl_start < request->fl_start)
left = fl;
@@ -1279,7 +1279,7 @@ static int posix_lock_inode(struct inode *inode, struct 
file_lock *request,
 
error = 0;
if (!added) {
-   if (request->fl_type == F_UNLCK) {
+   if (lock_is_unlock(request)) {
if (request->fl_flags & FL_EXISTS)
error = -ENOENT;
goto out;
@@ -1608,7 +1608,7 @@ void lease_get_mtime(struct inode *inode, struct 
timespec64 *time)
spin_lock(>flc_lock);
fl = list_first_entry_or_null(>flc_lease,
  struct file_lock, fl_list);
-   if (fl && (fl->fl_type == F_WRLCK))
+   if (fl && lock_is_write(fl))
has_lease = true;
spin_unlock(>flc_lock);
}
diff --git a/include/linux/filelock.h b/include/linux/filelock.h
index 085ff6ba0653..a814664b1053 100644
--- a/include/linux/filelock.h
+++ b/include/linux/filelock.h
@@ -147,6 +147,29 @@ int fcntl_setlk64(unsigned int, struct file *, unsigned 
int,
 int fcntl_setlease(unsigned int fd, struct file *filp, int arg);
 int fcntl_getlease(struct file *filp);
 
+static inline bool lock_is_unlock(struct file_lock *fl)
+{
+   return fl->fl_type == F_UNLCK;
+}
+
+static inline bool lock_is_read(struct file_lock *fl)
+{
+   return fl->fl_type == F_RDLCK;
+}
+
+static inline bool lock_is_write(struct file_lock *fl)
+{
+   return fl->fl_type == F_WRLCK;
+}
+
+static inline void locks_wake_up(struct file_lock *fl)
+{
+   wake_up(>fl_wait);
+}
+
+/* for walking lists of file_locks linked by fl_list */
+#define for_each_file_lock(_fl, _head) list_for_each_entry(_fl, _head, fl_list)
+
 /* fs/locks.c */
 void locks_free_lock_context(struct inode *inode);
 void locks_free_lock(struct file_lock *fl);

-- 
2.43.0




[PATCH v3 03/47] filelock: rename fl_pid variable in lock_get_status

2024-01-31 Thread Jeff Layton
In later patches we're going to introduce some macros that will clash
with the variable name here. Rename it.

Signed-off-by: Jeff Layton 
---
 fs/locks.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index cc7c117ee192..1eceaa56e47f 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2695,11 +2695,11 @@ static void lock_get_status(struct seq_file *f, struct 
file_lock *fl,
loff_t id, char *pfx, int repeat)
 {
struct inode *inode = NULL;
-   unsigned int fl_pid;
+   unsigned int pid;
struct pid_namespace *proc_pidns = 
proc_pid_ns(file_inode(f->file)->i_sb);
int type;
 
-   fl_pid = locks_translate_pid(fl, proc_pidns);
+   pid = locks_translate_pid(fl, proc_pidns);
/*
 * If lock owner is dead (and pid is freed) or not visible in current
 * pidns, zero is shown as a pid value. Check lock info from
@@ -2747,11 +2747,11 @@ static void lock_get_status(struct seq_file *f, struct 
file_lock *fl,
 (type == F_RDLCK) ? "READ" : "UNLCK");
if (inode) {
/* userspace relies on this representation of dev_t */
-   seq_printf(f, "%d %02x:%02x:%lu ", fl_pid,
+   seq_printf(f, "%d %02x:%02x:%lu ", pid,
MAJOR(inode->i_sb->s_dev),
MINOR(inode->i_sb->s_dev), inode->i_ino);
} else {
-   seq_printf(f, "%d :0 ", fl_pid);
+   seq_printf(f, "%d :0 ", pid);
}
if (IS_POSIX(fl)) {
if (fl->fl_end == OFFSET_MAX)

-- 
2.43.0




[PATCH v3 02/47] filelock: rename some fields in tracepoints

2024-01-31 Thread Jeff Layton
In later patches we're going to introduce some macros with names that
clash with fields here. To prevent problems building, just rename the
fields in the trace entry structures.

Signed-off-by: Jeff Layton 
---
 include/trace/events/filelock.h | 76 -
 1 file changed, 38 insertions(+), 38 deletions(-)

diff --git a/include/trace/events/filelock.h b/include/trace/events/filelock.h
index 1646dadd7f37..8fb1d41b1c67 100644
--- a/include/trace/events/filelock.h
+++ b/include/trace/events/filelock.h
@@ -68,11 +68,11 @@ DECLARE_EVENT_CLASS(filelock_lock,
__field(struct file_lock *, fl)
__field(unsigned long, i_ino)
__field(dev_t, s_dev)
-   __field(struct file_lock *, fl_blocker)
-   __field(fl_owner_t, fl_owner)
-   __field(unsigned int, fl_pid)
-   __field(unsigned int, fl_flags)
-   __field(unsigned char, fl_type)
+   __field(struct file_lock *, blocker)
+   __field(fl_owner_t, owner)
+   __field(unsigned int, pid)
+   __field(unsigned int, flags)
+   __field(unsigned char, type)
__field(loff_t, fl_start)
__field(loff_t, fl_end)
__field(int, ret)
@@ -82,11 +82,11 @@ DECLARE_EVENT_CLASS(filelock_lock,
__entry->fl = fl ? fl : NULL;
__entry->s_dev = inode->i_sb->s_dev;
__entry->i_ino = inode->i_ino;
-   __entry->fl_blocker = fl ? fl->fl_blocker : NULL;
-   __entry->fl_owner = fl ? fl->fl_owner : NULL;
-   __entry->fl_pid = fl ? fl->fl_pid : 0;
-   __entry->fl_flags = fl ? fl->fl_flags : 0;
-   __entry->fl_type = fl ? fl->fl_type : 0;
+   __entry->blocker = fl ? fl->fl_blocker : NULL;
+   __entry->owner = fl ? fl->fl_owner : NULL;
+   __entry->pid = fl ? fl->fl_pid : 0;
+   __entry->flags = fl ? fl->fl_flags : 0;
+   __entry->type = fl ? fl->fl_type : 0;
__entry->fl_start = fl ? fl->fl_start : 0;
__entry->fl_end = fl ? fl->fl_end : 0;
__entry->ret = ret;
@@ -94,9 +94,9 @@ DECLARE_EVENT_CLASS(filelock_lock,
 
TP_printk("fl=%p dev=0x%x:0x%x ino=0x%lx fl_blocker=%p fl_owner=%p 
fl_pid=%u fl_flags=%s fl_type=%s fl_start=%lld fl_end=%lld ret=%d",
__entry->fl, MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
-   __entry->i_ino, __entry->fl_blocker, __entry->fl_owner,
-   __entry->fl_pid, show_fl_flags(__entry->fl_flags),
-   show_fl_type(__entry->fl_type),
+   __entry->i_ino, __entry->blocker, __entry->owner,
+   __entry->pid, show_fl_flags(__entry->flags),
+   show_fl_type(__entry->type),
__entry->fl_start, __entry->fl_end, __entry->ret)
 );
 
@@ -125,32 +125,32 @@ DECLARE_EVENT_CLASS(filelock_lease,
__field(struct file_lock *, fl)
__field(unsigned long, i_ino)
__field(dev_t, s_dev)
-   __field(struct file_lock *, fl_blocker)
-   __field(fl_owner_t, fl_owner)
-   __field(unsigned int, fl_flags)
-   __field(unsigned char, fl_type)
-   __field(unsigned long, fl_break_time)
-   __field(unsigned long, fl_downgrade_time)
+   __field(struct file_lock *, blocker)
+   __field(fl_owner_t, owner)
+   __field(unsigned int, flags)
+   __field(unsigned char, type)
+   __field(unsigned long, break_time)
+   __field(unsigned long, downgrade_time)
),
 
TP_fast_assign(
__entry->fl = fl ? fl : NULL;
__entry->s_dev = inode->i_sb->s_dev;
__entry->i_ino = inode->i_ino;
-   __entry->fl_blocker = fl ? fl->fl_blocker : NULL;
-   __entry->fl_owner = fl ? fl->fl_owner : NULL;
-   __entry->fl_flags = fl ? fl->fl_flags : 0;
-   __entry->fl_type = fl ? fl->fl_type : 0;
-   __entry->fl_break_time = fl ? fl->fl_break_time : 0;
-   __entry->fl_downgrade_time = fl ? fl->fl_downgrade_time : 0;
+   __entry->blocker = fl ? fl->fl_blocker : NULL;
+   __entry->owner = fl ? fl->fl_owner : NULL;
+   __entry->flags = fl ? fl->fl_flags : 0;
+   __entry->type = fl ? fl->fl_type : 0;
+   __entry->break_time = fl ? fl->fl_break_time : 0;
+   __entry->downgrade_time = fl ? fl->fl_downgrade_time : 0;
),
 
TP_printk("fl=%p dev=0x%x:0x%x ino=0x%lx fl_blocker=%p fl_owner=%p 
fl_flags=%s fl_type=%s fl_break_time=%lu fl_downgrade_time=%lu",
__entry->fl, MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
-   __entry->i_ino, __entry->fl_blocker, __entry->fl_owner,
-   

[PATCH v3 01/47] filelock: fl_pid field should be signed int

2024-01-31 Thread Jeff Layton
This field has been unsigned for a very long time, but most users of the
struct file_lock and the file locking internals themselves treat it as a
signed value. Change it to be pid_t (which is a signed int).

Signed-off-by: Jeff Layton 
---
 include/linux/filelock.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/filelock.h b/include/linux/filelock.h
index 95e868e09e29..085ff6ba0653 100644
--- a/include/linux/filelock.h
+++ b/include/linux/filelock.h
@@ -98,7 +98,7 @@ struct file_lock {
fl_owner_t fl_owner;
unsigned int fl_flags;
unsigned char fl_type;
-   unsigned int fl_pid;
+   pid_t fl_pid;
int fl_link_cpu;/* what cpu's list is this on? */
wait_queue_head_t fl_wait;
struct file *fl_file;

-- 
2.43.0




[PATCH v3 00/47] filelock: split file leases out of struct file_lock

2024-01-31 Thread Jeff Layton
I'm not sure this is much prettier than the last, but contracting
"fl_core" to "c", as Neil suggested is a bit easier on the eyes.

I also added a few small helpers and converted several users over to
them. That reduces the size of the per-fs conversion patches later in
the series. I played with some others too, but they were too awkward
or not frequently used enough to make it worthwhile.

Many thanks to Chuck and Neil for the earlier R-b's and comments. I've
dropped those for now since this set is a bit different from the last.

I'd like to get this into linux-next soon and we can see about merging
it for v6.9, unless anyone has major objections.

Thanks!

Signed-off-by: Jeff Layton 
---
Changes in v3:
- Rename "flc_core" fields in file_lock and file_lease to "c"
- new helpers: locks_wake_up, for_each_file_lock, and 
lock_is_{unlock,read,write}
- Link to v2: 
https://lore.kernel.org/r/20240125-flsplit-v2-0-7485322b6...@kernel.org

Changes in v2:
- renamed file_lock_core fields to have "flc_" prefix
- used macros to more easily do the change piecemeal
- broke up patches into per-subsystem ones
- Link to v1: 
https://lore.kernel.org/r/20240116-flsplit-v1-0-c9d0f4370...@kernel.org

---
Jeff Layton (47):
  filelock: fl_pid field should be signed int
  filelock: rename some fields in tracepoints
  filelock: rename fl_pid variable in lock_get_status
  filelock: add some new helper functions
  9p: rename fl_type variable in v9fs_file_do_lock
  afs: convert to using new filelock helpers
  ceph: convert to using new filelock helpers
  dlm: convert to using new filelock helpers
  gfs2: convert to using new filelock helpers
  lockd: convert to using new filelock helpers
  nfs: convert to using new filelock helpers
  nfsd: convert to using new filelock helpers
  ocfs2: convert to using new filelock helpers
  smb/client: convert to using new filelock helpers
  smb/server: convert to using new filelock helpers
  filelock: drop the IS_* macros
  filelock: split common fields into struct file_lock_core
  filelock: have fs/locks.c deal with file_lock_core directly
  filelock: convert more internal functions to use file_lock_core
  filelock: make posix_same_owner take file_lock_core pointers
  filelock: convert posix_owner_key to take file_lock_core arg
  filelock: make locks_{insert,delete}_global_locks take file_lock_core arg
  filelock: convert locks_{insert,delete}_global_blocked
  filelock: make __locks_delete_block and __locks_wake_up_blocks take 
file_lock_core
  filelock: convert __locks_insert_block, conflict and deadlock checks to 
use file_lock_core
  filelock: convert fl_blocker to file_lock_core
  filelock: clean up locks_delete_block internals
  filelock: reorganize locks_delete_block and __locks_insert_block
  filelock: make assign_type helper take a file_lock_core pointer
  filelock: convert locks_wake_up_blocks to take a file_lock_core pointer
  filelock: convert locks_insert_lock_ctx and locks_delete_lock_ctx
  filelock: convert locks_translate_pid to take file_lock_core
  filelock: convert seqfile handling to use file_lock_core
  9p: adapt to breakup of struct file_lock
  afs: adapt to breakup of struct file_lock
  ceph: adapt to breakup of struct file_lock
  dlm: adapt to breakup of struct file_lock
  gfs2: adapt to breakup of struct file_lock
  fuse: adapt to breakup of struct file_lock
  lockd: adapt to breakup of struct file_lock
  nfs: adapt to breakup of struct file_lock
  nfsd: adapt to breakup of struct file_lock
  ocfs2: adapt to breakup of struct file_lock
  smb/client: adapt to breakup of struct file_lock
  smb/server: adapt to breakup of struct file_lock
  filelock: remove temporary compatibility macros
  filelock: split leases out of struct file_lock

 fs/9p/vfs_file.c|  40 +-
 fs/afs/flock.c  |  60 +--
 fs/ceph/locks.c |  74 ++--
 fs/dlm/plock.c  |  44 +--
 fs/fuse/file.c  |  14 +-
 fs/gfs2/file.c  |  16 +-
 fs/libfs.c  |   2 +-
 fs/lockd/clnt4xdr.c |  14 +-
 fs/lockd/clntlock.c |   2 +-
 fs/lockd/clntproc.c |  65 +--
 fs/lockd/clntxdr.c  |  14 +-
 fs/lockd/svc4proc.c |  10 +-
 fs/lockd/svclock.c  |  64 +--
 fs/lockd/svcproc.c  |  10 +-
 fs/lockd/svcsubs.c  |  24 +-
 fs/lockd/xdr.c  |  14 +-
 fs/lockd/xdr4.c |  14 +-
 fs/locks.c  | 851 ++--
 fs/nfs/delegation.c |   4 +-
 fs/nfs/file.c   |  22 +-
 fs/nfs/nfs3proc.c   |   2 +-
 fs/nfs/nfs4_fs.h|   2 +-
 fs/nfs/nfs4file.c   |   2 +-
 fs/nfs/nfs4proc.c   |  39 +-
 fs/nfs/nfs4state.c

Re: [PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time

2024-01-31 Thread Luis Chamberlain
On Wed, Jan 31, 2024 at 06:53:13AM +, Christophe Leroy wrote:
> The problem being identified in commit 677bfb9db8a3 ("module: Don't 
> ignore errors from set_memory_XX()"), you can keep/re-apply the series 
> [PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time.

Sure, queued that up into modules-testing before I push to modules-next.

  Luis



Re: [PATCH v2 0/3] modules: few of alignment fixes

2024-01-31 Thread Luis Chamberlain
On Mon, Jan 29, 2024 at 11:26:39AM -0800, Luis Chamberlain wrote:
> Masahiro, if there no issues feel free to take this or I can take them in
> too via the modules-next tree. Lemme know!

I've queued this onto modules-testing to get winder testing [0]

[0] 
https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-testing

  Luis



[PATCH] eventfs: Create eventfs_root_inode to store dentry

2024-01-31 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

Only the root "events" directory stores a dentry. There's no reason to
hold a dentry pointer for every eventfs_inode as it is never set except
for the root "events" eventfs_inode.

Create a eventfs_root_inode structure that holds the events_dir dentry.
The "events" eventfs_inode *is* special, let it have its own descriptor.

Signed-off-by: Steven Rostedt (Google) 
---
 fs/tracefs/event_inode.c | 65 +---
 fs/tracefs/internal.h|  2 --
 2 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 463920295237..126cbe62f142 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -35,6 +35,17 @@ static DEFINE_MUTEX(eventfs_mutex);
 /* Choose something "unique" ;-) */
 #define EVENTFS_FILE_INODE_INO 0x12c4e37
 
+struct eventfs_root_inode {
+   struct eventfs_inodeei;
+   struct dentry   *events_dir;
+};
+
+static struct eventfs_root_inode *get_root_inode(struct eventfs_inode *ei)
+{
+   WARN_ON_ONCE(!ei->is_events);
+   return container_of(ei, struct eventfs_root_inode, ei);
+}
+
 /* Just try to make something consistent and unique */
 static int eventfs_dir_ino(struct eventfs_inode *ei)
 {
@@ -73,12 +84,18 @@ enum {
 static void release_ei(struct kref *ref)
 {
struct eventfs_inode *ei = container_of(ref, struct eventfs_inode, 
kref);
+   struct eventfs_root_inode *rei;
 
WARN_ON_ONCE(!ei->is_freed);
 
kfree(ei->entry_attrs);
kfree_const(ei->name);
-   kfree_rcu(ei, rcu);
+   if (ei->is_events) {
+   rei = get_root_inode(ei);
+   kfree_rcu(rei, ei.rcu);
+   } else {
+   kfree_rcu(ei, rcu);
+   }
 }
 
 static inline void put_ei(struct eventfs_inode *ei)
@@ -412,19 +429,43 @@ static struct dentry *lookup_dir_entry(struct dentry 
*dentry,
return NULL;
 }
 
+static inline struct eventfs_inode *init_ei(struct eventfs_inode *ei, const 
char *name)
+{
+   ei->name = kstrdup_const(name, GFP_KERNEL);
+   if (!ei->name)
+   return NULL;
+   kref_init(>kref);
+   return ei;
+}
+
 static inline struct eventfs_inode *alloc_ei(const char *name)
 {
struct eventfs_inode *ei = kzalloc(sizeof(*ei), GFP_KERNEL);
+   struct eventfs_inode *result;
 
if (!ei)
return NULL;
 
-   ei->name = kstrdup_const(name, GFP_KERNEL);
-   if (!ei->name) {
+   result = init_ei(ei, name);
+   if (!result)
kfree(ei);
+
+   return result;
+}
+
+static inline struct eventfs_inode *alloc_root_ei(const char *name)
+{
+   struct eventfs_root_inode *rei = kzalloc(sizeof(*rei), GFP_KERNEL);
+   struct eventfs_inode *ei;
+
+   if (!rei)
return NULL;
-   }
-   kref_init(>kref);
+
+   rei->ei.is_events = 1;
+   ei = init_ei(>ei, name);
+   if (!ei)
+   kfree(rei);
+
return ei;
 }
 
@@ -718,6 +759,7 @@ struct eventfs_inode *eventfs_create_events_dir(const char 
*name, struct dentry
int size, void *data)
 {
struct dentry *dentry = tracefs_start_creating(name, parent);
+   struct eventfs_root_inode *rei;
struct eventfs_inode *ei;
struct tracefs_inode *ti;
struct inode *inode;
@@ -730,7 +772,7 @@ struct eventfs_inode *eventfs_create_events_dir(const char 
*name, struct dentry
if (IS_ERR(dentry))
return ERR_CAST(dentry);
 
-   ei = alloc_ei(name);
+   ei = alloc_root_ei(name);
if (!ei)
goto fail;
 
@@ -739,10 +781,11 @@ struct eventfs_inode *eventfs_create_events_dir(const 
char *name, struct dentry
goto fail;
 
// Note: we have a ref to the dentry from tracefs_start_creating()
-   ei->events_dir = dentry;
+   rei = get_root_inode(ei);
+   rei->events_dir = dentry;
+
ei->entries = entries;
ei->nr_entries = size;
-   ei->is_events = 1;
ei->data = data;
 
/* Save the ownership of this directory */
@@ -845,13 +888,15 @@ void eventfs_remove_dir(struct eventfs_inode *ei)
  */
 void eventfs_remove_events_dir(struct eventfs_inode *ei)
 {
+   struct eventfs_root_inode *rei;
struct dentry *dentry;
 
-   dentry = ei->events_dir;
+   rei = get_root_inode(ei);
+   dentry = rei->events_dir;
if (!dentry)
return;
 
-   ei->events_dir = NULL;
+   rei->events_dir = NULL;
eventfs_remove_dir(ei);
 
/*
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index beb3dcd0e434..15c26f9aaad4 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -36,7 +36,6 @@ struct eventfs_attr {
  * @children:  link list into the child eventfs_inode
  * @entries:   the array of entries representing the files in the directory
  * @name:  the 

Re: [PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time

2024-01-31 Thread Marek Szyprowski
Hi Christophe,

On 31.01.2024 21:07, Christophe Leroy wrote:
> Le 31/01/2024 à 16:17, Marek Szyprowski a écrit :
>> [Vous ne recevez pas souvent de courriers de m.szyprow...@samsung.com. 
>> Découvrez pourquoi ceci est important à 
>> https://aka.ms/LearnAboutSenderIdentification ]
>>
>> On 31.01.2024 12:58, Christophe Leroy wrote:
>>> Le 30/01/2024 à 18:48, Marek Szyprowski a écrit :
 [Vous ne recevez pas souvent de courriers de m.szyprow...@samsung.com. 
 Découvrez pourquoi ceci est important à 
 https://aka.ms/LearnAboutSenderIdentification ]

 On 30.01.2024 12:03, Christophe Leroy wrote:
> Le 30/01/2024 à 10:16, Chen-Yu Tsai a écrit :
>> [Vous ne recevez pas souvent de courriers de we...@chromium.org. 
>> D?couvrez pourquoi ceci est important ? 
>> https://aka.ms/LearnAboutSenderIdentification ]
>>
>> On Mon, Jan 29, 2024 at 12:09:50PM -0800, Luis Chamberlain wrote:
>>> On Thu, Dec 21, 2023 at 10:02:46AM +0100, Christophe Leroy wrote:
 Declaring rodata_enabled and mark_rodata_ro() at all time
 helps removing related #ifdefery in C files.

 Signed-off-by: Christophe Leroy 
>>> Very nice cleanup, thanks!, applied and pushed
>>>
>>> Luis
>> On next-20240130, which has your modules-next branch, and thus this
>> series and the other "module: Use set_memory_rox()" series applied,
>> my kernel crashes in some very weird way. Reverting your branch
>> makes the crash go away.
>>
>> I thought I'd report it right away. Maybe you folks would know what's
>> happening here? This is on arm64.
> That's strange, it seems to bug in module_bug_finalize() which is
> _before_ calls to module_enable_ro() and such.
>
> Can you try to revert the 6 patches one by one to see which one
> introduces the problem ?
>
> In reality, only patch 677bfb9db8a3 really change things. Other ones are
> more on less only cleanup.
 I've also run into this issue with today's (20240130) linux-next on my
 test farm. The issue is not fully reproducible, so it was a bit hard to
 bisect it automatically. I've spent some time on manual testing and it
 looks that reverting the following 2 commits on top of linux-next fixes
 the problem:

 65929884f868 ("modules: Remove #ifdef CONFIG_STRICT_MODULE_RWX around
 rodata_enabled")
 677bfb9db8a3 ("module: Don't ignore errors from set_memory_XX()")

 This in fact means that commit 677bfb9db8a3 is responsible for this
 regression, as 65929884f868 has to be reverted only because the latter
 depends on it. Let me know what I can do to help debugging this issue.

>>> Thanks for the bisect. I suspect you hit one of the errors and something
>>> goes wrong in the error path.
>>>
>>> To confirm this assumption, could you try with the following change on
>>> top of everything ?
>>
>> Yes, this is the problem. I've added printing a mod->name to the log.
>> Here is a log from kernel build from next-20240130 (sometimes it even
>> boots to shell):
>>
>> # dmesg | grep module_set_memory
>> [    8.061525] module_set_memory(6, , 0) name ipv6
>> returned -22
>> [    8.067543] WARNING: CPU: 3 PID: 1 at kernel/module/strict_rwx.c:22
>> module_set_memory+0x9c/0xb8
> Would be good if you could show the backtrace too so that we know who is
> the caller. I guess what you show here is what you get on the screen ?
> The backtrace should be available throught 'dmesg'.

Here are relevant parts of the boot log:

[    8.096850] [ cut here ]
[    8.096939] module_set_memory(6, , 0) name ipv6 
returned -22
[    8.102947] WARNING: CPU: 4 PID: 1 at kernel/module/strict_rwx.c:22 
module_set_memory+0x9c/0xb8
[    8.111561] Modules linked in:
[    8.114596] CPU: 4 PID: 1 Comm: systemd Not tainted 
6.8.0-rc2-next-20240130-dirty #14429
[    8.122654] Hardware name: Khadas VIM3 (DT)
[    8.126815] pstate: 6005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS 
BTYPE=--)
[    8.133747] pc : module_set_memory+0x9c/0xb8
[    8.137994] lr : module_set_memory+0x9c/0xb8
[    8.142240] sp : 800083fcba80
[    8.145534] x29: 800083fcba80 x28: 0001 x27: 
80007c024448
[    8.152640] x26: 800083fcbc10 x25: 80007c007958 x24: 
80007c024450
[    8.159747] x23: 800083f2a090 x22: 80007c007940 x21: 
0006
[    8.166854] x20: ffea x19: 80007c007af0 x18: 
0030
[    8.173960] x17:  x16: 5932 x15: 

[    8.181067] x14: 800082ea5658 x13: 03d5 x12: 
0147
[    8.188174] x11: 6920656d616e2029 x10: 800082efd658 x9 : 
f000
[    8.195280] x8 : 800082ea5658 x7 : 800082efd658 x6 : 

[    8.202387] x5 : bff4 x4 :  x3 : 

[    8.209494] x2 :  x1 :  x0 : 

[PATCH] eventfs: Warn if an eventfs_inode is freed without is_freed being set

2024-01-31 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

There should never be a case where an evenfs_inode is being freed without
is_freed being set. Add a WARN_ON_ONCE() if it ever happens. That would
mean there was one too many put_ei()s.

As put_ei() is also called on failure cases to free the ei, add a free_ei()
helper that sets the is_freed and then calls put_ei().

Signed-off-by: Steven Rostedt (Google) 
---
 fs/tracefs/event_inode.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 31cbe38739fa..463920295237 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -73,6 +73,9 @@ enum {
 static void release_ei(struct kref *ref)
 {
struct eventfs_inode *ei = container_of(ref, struct eventfs_inode, 
kref);
+
+   WARN_ON_ONCE(!ei->is_freed);
+
kfree(ei->entry_attrs);
kfree_const(ei->name);
kfree_rcu(ei, rcu);
@@ -84,6 +87,14 @@ static inline void put_ei(struct eventfs_inode *ei)
kref_put(>kref, release_ei);
 }
 
+static inline void free_ei(struct eventfs_inode *ei)
+{
+   if (ei) {
+   ei->is_freed = 1;
+   put_ei(ei);
+   }
+}
+
 static inline struct eventfs_inode *get_ei(struct eventfs_inode *ei)
 {
if (ei)
@@ -684,7 +695,7 @@ struct eventfs_inode *eventfs_create_dir(const char *name, 
struct eventfs_inode
 
/* Was the parent freed? */
if (list_empty(>list)) {
-   put_ei(ei);
+   free_ei(ei);
ei = NULL;
}
return ei;
@@ -775,7 +786,7 @@ struct eventfs_inode *eventfs_create_events_dir(const char 
*name, struct dentry
return ei;
 
  fail:
-   put_ei(ei);
+   free_ei(ei);
tracefs_failed_creating(dentry);
return ERR_PTR(-ENOMEM);
 }
@@ -806,9 +817,8 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, 
int level)
list_for_each_entry(ei_child, >children, list)
eventfs_remove_rec(ei_child, level + 1);
 
-   ei->is_freed = 1;
list_del(>list);
-   put_ei(ei);
+   free_ei(ei);
 }
 
 /**
-- 
2.43.0




Re: [PATCH RFC 0/4] Introduce uts_release

2024-01-31 Thread Greg KH
On Wed, Jan 31, 2024 at 05:16:09PM +, John Garry wrote:
> On 31/01/2024 16:22, Greg KH wrote:
> > > before:
> > > real0m53.591s
> > > user1m1.842s
> > > sys 0m9.161s
> > > 
> > > after:
> > > real0m37.481s
> > > user0m46.461s
> > > sys 0m7.199s
> > > 
> > > Sending as an RFC as I need to test more of the conversions and I would
> > > like to also convert more UTS_RELEASE users to prove this is proper
> > > approach.
> > I like it, I also think that v4l2 includes this as well as all of those
> > drivers seem to rebuild when this changes, does that not happen for you
> > too?
> 
> I didn't see that. Were you were building for arm64? I can see some v4l2
> configs enabled there for the vanilla defconfig (but none for x86-64).

Building for x86, maybe it's one of the other LINUX_VERSION type defines
we have, sorry, can't remember, it's been a long time since I looked
into it.

thanks,

greg k-h



Re: [RFC PATCH v2 7/8] Introduce dcache_is_aliasing() across all architectures

2024-01-31 Thread Dave Chinner
On Wed, Jan 31, 2024 at 09:58:21AM -0500, Mathieu Desnoyers wrote:
> On 2024-01-30 21:48, Dave Chinner wrote:
> > On Tue, Jan 30, 2024 at 11:52:54AM -0500, Mathieu Desnoyers wrote:
> > > Introduce a generic way to query whether the dcache is virtually aliased
> > > on all architectures. Its purpose is to ensure that subsystems which
> > > are incompatible with virtually aliased data caches (e.g. FS_DAX) can
> > > reliably query this.
> > > 
> > > For dcache aliasing, there are three scenarios dependending on the
> > > architecture. Here is a breakdown based on my understanding:
> > > 
> > > A) The dcache is always aliasing:
> > > 
> > > * arc
> > > * csky
> > > * m68k (note: shared memory mappings are incoherent ? SHMLBA is missing 
> > > there.)
> > > * sh
> > > * parisc
> > 
> > /me wonders why the dentry cache aliasing has problems on these
> > systems.
> > 
> > Oh, dcache != fs/dcache.c (the VFS dentry cache).
> > 
> > Can you please rename this function appropriately so us dumb
> > filesystem people don't confuse cpu data cache configurations with
> > the VFS dentry cache aliasing when we read this code? Something like
> > cpu_dcache_is_aliased(), perhaps?
> 
> Good point, will do. I'm planning go rename as follows for v3 to
> eliminate confusion with dentry cache (and with "page cache" in
> general):
> 
> ARCH_HAS_CACHE_ALIASING -> ARCH_HAS_CPU_CACHE_ALIASING
> dcache_is_aliasing() -> cpu_dcache_is_aliasing()
> 
> I noticed that you suggested "aliased" rather than "aliasing",
> but I followed what arm64 did for icache_is_aliasing(). Do you
> have a strong preference one way or another ?

Not really.

-Dave.
-- 
Dave Chinner
da...@fromorbit.com



Re: [PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time

2024-01-31 Thread Christophe Leroy


Le 31/01/2024 à 16:17, Marek Szyprowski a écrit :
> [Vous ne recevez pas souvent de courriers de m.szyprow...@samsung.com. 
> Découvrez pourquoi ceci est important à 
> https://aka.ms/LearnAboutSenderIdentification ]
> 
> Hi Christophe,
> 
> On 31.01.2024 12:58, Christophe Leroy wrote:
>> Le 30/01/2024 à 18:48, Marek Szyprowski a écrit :
>>> [Vous ne recevez pas souvent de courriers de m.szyprow...@samsung.com. 
>>> Découvrez pourquoi ceci est important à 
>>> https://aka.ms/LearnAboutSenderIdentification ]
>>>
>>> On 30.01.2024 12:03, Christophe Leroy wrote:
 Le 30/01/2024 à 10:16, Chen-Yu Tsai a écrit :
> [Vous ne recevez pas souvent de courriers de we...@chromium.org. 
> D?couvrez pourquoi ceci est important ? 
> https://aka.ms/LearnAboutSenderIdentification ]
>
> On Mon, Jan 29, 2024 at 12:09:50PM -0800, Luis Chamberlain wrote:
>> On Thu, Dec 21, 2023 at 10:02:46AM +0100, Christophe Leroy wrote:
>>> Declaring rodata_enabled and mark_rodata_ro() at all time
>>> helps removing related #ifdefery in C files.
>>>
>>> Signed-off-by: Christophe Leroy 
>> Very nice cleanup, thanks!, applied and pushed
>>
>>Luis
> On next-20240130, which has your modules-next branch, and thus this
> series and the other "module: Use set_memory_rox()" series applied,
> my kernel crashes in some very weird way. Reverting your branch
> makes the crash go away.
>
> I thought I'd report it right away. Maybe you folks would know what's
> happening here? This is on arm64.
 That's strange, it seems to bug in module_bug_finalize() which is
 _before_ calls to module_enable_ro() and such.

 Can you try to revert the 6 patches one by one to see which one
 introduces the problem ?

 In reality, only patch 677bfb9db8a3 really change things. Other ones are
 more on less only cleanup.
>>> I've also run into this issue with today's (20240130) linux-next on my
>>> test farm. The issue is not fully reproducible, so it was a bit hard to
>>> bisect it automatically. I've spent some time on manual testing and it
>>> looks that reverting the following 2 commits on top of linux-next fixes
>>> the problem:
>>>
>>> 65929884f868 ("modules: Remove #ifdef CONFIG_STRICT_MODULE_RWX around
>>> rodata_enabled")
>>> 677bfb9db8a3 ("module: Don't ignore errors from set_memory_XX()")
>>>
>>> This in fact means that commit 677bfb9db8a3 is responsible for this
>>> regression, as 65929884f868 has to be reverted only because the latter
>>> depends on it. Let me know what I can do to help debugging this issue.
>>>
>> Thanks for the bisect. I suspect you hit one of the errors and something
>> goes wrong in the error path.
>>
>> To confirm this assumption, could you try with the following change on
>> top of everything ?
> 
> 
> Yes, this is the problem. I've added printing a mod->name to the log.
> Here is a log from kernel build from next-20240130 (sometimes it even
> boots to shell):
> 
> # dmesg | grep module_set_memory
> [    8.061525] module_set_memory(6, , 0) name ipv6
> returned -22
> [    8.067543] WARNING: CPU: 3 PID: 1 at kernel/module/strict_rwx.c:22
> module_set_memory+0x9c/0xb8

Would be good if you could show the backtrace too so that we know who is 
the caller. I guess what you show here is what you get on the screen ? 
The backtrace should be available throught 'dmesg'.

I guess we will now seek help from ARM64 people to understand why 
module_set_memory_something() fails with -EINVAL when loading modules.


> [    8.097821] pc : module_set_memory+0x9c/0xb8
> [    8.102068] lr : module_set_memory+0x9c/0xb8
> [    8.183101]  module_set_memory+0x9c/0xb8
> [    8.472862] module_set_memory(6, , 0) name x_tables
> returned -22
> [    8.479215] WARNING: CPU: 2 PID: 1 at kernel/module/strict_rwx.c:22
> module_set_memory+0x9c/0xb8
> [    8.510978] pc : module_set_memory+0x9c/0xb8
> [    8.515225] lr : module_set_memory+0x9c/0xb8
> [    8.596259]  module_set_memory+0x9c/0xb8
> [   10.529879] module_set_memory(6, , 0) name dm_mod
> returned -22
> [   10.536087] WARNING: CPU: 3 PID: 127 at kernel/module/strict_rwx.c:22
> module_set_memory+0x9c/0xb8
> [   10.568254] pc : module_set_memory+0x9c/0xb8
> [   10.572501] lr : module_set_memory+0x9c/0xb8
> [   10.653535]  module_set_memory+0x9c/0xb8
> [   10.853177] module_set_memory(6, , 0) name fuse
> returned -22
> [   10.859196] WARNING: CPU: 5 PID: 130 at kernel/module/strict_rwx.c:22
> module_set_memory+0x9c/0xb8
> [   10.891382] pc : module_set_memory+0x9c/0xb8
> [   10.895629] lr : module_set_memory+0x9c/0xb8
> [   10.976663]  module_set_memory+0x9c/0xb8
> 
> 
> 
>> diff --git a/kernel/module/strict_rwx.c b/kernel/module/strict_rwx.c
>> index a14df9655dbe..fdf8484154dd 100644
>> --- a/kernel/module/strict_rwx.c
>> +++ b/kernel/module/strict_rwx.c
>> @@ -15,9 +15,12 @@ static int module_set_memory(const struct module
>> *mod, 

[PATCH] eventfs: Restructure eventfs_inode structure to be more condensed

2024-01-31 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

Some of the eventfs_inode structure has holes in it. Rework the structure
to be a bit more condensed, and also remove the no longer used llist
field.

Link: 
https://lore.kernel.org/linux-trace-kernel/CAHk-=wgh0otaSyV0MNrQpwFDTjT3=twv94wit2euupdh2kd...@mail.gmail.com/

Suggested-by: Linus Torvalds 
Signed-off-by: Steven Rostedt (Google) 
---
 fs/tracefs/internal.h | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 1886f1826cd8..beb3dcd0e434 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -32,40 +32,37 @@ struct eventfs_attr {
 /*
  * struct eventfs_inode - hold the properties of the eventfs directories.
  * @list:  link list into the parent directory
+ * @rcu:   Union with @list for freeing
+ * @children:  link list into the child eventfs_inode
  * @entries:   the array of entries representing the files in the directory
  * @name:  the name of the directory to create
- * @children:  link list into the child eventfs_inode
  * @events_dir: the dentry of the events directory
  * @entry_attrs: Saved mode and ownership of the @d_children
- * @attr:  Saved mode and ownership of eventfs_inode itself
  * @data:  The private data to pass to the callbacks
+ * @attr:  Saved mode and ownership of eventfs_inode itself
  * @is_freed:  Flag set if the eventfs is on its way to be freed
  *Note if is_freed is set, then dentry is corrupted.
+ * @is_events: Flag set for only the top level "events" directory
  * @nr_entries: The number of items in @entries
+ * @ino:   The saved inode number
  */
 struct eventfs_inode {
-   struct kref kref;
-   struct list_headlist;
+   union {
+   struct list_headlist;
+   struct rcu_head rcu;
+   };
+   struct list_headchildren;
const struct eventfs_entry  *entries;
const char  *name;
-   struct list_headchildren;
struct dentry   *events_dir;
struct eventfs_attr *entry_attrs;
-   struct eventfs_attr attr;
void*data;
+   struct eventfs_attr attr;
+   struct kref kref;
unsigned intis_freed:1;
unsigned intis_events:1;
unsigned intnr_entries:30;
unsigned intino;
-   /*
-* Union - used for deletion
-* @llist:  for calling dput() if needed after RCU
-* @rcu:eventfs_inode to delete in RCU
-*/
-   union {
-   struct llist_node   llist;
-   struct rcu_head rcu;
-   };
 };
 
 static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
-- 
2.43.0




Re: [linus:master] [eventfs] 852e46e239: BUG:unable_to_handle_page_fault_for_address

2024-01-31 Thread Steven Rostedt
On Wed, 31 Jan 2024 11:35:18 -0800
Linus Torvalds  wrote:

> On Wed, 31 Jan 2024 at 07:58, Steven Rostedt  wrote:
> >
> > BTW, I ran my full test suite on your patches with the below updates and it
> > all passed.  
> 
> Those patch updates all look sane to me.
> 
> > I can break up and clean up the patches so that they are bisectable, and if
> > that passes the bisectable portion of my tests, I can still send them to
> > you for 6.8.  
> 
> Ack. That series you posted looks fine. I didn't do any actual testing
> or applying the patches, just looking at them.
> 
> The one thing I noticed is that the 'llist' removal still needs to be
> done. The logical point is that "[PATCH v2 7/7]" where the
> eventfs_workfn stuff is ripped out.
> 
> And the 'rcu' head should now be a union with something that is no
> longer used after the last kref. The only thing that *is* used after
> the last kref is the "is_freed" bit, so there's lots of choice. Using
> the 'struct list_head listl' that is used for the child list would
> seem to be the obvious choice, but it could be anything (including all
> of the beginning of that eventfs_inode, but then you would need to
> group that as another nested unnamed struct, so picking a "big enough"
> entry like 'list' makes it syntactically simpler.

Yeah, that was what I was talking about in my cover letter with:

  Note, there's more clean ups that can happen. One being cleaning up the
  eventfs_inode structure. But that's not critical now and can be added
  later.

I just want to get the majority of the broken parts done. The clean up of
the eventfs_inode is something that I'd add a separate patch. Not sure that
falls in your "fixes" category for 6.8.

-- Steve



Re: [linus:master] [eventfs] 852e46e239: BUG:unable_to_handle_page_fault_for_address

2024-01-31 Thread Linus Torvalds
On Wed, 31 Jan 2024 at 07:58, Steven Rostedt  wrote:
>
> BTW, I ran my full test suite on your patches with the below updates and it
> all passed.

Those patch updates all look sane to me.

> I can break up and clean up the patches so that they are bisectable, and if
> that passes the bisectable portion of my tests, I can still send them to
> you for 6.8.

Ack. That series you posted looks fine. I didn't do any actual testing
or applying the patches, just looking at them.

The one thing I noticed is that the 'llist' removal still needs to be
done. The logical point is that "[PATCH v2 7/7]" where the
eventfs_workfn stuff is ripped out.

And the 'rcu' head should now be a union with something that is no
longer used after the last kref. The only thing that *is* used after
the last kref is the "is_freed" bit, so there's lots of choice. Using
the 'struct list_head listl' that is used for the child list would
seem to be the obvious choice, but it could be anything (including all
of the beginning of that eventfs_inode, but then you would need to
group that as another nested unnamed struct, so picking a "big enough"
entry like 'list' makes it syntactically simpler.

   Linus



Re: [PATCH RFC 2/4] tracing: Use uts_release

2024-01-31 Thread Steven Rostedt
On Wed, 31 Jan 2024 10:48:49 +
John Garry  wrote:

> Instead of using UTS_RELEASE, use uts_release, which means that we don't
> need to rebuild the code just for the git head commit changing.
> 
> Signed-off-by: John Garry 

Acked-by: Steven Rostedt (Google) 

-- Steve

> ---
>  kernel/trace/trace.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 2a7c6fd934e9..68513924beb4 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -13,7 +13,7 @@
>   *  Copyright (C) 2004 Nadia Yvette Chambers
>   */
>  #include 
> -#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -4354,7 +4354,7 @@ print_trace_header(struct seq_file *m, struct 
> trace_iterator *iter)
>   get_total_entries(buf, , );
>  
>   seq_printf(m, "# %s latency trace v1.1.5 on %s\n",
> -name, UTS_RELEASE);
> +name, uts_release);
>   seq_puts(m, "# ---"
>"-\n");
>   seq_printf(m, "# latency: %lu us, #%lu/%lu, CPU#%d |"




Re: [PATCH RFC 3/4] net: ethtool: Use uts_release

2024-01-31 Thread Jakub Kicinski
On Wed, 31 Jan 2024 10:48:50 + John Garry wrote:
> Instead of using UTS_RELEASE, use uts_release, which means that we don't
> need to rebuild the code just for the git head commit changing.
> 
> Signed-off-by: John Garry 

Yes, please!

Acked-by: Jakub Kicinski 



Re: [PATCH v2 0/7] eventfs: Rewrite to simplify the code (aka: crapectomy)

2024-01-31 Thread Steven Rostedt
On Wed, 31 Jan 2024 13:49:18 -0500
Steven Rostedt  wrote:

> I would like to have this entire series go all the way back to 6.6 (after it
> is accepted in mainline of course) and replace everything since the creation
> of the eventfs code.  That is, stable releases may need to add all the
> patches that are in fs/tracefs to make that happen. The reason being is that
> this rewrite likely fixed a lot of hidden bugs and I honestly believe it's
> more stable than the code that currently exists.

If there is no more issues found here, and Linus pulls it into 6.8, I'll
make the backport series for both 6.7 and 6.6.

-- Steve



[PATCH v1] module.h: define __symbol_get_gpl() as a regular __symbol_get()

2024-01-31 Thread Andrew Kanner
Prototype for __symbol_get_gpl() was introduced in the initial git
commit 1da177e4c3f4 ("Linux-2.6.12-rc2"), but was not used after that.

In commit 9011e49d54dc ("modules: only allow symbol_get of
EXPORT_SYMBOL_GPL modules") Christoph Hellwig switched __symbol_get()
to process GPL symbols only, most likely this is what
__symbol_get_gpl() was designed to do.

We might either define __symbol_get_gpl() as __symbol_get() or remove
it completely as suggested by Mauro Carvalho Chehab.

Link: 
https://lore.kernel.org/lkml/5f001015990a76c0da35a4c3cf08e457ec353ab2.1652113087.git.mche...@kernel.org/
Signed-off-by: Andrew Kanner 
---
 include/linux/module.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index 96bc462872c0..8a660c81ac3d 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -299,7 +299,7 @@ struct notifier_block;
 extern int modules_disabled; /* for sysctl */
 /* Get/put a kernel symbol (calls must be symmetric) */
 void *__symbol_get(const char *symbol);
-void *__symbol_get_gpl(const char *symbol);
+#define __symbol_get_gpl(x) (__symbol_get(x))
 #define symbol_get(x) ((typeof())(__symbol_get(__stringify(x
 
 /* modules using other modules: kdb wants to see this. */
-- 
2.39.3




[PATCH v2 7/7] eventfs: Get rid of dentry pointers without refcounts

2024-01-31 Thread Steven Rostedt
From: Linus Torvalds 

The eventfs inode had pointers to dentries (and child dentries) without
actually holding a refcount on said pointer.  That is fundamentally
broken, and while eventfs tried to then maintain coherence with dentries
going away by hooking into the '.d_iput' callback, that doesn't actually
work since it's not ordered wrt lookups.

There were two reasonms why eventfs tried to keep a pointer to a dentry:

 - the creation of a 'events' directory would actually have a stable
   dentry pointer that it created with tracefs_start_creating().

   And it needed that dentry when tearing it all down again in
   eventfs_remove_events_dir().

   This use is actually ok, because the special top-level events
   directory dentries are actually stable, not just a temporary cache of
   the eventfs data structures.

 - the 'eventfs_inode' (aka ei) needs to stay around as long as there
   are dentries that refer to it.

   It then used these dentry pointers as a replacement for doing
   reference counting: it would try to make sure that there was only
   ever one dentry associated with an event_inode, and keep a child
   dentry array around to see which dentries might still refer to the
   parent ei.

This gets rid of the invalid dentry pointer use, and renames the one
valid case to a different name to make it clear that it's not just any
random dentry.

The magic child dentry array that is kind of a "reverse reference list"
is simply replaced by having child dentries take a ref to the ei.  As
does the directory dentries.  That makes the broken use case go away.

Link: 
https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.s...@intel.com/

Cc: sta...@vger.kernel.org
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds 
Signed-off-by: Steven Rostedt (Google) 
---
Changes since v1: 
https://lore.kernel.org/linux-trace-kernel/20240130190355.11486-5-torva...@linux-foundation.org

- Put back the kstrdup_const()

- use kfree_rcu(ei, rcu);

- Replace simple_recursive_removal() with d_invalidate().

 fs/tracefs/event_inode.c | 247 ---
 fs/tracefs/internal.h|   7 +-
 2 files changed, 77 insertions(+), 177 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 0213a3375d53..31cbe38739fa 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -62,6 +62,35 @@ enum {
 
 #define EVENTFS_MODE_MASK  (EVENTFS_SAVE_MODE - 1)
 
+/*
+ * eventfs_inode reference count management.
+ *
+ * NOTE! We count only references from dentries, in the
+ * form 'dentry->d_fsdata'. There are also references from
+ * directory inodes ('ti->private'), but the dentry reference
+ * count is always a superset of the inode reference count.
+ */
+static void release_ei(struct kref *ref)
+{
+   struct eventfs_inode *ei = container_of(ref, struct eventfs_inode, 
kref);
+   kfree(ei->entry_attrs);
+   kfree_const(ei->name);
+   kfree_rcu(ei, rcu);
+}
+
+static inline void put_ei(struct eventfs_inode *ei)
+{
+   if (ei)
+   kref_put(>kref, release_ei);
+}
+
+static inline struct eventfs_inode *get_ei(struct eventfs_inode *ei)
+{
+   if (ei)
+   kref_get(>kref);
+   return ei;
+}
+
 static struct dentry *eventfs_root_lookup(struct inode *dir,
  struct dentry *dentry,
  unsigned int flags);
@@ -289,7 +318,8 @@ static void update_inode_attr(struct dentry *dentry, struct 
inode *inode,
  * directory. The inode.i_private pointer will point to @data in the open()
  * call.
  */
-static struct dentry *lookup_file(struct dentry *dentry,
+static struct dentry *lookup_file(struct eventfs_inode *parent_ei,
+ struct dentry *dentry,
  umode_t mode,
  struct eventfs_attr *attr,
  void *data,
@@ -302,7 +332,7 @@ static struct dentry *lookup_file(struct dentry *dentry,
mode |= S_IFREG;
 
if (WARN_ON_ONCE(!S_ISREG(mode)))
-   return NULL;
+   return ERR_PTR(-EIO);
 
inode = tracefs_get_inode(dentry->d_sb);
if (unlikely(!inode))
@@ -321,9 +351,12 @@ static struct dentry *lookup_file(struct dentry *dentry,
ti = get_tracefs(inode);
ti->flags |= TRACEFS_EVENT_INODE;
 
+   // Files have their parent's ei as their fsdata
+   dentry->d_fsdata = get_ei(parent_ei);
+
d_add(dentry, inode);
fsnotify_create(dentry->d_parent->d_inode, dentry);
-   return dentry;
+   return NULL;
 };
 
 /**
@@ -359,22 +392,29 @@ static struct dentry *lookup_dir_entry(struct dentry 
*dentry,
/* Only directories have ti->private set to an ei, not files */
ti->private = ei;
 
-   dentry->d_fsdata = ei;
-ei->dentry = dentry;   // Remove me!
+   

[PATCH v2 5/7] eventfs: Remove unused d_parent pointer field

2024-01-31 Thread Steven Rostedt
From: Linus Torvalds 

It's never used

Link: 
https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.s...@intel.com/

Cc: sta...@vger.kernel.org
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds 
Signed-off-by: Steven Rostedt (Google) 
---
Original-patch: 
https://lore.kernel.org/linux-trace-kernel/20240130190355.11486-4-torva...@linux-foundation.org

 fs/tracefs/event_inode.c | 4 +---
 fs/tracefs/internal.h| 2 --
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 4878f4d578be..0289ec787367 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -686,10 +686,8 @@ struct eventfs_inode *eventfs_create_dir(const char *name, 
struct eventfs_inode
INIT_LIST_HEAD(>list);
 
mutex_lock(_mutex);
-   if (!parent->is_freed) {
+   if (!parent->is_freed)
list_add_tail(>list, >children);
-   ei->d_parent = parent->dentry;
-   }
mutex_unlock(_mutex);
 
/* Was the parent freed? */
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 09037e2c173d..932733a2696a 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -36,7 +36,6 @@ struct eventfs_attr {
  * @name:  the name of the directory to create
  * @children:  link list into the child eventfs_inode
  * @dentry: the dentry of the directory
- * @d_parent:   pointer to the parent's dentry
  * @d_children: The array of dentries to represent the files when created
  * @entry_attrs: Saved mode and ownership of the @d_children
  * @attr:  Saved mode and ownership of eventfs_inode itself
@@ -51,7 +50,6 @@ struct eventfs_inode {
const char  *name;
struct list_headchildren;
struct dentry   *dentry; /* Check is_freed to access */
-   struct dentry   *d_parent;
struct dentry   **d_children;
struct eventfs_attr *entry_attrs;
struct eventfs_attr attr;
-- 
2.43.0





[PATCH v2 4/7] tracefs: dentry lookup crapectomy

2024-01-31 Thread Steven Rostedt
From: Linus Torvalds 

The dentry lookup for eventfs files was very broken, and had lots of
signs of the old situation where the filesystem names were all created
statically in the dentry tree, rather than being looked up dynamically
based on the eventfs data structures.

You could see it in the naming - how it claimed to "create" dentries
rather than just look up the dentries that were given it.

You could see it in various nonsensical and very incorrect operations,
like using "simple_lookup()" on the dentries that were passed in, which
only results in those dentries becoming negative dentries.  Which meant
that any other lookup would possibly return ENOENT if it saw that
negative dentry before the data was then later filled in.

You could see it in the immense amount of nonsensical code that didn't
actually just do lookups.

Link: 
https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.s...@intel.com/

Cc: sta...@vger.kernel.org
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds 
Signed-off-by: Steven Rostedt (Google) 
---
Changes since v1: 
https://lore.kernel.org/linux-trace-kernel/20240130190355.11486-3-torva...@linux-foundation.org

- Fixed the lookup case of not found dentry, to return an error.
  This was added in a later patch when it should have been in this one.

- Removed the calls to eventfs_{start,end,failed}_creating()

 fs/tracefs/event_inode.c | 285 ---
 fs/tracefs/inode.c   |  69 --
 fs/tracefs/internal.h|   3 -
 3 files changed, 58 insertions(+), 299 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index e9819d719d2a..4878f4d578be 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -230,7 +230,6 @@ static struct eventfs_inode *eventfs_find_events(struct 
dentry *dentry)
 {
struct eventfs_inode *ei;
 
-   mutex_lock(_mutex);
do {
// The parent is stable because we do not do renames
dentry = dentry->d_parent;
@@ -247,7 +246,6 @@ static struct eventfs_inode *eventfs_find_events(struct 
dentry *dentry)
}
// Walk upwards until you find the events inode
} while (!ei->is_events);
-   mutex_unlock(_mutex);
 
update_top_events_attr(ei, dentry->d_sb);
 
@@ -280,11 +278,10 @@ static void update_inode_attr(struct dentry *dentry, 
struct inode *inode,
 }
 
 /**
- * create_file - create a file in the tracefs filesystem
- * @name: the name of the file to create.
+ * lookup_file - look up a file in the tracefs filesystem
+ * @dentry: the dentry to look up
  * @mode: the permission that the file should have.
  * @attr: saved attributes changed by user
- * @parent: parent dentry for this file.
  * @data: something that the caller will want to get to later on.
  * @fop: struct file_operations that should be used for this file.
  *
@@ -292,13 +289,13 @@ static void update_inode_attr(struct dentry *dentry, 
struct inode *inode,
  * directory. The inode.i_private pointer will point to @data in the open()
  * call.
  */
-static struct dentry *create_file(const char *name, umode_t mode,
+static struct dentry *lookup_file(struct dentry *dentry,
+ umode_t mode,
  struct eventfs_attr *attr,
- struct dentry *parent, void *data,
+ void *data,
  const struct file_operations *fop)
 {
struct tracefs_inode *ti;
-   struct dentry *dentry;
struct inode *inode;
 
if (!(mode & S_IFMT))
@@ -307,15 +304,9 @@ static struct dentry *create_file(const char *name, 
umode_t mode,
if (WARN_ON_ONCE(!S_ISREG(mode)))
return NULL;
 
-   WARN_ON_ONCE(!parent);
-   dentry = eventfs_start_creating(name, parent);
-
-   if (IS_ERR(dentry))
-   return dentry;
-
inode = tracefs_get_inode(dentry->d_sb);
if (unlikely(!inode))
-   return eventfs_failed_creating(dentry);
+   return ERR_PTR(-ENOMEM);
 
/* If the user updated the directory's attributes, use them */
update_inode_attr(dentry, inode, attr, mode);
@@ -329,32 +320,29 @@ static struct dentry *create_file(const char *name, 
umode_t mode,
 
ti = get_tracefs(inode);
ti->flags |= TRACEFS_EVENT_INODE;
-   d_instantiate(dentry, inode);
+
+   d_add(dentry, inode);
fsnotify_create(dentry->d_parent->d_inode, dentry);
-   return eventfs_end_creating(dentry);
+   return dentry;
 };
 
 /**
- * create_dir - create a dir in the tracefs filesystem
+ * lookup_dir_entry - look up a dir in the tracefs filesystem
+ * @dentry: the directory to look up
  * @ei: the eventfs_inode that represents the directory to create
- * @parent: parent dentry for this file.
  *
- * This function will create a dentry for a 

[PATCH v2 6/7] eventfs: Clean up dentry ops and add revalidate function

2024-01-31 Thread Steven Rostedt
From: Linus Torvalds 

In order for the dentries to stay up-to-date with the eventfs changes,
just add a 'd_revalidate' function that checks the 'is_freed' bit.

Also, clean up the dentry release to actually use d_release() rather
than the slightly odd d_iput() function.  We don't care about the inode,
all we want to do is to get rid of the refcount to the eventfs data
added by dentry->d_fsdata.

It would probably be cleaner to make eventfs its own filesystem, or at
least set its own dentry ops when looking up eventfs files.  But as it
is, only eventfs dentries use d_fsdata, so we don't really need to split
these things up by use.

Another thing that might be worth doing is to make all eventfs lookups
mark their dentries as not worth caching.  We could do that with
d_delete(), but the DCACHE_DONTCACHE flag would likely be even better.

As it is, the dentries are all freeable, but they only tend to get freed
at memory pressure rather than more proactively.  But that's a separate
issue.

Link: 
https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.s...@intel.com/

Cc: sta...@vger.kernel.org
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds 
Signed-off-by: Steven Rostedt (Google) 
---
Original-patch: 
https://lore.kernel.org/linux-trace-kernel/20240130190355.11486-6-torva...@linux-foundation.org

 fs/tracefs/event_inode.c |  5 ++---
 fs/tracefs/inode.c   | 27 ++-
 fs/tracefs/internal.h|  3 ++-
 3 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 0289ec787367..0213a3375d53 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -378,13 +378,12 @@ static void free_ei(struct eventfs_inode *ei)
 }
 
 /**
- * eventfs_set_ei_status_free - remove the dentry reference from an 
eventfs_inode
- * @ti: the tracefs_inode of the dentry
+ * eventfs_d_release - dentry is going away
  * @dentry: dentry which has the reference to remove.
  *
  * Remove the association between a dentry from an eventfs_inode.
  */
-void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry 
*dentry)
+void eventfs_d_release(struct dentry *dentry)
 {
struct eventfs_inode *ei;
int i;
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 5c84460feeeb..d65ffad4c327 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -377,21 +377,30 @@ static const struct super_operations 
tracefs_super_operations = {
.show_options   = tracefs_show_options,
 };
 
-static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode)
+/*
+ * It would be cleaner if eventfs had its own dentry ops.
+ *
+ * Note that d_revalidate is called potentially under RCU,
+ * so it can't take the eventfs mutex etc. It's fine - if
+ * we open a file just as it's marked dead, things will
+ * still work just fine, and just see the old stale case.
+ */
+static void tracefs_d_release(struct dentry *dentry)
 {
-   struct tracefs_inode *ti;
+   if (dentry->d_fsdata)
+   eventfs_d_release(dentry);
+}
 
-   if (!dentry || !inode)
-   return;
+static int tracefs_d_revalidate(struct dentry *dentry, unsigned int flags)
+{
+   struct eventfs_inode *ei = dentry->d_fsdata;
 
-   ti = get_tracefs(inode);
-   if (ti && ti->flags & TRACEFS_EVENT_INODE)
-   eventfs_set_ei_status_free(ti, dentry);
-   iput(inode);
+   return !(ei && ei->is_freed);
 }
 
 static const struct dentry_operations tracefs_dentry_operations = {
-   .d_iput = tracefs_dentry_iput,
+   .d_revalidate = tracefs_d_revalidate,
+   .d_release = tracefs_d_release,
 };
 
 static int trace_fill_super(struct super_block *sb, void *data, int silent)
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 932733a2696a..4b50a0668055 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -78,6 +78,7 @@ struct dentry *tracefs_start_creating(const char *name, 
struct dentry *parent);
 struct dentry *tracefs_end_creating(struct dentry *dentry);
 struct dentry *tracefs_failed_creating(struct dentry *dentry);
 struct inode *tracefs_get_inode(struct super_block *sb);
-void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry 
*dentry);
+
+void eventfs_d_release(struct dentry *dentry);
 
 #endif /* _TRACEFS_INTERNAL_H */
-- 
2.43.0





[PATCH v2 3/7] tracefs: Avoid using the ei->dentry pointer unnecessarily

2024-01-31 Thread Steven Rostedt
From: Linus Torvalds 

The eventfs_find_events() code tries to walk up the tree to find the
event directory that a dentry belongs to, in order to then find the
eventfs inode that is associated with that event directory.

However, it uses an odd combination of walking the dentry parent,
looking up the eventfs inode associated with that, and then looking up
the dentry from there.  Repeat.

But the code shouldn't have back-pointers to dentries in the first
place, and it should just walk the dentry parenthood chain directly.

Similarly, 'set_top_events_ownership()' looks up the dentry from the
eventfs inode, but the only reason it wants a dentry is to look up the
superblock in order to look up the root dentry.

But it already has the real filesystem inode, which has that same
superblock pointer.  So just pass in the superblock pointer using the
information that's already there, instead of looking up extraneous data
that is irrelevant.

Link: 
https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.s...@intel.com/

Cc: sta...@vger.kernel.org
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds 
Signed-off-by: Steven Rostedt (Google) 
---
Original patch: 
https://lore.kernel.org/linux-trace-kernel/20240130190355.11486-1-torva...@linux-foundation.org

 fs/tracefs/event_inode.c | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 824b1811e342..e9819d719d2a 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -156,33 +156,30 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, 
struct dentry *dentry,
return ret;
 }
 
-static void update_top_events_attr(struct eventfs_inode *ei, struct dentry 
*dentry)
+static void update_top_events_attr(struct eventfs_inode *ei, struct 
super_block *sb)
 {
-   struct inode *inode;
+   struct inode *root;
 
/* Only update if the "events" was on the top level */
if (!ei || !(ei->attr.mode & EVENTFS_TOPLEVEL))
return;
 
/* Get the tracefs root inode. */
-   inode = d_inode(dentry->d_sb->s_root);
-   ei->attr.uid = inode->i_uid;
-   ei->attr.gid = inode->i_gid;
+   root = d_inode(sb->s_root);
+   ei->attr.uid = root->i_uid;
+   ei->attr.gid = root->i_gid;
 }
 
 static void set_top_events_ownership(struct inode *inode)
 {
struct tracefs_inode *ti = get_tracefs(inode);
struct eventfs_inode *ei = ti->private;
-   struct dentry *dentry;
 
/* The top events directory doesn't get automatically updated */
if (!ei || !ei->is_events || !(ei->attr.mode & EVENTFS_TOPLEVEL))
return;
 
-   dentry = ei->dentry;
-
-   update_top_events_attr(ei, dentry);
+   update_top_events_attr(ei, inode->i_sb);
 
if (!(ei->attr.mode & EVENTFS_SAVE_UID))
inode->i_uid = ei->attr.uid;
@@ -235,8 +232,10 @@ static struct eventfs_inode *eventfs_find_events(struct 
dentry *dentry)
 
mutex_lock(_mutex);
do {
-   /* The parent always has an ei, except for events itself */
-   ei = dentry->d_parent->d_fsdata;
+   // The parent is stable because we do not do renames
+   dentry = dentry->d_parent;
+   // ... and directories always have d_fsdata
+   ei = dentry->d_fsdata;
 
/*
 * If the ei is being freed, the ownership of the children
@@ -246,12 +245,11 @@ static struct eventfs_inode *eventfs_find_events(struct 
dentry *dentry)
ei = NULL;
break;
}
-
-   dentry = ei->dentry;
+   // Walk upwards until you find the events inode
} while (!ei->is_events);
mutex_unlock(_mutex);
 
-   update_top_events_attr(ei, dentry);
+   update_top_events_attr(ei, dentry->d_sb);
 
return ei;
 }
-- 
2.43.0





[PATCH v2 2/7] eventfs: Initialize the tracefs inode properly

2024-01-31 Thread Steven Rostedt
From: Linus Torvalds 

The tracefs-specific fields in the inode were not initialized before the
inode was exposed to others through the dentry with 'd_instantiate()'.

Move the field initializations up to before the d_instantiate.

Cc: sta...@vger.kernel.org
Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-lkp/202401291043.e62e89dc-oliver.s...@intel.com
Signed-off-by: Linus Torvalds 
Signed-off-by: Steven Rostedt (Google) 
---
Changes since v1: 
https://lore.kernel.org/linux-trace-kernel/20240130190355.11486-2-torva...@linux-foundation.org

-  Since another patch zeroed out the entire tracefs_inode, there's no need
   to initialize any of its fields to NULL.

 fs/tracefs/event_inode.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 1c3dd0ad4660..824b1811e342 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -370,6 +370,8 @@ static struct dentry *create_dir(struct eventfs_inode *ei, 
struct dentry *parent
 
ti = get_tracefs(inode);
ti->flags |= TRACEFS_EVENT_INODE;
+   /* Only directories have ti->private set to an ei, not files */
+   ti->private = ei;
 
inc_nlink(inode);
d_instantiate(dentry, inode);
@@ -515,7 +517,6 @@ create_file_dentry(struct eventfs_inode *ei, int idx,
 static void eventfs_post_create_dir(struct eventfs_inode *ei)
 {
struct eventfs_inode *ei_child;
-   struct tracefs_inode *ti;
 
lockdep_assert_held(_mutex);
 
@@ -525,9 +526,6 @@ static void eventfs_post_create_dir(struct eventfs_inode 
*ei)
 srcu_read_lock_held(_srcu)) {
ei_child->d_parent = ei->dentry;
}
-
-   ti = get_tracefs(ei->dentry->d_inode);
-   ti->private = ei;
 }
 
 /**
-- 
2.43.0





[PATCH v2 0/7] eventfs: Rewrite to simplify the code (aka: crapectomy)

2024-01-31 Thread Steven Rostedt


Linus took the time to massively clean up the eventfs logic.
I took his code and made tweaks to represent some of the feedback
from Al Viro and also fix issues that came up in testing.

The diff between v1 and this can be found here:
  
https://lore.kernel.org/linux-trace-kernel/20240131105847.3e9af...@gandalf.local.home/
 
  Although the first patch I changed to use memset_after() since
  that update.

I would like to have this entire series go all the way back to 6.6 (after it
is accepted in mainline of course) and replace everything since the creation
of the eventfs code.  That is, stable releases may need to add all the
patches that are in fs/tracefs to make that happen. The reason being is that
this rewrite likely fixed a lot of hidden bugs and I honestly believe it's
more stable than the code that currently exists.

Note, there's more clean ups that can happen. One being cleaning up
the eventfs_inode structure. But that's not critical now and can be
added later.

This made it through one round of my testing. I'm going to run it
again but with the part of testing that also runs some tests on
each patch in the series to make sure it doesn't break bisection.

In Linus's first version, patch 5 broke some of the tests but was fixed
in patch 6. I swapped the order and moved patch 6 before patch 5
and it appears to work. I still need to run this through all
my testing again.

Version 1 is at: 
https://lore.kernel.org/linux-trace-kernel/20240130190355.11486-1-torva...@linux-foundation.org/



Linus Torvalds (6):
  eventfs: Initialize the tracefs inode properly
  tracefs: Avoid using the ei->dentry pointer unnecessarily
  tracefs: dentry lookup crapectomy
  eventfs: Remove unused 'd_parent' pointer field
  eventfs: Clean up dentry ops and add revalidate function
  eventfs: Get rid of dentry pointers without refcounts

Steven Rostedt (Google) (1):
  tracefs: Zero out the tracefs_inode when allocating it


 fs/tracefs/event_inode.c | 551 ---
 fs/tracefs/inode.c   | 102 ++---
 fs/tracefs/internal.h|  18 +-
 3 files changed, 167 insertions(+), 504 deletions(-)



[PATCH v2 1/7] tracefs: Zero out the tracefs_inode when allocating it

2024-01-31 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

eventfs uses the tracefs_inode and assumes that it's already initialized
to zero. That is, it doesn't set fields to zero (like ti->private) after
getting its tracefs_inode. This causes bugs due to stale values.

Just initialize the entire structure to zero on allocation so there isn't
any more surprises.

This is a partial fix to access to ti->private. The assignment still needs
to be made before the dentry is instantiated.

Cc: sta...@vger.kernel.org
Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-lkp/202401291043.e62e89dc-oliver.s...@intel.com
Suggested-by: Linus Torvalds 
Signed-off-by: Steven Rostedt (Google) 
---
Changes since last version: 
https://lore.kernel.org/all/20240130230612.377a1...@gandalf.local.home/

- Moved vfs_inode to top of tracefs_inode structure so that the rest can
  be initialized with memset_after() as the vfs_inode portion is already
  cleared with a memset() itself in inode_init_once().

 fs/tracefs/inode.c| 6 --
 fs/tracefs/internal.h | 3 ++-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index e1b172c0e091..888e42087847 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -38,8 +38,6 @@ static struct inode *tracefs_alloc_inode(struct super_block 
*sb)
if (!ti)
return NULL;
 
-   ti->flags = 0;
-
return >vfs_inode;
 }
 
@@ -779,7 +777,11 @@ static void init_once(void *foo)
 {
struct tracefs_inode *ti = (struct tracefs_inode *) foo;
 
+   /* inode_init_once() calls memset() on the vfs_inode portion */
inode_init_once(>vfs_inode);
+
+   /* Zero out the rest */
+   memset_after(ti, 0, vfs_inode);
 }
 
 static int __init tracefs_init(void)
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 91c2bf0b91d9..7d84349ade87 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -11,9 +11,10 @@ enum {
 };
 
 struct tracefs_inode {
+   struct inodevfs_inode;
+   /* The below gets initialized with memset_after(ti, 0, vfs_inode) */
unsigned long   flags;
void*private;
-   struct inodevfs_inode;
 };
 
 /*
-- 
2.43.0





Re: [PATCH v2 4/4] remoteproc: stm32: Add support of an OP-TEE TA to load the firmware

2024-01-31 Thread Mathieu Poirier
On Tue, Jan 30, 2024 at 10:13:48AM +0100, Arnaud POULIQUEN wrote:
> 
> 
> On 1/26/24 18:11, Mathieu Poirier wrote:
> > On Thu, Jan 18, 2024 at 11:04:33AM +0100, Arnaud Pouliquen wrote:
> >> The new TEE remoteproc device is used to manage remote firmware in a
> >> secure, trusted context. The 'st,stm32mp1-m4-tee' compatibility is
> >> introduced to delegate the loading of the firmware to the trusted
> >> execution context. In such cases, the firmware should be signed and
> >> adhere to the image format defined by the TEE.
> >>
> >> Signed-off-by: Arnaud Pouliquen 
> >> ---
> >> V1 to V2 update:
> >> - remove the select "TEE_REMOTEPROC" in STM32_RPROC config as detected by
> >>   the kernel test robot:
> >>  WARNING: unmet direct dependencies detected for TEE_REMOTEPROC
> >>  Depends on [n]: REMOTEPROC [=y] && OPTEE [=n]
> >>  Selected by [y]:
> >>  - STM32_RPROC [=y] && (ARCH_STM32 || COMPILE_TEST [=y]) && REMOTEPROC 
> >> [=y]
> >> - Fix initialized trproc variable in  stm32_rproc_probe
> >> ---
> >>  drivers/remoteproc/stm32_rproc.c | 149 +--
> >>  1 file changed, 144 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/remoteproc/stm32_rproc.c 
> >> b/drivers/remoteproc/stm32_rproc.c
> >> index fcc0001e2657..cf6a21bac945 100644
> >> --- a/drivers/remoteproc/stm32_rproc.c
> >> +++ b/drivers/remoteproc/stm32_rproc.c
> >> @@ -20,6 +20,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >>  #include 
> >>  
> >>  #include "remoteproc_internal.h"
> >> @@ -49,6 +50,9 @@
> >>  #define M4_STATE_STANDBY  4
> >>  #define M4_STATE_CRASH5
> >>  
> >> +/* Remote processor unique identifier aligned with the Trusted Execution 
> >> Environment definitions */
> >> +#define STM32_MP1_M4_PROC_ID0
> >> +
> >>  struct stm32_syscon {
> >>struct regmap *map;
> >>u32 reg;
> >> @@ -90,6 +94,8 @@ struct stm32_rproc {
> >>struct stm32_mbox mb[MBOX_NB_MBX];
> >>struct workqueue_struct *workqueue;
> >>bool hold_boot_smc;
> >> +  bool fw_loaded;
> >> +  struct tee_rproc *trproc;
> >>void __iomem *rsc_va;
> >>  };
> >>  
> >> @@ -257,6 +263,91 @@ static int stm32_rproc_release(struct rproc *rproc)
> >>return err;
> >>  }
> >>  
> >> +static int stm32_rproc_tee_elf_sanity_check(struct rproc *rproc,
> >> +  const struct firmware *fw)
> >> +{
> >> +  struct stm32_rproc *ddata = rproc->priv;
> >> +  unsigned int ret = 0;
> >> +
> >> +  if (rproc->state == RPROC_DETACHED)
> >> +  return 0;
> >> +
> >> +  ret = tee_rproc_load_fw(ddata->trproc, fw);
> >> +  if (!ret)
> >> +  ddata->fw_loaded = true;
> >> +
> >> +  return ret;
> >> +}
> >> +
> >> +static int stm32_rproc_tee_elf_load(struct rproc *rproc,
> >> +  const struct firmware *fw)
> >> +{
> >> +  struct stm32_rproc *ddata = rproc->priv;
> >> +  unsigned int ret;
> >> +
> >> +  /*
> >> +   * This function can be called by remote proc for recovery
> >> +   * without the sanity check. In this case we need to load the firmware
> >> +   * else nothing done here as the firmware has been preloaded for the
> >> +   * sanity check to be able to parse it for the resource table.
> >> +   */
> > 
> > This comment is very confusing - please consider refactoring.  
> > 
> >> +  if (ddata->fw_loaded)
> >> +  return 0;
> >> +
> > 
> > I'm not sure about keeping a flag to indicate the status of the loaded 
> > firmware.
> > It is not done for the non-secure method, I don't see why it would be 
> > needed for
> > the secure one.
> > 
> 
> The difference is on the sanity check.
> - in rproc_elf_sanity_check we  parse the elf file to verify that it is
> valid.
> - in stm32_rproc_tee_elf_sanity_check we have to do the same, that means to
> authenticate it. the authentication is done during the load.
> 
> So this flag is used to avoid to reload it twice time.
> refactoring the comment should help to understand this flag
> 
> 
> An alternative would be to bypass the sanity check. But this lead to same
> limitation.
> Before loading the firmware in remoteproc_core, we call rproc_parse_fw() that 
> is
> used to get the resource table address. To get it from tee we need to
> authenticate the firmware so load it...
>

I spent a long time thinking about this patchset.  Looking at the code as it
is now, request_firmware() in rproc_boot() is called even when the TEE is
responsible for loading the firmware.  There should be some conditional code
that calls either request_firmware() or tee_rproc_load_fw().  The latter should
also be renamed to tee_rproc_request_firmware() to avoid confusion.

I touched on that before but please rename rproc_tee_get_rsc_table() to
rproc_tee_elf_load_rsc_table().  I also suggest to introduce a new function,
rproc_tee_get_loaded_rsc_table() that would be called from
rproc_tee_elf_load_rsc_table().  That way we don't need trproc->rsc_va.  

I also think tee_rproc should be renamed to 

Re: [PATCH 5/6] eventfs: get rid of dentry pointers without refcounts

2024-01-31 Thread Steven Rostedt
On Wed, 31 Jan 2024 10:08:37 -0800
Linus Torvalds  wrote:

> On Wed, 31 Jan 2024 at 05:14, Steven Rostedt  wrote:
> >
> > If you also notice, tracefs only allows mkdir/rmdir to be assigned to
> > one directory. Once it is assigned, no other directories can have mkdir
> > rmdir functionality.  
> 
> I think that does limit the damage, but it's not clear that it is actually 
> safe.
> 
> Because you don't hold the inode lock, somebody could come in and do a
> mkdir inside the other one that is being removed, ie something like
> 
>  - thread 1 does took the inode lock, called ->rmdir
> 
>  - it then drops the inode lock (both parent and the directory that is
> getting removed) and gets the event lock
> 
>  - but thread 2 can come in between that inode lock drop and event lock
> 
> Notice: thread 2 comes in where there is *no* locking. Nada. Zilch.
> 
> This is what worries me.

Yep, and that was my biggest concern too, which is why I have stress tests
that try to hit that above scenario. Well, rmdir and other accesses
including other mkdir's of the same name.

As my knowledge on the inode life time is still limited, my main concern
was just corruption of the dentry/inodes themselves. But the first one to
get the event_mutex determines the state of the file system.

If thread 1 is doing rmdir, what would thread 2 do that can harm it?

The rmdir calls trace_remove() which is basically retrying to remove the
directory again, and hopefully has the proper locking just like removing
the kprobe trace event that deletes directories. It can have references on
it.

Now if something were to get a reference, and a valid dentry, as soon as
the open function is called, the tracing logic will see that the
trace_array no longer exists and returns an error.

All the open functions for files that are created in an instance (and that
includes eventfs files) have a check to see if the inode->i_private data is
still valid. The trace_array structure represents the directory, and
there's a link list of all the trace_array structures that is protected by
the trace_types_lock. It grabs that lock, iterates the array to see if the
passed in trace_array is on it, if it is, it ups the ref count (preventing
a rmdir from succeeding) and returns it. If it is not, it returns NULL and
the open call fails as if it opened a nonexistent file.

> 
> But it does help that it's all only in *one* directory.  At least
> another mkdir cannot happen *inside* the one that is going away while
> the locks are not held. So the good news is that it does mean that
> there's only one point that is being protected.
> 
> But I do worry about things like this (in vfs_rmdir()):
> 
> inode_lock(dentry->d_inode);
> 
> error = -EBUSY;
> if (is_local_mountpoint(dentry) ||
> (dentry->d_inode->i_flags & S_KERNEL_FILE))
> goto out;
> 
> error = security_inode_rmdir(dir, dentry);
> if (error)
> goto out;
> 
> error = dir->i_op->rmdir(dir, dentry);
> if (error)
> goto out;
> 
> notice how it does that "is this a mount point" test atomically wrt
> the rmdir before it is allowed to proceed.

You mean if someone did:

 # mkdir instances/foo
 # rmdir instances/foo

and at the same time, someone else did

 # mount -t ext4 /dev/sda instances/foo

?

OK, I never thought of that use case. Although, I think if someone is
trying to mount anything in tracefs, they can keep the pieces. ;-)

> 
> And I do think that the inode lock is what also protects it from
> concurrent mounts. So now what happens when that "thread 2" above
> comes in while there is *no* locking, and mounts something there?
> 
> Now, I'm not saying this is a huge problem. But it's very much an
> example of a thing that *could* be a problem. Dropping locks in the
> middle is just very scary.

No arguments from me. I really didn't like the dropping of the locks, and
tried hard to avoid it. If switching over to kernfs can solve that, I'd let
that conversion happen.

I'm all for someone switching tracefs over to kernfs if it solves all these
unknown bugs, as long as it doesn't hurt the memory savings of eventfs. But
again, I'm also willing to make eventfs its own file system (although I
don't have the time yet to do that) where tracefs isn't burdened by it.

-- Steve



Re: [ANNOUNCE] 5.10.204-rt100

2024-01-31 Thread Pavel Machek
Hi!

> > We (as in cip project), are trying to do -cip-rt releases
> > once a month. Are there any plans for 5.10-rt release any time soon?
> > That would help us ;-).
> 
> I already pushed v5.10-rt-next (containing v5.10.209-rt101-rc1) to
> kernel.org and kernelci should pick that up for comprehensive testing
> within the next hour. As soon as the testing is done I will perform the
> release dance.
> 
> My vacations started (abruptly) a few days before I planned and that lead
> to some delays. People volunteered to run the builds if anything critical
> popped up, but that was not the case.
> 
> Sorry for the inconvenience, I do hope a release tomorrow or Friday does
> not disrupt your workflow too much.

No problem, thanks for the information, and looking forward to the
release.

Best regards,
Pavel
-- 
DENX Software Engineering GmbH,Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


signature.asc
Description: PGP signature


  1   2   >