Re: [PATCH v14 4/6] tracing: Allow user-space mapping of the ring-buffer

2024-02-06 Thread kernel test robot
Hi Vincent,

kernel test robot noticed the following build warnings:

[auto build test WARNING on e2412e51fdea837b50ce31fea8e5dfc885237f3a]

url:
https://github.com/intel-lab-lkp/linux/commits/Vincent-Donnefort/ring-buffer-Zero-ring-buffer-sub-buffers/20240206-004413
base:   e2412e51fdea837b50ce31fea8e5dfc885237f3a
patch link:
https://lore.kernel.org/r/20240205163410.2296552-5-vdonnefort%40google.com
patch subject: [PATCH v14 4/6] tracing: Allow user-space mapping of the 
ring-buffer
config: arc-defconfig 
(https://download.01.org/0day-ci/archive/20240206/202402061809.t2cv9j8w-...@intel.com/config)
compiler: arc-elf-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240206/202402061809.t2cv9j8w-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202402061809.t2cv9j8w-...@intel.com/

All warnings (new ones prefixed by >>):

   kernel/trace/trace.c: In function 'tracing_buffers_mmap_close':
>> kernel/trace/trace.c:8722:29: warning: unused variable 'tr' 
>> [-Wunused-variable]
8722 | struct trace_array *tr = iter->tr;
 | ^~
   kernel/trace/trace.c: In function 'tracing_buffers_mmap':
   kernel/trace/trace.c:8752:29: warning: unused variable 'tr' 
[-Wunused-variable]
8752 | struct trace_array *tr = iter->tr;
 | ^~


vim +/tr +8722 kernel/trace/trace.c

  8717  
  8718  static void tracing_buffers_mmap_close(struct vm_area_struct *vma)
  8719  {
  8720  struct ftrace_buffer_info *info = vma->vm_file->private_data;
  8721  struct trace_iterator *iter = >iter;
> 8722  struct trace_array *tr = iter->tr;
  8723  
  8724  ring_buffer_unmap(iter->array_buffer->buffer, iter->cpu_file);
  8725  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



Re: [PATCH v14 4/6] tracing: Allow user-space mapping of the ring-buffer

2024-02-05 Thread Vincent Donnefort
On Mon, Feb 05, 2024 at 01:44:47PM -0500, Mathieu Desnoyers wrote:
> On 2024-02-05 13:34, Vincent Donnefort wrote:
> > On Mon, Feb 05, 2024 at 11:55:08AM -0500, Mathieu Desnoyers wrote:
> [...]
> 
> > > 
> > > How are the kernel linear mapping and the userspace mapping made coherent
> > > on architectures with virtually aliasing data caches ?
> > > 
> > > Ref. 
> > > https://lore.kernel.org/lkml/20240202210019.88022-1-mathieu.desnoy...@efficios.com/T/#t
> > 
> > Hi Mathieu,
> > 
> > Thanks for the pointer.
> > 
> > We are in the exact same problem as DAX. We do modify the data through the
> > kernel linear mapping while user-space can read it through its own. I should
> > probably return an error when used with any of the arch ARM || SPARC || 
> > MIPS,
> > until cpu_dcache_is_aliasing() introduces a fine-grain differentiation.
> 
> You might want to use LTTng's ring buffer approach instead. See
> 
> https://github.com/lttng/lttng-modules/blob/master/src/lib/ringbuffer/ring_buffer_frontend.c#L1202
> 
> lib_ring_buffer_flush_read_subbuf_dcache()

Thanks!

> 
> Basically, whenever user-space grabs a sub-buffer for reading (through
> lttng-modules's LTTNG_KERNEL_ABI_RING_BUFFER_GET_SUBBUF ioctl), lttng
> calls flush_dcache_page() on all pages of this subbuffer (I should
> really change this for a call to flush_dcache_folio() which would be
> more efficient).
> 
> Note that doing this is not very efficient on architectures which have
> coherent data caches and incoherent dcache vs icache: in that case,
> we issue the flush_dcache_page uselessly. I plan on using the new
> cpu_dcache_is_aliasing() check once/if it makes it way upstream to
> remove those useless flushes on architectures which define
> ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, but do not virtually alias the
> data cache.

I believe the aim is to use the mapping by default in libtracefs and fallback to
splice whenever not available...  But for those arch, I guess that might be a
mistake. Wonder if then it isn't just better to return ENOTSUPP?

> 
> The equivalent of LTTng's "get subbuf" operation would be
> the new TRACE_MMAP_IOCTL_GET_READER ioctl in ftrace AFAIU.

That is correct!

> 
> Thanks,
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
> 



Re: [PATCH v14 4/6] tracing: Allow user-space mapping of the ring-buffer

2024-02-05 Thread Mathieu Desnoyers

On 2024-02-05 13:34, Vincent Donnefort wrote:

On Mon, Feb 05, 2024 at 11:55:08AM -0500, Mathieu Desnoyers wrote:

[...]



How are the kernel linear mapping and the userspace mapping made coherent
on architectures with virtually aliasing data caches ?

Ref. 
https://lore.kernel.org/lkml/20240202210019.88022-1-mathieu.desnoy...@efficios.com/T/#t


Hi Mathieu,

Thanks for the pointer.

We are in the exact same problem as DAX. We do modify the data through the
kernel linear mapping while user-space can read it through its own. I should
probably return an error when used with any of the arch ARM || SPARC || MIPS,
until cpu_dcache_is_aliasing() introduces a fine-grain differentiation.


You might want to use LTTng's ring buffer approach instead. See

https://github.com/lttng/lttng-modules/blob/master/src/lib/ringbuffer/ring_buffer_frontend.c#L1202

lib_ring_buffer_flush_read_subbuf_dcache()

Basically, whenever user-space grabs a sub-buffer for reading (through
lttng-modules's LTTNG_KERNEL_ABI_RING_BUFFER_GET_SUBBUF ioctl), lttng
calls flush_dcache_page() on all pages of this subbuffer (I should
really change this for a call to flush_dcache_folio() which would be
more efficient).

Note that doing this is not very efficient on architectures which have
coherent data caches and incoherent dcache vs icache: in that case,
we issue the flush_dcache_page uselessly. I plan on using the new
cpu_dcache_is_aliasing() check once/if it makes it way upstream to
remove those useless flushes on architectures which define
ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, but do not virtually alias the
data cache.

The equivalent of LTTng's "get subbuf" operation would be
the new TRACE_MMAP_IOCTL_GET_READER ioctl in ftrace AFAIU.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com




Re: [PATCH v14 4/6] tracing: Allow user-space mapping of the ring-buffer

2024-02-05 Thread Vincent Donnefort
On Mon, Feb 05, 2024 at 11:55:08AM -0500, Mathieu Desnoyers wrote:
> On 2024-02-05 11:34, Vincent Donnefort wrote:
> > Currently, user-space extracts data from the ring-buffer via splice,
> > which is handy for storage or network sharing. However, due to splice
> > limitations, it is imposible to do real-time analysis without a copy.
> > 
> > A solution for that problem is to let the user-space map the ring-buffer
> > directly.
> > 
> > The mapping is exposed via the per-CPU file trace_pipe_raw. The first
> > element of the mapping is the meta-page. It is followed by each
> > subbuffer constituting the ring-buffer, ordered by their unique page ID:
> > 
> >* Meta-page -- include/uapi/linux/trace_mmap.h for a description
> >* Subbuf ID 0
> >* Subbuf ID 1
> >   ...
> > 
> > It is therefore easy to translate a subbuf ID into an offset in the
> > mapping:
> > 
> >reader_id = meta->reader->id;
> >reader_offset = meta->meta_page_size + reader_id * meta->subbuf_size;
> > 
> > When new data is available, the mapper must call a newly introduced ioctl:
> > TRACE_MMAP_IOCTL_GET_READER. This will update the Meta-page reader ID to
> > point to the next reader containing unread data.
> > 
> > Mapping will prevent snapshot and buffer size modifications.
> 
> How are the kernel linear mapping and the userspace mapping made coherent
> on architectures with virtually aliasing data caches ?
> 
> Ref. 
> https://lore.kernel.org/lkml/20240202210019.88022-1-mathieu.desnoy...@efficios.com/T/#t

Hi Mathieu,

Thanks for the pointer.

We are in the exact same problem as DAX. We do modify the data through the
kernel linear mapping while user-space can read it through its own. I should
probably return an error when used with any of the arch ARM || SPARC || MIPS,
until cpu_dcache_is_aliasing() introduces a fine-grain differentiation.

> 
> Thanks,
> 
> Mathieu
> 
> > 
> > Signed-off-by: Vincent Donnefort 
> > 
> > diff --git a/include/uapi/linux/trace_mmap.h 
> > b/include/uapi/linux/trace_mmap.h
> > index 182e05a3004a..7330249257e7 100644
> > --- a/include/uapi/linux/trace_mmap.h
> > +++ b/include/uapi/linux/trace_mmap.h
> > @@ -43,4 +43,6 @@ struct trace_buffer_meta {
> > __u64   Reserved2;
> >   };
> > +#define TRACE_MMAP_IOCTL_GET_READER_IO('T', 0x1)
> > +
> >   #endif /* _TRACE_MMAP_H_ */
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index 4ebf4d0bd14c..36b62cf2fb3f 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -1175,6 +1175,12 @@ static void tracing_snapshot_instance_cond(struct 
> > trace_array *tr,
> > return;
> > }
> > +   if (tr->mapped) {
> > +   trace_array_puts(tr, "*** BUFFER MEMORY MAPPED ***\n");
> > +   trace_array_puts(tr, "*** Can not use snapshot (sorry) ***\n");
> > +   return;
> > +   }
> > +
> > local_irq_save(flags);
> > update_max_tr(tr, current, smp_processor_id(), cond_data);
> > local_irq_restore(flags);
> > @@ -1307,7 +1313,7 @@ static int tracing_arm_snapshot_locked(struct 
> > trace_array *tr)
> > lockdep_assert_held(_types_lock);
> > spin_lock(>snapshot_trigger_lock);
> > -   if (tr->snapshot == UINT_MAX) {
> > +   if (tr->snapshot == UINT_MAX || tr->mapped) {
> > spin_unlock(>snapshot_trigger_lock);
> > return -EBUSY;
> > }
> > @@ -6533,7 +6539,7 @@ static void tracing_set_nop(struct trace_array *tr)
> >   {
> > if (tr->current_trace == _trace)
> > return;
> > -   
> > +
> > tr->current_trace->enabled--;
> > if (tr->current_trace->reset)
> > @@ -8652,15 +8658,31 @@ tracing_buffers_splice_read(struct file *file, 
> > loff_t *ppos,
> > return ret;
> >   }
> > -/* An ioctl call with cmd 0 to the ring buffer file will wake up all 
> > waiters */
> >   static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, 
> > unsigned long arg)
> >   {
> > struct ftrace_buffer_info *info = file->private_data;
> > struct trace_iterator *iter = >iter;
> > +   int err;
> > +
> > +   if (cmd == TRACE_MMAP_IOCTL_GET_READER) {
> > +   if (!(file->f_flags & O_NONBLOCK)) {
> > +   err = ring_buffer_wait(iter->array_buffer->buffer,
> > +  iter->cpu_file,
> > +  iter->tr->buffer_percent);
> > +   if (err)
> > +   return err;
> > +   }
> > -   if (cmd)
> > -   return -ENOIOCTLCMD;
> > +   return ring_buffer_map_get_reader(iter->array_buffer->buffer,
> > + iter->cpu_file);
> > +   } else if (cmd) {
> > +   return -ENOTTY;
> > +   }
> > +   /*
> > +* An ioctl call with cmd 0 to the ring buffer file will wake up all
> > +* waiters
> > +*/
> > mutex_lock(_types_lock);
> > iter->wait_index++;
> > @@ -8673,6 +8695,97 @@ static long tracing_buffers_ioctl(struct file *file, 
> 

Re: [PATCH v14 4/6] tracing: Allow user-space mapping of the ring-buffer

2024-02-05 Thread Mathieu Desnoyers

On 2024-02-05 11:34, Vincent Donnefort wrote:

Currently, user-space extracts data from the ring-buffer via splice,
which is handy for storage or network sharing. However, due to splice
limitations, it is imposible to do real-time analysis without a copy.

A solution for that problem is to let the user-space map the ring-buffer
directly.

The mapping is exposed via the per-CPU file trace_pipe_raw. The first
element of the mapping is the meta-page. It is followed by each
subbuffer constituting the ring-buffer, ordered by their unique page ID:

   * Meta-page -- include/uapi/linux/trace_mmap.h for a description
   * Subbuf ID 0
   * Subbuf ID 1
  ...

It is therefore easy to translate a subbuf ID into an offset in the
mapping:

   reader_id = meta->reader->id;
   reader_offset = meta->meta_page_size + reader_id * meta->subbuf_size;

When new data is available, the mapper must call a newly introduced ioctl:
TRACE_MMAP_IOCTL_GET_READER. This will update the Meta-page reader ID to
point to the next reader containing unread data.

Mapping will prevent snapshot and buffer size modifications.


How are the kernel linear mapping and the userspace mapping made coherent
on architectures with virtually aliasing data caches ?

Ref. 
https://lore.kernel.org/lkml/20240202210019.88022-1-mathieu.desnoy...@efficios.com/T/#t

Thanks,

Mathieu



Signed-off-by: Vincent Donnefort 

diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h
index 182e05a3004a..7330249257e7 100644
--- a/include/uapi/linux/trace_mmap.h
+++ b/include/uapi/linux/trace_mmap.h
@@ -43,4 +43,6 @@ struct trace_buffer_meta {
__u64   Reserved2;
  };
  
+#define TRACE_MMAP_IOCTL_GET_READER		_IO('T', 0x1)

+
  #endif /* _TRACE_MMAP_H_ */
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4ebf4d0bd14c..36b62cf2fb3f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1175,6 +1175,12 @@ static void tracing_snapshot_instance_cond(struct 
trace_array *tr,
return;
}
  
+	if (tr->mapped) {

+   trace_array_puts(tr, "*** BUFFER MEMORY MAPPED ***\n");
+   trace_array_puts(tr, "*** Can not use snapshot (sorry) ***\n");
+   return;
+   }
+
local_irq_save(flags);
update_max_tr(tr, current, smp_processor_id(), cond_data);
local_irq_restore(flags);
@@ -1307,7 +1313,7 @@ static int tracing_arm_snapshot_locked(struct trace_array 
*tr)
lockdep_assert_held(_types_lock);
  
  	spin_lock(>snapshot_trigger_lock);

-   if (tr->snapshot == UINT_MAX) {
+   if (tr->snapshot == UINT_MAX || tr->mapped) {
spin_unlock(>snapshot_trigger_lock);
return -EBUSY;
}
@@ -6533,7 +6539,7 @@ static void tracing_set_nop(struct trace_array *tr)
  {
if (tr->current_trace == _trace)
return;
-   
+
tr->current_trace->enabled--;
  
  	if (tr->current_trace->reset)

@@ -8652,15 +8658,31 @@ tracing_buffers_splice_read(struct file *file, loff_t 
*ppos,
return ret;
  }
  
-/* An ioctl call with cmd 0 to the ring buffer file will wake up all waiters */

  static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
  {
struct ftrace_buffer_info *info = file->private_data;
struct trace_iterator *iter = >iter;
+   int err;
+
+   if (cmd == TRACE_MMAP_IOCTL_GET_READER) {
+   if (!(file->f_flags & O_NONBLOCK)) {
+   err = ring_buffer_wait(iter->array_buffer->buffer,
+  iter->cpu_file,
+  iter->tr->buffer_percent);
+   if (err)
+   return err;
+   }
  
-	if (cmd)

-   return -ENOIOCTLCMD;
+   return ring_buffer_map_get_reader(iter->array_buffer->buffer,
+ iter->cpu_file);
+   } else if (cmd) {
+   return -ENOTTY;
+   }
  
+	/*

+* An ioctl call with cmd 0 to the ring buffer file will wake up all
+* waiters
+*/
mutex_lock(_types_lock);
  
  	iter->wait_index++;

@@ -8673,6 +8695,97 @@ static long tracing_buffers_ioctl(struct file *file, 
unsigned int cmd, unsigned
return 0;
  }
  
+static vm_fault_t tracing_buffers_mmap_fault(struct vm_fault *vmf)

+{
+   struct ftrace_buffer_info *info = vmf->vma->vm_file->private_data;
+   struct trace_iterator *iter = >iter;
+   vm_fault_t ret = VM_FAULT_SIGBUS;
+   struct page *page;
+
+   page = ring_buffer_map_fault(iter->array_buffer->buffer, iter->cpu_file,
+vmf->pgoff);
+   if (!page)
+   return ret;
+
+   get_page(page);
+   vmf->page = page;
+   vmf->page->mapping = vmf->vma->vm_file->f_mapping;
+   vmf->page->index = vmf->pgoff;
+
+   return 0;
+}
+
+static void 

[PATCH v14 4/6] tracing: Allow user-space mapping of the ring-buffer

2024-02-05 Thread Vincent Donnefort
Currently, user-space extracts data from the ring-buffer via splice,
which is handy for storage or network sharing. However, due to splice
limitations, it is imposible to do real-time analysis without a copy.

A solution for that problem is to let the user-space map the ring-buffer
directly.

The mapping is exposed via the per-CPU file trace_pipe_raw. The first
element of the mapping is the meta-page. It is followed by each
subbuffer constituting the ring-buffer, ordered by their unique page ID:

  * Meta-page -- include/uapi/linux/trace_mmap.h for a description
  * Subbuf ID 0
  * Subbuf ID 1
 ...

It is therefore easy to translate a subbuf ID into an offset in the
mapping:

  reader_id = meta->reader->id;
  reader_offset = meta->meta_page_size + reader_id * meta->subbuf_size;

When new data is available, the mapper must call a newly introduced ioctl:
TRACE_MMAP_IOCTL_GET_READER. This will update the Meta-page reader ID to
point to the next reader containing unread data.

Mapping will prevent snapshot and buffer size modifications.

Signed-off-by: Vincent Donnefort 

diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h
index 182e05a3004a..7330249257e7 100644
--- a/include/uapi/linux/trace_mmap.h
+++ b/include/uapi/linux/trace_mmap.h
@@ -43,4 +43,6 @@ struct trace_buffer_meta {
__u64   Reserved2;
 };
 
+#define TRACE_MMAP_IOCTL_GET_READER_IO('T', 0x1)
+
 #endif /* _TRACE_MMAP_H_ */
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4ebf4d0bd14c..36b62cf2fb3f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1175,6 +1175,12 @@ static void tracing_snapshot_instance_cond(struct 
trace_array *tr,
return;
}
 
+   if (tr->mapped) {
+   trace_array_puts(tr, "*** BUFFER MEMORY MAPPED ***\n");
+   trace_array_puts(tr, "*** Can not use snapshot (sorry) ***\n");
+   return;
+   }
+
local_irq_save(flags);
update_max_tr(tr, current, smp_processor_id(), cond_data);
local_irq_restore(flags);
@@ -1307,7 +1313,7 @@ static int tracing_arm_snapshot_locked(struct trace_array 
*tr)
lockdep_assert_held(_types_lock);
 
spin_lock(>snapshot_trigger_lock);
-   if (tr->snapshot == UINT_MAX) {
+   if (tr->snapshot == UINT_MAX || tr->mapped) {
spin_unlock(>snapshot_trigger_lock);
return -EBUSY;
}
@@ -6533,7 +6539,7 @@ static void tracing_set_nop(struct trace_array *tr)
 {
if (tr->current_trace == _trace)
return;
-   
+
tr->current_trace->enabled--;
 
if (tr->current_trace->reset)
@@ -8652,15 +8658,31 @@ tracing_buffers_splice_read(struct file *file, loff_t 
*ppos,
return ret;
 }
 
-/* An ioctl call with cmd 0 to the ring buffer file will wake up all waiters */
 static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
 {
struct ftrace_buffer_info *info = file->private_data;
struct trace_iterator *iter = >iter;
+   int err;
+
+   if (cmd == TRACE_MMAP_IOCTL_GET_READER) {
+   if (!(file->f_flags & O_NONBLOCK)) {
+   err = ring_buffer_wait(iter->array_buffer->buffer,
+  iter->cpu_file,
+  iter->tr->buffer_percent);
+   if (err)
+   return err;
+   }
 
-   if (cmd)
-   return -ENOIOCTLCMD;
+   return ring_buffer_map_get_reader(iter->array_buffer->buffer,
+ iter->cpu_file);
+   } else if (cmd) {
+   return -ENOTTY;
+   }
 
+   /*
+* An ioctl call with cmd 0 to the ring buffer file will wake up all
+* waiters
+*/
mutex_lock(_types_lock);
 
iter->wait_index++;
@@ -8673,6 +8695,97 @@ static long tracing_buffers_ioctl(struct file *file, 
unsigned int cmd, unsigned
return 0;
 }
 
+static vm_fault_t tracing_buffers_mmap_fault(struct vm_fault *vmf)
+{
+   struct ftrace_buffer_info *info = vmf->vma->vm_file->private_data;
+   struct trace_iterator *iter = >iter;
+   vm_fault_t ret = VM_FAULT_SIGBUS;
+   struct page *page;
+
+   page = ring_buffer_map_fault(iter->array_buffer->buffer, iter->cpu_file,
+vmf->pgoff);
+   if (!page)
+   return ret;
+
+   get_page(page);
+   vmf->page = page;
+   vmf->page->mapping = vmf->vma->vm_file->f_mapping;
+   vmf->page->index = vmf->pgoff;
+
+   return 0;
+}
+
+static void tracing_buffers_mmap_close(struct vm_area_struct *vma)
+{
+   struct ftrace_buffer_info *info = vma->vm_file->private_data;
+   struct trace_iterator *iter = >iter;
+   struct trace_array *tr = iter->tr;
+
+   ring_buffer_unmap(iter->array_buffer->buffer, iter->cpu_file);
+