Re: [PATCH v20 0/5] Introducing trace buffer mapping by user-space

2024-04-16 Thread Mike Rapoport
(added linux-mm)

On Wed, Apr 10, 2024 at 01:56:12PM -0400, Steven Rostedt wrote:
> 
> Hi Andrew, et.al.
> 
> Linus said it's a hard requirement that this code gets an Acked-by (or
> Reviewed-by) from the memory sub-maintainers before he will accept it.
> He was upset that we faulted in pages one at a time instead of mapping it
> in one go:
> 
>   
> https://lore.kernel.org/all/CAHk-=wh5wWeib7+kVHpBVtUn7kx7GGadWqb5mW5FYTdewEfL=w...@mail.gmail.com/
> 
> Could you take a look at patches 1-3 to make sure they look sane from a
> memory management point of view?
> 
> I really want this applied in the next merge window.
> 
> Thanks!
> 
> -- Steve
> 
> 
> On Sat,  6 Apr 2024 18:36:44 +0100
> Vincent Donnefort  wrote:
> 
> > The tracing ring-buffers can be stored on disk or sent to network
> > without any copy via splice. However the later doesn't allow real time
> > processing of the traces. A solution is to give userspace direct access
> > to the ring-buffer pages via a mapping. An application can now become a
> > consumer of the ring-buffer, in a similar fashion to what trace_pipe
> > offers.
> > 
> > Support for this new feature can already be found in libtracefs from
> > version 1.8, when built with EXTRA_CFLAGS=-DFORCE_MMAP_ENABLE.
> > 
> > Vincent
> > 
> > v19 -> v20:
> >   * Fix typos in documentation.
> >   * Remove useless mmap open and fault callbacks.
> >   * add mm.h include for vm_insert_pages
> > 
> > v18 -> v19:
> >   * Use VM_PFNMAP and vm_insert_pages
> >   * Allocate ring-buffer subbufs with __GFP_COMP
> >   * Pad the meta-page with the zero-page to align on the subbuf_order
> >   * Extend the ring-buffer test with mmap() dedicated suite
> > 
> > v17 -> v18:
> >   * Fix lockdep_assert_held
> >   * Fix spin_lock_init typo
> >   * Fix CONFIG_TRACER_MAX_TRACE typo
> > 
> > v16 -> v17:
> >   * Documentation and comments improvements.
> >   * Create get/put_snapshot_map() for clearer code.
> >   * Replace kzalloc with kcalloc.
> >   * Fix -ENOMEM handling in rb_alloc_meta_page().
> >   * Move flush(cpu_buffer->reader_page) behind the reader lock.
> >   * Move all inc/dec of cpu_buffer->mapped behind reader lock and buffer
> > mutex. (removes READ_ONCE/WRITE_ONCE accesses).
> > 
> > v15 -> v16:
> >   * Add comment for the dcache flush.
> >   * Remove now unnecessary WRITE_ONCE for the meta-page.
> > 
> > v14 -> v15:
> >   * Add meta-page and reader-page flush. Intends to fix the mapping
> > for VIVT and aliasing-VIPT data caches.
> >   * -EPERM on VM_EXEC.
> >   * Fix build warning !CONFIG_TRACER_MAX_TRACE.
> > 
> > v13 -> v14:
> >   * All cpu_buffer->mapped readers use READ_ONCE (except for swap_cpu)
> >   * on unmap, sync meta-page teardown with the reader_lock instead of
> > the synchronize_rcu.
> >   * Add a dedicated spinlock for trace_array ->snapshot and ->mapped.
> > (intends to fix a lockdep issue)
> >   * Add kerneldoc for flags and Reserved fields.
> >   * Add kselftest for snapshot/map mutual exclusion.
> > 
> > v12 -> v13:
> >   * Swap subbufs_{touched,lost} for Reserved fields.
> >   * Add a flag field in the meta-page.
> >   * Fix CONFIG_TRACER_MAX_TRACE.
> >   * Rebase on top of trace/urgent.
> >   * Add a comment for try_unregister_trigger()
> > 
> > v11 -> v12:
> >   * Fix code sample mmap bug.
> >   * Add logging in sample code.
> >   * Reset tracer in selftest.
> >   * Add a refcount for the snapshot users.
> >   * Prevent mapping when there are snapshot users and vice versa.
> >   * Refine the meta-page.
> >   * Fix types in the meta-page.
> >   * Collect Reviewed-by.
> > 
> > v10 -> v11:
> >   * Add Documentation and code sample.
> >   * Add a selftest.
> >   * Move all the update to the meta-page into a single
> > rb_update_meta_page().
> >   * rb_update_meta_page() is now called from
> > ring_buffer_map_get_reader() to fix NOBLOCK callers.
> >   * kerneldoc for struct trace_meta_page.
> >   * Add a patch to zero all the ring-buffer allocations.
> > 
> > v9 -> v10:
> >   * Refactor rb_update_meta_page()
> >   * In-loop declaration for foreach_subbuf_page()
> >   * Check for cpu_buffer->mapped overflow
> > 
> > v8 -> v9:
> >   * Fix the unlock path in ring_buffer_map()
> >   * Fix cpu_buffer cast with rb_work_rq->is_cpu_buffer
> >   * Rebase on linux-trace/for-next 
> > (3cb3091138ca0921c4569bcf7ffa062519639b6a)
> > 
> > v7 -> v8:
> >   * Drop the subbufs renaming into bpages
> >   * Use subbuf as a name when relevant
> > 
> > v6 -> v7:
> >   * Rebase onto lore.kernel.org/lkml/20231215175502.106587...@goodmis.org/
> >   * Support for subbufs
> >   * Rename subbufs into bpages
> > 
> > v5 -> v6:
> >   * Rebase on next-20230802.
> >   * (unsigned long) -> (void *) cast for virt_to_page().
> >   * Add a wait for the GET_READER_PAGE ioctl.
> >   * Move writer fields update (overrun/pages_lost/entries/pages_touched)
> > in the irq_work.
> >   * Rearrange id in struct buffer_page.
> >   * Rearrange the meta-page.
> >   * ring_buffer_meta_page -> 

Re: [PATCH v20 0/5] Introducing trace buffer mapping by user-space

2024-04-10 Thread Steven Rostedt


Hi Andrew, et.al.

Linus said it's a hard requirement that this code gets an Acked-by (or
Reviewed-by) from the memory sub-maintainers before he will accept it.
He was upset that we faulted in pages one at a time instead of mapping it
in one go:

  
https://lore.kernel.org/all/CAHk-=wh5wWeib7+kVHpBVtUn7kx7GGadWqb5mW5FYTdewEfL=w...@mail.gmail.com/

Could you take a look at patches 1-3 to make sure they look sane from a
memory management point of view?

I really want this applied in the next merge window.

Thanks!

-- Steve


On Sat,  6 Apr 2024 18:36:44 +0100
Vincent Donnefort  wrote:

> The tracing ring-buffers can be stored on disk or sent to network
> without any copy via splice. However the later doesn't allow real time
> processing of the traces. A solution is to give userspace direct access
> to the ring-buffer pages via a mapping. An application can now become a
> consumer of the ring-buffer, in a similar fashion to what trace_pipe
> offers.
> 
> Support for this new feature can already be found in libtracefs from
> version 1.8, when built with EXTRA_CFLAGS=-DFORCE_MMAP_ENABLE.
> 
> Vincent
> 
> v19 -> v20:
>   * Fix typos in documentation.
>   * Remove useless mmap open and fault callbacks.
>   * add mm.h include for vm_insert_pages
> 
> v18 -> v19:
>   * Use VM_PFNMAP and vm_insert_pages
>   * Allocate ring-buffer subbufs with __GFP_COMP
>   * Pad the meta-page with the zero-page to align on the subbuf_order
>   * Extend the ring-buffer test with mmap() dedicated suite
> 
> v17 -> v18:
>   * Fix lockdep_assert_held
>   * Fix spin_lock_init typo
>   * Fix CONFIG_TRACER_MAX_TRACE typo
> 
> v16 -> v17:
>   * Documentation and comments improvements.
>   * Create get/put_snapshot_map() for clearer code.
>   * Replace kzalloc with kcalloc.
>   * Fix -ENOMEM handling in rb_alloc_meta_page().
>   * Move flush(cpu_buffer->reader_page) behind the reader lock.
>   * Move all inc/dec of cpu_buffer->mapped behind reader lock and buffer
> mutex. (removes READ_ONCE/WRITE_ONCE accesses).
> 
> v15 -> v16:
>   * Add comment for the dcache flush.
>   * Remove now unnecessary WRITE_ONCE for the meta-page.
> 
> v14 -> v15:
>   * Add meta-page and reader-page flush. Intends to fix the mapping
> for VIVT and aliasing-VIPT data caches.
>   * -EPERM on VM_EXEC.
>   * Fix build warning !CONFIG_TRACER_MAX_TRACE.
> 
> v13 -> v14:
>   * All cpu_buffer->mapped readers use READ_ONCE (except for swap_cpu)
>   * on unmap, sync meta-page teardown with the reader_lock instead of
> the synchronize_rcu.
>   * Add a dedicated spinlock for trace_array ->snapshot and ->mapped.
> (intends to fix a lockdep issue)
>   * Add kerneldoc for flags and Reserved fields.
>   * Add kselftest for snapshot/map mutual exclusion.
> 
> v12 -> v13:
>   * Swap subbufs_{touched,lost} for Reserved fields.
>   * Add a flag field in the meta-page.
>   * Fix CONFIG_TRACER_MAX_TRACE.
>   * Rebase on top of trace/urgent.
>   * Add a comment for try_unregister_trigger()
> 
> v11 -> v12:
>   * Fix code sample mmap bug.
>   * Add logging in sample code.
>   * Reset tracer in selftest.
>   * Add a refcount for the snapshot users.
>   * Prevent mapping when there are snapshot users and vice versa.
>   * Refine the meta-page.
>   * Fix types in the meta-page.
>   * Collect Reviewed-by.
> 
> v10 -> v11:
>   * Add Documentation and code sample.
>   * Add a selftest.
>   * Move all the update to the meta-page into a single
> rb_update_meta_page().
>   * rb_update_meta_page() is now called from
> ring_buffer_map_get_reader() to fix NOBLOCK callers.
>   * kerneldoc for struct trace_meta_page.
>   * Add a patch to zero all the ring-buffer allocations.
> 
> v9 -> v10:
>   * Refactor rb_update_meta_page()
>   * In-loop declaration for foreach_subbuf_page()
>   * Check for cpu_buffer->mapped overflow
> 
> v8 -> v9:
>   * Fix the unlock path in ring_buffer_map()
>   * Fix cpu_buffer cast with rb_work_rq->is_cpu_buffer
>   * Rebase on linux-trace/for-next (3cb3091138ca0921c4569bcf7ffa062519639b6a)
> 
> v7 -> v8:
>   * Drop the subbufs renaming into bpages
>   * Use subbuf as a name when relevant
> 
> v6 -> v7:
>   * Rebase onto lore.kernel.org/lkml/20231215175502.106587...@goodmis.org/
>   * Support for subbufs
>   * Rename subbufs into bpages
> 
> v5 -> v6:
>   * Rebase on next-20230802.
>   * (unsigned long) -> (void *) cast for virt_to_page().
>   * Add a wait for the GET_READER_PAGE ioctl.
>   * Move writer fields update (overrun/pages_lost/entries/pages_touched)
> in the irq_work.
>   * Rearrange id in struct buffer_page.
>   * Rearrange the meta-page.
>   * ring_buffer_meta_page -> trace_buffer_meta_page.
>   * Add meta_struct_len into the meta-page.
> 
> v4 -> v5:
>   * Trivial rebase onto 6.5-rc3 (previously 6.4-rc3)
> 
> v3 -> v4:
>   * Add to the meta-page:
>- pages_lost / pages_read (allow to compute how full is the
>ring-buffer)
>- read (allow to compute how many entries can be read)
>- A reader_page 

[PATCH v20 0/5] Introducing trace buffer mapping by user-space

2024-04-06 Thread Vincent Donnefort
The tracing ring-buffers can be stored on disk or sent to network
without any copy via splice. However the later doesn't allow real time
processing of the traces. A solution is to give userspace direct access
to the ring-buffer pages via a mapping. An application can now become a
consumer of the ring-buffer, in a similar fashion to what trace_pipe
offers.

Support for this new feature can already be found in libtracefs from
version 1.8, when built with EXTRA_CFLAGS=-DFORCE_MMAP_ENABLE.

Vincent

v19 -> v20:
  * Fix typos in documentation.
  * Remove useless mmap open and fault callbacks.
  * add mm.h include for vm_insert_pages

v18 -> v19:
  * Use VM_PFNMAP and vm_insert_pages
  * Allocate ring-buffer subbufs with __GFP_COMP
  * Pad the meta-page with the zero-page to align on the subbuf_order
  * Extend the ring-buffer test with mmap() dedicated suite

v17 -> v18:
  * Fix lockdep_assert_held
  * Fix spin_lock_init typo
  * Fix CONFIG_TRACER_MAX_TRACE typo

v16 -> v17:
  * Documentation and comments improvements.
  * Create get/put_snapshot_map() for clearer code.
  * Replace kzalloc with kcalloc.
  * Fix -ENOMEM handling in rb_alloc_meta_page().
  * Move flush(cpu_buffer->reader_page) behind the reader lock.
  * Move all inc/dec of cpu_buffer->mapped behind reader lock and buffer
mutex. (removes READ_ONCE/WRITE_ONCE accesses).

v15 -> v16:
  * Add comment for the dcache flush.
  * Remove now unnecessary WRITE_ONCE for the meta-page.

v14 -> v15:
  * Add meta-page and reader-page flush. Intends to fix the mapping
for VIVT and aliasing-VIPT data caches.
  * -EPERM on VM_EXEC.
  * Fix build warning !CONFIG_TRACER_MAX_TRACE.

v13 -> v14:
  * All cpu_buffer->mapped readers use READ_ONCE (except for swap_cpu)
  * on unmap, sync meta-page teardown with the reader_lock instead of
the synchronize_rcu.
  * Add a dedicated spinlock for trace_array ->snapshot and ->mapped.
(intends to fix a lockdep issue)
  * Add kerneldoc for flags and Reserved fields.
  * Add kselftest for snapshot/map mutual exclusion.

v12 -> v13:
  * Swap subbufs_{touched,lost} for Reserved fields.
  * Add a flag field in the meta-page.
  * Fix CONFIG_TRACER_MAX_TRACE.
  * Rebase on top of trace/urgent.
  * Add a comment for try_unregister_trigger()

v11 -> v12:
  * Fix code sample mmap bug.
  * Add logging in sample code.
  * Reset tracer in selftest.
  * Add a refcount for the snapshot users.
  * Prevent mapping when there are snapshot users and vice versa.
  * Refine the meta-page.
  * Fix types in the meta-page.
  * Collect Reviewed-by.

v10 -> v11:
  * Add Documentation and code sample.
  * Add a selftest.
  * Move all the update to the meta-page into a single
rb_update_meta_page().
  * rb_update_meta_page() is now called from
ring_buffer_map_get_reader() to fix NOBLOCK callers.
  * kerneldoc for struct trace_meta_page.
  * Add a patch to zero all the ring-buffer allocations.

v9 -> v10:
  * Refactor rb_update_meta_page()
  * In-loop declaration for foreach_subbuf_page()
  * Check for cpu_buffer->mapped overflow

v8 -> v9:
  * Fix the unlock path in ring_buffer_map()
  * Fix cpu_buffer cast with rb_work_rq->is_cpu_buffer
  * Rebase on linux-trace/for-next (3cb3091138ca0921c4569bcf7ffa062519639b6a)

v7 -> v8:
  * Drop the subbufs renaming into bpages
  * Use subbuf as a name when relevant

v6 -> v7:
  * Rebase onto lore.kernel.org/lkml/20231215175502.106587...@goodmis.org/
  * Support for subbufs
  * Rename subbufs into bpages

v5 -> v6:
  * Rebase on next-20230802.
  * (unsigned long) -> (void *) cast for virt_to_page().
  * Add a wait for the GET_READER_PAGE ioctl.
  * Move writer fields update (overrun/pages_lost/entries/pages_touched)
in the irq_work.
  * Rearrange id in struct buffer_page.
  * Rearrange the meta-page.
  * ring_buffer_meta_page -> trace_buffer_meta_page.
  * Add meta_struct_len into the meta-page.

v4 -> v5:
  * Trivial rebase onto 6.5-rc3 (previously 6.4-rc3)

v3 -> v4:
  * Add to the meta-page:
   - pages_lost / pages_read (allow to compute how full is the
 ring-buffer)
   - read (allow to compute how many entries can be read)
   - A reader_page struct.
  * Rename ring_buffer_meta_header -> ring_buffer_meta
  * Rename ring_buffer_get_reader_page -> ring_buffer_map_get_reader_page
  * Properly consume events on ring_buffer_map_get_reader_page() with
rb_advance_reader().

v2 -> v3:
  * Remove data page list (for non-consuming read)
** Implies removing order > 0 meta-page
  * Add a new meta page field ->read
  * Rename ring_buffer_meta_page_header into ring_buffer_meta_header

v1 -> v2:
  * Hide data_pages from the userspace struct
  * Fix META_PAGE_MAX_PAGES
  * Support for order > 0 meta-page
  * Add missing page->mapping.

Vincent Donnefort (5):
  ring-buffer: allocate sub-buffers with __GFP_COMP
  ring-buffer: Introducing ring-buffer mapping functions
  tracing: Allow user-space mapping of the ring-buffer
  Documentation: tracing: Add ring-buffer mapping