On Thu, 18 Apr 2024 09:55:55 +0300
Mike Rapoport <r...@kernel.org> wrote:

Hi Mike,

Thanks for doing this review!

> > +/**
> > + * struct trace_buffer_meta - Ring-buffer Meta-page description
> > + * @meta_page_size:        Size of this meta-page.
> > + * @meta_struct_len:       Size of this structure.
> > + * @subbuf_size:   Size of each sub-buffer.
> > + * @nr_subbufs:            Number of subbfs in the ring-buffer, including 
> > the reader.
> > + * @reader.lost_events:    Number of events lost at the time of the reader 
> > swap.
> > + * @reader.id:             subbuf ID of the current reader. ID range [0 : 
> > @nr_subbufs - 1]
> > + * @reader.read:   Number of bytes read on the reader subbuf.
> > + * @flags:         Placeholder for now, 0 until new features are supported.
> > + * @entries:               Number of entries in the ring-buffer.
> > + * @overrun:               Number of entries lost in the ring-buffer.
> > + * @read:          Number of entries that have been read.
> > + * @Reserved1:             Reserved for future use.
> > + * @Reserved2:             Reserved for future use.
> > + */
> > +struct trace_buffer_meta {
> > +   __u32           meta_page_size;
> > +   __u32           meta_struct_len;
> > +
> > +   __u32           subbuf_size;
> > +   __u32           nr_subbufs;
> > +
> > +   struct {
> > +           __u64   lost_events;
> > +           __u32   id;
> > +           __u32   read;
> > +   } reader;
> > +
> > +   __u64   flags;
> > +
> > +   __u64   entries;
> > +   __u64   overrun;
> > +   __u64   read;
> > +
> > +   __u64   Reserved1;
> > +   __u64   Reserved2;  
> 
> Why do you need reserved fields? This structure always resides in the
> beginning of a page and the rest of the page is essentially "reserved".

So this code is also going to be used in arm's pkvm hypervisor code,
where it will be using these fields, but since we are looking at
keeping the same interface between the two, we don't want these used by
this interface.

We probably should add a comment about that.

> 
> > +};
> > +
> > +#endif /* _TRACE_MMAP_H_ */
> > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > index cc9ebe593571..793ecc454039 100644
> > --- a/kernel/trace/ring_buffer.c
> > +++ b/kernel/trace/ring_buffer.c  
> 
> ... 
> 
> > +static void rb_setup_ids_meta_page(struct ring_buffer_per_cpu *cpu_buffer,
> > +                              unsigned long *subbuf_ids)
> > +{
> > +   struct trace_buffer_meta *meta = cpu_buffer->meta_page;
> > +   unsigned int nr_subbufs = cpu_buffer->nr_pages + 1;
> > +   struct buffer_page *first_subbuf, *subbuf;
> > +   int id = 0;
> > +
> > +   subbuf_ids[id] = (unsigned long)cpu_buffer->reader_page->page;
> > +   cpu_buffer->reader_page->id = id++;
> > +
> > +   first_subbuf = subbuf = rb_set_head_page(cpu_buffer);
> > +   do {
> > +           if (WARN_ON(id >= nr_subbufs))
> > +                   break;
> > +
> > +           subbuf_ids[id] = (unsigned long)subbuf->page;
> > +           subbuf->id = id;
> > +
> > +           rb_inc_page(&subbuf);
> > +           id++;
> > +   } while (subbuf != first_subbuf);
> > +
> > +   /* install subbuf ID to kern VA translation */
> > +   cpu_buffer->subbuf_ids = subbuf_ids;
> > +
> > +   /* __rb_map_vma() pads the meta-page to align it with the sub-buffers */
> > +   meta->meta_page_size = PAGE_SIZE << cpu_buffer->buffer->subbuf_order;  
> 
> Isn't this a single page?

One thing we are doing is to make sure that the subbuffers are aligned
by their size. If a subbuffer is 3 pages, it should be aligned on 3
page boundaries. This was something that Linus suggested.

> 
> > +   meta->meta_struct_len = sizeof(*meta);
> > +   meta->nr_subbufs = nr_subbufs;
> > +   meta->subbuf_size = cpu_buffer->buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
> > +
> > +   rb_update_meta_page(cpu_buffer);
> > +}  
> 
> ...
> 
> > +#define subbuf_page(off, start) \
> > +   virt_to_page((void *)((start) + ((off) << PAGE_SHIFT)))
> > +
> > +#define foreach_subbuf_page(sub_order, start, page)                \  
> 
> Nit: usually iterators in kernel use for_each

Ah, good catch. Yeah, that should be changed. But then ...

> 
> > +   page = subbuf_page(0, (start));                         \
> > +   for (int __off = 0; __off < (1 << (sub_order));         \
> > +        __off++, page = subbuf_page(__off, (start)))  
> 
> The pages are allocated with alloc_pages_node(.. subbuf_order) are
> physically contiguous and struct pages for them are also contiguous, so
> inside a subbuf_order allocation you can just do page++.
> 

I'm wondering if we should just nuke the macro. It was there because of
the previous implementation did things twice. But now it's just done
once here:

+       while (s < nr_subbufs && p < nr_pages) {
+               struct page *page;
+
+               foreach_subbuf_page(subbuf_order, cpu_buffer->subbuf_ids[s], 
page) {
+                       if (p >= nr_pages)
+                               break;
+
+                       pages[p++] = page;
+               }
+               s++;
+       }

Perhaps that should just be:

        while (s < nr_subbufs && p < nr_pages) {
                struct page *page;
                int off;

                page = subbuf_page(0, cpu_buffer->subbuf_ids[s]);
                for (off = 0; off < (1 << subbuf_order); off++, page++, p++) {
                        if (p >= nr_pages)
                                break;

                        pages[p] = page;
                }
                s++;
        }

 ?

-- Steve

Reply via email to