Re: [PATCH 6/6] eventfs: clean up dentry ops and add revalidate function

2024-01-31 Thread Steven Rostedt
On Thu, 1 Feb 2024 08:07:15 +0900
Masami Hiramatsu (Google)  wrote:

> > Then tracefs could be nicely converted over to kernfs, and eventfs would be
> > its own entity.  
> 
> If so, maybe we can just make symlinks to the 'id' and 'format' from events
> under tracefs? :)

I don't think that will save anything. The files currently do not allocate
any memory. If we make symlinks, we need to allocate a path, to them. I
think that would be rather difficult to do. Not to mention, that could
cause a lot of breakage. What do you do if the other filesystem isn't
mounted?

I could possibly make a light way handle to pass back to the callbacks.

struct trace_event_light {
unsigned long   flags
struct trace_event_call *event_call;
};

struct trace_event_file {
struct trace_event_lightcall;
[..]
// Remove he flags and event_call and have it above
};

if the callback data has:

 callback(..., void **data)
 {
struct trace_event_light *call = *data;
struct trace_event_file *file;

If (strcmp(name, "id") == 0 || strcmp(name, "format") == 0) {
*data = call->event_call;
return 1;
}

/* Return if this is just a light data entry */
if (!(data->flags & TRACE_EVENT_FULL))
return 0;

file = container_of(data, struct trace_event_file, call);

// continue processing the full data
}

This way the lonely eventfs could still share a lot of the code.

-- Steve



Re: [PATCH 6/6] eventfs: clean up dentry ops and add revalidate function

2024-01-31 Thread Google
On Wed, 31 Jan 2024 13:00:39 -0500
Steven Rostedt  wrote:

> On Tue, 30 Jan 2024 11:03:55 -0800
> Linus Torvalds  wrote:
> 
> > It would probably be cleaner to make eventfs its own filesystem, or at
> > least set its own dentry ops when looking up eventfs files.  But as it
> > is, only eventfs dentries use d_fsdata, so we don't really need to split
> > these things up by use.
> 
> BTW, I have been thinking about making eventfs a completely separate file
> system that could be mounted outside of tracefs. One that is readonly that
> only contains the "id" and "format" files and nothing more.
> 
> Why? Because perf and powertop both use those files to know how to parse
> the raw event formats. I don't think there's anything in there that
> requires root privileges to read. They should not be exposing any internal
> kernel information besides the event format layouts, and it would be nice
> to have a /sys/kernel/events directory that only had that.

That's a good idea! So maybe we can allow perf to read it without root user.

> 
> Making eventfs a separate file system where, when added to tracefs, has the
> control files for the specific trace_array, but for the /sys/kernel
> directory, only cares about the trace format files.
> 
> Then tracefs could be nicely converted over to kernfs, and eventfs would be
> its own entity.

If so, maybe we can just make symlinks to the 'id' and 'format' from events
under tracefs? :)

Thank you,

> 
> -- Steve


-- 
Masami Hiramatsu (Google) 



Re: [PATCH 6/6] eventfs: clean up dentry ops and add revalidate function

2024-01-31 Thread Steven Rostedt
On Tue, 30 Jan 2024 11:03:55 -0800
Linus Torvalds  wrote:

> It would probably be cleaner to make eventfs its own filesystem, or at
> least set its own dentry ops when looking up eventfs files.  But as it
> is, only eventfs dentries use d_fsdata, so we don't really need to split
> these things up by use.

BTW, I have been thinking about making eventfs a completely separate file
system that could be mounted outside of tracefs. One that is readonly that
only contains the "id" and "format" files and nothing more.

Why? Because perf and powertop both use those files to know how to parse
the raw event formats. I don't think there's anything in there that
requires root privileges to read. They should not be exposing any internal
kernel information besides the event format layouts, and it would be nice
to have a /sys/kernel/events directory that only had that.

Making eventfs a separate file system where, when added to tracefs, has the
control files for the specific trace_array, but for the /sys/kernel
directory, only cares about the trace format files.

Then tracefs could be nicely converted over to kernfs, and eventfs would be
its own entity.

-- Steve



Re: [PATCH 6/6] eventfs: clean up dentry ops and add revalidate function

2024-01-30 Thread Al Viro
On Tue, Jan 30, 2024 at 07:39:55PM -0800, Linus Torvalds wrote:

> Why don't we, btw? It would be so much better if we did the
> d_release() from __d_free(). We have all that smarts in fs/dcache.c to
> decide if we need to RCU-delay it or not, and then we don't let
> filesystems use it.

Because we want to give filesystems a chance to do something useful
in it?  Something that might need access to the parent.  Or superblock,
for that matter...  By the time we are past the RCU delay there's
very little left; if you want to look at some per-superblock data,
you would have to do something like putting it into a refcounted
structure, with each dentry holding a counting reference, with obvious
effects...

We could add a separate method just for freeing stuff, but... we already
have 4 of them related to that path (->d_delete(), ->d_prune(), ->d_iput(),
->d_release()) and the things are pretty confusing as it is, without
throwing another one into the mix.

I'll look through the RCU pathwalk fixes (will repost the rebased set in
a couple of days, need to finish the re-audit of that stuff) and see how
much would such a method simplify, but I don't think it would buy us
a lot.

> So I guess the "make low-level filesystems do their own kfree_rcu() is
> what we're doing.
> 
> In this case it's as simple as doing that
> 
> -   kfree(ei);
> +   kfree_rcu(ei, rcu);
> 
> and we'd just make the rcu entry a union with something that isn't
> that 'is_freed' field so that it doesn't take more space.



Re: [PATCH 6/6] eventfs: clean up dentry ops and add revalidate function

2024-01-30 Thread Linus Torvalds
On Tue, 30 Jan 2024 at 18:46, Al Viro  wrote:
>
> What's to stop ->d_revalidate() from being called in parallel with
> __dentry_kill()?

Oh, you're right.

For some reason I thought we did the d_release() _after_ the RCU grace
period, but we don't.

Why don't we, btw? It would be so much better if we did the
d_release() from __d_free(). We have all that smarts in fs/dcache.c to
decide if we need to RCU-delay it or not, and then we don't let
filesystems use it.

I assume the reason is that some 'd_delete' cases might want to sleep
etc. Still, for things like this that just want to release memory, it
would be *much* better to have d_release called when the dentry is
actually released.

Hmm. Not very many d_delete cases, but I did see a couple that
definitely want process context (dma_buf_release goes to things that
do vfree() etc).

So I guess the "make low-level filesystems do their own kfree_rcu() is
what we're doing.

In this case it's as simple as doing that

-   kfree(ei);
+   kfree_rcu(ei, rcu);

and we'd just make the rcu entry a union with something that isn't
that 'is_freed' field so that it doesn't take more space.

 Linus



Re: [PATCH 6/6] eventfs: clean up dentry ops and add revalidate function

2024-01-30 Thread Al Viro
On Tue, Jan 30, 2024 at 06:37:49PM -0800, Linus Torvalds wrote:
> On Tue, 30 Jan 2024 at 17:12, Al Viro  wrote:
> >
> > > + *
> > > + * Note that d_revalidate is called potentially under RCU,
> > > + * so it can't take the eventfs mutex etc. It's fine - if
> > > + * we open a file just as it's marked dead, things will
> > > + * still work just fine, and just see the old stale case.
> >
> > Looks like use after free, unless freeing ei is RCU-delayed...
> 
> We hold the ref to the ei in the very dentry that is doing d_revalidate().

What's to stop ->d_revalidate() from being called in parallel with
__dentry_kill()?



Re: [PATCH 6/6] eventfs: clean up dentry ops and add revalidate function

2024-01-30 Thread Linus Torvalds
On Tue, 30 Jan 2024 at 17:12, Al Viro  wrote:
>
> > + *
> > + * Note that d_revalidate is called potentially under RCU,
> > + * so it can't take the eventfs mutex etc. It's fine - if
> > + * we open a file just as it's marked dead, things will
> > + * still work just fine, and just see the old stale case.
>
> Looks like use after free, unless freeing ei is RCU-delayed...

We hold the ref to the ei in the very dentry that is doing d_revalidate().

So it should be fine. The race is with eventfs marking the ei
'is_freed' (under the mutex that we don't hold here), but when that
happens and we end up still using the dentry, the ei is still there,
all the operations are just going to fail.

 Linus



Re: [PATCH 6/6] eventfs: clean up dentry ops and add revalidate function

2024-01-30 Thread Al Viro
On Tue, Jan 30, 2024 at 11:03:55AM -0800, Linus Torvalds wrote:

> +void eventfs_d_release(struct dentry *dentry)
>  {
> - struct eventfs_inode *ei;
> -
> - mutex_lock(_mutex);
> - ei = dentry->d_fsdata;
> - if (ei) {
> - dentry->d_fsdata = NULL;
> - put_ei(ei);
> - }
> - mutex_unlock(_mutex);
> + put_ei(dentry->d_fsdata);
>  }

I'd rather pass ->d_fsdata to that sucker (or exposed put_ei(),
for that matter).

> @@ -857,6 +847,5 @@ void eventfs_remove_events_dir(struct eventfs_inode *ei)
>* sticks around while the other ei->dentry are created
>* and destroyed dynamically.
>*/
> - simple_recursive_removal(dentry, NULL);

That also needs to move earlier in the series - bisect hazard.

> + *
> + * Note that d_revalidate is called potentially under RCU,
> + * so it can't take the eventfs mutex etc. It's fine - if
> + * we open a file just as it's marked dead, things will
> + * still work just fine, and just see the old stale case.

Looks like use after free, unless freeing ei is RCU-delayed...

> + return !(ei && ei->is_freed);



Re: [PATCH 6/6] eventfs: clean up dentry ops and add revalidate function

2024-01-30 Thread Linus Torvalds
On Tue, 30 Jan 2024 at 13:25, Steven Rostedt  wrote:
>
> I actually find the dentry's sticking around when their ref counts go to
> zero a feature. Tracing applications will open and close the eventfs files
> many times. If the dentry were to be deleted on every close, it would need
> to be create over again in short spurts.

Sure. I think this is literally a tuning thing, and we'll see if
anybody ever says "the dentry cache grows too much with tracefs".

   Linus



Re: [PATCH 6/6] eventfs: clean up dentry ops and add revalidate function

2024-01-30 Thread Steven Rostedt
On Tue, 30 Jan 2024 11:03:55 -0800
Linus Torvalds  wrote:

> Another thing that might be worth doing is to make all eventfs lookups
> mark their dentries as not worth caching.  We could do that with
> d_delete(), but the DCACHE_DONTCACHE flag would likely be even better.
> 
> As it is, the dentries are all freeable, but they only tend to get freed
> at memory pressure rather than more proactively.  But that's a separate
> issue.

I actually find the dentry's sticking around when their ref counts go to
zero a feature. Tracing applications will open and close the eventfs files
many times. If the dentry were to be deleted on every close, it would need
to be create over again in short spurts.

There's some operations that will traverse the tree many times. One
operation is on reset, as there's some dependencies in the order of
disabling triggers. If there's a dependency, and a trigger is to be
disabled out of order, the disabling will fail with -EBUSY. Thus the tools
will iterate the trigger files several times until it there's no more
changes. It's not a critical path, but I rather not add more overhead to it.

-- Steve