[PATCH 13/14] tracefs: Convert to new uid/gid option parsing helpers

2024-06-27 Thread Eric Sandeen
Convert to new uid/gid option parsing helpers

Signed-off-by: Eric Sandeen 
---
 fs/tracefs/inode.c | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 7c29f4afc23d..1028ab6d9a74 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -296,9 +296,9 @@ enum {
 };
 
 static const struct fs_parameter_spec tracefs_param_specs[] = {
-   fsparam_u32 ("gid", Opt_gid),
+   fsparam_gid ("gid", Opt_gid),
fsparam_u32oct  ("mode",Opt_mode),
-   fsparam_u32 ("uid", Opt_uid),
+   fsparam_uid ("uid", Opt_uid),
{}
 };
 
@@ -306,8 +306,6 @@ static int tracefs_parse_param(struct fs_context *fc, 
struct fs_parameter *param
 {
struct tracefs_fs_info *opts = fc->s_fs_info;
struct fs_parse_result result;
-   kuid_t uid;
-   kgid_t gid;
int opt;
 
opt = fs_parse(fc, tracefs_param_specs, param, );
@@ -316,16 +314,10 @@ static int tracefs_parse_param(struct fs_context *fc, 
struct fs_parameter *param
 
switch (opt) {
case Opt_uid:
-   uid = make_kuid(current_user_ns(), result.uint_32);
-   if (!uid_valid(uid))
-   return invalf(fc, "Unknown uid");
-   opts->uid = uid;
+   opts->uid = result.uid;
break;
case Opt_gid:
-   gid = make_kgid(current_user_ns(), result.uint_32);
-   if (!gid_valid(gid))
-   return invalf(fc, "Unknown gid");
-   opts->gid = gid;
+   opts->gid = result.gid;
break;
case Opt_mode:
opts->mode = result.uint_32 & S_IALLUGO;
-- 
2.45.2





Re: [PATCH 06/12] uprobes: add batch uprobe register/unregister APIs

2024-06-27 Thread Andrii Nakryiko
On Thu, Jun 27, 2024 at 6:04 AM Masami Hiramatsu  wrote:
>
> On Mon, 24 Jun 2024 17:21:38 -0700
> Andrii Nakryiko  wrote:
>
> > -static int __uprobe_register(struct inode *inode, loff_t offset,
> > -  loff_t ref_ctr_offset, struct uprobe_consumer 
> > *uc)
> > +int uprobe_register_batch(struct inode *inode, int cnt,
> > +   uprobe_consumer_fn get_uprobe_consumer, void *ctx)
>
> Is this interface just for avoiding memory allocation? Can't we just
> allocate a temporary array of *uprobe_consumer instead?

Yes, exactly, to avoid the need for allocating another array that
would just contain pointers to uprobe_consumer. Consumers would never
just have an array of `struct uprobe_consumer *`, because
uprobe_consumer struct is embedded in some other struct, so the array
interface isn't the most convenient.

If you feel strongly, I can do an array, but this necessitates
allocating an extra array *and keeping it* for the entire duration of
BPF multi-uprobe link (attachment) existence, so it feels like a
waste. This is because we don't want to do anything that can fail in
the detachment logic (so no temporary array allocation there).

Anyways, let me know how you feel about keeping this callback.

>
> Thank you,
>
> --
> Masami Hiramatsu (Google) 



Re: [PATCH 04/12] uprobes: revamp uprobe refcounting and lifetime management

2024-06-27 Thread Andrii Nakryiko
On Wed, Jun 26, 2024 at 7:30 PM Masami Hiramatsu  wrote:
>
> On Mon, 24 Jun 2024 17:21:36 -0700
> Andrii Nakryiko  wrote:
>
> > Anyways, under exclusive writer lock, we double-check that refcount
> > didn't change and is still zero. If it is, we proceed with destruction,
> > because at that point we have a guarantee that find_active_uprobe()
> > can't successfully look up this uprobe instance, as it's going to be
> > removed in destructor under writer lock. If, on the other hand,
> > find_active_uprobe() managed to bump refcount from zero to one in
> > between put_uprobe()'s atomic_dec_and_test(>ref) and
> > write_lock(_treelock), we'll deterministically detect this with
> > extra atomic_read(>ref) check, and if it doesn't hold, we
> > pretend like atomic_dec_and_test() never returned true. There is no
> > resource freeing or any other irreversible action taken up till this
> > point, so we just exit early.
> >
> > One tricky part in the above is actually two CPUs racing and dropping
> > refcnt to zero, and then attempting to free resources. This can happen
> > as follows:
> >   - CPU #0 drops refcnt from 1 to 0, and proceeds to grab uprobes_treelock;
> >   - before CPU #0 grabs a lock, CPU #1 updates refcnt as 0 -> 1 -> 0, at
> > which point it decides that it needs to free uprobe as well.
> >
> > At this point both CPU #0 and CPU #1 will believe they need to destroy
> > uprobe, which is obviously wrong. To prevent this situations, we augment
> > refcount with epoch counter, which is always incremented by 1 on either
> > get or put operation. This allows those two CPUs above to disambiguate
> > who should actually free uprobe (it's the CPU #1, because it has
> > up-to-date epoch). See comments in the code and note the specific values
> > of UPROBE_REFCNT_GET and UPROBE_REFCNT_PUT constants. Keep in mind that
> > a single atomi64_t is actually a two sort-of-independent 32-bit counters
> > that are incremented/decremented with a single atomic_add_and_return()
> > operation. Note also a small and extremely rare (and thus having no
> > effect on performance) need to clear the highest bit every 2 billion
> > get/put operations to prevent high 32-bit counter from "bleeding over"
> > into lower 32-bit counter.
>
> I have a question here.
> Is there any chance to the CPU#1 to put the uprobe before CPU#0 gets
> the uprobes_treelock, and free uprobe before CPU#0 validate uprobe->ref
> again? e.g.
>
> CPU#0   CPU#1
>
> put_uprobe() {
> atomic64_add_return()
> __get_uprobe();
> put_uprobe() {
> kfree(uprobe)
> }
> write_lock(_treelock);
> atomic64_read(>ref);
> }
>
> I think it is very rare case, but I could not find any code to prevent
> this scenario.
>

Yes, I think you are right, great catch!

I concentrated on preventing double kfree() in this situation, and
somehow convinced myself that eager kfree() is fine. But I think I'll
need to delay freeing, probably with RCU. The problem is that we can't
use rcu_read_lock()/rcu_read_unlock() because we take locks, so it has
to be a sleepable variant of RCU. I'm thinking of using
rcu_read_lock_trace(), the same variant of RCU we use for sleepable
BPF programs (including sleepable uprobes). srcu might be too heavy
for this.

I'll try a few variants over the next few days and see how the
performance looks.

> Thank you,
>
>
> --
> Masami Hiramatsu (Google) 
>



Re: [PATCH v1 4/4] trace: platform/x86/intel/ifs: Add SBAF trace support

2024-06-27 Thread Kuppuswamy Sathyanarayanan


On 6/27/24 6:56 AM, Steven Rostedt wrote:
> On Thu, 27 Jun 2024 02:35:16 +
> Kuppuswamy Sathyanarayanan  wrote:
>
>> From: Jithu Joseph 
>>
>> Add tracing support for the SBAF IFS tests, which may be useful for
>> debugging systems that fail these tests. Log details like test content
>> batch number, SBAF bundle ID, program index and the exact errors or
>> warnings encountered by each HT thread during the test.
>>
>> Reviewed-by: Ashok Raj 
>> Reviewed-by: Tony Luck 
>> Signed-off-by: Jithu Joseph 
>> Signed-off-by: Kuppuswamy Sathyanarayanan 
>> 
>> ---
>>  include/trace/events/intel_ifs.h | 27 
>>  drivers/platform/x86/intel/ifs/runtest.c |  1 +
>>  2 files changed, 28 insertions(+)
>>
>> diff --git a/include/trace/events/intel_ifs.h 
>> b/include/trace/events/intel_ifs.h
>> index 0d88ebf2c980..9c7413de432b 100644
>> --- a/include/trace/events/intel_ifs.h
>> +++ b/include/trace/events/intel_ifs.h
>> @@ -35,6 +35,33 @@ TRACE_EVENT(ifs_status,
>>  __entry->status)
>>  );
>>  
>> +TRACE_EVENT(ifs_sbaf,
>> +
>> +TP_PROTO(int batch, union ifs_sbaf activate, union ifs_sbaf_status 
>> status),
>> +
>> +TP_ARGS(batch, activate, status),
>> +
>> +TP_STRUCT__entry(
>> +__field(int,batch   )
>> +__field(u64,status  )
> Please put the 64 bit status field before the 32 bit batch field,
> otherwise this will likely create a 4 byte hole between the two fields.
> Space on the ring buffer is expensive real-estate.

Agree. I will fix this in next version.

>
> -- Steve
>
>> +__field(u16,bundle  )
>> +__field(u16,pgm )
>> +),
>> +
>> +TP_fast_assign(
>> +__entry->batch  = batch;
>> +__entry->bundle = activate.bundle_idx;
>> +__entry->pgm= activate.pgm_idx;
>> +__entry->status = status.data;
>> +),
>> +
>> +TP_printk("batch: 0x%.2x, bundle_idx: 0x%.4x, pgm_idx: 0x%.4x, status: 
>> 0x%.16llx",
>> +__entry->batch,
>> +__entry->bundle,
>> +__entry->pgm,
>> +__entry->status)
>> +);
>> +
>>  #endif /* _TRACE_IFS_H */
>>  
>>  /* This part must be outside protection */
>> diff --git a/drivers/platform/x86/intel/ifs/runtest.c 
>> b/drivers/platform/x86/intel/ifs/runtest.c
>> index bdb31b2f45b4..69ee0eb72025 100644
>> --- a/drivers/platform/x86/intel/ifs/runtest.c
>> +++ b/drivers/platform/x86/intel/ifs/runtest.c
>> @@ -530,6 +530,7 @@ static int dosbaf(void *data)
>>   */
>>  wrmsrl(MSR_ACTIVATE_SBAF, run_params->activate->data);
>>  rdmsrl(MSR_SBAF_STATUS, status.data);
>> +trace_ifs_sbaf(ifsd->cur_batch, *run_params->activate, status);
>>  
>>  /* Pass back the result of the test */
>>  if (cpu == first)

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer




Re: [PATCH v1 4/4] trace: platform/x86/intel/ifs: Add SBAF trace support

2024-06-27 Thread Steven Rostedt
On Thu, 27 Jun 2024 02:35:16 +
Kuppuswamy Sathyanarayanan  wrote:

> From: Jithu Joseph 
> 
> Add tracing support for the SBAF IFS tests, which may be useful for
> debugging systems that fail these tests. Log details like test content
> batch number, SBAF bundle ID, program index and the exact errors or
> warnings encountered by each HT thread during the test.
> 
> Reviewed-by: Ashok Raj 
> Reviewed-by: Tony Luck 
> Signed-off-by: Jithu Joseph 
> Signed-off-by: Kuppuswamy Sathyanarayanan 
> 
> ---
>  include/trace/events/intel_ifs.h | 27 
>  drivers/platform/x86/intel/ifs/runtest.c |  1 +
>  2 files changed, 28 insertions(+)
> 
> diff --git a/include/trace/events/intel_ifs.h 
> b/include/trace/events/intel_ifs.h
> index 0d88ebf2c980..9c7413de432b 100644
> --- a/include/trace/events/intel_ifs.h
> +++ b/include/trace/events/intel_ifs.h
> @@ -35,6 +35,33 @@ TRACE_EVENT(ifs_status,
>   __entry->status)
>  );
>  
> +TRACE_EVENT(ifs_sbaf,
> +
> + TP_PROTO(int batch, union ifs_sbaf activate, union ifs_sbaf_status 
> status),
> +
> + TP_ARGS(batch, activate, status),
> +
> + TP_STRUCT__entry(
> + __field(int,batch   )
> + __field(u64,status  )

Please put the 64 bit status field before the 32 bit batch field,
otherwise this will likely create a 4 byte hole between the two fields.
Space on the ring buffer is expensive real-estate.

-- Steve

> + __field(u16,bundle  )
> + __field(u16,pgm )
> + ),
> +
> + TP_fast_assign(
> + __entry->batch  = batch;
> + __entry->bundle = activate.bundle_idx;
> + __entry->pgm= activate.pgm_idx;
> + __entry->status = status.data;
> + ),
> +
> + TP_printk("batch: 0x%.2x, bundle_idx: 0x%.4x, pgm_idx: 0x%.4x, status: 
> 0x%.16llx",
> + __entry->batch,
> + __entry->bundle,
> + __entry->pgm,
> + __entry->status)
> +);
> +
>  #endif /* _TRACE_IFS_H */
>  
>  /* This part must be outside protection */
> diff --git a/drivers/platform/x86/intel/ifs/runtest.c 
> b/drivers/platform/x86/intel/ifs/runtest.c
> index bdb31b2f45b4..69ee0eb72025 100644
> --- a/drivers/platform/x86/intel/ifs/runtest.c
> +++ b/drivers/platform/x86/intel/ifs/runtest.c
> @@ -530,6 +530,7 @@ static int dosbaf(void *data)
>*/
>   wrmsrl(MSR_ACTIVATE_SBAF, run_params->activate->data);
>   rdmsrl(MSR_SBAF_STATUS, status.data);
> + trace_ifs_sbaf(ifsd->cur_batch, *run_params->activate, status);
>  
>   /* Pass back the result of the test */
>   if (cpu == first)




Re: [PATCH 06/12] uprobes: add batch uprobe register/unregister APIs

2024-06-27 Thread Google
On Mon, 24 Jun 2024 17:21:38 -0700
Andrii Nakryiko  wrote:

> -static int __uprobe_register(struct inode *inode, loff_t offset,
> -  loff_t ref_ctr_offset, struct uprobe_consumer *uc)
> +int uprobe_register_batch(struct inode *inode, int cnt,
> +   uprobe_consumer_fn get_uprobe_consumer, void *ctx)

Is this interface just for avoiding memory allocation? Can't we just
allocate a temporary array of *uprobe_consumer instead? 

Thank you,

-- 
Masami Hiramatsu (Google)