On Thu, Dec 29, 2022 at 7:38 PM Stanislav Fomichev <s...@google.com> wrote:
>
> On Thu, Dec 29, 2022 at 7:10 PM Alexei Starovoitov
> <alexei.starovoi...@gmail.com> wrote:
> >
> > On Thu, Dec 29, 2022 at 6:13 PM Stanislav Fomichev <s...@google.com> wrote:
> > >
> > > On Tue, Dec 27, 2022 at 8:40 AM Paul Moore <p...@paul-moore.com> wrote:
> > > >
> > > > On December 26, 2022 10:35:49 PM Stanislav Fomichev 
> > > > <stfomic...@yandex.ru>
> > > > wrote:
> > > > >> On Fri, Dec 23, 2022 at 5:49 PM Stanislav Fomichev <s...@google.com> 
> > > > >> wrote:
> > > > >> get_func_ip() */
> > > > >>>> -                               tstamp_type_access:1; /* Accessed
> > > > >>>> __sk_buff->tstamp_type */
> > > > >>>> +                               tstamp_type_access:1, /* Accessed
> > > > >>>> __sk_buff->tstamp_type */
> > > > >>>> +                               valid_id:1; /* Is 
> > > > >>>> bpf_prog::aux::__id valid? */
> > > > >>>>    enum bpf_prog_type      type;           /* Type of BPF program 
> > > > >>>> */
> > > > >>>>    enum bpf_attach_type    expected_attach_type; /* For some prog 
> > > > >>>> types */
> > > > >>>>    u32                     len;            /* Number of filter 
> > > > >>>> blocks */
> > > > >>>> @@ -1688,6 +1689,12 @@ void bpf_prog_inc(struct bpf_prog *prog);
> > > > >>>> struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct 
> > > > >>>> bpf_prog *prog);
> > > > >>>> void bpf_prog_put(struct bpf_prog *prog);
> > > > >>>>
> > > > >>>> +static inline u32 bpf_prog_get_id(const struct bpf_prog *prog)
> > > > >>>> +{
> > > > >>>> +       if (WARN(!prog->valid_id, "Attempting to use an invalid 
> > > > >>>> eBPF program"))
> > > > >>>> +               return 0;
> > > > >>>> +       return prog->aux->__id;
> > > > >>>> +}
> > > > >>>
> > > > >>> I'm still missing why we need to have this WARN and have a check at 
> > > > >>> all.
> > > > >>> IIUC, we're actually too eager in resetting the id to 0, and need to
> > > > >>> keep that stale id around at least for perf/audit.
> > > > >>> Why not have a flag only to protect against double-idr_remove
> > > > >>> bpf_prog_free_id and keep the rest as is?
> > > > >>> Which places are we concerned about that used to report id=0 but now
> > > > >>> would report stale id?
> > > > >>
> > > > >> What double-idr_remove are you concerned about?
> > > > >> bpf_prog_by_id() is doing bpf_prog_inc_not_zero
> > > > >> while __bpf_prog_put just dropped it to zero.
> > > > >
> > > > > (traveling, sending from an untested setup, hope it reaches everyone)
> > > > >
> > > > > There is a call to bpf_prog_free_id from __bpf_prog_offload_destroy 
> > > > > which
> > > > > tries to make offloaded program disappear from the idr when the netdev
> > > > > goes offline. So I'm assuming that '!prog->aux->id' check in 
> > > > > bpf_prog_free_id
> > > > > is to handle that case where we do bpf_prog_free_id much earlier than 
> > > > > the
> > > > > rest of the __bpf_prog_put stuff.
> > > > >
> > > > >> Maybe just move bpf_prog_free_id() into bpf_prog_put_deferred()
> > > > >> after perf_event_bpf_event and bpf_audit_prog ?
> > > > >> Probably can remove the obsolete do_idr_lock bool flag as
> > > > >> separate patch?
> > > > >
> > > > > +1 on removing do_idr_lock separately.
> > > > >
> > > > >> Much simpler fix and no code churn.
> > > > >> Both valid_id and saved_id approaches have flaws.
> > > > >
> > > > > Given the __bpf_prog_offload_destroy path above, we still probably 
> > > > > need
> > > > > some flag to indicate that the id has been already removed from the 
> > > > > idr?
> > > >
> > > > So what do you guys want in a patch?  Is there a consensus on what you
> > > > would merge to fix this bug/regression?
> > >
> > > Can we try the following?
> > >
> > > 1. Remove calls to bpf_prog_free_id (and bpf_map_free_id?) from
> > > kernel/bpf/offload.c; that should make it easier to reason about those
> > > '!id' checks
> >
> > calls? you mean a single call, right?
>
> Right, there is a single call to bpf_prog_free_id. But there is also
> another single call to bpf_map_free_id with the same "remove it from
> idr so it can't be found if GET_NEXT_ID" reasoning.

map offloading is different from prog offload.
Like:
        if (bpf_map_is_dev_bound(map))
                return bpf_map_offload_lookup_elem(map, key, value);

gotta be much more careful with them and offload.

> It's probably worth it to look into whether we can remove it as well
> to have consistent id management for progs and maps?

I'd rather not at this point.
Consistency sounds nice, but requires a ton more work.

> > > 2. Move bpf_prog_free_id (and bpf_map_free_id?) to happen after
> > > audit/perf in kernel/bpf/syscall.c (there are comments that say "must
> > > be called first", but I don't see why; seems like GET_FD_BY_ID would
> > > correctly return -ENOENT; maybe Martin can chime in, CC'ed him
> > > explicitly)
> >
> > The comment says that it should be removed from idr
> > before __bpf_prog_put_noref will proceed to clean up.
>
> Which one? I was trying to see if there is any reasoning in the
> original commit 34ad5580f8f9 ("bpf: Add BPF_(PROG|MAP)_GET_NEXT_ID
> command"), but couldn't find anything useful :-(

Maybe back then we didn't have atomic_inc_not_zero(prog/map->refcnt) ?
I don't really recall what race we were worried about.

> > > 3. (optionally) Remove do_idr_lock arguments (all callers are passing 
> > > 'true')
> >
> > yes. please.

--
Linux-audit mailing list
Linux-audit@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-audit

Reply via email to