Hi Bertrand,

On Mon, Apr 29, 2024 at 9:20 AM Bertrand Marquis
<bertrand.marq...@arm.com> wrote:
[...]
> >> +static void notif_irq_handler(int irq, void *data)
> >> +{
> >> +    const struct arm_smccc_1_2_regs arg = {
> >> +        .a0 = FFA_NOTIFICATION_INFO_GET_64,
> >> +    };
> >> +    struct arm_smccc_1_2_regs resp;
> >> +    unsigned int id_pos;
> >> +    unsigned int list_count;
> >> +    uint64_t ids_count;
> >> +    unsigned int n;
> >> +    int32_t res;
> >> +
> >> +    do {
> >> +        arm_smccc_1_2_smc(&arg, &resp);
> >> +        res = ffa_get_ret_code(&resp);
> >> +        if ( res )
> >> +        {
> >> +            if ( res != FFA_RET_NO_DATA )
> >> +                printk(XENLOG_ERR "ffa: notification info get failed: 
> >> error %d\n",
> >> +                       res);
> >> +            return;
> >> +        }
> >> +
> >> +        ids_count = resp.a2 >> FFA_NOTIF_INFO_GET_ID_LIST_SHIFT;
> >> +        list_count = ( resp.a2 >> FFA_NOTIF_INFO_GET_ID_COUNT_SHIFT ) &
> >> +                     FFA_NOTIF_INFO_GET_ID_COUNT_MASK;
> >> +
> >> +        id_pos = 0;
> >> +        for ( n = 0; n < list_count; n++ )
> >> +        {
> >> +            unsigned int count = ((ids_count >> 2 * n) & 0x3) + 1;
> >> +            struct domain *d;
> >> +
> >> +            d = ffa_get_domain_by_vm_id(get_id_from_resp(&resp, id_pos));
> >
> > Thinking a bit more about the question from Bertrand about get_domain_id() 
> > vs rcu_lock_domain_by_id(). I am actually not sure whether either are ok 
> > here.
> >
> > If I am not mistaken, d->arch.tee will be freed as part of the domain 
> > teardown process. This means you can have the following scenario:
> >
> > CPU0: ffa_get_domain_by_vm_id() (return the domain as it is alive)
> >
> > CPU1: call domain_kill()
> > CPU1: teardown is called, free d->arch.tee (the pointer is not set to NULL)
> >
> > d->arch.tee is now a dangling pointer
> >
> > CPU0: access d->arch.tee
> >
> > This implies you may need to gain a global lock (I don't have a better idea 
> > so far) to protect the IRQ handler against domains teardown.
> >
> > Did I miss anything?
>
> I think you are right which is why I was thinking to use 
> rcu_lock_live_remote_domain_by_id to only get a reference
> to the domain if it is not dying.
>
> From the comment in sched.h:
> /*
>  * rcu_lock_domain_by_id() is more efficient than get_domain_by_id().
>  * This is the preferred function if the returned domain reference
>  * is short lived,  but it cannot be used if the domain reference needs
>  * to be kept beyond the current scope (e.g., across a softirq).
>  * The returned domain reference must be discarded using rcu_unlock_domain().
>  */
>
> Now the question of short lived should be challenged but I do not think we can
> consider the current code as "long lived".
>
> It would be a good idea to start a mailing list thread discussion with other
> maintainers on the subject to confirm.
> @Jens: as i will be off for some time, would you mind doing it ?

Sure, first I'll send out a new patch set with the current comments
addressed. I'll update to use rcu_lock_live_remote_domain_by_id() both
because it makes more sense than the current code, and also to have a
good base for the discussion.

Thanks,
Jens

Reply via email to