On 08.05.2024 09:10, Jens Wiklander wrote:
> On Fri, May 3, 2024 at 12:32 PM Jan Beulich <jbeul...@suse.com> wrote:
>> Furthermore, is it guaranteed that the IRQ handler won't interrupt code
>> fiddling with the domain list? I don't think it is, since
>> domlist_update_lock isn't acquired in an IRQ-safe manner. Looks like
>> you need to defer the operation on the domain until softirq or tasklet
>> context.
> 
> Thanks for the suggestion, I'm testing it as:
> static DECLARE_TASKLET(notif_sri_tasklet, notif_sri_action, NULL);
> 
> static void notif_irq_handler(int irq, void *data)
> {
>     tasklet_schedule(&notif_sri_tasklet);
> }
> 
> Where notif_sri_action() does what notif_irq_handler() did before
> (using rcu_lock_domain_by_id()).
> 
> I have one more question regarding this.
> 
> Even with the RCU lock if I understand it correctly, it's possible for
> domain_kill() to tear down the domain. Or as Julien explained it in
> another thread [3]:
>> CPU0: ffa_get_domain_by_vm_id() (return the domain as it is alive)
>>
>> CPU1: call domain_kill()
>> CPU1: teardown is called, free d->arch.tee (the pointer is not set to NULL)
>>
>> d->arch.tee is now a dangling pointer
>>
>> CPU0: access d->arch.tee
>>
>> This implies you may need to gain a global lock (I don't have a better
>> idea so far) to protect the IRQ handler against domains teardown.
> 
> I'm trying to address that (now in a tasklet) with:
>     /*
>      * domain_kill() calls ffa_domain_teardown() which will free
>      * d->arch.tee, but not set it to NULL. This can happen while holding
>      * the RCU lock.
>      *
>      * domain_lock() will stop rspin_barrier() in domain_kill(), unless
>      * we're already past rspin_barrier(), but then will d->is_dying be
>      * non-zero.
>      */
>     domain_lock(d);
>     if ( !d->is_dying )
>     {
>         struct ffa_ctx *ctx = d->arch.tee;
> 
>         ACCESS_ONCE(ctx->notif.secure_pending) = true;
>     }
>     domain_unlock(d);
> 
> It seems to work, but I'm worried I'm missing something or abusing
> domain_lock().

Well. Yes, this is one way of dealing with the issue. Yet as you suspect it
feels like an abuse of domain_lock(); that function would better be avoided
whenever possible. (It had some very unhelpful uses long ago.)

Another approach would generally be to do respective cleanup not from
underneath domain_kill(), but complete_domain_destroy(). It's not really
clear to me which of the two approaches is better in this case.

Jan

Reply via email to