RE: hexagon: modeling a shared lock state

Brian Cain Thu, 25 Jan 2024 08:30:12 -0800


> -----Original Message-----
> From: Richard Henderson <richard.hender...@linaro.org>
> Sent: Wednesday, January 24, 2024 6:22 PM
> To: Brian Cain <bc...@quicinc.com>; Philippe Mathieu-Daudé
> <phi...@linaro.org>
> Cc: qemu-devel@nongnu.org; Sid Manning <sidn...@quicinc.com>; Marco
> Liebel <mlie...@qti.qualcomm.com>; Matheus Bernardino
> <mathb...@qti.qualcomm.com>
> Subject: Re: hexagon: modeling a shared lock state
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary of
> any links or attachments, and do not enable macros.
> 
> On 1/25/24 01:07, Brian Cain wrote:
> > Philippe,
> >
> > For hexagon sysemu, while internally reviewing patches to be upstreamed we
> noticed that
> > our design for a lock instruction is not quite suitable.  The k0lock 
> > instruction
> will halt
> > if some other hexagon hardware CPU has already claimed the kernel lock,
> only to continue
> > executing once some CPU executes the unlock instruction.  We modeled this
> using a lock
> > state enumeration member { OWNER, WAITING, UNLOCKED } in **each**
> vCPU and atomically
> > transitioning the lock required us to have vCPU[n] write the updated lock
> state to vCPU[m]
> > when unlocking.
> >
> > In context:
> >
> https://github.com/quic/qemu/blob/hexagon_sysemu_20_dec_2023/target/he
> xagon/op_helper.c#L1790-L1821
> <https://github.com/quic/qemu/blob/hexagon_sysemu_20_dec_2023/target/h
> exagon/op_helper.c#L1790-L1821>
> >
> > So instead, maybe it makes more sense to have a system device hold a single
> representation
> > of the lock’s state and each vCPU can do some kind of access via load/store
> and/or
> > interrupts to/from the device?  I was thinking of raising an interrupt on 
> > the
> lock device
> > at the vCPU’s k0lock instruction to indicate demand for the lock, and then 
> > the
> device
> > could raise an interrupt to one of the vCPUs when it’s granted the lock.  
> > One
> drawback for
> > this is that this might add significant latency to the uncontested lock 
> > case.  So
> I also
> > considered doing a load of some part of the lock device’s memory that could
> indicate
> > whether the lock was available, and if available it would claim it with a 
> > store
> (both
> > ld/st while holding BQL).  Only if unavailable it would halt and raise the
> interrupt.  Is
> > it possible to create an address space that’s independent of the true system
> memory map
> > for this use case or would we need to carve out some memory from the
> existing memory map
> > for this synthetic device?  Or – do you have a suggestion for a better
> approach overall?
> 
> I think you are over-thinking this.  A system device would just get in the 
> way.  I


Ok, great - our existing design is ~roughly like this.  But we can explore this 
-- thanks for writing this example!

> think
> you want something like
> 
>    struct CPUHexagonState {
>      ...
>      bool hwlock_pending;
>    }
> 
>    hexagon_cpu_has_work() {
>      if (cpu->hwlock_pending) {
>        return false;
>      }
>    }
> 
>    static void do_hwlock(CPUHexagonState *env, bool *lock)
>    {
>      bql_lock();
> 
>      if (*lock) {
>        env->hwlock_pending = true;
>        cs->halted = true;
>        cs->exception_index = EXCP_HALTED;
>        bql_unlock();
>        cpu_loop_exit(cs);

In place of the above - we have cpu_interrupt(cs, CPU_INTERRUPT_HALT) -- but is 
that equivalent?  Is there one idiom that's preferred over another?  Somehow it 
seems simpler if we don't need to longjmp and IIRC some of these patterns do.

>      } else {
>        *lock = true;
>        bql_unlock();
>      }
>    }
> 
>    static void do_hwunlock(CPUHexagonState *env, bool *lock)
>    {
>      CPUState *cs;
>      BQL_LOCK_GUARD();
> 
>      *lock = false;
> 
>      CPU_FOREACH(cs) {
>        if (cs->hwlock_pending) {
>          cs->hwlock_pending = false;
>          qemu_cpu_kick(cs);
>        }
>      }
>    }
> 
>    static bool k0lock, tlblock;
> 
>    void HELPER(k0lock)(CPUHexagonState *env)
>    void HELPER(tlblock)(CPUHexagonState *env)
>    void HELPER(k0unlock)(CPUHexagonState *env)
>    void HELPER(tlbunlock)(CPUHexagonState *env)
> 
> Open questions are:
> 
> (1) Do interrupts cancel lock wait?  Either way, placement in
> hexagon_cpu_has_work is key.

Yeah - they do, we will double check the interaction w has_work.

> (2) I assume that the pc will not be advanced, so that after the kick we will 
> re-
> execute
> the hwlock instruction.  There will be a thundering herd racing to grab the 
> lock
> again,
> but it saves queuing logic, which might be complicated if interrupts are
> involved.

Yes that's right, too.  Thanks!

RE: hexagon: modeling a shared lock state

Reply via email to