On Fri, 2023-02-17 at 03:12 +0000, Teres Alexis, Alan Previn wrote:
> On Tue, 2023-02-14 at 13:38 -0800, Teres Alexis, Alan Previn wrote:
> > Add MTL's function for ARB session creation using PXP firmware
> > version 4.3 ABI structure format.
> 
> alan:snip
> 
> Not part of this patch today but a new modification is required that would 
> end up going into this patch --->
> 
> So from the internal testing we are doing on MTL, i have noticed that the 
> first time the GSC firmware
> is requested to init the arb session (right after a cold-boot or  
> driver-reload-after-flr), it takes much longer.
> This has resulted in the observation of the following problematic event flow:
> 
> 1. app or igt calls gem-context-create to create a protected context (after a 
> fresh boot or driver reload).
> 2. intel_pxp_start will begin the global teardown and recreation where:
>       2-a: the first part (i.e. session teardown) is skipped (since arb 
> session wasnt created before this)
>         2-b: the second part (i.e. arb session init commands via the gsc 
> firmware) does happen and takes a long time (on first time)
> 3. step 2 is queued thru a worker while the main call into intel_pxp_start 
> continues to wait for the arb
>     session to start and finally bails out with a timeout (back up through 
> gem-context-create).
> 4. app retries again and now we get a second call that repeats step 1 while 
> 2-b is still wrapping up.
>     so depending on the race of this step 4 (step-1-recall) vs the completion 
> of step 2-b, we could end up
>     getting a 2nd teardown right (i.e. step 2-a going in) after the the first 
> arb-session-creation completed
>     ... eventhough in both cases app just wants the creation.
> 
> The simplest fix (with minimal code changes) would be to add a complementary 
> "is_arb_creation_pending" flag
> alongside the is_arb_valid flag - with both remainining protected by the 
> arb-mutex. That said, we I'll respin rev6
> with this fix along with other mutex fix on Patch4.

After additional offline discussions with Daniele, we've decided against adding 
more complexity.
Instead we'll get the official timeout spec from the gsc-firmware and bump up 
the arb-session timeout
creation across the call-stack to ensure its sufficient and if it fails, we 
return -ENODEV indicating
we do not have PXP support. Although this will block the gem protected-context 
creation, it will not
block other apps, only the ones creating protected contexts which would end up 
waiting somewhere for
the gsc fw (from an e2e system level) no matter what design we employ so 
bumping out timeout with a 
hard -ENODEV seems like the most straight forward.

Reply via email to