Re: [Intel-gfx] [PATCH v6 5/8] drm/i915/pxp: Add ARB session creation and cleanup

2023-03-31 Thread Tvrtko Ursulin



On 30/03/2023 20:44, Teres Alexis, Alan Previn wrote:

On Thu, 2023-03-30 at 13:25 +0100, Tvrtko Ursulin wrote:

On 30/03/2023 01:10, Teres Alexis, Alan Previn wrote:

On Wed, 2023-03-29 at 08:43 +0100, Tvrtko Ursulin wrote:

On 28/03/2023 18:52, Rodrigo Vivi wrote:

On Tue, Mar 28, 2023 at 05:01:36PM +, Teres Alexis, Alan Previn wrote:

On Mon, 2023-03-27 at 17:15 +0100, Tvrtko Ursulin wrote:

alan:snip (excuse my snips - my evolution keeps inserting CRs - still looking 
for solution)

But intuitively I thought that what Mesa wants is a no-cost getparam
which would somewhat reliably tell it if the feature is supposed to be
there and context create at a later stage, with the protected flag set,
is supposed to work. AFAIU it can still fail at that point or probably
block until the required setup is done.

Yes - that's right - i had another round of discussions with Daniele about a 
cleaner approach - below..
alan:snip

Even 200ms is possibly not good enough since boot time targets (to UI
AFAIR) are pretty tight. Don't know... Maybe I'd need a timeline diagram
showing the involved components to understand this properly.

Absolutely, my experiences in i915 on embedded products even had PORs of 
<1000milisec to first-fully-renderered-display from cold-boot so yes, we need 
to work with this requirement
in mind and do testing on real customer stack.

I spoke to Daniele and we have another idea - but would also impact mesa, for 
the better:

1. Introduce get-param (is_PXP_avail)
- will return a simple yes or no
- yes means : i915-device-info supports it, kernel configs 
supports it and required-firmwares were found (not necessarily loaded/init yet).
(NOTE: this would be made to hook up to pxp helpers 
such as intel_pxp_is_supported)
2. Gem-pxp-context-creation continues blocking like today with minor tweak:
(same)- success = all dependencies are in place, all firmware init 
completed, pxp arb session successfully completed.
(same)- non-success -ENODEV = if any dependency wasnt available or fw 
failed to create arb-session due to fw-init-failure/BIOS/platform config.
(tweak)- non-success -ENXIO (or some other -E'FOO') if 
component-driver-init or firmware-init is still pending after brief timeout.
- on timeout - TBD - need testing/debug on real world stack.
- UAPI spec needs update but pxp implementation currently uses 
-ENXIO for similiar reason inheritted first merge.

Thus, with this: Get param would always be immediate. Pxp-context-creation 
would only block when all dependencies are in place and we attempt to create 
the pxp arb session.
(firmware can take up to 200-milisecs, according to MTL spec, so I'd say ~210 
given other overheads between i915 and fw and back).
We would need to change MESA-get-caps to use get-param (and not 
pxp-context-creation) as it would always return immediately with kernel side 
support.
And if application explicitly requires PXP support, then it needs to call 
pxp-context-creation that may block or require retry.


The above sounds good to me.

I am only not 100% clear on the ENODEV option from context create, does 
it include even things which can be detected without any timeouts at 
probe time, or just failures which take time to learn about.



WRT to fast-boot-to-first-frame, I am hoping real customer stack doesn't 
require PXP on the compositor and first mesa instance works fine without PXP 
caps.
And when customer apps that needs PXP starts, it would create pxp context which 
would block but the app would not have a choice.


Yeah that sounds like an unlikely use case and one that we cannot 
improve on the kernel or uapi side.


(I can imagine resuming directly into a full screen video playback post 
suspend, but a cold boot into it is a stretch.)


Regards,

Tvrtko


Re: [Intel-gfx] [PATCH v6 5/8] drm/i915/pxp: Add ARB session creation and cleanup

2023-03-30 Thread Teres Alexis, Alan Previn
On Thu, 2023-03-30 at 13:25 +0100, Tvrtko Ursulin wrote:
> On 30/03/2023 01:10, Teres Alexis, Alan Previn wrote:
> > On Wed, 2023-03-29 at 08:43 +0100, Tvrtko Ursulin wrote:
> > > On 28/03/2023 18:52, Rodrigo Vivi wrote:
> > > > On Tue, Mar 28, 2023 at 05:01:36PM +, Teres Alexis, Alan Previn 
> > > > wrote:
> > > > > On Mon, 2023-03-27 at 17:15 +0100, Tvrtko Ursulin wrote:
alan:snip (excuse my snips - my evolution keeps inserting CRs - still looking 
for solution)
> But intuitively I thought that what Mesa wants is a no-cost getparam 
> which would somewhat reliably tell it if the feature is supposed to be 
> there and context create at a later stage, with the protected flag set, 
> is supposed to work. AFAIU it can still fail at that point or probably 
> block until the required setup is done.
Yes - that's right - i had another round of discussions with Daniele about a 
cleaner approach - below.. 
alan:snip
> Even 200ms is possibly not good enough since boot time targets (to UI 
> AFAIR) are pretty tight. Don't know... Maybe I'd need a timeline diagram 
> showing the involved components to understand this properly.
Absolutely, my experiences in i915 on embedded products even had PORs of 
<1000milisec to first-fully-renderered-display from cold-boot so yes, we need 
to work with this requirement
in mind and do testing on real customer stack.

I spoke to Daniele and we have another idea - but would also impact mesa, for 
the better:

1. Introduce get-param (is_PXP_avail)
- will return a simple yes or no
- yes means : i915-device-info supports it, kernel configs 
supports it and required-firmwares were found (not necessarily loaded/init yet).
(NOTE: this would be made to hook up to pxp helpers 
such as intel_pxp_is_supported)
2. Gem-pxp-context-creation continues blocking like today with minor tweak:
(same)- success = all dependencies are in place, all firmware init 
completed, pxp arb session successfully completed.
(same)- non-success -ENODEV = if any dependency wasnt available or fw 
failed to create arb-session due to fw-init-failure/BIOS/platform config.
(tweak)- non-success -ENXIO (or some other -E'FOO') if 
component-driver-init or firmware-init is still pending after brief timeout.
- on timeout - TBD - need testing/debug on real world stack.
- UAPI spec needs update but pxp implementation currently uses 
-ENXIO for similiar reason inheritted first merge.

Thus, with this: Get param would always be immediate. Pxp-context-creation 
would only block when all dependencies are in place and we attempt to create 
the pxp arb session.
(firmware can take up to 200-milisecs, according to MTL spec, so I'd say ~210 
given other overheads between i915 and fw and back).
We would need to change MESA-get-caps to use get-param (and not 
pxp-context-creation) as it would always return immediately with kernel side 
support.
And if application explicitly requires PXP support, then it needs to call 
pxp-context-creation that may block or require retry.
WRT to fast-boot-to-first-frame, I am hoping real customer stack doesn't 
require PXP on the compositor and first mesa instance works fine without PXP 
caps.
And when customer apps that needs PXP starts, it would create pxp context which 
would block but the app would not have a choice.

..alan


Re: [Intel-gfx] [PATCH v6 5/8] drm/i915/pxp: Add ARB session creation and cleanup

2023-03-30 Thread Tvrtko Ursulin



On 30/03/2023 01:10, Teres Alexis, Alan Previn wrote:


On Wed, 2023-03-29 at 08:43 +0100, Tvrtko Ursulin wrote:

On 28/03/2023 18:52, Rodrigo Vivi wrote:

On Tue, Mar 28, 2023 at 05:01:36PM +, Teres Alexis, Alan Previn wrote:

On Mon, 2023-03-27 at 17:15 +0100, Tvrtko Ursulin wrote:


alan:snip

How will the context create path look like on those platforms:

1. Block, then potentially error out if the full initialization failed.
2. Error out "in progress" while initializing, error out "something
else" if initialization failed.


alan:i was thinking of taking a page from huc-authentication's get-param where 
we could return different values based on startup progress - in all cases we 
never block:


If context create never blocks then the only added value of new getparam 
is just granularity of reported statuses, versus potential overload of 
errnos from context create?



  1. we dont support it in hw/kernel (i.e. not pxp in device-info or not enough 
CONFIG_foo - reusing intel_pxp_is_supported?)
  2. we support it in kernel but internal dependencies are still in progress 
(i.e. we have not yet completed huc-loading/huc-authen/proxy-init - UAPI spec 
should include how many
max seconds delay per platform)
  3. we support it in kernel but internal dependencies failed (i.e. we know 
huc-load/authent. failed ... or we know proxy-init failed).
  4. we support it in kernel but platform has no support (at this stage we 
actually attempt to create a PXP context or create the arb-session from inside 
i915-get-param but we ended
up a PXP fw error indicating select list of failures such as fusing / 
BIOS-config / wrong-version.
  0. we support it completely i.e. step 4's attempt to create active PXP 
session succeeded

I want to differentiate 3 and 4 (as opposed to return x-means-ENODEV) because i 
have am sure it will save debug time when facing customer issues.
Ofc we will have to optimize the checking sequences above but at #4, we would 
be creating a session which might take up to ~200 milisecs for the round trip 
response from fw.


Not sure I get this. If getparam is trying to set this up, which is 
possibly questionable in itself, where it needs to block for 200ms 
(max?), and nothing else blocks, then where is the missing max delay 
mentioned as a starting point for the discussion? Is it expected to 
elapse while userspace is repeatedly calling getparam and it is getting 
some "in progress" value?



We could store a flag in i915-pxp-internal-struct to indicate if we ever did 
succeed a pxp creation after a fresh boot ... intel_pxp_is_ready_for_active()?
... true only if we ever did allocate a slot successfully at least once since 
boot.
This also ensure mesa init will return almost immediately except at the first 
time when hitting #4.


Even 200ms is possibly not good enough since boot time targets (to UI 
AFAIR) are pretty tight. Don't know... Maybe I'd need a timeline diagram 
showing the involved components to understand this properly.


But intuitively I thought that what Mesa wants is a no-cost getparam 
which would somewhat reliably tell it if the feature is supposed to be 
there and context create at a later stage, with the protected flag set, 
is supposed to work. AFAIU it can still fail at that point or probably 
block until the required setup is done.


Regards,

Tvrtko


Re: [Intel-gfx] [PATCH v6 5/8] drm/i915/pxp: Add ARB session creation and cleanup

2023-03-29 Thread Teres Alexis, Alan Previn

On Wed, 2023-03-29 at 08:43 +0100, Tvrtko Ursulin wrote:
> On 28/03/2023 18:52, Rodrigo Vivi wrote:
> > On Tue, Mar 28, 2023 at 05:01:36PM +, Teres Alexis, Alan Previn wrote:
> > > On Mon, 2023-03-27 at 17:15 +0100, Tvrtko Ursulin wrote:
> > > 
alan:snip
> How will the context create path look like on those platforms:
> 
> 1. Block, then potentially error out if the full initialization failed.
> 2. Error out "in progress" while initializing, error out "something 
> else" if initialization failed.
> 
alan:i was thinking of taking a page from huc-authentication's get-param where 
we could return different values based on startup progress - in all cases we 
never block:
 1. we dont support it in hw/kernel (i.e. not pxp in device-info or not enough 
CONFIG_foo - reusing intel_pxp_is_supported?)
 2. we support it in kernel but internal dependencies are still in progress 
(i.e. we have not yet completed huc-loading/huc-authen/proxy-init - UAPI spec 
should include how many
max seconds delay per platform)
 3. we support it in kernel but internal dependencies failed (i.e. we know 
huc-load/authent. failed ... or we know proxy-init failed).
 4. we support it in kernel but platform has no support (at this stage we 
actually attempt to create a PXP context or create the arb-session from inside 
i915-get-param but we ended
up a PXP fw error indicating select list of failures such as fusing / 
BIOS-config / wrong-version.
 0. we support it completely i.e. step 4's attempt to create active PXP session 
succeeded

I want to differentiate 3 and 4 (as opposed to return x-means-ENODEV) because i 
have am sure it will save debug time when facing customer issues.
Ofc we will have to optimize the checking sequences above but at #4, we would 
be creating a session which might take up to ~200 milisecs for the round trip 
response from fw.
We could store a flag in i915-pxp-internal-struct to indicate if we ever did 
succeed a pxp creation after a fresh boot ... intel_pxp_is_ready_for_active()? 
... true only if we ever did allocate a slot successfully at least once since 
boot.
This also ensure mesa init will return almost immediately except at the first 
time when hitting #4.


Re: [Intel-gfx] [PATCH v6 5/8] drm/i915/pxp: Add ARB session creation and cleanup

2023-03-29 Thread Tvrtko Ursulin



On 28/03/2023 18:52, Rodrigo Vivi wrote:

On Tue, Mar 28, 2023 at 05:01:36PM +, Teres Alexis, Alan Previn wrote:

On Mon, 2023-03-27 at 17:15 +0100, Tvrtko Ursulin wrote:

These two:
e6177ec586d1 ("drm/i915/huc: stall media submission until HuC is loaded")
b76c14c8fb2a ("drm/i915/huc: better define HuC status getparam possible return 
values.")
They do not help here? It is not possible to use or extend the refined 
I915_PARAM_HUC_STATUS return values combined with huc load fence for this all 
to keep working?

Checking is-huc-loaded won't reflect is-pxp-available (in case fw/fusing isn't 
allowing it). But the connection between them is hw-internal so i915 asking 
PXP-fw to attempt a PXP
session depends on HuC (and the 3 other things i mentioned). However, Tvrtko's 
point on using fences-or-equivalent is the same thing John Harrison brought up 
offline with Daniele
as the proper kernel way to do this type of dependency checking. However, any 
form of dependency-checking won't improve UMD's experience. We still need to 
decide if i915-PXP should
wait-in-kernel or return some-new-spec-error. A useful data point: we are 
debugging a related issue on actual customer stack. The compositor using mesa 
is hitting this code path
very early.. even before gsc-proxy component is loaded and we are trying to 
figure out why delaying inside intel_pxp_start is not helping (more delays 
causes the gsc-proxy
component to also get delayed) but that is a different conversation. I'm only 
mentioning this because we have a strict requirement to get the desktop login 
window up within 1-2
seconds of bootloader->kernel handoff. That said, if use -EAGAIN, I'm not sure 
if that would work as it would delay the compositor startup beyond the typical end 
user experience
unless MESA has a timeout to give up on a cap-testing when seeing repeated 
-EAGAIN (doubt mesa folks like this?). Perhaps we could just immediately return 
with a different error
(instead of current PXP-UAPI spec of -EINVAL or -ENODEV)... perhaps use -ENXIO 
which apparently is already part of the original pxp code but was never 
mentioned in UAPI - but we
return this immediately and document it in UAPI as "hw/fw insfratructure is not 
yet ready to create pxp arb session, user space can retry but may need delays of up 
to x-seconds on
ADl/TGL or y-seconds on MTL, before getting a SUCCESS or one of the other 
errors).


fair enough. It looks like we need a new get_param! :)


To check I understood this - new parameter would be like "is pxp support 
present"?


And then later it can still fail to initialize due some parameters not 
easily detectable during boot/probe?


How will the context create path look like on those platforms:

1. Block, then potentially error out if the full initialization failed.

2. Error out "in progress" while initializing, error out "something 
else" if initialization failed.


?

Regards,

Tvrtko


Re: [Intel-gfx] [PATCH v6 5/8] drm/i915/pxp: Add ARB session creation and cleanup

2023-03-28 Thread Rodrigo Vivi
On Tue, Mar 28, 2023 at 05:01:36PM +, Teres Alexis, Alan Previn wrote:
> On Mon, 2023-03-27 at 17:15 +0100, Tvrtko Ursulin wrote:
> > These two:
> > e6177ec586d1 ("drm/i915/huc: stall media submission until HuC is loaded")
> > b76c14c8fb2a ("drm/i915/huc: better define HuC status getparam possible 
> > return values.")
> > They do not help here? It is not possible to use or extend the refined 
> > I915_PARAM_HUC_STATUS return values combined with huc load fence for this 
> > all to keep working?
> Checking is-huc-loaded won't reflect is-pxp-available (in case fw/fusing 
> isn't allowing it). But the connection between them is hw-internal so i915 
> asking PXP-fw to attempt a PXP
> session depends on HuC (and the 3 other things i mentioned). However, 
> Tvrtko's point on using fences-or-equivalent is the same thing John Harrison 
> brought up offline with Daniele
> as the proper kernel way to do this type of dependency checking. However, any 
> form of dependency-checking won't improve UMD's experience. We still need to 
> decide if i915-PXP should
> wait-in-kernel or return some-new-spec-error. A useful data point: we are 
> debugging a related issue on actual customer stack. The compositor using mesa 
> is hitting this code path
> very early.. even before gsc-proxy component is loaded and we are trying to 
> figure out why delaying inside intel_pxp_start is not helping (more delays 
> causes the gsc-proxy
> component to also get delayed) but that is a different conversation. I'm only 
> mentioning this because we have a strict requirement to get the desktop login 
> window up within 1-2
> seconds of bootloader->kernel handoff. That said, if use -EAGAIN, I'm not 
> sure if that would work as it would delay the compositor startup beyond the 
> typical end user experience
> unless MESA has a timeout to give up on a cap-testing when seeing repeated 
> -EAGAIN (doubt mesa folks like this?). Perhaps we could just immediately 
> return with a different error
> (instead of current PXP-UAPI spec of -EINVAL or -ENODEV)... perhaps use 
> -ENXIO which apparently is already part of the original pxp code but was 
> never mentioned in UAPI - but we
> return this immediately and document it in UAPI as "hw/fw insfratructure is 
> not yet ready to create pxp arb session, user space can retry but may need 
> delays of up to x-seconds on
> ADl/TGL or y-seconds on MTL, before getting a SUCCESS or one of the other 
> errors).

fair enough. It looks like we need a new get_param! :)


Re: [Intel-gfx] [PATCH v6 5/8] drm/i915/pxp: Add ARB session creation and cleanup

2023-03-28 Thread Teres Alexis, Alan Previn
On Mon, 2023-03-27 at 17:15 +0100, Tvrtko Ursulin wrote:
> These two:
> e6177ec586d1 ("drm/i915/huc: stall media submission until HuC is loaded")
> b76c14c8fb2a ("drm/i915/huc: better define HuC status getparam possible 
> return values.")
> They do not help here? It is not possible to use or extend the refined 
> I915_PARAM_HUC_STATUS return values combined with huc load fence for this all 
> to keep working?
Checking is-huc-loaded won't reflect is-pxp-available (in case fw/fusing isn't 
allowing it). But the connection between them is hw-internal so i915 asking 
PXP-fw to attempt a PXP
session depends on HuC (and the 3 other things i mentioned). However, Tvrtko's 
point on using fences-or-equivalent is the same thing John Harrison brought up 
offline with Daniele
as the proper kernel way to do this type of dependency checking. However, any 
form of dependency-checking won't improve UMD's experience. We still need to 
decide if i915-PXP should
wait-in-kernel or return some-new-spec-error. A useful data point: we are 
debugging a related issue on actual customer stack. The compositor using mesa 
is hitting this code path
very early.. even before gsc-proxy component is loaded and we are trying to 
figure out why delaying inside intel_pxp_start is not helping (more delays 
causes the gsc-proxy
component to also get delayed) but that is a different conversation. I'm only 
mentioning this because we have a strict requirement to get the desktop login 
window up within 1-2
seconds of bootloader->kernel handoff. That said, if use -EAGAIN, I'm not sure 
if that would work as it would delay the compositor startup beyond the typical 
end user experience
unless MESA has a timeout to give up on a cap-testing when seeing repeated 
-EAGAIN (doubt mesa folks like this?). Perhaps we could just immediately return 
with a different error
(instead of current PXP-UAPI spec of -EINVAL or -ENODEV)... perhaps use -ENXIO 
which apparently is already part of the original pxp code but was never 
mentioned in UAPI - but we
return this immediately and document it in UAPI as "hw/fw insfratructure is not 
yet ready to create pxp arb session, user space can retry but may need delays 
of up to x-seconds on
ADl/TGL or y-seconds on MTL, before getting a SUCCESS or one of the other 
errors).


Re: [Intel-gfx] [PATCH v6 5/8] drm/i915/pxp: Add ARB session creation and cleanup

2023-03-27 Thread Tvrtko Ursulin



On 27/03/2023 08:07, Lionel Landwerlin wrote:

On 26/03/2023 14:18, Rodrigo Vivi wrote:
On Sat, Mar 25, 2023 at 02:19:21AM -0400, Teres Alexis, Alan Previn 
wrote:

alan:snip

@@ -353,8 +367,20 @@ int intel_pxp_start(struct intel_pxp *pxp)
alan:snip

+    if (HAS_ENGINE(pxp->ctrl_gt, GSC0)) {
+    /*
+ * GSC-fw loading, GSC-proxy init (requiring an mei 
component driver) and
+ * HuC-fw loading must all occur first before we start 
requesting for PXP
+ * sessions. Checking HuC authentication (the last 
dependency)  will suffice.
+ * Let's use a much larger 8 second timeout considering 
all the types of

+ * dependencies prior to that.
+ */
+    if 
(wait_for(intel_huc_is_authenticated(>ctrl_gt->uc.huc), 8000))
This big timeout needs an ack from userspace drivers, as 
intel_pxp_start

is called during context creation and the current way to query if the
feature is supported is to create a protected context. 
Unfortunately, we
do need to wait to confirm that PXP is available (although in most 
cases

it shouldn't take even close to 8 secs), because until everything is
setup we're not sure if things will work as expected. I see 2 potential
mitigations in case the timeout doesn't work as-is:

1) we return -EAGAIN (or another dedicated error code) to userspace if
the prerequisite steps aren't done yet. This would indicate that the
feature is there, but that we haven't completed the setup yet. The
caller can then decide if they want to retry immediately or later. Pro:
more flexibility for userspace; Cons: new interface return code.

2) we add a getparam to say if PXP is supported in HW and the 
support is
compiled in i915. Userspace can query this as a way to check the 
feature

support and only create the context if they actually need it for PXP
operations. Pro: simpler kernel implementation; Cons: new getparam, 
plus

even if the getparam returns true the pxp_start could later fail, so
userspace needs to handle that case.


These two:

e6177ec586d1 ("drm/i915/huc: stall media submission until HuC is loaded")
b76c14c8fb2a ("drm/i915/huc: better define HuC status getparam possible return 
values.")

They do not help here? It is not possible to use or extend the refined 
I915_PARAM_HUC_STATUS return values combined with huc load fence for this all 
to keep working?

Regards,

Tvrtko

alan: I've cc'd Rodrigo, Joonas and Lionel. Folks - what are your 
thoughts on above issue?
Recap: On MTL, only when creating a GEM Protected (PXP) context for 
the very first time after
a driver load, it will be dependent on (1) loading the GSC firmware, 
(2) GuC loading the HuC
firmware and (3) GSC authenticating the HuC fw. But step 3 also 
depends on additional
GSC-proxy-init steps that depend on a new mei-gsc-proxy component 
driver. I'd used the
8 second number based on offline conversations with Daniele but that 
is a worse-case.
Alternatively, should we change UAPI instead to return -EAGAIN as per 
Daniele's proposal?
I believe we've had the get-param conversation offline recently and 
the direction was to
stick with attempting to create the context as it is normal in 3D UMD 
when it comes to

testing capabilities for other features too.

Thoughts?
I like the option 1 more. This extra return handling won't break 
compatibility.



I like option 2 better because we have to report support as fast as we 
can when enumerating devices on the system for example.


If I understand correctly, with the get param, most apps won't ever be 
blocking on any PXP stuff if they don't use it.


Only the ones that require protected support might block.


-Lionel





Re: [Intel-gfx] [PATCH v6 5/8] drm/i915/pxp: Add ARB session creation and cleanup

2023-03-27 Thread Lionel Landwerlin

On 26/03/2023 14:18, Rodrigo Vivi wrote:

On Sat, Mar 25, 2023 at 02:19:21AM -0400, Teres Alexis, Alan Previn wrote:

alan:snip

@@ -353,8 +367,20 @@ int intel_pxp_start(struct intel_pxp *pxp)
alan:snip

+   if (HAS_ENGINE(pxp->ctrl_gt, GSC0)) {
+   /*
+* GSC-fw loading, GSC-proxy init (requiring an mei component 
driver) and
+* HuC-fw loading must all occur first before we start 
requesting for PXP
+* sessions. Checking HuC authentication (the last dependency)  
will suffice.
+* Let's use a much larger 8 second timeout considering all the 
types of
+* dependencies prior to that.
+*/
+   if (wait_for(intel_huc_is_authenticated(>ctrl_gt->uc.huc), 
8000))

This big timeout needs an ack from userspace drivers, as intel_pxp_start
is called during context creation and the current way to query if the
feature is supported is to create a protected context. Unfortunately, we
do need to wait to confirm that PXP is available (although in most cases
it shouldn't take even close to 8 secs), because until everything is
setup we're not sure if things will work as expected. I see 2 potential
mitigations in case the timeout doesn't work as-is:

1) we return -EAGAIN (or another dedicated error code) to userspace if
the prerequisite steps aren't done yet. This would indicate that the
feature is there, but that we haven't completed the setup yet. The
caller can then decide if they want to retry immediately or later. Pro:
more flexibility for userspace; Cons: new interface return code.

2) we add a getparam to say if PXP is supported in HW and the support is
compiled in i915. Userspace can query this as a way to check the feature
support and only create the context if they actually need it for PXP
operations. Pro: simpler kernel implementation; Cons: new getparam, plus
even if the getparam returns true the pxp_start could later fail, so
userspace needs to handle that case.


alan: I've cc'd Rodrigo, Joonas and Lionel. Folks - what are your thoughts on 
above issue?
Recap: On MTL, only when creating a GEM Protected (PXP) context for the very 
first time after
a driver load, it will be dependent on (1) loading the GSC firmware, (2) GuC 
loading the HuC
firmware and (3) GSC authenticating the HuC fw. But step 3 also depends on 
additional
GSC-proxy-init steps that depend on a new mei-gsc-proxy component driver. I'd 
used the
8 second number based on offline conversations with Daniele but that is a 
worse-case.
Alternatively, should we change UAPI instead to return -EAGAIN as per Daniele's 
proposal?
I believe we've had the get-param conversation offline recently and the 
direction was to
stick with attempting to create the context as it is normal in 3D UMD when it 
comes to
testing capabilities for other features too.

Thoughts?

I like the option 1 more. This extra return handling won't break compatibility.



I like option 2 better because we have to report support as fast as we 
can when enumerating devices on the system for example.


If I understand correctly, with the get param, most apps won't ever be 
blocking on any PXP stuff if they don't use it.


Only the ones that require protected support might block.


-Lionel





Re: [Intel-gfx] [PATCH v6 5/8] drm/i915/pxp: Add ARB session creation and cleanup

2023-03-26 Thread Rodrigo Vivi
On Sat, Mar 25, 2023 at 02:19:21AM -0400, Teres Alexis, Alan Previn wrote:
> alan:snip
> 
> > 
> @@ -353,8 +367,20 @@ int intel_pxp_start(struct intel_pxp *pxp)
> alan:snip
> > > + if (HAS_ENGINE(pxp->ctrl_gt, GSC0)) {
> > > + /*
> > > +  * GSC-fw loading, GSC-proxy init (requiring an mei component 
> > > driver) and
> > > +  * HuC-fw loading must all occur first before we start 
> > > requesting for PXP
> > > +  * sessions. Checking HuC authentication (the last dependency)  
> > > will suffice.
> > > +  * Let's use a much larger 8 second timeout considering all the 
> > > types of
> > > +  * dependencies prior to that.
> > > +  */
> > > + if (wait_for(intel_huc_is_authenticated(>ctrl_gt->uc.huc), 
> > > 8000))
> > 
> > This big timeout needs an ack from userspace drivers, as intel_pxp_start 
> > is called during context creation and the current way to query if the 
> > feature is supported is to create a protected context. Unfortunately, we 
> > do need to wait to confirm that PXP is available (although in most cases 
> > it shouldn't take even close to 8 secs), because until everything is 
> > setup we're not sure if things will work as expected. I see 2 potential 
> > mitigations in case the timeout doesn't work as-is:
> > 
> > 1) we return -EAGAIN (or another dedicated error code) to userspace if 
> > the prerequisite steps aren't done yet. This would indicate that the 
> > feature is there, but that we haven't completed the setup yet. The 
> > caller can then decide if they want to retry immediately or later. Pro: 
> > more flexibility for userspace; Cons: new interface return code.
> > 
> > 2) we add a getparam to say if PXP is supported in HW and the support is 
> > compiled in i915. Userspace can query this as a way to check the feature 
> > support and only create the context if they actually need it for PXP 
> > operations. Pro: simpler kernel implementation; Cons: new getparam, plus 
> > even if the getparam returns true the pxp_start could later fail, so 
> > userspace needs to handle that case.
> > 
> 
> alan: I've cc'd Rodrigo, Joonas and Lionel. Folks - what are your thoughts on 
> above issue?
> Recap: On MTL, only when creating a GEM Protected (PXP) context for the very 
> first time after
> a driver load, it will be dependent on (1) loading the GSC firmware, (2) GuC 
> loading the HuC
> firmware and (3) GSC authenticating the HuC fw. But step 3 also depends on 
> additional
> GSC-proxy-init steps that depend on a new mei-gsc-proxy component driver. I'd 
> used the
> 8 second number based on offline conversations with Daniele but that is a 
> worse-case.
> Alternatively, should we change UAPI instead to return -EAGAIN as per 
> Daniele's proposal?
> I believe we've had the get-param conversation offline recently and the 
> direction was to
> stick with attempting to create the context as it is normal in 3D UMD when it 
> comes to
> testing capabilities for other features too.
> 
> Thoughts?

I like the option 1 more. This extra return handling won't break compatibility.


Re: [Intel-gfx] [PATCH v6 5/8] drm/i915/pxp: Add ARB session creation and cleanup

2023-03-25 Thread Teres Alexis, Alan Previn
alan:snip

> 
@@ -353,8 +367,20 @@ int intel_pxp_start(struct intel_pxp *pxp)
alan:snip
> > +   if (HAS_ENGINE(pxp->ctrl_gt, GSC0)) {
> > +   /*
> > +* GSC-fw loading, GSC-proxy init (requiring an mei component 
> > driver) and
> > +* HuC-fw loading must all occur first before we start 
> > requesting for PXP
> > +* sessions. Checking HuC authentication (the last dependency)  
> > will suffice.
> > +* Let's use a much larger 8 second timeout considering all the 
> > types of
> > +* dependencies prior to that.
> > +*/
> > +   if (wait_for(intel_huc_is_authenticated(>ctrl_gt->uc.huc), 
> > 8000))
> 
> This big timeout needs an ack from userspace drivers, as intel_pxp_start 
> is called during context creation and the current way to query if the 
> feature is supported is to create a protected context. Unfortunately, we 
> do need to wait to confirm that PXP is available (although in most cases 
> it shouldn't take even close to 8 secs), because until everything is 
> setup we're not sure if things will work as expected. I see 2 potential 
> mitigations in case the timeout doesn't work as-is:
> 
> 1) we return -EAGAIN (or another dedicated error code) to userspace if 
> the prerequisite steps aren't done yet. This would indicate that the 
> feature is there, but that we haven't completed the setup yet. The 
> caller can then decide if they want to retry immediately or later. Pro: 
> more flexibility for userspace; Cons: new interface return code.
> 
> 2) we add a getparam to say if PXP is supported in HW and the support is 
> compiled in i915. Userspace can query this as a way to check the feature 
> support and only create the context if they actually need it for PXP 
> operations. Pro: simpler kernel implementation; Cons: new getparam, plus 
> even if the getparam returns true the pxp_start could later fail, so 
> userspace needs to handle that case.
> 

alan: I've cc'd Rodrigo, Joonas and Lionel. Folks - what are your thoughts on 
above issue?
Recap: On MTL, only when creating a GEM Protected (PXP) context for the very 
first time after
a driver load, it will be dependent on (1) loading the GSC firmware, (2) GuC 
loading the HuC
firmware and (3) GSC authenticating the HuC fw. But step 3 also depends on 
additional
GSC-proxy-init steps that depend on a new mei-gsc-proxy component driver. I'd 
used the
8 second number based on offline conversations with Daniele but that is a 
worse-case.
Alternatively, should we change UAPI instead to return -EAGAIN as per Daniele's 
proposal?
I believe we've had the get-param conversation offline recently and the 
direction was to
stick with attempting to create the context as it is normal in 3D UMD when it 
comes to
testing capabilities for other features too.

Thoughts?


Re: [Intel-gfx] [PATCH v6 5/8] drm/i915/pxp: Add ARB session creation and cleanup

2023-03-25 Thread Teres Alexis, Alan Previn
On Fri, 2023-03-03 at 17:34 -0800, Ceraolo Spurio, Daniele wrote:
> 
> On 2/27/2023 6:21 PM, Alan Previn wrote:
> > Add MTL's function for ARB session creation using PXP firmware
> > version 4.3 ABI structure format.
> 
alan:snip

> > +   ret = gsccs_send_message_retry_complete(pxp,
> > +   _in, sizeof(msg_in),
> > +   _out, sizeof(msg_out), 
> > NULL);
> > +   if (ret)
> > +   drm_warn(>drm, "Failed to send gsccs msg for 
> > creating-session-%d: ret=[%d]\n",
> > +arb_session_id, ret);
> > +   else if (msg_out.header.status != 0x0)
> > +   drm_warn(>drm, "PXP firmware failed on 
> > creating-session-%d: status=0x%08x\n",
> > +arb_session_id, msg_out.header.status);
> 
> Should this follow the same log style as 
> https://patchwork.freedesktop.org/patch/521431/? same for the function 
> below.
alan: yeah - i was planning to update once the other got an Rb which is just 
did.
In the coming rev7, I'll probably duplicate some of the fw-err-to-string 
function
for both GSCCS and TEE in the event new or existing error conditions are 
differently
determined as platform persisting issue or runtime issues. 

alan:snip
> > +void intel_pxp_gsccs_end_arb_fw_session(struct intel_pxp *pxp, u32 
> > session_id)
> > +{
> > +   struct drm_i915_private *i915 = pxp->ctrl_gt->i915;
> > +   struct pxp42_inv_stream_key_in msg_in = {0};
> > +   struct pxp42_inv_stream_key_out msg_out = {0};
> > +   int ret = 0;
> > +
> > +   memset(_in, 0, sizeof(msg_in));
> > +   memset(_out, 0, sizeof(msg_out));
> 
> You're already initializing the structs to zero with "= {0}"
> 
alan: oops - copy+paste error - will fix.



Re: [Intel-gfx] [PATCH v6 5/8] drm/i915/pxp: Add ARB session creation and cleanup

2023-03-03 Thread Ceraolo Spurio, Daniele




On 2/27/2023 6:21 PM, Alan Previn wrote:

Add MTL's function for ARB session creation using PXP firmware
version 4.3 ABI structure format.

Also add MTL's function for ARB session invalidation but this
reuses PXP firmware version 4.2 ABI structure format.

Before checking the return status, look at the GSC-CS-Mem-Header's
pending-bit which means the GSC firmware is busy and we should
resubmit.

Signed-off-by: Alan Previn 
---
  drivers/gpu/drm/i915/pxp/intel_pxp.c  | 34 --
  .../drm/i915/pxp/intel_pxp_cmd_interface_43.h | 21 +++
  drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.c| 62 +++
  drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h|  4 ++
  drivers/gpu/drm/i915/pxp/intel_pxp_session.c  | 11 +++-
  5 files changed, 126 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp.c
index aecc65b5da70..61041277be24 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp.c
@@ -290,6 +290,8 @@ static bool pxp_component_bound(struct intel_pxp *pxp)
  
  static int __pxp_global_teardown_final(struct intel_pxp *pxp)

  {
+   int timeout;
+
if (!pxp->arb_is_valid)
return 0;
/*
@@ -299,7 +301,12 @@ static int __pxp_global_teardown_final(struct intel_pxp 
*pxp)
intel_pxp_mark_termination_in_progress(pxp);
intel_pxp_terminate(pxp, false);
  
-	if (!wait_for_completion_timeout(>termination, msecs_to_jiffies(250)))

+   if (HAS_ENGINE(pxp->ctrl_gt, GSC0))
+   timeout = GSC_PENDING_RETRY_LATENCY_MS;
+   else
+   timeout = 250;
+
+   if (!wait_for_completion_timeout(>termination, 
msecs_to_jiffies(timeout)))
return -ETIMEDOUT;
  
  	return 0;

@@ -307,6 +314,8 @@ static int __pxp_global_teardown_final(struct intel_pxp 
*pxp)
  
  static int __pxp_global_teardown_restart(struct intel_pxp *pxp)

  {
+   int timeout;
+
if (pxp->arb_is_valid)
return 0;
/*
@@ -315,7 +324,12 @@ static int __pxp_global_teardown_restart(struct intel_pxp 
*pxp)
 */
pxp_queue_termination(pxp);
  
-	if (!wait_for_completion_timeout(>termination, msecs_to_jiffies(250)))

+   if (HAS_ENGINE(pxp->ctrl_gt, GSC0))
+   timeout = GSC_PENDING_RETRY_LATENCY_MS;
+   else
+   timeout = 250;
+
+   if (!wait_for_completion_timeout(>termination, 
msecs_to_jiffies(timeout)))
return -ETIMEDOUT;
  
  	return 0;

@@ -353,8 +367,20 @@ int intel_pxp_start(struct intel_pxp *pxp)
if (!intel_pxp_is_enabled(pxp))
return -ENODEV;
  
-	if (wait_for(pxp_component_bound(pxp), 250))

-   return -ENXIO;
+   if (HAS_ENGINE(pxp->ctrl_gt, GSC0)) {
+   /*
+* GSC-fw loading, GSC-proxy init (requiring an mei component 
driver) and
+* HuC-fw loading must all occur first before we start 
requesting for PXP
+* sessions. Checking HuC authentication (the last dependency)  
will suffice.
+* Let's use a much larger 8 second timeout considering all the 
types of
+* dependencies prior to that.
+*/
+   if (wait_for(intel_huc_is_authenticated(>ctrl_gt->uc.huc), 
8000))


This big timeout needs an ack from userspace drivers, as intel_pxp_start 
is called during context creation and the current way to query if the 
feature is supported is to create a protected context. Unfortunately, we 
do need to wait to confirm that PXP is available (although in most cases 
it shouldn't take even close to 8 secs), because until everything is 
setup we're not sure if things will work as expected. I see 2 potential 
mitigations in case the timeout doesn't work as-is:


1) we return -EAGAIN (or another dedicated error code) to userspace if 
the prerequisite steps aren't done yet. This would indicate that the 
feature is there, but that we haven't completed the setup yet. The 
caller can then decide if they want to retry immediately or later. Pro: 
more flexibility for userspace; Cons: new interface return code.


2) we add a getparam to say if PXP is supported in HW and the support is 
compiled in i915. Userspace can query this as a way to check the feature 
support and only create the context if they actually need it for PXP 
operations. Pro: simpler kernel implementation; Cons: new getparam, plus 
even if the getparam returns true the pxp_start could later fail, so 
userspace needs to handle that case.



+   return -ENXIO;
+   } else {
+   if (wait_for(pxp_component_bound(pxp), 250))
+   return -ENXIO;
+   }
  
  	mutex_lock(>arb_mutex);
  
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h

index b2523d6918c7..9089e02a8c2d 100644
--- 

[Intel-gfx] [PATCH v6 5/8] drm/i915/pxp: Add ARB session creation and cleanup

2023-02-27 Thread Alan Previn
Add MTL's function for ARB session creation using PXP firmware
version 4.3 ABI structure format.

Also add MTL's function for ARB session invalidation but this
reuses PXP firmware version 4.2 ABI structure format.

Before checking the return status, look at the GSC-CS-Mem-Header's
pending-bit which means the GSC firmware is busy and we should
resubmit.

Signed-off-by: Alan Previn 
---
 drivers/gpu/drm/i915/pxp/intel_pxp.c  | 34 --
 .../drm/i915/pxp/intel_pxp_cmd_interface_43.h | 21 +++
 drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.c| 62 +++
 drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.h|  4 ++
 drivers/gpu/drm/i915/pxp/intel_pxp_session.c  | 11 +++-
 5 files changed, 126 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp.c
index aecc65b5da70..61041277be24 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp.c
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp.c
@@ -290,6 +290,8 @@ static bool pxp_component_bound(struct intel_pxp *pxp)
 
 static int __pxp_global_teardown_final(struct intel_pxp *pxp)
 {
+   int timeout;
+
if (!pxp->arb_is_valid)
return 0;
/*
@@ -299,7 +301,12 @@ static int __pxp_global_teardown_final(struct intel_pxp 
*pxp)
intel_pxp_mark_termination_in_progress(pxp);
intel_pxp_terminate(pxp, false);
 
-   if (!wait_for_completion_timeout(>termination, 
msecs_to_jiffies(250)))
+   if (HAS_ENGINE(pxp->ctrl_gt, GSC0))
+   timeout = GSC_PENDING_RETRY_LATENCY_MS;
+   else
+   timeout = 250;
+
+   if (!wait_for_completion_timeout(>termination, 
msecs_to_jiffies(timeout)))
return -ETIMEDOUT;
 
return 0;
@@ -307,6 +314,8 @@ static int __pxp_global_teardown_final(struct intel_pxp 
*pxp)
 
 static int __pxp_global_teardown_restart(struct intel_pxp *pxp)
 {
+   int timeout;
+
if (pxp->arb_is_valid)
return 0;
/*
@@ -315,7 +324,12 @@ static int __pxp_global_teardown_restart(struct intel_pxp 
*pxp)
 */
pxp_queue_termination(pxp);
 
-   if (!wait_for_completion_timeout(>termination, 
msecs_to_jiffies(250)))
+   if (HAS_ENGINE(pxp->ctrl_gt, GSC0))
+   timeout = GSC_PENDING_RETRY_LATENCY_MS;
+   else
+   timeout = 250;
+
+   if (!wait_for_completion_timeout(>termination, 
msecs_to_jiffies(timeout)))
return -ETIMEDOUT;
 
return 0;
@@ -353,8 +367,20 @@ int intel_pxp_start(struct intel_pxp *pxp)
if (!intel_pxp_is_enabled(pxp))
return -ENODEV;
 
-   if (wait_for(pxp_component_bound(pxp), 250))
-   return -ENXIO;
+   if (HAS_ENGINE(pxp->ctrl_gt, GSC0)) {
+   /*
+* GSC-fw loading, GSC-proxy init (requiring an mei component 
driver) and
+* HuC-fw loading must all occur first before we start 
requesting for PXP
+* sessions. Checking HuC authentication (the last dependency)  
will suffice.
+* Let's use a much larger 8 second timeout considering all the 
types of
+* dependencies prior to that.
+*/
+   if (wait_for(intel_huc_is_authenticated(>ctrl_gt->uc.huc), 
8000))
+   return -ENXIO;
+   } else {
+   if (wait_for(pxp_component_bound(pxp), 250))
+   return -ENXIO;
+   }
 
mutex_lock(>arb_mutex);
 
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h 
b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
index b2523d6918c7..9089e02a8c2d 100644
--- a/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
+++ b/drivers/gpu/drm/i915/pxp/intel_pxp_cmd_interface_43.h
@@ -11,6 +11,7 @@
 
 /* PXP-Cmd-Op definitions */
 #define PXP43_CMDID_START_HUC_AUTH 0x003A
+#define PXP43_CMDID_INIT_SESSION 0x0036
 
 /* PXP-Packet sizes for MTL's GSCCS-HECI instruction */
 #define PXP43_MAX_HECI_IN_SIZE (SZ_32K)
@@ -27,4 +28,24 @@ struct pxp43_start_huc_auth_out {
struct pxp_cmd_header header;
 } __packed;
 
+/* PXP-Input-Packet: Init PXP session */
+struct pxp43_create_arb_in {
+   struct pxp_cmd_header header;
+   /* header.stream_id fields for vesion 4.3 of Init PXP session: 
*/
+   #define PXP43_INIT_SESSION_VALID BIT(0)
+   #define PXP43_INIT_SESSION_APPTYPE BIT(1)
+   #define PXP43_INIT_SESSION_APPID GENMASK(17, 2)
+   u32 protection_mode;
+   #define PXP43_INIT_SESSION_PROTECTION_ARB 0x2
+   u32 sub_session_id;
+   u32 init_flags;
+   u32 rsvd[12];
+} __packed;
+
+/* PXP-Input-Packet: Init PXP session */
+struct pxp43_create_arb_out {
+   struct pxp_cmd_header header;
+   u32 rsvd[8];
+} __packed;
+
 #endif /* __INTEL_PXP_FW_INTERFACE_43_H__ */
diff --git a/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.c 
b/drivers/gpu/drm/i915/pxp/intel_pxp_gsccs.c
index