Re: [PATCH v6 6/8] drm/i915/pxp: MTL-KCR interrupt ctrl's are in GT-0

2023-04-05 Thread Teres Alexis, Alan Previn
> 
alan:snip

> >   void intel_pxp_irq_enable(struct intel_pxp *pxp)
> >   {
> > -   struct intel_gt *gt = pxp->ctrl_gt;
> > +   struct intel_gt *gt = intel_pxp_get_irq_gt(pxp);
> 
> in this function we use the gt for:
> 
> 1 - the lock: see above about this
> 
> 2 - gen11_gt_reset_one_iir(): this should work with the media GT (we use 
> it for media GuC)
> 
> 3 - writes to the GEN11_CRYPTO_* regs: those should also work with the 
> media GT uncore as these regs are in the same range as the GuC scratch 
> regs and we use the media uncore for those accesses.
> 
alan:snip
> > @@ -83,7 +101,7 @@ void intel_pxp_irq_enable(struct intel_pxp *pxp)
> >   
> >   void intel_pxp_irq_disable(struct intel_pxp *pxp)
> >   {
> > -   struct intel_gt *gt = pxp->ctrl_gt;
> > +   struct intel_gt *gt = intel_pxp_get_irq_gt(pxp);
> >   
> 
> AFAICS this functions uses the same 3 cases as above.
> 
> Overall, I am not sure this patch is required. Am I missing something?
> 
alan: context: during development of my initial few revs, i needed to
explicitly do that switch-over to the other gt in order to even get the IRQs.
(i.e. as if the forcewake didnt wake up the range)... but upon recent
retesting it seems to work fine. I guess there must have been a bug
somewhere else in my branch.
So yes i believe this means we can totally remove this patch



Re: [PATCH v30 0/7] Add MediaTek SoC DRM (vdosys1) support for mt8195

2023-04-05 Thread Chen-Yu Tsai
On Mon, Apr 3, 2023 at 5:47 PM Krzysztof Kozlowski
 wrote:
>
> On 03/04/2023 05:30, Chun-Kuang Hu wrote:
> > Hi, Chen-yu:
> >
> > Chen-Yu Tsai  於 2023年3月30日 週四 下午7:05寫道:
> >>
> >> On Mon, Mar 27, 2023 at 11:17 PM Chun-Kuang Hu  
> >> wrote:
> >>>
> >>> Hi, Angelo:
> >>>
> >>> AngeloGioacchino Del Regno  於
> >>> 2023年3月24日 週五 下午4:38寫道:
> 
>  Il 24/03/23 00:25, Chun-Kuang Hu ha scritto:
> > Hi, Angelo:
> >
> > AngeloGioacchino Del Regno  於
> > 2023年3月23日 週四 下午4:58寫道:
> >>
> >> Il 21/03/23 13:18, Nancy.Lin ha scritto:
> >>> The hardware path of vdosys1 with DPTx output need to go through by 
> >>> several modules, such as, OVL_ADAPTOR and MERGE.
> >>>
> >>> Add DRM and these modules support by the patches below:
> >>>
> >>
> >> I've tested v30 again on MT8173, MT8192 and MT8195 based Chromebooks.
> >> Green light from me.
> >
> > I'm curious about how you build code and test on Chromebooks. Do you
> > build in cros environment or pure linux
> > (https://archlinuxarm.org/platforms/armv8/mediatek/acer-chromebook-r13).
> > I've a MT8183 based Chromebook (HP 11a) and I've tried to run a
> > upstream kernel on it. cros is too heavy for me and I doubt I could
> > use it. I've tried the pure linux and could boot up with console, but
> > display does not work. If you use the pure linux environment, could
> > you share how it works?
> >
> 
>  I haven't tested MT8183 (I don't actually have any 8183 machine in my 
>  hands)... but
>  yes, I can share my test environment.
> 
>  I have one MicroSD that I use either in the MicroSD slot of the target 
>  machine, or
>  in a USB reader; this *single* system is what I boot on *all* 
>  Chromebooks that I
>  have: one kernel, multiple devicetrees, same Debian-based userspace.
> 
>  What we have to prepare this bootable media can be found at [1], but 
>  beware that
>  it currently uses an outdated kernel, so, what I have locally is a 
>  symlink to my
>  kernel tree.
>  You can change/add/remove the devicetree blobs that will get added to 
>  the image
>  by modifying `chromebook-setup.sh`; before tampering with kernel tree 
>  symlink,
>  please run that script for the first time, as it will download a 
>  cross-compiler,
>  a kernel tree (that you will replace for sure) and the (very old) Debian 
>  rootfs
>  that you can update with `apt-get dist-upgrade` after booting the 
>  Chromebook.
> 
>  If you want to check about possible kernel configuration differences, 
>  what I use
>  is at [2], so that you can compare.
> >>>
> >>> Thanks for the information, I would try to compare the kernel config 
> >>> first.
> >>
> >> Hi CK,
> >>
> >> Would you consider adding your repo to linux-next? That would let everyone
> >> do integration testing, especially automated ones, earlier, before you send
> >> your PRs to drm maintainers.
> >>
> >> You can do so by sending an email to Stephen Rothwell to do so.
> >
> > I don't understand what this process is. Does it means that I directly
> > upstream patches into linux-next? I prefer that my patches go through
> > drm maintainers' tree. Does any document introduce this process?
>
> All maintainers and sub-maintainers trees are supposed to be fed into
> linux-next.
>
> https://lore.kernel.org/linux-next/20230327124805.3ca4f...@canb.auug.org.au/T/#md226a8e714cc731c2ab4ba5ee7eb43fe21a55009
>
> Documentation/process/howto.rst
> Documentation/process/2.Process.rst

As Krzysztof mentioned, the purpose of linux-next is for integration testing.
Your queued up patches still go through the DRM tree for eventually merging
into Linus's tree. Getting included into linux-next means that your branch
gets automatically merged together with other branches, gets build and boot
tested by the various CI platforms and bots out there, and provides a common
base for other developers.

I don't think there is any downside to having your branch integrated into
linux-next.

ChenYu


Re: [PATCH v6 7/8] drm/i915/pxp: On MTL, KCR enabling doesn't wait on tee component

2023-04-05 Thread Teres Alexis, Alan Previn
> 
alan:snip
> > @@ -140,10 +141,15 @@ static void pxp_init_full(struct intel_pxp *pxp)
> > if (ret)
> > return;
> >   
> > -   if (HAS_ENGINE(pxp->ctrl_gt, GSC0))
> > +   if (HAS_ENGINE(pxp->ctrl_gt, GSC0)) {
> > ret = intel_pxp_gsccs_init(pxp);
> > -   else
> > +   if (!ret) {
> > +   with_intel_runtime_pm(>ctrl_gt->i915->runtime_pm, 
> > wakeref)
> > +   intel_pxp_init_hw(pxp);
> 
> personal preference: I'd move this (and the matching call in fini) 
> inside intel_pxp_gsccs_init/fini. That way we can see this as more 
> back-end specific: the gsccs initialize everything immediately, while 
> the tee back-end follows a 2-step approach with the component.
> Not a blocker since it is a personal preference, so with or without the 
> change:
> 
> Reviewed-by: Daniele Ceraolo Spurio 
> 
> Daniele

alan: will make that change too - thanks for the R-b.
alan:snip


Re: [Intel-gfx] [PATCH] drm/atomic-helper: Don't set deadline for modesets

2023-04-05 Thread Daniel Vetter
On Wed, Apr 05, 2023 at 06:50:22AM -0700, Rob Clark wrote:
> On Wed, Apr 5, 2023 at 6:31 AM Daniel Vetter  wrote:
> >
> > If the crtc is being switched on or off then the semantics of
> > computing the timestampe of the next vblank is somewhat ill-defined.
> > And indeed, the code splats with a warning in the timestamp
> > computation code. Specifically it hits the check to make sure that
> > atomic drivers have full set up the timing constants in the drm_vblank
> > structure, and that's just not the case before the crtc is actually
> > on.
> >
> > For robustness it seems best to just not set deadlines for modesets.
> >
> > v2: Also skip on inactive crtc (Ville)
> >
> > Link: 
> > https://lore.kernel.org/dri-devel/dfc21f18-7e1e-48f0-c05a-d659b9c90...@linaro.org/
> > Fixes: d39e48ca80c0 ("drm/atomic-helper: Set fence deadline for vblank")
> > Cc: Ville Syrjälä 
> > Cc: Rob Clark 
> > Cc: Daniel Vetter 
> > Cc: Maarten Lankhorst 
> > Cc: Maxime Ripard 
> > Cc: Thomas Zimmermann 
> > Reported-by: Dmitry Baryshkov 
> > Tested-by: Dmitry Baryshkov  # test patch only
> > Cc: Dmitry Baryshkov 
> > Signed-off-by: Daniel Vetter 
> 
> Reviewed-by: Rob Clark 

Merged to drm-misc-next, thanks for review.

> 
> > ---
> >  drivers/gpu/drm/drm_atomic_helper.c | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
> > b/drivers/gpu/drm/drm_atomic_helper.c
> > index f21b5a74176c..d44fb9b87ef8 100644
> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > @@ -1528,6 +1528,12 @@ static void set_fence_deadline(struct drm_device 
> > *dev,
> > for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
> > ktime_t v;
> >
> > +   if (drm_atomic_crtc_needs_modeset(new_crtc_state))
> > +   continue;
> > +
> > +   if (!new_crtc_state->active)
> > +   continue;
> > +
> > if (drm_crtc_next_vblank_start(crtc, ))
> > continue;
> >
> > --
> > 2.40.0
> >

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH RFC 18/18] drm/asahi: Add the Asahi driver for Apple AGX GPUs

2023-04-05 Thread Asahi Lina

Same as the prior email, this was supposed to go to Daniel...

On 06/04/2023 14.02, Asahi Lina wrote:

On 05/04/2023 23.44, Daniel Vetter wrote:

On Tue, Mar 07, 2023 at 11:25:43PM +0900, Asahi Lina wrote:

+/// Look up a GEM object handle for a `File` and return an `ObjectRef` for it.
+pub(crate) fn lookup_handle(file: , handle: u32) -> Result {
+Ok(ObjectRef::new(shmem::Object::lookup_handle(file, handle)?))
+}


So maybe my expectations for rust typing is a bit too much, but I kinda
expected this to be fully generic:

- trait Driver (drm_driver) knows the driver's object type
- a generic create_handle function could ensure that for drm_file (which
is always for a specific drm_device and hence Driver) can ensure at the
type level that you only put the right objects into the drm_file
- a generic lookup_handle function on the drm_file knows the Driver trait
and so can give you back the right type right away.

Why the wrapping, what do I miss?


Sigh, so this is one of the many ways I'm trying to work around the
"Rust doesn't do subclasses" problem (so we can figure out what the best
one is ^^).

The generic shmem::Object::lookup_handle() call *is* fully generic and
will get you back a driver-specific object. But since Rust doesn't do
subclassing, what you get back isn't a driver-specific type T, but
rather a (reference to a) shmem::Object. T represents the inner
driver-specific data/functionality (only), and the outer
shmem::Object includes the actual drm_gem_shmem_object plus a T. This
is backwards from C, where you expect the opposite situation where T
contains a shmem object, but that just doesn't work with Rust because
there's no way to build a safe API around that model as far as I know.

Now the problem is from the higher layers I want object operations that
interact with the shmem::Object (that is, they call generic GEM
functions on the object). Options so far:

1. Add an outer wrapper and put that functionality there.
2. Just have the functions on T as helpers, so you need to call
T::foo(obj) instead of obj.foo().
3. Use the undocumented method receiver trait thing to make
shmem::Object a valid `self` type, plus add auto-Deref to
shmem::Object. Then obj.foo() works.

#1 is what I use here. #2 is how the driver-specific File ioctl
callbacks are implemented, and also sched::Job. #3 is used for fence
callbacks (FenceObject). None of them are great, and I'd love to hear
what people think of the various options...

There are other unexplored options, like in this GEM case it could be
covered with a driver-internal auxiliary trait impl'd on
shmem::Object buuut that doesn't work when you actually need
callbacks on T itself to circle back to shmem::Object, as is the case
with File/Job/FenceObject.

~~ Lina



~~ Lina



Re: [PATCH RFC 18/18] drm/asahi: Add the Asahi driver for Apple AGX GPUs

2023-04-05 Thread Asahi Lina
Argh. This (and my other reply) was supposed to go to Daniel, but 
Thunderbird... just dropped that recipient? And then my silly brain saw 
all the Cc:s go to To: and figured it was some weird consolidation and 
so I moved everything to Cc: except the only name that started with "Da" 
and... yeah, that wasn't the same person.


Sorry for the confusion... I have no idea why Thunderbird hates Daniel...

On 06/04/2023 13.44, Asahi Lina wrote:

On 05/04/2023 23.37, Daniel Vetter wrote:

On Tue, Mar 07, 2023 at 11:25:43PM +0900, Asahi Lina wrote:

+/// A generic monotonically incrementing ID used to uniquely identify object 
instances within the
+/// driver.
+pub(crate) struct ID(AtomicU64);
+
+impl ID {
+/// Create a new ID counter with a given value.
+fn new(val: u64) -> ID {
+ID(AtomicU64::new(val))
+}
+
+/// Fetch the next unique ID.
+pub(crate) fn next() -> u64 {
+self.0.fetch_add(1, Ordering::Relaxed)
+}
+}


Continuing the theme of me commenting on individual things, I stumbled
over this because I noticed that there's a lot of id based lookups where I
don't expect them, and started chasing.

- For ids use xarray, not atomic counters. Yes I know dma_fence timelines
gets this wrong, this goes back to an innocent time where we didn't
allocate more than one timeline per engine, and no one fixed it since
then. Yes u64 should be big enough for everyone :-/

- Attaching ID spaces to drm_device is also not great. drm is full of
these mistakes. Much better if their per drm_file and so private to each
client.

- They shouldn't be used for anything else than uapi id -> kernel object
lookup at the beginning of ioctl code, and nowhere else. At least from
skimming it seems like these are used all over the driver codebase,
which does freak me out. At least on the C side that's a clear indicator
for a refcount/lockin/data structure model that's not thought out at
all.

What's going on here, what do I miss?


These aren't UAPI IDs, they are driver-internal IDs (the UAPI IDs do use
xarray and are per-File). Most of them are just for debugging, so that
when I enable full debug spam I have some way to correlate different
things that are happening together (this subset of interleaved log lines
relate to the same submission). Basically just object names that are
easier to read (and less of a security leak) than pointers and
guaranteed not to repeat. You could get rid of most of them and it
wouldn't affect the driver design, it just makes it very hard to see
what's going on with debug logs ^^;

There are only two that are ever used for non-debugging purposes: the VM
ID, and the File ID. Both are per-device global IDs attached to the VMs
(not the UAPI VM objects, but rather the underlyng MMU address space
managers they represent, including the kernel-internal ones) and to
Files themselves. They are used for destroying GEM objects: since the
objects are also device-global across multiple clients, I need a way to
do things like "clean up all mappings for this File" or "clean up all
mappings for this VM". There's an annoying circular reference between
GEM objects and their mappings, which is why this is explicitly coded
out in destroy paths instead of naturally happening via Drop semantics
(without that cleanup code, the circular reference leaks it).

So e.g. when a File does a GEM close or explicitly asks for all mappings
of an object to be removed, it goes out to the (possibly shared) GEM
object and tells it to drop all mappings marked as owned by that unique
File ID. When an explicit "unmap all in VM" op happens, it asks the GEM
object to drop all mappings for that underlying VM ID. Similarly, when a
UAPI VM object is dropped (in the Drop impl, so both explicitly and when
the whole File/xarray is dropped and such), that does an explicit unmap
of a special dummy object it owns which would otherwise leak since it is
not tracked as a GEM object owned by that File and therefore not handled
by GEM closing. And again along the same lines, the allocators in
alloc.rs explicitly destroy the mappings for their backing GEM objects
on Drop. All this is due to that annoying circular reference between VMs
and GEM objects that I'm not sure how to fix.

Note that if I *don't* do this (or forget to do it somewhere) the
consequence is just that we leak memory, and if you try to destroy the
wrong IDs somehow the worst that can happen is you unmap things you
shouldn't and fault the GPU (or, in the kernel or kernel-managed user VM
cases, potentially the firmware). Rust safety guarantees still keep
things from going entirely off the rails within the kernel, since
everything that matters is reference counted (which is why these
reference cycles are possible at all).

This all started when I was looking at the panfrost driver for
reference. It does the same thing except it uses actual pointers to the
owning entities instead of IDs, and pointer comparison (see

Re: [Intel-gfx] [PATCH] i915/guc/slpc: Provide sysfs for efficient freq

2023-04-05 Thread Dixit, Ashutosh
On Wed, 05 Apr 2023 13:12:29 -0700, Rodrigo Vivi wrote:
>
> On Wed, Apr 05, 2023 at 12:42:30PM -0700, Dixit, Ashutosh wrote:
> > On Wed, 05 Apr 2023 06:57:42 -0700, Rodrigo Vivi wrote:
> > >

Hi Rodrigo,

> >
> > > On Fri, Mar 31, 2023 at 08:11:29PM -0700, Dixit, Ashutosh wrote:
> > > > On Fri, 31 Mar 2023 19:00:49 -0700, Vinay Belgaumkar wrote:
> > > > >
> > > >
> > > > Hi Vinay,
> > > >
> > > > > @@ -478,20 +507,15 @@ int intel_guc_slpc_set_min_freq(struct 
> > > > > intel_guc_slpc *slpc, u32 val)
> > > > >   val > slpc->max_freq_softlimit)
> > > > >   return -EINVAL;
> > > > >
> > > > > + /* Ignore efficient freq if lower min freq is requested */
> > > > > + ret = intel_guc_slpc_set_ignore_eff_freq(slpc, val < 
> > > > > slpc->rp1_freq);
> > > > > + if (ret)
> > > > > + goto out;
> > > > > +
> > > >
> > > > I don't agree with this. If we are now providing an interface 
> > > > explicitly to
> > > > ignore RPe, that should be /only/ way to ignore RPe. There should be no
> > > > other "under the hood" ignoring of RPe. In other words, ignoring RPe 
> > > > should
> > > > be minimized unless explicitly requested.
> > > >
> > > > I don't clearly understand why this was done previously but it makes 
> > > > even
> > > > less sense to me now after this patch.
> > >
> > > well, I had suggested this previously. And just because without this we 
> > > would
> > > be breaking API expectations.
> > >
> > > When user selects a minimal frequency it expect that to stick. But with 
> > > the
> > > efficient freq enabled in guc if minimal is less than the efficient one,
> > > this request is likely ignored.
> > >
> > > Well, even worse is that we are actually caching the request in the soft 
> > > values.
> > > So we show a minimal, but the hardware without any workload is operating 
> > > at
> > > efficient.
> > >
> > > So, the thought process was: 'if user requested a very low minimal, we 
> > > give them
> > > the minimal requested, even if that means to disable the efficient freq.'
> >
> > Hmm, I understand this even less now :)
> >
> > * Why is RPe ignored when min < RPe? Since the freq can be between min and
> >   max? Shouldn't the condition be min > RPe, that is turn RPe off if min
> >   higher that RPe is requested?
>
> that is not how guc efficient freq selection works. (unless my memory is
> tricking me right now.)
>
> So, if we select a min that is between RPe and RP0, guc will respect and
> use the selected min. So we don't need to disable guc selection of the
> efficient.
>
> This is not true when we select a very low min like RPn. If we select RPn
> as min and guc efficient freq selection is enabled guc will simply ignore
> our request. So the only way to give the user what is asked, is to also
> disable guc's efficient freq selection. (I probably confused you in the
> previous email because I used 'RP0' when I meant 'RPn'. I hope it gets
> clear now).
>
> >
> > * Also isn't RPe dynamic, so we can't say RPe == rp1 when using in KMD?
>
> Oh... yeap, this is an issue indeed. Specially with i915 where we have
> the soft values cached instead of asking guc everytime.
>
> That's a good point. The variance is not big, but we will hit corner cases.
> One way is to keep checking and updating everytime a sysfs is touched.

This I believe not possible in all cases. Say the freq's are set through
sysfs first and the workload starts later. In this case RPe will probably
start changing after the workload starts, not when freq's are set in sysfs.

> Other way is do what you are suggesting and let's just accept and deal
> with the reality that is: "we cannot guarantee a min freq selection if user
> doesn't disable the efficient freq selection".
>
> >
> > * Finally, we know that enabling RPe broke the kernel freq API because RPe
> >   could go over max_freq. So it is actually the max freq which is not
> >   obeyed after RPe is enabled.
>
> Oh! so it was my bad memory indeed and everything is the other way around?
> But I just looked to Xe code, my most recent memory, and I just needed
> to toggle the efficient freq off on the case that I mentioned, when min
> selection is below the efficient one. With that all the API expectation
> that I coded in IGT works neatly.

From what I saw the following bugs:

https://gitlab.freedesktop.org/drm/intel/-/issues/6806
https://gitlab.freedesktop.org/drm/intel/-/issues/6786

and the following patches in response to these bugs (as well as the
discussion on these patches):

https://patchwork.freedesktop.org/series/111282/
https://patchwork.freedesktop.org/series/110574/

were all due to the fact that the max freq is not obeyed after RPe is
enabled.

These patches were never merged but I will now try to submit them again
after this ignore efficient freq patch gets reviewed and merged.

Thanks.
--
Ashutosh

> >
> > So we ignore RPe in some select cases (which also I don't understand as
> > mentioned above but maybe I am missing something) to 

Re: [PATCH RFC 18/18] drm/asahi: Add the Asahi driver for Apple AGX GPUs

2023-04-05 Thread Asahi Lina

On 05/04/2023 23.44, Daniel Vetter wrote:

On Tue, Mar 07, 2023 at 11:25:43PM +0900, Asahi Lina wrote:

+/// Look up a GEM object handle for a `File` and return an `ObjectRef` for it.
+pub(crate) fn lookup_handle(file: , handle: u32) -> Result {
+Ok(ObjectRef::new(shmem::Object::lookup_handle(file, handle)?))
+}


So maybe my expectations for rust typing is a bit too much, but I kinda
expected this to be fully generic:

- trait Driver (drm_driver) knows the driver's object type
- a generic create_handle function could ensure that for drm_file (which
   is always for a specific drm_device and hence Driver) can ensure at the
   type level that you only put the right objects into the drm_file
- a generic lookup_handle function on the drm_file knows the Driver trait
   and so can give you back the right type right away.

Why the wrapping, what do I miss?


Sigh, so this is one of the many ways I'm trying to work around the 
"Rust doesn't do subclasses" problem (so we can figure out what the best 
one is ^^).


The generic shmem::Object::lookup_handle() call *is* fully generic and 
will get you back a driver-specific object. But since Rust doesn't do 
subclassing, what you get back isn't a driver-specific type T, but 
rather a (reference to a) shmem::Object. T represents the inner 
driver-specific data/functionality (only), and the outer 
shmem::Object includes the actual drm_gem_shmem_object plus a T. This 
is backwards from C, where you expect the opposite situation where T 
contains a shmem object, but that just doesn't work with Rust because 
there's no way to build a safe API around that model as far as I know.


Now the problem is from the higher layers I want object operations that 
interact with the shmem::Object (that is, they call generic GEM 
functions on the object). Options so far:


1. Add an outer wrapper and put that functionality there.
2. Just have the functions on T as helpers, so you need to call 
T::foo(obj) instead of obj.foo().
3. Use the undocumented method receiver trait thing to make 
shmem::Object a valid `self` type, plus add auto-Deref to 
shmem::Object. Then obj.foo() works.


#1 is what I use here. #2 is how the driver-specific File ioctl 
callbacks are implemented, and also sched::Job. #3 is used for fence 
callbacks (FenceObject). None of them are great, and I'd love to hear 
what people think of the various options...


There are other unexplored options, like in this GEM case it could be 
covered with a driver-internal auxiliary trait impl'd on 
shmem::Object buuut that doesn't work when you actually need 
callbacks on T itself to circle back to shmem::Object, as is the case 
with File/Job/FenceObject.


~~ Lina



Re: [Intel-gfx] [PATCH] drm/i915/guc: Disable PL1 power limit when loading GuC firmware

2023-04-05 Thread Dixit, Ashutosh
On Tue, 28 Mar 2023 02:14:42 -0700, Tvrtko Ursulin wrote:
>

Hi Tvrtko,

> On 27/03/2023 18:47, Rodrigo Vivi wrote:
> >
> > +Daniel
> >
> > On Mon, Mar 27, 2023 at 09:58:52AM -0700, Dixit, Ashutosh wrote:
> >> On Sun, 26 Mar 2023 04:52:59 -0700, Rodrigo Vivi wrote:
> >>>
> >>
> >> Hi Rodrigo,
> >>
> >>> On Fri, Mar 24, 2023 at 04:31:22PM -0700, Dixit, Ashutosh wrote:
>  On Fri, 24 Mar 2023 11:15:02 -0700, Belgaumkar, Vinay wrote:
> >
> 
>  Hi Vinay,
> 
>  Thanks for the review. Comments inline below.
> 
> > On 3/15/2023 8:59 PM, Ashutosh Dixit wrote:
> >> On dGfx, the PL1 power limit being enabled and set to a low value 
> >> results
> >> in a low GPU operating freq. It also negates the freq raise operation 
> >> which
> >> is done before GuC firmware load. As a result GuC firmware load can 
> >> time
> >> out. Such timeouts were seen in the GL #8062 bug below (where the PL1 
> >> power
> >> limit was enabled and set to a low value). Therefore disable the PL1 
> >> power
> >> limit when allowed by HW when loading GuC firmware.
> > v3 label missing in subject.
> >>
> >> v2:
> >>- Take mutex (to disallow writes to power1_max) across GuC reset/fw 
> >> load
> >>- Add hwm_power_max_restore to error return code path
> >>
> >> v3 (Jani N):
> >>- Add/remove explanatory comments
> >>- Function renames
> >>- Type corrections
> >>- Locking annotation
> >>
> >> Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8062
> >> Signed-off-by: Ashutosh Dixit 
> >> ---
> >>drivers/gpu/drm/i915/gt/uc/intel_uc.c |  9 +++
> >>drivers/gpu/drm/i915/i915_hwmon.c | 39 
> >> +++
> >>drivers/gpu/drm/i915/i915_hwmon.h |  7 +
> >>3 files changed, 55 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
> >> b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> >> index 4ccb4be4c9cba..aa8e35a5636a0 100644
> >> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> >> @@ -18,6 +18,7 @@
> >>#include "intel_uc.h"
> >>  #include "i915_drv.h"
> >> +#include "i915_hwmon.h"
> >>  static const struct intel_uc_ops uc_ops_off;
> >>static const struct intel_uc_ops uc_ops_on;
> >> @@ -461,6 +462,7 @@ static int __uc_init_hw(struct intel_uc *uc)
> >>struct intel_guc *guc = >guc;
> >>struct intel_huc *huc = >huc;
> >>int ret, attempts;
> >> +  bool pl1en;
> >
> > Init to 'false' here
> 
>  See next comment.
> 
> >
> >
> >>GEM_BUG_ON(!intel_uc_supports_guc(uc));
> >>GEM_BUG_ON(!intel_uc_wants_guc(uc));
> >> @@ -491,6 +493,9 @@ static int __uc_init_hw(struct intel_uc *uc)
> >>else
> >>attempts = 1;
> >>+   /* Disable a potentially low PL1 power limit to allow freq to be
> >> raised */
> >> +  i915_hwmon_power_max_disable(gt->i915, );
> >> +
> >>intel_rps_raise_unslice(_to_gt(uc)->rps);
> >>while (attempts--) {
> >> @@ -547,6 +552,8 @@ static int __uc_init_hw(struct intel_uc *uc)
> >>intel_rps_lower_unslice(_to_gt(uc)->rps);
> >>}
> >>+   i915_hwmon_power_max_restore(gt->i915, pl1en);
> >> +
> >>guc_info(guc, "submission %s\n", 
> >> str_enabled_disabled(intel_uc_uses_guc_submission(uc)));
> >>guc_info(guc, "SLPC %s\n", 
> >> str_enabled_disabled(intel_uc_uses_guc_slpc(uc)));
> >>@@ -563,6 +570,8 @@ static int __uc_init_hw(struct intel_uc *uc)
> >>/* Return GT back to RPn */
> >>intel_rps_lower_unslice(_to_gt(uc)->rps);
> >>+   i915_hwmon_power_max_restore(gt->i915, pl1en);
> >
> > if (pl1en)
> >
> >      i915_hwmon_power_max_enable().
> 
>  IMO it's better not to have checks in the main __uc_init_hw() function 
>  (if
>  we do this we'll need to add 2 checks in __uc_init_hw()). If you really
>  want we could do something like this inside
>  i915_hwmon_power_max_disable/i915_hwmon_power_max_restore. But for now I
>  am not making any changes.
> 
>  (I can send a patch with the changes if you want to take a look but IMO 
>  it
>  will add more logic/code but without real benefits (it will save a rmw if
>  the limit was already disabled, but IMO this code is called so 
>  infrequently
>  (only during GuC resets) as to not have any significant impact)).
> 
> >
> >> +
> >>__uc_sanitize(uc);
> >>if (!ret) {
> >> diff --git a/drivers/gpu/drm/i915/i915_hwmon.c 
> >> b/drivers/gpu/drm/i915/i915_hwmon.c
> >> index ee63a8fd88fc1..769b5bda4d53f 100644
> >> --- 

Re: [Intel-gfx] [PATCH] drm/i915/guc: Disable PL1 power limit when loading GuC firmware

2023-04-05 Thread Dixit, Ashutosh
On Mon, 27 Mar 2023 10:47:25 -0700, Rodrigo Vivi wrote:
>

Hi Rodrigo,

Sorry for the delay, I got pulled away into a couple of other things and
could only now get back to this.

>
> +Daniel
>
> On Mon, Mar 27, 2023 at 09:58:52AM -0700, Dixit, Ashutosh wrote:
> > On Sun, 26 Mar 2023 04:52:59 -0700, Rodrigo Vivi wrote:
> > >
> >
> > Hi Rodrigo,
> >
> > > On Fri, Mar 24, 2023 at 04:31:22PM -0700, Dixit, Ashutosh wrote:
> > > > On Fri, 24 Mar 2023 11:15:02 -0700, Belgaumkar, Vinay wrote:
> > > > >
> > > >
> > > > Hi Vinay,
> > > >
> > > > Thanks for the review. Comments inline below.
> > > >
> > > > > On 3/15/2023 8:59 PM, Ashutosh Dixit wrote:
> > > > > > On dGfx, the PL1 power limit being enabled and set to a low value 
> > > > > > results
> > > > > > in a low GPU operating freq. It also negates the freq raise 
> > > > > > operation which
> > > > > > is done before GuC firmware load. As a result GuC firmware load can 
> > > > > > time
> > > > > > out. Such timeouts were seen in the GL #8062 bug below (where the 
> > > > > > PL1 power
> > > > > > limit was enabled and set to a low value). Therefore disable the 
> > > > > > PL1 power
> > > > > > limit when allowed by HW when loading GuC firmware.
> > > > > v3 label missing in subject.
> > > > > >
> > > > > > v2:
> > > > > >   - Take mutex (to disallow writes to power1_max) across GuC 
> > > > > > reset/fw load
> > > > > >   - Add hwm_power_max_restore to error return code path
> > > > > >
> > > > > > v3 (Jani N):
> > > > > >   - Add/remove explanatory comments
> > > > > >   - Function renames
> > > > > >   - Type corrections
> > > > > >   - Locking annotation
> > > > > >
> > > > > > Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8062
> > > > > > Signed-off-by: Ashutosh Dixit 
> > > > > > ---
> > > > > >   drivers/gpu/drm/i915/gt/uc/intel_uc.c |  9 +++
> > > > > >   drivers/gpu/drm/i915/i915_hwmon.c | 39 
> > > > > > +++
> > > > > >   drivers/gpu/drm/i915/i915_hwmon.h |  7 +
> > > > > >   3 files changed, 55 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
> > > > > > b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > > > > > index 4ccb4be4c9cba..aa8e35a5636a0 100644
> > > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > > > > > @@ -18,6 +18,7 @@
> > > > > >   #include "intel_uc.h"
> > > > > > #include "i915_drv.h"
> > > > > > +#include "i915_hwmon.h"
> > > > > > static const struct intel_uc_ops uc_ops_off;
> > > > > >   static const struct intel_uc_ops uc_ops_on;
> > > > > > @@ -461,6 +462,7 @@ static int __uc_init_hw(struct intel_uc *uc)
> > > > > > struct intel_guc *guc = >guc;
> > > > > > struct intel_huc *huc = >huc;
> > > > > > int ret, attempts;
> > > > > > +   bool pl1en;
> > > > >
> > > > > Init to 'false' here
> > > >
> > > > See next comment.
> > > >
> > > > >
> > > > >
> > > > > > GEM_BUG_ON(!intel_uc_supports_guc(uc));
> > > > > > GEM_BUG_ON(!intel_uc_wants_guc(uc));
> > > > > > @@ -491,6 +493,9 @@ static int __uc_init_hw(struct intel_uc *uc)
> > > > > > else
> > > > > > attempts = 1;
> > > > > >   + /* Disable a potentially low PL1 power limit to allow freq to be
> > > > > > raised */
> > > > > > +   i915_hwmon_power_max_disable(gt->i915, );
> > > > > > +
> > > > > > intel_rps_raise_unslice(_to_gt(uc)->rps);
> > > > > > while (attempts--) {
> > > > > > @@ -547,6 +552,8 @@ static int __uc_init_hw(struct intel_uc *uc)
> > > > > > intel_rps_lower_unslice(_to_gt(uc)->rps);
> > > > > > }
> > > > > >   + i915_hwmon_power_max_restore(gt->i915, pl1en);
> > > > > > +
> > > > > > guc_info(guc, "submission %s\n", 
> > > > > > str_enabled_disabled(intel_uc_uses_guc_submission(uc)));
> > > > > > guc_info(guc, "SLPC %s\n", 
> > > > > > str_enabled_disabled(intel_uc_uses_guc_slpc(uc)));
> > > > > >   @@ -563,6 +570,8 @@ static int __uc_init_hw(struct intel_uc *uc)
> > > > > > /* Return GT back to RPn */
> > > > > > intel_rps_lower_unslice(_to_gt(uc)->rps);
> > > > > >   + i915_hwmon_power_max_restore(gt->i915, pl1en);
> > > > >
> > > > > if (pl1en)
> > > > >
> > > > >     i915_hwmon_power_max_enable().
> > > >
> > > > IMO it's better not to have checks in the main __uc_init_hw() function 
> > > > (if
> > > > we do this we'll need to add 2 checks in __uc_init_hw()). If you really
> > > > want we could do something like this inside
> > > > i915_hwmon_power_max_disable/i915_hwmon_power_max_restore. But for now I
> > > > am not making any changes.
> > > >
> > > > (I can send a patch with the changes if you want to take a look but IMO 
> > > > it
> > > > will add more logic/code but without real benefits (it will save a rmw 
> > > > if
> > > > the limit was already disabled, but IMO this code is called so 
> > > > infrequently
> > > > (only during GuC resets) as to not have any significant impact)).
> > > >
> > > > >
> > > > 

[PATCH 2/3] drm/i915/guc: Disable PL1 power limit when loading GuC firmware

2023-04-05 Thread Ashutosh Dixit
On dGfx, the PL1 power limit being enabled and set to a low value results
in a low GPU operating freq. It also negates the freq raise operation which
is done before GuC firmware load. As a result GuC firmware load can time
out. Such timeouts were seen in the GL #8062 bug below (where the PL1 power
limit was enabled and set to a low value). Therefore disable the PL1 power
limit when allowed by HW when loading GuC firmware.

v2:
 - Take mutex (to disallow writes to power1_max) across GuC reset/fw load
 - Add hwm_power_max_restore to error return code path

v3 (Jani N):
 - Add/remove explanatory comments
 - Function renames
 - Type corrections
 - Locking annotation

v4:
 - Don't hold the lock across GuC reset (Rodrigo)
 - New locking scheme (suggested by Rodrigo)
 - Eliminate rpm_get in power_max_disable/restore, not needed (Tvrtko)

Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8062
Signed-off-by: Ashutosh Dixit 
---
 drivers/gpu/drm/i915/gt/uc/intel_uc.c |  9 ++
 drivers/gpu/drm/i915/i915_hwmon.c | 40 +++
 drivers/gpu/drm/i915/i915_hwmon.h |  7 +
 3 files changed, 56 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 4ccb4be4c9cba..aa8e35a5636a0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -18,6 +18,7 @@
 #include "intel_uc.h"
 
 #include "i915_drv.h"
+#include "i915_hwmon.h"
 
 static const struct intel_uc_ops uc_ops_off;
 static const struct intel_uc_ops uc_ops_on;
@@ -461,6 +462,7 @@ static int __uc_init_hw(struct intel_uc *uc)
struct intel_guc *guc = >guc;
struct intel_huc *huc = >huc;
int ret, attempts;
+   bool pl1en;
 
GEM_BUG_ON(!intel_uc_supports_guc(uc));
GEM_BUG_ON(!intel_uc_wants_guc(uc));
@@ -491,6 +493,9 @@ static int __uc_init_hw(struct intel_uc *uc)
else
attempts = 1;
 
+   /* Disable a potentially low PL1 power limit to allow freq to be raised 
*/
+   i915_hwmon_power_max_disable(gt->i915, );
+
intel_rps_raise_unslice(_to_gt(uc)->rps);
 
while (attempts--) {
@@ -547,6 +552,8 @@ static int __uc_init_hw(struct intel_uc *uc)
intel_rps_lower_unslice(_to_gt(uc)->rps);
}
 
+   i915_hwmon_power_max_restore(gt->i915, pl1en);
+
guc_info(guc, "submission %s\n", 
str_enabled_disabled(intel_uc_uses_guc_submission(uc)));
guc_info(guc, "SLPC %s\n", 
str_enabled_disabled(intel_uc_uses_guc_slpc(uc)));
 
@@ -563,6 +570,8 @@ static int __uc_init_hw(struct intel_uc *uc)
/* Return GT back to RPn */
intel_rps_lower_unslice(_to_gt(uc)->rps);
 
+   i915_hwmon_power_max_restore(gt->i915, pl1en);
+
__uc_sanitize(uc);
 
if (!ret) {
diff --git a/drivers/gpu/drm/i915/i915_hwmon.c 
b/drivers/gpu/drm/i915/i915_hwmon.c
index 7f44e809ca155..9ab8971679fe3 100644
--- a/drivers/gpu/drm/i915/i915_hwmon.c
+++ b/drivers/gpu/drm/i915/i915_hwmon.c
@@ -50,6 +50,7 @@ struct hwm_drvdata {
struct hwm_energy_info ei;  /*  Energy info for 
energy1_input */
char name[12];
int gt_n;
+   bool reset_in_progress;
 };
 
 struct i915_hwmon {
@@ -400,6 +401,10 @@ hwm_power_max_write(struct hwm_drvdata *ddat, long val)
u32 nval;
 
mutex_lock(>hwmon_lock);
+   if (hwmon->ddat.reset_in_progress) {
+   ret = -EAGAIN;
+   goto unlock;
+   }
wakeref = intel_runtime_pm_get(ddat->uncore->rpm);
 
/* Disable PL1 limit and verify, because the limit cannot be disabled 
on all platforms */
@@ -421,6 +426,7 @@ hwm_power_max_write(struct hwm_drvdata *ddat, long val)
 PKG_PWR_LIM_1_EN | PKG_PWR_LIM_1, nval);
 exit:
intel_runtime_pm_put(ddat->uncore->rpm, wakeref);
+unlock:
mutex_unlock(>hwmon_lock);
return ret;
 }
@@ -472,6 +478,40 @@ hwm_power_write(struct hwm_drvdata *ddat, u32 attr, int 
chan, long val)
}
 }
 
+void i915_hwmon_power_max_disable(struct drm_i915_private *i915, bool *old)
+{
+   struct i915_hwmon *hwmon = i915->hwmon;
+   u32 r;
+
+   if (!hwmon || !i915_mmio_reg_valid(hwmon->rg.pkg_rapl_limit))
+   return;
+
+   mutex_lock(>hwmon_lock);
+
+   hwmon->ddat.reset_in_progress = true;
+   r = intel_uncore_rmw(hwmon->ddat.uncore, hwmon->rg.pkg_rapl_limit,
+PKG_PWR_LIM_1_EN, 0);
+   *old = !!(r & PKG_PWR_LIM_1_EN);
+
+   mutex_unlock(>hwmon_lock);
+}
+
+void i915_hwmon_power_max_restore(struct drm_i915_private *i915, bool old)
+{
+   struct i915_hwmon *hwmon = i915->hwmon;
+
+   if (!hwmon || !i915_mmio_reg_valid(hwmon->rg.pkg_rapl_limit))
+   return;
+
+   mutex_lock(>hwmon_lock);
+
+   intel_uncore_rmw(hwmon->ddat.uncore, hwmon->rg.pkg_rapl_limit,
+PKG_PWR_LIM_1_EN, old ? PKG_PWR_LIM_1_EN : 0);
+   hwmon->ddat.reset_in_progress = 

[PATCH 1/3] drm/i915/hwmon: Get mutex and rpm ref just once in hwm_power_max_write

2023-04-05 Thread Ashutosh Dixit
In preparation for follow-on patches, refactor hwm_power_max_write to take
hwmon_lock and runtime pm wakeref at start of the function and release them
at the end, therefore acquiring these just once each.

Signed-off-by: Ashutosh Dixit 
---
 drivers/gpu/drm/i915/i915_hwmon.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_hwmon.c 
b/drivers/gpu/drm/i915/i915_hwmon.c
index 8e7dccc8d3a0e..7f44e809ca155 100644
--- a/drivers/gpu/drm/i915/i915_hwmon.c
+++ b/drivers/gpu/drm/i915/i915_hwmon.c
@@ -396,31 +396,33 @@ hwm_power_max_write(struct hwm_drvdata *ddat, long val)
 {
struct i915_hwmon *hwmon = ddat->hwmon;
intel_wakeref_t wakeref;
+   int ret = 0;
u32 nval;
 
+   mutex_lock(>hwmon_lock);
+   wakeref = intel_runtime_pm_get(ddat->uncore->rpm);
+
/* Disable PL1 limit and verify, because the limit cannot be disabled 
on all platforms */
if (val == PL1_DISABLE) {
-   mutex_lock(>hwmon_lock);
-   with_intel_runtime_pm(ddat->uncore->rpm, wakeref) {
-   intel_uncore_rmw(ddat->uncore, hwmon->rg.pkg_rapl_limit,
-PKG_PWR_LIM_1_EN, 0);
-   nval = intel_uncore_read(ddat->uncore, 
hwmon->rg.pkg_rapl_limit);
-   }
-   mutex_unlock(>hwmon_lock);
+   intel_uncore_rmw(ddat->uncore, hwmon->rg.pkg_rapl_limit,
+PKG_PWR_LIM_1_EN, 0);
+   nval = intel_uncore_read(ddat->uncore, 
hwmon->rg.pkg_rapl_limit);
 
if (nval & PKG_PWR_LIM_1_EN)
-   return -ENODEV;
-   return 0;
+   ret = -ENODEV;
+   goto exit;
}
 
/* Computation in 64-bits to avoid overflow. Round to nearest. */
nval = DIV_ROUND_CLOSEST_ULL((u64)val << hwmon->scl_shift_power, 
SF_POWER);
nval = PKG_PWR_LIM_1_EN | REG_FIELD_PREP(PKG_PWR_LIM_1, nval);
 
-   hwm_locked_with_pm_intel_uncore_rmw(ddat, hwmon->rg.pkg_rapl_limit,
-   PKG_PWR_LIM_1_EN | PKG_PWR_LIM_1,
-   nval);
-   return 0;
+   intel_uncore_rmw(ddat->uncore, hwmon->rg.pkg_rapl_limit,
+PKG_PWR_LIM_1_EN | PKG_PWR_LIM_1, nval);
+exit:
+   intel_runtime_pm_put(ddat->uncore->rpm, wakeref);
+   mutex_unlock(>hwmon_lock);
+   return ret;
 }
 
 static int
-- 
2.38.0



[PATCH 3/3] drm/i915/hwmon: Block waiting for GuC reset to complete

2023-04-05 Thread Ashutosh Dixit
Instead of erroring out when GuC reset is in progress, block waiting for
GuC reset to complete which is a more reasonable uapi behavior.

Signed-off-by: Ashutosh Dixit 
---
 drivers/gpu/drm/i915/i915_hwmon.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_hwmon.c 
b/drivers/gpu/drm/i915/i915_hwmon.c
index 9ab8971679fe3..4343efb48e61b 100644
--- a/drivers/gpu/drm/i915/i915_hwmon.c
+++ b/drivers/gpu/drm/i915/i915_hwmon.c
@@ -51,6 +51,7 @@ struct hwm_drvdata {
char name[12];
int gt_n;
bool reset_in_progress;
+   wait_queue_head_t wqh;
 };
 
 struct i915_hwmon {
@@ -400,10 +401,15 @@ hwm_power_max_write(struct hwm_drvdata *ddat, long val)
int ret = 0;
u32 nval;
 
+retry:
mutex_lock(>hwmon_lock);
if (hwmon->ddat.reset_in_progress) {
-   ret = -EAGAIN;
-   goto unlock;
+   mutex_unlock(>hwmon_lock);
+   ret = wait_event_interruptible(ddat->wqh,
+  !hwmon->ddat.reset_in_progress);
+   if (ret)
+   return ret;
+   goto retry;
}
wakeref = intel_runtime_pm_get(ddat->uncore->rpm);
 
@@ -426,7 +432,6 @@ hwm_power_max_write(struct hwm_drvdata *ddat, long val)
 PKG_PWR_LIM_1_EN | PKG_PWR_LIM_1, nval);
 exit:
intel_runtime_pm_put(ddat->uncore->rpm, wakeref);
-unlock:
mutex_unlock(>hwmon_lock);
return ret;
 }
@@ -508,6 +513,7 @@ void i915_hwmon_power_max_restore(struct drm_i915_private 
*i915, bool old)
intel_uncore_rmw(hwmon->ddat.uncore, hwmon->rg.pkg_rapl_limit,
 PKG_PWR_LIM_1_EN, old ? PKG_PWR_LIM_1_EN : 0);
hwmon->ddat.reset_in_progress = false;
+   wake_up_all(>ddat.wqh);
 
mutex_unlock(>hwmon_lock);
 }
@@ -784,6 +790,7 @@ void i915_hwmon_register(struct drm_i915_private *i915)
ddat->uncore = >uncore;
snprintf(ddat->name, sizeof(ddat->name), "i915");
ddat->gt_n = -1;
+   init_waitqueue_head(>wqh);
 
for_each_gt(gt, i915, i) {
ddat_gt = hwmon->ddat_gt + i;
-- 
2.38.0



[PATCH v4 0/3] drm/i915/guc: Disable PL1 power limit when loading GuC firmware

2023-04-05 Thread Ashutosh Dixit
Split the v3 patch into 3 patches for easier review, can squash later if needed.

Cc: Rodrigo Vivi 
Cc: Tvrtko Ursulin 

Ashutosh Dixit (3):
  drm/i915/hwmon: Get mutex and rpm ref just once in hwm_power_max_write
  drm/i915/guc: Disable PL1 power limit when loading GuC firmware
  drm/i915/hwmon: Block waiting for GuC reset to complete

 drivers/gpu/drm/i915/gt/uc/intel_uc.c |  9 
 drivers/gpu/drm/i915/i915_hwmon.c | 75 ++-
 drivers/gpu/drm/i915/i915_hwmon.h |  7 +++
 3 files changed, 78 insertions(+), 13 deletions(-)

-- 
2.38.0



Re: [PATCH RFC 18/18] drm/asahi: Add the Asahi driver for Apple AGX GPUs

2023-04-05 Thread Asahi Lina

On 05/04/2023 23.37, Daniel Vetter wrote:

On Tue, Mar 07, 2023 at 11:25:43PM +0900, Asahi Lina wrote:

+/// A generic monotonically incrementing ID used to uniquely identify object 
instances within the
+/// driver.
+pub(crate) struct ID(AtomicU64);
+
+impl ID {
+/// Create a new ID counter with a given value.
+fn new(val: u64) -> ID {
+ID(AtomicU64::new(val))
+}
+
+/// Fetch the next unique ID.
+pub(crate) fn next() -> u64 {
+self.0.fetch_add(1, Ordering::Relaxed)
+}
+}


Continuing the theme of me commenting on individual things, I stumbled
over this because I noticed that there's a lot of id based lookups where I
don't expect them, and started chasing.

- For ids use xarray, not atomic counters. Yes I know dma_fence timelines
   gets this wrong, this goes back to an innocent time where we didn't
   allocate more than one timeline per engine, and no one fixed it since
   then. Yes u64 should be big enough for everyone :-/

- Attaching ID spaces to drm_device is also not great. drm is full of
   these mistakes. Much better if their per drm_file and so private to each
   client.

- They shouldn't be used for anything else than uapi id -> kernel object
   lookup at the beginning of ioctl code, and nowhere else. At least from
   skimming it seems like these are used all over the driver codebase,
   which does freak me out. At least on the C side that's a clear indicator
   for a refcount/lockin/data structure model that's not thought out at
   all.

What's going on here, what do I miss?


These aren't UAPI IDs, they are driver-internal IDs (the UAPI IDs do use 
xarray and are per-File). Most of them are just for debugging, so that 
when I enable full debug spam I have some way to correlate different 
things that are happening together (this subset of interleaved log lines 
relate to the same submission). Basically just object names that are 
easier to read (and less of a security leak) than pointers and 
guaranteed not to repeat. You could get rid of most of them and it 
wouldn't affect the driver design, it just makes it very hard to see 
what's going on with debug logs ^^;


There are only two that are ever used for non-debugging purposes: the VM 
ID, and the File ID. Both are per-device global IDs attached to the VMs 
(not the UAPI VM objects, but rather the underlyng MMU address space 
managers they represent, including the kernel-internal ones) and to 
Files themselves. They are used for destroying GEM objects: since the 
objects are also device-global across multiple clients, I need a way to 
do things like "clean up all mappings for this File" or "clean up all 
mappings for this VM". There's an annoying circular reference between 
GEM objects and their mappings, which is why this is explicitly coded 
out in destroy paths instead of naturally happening via Drop semantics 
(without that cleanup code, the circular reference leaks it).


So e.g. when a File does a GEM close or explicitly asks for all mappings 
of an object to be removed, it goes out to the (possibly shared) GEM 
object and tells it to drop all mappings marked as owned by that unique 
File ID. When an explicit "unmap all in VM" op happens, it asks the GEM 
object to drop all mappings for that underlying VM ID. Similarly, when a 
UAPI VM object is dropped (in the Drop impl, so both explicitly and when 
the whole File/xarray is dropped and such), that does an explicit unmap 
of a special dummy object it owns which would otherwise leak since it is 
not tracked as a GEM object owned by that File and therefore not handled 
by GEM closing. And again along the same lines, the allocators in 
alloc.rs explicitly destroy the mappings for their backing GEM objects 
on Drop. All this is due to that annoying circular reference between VMs 
and GEM objects that I'm not sure how to fix.


Note that if I *don't* do this (or forget to do it somewhere) the 
consequence is just that we leak memory, and if you try to destroy the 
wrong IDs somehow the worst that can happen is you unmap things you 
shouldn't and fault the GPU (or, in the kernel or kernel-managed user VM 
cases, potentially the firmware). Rust safety guarantees still keep 
things from going entirely off the rails within the kernel, since 
everything that matters is reference counted (which is why these 
reference cycles are possible at all).


This all started when I was looking at the panfrost driver for 
reference. It does the same thing except it uses actual pointers to the 
owning entities instead of IDs, and pointer comparison (see 
panfrost_gem_close). Of course you could try do that in Rust too 
(literally storing and comparing raw pointers that aren't owned 
references), but then you're introducing a Pin<> requirement on those 
objects to make their addresses stable and it feels way more icky and 
error-prone than unique IDs (since addresses can be reused). panfrost 
only has a single mmu (what I call the raw VM) per File while I have an 

Re: [RFC PATCH 00/10] Xe DRM scheduler and long running workload plans

2023-04-05 Thread Matthew Brost
On Wed, Apr 05, 2023 at 12:12:27PM +0200, Daniel Vetter wrote:
> On Wed, 5 Apr 2023 at 11:57, Christian König  wrote:
> >
> > Am 05.04.23 um 11:07 schrieb Daniel Vetter:
> > > [SNIP]
> > >> I would approach it from the complete other side. This component here is 
> > >> a
> > >> tool to decide what job should run next.
> > >>
> > >> How that is then signaled and run should not be part of the scheduler, 
> > >> but
> > >> another more higher level component.
> > >>
> > >> This way you also don't have a problem with not using DMA-fences as
> > >> dependencies as well as constrains for running more jobs.
> > > I think we're talking about two things here and mixing them up.
> > >
> > > For the dependencies I agree with you, and imo that higher level tool
> > > should probably just be an on-demand submit thread in userspace for the
> > > rare case where the kernel would need to sort out a dependency otherwise
> > > (due to running out of ringspace in the per-ctx ringbuffer).
> > >
> > > The other thing is the message passing stuff, and this is what I was
> > > talking about above. This has nothing to do with handling dependencies,
> > > but with talking to the gpu fw. Here the intel design issue is that the fw
> > > only provides a single queue, and it's in-order. Which means it
> > > fundamentally has the stalling issue you describe as a point against a
> > > message passing design. And fundamentally we need to be able to talk to
> > > the fw in the scheduler ->run_job callback.
> > >
> > > The proposal here for the message passing part is that since it has the
> > > stalling issue already anyway, and the scheduler needs to be involved
> > > anyway, it makes sense to integrated this (as an optional thing, only for
> > > drivers which have this kind of fw interface) into the scheduler.
> > > Otherwise you just end up with two layers for no reason and more ping-pong
> > > delay because the ->run_job needs to kick off the subordinate driver layer
> > > first. Note that for this case the optional message passing support in the
> > > drm/scheduler actually makes things better, because it allows you to cut
> > > out one layer.
> > >
> > > Of course if a driver with better fw interface uses this message passing
> > > support, then that's bad. Hence the big warning in the kerneldoc.
> >
> > Well what I wanted to say is that if you design the dependency handling
> > / scheduler properly you don't need the message passing through it.
> >
> > For example if the GPU scheduler component uses a work item to do it's
> > handling instead of a kthread you could also let the driver specify the
> > work queue where this work item is executed on.
> >
> > When you design it like this the driver specifies the thread context of
> > execution for it's job. In other words it can specify a single threaded
> > firmware work queue as well.
> >
> > When you then have other messages which needs to be passed to the
> > firmware you can also use the same single threaded workqueue for this.
> >
> > Drivers which have a different firmware interface would just use one of
> > the system work queues instead.
> >
> > This approach basically decouples the GPU scheduler component from the
> > message passing functionality.
> 
> Hm I guess we've been talking past each another big time, because
> that's really what I thought was under discussions? Essentially the
> current rfc, but implementing with some polish.
>

I think Daniel pretty much nailed it here (thanks), to recap:

1. I want the messages in the same worker so run_job / free_job /
process_msg execution is mutual exclusive and also so during reset paths
if the worker is stopped all the entry points can't be entered.

If this is a NAK, then another worker is fine I guess. A lock between
run_job / free_job + process_msg should solve the exclusion issue and the
reset paths can also stop this new worker too. That being said I'd
rather leave this as is but will not fight this point.

2. process_msg is just used to communicate with the firmware using the
same queue as submission. Waiting for space in this queue is the only
place this function can block (same as submission), well actually we
have the concept a preempt time slice but that sleeps for 10 ms by
default. Also preempt is only used in LR entities so I don't think it is
relavent in this case either.

3. Agree this is in the dma-fence signaling path (if process_msg is in
the submission worker) so we can't block indefinitely or an unreasonable
period of time (i.e. we must obey dma-fence rules).

4. Agree the documentation for thw usage of the messaging interface
needs to be clear.

5. Agree that my code could alway use polishing.

Lets close on #1 then can I get on general Ack on this part of the RFC
and apply the polish in the full review process?

Matt

> iow I agree with you (I think at least).
> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


Re: [PATCH v2 2/2] drm: bridge: cdns-mhdp8546: Add support for no-hpd

2023-04-05 Thread Laurent Pinchart
Hi Jayesh,

Thank you for the patch.

On Wed, Apr 05, 2023 at 07:54:40PM +0530, Jayesh Choudhary wrote:
> From: Rahul T R 
> 
> In J721S2 EVMs DP0 hpd is not connected to correct
> hpd pin on SOC, to handle such cases, Add support for
> "no-hpd" property in the device tree node to disable
> hpd

s/hpd/hpd./

You can also reflow the commit message to 72 columns.

> Also change the log level for dpcd read failuers to
> debug, since framework retries 32 times for each read

s/read/read./

Doesn't this apply to writes as well ?

> Signed-off-by: Rahul T R 
> Signed-off-by: Jayesh Choudhary 
> ---
>  .../drm/bridge/cadence/cdns-mhdp8546-core.c   | 37 ---
>  .../drm/bridge/cadence/cdns-mhdp8546-core.h   |  1 +
>  2 files changed, 33 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c 
> b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
> index f6822dfa3805..e177794b069d 100644
> --- a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
> +++ b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
> @@ -54,6 +54,8 @@
>  #include "cdns-mhdp8546-hdcp.h"
>  #include "cdns-mhdp8546-j721e.h"
>  
> +static int cdns_mhdp_update_link_status(struct cdns_mhdp_device *mhdp);
> +
>  static int cdns_mhdp_mailbox_read(struct cdns_mhdp_device *mhdp)
>  {
>   int ret, empty;
> @@ -749,7 +751,7 @@ static int cdns_mhdp_fw_activate(const struct firmware 
> *fw,
>* MHDP_HW_STOPPED happens only due to driver removal when
>* bridge should already be detached.
>*/
> - if (mhdp->bridge_attached)
> + if (mhdp->bridge_attached && !mhdp->no_hpd)
>   writel(~(u32)CDNS_APB_INT_MASK_SW_EVENT_INT,
>  mhdp->regs + CDNS_APB_INT_MASK);
>  
> @@ -845,7 +847,7 @@ static ssize_t cdns_mhdp_transfer(struct drm_dp_aux *aux,
>   ret = cdns_mhdp_dpcd_read(mhdp, msg->address,
> msg->buffer, msg->size);
>   if (ret) {
> - dev_err(mhdp->dev,
> + dev_dbg(mhdp->dev,
>   "Failed to read DPCD addr %u\n",
>   msg->address);
>  
> @@ -1738,6 +1740,19 @@ static int cdns_mhdp_attach(struct drm_bridge *bridge,
>  
>   spin_unlock(>start_lock);
>  
> + if (mhdp->no_hpd) {
> + ret = wait_event_timeout(mhdp->fw_load_wq,
> +  mhdp->hw_state == MHDP_HW_READY,
> +  msecs_to_jiffies(100));
> + if (ret == 0) {
> + dev_err(mhdp->dev, "%s: Timeout waiting for fw 
> loading\n",
> + __func__);
> + return -ETIMEDOUT;
> + }
> +
> + cdns_mhdp_update_link_status(mhdp);
> + return 0;
> + }

Missing blank line.

It's not clear to me while you need to wait for the state to change to
MHDP_HW_READY in the no_hpd case. This should be explained in the commit
message.

>   /* Enable SW event interrupts */
>   if (hw_ready)
>   writel(~(u32)CDNS_APB_INT_MASK_SW_EVENT_INT,
> @@ -2256,7 +2271,16 @@ static int cdns_mhdp_update_link_status(struct 
> cdns_mhdp_device *mhdp)
>  
>   mutex_lock(>link_mutex);
>  
> - mhdp->plugged = cdns_mhdp_detect_hpd(mhdp, _pulse);
> + if (mhdp->no_hpd) {
> + ret = drm_dp_dpcd_read_link_status(>aux, status);
> + hpd_pulse = false;
> + if (ret < 0)
> + mhdp->plugged = false;
> + else
> + mhdp->plugged = true;

I think there's an issue with how the driver uses mhdp->plugged. In the
no_hpd case, you try to detect if a display is connected by reading the
link status at attach time, and then never update mhdp->plugged. This
means that if no display is connected at that point, functions like
cdns_mhdp_get_edid() will always fail, even if a display gets plugged
later. As the goal of this series is (as far as I understand) support
systems where the HPD signal could be connected to a SoC GPIO instead of
the bridge, I don't think this is good enough.

> + } else {
> + mhdp->plugged = cdns_mhdp_detect_hpd(mhdp, _pulse);
> + }
>  
>   if (!mhdp->plugged) {
>   cdns_mhdp_link_down(mhdp);
> @@ -2451,6 +2475,8 @@ static int cdns_mhdp_probe(struct platform_device *pdev)
>   mhdp->aux.dev = dev;
>   mhdp->aux.transfer = cdns_mhdp_transfer;
>  
> + mhdp->no_hpd = of_property_read_bool(dev->of_node, "cdns,no-hpd");
> +
>   mhdp->regs = devm_platform_ioremap_resource(pdev, 0);
>   if (IS_ERR(mhdp->regs)) {
>   dev_err(dev, "Failed to get memory resource\n");
> @@ -2526,8 +2552,9 @@ static int cdns_mhdp_probe(struct platform_device *pdev)
>  
>   mhdp->bridge.of_node = pdev->dev.of_node;
>   mhdp->bridge.funcs = _mhdp_bridge_funcs;
> - mhdp->bridge.ops = DRM_BRIDGE_OP_DETECT | 

Re: [PATCH v2 1/2] dt-bindings: drm/bridge: Add no-hpd property

2023-04-05 Thread Laurent Pinchart
Hi Jayesh,

Thank you for the patch.

On Wed, Apr 05, 2023 at 07:54:39PM +0530, Jayesh Choudhary wrote:
> From: Rahul T R 
> 
> The mhdp bridge can work without its HPD pin hooked up to the connector,
> but the current bridge driver throws an error when hpd line is not
> connected to the connector. For such cases, we need an indication for
> no-hpd, using which we can bypass the hpd detection and instead use the
> auxiliary channels connected to the DP connector to confirm the
> connection.
> So add no-hpd property to the bindings, to disable hpd when not
> connected or unusable.
> 
> Signed-off-by: Rahul T R 
> Signed-off-by: Jayesh Choudhary 
> ---
>  .../devicetree/bindings/display/bridge/cdns,mhdp8546.yaml   | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git 
> a/Documentation/devicetree/bindings/display/bridge/cdns,mhdp8546.yaml 
> b/Documentation/devicetree/bindings/display/bridge/cdns,mhdp8546.yaml
> index c2b369456e4e..3a6c6d837593 100644
> --- a/Documentation/devicetree/bindings/display/bridge/cdns,mhdp8546.yaml
> +++ b/Documentation/devicetree/bindings/display/bridge/cdns,mhdp8546.yaml
> @@ -57,6 +57,12 @@ properties:
>interrupts:
>  maxItems: 1
>  
> +  cdns,no-hpd:
> +type: boolean
> +description:
> +  Set if the HPD line on the bridge isn't hooked up to anything or is
> +  otherwise unusable.

I'm fine with the non connected part, but concerned with "otherwise
unusable". It's very vague and could thus be abused. Do you have
particular use cases in mind for this ? If so, restricting this to those
use cases, or at least giving examples, would help.

> +
>ports:
>  $ref: /schemas/graph.yaml#/properties/ports
>  

-- 
Regards,

Laurent Pinchart


[Bug 215631] Some Desktop oriented mode setting drivers are missing DRM PRIME support

2023-04-05 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=215631

--- Comment #1 from bluescreen_aven...@verizon.net ---
I have been informed that 41068c8b28e16f1c2c26c854271520e1f3afaa22 in
drm-misc-next should fix it for all the mentioned drivers in the list, except
for gma500, ...and I have tested it, and it does work for bochs (in a QEMU
card)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences

2023-04-05 Thread Matthew Brost
On Wed, Apr 05, 2023 at 03:09:08PM +0200, Daniel Vetter wrote:
> On Tue, Apr 04, 2023 at 07:48:27PM +, Matthew Brost wrote:
> > On Tue, Apr 04, 2023 at 09:25:52PM +0200, Daniel Vetter wrote:
> > > On Tue, Apr 04, 2023 at 07:02:23PM +, Matthew Brost wrote:
> > > > On Tue, Apr 04, 2023 at 08:14:01PM +0200, Thomas Hellström (Intel) 
> > > > wrote:
> > > > > 
> > > > > On 4/4/23 15:10, Christian König wrote:
> > > > > > Am 04.04.23 um 14:54 schrieb Thomas Hellström:
> > > > > > > Hi, Christian,
> > > > > > > 
> > > > > > > On 4/4/23 11:09, Christian König wrote:
> > > > > > > > Am 04.04.23 um 02:22 schrieb Matthew Brost:
> > > > > > > > > From: Thomas Hellström 
> > > > > > > > > 
> > > > > > > > > For long-running workloads, drivers either need to open-code
> > > > > > > > > completion
> > > > > > > > > waits, invent their own synchronization primitives or 
> > > > > > > > > internally use
> > > > > > > > > dma-fences that do not obey the cross-driver dma-fence 
> > > > > > > > > protocol, but
> > > > > > > > > without any lockdep annotation all these approaches are error 
> > > > > > > > > prone.
> > > > > > > > > 
> > > > > > > > > So since for example the drm scheduler uses dma-fences it is
> > > > > > > > > desirable for
> > > > > > > > > a driver to be able to use it for throttling and error
> > > > > > > > > handling also with
> > > > > > > > > internal dma-fences tha do not obey the cros-driver
> > > > > > > > > dma-fence protocol.
> > > > > > > > > 
> > > > > > > > > Introduce long-running completion fences in form of
> > > > > > > > > dma-fences, and add
> > > > > > > > > lockdep annotation for them. In particular:
> > > > > > > > > 
> > > > > > > > > * Do not allow waiting under any memory management locks.
> > > > > > > > > * Do not allow to attach them to a dma-resv object.
> > > > > > > > > * Introduce a new interface for adding callbacks making the
> > > > > > > > > helper adding
> > > > > > > > >    a callback sign off on that it is aware that the dma-fence 
> > > > > > > > > may not
> > > > > > > > >    complete anytime soon. Typically this will be the
> > > > > > > > > scheduler chaining
> > > > > > > > >    a new long-running fence on another one.
> > > > > > > > 
> > > > > > > > Well that's pretty much what I tried before:
> > > > > > > > https://lwn.net/Articles/893704/
> > > > > > > > 
> > > > 
> > > > I don't think this quite the same, this explictly enforces that we don't
> > > > break the dma-fence rules (in path of memory allocations, exported in
> > > > any way), essentially this just SW sync point reusing dma-fence the
> > > > infrastructure for signaling / callbacks. I believe your series tried to
> > > > export these fences to user space (admittedly I haven't fully read your
> > > > series).
> > > > 
> > > > In this use case we essentially just want to flow control the ring via
> > > > the dma-scheduler + maintain a list of pending jobs so the TDR can be
> > > > used for cleanup if LR entity encounters an error. To me this seems
> > > > perfectly reasonable but I know dma-femce rules are akin to a holy war.
> > > > 
> > > > If we return NULL in run_job, now we have to be able to sink all jobs
> > > > in the backend regardless on ring space, maintain a list of jobs pending
> > > > for cleanup after errors, and write a different cleanup path as now the
> > > > TDR doesn't work. Seems very, very silly to duplicate all of this code
> > > > when the DRM scheduler provides all of this for us. Also if we go this
> > > > route, now all drivers are going to invent ways to handle LR jobs /w the
> > > > DRM scheduler.
> > > > 
> > > > This solution is pretty clear, mark the scheduler as LR, and don't
> > > > export any fences from the scheduler. If you try to export these fences
> > > > a blow up happens.
> > > 
> > > The problem is if you mix things up. Like for resets you need all the
> > > schedulers on an engine/set-of-engines to quiescent or things get
> > > potentially hilarious. If you now have a scheduler in forever limbo, the
> > > dma_fence guarantees are right out the window.
> > > 
> > 
> > Right, a GT reset on Xe is:
> > 
> > Stop all schedulers
> > Do a reset
> > Ban any schedulers which we think caused the GT reset
> > Resubmit all schedulers which we think were good
> > Restart all schedulers
> > 
> > None of this flow depends on LR dma-fences, all of this uses the DRM
> > sched infrastructure and work very well compared to the i915. Rewriting
> > all this with a driver specific implementation is what we are trying to
> > avoid.
> > 
> > Similarly if LR entity hangs on its own (not a GT reset, rather the
> > firmware does the reset for us) we use all the DRM scheduler
> > infrastructure to handle this. Again this works rather well...
> 
> Yeah this is why I don't think duplicating everything that long-running
> jobs need makes any sense. iow I agree with you.
> 

Glad we agree.

> > > But the issue you're having is fairly specific if it's just about
> > > ringspace. 

Re: [PATCH v4 1/6] drm/msm: Add MSM-specific DSC helper methods

2023-04-05 Thread Dmitry Baryshkov

On 06/04/2023 02:34, Abhinav Kumar wrote:



On 4/5/2023 12:26 PM, Dmitry Baryshkov wrote:

On 05/04/2023 03:41, Jessica Zhang wrote:

Introduce MSM-specific DSC helper methods, as some calculations are
common between DP and DSC.

Changes in v2:
- Moved files up to msm/ directory
- Dropped get_comp_ratio() helper
- Used drm_int2fixp() to convert to integers to fp
- Style changes to improve readability
- Dropped unused bpp variable in msm_dsc_get_dce_bytes_per_line()
- Changed msm_dsc_get_slice_per_intf() to a static inline method
- Dropped last division step of msm_dsc_get_pclk_per_line() and changed
   method name accordingly
- Changed DSC_BPP macro to drm_dsc_get_bpp_int() helper method
- Fixed some math issues caused by passing in incorrect types to
   drm_fixed methods in get_bytes_per_soft_slice()

Changes in v3:
- Dropped src_bpp parameter from all methods -- src_bpp can be
   calculated as dsc->bits_per_component * 3
- Dropped intf_width parameter from get_bytes_per_soft_slice()
- Moved dsc->bits_per_component to numerator calculation in
   get_bytes_per_soft_slice()
- Renamed msm_dsc_get_uncompressed_pclk_per_line to
   *_get_uncompressed_pclk_per_intf()
- Removed dsc->slice_width check from
   msm_dsc_get_uncompressed_pclk_per_intf()
- Made get_bytes_per_soft_slice() a public method (this will be called
   later to help calculate DP pclk params)
- Added documentation in comments
- Moved extra_eol_bytes math out of msm_dsc_get_eol_byte_num() and
   renamed msm_dsc_get_eol_byte_num to *_get_bytes_per_intf.

Changes in v4:
- Changed msm_dsc_get_uncompressed_pclk_per_intf to
   msm_dsc_get_pclk_per_intf

Signed-off-by: Jessica Zhang 
---
  drivers/gpu/drm/msm/Makefile |  1 +
  drivers/gpu/drm/msm/msm_dsc_helper.c | 47 
  drivers/gpu/drm/msm/msm_dsc_helper.h | 70 


  3 files changed, 118 insertions(+)

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index 7274c41228ed..b814fc80e2d5 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -94,6 +94,7 @@ msm-y += \
  msm_atomic_tracepoints.o \
  msm_debugfs.o \
  msm_drv.o \
+    msm_dsc_helper.o \
  msm_fb.o \
  msm_fence.o \
  msm_gem.o \
diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.c 
b/drivers/gpu/drm/msm/msm_dsc_helper.c

new file mode 100644
index ..0539221eb09d
--- /dev/null
+++ b/drivers/gpu/drm/msm/msm_dsc_helper.c
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights 
reserved

+ */
+
+#include 
+#include 
+#include 
+
+#include "msm_drv.h"
+#include "msm_dsc_helper.h"
+
+s64 get_bytes_per_soft_slice(struct drm_dsc_config *dsc)
+{
+    int bpp = msm_dsc_get_bpp_int(dsc);
+    s64 numerator_fp, denominator_fp;
+    s64 comp_ratio_fp = 
drm_fixp_from_fraction(dsc->bits_per_component * 3, bpp);

+
+    numerator_fp = drm_int2fixp(dsc->slice_width * 3 * 
dsc->bits_per_component);

+    denominator_fp = drm_fixp_mul(comp_ratio_fp, drm_int2fixp(8));
+
+    return drm_fixp_div(numerator_fp, denominator_fp);


If we remove 3 * dsc->bits_per_components from both numerator and 
denominator, this whole function seems to be as simple as 
DIV_ROUND_UP(dsc->slice_width * bpp, 8)


Or, if you prefer FP math, drm_fixp_from_fraction(dsc->slice_width * 
bpp, 8).




True and thats not really surprising because bytes_per_soft_slice is 
eventually (dsc->slice_width * bpp, 8)


I also thought about it after you mentioned that they will cancel out 
then why was downstream code and our programming guide doing this way.


So i thought a bit more and the reason from what I can see is that its 
showing that compression ratio was factored into the math while 
calculating it. If we just go with (dsc->slice_width * bpp, 8), it does 
not really show that this was actually compressed bytes per slice 
leading to the earlier confusion that it was uncompressed pclk while it 
actually was.


I'd say, name we can name that dsc_bpp or compressed_bpp instead of just 
bpp, this way showing that it is a compressed pclk. But please don't 
complicate the math.





+}
+
+u32 msm_dsc_get_bytes_per_intf(struct drm_dsc_config *dsc, int 
intf_width)

+{
+    u32 bytes_per_soft_slice, bytes_per_intf;
+    s64 bytes_per_soft_slice_fp;
+    int slice_per_intf = msm_dsc_get_slice_per_intf(dsc, intf_width);
+
+    bytes_per_soft_slice_fp = get_bytes_per_soft_slice(dsc);
+    bytes_per_soft_slice = drm_fixp2int_ceil(bytes_per_soft_slice_fp);
+
+    bytes_per_intf = bytes_per_soft_slice * slice_per_intf;
+
+    return bytes_per_intf;
+}
+
+int msm_dsc_get_pclk_per_intf(struct drm_dsc_config *dsc)
+{
+    s64 data_width;
+
+    data_width = drm_fixp_mul(drm_int2fixp(dsc->slice_count),
+    get_bytes_per_soft_slice(dsc));


And this is then DIV_ROUND_UP(dsc->slice_width * dsc->slice_count * 
bpp, 8)



+
+    return drm_fixp2int_ceil(data_width);
+}
diff --git 

Re: [PATCH v4 1/6] drm/msm: Add MSM-specific DSC helper methods

2023-04-05 Thread Abhinav Kumar




On 4/5/2023 12:26 PM, Dmitry Baryshkov wrote:

On 05/04/2023 03:41, Jessica Zhang wrote:

Introduce MSM-specific DSC helper methods, as some calculations are
common between DP and DSC.

Changes in v2:
- Moved files up to msm/ directory
- Dropped get_comp_ratio() helper
- Used drm_int2fixp() to convert to integers to fp
- Style changes to improve readability
- Dropped unused bpp variable in msm_dsc_get_dce_bytes_per_line()
- Changed msm_dsc_get_slice_per_intf() to a static inline method
- Dropped last division step of msm_dsc_get_pclk_per_line() and changed
   method name accordingly
- Changed DSC_BPP macro to drm_dsc_get_bpp_int() helper method
- Fixed some math issues caused by passing in incorrect types to
   drm_fixed methods in get_bytes_per_soft_slice()

Changes in v3:
- Dropped src_bpp parameter from all methods -- src_bpp can be
   calculated as dsc->bits_per_component * 3
- Dropped intf_width parameter from get_bytes_per_soft_slice()
- Moved dsc->bits_per_component to numerator calculation in
   get_bytes_per_soft_slice()
- Renamed msm_dsc_get_uncompressed_pclk_per_line to
   *_get_uncompressed_pclk_per_intf()
- Removed dsc->slice_width check from
   msm_dsc_get_uncompressed_pclk_per_intf()
- Made get_bytes_per_soft_slice() a public method (this will be called
   later to help calculate DP pclk params)
- Added documentation in comments
- Moved extra_eol_bytes math out of msm_dsc_get_eol_byte_num() and
   renamed msm_dsc_get_eol_byte_num to *_get_bytes_per_intf.

Changes in v4:
- Changed msm_dsc_get_uncompressed_pclk_per_intf to
   msm_dsc_get_pclk_per_intf

Signed-off-by: Jessica Zhang 
---
  drivers/gpu/drm/msm/Makefile |  1 +
  drivers/gpu/drm/msm/msm_dsc_helper.c | 47 
  drivers/gpu/drm/msm/msm_dsc_helper.h | 70 


  3 files changed, 118 insertions(+)

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index 7274c41228ed..b814fc80e2d5 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -94,6 +94,7 @@ msm-y += \
  msm_atomic_tracepoints.o \
  msm_debugfs.o \
  msm_drv.o \
+    msm_dsc_helper.o \
  msm_fb.o \
  msm_fence.o \
  msm_gem.o \
diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.c 
b/drivers/gpu/drm/msm/msm_dsc_helper.c

new file mode 100644
index ..0539221eb09d
--- /dev/null
+++ b/drivers/gpu/drm/msm/msm_dsc_helper.c
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights 
reserved

+ */
+
+#include 
+#include 
+#include 
+
+#include "msm_drv.h"
+#include "msm_dsc_helper.h"
+
+s64 get_bytes_per_soft_slice(struct drm_dsc_config *dsc)
+{
+    int bpp = msm_dsc_get_bpp_int(dsc);
+    s64 numerator_fp, denominator_fp;
+    s64 comp_ratio_fp = 
drm_fixp_from_fraction(dsc->bits_per_component * 3, bpp);

+
+    numerator_fp = drm_int2fixp(dsc->slice_width * 3 * 
dsc->bits_per_component);

+    denominator_fp = drm_fixp_mul(comp_ratio_fp, drm_int2fixp(8));
+
+    return drm_fixp_div(numerator_fp, denominator_fp);


If we remove 3 * dsc->bits_per_components from both numerator and 
denominator, this whole function seems to be as simple as 
DIV_ROUND_UP(dsc->slice_width * bpp, 8)


Or, if you prefer FP math, drm_fixp_from_fraction(dsc->slice_width * 
bpp, 8).




True and thats not really surprising because bytes_per_soft_slice is 
eventually (dsc->slice_width * bpp, 8)


I also thought about it after you mentioned that they will cancel out 
then why was downstream code and our programming guide doing this way.


So i thought a bit more and the reason from what I can see is that its 
showing that compression ratio was factored into the math while 
calculating it. If we just go with (dsc->slice_width * bpp, 8), it does 
not really show that this was actually compressed bytes per slice 
leading to the earlier confusion that it was uncompressed pclk while it 
actually was.



+}
+
+u32 msm_dsc_get_bytes_per_intf(struct drm_dsc_config *dsc, int 
intf_width)

+{
+    u32 bytes_per_soft_slice, bytes_per_intf;
+    s64 bytes_per_soft_slice_fp;
+    int slice_per_intf = msm_dsc_get_slice_per_intf(dsc, intf_width);
+
+    bytes_per_soft_slice_fp = get_bytes_per_soft_slice(dsc);
+    bytes_per_soft_slice = drm_fixp2int_ceil(bytes_per_soft_slice_fp);
+
+    bytes_per_intf = bytes_per_soft_slice * slice_per_intf;
+
+    return bytes_per_intf;
+}
+
+int msm_dsc_get_pclk_per_intf(struct drm_dsc_config *dsc)
+{
+    s64 data_width;
+
+    data_width = drm_fixp_mul(drm_int2fixp(dsc->slice_count),
+    get_bytes_per_soft_slice(dsc));


And this is then DIV_ROUND_UP(dsc->slice_width * dsc->slice_count * bpp, 8)


+
+    return drm_fixp2int_ceil(data_width);
+}
diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.h 
b/drivers/gpu/drm/msm/msm_dsc_helper.h

new file mode 100644
index ..31116a31090f
--- /dev/null
+++ b/drivers/gpu/drm/msm/msm_dsc_helper.h
@@ -0,0 +1,70 @@
+/* 

RE: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO creation

2023-04-05 Thread Yang, Fei
>Subject: Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO 
>creation
>
>On 04/04/2023 19:04, Yang, Fei wrote:
>>> Subject: Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set
>>> cache at BO creation
>>>
>>> On 01/04/2023 09:38, fei.y...@intel.com wrote:
 From: Fei Yang 

 To comply with the design that buffer objects shall have immutable
 cache setting through out its life cycle, {set, get}_caching ioctl's
 are no longer supported from MTL onward. With that change caching
 policy can only be set at object creation time. The current code
 applies a default (platform dependent) cache setting for all objects.
 However this is not optimal for performance tuning. The patch
 extends the existing gem_create uAPI to let user set PAT index for
 the object at creation time.
 The new extension is platform independent, so UMD's can switch to
 using this extension for older platforms as well, while {set,
 get}_caching are still supported on these legacy paltforms for
 compatibility reason.

 Cc: Chris Wilson 
 Cc: Matt Roper 
 Signed-off-by: Fei Yang 
 Reviewed-by: Andi Shyti 
>>>
>>> Just like the protected content uAPI, there is no way for userspace
>>> to tell this feature is available other than trying using it.
>>>
>>> Given the issues with protected content, is it not thing we could want to 
>>> add?
>> Sorry I'm not aware of the issues with protected content, could you 
>> elaborate?
>> There was a long discussion on teams uAPI channel, could you comment
>> there if any concerns?
>>
>> https://teams.microsoft.com/l/message/19:f1767bda6734476ba0a9c7d147b92
>> 8d1@thread.skype/1675860924675?tenantId=46c98d88-e344-4ed4-8496-4ed771
>> 2e255d=379f3ae1-d138-4205-bb65-d4c7d38cb481=16
>> 75860924675=GSE%20OSGC=i915%20uAPI%20changes
>> tedTime=1675860924675=false
>>
>> Thanks,
>> -Fei
>
>
> We wanted to have a getparam to detect protected support and were told
> to detect it by trying to create a context with it.
>
> Now it appears trying to create a protected context can block for several
> seconds.
>
> Since we have to report capabilities to the user even before it creates
> protected contexts, any app is at risk of blocking.

Can we detect this capability by creating a buffer object? This extension is
not blocking, it just provide a way to set caching policy, and should complete
very fast. There is a IGT test I created for this extension (not merged yet),
please take a look at http://intel-gfx-pw.fi.intel.com/series/19149/

I'm not familiar with getparam, will take a look there as well. But I think it
would be easier just create an object.

-Fei

>-Lionel
>
>
>>
>>> Thanks,
>>>
>>> -Lionel
>>>
>>>
 ---
drivers/gpu/drm/i915/gem/i915_gem_create.c | 33 
include/uapi/drm/i915_drm.h| 36 ++
tools/include/uapi/drm/i915_drm.h  | 36 ++
3 files changed, 105 insertions(+)



Re: [Freedreno] [PATCH v4 1/6] drm/msm: Add MSM-specific DSC helper methods

2023-04-05 Thread Dmitry Baryshkov

On 06/04/2023 01:02, Jessica Zhang wrote:



On 4/5/2023 12:26 PM, Dmitry Baryshkov wrote:

On 05/04/2023 03:41, Jessica Zhang wrote:

Introduce MSM-specific DSC helper methods, as some calculations are
common between DP and DSC.

Changes in v2:
- Moved files up to msm/ directory
- Dropped get_comp_ratio() helper
- Used drm_int2fixp() to convert to integers to fp
- Style changes to improve readability
- Dropped unused bpp variable in msm_dsc_get_dce_bytes_per_line()
- Changed msm_dsc_get_slice_per_intf() to a static inline method
- Dropped last division step of msm_dsc_get_pclk_per_line() and changed
   method name accordingly
- Changed DSC_BPP macro to drm_dsc_get_bpp_int() helper method
- Fixed some math issues caused by passing in incorrect types to
   drm_fixed methods in get_bytes_per_soft_slice()

Changes in v3:
- Dropped src_bpp parameter from all methods -- src_bpp can be
   calculated as dsc->bits_per_component * 3
- Dropped intf_width parameter from get_bytes_per_soft_slice()
- Moved dsc->bits_per_component to numerator calculation in
   get_bytes_per_soft_slice()
- Renamed msm_dsc_get_uncompressed_pclk_per_line to
   *_get_uncompressed_pclk_per_intf()
- Removed dsc->slice_width check from
   msm_dsc_get_uncompressed_pclk_per_intf()
- Made get_bytes_per_soft_slice() a public method (this will be called
   later to help calculate DP pclk params)
- Added documentation in comments
- Moved extra_eol_bytes math out of msm_dsc_get_eol_byte_num() and
   renamed msm_dsc_get_eol_byte_num to *_get_bytes_per_intf.

Changes in v4:
- Changed msm_dsc_get_uncompressed_pclk_per_intf to
   msm_dsc_get_pclk_per_intf

Signed-off-by: Jessica Zhang 
---
  drivers/gpu/drm/msm/Makefile |  1 +
  drivers/gpu/drm/msm/msm_dsc_helper.c | 47 
  drivers/gpu/drm/msm/msm_dsc_helper.h | 70 


  3 files changed, 118 insertions(+)

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index 7274c41228ed..b814fc80e2d5 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -94,6 +94,7 @@ msm-y += \
  msm_atomic_tracepoints.o \
  msm_debugfs.o \
  msm_drv.o \
+    msm_dsc_helper.o \
  msm_fb.o \
  msm_fence.o \
  msm_gem.o \
diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.c 
b/drivers/gpu/drm/msm/msm_dsc_helper.c

new file mode 100644
index ..0539221eb09d
--- /dev/null
+++ b/drivers/gpu/drm/msm/msm_dsc_helper.c
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights 
reserved

+ */
+
+#include 
+#include 
+#include 
+
+#include "msm_drv.h"
+#include "msm_dsc_helper.h"
+
+s64 get_bytes_per_soft_slice(struct drm_dsc_config *dsc)
+{
+    int bpp = msm_dsc_get_bpp_int(dsc);
+    s64 numerator_fp, denominator_fp;
+    s64 comp_ratio_fp = 
drm_fixp_from_fraction(dsc->bits_per_component * 3, bpp);

+
+    numerator_fp = drm_int2fixp(dsc->slice_width * 3 * 
dsc->bits_per_component);

+    denominator_fp = drm_fixp_mul(comp_ratio_fp, drm_int2fixp(8));
+
+    return drm_fixp_div(numerator_fp, denominator_fp);


If we remove 3 * dsc->bits_per_components from both numerator and 
denominator, this whole function seems to be as simple as 
DIV_ROUND_UP(dsc->slice_width * bpp, 8)


Or, if you prefer FP math, drm_fixp_from_fraction(dsc->slice_width * 
bpp, 8).


Hi Dmitry,

Sounds good.




+}
+
+u32 msm_dsc_get_bytes_per_intf(struct drm_dsc_config *dsc, int 
intf_width)

+{
+    u32 bytes_per_soft_slice, bytes_per_intf;
+    s64 bytes_per_soft_slice_fp;
+    int slice_per_intf = msm_dsc_get_slice_per_intf(dsc, intf_width);
+
+    bytes_per_soft_slice_fp = get_bytes_per_soft_slice(dsc);
+    bytes_per_soft_slice = drm_fixp2int_ceil(bytes_per_soft_slice_fp);
+
+    bytes_per_intf = bytes_per_soft_slice * slice_per_intf;
+
+    return bytes_per_intf;
+}
+
+int msm_dsc_get_pclk_per_intf(struct drm_dsc_config *dsc)
+{
+    s64 data_width;
+
+    data_width = drm_fixp_mul(drm_int2fixp(dsc->slice_count),
+    get_bytes_per_soft_slice(dsc));


And this is then DIV_ROUND_UP(dsc->slice_width * dsc->slice_count * 
bpp, 8)


I would prefer to keep the FP math/get_bytes_per_soft_slice() call here 
and leave the ceil() until the end.


It is the code, you are calling ceil right after drm_fixp_mul. So, there 
is no difference.







+
+    return drm_fixp2int_ceil(data_width);
+}
diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.h 
b/drivers/gpu/drm/msm/msm_dsc_helper.h

new file mode 100644
index ..31116a31090f
--- /dev/null
+++ b/drivers/gpu/drm/msm/msm_dsc_helper.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights 
reserved

+ */
+
+#ifndef MSM_DSC_HELPER_H_
+#define MSM_DSC_HELPER_H_
+
+#include 
+#include 
+
+/*
+ * Helper methods for MSM specific DSC calculations that are common 
between timing engine,

+ * DSI, and DP.
+ */
+
+/**
+ * 

Re: [Freedreno] [PATCH v4 1/6] drm/msm: Add MSM-specific DSC helper methods

2023-04-05 Thread Jessica Zhang




On 4/5/2023 12:26 PM, Dmitry Baryshkov wrote:

On 05/04/2023 03:41, Jessica Zhang wrote:

Introduce MSM-specific DSC helper methods, as some calculations are
common between DP and DSC.

Changes in v2:
- Moved files up to msm/ directory
- Dropped get_comp_ratio() helper
- Used drm_int2fixp() to convert to integers to fp
- Style changes to improve readability
- Dropped unused bpp variable in msm_dsc_get_dce_bytes_per_line()
- Changed msm_dsc_get_slice_per_intf() to a static inline method
- Dropped last division step of msm_dsc_get_pclk_per_line() and changed
   method name accordingly
- Changed DSC_BPP macro to drm_dsc_get_bpp_int() helper method
- Fixed some math issues caused by passing in incorrect types to
   drm_fixed methods in get_bytes_per_soft_slice()

Changes in v3:
- Dropped src_bpp parameter from all methods -- src_bpp can be
   calculated as dsc->bits_per_component * 3
- Dropped intf_width parameter from get_bytes_per_soft_slice()
- Moved dsc->bits_per_component to numerator calculation in
   get_bytes_per_soft_slice()
- Renamed msm_dsc_get_uncompressed_pclk_per_line to
   *_get_uncompressed_pclk_per_intf()
- Removed dsc->slice_width check from
   msm_dsc_get_uncompressed_pclk_per_intf()
- Made get_bytes_per_soft_slice() a public method (this will be called
   later to help calculate DP pclk params)
- Added documentation in comments
- Moved extra_eol_bytes math out of msm_dsc_get_eol_byte_num() and
   renamed msm_dsc_get_eol_byte_num to *_get_bytes_per_intf.

Changes in v4:
- Changed msm_dsc_get_uncompressed_pclk_per_intf to
   msm_dsc_get_pclk_per_intf

Signed-off-by: Jessica Zhang 
---
  drivers/gpu/drm/msm/Makefile |  1 +
  drivers/gpu/drm/msm/msm_dsc_helper.c | 47 
  drivers/gpu/drm/msm/msm_dsc_helper.h | 70 


  3 files changed, 118 insertions(+)

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index 7274c41228ed..b814fc80e2d5 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -94,6 +94,7 @@ msm-y += \
  msm_atomic_tracepoints.o \
  msm_debugfs.o \
  msm_drv.o \
+    msm_dsc_helper.o \
  msm_fb.o \
  msm_fence.o \
  msm_gem.o \
diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.c 
b/drivers/gpu/drm/msm/msm_dsc_helper.c

new file mode 100644
index ..0539221eb09d
--- /dev/null
+++ b/drivers/gpu/drm/msm/msm_dsc_helper.c
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights 
reserved

+ */
+
+#include 
+#include 
+#include 
+
+#include "msm_drv.h"
+#include "msm_dsc_helper.h"
+
+s64 get_bytes_per_soft_slice(struct drm_dsc_config *dsc)
+{
+    int bpp = msm_dsc_get_bpp_int(dsc);
+    s64 numerator_fp, denominator_fp;
+    s64 comp_ratio_fp = 
drm_fixp_from_fraction(dsc->bits_per_component * 3, bpp);

+
+    numerator_fp = drm_int2fixp(dsc->slice_width * 3 * 
dsc->bits_per_component);

+    denominator_fp = drm_fixp_mul(comp_ratio_fp, drm_int2fixp(8));
+
+    return drm_fixp_div(numerator_fp, denominator_fp);


If we remove 3 * dsc->bits_per_components from both numerator and 
denominator, this whole function seems to be as simple as 
DIV_ROUND_UP(dsc->slice_width * bpp, 8)


Or, if you prefer FP math, drm_fixp_from_fraction(dsc->slice_width * 
bpp, 8).


Hi Dmitry,

Sounds good.




+}
+
+u32 msm_dsc_get_bytes_per_intf(struct drm_dsc_config *dsc, int 
intf_width)

+{
+    u32 bytes_per_soft_slice, bytes_per_intf;
+    s64 bytes_per_soft_slice_fp;
+    int slice_per_intf = msm_dsc_get_slice_per_intf(dsc, intf_width);
+
+    bytes_per_soft_slice_fp = get_bytes_per_soft_slice(dsc);
+    bytes_per_soft_slice = drm_fixp2int_ceil(bytes_per_soft_slice_fp);
+
+    bytes_per_intf = bytes_per_soft_slice * slice_per_intf;
+
+    return bytes_per_intf;
+}
+
+int msm_dsc_get_pclk_per_intf(struct drm_dsc_config *dsc)
+{
+    s64 data_width;
+
+    data_width = drm_fixp_mul(drm_int2fixp(dsc->slice_count),
+    get_bytes_per_soft_slice(dsc));


And this is then DIV_ROUND_UP(dsc->slice_width * dsc->slice_count * bpp, 8)


I would prefer to keep the FP math/get_bytes_per_soft_slice() call here 
and leave the ceil() until the end.





+
+    return drm_fixp2int_ceil(data_width);
+}
diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.h 
b/drivers/gpu/drm/msm/msm_dsc_helper.h

new file mode 100644
index ..31116a31090f
--- /dev/null
+++ b/drivers/gpu/drm/msm/msm_dsc_helper.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights 
reserved

+ */
+
+#ifndef MSM_DSC_HELPER_H_
+#define MSM_DSC_HELPER_H_
+
+#include 
+#include 
+
+/*
+ * Helper methods for MSM specific DSC calculations that are common 
between timing engine,

+ * DSI, and DP.
+ */
+
+/**
+ * msm_dsc_get_bpp_int - get bits per pixel integer value
+ * @dsc: Pointer to drm dsc config struct
+ */
+static inline int 

Re: [PATCH] drm/i915/guc: Don't capture Gen8 regs on Gen12 devices

2023-04-05 Thread Matt Roper
On Wed, Apr 05, 2023 at 02:13:31PM -0700, John Harrison wrote:
> On 4/3/2023 17:34, Matt Roper wrote:
> > On Mon, Apr 03, 2023 at 02:33:34PM -0700, john.c.harri...@intel.com wrote:
> > > From: John Harrison 
> > > 
> > > A pair of pre-Gen12 registers were being included in the Gen12 capture
> > > list. GuC was rejecting those as being invalid and logging errors
> > > about them. So, stop doing it.
> > Looks like these registers existed from gen8-gen11.  With this change,
> > it looks like they also won't be included in the GuC error capture for
> > gen11 (ICL and EHL/JSL) since those platforms return xe_lpd_lists [1]
> > rather than default_lists; do we care about that?  I assume not (since
> > those platforms don't use GuC submission unless you force it with the
> > enable_guc modparam and taint your kernel), but I figured I should point
> > it out.
> Yeah, I think the code is treating Gen11 as Gen12 rather than Gen9 or it's
> own thing. I hadn't spotted that before. It certainly seems incorrect.
> 
> > 
> > Reviewed-by: Matt Roper 
> > 
> > 
> > [1] Why is the main list we use called xe_lpd (i.e., the name of ADL-P's
> >  display IP)?  It doesn't seem like we're doing anything with display
> >  registers here so using display IP naming seems really confusing.
> I think because no-one has a clue what name refers to what hardware any more
> :(.
> 
> What are the official names for IP_VER 9, 11, 12.00, 12.50 and 12.55?

Yeah, the naming is a real mess.  :-(  For graphics IP, the official
terms are supposed to be:

12.00 = Xe_LP
12.10 = Xe_LP+ (basically the same as Xe_LP except for interrupts)
12.50 = Xe_HP
12.55 = Xe_HPG (it's nearly identical to Xe_HP)
12.7x = Xe_LPG

There are separate names for media, although we didn't really start
using them anywhere in the i915 until the separation of IPs started
becoming more important with MTL:

12.00 = Xe_M (or Xe_M+ for DG1, but we treat it the same in the KMD)
12.50 = Xe_XPM
12.55 = Xe_HPM
12.60 = Xe_XPM+
13.00 = Xe_LPM+

and display:

12.00 = Xe_D
13.00 = Xe_LPD (ADL-P) or Xe_HPD (DG2)
14.00 = Xe_LPD+


The pre-12 stuff predates the fancy new marketing-mandated names.  Even
though we're not using "gen" terminology going forward, those old ones
are grandfathered in, so it's still okay to refer to them as gen9,
gen11, etc.


Matt

> 
> John.
> 
> > 
> > 
> > Matt
> > 
> > > Signed-off-by: John Harrison 
> > > Fixes: dce2bd542337 ("drm/i915/guc: Add Gen9 registers for GuC error 
> > > state capture.")
> > > Cc: Alan Previn 
> > > Cc: Umesh Nerlige Ramappa 
> > > Cc: Lucas De Marchi 
> > > Cc: John Harrison 
> > > Cc: Jani Nikula 
> > > Cc: Matt Roper 
> > > Cc: Balasubramani Vivekanandan 
> > > Cc: Daniele Ceraolo Spurio 
> > > ---
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c | 7 +--
> > >   1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c 
> > > b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> > > index cf49188db6a6e..e0e793167d61b 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> > > @@ -31,12 +31,14 @@
> > >   { FORCEWAKE_MT, 0,  0, "FORCEWAKE" }
> > >   #define COMMON_GEN9BASE_GLOBAL \
> > > - { GEN8_FAULT_TLB_DATA0, 0,  0, "GEN8_FAULT_TLB_DATA0" }, \
> > > - { GEN8_FAULT_TLB_DATA1, 0,  0, "GEN8_FAULT_TLB_DATA1" }, \
> > >   { ERROR_GEN6,   0,  0, "ERROR_GEN6" }, \
> > >   { DONE_REG, 0,  0, "DONE_REG" }, \
> > >   { HSW_GTT_CACHE_EN, 0,  0, "HSW_GTT_CACHE_EN" }
> > > +#define GEN9_GLOBAL \
> > > + { GEN8_FAULT_TLB_DATA0, 0,  0, "GEN8_FAULT_TLB_DATA0" }, \
> > > + { GEN8_FAULT_TLB_DATA1, 0,  0, "GEN8_FAULT_TLB_DATA1" }
> > > +
> > >   #define COMMON_GEN12BASE_GLOBAL \
> > >   { GEN12_FAULT_TLB_DATA0,0,  0, "GEN12_FAULT_TLB_DATA0" 
> > > }, \
> > >   { GEN12_FAULT_TLB_DATA1,0,  0, "GEN12_FAULT_TLB_DATA1" 
> > > }, \
> > > @@ -142,6 +144,7 @@ static const struct __guc_mmio_reg_descr 
> > > xe_lpd_gsc_inst_regs[] = {
> > >   static const struct __guc_mmio_reg_descr default_global_regs[] = {
> > >   COMMON_BASE_GLOBAL,
> > >   COMMON_GEN9BASE_GLOBAL,
> > > + GEN9_GLOBAL,
> > >   };
> > >   static const struct __guc_mmio_reg_descr default_rc_class_regs[] = {
> > > -- 
> > > 2.39.1
> > > 
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


Re: [PATCH] drm/i915/guc: Don't capture Gen8 regs on Gen12 devices

2023-04-05 Thread John Harrison

On 4/3/2023 17:34, Matt Roper wrote:

On Mon, Apr 03, 2023 at 02:33:34PM -0700, john.c.harri...@intel.com wrote:

From: John Harrison 

A pair of pre-Gen12 registers were being included in the Gen12 capture
list. GuC was rejecting those as being invalid and logging errors
about them. So, stop doing it.

Looks like these registers existed from gen8-gen11.  With this change,
it looks like they also won't be included in the GuC error capture for
gen11 (ICL and EHL/JSL) since those platforms return xe_lpd_lists [1]
rather than default_lists; do we care about that?  I assume not (since
those platforms don't use GuC submission unless you force it with the
enable_guc modparam and taint your kernel), but I figured I should point
it out.
Yeah, I think the code is treating Gen11 as Gen12 rather than Gen9 or 
it's own thing. I hadn't spotted that before. It certainly seems incorrect.




Reviewed-by: Matt Roper 


[1] Why is the main list we use called xe_lpd (i.e., the name of ADL-P's
 display IP)?  It doesn't seem like we're doing anything with display
 registers here so using display IP naming seems really confusing.
I think because no-one has a clue what name refers to what hardware any 
more :(.


What are the official names for IP_VER 9, 11, 12.00, 12.50 and 12.55?

John.




Matt


Signed-off-by: John Harrison 
Fixes: dce2bd542337 ("drm/i915/guc: Add Gen9 registers for GuC error state 
capture.")
Cc: Alan Previn 
Cc: Umesh Nerlige Ramappa 
Cc: Lucas De Marchi 
Cc: John Harrison 
Cc: Jani Nikula 
Cc: Matt Roper 
Cc: Balasubramani Vivekanandan 
Cc: Daniele Ceraolo Spurio 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
index cf49188db6a6e..e0e793167d61b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
@@ -31,12 +31,14 @@
{ FORCEWAKE_MT, 0,  0, "FORCEWAKE" }
  
  #define COMMON_GEN9BASE_GLOBAL \

-   { GEN8_FAULT_TLB_DATA0, 0,  0, "GEN8_FAULT_TLB_DATA0" }, \
-   { GEN8_FAULT_TLB_DATA1, 0,  0, "GEN8_FAULT_TLB_DATA1" }, \
{ ERROR_GEN6,   0,  0, "ERROR_GEN6" }, \
{ DONE_REG, 0,  0, "DONE_REG" }, \
{ HSW_GTT_CACHE_EN, 0,  0, "HSW_GTT_CACHE_EN" }
  
+#define GEN9_GLOBAL \

+   { GEN8_FAULT_TLB_DATA0, 0,  0, "GEN8_FAULT_TLB_DATA0" }, \
+   { GEN8_FAULT_TLB_DATA1, 0,  0, "GEN8_FAULT_TLB_DATA1" }
+
  #define COMMON_GEN12BASE_GLOBAL \
{ GEN12_FAULT_TLB_DATA0,0,  0, "GEN12_FAULT_TLB_DATA0" }, \
{ GEN12_FAULT_TLB_DATA1,0,  0, "GEN12_FAULT_TLB_DATA1" }, \
@@ -142,6 +144,7 @@ static const struct __guc_mmio_reg_descr 
xe_lpd_gsc_inst_regs[] = {
  static const struct __guc_mmio_reg_descr default_global_regs[] = {
COMMON_BASE_GLOBAL,
COMMON_GEN9BASE_GLOBAL,
+   GEN9_GLOBAL,
  };
  
  static const struct __guc_mmio_reg_descr default_rc_class_regs[] = {

--
2.39.1





Re: [PATCH 3/3] drm/fb-helper: fix input validation gaps in check_var

2023-04-05 Thread Javier Martinez Canillas
Daniel Vetter  writes:

> On Wed, Apr 05, 2023 at 07:42:08PM +0200, Javier Martinez Canillas wrote:

[...]

>> >> Ah, your patch adds it after that indeed. Please ignore my comment then.
>> >
>> > So rb: you?
>> >
>> 
>> Yes, I already provided it in my previous email and has been picked by
>> patchwork. I could do again but probably will confuse dim when applying.
>
> Yeah just wanted to confirm I cleared up all your questions. Merged the
> entire series to drm-misc-next, thanks for the review.
>

You are welcome.

>> The only patch from your series that is missing an {r,a}b is #1 right now:
>> 
>> https://patchwork.kernel.org/project/dri-devel/list/?series=736966=both
>
> That's a different one :-)
>

Oh, sorry about that. Somehow I switched threads in my head in the middle
of the response.

> I'll respin with your comments and then let you duke it out about
> patch 1.
> -Daniel
>

Perfect, thanks! It would be good to finally have that issue fixed.

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [Regression] drm/scheduler: track GPU active time per entity

2023-04-05 Thread Luben Tuikov
On 2023-04-05 13:44, Lucas Stach wrote:
> Hi Luben,
> 
> Am Dienstag, dem 04.04.2023 um 00:31 -0400 schrieb Luben Tuikov:
>> On 2023-03-28 04:54, Lucas Stach wrote:
>>> Hi Danilo,
>>>
>>> Am Dienstag, dem 28.03.2023 um 02:57 +0200 schrieb Danilo Krummrich:
 Hi all,

 Commit df622729ddbf ("drm/scheduler: track GPU active time per entity") 
 tries to track the accumulated time that a job was active on the GPU 
 writing it to the entity through which the job was deployed to the 
 scheduler originally. This is done within drm_sched_get_cleanup_job() 
 which fetches a job from the schedulers pending_list.

 Doing this can result in a race condition where the entity is already 
 freed, but the entity's newly added elapsed_ns field is still accessed 
 once the job is fetched from the pending_list.

 After drm_sched_entity_destroy() being called it should be safe to free 
 the structure that embeds the entity. However, a job originally handed 
 over to the scheduler by this entity might still reside in the 
 schedulers pending_list for cleanup after drm_sched_entity_destroy() 
 already being called and the entity being freed. Hence, we can run into 
 a UAF.

>>> Sorry about that, I clearly didn't properly consider this case.
>>>
 In my case it happened that a job, as explained above, was just picked 
 from the schedulers pending_list after the entity was freed due to the 
 client application exiting. Meanwhile this freed up memory was already 
 allocated for a subsequent client applications job structure again. 
 Hence, the new jobs memory got corrupted. Luckily, I was able to 
 reproduce the same corruption over and over again by just using 
 deqp-runner to run a specific set of VK test cases in parallel.

 Fixing this issue doesn't seem to be very straightforward though (unless 
 I miss something), which is why I'm writing this mail instead of sending 
 a fix directly.

 Spontaneously, I see three options to fix it:

 1. Rather than embedding the entity into driver specific structures 
 (e.g. tied to file_priv) we could allocate the entity separately and 
 reference count it, such that it's only freed up once all jobs that were 
 deployed through this entity are fetched from the schedulers pending list.

>>> My vote is on this or something in similar vain for the long term. I
>>> have some hope to be able to add a GPU scheduling algorithm with a bit
>>> more fairness than the current one sometime in the future, which
>>> requires execution time tracking on the entities.
>>
>> Danilo,
>>
>> Using kref is preferable, i.e. option 1 above.
>>
>> Lucas, can you shed some light on,
>>
>> 1. In what way the current FIFO scheduling is unfair, and
>> 2. shed some details on this "scheduling algorithm with a bit
>> more fairness than the current one"? 
> 
> I don't have a specific implementation in mind yet. However the current
> FIFO algorithm can be very unfair if you have a sparse workload compete
> with one that generates a lot of jobs without any throttling aside from
> the entity queue length.

Ah, that's interesting, let's see, a "sparse workload compete with one that
generates a lot of jobs", so basically we have a sparse workload compete
with a dense workload. So we can represent this with two entities, A, B,
whose jobs we're going to represent by the entities, names A and B.
So if we let A be the sparse workload and B the dense workload,
we have this, wlog,

  First/oldest job, .., latter/new jobs.
Subm: A, B, B, B, B, B, A, B, B, B, B, B, A, B, B, B, B, B, A, ...
Time: t0,t1,t2,t3,t4,t5,t6,t7,t8,t9, .

The current FIFO algorithm, would prefer to execute those jobs
in order of submission, i.e. oldest-ready-first job. Assume
that all jobs are ready. Then we'll execute them in order.
This is desirable and fair. We want to execute the jobs
in the order they were submitted, given also that they are
ready to be executed. So perhaps we want to execute them like this:

  First/oldest job, .., latter/new jobs.
Subm: A, B, B, B, B, B, A, B, B, B, B, B, A, B, B, B, B, B, A, ...
Time: t0,t1,t2,t3,t4,t5,t6,t7,t8,t9, 
Exec:  A, B, B, B, B, B, A, B, B, B, B, B, A, B, B, B, B, B, A, ...   

Any other ordering would starve either A, or B. If we executed the 2nd A
job at t6 or t7, then that would starve the 3rd/4th job in B, since the 2nd A 
job
arrives at the same time as that of the 3rd B job, at time t6.
The time t3-t0 is some delta > 0, some initial scheduler-start up time.

IOW, we don't want to delay a job any more than it should wait--the oldest
job, which is also ready, should execute next, so that we're fair how
it executes in real time. We cannot boot B at t6, so that we execute A,
just because it is sparse, but just arrived.

>From A's point of view, it shouldn't expect its job execution time 

Re: [PATCH 3/3] drm/fb-helper: fix input validation gaps in check_var

2023-04-05 Thread Daniel Vetter
On Wed, Apr 05, 2023 at 07:42:08PM +0200, Javier Martinez Canillas wrote:
> Daniel Vetter  writes:
> 
> > On Wed, Apr 05, 2023 at 06:27:17PM +0200, Javier Martinez Canillas wrote:
> >> Daniel Vetter  writes:
> 
> [...]
> 
> >> >
> >> > The __fill_var is after this. I'm honestly not sure what the exact
> >> 
> >> Ah, your patch adds it after that indeed. Please ignore my comment then.
> >
> > So rb: you?
> >
> 
> Yes, I already provided it in my previous email and has been picked by
> patchwork. I could do again but probably will confuse dim when applying.

Yeah just wanted to confirm I cleared up all your questions. Merged the
entire series to drm-misc-next, thanks for the review.

> The only patch from your series that is missing an {r,a}b is #1 right now:
> 
> https://patchwork.kernel.org/project/dri-devel/list/?series=736966=both

That's a different one :-)

I'll respin with your comments and then let you duke it out about
patch 1.
-Daniel

> 
> [...]
> 
> >> > What I'm wondering now is whether too small x/yres won't lead to problems
> >> > of some sorts ... For multi-screen we set the virtual size to be big
> >> > enough for all crtc, and then just set x/yres to be the smallest output.
> >> > That way fbcon knows to only draw as much as is visible on all screens.
> >> > But if you then pan that too much, the bigger screens might not have a 
> >> > big
> >> > enough buffer anymore and things fail (but shouldn't).
> >> >
> >> > Not sure how to fix that tbh.
> >> 
> >> Would this be a problem in practice?
> >
> > I'm frankly not sure. You'd get a black screen for fbcon/fbdev across all
> > outputs, but only if you have userspace doing this intentionally.
> >
> > In a way it's just another artifact of the drm fbdev emulation not using
> > ATOMIC_TEST_ONLY in the various places where it should, and so doesn't
> > really know whether a configuration change will work out.
> >
> > We already have this in obscure mulit-monitor cases where adding another
> > screen kills fbcon, because the display hw is running out of fifo or
> > clocks or whatever, and because the drm fbdev code doesn't check but just
> > blindly commits the entire thing as an atomic commit, the overall commit
> > fails.
> >
> > This worked "better" with legacy kms because there we commit per-crtc, so
> > if any specific crtc runs into a limit check, only that one fails to light
> > up.
> >
> > Imo given that no one cared enough yet to write up atomic TEST_ONLY
> > support for fbdev emulation I think we can continue to just ignore this
> > problem.
> >
> 
> Agreed. If that ends being a problem for people in practice then I guess
> someone can type atomic TEST_ONLY support for the fbdev emulation layer.
> 
> > What should not happen is that fbcon code blows up drawing out of bounds
> > or something like that, resulting in a kernel crash. So from that pov I
> > think it's "safe" :-)
> 
> Great. Thanks a lot for your explanations.
> 
> > -Daniel
> 
> -- 
> Best regards,
> 
> Javier Martinez Canillas
> Core Platforms
> Red Hat
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v4 4/6] drm/msm/dsi: Use MSM and DRM DSC helper methods

2023-04-05 Thread Jessica Zhang




On 4/5/2023 12:27 PM, Dmitry Baryshkov wrote:

On 05/04/2023 03:41, Jessica Zhang wrote:

Use MSM and DRM DSC helper methods to configure DSC for DSI.

Changes in V2:
- *_calculate_initial_scale_value --> *_set_initial_scale_value
- Split pkt_per_line and eol_byte_num changes to a separate patch
- Moved pclk_per_line calculation to hdisplay adjustment in `if (dsc)`
   block of dsi_update_dsc_timing()

Changes in v3:
- Split pclk_per_intf calculation into a separate patch
- Added slice_width check to dsi_timing_setup
- Used MSM DSC helper to calculate total_bytes_per_intf

Signed-off-by: Jessica Zhang 
---
  drivers/gpu/drm/msm/dsi/dsi_host.c | 13 ++---
  1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/dsi/dsi_host.c 
b/drivers/gpu/drm/msm/dsi/dsi_host.c

index 74d38f90398a..6a6218a9655f 100644
--- a/drivers/gpu/drm/msm/dsi/dsi_host.c
+++ b/drivers/gpu/drm/msm/dsi/dsi_host.c
@@ -28,6 +28,7 @@
  #include "dsi.xml.h"
  #include "sfpb.xml.h"
  #include "dsi_cfg.h"
+#include "msm_dsc_helper.h"
  #include "msm_kms.h"
  #include "msm_gem.h"
  #include "phy/dsi_phy.h"
@@ -848,7 +849,7 @@ static void dsi_update_dsc_timing(struct 
msm_dsi_host *msm_host, bool is_cmd_mod

  /* first calculate dsc parameters and then program
   * compress mode registers
   */
-    slice_per_intf = DIV_ROUND_UP(hdisplay, dsc->slice_width);
+    slice_per_intf = msm_dsc_get_slice_per_intf(dsc, hdisplay);
  /*
   * If slice_count is greater than slice_per_intf
@@ -858,7 +859,7 @@ static void dsi_update_dsc_timing(struct 
msm_dsi_host *msm_host, bool is_cmd_mod

  if (dsc->slice_count > slice_per_intf)
  dsc->slice_count = 1;
-    total_bytes_per_intf = dsc->slice_chunk_size * slice_per_intf;
+    total_bytes_per_intf = msm_dsc_get_bytes_per_intf(dsc, hdisplay);
  eol_byte_num = total_bytes_per_intf % 3;
  pkt_per_line = slice_per_intf / dsc->slice_count;
@@ -936,6 +937,12 @@ static void dsi_timing_setup(struct msm_dsi_host 
*msm_host, bool is_bonded_dsi)

  return;
  }
+    if (!dsc->slice_width || (mode->hdisplay < dsc->slice_width)) {
+    pr_err("DSI: invalid slice width %d (pic_width: %d)\n",
+   dsc->slice_width, mode->hdisplay);
+    return;
+    }


This is not the "use of MSM and DRM DSC helper methods" and thus should 
be moved to a separate patch.


Hi Dmitry,

Acked.

Thanks,

Jessica Zhang




+
  dsc->pic_width = mode->hdisplay;
  dsc->pic_height = mode->vdisplay;
  DBG("Mode %dx%d\n", dsc->pic_width, dsc->pic_height);
@@ -1759,7 +1766,7 @@ static int dsi_populate_dsc_params(struct 
msm_dsi_host *msm_host, struct drm_dsc

  return ret;
  }
-    dsc->initial_scale_value = 32;
+    drm_dsc_set_initial_scale_value(dsc);
  dsc->line_buf_depth = dsc->bits_per_component + 1;
  return drm_dsc_compute_rc_parameters(dsc);



--
With best wishes
Dmitry



Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO creation

2023-04-05 Thread Jordan Justen
On 2023-04-05 00:45:24, Lionel Landwerlin wrote:
> On 04/04/2023 19:04, Yang, Fei wrote:
> >> Subject: Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at 
> >> BO creation
> >>
> >> Just like the protected content uAPI, there is no way for userspace to tell
> >> this feature is available other than trying using it.
> >>
> >> Given the issues with protected content, is it not thing we could want to 
> >> add?
> > Sorry I'm not aware of the issues with protected content, could you 
> > elaborate?
> > There was a long discussion on teams uAPI channel, could you comment there 
> > if
> > any concerns?
> >
> 
> We wanted to have a getparam to detect protected support and were told 
> to detect it by trying to create a context with it.
> 

An extensions system where the detection mechanism is "just try it",
and assume it's not supported if it fails. ??

This seem likely to get more and more problematic as a detection
mechanism as more extensions are added.

> 
> Now it appears trying to create a protected context can block for 
> several seconds.
> 
> Since we have to report capabilities to the user even before it creates 
> protected contexts, any app is at risk of blocking.
> 

This failure path is not causing any re-thinking about using this as
the extension detection mechanism?

Doesn't the ioctl# + input-struct-size + u64-extension# identify the
extension such that the kernel could indicate if it is supported or
not. (Or, perhaps return an array of the supported extensions so the
umd doesn't have to potentially make many ioctls for each extension of
interest.)

-Jordan


Re: [Intel-gfx] [PATCH] i915/guc/slpc: Provide sysfs for efficient freq

2023-04-05 Thread Rodrigo Vivi
On Wed, Apr 05, 2023 at 12:42:30PM -0700, Dixit, Ashutosh wrote:
> On Wed, 05 Apr 2023 06:57:42 -0700, Rodrigo Vivi wrote:
> >
> 
> Hi Rodrigo,
> 
> > On Fri, Mar 31, 2023 at 08:11:29PM -0700, Dixit, Ashutosh wrote:
> > > On Fri, 31 Mar 2023 19:00:49 -0700, Vinay Belgaumkar wrote:
> > > >
> > >
> > > Hi Vinay,
> > >
> > > > @@ -478,20 +507,15 @@ int intel_guc_slpc_set_min_freq(struct 
> > > > intel_guc_slpc *slpc, u32 val)
> > > > val > slpc->max_freq_softlimit)
> > > > return -EINVAL;
> > > >
> > > > +   /* Ignore efficient freq if lower min freq is requested */
> > > > +   ret = intel_guc_slpc_set_ignore_eff_freq(slpc, val < 
> > > > slpc->rp1_freq);
> > > > +   if (ret)
> > > > +   goto out;
> > > > +
> > >
> > > I don't agree with this. If we are now providing an interface explicitly 
> > > to
> > > ignore RPe, that should be /only/ way to ignore RPe. There should be no
> > > other "under the hood" ignoring of RPe. In other words, ignoring RPe 
> > > should
> > > be minimized unless explicitly requested.
> > >
> > > I don't clearly understand why this was done previously but it makes even
> > > less sense to me now after this patch.
> >
> > well, I had suggested this previously. And just because without this we 
> > would
> > be breaking API expectations.
> >
> > When user selects a minimal frequency it expect that to stick. But with the
> > efficient freq enabled in guc if minimal is less than the efficient one,
> > this request is likely ignored.
> >
> > Well, even worse is that we are actually caching the request in the soft 
> > values.
> > So we show a minimal, but the hardware without any workload is operating at
> > efficient.
> >
> > So, the thought process was: 'if user requested a very low minimal, we give 
> > them
> > the minimal requested, even if that means to disable the efficient freq.'
> 
> Hmm, I understand this even less now :)
> 
> * Why is RPe ignored when min < RPe? Since the freq can be between min and
>   max? Shouldn't the condition be min > RPe, that is turn RPe off if min
>   higher that RPe is requested?

that is not how guc efficient freq selection works. (unless my memory is
tricking me right now.)

So, if we select a min that is between RPe and RP0, guc will respect and
use the selected min. So we don't need to disable guc selection of the
efficient.

This is not true when we select a very low min like RPn. If we select RPn
as min and guc efficient freq selection is enabled guc will simply ignore
our request. So the only way to give the user what is asked, is to also
disable guc's efficient freq selection. (I probably confused you in the
previous email because I used 'RP0' when I meant 'RPn'. I hope it gets
clear now).

> 
> * Also isn't RPe dynamic, so we can't say RPe == rp1 when using in KMD?

Oh... yeap, this is an issue indeed. Specially with i915 where we have
the soft values cached instead of asking guc everytime.

That's a good point. The variance is not big, but we will hit corner cases.
One way is to keep checking and updating everytime a sysfs is touched.
Other way is do what you are suggesting and let's just accept and deal
with the reality that is: "we cannot guarantee a min freq selection if user
doesn't disable the efficient freq selection".

> 
> * Finally, we know that enabling RPe broke the kernel freq API because RPe
>   could go over max_freq. So it is actually the max freq which is not
>   obeyed after RPe is enabled.

Oh! so it was my bad memory indeed and everything is the other way around?
But I just looked to Xe code, my most recent memory, and I just needed
to toggle the efficient freq off on the case that I mentioned, when min
selection is below the efficient one. With that all the API expectation
that I coded in IGT works neatly.

> 
> So we ignore RPe in some select cases (which also I don't understand as
> mentioned above but maybe I am missing something) to claim that we are
> obeying the freq API, but let the freq API stay broken in other cases?

what cases it stays broken? This is why we need the IGT tests for all the
API behavior in place.

> 
> > So, that was introduced to avoid API breakage. Removing it now would mean
> > breaking API. (Not sure if the IGT tests for the API got merged already,
> > but think that as the API contract).
> 
> I think we should take this patch as an opportunity to fix this and give
> the user a clean interface to ignore RPe and remove this other implicit way
> to ignore RPe. All IGT changes are unmerged at present.

Yeap, the IGT needs to come with whatever we concluded here and we need to
stick with that afterwards, so let's think with care.

Vinay, Ashutosh's strongest argument is the variable RPe. Do you have thoughts
on that?

> 
> Thanks.
> --
> Ashutosh
> 
> 
> 
> >
> > But I do agree with you that having something selected from multiple places
> > also has the potential to cause some miss-expectations. So I was thinking
> > about multiple even orders where the user 

[PATCH] drm/doc/rfc: Introduce the merge plan for the Xe driver.

2023-04-05 Thread Rodrigo Vivi
Let’s establish a merge plan for Xe, by writing down clear pre-merge goals, in
order to avoid unnecessary delays.

This initial document starts with a TODO list containing items with clear and
measurable key results. Xe’s initial pull request should only be sent to
dri-devel after all the items are clearly resolved.

Since many of them involve some level of a community consensus, in many cases,
the consensus will be reached in follow-up patches to this document with more
details of the API or helpers that will be developed or modified.

Cc: Dave Airlie 
Cc: Daniel Vetter 
Cc: Oded Gabbay 
Signed-off-by: Rodrigo Vivi 
Signed-off-by: Francois Dugast 
Signed-off-by: Luis Strano 
Signed-off-by: Matthew Brost 
Signed-off-by: Thomas Hellström 
---
 Documentation/gpu/rfc/index.rst |   4 +
 Documentation/gpu/rfc/xe.rst| 216 
 2 files changed, 220 insertions(+)
 create mode 100644 Documentation/gpu/rfc/xe.rst

diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst
index 476719771eef..e4f7b005138d 100644
--- a/Documentation/gpu/rfc/index.rst
+++ b/Documentation/gpu/rfc/index.rst
@@ -31,3 +31,7 @@ host such documentation:
 .. toctree::
 
 i915_vm_bind.rst
+
+.. toctree::
+
+   xe.rst
diff --git a/Documentation/gpu/rfc/xe.rst b/Documentation/gpu/rfc/xe.rst
new file mode 100644
index ..1e3e7e9c67c3
--- /dev/null
+++ b/Documentation/gpu/rfc/xe.rst
@@ -0,0 +1,216 @@
+==
+Xe – Merge Acceptance Plan
+==
+Xe is a new driver for Intel GPUs that supports both integrated and
+discrete platforms starting with Tiger Lake (first Intel Xe Architecture).
+
+This document aims to establish a merge plan for the Xe, by writing down clear
+pre-merge goals, in order to avoid unnecessary delays.
+
+Xe – Overview
+=
+The main motivation of Xe is to have a fresh base to work from that is
+unencumbered by older platforms, whilst also taking the opportunity to
+rearchitect our driver to increase sharing across the drm subsystem, both
+leveraging and allowing us to contribute more towards other shared components
+like TTM and drm/scheduler.
+
+This is also an opportunity to start from the beginning with a clean uAPI that 
is
+extensible by design and already aligned with the modern userspace needs. For
+this reason, the memory model is solely based on GPU Virtual Address space
+bind/unbind (‘VM_BIND’) of GEM buffer objects (BOs) and execution only 
supporting
+explicit synchronization. With persistent mapping across the execution, the
+userspace does not need to provide a list of all required mappings during each
+submission.
+
+The new driver leverages a lot from i915. As for display, the intent is to 
share
+the display code with the i915 driver so that there is maximum reuse there.
+
+As for the power management area, the goal is to have a much-simplified support
+for the system suspend states (S-states), PCI device suspend states (D-states),
+GPU/Render suspend states (R-states) and frequency management. It should 
leverage
+as much as possible all the existent PCI-subsystem infrastructure (pm and
+runtime_pm) and underlying firmware components such PCODE and GuC for the power
+states and frequency decisions.
+
+Repository:
+
+https://gitlab.freedesktop.org/drm/xe/kernel (branch drm-xe-next)
+
+Xe – Platforms
+==
+Currently, Xe is already functional and has experimental support for multiple
+platforms starting from Tiger Lake, with initial support in userspace 
implemented
+in Mesa (for Iris and Anv, our OpenGL and Vulkan drivers), as well as in NEO
+(for OpenCL and Level0).
+
+During a transition period, platforms will be supported by both Xe and i915.
+However, the force_probe mechanism existent in both drivers will allow only one
+official and by-default probe at a given time.
+
+For instance, in order to probe a DG2 which PCI ID is 0x5690 by Xe instead of
+i915, the following set of parameters need to be used:
+
+```
+i915.force_probe=!5690 xe.force_probe=5690
+```
+
+In both drivers, the ‘.require_force_probe’ protection forces the user to use 
the
+force_probe parameter while the driver is under development. This protection is
+only removed when the support for the platform and the uAPI are stable. 
Stability
+which needs to be demonstrated by CI results.
+
+In order to avoid user space regressions, i915 will continue to support all the
+current platforms that are already out of this protection. Xe support will be
+forever experimental and dependent on the usage of force_probe for these
+platforms.
+
+When the time comes for Xe, the protection will be lifted on Xe and kept in 
i915.
+
+Xe driver will be protected with both STAGING Kconfig and force_probe. Changes 
in
+the uAPI are expected while the driver is behind these protections. STAGING 
will
+be removed when the driver uAPI gets to a mature state where we can guarantee 
the
+‘no regression’ rule. Then force_probe will be lifted 

Re: [Intel-gfx] [PATCH] i915/guc/slpc: Provide sysfs for efficient freq

2023-04-05 Thread Dixit, Ashutosh
On Wed, 05 Apr 2023 06:57:42 -0700, Rodrigo Vivi wrote:
>

Hi Rodrigo,

> On Fri, Mar 31, 2023 at 08:11:29PM -0700, Dixit, Ashutosh wrote:
> > On Fri, 31 Mar 2023 19:00:49 -0700, Vinay Belgaumkar wrote:
> > >
> >
> > Hi Vinay,
> >
> > > @@ -478,20 +507,15 @@ int intel_guc_slpc_set_min_freq(struct 
> > > intel_guc_slpc *slpc, u32 val)
> > >   val > slpc->max_freq_softlimit)
> > >   return -EINVAL;
> > >
> > > + /* Ignore efficient freq if lower min freq is requested */
> > > + ret = intel_guc_slpc_set_ignore_eff_freq(slpc, val < slpc->rp1_freq);
> > > + if (ret)
> > > + goto out;
> > > +
> >
> > I don't agree with this. If we are now providing an interface explicitly to
> > ignore RPe, that should be /only/ way to ignore RPe. There should be no
> > other "under the hood" ignoring of RPe. In other words, ignoring RPe should
> > be minimized unless explicitly requested.
> >
> > I don't clearly understand why this was done previously but it makes even
> > less sense to me now after this patch.
>
> well, I had suggested this previously. And just because without this we would
> be breaking API expectations.
>
> When user selects a minimal frequency it expect that to stick. But with the
> efficient freq enabled in guc if minimal is less than the efficient one,
> this request is likely ignored.
>
> Well, even worse is that we are actually caching the request in the soft 
> values.
> So we show a minimal, but the hardware without any workload is operating at
> efficient.
>
> So, the thought process was: 'if user requested a very low minimal, we give 
> them
> the minimal requested, even if that means to disable the efficient freq.'

Hmm, I understand this even less now :)

* Why is RPe ignored when min < RPe? Since the freq can be between min and
  max? Shouldn't the condition be min > RPe, that is turn RPe off if min
  higher that RPe is requested?

* Also isn't RPe dynamic, so we can't say RPe == rp1 when using in KMD?

* Finally, we know that enabling RPe broke the kernel freq API because RPe
  could go over max_freq. So it is actually the max freq which is not
  obeyed after RPe is enabled.

So we ignore RPe in some select cases (which also I don't understand as
mentioned above but maybe I am missing something) to claim that we are
obeying the freq API, but let the freq API stay broken in other cases?

> So, that was introduced to avoid API breakage. Removing it now would mean
> breaking API. (Not sure if the IGT tests for the API got merged already,
> but think that as the API contract).

I think we should take this patch as an opportunity to fix this and give
the user a clean interface to ignore RPe and remove this other implicit way
to ignore RPe. All IGT changes are unmerged at present.

Thanks.
--
Ashutosh



>
> But I do agree with you that having something selected from multiple places
> also has the potential to cause some miss-expectations. So I was thinking
> about multiple even orders where the user select the RP0 as minimal, then
> enable the efficient or vice versa, but I couldn't think of a bad case.
> Or at least not as bad as the user asking to get RP0 as minimal and only
> getting RPe back.
>
> With this in mind, and having checked the code:
>
> Reviewed-by: Rodrigo Vivi 
>
> But I won't push this immediately because I'm still open to hear another
> side/angle.
>
> >
> > Thanks.
> > --
> > Ashutosh
> >
> >
> > >   /* Need a lock now since waitboost can be modifying min as well */
> > >   mutex_lock(>lock);
> > >   wakeref = intel_runtime_pm_get(>runtime_pm);
> > >
> > > - /* Ignore efficient freq if lower min freq is requested */
> > > - ret = slpc_set_param(slpc,
> > > -  SLPC_PARAM_IGNORE_EFFICIENT_FREQUENCY,
> > > -  val < slpc->rp1_freq);
> > > - if (ret) {
> > > - guc_probe_error(slpc_to_guc(slpc), "Failed to toggle efficient 
> > > freq: %pe\n",
> > > - ERR_PTR(ret));
> > > - goto out;
> > > - }
> > > -
> > >   ret = slpc_set_param(slpc,
> > >SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ,
> > >val);


Re: [PATCH RFC 12/18] rust: drm: sched: Add GPU scheduler abstraction

2023-04-05 Thread Daniel Vetter
On Wed, Apr 05, 2023 at 05:43:01PM +0200, Daniel Vetter wrote:
> On Tue, Mar 07, 2023 at 11:25:37PM +0900, Asahi Lina wrote:
> > +/// An armed DRM scheduler job (not yet submitted)
> > +pub struct ArmedJob<'a, T: JobImpl>(Box>, PhantomData<&'a T>);
> > +
> > +impl<'a, T: JobImpl> ArmedJob<'a, T> {
> > +/// Returns the job fences
> > +pub fn fences() -> JobFences<'_> {
> > +JobFences(unsafe {  *self.0.job.s_fence })
> > +}
> > +
> > +/// Push the job for execution into the scheduler
> > +pub fn push(self) {
> > +// After this point, the job is submitted and owned by the 
> > scheduler
> > +let ptr = match self {
> > +ArmedJob(job, _) => Box::>::into_raw(job),
> > +};
> 
> If I get this all right then this all makes sure that drivers can't use
> the job after push and they don't forgot to call arm.
> 
> What I'm not seeing is how we force drivers to call push once they've
> called arm? I haven't check what the code does, but from the docs it
> sounds like if you don't call push then drop will get called. Which wreaks
> the book-keeping on an armed job. Or is there someting that prevents
> ArmedJob from having the Drop trait and so the only way to not go boom
> is by pushing it?
> 
> Googling for "rust undroppable" seems to indicate that this isn't a thing
> rust can do?

Another thing that I just realized: The driver must ensure that the
arm->push sequence on a given drm_sched_entity isn't interrupte by another
thread doing the same, i.e. you need to wrap it all in a lock, and it
always needs to be the same lock for a given entity.

I have no idea how to guarantee that, but I guess somehow we should?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v4 6/6] drm/msm/dsi: Fix calculations pkt_per_line

2023-04-05 Thread Dmitry Baryshkov

On 05/04/2023 03:41, Jessica Zhang wrote:

Currently, pkt_per_line is calculated by dividing slice_per_intf by
slice_count. This is incorrect, as slice_per_intf should be divided by
slice_per_pkt, which is not always equivalent to slice_count as it is
possible for there to be multiple soft slices per interface even though
a panel only specifies one slice per packet.

Fixes: 08802f515c3c ("drm/msm/dsi: Add support for DSC configuration")
Signed-off-by: Jessica Zhang 
Reviewed-by: Dmitry Baryshkov 
---
  drivers/gpu/drm/msm/dsi/dsi_host.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)


Reviewed-by: Dmitry Baryshkov 

--
With best wishes
Dmitry



Re: [PATCH v4 5/6] drm/msm/dsi: update hdisplay calculation for dsi_timing_setup

2023-04-05 Thread Dmitry Baryshkov

On 05/04/2023 03:41, Jessica Zhang wrote:

hdisplay for compressed images should be calculated as bytes_per_slice *
slice_count. Thus, use MSM DSC helper to calculate hdisplay for
dsi_timing_setup instead of directly using mode->hdisplay.

Changes in v3:
- Split from previous patch
- Initialized hdisplay as uncompressed pclk per line at the beginning of
   dsi_timing_setup as to not break dual DSI calculations

Changes in v4:
- Moved pclk_per_intf calculations to DSC hdisplay adjustments

Signed-off-by: Jessica Zhang 
---
  drivers/gpu/drm/msm/dsi/dsi_host.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Dmitry Baryshkov 

--
With best wishes
Dmitry



Re: [PATCH v4 4/6] drm/msm/dsi: Use MSM and DRM DSC helper methods

2023-04-05 Thread Dmitry Baryshkov

On 05/04/2023 03:41, Jessica Zhang wrote:

Use MSM and DRM DSC helper methods to configure DSC for DSI.

Changes in V2:
- *_calculate_initial_scale_value --> *_set_initial_scale_value
- Split pkt_per_line and eol_byte_num changes to a separate patch
- Moved pclk_per_line calculation to hdisplay adjustment in `if (dsc)`
   block of dsi_update_dsc_timing()

Changes in v3:
- Split pclk_per_intf calculation into a separate patch
- Added slice_width check to dsi_timing_setup
- Used MSM DSC helper to calculate total_bytes_per_intf

Signed-off-by: Jessica Zhang 
---
  drivers/gpu/drm/msm/dsi/dsi_host.c | 13 ++---
  1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/dsi/dsi_host.c 
b/drivers/gpu/drm/msm/dsi/dsi_host.c
index 74d38f90398a..6a6218a9655f 100644
--- a/drivers/gpu/drm/msm/dsi/dsi_host.c
+++ b/drivers/gpu/drm/msm/dsi/dsi_host.c
@@ -28,6 +28,7 @@
  #include "dsi.xml.h"
  #include "sfpb.xml.h"
  #include "dsi_cfg.h"
+#include "msm_dsc_helper.h"
  #include "msm_kms.h"
  #include "msm_gem.h"
  #include "phy/dsi_phy.h"
@@ -848,7 +849,7 @@ static void dsi_update_dsc_timing(struct msm_dsi_host 
*msm_host, bool is_cmd_mod
/* first calculate dsc parameters and then program
 * compress mode registers
 */
-   slice_per_intf = DIV_ROUND_UP(hdisplay, dsc->slice_width);
+   slice_per_intf = msm_dsc_get_slice_per_intf(dsc, hdisplay);
  
  	/*

 * If slice_count is greater than slice_per_intf
@@ -858,7 +859,7 @@ static void dsi_update_dsc_timing(struct msm_dsi_host 
*msm_host, bool is_cmd_mod
if (dsc->slice_count > slice_per_intf)
dsc->slice_count = 1;
  
-	total_bytes_per_intf = dsc->slice_chunk_size * slice_per_intf;

+   total_bytes_per_intf = msm_dsc_get_bytes_per_intf(dsc, hdisplay);
  
  	eol_byte_num = total_bytes_per_intf % 3;

pkt_per_line = slice_per_intf / dsc->slice_count;
@@ -936,6 +937,12 @@ static void dsi_timing_setup(struct msm_dsi_host 
*msm_host, bool is_bonded_dsi)
return;
}
  
+		if (!dsc->slice_width || (mode->hdisplay < dsc->slice_width)) {

+   pr_err("DSI: invalid slice width %d (pic_width: %d)\n",
+  dsc->slice_width, mode->hdisplay);
+   return;
+   }


This is not the "use of MSM and DRM DSC helper methods" and thus should 
be moved to a separate patch.



+
dsc->pic_width = mode->hdisplay;
dsc->pic_height = mode->vdisplay;
DBG("Mode %dx%d\n", dsc->pic_width, dsc->pic_height);
@@ -1759,7 +1766,7 @@ static int dsi_populate_dsc_params(struct msm_dsi_host 
*msm_host, struct drm_dsc
return ret;
}
  
-	dsc->initial_scale_value = 32;

+   drm_dsc_set_initial_scale_value(dsc);
dsc->line_buf_depth = dsc->bits_per_component + 1;
  
  	return drm_dsc_compute_rc_parameters(dsc);




--
With best wishes
Dmitry



Re: [PATCH v4 3/6] drm/msm/dpu: Fix slice_last_group_size calculation

2023-04-05 Thread Dmitry Baryshkov

On 05/04/2023 03:41, Jessica Zhang wrote:

Correct the math for slice_last_group_size so that it matches the
calculations downstream.

Changes in v3:
- Reworded slice_last_group_size calculation to
   `(dsc->slice_width + 2) % 3`

Fixes: c110cfd1753e ("drm/msm/disp/dpu1: Add support for DSC")
Signed-off-by: Jessica Zhang 
---
  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)


Reviewed-by: Dmitry Baryshkov 

--
With best wishes
Dmitry



Re: [PATCH v4 1/6] drm/msm: Add MSM-specific DSC helper methods

2023-04-05 Thread Dmitry Baryshkov

On 05/04/2023 03:41, Jessica Zhang wrote:

Introduce MSM-specific DSC helper methods, as some calculations are
common between DP and DSC.

Changes in v2:
- Moved files up to msm/ directory
- Dropped get_comp_ratio() helper
- Used drm_int2fixp() to convert to integers to fp
- Style changes to improve readability
- Dropped unused bpp variable in msm_dsc_get_dce_bytes_per_line()
- Changed msm_dsc_get_slice_per_intf() to a static inline method
- Dropped last division step of msm_dsc_get_pclk_per_line() and changed
   method name accordingly
- Changed DSC_BPP macro to drm_dsc_get_bpp_int() helper method
- Fixed some math issues caused by passing in incorrect types to
   drm_fixed methods in get_bytes_per_soft_slice()

Changes in v3:
- Dropped src_bpp parameter from all methods -- src_bpp can be
   calculated as dsc->bits_per_component * 3
- Dropped intf_width parameter from get_bytes_per_soft_slice()
- Moved dsc->bits_per_component to numerator calculation in
   get_bytes_per_soft_slice()
- Renamed msm_dsc_get_uncompressed_pclk_per_line to
   *_get_uncompressed_pclk_per_intf()
- Removed dsc->slice_width check from
   msm_dsc_get_uncompressed_pclk_per_intf()
- Made get_bytes_per_soft_slice() a public method (this will be called
   later to help calculate DP pclk params)
- Added documentation in comments
- Moved extra_eol_bytes math out of msm_dsc_get_eol_byte_num() and
   renamed msm_dsc_get_eol_byte_num to *_get_bytes_per_intf.

Changes in v4:
- Changed msm_dsc_get_uncompressed_pclk_per_intf to
   msm_dsc_get_pclk_per_intf

Signed-off-by: Jessica Zhang 
---
  drivers/gpu/drm/msm/Makefile |  1 +
  drivers/gpu/drm/msm/msm_dsc_helper.c | 47 
  drivers/gpu/drm/msm/msm_dsc_helper.h | 70 
  3 files changed, 118 insertions(+)

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index 7274c41228ed..b814fc80e2d5 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -94,6 +94,7 @@ msm-y += \
msm_atomic_tracepoints.o \
msm_debugfs.o \
msm_drv.o \
+   msm_dsc_helper.o \
msm_fb.o \
msm_fence.o \
msm_gem.o \
diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.c 
b/drivers/gpu/drm/msm/msm_dsc_helper.c
new file mode 100644
index ..0539221eb09d
--- /dev/null
+++ b/drivers/gpu/drm/msm/msm_dsc_helper.c
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved
+ */
+
+#include 
+#include 
+#include 
+
+#include "msm_drv.h"
+#include "msm_dsc_helper.h"
+
+s64 get_bytes_per_soft_slice(struct drm_dsc_config *dsc)
+{
+   int bpp = msm_dsc_get_bpp_int(dsc);
+   s64 numerator_fp, denominator_fp;
+   s64 comp_ratio_fp = drm_fixp_from_fraction(dsc->bits_per_component * 3, 
bpp);
+
+   numerator_fp = drm_int2fixp(dsc->slice_width * 3 * 
dsc->bits_per_component);
+   denominator_fp = drm_fixp_mul(comp_ratio_fp, drm_int2fixp(8));
+
+   return drm_fixp_div(numerator_fp, denominator_fp);


If we remove 3 * dsc->bits_per_components from both numerator and 
denominator, this whole function seems to be as simple as 
DIV_ROUND_UP(dsc->slice_width * bpp, 8)


Or, if you prefer FP math, drm_fixp_from_fraction(dsc->slice_width * 
bpp, 8).



+}
+
+u32 msm_dsc_get_bytes_per_intf(struct drm_dsc_config *dsc, int intf_width)
+{
+   u32 bytes_per_soft_slice, bytes_per_intf;
+   s64 bytes_per_soft_slice_fp;
+   int slice_per_intf = msm_dsc_get_slice_per_intf(dsc, intf_width);
+
+   bytes_per_soft_slice_fp = get_bytes_per_soft_slice(dsc);
+   bytes_per_soft_slice = drm_fixp2int_ceil(bytes_per_soft_slice_fp);
+
+   bytes_per_intf = bytes_per_soft_slice * slice_per_intf;
+
+   return bytes_per_intf;
+}
+
+int msm_dsc_get_pclk_per_intf(struct drm_dsc_config *dsc)
+{
+   s64 data_width;
+
+   data_width = drm_fixp_mul(drm_int2fixp(dsc->slice_count),
+   get_bytes_per_soft_slice(dsc));


And this is then DIV_ROUND_UP(dsc->slice_width * dsc->slice_count * bpp, 8)


+
+   return drm_fixp2int_ceil(data_width);
+}
diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.h 
b/drivers/gpu/drm/msm/msm_dsc_helper.h
new file mode 100644
index ..31116a31090f
--- /dev/null
+++ b/drivers/gpu/drm/msm/msm_dsc_helper.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved
+ */
+
+#ifndef MSM_DSC_HELPER_H_
+#define MSM_DSC_HELPER_H_
+
+#include 
+#include 
+
+/*
+ * Helper methods for MSM specific DSC calculations that are common between 
timing engine,
+ * DSI, and DP.
+ */
+
+/**
+ * msm_dsc_get_bpp_int - get bits per pixel integer value
+ * @dsc: Pointer to drm dsc config struct
+ */
+static inline int msm_dsc_get_bpp_int(struct drm_dsc_config *dsc)
+{
+   WARN_ON_ONCE(dsc->bits_per_pixel & 0xf);
+   return dsc->bits_per_pixel >> 4;
+}
+
+/**

Re: [PULL] drm-misc-fixes

2023-04-05 Thread Daniel Vetter
On Wed, Apr 05, 2023 at 08:28:55PM +0200, Thomas Zimmermann wrote:
> Hi Dave and Daniel,
> 
> here's this week's PR for drm-misc-fixes. As requested, it comes
> a day earlier than usual due to Easter holidays.
> 
> Best regards
> Thomas
> 
> drm-misc-fixes-2023-04-05:
> Short summary of fixes pull:
> 
>  * ivpu: DMA fence and suspend fixes
>  * nouveau: Color-depth fixes
>  * panfrost: Fix mmap error handling
> The following changes since commit 25bbe844ef5c4fb4d7d8dcaa0080f922b7cd3a16:
> 
>   drm: test: Fix 32-bit issue in drm_buddy_test (2023-03-29 17:14:15 +0200)
> 
> are available in the Git repository at:
> 
>   git://anongit.freedesktop.org/drm/drm-misc tags/drm-misc-fixes-2023-04-05
> 
> for you to fetch changes up to 0ec8671837a61d841462179686c5819d951d3b10:
> 
>   accel/ivpu: Fix S3 system suspend when not idle (2023-04-05 09:07:26 +0200)

Pulled, thanks.

> 
> 
> Short summary of fixes pull:
> 
>  * ivpu: DMA fence and suspend fixes
>  * nouveau: Color-depth fixes
>  * panfrost: Fix mmap error handling
> 
> 
> Boris Brezillon (1):
>   drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path
> 
> Jacek Lawrynowicz (1):
>   accel/ivpu: Fix S3 system suspend when not idle
> 
> Karol Herbst (1):
>   drm/nouveau/disp: Support more modes by checking with lower bpc
> 
> Karol Wachowski (1):
>   accel/ivpu: Add dma fence to command buffers only
> 
>  drivers/accel/ivpu/ivpu_job.c   | 18 +++---
>  drivers/accel/ivpu/ivpu_pm.c| 26 +++---
>  drivers/gpu/drm/nouveau/dispnv50/disp.c | 32 
>  drivers/gpu/drm/nouveau/nouveau_dp.c|  8 +---
>  drivers/gpu/drm/panfrost/panfrost_mmu.c |  1 +
>  5 files changed, 56 insertions(+), 29 deletions(-)
> 
> -- 
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [RFC PATCH 00/10] Xe DRM scheduler and long running workload plans

2023-04-05 Thread Matthew Brost
On Wed, Apr 05, 2023 at 12:06:53PM -0600, Zeng, Oak wrote:
> Hi,
> 
> Using dma-fence for completion/dependency tracking for long-run workload(more 
> precisely on-demand paging/page fault enabled workload) can cause deadlock. 
> This seems the significant issue here. Other issues such as the drm scheduler 
> completion order implication etc are minors which can be solve inside the 
> framework of drm scheduler. We need to evaluate below paths:
> 
>   1) still use drm scheduler for job submission, and use dma-fence for 
> job completion waiting/dependency tracking. This is solution proposed in this 
> series. Annotate dma-fence for long-run workload: user can still wait 
> dma-fence for job completion but can't wait dma-fence while holding any 
> memory management locks.  We still use dma-fence for dependency tracking. But 
> it is just very easily run into deadlock when on-demand paging is in the 
> picture. The annotation helps us to detect deadlock but not solve deadlock 
> problems. Seems *not* a complete solution: It is almost impossible to 
> completely avoid dependency deadlock in complex runtime environment
>

No one can wait on LR fence, so it is impossible to deadlock. The
annotations enforce this. Literally this is only for flow controling the
ring / hold pending jobs in in the DRM schedule list.

>   2) Still use drm scheduler but not use dma-fence for completion 
> signaling and dependency tracking. This way we still get some free functions 
> (reset, err handling ring flow control as Matt said)from drm scheduler, but 
> push the dependency/completion tracking completely to user space using 
> techniques such as user space fence. User space doesn't have chance to wait 
> fence while holding a kernel memory management lock, thus the dma-fence 
> deadlock issue is solved.
>

We use user space fence for syncs.

>   3) Completely discard drm scheduler and dma-fence for long-run 
> workload. Use user queue/doorbell for super fast submission, directly 
> interact with fw scheduler. Use user fence for completion/dependency tracking.
> 

This is a hard no from me, I want 1 submission path in Xe. Either we use
the DRM scheduler or we don't.

Matt

> Thanks,
> Oak
> 
> > -Original Message-
> > From: Christian König 
> > Sent: April 5, 2023 3:30 AM
> > To: Brost, Matthew ; Zeng, Oak
> > 
> > Cc: dri-devel@lists.freedesktop.org; intel...@lists.freedesktop.org;
> > robdcl...@chromium.org; thomas.hellst...@linux.intel.com; airl...@linux.ie;
> > l...@asahilina.net; boris.brezil...@collabora.com; 
> > faith.ekstr...@collabora.com
> > Subject: Re: [RFC PATCH 00/10] Xe DRM scheduler and long running workload
> > plans
> > 
> > Am 04.04.23 um 20:08 schrieb Matthew Brost:
> > > On Tue, Apr 04, 2023 at 12:02:03PM -0600, Zeng, Oak wrote:
> > >> Hi Matt, Thomas,
> > >>
> > >> Some very bold out of box thinking in this area:
> > >>
> > >> 1. so you want to use drm scheduler and dma-fence for long running 
> > >> workload.
> > Why you want to do this in the first place? What is the benefit? Drm 
> > scheduler is
> > pretty much a software scheduler. Modern gpu has scheduler built at fw/hw
> > level, as you said below for intel this is Guc. Can xe driver just directly 
> > submit job
> > to Guc, bypassing drm scheduler?
> > >>
> > > If we did that now we have 2 paths for dependency track, flow controling
> > > the ring, resets / error handling / backend submission implementations.
> > > We don't want this.
> > 
> > Well exactly that's the point: Why?
> > 
> > As far as I can see that are two completely distinct use cases, so you
> > absolutely do want two completely distinct implementations for this.
> > 
> > >> 2. using dma-fence for long run workload: I am well aware that page 
> > >> fault (and
> > the consequent memory allocation/lock acquiring to fix the fault) can cause
> > deadlock for a dma-fence wait. But I am not convinced that dma-fence can't 
> > be
> > used purely because the nature of the workload that it runs very long 
> > (indefinite).
> > I did a math: the dma_fence_wait_timeout function's third param is the 
> > timeout
> > which is a signed long type. If HZ is 1000, this is about 23 days. If 23 
> > days is not long
> > enough, can we just change the timeout parameter to signed 64 bits so it is 
> > much
> > longer than our life time...
> > >>
> > >> So I mainly argue we can't use dma-fence for long-run workload is not
> > because the workload runs very long, rather because of the fact that we use
> > page fault for long-run workload. If we enable page fault for short-run 
> > workload,
> > we can't use dma-fence either. Page fault is the key thing here.
> > >>
> > >> Now since we use page fault which is *fundamentally* controversial with
> > dma-fence design, why now just introduce a independent concept such as user-
> > fence instead of extending existing dma-fence?
> > >>
> > >> I like unified design. If drm scheduler, dma-fence can be extended to 
> > >> work for
> > 

[linux-next:master] BUILD REGRESSION 8417c8f5007bf4567ccffda850a3157c7d905f67

2023-04-05 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 8417c8f5007bf4567ccffda850a3157c7d905f67  Add linux-next specific 
files for 20230405

Error/Warning reports:

https://lore.kernel.org/oe-kbuild-all/202303161521.jbgbafjj-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_validation.c:351:13: 
warning: variable 'bw_needed' set but not used [-Wunused-but-set-variable]
drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_validation.c:352:25: 
warning: variable 'link' set but not used [-Wunused-but-set-variable]
drivers/gpu/drm/tegra/fb.c:276:60: error: 'VM_MAP' undeclared (first use in 
this function); did you mean 'VM_MTE'?
drivers/net/wireless/legacy/ray_cs.c:628:17: warning: 'strncpy' specified bound 
32 equals destination size [-Wstringop-truncation]
ld.lld: error: undefined symbol: find_kallsyms_symbol_value

Unverified Error/Warning (likely false positive, please contact us if 
interested):

drivers/acpi/property.c:985 acpi_data_prop_read_single() error: potentially 
dereferencing uninitialized 'obj'.
include/linux/gpio/consumer.h: linux/err.h is included more than once.
include/linux/gpio/driver.h: asm/bug.h is included more than once.
io_uring/io_uring.c:432 io_prep_async_work() error: we previously assumed 
'req->file' could be null (see line 425)
io_uring/kbuf.c:221 __io_remove_buffers() warn: variable dereferenced before 
check 'bl->buf_ring' (see line 219)

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
|   `-- 
drivers-net-wireless-legacy-ray_cs.c:warning:strncpy-specified-bound-equals-destination-size
|-- alpha-randconfig-r036-20230403
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
|-- arc-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
|-- arm-allmodconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
|-- arm-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
|-- arm64-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
|-- i386-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
|-- ia64-allmodconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
|   `-- 
drivers-net-wireless-legacy-ray_cs.c:warning:strncpy-specified-bound-equals-destination-size
|-- loongarch-allmodconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
|-- loongarch-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
|-- loongarch-defconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
|-- mips-allmodconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
|   `-- 
drivers-gpu-drm-tegra-fb.c:error:VM_MAP-undeclared-(first-use-in-this-function)
|-- mips-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-d

Re: [PATCH 01/18] fbdev: Prepare generic architecture helpers

2023-04-05 Thread Thomas Zimmermann

Hi

Am 05.04.23 um 17:53 schrieb Arnd Bergmann:

On Wed, Apr 5, 2023, at 17:05, Thomas Zimmermann wrote:

Generic implementations of fb_pgprotect() and fb_is_primary_device()
have been in the source code for a long time. Prepare the header file
to make use of them.

Improve the code by using an inline function for fb_pgprotect() and
by removing include statements.

Symbols are protected by preprocessor guards. Architectures that
provide a symbol need to define a preprocessor token of the same
name and value. Otherwise the header file will provide a generic
implementation. This pattern has been taken from .

Signed-off-by: Thomas Zimmermann 


Moving this into generic code is good, but I'm not sure
about the default for fb_pgprotect():


+
+#ifndef fb_pgprotect
+#define fb_pgprotect fb_pgprotect
+static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
+   unsigned long off)
+{ }
+#endif


I think most architectures will want the version we have on
arc, arm, arm64, loongarch, and sh already:

static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
 unsigned long off)
{
vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
}

so I'd suggest making that version the default, and treating the
empty ones (m68knommu, sparc32) as architecture specific
workarounds.


Make sense, thanks for the feedback. I'll send out an update soon.

Best regards
Thomas



I see that sparc64 and parisc use pgprot_uncached here, but as
they don't define a custom pgprot_writecombine, this ends up being
the same, and they can use the above definition as well.

mips defines pgprot_writecombine but uses pgprot_noncached
in fb_pgprotect(), which is probably a mistake and should have
been updated as part of commit 4b050ba7a66c ("MIPS: pgtable.h:
Implement the pgprot_writecombine function for MIPS").

 Arnd


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature


Re: [RFC PATCH 02/10] drm/sched: Move schedule policy to scheduler / entity

2023-04-05 Thread Matthew Brost
On Wed, Apr 05, 2023 at 01:37:22PM -0400, Luben Tuikov wrote:
> Hi,
> 
> Inlined:
> 

Thanks for the feedback.

> On 2023-04-03 20:22, Matthew Brost wrote:
> > Rather than a global modparam for scheduling policy, move the scheduling
> > policy to scheduler / entity so user can control each scheduler / entity
> > policy.
> > 
> > Signed-off-by: Matthew Brost 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  1 +
> >  drivers/gpu/drm/etnaviv/etnaviv_sched.c|  3 ++-
> >  drivers/gpu/drm/lima/lima_sched.c  |  3 ++-
> >  drivers/gpu/drm/msm/msm_ringbuffer.c   |  3 ++-
> >  drivers/gpu/drm/panfrost/panfrost_job.c|  3 ++-
> >  drivers/gpu/drm/scheduler/sched_entity.c   | 25 ++
> >  drivers/gpu/drm/scheduler/sched_main.c | 21 +-
> >  drivers/gpu/drm/v3d/v3d_sched.c| 15 -
> >  include/drm/gpu_scheduler.h| 23 ++--
> >  9 files changed, 73 insertions(+), 24 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 00c9c03c8f94..4df0fca5a74c 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2368,6 +2368,7 @@ static int amdgpu_device_init_schedulers(struct 
> > amdgpu_device *adev)
> >ring->num_hw_submission, 
> > amdgpu_job_hang_limit,
> >timeout, adev->reset_domain->wq,
> >ring->sched_score, ring->name,
> > +  DRM_SCHED_POLICY_DEFAULT,
> >adev->dev);
> > if (r) {
> > DRM_ERROR("Failed to create scheduler on ring %s.\n",
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c 
> > b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > index 8486a2923f1b..61204a3f8b0b 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> > @@ -136,7 +136,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
> > ret = drm_sched_init(>sched, _sched_ops, NULL,
> >  etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
> >  msecs_to_jiffies(500), NULL, NULL,
> > -dev_name(gpu->dev), gpu->dev);
> > +dev_name(gpu->dev), DRM_SCHED_POLICY_DEFAULT,
> > +gpu->dev);
> > if (ret)
> > return ret;
> >  
> > diff --git a/drivers/gpu/drm/lima/lima_sched.c 
> > b/drivers/gpu/drm/lima/lima_sched.c
> > index 54f53bece27c..33042ba6ae93 100644
> > --- a/drivers/gpu/drm/lima/lima_sched.c
> > +++ b/drivers/gpu/drm/lima/lima_sched.c
> > @@ -491,7 +491,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, 
> > const char *name)
> > return drm_sched_init(>base, _sched_ops, NULL, 1,
> >   lima_job_hang_limit,
> >   msecs_to_jiffies(timeout), NULL,
> > - NULL, name, pipe->ldev->dev);
> > + NULL, name, DRM_SCHED_POLICY_DEFAULT,
> > + pipe->ldev->dev);
> >  }
> >  
> >  void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
> > diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c 
> > b/drivers/gpu/drm/msm/msm_ringbuffer.c
> > index 5879fc262047..f408a9097315 100644
> > --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
> > +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
> > @@ -97,7 +97,8 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu 
> > *gpu, int id,
> >  
> > ret = drm_sched_init(>sched, _sched_ops, NULL,
> > num_hw_submissions, 0, sched_timeout,
> > -   NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
> > +   NULL, NULL, to_msm_bo(ring->bo)->name,
> > +   DRM_SCHED_POLICY_DEFAULT, gpu->dev->dev);
> > if (ret) {
> > goto fail;
> > }
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
> > b/drivers/gpu/drm/panfrost/panfrost_job.c
> > index f48b07056a16..effa48b33dce 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> > @@ -819,7 +819,8 @@ int panfrost_job_init(struct panfrost_device *pfdev)
> >  nentries, 0,
> >  msecs_to_jiffies(JOB_TIMEOUT_MS),
> >  pfdev->reset.wq,
> > -NULL, "pan_js", pfdev->dev);
> > +NULL, "pan_js", DRM_SCHED_POLICY_DEFAULT,
> > +pfdev->dev);
> > if (ret) {
> > dev_err(pfdev->dev, "Failed to create scheduler: %d.", 
> > ret);
> > goto err_sched;
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
> > b/drivers/gpu/drm/scheduler/sched_entity.c
> > index 

[PULL] drm-misc-fixes

2023-04-05 Thread Thomas Zimmermann
Hi Dave and Daniel,

here's this week's PR for drm-misc-fixes. As requested, it comes
a day earlier than usual due to Easter holidays.

Best regards
Thomas

drm-misc-fixes-2023-04-05:
Short summary of fixes pull:

 * ivpu: DMA fence and suspend fixes
 * nouveau: Color-depth fixes
 * panfrost: Fix mmap error handling
The following changes since commit 25bbe844ef5c4fb4d7d8dcaa0080f922b7cd3a16:

  drm: test: Fix 32-bit issue in drm_buddy_test (2023-03-29 17:14:15 +0200)

are available in the Git repository at:

  git://anongit.freedesktop.org/drm/drm-misc tags/drm-misc-fixes-2023-04-05

for you to fetch changes up to 0ec8671837a61d841462179686c5819d951d3b10:

  accel/ivpu: Fix S3 system suspend when not idle (2023-04-05 09:07:26 +0200)


Short summary of fixes pull:

 * ivpu: DMA fence and suspend fixes
 * nouveau: Color-depth fixes
 * panfrost: Fix mmap error handling


Boris Brezillon (1):
  drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path

Jacek Lawrynowicz (1):
  accel/ivpu: Fix S3 system suspend when not idle

Karol Herbst (1):
  drm/nouveau/disp: Support more modes by checking with lower bpc

Karol Wachowski (1):
  accel/ivpu: Add dma fence to command buffers only

 drivers/accel/ivpu/ivpu_job.c   | 18 +++---
 drivers/accel/ivpu/ivpu_pm.c| 26 +++---
 drivers/gpu/drm/nouveau/dispnv50/disp.c | 32 
 drivers/gpu/drm/nouveau/nouveau_dp.c|  8 +---
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  1 +
 5 files changed, 56 insertions(+), 29 deletions(-)

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


Re: [PATCH] drm/amd/display: Fix potential null dereference

2023-04-05 Thread Hamza Mahfooz

On 4/3/23 09:10, Igor Artemiev wrote:

The adev->dm.dc pointer can be NULL and dereferenced in amdgpu_dm_fini()
without checking.

Add a NULL pointer check before calling dc_dmub_srv_destroy().

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 9a71c7d31734 ("drm/amd/display: Register DMUB service with DC")
Signed-off-by: Igor Artemiev 


Applied, thanks!


---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index a01fd41643fc..27f7a554874e 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1854,7 +1854,8 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev)
dc_deinit_callbacks(adev->dm.dc);
  #endif
  
-	dc_dmub_srv_destroy(>dm.dc->ctx->dmub_srv);

+   if (adev->dm.dc)
+   dc_dmub_srv_destroy(>dm.dc->ctx->dmub_srv);
  
  	if (dc_enable_dmub_notifications(adev->dm.dc)) {

kfree(adev->dm.dmub_notify);

--
Hamza



Re: [PATCH] drm/scheduler: Fix UAF race in drm_sched_entity_push_job()

2023-04-05 Thread Luben Tuikov
On 2023-04-05 12:37, Asahi Lina wrote:
> After a job is pushed into the queue, it is owned by the scheduler core
> and may be freed at any time, so we can't write nor read the submit
> timestamp after that point.
> 
> Fixes oopses observed with the drm/asahi driver, found with kASAN.
> 
> Signed-off-by: Asahi Lina 
> ---

That's a good fix! Thanks for fixing this. I'll push
this through drm-misc-next and drm-misc-fixes, if there
are no objections.

Reviewed-by: Luben Tuikov 

Regards,
Luben

>  drivers/gpu/drm/scheduler/sched_entity.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index 15d04a0ec623..e0a8890a62e2 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -507,12 +507,19 @@ void drm_sched_entity_push_job(struct drm_sched_job 
> *sched_job)
>  {
>   struct drm_sched_entity *entity = sched_job->entity;
>   bool first;
> + ktime_t submit_ts;
>  
>   trace_drm_sched_job(sched_job, entity);
>   atomic_inc(entity->rq->sched->score);
>   WRITE_ONCE(entity->last_user, current->group_leader);
> +
> + /*
> +  * After the sched_job is pushed into the entity queue, it may be
> +  * completed and freed up at any time. We can no longer access it.
> +  * Make sure to set the submit_ts first, to avoid a race.
> +  */
> + sched_job->submit_ts = submit_ts = ktime_get();
>   first = spsc_queue_push(>job_queue, _job->queue_node);
> - sched_job->submit_ts = ktime_get();
>  
>   /* first job wakes up scheduler */
>   if (first) {
> @@ -529,7 +536,7 @@ void drm_sched_entity_push_job(struct drm_sched_job 
> *sched_job)
>   spin_unlock(>rq_lock);
>  
>   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> - drm_sched_rq_update_fifo(entity, sched_job->submit_ts);
> + drm_sched_rq_update_fifo(entity, submit_ts);
>  
>   drm_sched_wakeup(entity->rq->sched);
>   }
> 
> ---
> base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6
> change-id: 20230406-scheduler-uaf-2-44cf8faed245
> 
> Thank you,
> ~~ Lina
> 



Re: [PATCH] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name

2023-04-05 Thread Luben Tuikov
On 2023-04-05 12:34, Asahi Lina wrote:
> A signaled scheduler fence can outlive its scheduler, since fences are
> independently reference counted. Therefore, we can't reference the
> scheduler in the get_timeline_name() implementation.
> 
> Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared
> dma-bufs reference fences from GPU schedulers that no longer exist.
> 
> Signed-off-by: Asahi Lina 
> ---

Thanks for fixing this. If there's no objections, I'll add my tag
and I'll push this tomorrow. I think this should go in through
drm-misc-fixes and drm-misc-next.

Reviewed-by: Luben Tuikov 

Regards,
Luben

>  drivers/gpu/drm/scheduler/sched_entity.c | 7 ++-
>  drivers/gpu/drm/scheduler/sched_fence.c  | 4 +++-
>  include/drm/gpu_scheduler.h  | 5 +
>  3 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index 15d04a0ec623..8b3b949b2ce8 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -368,7 +368,12 @@ static bool drm_sched_entity_add_dependency_cb(struct 
> drm_sched_entity *entity)
>  
>   /*
>* Fence is from the same scheduler, only need to wait for
> -  * it to be scheduled
> +  * it to be scheduled.
> +  *
> +  * Note: s_fence->sched could have been freed and reallocated
> +  * as another scheduler. This false positive case is okay, as if
> +  * the old scheduler was freed all of its jobs must have
> +  * signaled their completion fences.
>*/
>   fence = dma_fence_get(_fence->scheduled);
>   dma_fence_put(entity->dependency);
> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c 
> b/drivers/gpu/drm/scheduler/sched_fence.c
> index 7fd869520ef2..33b145dfa38c 100644
> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> @@ -66,7 +66,7 @@ static const char *drm_sched_fence_get_driver_name(struct 
> dma_fence *fence)
>  static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
>  {
>   struct drm_sched_fence *fence = to_drm_sched_fence(f);
> - return (const char *)fence->sched->name;
> + return (const char *)fence->sched_name;
>  }
>  
>  static void drm_sched_fence_free_rcu(struct rcu_head *rcu)
> @@ -168,6 +168,8 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
>   unsigned seq;
>  
>   fence->sched = entity->rq->sched;
> + strlcpy(fence->sched_name, entity->rq->sched->name,
> + sizeof(fence->sched_name));
>   seq = atomic_inc_return(>fence_seq);
>   dma_fence_init(>scheduled, _sched_fence_ops_scheduled,
>  >lock, entity->fence_context, seq);
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 9db9e5e504ee..49f019731891 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -295,6 +295,11 @@ struct drm_sched_fence {
>   * @lock: the lock used by the scheduled and the finished fences.
>   */
>   spinlock_t  lock;
> +/**
> + * @sched_name: the name of the scheduler that owns this fence. We
> + * keep a copy here since fences can outlive their scheduler.
> + */
> + char sched_name[16];
>  /**
>   * @owner: job owner for debugging
>   */
> 
> ---
> base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6
> change-id: 20230406-scheduler-uaf-1-994ec34cac93
> 
> Thank you,
> ~~ Lina
> 



RE: [RFC PATCH 00/10] Xe DRM scheduler and long running workload plans

2023-04-05 Thread Zeng, Oak
Hi,

Using dma-fence for completion/dependency tracking for long-run workload(more 
precisely on-demand paging/page fault enabled workload) can cause deadlock. 
This seems the significant issue here. Other issues such as the drm scheduler 
completion order implication etc are minors which can be solve inside the 
framework of drm scheduler. We need to evaluate below paths:

1) still use drm scheduler for job submission, and use dma-fence for 
job completion waiting/dependency tracking. This is solution proposed in this 
series. Annotate dma-fence for long-run workload: user can still wait dma-fence 
for job completion but can't wait dma-fence while holding any memory management 
locks.  We still use dma-fence for dependency tracking. But it is just very 
easily run into deadlock when on-demand paging is in the picture. The 
annotation helps us to detect deadlock but not solve deadlock problems. Seems 
*not* a complete solution: It is almost impossible to completely avoid 
dependency deadlock in complex runtime environment

2) Still use drm scheduler but not use dma-fence for completion 
signaling and dependency tracking. This way we still get some free functions 
(reset, err handling ring flow control as Matt said)from drm scheduler, but 
push the dependency/completion tracking completely to user space using 
techniques such as user space fence. User space doesn't have chance to wait 
fence while holding a kernel memory management lock, thus the dma-fence 
deadlock issue is solved.

3) Completely discard drm scheduler and dma-fence for long-run 
workload. Use user queue/doorbell for super fast submission, directly interact 
with fw scheduler. Use user fence for completion/dependency tracking.

Thanks,
Oak

> -Original Message-
> From: Christian König 
> Sent: April 5, 2023 3:30 AM
> To: Brost, Matthew ; Zeng, Oak
> 
> Cc: dri-devel@lists.freedesktop.org; intel...@lists.freedesktop.org;
> robdcl...@chromium.org; thomas.hellst...@linux.intel.com; airl...@linux.ie;
> l...@asahilina.net; boris.brezil...@collabora.com; 
> faith.ekstr...@collabora.com
> Subject: Re: [RFC PATCH 00/10] Xe DRM scheduler and long running workload
> plans
> 
> Am 04.04.23 um 20:08 schrieb Matthew Brost:
> > On Tue, Apr 04, 2023 at 12:02:03PM -0600, Zeng, Oak wrote:
> >> Hi Matt, Thomas,
> >>
> >> Some very bold out of box thinking in this area:
> >>
> >> 1. so you want to use drm scheduler and dma-fence for long running 
> >> workload.
> Why you want to do this in the first place? What is the benefit? Drm 
> scheduler is
> pretty much a software scheduler. Modern gpu has scheduler built at fw/hw
> level, as you said below for intel this is Guc. Can xe driver just directly 
> submit job
> to Guc, bypassing drm scheduler?
> >>
> > If we did that now we have 2 paths for dependency track, flow controling
> > the ring, resets / error handling / backend submission implementations.
> > We don't want this.
> 
> Well exactly that's the point: Why?
> 
> As far as I can see that are two completely distinct use cases, so you
> absolutely do want two completely distinct implementations for this.
> 
> >> 2. using dma-fence for long run workload: I am well aware that page fault 
> >> (and
> the consequent memory allocation/lock acquiring to fix the fault) can cause
> deadlock for a dma-fence wait. But I am not convinced that dma-fence can't be
> used purely because the nature of the workload that it runs very long 
> (indefinite).
> I did a math: the dma_fence_wait_timeout function's third param is the timeout
> which is a signed long type. If HZ is 1000, this is about 23 days. If 23 days 
> is not long
> enough, can we just change the timeout parameter to signed 64 bits so it is 
> much
> longer than our life time...
> >>
> >> So I mainly argue we can't use dma-fence for long-run workload is not
> because the workload runs very long, rather because of the fact that we use
> page fault for long-run workload. If we enable page fault for short-run 
> workload,
> we can't use dma-fence either. Page fault is the key thing here.
> >>
> >> Now since we use page fault which is *fundamentally* controversial with
> dma-fence design, why now just introduce a independent concept such as user-
> fence instead of extending existing dma-fence?
> >>
> >> I like unified design. If drm scheduler, dma-fence can be extended to work 
> >> for
> everything, it is beautiful. But seems we have some fundamental problem here.
> >>
> > Thomas's patches turn a dma-fence into KMD sync point (e.g. we just use
> > the signal / CB infrastructure) and enforce we don't use use these
> > dma-fences from the scheduler in memory reclaim paths or export these to
> > user space or other drivers. Think of this mode as SW only fence.
> 
> Yeah and I truly think this is an really bad idea.
> 
> The signal/CB infrastructure in the dma_fence turned out to be the
> absolutely nightmare I initially predicted. Sorry to say that, but in
> this 

Re: [Intel-gfx] [PATCH 1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers

2023-04-05 Thread Patrik Jakobsson
On Wed, Apr 5, 2023 at 7:15 PM Daniel Vetter  wrote:
>
> On Wed, 5 Apr 2023 at 18:54, Javier Martinez Canillas
>  wrote:
> >
> > Daniel Vetter  writes:
> >
> > > On Wed, Apr 05, 2023 at 04:32:19PM +0200, Thomas Zimmermann wrote:
> >
> > [...]
> >
> > >> > > >/*
> > >> > > > * WARNING: Apparently we must kick fbdev drivers before 
> > >> > > > vgacon,
> > >> > > > * otherwise the vga fbdev driver falls over.
> > >> > > > */
> > >> > > >ret = vga_remove_vgacon(pdev);
> > >> >
> > >> > This isn't enough, we also nuke stuff that's mapping the vga fb range.
> >
> > Ah, also need aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE) 
> > then.
> >
> > [...]
> >
> > >> int aperture_remove_legacy_vga_devices(struct pci_dev *pdev)
> > >> {
> > >>  aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE);
> > >>
> > >>  return vga_remove_vgacon(pdev);
> > >> }
> > >>
> > >> And that can be called from gma500 and the pci aperture helper.
> > >
> > > But you still pass a pci_dev to that helper. Which just doesn't make any
> > > sense to me (assuming your entire point is that this isn't just a normal
> > > pci device but some special legacy vga thing), but if we go with (void)
> > > then there's more refactoring to do because the vga_remove_vgacon also
> > > wants a pdev.
> > >
> > > All so that we don't call aperture_detach_devices() on a bunch of pci
> > > bars, which apparently is not problem for any other driver, but absolutely
> > > is a huge problem for gma500 somehow.
> > >
> > > I don't understand why.
> > >
> >
> > Yeah, agreed that if vga_remove_vgacon() isn't enough and another helper
> > is needed then starts to get a little silly. Maybe one option is to add a
> > 3rd param to aperture_remove_conflicting_pci_devices() and skip the logic
> > to iterate over PCI bars and call aperture_remove_conflicting_devices() ?
>
> The thing I don't get: Why does this matter for gma500 and not any of
> the other pci devices? Look at your gpu, realize there's a lot more
> than the one pci bar for vram or stolen memory, realize that we're
> nuking bars that cannot possible contain the framebuffer for everyone
> else too. Like the entire "gpus have a lot of bars" thing is the
> reason why I pulled the sysfb_disable one level up, because we've been
> doing that quite a few times before this patch (yes it's not the main
> thing, but the side-effect cleanup is why I've gone down this rabbit
> hole and wrote the entire series here):
>
> https://lore.kernel.org/dri-devel/20230404201842.567344-7-daniel.vet...@ffwll.ch/
>
> But somehow for gma500 it's a problem, while for everyone else it's
> fine. That's the part I dont get, or Thomas have been talking past
> each another and there's another issue that I'm missing.
> -Daniel

I'm also getting confused here.

AFAIK the stolen memory works the same for gma500 hardware as other
Intel GPUs. Are you saying that there is a difference in how gma500
hardware works? I always assumed that i915 got away with not dealing
much with stolen memory because it simply doesn't use it for
allocations. In gma500 we use it for fbdev and cursors. The actual
pages reserved by the bios can be accessed through a pci bar if you
map it so (which IIRC we do) but I suppose that doesn't help
identifying it as a range reserved by other drivers.

The reason I've kept the stolen allocation logic is because some
gma500 systems don't have a lot of memory. But that is mostly the old
Pouslbo systems. Perhaps it is time to ditch the stolen allocation
code?

-Patrik

>
> > > Consider this me throwing in the towel. If you are convinced this
> > > makes sense please type it up and merge it, but I'm not going to type
> > > something that just doesn't make sense to me.
> >
> > Honestly, I would just go with the double drm_aperture_remove_*() helper
> > calls (your original patch) unless that causes real issues. There is no
> > point on blocking all your series just for this IMO.
> >
> > Then latter if Thomas has strong opinions can send a follow-up patch for
> > the gma500 driver and the aperture helpers.
> >
> > > -Daniel
> > >
> >
> > --
> > Best regards,
> >
> > Javier Martinez Canillas
> > Core Platforms
> > Red Hat
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


Re: [Regression] drm/scheduler: track GPU active time per entity

2023-04-05 Thread Lucas Stach
Hi Luben,

Am Dienstag, dem 04.04.2023 um 00:31 -0400 schrieb Luben Tuikov:
> On 2023-03-28 04:54, Lucas Stach wrote:
> > Hi Danilo,
> > 
> > Am Dienstag, dem 28.03.2023 um 02:57 +0200 schrieb Danilo Krummrich:
> > > Hi all,
> > > 
> > > Commit df622729ddbf ("drm/scheduler: track GPU active time per entity") 
> > > tries to track the accumulated time that a job was active on the GPU 
> > > writing it to the entity through which the job was deployed to the 
> > > scheduler originally. This is done within drm_sched_get_cleanup_job() 
> > > which fetches a job from the schedulers pending_list.
> > > 
> > > Doing this can result in a race condition where the entity is already 
> > > freed, but the entity's newly added elapsed_ns field is still accessed 
> > > once the job is fetched from the pending_list.
> > > 
> > > After drm_sched_entity_destroy() being called it should be safe to free 
> > > the structure that embeds the entity. However, a job originally handed 
> > > over to the scheduler by this entity might still reside in the 
> > > schedulers pending_list for cleanup after drm_sched_entity_destroy() 
> > > already being called and the entity being freed. Hence, we can run into 
> > > a UAF.
> > > 
> > Sorry about that, I clearly didn't properly consider this case.
> > 
> > > In my case it happened that a job, as explained above, was just picked 
> > > from the schedulers pending_list after the entity was freed due to the 
> > > client application exiting. Meanwhile this freed up memory was already 
> > > allocated for a subsequent client applications job structure again. 
> > > Hence, the new jobs memory got corrupted. Luckily, I was able to 
> > > reproduce the same corruption over and over again by just using 
> > > deqp-runner to run a specific set of VK test cases in parallel.
> > > 
> > > Fixing this issue doesn't seem to be very straightforward though (unless 
> > > I miss something), which is why I'm writing this mail instead of sending 
> > > a fix directly.
> > > 
> > > Spontaneously, I see three options to fix it:
> > > 
> > > 1. Rather than embedding the entity into driver specific structures 
> > > (e.g. tied to file_priv) we could allocate the entity separately and 
> > > reference count it, such that it's only freed up once all jobs that were 
> > > deployed through this entity are fetched from the schedulers pending list.
> > > 
> > My vote is on this or something in similar vain for the long term. I
> > have some hope to be able to add a GPU scheduling algorithm with a bit
> > more fairness than the current one sometime in the future, which
> > requires execution time tracking on the entities.
> 
> Danilo,
> 
> Using kref is preferable, i.e. option 1 above.
> 
> Lucas, can you shed some light on,
> 
> 1. In what way the current FIFO scheduling is unfair, and
> 2. shed some details on this "scheduling algorithm with a bit
> more fairness than the current one"? 

I don't have a specific implementation in mind yet. However the current
FIFO algorithm can be very unfair if you have a sparse workload compete
with one that generates a lot of jobs without any throttling aside from
the entity queue length. By tracking the actual GPU time consumed by
the entities we could implement something with a bit more fairness like
deficit round robin (don't pin me on the specific algorithm, as I
haven't given it much thought yet).

Regards,
Lucas


Re: [Regression] drm/scheduler: track GPU active time per entity

2023-04-05 Thread Danilo Krummrich

On 4/5/23 18:09, Luben Tuikov wrote:

On 2023-04-05 10:05, Danilo Krummrich wrote:

On 4/4/23 06:31, Luben Tuikov wrote:

On 2023-03-28 04:54, Lucas Stach wrote:

Hi Danilo,

Am Dienstag, dem 28.03.2023 um 02:57 +0200 schrieb Danilo Krummrich:

Hi all,

Commit df622729ddbf ("drm/scheduler: track GPU active time per entity")
tries to track the accumulated time that a job was active on the GPU
writing it to the entity through which the job was deployed to the
scheduler originally. This is done within drm_sched_get_cleanup_job()
which fetches a job from the schedulers pending_list.

Doing this can result in a race condition where the entity is already
freed, but the entity's newly added elapsed_ns field is still accessed
once the job is fetched from the pending_list.

After drm_sched_entity_destroy() being called it should be safe to free
the structure that embeds the entity. However, a job originally handed
over to the scheduler by this entity might still reside in the
schedulers pending_list for cleanup after drm_sched_entity_destroy()
already being called and the entity being freed. Hence, we can run into
a UAF.


Sorry about that, I clearly didn't properly consider this case.


In my case it happened that a job, as explained above, was just picked
from the schedulers pending_list after the entity was freed due to the
client application exiting. Meanwhile this freed up memory was already
allocated for a subsequent client applications job structure again.
Hence, the new jobs memory got corrupted. Luckily, I was able to
reproduce the same corruption over and over again by just using
deqp-runner to run a specific set of VK test cases in parallel.

Fixing this issue doesn't seem to be very straightforward though (unless
I miss something), which is why I'm writing this mail instead of sending
a fix directly.

Spontaneously, I see three options to fix it:

1. Rather than embedding the entity into driver specific structures
(e.g. tied to file_priv) we could allocate the entity separately and
reference count it, such that it's only freed up once all jobs that were
deployed through this entity are fetched from the schedulers pending list.


My vote is on this or something in similar vain for the long term. I
have some hope to be able to add a GPU scheduling algorithm with a bit
more fairness than the current one sometime in the future, which
requires execution time tracking on the entities.


Danilo,

Using kref is preferable, i.e. option 1 above.


I think the only real motivation for doing that would be for generically
tracking job statistics within the entity a job was deployed through. If
we all agree on tracking job statistics this way I am happy to prepare a
patch for this option and drop this one:
https://lore.kernel.org/all/20230331000622.4156-1-d...@redhat.com/T/#u


Hmm, I never thought about "job statistics" when I preferred using kref above.
The reason kref is attractive is because one doesn't need to worry about
it--when the last user drops the kref, the release is called to do
housekeeping. If this never happens, we know that we have a bug to debug.



I tied it to the use case of (accumulated) job statistics since I think 
using kref might only make sense if we have a reason why a job needs to 
access the entity it was deployed through. And the only real reason to 
keep this connection seems to be (accumulated) job statistics.


If it is only about allowing drivers to derive a driver specific entity 
structure from a job I think it might depend on whether this is a 
"typical application" nearly every driver does (which it seems not to 
be) or if it is an exception.


In the latter case the driver could still store the corresponding 
pointers in its driver specific structures and take care of not freeing 
the structure the entity is embedded in until it is safe to do so.


Since it was already tried twice to generically get (accumulated) job 
statistics into entities personally I think doing so plus using kref 
would be quite nice. However, I'd like to be sure this fits for 
everyone's use cases.



Regarding the patch above--I did look around the code, and it seems safe,
as per your analysis, I didn't see any reference to entity after job submission,
but I'll comment on that thread as well for the record.

Regards,
Luben



Christian mentioned amdgpu tried something similar to what Lucas tried
running into similar trouble, backed off and implemented it in another
way - a driver specific way I guess?



Lucas, can you shed some light on,

1. In what way the current FIFO scheduling is unfair, and
2. shed some details on this "scheduling algorithm with a bit
more fairness than the current one"?

Regards,
Luben




2. Somehow make sure drm_sched_entity_destroy() does block until all
jobs deployed through this entity were fetched from the schedulers
pending list. Though, I'm pretty sure that this is not really desirable.

3. Just revert the change and let drivers implement tracking of GPU
active times 

Re: [PATCH 1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers

2023-04-05 Thread Javier Martinez Canillas
Daniel Vetter  writes:

> On Wed, 5 Apr 2023 at 18:54, Javier Martinez Canillas
>  wrote:
>>
>> Daniel Vetter  writes:

[...]

>>
>> Yeah, agreed that if vga_remove_vgacon() isn't enough and another helper
>> is needed then starts to get a little silly. Maybe one option is to add a
>> 3rd param to aperture_remove_conflicting_pci_devices() and skip the logic
>> to iterate over PCI bars and call aperture_remove_conflicting_devices() ?
>
> The thing I don't get: Why does this matter for gma500 and not any of
> the other pci devices? Look at your gpu, realize there's a lot more

Yes, I don't know why gma500 is special in that sense but I'm not familiar
with that hardware to have an opinion on this.

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [PATCH 3/3] drm/fb-helper: fix input validation gaps in check_var

2023-04-05 Thread Javier Martinez Canillas
Daniel Vetter  writes:

> On Wed, Apr 05, 2023 at 06:27:17PM +0200, Javier Martinez Canillas wrote:
>> Daniel Vetter  writes:

[...]

>> >
>> > The __fill_var is after this. I'm honestly not sure what the exact
>> 
>> Ah, your patch adds it after that indeed. Please ignore my comment then.
>
> So rb: you?
>

Yes, I already provided it in my previous email and has been picked by
patchwork. I could do again but probably will confuse dim when applying.

The only patch from your series that is missing an {r,a}b is #1 right now:

https://patchwork.kernel.org/project/dri-devel/list/?series=736966=both

[...]

>> > What I'm wondering now is whether too small x/yres won't lead to problems
>> > of some sorts ... For multi-screen we set the virtual size to be big
>> > enough for all crtc, and then just set x/yres to be the smallest output.
>> > That way fbcon knows to only draw as much as is visible on all screens.
>> > But if you then pan that too much, the bigger screens might not have a big
>> > enough buffer anymore and things fail (but shouldn't).
>> >
>> > Not sure how to fix that tbh.
>> 
>> Would this be a problem in practice?
>
> I'm frankly not sure. You'd get a black screen for fbcon/fbdev across all
> outputs, but only if you have userspace doing this intentionally.
>
> In a way it's just another artifact of the drm fbdev emulation not using
> ATOMIC_TEST_ONLY in the various places where it should, and so doesn't
> really know whether a configuration change will work out.
>
> We already have this in obscure mulit-monitor cases where adding another
> screen kills fbcon, because the display hw is running out of fifo or
> clocks or whatever, and because the drm fbdev code doesn't check but just
> blindly commits the entire thing as an atomic commit, the overall commit
> fails.
>
> This worked "better" with legacy kms because there we commit per-crtc, so
> if any specific crtc runs into a limit check, only that one fails to light
> up.
>
> Imo given that no one cared enough yet to write up atomic TEST_ONLY
> support for fbdev emulation I think we can continue to just ignore this
> problem.
>

Agreed. If that ends being a problem for people in practice then I guess
someone can type atomic TEST_ONLY support for the fbdev emulation layer.

> What should not happen is that fbcon code blows up drawing out of bounds
> or something like that, resulting in a kernel crash. So from that pov I
> think it's "safe" :-)

Great. Thanks a lot for your explanations.

> -Daniel

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [PATCH] drm/scheduler: set entity to NULL in drm_sched_entity_pop_job()

2023-04-05 Thread Luben Tuikov
On 2023-03-31 01:59, Christian König wrote:
> Am 31.03.23 um 02:06 schrieb Danilo Krummrich:
>> It already happend a few times that patches slipped through which
>> implemented access to an entity through a job that was already removed
>> from the entities queue. Since jobs and entities might have different
>> lifecycles, this can potentially cause UAF bugs.
>>
>> In order to make it obvious that a jobs entity pointer shouldn't be
>> accessed after drm_sched_entity_pop_job() was called successfully, set
>> the jobs entity pointer to NULL once the job is removed from the entity
>> queue.
>>
>> Moreover, debugging a potential NULL pointer dereference is way easier
>> than potentially corrupted memory through a UAF.
>>
>> Signed-off-by: Danilo Krummrich 
> 
> In general "YES PLEASE!", but I fear that this will break amdgpus reset 
> sequence.
> 
> On the other hand when amdgpu still relies on that pointer it's clearly 
> a bug (which I pointed out tons of times before).
> 
> Luben any opinion on that? Could you drive cleaning that up as well?

I didn't find any references to scheduling entity after the job
is submitted to the hardware. (I commented the same in the other
thread, we just need to decide which way to go.)

Regards,
Luben

> 
> Thanks,
> Christian.
> 
>> ---
>> I'm aware that drivers could already use job->entity in arbitrary places, 
>> since
>> they in control of when the entity is actually freed. A quick grep didn't 
>> give
>> me any results where this would actually be the case, however maybe I also 
>> just
>> didn't catch it.
>>
>> If, therefore, we don't want to set job->entity to NULL I think we should at
>> least add a comment somewhere.
>> ---
>>
>>   drivers/gpu/drm/scheduler/sched_entity.c | 6 ++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
>> b/drivers/gpu/drm/scheduler/sched_entity.c
>> index 15d04a0ec623..a9c6118e534b 100644
>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>> @@ -448,6 +448,12 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
>> drm_sched_entity *entity)
>>  drm_sched_rq_update_fifo(entity, next->submit_ts);
>>  }
>>   
>> +/* Jobs and entities might have different lifecycles. Since we're
>> + * removing the job from the entities queue, set the jobs entity pointer
>> + * to NULL to prevent any future access of the entity through this job.
>> + */
>> +sched_job->entity = NULL;
>> +
>>  return sched_job;
>>   }
>>   
> 



Re: [RFC PATCH 02/10] drm/sched: Move schedule policy to scheduler / entity

2023-04-05 Thread Luben Tuikov
Hi,

Inlined:

On 2023-04-03 20:22, Matthew Brost wrote:
> Rather than a global modparam for scheduling policy, move the scheduling
> policy to scheduler / entity so user can control each scheduler / entity
> policy.
> 
> Signed-off-by: Matthew Brost 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  1 +
>  drivers/gpu/drm/etnaviv/etnaviv_sched.c|  3 ++-
>  drivers/gpu/drm/lima/lima_sched.c  |  3 ++-
>  drivers/gpu/drm/msm/msm_ringbuffer.c   |  3 ++-
>  drivers/gpu/drm/panfrost/panfrost_job.c|  3 ++-
>  drivers/gpu/drm/scheduler/sched_entity.c   | 25 ++
>  drivers/gpu/drm/scheduler/sched_main.c | 21 +-
>  drivers/gpu/drm/v3d/v3d_sched.c| 15 -
>  include/drm/gpu_scheduler.h| 23 ++--
>  9 files changed, 73 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 00c9c03c8f94..4df0fca5a74c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2368,6 +2368,7 @@ static int amdgpu_device_init_schedulers(struct 
> amdgpu_device *adev)
>  ring->num_hw_submission, 
> amdgpu_job_hang_limit,
>  timeout, adev->reset_domain->wq,
>  ring->sched_score, ring->name,
> +DRM_SCHED_POLICY_DEFAULT,
>  adev->dev);
>   if (r) {
>   DRM_ERROR("Failed to create scheduler on ring %s.\n",
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c 
> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index 8486a2923f1b..61204a3f8b0b 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -136,7 +136,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>   ret = drm_sched_init(>sched, _sched_ops, NULL,
>etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>msecs_to_jiffies(500), NULL, NULL,
> -  dev_name(gpu->dev), gpu->dev);
> +  dev_name(gpu->dev), DRM_SCHED_POLICY_DEFAULT,
> +  gpu->dev);
>   if (ret)
>   return ret;
>  
> diff --git a/drivers/gpu/drm/lima/lima_sched.c 
> b/drivers/gpu/drm/lima/lima_sched.c
> index 54f53bece27c..33042ba6ae93 100644
> --- a/drivers/gpu/drm/lima/lima_sched.c
> +++ b/drivers/gpu/drm/lima/lima_sched.c
> @@ -491,7 +491,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, 
> const char *name)
>   return drm_sched_init(>base, _sched_ops, NULL, 1,
> lima_job_hang_limit,
> msecs_to_jiffies(timeout), NULL,
> -   NULL, name, pipe->ldev->dev);
> +   NULL, name, DRM_SCHED_POLICY_DEFAULT,
> +   pipe->ldev->dev);
>  }
>  
>  void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
> diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c 
> b/drivers/gpu/drm/msm/msm_ringbuffer.c
> index 5879fc262047..f408a9097315 100644
> --- a/drivers/gpu/drm/msm/msm_ringbuffer.c
> +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
> @@ -97,7 +97,8 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu 
> *gpu, int id,
>  
>   ret = drm_sched_init(>sched, _sched_ops, NULL,
>   num_hw_submissions, 0, sched_timeout,
> - NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev);
> + NULL, NULL, to_msm_bo(ring->bo)->name,
> + DRM_SCHED_POLICY_DEFAULT, gpu->dev->dev);
>   if (ret) {
>   goto fail;
>   }
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
> b/drivers/gpu/drm/panfrost/panfrost_job.c
> index f48b07056a16..effa48b33dce 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -819,7 +819,8 @@ int panfrost_job_init(struct panfrost_device *pfdev)
>nentries, 0,
>msecs_to_jiffies(JOB_TIMEOUT_MS),
>pfdev->reset.wq,
> -  NULL, "pan_js", pfdev->dev);
> +  NULL, "pan_js", DRM_SCHED_POLICY_DEFAULT,
> +  pfdev->dev);
>   if (ret) {
>   dev_err(pfdev->dev, "Failed to create scheduler: %d.", 
> ret);
>   goto err_sched;
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index 15d04a0ec623..f1299e51860b 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -33,6 +33,20 @@
>  #define to_drm_sched_job(sched_job)  \
>   container_of((sched_job), 

Re: [PATCH 3/3] drm/fb-helper: fix input validation gaps in check_var

2023-04-05 Thread Daniel Vetter
On Wed, Apr 05, 2023 at 06:27:17PM +0200, Javier Martinez Canillas wrote:
> Daniel Vetter  writes:
> 
> [...]
> 
> >> 
> >> but only the 'var->xres > fb->width || var->yres > fb->height' from the
> >> conditions checked could be false after your __fill_var() call above.
> >> 
> >> You should drop the 'var->bits_per_pixel > bpp', 'var->xres_virtual >
> >> fb->width' and 'var->yres_virtual > fb->height' checks I believe since
> >> those will always be true.
> >
> > The __fill_var is after this. I'm honestly not sure what the exact
> 
> Ah, your patch adds it after that indeed. Please ignore my comment then.

So rb: you?

> > semantics are supposed to be, but essentially if userspace asks for too
> > big virtual size, we reject it. And for anything else we then tell it
> > (with __fill_var) how big the actually available space is.
> >
> > What I'm wondering now is whether too small x/yres won't lead to problems
> > of some sorts ... For multi-screen we set the virtual size to be big
> > enough for all crtc, and then just set x/yres to be the smallest output.
> > That way fbcon knows to only draw as much as is visible on all screens.
> > But if you then pan that too much, the bigger screens might not have a big
> > enough buffer anymore and things fail (but shouldn't).
> >
> > Not sure how to fix that tbh.
> 
> Would this be a problem in practice?

I'm frankly not sure. You'd get a black screen for fbcon/fbdev across all
outputs, but only if you have userspace doing this intentionally.

In a way it's just another artifact of the drm fbdev emulation not using
ATOMIC_TEST_ONLY in the various places where it should, and so doesn't
really know whether a configuration change will work out.

We already have this in obscure mulit-monitor cases where adding another
screen kills fbcon, because the display hw is running out of fifo or
clocks or whatever, and because the drm fbdev code doesn't check but just
blindly commits the entire thing as an atomic commit, the overall commit
fails.

This worked "better" with legacy kms because there we commit per-crtc, so
if any specific crtc runs into a limit check, only that one fails to light
up.

Imo given that no one cared enough yet to write up atomic TEST_ONLY
support for fbdev emulation I think we can continue to just ignore this
problem.

What should not happen is that fbcon code blows up drawing out of bounds
or something like that, resulting in a kernel crash. So from that pov I
think it's "safe" :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers

2023-04-05 Thread Daniel Vetter
On Wed, 5 Apr 2023 at 18:54, Javier Martinez Canillas
 wrote:
>
> Daniel Vetter  writes:
>
> > On Wed, Apr 05, 2023 at 04:32:19PM +0200, Thomas Zimmermann wrote:
>
> [...]
>
> >> > > >/*
> >> > > > * WARNING: Apparently we must kick fbdev drivers before 
> >> > > > vgacon,
> >> > > > * otherwise the vga fbdev driver falls over.
> >> > > > */
> >> > > >ret = vga_remove_vgacon(pdev);
> >> >
> >> > This isn't enough, we also nuke stuff that's mapping the vga fb range.
>
> Ah, also need aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE) 
> then.
>
> [...]
>
> >> int aperture_remove_legacy_vga_devices(struct pci_dev *pdev)
> >> {
> >>  aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE);
> >>
> >>  return vga_remove_vgacon(pdev);
> >> }
> >>
> >> And that can be called from gma500 and the pci aperture helper.
> >
> > But you still pass a pci_dev to that helper. Which just doesn't make any
> > sense to me (assuming your entire point is that this isn't just a normal
> > pci device but some special legacy vga thing), but if we go with (void)
> > then there's more refactoring to do because the vga_remove_vgacon also
> > wants a pdev.
> >
> > All so that we don't call aperture_detach_devices() on a bunch of pci
> > bars, which apparently is not problem for any other driver, but absolutely
> > is a huge problem for gma500 somehow.
> >
> > I don't understand why.
> >
>
> Yeah, agreed that if vga_remove_vgacon() isn't enough and another helper
> is needed then starts to get a little silly. Maybe one option is to add a
> 3rd param to aperture_remove_conflicting_pci_devices() and skip the logic
> to iterate over PCI bars and call aperture_remove_conflicting_devices() ?

The thing I don't get: Why does this matter for gma500 and not any of
the other pci devices? Look at your gpu, realize there's a lot more
than the one pci bar for vram or stolen memory, realize that we're
nuking bars that cannot possible contain the framebuffer for everyone
else too. Like the entire "gpus have a lot of bars" thing is the
reason why I pulled the sysfb_disable one level up, because we've been
doing that quite a few times before this patch (yes it's not the main
thing, but the side-effect cleanup is why I've gone down this rabbit
hole and wrote the entire series here):

https://lore.kernel.org/dri-devel/20230404201842.567344-7-daniel.vet...@ffwll.ch/

But somehow for gma500 it's a problem, while for everyone else it's
fine. That's the part I dont get, or Thomas have been talking past
each another and there's another issue that I'm missing.
-Daniel

> > Consider this me throwing in the towel. If you are convinced this
> > makes sense please type it up and merge it, but I'm not going to type
> > something that just doesn't make sense to me.
>
> Honestly, I would just go with the double drm_aperture_remove_*() helper
> calls (your original patch) unless that causes real issues. There is no
> point on blocking all your series just for this IMO.
>
> Then latter if Thomas has strong opinions can send a follow-up patch for
> the gma500 driver and the aperture helpers.
>
> > -Daniel
> >
>
> --
> Best regards,
>
> Javier Martinez Canillas
> Core Platforms
> Red Hat
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[PATCH] drm/nouveau: Add a jump label in nouveau_gem_ioctl_pushbuf()

2023-04-05 Thread Markus Elfring
Date: Wed, 5 Apr 2023 18:38:54 +0200

The label “out_prevalid” was used to jump to another pointer check
despite of the detail in the implementation of the function
“nouveau_gem_ioctl_pushbuf” that it was determined already in one case
that the corresponding variable contained an error pointer
because of a failed call of the function “u_memcpya”.

Thus use an additional label.

This issue was detected by using the Coccinelle software.

Fixes: 2be65641642ef423f82162c3a5f28c754d1637d2 ("drm/nouveau: fix relocations 
applying logic and a double-free")
Signed-off-by: Markus Elfring 
---
 drivers/gpu/drm/nouveau/nouveau_gem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index f77e44958037..d87e1cb2c933 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -814,7 +814,7 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void 
*data,
reloc = u_memcpya(req->relocs, req->nr_relocs, 
sizeof(*reloc));
if (IS_ERR(reloc)) {
ret = PTR_ERR(reloc);
-   goto out_prevalid;
+   goto out_free_bo;
}

goto revalidate;
@@ -929,6 +929,7 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void 
*data,
 out_prevalid:
if (!IS_ERR(reloc))
u_free(reloc);
+out_free_bo:
u_free(bo);
u_free(push);

--
2.40.0



Re: [PATCH RFC 02/18] rust: drm: Add Device and Driver abstractions

2023-04-05 Thread Daniel Vetter
On Tue, Mar 07, 2023 at 11:25:27PM +0900, Asahi Lina wrote:
> Add the initial abstractions for DRM drivers and devices. These go
> together in one commit since they are fairly tightly coupled types.
> 
> A few things have been stubbed out, to be implemented as further bits of
> the DRM subsystem are introduced.
> 
> Signed-off-by: Asahi Lina 

Ok so this is fairly fundamental lifetime fun and might be fairly
orthogonal to most of the things you actually want to do with a drm driver
(like implement gem or whatever). So separate mail.

So upfront short intro. There's 3 different lifetimes involved in building
a drm driver:

- struct drm_driver. It's refcounted because it's fundamentally an uapi
  interface thing, and all the various uapi interfaces that build on top
  of this (drm_file, dma_buf, ...) need to hold references on it. It's
  supposed to survive for as long as userspace needs it or the underlying
  driver is bound, whichever is longer.

- struct device. Refcounted and good for dmesg printing, nothing else.
  Yes, because ...

- ... the actual hardware resource, in many places also represented by struct
  device. Not refcounted, instead it's limited by hotunplug or more
  precisiely, how long your driver is bound to the struct device. You
  could make a case that in C this is represented by the bus specific type
  (e.g. platform_device), and the bus-specific hooks delineate the
  lifetime (for platform devices that's that's from ->probe to ->remove).
  Since there's no C type for this I'll call this hwdevice.

I think for rust it would be good if we model a bit more precisely in
rust. It might be possible to use the bus-specific types as the hwdevice,
but that's not entirely right either because each bus device is both
representing the hwdevice and the refcounted struct device.

Now onto lifetimes, or at least how this is usually handled.

- struct device should be obvious, the important part really is that the
  rust wrappers should not allow anything to be done with that which is
  tied to the hwdevice lifetime. Which is almost everything you want to do
  with a struct (platform_)device (aside from pure sw stuff like dmesg
  printing).

- the hwdevice is _not_ refcounted. I think in rust this maps to borrow
  semantics, to make sure that the reference only stays valid during a
  driver callback. The driver core/bus driver ensure that all the various
  callbacks (pm_ops, platform_driver, ...) finish before the ->remove
  callback starts.

- usually the the link from hwdevice to drm_device is done with a
  refcounted drm_device stored with dev_set_drvdata. For rust it'd be nice
  if that's the Driver and fully typesafe and automatically cleaned up.

- which brings us to how hwdevice cleanup works in C: That's done with all
  the devm_ helpers for practically anything you might want to set up for
  hw access: mappings, interrupts,  Note that there's also
  devm_*malloc functions, when drivers use them that's almost always a bug
  because generally allocations should stick around with the drm_device
  and not go down with the non-refcounted hwdevice lifetime.

  For added fun the bus/driver core also uses devm_ to mange things tied
  to the refcounted struct device, which works because devm_ nests and
  ->probe opens up a new devm_ bucket which is torn down at ->remove time.
  But luckily drivers should never deal with that, so for them (on the C
  side at least) devm_ is the right lifetime model for things tied to the
  hwdevice lifetime.

  For rust this means that we really should try to tie all the hw related
  things into devm or equivalent, and make both sure it's automatically
  cleaned up at that point, but also no later (because if you clean up hw
  stuff after ->remove you have a driver bug).

- Similarly on the drm_device side we have drmm_. You can have some
  refcounted things within the lifetime of drm_device (like dma_buf), but
  if things the driver creates survive past the point of drm_device, then
  probably something went really wrong. Either a leak or you'll blow up in
  the C code.

  So again for rust I think we should try to model this, and make sure
  (with borrow semantics and avoiding full refcounting like the plague in
  driver code) that driver structs and other sw things can't outlive the
  drm_device, but also don't hold it alive unduly.

- Since the point of a drm_device is to drive hardware, you need to be
  able to safely dereference the drm_device->dev pointer and know whether
  it's still a hwdevice (i.e. useful) or just a struct device because the
  hw is gone. That's done with drm_dev_enter/exit and making sure that
  ->remove calls drm_dev_unplug as the very first thing, before it starts
  tearing down hw resources like mappings, interrupts, ...

  On the C side we entirely rely on review for this, and it just doesn't
  work. Unless exhaustively tested, hotunplug just dies, and I think for
  more complex drivers this is something where Rust type enforcement 

Re: [PATCH v10 2/2] drm: add kms driver for loongson display controller

2023-04-05 Thread Sui Jingfeng



On 2023/4/6 00:40, Sui Jingfeng wrote:

Hi,

On 2023/4/4 22:10, Emil Velikov wrote:

Greetings Sui Jingfeng,

I haven't been around drm-land for a while and this is the first
driver I skim through in a few years. So take the following
suggestions with a healthy pinch of salt.

Hope that helps o/

Emil, we love your reviews,
On Mon, 3 Apr 2023 at 18:13, Sui Jingfeng  
wrote:



   v7 -> v8:
    1) Zero a compile warnnings on 32-bit platform, compile with W=1
    2) Revise lsdc_bo_gpu_offset() and minor cleanup
    3) Pageflip tested on the virtual terminal with following commands

   modetest -M loongson -s 32:1920x1080 -v
   modetest -M loongson -s 34:1920x1080 -v -F tiles


I could be wrong, but my understanding is that new drivers should be
capable of running under Xorg and/or Wayland compositor. There is also
the IGT test suite, which can help verify and validate the driver's
behaviour:

https://www.kernel.org/doc/html/latest/gpu/drm-uapi.html


Yet it may take more time to give full answer for all of your concerns.

Currently, drm/loongson driver do works under Xorg(X server),

link[1] is a short video which can prove that the driver actually 
works very well.


Note that it use the generic modesetting driver on userspace.

We could provide more videos if necessary.


We are carry on the IGT test suite, we feedback the test result once 
it finished on our platform.


We will feedback the results once we finishe the igt test, thanks for 
providing such a valuable


information.


[1] 
https://raw.githubusercontent.com/loongson-gfx/loongson_boards/main/videos/drm_loongson_under_xserver.mp4



I'm not familiar with it before, previously we only focus on the basic 
unit tests came with libdrm.



I will answer rest questions in a latter time, please wait a moment.




+static void lsdc_crtc_reset(struct drm_crtc *crtc)
+{
+   struct lsdc_display_pipe *dispipe = crtc_to_display_pipe(crtc);
+   struct drm_device *ddev = crtc->dev;
+   struct lsdc_device *ldev = to_lsdc(ddev);
+   struct lsdc_crtc_state *priv_crtc_state;
+   unsigned int index = dispipe->index;
+   u32 val;
+
+   val = LSDC_PF_XRGB | CFG_RESET_N;
+   if (ldev->descp->chip == CHIP_LS7A2000)
+   val |= LSDC_DMA_STEP_64_BYTES;
+
+   lsdc_crtc_wreg32(ldev, LSDC_CRTC0_CFG_REG, index, val);
+
+   if (ldev->descp->chip == CHIP_LS7A2000) {
+   val = PHY_CLOCK_EN | PHY_DATA_EN;
+   lsdc_crtc_wreg32(ldev, LSDC_CRTC0_PANEL_CONF_REG, 
index, val);

+   }
+

AFAICT no other driver touches the HW in their reset callback. Should
the above be moved to another callback?




+static void lsdc_crtc_atomic_enable(struct drm_crtc *crtc,
+   struct drm_atomic_state *state)
+{
+   val = lsdc_crtc_rreg32(ldev, LSDC_CRTC0_CFG_REG, index);
+   /* clear old dma step settings */
+   val &= ~CFG_DMA_STEP_MASK;
+
+   if (descp->chip == CHIP_LS7A2000) {
+   /*
+    * Using large dma step as much as possible,
+    * for improve hardware DMA efficiency.
+    */
+   if (width_in_bytes % 256 == 0)
+   val |= LSDC_DMA_STEP_256_BYTES;
+   else if (width_in_bytes % 128 == 0)
+   val |= LSDC_DMA_STEP_128_BYTES;
+   else if (width_in_bytes % 64 == 0)
+   val |= LSDC_DMA_STEP_64_BYTES;
+   else  /* width_in_bytes % 32 == 0 */
+   val |= LSDC_DMA_STEP_32_BYTES;
+   }
+
+   clk_func->update(pixpll, _state->pparms);
+

Without knowing the hardware, the clk_func abstraction seems quite
arbitrary and unnecessary. It should be introduced when there is a
use-case for it.


+   lsdc_crtc_wreg32(ldev, LSDC_CRTC0_CFG_REG, index, val | 
CFG_OUTPUT_EN);

+
+   drm_crtc_vblank_on(crtc);
+}
+



--- /dev/null
+++ b/drivers/gpu/drm/loongson/lsdc_debugfs.c
+void lsdc_debugfs_init(struct drm_minor *minor)
+{
+#ifdef CONFIG_DEBUG_FS
+   drm_debugfs_create_files(lsdc_debugfs_list,
+ ARRAY_SIZE(lsdc_debugfs_list),
+    minor->debugfs_root,
+    minor);
+#endif
+}

Should probably build the file when debugfs is enabled and provide
no-op stub in the header. See nouveau for an example.



--- /dev/null
+++ b/drivers/gpu/drm/loongson/lsdc_drv.c
+static const struct lsdc_desc dc_in_ls7a1000 = {
+   .chip = CHIP_LS7A1000,
+   .num_of_crtc = LSDC_NUM_CRTC,
+   .max_pixel_clk = 20,
+   .max_width = 2048,
+   .max_height = 2048,
+   .num_of_hw_cursor = 1,
+   .hw_cursor_w = 32,
+   .hw_cursor_h = 32,
+   .pitch_align = 256,
+   .mc_bits = 40,
+   .has_vblank_counter = false,
+   .has_scan_pos = true,
+   .has_builtin_i2c = true,
+   .has_vram = true,
+   .has_hpd_reg = false,
+   .is_soc = false,
+};
+
+static const struct lsdc_desc dc_in_ls7a2000 = {
+

Re: [PATCH 1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers

2023-04-05 Thread Javier Martinez Canillas
Daniel Vetter  writes:

> On Wed, Apr 05, 2023 at 04:32:19PM +0200, Thomas Zimmermann wrote:

[...]

>> > > >/*
>> > > > * WARNING: Apparently we must kick fbdev drivers before vgacon,
>> > > > * otherwise the vga fbdev driver falls over.
>> > > > */
>> > > >ret = vga_remove_vgacon(pdev);
>> > 
>> > This isn't enough, we also nuke stuff that's mapping the vga fb range.

Ah, also need aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE) then.

[...]

>> int aperture_remove_legacy_vga_devices(struct pci_dev *pdev)
>> {
>>  aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE);
>> 
>>  return vga_remove_vgacon(pdev);
>> }
>> 
>> And that can be called from gma500 and the pci aperture helper.
>
> But you still pass a pci_dev to that helper. Which just doesn't make any
> sense to me (assuming your entire point is that this isn't just a normal
> pci device but some special legacy vga thing), but if we go with (void)
> then there's more refactoring to do because the vga_remove_vgacon also
> wants a pdev.
>
> All so that we don't call aperture_detach_devices() on a bunch of pci
> bars, which apparently is not problem for any other driver, but absolutely
> is a huge problem for gma500 somehow.
>
> I don't understand why.
>

Yeah, agreed that if vga_remove_vgacon() isn't enough and another helper
is needed then starts to get a little silly. Maybe one option is to add a
3rd param to aperture_remove_conflicting_pci_devices() and skip the logic
to iterate over PCI bars and call aperture_remove_conflicting_devices() ?

> Consider this me throwing in the towel. If you are convinced this
> makes sense please type it up and merge it, but I'm not going to type
> something that just doesn't make sense to me.

Honestly, I would just go with the double drm_aperture_remove_*() helper
calls (your original patch) unless that causes real issues. There is no
point on blocking all your series just for this IMO.

Then latter if Thomas has strong opinions can send a follow-up patch for
the gma500 driver and the aperture helpers.

> -Daniel
>

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [PATCH 01/19] drm/i915/i915_scatterlist: Fix kerneldoc formatting issue - missing '@'

2023-04-05 Thread Lee Jones
On Wed, 05 Apr 2023, Jani Nikula wrote:

> On Wed, 05 Apr 2023, Lee Jones  wrote:
> > On Tue, 04 Apr 2023, Jani Nikula wrote:
> >
> >> On Mon, 03 Apr 2023, Lee Jones  wrote:
> >> > On Mon, 03 Apr 2023, Jani Nikula wrote:
> >> >
> >> >> On Fri, 31 Mar 2023, Lee Jones  wrote:
> >> >> > Fixes the following W=1 kernel build warning(s):
> >> >> >
> >> >> >  drivers/gpu/drm/i915/i915_scatterlist.c:62: warning: Function 
> >> >> > parameter or member 'size' not described in 'i915_refct_sgt_init'
> >> >> >
> >> >> > Cc: Jani Nikula 
> >> >> > Cc: Joonas Lahtinen 
> >> >> > Cc: Rodrigo Vivi 
> >> >> > Cc: Tvrtko Ursulin 
> >> >> > Cc: David Airlie 
> >> >> > Cc: Daniel Vetter 
> >> >> > Cc: intel-...@lists.freedesktop.org
> >> >> > Cc: dri-devel@lists.freedesktop.org
> >> >> > Signed-off-by: Lee Jones 
> >> >>
> >> >> Thanks for the patches!
> >> >>
> >> >> Applied all but one of the drm/i915 patches to drm-intel-next or
> >> >> drm-intel-gt-next depending on the area. There were a couple of issues
> >> >> that I fixed while applying. There was a conflict with patch 5/19
> >> >> against drm-intel-gt-next so I left that one out.
> >> >
> >> > Thanks Jani.  I'll rebase and see what's left.
> >>
> >> We also took notice and aim to track this more aggressively [1].
> >
> > Thanks.
> >
> > I did clean-up all of the GPU warnings already a couple of years ago,
> > but they seem to have crept back over time.  It would be great if we
> > could put some extra checks in place to prevent them in the future.
>
> We are pretty zealous about warnings in general in i915. We have a bunch
> of extra warnings in our local Makefile and use -Werror in
> development. Inspired by this series, we added kernel-doc check to the
> build, and hope to add kernel-doc -Werror too once we're done.

Sounds good that you're on it.  At least in your part of GPU.

kernel-doc warnings are surfaced by enabling W=1.

> > My aim, albeit ambitious, is to clean-up all of the W=1 warnings in the
> > kernel, then have them promoted to W=0, so they warn more loudly during
> > development, thus keeping them from reappearing.
>
> I wish it was easier to do the equivalent of W=1 on a driver or Makefile
> basis. I like to keep i915 clean, but I don't like to use W=1 because
> there are just so many warnings currently.

Well that's what I hope to improve (again). :)

> The other alternative is fixing and moving extra warnings from W=1 to
> W=0 one by one.

Right, that's where I'd like to end up eventually.

--
Lee Jones [李琼斯]


Re: [PATCH v10 2/2] drm: add kms driver for loongson display controller

2023-04-05 Thread Sui Jingfeng

Hi,

On 2023/4/4 22:10, Emil Velikov wrote:

Greetings Sui Jingfeng,

I haven't been around drm-land for a while and this is the first
driver I skim through in a few years. So take the following
suggestions with a healthy pinch of salt.

Hope that helps o/

Emil, we love your reviews,

On Mon, 3 Apr 2023 at 18:13, Sui Jingfeng  wrote:


   v7 -> v8:
1) Zero a compile warnnings on 32-bit platform, compile with W=1
2) Revise lsdc_bo_gpu_offset() and minor cleanup
3) Pageflip tested on the virtual terminal with following commands

   modetest -M loongson -s 32:1920x1080 -v
   modetest -M loongson -s 34:1920x1080 -v -F tiles


I could be wrong, but my understanding is that new drivers should be
capable of running under Xorg and/or Wayland compositor. There is also
the IGT test suite, which can help verify and validate the driver's
behaviour:

https://www.kernel.org/doc/html/latest/gpu/drm-uapi.html


Yet it may take more time to give full answer for all of your concerns.

Currently, drm/loongson driver do works under Xorg(X server),

link[1] is a short video which can prove that the driver actually works 
very well.


Note that it use the generic modesetting driver on userspace.

We could provide more videos if necessary.


We are carry on the IGT test suite, we feedback the test result once it 
finished on our platform.


I'm not familiar with it before, previously we only focus on the basic 
unit tests came with libdrm.



I will answer rest questions in a latter time, please wait a moment.




+static void lsdc_crtc_reset(struct drm_crtc *crtc)
+{
+   struct lsdc_display_pipe *dispipe = crtc_to_display_pipe(crtc);
+   struct drm_device *ddev = crtc->dev;
+   struct lsdc_device *ldev = to_lsdc(ddev);
+   struct lsdc_crtc_state *priv_crtc_state;
+   unsigned int index = dispipe->index;
+   u32 val;
+
+   val = LSDC_PF_XRGB | CFG_RESET_N;
+   if (ldev->descp->chip == CHIP_LS7A2000)
+   val |= LSDC_DMA_STEP_64_BYTES;
+
+   lsdc_crtc_wreg32(ldev, LSDC_CRTC0_CFG_REG, index, val);
+
+   if (ldev->descp->chip == CHIP_LS7A2000) {
+   val = PHY_CLOCK_EN | PHY_DATA_EN;
+   lsdc_crtc_wreg32(ldev, LSDC_CRTC0_PANEL_CONF_REG, index, val);
+   }
+

AFAICT no other driver touches the HW in their reset callback. Should
the above be moved to another callback?




+static void lsdc_crtc_atomic_enable(struct drm_crtc *crtc,
+   struct drm_atomic_state *state)
+{
+   val = lsdc_crtc_rreg32(ldev, LSDC_CRTC0_CFG_REG, index);
+   /* clear old dma step settings */
+   val &= ~CFG_DMA_STEP_MASK;
+
+   if (descp->chip == CHIP_LS7A2000) {
+   /*
+* Using large dma step as much as possible,
+* for improve hardware DMA efficiency.
+*/
+   if (width_in_bytes % 256 == 0)
+   val |= LSDC_DMA_STEP_256_BYTES;
+   else if (width_in_bytes % 128 == 0)
+   val |= LSDC_DMA_STEP_128_BYTES;
+   else if (width_in_bytes % 64 == 0)
+   val |= LSDC_DMA_STEP_64_BYTES;
+   else  /* width_in_bytes % 32 == 0 */
+   val |= LSDC_DMA_STEP_32_BYTES;
+   }
+
+   clk_func->update(pixpll, _state->pparms);
+

Without knowing the hardware, the clk_func abstraction seems quite
arbitrary and unnecessary. It should be introduced when there is a
use-case for it.



+   lsdc_crtc_wreg32(ldev, LSDC_CRTC0_CFG_REG, index, val | CFG_OUTPUT_EN);
+
+   drm_crtc_vblank_on(crtc);
+}
+



--- /dev/null
+++ b/drivers/gpu/drm/loongson/lsdc_debugfs.c
+void lsdc_debugfs_init(struct drm_minor *minor)
+{
+#ifdef CONFIG_DEBUG_FS
+   drm_debugfs_create_files(lsdc_debugfs_list,
+ARRAY_SIZE(lsdc_debugfs_list),
+minor->debugfs_root,
+minor);
+#endif
+}

Should probably build the file when debugfs is enabled and provide
no-op stub in the header. See nouveau for an example.



--- /dev/null
+++ b/drivers/gpu/drm/loongson/lsdc_drv.c
+static const struct lsdc_desc dc_in_ls7a1000 = {
+   .chip = CHIP_LS7A1000,
+   .num_of_crtc = LSDC_NUM_CRTC,
+   .max_pixel_clk = 20,
+   .max_width = 2048,
+   .max_height = 2048,
+   .num_of_hw_cursor = 1,
+   .hw_cursor_w = 32,
+   .hw_cursor_h = 32,
+   .pitch_align = 256,
+   .mc_bits = 40,
+   .has_vblank_counter = false,
+   .has_scan_pos = true,
+   .has_builtin_i2c = true,
+   .has_vram = true,
+   .has_hpd_reg = false,
+   .is_soc = false,
+};
+
+static const struct lsdc_desc dc_in_ls7a2000 = {
+   .chip = CHIP_LS7A2000,
+   .num_of_crtc = LSDC_NUM_CRTC,
+   .max_pixel_clk = 35,
+   .max_width = 4096,
+   .max_height = 4096,
+   .num_of_hw_cursor = 2,
+   .hw_cursor_w = 64,
+   .hw_cursor_h = 64,
+ 

[PATCH] drm/scheduler: Fix UAF race in drm_sched_entity_push_job()

2023-04-05 Thread Asahi Lina
After a job is pushed into the queue, it is owned by the scheduler core
and may be freed at any time, so we can't write nor read the submit
timestamp after that point.

Fixes oopses observed with the drm/asahi driver, found with kASAN.

Signed-off-by: Asahi Lina 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 15d04a0ec623..e0a8890a62e2 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -507,12 +507,19 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
 {
struct drm_sched_entity *entity = sched_job->entity;
bool first;
+   ktime_t submit_ts;
 
trace_drm_sched_job(sched_job, entity);
atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
+
+   /*
+* After the sched_job is pushed into the entity queue, it may be
+* completed and freed up at any time. We can no longer access it.
+* Make sure to set the submit_ts first, to avoid a race.
+*/
+   sched_job->submit_ts = submit_ts = ktime_get();
first = spsc_queue_push(>job_queue, _job->queue_node);
-   sched_job->submit_ts = ktime_get();
 
/* first job wakes up scheduler */
if (first) {
@@ -529,7 +536,7 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
spin_unlock(>rq_lock);
 
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-   drm_sched_rq_update_fifo(entity, sched_job->submit_ts);
+   drm_sched_rq_update_fifo(entity, submit_ts);
 
drm_sched_wakeup(entity->rq->sched);
}

---
base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6
change-id: 20230406-scheduler-uaf-2-44cf8faed245

Thank you,
~~ Lina



[PATCH] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name

2023-04-05 Thread Asahi Lina
A signaled scheduler fence can outlive its scheduler, since fences are
independently reference counted. Therefore, we can't reference the
scheduler in the get_timeline_name() implementation.

Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared
dma-bufs reference fences from GPU schedulers that no longer exist.

Signed-off-by: Asahi Lina 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 7 ++-
 drivers/gpu/drm/scheduler/sched_fence.c  | 4 +++-
 include/drm/gpu_scheduler.h  | 5 +
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 15d04a0ec623..8b3b949b2ce8 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -368,7 +368,12 @@ static bool drm_sched_entity_add_dependency_cb(struct 
drm_sched_entity *entity)
 
/*
 * Fence is from the same scheduler, only need to wait for
-* it to be scheduled
+* it to be scheduled.
+*
+* Note: s_fence->sched could have been freed and reallocated
+* as another scheduler. This false positive case is okay, as if
+* the old scheduler was freed all of its jobs must have
+* signaled their completion fences.
 */
fence = dma_fence_get(_fence->scheduled);
dma_fence_put(entity->dependency);
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c 
b/drivers/gpu/drm/scheduler/sched_fence.c
index 7fd869520ef2..33b145dfa38c 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -66,7 +66,7 @@ static const char *drm_sched_fence_get_driver_name(struct 
dma_fence *fence)
 static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f)
 {
struct drm_sched_fence *fence = to_drm_sched_fence(f);
-   return (const char *)fence->sched->name;
+   return (const char *)fence->sched_name;
 }
 
 static void drm_sched_fence_free_rcu(struct rcu_head *rcu)
@@ -168,6 +168,8 @@ void drm_sched_fence_init(struct drm_sched_fence *fence,
unsigned seq;
 
fence->sched = entity->rq->sched;
+   strlcpy(fence->sched_name, entity->rq->sched->name,
+   sizeof(fence->sched_name));
seq = atomic_inc_return(>fence_seq);
dma_fence_init(>scheduled, _sched_fence_ops_scheduled,
   >lock, entity->fence_context, seq);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 9db9e5e504ee..49f019731891 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -295,6 +295,11 @@ struct drm_sched_fence {
  * @lock: the lock used by the scheduled and the finished fences.
  */
spinlock_t  lock;
+/**
+ * @sched_name: the name of the scheduler that owns this fence. We
+ * keep a copy here since fences can outlive their scheduler.
+ */
+   char sched_name[16];
 /**
  * @owner: job owner for debugging
  */

---
base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6
change-id: 20230406-scheduler-uaf-1-994ec34cac93

Thank you,
~~ Lina



Re: [PATCH 3/3] drm/fb-helper: fix input validation gaps in check_var

2023-04-05 Thread Javier Martinez Canillas
Daniel Vetter  writes:

[...]

>> 
>> but only the 'var->xres > fb->width || var->yres > fb->height' from the
>> conditions checked could be false after your __fill_var() call above.
>> 
>> You should drop the 'var->bits_per_pixel > bpp', 'var->xres_virtual >
>> fb->width' and 'var->yres_virtual > fb->height' checks I believe since
>> those will always be true.
>
> The __fill_var is after this. I'm honestly not sure what the exact

Ah, your patch adds it after that indeed. Please ignore my comment then.

> semantics are supposed to be, but essentially if userspace asks for too
> big virtual size, we reject it. And for anything else we then tell it
> (with __fill_var) how big the actually available space is.
>
> What I'm wondering now is whether too small x/yres won't lead to problems
> of some sorts ... For multi-screen we set the virtual size to be big
> enough for all crtc, and then just set x/yres to be the smallest output.
> That way fbcon knows to only draw as much as is visible on all screens.
> But if you then pan that too much, the bigger screens might not have a big
> enough buffer anymore and things fail (but shouldn't).
>
> Not sure how to fix that tbh.

Would this be a problem in practice?

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [Regression] drm/scheduler: track GPU active time per entity

2023-04-05 Thread Luben Tuikov
On 2023-04-05 10:05, Danilo Krummrich wrote:
> On 4/4/23 06:31, Luben Tuikov wrote:
>> On 2023-03-28 04:54, Lucas Stach wrote:
>>> Hi Danilo,
>>>
>>> Am Dienstag, dem 28.03.2023 um 02:57 +0200 schrieb Danilo Krummrich:
 Hi all,

 Commit df622729ddbf ("drm/scheduler: track GPU active time per entity")
 tries to track the accumulated time that a job was active on the GPU
 writing it to the entity through which the job was deployed to the
 scheduler originally. This is done within drm_sched_get_cleanup_job()
 which fetches a job from the schedulers pending_list.

 Doing this can result in a race condition where the entity is already
 freed, but the entity's newly added elapsed_ns field is still accessed
 once the job is fetched from the pending_list.

 After drm_sched_entity_destroy() being called it should be safe to free
 the structure that embeds the entity. However, a job originally handed
 over to the scheduler by this entity might still reside in the
 schedulers pending_list for cleanup after drm_sched_entity_destroy()
 already being called and the entity being freed. Hence, we can run into
 a UAF.

>>> Sorry about that, I clearly didn't properly consider this case.
>>>
 In my case it happened that a job, as explained above, was just picked
 from the schedulers pending_list after the entity was freed due to the
 client application exiting. Meanwhile this freed up memory was already
 allocated for a subsequent client applications job structure again.
 Hence, the new jobs memory got corrupted. Luckily, I was able to
 reproduce the same corruption over and over again by just using
 deqp-runner to run a specific set of VK test cases in parallel.

 Fixing this issue doesn't seem to be very straightforward though (unless
 I miss something), which is why I'm writing this mail instead of sending
 a fix directly.

 Spontaneously, I see three options to fix it:

 1. Rather than embedding the entity into driver specific structures
 (e.g. tied to file_priv) we could allocate the entity separately and
 reference count it, such that it's only freed up once all jobs that were
 deployed through this entity are fetched from the schedulers pending list.

>>> My vote is on this or something in similar vain for the long term. I
>>> have some hope to be able to add a GPU scheduling algorithm with a bit
>>> more fairness than the current one sometime in the future, which
>>> requires execution time tracking on the entities.
>>
>> Danilo,
>>
>> Using kref is preferable, i.e. option 1 above.
> 
> I think the only real motivation for doing that would be for generically 
> tracking job statistics within the entity a job was deployed through. If 
> we all agree on tracking job statistics this way I am happy to prepare a 
> patch for this option and drop this one: 
> https://lore.kernel.org/all/20230331000622.4156-1-d...@redhat.com/T/#u

Hmm, I never thought about "job statistics" when I preferred using kref above.
The reason kref is attractive is because one doesn't need to worry about
it--when the last user drops the kref, the release is called to do
housekeeping. If this never happens, we know that we have a bug to debug.

Regarding the patch above--I did look around the code, and it seems safe,
as per your analysis, I didn't see any reference to entity after job submission,
but I'll comment on that thread as well for the record.

Regards,
Luben

> 
> Christian mentioned amdgpu tried something similar to what Lucas tried 
> running into similar trouble, backed off and implemented it in another 
> way - a driver specific way I guess?
> 
>>
>> Lucas, can you shed some light on,
>>
>> 1. In what way the current FIFO scheduling is unfair, and
>> 2. shed some details on this "scheduling algorithm with a bit
>> more fairness than the current one"?
>>
>> Regards,
>> Luben
>>
>>>
 2. Somehow make sure drm_sched_entity_destroy() does block until all
 jobs deployed through this entity were fetched from the schedulers
 pending list. Though, I'm pretty sure that this is not really desirable.

 3. Just revert the change and let drivers implement tracking of GPU
 active times themselves.

>>> Given that we are already pretty late in the release cycle and etnaviv
>>> being the only driver so far making use of the scheduler elapsed time
>>> tracking I think the right short term solution is to either move the
>>> tracking into etnaviv or just revert the change for now. I'll have a
>>> look at this.
>>>
>>> Regards,
>>> Lucas
>>>
 In the case of just reverting the change I'd propose to also set a jobs
 entity pointer to NULL  once the job was taken from the entity, such
 that in case of a future issue we fail where the actual issue resides
 and to make it more obvious that the field shouldn't be used anymore
 after the job was taken from the 

Re: [PATCH v3 54/65] clk: da8xx: clk48: Switch to determine_rate

2023-04-05 Thread David Lechner

On 4/5/23 10:22 AM, Maxime Ripard wrote:

Hi David,

On Wed, Apr 05, 2023 at 10:03:24AM -0500, David Lechner wrote:

On 4/4/23 5:11 AM, Maxime Ripard wrote:

The TI DA8xx USB0 clk48 clocks implements a mux with a set_parent
hook, but doesn't provide a determine_rate implementation.

This is a bit odd, since set_parent() is there to, as its name implies,
change the parent of a clock. However, the most likely candidate to
trigger that parent change is a call to clk_set_rate(), with
determine_rate() figuring out which parent is the best suited for a
given rate.



As mentioned in my previous review, parent is selected by device
tree and should never be changed after init.


Great minds think alike then, because the driver implements exactly
that, either before or after that patch.

That patch makes the current behaviour explicit but doesn't change it in
any way.

So I guess that means that I can add your Acked-by on the three patches
you reviewed with the same message?

Maxime


Yes, preferably with a simplified commit message.

Acked-by: David Lechner 



Re: [PATCH 01/18] fbdev: Prepare generic architecture helpers

2023-04-05 Thread Daniel Vetter
On Wed, Apr 05, 2023 at 05:53:03PM +0200, Arnd Bergmann wrote:
> On Wed, Apr 5, 2023, at 17:05, Thomas Zimmermann wrote:
> > Generic implementations of fb_pgprotect() and fb_is_primary_device()
> > have been in the source code for a long time. Prepare the header file
> > to make use of them.
> >
> > Improve the code by using an inline function for fb_pgprotect() and
> > by removing include statements.
> >
> > Symbols are protected by preprocessor guards. Architectures that
> > provide a symbol need to define a preprocessor token of the same
> > name and value. Otherwise the header file will provide a generic
> > implementation. This pattern has been taken from .
> >
> > Signed-off-by: Thomas Zimmermann 
> 
> Moving this into generic code is good, but I'm not sure
> about the default for fb_pgprotect():
> 
> > +
> > +#ifndef fb_pgprotect
> > +#define fb_pgprotect fb_pgprotect
> > +static inline void fb_pgprotect(struct file *file, struct vm_area_struct 
> > *vma,
> > +   unsigned long off)
> > +{ }
> > +#endif
> 
> I think most architectures will want the version we have on
> arc, arm, arm64, loongarch, and sh already:
> 
> static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
> unsigned long off)
> {
>vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> }
> 
> so I'd suggest making that version the default, and treating the
> empty ones (m68knommu, sparc32) as architecture specific
> workarounds.

Yeah I was about to type the same suggestion :-)
-Daniel

 
> I see that sparc64 and parisc use pgprot_uncached here, but as
> they don't define a custom pgprot_writecombine, this ends up being
> the same, and they can use the above definition as well.
> 
> mips defines pgprot_writecombine but uses pgprot_noncached
> in fb_pgprotect(), which is probably a mistake and should have
> been updated as part of commit 4b050ba7a66c ("MIPS: pgtable.h:
> Implement the pgprot_writecombine function for MIPS").
> 
> Arnd

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 01/19] drm/i915/i915_scatterlist: Fix kerneldoc formatting issue - missing '@'

2023-04-05 Thread Jani Nikula
On Wed, 05 Apr 2023, Lee Jones  wrote:
> On Tue, 04 Apr 2023, Jani Nikula wrote:
>
>> On Mon, 03 Apr 2023, Lee Jones  wrote:
>> > On Mon, 03 Apr 2023, Jani Nikula wrote:
>> >
>> >> On Fri, 31 Mar 2023, Lee Jones  wrote:
>> >> > Fixes the following W=1 kernel build warning(s):
>> >> >
>> >> >  drivers/gpu/drm/i915/i915_scatterlist.c:62: warning: Function 
>> >> > parameter or member 'size' not described in 'i915_refct_sgt_init'
>> >> >
>> >> > Cc: Jani Nikula 
>> >> > Cc: Joonas Lahtinen 
>> >> > Cc: Rodrigo Vivi 
>> >> > Cc: Tvrtko Ursulin 
>> >> > Cc: David Airlie 
>> >> > Cc: Daniel Vetter 
>> >> > Cc: intel-...@lists.freedesktop.org
>> >> > Cc: dri-devel@lists.freedesktop.org
>> >> > Signed-off-by: Lee Jones 
>> >>
>> >> Thanks for the patches!
>> >>
>> >> Applied all but one of the drm/i915 patches to drm-intel-next or
>> >> drm-intel-gt-next depending on the area. There were a couple of issues
>> >> that I fixed while applying. There was a conflict with patch 5/19
>> >> against drm-intel-gt-next so I left that one out.
>> >
>> > Thanks Jani.  I'll rebase and see what's left.
>>
>> We also took notice and aim to track this more aggressively [1].
>
> Thanks.
>
> I did clean-up all of the GPU warnings already a couple of years ago,
> but they seem to have crept back over time.  It would be great if we
> could put some extra checks in place to prevent them in the future.

We are pretty zealous about warnings in general in i915. We have a bunch
of extra warnings in our local Makefile and use -Werror in
development. Inspired by this series, we added kernel-doc check to the
build, and hope to add kernel-doc -Werror too once we're done.

> My aim, albeit ambitious, is to clean-up all of the W=1 warnings in the
> kernel, then have them promoted to W=0, so they warn more loudly during
> development, thus keeping them from reappearing.

I wish it was easier to do the equivalent of W=1 on a driver or Makefile
basis. I like to keep i915 clean, but I don't like to use W=1 because
there are just so many warnings currently.

The other alternative is fixing and moving extra warnings from W=1 to
W=0 one by one.


BR,
Jani.


-- 
Jani Nikula, Intel Open Source Graphics Center


Re: [PATCH] drm/msm: Check for the GPU IOMMU during bind

2023-04-05 Thread Jordan Crouse
On Fri, Mar 10, 2023 at 01:05:36AM +0200, Dmitry Baryshkov wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you can confirm the sender and know the 
> content is safe.
> 
> 
> 
> On 10/03/2023 00:20, Jordan Crouse wrote:
> > While booting with amd,imageon on a headless target the GPU probe was
> > failing with -ENOSPC in get_pages() from msm_gem.c.
> > 
> > Investigation showed that the driver was using the default 16MB VRAM
> > carveout because msm_use_mmu() was returning false since headless devices
> > use a dummy parent device. Avoid this by extending the existing is_a2xx
> > priv member to check the GPU IOMMU state on all platforms and use that
> > check in msm_use_mmu().
> 
> I wonder if we can fix this by setting 'dummy_dev'->of_node to adreno's
> of_node. Did you check that possibility?

I said I would check and then never looped back around. This will stick
on my todo list for now and I'll check on the next cycle. If anybody
else wants to jump in the meantime then please go for it.

Jordan


Re: [PATCH 1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers

2023-04-05 Thread Daniel Vetter
On Wed, Apr 05, 2023 at 04:32:19PM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 05.04.23 um 15:18 schrieb Daniel Vetter:
> > On Wed, Apr 05, 2023 at 01:16:27PM +0200, Javier Martinez Canillas wrote:
> > > Thomas Zimmermann  writes:
> > > 
> > > [...]
> > > 
> > > > 
> > > > Your comment says that it calls a PCI function to clean up to vgacon.
> > > > That comment explains what is happening, not why. And how the PCI and
> > > > vgacon code work together is non-obvious.
> > 
> > Would a better comment help then:
> > 
> > /*
> >  * gma500 is a strange hybrid device, which both acts as a pci
> >  * device (for legacy vga functionality) but also more like an
> >  * integrated display on a SoC where the framebuffer simply
> >  * resides in main memory and not in a special pci bar (that
> >  * internally redirects to a stolen range of main memory) like all
> >  * other integrated pci display devices have.
> >  *
> >  * To catch all cases we need to both remove conflicting fw
> >  * drivers for the pci device and main memory.
> >  */
> 
> Together with the existing comment, this should be the comment to describe
> gma_remove_conflicting_framebuffers().
> 
> > > > 
> > > > Again, here's my proposal for gma500:
> > > > 
> > > > // call this from psb_pci_probe()
> > > > int gma_remove_conflicting_framebuffers(struct pci_dev *pdev, const
> > > > struct drm_driver *req_driver)
> > > > {
> > > > resource_size_t base = 0;
> > > > resource_size_t size = (resource_size_t)-1;
> > > > const char *name = req_driver->name;
> > > > int ret;
> > > > 
> > > > /*
> > > >  * We cannot yet easily find the framebuffer's location in
> > > >  * memory. So remove all framebuffers here.
> > > >  *
> > > >  * TODO: Refactor psb_driver_load() to map vdc_reg earlier. Then
> > > >  *   we might be able to read the framebuffer range from the
> > > >  *   device.
> > > >  */
> > > > ret = aperture_remove_conflicting_devices(base, size, name);
> > 
> > Why can't this be a call to drm_aperture_remove_framebuffers? At least as
> > long as we don't implement the "read out actual fb base and size" code,
> > which also none of the other soc drivers bother with?
> 
> It can. Feel free to use it.
> 
> But I have to say that those DRM helpers are somewhat empty and obsolete
> after the aperture code has been moved to drivers/video/. They exist mostly
> for convenience. As with other DRM helpers, if a driver needs something
> special, it can ignore them.
> 
> > 
> > > > if (ret)
> > > > return ret;
> > > > 
> > > > /*
> > > >  * WARNING: Apparently we must kick fbdev drivers before vgacon,
> > > >  * otherwise the vga fbdev driver falls over.
> > > >  */
> > > > ret = vga_remove_vgacon(pdev);
> > 
> > This isn't enough, we also nuke stuff that's mapping the vga fb range.
> > Which is really the reason I don't want to open code random stuff, pci is
> > self-describing, if it's decoding legacy vga it can figure this out and we
> > only have to implement the "how do I nuke legacy vga fw drivers from a pci
> > driver" once.
> 
> Sure, but it's really just one additional line:
> 
>   aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE);
> 
> as you mention below, this and vgacon can be exported in a single VGA
> aperture helper.
> 
> > 
> > Not twice like this would result in, with the gma500 version being only
> > half the thing.
> > 
> > If it absolutely has to be a separate function for the gma500 pci legacy
> > vga (I still don't get why, it's just a pci vga device, there's absolutely
> > nothing special about that part at all) then I think it needs to be at
> > least a common "nuke a legacy vga device for me pls" function, which
> > shares the implementation with the pci one.
> 
> Sure
> 
> /**
>  * kerneldoc goes here
>  *
>  * WARNING: Apparently we must remove graphics drivers before calling
>  *  this helper. Otherwise the vga fbdev driver falls over if
>  *  we have vgacon configured.
>  */
> int aperture_remove_legacy_vga_devices(struct pci_dev *pdev)
> {
>   aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE);
> 
>   return vga_remove_vgacon(pdev);
> }
> 
> And that can be called from gma500 and the pci aperture helper.

But you still pass a pci_dev to that helper. Which just doesn't make any
sense to me (assuming your entire point is that this isn't just a normal
pci device but some special legacy vga thing), but if we go with (void)
then there's more refactoring to do because the vga_remove_vgacon also
wants a pdev.

All so that we don't call aperture_detach_devices() on a bunch of pci
bars, which apparently is not problem for any other driver, but absolutely
is a huge problem for gma500 somehow.

I don't understand why.

Consider this me throwing in 

Re: [PATCH 01/18] fbdev: Prepare generic architecture helpers

2023-04-05 Thread Arnd Bergmann
On Wed, Apr 5, 2023, at 17:05, Thomas Zimmermann wrote:
> Generic implementations of fb_pgprotect() and fb_is_primary_device()
> have been in the source code for a long time. Prepare the header file
> to make use of them.
>
> Improve the code by using an inline function for fb_pgprotect() and
> by removing include statements.
>
> Symbols are protected by preprocessor guards. Architectures that
> provide a symbol need to define a preprocessor token of the same
> name and value. Otherwise the header file will provide a generic
> implementation. This pattern has been taken from .
>
> Signed-off-by: Thomas Zimmermann 

Moving this into generic code is good, but I'm not sure
about the default for fb_pgprotect():

> +
> +#ifndef fb_pgprotect
> +#define fb_pgprotect fb_pgprotect
> +static inline void fb_pgprotect(struct file *file, struct vm_area_struct 
> *vma,
> + unsigned long off)
> +{ }
> +#endif

I think most architectures will want the version we have on
arc, arm, arm64, loongarch, and sh already:

static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
unsigned long off)
{
   vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
}

so I'd suggest making that version the default, and treating the
empty ones (m68knommu, sparc32) as architecture specific
workarounds.

I see that sparc64 and parisc use pgprot_uncached here, but as
they don't define a custom pgprot_writecombine, this ends up being
the same, and they can use the above definition as well.

mips defines pgprot_writecombine but uses pgprot_noncached
in fb_pgprotect(), which is probably a mistake and should have
been updated as part of commit 4b050ba7a66c ("MIPS: pgtable.h:
Implement the pgprot_writecombine function for MIPS").

Arnd


Re: [PATCH RFC 12/18] rust: drm: sched: Add GPU scheduler abstraction

2023-04-05 Thread Daniel Vetter
On Tue, Mar 07, 2023 at 11:25:37PM +0900, Asahi Lina wrote:
> The GPU scheduler manages scheduling GPU jobs and dependencies between
> them. This Rust abstraction allows Rust DRM drivers to use this
> functionality.
> 
> Signed-off-by: Asahi Lina 

Overall (with my limited rust knowledge) I really like this, it nicely
encodes the state transitions of jobs and anything else I looked into.
Some thoughts/questions below.

> ---
>  drivers/gpu/drm/Kconfig |   5 +
>  rust/bindings/bindings_helper.h |   1 +
>  rust/helpers.c  |   6 +
>  rust/kernel/drm/mod.rs  |   2 +
>  rust/kernel/drm/sched.rs| 358 
> 
>  5 files changed, 372 insertions(+)
> 
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index 70a983a17ac2..8b5ad6aee126 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -39,6 +39,11 @@ config RUST_DRM_GEM_SHMEM_HELPER
>   depends on RUST_DRM
>   select DRM_GEM_SHMEM_HELPER
>  
> +config RUST_DRM_SCHED
> + bool
> + depends on RUST_DRM
> + select DRM_SCHED
> +
>  config DRM_MIPI_DBI
>   tristate
>   depends on DRM
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index b6696011f3a4..dc01be08676e 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> diff --git a/rust/helpers.c b/rust/helpers.c
> index 11965b1e2f4e..1b33ed602090 100644
> --- a/rust/helpers.c
> +++ b/rust/helpers.c
> @@ -408,6 +408,12 @@ void rust_helper___spin_lock_init(spinlock_t *lock, 
> const char *name,
>  }
>  EXPORT_SYMBOL_GPL(rust_helper___spin_lock_init);
>  
> +unsigned long rust_helper_msecs_to_jiffies(const unsigned int m)
> +{
> + return msecs_to_jiffies(m);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_msecs_to_jiffies);
> +
>  #ifdef CONFIG_DMA_SHARED_BUFFER
>  
>  void rust_helper_dma_fence_get(struct dma_fence *fence)
> diff --git a/rust/kernel/drm/mod.rs b/rust/kernel/drm/mod.rs
> index dae98826edfd..3ddf7712aab3 100644
> --- a/rust/kernel/drm/mod.rs
> +++ b/rust/kernel/drm/mod.rs
> @@ -8,4 +8,6 @@ pub mod file;
>  pub mod gem;
>  pub mod ioctl;
>  pub mod mm;
> +#[cfg(CONFIG_RUST_DRM_SCHED)]
> +pub mod sched;
>  pub mod syncobj;
> diff --git a/rust/kernel/drm/sched.rs b/rust/kernel/drm/sched.rs
> new file mode 100644
> index ..a5275cc16179
> --- /dev/null
> +++ b/rust/kernel/drm/sched.rs
> @@ -0,0 +1,358 @@
> +// SPDX-License-Identifier: GPL-2.0 OR MIT
> +
> +//! DRM Scheduler
> +//!
> +//! C header: 
> [`include/linux/drm/gpu_scheduler.h`](../../../../include/linux/drm/gpu_scheduler.h)
> +
> +use crate::{
> +bindings, device,
> +dma_fence::*,
> +error::{to_result, Result},
> +prelude::*,
> +sync::{Arc, UniqueArc},
> +};
> +use alloc::boxed::Box;
> +use core::marker::PhantomData;
> +use core::mem::MaybeUninit;
> +use core::ops::{Deref, DerefMut};
> +use core::ptr::addr_of_mut;
> +
> +/// Scheduler status after timeout recovery
> +#[repr(u32)]
> +pub enum Status {
> +/// Device recovered from the timeout and can execute jobs again
> +Nominal = bindings::drm_gpu_sched_stat_DRM_GPU_SCHED_STAT_NOMINAL,
> +/// Device is no longer available
> +NoDevice = bindings::drm_gpu_sched_stat_DRM_GPU_SCHED_STAT_ENODEV,
> +}
> +
> +/// Scheduler priorities
> +#[repr(i32)]
> +pub enum Priority {
> +/// Low userspace priority
> +Min = bindings::drm_sched_priority_DRM_SCHED_PRIORITY_MIN,
> +/// Normal userspace priority
> +Normal = bindings::drm_sched_priority_DRM_SCHED_PRIORITY_NORMAL,
> +/// High userspace priority
> +High = bindings::drm_sched_priority_DRM_SCHED_PRIORITY_HIGH,
> +/// Kernel priority (highest)
> +Kernel = bindings::drm_sched_priority_DRM_SCHED_PRIORITY_KERNEL,
> +}
> +
> +/// Trait to be implemented by driver job objects.
> +pub trait JobImpl: Sized {
> +/// Called when the scheduler is considering scheduling this job next, 
> to get another Fence
> +/// for this job to block on. Once it returns None, run() may be called.
> +fn prepare(_job:  Job) -> Option {

So if I get this all right then Job allows us to nicely parametrize the
job with the driver structure itself, but not really anything else. I do
wonder whether this needs a bit more with a type both for the job and
entity and the drm/sched code + rust wrapper guaranteeing that the
lifetimes of these make sense. With just the job parametrized drivers need
to make sure they refcount anything else hanging of that properly which
means if they get some detail wrong there might be an unintentional leak.

If we instead also give a parametrized entity where the driver can stuff
anything necessary and sched code guarantees that it'll clean up the any
mess on teardown and guarantee that the entity survives, I think a lot of
drivers could benefit from that and it would 

Re: [Regression] drm/scheduler: track GPU active time per entity

2023-04-05 Thread Danilo Krummrich

On 4/4/23 06:31, Luben Tuikov wrote:

On 2023-03-28 04:54, Lucas Stach wrote:

Hi Danilo,

Am Dienstag, dem 28.03.2023 um 02:57 +0200 schrieb Danilo Krummrich:

Hi all,

Commit df622729ddbf ("drm/scheduler: track GPU active time per entity")
tries to track the accumulated time that a job was active on the GPU
writing it to the entity through which the job was deployed to the
scheduler originally. This is done within drm_sched_get_cleanup_job()
which fetches a job from the schedulers pending_list.

Doing this can result in a race condition where the entity is already
freed, but the entity's newly added elapsed_ns field is still accessed
once the job is fetched from the pending_list.

After drm_sched_entity_destroy() being called it should be safe to free
the structure that embeds the entity. However, a job originally handed
over to the scheduler by this entity might still reside in the
schedulers pending_list for cleanup after drm_sched_entity_destroy()
already being called and the entity being freed. Hence, we can run into
a UAF.


Sorry about that, I clearly didn't properly consider this case.


In my case it happened that a job, as explained above, was just picked
from the schedulers pending_list after the entity was freed due to the
client application exiting. Meanwhile this freed up memory was already
allocated for a subsequent client applications job structure again.
Hence, the new jobs memory got corrupted. Luckily, I was able to
reproduce the same corruption over and over again by just using
deqp-runner to run a specific set of VK test cases in parallel.

Fixing this issue doesn't seem to be very straightforward though (unless
I miss something), which is why I'm writing this mail instead of sending
a fix directly.

Spontaneously, I see three options to fix it:

1. Rather than embedding the entity into driver specific structures
(e.g. tied to file_priv) we could allocate the entity separately and
reference count it, such that it's only freed up once all jobs that were
deployed through this entity are fetched from the schedulers pending list.


My vote is on this or something in similar vain for the long term. I
have some hope to be able to add a GPU scheduling algorithm with a bit
more fairness than the current one sometime in the future, which
requires execution time tracking on the entities.


Danilo,

Using kref is preferable, i.e. option 1 above.


I think the only real motivation for doing that would be for generically 
tracking job statistics within the entity a job was deployed through. If 
we all agree on tracking job statistics this way I am happy to prepare a 
patch for this option and drop this one: 
https://lore.kernel.org/all/20230331000622.4156-1-d...@redhat.com/T/#u


Christian mentioned amdgpu tried something similar to what Lucas tried 
running into similar trouble, backed off and implemented it in another 
way - a driver specific way I guess?




Lucas, can you shed some light on,

1. In what way the current FIFO scheduling is unfair, and
2. shed some details on this "scheduling algorithm with a bit
more fairness than the current one"?

Regards,
Luben




2. Somehow make sure drm_sched_entity_destroy() does block until all
jobs deployed through this entity were fetched from the schedulers
pending list. Though, I'm pretty sure that this is not really desirable.

3. Just revert the change and let drivers implement tracking of GPU
active times themselves.


Given that we are already pretty late in the release cycle and etnaviv
being the only driver so far making use of the scheduler elapsed time
tracking I think the right short term solution is to either move the
tracking into etnaviv or just revert the change for now. I'll have a
look at this.

Regards,
Lucas


In the case of just reverting the change I'd propose to also set a jobs
entity pointer to NULL  once the job was taken from the entity, such
that in case of a future issue we fail where the actual issue resides
and to make it more obvious that the field shouldn't be used anymore
after the job was taken from the entity.

I'm happy to implement the solution we agree on. However, it might also
make sense to revert the change until we have a solution in place. I'm
also happy to send a revert with a proper description of the problem.
Please let me know what you think.

- Danilo









Re: [PATCH 1/1] drm/bridge: ti-sn65dsi86: Allow GPIO operations to sleep

2023-04-05 Thread Doug Anderson
Hi,

On Wed, Apr 5, 2023 at 6:51 AM Alexander Stein
 wrote:
>
> There is no need to require non-sleeping GPIO access. Silence the
> WARN_ON() if GPIO is using e.g. I2C expanders.
>
> Signed-off-by: Alexander Stein 
> ---
> This is the TI SN65DSI86 equivalent to 805245071240 ("drm/bridge:
> ti-sn65dsi83: Allow GPIO operations to sleep")
>
>  drivers/gpu/drm/bridge/ti-sn65dsi86.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Douglas Anderson 

Trivial / straightforward change. Sending it to drm-misc-next.

77d08a2de6a4 drm/bridge: ti-sn65dsi86: Allow GPIO operations to sleep


Re: [PATCH v3 43/65] ASoC: tlv320aic32x4: Add a determine_rate hook

2023-04-05 Thread Mark Brown
On Wed, Apr 05, 2023 at 05:17:21PM +0200, Maxime Ripard wrote:
> On Tue, Apr 04, 2023 at 04:26:18PM +0100, Mark Brown wrote:

> > To be honest it's surprising that we'd have to manually specify this, I
> > would expect to be able to reparent.  I suspect it'd be better to go the
> > other way here and allow reparenting.

> Yeah, I think I'd prefer to allow reparenting too, but as can be seen
> from the other reviewers in that thread, it seems like we have a very
> split community here, so that doesn't sound very realistic without some
> major pushback :)

For these ASoC drivers I think we should just do the reparenting,
they're very much at the leaf of the tree so the considerations that
make it a problem sometimes are unlikely to apply.


signature.asc
Description: PGP signature


Re: [PATCH v2 56/65] clk: ingenic: cgu: Switch to determine_rate

2023-04-05 Thread Paul Cercueil
Le mercredi 05 avril 2023 à 16:50 +0200, Maxime Ripard a écrit :
> On Wed, Apr 05, 2023 at 02:57:26PM +0200, Paul Cercueil wrote:
> > Le lundi 27 mars 2023 à 21:24 +0200, Maxime Ripard a écrit :
> > > On Fri, Mar 24, 2023 at 08:58:48PM +, Aidan MacDonald wrote:
> > > > > > My suggestion: add a per-clock bitmap to keep track of
> > > > > > which
> > > > > > parents
> > > > > > are allowed. Any operation that would select a parent clock
> > > > > > not
> > > > > > on the
> > > > > > whitelist should fail. Automatic reparenting should only
> > > > > > select
> > > > > > from
> > > > > > clocks on the whitelist. And we need new DT bindings for
> > > > > > controlling
> > > > > > the whitelist, for example:
> > > > > > 
> > > > > >     clock-parents-0 = <>, <_c>;
> > > > > >     clock-parents-1 = <>, <_a>, <_b>;
> > > > > > 
> > > > > > This means that clk1 can only have pll_c as a parent, while
> > > > > > clk2 can
> > > > > > have pll_a or pll_b as parents. By default every clock will
> > > > > > be
> > > > > > able
> > > > > > to use any parent, so a list is only needed if the machine
> > > > > > needs a
> > > > > > more restrictive policy.
> > > > > > 
> > > > > > assigned-clock-parents should disable automatic
> > > > > > reparenting,
> > > > > > but allow
> > > > > > explicit clk_set_parent(). This will allow clock drivers to
> > > > > > start doing
> > > > > > reparenting without breaking old DTs.
> > > > > 
> > > > > I'm generally not a fan of putting all these policies in the
> > > > > device
> > > > > tree. Do you have an example where it wouldn't be possible to
> > > > > do
> > > > > exactly
> > > > > this from the driver itself?
> > > > 
> > > > I'm confused. What's implicit in the example is clk1 and clk2
> > > > might
> > > > have *other* possible choices of parent clock and the device
> > > > tree
> > > > is
> > > > limiting what the OS is allowed to choose.
> > > > 
> > > > Why would you put such arbitrary limitations into the driver?
> > > 
> > > Why would we put such arbitrary limitations in the firmware? As
> > > this
> > > entire thread can attest, people are already using the device
> > > tree to
> > > work around the limitations of the Linux driver, or reduce the
> > > features of Linux because they can rely on the device tree.
> > > Either
> > > way, it's linked to the state of the Linux driver, and any other
> > > OS
> > > or
> > > Linux version could very well implement something more dynamic.
> > 
> > Probably because if we have to choose between setting policy in the
> > kernel or in the firmware, it is arguably better to set it in the
> > firmware.
> 
> I have a very different view on this I guess. Firmware is (most of
> the
> time) hard to update, and the policy depend on the state of support
> of a
> given OS so it's likely to evolve. The kernel is the best place to me
> to
> put that kind of policy. Why do you think differently?

Because the clocks configuration can be board-specific. And you don't
really want board-specific stuff in the driver.

If we take the Ingenic JZ4770 SoC as example, on one board we parent
everything we can to the PLL1 clock and set it to 432 MHz (the least
common multiple). Then the PLL0 (which drives the DDR and CPU clocks)
can be updated to overclock/underclock the CPU and RAM.

So should that be hardcoded in the driver? Well, for a different board,
for which overclock is not really needed, it's better to parent
everything to PLL0 so that PLL1 can be shut down to save power. So what
policy should be hardcoded in the driver?

> 
> > Especially when talking about clocks, as the firmware is already
> > the
> > one programming the boot clocks.
> 
> I'm not sure what your point is there. I don't think I ever saw a
> firmware getting the clocks right for every possible scenario on a
> given
> platform. And if it was indeed the case, then we wouldn't even a
> kernel
> driver.

My point is that there is already policy in how the firmware sets up
the hardware; and most often than not, the kernel driver won't change
that (e.g. you're probably not going to touch the DDR clock). It's
better to have all policy in the firmware then, instead of having some
in the firmware, and some in the kernel driver.

Cheers,
-Paul


Re: [PATCH] drm/panel-edp: Add AUO NE135FBM-N41 v8.1 panel entry

2023-04-05 Thread Doug Anderson
Hi,

On Wed, Apr 5, 2023 at 3:05 AM AngeloGioacchino Del Regno
 wrote:
>
> Add a panel entry with delay_200_500_e50 for the AUO NE135FBM-N41
> version 8.1, found on a number of ACER laptops, including the
> Swift 3 (SF313-52, SF313-53), Chromebook Spin 513 (CP513-2H) and
> others.
>
> Signed-off-by: AngeloGioacchino Del Regno 
> 
> ---
>  drivers/gpu/drm/panel/panel-edp.c | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Douglas Anderson 

There's not lots of reason to delay landing patches like this, so
pushing to drm-misc-next right away.

a80c882183e3 drm/panel-edp: Add AUO NE135FBM-N41 v8.1 panel entry


Re: [PATCH v3 54/65] clk: da8xx: clk48: Switch to determine_rate

2023-04-05 Thread David Lechner

On 4/4/23 5:11 AM, Maxime Ripard wrote:

The TI DA8xx USB0 clk48 clocks implements a mux with a set_parent
hook, but doesn't provide a determine_rate implementation.

This is a bit odd, since set_parent() is there to, as its name implies,
change the parent of a clock. However, the most likely candidate to
trigger that parent change is a call to clk_set_rate(), with
determine_rate() figuring out which parent is the best suited for a
given rate.



As mentioned in my previous review, parent is selected by device
tree and should never be changed after init.



Re: [PATCH v3 21/65] clk: davinci: da8xx-cfgchip: Add a determine_rate hook

2023-04-05 Thread David Lechner

On 4/4/23 5:11 AM, Maxime Ripard wrote:

The Davinci DA8xxx cfgchip mux clock implements a mux with a set_parent
hook, but doesn't provide a determine_rate implementation.

This is a bit odd, since set_parent() is there to, as its name implies,
change the parent of a clock. However, the most likely candidate to
trigger that parent change is a call to clk_set_rate(), with
determine_rate() figuring out which parent is the best suited for a
given rate.



As mentioned in my previous review, parent is selected by device
tree and should never be changed after init.




Re: [PATCH v3 22/65] clk: davinci: da8xx-cfgchip: Add a determine_rate hook

2023-04-05 Thread David Lechner

On 4/4/23 5:11 AM, Maxime Ripard wrote:

The Davinci DA8xxx cfgchip "clk48" clock implements a mux with a
set_parent hook, but doesn't provide a determine_rate implementation.

This is a bit odd, since set_parent() is there to, as its name implies,
change the parent of a clock. However, the most likely candidate to
trigger that parent change is a call to clk_set_rate(), with
determine_rate() figuring out which parent is the best suited for a
given rate.



As mentioned in my previous review, parent is selected by device
tree and should never be changed after init.




Re: [PATCH 0/4] log2: make is_power_of_2() more generic

2023-04-05 Thread Steven Price
On 31/03/2023 09:31, Jani Nikula wrote:
> On Thu, 30 Mar 2023, Andrew Morton  wrote:
>> On Thu, 30 Mar 2023 21:53:03 + David Laight  
>> wrote:
>>
 But wouldn't all these issues be addressed by simply doing

 #define is_power_of_2(n) (n != 0 && ((n & (n - 1)) == 0))

 ?

 (With suitable tweaks to avoid evaluating `n' more than once)
>>>
>>> I think you need to use the 'horrid tricks' from min() to get
>>> a constant expression from constant inputs.
>>
>> This
>>
>> --- a/include/linux/log2.h~a
>> +++ a/include/linux/log2.h
>> @@ -41,11 +41,11 @@ int __ilog2_u64(u64 n)
>>   * *not* considered a power of two.
>>   * Return: true if @n is a power of 2, otherwise false.
>>   */
>> -static inline __attribute__((const))
>> -bool is_power_of_2(unsigned long n)
>> -{
>> -return (n != 0 && ((n & (n - 1)) == 0));
>> -}
>> +#define is_power_of_2(_n)   \
>> +({  \
>> +typeof(_n) n = (_n);\
>> +n != 0 && ((n & (n - 1)) == 0); \
>> +})
>>  
>>  /**
>>   * __roundup_pow_of_two() - round up to nearest power of two
>> _
>>
>> worked for me in a simple test.
>>
>> --- a/fs/open.c~b
>> +++ a/fs/open.c
>> @@ -1564,3 +1564,10 @@ int stream_open(struct inode *inode, str
>>  }
>>  
>>  EXPORT_SYMBOL(stream_open);
>> +
>> +#include 
>> +
>> +int foo(void)
>> +{
>> +return is_power_of_2(43);
>> +}
>> _
>>
>>
>> foo:
>> # fs/open.c:1573: }
>>  xorl%eax, %eax  #
>>  ret 
>>
>>
>> Is there some more tricky situation where it breaks?
> 
> It doesn't work with BUILD_BUG_ON_ZERO().

Like most programming problems, you just need another layer of
indirection! The below works for me in all the cases I could think of
(including __uint128_t).


#define __IS_POWER_OF_2(n) (n != 0 && ((n & (n - 1)) == 0))

#define _IS_POWER_OF_2(n, unique_n) \
({  \
typeof(n) unique_n = (n);   \
__IS_POWER_OF_2(unique_n);  \
})

#define is_power_of_2(n)\
__builtin_choose_expr(__is_constexpr((n)),  \
  __IS_POWER_OF_2((n)), \
  _IS_POWER_OF_2(n, __UNIQUE_ID(_n)))


Although Jani's original might be easier to understand.

Steve


Re: [PATCH v3 54/65] clk: da8xx: clk48: Switch to determine_rate

2023-04-05 Thread Maxime Ripard
Hi David,

On Wed, Apr 05, 2023 at 10:03:24AM -0500, David Lechner wrote:
> On 4/4/23 5:11 AM, Maxime Ripard wrote:
> > The TI DA8xx USB0 clk48 clocks implements a mux with a set_parent
> > hook, but doesn't provide a determine_rate implementation.
> > 
> > This is a bit odd, since set_parent() is there to, as its name implies,
> > change the parent of a clock. However, the most likely candidate to
> > trigger that parent change is a call to clk_set_rate(), with
> > determine_rate() figuring out which parent is the best suited for a
> > given rate.
> > 
> 
> As mentioned in my previous review, parent is selected by device
> tree and should never be changed after init.

Great minds think alike then, because the driver implements exactly
that, either before or after that patch.

That patch makes the current behaviour explicit but doesn't change it in
any way.

So I guess that means that I can add your Acked-by on the three patches
you reviewed with the same message?

Maxime


signature.asc
Description: PGP signature


Re: [PATCH v3 56/65] clk: ingenic: cgu: Switch to determine_rate

2023-04-05 Thread Maxime Ripard
Hi Paul,

On Wed, Apr 05, 2023 at 03:04:05PM +0200, Paul Cercueil wrote:
> Le mardi 04 avril 2023 à 12:11 +0200, Maxime Ripard a écrit :
> > The Ingenic CGU clocks implements a mux with a set_parent hook, but
> > doesn't provide a determine_rate implementation.
> > 
> > This is a bit odd, since set_parent() is there to, as its name
> > implies,
> > change the parent of a clock. However, the most likely candidate to
> > trigger that parent change is a call to clk_set_rate(), with
> > determine_rate() figuring out which parent is the best suited for a
> > given rate.
> > 
> > The other trigger would be a call to clk_set_parent(), but it's far
> > less
> > used, and it doesn't look like there's any obvious user for that
> > clock.
> > 
> > So, the set_parent hook is effectively unused, possibly because of an
> > oversight. However, it could also be an explicit decision by the
> > original author to avoid any reparenting but through an explicit call
> > to
> > clk_set_parent().
> 
> As I said in the v2 (IIRC), clk_set_parent() is used when re-parenting
> from the device tree.

Yep, it's indeed an oversight in the commit log.

> > The driver does implement round_rate() though, which means that we
> > can
> > change the rate of the clock, but we will never get to change the
> > parent.
> > 
> > However, It's hard to tell whether it's been done on purpose or not.
> > 
> > Since we'll start mandating a determine_rate() implementation, let's
> > convert the round_rate() implementation to a determine_rate(), which
> > will also make the current behavior explicit. And if it was an
> > oversight, the clock behaviour can be adjusted later on.
> 
> So just to be sure, this patch won't make clk_set_rate() automatically
> switch parents, right?
> 
> Allowing automatic re-parenting sounds like a huge can of worms...

The behaviour is strictly equivalent before that patch and after: the
driver will not reparent on a rate change. It just makes it explicit.

Maxime


signature.asc
Description: PGP signature


Re: [PATCH v3 43/65] ASoC: tlv320aic32x4: Add a determine_rate hook

2023-04-05 Thread Maxime Ripard
Hi Mark,

On Tue, Apr 04, 2023 at 04:26:18PM +0100, Mark Brown wrote:
> On Tue, Apr 04, 2023 at 12:11:33PM +0200, Maxime Ripard wrote:
> > The tlv320aic32x4 clkin clock implements a mux with a set_parent hook,
> > but doesn't provide a determine_rate implementation.
> 
> > This is a bit odd, since set_parent() is there to, as its name implies,
> > change the parent of a clock. However, the most likely candidate to
> > trigger that parent change is a call to clk_set_rate(), with
> > determine_rate() figuring out which parent is the best suited for a
> > given rate.
> 
> > The other trigger would be a call to clk_set_parent(), but it's far less
> > used, and it doesn't look like there's any obvious user for that clock.
> 
> It could be configured from device tree as well couldn't it?

Yep, indeed.

> > So, the set_parent hook is effectively unused, possibly because of an
> > oversight. However, it could also be an explicit decision by the
> > original author to avoid any reparenting but through an explicit call to
> > clk_set_parent().
> 
> Historically clk_set_rate() wouldn't reparent IIRC.
> 
> > The latter case would be equivalent to setting the flag
> > CLK_SET_RATE_NO_REPARENT, together with setting our determine_rate hook
> > to __clk_mux_determine_rate(). Indeed, if no determine_rate
> > implementation is provided, clk_round_rate() (through
> > clk_core_round_rate_nolock()) will call itself on the parent if
> > CLK_SET_RATE_PARENT is set, and will not change the clock rate
> > otherwise. __clk_mux_determine_rate() has the exact same behavior when
> > CLK_SET_RATE_NO_REPARENT is set.
> 
> > And if it was an oversight, then we are at least explicit about our
> > behavior now and it can be further refined down the line.
> 
> To be honest it's surprising that we'd have to manually specify this, I
> would expect to be able to reparent.  I suspect it'd be better to go the
> other way here and allow reparenting.

Yeah, I think I'd prefer to allow reparenting too, but as can be seen
from the other reviewers in that thread, it seems like we have a very
split community here, so that doesn't sound very realistic without some
major pushback :)

Maxime


signature.asc
Description: PGP signature


Re: [PATCH v3 64/65] ASoC: tlv320aic32x4: div: Switch to determine_rate

2023-04-05 Thread Mark Brown
On Tue, Apr 04, 2023 at 12:11:54PM +0200, Maxime Ripard wrote:

> The driver does implement round_rate() though, which means that we can
> change the rate of the clock, but we will never get to change the
> parent.

> However, It's hard to tell whether it's been done on purpose or not.

> Since we'll start mandating a determine_rate() implementation, let's
> convert the round_rate() implementation to a determine_rate(), which
> will also make the current behavior explicit. And if it was an
> oversight, the clock behaviour can be adjusted later on.

Similar comments to the other patch, I'm pretty sure this is just
surprising design on the part of the clock API and we should just allow
reparenting.


signature.asc
Description: PGP signature


Re: [PATCH v3 63/65] ASoC: tlv320aic32x4: pll: Switch to determine_rate

2023-04-05 Thread Mark Brown
On Tue, Apr 04, 2023 at 12:11:53PM +0200, Maxime Ripard wrote:

> The driver does implement round_rate() though, which means that we can
> change the rate of the clock, but we will never get to change the
> parent.

> However, It's hard to tell whether it's been done on purpose or not.

> Since we'll start mandating a determine_rate() implementation, let's
> convert the round_rate() implementation to a determine_rate(), which
> will also make the current behavior explicit. And if it was an
> oversight, the clock behaviour can be adjusted later on.

Similar comments to the other patch, I'm pretty sure this is just
surprising design on the part of the clock API and we should just allow
reparenting.


signature.asc
Description: PGP signature


Re: [PATCH 0/7] drm/tegra: Convert fbdev to DRM client

2023-04-05 Thread Thomas Zimmermann



Am 05.04.23 um 16:55 schrieb Thierry Reding:

On Thu, Mar 30, 2023 at 10:36:00AM +0200, Thomas Zimmermann wrote:

Convert tegra's fbdev code to struct drm_client. Replaces the current
ad-hoc integration. The conversion includes a number of cleanups. As
with most other drivers' fbdev emulation, fbdev in tegra is now just
another DRM client that runs after the DRM device has been registered.

Once all drivers' fbdev emulation has been converted to struct drm_client,
we can attempt to add additional in-kernel clients. A DRM-based dmesg
log or a bootsplash are commonly mentioned. DRM can then switch easily
among the existing clients if/when required.

I did the conversion from similar experience with other drivers. But I
don't have the hardware to test this. Any testing is welcome.

Thomas Zimmermann (7):
   drm/tegra: Include 
   drm/tegra: Include 
   drm/tegra: Removed fb from struct tegra_fbdev
   drm/tegra: Remove struct tegra_fbdev
   drm/tegra: Hide fbdev support behind config option
   drm/tegra: Initialize fbdev DRM client
   drm/tegra: Implement fbdev emulation as in-kernel client

  drivers/gpu/drm/tegra/Makefile |   2 +
  drivers/gpu/drm/tegra/drm.c|  23 +---
  drivers/gpu/drm/tegra/drm.h|  27 ++--
  drivers/gpu/drm/tegra/fb.c | 242 +
  drivers/gpu/drm/tegra/fbdev.c  | 240 
  drivers/gpu/drm/tegra/output.c |   3 +
  drivers/gpu/drm/tegra/rgb.c|   1 +
  7 files changed, 265 insertions(+), 273 deletions(-)
  create mode 100644 drivers/gpu/drm/tegra/fbdev.c


Seems to be working just fine. Applied, thanks.


Thanks a lot!



Thierry


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature


[PATCH 16/18] arch/sparc: Implement fb_is_primary_device() in source file

2023-04-05 Thread Thomas Zimmermann
Other architectures implment fb_is_primary_device() in a source
file. Do the same on sparc. No functional changes, but allows to
remove several include statement from .

Signed-off-by: Thomas Zimmermann 
Cc: "David S. Miller" 
---
 arch/sparc/Makefile |  1 +
 arch/sparc/include/asm/fb.h | 22 +-
 arch/sparc/video/Makefile   |  3 +++
 arch/sparc/video/fbdev.c| 24 
 4 files changed, 33 insertions(+), 17 deletions(-)
 create mode 100644 arch/sparc/video/Makefile
 create mode 100644 arch/sparc/video/fbdev.c

diff --git a/arch/sparc/Makefile b/arch/sparc/Makefile
index a4ea5b05f288..95a9211e48e3 100644
--- a/arch/sparc/Makefile
+++ b/arch/sparc/Makefile
@@ -60,6 +60,7 @@ libs-y += arch/sparc/prom/
 libs-y += arch/sparc/lib/
 
 drivers-$(CONFIG_PM) += arch/sparc/power/
+drivers-$(CONFIG_FB) += arch/sparc/video/
 
 boot := arch/sparc/boot
 
diff --git a/arch/sparc/include/asm/fb.h b/arch/sparc/include/asm/fb.h
index f699962e9ddf..e4ef1955b2b6 100644
--- a/arch/sparc/include/asm/fb.h
+++ b/arch/sparc/include/asm/fb.h
@@ -1,12 +1,14 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef _SPARC_FB_H_
 #define _SPARC_FB_H_
-#include 
-#include 
+
 #include 
+
 #include 
 #include 
 
+struct fb_info;
+
 static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
unsigned long off)
 {
@@ -15,20 +17,6 @@ static inline void fb_pgprotect(struct file *file, struct 
vm_area_struct *vma,
 #endif
 }
 
-static inline int fb_is_primary_device(struct fb_info *info)
-{
-   struct device *dev = info->device;
-   struct device_node *node;
-
-   if (console_set_on_cmdline)
-   return 0;
-
-   node = dev->of_node;
-   if (node &&
-   node == of_console_device)
-   return 1;
-
-   return 0;
-}
+int fb_is_primary_device(struct fb_info *info);
 
 #endif /* _SPARC_FB_H_ */
diff --git a/arch/sparc/video/Makefile b/arch/sparc/video/Makefile
new file mode 100644
index ..6baddbd58e4d
--- /dev/null
+++ b/arch/sparc/video/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_FB) += fbdev.o
diff --git a/arch/sparc/video/fbdev.c b/arch/sparc/video/fbdev.c
new file mode 100644
index ..dadd5799fbb3
--- /dev/null
+++ b/arch/sparc/video/fbdev.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+int fb_is_primary_device(struct fb_info *info)
+{
+   struct device *dev = info->device;
+   struct device_node *node;
+
+   if (console_set_on_cmdline)
+   return 0;
+
+   node = dev->of_node;
+   if (node && node == of_console_device)
+   return 1;
+
+   return 0;
+}
+EXPORT_SYMBOL(fb_is_primary_device);
-- 
2.40.0



[PATCH 14/18] arch/powerpc: Implement with generic helpers

2023-04-05 Thread Thomas Zimmermann
Replace the architecture's fb_is_primary_device() with the generic
one from . No functional changes.

Signed-off-by: Thomas Zimmermann 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
---
 arch/powerpc/include/asm/fb.h | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/fb.h b/arch/powerpc/include/asm/fb.h
index 6541ab77c5b9..5f1a2e5f7654 100644
--- a/arch/powerpc/include/asm/fb.h
+++ b/arch/powerpc/include/asm/fb.h
@@ -2,8 +2,8 @@
 #ifndef _ASM_FB_H_
 #define _ASM_FB_H_
 
-#include 
 #include 
+
 #include 
 
 static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
@@ -13,10 +13,8 @@ static inline void fb_pgprotect(struct file *file, struct 
vm_area_struct *vma,
 vma->vm_end - vma->vm_start,
 vma->vm_page_prot);
 }
+#define fb_pgprotect fb_pgprotect
 
-static inline int fb_is_primary_device(struct fb_info *info)
-{
-   return 0;
-}
+#include 
 
 #endif /* _ASM_FB_H_ */
-- 
2.40.0



[PATCH 12/18] arch/parisc: Implement fb_is_primary_device() under arch/parisc

2023-04-05 Thread Thomas Zimmermann
Move PARISC's implementation of fb_is_primary_device() into the
architecture directory. This the place of the declaration and
where other architectures implement this function. No functional
changes.

Signed-off-by: Thomas Zimmermann 
Cc: "James E.J. Bottomley" 
Cc: Helge Deller 
---
 arch/parisc/Makefile |  2 ++
 arch/parisc/include/asm/fb.h |  2 +-
 arch/parisc/video/Makefile   |  3 +++
 arch/parisc/video/fbdev.c| 27 +++
 drivers/video/sticore.c  | 19 ---
 include/video/sticore.h  |  2 ++
 6 files changed, 35 insertions(+), 20 deletions(-)
 create mode 100644 arch/parisc/video/Makefile
 create mode 100644 arch/parisc/video/fbdev.c

diff --git a/arch/parisc/Makefile b/arch/parisc/Makefile
index 0d049a6f6a60..968ebe17494c 100644
--- a/arch/parisc/Makefile
+++ b/arch/parisc/Makefile
@@ -119,6 +119,8 @@ export LIBGCC
 
 libs-y += arch/parisc/lib/ $(LIBGCC)
 
+drivers-y += arch/parisc/video/
+
 boot   := arch/parisc/boot
 
 PALO := $(shell if (which palo 2>&1); then : ; \
diff --git a/arch/parisc/include/asm/fb.h b/arch/parisc/include/asm/fb.h
index 55d29c4f716e..0b9a38ced5c8 100644
--- a/arch/parisc/include/asm/fb.h
+++ b/arch/parisc/include/asm/fb.h
@@ -12,7 +12,7 @@ static inline void fb_pgprotect(struct file *file, struct 
vm_area_struct *vma,
pgprot_val(vma->vm_page_prot) |= _PAGE_NO_CACHE;
 }
 
-#if defined(CONFIG_FB_STI)
+#if defined(CONFIG_STI_CORE)
 int fb_is_primary_device(struct fb_info *info);
 #else
 static inline int fb_is_primary_device(struct fb_info *info)
diff --git a/arch/parisc/video/Makefile b/arch/parisc/video/Makefile
new file mode 100644
index ..16a73cce4661
--- /dev/null
+++ b/arch/parisc/video/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_STI_CORE) += fbdev.o
diff --git a/arch/parisc/video/fbdev.c b/arch/parisc/video/fbdev.c
new file mode 100644
index ..4a0ae08fc75b
--- /dev/null
+++ b/arch/parisc/video/fbdev.c
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2000 Philipp Rumpf 
+ * Copyright (C) 2001-2020 Helge Deller 
+ * Copyright (C) 2001-2002 Thomas Bogendoerfer 
+ */
+
+#include 
+
+#include 
+
+#include 
+
+int fb_is_primary_device(struct fb_info *info)
+{
+   struct sti_struct *sti;
+
+   sti = sti_get_rom(0);
+
+   /* if no built-in graphics card found, allow any fb driver as default */
+   if (!sti)
+   return true;
+
+   /* return true if it's the default built-in framebuffer driver */
+   return (sti->info == info);
+}
+EXPORT_SYMBOL(fb_is_primary_device);
diff --git a/drivers/video/sticore.c b/drivers/video/sticore.c
index f8aaedea437d..7eb925f2ba9c 100644
--- a/drivers/video/sticore.c
+++ b/drivers/video/sticore.c
@@ -30,7 +30,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 
@@ -1148,24 +1147,6 @@ int sti_call(const struct sti_struct *sti, unsigned long 
func,
return ret;
 }
 
-#if defined(CONFIG_FB_STI)
-/* check if given fb_info is the primary device */
-int fb_is_primary_device(struct fb_info *info)
-{
-   struct sti_struct *sti;
-
-   sti = sti_get_rom(0);
-
-   /* if no built-in graphics card found, allow any fb driver as default */
-   if (!sti)
-   return true;
-
-   /* return true if it's the default built-in framebuffer driver */
-   return (sti->info == info);
-}
-EXPORT_SYMBOL(fb_is_primary_device);
-#endif
-
 MODULE_AUTHOR("Philipp Rumpf, Helge Deller, Thomas Bogendoerfer");
 MODULE_DESCRIPTION("Core STI driver for HP's NGLE series graphics cards in HP 
PARISC machines");
 MODULE_LICENSE("GPL v2");
diff --git a/include/video/sticore.h b/include/video/sticore.h
index c0879352cde4..fbb78d7e7565 100644
--- a/include/video/sticore.h
+++ b/include/video/sticore.h
@@ -2,6 +2,8 @@
 #ifndef STICORE_H
 #define STICORE_H
 
+struct fb_info;
+
 /* generic STI structures & functions */
 
 #define MAX_STI_ROMS 4 /* max no. of ROMs which this driver handles */
-- 
2.40.0



[PATCH 04/18] arch/arm64: Implement with generic helpers

2023-04-05 Thread Thomas Zimmermann
Replace the architecture's fb_is_primary_device() with the generic
one from . No functional changes.

Signed-off-by: Thomas Zimmermann 
Cc: Catalin Marinas 
Cc: Will Deacon 
---
 arch/arm64/include/asm/fb.h | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/fb.h b/arch/arm64/include/asm/fb.h
index bdc735ee1f67..fc31a5d1f48a 100644
--- a/arch/arm64/include/asm/fb.h
+++ b/arch/arm64/include/asm/fb.h
@@ -5,19 +5,17 @@
 #ifndef __ASM_FB_H_
 #define __ASM_FB_H_
 
-#include 
-#include 
 #include 
 
+struct file;
+
 static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
unsigned long off)
 {
vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
 }
+#define fb_pgprotect fb_pgprotect
 
-static inline int fb_is_primary_device(struct fb_info *info)
-{
-   return 0;
-}
+#include 
 
 #endif /* __ASM_FB_H_ */
-- 
2.40.0



[PATCH 15/18] arch/sh: Implement with generic helpers

2023-04-05 Thread Thomas Zimmermann
Replace the architecture's fb_is_primary_device() with the generic
one from . No functional changes.

Signed-off-by: Thomas Zimmermann 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: John Paul Adrian Glaubitz 
---
 arch/sh/include/asm/fb.h | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/sh/include/asm/fb.h b/arch/sh/include/asm/fb.h
index 9a0bca2686fd..1e7b1cfd5b5e 100644
--- a/arch/sh/include/asm/fb.h
+++ b/arch/sh/include/asm/fb.h
@@ -2,19 +2,17 @@
 #ifndef _ASM_FB_H_
 #define _ASM_FB_H_
 
-#include 
-#include 
 #include 
 
+struct file;
+
 static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
unsigned long off)
 {
vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
 }
+#define fb_pgprotect fb_pgprotect
 
-static inline int fb_is_primary_device(struct fb_info *info)
-{
-   return 0;
-}
+#include 
 
 #endif /* _ASM_FB_H_ */
-- 
2.40.0



[PATCH 18/18] arch/x86: Implement with generic helpers

2023-04-05 Thread Thomas Zimmermann
Include  and set the required preprocessor tokens
correctly. x86 now implements its own set of fb helpers, but still
follows the overall pattern.

Signed-off-by: Thomas Zimmermann 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
---
 arch/x86/include/asm/fb.h | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/fb.h b/arch/x86/include/asm/fb.h
index ab4c960146e3..a3fb801f12f1 100644
--- a/arch/x86/include/asm/fb.h
+++ b/arch/x86/include/asm/fb.h
@@ -2,10 +2,11 @@
 #ifndef _ASM_X86_FB_H
 #define _ASM_X86_FB_H
 
-#include 
-#include 
 #include 
 
+struct fb_info;
+struct file;
+
 static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
unsigned long off)
 {
@@ -16,7 +17,11 @@ static inline void fb_pgprotect(struct file *file, struct 
vm_area_struct *vma,
pgprot_val(vma->vm_page_prot) =
prot | cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS);
 }
+#define fb_pgprotect fb_pgprotect
+
+int fb_is_primary_device(struct fb_info *info);
+#define fb_is_primary_device fb_is_primary_device
 
-extern int fb_is_primary_device(struct fb_info *info);
+#include 
 
 #endif /* _ASM_X86_FB_H */
-- 
2.40.0



[PATCH 06/18] arch/loongarch: Implement with generic helpers

2023-04-05 Thread Thomas Zimmermann
Replace the architecture's fb_is_primary_device() with the generic
one from . No functional changes.

Signed-off-by: Thomas Zimmermann 
Cc: Huacai Chen 
Cc: WANG Xuerui 
---
 arch/loongarch/include/asm/fb.h | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/loongarch/include/asm/fb.h b/arch/loongarch/include/asm/fb.h
index 3116bde8772d..d1c9dd1c6e2e 100644
--- a/arch/loongarch/include/asm/fb.h
+++ b/arch/loongarch/include/asm/fb.h
@@ -5,19 +5,17 @@
 #ifndef _ASM_FB_H_
 #define _ASM_FB_H_
 
-#include 
-#include 
 #include 
 
+struct file;
+
 static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
unsigned long off)
 {
vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
 }
+#define fb_pgprotect fb_pgprotect
 
-static inline int fb_is_primary_device(struct fb_info *info)
-{
-   return 0;
-}
+#include 
 
 #endif /* _ASM_FB_H_ */
-- 
2.40.0



[PATCH 07/18] arch/m68k: Implement with generic helpers

2023-04-05 Thread Thomas Zimmermann
Replace the architecture's fb_is_primary_device() with the generic
one from . No functional changes. Also use the
generic helper for fb_pgprotect() on systems without MMU.

Signed-off-by: Thomas Zimmermann 
Cc: Geert Uytterhoeven 
---
 arch/m68k/include/asm/fb.h | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/m68k/include/asm/fb.h b/arch/m68k/include/asm/fb.h
index b86c6e2e26dd..f15a14e36826 100644
--- a/arch/m68k/include/asm/fb.h
+++ b/arch/m68k/include/asm/fb.h
@@ -2,8 +2,8 @@
 #ifndef _ASM_FB_H_
 #define _ASM_FB_H_
 
-#include 
 #include 
+
 #include 
 #include 
 
@@ -27,13 +27,9 @@ static inline void fb_pgprotect(struct file *file, struct 
vm_area_struct *vma,
}
 }
 #endif /* CONFIG_SUN3 */
-#else
-#define fb_pgprotect(...) do {} while (0)
+#define fb_pgprotect fb_pgprotect
 #endif /* CONFIG_MMU */
 
-static inline int fb_is_primary_device(struct fb_info *info)
-{
-   return 0;
-}
+#include 
 
 #endif /* _ASM_FB_H_ */
-- 
2.40.0



  1   2   3   >