Re: [PATCH v6 6/8] drm/i915/pxp: MTL-KCR interrupt ctrl's are in GT-0
> alan:snip > > void intel_pxp_irq_enable(struct intel_pxp *pxp) > > { > > - struct intel_gt *gt = pxp->ctrl_gt; > > + struct intel_gt *gt = intel_pxp_get_irq_gt(pxp); > > in this function we use the gt for: > > 1 - the lock: see above about this > > 2 - gen11_gt_reset_one_iir(): this should work with the media GT (we use > it for media GuC) > > 3 - writes to the GEN11_CRYPTO_* regs: those should also work with the > media GT uncore as these regs are in the same range as the GuC scratch > regs and we use the media uncore for those accesses. > alan:snip > > @@ -83,7 +101,7 @@ void intel_pxp_irq_enable(struct intel_pxp *pxp) > > > > void intel_pxp_irq_disable(struct intel_pxp *pxp) > > { > > - struct intel_gt *gt = pxp->ctrl_gt; > > + struct intel_gt *gt = intel_pxp_get_irq_gt(pxp); > > > > AFAICS this functions uses the same 3 cases as above. > > Overall, I am not sure this patch is required. Am I missing something? > alan: context: during development of my initial few revs, i needed to explicitly do that switch-over to the other gt in order to even get the IRQs. (i.e. as if the forcewake didnt wake up the range)... but upon recent retesting it seems to work fine. I guess there must have been a bug somewhere else in my branch. So yes i believe this means we can totally remove this patch
Re: [PATCH v30 0/7] Add MediaTek SoC DRM (vdosys1) support for mt8195
On Mon, Apr 3, 2023 at 5:47 PM Krzysztof Kozlowski wrote: > > On 03/04/2023 05:30, Chun-Kuang Hu wrote: > > Hi, Chen-yu: > > > > Chen-Yu Tsai 於 2023年3月30日 週四 下午7:05寫道: > >> > >> On Mon, Mar 27, 2023 at 11:17 PM Chun-Kuang Hu > >> wrote: > >>> > >>> Hi, Angelo: > >>> > >>> AngeloGioacchino Del Regno 於 > >>> 2023年3月24日 週五 下午4:38寫道: > > Il 24/03/23 00:25, Chun-Kuang Hu ha scritto: > > Hi, Angelo: > > > > AngeloGioacchino Del Regno 於 > > 2023年3月23日 週四 下午4:58寫道: > >> > >> Il 21/03/23 13:18, Nancy.Lin ha scritto: > >>> The hardware path of vdosys1 with DPTx output need to go through by > >>> several modules, such as, OVL_ADAPTOR and MERGE. > >>> > >>> Add DRM and these modules support by the patches below: > >>> > >> > >> I've tested v30 again on MT8173, MT8192 and MT8195 based Chromebooks. > >> Green light from me. > > > > I'm curious about how you build code and test on Chromebooks. Do you > > build in cros environment or pure linux > > (https://archlinuxarm.org/platforms/armv8/mediatek/acer-chromebook-r13). > > I've a MT8183 based Chromebook (HP 11a) and I've tried to run a > > upstream kernel on it. cros is too heavy for me and I doubt I could > > use it. I've tried the pure linux and could boot up with console, but > > display does not work. If you use the pure linux environment, could > > you share how it works? > > > > I haven't tested MT8183 (I don't actually have any 8183 machine in my > hands)... but > yes, I can share my test environment. > > I have one MicroSD that I use either in the MicroSD slot of the target > machine, or > in a USB reader; this *single* system is what I boot on *all* > Chromebooks that I > have: one kernel, multiple devicetrees, same Debian-based userspace. > > What we have to prepare this bootable media can be found at [1], but > beware that > it currently uses an outdated kernel, so, what I have locally is a > symlink to my > kernel tree. > You can change/add/remove the devicetree blobs that will get added to > the image > by modifying `chromebook-setup.sh`; before tampering with kernel tree > symlink, > please run that script for the first time, as it will download a > cross-compiler, > a kernel tree (that you will replace for sure) and the (very old) Debian > rootfs > that you can update with `apt-get dist-upgrade` after booting the > Chromebook. > > If you want to check about possible kernel configuration differences, > what I use > is at [2], so that you can compare. > >>> > >>> Thanks for the information, I would try to compare the kernel config > >>> first. > >> > >> Hi CK, > >> > >> Would you consider adding your repo to linux-next? That would let everyone > >> do integration testing, especially automated ones, earlier, before you send > >> your PRs to drm maintainers. > >> > >> You can do so by sending an email to Stephen Rothwell to do so. > > > > I don't understand what this process is. Does it means that I directly > > upstream patches into linux-next? I prefer that my patches go through > > drm maintainers' tree. Does any document introduce this process? > > All maintainers and sub-maintainers trees are supposed to be fed into > linux-next. > > https://lore.kernel.org/linux-next/20230327124805.3ca4f...@canb.auug.org.au/T/#md226a8e714cc731c2ab4ba5ee7eb43fe21a55009 > > Documentation/process/howto.rst > Documentation/process/2.Process.rst As Krzysztof mentioned, the purpose of linux-next is for integration testing. Your queued up patches still go through the DRM tree for eventually merging into Linus's tree. Getting included into linux-next means that your branch gets automatically merged together with other branches, gets build and boot tested by the various CI platforms and bots out there, and provides a common base for other developers. I don't think there is any downside to having your branch integrated into linux-next. ChenYu
Re: [PATCH v6 7/8] drm/i915/pxp: On MTL, KCR enabling doesn't wait on tee component
> alan:snip > > @@ -140,10 +141,15 @@ static void pxp_init_full(struct intel_pxp *pxp) > > if (ret) > > return; > > > > - if (HAS_ENGINE(pxp->ctrl_gt, GSC0)) > > + if (HAS_ENGINE(pxp->ctrl_gt, GSC0)) { > > ret = intel_pxp_gsccs_init(pxp); > > - else > > + if (!ret) { > > + with_intel_runtime_pm(>ctrl_gt->i915->runtime_pm, > > wakeref) > > + intel_pxp_init_hw(pxp); > > personal preference: I'd move this (and the matching call in fini) > inside intel_pxp_gsccs_init/fini. That way we can see this as more > back-end specific: the gsccs initialize everything immediately, while > the tee back-end follows a 2-step approach with the component. > Not a blocker since it is a personal preference, so with or without the > change: > > Reviewed-by: Daniele Ceraolo Spurio > > Daniele alan: will make that change too - thanks for the R-b. alan:snip
Re: [Intel-gfx] [PATCH] drm/atomic-helper: Don't set deadline for modesets
On Wed, Apr 05, 2023 at 06:50:22AM -0700, Rob Clark wrote: > On Wed, Apr 5, 2023 at 6:31 AM Daniel Vetter wrote: > > > > If the crtc is being switched on or off then the semantics of > > computing the timestampe of the next vblank is somewhat ill-defined. > > And indeed, the code splats with a warning in the timestamp > > computation code. Specifically it hits the check to make sure that > > atomic drivers have full set up the timing constants in the drm_vblank > > structure, and that's just not the case before the crtc is actually > > on. > > > > For robustness it seems best to just not set deadlines for modesets. > > > > v2: Also skip on inactive crtc (Ville) > > > > Link: > > https://lore.kernel.org/dri-devel/dfc21f18-7e1e-48f0-c05a-d659b9c90...@linaro.org/ > > Fixes: d39e48ca80c0 ("drm/atomic-helper: Set fence deadline for vblank") > > Cc: Ville Syrjälä > > Cc: Rob Clark > > Cc: Daniel Vetter > > Cc: Maarten Lankhorst > > Cc: Maxime Ripard > > Cc: Thomas Zimmermann > > Reported-by: Dmitry Baryshkov > > Tested-by: Dmitry Baryshkov # test patch only > > Cc: Dmitry Baryshkov > > Signed-off-by: Daniel Vetter > > Reviewed-by: Rob Clark Merged to drm-misc-next, thanks for review. > > > --- > > drivers/gpu/drm/drm_atomic_helper.c | 6 ++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c > > b/drivers/gpu/drm/drm_atomic_helper.c > > index f21b5a74176c..d44fb9b87ef8 100644 > > --- a/drivers/gpu/drm/drm_atomic_helper.c > > +++ b/drivers/gpu/drm/drm_atomic_helper.c > > @@ -1528,6 +1528,12 @@ static void set_fence_deadline(struct drm_device > > *dev, > > for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) { > > ktime_t v; > > > > + if (drm_atomic_crtc_needs_modeset(new_crtc_state)) > > + continue; > > + > > + if (!new_crtc_state->active) > > + continue; > > + > > if (drm_crtc_next_vblank_start(crtc, )) > > continue; > > > > -- > > 2.40.0 > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH RFC 18/18] drm/asahi: Add the Asahi driver for Apple AGX GPUs
Same as the prior email, this was supposed to go to Daniel... On 06/04/2023 14.02, Asahi Lina wrote: On 05/04/2023 23.44, Daniel Vetter wrote: On Tue, Mar 07, 2023 at 11:25:43PM +0900, Asahi Lina wrote: +/// Look up a GEM object handle for a `File` and return an `ObjectRef` for it. +pub(crate) fn lookup_handle(file: , handle: u32) -> Result { +Ok(ObjectRef::new(shmem::Object::lookup_handle(file, handle)?)) +} So maybe my expectations for rust typing is a bit too much, but I kinda expected this to be fully generic: - trait Driver (drm_driver) knows the driver's object type - a generic create_handle function could ensure that for drm_file (which is always for a specific drm_device and hence Driver) can ensure at the type level that you only put the right objects into the drm_file - a generic lookup_handle function on the drm_file knows the Driver trait and so can give you back the right type right away. Why the wrapping, what do I miss? Sigh, so this is one of the many ways I'm trying to work around the "Rust doesn't do subclasses" problem (so we can figure out what the best one is ^^). The generic shmem::Object::lookup_handle() call *is* fully generic and will get you back a driver-specific object. But since Rust doesn't do subclassing, what you get back isn't a driver-specific type T, but rather a (reference to a) shmem::Object. T represents the inner driver-specific data/functionality (only), and the outer shmem::Object includes the actual drm_gem_shmem_object plus a T. This is backwards from C, where you expect the opposite situation where T contains a shmem object, but that just doesn't work with Rust because there's no way to build a safe API around that model as far as I know. Now the problem is from the higher layers I want object operations that interact with the shmem::Object (that is, they call generic GEM functions on the object). Options so far: 1. Add an outer wrapper and put that functionality there. 2. Just have the functions on T as helpers, so you need to call T::foo(obj) instead of obj.foo(). 3. Use the undocumented method receiver trait thing to make shmem::Object a valid `self` type, plus add auto-Deref to shmem::Object. Then obj.foo() works. #1 is what I use here. #2 is how the driver-specific File ioctl callbacks are implemented, and also sched::Job. #3 is used for fence callbacks (FenceObject). None of them are great, and I'd love to hear what people think of the various options... There are other unexplored options, like in this GEM case it could be covered with a driver-internal auxiliary trait impl'd on shmem::Object buuut that doesn't work when you actually need callbacks on T itself to circle back to shmem::Object, as is the case with File/Job/FenceObject. ~~ Lina ~~ Lina
Re: [PATCH RFC 18/18] drm/asahi: Add the Asahi driver for Apple AGX GPUs
Argh. This (and my other reply) was supposed to go to Daniel, but Thunderbird... just dropped that recipient? And then my silly brain saw all the Cc:s go to To: and figured it was some weird consolidation and so I moved everything to Cc: except the only name that started with "Da" and... yeah, that wasn't the same person. Sorry for the confusion... I have no idea why Thunderbird hates Daniel... On 06/04/2023 13.44, Asahi Lina wrote: On 05/04/2023 23.37, Daniel Vetter wrote: On Tue, Mar 07, 2023 at 11:25:43PM +0900, Asahi Lina wrote: +/// A generic monotonically incrementing ID used to uniquely identify object instances within the +/// driver. +pub(crate) struct ID(AtomicU64); + +impl ID { +/// Create a new ID counter with a given value. +fn new(val: u64) -> ID { +ID(AtomicU64::new(val)) +} + +/// Fetch the next unique ID. +pub(crate) fn next() -> u64 { +self.0.fetch_add(1, Ordering::Relaxed) +} +} Continuing the theme of me commenting on individual things, I stumbled over this because I noticed that there's a lot of id based lookups where I don't expect them, and started chasing. - For ids use xarray, not atomic counters. Yes I know dma_fence timelines gets this wrong, this goes back to an innocent time where we didn't allocate more than one timeline per engine, and no one fixed it since then. Yes u64 should be big enough for everyone :-/ - Attaching ID spaces to drm_device is also not great. drm is full of these mistakes. Much better if their per drm_file and so private to each client. - They shouldn't be used for anything else than uapi id -> kernel object lookup at the beginning of ioctl code, and nowhere else. At least from skimming it seems like these are used all over the driver codebase, which does freak me out. At least on the C side that's a clear indicator for a refcount/lockin/data structure model that's not thought out at all. What's going on here, what do I miss? These aren't UAPI IDs, they are driver-internal IDs (the UAPI IDs do use xarray and are per-File). Most of them are just for debugging, so that when I enable full debug spam I have some way to correlate different things that are happening together (this subset of interleaved log lines relate to the same submission). Basically just object names that are easier to read (and less of a security leak) than pointers and guaranteed not to repeat. You could get rid of most of them and it wouldn't affect the driver design, it just makes it very hard to see what's going on with debug logs ^^; There are only two that are ever used for non-debugging purposes: the VM ID, and the File ID. Both are per-device global IDs attached to the VMs (not the UAPI VM objects, but rather the underlyng MMU address space managers they represent, including the kernel-internal ones) and to Files themselves. They are used for destroying GEM objects: since the objects are also device-global across multiple clients, I need a way to do things like "clean up all mappings for this File" or "clean up all mappings for this VM". There's an annoying circular reference between GEM objects and their mappings, which is why this is explicitly coded out in destroy paths instead of naturally happening via Drop semantics (without that cleanup code, the circular reference leaks it). So e.g. when a File does a GEM close or explicitly asks for all mappings of an object to be removed, it goes out to the (possibly shared) GEM object and tells it to drop all mappings marked as owned by that unique File ID. When an explicit "unmap all in VM" op happens, it asks the GEM object to drop all mappings for that underlying VM ID. Similarly, when a UAPI VM object is dropped (in the Drop impl, so both explicitly and when the whole File/xarray is dropped and such), that does an explicit unmap of a special dummy object it owns which would otherwise leak since it is not tracked as a GEM object owned by that File and therefore not handled by GEM closing. And again along the same lines, the allocators in alloc.rs explicitly destroy the mappings for their backing GEM objects on Drop. All this is due to that annoying circular reference between VMs and GEM objects that I'm not sure how to fix. Note that if I *don't* do this (or forget to do it somewhere) the consequence is just that we leak memory, and if you try to destroy the wrong IDs somehow the worst that can happen is you unmap things you shouldn't and fault the GPU (or, in the kernel or kernel-managed user VM cases, potentially the firmware). Rust safety guarantees still keep things from going entirely off the rails within the kernel, since everything that matters is reference counted (which is why these reference cycles are possible at all). This all started when I was looking at the panfrost driver for reference. It does the same thing except it uses actual pointers to the owning entities instead of IDs, and pointer comparison (see
Re: [Intel-gfx] [PATCH] i915/guc/slpc: Provide sysfs for efficient freq
On Wed, 05 Apr 2023 13:12:29 -0700, Rodrigo Vivi wrote: > > On Wed, Apr 05, 2023 at 12:42:30PM -0700, Dixit, Ashutosh wrote: > > On Wed, 05 Apr 2023 06:57:42 -0700, Rodrigo Vivi wrote: > > > Hi Rodrigo, > > > > > On Fri, Mar 31, 2023 at 08:11:29PM -0700, Dixit, Ashutosh wrote: > > > > On Fri, 31 Mar 2023 19:00:49 -0700, Vinay Belgaumkar wrote: > > > > > > > > > > > > > Hi Vinay, > > > > > > > > > @@ -478,20 +507,15 @@ int intel_guc_slpc_set_min_freq(struct > > > > > intel_guc_slpc *slpc, u32 val) > > > > > val > slpc->max_freq_softlimit) > > > > > return -EINVAL; > > > > > > > > > > + /* Ignore efficient freq if lower min freq is requested */ > > > > > + ret = intel_guc_slpc_set_ignore_eff_freq(slpc, val < > > > > > slpc->rp1_freq); > > > > > + if (ret) > > > > > + goto out; > > > > > + > > > > > > > > I don't agree with this. If we are now providing an interface > > > > explicitly to > > > > ignore RPe, that should be /only/ way to ignore RPe. There should be no > > > > other "under the hood" ignoring of RPe. In other words, ignoring RPe > > > > should > > > > be minimized unless explicitly requested. > > > > > > > > I don't clearly understand why this was done previously but it makes > > > > even > > > > less sense to me now after this patch. > > > > > > well, I had suggested this previously. And just because without this we > > > would > > > be breaking API expectations. > > > > > > When user selects a minimal frequency it expect that to stick. But with > > > the > > > efficient freq enabled in guc if minimal is less than the efficient one, > > > this request is likely ignored. > > > > > > Well, even worse is that we are actually caching the request in the soft > > > values. > > > So we show a minimal, but the hardware without any workload is operating > > > at > > > efficient. > > > > > > So, the thought process was: 'if user requested a very low minimal, we > > > give them > > > the minimal requested, even if that means to disable the efficient freq.' > > > > Hmm, I understand this even less now :) > > > > * Why is RPe ignored when min < RPe? Since the freq can be between min and > > max? Shouldn't the condition be min > RPe, that is turn RPe off if min > > higher that RPe is requested? > > that is not how guc efficient freq selection works. (unless my memory is > tricking me right now.) > > So, if we select a min that is between RPe and RP0, guc will respect and > use the selected min. So we don't need to disable guc selection of the > efficient. > > This is not true when we select a very low min like RPn. If we select RPn > as min and guc efficient freq selection is enabled guc will simply ignore > our request. So the only way to give the user what is asked, is to also > disable guc's efficient freq selection. (I probably confused you in the > previous email because I used 'RP0' when I meant 'RPn'. I hope it gets > clear now). > > > > > * Also isn't RPe dynamic, so we can't say RPe == rp1 when using in KMD? > > Oh... yeap, this is an issue indeed. Specially with i915 where we have > the soft values cached instead of asking guc everytime. > > That's a good point. The variance is not big, but we will hit corner cases. > One way is to keep checking and updating everytime a sysfs is touched. This I believe not possible in all cases. Say the freq's are set through sysfs first and the workload starts later. In this case RPe will probably start changing after the workload starts, not when freq's are set in sysfs. > Other way is do what you are suggesting and let's just accept and deal > with the reality that is: "we cannot guarantee a min freq selection if user > doesn't disable the efficient freq selection". > > > > > * Finally, we know that enabling RPe broke the kernel freq API because RPe > > could go over max_freq. So it is actually the max freq which is not > > obeyed after RPe is enabled. > > Oh! so it was my bad memory indeed and everything is the other way around? > But I just looked to Xe code, my most recent memory, and I just needed > to toggle the efficient freq off on the case that I mentioned, when min > selection is below the efficient one. With that all the API expectation > that I coded in IGT works neatly. From what I saw the following bugs: https://gitlab.freedesktop.org/drm/intel/-/issues/6806 https://gitlab.freedesktop.org/drm/intel/-/issues/6786 and the following patches in response to these bugs (as well as the discussion on these patches): https://patchwork.freedesktop.org/series/111282/ https://patchwork.freedesktop.org/series/110574/ were all due to the fact that the max freq is not obeyed after RPe is enabled. These patches were never merged but I will now try to submit them again after this ignore efficient freq patch gets reviewed and merged. Thanks. -- Ashutosh > > > > So we ignore RPe in some select cases (which also I don't understand as > > mentioned above but maybe I am missing something) to
Re: [PATCH RFC 18/18] drm/asahi: Add the Asahi driver for Apple AGX GPUs
On 05/04/2023 23.44, Daniel Vetter wrote: On Tue, Mar 07, 2023 at 11:25:43PM +0900, Asahi Lina wrote: +/// Look up a GEM object handle for a `File` and return an `ObjectRef` for it. +pub(crate) fn lookup_handle(file: , handle: u32) -> Result { +Ok(ObjectRef::new(shmem::Object::lookup_handle(file, handle)?)) +} So maybe my expectations for rust typing is a bit too much, but I kinda expected this to be fully generic: - trait Driver (drm_driver) knows the driver's object type - a generic create_handle function could ensure that for drm_file (which is always for a specific drm_device and hence Driver) can ensure at the type level that you only put the right objects into the drm_file - a generic lookup_handle function on the drm_file knows the Driver trait and so can give you back the right type right away. Why the wrapping, what do I miss? Sigh, so this is one of the many ways I'm trying to work around the "Rust doesn't do subclasses" problem (so we can figure out what the best one is ^^). The generic shmem::Object::lookup_handle() call *is* fully generic and will get you back a driver-specific object. But since Rust doesn't do subclassing, what you get back isn't a driver-specific type T, but rather a (reference to a) shmem::Object. T represents the inner driver-specific data/functionality (only), and the outer shmem::Object includes the actual drm_gem_shmem_object plus a T. This is backwards from C, where you expect the opposite situation where T contains a shmem object, but that just doesn't work with Rust because there's no way to build a safe API around that model as far as I know. Now the problem is from the higher layers I want object operations that interact with the shmem::Object (that is, they call generic GEM functions on the object). Options so far: 1. Add an outer wrapper and put that functionality there. 2. Just have the functions on T as helpers, so you need to call T::foo(obj) instead of obj.foo(). 3. Use the undocumented method receiver trait thing to make shmem::Object a valid `self` type, plus add auto-Deref to shmem::Object. Then obj.foo() works. #1 is what I use here. #2 is how the driver-specific File ioctl callbacks are implemented, and also sched::Job. #3 is used for fence callbacks (FenceObject). None of them are great, and I'd love to hear what people think of the various options... There are other unexplored options, like in this GEM case it could be covered with a driver-internal auxiliary trait impl'd on shmem::Object buuut that doesn't work when you actually need callbacks on T itself to circle back to shmem::Object, as is the case with File/Job/FenceObject. ~~ Lina
Re: [Intel-gfx] [PATCH] drm/i915/guc: Disable PL1 power limit when loading GuC firmware
On Tue, 28 Mar 2023 02:14:42 -0700, Tvrtko Ursulin wrote: > Hi Tvrtko, > On 27/03/2023 18:47, Rodrigo Vivi wrote: > > > > +Daniel > > > > On Mon, Mar 27, 2023 at 09:58:52AM -0700, Dixit, Ashutosh wrote: > >> On Sun, 26 Mar 2023 04:52:59 -0700, Rodrigo Vivi wrote: > >>> > >> > >> Hi Rodrigo, > >> > >>> On Fri, Mar 24, 2023 at 04:31:22PM -0700, Dixit, Ashutosh wrote: > On Fri, 24 Mar 2023 11:15:02 -0700, Belgaumkar, Vinay wrote: > > > > Hi Vinay, > > Thanks for the review. Comments inline below. > > > On 3/15/2023 8:59 PM, Ashutosh Dixit wrote: > >> On dGfx, the PL1 power limit being enabled and set to a low value > >> results > >> in a low GPU operating freq. It also negates the freq raise operation > >> which > >> is done before GuC firmware load. As a result GuC firmware load can > >> time > >> out. Such timeouts were seen in the GL #8062 bug below (where the PL1 > >> power > >> limit was enabled and set to a low value). Therefore disable the PL1 > >> power > >> limit when allowed by HW when loading GuC firmware. > > v3 label missing in subject. > >> > >> v2: > >>- Take mutex (to disallow writes to power1_max) across GuC reset/fw > >> load > >>- Add hwm_power_max_restore to error return code path > >> > >> v3 (Jani N): > >>- Add/remove explanatory comments > >>- Function renames > >>- Type corrections > >>- Locking annotation > >> > >> Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8062 > >> Signed-off-by: Ashutosh Dixit > >> --- > >>drivers/gpu/drm/i915/gt/uc/intel_uc.c | 9 +++ > >>drivers/gpu/drm/i915/i915_hwmon.c | 39 > >> +++ > >>drivers/gpu/drm/i915/i915_hwmon.h | 7 + > >>3 files changed, 55 insertions(+) > >> > >> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c > >> b/drivers/gpu/drm/i915/gt/uc/intel_uc.c > >> index 4ccb4be4c9cba..aa8e35a5636a0 100644 > >> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c > >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c > >> @@ -18,6 +18,7 @@ > >>#include "intel_uc.h" > >> #include "i915_drv.h" > >> +#include "i915_hwmon.h" > >> static const struct intel_uc_ops uc_ops_off; > >>static const struct intel_uc_ops uc_ops_on; > >> @@ -461,6 +462,7 @@ static int __uc_init_hw(struct intel_uc *uc) > >>struct intel_guc *guc = >guc; > >>struct intel_huc *huc = >huc; > >>int ret, attempts; > >> + bool pl1en; > > > > Init to 'false' here > > See next comment. > > > > > > >>GEM_BUG_ON(!intel_uc_supports_guc(uc)); > >>GEM_BUG_ON(!intel_uc_wants_guc(uc)); > >> @@ -491,6 +493,9 @@ static int __uc_init_hw(struct intel_uc *uc) > >>else > >>attempts = 1; > >>+ /* Disable a potentially low PL1 power limit to allow freq to be > >> raised */ > >> + i915_hwmon_power_max_disable(gt->i915, ); > >> + > >>intel_rps_raise_unslice(_to_gt(uc)->rps); > >>while (attempts--) { > >> @@ -547,6 +552,8 @@ static int __uc_init_hw(struct intel_uc *uc) > >>intel_rps_lower_unslice(_to_gt(uc)->rps); > >>} > >>+ i915_hwmon_power_max_restore(gt->i915, pl1en); > >> + > >>guc_info(guc, "submission %s\n", > >> str_enabled_disabled(intel_uc_uses_guc_submission(uc))); > >>guc_info(guc, "SLPC %s\n", > >> str_enabled_disabled(intel_uc_uses_guc_slpc(uc))); > >>@@ -563,6 +570,8 @@ static int __uc_init_hw(struct intel_uc *uc) > >>/* Return GT back to RPn */ > >>intel_rps_lower_unslice(_to_gt(uc)->rps); > >>+ i915_hwmon_power_max_restore(gt->i915, pl1en); > > > > if (pl1en) > > > > i915_hwmon_power_max_enable(). > > IMO it's better not to have checks in the main __uc_init_hw() function > (if > we do this we'll need to add 2 checks in __uc_init_hw()). If you really > want we could do something like this inside > i915_hwmon_power_max_disable/i915_hwmon_power_max_restore. But for now I > am not making any changes. > > (I can send a patch with the changes if you want to take a look but IMO > it > will add more logic/code but without real benefits (it will save a rmw if > the limit was already disabled, but IMO this code is called so > infrequently > (only during GuC resets) as to not have any significant impact)). > > > > >> + > >>__uc_sanitize(uc); > >>if (!ret) { > >> diff --git a/drivers/gpu/drm/i915/i915_hwmon.c > >> b/drivers/gpu/drm/i915/i915_hwmon.c > >> index ee63a8fd88fc1..769b5bda4d53f 100644 > >> ---
Re: [Intel-gfx] [PATCH] drm/i915/guc: Disable PL1 power limit when loading GuC firmware
On Mon, 27 Mar 2023 10:47:25 -0700, Rodrigo Vivi wrote: > Hi Rodrigo, Sorry for the delay, I got pulled away into a couple of other things and could only now get back to this. > > +Daniel > > On Mon, Mar 27, 2023 at 09:58:52AM -0700, Dixit, Ashutosh wrote: > > On Sun, 26 Mar 2023 04:52:59 -0700, Rodrigo Vivi wrote: > > > > > > > Hi Rodrigo, > > > > > On Fri, Mar 24, 2023 at 04:31:22PM -0700, Dixit, Ashutosh wrote: > > > > On Fri, 24 Mar 2023 11:15:02 -0700, Belgaumkar, Vinay wrote: > > > > > > > > > > > > > Hi Vinay, > > > > > > > > Thanks for the review. Comments inline below. > > > > > > > > > On 3/15/2023 8:59 PM, Ashutosh Dixit wrote: > > > > > > On dGfx, the PL1 power limit being enabled and set to a low value > > > > > > results > > > > > > in a low GPU operating freq. It also negates the freq raise > > > > > > operation which > > > > > > is done before GuC firmware load. As a result GuC firmware load can > > > > > > time > > > > > > out. Such timeouts were seen in the GL #8062 bug below (where the > > > > > > PL1 power > > > > > > limit was enabled and set to a low value). Therefore disable the > > > > > > PL1 power > > > > > > limit when allowed by HW when loading GuC firmware. > > > > > v3 label missing in subject. > > > > > > > > > > > > v2: > > > > > > - Take mutex (to disallow writes to power1_max) across GuC > > > > > > reset/fw load > > > > > > - Add hwm_power_max_restore to error return code path > > > > > > > > > > > > v3 (Jani N): > > > > > > - Add/remove explanatory comments > > > > > > - Function renames > > > > > > - Type corrections > > > > > > - Locking annotation > > > > > > > > > > > > Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8062 > > > > > > Signed-off-by: Ashutosh Dixit > > > > > > --- > > > > > > drivers/gpu/drm/i915/gt/uc/intel_uc.c | 9 +++ > > > > > > drivers/gpu/drm/i915/i915_hwmon.c | 39 > > > > > > +++ > > > > > > drivers/gpu/drm/i915/i915_hwmon.h | 7 + > > > > > > 3 files changed, 55 insertions(+) > > > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c > > > > > > b/drivers/gpu/drm/i915/gt/uc/intel_uc.c > > > > > > index 4ccb4be4c9cba..aa8e35a5636a0 100644 > > > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c > > > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c > > > > > > @@ -18,6 +18,7 @@ > > > > > > #include "intel_uc.h" > > > > > > #include "i915_drv.h" > > > > > > +#include "i915_hwmon.h" > > > > > > static const struct intel_uc_ops uc_ops_off; > > > > > > static const struct intel_uc_ops uc_ops_on; > > > > > > @@ -461,6 +462,7 @@ static int __uc_init_hw(struct intel_uc *uc) > > > > > > struct intel_guc *guc = >guc; > > > > > > struct intel_huc *huc = >huc; > > > > > > int ret, attempts; > > > > > > + bool pl1en; > > > > > > > > > > Init to 'false' here > > > > > > > > See next comment. > > > > > > > > > > > > > > > > > > > > GEM_BUG_ON(!intel_uc_supports_guc(uc)); > > > > > > GEM_BUG_ON(!intel_uc_wants_guc(uc)); > > > > > > @@ -491,6 +493,9 @@ static int __uc_init_hw(struct intel_uc *uc) > > > > > > else > > > > > > attempts = 1; > > > > > > + /* Disable a potentially low PL1 power limit to allow freq to be > > > > > > raised */ > > > > > > + i915_hwmon_power_max_disable(gt->i915, ); > > > > > > + > > > > > > intel_rps_raise_unslice(_to_gt(uc)->rps); > > > > > > while (attempts--) { > > > > > > @@ -547,6 +552,8 @@ static int __uc_init_hw(struct intel_uc *uc) > > > > > > intel_rps_lower_unslice(_to_gt(uc)->rps); > > > > > > } > > > > > > + i915_hwmon_power_max_restore(gt->i915, pl1en); > > > > > > + > > > > > > guc_info(guc, "submission %s\n", > > > > > > str_enabled_disabled(intel_uc_uses_guc_submission(uc))); > > > > > > guc_info(guc, "SLPC %s\n", > > > > > > str_enabled_disabled(intel_uc_uses_guc_slpc(uc))); > > > > > > @@ -563,6 +570,8 @@ static int __uc_init_hw(struct intel_uc *uc) > > > > > > /* Return GT back to RPn */ > > > > > > intel_rps_lower_unslice(_to_gt(uc)->rps); > > > > > > + i915_hwmon_power_max_restore(gt->i915, pl1en); > > > > > > > > > > if (pl1en) > > > > > > > > > > i915_hwmon_power_max_enable(). > > > > > > > > IMO it's better not to have checks in the main __uc_init_hw() function > > > > (if > > > > we do this we'll need to add 2 checks in __uc_init_hw()). If you really > > > > want we could do something like this inside > > > > i915_hwmon_power_max_disable/i915_hwmon_power_max_restore. But for now I > > > > am not making any changes. > > > > > > > > (I can send a patch with the changes if you want to take a look but IMO > > > > it > > > > will add more logic/code but without real benefits (it will save a rmw > > > > if > > > > the limit was already disabled, but IMO this code is called so > > > > infrequently > > > > (only during GuC resets) as to not have any significant impact)). > > > > > > > > > > > > >
[PATCH 2/3] drm/i915/guc: Disable PL1 power limit when loading GuC firmware
On dGfx, the PL1 power limit being enabled and set to a low value results in a low GPU operating freq. It also negates the freq raise operation which is done before GuC firmware load. As a result GuC firmware load can time out. Such timeouts were seen in the GL #8062 bug below (where the PL1 power limit was enabled and set to a low value). Therefore disable the PL1 power limit when allowed by HW when loading GuC firmware. v2: - Take mutex (to disallow writes to power1_max) across GuC reset/fw load - Add hwm_power_max_restore to error return code path v3 (Jani N): - Add/remove explanatory comments - Function renames - Type corrections - Locking annotation v4: - Don't hold the lock across GuC reset (Rodrigo) - New locking scheme (suggested by Rodrigo) - Eliminate rpm_get in power_max_disable/restore, not needed (Tvrtko) Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8062 Signed-off-by: Ashutosh Dixit --- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 9 ++ drivers/gpu/drm/i915/i915_hwmon.c | 40 +++ drivers/gpu/drm/i915/i915_hwmon.h | 7 + 3 files changed, 56 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 4ccb4be4c9cba..aa8e35a5636a0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -18,6 +18,7 @@ #include "intel_uc.h" #include "i915_drv.h" +#include "i915_hwmon.h" static const struct intel_uc_ops uc_ops_off; static const struct intel_uc_ops uc_ops_on; @@ -461,6 +462,7 @@ static int __uc_init_hw(struct intel_uc *uc) struct intel_guc *guc = >guc; struct intel_huc *huc = >huc; int ret, attempts; + bool pl1en; GEM_BUG_ON(!intel_uc_supports_guc(uc)); GEM_BUG_ON(!intel_uc_wants_guc(uc)); @@ -491,6 +493,9 @@ static int __uc_init_hw(struct intel_uc *uc) else attempts = 1; + /* Disable a potentially low PL1 power limit to allow freq to be raised */ + i915_hwmon_power_max_disable(gt->i915, ); + intel_rps_raise_unslice(_to_gt(uc)->rps); while (attempts--) { @@ -547,6 +552,8 @@ static int __uc_init_hw(struct intel_uc *uc) intel_rps_lower_unslice(_to_gt(uc)->rps); } + i915_hwmon_power_max_restore(gt->i915, pl1en); + guc_info(guc, "submission %s\n", str_enabled_disabled(intel_uc_uses_guc_submission(uc))); guc_info(guc, "SLPC %s\n", str_enabled_disabled(intel_uc_uses_guc_slpc(uc))); @@ -563,6 +570,8 @@ static int __uc_init_hw(struct intel_uc *uc) /* Return GT back to RPn */ intel_rps_lower_unslice(_to_gt(uc)->rps); + i915_hwmon_power_max_restore(gt->i915, pl1en); + __uc_sanitize(uc); if (!ret) { diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c index 7f44e809ca155..9ab8971679fe3 100644 --- a/drivers/gpu/drm/i915/i915_hwmon.c +++ b/drivers/gpu/drm/i915/i915_hwmon.c @@ -50,6 +50,7 @@ struct hwm_drvdata { struct hwm_energy_info ei; /* Energy info for energy1_input */ char name[12]; int gt_n; + bool reset_in_progress; }; struct i915_hwmon { @@ -400,6 +401,10 @@ hwm_power_max_write(struct hwm_drvdata *ddat, long val) u32 nval; mutex_lock(>hwmon_lock); + if (hwmon->ddat.reset_in_progress) { + ret = -EAGAIN; + goto unlock; + } wakeref = intel_runtime_pm_get(ddat->uncore->rpm); /* Disable PL1 limit and verify, because the limit cannot be disabled on all platforms */ @@ -421,6 +426,7 @@ hwm_power_max_write(struct hwm_drvdata *ddat, long val) PKG_PWR_LIM_1_EN | PKG_PWR_LIM_1, nval); exit: intel_runtime_pm_put(ddat->uncore->rpm, wakeref); +unlock: mutex_unlock(>hwmon_lock); return ret; } @@ -472,6 +478,40 @@ hwm_power_write(struct hwm_drvdata *ddat, u32 attr, int chan, long val) } } +void i915_hwmon_power_max_disable(struct drm_i915_private *i915, bool *old) +{ + struct i915_hwmon *hwmon = i915->hwmon; + u32 r; + + if (!hwmon || !i915_mmio_reg_valid(hwmon->rg.pkg_rapl_limit)) + return; + + mutex_lock(>hwmon_lock); + + hwmon->ddat.reset_in_progress = true; + r = intel_uncore_rmw(hwmon->ddat.uncore, hwmon->rg.pkg_rapl_limit, +PKG_PWR_LIM_1_EN, 0); + *old = !!(r & PKG_PWR_LIM_1_EN); + + mutex_unlock(>hwmon_lock); +} + +void i915_hwmon_power_max_restore(struct drm_i915_private *i915, bool old) +{ + struct i915_hwmon *hwmon = i915->hwmon; + + if (!hwmon || !i915_mmio_reg_valid(hwmon->rg.pkg_rapl_limit)) + return; + + mutex_lock(>hwmon_lock); + + intel_uncore_rmw(hwmon->ddat.uncore, hwmon->rg.pkg_rapl_limit, +PKG_PWR_LIM_1_EN, old ? PKG_PWR_LIM_1_EN : 0); + hwmon->ddat.reset_in_progress =
[PATCH 1/3] drm/i915/hwmon: Get mutex and rpm ref just once in hwm_power_max_write
In preparation for follow-on patches, refactor hwm_power_max_write to take hwmon_lock and runtime pm wakeref at start of the function and release them at the end, therefore acquiring these just once each. Signed-off-by: Ashutosh Dixit --- drivers/gpu/drm/i915/i915_hwmon.c | 28 +++- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c index 8e7dccc8d3a0e..7f44e809ca155 100644 --- a/drivers/gpu/drm/i915/i915_hwmon.c +++ b/drivers/gpu/drm/i915/i915_hwmon.c @@ -396,31 +396,33 @@ hwm_power_max_write(struct hwm_drvdata *ddat, long val) { struct i915_hwmon *hwmon = ddat->hwmon; intel_wakeref_t wakeref; + int ret = 0; u32 nval; + mutex_lock(>hwmon_lock); + wakeref = intel_runtime_pm_get(ddat->uncore->rpm); + /* Disable PL1 limit and verify, because the limit cannot be disabled on all platforms */ if (val == PL1_DISABLE) { - mutex_lock(>hwmon_lock); - with_intel_runtime_pm(ddat->uncore->rpm, wakeref) { - intel_uncore_rmw(ddat->uncore, hwmon->rg.pkg_rapl_limit, -PKG_PWR_LIM_1_EN, 0); - nval = intel_uncore_read(ddat->uncore, hwmon->rg.pkg_rapl_limit); - } - mutex_unlock(>hwmon_lock); + intel_uncore_rmw(ddat->uncore, hwmon->rg.pkg_rapl_limit, +PKG_PWR_LIM_1_EN, 0); + nval = intel_uncore_read(ddat->uncore, hwmon->rg.pkg_rapl_limit); if (nval & PKG_PWR_LIM_1_EN) - return -ENODEV; - return 0; + ret = -ENODEV; + goto exit; } /* Computation in 64-bits to avoid overflow. Round to nearest. */ nval = DIV_ROUND_CLOSEST_ULL((u64)val << hwmon->scl_shift_power, SF_POWER); nval = PKG_PWR_LIM_1_EN | REG_FIELD_PREP(PKG_PWR_LIM_1, nval); - hwm_locked_with_pm_intel_uncore_rmw(ddat, hwmon->rg.pkg_rapl_limit, - PKG_PWR_LIM_1_EN | PKG_PWR_LIM_1, - nval); - return 0; + intel_uncore_rmw(ddat->uncore, hwmon->rg.pkg_rapl_limit, +PKG_PWR_LIM_1_EN | PKG_PWR_LIM_1, nval); +exit: + intel_runtime_pm_put(ddat->uncore->rpm, wakeref); + mutex_unlock(>hwmon_lock); + return ret; } static int -- 2.38.0
[PATCH 3/3] drm/i915/hwmon: Block waiting for GuC reset to complete
Instead of erroring out when GuC reset is in progress, block waiting for GuC reset to complete which is a more reasonable uapi behavior. Signed-off-by: Ashutosh Dixit --- drivers/gpu/drm/i915/i915_hwmon.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c index 9ab8971679fe3..4343efb48e61b 100644 --- a/drivers/gpu/drm/i915/i915_hwmon.c +++ b/drivers/gpu/drm/i915/i915_hwmon.c @@ -51,6 +51,7 @@ struct hwm_drvdata { char name[12]; int gt_n; bool reset_in_progress; + wait_queue_head_t wqh; }; struct i915_hwmon { @@ -400,10 +401,15 @@ hwm_power_max_write(struct hwm_drvdata *ddat, long val) int ret = 0; u32 nval; +retry: mutex_lock(>hwmon_lock); if (hwmon->ddat.reset_in_progress) { - ret = -EAGAIN; - goto unlock; + mutex_unlock(>hwmon_lock); + ret = wait_event_interruptible(ddat->wqh, + !hwmon->ddat.reset_in_progress); + if (ret) + return ret; + goto retry; } wakeref = intel_runtime_pm_get(ddat->uncore->rpm); @@ -426,7 +432,6 @@ hwm_power_max_write(struct hwm_drvdata *ddat, long val) PKG_PWR_LIM_1_EN | PKG_PWR_LIM_1, nval); exit: intel_runtime_pm_put(ddat->uncore->rpm, wakeref); -unlock: mutex_unlock(>hwmon_lock); return ret; } @@ -508,6 +513,7 @@ void i915_hwmon_power_max_restore(struct drm_i915_private *i915, bool old) intel_uncore_rmw(hwmon->ddat.uncore, hwmon->rg.pkg_rapl_limit, PKG_PWR_LIM_1_EN, old ? PKG_PWR_LIM_1_EN : 0); hwmon->ddat.reset_in_progress = false; + wake_up_all(>ddat.wqh); mutex_unlock(>hwmon_lock); } @@ -784,6 +790,7 @@ void i915_hwmon_register(struct drm_i915_private *i915) ddat->uncore = >uncore; snprintf(ddat->name, sizeof(ddat->name), "i915"); ddat->gt_n = -1; + init_waitqueue_head(>wqh); for_each_gt(gt, i915, i) { ddat_gt = hwmon->ddat_gt + i; -- 2.38.0
[PATCH v4 0/3] drm/i915/guc: Disable PL1 power limit when loading GuC firmware
Split the v3 patch into 3 patches for easier review, can squash later if needed. Cc: Rodrigo Vivi Cc: Tvrtko Ursulin Ashutosh Dixit (3): drm/i915/hwmon: Get mutex and rpm ref just once in hwm_power_max_write drm/i915/guc: Disable PL1 power limit when loading GuC firmware drm/i915/hwmon: Block waiting for GuC reset to complete drivers/gpu/drm/i915/gt/uc/intel_uc.c | 9 drivers/gpu/drm/i915/i915_hwmon.c | 75 ++- drivers/gpu/drm/i915/i915_hwmon.h | 7 +++ 3 files changed, 78 insertions(+), 13 deletions(-) -- 2.38.0
Re: [PATCH RFC 18/18] drm/asahi: Add the Asahi driver for Apple AGX GPUs
On 05/04/2023 23.37, Daniel Vetter wrote: On Tue, Mar 07, 2023 at 11:25:43PM +0900, Asahi Lina wrote: +/// A generic monotonically incrementing ID used to uniquely identify object instances within the +/// driver. +pub(crate) struct ID(AtomicU64); + +impl ID { +/// Create a new ID counter with a given value. +fn new(val: u64) -> ID { +ID(AtomicU64::new(val)) +} + +/// Fetch the next unique ID. +pub(crate) fn next() -> u64 { +self.0.fetch_add(1, Ordering::Relaxed) +} +} Continuing the theme of me commenting on individual things, I stumbled over this because I noticed that there's a lot of id based lookups where I don't expect them, and started chasing. - For ids use xarray, not atomic counters. Yes I know dma_fence timelines gets this wrong, this goes back to an innocent time where we didn't allocate more than one timeline per engine, and no one fixed it since then. Yes u64 should be big enough for everyone :-/ - Attaching ID spaces to drm_device is also not great. drm is full of these mistakes. Much better if their per drm_file and so private to each client. - They shouldn't be used for anything else than uapi id -> kernel object lookup at the beginning of ioctl code, and nowhere else. At least from skimming it seems like these are used all over the driver codebase, which does freak me out. At least on the C side that's a clear indicator for a refcount/lockin/data structure model that's not thought out at all. What's going on here, what do I miss? These aren't UAPI IDs, they are driver-internal IDs (the UAPI IDs do use xarray and are per-File). Most of them are just for debugging, so that when I enable full debug spam I have some way to correlate different things that are happening together (this subset of interleaved log lines relate to the same submission). Basically just object names that are easier to read (and less of a security leak) than pointers and guaranteed not to repeat. You could get rid of most of them and it wouldn't affect the driver design, it just makes it very hard to see what's going on with debug logs ^^; There are only two that are ever used for non-debugging purposes: the VM ID, and the File ID. Both are per-device global IDs attached to the VMs (not the UAPI VM objects, but rather the underlyng MMU address space managers they represent, including the kernel-internal ones) and to Files themselves. They are used for destroying GEM objects: since the objects are also device-global across multiple clients, I need a way to do things like "clean up all mappings for this File" or "clean up all mappings for this VM". There's an annoying circular reference between GEM objects and their mappings, which is why this is explicitly coded out in destroy paths instead of naturally happening via Drop semantics (without that cleanup code, the circular reference leaks it). So e.g. when a File does a GEM close or explicitly asks for all mappings of an object to be removed, it goes out to the (possibly shared) GEM object and tells it to drop all mappings marked as owned by that unique File ID. When an explicit "unmap all in VM" op happens, it asks the GEM object to drop all mappings for that underlying VM ID. Similarly, when a UAPI VM object is dropped (in the Drop impl, so both explicitly and when the whole File/xarray is dropped and such), that does an explicit unmap of a special dummy object it owns which would otherwise leak since it is not tracked as a GEM object owned by that File and therefore not handled by GEM closing. And again along the same lines, the allocators in alloc.rs explicitly destroy the mappings for their backing GEM objects on Drop. All this is due to that annoying circular reference between VMs and GEM objects that I'm not sure how to fix. Note that if I *don't* do this (or forget to do it somewhere) the consequence is just that we leak memory, and if you try to destroy the wrong IDs somehow the worst that can happen is you unmap things you shouldn't and fault the GPU (or, in the kernel or kernel-managed user VM cases, potentially the firmware). Rust safety guarantees still keep things from going entirely off the rails within the kernel, since everything that matters is reference counted (which is why these reference cycles are possible at all). This all started when I was looking at the panfrost driver for reference. It does the same thing except it uses actual pointers to the owning entities instead of IDs, and pointer comparison (see panfrost_gem_close). Of course you could try do that in Rust too (literally storing and comparing raw pointers that aren't owned references), but then you're introducing a Pin<> requirement on those objects to make their addresses stable and it feels way more icky and error-prone than unique IDs (since addresses can be reused). panfrost only has a single mmu (what I call the raw VM) per File while I have an
Re: [RFC PATCH 00/10] Xe DRM scheduler and long running workload plans
On Wed, Apr 05, 2023 at 12:12:27PM +0200, Daniel Vetter wrote: > On Wed, 5 Apr 2023 at 11:57, Christian König wrote: > > > > Am 05.04.23 um 11:07 schrieb Daniel Vetter: > > > [SNIP] > > >> I would approach it from the complete other side. This component here is > > >> a > > >> tool to decide what job should run next. > > >> > > >> How that is then signaled and run should not be part of the scheduler, > > >> but > > >> another more higher level component. > > >> > > >> This way you also don't have a problem with not using DMA-fences as > > >> dependencies as well as constrains for running more jobs. > > > I think we're talking about two things here and mixing them up. > > > > > > For the dependencies I agree with you, and imo that higher level tool > > > should probably just be an on-demand submit thread in userspace for the > > > rare case where the kernel would need to sort out a dependency otherwise > > > (due to running out of ringspace in the per-ctx ringbuffer). > > > > > > The other thing is the message passing stuff, and this is what I was > > > talking about above. This has nothing to do with handling dependencies, > > > but with talking to the gpu fw. Here the intel design issue is that the fw > > > only provides a single queue, and it's in-order. Which means it > > > fundamentally has the stalling issue you describe as a point against a > > > message passing design. And fundamentally we need to be able to talk to > > > the fw in the scheduler ->run_job callback. > > > > > > The proposal here for the message passing part is that since it has the > > > stalling issue already anyway, and the scheduler needs to be involved > > > anyway, it makes sense to integrated this (as an optional thing, only for > > > drivers which have this kind of fw interface) into the scheduler. > > > Otherwise you just end up with two layers for no reason and more ping-pong > > > delay because the ->run_job needs to kick off the subordinate driver layer > > > first. Note that for this case the optional message passing support in the > > > drm/scheduler actually makes things better, because it allows you to cut > > > out one layer. > > > > > > Of course if a driver with better fw interface uses this message passing > > > support, then that's bad. Hence the big warning in the kerneldoc. > > > > Well what I wanted to say is that if you design the dependency handling > > / scheduler properly you don't need the message passing through it. > > > > For example if the GPU scheduler component uses a work item to do it's > > handling instead of a kthread you could also let the driver specify the > > work queue where this work item is executed on. > > > > When you design it like this the driver specifies the thread context of > > execution for it's job. In other words it can specify a single threaded > > firmware work queue as well. > > > > When you then have other messages which needs to be passed to the > > firmware you can also use the same single threaded workqueue for this. > > > > Drivers which have a different firmware interface would just use one of > > the system work queues instead. > > > > This approach basically decouples the GPU scheduler component from the > > message passing functionality. > > Hm I guess we've been talking past each another big time, because > that's really what I thought was under discussions? Essentially the > current rfc, but implementing with some polish. > I think Daniel pretty much nailed it here (thanks), to recap: 1. I want the messages in the same worker so run_job / free_job / process_msg execution is mutual exclusive and also so during reset paths if the worker is stopped all the entry points can't be entered. If this is a NAK, then another worker is fine I guess. A lock between run_job / free_job + process_msg should solve the exclusion issue and the reset paths can also stop this new worker too. That being said I'd rather leave this as is but will not fight this point. 2. process_msg is just used to communicate with the firmware using the same queue as submission. Waiting for space in this queue is the only place this function can block (same as submission), well actually we have the concept a preempt time slice but that sleeps for 10 ms by default. Also preempt is only used in LR entities so I don't think it is relavent in this case either. 3. Agree this is in the dma-fence signaling path (if process_msg is in the submission worker) so we can't block indefinitely or an unreasonable period of time (i.e. we must obey dma-fence rules). 4. Agree the documentation for thw usage of the messaging interface needs to be clear. 5. Agree that my code could alway use polishing. Lets close on #1 then can I get on general Ack on this part of the RFC and apply the polish in the full review process? Matt > iow I agree with you (I think at least). > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
Re: [PATCH v2 2/2] drm: bridge: cdns-mhdp8546: Add support for no-hpd
Hi Jayesh, Thank you for the patch. On Wed, Apr 05, 2023 at 07:54:40PM +0530, Jayesh Choudhary wrote: > From: Rahul T R > > In J721S2 EVMs DP0 hpd is not connected to correct > hpd pin on SOC, to handle such cases, Add support for > "no-hpd" property in the device tree node to disable > hpd s/hpd/hpd./ You can also reflow the commit message to 72 columns. > Also change the log level for dpcd read failuers to > debug, since framework retries 32 times for each read s/read/read./ Doesn't this apply to writes as well ? > Signed-off-by: Rahul T R > Signed-off-by: Jayesh Choudhary > --- > .../drm/bridge/cadence/cdns-mhdp8546-core.c | 37 --- > .../drm/bridge/cadence/cdns-mhdp8546-core.h | 1 + > 2 files changed, 33 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c > b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c > index f6822dfa3805..e177794b069d 100644 > --- a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c > +++ b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c > @@ -54,6 +54,8 @@ > #include "cdns-mhdp8546-hdcp.h" > #include "cdns-mhdp8546-j721e.h" > > +static int cdns_mhdp_update_link_status(struct cdns_mhdp_device *mhdp); > + > static int cdns_mhdp_mailbox_read(struct cdns_mhdp_device *mhdp) > { > int ret, empty; > @@ -749,7 +751,7 @@ static int cdns_mhdp_fw_activate(const struct firmware > *fw, >* MHDP_HW_STOPPED happens only due to driver removal when >* bridge should already be detached. >*/ > - if (mhdp->bridge_attached) > + if (mhdp->bridge_attached && !mhdp->no_hpd) > writel(~(u32)CDNS_APB_INT_MASK_SW_EVENT_INT, > mhdp->regs + CDNS_APB_INT_MASK); > > @@ -845,7 +847,7 @@ static ssize_t cdns_mhdp_transfer(struct drm_dp_aux *aux, > ret = cdns_mhdp_dpcd_read(mhdp, msg->address, > msg->buffer, msg->size); > if (ret) { > - dev_err(mhdp->dev, > + dev_dbg(mhdp->dev, > "Failed to read DPCD addr %u\n", > msg->address); > > @@ -1738,6 +1740,19 @@ static int cdns_mhdp_attach(struct drm_bridge *bridge, > > spin_unlock(>start_lock); > > + if (mhdp->no_hpd) { > + ret = wait_event_timeout(mhdp->fw_load_wq, > + mhdp->hw_state == MHDP_HW_READY, > + msecs_to_jiffies(100)); > + if (ret == 0) { > + dev_err(mhdp->dev, "%s: Timeout waiting for fw > loading\n", > + __func__); > + return -ETIMEDOUT; > + } > + > + cdns_mhdp_update_link_status(mhdp); > + return 0; > + } Missing blank line. It's not clear to me while you need to wait for the state to change to MHDP_HW_READY in the no_hpd case. This should be explained in the commit message. > /* Enable SW event interrupts */ > if (hw_ready) > writel(~(u32)CDNS_APB_INT_MASK_SW_EVENT_INT, > @@ -2256,7 +2271,16 @@ static int cdns_mhdp_update_link_status(struct > cdns_mhdp_device *mhdp) > > mutex_lock(>link_mutex); > > - mhdp->plugged = cdns_mhdp_detect_hpd(mhdp, _pulse); > + if (mhdp->no_hpd) { > + ret = drm_dp_dpcd_read_link_status(>aux, status); > + hpd_pulse = false; > + if (ret < 0) > + mhdp->plugged = false; > + else > + mhdp->plugged = true; I think there's an issue with how the driver uses mhdp->plugged. In the no_hpd case, you try to detect if a display is connected by reading the link status at attach time, and then never update mhdp->plugged. This means that if no display is connected at that point, functions like cdns_mhdp_get_edid() will always fail, even if a display gets plugged later. As the goal of this series is (as far as I understand) support systems where the HPD signal could be connected to a SoC GPIO instead of the bridge, I don't think this is good enough. > + } else { > + mhdp->plugged = cdns_mhdp_detect_hpd(mhdp, _pulse); > + } > > if (!mhdp->plugged) { > cdns_mhdp_link_down(mhdp); > @@ -2451,6 +2475,8 @@ static int cdns_mhdp_probe(struct platform_device *pdev) > mhdp->aux.dev = dev; > mhdp->aux.transfer = cdns_mhdp_transfer; > > + mhdp->no_hpd = of_property_read_bool(dev->of_node, "cdns,no-hpd"); > + > mhdp->regs = devm_platform_ioremap_resource(pdev, 0); > if (IS_ERR(mhdp->regs)) { > dev_err(dev, "Failed to get memory resource\n"); > @@ -2526,8 +2552,9 @@ static int cdns_mhdp_probe(struct platform_device *pdev) > > mhdp->bridge.of_node = pdev->dev.of_node; > mhdp->bridge.funcs = _mhdp_bridge_funcs; > - mhdp->bridge.ops = DRM_BRIDGE_OP_DETECT |
Re: [PATCH v2 1/2] dt-bindings: drm/bridge: Add no-hpd property
Hi Jayesh, Thank you for the patch. On Wed, Apr 05, 2023 at 07:54:39PM +0530, Jayesh Choudhary wrote: > From: Rahul T R > > The mhdp bridge can work without its HPD pin hooked up to the connector, > but the current bridge driver throws an error when hpd line is not > connected to the connector. For such cases, we need an indication for > no-hpd, using which we can bypass the hpd detection and instead use the > auxiliary channels connected to the DP connector to confirm the > connection. > So add no-hpd property to the bindings, to disable hpd when not > connected or unusable. > > Signed-off-by: Rahul T R > Signed-off-by: Jayesh Choudhary > --- > .../devicetree/bindings/display/bridge/cdns,mhdp8546.yaml | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git > a/Documentation/devicetree/bindings/display/bridge/cdns,mhdp8546.yaml > b/Documentation/devicetree/bindings/display/bridge/cdns,mhdp8546.yaml > index c2b369456e4e..3a6c6d837593 100644 > --- a/Documentation/devicetree/bindings/display/bridge/cdns,mhdp8546.yaml > +++ b/Documentation/devicetree/bindings/display/bridge/cdns,mhdp8546.yaml > @@ -57,6 +57,12 @@ properties: >interrupts: > maxItems: 1 > > + cdns,no-hpd: > +type: boolean > +description: > + Set if the HPD line on the bridge isn't hooked up to anything or is > + otherwise unusable. I'm fine with the non connected part, but concerned with "otherwise unusable". It's very vague and could thus be abused. Do you have particular use cases in mind for this ? If so, restricting this to those use cases, or at least giving examples, would help. > + >ports: > $ref: /schemas/graph.yaml#/properties/ports > -- Regards, Laurent Pinchart
[Bug 215631] Some Desktop oriented mode setting drivers are missing DRM PRIME support
https://bugzilla.kernel.org/show_bug.cgi?id=215631 --- Comment #1 from bluescreen_aven...@verizon.net --- I have been informed that 41068c8b28e16f1c2c26c854271520e1f3afaa22 in drm-misc-next should fix it for all the mentioned drivers in the list, except for gma500, ...and I have tested it, and it does work for bochs (in a QEMU card) -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
Re: [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences
On Wed, Apr 05, 2023 at 03:09:08PM +0200, Daniel Vetter wrote: > On Tue, Apr 04, 2023 at 07:48:27PM +, Matthew Brost wrote: > > On Tue, Apr 04, 2023 at 09:25:52PM +0200, Daniel Vetter wrote: > > > On Tue, Apr 04, 2023 at 07:02:23PM +, Matthew Brost wrote: > > > > On Tue, Apr 04, 2023 at 08:14:01PM +0200, Thomas Hellström (Intel) > > > > wrote: > > > > > > > > > > On 4/4/23 15:10, Christian König wrote: > > > > > > Am 04.04.23 um 14:54 schrieb Thomas Hellström: > > > > > > > Hi, Christian, > > > > > > > > > > > > > > On 4/4/23 11:09, Christian König wrote: > > > > > > > > Am 04.04.23 um 02:22 schrieb Matthew Brost: > > > > > > > > > From: Thomas Hellström > > > > > > > > > > > > > > > > > > For long-running workloads, drivers either need to open-code > > > > > > > > > completion > > > > > > > > > waits, invent their own synchronization primitives or > > > > > > > > > internally use > > > > > > > > > dma-fences that do not obey the cross-driver dma-fence > > > > > > > > > protocol, but > > > > > > > > > without any lockdep annotation all these approaches are error > > > > > > > > > prone. > > > > > > > > > > > > > > > > > > So since for example the drm scheduler uses dma-fences it is > > > > > > > > > desirable for > > > > > > > > > a driver to be able to use it for throttling and error > > > > > > > > > handling also with > > > > > > > > > internal dma-fences tha do not obey the cros-driver > > > > > > > > > dma-fence protocol. > > > > > > > > > > > > > > > > > > Introduce long-running completion fences in form of > > > > > > > > > dma-fences, and add > > > > > > > > > lockdep annotation for them. In particular: > > > > > > > > > > > > > > > > > > * Do not allow waiting under any memory management locks. > > > > > > > > > * Do not allow to attach them to a dma-resv object. > > > > > > > > > * Introduce a new interface for adding callbacks making the > > > > > > > > > helper adding > > > > > > > > > a callback sign off on that it is aware that the dma-fence > > > > > > > > > may not > > > > > > > > > complete anytime soon. Typically this will be the > > > > > > > > > scheduler chaining > > > > > > > > > a new long-running fence on another one. > > > > > > > > > > > > > > > > Well that's pretty much what I tried before: > > > > > > > > https://lwn.net/Articles/893704/ > > > > > > > > > > > > > > > > I don't think this quite the same, this explictly enforces that we don't > > > > break the dma-fence rules (in path of memory allocations, exported in > > > > any way), essentially this just SW sync point reusing dma-fence the > > > > infrastructure for signaling / callbacks. I believe your series tried to > > > > export these fences to user space (admittedly I haven't fully read your > > > > series). > > > > > > > > In this use case we essentially just want to flow control the ring via > > > > the dma-scheduler + maintain a list of pending jobs so the TDR can be > > > > used for cleanup if LR entity encounters an error. To me this seems > > > > perfectly reasonable but I know dma-femce rules are akin to a holy war. > > > > > > > > If we return NULL in run_job, now we have to be able to sink all jobs > > > > in the backend regardless on ring space, maintain a list of jobs pending > > > > for cleanup after errors, and write a different cleanup path as now the > > > > TDR doesn't work. Seems very, very silly to duplicate all of this code > > > > when the DRM scheduler provides all of this for us. Also if we go this > > > > route, now all drivers are going to invent ways to handle LR jobs /w the > > > > DRM scheduler. > > > > > > > > This solution is pretty clear, mark the scheduler as LR, and don't > > > > export any fences from the scheduler. If you try to export these fences > > > > a blow up happens. > > > > > > The problem is if you mix things up. Like for resets you need all the > > > schedulers on an engine/set-of-engines to quiescent or things get > > > potentially hilarious. If you now have a scheduler in forever limbo, the > > > dma_fence guarantees are right out the window. > > > > > > > Right, a GT reset on Xe is: > > > > Stop all schedulers > > Do a reset > > Ban any schedulers which we think caused the GT reset > > Resubmit all schedulers which we think were good > > Restart all schedulers > > > > None of this flow depends on LR dma-fences, all of this uses the DRM > > sched infrastructure and work very well compared to the i915. Rewriting > > all this with a driver specific implementation is what we are trying to > > avoid. > > > > Similarly if LR entity hangs on its own (not a GT reset, rather the > > firmware does the reset for us) we use all the DRM scheduler > > infrastructure to handle this. Again this works rather well... > > Yeah this is why I don't think duplicating everything that long-running > jobs need makes any sense. iow I agree with you. > Glad we agree. > > > But the issue you're having is fairly specific if it's just about > > > ringspace.
Re: [PATCH v4 1/6] drm/msm: Add MSM-specific DSC helper methods
On 06/04/2023 02:34, Abhinav Kumar wrote: On 4/5/2023 12:26 PM, Dmitry Baryshkov wrote: On 05/04/2023 03:41, Jessica Zhang wrote: Introduce MSM-specific DSC helper methods, as some calculations are common between DP and DSC. Changes in v2: - Moved files up to msm/ directory - Dropped get_comp_ratio() helper - Used drm_int2fixp() to convert to integers to fp - Style changes to improve readability - Dropped unused bpp variable in msm_dsc_get_dce_bytes_per_line() - Changed msm_dsc_get_slice_per_intf() to a static inline method - Dropped last division step of msm_dsc_get_pclk_per_line() and changed method name accordingly - Changed DSC_BPP macro to drm_dsc_get_bpp_int() helper method - Fixed some math issues caused by passing in incorrect types to drm_fixed methods in get_bytes_per_soft_slice() Changes in v3: - Dropped src_bpp parameter from all methods -- src_bpp can be calculated as dsc->bits_per_component * 3 - Dropped intf_width parameter from get_bytes_per_soft_slice() - Moved dsc->bits_per_component to numerator calculation in get_bytes_per_soft_slice() - Renamed msm_dsc_get_uncompressed_pclk_per_line to *_get_uncompressed_pclk_per_intf() - Removed dsc->slice_width check from msm_dsc_get_uncompressed_pclk_per_intf() - Made get_bytes_per_soft_slice() a public method (this will be called later to help calculate DP pclk params) - Added documentation in comments - Moved extra_eol_bytes math out of msm_dsc_get_eol_byte_num() and renamed msm_dsc_get_eol_byte_num to *_get_bytes_per_intf. Changes in v4: - Changed msm_dsc_get_uncompressed_pclk_per_intf to msm_dsc_get_pclk_per_intf Signed-off-by: Jessica Zhang --- drivers/gpu/drm/msm/Makefile | 1 + drivers/gpu/drm/msm/msm_dsc_helper.c | 47 drivers/gpu/drm/msm/msm_dsc_helper.h | 70 3 files changed, 118 insertions(+) diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile index 7274c41228ed..b814fc80e2d5 100644 --- a/drivers/gpu/drm/msm/Makefile +++ b/drivers/gpu/drm/msm/Makefile @@ -94,6 +94,7 @@ msm-y += \ msm_atomic_tracepoints.o \ msm_debugfs.o \ msm_drv.o \ + msm_dsc_helper.o \ msm_fb.o \ msm_fence.o \ msm_gem.o \ diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.c b/drivers/gpu/drm/msm/msm_dsc_helper.c new file mode 100644 index ..0539221eb09d --- /dev/null +++ b/drivers/gpu/drm/msm/msm_dsc_helper.c @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved + */ + +#include +#include +#include + +#include "msm_drv.h" +#include "msm_dsc_helper.h" + +s64 get_bytes_per_soft_slice(struct drm_dsc_config *dsc) +{ + int bpp = msm_dsc_get_bpp_int(dsc); + s64 numerator_fp, denominator_fp; + s64 comp_ratio_fp = drm_fixp_from_fraction(dsc->bits_per_component * 3, bpp); + + numerator_fp = drm_int2fixp(dsc->slice_width * 3 * dsc->bits_per_component); + denominator_fp = drm_fixp_mul(comp_ratio_fp, drm_int2fixp(8)); + + return drm_fixp_div(numerator_fp, denominator_fp); If we remove 3 * dsc->bits_per_components from both numerator and denominator, this whole function seems to be as simple as DIV_ROUND_UP(dsc->slice_width * bpp, 8) Or, if you prefer FP math, drm_fixp_from_fraction(dsc->slice_width * bpp, 8). True and thats not really surprising because bytes_per_soft_slice is eventually (dsc->slice_width * bpp, 8) I also thought about it after you mentioned that they will cancel out then why was downstream code and our programming guide doing this way. So i thought a bit more and the reason from what I can see is that its showing that compression ratio was factored into the math while calculating it. If we just go with (dsc->slice_width * bpp, 8), it does not really show that this was actually compressed bytes per slice leading to the earlier confusion that it was uncompressed pclk while it actually was. I'd say, name we can name that dsc_bpp or compressed_bpp instead of just bpp, this way showing that it is a compressed pclk. But please don't complicate the math. +} + +u32 msm_dsc_get_bytes_per_intf(struct drm_dsc_config *dsc, int intf_width) +{ + u32 bytes_per_soft_slice, bytes_per_intf; + s64 bytes_per_soft_slice_fp; + int slice_per_intf = msm_dsc_get_slice_per_intf(dsc, intf_width); + + bytes_per_soft_slice_fp = get_bytes_per_soft_slice(dsc); + bytes_per_soft_slice = drm_fixp2int_ceil(bytes_per_soft_slice_fp); + + bytes_per_intf = bytes_per_soft_slice * slice_per_intf; + + return bytes_per_intf; +} + +int msm_dsc_get_pclk_per_intf(struct drm_dsc_config *dsc) +{ + s64 data_width; + + data_width = drm_fixp_mul(drm_int2fixp(dsc->slice_count), + get_bytes_per_soft_slice(dsc)); And this is then DIV_ROUND_UP(dsc->slice_width * dsc->slice_count * bpp, 8) + + return drm_fixp2int_ceil(data_width); +} diff --git
Re: [PATCH v4 1/6] drm/msm: Add MSM-specific DSC helper methods
On 4/5/2023 12:26 PM, Dmitry Baryshkov wrote: On 05/04/2023 03:41, Jessica Zhang wrote: Introduce MSM-specific DSC helper methods, as some calculations are common between DP and DSC. Changes in v2: - Moved files up to msm/ directory - Dropped get_comp_ratio() helper - Used drm_int2fixp() to convert to integers to fp - Style changes to improve readability - Dropped unused bpp variable in msm_dsc_get_dce_bytes_per_line() - Changed msm_dsc_get_slice_per_intf() to a static inline method - Dropped last division step of msm_dsc_get_pclk_per_line() and changed method name accordingly - Changed DSC_BPP macro to drm_dsc_get_bpp_int() helper method - Fixed some math issues caused by passing in incorrect types to drm_fixed methods in get_bytes_per_soft_slice() Changes in v3: - Dropped src_bpp parameter from all methods -- src_bpp can be calculated as dsc->bits_per_component * 3 - Dropped intf_width parameter from get_bytes_per_soft_slice() - Moved dsc->bits_per_component to numerator calculation in get_bytes_per_soft_slice() - Renamed msm_dsc_get_uncompressed_pclk_per_line to *_get_uncompressed_pclk_per_intf() - Removed dsc->slice_width check from msm_dsc_get_uncompressed_pclk_per_intf() - Made get_bytes_per_soft_slice() a public method (this will be called later to help calculate DP pclk params) - Added documentation in comments - Moved extra_eol_bytes math out of msm_dsc_get_eol_byte_num() and renamed msm_dsc_get_eol_byte_num to *_get_bytes_per_intf. Changes in v4: - Changed msm_dsc_get_uncompressed_pclk_per_intf to msm_dsc_get_pclk_per_intf Signed-off-by: Jessica Zhang --- drivers/gpu/drm/msm/Makefile | 1 + drivers/gpu/drm/msm/msm_dsc_helper.c | 47 drivers/gpu/drm/msm/msm_dsc_helper.h | 70 3 files changed, 118 insertions(+) diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile index 7274c41228ed..b814fc80e2d5 100644 --- a/drivers/gpu/drm/msm/Makefile +++ b/drivers/gpu/drm/msm/Makefile @@ -94,6 +94,7 @@ msm-y += \ msm_atomic_tracepoints.o \ msm_debugfs.o \ msm_drv.o \ + msm_dsc_helper.o \ msm_fb.o \ msm_fence.o \ msm_gem.o \ diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.c b/drivers/gpu/drm/msm/msm_dsc_helper.c new file mode 100644 index ..0539221eb09d --- /dev/null +++ b/drivers/gpu/drm/msm/msm_dsc_helper.c @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved + */ + +#include +#include +#include + +#include "msm_drv.h" +#include "msm_dsc_helper.h" + +s64 get_bytes_per_soft_slice(struct drm_dsc_config *dsc) +{ + int bpp = msm_dsc_get_bpp_int(dsc); + s64 numerator_fp, denominator_fp; + s64 comp_ratio_fp = drm_fixp_from_fraction(dsc->bits_per_component * 3, bpp); + + numerator_fp = drm_int2fixp(dsc->slice_width * 3 * dsc->bits_per_component); + denominator_fp = drm_fixp_mul(comp_ratio_fp, drm_int2fixp(8)); + + return drm_fixp_div(numerator_fp, denominator_fp); If we remove 3 * dsc->bits_per_components from both numerator and denominator, this whole function seems to be as simple as DIV_ROUND_UP(dsc->slice_width * bpp, 8) Or, if you prefer FP math, drm_fixp_from_fraction(dsc->slice_width * bpp, 8). True and thats not really surprising because bytes_per_soft_slice is eventually (dsc->slice_width * bpp, 8) I also thought about it after you mentioned that they will cancel out then why was downstream code and our programming guide doing this way. So i thought a bit more and the reason from what I can see is that its showing that compression ratio was factored into the math while calculating it. If we just go with (dsc->slice_width * bpp, 8), it does not really show that this was actually compressed bytes per slice leading to the earlier confusion that it was uncompressed pclk while it actually was. +} + +u32 msm_dsc_get_bytes_per_intf(struct drm_dsc_config *dsc, int intf_width) +{ + u32 bytes_per_soft_slice, bytes_per_intf; + s64 bytes_per_soft_slice_fp; + int slice_per_intf = msm_dsc_get_slice_per_intf(dsc, intf_width); + + bytes_per_soft_slice_fp = get_bytes_per_soft_slice(dsc); + bytes_per_soft_slice = drm_fixp2int_ceil(bytes_per_soft_slice_fp); + + bytes_per_intf = bytes_per_soft_slice * slice_per_intf; + + return bytes_per_intf; +} + +int msm_dsc_get_pclk_per_intf(struct drm_dsc_config *dsc) +{ + s64 data_width; + + data_width = drm_fixp_mul(drm_int2fixp(dsc->slice_count), + get_bytes_per_soft_slice(dsc)); And this is then DIV_ROUND_UP(dsc->slice_width * dsc->slice_count * bpp, 8) + + return drm_fixp2int_ceil(data_width); +} diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.h b/drivers/gpu/drm/msm/msm_dsc_helper.h new file mode 100644 index ..31116a31090f --- /dev/null +++ b/drivers/gpu/drm/msm/msm_dsc_helper.h @@ -0,0 +1,70 @@ +/*
RE: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO creation
>Subject: Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO >creation > >On 04/04/2023 19:04, Yang, Fei wrote: >>> Subject: Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set >>> cache at BO creation >>> >>> On 01/04/2023 09:38, fei.y...@intel.com wrote: From: Fei Yang To comply with the design that buffer objects shall have immutable cache setting through out its life cycle, {set, get}_caching ioctl's are no longer supported from MTL onward. With that change caching policy can only be set at object creation time. The current code applies a default (platform dependent) cache setting for all objects. However this is not optimal for performance tuning. The patch extends the existing gem_create uAPI to let user set PAT index for the object at creation time. The new extension is platform independent, so UMD's can switch to using this extension for older platforms as well, while {set, get}_caching are still supported on these legacy paltforms for compatibility reason. Cc: Chris Wilson Cc: Matt Roper Signed-off-by: Fei Yang Reviewed-by: Andi Shyti >>> >>> Just like the protected content uAPI, there is no way for userspace >>> to tell this feature is available other than trying using it. >>> >>> Given the issues with protected content, is it not thing we could want to >>> add? >> Sorry I'm not aware of the issues with protected content, could you >> elaborate? >> There was a long discussion on teams uAPI channel, could you comment >> there if any concerns? >> >> https://teams.microsoft.com/l/message/19:f1767bda6734476ba0a9c7d147b92 >> 8d1@thread.skype/1675860924675?tenantId=46c98d88-e344-4ed4-8496-4ed771 >> 2e255d=379f3ae1-d138-4205-bb65-d4c7d38cb481=16 >> 75860924675=GSE%20OSGC=i915%20uAPI%20changes >> tedTime=1675860924675=false >> >> Thanks, >> -Fei > > > We wanted to have a getparam to detect protected support and were told > to detect it by trying to create a context with it. > > Now it appears trying to create a protected context can block for several > seconds. > > Since we have to report capabilities to the user even before it creates > protected contexts, any app is at risk of blocking. Can we detect this capability by creating a buffer object? This extension is not blocking, it just provide a way to set caching policy, and should complete very fast. There is a IGT test I created for this extension (not merged yet), please take a look at http://intel-gfx-pw.fi.intel.com/series/19149/ I'm not familiar with getparam, will take a look there as well. But I think it would be easier just create an object. -Fei >-Lionel > > >> >>> Thanks, >>> >>> -Lionel >>> >>> --- drivers/gpu/drm/i915/gem/i915_gem_create.c | 33 include/uapi/drm/i915_drm.h| 36 ++ tools/include/uapi/drm/i915_drm.h | 36 ++ 3 files changed, 105 insertions(+)
Re: [Freedreno] [PATCH v4 1/6] drm/msm: Add MSM-specific DSC helper methods
On 06/04/2023 01:02, Jessica Zhang wrote: On 4/5/2023 12:26 PM, Dmitry Baryshkov wrote: On 05/04/2023 03:41, Jessica Zhang wrote: Introduce MSM-specific DSC helper methods, as some calculations are common between DP and DSC. Changes in v2: - Moved files up to msm/ directory - Dropped get_comp_ratio() helper - Used drm_int2fixp() to convert to integers to fp - Style changes to improve readability - Dropped unused bpp variable in msm_dsc_get_dce_bytes_per_line() - Changed msm_dsc_get_slice_per_intf() to a static inline method - Dropped last division step of msm_dsc_get_pclk_per_line() and changed method name accordingly - Changed DSC_BPP macro to drm_dsc_get_bpp_int() helper method - Fixed some math issues caused by passing in incorrect types to drm_fixed methods in get_bytes_per_soft_slice() Changes in v3: - Dropped src_bpp parameter from all methods -- src_bpp can be calculated as dsc->bits_per_component * 3 - Dropped intf_width parameter from get_bytes_per_soft_slice() - Moved dsc->bits_per_component to numerator calculation in get_bytes_per_soft_slice() - Renamed msm_dsc_get_uncompressed_pclk_per_line to *_get_uncompressed_pclk_per_intf() - Removed dsc->slice_width check from msm_dsc_get_uncompressed_pclk_per_intf() - Made get_bytes_per_soft_slice() a public method (this will be called later to help calculate DP pclk params) - Added documentation in comments - Moved extra_eol_bytes math out of msm_dsc_get_eol_byte_num() and renamed msm_dsc_get_eol_byte_num to *_get_bytes_per_intf. Changes in v4: - Changed msm_dsc_get_uncompressed_pclk_per_intf to msm_dsc_get_pclk_per_intf Signed-off-by: Jessica Zhang --- drivers/gpu/drm/msm/Makefile | 1 + drivers/gpu/drm/msm/msm_dsc_helper.c | 47 drivers/gpu/drm/msm/msm_dsc_helper.h | 70 3 files changed, 118 insertions(+) diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile index 7274c41228ed..b814fc80e2d5 100644 --- a/drivers/gpu/drm/msm/Makefile +++ b/drivers/gpu/drm/msm/Makefile @@ -94,6 +94,7 @@ msm-y += \ msm_atomic_tracepoints.o \ msm_debugfs.o \ msm_drv.o \ + msm_dsc_helper.o \ msm_fb.o \ msm_fence.o \ msm_gem.o \ diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.c b/drivers/gpu/drm/msm/msm_dsc_helper.c new file mode 100644 index ..0539221eb09d --- /dev/null +++ b/drivers/gpu/drm/msm/msm_dsc_helper.c @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved + */ + +#include +#include +#include + +#include "msm_drv.h" +#include "msm_dsc_helper.h" + +s64 get_bytes_per_soft_slice(struct drm_dsc_config *dsc) +{ + int bpp = msm_dsc_get_bpp_int(dsc); + s64 numerator_fp, denominator_fp; + s64 comp_ratio_fp = drm_fixp_from_fraction(dsc->bits_per_component * 3, bpp); + + numerator_fp = drm_int2fixp(dsc->slice_width * 3 * dsc->bits_per_component); + denominator_fp = drm_fixp_mul(comp_ratio_fp, drm_int2fixp(8)); + + return drm_fixp_div(numerator_fp, denominator_fp); If we remove 3 * dsc->bits_per_components from both numerator and denominator, this whole function seems to be as simple as DIV_ROUND_UP(dsc->slice_width * bpp, 8) Or, if you prefer FP math, drm_fixp_from_fraction(dsc->slice_width * bpp, 8). Hi Dmitry, Sounds good. +} + +u32 msm_dsc_get_bytes_per_intf(struct drm_dsc_config *dsc, int intf_width) +{ + u32 bytes_per_soft_slice, bytes_per_intf; + s64 bytes_per_soft_slice_fp; + int slice_per_intf = msm_dsc_get_slice_per_intf(dsc, intf_width); + + bytes_per_soft_slice_fp = get_bytes_per_soft_slice(dsc); + bytes_per_soft_slice = drm_fixp2int_ceil(bytes_per_soft_slice_fp); + + bytes_per_intf = bytes_per_soft_slice * slice_per_intf; + + return bytes_per_intf; +} + +int msm_dsc_get_pclk_per_intf(struct drm_dsc_config *dsc) +{ + s64 data_width; + + data_width = drm_fixp_mul(drm_int2fixp(dsc->slice_count), + get_bytes_per_soft_slice(dsc)); And this is then DIV_ROUND_UP(dsc->slice_width * dsc->slice_count * bpp, 8) I would prefer to keep the FP math/get_bytes_per_soft_slice() call here and leave the ceil() until the end. It is the code, you are calling ceil right after drm_fixp_mul. So, there is no difference. + + return drm_fixp2int_ceil(data_width); +} diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.h b/drivers/gpu/drm/msm/msm_dsc_helper.h new file mode 100644 index ..31116a31090f --- /dev/null +++ b/drivers/gpu/drm/msm/msm_dsc_helper.h @@ -0,0 +1,70 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved + */ + +#ifndef MSM_DSC_HELPER_H_ +#define MSM_DSC_HELPER_H_ + +#include +#include + +/* + * Helper methods for MSM specific DSC calculations that are common between timing engine, + * DSI, and DP. + */ + +/** + *
Re: [Freedreno] [PATCH v4 1/6] drm/msm: Add MSM-specific DSC helper methods
On 4/5/2023 12:26 PM, Dmitry Baryshkov wrote: On 05/04/2023 03:41, Jessica Zhang wrote: Introduce MSM-specific DSC helper methods, as some calculations are common between DP and DSC. Changes in v2: - Moved files up to msm/ directory - Dropped get_comp_ratio() helper - Used drm_int2fixp() to convert to integers to fp - Style changes to improve readability - Dropped unused bpp variable in msm_dsc_get_dce_bytes_per_line() - Changed msm_dsc_get_slice_per_intf() to a static inline method - Dropped last division step of msm_dsc_get_pclk_per_line() and changed method name accordingly - Changed DSC_BPP macro to drm_dsc_get_bpp_int() helper method - Fixed some math issues caused by passing in incorrect types to drm_fixed methods in get_bytes_per_soft_slice() Changes in v3: - Dropped src_bpp parameter from all methods -- src_bpp can be calculated as dsc->bits_per_component * 3 - Dropped intf_width parameter from get_bytes_per_soft_slice() - Moved dsc->bits_per_component to numerator calculation in get_bytes_per_soft_slice() - Renamed msm_dsc_get_uncompressed_pclk_per_line to *_get_uncompressed_pclk_per_intf() - Removed dsc->slice_width check from msm_dsc_get_uncompressed_pclk_per_intf() - Made get_bytes_per_soft_slice() a public method (this will be called later to help calculate DP pclk params) - Added documentation in comments - Moved extra_eol_bytes math out of msm_dsc_get_eol_byte_num() and renamed msm_dsc_get_eol_byte_num to *_get_bytes_per_intf. Changes in v4: - Changed msm_dsc_get_uncompressed_pclk_per_intf to msm_dsc_get_pclk_per_intf Signed-off-by: Jessica Zhang --- drivers/gpu/drm/msm/Makefile | 1 + drivers/gpu/drm/msm/msm_dsc_helper.c | 47 drivers/gpu/drm/msm/msm_dsc_helper.h | 70 3 files changed, 118 insertions(+) diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile index 7274c41228ed..b814fc80e2d5 100644 --- a/drivers/gpu/drm/msm/Makefile +++ b/drivers/gpu/drm/msm/Makefile @@ -94,6 +94,7 @@ msm-y += \ msm_atomic_tracepoints.o \ msm_debugfs.o \ msm_drv.o \ + msm_dsc_helper.o \ msm_fb.o \ msm_fence.o \ msm_gem.o \ diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.c b/drivers/gpu/drm/msm/msm_dsc_helper.c new file mode 100644 index ..0539221eb09d --- /dev/null +++ b/drivers/gpu/drm/msm/msm_dsc_helper.c @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved + */ + +#include +#include +#include + +#include "msm_drv.h" +#include "msm_dsc_helper.h" + +s64 get_bytes_per_soft_slice(struct drm_dsc_config *dsc) +{ + int bpp = msm_dsc_get_bpp_int(dsc); + s64 numerator_fp, denominator_fp; + s64 comp_ratio_fp = drm_fixp_from_fraction(dsc->bits_per_component * 3, bpp); + + numerator_fp = drm_int2fixp(dsc->slice_width * 3 * dsc->bits_per_component); + denominator_fp = drm_fixp_mul(comp_ratio_fp, drm_int2fixp(8)); + + return drm_fixp_div(numerator_fp, denominator_fp); If we remove 3 * dsc->bits_per_components from both numerator and denominator, this whole function seems to be as simple as DIV_ROUND_UP(dsc->slice_width * bpp, 8) Or, if you prefer FP math, drm_fixp_from_fraction(dsc->slice_width * bpp, 8). Hi Dmitry, Sounds good. +} + +u32 msm_dsc_get_bytes_per_intf(struct drm_dsc_config *dsc, int intf_width) +{ + u32 bytes_per_soft_slice, bytes_per_intf; + s64 bytes_per_soft_slice_fp; + int slice_per_intf = msm_dsc_get_slice_per_intf(dsc, intf_width); + + bytes_per_soft_slice_fp = get_bytes_per_soft_slice(dsc); + bytes_per_soft_slice = drm_fixp2int_ceil(bytes_per_soft_slice_fp); + + bytes_per_intf = bytes_per_soft_slice * slice_per_intf; + + return bytes_per_intf; +} + +int msm_dsc_get_pclk_per_intf(struct drm_dsc_config *dsc) +{ + s64 data_width; + + data_width = drm_fixp_mul(drm_int2fixp(dsc->slice_count), + get_bytes_per_soft_slice(dsc)); And this is then DIV_ROUND_UP(dsc->slice_width * dsc->slice_count * bpp, 8) I would prefer to keep the FP math/get_bytes_per_soft_slice() call here and leave the ceil() until the end. + + return drm_fixp2int_ceil(data_width); +} diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.h b/drivers/gpu/drm/msm/msm_dsc_helper.h new file mode 100644 index ..31116a31090f --- /dev/null +++ b/drivers/gpu/drm/msm/msm_dsc_helper.h @@ -0,0 +1,70 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved + */ + +#ifndef MSM_DSC_HELPER_H_ +#define MSM_DSC_HELPER_H_ + +#include +#include + +/* + * Helper methods for MSM specific DSC calculations that are common between timing engine, + * DSI, and DP. + */ + +/** + * msm_dsc_get_bpp_int - get bits per pixel integer value + * @dsc: Pointer to drm dsc config struct + */ +static inline int
Re: [PATCH] drm/i915/guc: Don't capture Gen8 regs on Gen12 devices
On Wed, Apr 05, 2023 at 02:13:31PM -0700, John Harrison wrote: > On 4/3/2023 17:34, Matt Roper wrote: > > On Mon, Apr 03, 2023 at 02:33:34PM -0700, john.c.harri...@intel.com wrote: > > > From: John Harrison > > > > > > A pair of pre-Gen12 registers were being included in the Gen12 capture > > > list. GuC was rejecting those as being invalid and logging errors > > > about them. So, stop doing it. > > Looks like these registers existed from gen8-gen11. With this change, > > it looks like they also won't be included in the GuC error capture for > > gen11 (ICL and EHL/JSL) since those platforms return xe_lpd_lists [1] > > rather than default_lists; do we care about that? I assume not (since > > those platforms don't use GuC submission unless you force it with the > > enable_guc modparam and taint your kernel), but I figured I should point > > it out. > Yeah, I think the code is treating Gen11 as Gen12 rather than Gen9 or it's > own thing. I hadn't spotted that before. It certainly seems incorrect. > > > > > Reviewed-by: Matt Roper > > > > > > [1] Why is the main list we use called xe_lpd (i.e., the name of ADL-P's > > display IP)? It doesn't seem like we're doing anything with display > > registers here so using display IP naming seems really confusing. > I think because no-one has a clue what name refers to what hardware any more > :(. > > What are the official names for IP_VER 9, 11, 12.00, 12.50 and 12.55? Yeah, the naming is a real mess. :-( For graphics IP, the official terms are supposed to be: 12.00 = Xe_LP 12.10 = Xe_LP+ (basically the same as Xe_LP except for interrupts) 12.50 = Xe_HP 12.55 = Xe_HPG (it's nearly identical to Xe_HP) 12.7x = Xe_LPG There are separate names for media, although we didn't really start using them anywhere in the i915 until the separation of IPs started becoming more important with MTL: 12.00 = Xe_M (or Xe_M+ for DG1, but we treat it the same in the KMD) 12.50 = Xe_XPM 12.55 = Xe_HPM 12.60 = Xe_XPM+ 13.00 = Xe_LPM+ and display: 12.00 = Xe_D 13.00 = Xe_LPD (ADL-P) or Xe_HPD (DG2) 14.00 = Xe_LPD+ The pre-12 stuff predates the fancy new marketing-mandated names. Even though we're not using "gen" terminology going forward, those old ones are grandfathered in, so it's still okay to refer to them as gen9, gen11, etc. Matt > > John. > > > > > > > Matt > > > > > Signed-off-by: John Harrison > > > Fixes: dce2bd542337 ("drm/i915/guc: Add Gen9 registers for GuC error > > > state capture.") > > > Cc: Alan Previn > > > Cc: Umesh Nerlige Ramappa > > > Cc: Lucas De Marchi > > > Cc: John Harrison > > > Cc: Jani Nikula > > > Cc: Matt Roper > > > Cc: Balasubramani Vivekanandan > > > Cc: Daniele Ceraolo Spurio > > > --- > > > drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c | 7 +-- > > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c > > > b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c > > > index cf49188db6a6e..e0e793167d61b 100644 > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c > > > @@ -31,12 +31,14 @@ > > > { FORCEWAKE_MT, 0, 0, "FORCEWAKE" } > > > #define COMMON_GEN9BASE_GLOBAL \ > > > - { GEN8_FAULT_TLB_DATA0, 0, 0, "GEN8_FAULT_TLB_DATA0" }, \ > > > - { GEN8_FAULT_TLB_DATA1, 0, 0, "GEN8_FAULT_TLB_DATA1" }, \ > > > { ERROR_GEN6, 0, 0, "ERROR_GEN6" }, \ > > > { DONE_REG, 0, 0, "DONE_REG" }, \ > > > { HSW_GTT_CACHE_EN, 0, 0, "HSW_GTT_CACHE_EN" } > > > +#define GEN9_GLOBAL \ > > > + { GEN8_FAULT_TLB_DATA0, 0, 0, "GEN8_FAULT_TLB_DATA0" }, \ > > > + { GEN8_FAULT_TLB_DATA1, 0, 0, "GEN8_FAULT_TLB_DATA1" } > > > + > > > #define COMMON_GEN12BASE_GLOBAL \ > > > { GEN12_FAULT_TLB_DATA0,0, 0, "GEN12_FAULT_TLB_DATA0" > > > }, \ > > > { GEN12_FAULT_TLB_DATA1,0, 0, "GEN12_FAULT_TLB_DATA1" > > > }, \ > > > @@ -142,6 +144,7 @@ static const struct __guc_mmio_reg_descr > > > xe_lpd_gsc_inst_regs[] = { > > > static const struct __guc_mmio_reg_descr default_global_regs[] = { > > > COMMON_BASE_GLOBAL, > > > COMMON_GEN9BASE_GLOBAL, > > > + GEN9_GLOBAL, > > > }; > > > static const struct __guc_mmio_reg_descr default_rc_class_regs[] = { > > > -- > > > 2.39.1 > > > > -- Matt Roper Graphics Software Engineer Linux GPU Platform Enablement Intel Corporation
Re: [PATCH] drm/i915/guc: Don't capture Gen8 regs on Gen12 devices
On 4/3/2023 17:34, Matt Roper wrote: On Mon, Apr 03, 2023 at 02:33:34PM -0700, john.c.harri...@intel.com wrote: From: John Harrison A pair of pre-Gen12 registers were being included in the Gen12 capture list. GuC was rejecting those as being invalid and logging errors about them. So, stop doing it. Looks like these registers existed from gen8-gen11. With this change, it looks like they also won't be included in the GuC error capture for gen11 (ICL and EHL/JSL) since those platforms return xe_lpd_lists [1] rather than default_lists; do we care about that? I assume not (since those platforms don't use GuC submission unless you force it with the enable_guc modparam and taint your kernel), but I figured I should point it out. Yeah, I think the code is treating Gen11 as Gen12 rather than Gen9 or it's own thing. I hadn't spotted that before. It certainly seems incorrect. Reviewed-by: Matt Roper [1] Why is the main list we use called xe_lpd (i.e., the name of ADL-P's display IP)? It doesn't seem like we're doing anything with display registers here so using display IP naming seems really confusing. I think because no-one has a clue what name refers to what hardware any more :(. What are the official names for IP_VER 9, 11, 12.00, 12.50 and 12.55? John. Matt Signed-off-by: John Harrison Fixes: dce2bd542337 ("drm/i915/guc: Add Gen9 registers for GuC error state capture.") Cc: Alan Previn Cc: Umesh Nerlige Ramappa Cc: Lucas De Marchi Cc: John Harrison Cc: Jani Nikula Cc: Matt Roper Cc: Balasubramani Vivekanandan Cc: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c index cf49188db6a6e..e0e793167d61b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c @@ -31,12 +31,14 @@ { FORCEWAKE_MT, 0, 0, "FORCEWAKE" } #define COMMON_GEN9BASE_GLOBAL \ - { GEN8_FAULT_TLB_DATA0, 0, 0, "GEN8_FAULT_TLB_DATA0" }, \ - { GEN8_FAULT_TLB_DATA1, 0, 0, "GEN8_FAULT_TLB_DATA1" }, \ { ERROR_GEN6, 0, 0, "ERROR_GEN6" }, \ { DONE_REG, 0, 0, "DONE_REG" }, \ { HSW_GTT_CACHE_EN, 0, 0, "HSW_GTT_CACHE_EN" } +#define GEN9_GLOBAL \ + { GEN8_FAULT_TLB_DATA0, 0, 0, "GEN8_FAULT_TLB_DATA0" }, \ + { GEN8_FAULT_TLB_DATA1, 0, 0, "GEN8_FAULT_TLB_DATA1" } + #define COMMON_GEN12BASE_GLOBAL \ { GEN12_FAULT_TLB_DATA0,0, 0, "GEN12_FAULT_TLB_DATA0" }, \ { GEN12_FAULT_TLB_DATA1,0, 0, "GEN12_FAULT_TLB_DATA1" }, \ @@ -142,6 +144,7 @@ static const struct __guc_mmio_reg_descr xe_lpd_gsc_inst_regs[] = { static const struct __guc_mmio_reg_descr default_global_regs[] = { COMMON_BASE_GLOBAL, COMMON_GEN9BASE_GLOBAL, + GEN9_GLOBAL, }; static const struct __guc_mmio_reg_descr default_rc_class_regs[] = { -- 2.39.1
Re: [PATCH 3/3] drm/fb-helper: fix input validation gaps in check_var
Daniel Vetter writes: > On Wed, Apr 05, 2023 at 07:42:08PM +0200, Javier Martinez Canillas wrote: [...] >> >> Ah, your patch adds it after that indeed. Please ignore my comment then. >> > >> > So rb: you? >> > >> >> Yes, I already provided it in my previous email and has been picked by >> patchwork. I could do again but probably will confuse dim when applying. > > Yeah just wanted to confirm I cleared up all your questions. Merged the > entire series to drm-misc-next, thanks for the review. > You are welcome. >> The only patch from your series that is missing an {r,a}b is #1 right now: >> >> https://patchwork.kernel.org/project/dri-devel/list/?series=736966=both > > That's a different one :-) > Oh, sorry about that. Somehow I switched threads in my head in the middle of the response. > I'll respin with your comments and then let you duke it out about > patch 1. > -Daniel > Perfect, thanks! It would be good to finally have that issue fixed. -- Best regards, Javier Martinez Canillas Core Platforms Red Hat
Re: [Regression] drm/scheduler: track GPU active time per entity
On 2023-04-05 13:44, Lucas Stach wrote: > Hi Luben, > > Am Dienstag, dem 04.04.2023 um 00:31 -0400 schrieb Luben Tuikov: >> On 2023-03-28 04:54, Lucas Stach wrote: >>> Hi Danilo, >>> >>> Am Dienstag, dem 28.03.2023 um 02:57 +0200 schrieb Danilo Krummrich: Hi all, Commit df622729ddbf ("drm/scheduler: track GPU active time per entity") tries to track the accumulated time that a job was active on the GPU writing it to the entity through which the job was deployed to the scheduler originally. This is done within drm_sched_get_cleanup_job() which fetches a job from the schedulers pending_list. Doing this can result in a race condition where the entity is already freed, but the entity's newly added elapsed_ns field is still accessed once the job is fetched from the pending_list. After drm_sched_entity_destroy() being called it should be safe to free the structure that embeds the entity. However, a job originally handed over to the scheduler by this entity might still reside in the schedulers pending_list for cleanup after drm_sched_entity_destroy() already being called and the entity being freed. Hence, we can run into a UAF. >>> Sorry about that, I clearly didn't properly consider this case. >>> In my case it happened that a job, as explained above, was just picked from the schedulers pending_list after the entity was freed due to the client application exiting. Meanwhile this freed up memory was already allocated for a subsequent client applications job structure again. Hence, the new jobs memory got corrupted. Luckily, I was able to reproduce the same corruption over and over again by just using deqp-runner to run a specific set of VK test cases in parallel. Fixing this issue doesn't seem to be very straightforward though (unless I miss something), which is why I'm writing this mail instead of sending a fix directly. Spontaneously, I see three options to fix it: 1. Rather than embedding the entity into driver specific structures (e.g. tied to file_priv) we could allocate the entity separately and reference count it, such that it's only freed up once all jobs that were deployed through this entity are fetched from the schedulers pending list. >>> My vote is on this or something in similar vain for the long term. I >>> have some hope to be able to add a GPU scheduling algorithm with a bit >>> more fairness than the current one sometime in the future, which >>> requires execution time tracking on the entities. >> >> Danilo, >> >> Using kref is preferable, i.e. option 1 above. >> >> Lucas, can you shed some light on, >> >> 1. In what way the current FIFO scheduling is unfair, and >> 2. shed some details on this "scheduling algorithm with a bit >> more fairness than the current one"? > > I don't have a specific implementation in mind yet. However the current > FIFO algorithm can be very unfair if you have a sparse workload compete > with one that generates a lot of jobs without any throttling aside from > the entity queue length. Ah, that's interesting, let's see, a "sparse workload compete with one that generates a lot of jobs", so basically we have a sparse workload compete with a dense workload. So we can represent this with two entities, A, B, whose jobs we're going to represent by the entities, names A and B. So if we let A be the sparse workload and B the dense workload, we have this, wlog, First/oldest job, .., latter/new jobs. Subm: A, B, B, B, B, B, A, B, B, B, B, B, A, B, B, B, B, B, A, ... Time: t0,t1,t2,t3,t4,t5,t6,t7,t8,t9, . The current FIFO algorithm, would prefer to execute those jobs in order of submission, i.e. oldest-ready-first job. Assume that all jobs are ready. Then we'll execute them in order. This is desirable and fair. We want to execute the jobs in the order they were submitted, given also that they are ready to be executed. So perhaps we want to execute them like this: First/oldest job, .., latter/new jobs. Subm: A, B, B, B, B, B, A, B, B, B, B, B, A, B, B, B, B, B, A, ... Time: t0,t1,t2,t3,t4,t5,t6,t7,t8,t9, Exec: A, B, B, B, B, B, A, B, B, B, B, B, A, B, B, B, B, B, A, ... Any other ordering would starve either A, or B. If we executed the 2nd A job at t6 or t7, then that would starve the 3rd/4th job in B, since the 2nd A job arrives at the same time as that of the 3rd B job, at time t6. The time t3-t0 is some delta > 0, some initial scheduler-start up time. IOW, we don't want to delay a job any more than it should wait--the oldest job, which is also ready, should execute next, so that we're fair how it executes in real time. We cannot boot B at t6, so that we execute A, just because it is sparse, but just arrived. >From A's point of view, it shouldn't expect its job execution time
Re: [PATCH 3/3] drm/fb-helper: fix input validation gaps in check_var
On Wed, Apr 05, 2023 at 07:42:08PM +0200, Javier Martinez Canillas wrote: > Daniel Vetter writes: > > > On Wed, Apr 05, 2023 at 06:27:17PM +0200, Javier Martinez Canillas wrote: > >> Daniel Vetter writes: > > [...] > > >> > > >> > The __fill_var is after this. I'm honestly not sure what the exact > >> > >> Ah, your patch adds it after that indeed. Please ignore my comment then. > > > > So rb: you? > > > > Yes, I already provided it in my previous email and has been picked by > patchwork. I could do again but probably will confuse dim when applying. Yeah just wanted to confirm I cleared up all your questions. Merged the entire series to drm-misc-next, thanks for the review. > The only patch from your series that is missing an {r,a}b is #1 right now: > > https://patchwork.kernel.org/project/dri-devel/list/?series=736966=both That's a different one :-) I'll respin with your comments and then let you duke it out about patch 1. -Daniel > > [...] > > >> > What I'm wondering now is whether too small x/yres won't lead to problems > >> > of some sorts ... For multi-screen we set the virtual size to be big > >> > enough for all crtc, and then just set x/yres to be the smallest output. > >> > That way fbcon knows to only draw as much as is visible on all screens. > >> > But if you then pan that too much, the bigger screens might not have a > >> > big > >> > enough buffer anymore and things fail (but shouldn't). > >> > > >> > Not sure how to fix that tbh. > >> > >> Would this be a problem in practice? > > > > I'm frankly not sure. You'd get a black screen for fbcon/fbdev across all > > outputs, but only if you have userspace doing this intentionally. > > > > In a way it's just another artifact of the drm fbdev emulation not using > > ATOMIC_TEST_ONLY in the various places where it should, and so doesn't > > really know whether a configuration change will work out. > > > > We already have this in obscure mulit-monitor cases where adding another > > screen kills fbcon, because the display hw is running out of fifo or > > clocks or whatever, and because the drm fbdev code doesn't check but just > > blindly commits the entire thing as an atomic commit, the overall commit > > fails. > > > > This worked "better" with legacy kms because there we commit per-crtc, so > > if any specific crtc runs into a limit check, only that one fails to light > > up. > > > > Imo given that no one cared enough yet to write up atomic TEST_ONLY > > support for fbdev emulation I think we can continue to just ignore this > > problem. > > > > Agreed. If that ends being a problem for people in practice then I guess > someone can type atomic TEST_ONLY support for the fbdev emulation layer. > > > What should not happen is that fbcon code blows up drawing out of bounds > > or something like that, resulting in a kernel crash. So from that pov I > > think it's "safe" :-) > > Great. Thanks a lot for your explanations. > > > -Daniel > > -- > Best regards, > > Javier Martinez Canillas > Core Platforms > Red Hat > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH v4 4/6] drm/msm/dsi: Use MSM and DRM DSC helper methods
On 4/5/2023 12:27 PM, Dmitry Baryshkov wrote: On 05/04/2023 03:41, Jessica Zhang wrote: Use MSM and DRM DSC helper methods to configure DSC for DSI. Changes in V2: - *_calculate_initial_scale_value --> *_set_initial_scale_value - Split pkt_per_line and eol_byte_num changes to a separate patch - Moved pclk_per_line calculation to hdisplay adjustment in `if (dsc)` block of dsi_update_dsc_timing() Changes in v3: - Split pclk_per_intf calculation into a separate patch - Added slice_width check to dsi_timing_setup - Used MSM DSC helper to calculate total_bytes_per_intf Signed-off-by: Jessica Zhang --- drivers/gpu/drm/msm/dsi/dsi_host.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/msm/dsi/dsi_host.c b/drivers/gpu/drm/msm/dsi/dsi_host.c index 74d38f90398a..6a6218a9655f 100644 --- a/drivers/gpu/drm/msm/dsi/dsi_host.c +++ b/drivers/gpu/drm/msm/dsi/dsi_host.c @@ -28,6 +28,7 @@ #include "dsi.xml.h" #include "sfpb.xml.h" #include "dsi_cfg.h" +#include "msm_dsc_helper.h" #include "msm_kms.h" #include "msm_gem.h" #include "phy/dsi_phy.h" @@ -848,7 +849,7 @@ static void dsi_update_dsc_timing(struct msm_dsi_host *msm_host, bool is_cmd_mod /* first calculate dsc parameters and then program * compress mode registers */ - slice_per_intf = DIV_ROUND_UP(hdisplay, dsc->slice_width); + slice_per_intf = msm_dsc_get_slice_per_intf(dsc, hdisplay); /* * If slice_count is greater than slice_per_intf @@ -858,7 +859,7 @@ static void dsi_update_dsc_timing(struct msm_dsi_host *msm_host, bool is_cmd_mod if (dsc->slice_count > slice_per_intf) dsc->slice_count = 1; - total_bytes_per_intf = dsc->slice_chunk_size * slice_per_intf; + total_bytes_per_intf = msm_dsc_get_bytes_per_intf(dsc, hdisplay); eol_byte_num = total_bytes_per_intf % 3; pkt_per_line = slice_per_intf / dsc->slice_count; @@ -936,6 +937,12 @@ static void dsi_timing_setup(struct msm_dsi_host *msm_host, bool is_bonded_dsi) return; } + if (!dsc->slice_width || (mode->hdisplay < dsc->slice_width)) { + pr_err("DSI: invalid slice width %d (pic_width: %d)\n", + dsc->slice_width, mode->hdisplay); + return; + } This is not the "use of MSM and DRM DSC helper methods" and thus should be moved to a separate patch. Hi Dmitry, Acked. Thanks, Jessica Zhang + dsc->pic_width = mode->hdisplay; dsc->pic_height = mode->vdisplay; DBG("Mode %dx%d\n", dsc->pic_width, dsc->pic_height); @@ -1759,7 +1766,7 @@ static int dsi_populate_dsc_params(struct msm_dsi_host *msm_host, struct drm_dsc return ret; } - dsc->initial_scale_value = 32; + drm_dsc_set_initial_scale_value(dsc); dsc->line_buf_depth = dsc->bits_per_component + 1; return drm_dsc_compute_rc_parameters(dsc); -- With best wishes Dmitry
Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO creation
On 2023-04-05 00:45:24, Lionel Landwerlin wrote: > On 04/04/2023 19:04, Yang, Fei wrote: > >> Subject: Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at > >> BO creation > >> > >> Just like the protected content uAPI, there is no way for userspace to tell > >> this feature is available other than trying using it. > >> > >> Given the issues with protected content, is it not thing we could want to > >> add? > > Sorry I'm not aware of the issues with protected content, could you > > elaborate? > > There was a long discussion on teams uAPI channel, could you comment there > > if > > any concerns? > > > > We wanted to have a getparam to detect protected support and were told > to detect it by trying to create a context with it. > An extensions system where the detection mechanism is "just try it", and assume it's not supported if it fails. ?? This seem likely to get more and more problematic as a detection mechanism as more extensions are added. > > Now it appears trying to create a protected context can block for > several seconds. > > Since we have to report capabilities to the user even before it creates > protected contexts, any app is at risk of blocking. > This failure path is not causing any re-thinking about using this as the extension detection mechanism? Doesn't the ioctl# + input-struct-size + u64-extension# identify the extension such that the kernel could indicate if it is supported or not. (Or, perhaps return an array of the supported extensions so the umd doesn't have to potentially make many ioctls for each extension of interest.) -Jordan
Re: [Intel-gfx] [PATCH] i915/guc/slpc: Provide sysfs for efficient freq
On Wed, Apr 05, 2023 at 12:42:30PM -0700, Dixit, Ashutosh wrote: > On Wed, 05 Apr 2023 06:57:42 -0700, Rodrigo Vivi wrote: > > > > Hi Rodrigo, > > > On Fri, Mar 31, 2023 at 08:11:29PM -0700, Dixit, Ashutosh wrote: > > > On Fri, 31 Mar 2023 19:00:49 -0700, Vinay Belgaumkar wrote: > > > > > > > > > > Hi Vinay, > > > > > > > @@ -478,20 +507,15 @@ int intel_guc_slpc_set_min_freq(struct > > > > intel_guc_slpc *slpc, u32 val) > > > > val > slpc->max_freq_softlimit) > > > > return -EINVAL; > > > > > > > > + /* Ignore efficient freq if lower min freq is requested */ > > > > + ret = intel_guc_slpc_set_ignore_eff_freq(slpc, val < > > > > slpc->rp1_freq); > > > > + if (ret) > > > > + goto out; > > > > + > > > > > > I don't agree with this. If we are now providing an interface explicitly > > > to > > > ignore RPe, that should be /only/ way to ignore RPe. There should be no > > > other "under the hood" ignoring of RPe. In other words, ignoring RPe > > > should > > > be minimized unless explicitly requested. > > > > > > I don't clearly understand why this was done previously but it makes even > > > less sense to me now after this patch. > > > > well, I had suggested this previously. And just because without this we > > would > > be breaking API expectations. > > > > When user selects a minimal frequency it expect that to stick. But with the > > efficient freq enabled in guc if minimal is less than the efficient one, > > this request is likely ignored. > > > > Well, even worse is that we are actually caching the request in the soft > > values. > > So we show a minimal, but the hardware without any workload is operating at > > efficient. > > > > So, the thought process was: 'if user requested a very low minimal, we give > > them > > the minimal requested, even if that means to disable the efficient freq.' > > Hmm, I understand this even less now :) > > * Why is RPe ignored when min < RPe? Since the freq can be between min and > max? Shouldn't the condition be min > RPe, that is turn RPe off if min > higher that RPe is requested? that is not how guc efficient freq selection works. (unless my memory is tricking me right now.) So, if we select a min that is between RPe and RP0, guc will respect and use the selected min. So we don't need to disable guc selection of the efficient. This is not true when we select a very low min like RPn. If we select RPn as min and guc efficient freq selection is enabled guc will simply ignore our request. So the only way to give the user what is asked, is to also disable guc's efficient freq selection. (I probably confused you in the previous email because I used 'RP0' when I meant 'RPn'. I hope it gets clear now). > > * Also isn't RPe dynamic, so we can't say RPe == rp1 when using in KMD? Oh... yeap, this is an issue indeed. Specially with i915 where we have the soft values cached instead of asking guc everytime. That's a good point. The variance is not big, but we will hit corner cases. One way is to keep checking and updating everytime a sysfs is touched. Other way is do what you are suggesting and let's just accept and deal with the reality that is: "we cannot guarantee a min freq selection if user doesn't disable the efficient freq selection". > > * Finally, we know that enabling RPe broke the kernel freq API because RPe > could go over max_freq. So it is actually the max freq which is not > obeyed after RPe is enabled. Oh! so it was my bad memory indeed and everything is the other way around? But I just looked to Xe code, my most recent memory, and I just needed to toggle the efficient freq off on the case that I mentioned, when min selection is below the efficient one. With that all the API expectation that I coded in IGT works neatly. > > So we ignore RPe in some select cases (which also I don't understand as > mentioned above but maybe I am missing something) to claim that we are > obeying the freq API, but let the freq API stay broken in other cases? what cases it stays broken? This is why we need the IGT tests for all the API behavior in place. > > > So, that was introduced to avoid API breakage. Removing it now would mean > > breaking API. (Not sure if the IGT tests for the API got merged already, > > but think that as the API contract). > > I think we should take this patch as an opportunity to fix this and give > the user a clean interface to ignore RPe and remove this other implicit way > to ignore RPe. All IGT changes are unmerged at present. Yeap, the IGT needs to come with whatever we concluded here and we need to stick with that afterwards, so let's think with care. Vinay, Ashutosh's strongest argument is the variable RPe. Do you have thoughts on that? > > Thanks. > -- > Ashutosh > > > > > > > But I do agree with you that having something selected from multiple places > > also has the potential to cause some miss-expectations. So I was thinking > > about multiple even orders where the user
[PATCH] drm/doc/rfc: Introduce the merge plan for the Xe driver.
Let’s establish a merge plan for Xe, by writing down clear pre-merge goals, in order to avoid unnecessary delays. This initial document starts with a TODO list containing items with clear and measurable key results. Xe’s initial pull request should only be sent to dri-devel after all the items are clearly resolved. Since many of them involve some level of a community consensus, in many cases, the consensus will be reached in follow-up patches to this document with more details of the API or helpers that will be developed or modified. Cc: Dave Airlie Cc: Daniel Vetter Cc: Oded Gabbay Signed-off-by: Rodrigo Vivi Signed-off-by: Francois Dugast Signed-off-by: Luis Strano Signed-off-by: Matthew Brost Signed-off-by: Thomas Hellström --- Documentation/gpu/rfc/index.rst | 4 + Documentation/gpu/rfc/xe.rst| 216 2 files changed, 220 insertions(+) create mode 100644 Documentation/gpu/rfc/xe.rst diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst index 476719771eef..e4f7b005138d 100644 --- a/Documentation/gpu/rfc/index.rst +++ b/Documentation/gpu/rfc/index.rst @@ -31,3 +31,7 @@ host such documentation: .. toctree:: i915_vm_bind.rst + +.. toctree:: + + xe.rst diff --git a/Documentation/gpu/rfc/xe.rst b/Documentation/gpu/rfc/xe.rst new file mode 100644 index ..1e3e7e9c67c3 --- /dev/null +++ b/Documentation/gpu/rfc/xe.rst @@ -0,0 +1,216 @@ +== +Xe – Merge Acceptance Plan +== +Xe is a new driver for Intel GPUs that supports both integrated and +discrete platforms starting with Tiger Lake (first Intel Xe Architecture). + +This document aims to establish a merge plan for the Xe, by writing down clear +pre-merge goals, in order to avoid unnecessary delays. + +Xe – Overview += +The main motivation of Xe is to have a fresh base to work from that is +unencumbered by older platforms, whilst also taking the opportunity to +rearchitect our driver to increase sharing across the drm subsystem, both +leveraging and allowing us to contribute more towards other shared components +like TTM and drm/scheduler. + +This is also an opportunity to start from the beginning with a clean uAPI that is +extensible by design and already aligned with the modern userspace needs. For +this reason, the memory model is solely based on GPU Virtual Address space +bind/unbind (‘VM_BIND’) of GEM buffer objects (BOs) and execution only supporting +explicit synchronization. With persistent mapping across the execution, the +userspace does not need to provide a list of all required mappings during each +submission. + +The new driver leverages a lot from i915. As for display, the intent is to share +the display code with the i915 driver so that there is maximum reuse there. + +As for the power management area, the goal is to have a much-simplified support +for the system suspend states (S-states), PCI device suspend states (D-states), +GPU/Render suspend states (R-states) and frequency management. It should leverage +as much as possible all the existent PCI-subsystem infrastructure (pm and +runtime_pm) and underlying firmware components such PCODE and GuC for the power +states and frequency decisions. + +Repository: + +https://gitlab.freedesktop.org/drm/xe/kernel (branch drm-xe-next) + +Xe – Platforms +== +Currently, Xe is already functional and has experimental support for multiple +platforms starting from Tiger Lake, with initial support in userspace implemented +in Mesa (for Iris and Anv, our OpenGL and Vulkan drivers), as well as in NEO +(for OpenCL and Level0). + +During a transition period, platforms will be supported by both Xe and i915. +However, the force_probe mechanism existent in both drivers will allow only one +official and by-default probe at a given time. + +For instance, in order to probe a DG2 which PCI ID is 0x5690 by Xe instead of +i915, the following set of parameters need to be used: + +``` +i915.force_probe=!5690 xe.force_probe=5690 +``` + +In both drivers, the ‘.require_force_probe’ protection forces the user to use the +force_probe parameter while the driver is under development. This protection is +only removed when the support for the platform and the uAPI are stable. Stability +which needs to be demonstrated by CI results. + +In order to avoid user space regressions, i915 will continue to support all the +current platforms that are already out of this protection. Xe support will be +forever experimental and dependent on the usage of force_probe for these +platforms. + +When the time comes for Xe, the protection will be lifted on Xe and kept in i915. + +Xe driver will be protected with both STAGING Kconfig and force_probe. Changes in +the uAPI are expected while the driver is behind these protections. STAGING will +be removed when the driver uAPI gets to a mature state where we can guarantee the +‘no regression’ rule. Then force_probe will be lifted
Re: [Intel-gfx] [PATCH] i915/guc/slpc: Provide sysfs for efficient freq
On Wed, 05 Apr 2023 06:57:42 -0700, Rodrigo Vivi wrote: > Hi Rodrigo, > On Fri, Mar 31, 2023 at 08:11:29PM -0700, Dixit, Ashutosh wrote: > > On Fri, 31 Mar 2023 19:00:49 -0700, Vinay Belgaumkar wrote: > > > > > > > Hi Vinay, > > > > > @@ -478,20 +507,15 @@ int intel_guc_slpc_set_min_freq(struct > > > intel_guc_slpc *slpc, u32 val) > > > val > slpc->max_freq_softlimit) > > > return -EINVAL; > > > > > > + /* Ignore efficient freq if lower min freq is requested */ > > > + ret = intel_guc_slpc_set_ignore_eff_freq(slpc, val < slpc->rp1_freq); > > > + if (ret) > > > + goto out; > > > + > > > > I don't agree with this. If we are now providing an interface explicitly to > > ignore RPe, that should be /only/ way to ignore RPe. There should be no > > other "under the hood" ignoring of RPe. In other words, ignoring RPe should > > be minimized unless explicitly requested. > > > > I don't clearly understand why this was done previously but it makes even > > less sense to me now after this patch. > > well, I had suggested this previously. And just because without this we would > be breaking API expectations. > > When user selects a minimal frequency it expect that to stick. But with the > efficient freq enabled in guc if minimal is less than the efficient one, > this request is likely ignored. > > Well, even worse is that we are actually caching the request in the soft > values. > So we show a minimal, but the hardware without any workload is operating at > efficient. > > So, the thought process was: 'if user requested a very low minimal, we give > them > the minimal requested, even if that means to disable the efficient freq.' Hmm, I understand this even less now :) * Why is RPe ignored when min < RPe? Since the freq can be between min and max? Shouldn't the condition be min > RPe, that is turn RPe off if min higher that RPe is requested? * Also isn't RPe dynamic, so we can't say RPe == rp1 when using in KMD? * Finally, we know that enabling RPe broke the kernel freq API because RPe could go over max_freq. So it is actually the max freq which is not obeyed after RPe is enabled. So we ignore RPe in some select cases (which also I don't understand as mentioned above but maybe I am missing something) to claim that we are obeying the freq API, but let the freq API stay broken in other cases? > So, that was introduced to avoid API breakage. Removing it now would mean > breaking API. (Not sure if the IGT tests for the API got merged already, > but think that as the API contract). I think we should take this patch as an opportunity to fix this and give the user a clean interface to ignore RPe and remove this other implicit way to ignore RPe. All IGT changes are unmerged at present. Thanks. -- Ashutosh > > But I do agree with you that having something selected from multiple places > also has the potential to cause some miss-expectations. So I was thinking > about multiple even orders where the user select the RP0 as minimal, then > enable the efficient or vice versa, but I couldn't think of a bad case. > Or at least not as bad as the user asking to get RP0 as minimal and only > getting RPe back. > > With this in mind, and having checked the code: > > Reviewed-by: Rodrigo Vivi > > But I won't push this immediately because I'm still open to hear another > side/angle. > > > > > Thanks. > > -- > > Ashutosh > > > > > > > /* Need a lock now since waitboost can be modifying min as well */ > > > mutex_lock(>lock); > > > wakeref = intel_runtime_pm_get(>runtime_pm); > > > > > > - /* Ignore efficient freq if lower min freq is requested */ > > > - ret = slpc_set_param(slpc, > > > - SLPC_PARAM_IGNORE_EFFICIENT_FREQUENCY, > > > - val < slpc->rp1_freq); > > > - if (ret) { > > > - guc_probe_error(slpc_to_guc(slpc), "Failed to toggle efficient > > > freq: %pe\n", > > > - ERR_PTR(ret)); > > > - goto out; > > > - } > > > - > > > ret = slpc_set_param(slpc, > > >SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ, > > >val);
Re: [PATCH RFC 12/18] rust: drm: sched: Add GPU scheduler abstraction
On Wed, Apr 05, 2023 at 05:43:01PM +0200, Daniel Vetter wrote: > On Tue, Mar 07, 2023 at 11:25:37PM +0900, Asahi Lina wrote: > > +/// An armed DRM scheduler job (not yet submitted) > > +pub struct ArmedJob<'a, T: JobImpl>(Box>, PhantomData<&'a T>); > > + > > +impl<'a, T: JobImpl> ArmedJob<'a, T> { > > +/// Returns the job fences > > +pub fn fences() -> JobFences<'_> { > > +JobFences(unsafe { *self.0.job.s_fence }) > > +} > > + > > +/// Push the job for execution into the scheduler > > +pub fn push(self) { > > +// After this point, the job is submitted and owned by the > > scheduler > > +let ptr = match self { > > +ArmedJob(job, _) => Box::>::into_raw(job), > > +}; > > If I get this all right then this all makes sure that drivers can't use > the job after push and they don't forgot to call arm. > > What I'm not seeing is how we force drivers to call push once they've > called arm? I haven't check what the code does, but from the docs it > sounds like if you don't call push then drop will get called. Which wreaks > the book-keeping on an armed job. Or is there someting that prevents > ArmedJob from having the Drop trait and so the only way to not go boom > is by pushing it? > > Googling for "rust undroppable" seems to indicate that this isn't a thing > rust can do? Another thing that I just realized: The driver must ensure that the arm->push sequence on a given drm_sched_entity isn't interrupte by another thread doing the same, i.e. you need to wrap it all in a lock, and it always needs to be the same lock for a given entity. I have no idea how to guarantee that, but I guess somehow we should? -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH v4 6/6] drm/msm/dsi: Fix calculations pkt_per_line
On 05/04/2023 03:41, Jessica Zhang wrote: Currently, pkt_per_line is calculated by dividing slice_per_intf by slice_count. This is incorrect, as slice_per_intf should be divided by slice_per_pkt, which is not always equivalent to slice_count as it is possible for there to be multiple soft slices per interface even though a panel only specifies one slice per packet. Fixes: 08802f515c3c ("drm/msm/dsi: Add support for DSC configuration") Signed-off-by: Jessica Zhang Reviewed-by: Dmitry Baryshkov --- drivers/gpu/drm/msm/dsi/dsi_host.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) Reviewed-by: Dmitry Baryshkov -- With best wishes Dmitry
Re: [PATCH v4 5/6] drm/msm/dsi: update hdisplay calculation for dsi_timing_setup
On 05/04/2023 03:41, Jessica Zhang wrote: hdisplay for compressed images should be calculated as bytes_per_slice * slice_count. Thus, use MSM DSC helper to calculate hdisplay for dsi_timing_setup instead of directly using mode->hdisplay. Changes in v3: - Split from previous patch - Initialized hdisplay as uncompressed pclk per line at the beginning of dsi_timing_setup as to not break dual DSI calculations Changes in v4: - Moved pclk_per_intf calculations to DSC hdisplay adjustments Signed-off-by: Jessica Zhang --- drivers/gpu/drm/msm/dsi/dsi_host.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Dmitry Baryshkov -- With best wishes Dmitry
Re: [PATCH v4 4/6] drm/msm/dsi: Use MSM and DRM DSC helper methods
On 05/04/2023 03:41, Jessica Zhang wrote: Use MSM and DRM DSC helper methods to configure DSC for DSI. Changes in V2: - *_calculate_initial_scale_value --> *_set_initial_scale_value - Split pkt_per_line and eol_byte_num changes to a separate patch - Moved pclk_per_line calculation to hdisplay adjustment in `if (dsc)` block of dsi_update_dsc_timing() Changes in v3: - Split pclk_per_intf calculation into a separate patch - Added slice_width check to dsi_timing_setup - Used MSM DSC helper to calculate total_bytes_per_intf Signed-off-by: Jessica Zhang --- drivers/gpu/drm/msm/dsi/dsi_host.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/msm/dsi/dsi_host.c b/drivers/gpu/drm/msm/dsi/dsi_host.c index 74d38f90398a..6a6218a9655f 100644 --- a/drivers/gpu/drm/msm/dsi/dsi_host.c +++ b/drivers/gpu/drm/msm/dsi/dsi_host.c @@ -28,6 +28,7 @@ #include "dsi.xml.h" #include "sfpb.xml.h" #include "dsi_cfg.h" +#include "msm_dsc_helper.h" #include "msm_kms.h" #include "msm_gem.h" #include "phy/dsi_phy.h" @@ -848,7 +849,7 @@ static void dsi_update_dsc_timing(struct msm_dsi_host *msm_host, bool is_cmd_mod /* first calculate dsc parameters and then program * compress mode registers */ - slice_per_intf = DIV_ROUND_UP(hdisplay, dsc->slice_width); + slice_per_intf = msm_dsc_get_slice_per_intf(dsc, hdisplay); /* * If slice_count is greater than slice_per_intf @@ -858,7 +859,7 @@ static void dsi_update_dsc_timing(struct msm_dsi_host *msm_host, bool is_cmd_mod if (dsc->slice_count > slice_per_intf) dsc->slice_count = 1; - total_bytes_per_intf = dsc->slice_chunk_size * slice_per_intf; + total_bytes_per_intf = msm_dsc_get_bytes_per_intf(dsc, hdisplay); eol_byte_num = total_bytes_per_intf % 3; pkt_per_line = slice_per_intf / dsc->slice_count; @@ -936,6 +937,12 @@ static void dsi_timing_setup(struct msm_dsi_host *msm_host, bool is_bonded_dsi) return; } + if (!dsc->slice_width || (mode->hdisplay < dsc->slice_width)) { + pr_err("DSI: invalid slice width %d (pic_width: %d)\n", + dsc->slice_width, mode->hdisplay); + return; + } This is not the "use of MSM and DRM DSC helper methods" and thus should be moved to a separate patch. + dsc->pic_width = mode->hdisplay; dsc->pic_height = mode->vdisplay; DBG("Mode %dx%d\n", dsc->pic_width, dsc->pic_height); @@ -1759,7 +1766,7 @@ static int dsi_populate_dsc_params(struct msm_dsi_host *msm_host, struct drm_dsc return ret; } - dsc->initial_scale_value = 32; + drm_dsc_set_initial_scale_value(dsc); dsc->line_buf_depth = dsc->bits_per_component + 1; return drm_dsc_compute_rc_parameters(dsc); -- With best wishes Dmitry
Re: [PATCH v4 3/6] drm/msm/dpu: Fix slice_last_group_size calculation
On 05/04/2023 03:41, Jessica Zhang wrote: Correct the math for slice_last_group_size so that it matches the calculations downstream. Changes in v3: - Reworded slice_last_group_size calculation to `(dsc->slice_width + 2) % 3` Fixes: c110cfd1753e ("drm/msm/disp/dpu1: Add support for DSC") Signed-off-by: Jessica Zhang --- drivers/gpu/drm/msm/disp/dpu1/dpu_hw_dsc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) Reviewed-by: Dmitry Baryshkov -- With best wishes Dmitry
Re: [PATCH v4 1/6] drm/msm: Add MSM-specific DSC helper methods
On 05/04/2023 03:41, Jessica Zhang wrote: Introduce MSM-specific DSC helper methods, as some calculations are common between DP and DSC. Changes in v2: - Moved files up to msm/ directory - Dropped get_comp_ratio() helper - Used drm_int2fixp() to convert to integers to fp - Style changes to improve readability - Dropped unused bpp variable in msm_dsc_get_dce_bytes_per_line() - Changed msm_dsc_get_slice_per_intf() to a static inline method - Dropped last division step of msm_dsc_get_pclk_per_line() and changed method name accordingly - Changed DSC_BPP macro to drm_dsc_get_bpp_int() helper method - Fixed some math issues caused by passing in incorrect types to drm_fixed methods in get_bytes_per_soft_slice() Changes in v3: - Dropped src_bpp parameter from all methods -- src_bpp can be calculated as dsc->bits_per_component * 3 - Dropped intf_width parameter from get_bytes_per_soft_slice() - Moved dsc->bits_per_component to numerator calculation in get_bytes_per_soft_slice() - Renamed msm_dsc_get_uncompressed_pclk_per_line to *_get_uncompressed_pclk_per_intf() - Removed dsc->slice_width check from msm_dsc_get_uncompressed_pclk_per_intf() - Made get_bytes_per_soft_slice() a public method (this will be called later to help calculate DP pclk params) - Added documentation in comments - Moved extra_eol_bytes math out of msm_dsc_get_eol_byte_num() and renamed msm_dsc_get_eol_byte_num to *_get_bytes_per_intf. Changes in v4: - Changed msm_dsc_get_uncompressed_pclk_per_intf to msm_dsc_get_pclk_per_intf Signed-off-by: Jessica Zhang --- drivers/gpu/drm/msm/Makefile | 1 + drivers/gpu/drm/msm/msm_dsc_helper.c | 47 drivers/gpu/drm/msm/msm_dsc_helper.h | 70 3 files changed, 118 insertions(+) diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile index 7274c41228ed..b814fc80e2d5 100644 --- a/drivers/gpu/drm/msm/Makefile +++ b/drivers/gpu/drm/msm/Makefile @@ -94,6 +94,7 @@ msm-y += \ msm_atomic_tracepoints.o \ msm_debugfs.o \ msm_drv.o \ + msm_dsc_helper.o \ msm_fb.o \ msm_fence.o \ msm_gem.o \ diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.c b/drivers/gpu/drm/msm/msm_dsc_helper.c new file mode 100644 index ..0539221eb09d --- /dev/null +++ b/drivers/gpu/drm/msm/msm_dsc_helper.c @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved + */ + +#include +#include +#include + +#include "msm_drv.h" +#include "msm_dsc_helper.h" + +s64 get_bytes_per_soft_slice(struct drm_dsc_config *dsc) +{ + int bpp = msm_dsc_get_bpp_int(dsc); + s64 numerator_fp, denominator_fp; + s64 comp_ratio_fp = drm_fixp_from_fraction(dsc->bits_per_component * 3, bpp); + + numerator_fp = drm_int2fixp(dsc->slice_width * 3 * dsc->bits_per_component); + denominator_fp = drm_fixp_mul(comp_ratio_fp, drm_int2fixp(8)); + + return drm_fixp_div(numerator_fp, denominator_fp); If we remove 3 * dsc->bits_per_components from both numerator and denominator, this whole function seems to be as simple as DIV_ROUND_UP(dsc->slice_width * bpp, 8) Or, if you prefer FP math, drm_fixp_from_fraction(dsc->slice_width * bpp, 8). +} + +u32 msm_dsc_get_bytes_per_intf(struct drm_dsc_config *dsc, int intf_width) +{ + u32 bytes_per_soft_slice, bytes_per_intf; + s64 bytes_per_soft_slice_fp; + int slice_per_intf = msm_dsc_get_slice_per_intf(dsc, intf_width); + + bytes_per_soft_slice_fp = get_bytes_per_soft_slice(dsc); + bytes_per_soft_slice = drm_fixp2int_ceil(bytes_per_soft_slice_fp); + + bytes_per_intf = bytes_per_soft_slice * slice_per_intf; + + return bytes_per_intf; +} + +int msm_dsc_get_pclk_per_intf(struct drm_dsc_config *dsc) +{ + s64 data_width; + + data_width = drm_fixp_mul(drm_int2fixp(dsc->slice_count), + get_bytes_per_soft_slice(dsc)); And this is then DIV_ROUND_UP(dsc->slice_width * dsc->slice_count * bpp, 8) + + return drm_fixp2int_ceil(data_width); +} diff --git a/drivers/gpu/drm/msm/msm_dsc_helper.h b/drivers/gpu/drm/msm/msm_dsc_helper.h new file mode 100644 index ..31116a31090f --- /dev/null +++ b/drivers/gpu/drm/msm/msm_dsc_helper.h @@ -0,0 +1,70 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved + */ + +#ifndef MSM_DSC_HELPER_H_ +#define MSM_DSC_HELPER_H_ + +#include +#include + +/* + * Helper methods for MSM specific DSC calculations that are common between timing engine, + * DSI, and DP. + */ + +/** + * msm_dsc_get_bpp_int - get bits per pixel integer value + * @dsc: Pointer to drm dsc config struct + */ +static inline int msm_dsc_get_bpp_int(struct drm_dsc_config *dsc) +{ + WARN_ON_ONCE(dsc->bits_per_pixel & 0xf); + return dsc->bits_per_pixel >> 4; +} + +/**
Re: [PULL] drm-misc-fixes
On Wed, Apr 05, 2023 at 08:28:55PM +0200, Thomas Zimmermann wrote: > Hi Dave and Daniel, > > here's this week's PR for drm-misc-fixes. As requested, it comes > a day earlier than usual due to Easter holidays. > > Best regards > Thomas > > drm-misc-fixes-2023-04-05: > Short summary of fixes pull: > > * ivpu: DMA fence and suspend fixes > * nouveau: Color-depth fixes > * panfrost: Fix mmap error handling > The following changes since commit 25bbe844ef5c4fb4d7d8dcaa0080f922b7cd3a16: > > drm: test: Fix 32-bit issue in drm_buddy_test (2023-03-29 17:14:15 +0200) > > are available in the Git repository at: > > git://anongit.freedesktop.org/drm/drm-misc tags/drm-misc-fixes-2023-04-05 > > for you to fetch changes up to 0ec8671837a61d841462179686c5819d951d3b10: > > accel/ivpu: Fix S3 system suspend when not idle (2023-04-05 09:07:26 +0200) Pulled, thanks. > > > Short summary of fixes pull: > > * ivpu: DMA fence and suspend fixes > * nouveau: Color-depth fixes > * panfrost: Fix mmap error handling > > > Boris Brezillon (1): > drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path > > Jacek Lawrynowicz (1): > accel/ivpu: Fix S3 system suspend when not idle > > Karol Herbst (1): > drm/nouveau/disp: Support more modes by checking with lower bpc > > Karol Wachowski (1): > accel/ivpu: Add dma fence to command buffers only > > drivers/accel/ivpu/ivpu_job.c | 18 +++--- > drivers/accel/ivpu/ivpu_pm.c| 26 +++--- > drivers/gpu/drm/nouveau/dispnv50/disp.c | 32 > drivers/gpu/drm/nouveau/nouveau_dp.c| 8 +--- > drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + > 5 files changed, 56 insertions(+), 29 deletions(-) > > -- > Thomas Zimmermann > Graphics Driver Developer > SUSE Software Solutions Germany GmbH > Maxfeldstr. 5, 90409 Nürnberg, Germany > (HRB 36809, AG Nürnberg) > Geschäftsführer: Felix Imendörffer -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [RFC PATCH 00/10] Xe DRM scheduler and long running workload plans
On Wed, Apr 05, 2023 at 12:06:53PM -0600, Zeng, Oak wrote: > Hi, > > Using dma-fence for completion/dependency tracking for long-run workload(more > precisely on-demand paging/page fault enabled workload) can cause deadlock. > This seems the significant issue here. Other issues such as the drm scheduler > completion order implication etc are minors which can be solve inside the > framework of drm scheduler. We need to evaluate below paths: > > 1) still use drm scheduler for job submission, and use dma-fence for > job completion waiting/dependency tracking. This is solution proposed in this > series. Annotate dma-fence for long-run workload: user can still wait > dma-fence for job completion but can't wait dma-fence while holding any > memory management locks. We still use dma-fence for dependency tracking. But > it is just very easily run into deadlock when on-demand paging is in the > picture. The annotation helps us to detect deadlock but not solve deadlock > problems. Seems *not* a complete solution: It is almost impossible to > completely avoid dependency deadlock in complex runtime environment > No one can wait on LR fence, so it is impossible to deadlock. The annotations enforce this. Literally this is only for flow controling the ring / hold pending jobs in in the DRM schedule list. > 2) Still use drm scheduler but not use dma-fence for completion > signaling and dependency tracking. This way we still get some free functions > (reset, err handling ring flow control as Matt said)from drm scheduler, but > push the dependency/completion tracking completely to user space using > techniques such as user space fence. User space doesn't have chance to wait > fence while holding a kernel memory management lock, thus the dma-fence > deadlock issue is solved. > We use user space fence for syncs. > 3) Completely discard drm scheduler and dma-fence for long-run > workload. Use user queue/doorbell for super fast submission, directly > interact with fw scheduler. Use user fence for completion/dependency tracking. > This is a hard no from me, I want 1 submission path in Xe. Either we use the DRM scheduler or we don't. Matt > Thanks, > Oak > > > -Original Message- > > From: Christian König > > Sent: April 5, 2023 3:30 AM > > To: Brost, Matthew ; Zeng, Oak > > > > Cc: dri-devel@lists.freedesktop.org; intel...@lists.freedesktop.org; > > robdcl...@chromium.org; thomas.hellst...@linux.intel.com; airl...@linux.ie; > > l...@asahilina.net; boris.brezil...@collabora.com; > > faith.ekstr...@collabora.com > > Subject: Re: [RFC PATCH 00/10] Xe DRM scheduler and long running workload > > plans > > > > Am 04.04.23 um 20:08 schrieb Matthew Brost: > > > On Tue, Apr 04, 2023 at 12:02:03PM -0600, Zeng, Oak wrote: > > >> Hi Matt, Thomas, > > >> > > >> Some very bold out of box thinking in this area: > > >> > > >> 1. so you want to use drm scheduler and dma-fence for long running > > >> workload. > > Why you want to do this in the first place? What is the benefit? Drm > > scheduler is > > pretty much a software scheduler. Modern gpu has scheduler built at fw/hw > > level, as you said below for intel this is Guc. Can xe driver just directly > > submit job > > to Guc, bypassing drm scheduler? > > >> > > > If we did that now we have 2 paths for dependency track, flow controling > > > the ring, resets / error handling / backend submission implementations. > > > We don't want this. > > > > Well exactly that's the point: Why? > > > > As far as I can see that are two completely distinct use cases, so you > > absolutely do want two completely distinct implementations for this. > > > > >> 2. using dma-fence for long run workload: I am well aware that page > > >> fault (and > > the consequent memory allocation/lock acquiring to fix the fault) can cause > > deadlock for a dma-fence wait. But I am not convinced that dma-fence can't > > be > > used purely because the nature of the workload that it runs very long > > (indefinite). > > I did a math: the dma_fence_wait_timeout function's third param is the > > timeout > > which is a signed long type. If HZ is 1000, this is about 23 days. If 23 > > days is not long > > enough, can we just change the timeout parameter to signed 64 bits so it is > > much > > longer than our life time... > > >> > > >> So I mainly argue we can't use dma-fence for long-run workload is not > > because the workload runs very long, rather because of the fact that we use > > page fault for long-run workload. If we enable page fault for short-run > > workload, > > we can't use dma-fence either. Page fault is the key thing here. > > >> > > >> Now since we use page fault which is *fundamentally* controversial with > > dma-fence design, why now just introduce a independent concept such as user- > > fence instead of extending existing dma-fence? > > >> > > >> I like unified design. If drm scheduler, dma-fence can be extended to > > >> work for > >
[linux-next:master] BUILD REGRESSION 8417c8f5007bf4567ccffda850a3157c7d905f67
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master branch HEAD: 8417c8f5007bf4567ccffda850a3157c7d905f67 Add linux-next specific files for 20230405 Error/Warning reports: https://lore.kernel.org/oe-kbuild-all/202303161521.jbgbafjj-...@intel.com Error/Warning: (recently discovered and may have been fixed) drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_validation.c:351:13: warning: variable 'bw_needed' set but not used [-Wunused-but-set-variable] drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_validation.c:352:25: warning: variable 'link' set but not used [-Wunused-but-set-variable] drivers/gpu/drm/tegra/fb.c:276:60: error: 'VM_MAP' undeclared (first use in this function); did you mean 'VM_MTE'? drivers/net/wireless/legacy/ray_cs.c:628:17: warning: 'strncpy' specified bound 32 equals destination size [-Wstringop-truncation] ld.lld: error: undefined symbol: find_kallsyms_symbol_value Unverified Error/Warning (likely false positive, please contact us if interested): drivers/acpi/property.c:985 acpi_data_prop_read_single() error: potentially dereferencing uninitialized 'obj'. include/linux/gpio/consumer.h: linux/err.h is included more than once. include/linux/gpio/driver.h: asm/bug.h is included more than once. io_uring/io_uring.c:432 io_prep_async_work() error: we previously assumed 'req->file' could be null (see line 425) io_uring/kbuf.c:221 __io_remove_buffers() warn: variable dereferenced before check 'bl->buf_ring' (see line 219) Error/Warning ids grouped by kconfigs: gcc_recent_errors |-- alpha-allyesconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used | `-- drivers-net-wireless-legacy-ray_cs.c:warning:strncpy-specified-bound-equals-destination-size |-- alpha-randconfig-r036-20230403 | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used |-- arc-allyesconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used |-- arm-allmodconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used |-- arm-allyesconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used |-- arm64-allyesconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used |-- i386-allyesconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used |-- ia64-allmodconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used | `-- drivers-net-wireless-legacy-ray_cs.c:warning:strncpy-specified-bound-equals-destination-size |-- loongarch-allmodconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used |-- loongarch-allyesconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used |-- loongarch-defconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used | `-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used |-- mips-allmodconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used | `-- drivers-gpu-drm-tegra-fb.c:error:VM_MAP-undeclared-(first-use-in-this-function) |-- mips-allyesconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-d
Re: [PATCH 01/18] fbdev: Prepare generic architecture helpers
Hi Am 05.04.23 um 17:53 schrieb Arnd Bergmann: On Wed, Apr 5, 2023, at 17:05, Thomas Zimmermann wrote: Generic implementations of fb_pgprotect() and fb_is_primary_device() have been in the source code for a long time. Prepare the header file to make use of them. Improve the code by using an inline function for fb_pgprotect() and by removing include statements. Symbols are protected by preprocessor guards. Architectures that provide a symbol need to define a preprocessor token of the same name and value. Otherwise the header file will provide a generic implementation. This pattern has been taken from . Signed-off-by: Thomas Zimmermann Moving this into generic code is good, but I'm not sure about the default for fb_pgprotect(): + +#ifndef fb_pgprotect +#define fb_pgprotect fb_pgprotect +static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma, + unsigned long off) +{ } +#endif I think most architectures will want the version we have on arc, arm, arm64, loongarch, and sh already: static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma, unsigned long off) { vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); } so I'd suggest making that version the default, and treating the empty ones (m68knommu, sparc32) as architecture specific workarounds. Make sense, thanks for the feedback. I'll send out an update soon. Best regards Thomas I see that sparc64 and parisc use pgprot_uncached here, but as they don't define a custom pgprot_writecombine, this ends up being the same, and they can use the above definition as well. mips defines pgprot_writecombine but uses pgprot_noncached in fb_pgprotect(), which is probably a mistake and should have been updated as part of commit 4b050ba7a66c ("MIPS: pgtable.h: Implement the pgprot_writecombine function for MIPS"). Arnd -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev OpenPGP_signature Description: OpenPGP digital signature
Re: [RFC PATCH 02/10] drm/sched: Move schedule policy to scheduler / entity
On Wed, Apr 05, 2023 at 01:37:22PM -0400, Luben Tuikov wrote: > Hi, > > Inlined: > Thanks for the feedback. > On 2023-04-03 20:22, Matthew Brost wrote: > > Rather than a global modparam for scheduling policy, move the scheduling > > policy to scheduler / entity so user can control each scheduler / entity > > policy. > > > > Signed-off-by: Matthew Brost > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + > > drivers/gpu/drm/etnaviv/etnaviv_sched.c| 3 ++- > > drivers/gpu/drm/lima/lima_sched.c | 3 ++- > > drivers/gpu/drm/msm/msm_ringbuffer.c | 3 ++- > > drivers/gpu/drm/panfrost/panfrost_job.c| 3 ++- > > drivers/gpu/drm/scheduler/sched_entity.c | 25 ++ > > drivers/gpu/drm/scheduler/sched_main.c | 21 +- > > drivers/gpu/drm/v3d/v3d_sched.c| 15 - > > include/drm/gpu_scheduler.h| 23 ++-- > > 9 files changed, 73 insertions(+), 24 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > index 00c9c03c8f94..4df0fca5a74c 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > @@ -2368,6 +2368,7 @@ static int amdgpu_device_init_schedulers(struct > > amdgpu_device *adev) > >ring->num_hw_submission, > > amdgpu_job_hang_limit, > >timeout, adev->reset_domain->wq, > >ring->sched_score, ring->name, > > + DRM_SCHED_POLICY_DEFAULT, > >adev->dev); > > if (r) { > > DRM_ERROR("Failed to create scheduler on ring %s.\n", > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > > b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > > index 8486a2923f1b..61204a3f8b0b 100644 > > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > > @@ -136,7 +136,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu) > > ret = drm_sched_init(>sched, _sched_ops, NULL, > > etnaviv_hw_jobs_limit, etnaviv_job_hang_limit, > > msecs_to_jiffies(500), NULL, NULL, > > -dev_name(gpu->dev), gpu->dev); > > +dev_name(gpu->dev), DRM_SCHED_POLICY_DEFAULT, > > +gpu->dev); > > if (ret) > > return ret; > > > > diff --git a/drivers/gpu/drm/lima/lima_sched.c > > b/drivers/gpu/drm/lima/lima_sched.c > > index 54f53bece27c..33042ba6ae93 100644 > > --- a/drivers/gpu/drm/lima/lima_sched.c > > +++ b/drivers/gpu/drm/lima/lima_sched.c > > @@ -491,7 +491,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, > > const char *name) > > return drm_sched_init(>base, _sched_ops, NULL, 1, > > lima_job_hang_limit, > > msecs_to_jiffies(timeout), NULL, > > - NULL, name, pipe->ldev->dev); > > + NULL, name, DRM_SCHED_POLICY_DEFAULT, > > + pipe->ldev->dev); > > } > > > > void lima_sched_pipe_fini(struct lima_sched_pipe *pipe) > > diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c > > b/drivers/gpu/drm/msm/msm_ringbuffer.c > > index 5879fc262047..f408a9097315 100644 > > --- a/drivers/gpu/drm/msm/msm_ringbuffer.c > > +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c > > @@ -97,7 +97,8 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu > > *gpu, int id, > > > > ret = drm_sched_init(>sched, _sched_ops, NULL, > > num_hw_submissions, 0, sched_timeout, > > - NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev); > > + NULL, NULL, to_msm_bo(ring->bo)->name, > > + DRM_SCHED_POLICY_DEFAULT, gpu->dev->dev); > > if (ret) { > > goto fail; > > } > > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c > > b/drivers/gpu/drm/panfrost/panfrost_job.c > > index f48b07056a16..effa48b33dce 100644 > > --- a/drivers/gpu/drm/panfrost/panfrost_job.c > > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c > > @@ -819,7 +819,8 @@ int panfrost_job_init(struct panfrost_device *pfdev) > > nentries, 0, > > msecs_to_jiffies(JOB_TIMEOUT_MS), > > pfdev->reset.wq, > > -NULL, "pan_js", pfdev->dev); > > +NULL, "pan_js", DRM_SCHED_POLICY_DEFAULT, > > +pfdev->dev); > > if (ret) { > > dev_err(pfdev->dev, "Failed to create scheduler: %d.", > > ret); > > goto err_sched; > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c > > b/drivers/gpu/drm/scheduler/sched_entity.c > > index
[PULL] drm-misc-fixes
Hi Dave and Daniel, here's this week's PR for drm-misc-fixes. As requested, it comes a day earlier than usual due to Easter holidays. Best regards Thomas drm-misc-fixes-2023-04-05: Short summary of fixes pull: * ivpu: DMA fence and suspend fixes * nouveau: Color-depth fixes * panfrost: Fix mmap error handling The following changes since commit 25bbe844ef5c4fb4d7d8dcaa0080f922b7cd3a16: drm: test: Fix 32-bit issue in drm_buddy_test (2023-03-29 17:14:15 +0200) are available in the Git repository at: git://anongit.freedesktop.org/drm/drm-misc tags/drm-misc-fixes-2023-04-05 for you to fetch changes up to 0ec8671837a61d841462179686c5819d951d3b10: accel/ivpu: Fix S3 system suspend when not idle (2023-04-05 09:07:26 +0200) Short summary of fixes pull: * ivpu: DMA fence and suspend fixes * nouveau: Color-depth fixes * panfrost: Fix mmap error handling Boris Brezillon (1): drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path Jacek Lawrynowicz (1): accel/ivpu: Fix S3 system suspend when not idle Karol Herbst (1): drm/nouveau/disp: Support more modes by checking with lower bpc Karol Wachowski (1): accel/ivpu: Add dma fence to command buffers only drivers/accel/ivpu/ivpu_job.c | 18 +++--- drivers/accel/ivpu/ivpu_pm.c| 26 +++--- drivers/gpu/drm/nouveau/dispnv50/disp.c | 32 drivers/gpu/drm/nouveau/nouveau_dp.c| 8 +--- drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + 5 files changed, 56 insertions(+), 29 deletions(-) -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer
Re: [PATCH] drm/amd/display: Fix potential null dereference
On 4/3/23 09:10, Igor Artemiev wrote: The adev->dm.dc pointer can be NULL and dereferenced in amdgpu_dm_fini() without checking. Add a NULL pointer check before calling dc_dmub_srv_destroy(). Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 9a71c7d31734 ("drm/amd/display: Register DMUB service with DC") Signed-off-by: Igor Artemiev Applied, thanks! --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index a01fd41643fc..27f7a554874e 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -1854,7 +1854,8 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev) dc_deinit_callbacks(adev->dm.dc); #endif - dc_dmub_srv_destroy(>dm.dc->ctx->dmub_srv); + if (adev->dm.dc) + dc_dmub_srv_destroy(>dm.dc->ctx->dmub_srv); if (dc_enable_dmub_notifications(adev->dm.dc)) { kfree(adev->dm.dmub_notify); -- Hamza
Re: [PATCH] drm/scheduler: Fix UAF race in drm_sched_entity_push_job()
On 2023-04-05 12:37, Asahi Lina wrote: > After a job is pushed into the queue, it is owned by the scheduler core > and may be freed at any time, so we can't write nor read the submit > timestamp after that point. > > Fixes oopses observed with the drm/asahi driver, found with kASAN. > > Signed-off-by: Asahi Lina > --- That's a good fix! Thanks for fixing this. I'll push this through drm-misc-next and drm-misc-fixes, if there are no objections. Reviewed-by: Luben Tuikov Regards, Luben > drivers/gpu/drm/scheduler/sched_entity.c | 11 +-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c > b/drivers/gpu/drm/scheduler/sched_entity.c > index 15d04a0ec623..e0a8890a62e2 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -507,12 +507,19 @@ void drm_sched_entity_push_job(struct drm_sched_job > *sched_job) > { > struct drm_sched_entity *entity = sched_job->entity; > bool first; > + ktime_t submit_ts; > > trace_drm_sched_job(sched_job, entity); > atomic_inc(entity->rq->sched->score); > WRITE_ONCE(entity->last_user, current->group_leader); > + > + /* > + * After the sched_job is pushed into the entity queue, it may be > + * completed and freed up at any time. We can no longer access it. > + * Make sure to set the submit_ts first, to avoid a race. > + */ > + sched_job->submit_ts = submit_ts = ktime_get(); > first = spsc_queue_push(>job_queue, _job->queue_node); > - sched_job->submit_ts = ktime_get(); > > /* first job wakes up scheduler */ > if (first) { > @@ -529,7 +536,7 @@ void drm_sched_entity_push_job(struct drm_sched_job > *sched_job) > spin_unlock(>rq_lock); > > if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) > - drm_sched_rq_update_fifo(entity, sched_job->submit_ts); > + drm_sched_rq_update_fifo(entity, submit_ts); > > drm_sched_wakeup(entity->rq->sched); > } > > --- > base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6 > change-id: 20230406-scheduler-uaf-2-44cf8faed245 > > Thank you, > ~~ Lina >
Re: [PATCH] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name
On 2023-04-05 12:34, Asahi Lina wrote: > A signaled scheduler fence can outlive its scheduler, since fences are > independently reference counted. Therefore, we can't reference the > scheduler in the get_timeline_name() implementation. > > Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared > dma-bufs reference fences from GPU schedulers that no longer exist. > > Signed-off-by: Asahi Lina > --- Thanks for fixing this. If there's no objections, I'll add my tag and I'll push this tomorrow. I think this should go in through drm-misc-fixes and drm-misc-next. Reviewed-by: Luben Tuikov Regards, Luben > drivers/gpu/drm/scheduler/sched_entity.c | 7 ++- > drivers/gpu/drm/scheduler/sched_fence.c | 4 +++- > include/drm/gpu_scheduler.h | 5 + > 3 files changed, 14 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c > b/drivers/gpu/drm/scheduler/sched_entity.c > index 15d04a0ec623..8b3b949b2ce8 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -368,7 +368,12 @@ static bool drm_sched_entity_add_dependency_cb(struct > drm_sched_entity *entity) > > /* >* Fence is from the same scheduler, only need to wait for > - * it to be scheduled > + * it to be scheduled. > + * > + * Note: s_fence->sched could have been freed and reallocated > + * as another scheduler. This false positive case is okay, as if > + * the old scheduler was freed all of its jobs must have > + * signaled their completion fences. >*/ > fence = dma_fence_get(_fence->scheduled); > dma_fence_put(entity->dependency); > diff --git a/drivers/gpu/drm/scheduler/sched_fence.c > b/drivers/gpu/drm/scheduler/sched_fence.c > index 7fd869520ef2..33b145dfa38c 100644 > --- a/drivers/gpu/drm/scheduler/sched_fence.c > +++ b/drivers/gpu/drm/scheduler/sched_fence.c > @@ -66,7 +66,7 @@ static const char *drm_sched_fence_get_driver_name(struct > dma_fence *fence) > static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f) > { > struct drm_sched_fence *fence = to_drm_sched_fence(f); > - return (const char *)fence->sched->name; > + return (const char *)fence->sched_name; > } > > static void drm_sched_fence_free_rcu(struct rcu_head *rcu) > @@ -168,6 +168,8 @@ void drm_sched_fence_init(struct drm_sched_fence *fence, > unsigned seq; > > fence->sched = entity->rq->sched; > + strlcpy(fence->sched_name, entity->rq->sched->name, > + sizeof(fence->sched_name)); > seq = atomic_inc_return(>fence_seq); > dma_fence_init(>scheduled, _sched_fence_ops_scheduled, > >lock, entity->fence_context, seq); > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h > index 9db9e5e504ee..49f019731891 100644 > --- a/include/drm/gpu_scheduler.h > +++ b/include/drm/gpu_scheduler.h > @@ -295,6 +295,11 @@ struct drm_sched_fence { > * @lock: the lock used by the scheduled and the finished fences. > */ > spinlock_t lock; > +/** > + * @sched_name: the name of the scheduler that owns this fence. We > + * keep a copy here since fences can outlive their scheduler. > + */ > + char sched_name[16]; > /** > * @owner: job owner for debugging > */ > > --- > base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6 > change-id: 20230406-scheduler-uaf-1-994ec34cac93 > > Thank you, > ~~ Lina >
RE: [RFC PATCH 00/10] Xe DRM scheduler and long running workload plans
Hi, Using dma-fence for completion/dependency tracking for long-run workload(more precisely on-demand paging/page fault enabled workload) can cause deadlock. This seems the significant issue here. Other issues such as the drm scheduler completion order implication etc are minors which can be solve inside the framework of drm scheduler. We need to evaluate below paths: 1) still use drm scheduler for job submission, and use dma-fence for job completion waiting/dependency tracking. This is solution proposed in this series. Annotate dma-fence for long-run workload: user can still wait dma-fence for job completion but can't wait dma-fence while holding any memory management locks. We still use dma-fence for dependency tracking. But it is just very easily run into deadlock when on-demand paging is in the picture. The annotation helps us to detect deadlock but not solve deadlock problems. Seems *not* a complete solution: It is almost impossible to completely avoid dependency deadlock in complex runtime environment 2) Still use drm scheduler but not use dma-fence for completion signaling and dependency tracking. This way we still get some free functions (reset, err handling ring flow control as Matt said)from drm scheduler, but push the dependency/completion tracking completely to user space using techniques such as user space fence. User space doesn't have chance to wait fence while holding a kernel memory management lock, thus the dma-fence deadlock issue is solved. 3) Completely discard drm scheduler and dma-fence for long-run workload. Use user queue/doorbell for super fast submission, directly interact with fw scheduler. Use user fence for completion/dependency tracking. Thanks, Oak > -Original Message- > From: Christian König > Sent: April 5, 2023 3:30 AM > To: Brost, Matthew ; Zeng, Oak > > Cc: dri-devel@lists.freedesktop.org; intel...@lists.freedesktop.org; > robdcl...@chromium.org; thomas.hellst...@linux.intel.com; airl...@linux.ie; > l...@asahilina.net; boris.brezil...@collabora.com; > faith.ekstr...@collabora.com > Subject: Re: [RFC PATCH 00/10] Xe DRM scheduler and long running workload > plans > > Am 04.04.23 um 20:08 schrieb Matthew Brost: > > On Tue, Apr 04, 2023 at 12:02:03PM -0600, Zeng, Oak wrote: > >> Hi Matt, Thomas, > >> > >> Some very bold out of box thinking in this area: > >> > >> 1. so you want to use drm scheduler and dma-fence for long running > >> workload. > Why you want to do this in the first place? What is the benefit? Drm > scheduler is > pretty much a software scheduler. Modern gpu has scheduler built at fw/hw > level, as you said below for intel this is Guc. Can xe driver just directly > submit job > to Guc, bypassing drm scheduler? > >> > > If we did that now we have 2 paths for dependency track, flow controling > > the ring, resets / error handling / backend submission implementations. > > We don't want this. > > Well exactly that's the point: Why? > > As far as I can see that are two completely distinct use cases, so you > absolutely do want two completely distinct implementations for this. > > >> 2. using dma-fence for long run workload: I am well aware that page fault > >> (and > the consequent memory allocation/lock acquiring to fix the fault) can cause > deadlock for a dma-fence wait. But I am not convinced that dma-fence can't be > used purely because the nature of the workload that it runs very long > (indefinite). > I did a math: the dma_fence_wait_timeout function's third param is the timeout > which is a signed long type. If HZ is 1000, this is about 23 days. If 23 days > is not long > enough, can we just change the timeout parameter to signed 64 bits so it is > much > longer than our life time... > >> > >> So I mainly argue we can't use dma-fence for long-run workload is not > because the workload runs very long, rather because of the fact that we use > page fault for long-run workload. If we enable page fault for short-run > workload, > we can't use dma-fence either. Page fault is the key thing here. > >> > >> Now since we use page fault which is *fundamentally* controversial with > dma-fence design, why now just introduce a independent concept such as user- > fence instead of extending existing dma-fence? > >> > >> I like unified design. If drm scheduler, dma-fence can be extended to work > >> for > everything, it is beautiful. But seems we have some fundamental problem here. > >> > > Thomas's patches turn a dma-fence into KMD sync point (e.g. we just use > > the signal / CB infrastructure) and enforce we don't use use these > > dma-fences from the scheduler in memory reclaim paths or export these to > > user space or other drivers. Think of this mode as SW only fence. > > Yeah and I truly think this is an really bad idea. > > The signal/CB infrastructure in the dma_fence turned out to be the > absolutely nightmare I initially predicted. Sorry to say that, but in > this
Re: [Intel-gfx] [PATCH 1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers
On Wed, Apr 5, 2023 at 7:15 PM Daniel Vetter wrote: > > On Wed, 5 Apr 2023 at 18:54, Javier Martinez Canillas > wrote: > > > > Daniel Vetter writes: > > > > > On Wed, Apr 05, 2023 at 04:32:19PM +0200, Thomas Zimmermann wrote: > > > > [...] > > > > >> > > >/* > > >> > > > * WARNING: Apparently we must kick fbdev drivers before > > >> > > > vgacon, > > >> > > > * otherwise the vga fbdev driver falls over. > > >> > > > */ > > >> > > >ret = vga_remove_vgacon(pdev); > > >> > > > >> > This isn't enough, we also nuke stuff that's mapping the vga fb range. > > > > Ah, also need aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE) > > then. > > > > [...] > > > > >> int aperture_remove_legacy_vga_devices(struct pci_dev *pdev) > > >> { > > >> aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE); > > >> > > >> return vga_remove_vgacon(pdev); > > >> } > > >> > > >> And that can be called from gma500 and the pci aperture helper. > > > > > > But you still pass a pci_dev to that helper. Which just doesn't make any > > > sense to me (assuming your entire point is that this isn't just a normal > > > pci device but some special legacy vga thing), but if we go with (void) > > > then there's more refactoring to do because the vga_remove_vgacon also > > > wants a pdev. > > > > > > All so that we don't call aperture_detach_devices() on a bunch of pci > > > bars, which apparently is not problem for any other driver, but absolutely > > > is a huge problem for gma500 somehow. > > > > > > I don't understand why. > > > > > > > Yeah, agreed that if vga_remove_vgacon() isn't enough and another helper > > is needed then starts to get a little silly. Maybe one option is to add a > > 3rd param to aperture_remove_conflicting_pci_devices() and skip the logic > > to iterate over PCI bars and call aperture_remove_conflicting_devices() ? > > The thing I don't get: Why does this matter for gma500 and not any of > the other pci devices? Look at your gpu, realize there's a lot more > than the one pci bar for vram or stolen memory, realize that we're > nuking bars that cannot possible contain the framebuffer for everyone > else too. Like the entire "gpus have a lot of bars" thing is the > reason why I pulled the sysfb_disable one level up, because we've been > doing that quite a few times before this patch (yes it's not the main > thing, but the side-effect cleanup is why I've gone down this rabbit > hole and wrote the entire series here): > > https://lore.kernel.org/dri-devel/20230404201842.567344-7-daniel.vet...@ffwll.ch/ > > But somehow for gma500 it's a problem, while for everyone else it's > fine. That's the part I dont get, or Thomas have been talking past > each another and there's another issue that I'm missing. > -Daniel I'm also getting confused here. AFAIK the stolen memory works the same for gma500 hardware as other Intel GPUs. Are you saying that there is a difference in how gma500 hardware works? I always assumed that i915 got away with not dealing much with stolen memory because it simply doesn't use it for allocations. In gma500 we use it for fbdev and cursors. The actual pages reserved by the bios can be accessed through a pci bar if you map it so (which IIRC we do) but I suppose that doesn't help identifying it as a range reserved by other drivers. The reason I've kept the stolen allocation logic is because some gma500 systems don't have a lot of memory. But that is mostly the old Pouslbo systems. Perhaps it is time to ditch the stolen allocation code? -Patrik > > > > Consider this me throwing in the towel. If you are convinced this > > > makes sense please type it up and merge it, but I'm not going to type > > > something that just doesn't make sense to me. > > > > Honestly, I would just go with the double drm_aperture_remove_*() helper > > calls (your original patch) unless that causes real issues. There is no > > point on blocking all your series just for this IMO. > > > > Then latter if Thomas has strong opinions can send a follow-up patch for > > the gma500 driver and the aperture helpers. > > > > > -Daniel > > > > > > > -- > > Best regards, > > > > Javier Martinez Canillas > > Core Platforms > > Red Hat > > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
Re: [Regression] drm/scheduler: track GPU active time per entity
Hi Luben, Am Dienstag, dem 04.04.2023 um 00:31 -0400 schrieb Luben Tuikov: > On 2023-03-28 04:54, Lucas Stach wrote: > > Hi Danilo, > > > > Am Dienstag, dem 28.03.2023 um 02:57 +0200 schrieb Danilo Krummrich: > > > Hi all, > > > > > > Commit df622729ddbf ("drm/scheduler: track GPU active time per entity") > > > tries to track the accumulated time that a job was active on the GPU > > > writing it to the entity through which the job was deployed to the > > > scheduler originally. This is done within drm_sched_get_cleanup_job() > > > which fetches a job from the schedulers pending_list. > > > > > > Doing this can result in a race condition where the entity is already > > > freed, but the entity's newly added elapsed_ns field is still accessed > > > once the job is fetched from the pending_list. > > > > > > After drm_sched_entity_destroy() being called it should be safe to free > > > the structure that embeds the entity. However, a job originally handed > > > over to the scheduler by this entity might still reside in the > > > schedulers pending_list for cleanup after drm_sched_entity_destroy() > > > already being called and the entity being freed. Hence, we can run into > > > a UAF. > > > > > Sorry about that, I clearly didn't properly consider this case. > > > > > In my case it happened that a job, as explained above, was just picked > > > from the schedulers pending_list after the entity was freed due to the > > > client application exiting. Meanwhile this freed up memory was already > > > allocated for a subsequent client applications job structure again. > > > Hence, the new jobs memory got corrupted. Luckily, I was able to > > > reproduce the same corruption over and over again by just using > > > deqp-runner to run a specific set of VK test cases in parallel. > > > > > > Fixing this issue doesn't seem to be very straightforward though (unless > > > I miss something), which is why I'm writing this mail instead of sending > > > a fix directly. > > > > > > Spontaneously, I see three options to fix it: > > > > > > 1. Rather than embedding the entity into driver specific structures > > > (e.g. tied to file_priv) we could allocate the entity separately and > > > reference count it, such that it's only freed up once all jobs that were > > > deployed through this entity are fetched from the schedulers pending list. > > > > > My vote is on this or something in similar vain for the long term. I > > have some hope to be able to add a GPU scheduling algorithm with a bit > > more fairness than the current one sometime in the future, which > > requires execution time tracking on the entities. > > Danilo, > > Using kref is preferable, i.e. option 1 above. > > Lucas, can you shed some light on, > > 1. In what way the current FIFO scheduling is unfair, and > 2. shed some details on this "scheduling algorithm with a bit > more fairness than the current one"? I don't have a specific implementation in mind yet. However the current FIFO algorithm can be very unfair if you have a sparse workload compete with one that generates a lot of jobs without any throttling aside from the entity queue length. By tracking the actual GPU time consumed by the entities we could implement something with a bit more fairness like deficit round robin (don't pin me on the specific algorithm, as I haven't given it much thought yet). Regards, Lucas
Re: [Regression] drm/scheduler: track GPU active time per entity
On 4/5/23 18:09, Luben Tuikov wrote: On 2023-04-05 10:05, Danilo Krummrich wrote: On 4/4/23 06:31, Luben Tuikov wrote: On 2023-03-28 04:54, Lucas Stach wrote: Hi Danilo, Am Dienstag, dem 28.03.2023 um 02:57 +0200 schrieb Danilo Krummrich: Hi all, Commit df622729ddbf ("drm/scheduler: track GPU active time per entity") tries to track the accumulated time that a job was active on the GPU writing it to the entity through which the job was deployed to the scheduler originally. This is done within drm_sched_get_cleanup_job() which fetches a job from the schedulers pending_list. Doing this can result in a race condition where the entity is already freed, but the entity's newly added elapsed_ns field is still accessed once the job is fetched from the pending_list. After drm_sched_entity_destroy() being called it should be safe to free the structure that embeds the entity. However, a job originally handed over to the scheduler by this entity might still reside in the schedulers pending_list for cleanup after drm_sched_entity_destroy() already being called and the entity being freed. Hence, we can run into a UAF. Sorry about that, I clearly didn't properly consider this case. In my case it happened that a job, as explained above, was just picked from the schedulers pending_list after the entity was freed due to the client application exiting. Meanwhile this freed up memory was already allocated for a subsequent client applications job structure again. Hence, the new jobs memory got corrupted. Luckily, I was able to reproduce the same corruption over and over again by just using deqp-runner to run a specific set of VK test cases in parallel. Fixing this issue doesn't seem to be very straightforward though (unless I miss something), which is why I'm writing this mail instead of sending a fix directly. Spontaneously, I see three options to fix it: 1. Rather than embedding the entity into driver specific structures (e.g. tied to file_priv) we could allocate the entity separately and reference count it, such that it's only freed up once all jobs that were deployed through this entity are fetched from the schedulers pending list. My vote is on this or something in similar vain for the long term. I have some hope to be able to add a GPU scheduling algorithm with a bit more fairness than the current one sometime in the future, which requires execution time tracking on the entities. Danilo, Using kref is preferable, i.e. option 1 above. I think the only real motivation for doing that would be for generically tracking job statistics within the entity a job was deployed through. If we all agree on tracking job statistics this way I am happy to prepare a patch for this option and drop this one: https://lore.kernel.org/all/20230331000622.4156-1-d...@redhat.com/T/#u Hmm, I never thought about "job statistics" when I preferred using kref above. The reason kref is attractive is because one doesn't need to worry about it--when the last user drops the kref, the release is called to do housekeeping. If this never happens, we know that we have a bug to debug. I tied it to the use case of (accumulated) job statistics since I think using kref might only make sense if we have a reason why a job needs to access the entity it was deployed through. And the only real reason to keep this connection seems to be (accumulated) job statistics. If it is only about allowing drivers to derive a driver specific entity structure from a job I think it might depend on whether this is a "typical application" nearly every driver does (which it seems not to be) or if it is an exception. In the latter case the driver could still store the corresponding pointers in its driver specific structures and take care of not freeing the structure the entity is embedded in until it is safe to do so. Since it was already tried twice to generically get (accumulated) job statistics into entities personally I think doing so plus using kref would be quite nice. However, I'd like to be sure this fits for everyone's use cases. Regarding the patch above--I did look around the code, and it seems safe, as per your analysis, I didn't see any reference to entity after job submission, but I'll comment on that thread as well for the record. Regards, Luben Christian mentioned amdgpu tried something similar to what Lucas tried running into similar trouble, backed off and implemented it in another way - a driver specific way I guess? Lucas, can you shed some light on, 1. In what way the current FIFO scheduling is unfair, and 2. shed some details on this "scheduling algorithm with a bit more fairness than the current one"? Regards, Luben 2. Somehow make sure drm_sched_entity_destroy() does block until all jobs deployed through this entity were fetched from the schedulers pending list. Though, I'm pretty sure that this is not really desirable. 3. Just revert the change and let drivers implement tracking of GPU active times
Re: [PATCH 1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers
Daniel Vetter writes: > On Wed, 5 Apr 2023 at 18:54, Javier Martinez Canillas > wrote: >> >> Daniel Vetter writes: [...] >> >> Yeah, agreed that if vga_remove_vgacon() isn't enough and another helper >> is needed then starts to get a little silly. Maybe one option is to add a >> 3rd param to aperture_remove_conflicting_pci_devices() and skip the logic >> to iterate over PCI bars and call aperture_remove_conflicting_devices() ? > > The thing I don't get: Why does this matter for gma500 and not any of > the other pci devices? Look at your gpu, realize there's a lot more Yes, I don't know why gma500 is special in that sense but I'm not familiar with that hardware to have an opinion on this. -- Best regards, Javier Martinez Canillas Core Platforms Red Hat
Re: [PATCH 3/3] drm/fb-helper: fix input validation gaps in check_var
Daniel Vetter writes: > On Wed, Apr 05, 2023 at 06:27:17PM +0200, Javier Martinez Canillas wrote: >> Daniel Vetter writes: [...] >> > >> > The __fill_var is after this. I'm honestly not sure what the exact >> >> Ah, your patch adds it after that indeed. Please ignore my comment then. > > So rb: you? > Yes, I already provided it in my previous email and has been picked by patchwork. I could do again but probably will confuse dim when applying. The only patch from your series that is missing an {r,a}b is #1 right now: https://patchwork.kernel.org/project/dri-devel/list/?series=736966=both [...] >> > What I'm wondering now is whether too small x/yres won't lead to problems >> > of some sorts ... For multi-screen we set the virtual size to be big >> > enough for all crtc, and then just set x/yres to be the smallest output. >> > That way fbcon knows to only draw as much as is visible on all screens. >> > But if you then pan that too much, the bigger screens might not have a big >> > enough buffer anymore and things fail (but shouldn't). >> > >> > Not sure how to fix that tbh. >> >> Would this be a problem in practice? > > I'm frankly not sure. You'd get a black screen for fbcon/fbdev across all > outputs, but only if you have userspace doing this intentionally. > > In a way it's just another artifact of the drm fbdev emulation not using > ATOMIC_TEST_ONLY in the various places where it should, and so doesn't > really know whether a configuration change will work out. > > We already have this in obscure mulit-monitor cases where adding another > screen kills fbcon, because the display hw is running out of fifo or > clocks or whatever, and because the drm fbdev code doesn't check but just > blindly commits the entire thing as an atomic commit, the overall commit > fails. > > This worked "better" with legacy kms because there we commit per-crtc, so > if any specific crtc runs into a limit check, only that one fails to light > up. > > Imo given that no one cared enough yet to write up atomic TEST_ONLY > support for fbdev emulation I think we can continue to just ignore this > problem. > Agreed. If that ends being a problem for people in practice then I guess someone can type atomic TEST_ONLY support for the fbdev emulation layer. > What should not happen is that fbcon code blows up drawing out of bounds > or something like that, resulting in a kernel crash. So from that pov I > think it's "safe" :-) Great. Thanks a lot for your explanations. > -Daniel -- Best regards, Javier Martinez Canillas Core Platforms Red Hat
Re: [PATCH] drm/scheduler: set entity to NULL in drm_sched_entity_pop_job()
On 2023-03-31 01:59, Christian König wrote: > Am 31.03.23 um 02:06 schrieb Danilo Krummrich: >> It already happend a few times that patches slipped through which >> implemented access to an entity through a job that was already removed >> from the entities queue. Since jobs and entities might have different >> lifecycles, this can potentially cause UAF bugs. >> >> In order to make it obvious that a jobs entity pointer shouldn't be >> accessed after drm_sched_entity_pop_job() was called successfully, set >> the jobs entity pointer to NULL once the job is removed from the entity >> queue. >> >> Moreover, debugging a potential NULL pointer dereference is way easier >> than potentially corrupted memory through a UAF. >> >> Signed-off-by: Danilo Krummrich > > In general "YES PLEASE!", but I fear that this will break amdgpus reset > sequence. > > On the other hand when amdgpu still relies on that pointer it's clearly > a bug (which I pointed out tons of times before). > > Luben any opinion on that? Could you drive cleaning that up as well? I didn't find any references to scheduling entity after the job is submitted to the hardware. (I commented the same in the other thread, we just need to decide which way to go.) Regards, Luben > > Thanks, > Christian. > >> --- >> I'm aware that drivers could already use job->entity in arbitrary places, >> since >> they in control of when the entity is actually freed. A quick grep didn't >> give >> me any results where this would actually be the case, however maybe I also >> just >> didn't catch it. >> >> If, therefore, we don't want to set job->entity to NULL I think we should at >> least add a comment somewhere. >> --- >> >> drivers/gpu/drm/scheduler/sched_entity.c | 6 ++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c >> b/drivers/gpu/drm/scheduler/sched_entity.c >> index 15d04a0ec623..a9c6118e534b 100644 >> --- a/drivers/gpu/drm/scheduler/sched_entity.c >> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >> @@ -448,6 +448,12 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct >> drm_sched_entity *entity) >> drm_sched_rq_update_fifo(entity, next->submit_ts); >> } >> >> +/* Jobs and entities might have different lifecycles. Since we're >> + * removing the job from the entities queue, set the jobs entity pointer >> + * to NULL to prevent any future access of the entity through this job. >> + */ >> +sched_job->entity = NULL; >> + >> return sched_job; >> } >> >
Re: [RFC PATCH 02/10] drm/sched: Move schedule policy to scheduler / entity
Hi, Inlined: On 2023-04-03 20:22, Matthew Brost wrote: > Rather than a global modparam for scheduling policy, move the scheduling > policy to scheduler / entity so user can control each scheduler / entity > policy. > > Signed-off-by: Matthew Brost > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + > drivers/gpu/drm/etnaviv/etnaviv_sched.c| 3 ++- > drivers/gpu/drm/lima/lima_sched.c | 3 ++- > drivers/gpu/drm/msm/msm_ringbuffer.c | 3 ++- > drivers/gpu/drm/panfrost/panfrost_job.c| 3 ++- > drivers/gpu/drm/scheduler/sched_entity.c | 25 ++ > drivers/gpu/drm/scheduler/sched_main.c | 21 +- > drivers/gpu/drm/v3d/v3d_sched.c| 15 - > include/drm/gpu_scheduler.h| 23 ++-- > 9 files changed, 73 insertions(+), 24 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 00c9c03c8f94..4df0fca5a74c 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -2368,6 +2368,7 @@ static int amdgpu_device_init_schedulers(struct > amdgpu_device *adev) > ring->num_hw_submission, > amdgpu_job_hang_limit, > timeout, adev->reset_domain->wq, > ring->sched_score, ring->name, > +DRM_SCHED_POLICY_DEFAULT, > adev->dev); > if (r) { > DRM_ERROR("Failed to create scheduler on ring %s.\n", > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > index 8486a2923f1b..61204a3f8b0b 100644 > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > @@ -136,7 +136,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu) > ret = drm_sched_init(>sched, _sched_ops, NULL, >etnaviv_hw_jobs_limit, etnaviv_job_hang_limit, >msecs_to_jiffies(500), NULL, NULL, > - dev_name(gpu->dev), gpu->dev); > + dev_name(gpu->dev), DRM_SCHED_POLICY_DEFAULT, > + gpu->dev); > if (ret) > return ret; > > diff --git a/drivers/gpu/drm/lima/lima_sched.c > b/drivers/gpu/drm/lima/lima_sched.c > index 54f53bece27c..33042ba6ae93 100644 > --- a/drivers/gpu/drm/lima/lima_sched.c > +++ b/drivers/gpu/drm/lima/lima_sched.c > @@ -491,7 +491,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, > const char *name) > return drm_sched_init(>base, _sched_ops, NULL, 1, > lima_job_hang_limit, > msecs_to_jiffies(timeout), NULL, > - NULL, name, pipe->ldev->dev); > + NULL, name, DRM_SCHED_POLICY_DEFAULT, > + pipe->ldev->dev); > } > > void lima_sched_pipe_fini(struct lima_sched_pipe *pipe) > diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c > b/drivers/gpu/drm/msm/msm_ringbuffer.c > index 5879fc262047..f408a9097315 100644 > --- a/drivers/gpu/drm/msm/msm_ringbuffer.c > +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c > @@ -97,7 +97,8 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu > *gpu, int id, > > ret = drm_sched_init(>sched, _sched_ops, NULL, > num_hw_submissions, 0, sched_timeout, > - NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev); > + NULL, NULL, to_msm_bo(ring->bo)->name, > + DRM_SCHED_POLICY_DEFAULT, gpu->dev->dev); > if (ret) { > goto fail; > } > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c > b/drivers/gpu/drm/panfrost/panfrost_job.c > index f48b07056a16..effa48b33dce 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_job.c > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c > @@ -819,7 +819,8 @@ int panfrost_job_init(struct panfrost_device *pfdev) >nentries, 0, >msecs_to_jiffies(JOB_TIMEOUT_MS), >pfdev->reset.wq, > - NULL, "pan_js", pfdev->dev); > + NULL, "pan_js", DRM_SCHED_POLICY_DEFAULT, > + pfdev->dev); > if (ret) { > dev_err(pfdev->dev, "Failed to create scheduler: %d.", > ret); > goto err_sched; > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c > b/drivers/gpu/drm/scheduler/sched_entity.c > index 15d04a0ec623..f1299e51860b 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -33,6 +33,20 @@ > #define to_drm_sched_job(sched_job) \ > container_of((sched_job),
Re: [PATCH 3/3] drm/fb-helper: fix input validation gaps in check_var
On Wed, Apr 05, 2023 at 06:27:17PM +0200, Javier Martinez Canillas wrote: > Daniel Vetter writes: > > [...] > > >> > >> but only the 'var->xres > fb->width || var->yres > fb->height' from the > >> conditions checked could be false after your __fill_var() call above. > >> > >> You should drop the 'var->bits_per_pixel > bpp', 'var->xres_virtual > > >> fb->width' and 'var->yres_virtual > fb->height' checks I believe since > >> those will always be true. > > > > The __fill_var is after this. I'm honestly not sure what the exact > > Ah, your patch adds it after that indeed. Please ignore my comment then. So rb: you? > > semantics are supposed to be, but essentially if userspace asks for too > > big virtual size, we reject it. And for anything else we then tell it > > (with __fill_var) how big the actually available space is. > > > > What I'm wondering now is whether too small x/yres won't lead to problems > > of some sorts ... For multi-screen we set the virtual size to be big > > enough for all crtc, and then just set x/yres to be the smallest output. > > That way fbcon knows to only draw as much as is visible on all screens. > > But if you then pan that too much, the bigger screens might not have a big > > enough buffer anymore and things fail (but shouldn't). > > > > Not sure how to fix that tbh. > > Would this be a problem in practice? I'm frankly not sure. You'd get a black screen for fbcon/fbdev across all outputs, but only if you have userspace doing this intentionally. In a way it's just another artifact of the drm fbdev emulation not using ATOMIC_TEST_ONLY in the various places where it should, and so doesn't really know whether a configuration change will work out. We already have this in obscure mulit-monitor cases where adding another screen kills fbcon, because the display hw is running out of fifo or clocks or whatever, and because the drm fbdev code doesn't check but just blindly commits the entire thing as an atomic commit, the overall commit fails. This worked "better" with legacy kms because there we commit per-crtc, so if any specific crtc runs into a limit check, only that one fails to light up. Imo given that no one cared enough yet to write up atomic TEST_ONLY support for fbdev emulation I think we can continue to just ignore this problem. What should not happen is that fbcon code blows up drawing out of bounds or something like that, resulting in a kernel crash. So from that pov I think it's "safe" :-) -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH 1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers
On Wed, 5 Apr 2023 at 18:54, Javier Martinez Canillas wrote: > > Daniel Vetter writes: > > > On Wed, Apr 05, 2023 at 04:32:19PM +0200, Thomas Zimmermann wrote: > > [...] > > >> > > >/* > >> > > > * WARNING: Apparently we must kick fbdev drivers before > >> > > > vgacon, > >> > > > * otherwise the vga fbdev driver falls over. > >> > > > */ > >> > > >ret = vga_remove_vgacon(pdev); > >> > > >> > This isn't enough, we also nuke stuff that's mapping the vga fb range. > > Ah, also need aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE) > then. > > [...] > > >> int aperture_remove_legacy_vga_devices(struct pci_dev *pdev) > >> { > >> aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE); > >> > >> return vga_remove_vgacon(pdev); > >> } > >> > >> And that can be called from gma500 and the pci aperture helper. > > > > But you still pass a pci_dev to that helper. Which just doesn't make any > > sense to me (assuming your entire point is that this isn't just a normal > > pci device but some special legacy vga thing), but if we go with (void) > > then there's more refactoring to do because the vga_remove_vgacon also > > wants a pdev. > > > > All so that we don't call aperture_detach_devices() on a bunch of pci > > bars, which apparently is not problem for any other driver, but absolutely > > is a huge problem for gma500 somehow. > > > > I don't understand why. > > > > Yeah, agreed that if vga_remove_vgacon() isn't enough and another helper > is needed then starts to get a little silly. Maybe one option is to add a > 3rd param to aperture_remove_conflicting_pci_devices() and skip the logic > to iterate over PCI bars and call aperture_remove_conflicting_devices() ? The thing I don't get: Why does this matter for gma500 and not any of the other pci devices? Look at your gpu, realize there's a lot more than the one pci bar for vram or stolen memory, realize that we're nuking bars that cannot possible contain the framebuffer for everyone else too. Like the entire "gpus have a lot of bars" thing is the reason why I pulled the sysfb_disable one level up, because we've been doing that quite a few times before this patch (yes it's not the main thing, but the side-effect cleanup is why I've gone down this rabbit hole and wrote the entire series here): https://lore.kernel.org/dri-devel/20230404201842.567344-7-daniel.vet...@ffwll.ch/ But somehow for gma500 it's a problem, while for everyone else it's fine. That's the part I dont get, or Thomas have been talking past each another and there's another issue that I'm missing. -Daniel > > Consider this me throwing in the towel. If you are convinced this > > makes sense please type it up and merge it, but I'm not going to type > > something that just doesn't make sense to me. > > Honestly, I would just go with the double drm_aperture_remove_*() helper > calls (your original patch) unless that causes real issues. There is no > point on blocking all your series just for this IMO. > > Then latter if Thomas has strong opinions can send a follow-up patch for > the gma500 driver and the aperture helpers. > > > -Daniel > > > > -- > Best regards, > > Javier Martinez Canillas > Core Platforms > Red Hat > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
[PATCH] drm/nouveau: Add a jump label in nouveau_gem_ioctl_pushbuf()
Date: Wed, 5 Apr 2023 18:38:54 +0200 The label “out_prevalid” was used to jump to another pointer check despite of the detail in the implementation of the function “nouveau_gem_ioctl_pushbuf” that it was determined already in one case that the corresponding variable contained an error pointer because of a failed call of the function “u_memcpya”. Thus use an additional label. This issue was detected by using the Coccinelle software. Fixes: 2be65641642ef423f82162c3a5f28c754d1637d2 ("drm/nouveau: fix relocations applying logic and a double-free") Signed-off-by: Markus Elfring --- drivers/gpu/drm/nouveau/nouveau_gem.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c index f77e44958037..d87e1cb2c933 100644 --- a/drivers/gpu/drm/nouveau/nouveau_gem.c +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c @@ -814,7 +814,7 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data, reloc = u_memcpya(req->relocs, req->nr_relocs, sizeof(*reloc)); if (IS_ERR(reloc)) { ret = PTR_ERR(reloc); - goto out_prevalid; + goto out_free_bo; } goto revalidate; @@ -929,6 +929,7 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data, out_prevalid: if (!IS_ERR(reloc)) u_free(reloc); +out_free_bo: u_free(bo); u_free(push); -- 2.40.0
Re: [PATCH RFC 02/18] rust: drm: Add Device and Driver abstractions
On Tue, Mar 07, 2023 at 11:25:27PM +0900, Asahi Lina wrote: > Add the initial abstractions for DRM drivers and devices. These go > together in one commit since they are fairly tightly coupled types. > > A few things have been stubbed out, to be implemented as further bits of > the DRM subsystem are introduced. > > Signed-off-by: Asahi Lina Ok so this is fairly fundamental lifetime fun and might be fairly orthogonal to most of the things you actually want to do with a drm driver (like implement gem or whatever). So separate mail. So upfront short intro. There's 3 different lifetimes involved in building a drm driver: - struct drm_driver. It's refcounted because it's fundamentally an uapi interface thing, and all the various uapi interfaces that build on top of this (drm_file, dma_buf, ...) need to hold references on it. It's supposed to survive for as long as userspace needs it or the underlying driver is bound, whichever is longer. - struct device. Refcounted and good for dmesg printing, nothing else. Yes, because ... - ... the actual hardware resource, in many places also represented by struct device. Not refcounted, instead it's limited by hotunplug or more precisiely, how long your driver is bound to the struct device. You could make a case that in C this is represented by the bus specific type (e.g. platform_device), and the bus-specific hooks delineate the lifetime (for platform devices that's that's from ->probe to ->remove). Since there's no C type for this I'll call this hwdevice. I think for rust it would be good if we model a bit more precisely in rust. It might be possible to use the bus-specific types as the hwdevice, but that's not entirely right either because each bus device is both representing the hwdevice and the refcounted struct device. Now onto lifetimes, or at least how this is usually handled. - struct device should be obvious, the important part really is that the rust wrappers should not allow anything to be done with that which is tied to the hwdevice lifetime. Which is almost everything you want to do with a struct (platform_)device (aside from pure sw stuff like dmesg printing). - the hwdevice is _not_ refcounted. I think in rust this maps to borrow semantics, to make sure that the reference only stays valid during a driver callback. The driver core/bus driver ensure that all the various callbacks (pm_ops, platform_driver, ...) finish before the ->remove callback starts. - usually the the link from hwdevice to drm_device is done with a refcounted drm_device stored with dev_set_drvdata. For rust it'd be nice if that's the Driver and fully typesafe and automatically cleaned up. - which brings us to how hwdevice cleanup works in C: That's done with all the devm_ helpers for practically anything you might want to set up for hw access: mappings, interrupts, Note that there's also devm_*malloc functions, when drivers use them that's almost always a bug because generally allocations should stick around with the drm_device and not go down with the non-refcounted hwdevice lifetime. For added fun the bus/driver core also uses devm_ to mange things tied to the refcounted struct device, which works because devm_ nests and ->probe opens up a new devm_ bucket which is torn down at ->remove time. But luckily drivers should never deal with that, so for them (on the C side at least) devm_ is the right lifetime model for things tied to the hwdevice lifetime. For rust this means that we really should try to tie all the hw related things into devm or equivalent, and make both sure it's automatically cleaned up at that point, but also no later (because if you clean up hw stuff after ->remove you have a driver bug). - Similarly on the drm_device side we have drmm_. You can have some refcounted things within the lifetime of drm_device (like dma_buf), but if things the driver creates survive past the point of drm_device, then probably something went really wrong. Either a leak or you'll blow up in the C code. So again for rust I think we should try to model this, and make sure (with borrow semantics and avoiding full refcounting like the plague in driver code) that driver structs and other sw things can't outlive the drm_device, but also don't hold it alive unduly. - Since the point of a drm_device is to drive hardware, you need to be able to safely dereference the drm_device->dev pointer and know whether it's still a hwdevice (i.e. useful) or just a struct device because the hw is gone. That's done with drm_dev_enter/exit and making sure that ->remove calls drm_dev_unplug as the very first thing, before it starts tearing down hw resources like mappings, interrupts, ... On the C side we entirely rely on review for this, and it just doesn't work. Unless exhaustively tested, hotunplug just dies, and I think for more complex drivers this is something where Rust type enforcement
Re: [PATCH v10 2/2] drm: add kms driver for loongson display controller
On 2023/4/6 00:40, Sui Jingfeng wrote: Hi, On 2023/4/4 22:10, Emil Velikov wrote: Greetings Sui Jingfeng, I haven't been around drm-land for a while and this is the first driver I skim through in a few years. So take the following suggestions with a healthy pinch of salt. Hope that helps o/ Emil, we love your reviews, On Mon, 3 Apr 2023 at 18:13, Sui Jingfeng wrote: v7 -> v8: 1) Zero a compile warnnings on 32-bit platform, compile with W=1 2) Revise lsdc_bo_gpu_offset() and minor cleanup 3) Pageflip tested on the virtual terminal with following commands modetest -M loongson -s 32:1920x1080 -v modetest -M loongson -s 34:1920x1080 -v -F tiles I could be wrong, but my understanding is that new drivers should be capable of running under Xorg and/or Wayland compositor. There is also the IGT test suite, which can help verify and validate the driver's behaviour: https://www.kernel.org/doc/html/latest/gpu/drm-uapi.html Yet it may take more time to give full answer for all of your concerns. Currently, drm/loongson driver do works under Xorg(X server), link[1] is a short video which can prove that the driver actually works very well. Note that it use the generic modesetting driver on userspace. We could provide more videos if necessary. We are carry on the IGT test suite, we feedback the test result once it finished on our platform. We will feedback the results once we finishe the igt test, thanks for providing such a valuable information. [1] https://raw.githubusercontent.com/loongson-gfx/loongson_boards/main/videos/drm_loongson_under_xserver.mp4 I'm not familiar with it before, previously we only focus on the basic unit tests came with libdrm. I will answer rest questions in a latter time, please wait a moment. +static void lsdc_crtc_reset(struct drm_crtc *crtc) +{ + struct lsdc_display_pipe *dispipe = crtc_to_display_pipe(crtc); + struct drm_device *ddev = crtc->dev; + struct lsdc_device *ldev = to_lsdc(ddev); + struct lsdc_crtc_state *priv_crtc_state; + unsigned int index = dispipe->index; + u32 val; + + val = LSDC_PF_XRGB | CFG_RESET_N; + if (ldev->descp->chip == CHIP_LS7A2000) + val |= LSDC_DMA_STEP_64_BYTES; + + lsdc_crtc_wreg32(ldev, LSDC_CRTC0_CFG_REG, index, val); + + if (ldev->descp->chip == CHIP_LS7A2000) { + val = PHY_CLOCK_EN | PHY_DATA_EN; + lsdc_crtc_wreg32(ldev, LSDC_CRTC0_PANEL_CONF_REG, index, val); + } + AFAICT no other driver touches the HW in their reset callback. Should the above be moved to another callback? +static void lsdc_crtc_atomic_enable(struct drm_crtc *crtc, + struct drm_atomic_state *state) +{ + val = lsdc_crtc_rreg32(ldev, LSDC_CRTC0_CFG_REG, index); + /* clear old dma step settings */ + val &= ~CFG_DMA_STEP_MASK; + + if (descp->chip == CHIP_LS7A2000) { + /* + * Using large dma step as much as possible, + * for improve hardware DMA efficiency. + */ + if (width_in_bytes % 256 == 0) + val |= LSDC_DMA_STEP_256_BYTES; + else if (width_in_bytes % 128 == 0) + val |= LSDC_DMA_STEP_128_BYTES; + else if (width_in_bytes % 64 == 0) + val |= LSDC_DMA_STEP_64_BYTES; + else /* width_in_bytes % 32 == 0 */ + val |= LSDC_DMA_STEP_32_BYTES; + } + + clk_func->update(pixpll, _state->pparms); + Without knowing the hardware, the clk_func abstraction seems quite arbitrary and unnecessary. It should be introduced when there is a use-case for it. + lsdc_crtc_wreg32(ldev, LSDC_CRTC0_CFG_REG, index, val | CFG_OUTPUT_EN); + + drm_crtc_vblank_on(crtc); +} + --- /dev/null +++ b/drivers/gpu/drm/loongson/lsdc_debugfs.c +void lsdc_debugfs_init(struct drm_minor *minor) +{ +#ifdef CONFIG_DEBUG_FS + drm_debugfs_create_files(lsdc_debugfs_list, + ARRAY_SIZE(lsdc_debugfs_list), + minor->debugfs_root, + minor); +#endif +} Should probably build the file when debugfs is enabled and provide no-op stub in the header. See nouveau for an example. --- /dev/null +++ b/drivers/gpu/drm/loongson/lsdc_drv.c +static const struct lsdc_desc dc_in_ls7a1000 = { + .chip = CHIP_LS7A1000, + .num_of_crtc = LSDC_NUM_CRTC, + .max_pixel_clk = 20, + .max_width = 2048, + .max_height = 2048, + .num_of_hw_cursor = 1, + .hw_cursor_w = 32, + .hw_cursor_h = 32, + .pitch_align = 256, + .mc_bits = 40, + .has_vblank_counter = false, + .has_scan_pos = true, + .has_builtin_i2c = true, + .has_vram = true, + .has_hpd_reg = false, + .is_soc = false, +}; + +static const struct lsdc_desc dc_in_ls7a2000 = { +
Re: [PATCH 1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers
Daniel Vetter writes: > On Wed, Apr 05, 2023 at 04:32:19PM +0200, Thomas Zimmermann wrote: [...] >> > > >/* >> > > > * WARNING: Apparently we must kick fbdev drivers before vgacon, >> > > > * otherwise the vga fbdev driver falls over. >> > > > */ >> > > >ret = vga_remove_vgacon(pdev); >> > >> > This isn't enough, we also nuke stuff that's mapping the vga fb range. Ah, also need aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE) then. [...] >> int aperture_remove_legacy_vga_devices(struct pci_dev *pdev) >> { >> aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE); >> >> return vga_remove_vgacon(pdev); >> } >> >> And that can be called from gma500 and the pci aperture helper. > > But you still pass a pci_dev to that helper. Which just doesn't make any > sense to me (assuming your entire point is that this isn't just a normal > pci device but some special legacy vga thing), but if we go with (void) > then there's more refactoring to do because the vga_remove_vgacon also > wants a pdev. > > All so that we don't call aperture_detach_devices() on a bunch of pci > bars, which apparently is not problem for any other driver, but absolutely > is a huge problem for gma500 somehow. > > I don't understand why. > Yeah, agreed that if vga_remove_vgacon() isn't enough and another helper is needed then starts to get a little silly. Maybe one option is to add a 3rd param to aperture_remove_conflicting_pci_devices() and skip the logic to iterate over PCI bars and call aperture_remove_conflicting_devices() ? > Consider this me throwing in the towel. If you are convinced this > makes sense please type it up and merge it, but I'm not going to type > something that just doesn't make sense to me. Honestly, I would just go with the double drm_aperture_remove_*() helper calls (your original patch) unless that causes real issues. There is no point on blocking all your series just for this IMO. Then latter if Thomas has strong opinions can send a follow-up patch for the gma500 driver and the aperture helpers. > -Daniel > -- Best regards, Javier Martinez Canillas Core Platforms Red Hat
Re: [PATCH 01/19] drm/i915/i915_scatterlist: Fix kerneldoc formatting issue - missing '@'
On Wed, 05 Apr 2023, Jani Nikula wrote: > On Wed, 05 Apr 2023, Lee Jones wrote: > > On Tue, 04 Apr 2023, Jani Nikula wrote: > > > >> On Mon, 03 Apr 2023, Lee Jones wrote: > >> > On Mon, 03 Apr 2023, Jani Nikula wrote: > >> > > >> >> On Fri, 31 Mar 2023, Lee Jones wrote: > >> >> > Fixes the following W=1 kernel build warning(s): > >> >> > > >> >> > drivers/gpu/drm/i915/i915_scatterlist.c:62: warning: Function > >> >> > parameter or member 'size' not described in 'i915_refct_sgt_init' > >> >> > > >> >> > Cc: Jani Nikula > >> >> > Cc: Joonas Lahtinen > >> >> > Cc: Rodrigo Vivi > >> >> > Cc: Tvrtko Ursulin > >> >> > Cc: David Airlie > >> >> > Cc: Daniel Vetter > >> >> > Cc: intel-...@lists.freedesktop.org > >> >> > Cc: dri-devel@lists.freedesktop.org > >> >> > Signed-off-by: Lee Jones > >> >> > >> >> Thanks for the patches! > >> >> > >> >> Applied all but one of the drm/i915 patches to drm-intel-next or > >> >> drm-intel-gt-next depending on the area. There were a couple of issues > >> >> that I fixed while applying. There was a conflict with patch 5/19 > >> >> against drm-intel-gt-next so I left that one out. > >> > > >> > Thanks Jani. I'll rebase and see what's left. > >> > >> We also took notice and aim to track this more aggressively [1]. > > > > Thanks. > > > > I did clean-up all of the GPU warnings already a couple of years ago, > > but they seem to have crept back over time. It would be great if we > > could put some extra checks in place to prevent them in the future. > > We are pretty zealous about warnings in general in i915. We have a bunch > of extra warnings in our local Makefile and use -Werror in > development. Inspired by this series, we added kernel-doc check to the > build, and hope to add kernel-doc -Werror too once we're done. Sounds good that you're on it. At least in your part of GPU. kernel-doc warnings are surfaced by enabling W=1. > > My aim, albeit ambitious, is to clean-up all of the W=1 warnings in the > > kernel, then have them promoted to W=0, so they warn more loudly during > > development, thus keeping them from reappearing. > > I wish it was easier to do the equivalent of W=1 on a driver or Makefile > basis. I like to keep i915 clean, but I don't like to use W=1 because > there are just so many warnings currently. Well that's what I hope to improve (again). :) > The other alternative is fixing and moving extra warnings from W=1 to > W=0 one by one. Right, that's where I'd like to end up eventually. -- Lee Jones [李琼斯]
Re: [PATCH v10 2/2] drm: add kms driver for loongson display controller
Hi, On 2023/4/4 22:10, Emil Velikov wrote: Greetings Sui Jingfeng, I haven't been around drm-land for a while and this is the first driver I skim through in a few years. So take the following suggestions with a healthy pinch of salt. Hope that helps o/ Emil, we love your reviews, On Mon, 3 Apr 2023 at 18:13, Sui Jingfeng wrote: v7 -> v8: 1) Zero a compile warnnings on 32-bit platform, compile with W=1 2) Revise lsdc_bo_gpu_offset() and minor cleanup 3) Pageflip tested on the virtual terminal with following commands modetest -M loongson -s 32:1920x1080 -v modetest -M loongson -s 34:1920x1080 -v -F tiles I could be wrong, but my understanding is that new drivers should be capable of running under Xorg and/or Wayland compositor. There is also the IGT test suite, which can help verify and validate the driver's behaviour: https://www.kernel.org/doc/html/latest/gpu/drm-uapi.html Yet it may take more time to give full answer for all of your concerns. Currently, drm/loongson driver do works under Xorg(X server), link[1] is a short video which can prove that the driver actually works very well. Note that it use the generic modesetting driver on userspace. We could provide more videos if necessary. We are carry on the IGT test suite, we feedback the test result once it finished on our platform. I'm not familiar with it before, previously we only focus on the basic unit tests came with libdrm. I will answer rest questions in a latter time, please wait a moment. +static void lsdc_crtc_reset(struct drm_crtc *crtc) +{ + struct lsdc_display_pipe *dispipe = crtc_to_display_pipe(crtc); + struct drm_device *ddev = crtc->dev; + struct lsdc_device *ldev = to_lsdc(ddev); + struct lsdc_crtc_state *priv_crtc_state; + unsigned int index = dispipe->index; + u32 val; + + val = LSDC_PF_XRGB | CFG_RESET_N; + if (ldev->descp->chip == CHIP_LS7A2000) + val |= LSDC_DMA_STEP_64_BYTES; + + lsdc_crtc_wreg32(ldev, LSDC_CRTC0_CFG_REG, index, val); + + if (ldev->descp->chip == CHIP_LS7A2000) { + val = PHY_CLOCK_EN | PHY_DATA_EN; + lsdc_crtc_wreg32(ldev, LSDC_CRTC0_PANEL_CONF_REG, index, val); + } + AFAICT no other driver touches the HW in their reset callback. Should the above be moved to another callback? +static void lsdc_crtc_atomic_enable(struct drm_crtc *crtc, + struct drm_atomic_state *state) +{ + val = lsdc_crtc_rreg32(ldev, LSDC_CRTC0_CFG_REG, index); + /* clear old dma step settings */ + val &= ~CFG_DMA_STEP_MASK; + + if (descp->chip == CHIP_LS7A2000) { + /* +* Using large dma step as much as possible, +* for improve hardware DMA efficiency. +*/ + if (width_in_bytes % 256 == 0) + val |= LSDC_DMA_STEP_256_BYTES; + else if (width_in_bytes % 128 == 0) + val |= LSDC_DMA_STEP_128_BYTES; + else if (width_in_bytes % 64 == 0) + val |= LSDC_DMA_STEP_64_BYTES; + else /* width_in_bytes % 32 == 0 */ + val |= LSDC_DMA_STEP_32_BYTES; + } + + clk_func->update(pixpll, _state->pparms); + Without knowing the hardware, the clk_func abstraction seems quite arbitrary and unnecessary. It should be introduced when there is a use-case for it. + lsdc_crtc_wreg32(ldev, LSDC_CRTC0_CFG_REG, index, val | CFG_OUTPUT_EN); + + drm_crtc_vblank_on(crtc); +} + --- /dev/null +++ b/drivers/gpu/drm/loongson/lsdc_debugfs.c +void lsdc_debugfs_init(struct drm_minor *minor) +{ +#ifdef CONFIG_DEBUG_FS + drm_debugfs_create_files(lsdc_debugfs_list, +ARRAY_SIZE(lsdc_debugfs_list), +minor->debugfs_root, +minor); +#endif +} Should probably build the file when debugfs is enabled and provide no-op stub in the header. See nouveau for an example. --- /dev/null +++ b/drivers/gpu/drm/loongson/lsdc_drv.c +static const struct lsdc_desc dc_in_ls7a1000 = { + .chip = CHIP_LS7A1000, + .num_of_crtc = LSDC_NUM_CRTC, + .max_pixel_clk = 20, + .max_width = 2048, + .max_height = 2048, + .num_of_hw_cursor = 1, + .hw_cursor_w = 32, + .hw_cursor_h = 32, + .pitch_align = 256, + .mc_bits = 40, + .has_vblank_counter = false, + .has_scan_pos = true, + .has_builtin_i2c = true, + .has_vram = true, + .has_hpd_reg = false, + .is_soc = false, +}; + +static const struct lsdc_desc dc_in_ls7a2000 = { + .chip = CHIP_LS7A2000, + .num_of_crtc = LSDC_NUM_CRTC, + .max_pixel_clk = 35, + .max_width = 4096, + .max_height = 4096, + .num_of_hw_cursor = 2, + .hw_cursor_w = 64, + .hw_cursor_h = 64, +
[PATCH] drm/scheduler: Fix UAF race in drm_sched_entity_push_job()
After a job is pushed into the queue, it is owned by the scheduler core and may be freed at any time, so we can't write nor read the submit timestamp after that point. Fixes oopses observed with the drm/asahi driver, found with kASAN. Signed-off-by: Asahi Lina --- drivers/gpu/drm/scheduler/sched_entity.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 15d04a0ec623..e0a8890a62e2 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -507,12 +507,19 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) { struct drm_sched_entity *entity = sched_job->entity; bool first; + ktime_t submit_ts; trace_drm_sched_job(sched_job, entity); atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); + + /* +* After the sched_job is pushed into the entity queue, it may be +* completed and freed up at any time. We can no longer access it. +* Make sure to set the submit_ts first, to avoid a race. +*/ + sched_job->submit_ts = submit_ts = ktime_get(); first = spsc_queue_push(>job_queue, _job->queue_node); - sched_job->submit_ts = ktime_get(); /* first job wakes up scheduler */ if (first) { @@ -529,7 +536,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) spin_unlock(>rq_lock); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo(entity, sched_job->submit_ts); + drm_sched_rq_update_fifo(entity, submit_ts); drm_sched_wakeup(entity->rq->sched); } --- base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6 change-id: 20230406-scheduler-uaf-2-44cf8faed245 Thank you, ~~ Lina
[PATCH] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name
A signaled scheduler fence can outlive its scheduler, since fences are independently reference counted. Therefore, we can't reference the scheduler in the get_timeline_name() implementation. Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared dma-bufs reference fences from GPU schedulers that no longer exist. Signed-off-by: Asahi Lina --- drivers/gpu/drm/scheduler/sched_entity.c | 7 ++- drivers/gpu/drm/scheduler/sched_fence.c | 4 +++- include/drm/gpu_scheduler.h | 5 + 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 15d04a0ec623..8b3b949b2ce8 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -368,7 +368,12 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity) /* * Fence is from the same scheduler, only need to wait for -* it to be scheduled +* it to be scheduled. +* +* Note: s_fence->sched could have been freed and reallocated +* as another scheduler. This false positive case is okay, as if +* the old scheduler was freed all of its jobs must have +* signaled their completion fences. */ fence = dma_fence_get(_fence->scheduled); dma_fence_put(entity->dependency); diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c index 7fd869520ef2..33b145dfa38c 100644 --- a/drivers/gpu/drm/scheduler/sched_fence.c +++ b/drivers/gpu/drm/scheduler/sched_fence.c @@ -66,7 +66,7 @@ static const char *drm_sched_fence_get_driver_name(struct dma_fence *fence) static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f) { struct drm_sched_fence *fence = to_drm_sched_fence(f); - return (const char *)fence->sched->name; + return (const char *)fence->sched_name; } static void drm_sched_fence_free_rcu(struct rcu_head *rcu) @@ -168,6 +168,8 @@ void drm_sched_fence_init(struct drm_sched_fence *fence, unsigned seq; fence->sched = entity->rq->sched; + strlcpy(fence->sched_name, entity->rq->sched->name, + sizeof(fence->sched_name)); seq = atomic_inc_return(>fence_seq); dma_fence_init(>scheduled, _sched_fence_ops_scheduled, >lock, entity->fence_context, seq); diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 9db9e5e504ee..49f019731891 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -295,6 +295,11 @@ struct drm_sched_fence { * @lock: the lock used by the scheduled and the finished fences. */ spinlock_t lock; +/** + * @sched_name: the name of the scheduler that owns this fence. We + * keep a copy here since fences can outlive their scheduler. + */ + char sched_name[16]; /** * @owner: job owner for debugging */ --- base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6 change-id: 20230406-scheduler-uaf-1-994ec34cac93 Thank you, ~~ Lina
Re: [PATCH 3/3] drm/fb-helper: fix input validation gaps in check_var
Daniel Vetter writes: [...] >> >> but only the 'var->xres > fb->width || var->yres > fb->height' from the >> conditions checked could be false after your __fill_var() call above. >> >> You should drop the 'var->bits_per_pixel > bpp', 'var->xres_virtual > >> fb->width' and 'var->yres_virtual > fb->height' checks I believe since >> those will always be true. > > The __fill_var is after this. I'm honestly not sure what the exact Ah, your patch adds it after that indeed. Please ignore my comment then. > semantics are supposed to be, but essentially if userspace asks for too > big virtual size, we reject it. And for anything else we then tell it > (with __fill_var) how big the actually available space is. > > What I'm wondering now is whether too small x/yres won't lead to problems > of some sorts ... For multi-screen we set the virtual size to be big > enough for all crtc, and then just set x/yres to be the smallest output. > That way fbcon knows to only draw as much as is visible on all screens. > But if you then pan that too much, the bigger screens might not have a big > enough buffer anymore and things fail (but shouldn't). > > Not sure how to fix that tbh. Would this be a problem in practice? -- Best regards, Javier Martinez Canillas Core Platforms Red Hat
Re: [Regression] drm/scheduler: track GPU active time per entity
On 2023-04-05 10:05, Danilo Krummrich wrote: > On 4/4/23 06:31, Luben Tuikov wrote: >> On 2023-03-28 04:54, Lucas Stach wrote: >>> Hi Danilo, >>> >>> Am Dienstag, dem 28.03.2023 um 02:57 +0200 schrieb Danilo Krummrich: Hi all, Commit df622729ddbf ("drm/scheduler: track GPU active time per entity") tries to track the accumulated time that a job was active on the GPU writing it to the entity through which the job was deployed to the scheduler originally. This is done within drm_sched_get_cleanup_job() which fetches a job from the schedulers pending_list. Doing this can result in a race condition where the entity is already freed, but the entity's newly added elapsed_ns field is still accessed once the job is fetched from the pending_list. After drm_sched_entity_destroy() being called it should be safe to free the structure that embeds the entity. However, a job originally handed over to the scheduler by this entity might still reside in the schedulers pending_list for cleanup after drm_sched_entity_destroy() already being called and the entity being freed. Hence, we can run into a UAF. >>> Sorry about that, I clearly didn't properly consider this case. >>> In my case it happened that a job, as explained above, was just picked from the schedulers pending_list after the entity was freed due to the client application exiting. Meanwhile this freed up memory was already allocated for a subsequent client applications job structure again. Hence, the new jobs memory got corrupted. Luckily, I was able to reproduce the same corruption over and over again by just using deqp-runner to run a specific set of VK test cases in parallel. Fixing this issue doesn't seem to be very straightforward though (unless I miss something), which is why I'm writing this mail instead of sending a fix directly. Spontaneously, I see three options to fix it: 1. Rather than embedding the entity into driver specific structures (e.g. tied to file_priv) we could allocate the entity separately and reference count it, such that it's only freed up once all jobs that were deployed through this entity are fetched from the schedulers pending list. >>> My vote is on this or something in similar vain for the long term. I >>> have some hope to be able to add a GPU scheduling algorithm with a bit >>> more fairness than the current one sometime in the future, which >>> requires execution time tracking on the entities. >> >> Danilo, >> >> Using kref is preferable, i.e. option 1 above. > > I think the only real motivation for doing that would be for generically > tracking job statistics within the entity a job was deployed through. If > we all agree on tracking job statistics this way I am happy to prepare a > patch for this option and drop this one: > https://lore.kernel.org/all/20230331000622.4156-1-d...@redhat.com/T/#u Hmm, I never thought about "job statistics" when I preferred using kref above. The reason kref is attractive is because one doesn't need to worry about it--when the last user drops the kref, the release is called to do housekeeping. If this never happens, we know that we have a bug to debug. Regarding the patch above--I did look around the code, and it seems safe, as per your analysis, I didn't see any reference to entity after job submission, but I'll comment on that thread as well for the record. Regards, Luben > > Christian mentioned amdgpu tried something similar to what Lucas tried > running into similar trouble, backed off and implemented it in another > way - a driver specific way I guess? > >> >> Lucas, can you shed some light on, >> >> 1. In what way the current FIFO scheduling is unfair, and >> 2. shed some details on this "scheduling algorithm with a bit >> more fairness than the current one"? >> >> Regards, >> Luben >> >>> 2. Somehow make sure drm_sched_entity_destroy() does block until all jobs deployed through this entity were fetched from the schedulers pending list. Though, I'm pretty sure that this is not really desirable. 3. Just revert the change and let drivers implement tracking of GPU active times themselves. >>> Given that we are already pretty late in the release cycle and etnaviv >>> being the only driver so far making use of the scheduler elapsed time >>> tracking I think the right short term solution is to either move the >>> tracking into etnaviv or just revert the change for now. I'll have a >>> look at this. >>> >>> Regards, >>> Lucas >>> In the case of just reverting the change I'd propose to also set a jobs entity pointer to NULL once the job was taken from the entity, such that in case of a future issue we fail where the actual issue resides and to make it more obvious that the field shouldn't be used anymore after the job was taken from the
Re: [PATCH v3 54/65] clk: da8xx: clk48: Switch to determine_rate
On 4/5/23 10:22 AM, Maxime Ripard wrote: Hi David, On Wed, Apr 05, 2023 at 10:03:24AM -0500, David Lechner wrote: On 4/4/23 5:11 AM, Maxime Ripard wrote: The TI DA8xx USB0 clk48 clocks implements a mux with a set_parent hook, but doesn't provide a determine_rate implementation. This is a bit odd, since set_parent() is there to, as its name implies, change the parent of a clock. However, the most likely candidate to trigger that parent change is a call to clk_set_rate(), with determine_rate() figuring out which parent is the best suited for a given rate. As mentioned in my previous review, parent is selected by device tree and should never be changed after init. Great minds think alike then, because the driver implements exactly that, either before or after that patch. That patch makes the current behaviour explicit but doesn't change it in any way. So I guess that means that I can add your Acked-by on the three patches you reviewed with the same message? Maxime Yes, preferably with a simplified commit message. Acked-by: David Lechner
Re: [PATCH 01/18] fbdev: Prepare generic architecture helpers
On Wed, Apr 05, 2023 at 05:53:03PM +0200, Arnd Bergmann wrote: > On Wed, Apr 5, 2023, at 17:05, Thomas Zimmermann wrote: > > Generic implementations of fb_pgprotect() and fb_is_primary_device() > > have been in the source code for a long time. Prepare the header file > > to make use of them. > > > > Improve the code by using an inline function for fb_pgprotect() and > > by removing include statements. > > > > Symbols are protected by preprocessor guards. Architectures that > > provide a symbol need to define a preprocessor token of the same > > name and value. Otherwise the header file will provide a generic > > implementation. This pattern has been taken from . > > > > Signed-off-by: Thomas Zimmermann > > Moving this into generic code is good, but I'm not sure > about the default for fb_pgprotect(): > > > + > > +#ifndef fb_pgprotect > > +#define fb_pgprotect fb_pgprotect > > +static inline void fb_pgprotect(struct file *file, struct vm_area_struct > > *vma, > > + unsigned long off) > > +{ } > > +#endif > > I think most architectures will want the version we have on > arc, arm, arm64, loongarch, and sh already: > > static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma, > unsigned long off) > { >vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); > } > > so I'd suggest making that version the default, and treating the > empty ones (m68knommu, sparc32) as architecture specific > workarounds. Yeah I was about to type the same suggestion :-) -Daniel > I see that sparc64 and parisc use pgprot_uncached here, but as > they don't define a custom pgprot_writecombine, this ends up being > the same, and they can use the above definition as well. > > mips defines pgprot_writecombine but uses pgprot_noncached > in fb_pgprotect(), which is probably a mistake and should have > been updated as part of commit 4b050ba7a66c ("MIPS: pgtable.h: > Implement the pgprot_writecombine function for MIPS"). > > Arnd -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH 01/19] drm/i915/i915_scatterlist: Fix kerneldoc formatting issue - missing '@'
On Wed, 05 Apr 2023, Lee Jones wrote: > On Tue, 04 Apr 2023, Jani Nikula wrote: > >> On Mon, 03 Apr 2023, Lee Jones wrote: >> > On Mon, 03 Apr 2023, Jani Nikula wrote: >> > >> >> On Fri, 31 Mar 2023, Lee Jones wrote: >> >> > Fixes the following W=1 kernel build warning(s): >> >> > >> >> > drivers/gpu/drm/i915/i915_scatterlist.c:62: warning: Function >> >> > parameter or member 'size' not described in 'i915_refct_sgt_init' >> >> > >> >> > Cc: Jani Nikula >> >> > Cc: Joonas Lahtinen >> >> > Cc: Rodrigo Vivi >> >> > Cc: Tvrtko Ursulin >> >> > Cc: David Airlie >> >> > Cc: Daniel Vetter >> >> > Cc: intel-...@lists.freedesktop.org >> >> > Cc: dri-devel@lists.freedesktop.org >> >> > Signed-off-by: Lee Jones >> >> >> >> Thanks for the patches! >> >> >> >> Applied all but one of the drm/i915 patches to drm-intel-next or >> >> drm-intel-gt-next depending on the area. There were a couple of issues >> >> that I fixed while applying. There was a conflict with patch 5/19 >> >> against drm-intel-gt-next so I left that one out. >> > >> > Thanks Jani. I'll rebase and see what's left. >> >> We also took notice and aim to track this more aggressively [1]. > > Thanks. > > I did clean-up all of the GPU warnings already a couple of years ago, > but they seem to have crept back over time. It would be great if we > could put some extra checks in place to prevent them in the future. We are pretty zealous about warnings in general in i915. We have a bunch of extra warnings in our local Makefile and use -Werror in development. Inspired by this series, we added kernel-doc check to the build, and hope to add kernel-doc -Werror too once we're done. > My aim, albeit ambitious, is to clean-up all of the W=1 warnings in the > kernel, then have them promoted to W=0, so they warn more loudly during > development, thus keeping them from reappearing. I wish it was easier to do the equivalent of W=1 on a driver or Makefile basis. I like to keep i915 clean, but I don't like to use W=1 because there are just so many warnings currently. The other alternative is fixing and moving extra warnings from W=1 to W=0 one by one. BR, Jani. -- Jani Nikula, Intel Open Source Graphics Center
Re: [PATCH] drm/msm: Check for the GPU IOMMU during bind
On Fri, Mar 10, 2023 at 01:05:36AM +0200, Dmitry Baryshkov wrote: > CAUTION: This email originated from outside of the organization. Do not click > links or open attachments unless you can confirm the sender and know the > content is safe. > > > > On 10/03/2023 00:20, Jordan Crouse wrote: > > While booting with amd,imageon on a headless target the GPU probe was > > failing with -ENOSPC in get_pages() from msm_gem.c. > > > > Investigation showed that the driver was using the default 16MB VRAM > > carveout because msm_use_mmu() was returning false since headless devices > > use a dummy parent device. Avoid this by extending the existing is_a2xx > > priv member to check the GPU IOMMU state on all platforms and use that > > check in msm_use_mmu(). > > I wonder if we can fix this by setting 'dummy_dev'->of_node to adreno's > of_node. Did you check that possibility? I said I would check and then never looped back around. This will stick on my todo list for now and I'll check on the next cycle. If anybody else wants to jump in the meantime then please go for it. Jordan
Re: [PATCH 1/8] drm/gma500: Use drm_aperture_remove_conflicting_pci_framebuffers
On Wed, Apr 05, 2023 at 04:32:19PM +0200, Thomas Zimmermann wrote: > Hi > > Am 05.04.23 um 15:18 schrieb Daniel Vetter: > > On Wed, Apr 05, 2023 at 01:16:27PM +0200, Javier Martinez Canillas wrote: > > > Thomas Zimmermann writes: > > > > > > [...] > > > > > > > > > > > Your comment says that it calls a PCI function to clean up to vgacon. > > > > That comment explains what is happening, not why. And how the PCI and > > > > vgacon code work together is non-obvious. > > > > Would a better comment help then: > > > > /* > > * gma500 is a strange hybrid device, which both acts as a pci > > * device (for legacy vga functionality) but also more like an > > * integrated display on a SoC where the framebuffer simply > > * resides in main memory and not in a special pci bar (that > > * internally redirects to a stolen range of main memory) like all > > * other integrated pci display devices have. > > * > > * To catch all cases we need to both remove conflicting fw > > * drivers for the pci device and main memory. > > */ > > Together with the existing comment, this should be the comment to describe > gma_remove_conflicting_framebuffers(). > > > > > > > > > Again, here's my proposal for gma500: > > > > > > > > // call this from psb_pci_probe() > > > > int gma_remove_conflicting_framebuffers(struct pci_dev *pdev, const > > > > struct drm_driver *req_driver) > > > > { > > > > resource_size_t base = 0; > > > > resource_size_t size = (resource_size_t)-1; > > > > const char *name = req_driver->name; > > > > int ret; > > > > > > > > /* > > > > * We cannot yet easily find the framebuffer's location in > > > > * memory. So remove all framebuffers here. > > > > * > > > > * TODO: Refactor psb_driver_load() to map vdc_reg earlier. Then > > > > * we might be able to read the framebuffer range from the > > > > * device. > > > > */ > > > > ret = aperture_remove_conflicting_devices(base, size, name); > > > > Why can't this be a call to drm_aperture_remove_framebuffers? At least as > > long as we don't implement the "read out actual fb base and size" code, > > which also none of the other soc drivers bother with? > > It can. Feel free to use it. > > But I have to say that those DRM helpers are somewhat empty and obsolete > after the aperture code has been moved to drivers/video/. They exist mostly > for convenience. As with other DRM helpers, if a driver needs something > special, it can ignore them. > > > > > > > if (ret) > > > > return ret; > > > > > > > > /* > > > > * WARNING: Apparently we must kick fbdev drivers before vgacon, > > > > * otherwise the vga fbdev driver falls over. > > > > */ > > > > ret = vga_remove_vgacon(pdev); > > > > This isn't enough, we also nuke stuff that's mapping the vga fb range. > > Which is really the reason I don't want to open code random stuff, pci is > > self-describing, if it's decoding legacy vga it can figure this out and we > > only have to implement the "how do I nuke legacy vga fw drivers from a pci > > driver" once. > > Sure, but it's really just one additional line: > > aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE); > > as you mention below, this and vgacon can be exported in a single VGA > aperture helper. > > > > > Not twice like this would result in, with the gma500 version being only > > half the thing. > > > > If it absolutely has to be a separate function for the gma500 pci legacy > > vga (I still don't get why, it's just a pci vga device, there's absolutely > > nothing special about that part at all) then I think it needs to be at > > least a common "nuke a legacy vga device for me pls" function, which > > shares the implementation with the pci one. > > Sure > > /** > * kerneldoc goes here > * > * WARNING: Apparently we must remove graphics drivers before calling > * this helper. Otherwise the vga fbdev driver falls over if > * we have vgacon configured. > */ > int aperture_remove_legacy_vga_devices(struct pci_dev *pdev) > { > aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE); > > return vga_remove_vgacon(pdev); > } > > And that can be called from gma500 and the pci aperture helper. But you still pass a pci_dev to that helper. Which just doesn't make any sense to me (assuming your entire point is that this isn't just a normal pci device but some special legacy vga thing), but if we go with (void) then there's more refactoring to do because the vga_remove_vgacon also wants a pdev. All so that we don't call aperture_detach_devices() on a bunch of pci bars, which apparently is not problem for any other driver, but absolutely is a huge problem for gma500 somehow. I don't understand why. Consider this me throwing in
Re: [PATCH 01/18] fbdev: Prepare generic architecture helpers
On Wed, Apr 5, 2023, at 17:05, Thomas Zimmermann wrote: > Generic implementations of fb_pgprotect() and fb_is_primary_device() > have been in the source code for a long time. Prepare the header file > to make use of them. > > Improve the code by using an inline function for fb_pgprotect() and > by removing include statements. > > Symbols are protected by preprocessor guards. Architectures that > provide a symbol need to define a preprocessor token of the same > name and value. Otherwise the header file will provide a generic > implementation. This pattern has been taken from . > > Signed-off-by: Thomas Zimmermann Moving this into generic code is good, but I'm not sure about the default for fb_pgprotect(): > + > +#ifndef fb_pgprotect > +#define fb_pgprotect fb_pgprotect > +static inline void fb_pgprotect(struct file *file, struct vm_area_struct > *vma, > + unsigned long off) > +{ } > +#endif I think most architectures will want the version we have on arc, arm, arm64, loongarch, and sh already: static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma, unsigned long off) { vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); } so I'd suggest making that version the default, and treating the empty ones (m68knommu, sparc32) as architecture specific workarounds. I see that sparc64 and parisc use pgprot_uncached here, but as they don't define a custom pgprot_writecombine, this ends up being the same, and they can use the above definition as well. mips defines pgprot_writecombine but uses pgprot_noncached in fb_pgprotect(), which is probably a mistake and should have been updated as part of commit 4b050ba7a66c ("MIPS: pgtable.h: Implement the pgprot_writecombine function for MIPS"). Arnd
Re: [PATCH RFC 12/18] rust: drm: sched: Add GPU scheduler abstraction
On Tue, Mar 07, 2023 at 11:25:37PM +0900, Asahi Lina wrote: > The GPU scheduler manages scheduling GPU jobs and dependencies between > them. This Rust abstraction allows Rust DRM drivers to use this > functionality. > > Signed-off-by: Asahi Lina Overall (with my limited rust knowledge) I really like this, it nicely encodes the state transitions of jobs and anything else I looked into. Some thoughts/questions below. > --- > drivers/gpu/drm/Kconfig | 5 + > rust/bindings/bindings_helper.h | 1 + > rust/helpers.c | 6 + > rust/kernel/drm/mod.rs | 2 + > rust/kernel/drm/sched.rs| 358 > > 5 files changed, 372 insertions(+) > > diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig > index 70a983a17ac2..8b5ad6aee126 100644 > --- a/drivers/gpu/drm/Kconfig > +++ b/drivers/gpu/drm/Kconfig > @@ -39,6 +39,11 @@ config RUST_DRM_GEM_SHMEM_HELPER > depends on RUST_DRM > select DRM_GEM_SHMEM_HELPER > > +config RUST_DRM_SCHED > + bool > + depends on RUST_DRM > + select DRM_SCHED > + > config DRM_MIPI_DBI > tristate > depends on DRM > diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h > index b6696011f3a4..dc01be08676e 100644 > --- a/rust/bindings/bindings_helper.h > +++ b/rust/bindings/bindings_helper.h > @@ -13,6 +13,7 @@ > #include > #include > #include > +#include > #include > #include > #include > diff --git a/rust/helpers.c b/rust/helpers.c > index 11965b1e2f4e..1b33ed602090 100644 > --- a/rust/helpers.c > +++ b/rust/helpers.c > @@ -408,6 +408,12 @@ void rust_helper___spin_lock_init(spinlock_t *lock, > const char *name, > } > EXPORT_SYMBOL_GPL(rust_helper___spin_lock_init); > > +unsigned long rust_helper_msecs_to_jiffies(const unsigned int m) > +{ > + return msecs_to_jiffies(m); > +} > +EXPORT_SYMBOL_GPL(rust_helper_msecs_to_jiffies); > + > #ifdef CONFIG_DMA_SHARED_BUFFER > > void rust_helper_dma_fence_get(struct dma_fence *fence) > diff --git a/rust/kernel/drm/mod.rs b/rust/kernel/drm/mod.rs > index dae98826edfd..3ddf7712aab3 100644 > --- a/rust/kernel/drm/mod.rs > +++ b/rust/kernel/drm/mod.rs > @@ -8,4 +8,6 @@ pub mod file; > pub mod gem; > pub mod ioctl; > pub mod mm; > +#[cfg(CONFIG_RUST_DRM_SCHED)] > +pub mod sched; > pub mod syncobj; > diff --git a/rust/kernel/drm/sched.rs b/rust/kernel/drm/sched.rs > new file mode 100644 > index ..a5275cc16179 > --- /dev/null > +++ b/rust/kernel/drm/sched.rs > @@ -0,0 +1,358 @@ > +// SPDX-License-Identifier: GPL-2.0 OR MIT > + > +//! DRM Scheduler > +//! > +//! C header: > [`include/linux/drm/gpu_scheduler.h`](../../../../include/linux/drm/gpu_scheduler.h) > + > +use crate::{ > +bindings, device, > +dma_fence::*, > +error::{to_result, Result}, > +prelude::*, > +sync::{Arc, UniqueArc}, > +}; > +use alloc::boxed::Box; > +use core::marker::PhantomData; > +use core::mem::MaybeUninit; > +use core::ops::{Deref, DerefMut}; > +use core::ptr::addr_of_mut; > + > +/// Scheduler status after timeout recovery > +#[repr(u32)] > +pub enum Status { > +/// Device recovered from the timeout and can execute jobs again > +Nominal = bindings::drm_gpu_sched_stat_DRM_GPU_SCHED_STAT_NOMINAL, > +/// Device is no longer available > +NoDevice = bindings::drm_gpu_sched_stat_DRM_GPU_SCHED_STAT_ENODEV, > +} > + > +/// Scheduler priorities > +#[repr(i32)] > +pub enum Priority { > +/// Low userspace priority > +Min = bindings::drm_sched_priority_DRM_SCHED_PRIORITY_MIN, > +/// Normal userspace priority > +Normal = bindings::drm_sched_priority_DRM_SCHED_PRIORITY_NORMAL, > +/// High userspace priority > +High = bindings::drm_sched_priority_DRM_SCHED_PRIORITY_HIGH, > +/// Kernel priority (highest) > +Kernel = bindings::drm_sched_priority_DRM_SCHED_PRIORITY_KERNEL, > +} > + > +/// Trait to be implemented by driver job objects. > +pub trait JobImpl: Sized { > +/// Called when the scheduler is considering scheduling this job next, > to get another Fence > +/// for this job to block on. Once it returns None, run() may be called. > +fn prepare(_job: Job) -> Option { So if I get this all right then Job allows us to nicely parametrize the job with the driver structure itself, but not really anything else. I do wonder whether this needs a bit more with a type both for the job and entity and the drm/sched code + rust wrapper guaranteeing that the lifetimes of these make sense. With just the job parametrized drivers need to make sure they refcount anything else hanging of that properly which means if they get some detail wrong there might be an unintentional leak. If we instead also give a parametrized entity where the driver can stuff anything necessary and sched code guarantees that it'll clean up the any mess on teardown and guarantee that the entity survives, I think a lot of drivers could benefit from that and it would
Re: [Regression] drm/scheduler: track GPU active time per entity
On 4/4/23 06:31, Luben Tuikov wrote: On 2023-03-28 04:54, Lucas Stach wrote: Hi Danilo, Am Dienstag, dem 28.03.2023 um 02:57 +0200 schrieb Danilo Krummrich: Hi all, Commit df622729ddbf ("drm/scheduler: track GPU active time per entity") tries to track the accumulated time that a job was active on the GPU writing it to the entity through which the job was deployed to the scheduler originally. This is done within drm_sched_get_cleanup_job() which fetches a job from the schedulers pending_list. Doing this can result in a race condition where the entity is already freed, but the entity's newly added elapsed_ns field is still accessed once the job is fetched from the pending_list. After drm_sched_entity_destroy() being called it should be safe to free the structure that embeds the entity. However, a job originally handed over to the scheduler by this entity might still reside in the schedulers pending_list for cleanup after drm_sched_entity_destroy() already being called and the entity being freed. Hence, we can run into a UAF. Sorry about that, I clearly didn't properly consider this case. In my case it happened that a job, as explained above, was just picked from the schedulers pending_list after the entity was freed due to the client application exiting. Meanwhile this freed up memory was already allocated for a subsequent client applications job structure again. Hence, the new jobs memory got corrupted. Luckily, I was able to reproduce the same corruption over and over again by just using deqp-runner to run a specific set of VK test cases in parallel. Fixing this issue doesn't seem to be very straightforward though (unless I miss something), which is why I'm writing this mail instead of sending a fix directly. Spontaneously, I see three options to fix it: 1. Rather than embedding the entity into driver specific structures (e.g. tied to file_priv) we could allocate the entity separately and reference count it, such that it's only freed up once all jobs that were deployed through this entity are fetched from the schedulers pending list. My vote is on this or something in similar vain for the long term. I have some hope to be able to add a GPU scheduling algorithm with a bit more fairness than the current one sometime in the future, which requires execution time tracking on the entities. Danilo, Using kref is preferable, i.e. option 1 above. I think the only real motivation for doing that would be for generically tracking job statistics within the entity a job was deployed through. If we all agree on tracking job statistics this way I am happy to prepare a patch for this option and drop this one: https://lore.kernel.org/all/20230331000622.4156-1-d...@redhat.com/T/#u Christian mentioned amdgpu tried something similar to what Lucas tried running into similar trouble, backed off and implemented it in another way - a driver specific way I guess? Lucas, can you shed some light on, 1. In what way the current FIFO scheduling is unfair, and 2. shed some details on this "scheduling algorithm with a bit more fairness than the current one"? Regards, Luben 2. Somehow make sure drm_sched_entity_destroy() does block until all jobs deployed through this entity were fetched from the schedulers pending list. Though, I'm pretty sure that this is not really desirable. 3. Just revert the change and let drivers implement tracking of GPU active times themselves. Given that we are already pretty late in the release cycle and etnaviv being the only driver so far making use of the scheduler elapsed time tracking I think the right short term solution is to either move the tracking into etnaviv or just revert the change for now. I'll have a look at this. Regards, Lucas In the case of just reverting the change I'd propose to also set a jobs entity pointer to NULL once the job was taken from the entity, such that in case of a future issue we fail where the actual issue resides and to make it more obvious that the field shouldn't be used anymore after the job was taken from the entity. I'm happy to implement the solution we agree on. However, it might also make sense to revert the change until we have a solution in place. I'm also happy to send a revert with a proper description of the problem. Please let me know what you think. - Danilo
Re: [PATCH 1/1] drm/bridge: ti-sn65dsi86: Allow GPIO operations to sleep
Hi, On Wed, Apr 5, 2023 at 6:51 AM Alexander Stein wrote: > > There is no need to require non-sleeping GPIO access. Silence the > WARN_ON() if GPIO is using e.g. I2C expanders. > > Signed-off-by: Alexander Stein > --- > This is the TI SN65DSI86 equivalent to 805245071240 ("drm/bridge: > ti-sn65dsi83: Allow GPIO operations to sleep") > > drivers/gpu/drm/bridge/ti-sn65dsi86.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Reviewed-by: Douglas Anderson Trivial / straightforward change. Sending it to drm-misc-next. 77d08a2de6a4 drm/bridge: ti-sn65dsi86: Allow GPIO operations to sleep
Re: [PATCH v3 43/65] ASoC: tlv320aic32x4: Add a determine_rate hook
On Wed, Apr 05, 2023 at 05:17:21PM +0200, Maxime Ripard wrote: > On Tue, Apr 04, 2023 at 04:26:18PM +0100, Mark Brown wrote: > > To be honest it's surprising that we'd have to manually specify this, I > > would expect to be able to reparent. I suspect it'd be better to go the > > other way here and allow reparenting. > Yeah, I think I'd prefer to allow reparenting too, but as can be seen > from the other reviewers in that thread, it seems like we have a very > split community here, so that doesn't sound very realistic without some > major pushback :) For these ASoC drivers I think we should just do the reparenting, they're very much at the leaf of the tree so the considerations that make it a problem sometimes are unlikely to apply. signature.asc Description: PGP signature
Re: [PATCH v2 56/65] clk: ingenic: cgu: Switch to determine_rate
Le mercredi 05 avril 2023 à 16:50 +0200, Maxime Ripard a écrit : > On Wed, Apr 05, 2023 at 02:57:26PM +0200, Paul Cercueil wrote: > > Le lundi 27 mars 2023 à 21:24 +0200, Maxime Ripard a écrit : > > > On Fri, Mar 24, 2023 at 08:58:48PM +, Aidan MacDonald wrote: > > > > > > My suggestion: add a per-clock bitmap to keep track of > > > > > > which > > > > > > parents > > > > > > are allowed. Any operation that would select a parent clock > > > > > > not > > > > > > on the > > > > > > whitelist should fail. Automatic reparenting should only > > > > > > select > > > > > > from > > > > > > clocks on the whitelist. And we need new DT bindings for > > > > > > controlling > > > > > > the whitelist, for example: > > > > > > > > > > > > clock-parents-0 = <>, <_c>; > > > > > > clock-parents-1 = <>, <_a>, <_b>; > > > > > > > > > > > > This means that clk1 can only have pll_c as a parent, while > > > > > > clk2 can > > > > > > have pll_a or pll_b as parents. By default every clock will > > > > > > be > > > > > > able > > > > > > to use any parent, so a list is only needed if the machine > > > > > > needs a > > > > > > more restrictive policy. > > > > > > > > > > > > assigned-clock-parents should disable automatic > > > > > > reparenting, > > > > > > but allow > > > > > > explicit clk_set_parent(). This will allow clock drivers to > > > > > > start doing > > > > > > reparenting without breaking old DTs. > > > > > > > > > > I'm generally not a fan of putting all these policies in the > > > > > device > > > > > tree. Do you have an example where it wouldn't be possible to > > > > > do > > > > > exactly > > > > > this from the driver itself? > > > > > > > > I'm confused. What's implicit in the example is clk1 and clk2 > > > > might > > > > have *other* possible choices of parent clock and the device > > > > tree > > > > is > > > > limiting what the OS is allowed to choose. > > > > > > > > Why would you put such arbitrary limitations into the driver? > > > > > > Why would we put such arbitrary limitations in the firmware? As > > > this > > > entire thread can attest, people are already using the device > > > tree to > > > work around the limitations of the Linux driver, or reduce the > > > features of Linux because they can rely on the device tree. > > > Either > > > way, it's linked to the state of the Linux driver, and any other > > > OS > > > or > > > Linux version could very well implement something more dynamic. > > > > Probably because if we have to choose between setting policy in the > > kernel or in the firmware, it is arguably better to set it in the > > firmware. > > I have a very different view on this I guess. Firmware is (most of > the > time) hard to update, and the policy depend on the state of support > of a > given OS so it's likely to evolve. The kernel is the best place to me > to > put that kind of policy. Why do you think differently? Because the clocks configuration can be board-specific. And you don't really want board-specific stuff in the driver. If we take the Ingenic JZ4770 SoC as example, on one board we parent everything we can to the PLL1 clock and set it to 432 MHz (the least common multiple). Then the PLL0 (which drives the DDR and CPU clocks) can be updated to overclock/underclock the CPU and RAM. So should that be hardcoded in the driver? Well, for a different board, for which overclock is not really needed, it's better to parent everything to PLL0 so that PLL1 can be shut down to save power. So what policy should be hardcoded in the driver? > > > Especially when talking about clocks, as the firmware is already > > the > > one programming the boot clocks. > > I'm not sure what your point is there. I don't think I ever saw a > firmware getting the clocks right for every possible scenario on a > given > platform. And if it was indeed the case, then we wouldn't even a > kernel > driver. My point is that there is already policy in how the firmware sets up the hardware; and most often than not, the kernel driver won't change that (e.g. you're probably not going to touch the DDR clock). It's better to have all policy in the firmware then, instead of having some in the firmware, and some in the kernel driver. Cheers, -Paul
Re: [PATCH] drm/panel-edp: Add AUO NE135FBM-N41 v8.1 panel entry
Hi, On Wed, Apr 5, 2023 at 3:05 AM AngeloGioacchino Del Regno wrote: > > Add a panel entry with delay_200_500_e50 for the AUO NE135FBM-N41 > version 8.1, found on a number of ACER laptops, including the > Swift 3 (SF313-52, SF313-53), Chromebook Spin 513 (CP513-2H) and > others. > > Signed-off-by: AngeloGioacchino Del Regno > > --- > drivers/gpu/drm/panel/panel-edp.c | 1 + > 1 file changed, 1 insertion(+) Reviewed-by: Douglas Anderson There's not lots of reason to delay landing patches like this, so pushing to drm-misc-next right away. a80c882183e3 drm/panel-edp: Add AUO NE135FBM-N41 v8.1 panel entry
Re: [PATCH v3 54/65] clk: da8xx: clk48: Switch to determine_rate
On 4/4/23 5:11 AM, Maxime Ripard wrote: The TI DA8xx USB0 clk48 clocks implements a mux with a set_parent hook, but doesn't provide a determine_rate implementation. This is a bit odd, since set_parent() is there to, as its name implies, change the parent of a clock. However, the most likely candidate to trigger that parent change is a call to clk_set_rate(), with determine_rate() figuring out which parent is the best suited for a given rate. As mentioned in my previous review, parent is selected by device tree and should never be changed after init.
Re: [PATCH v3 21/65] clk: davinci: da8xx-cfgchip: Add a determine_rate hook
On 4/4/23 5:11 AM, Maxime Ripard wrote: The Davinci DA8xxx cfgchip mux clock implements a mux with a set_parent hook, but doesn't provide a determine_rate implementation. This is a bit odd, since set_parent() is there to, as its name implies, change the parent of a clock. However, the most likely candidate to trigger that parent change is a call to clk_set_rate(), with determine_rate() figuring out which parent is the best suited for a given rate. As mentioned in my previous review, parent is selected by device tree and should never be changed after init.
Re: [PATCH v3 22/65] clk: davinci: da8xx-cfgchip: Add a determine_rate hook
On 4/4/23 5:11 AM, Maxime Ripard wrote: The Davinci DA8xxx cfgchip "clk48" clock implements a mux with a set_parent hook, but doesn't provide a determine_rate implementation. This is a bit odd, since set_parent() is there to, as its name implies, change the parent of a clock. However, the most likely candidate to trigger that parent change is a call to clk_set_rate(), with determine_rate() figuring out which parent is the best suited for a given rate. As mentioned in my previous review, parent is selected by device tree and should never be changed after init.
Re: [PATCH 0/4] log2: make is_power_of_2() more generic
On 31/03/2023 09:31, Jani Nikula wrote: > On Thu, 30 Mar 2023, Andrew Morton wrote: >> On Thu, 30 Mar 2023 21:53:03 + David Laight >> wrote: >> But wouldn't all these issues be addressed by simply doing #define is_power_of_2(n) (n != 0 && ((n & (n - 1)) == 0)) ? (With suitable tweaks to avoid evaluating `n' more than once) >>> >>> I think you need to use the 'horrid tricks' from min() to get >>> a constant expression from constant inputs. >> >> This >> >> --- a/include/linux/log2.h~a >> +++ a/include/linux/log2.h >> @@ -41,11 +41,11 @@ int __ilog2_u64(u64 n) >> * *not* considered a power of two. >> * Return: true if @n is a power of 2, otherwise false. >> */ >> -static inline __attribute__((const)) >> -bool is_power_of_2(unsigned long n) >> -{ >> -return (n != 0 && ((n & (n - 1)) == 0)); >> -} >> +#define is_power_of_2(_n) \ >> +({ \ >> +typeof(_n) n = (_n);\ >> +n != 0 && ((n & (n - 1)) == 0); \ >> +}) >> >> /** >> * __roundup_pow_of_two() - round up to nearest power of two >> _ >> >> worked for me in a simple test. >> >> --- a/fs/open.c~b >> +++ a/fs/open.c >> @@ -1564,3 +1564,10 @@ int stream_open(struct inode *inode, str >> } >> >> EXPORT_SYMBOL(stream_open); >> + >> +#include >> + >> +int foo(void) >> +{ >> +return is_power_of_2(43); >> +} >> _ >> >> >> foo: >> # fs/open.c:1573: } >> xorl%eax, %eax # >> ret >> >> >> Is there some more tricky situation where it breaks? > > It doesn't work with BUILD_BUG_ON_ZERO(). Like most programming problems, you just need another layer of indirection! The below works for me in all the cases I could think of (including __uint128_t). #define __IS_POWER_OF_2(n) (n != 0 && ((n & (n - 1)) == 0)) #define _IS_POWER_OF_2(n, unique_n) \ ({ \ typeof(n) unique_n = (n); \ __IS_POWER_OF_2(unique_n); \ }) #define is_power_of_2(n)\ __builtin_choose_expr(__is_constexpr((n)), \ __IS_POWER_OF_2((n)), \ _IS_POWER_OF_2(n, __UNIQUE_ID(_n))) Although Jani's original might be easier to understand. Steve
Re: [PATCH v3 54/65] clk: da8xx: clk48: Switch to determine_rate
Hi David, On Wed, Apr 05, 2023 at 10:03:24AM -0500, David Lechner wrote: > On 4/4/23 5:11 AM, Maxime Ripard wrote: > > The TI DA8xx USB0 clk48 clocks implements a mux with a set_parent > > hook, but doesn't provide a determine_rate implementation. > > > > This is a bit odd, since set_parent() is there to, as its name implies, > > change the parent of a clock. However, the most likely candidate to > > trigger that parent change is a call to clk_set_rate(), with > > determine_rate() figuring out which parent is the best suited for a > > given rate. > > > > As mentioned in my previous review, parent is selected by device > tree and should never be changed after init. Great minds think alike then, because the driver implements exactly that, either before or after that patch. That patch makes the current behaviour explicit but doesn't change it in any way. So I guess that means that I can add your Acked-by on the three patches you reviewed with the same message? Maxime signature.asc Description: PGP signature
Re: [PATCH v3 56/65] clk: ingenic: cgu: Switch to determine_rate
Hi Paul, On Wed, Apr 05, 2023 at 03:04:05PM +0200, Paul Cercueil wrote: > Le mardi 04 avril 2023 à 12:11 +0200, Maxime Ripard a écrit : > > The Ingenic CGU clocks implements a mux with a set_parent hook, but > > doesn't provide a determine_rate implementation. > > > > This is a bit odd, since set_parent() is there to, as its name > > implies, > > change the parent of a clock. However, the most likely candidate to > > trigger that parent change is a call to clk_set_rate(), with > > determine_rate() figuring out which parent is the best suited for a > > given rate. > > > > The other trigger would be a call to clk_set_parent(), but it's far > > less > > used, and it doesn't look like there's any obvious user for that > > clock. > > > > So, the set_parent hook is effectively unused, possibly because of an > > oversight. However, it could also be an explicit decision by the > > original author to avoid any reparenting but through an explicit call > > to > > clk_set_parent(). > > As I said in the v2 (IIRC), clk_set_parent() is used when re-parenting > from the device tree. Yep, it's indeed an oversight in the commit log. > > The driver does implement round_rate() though, which means that we > > can > > change the rate of the clock, but we will never get to change the > > parent. > > > > However, It's hard to tell whether it's been done on purpose or not. > > > > Since we'll start mandating a determine_rate() implementation, let's > > convert the round_rate() implementation to a determine_rate(), which > > will also make the current behavior explicit. And if it was an > > oversight, the clock behaviour can be adjusted later on. > > So just to be sure, this patch won't make clk_set_rate() automatically > switch parents, right? > > Allowing automatic re-parenting sounds like a huge can of worms... The behaviour is strictly equivalent before that patch and after: the driver will not reparent on a rate change. It just makes it explicit. Maxime signature.asc Description: PGP signature
Re: [PATCH v3 43/65] ASoC: tlv320aic32x4: Add a determine_rate hook
Hi Mark, On Tue, Apr 04, 2023 at 04:26:18PM +0100, Mark Brown wrote: > On Tue, Apr 04, 2023 at 12:11:33PM +0200, Maxime Ripard wrote: > > The tlv320aic32x4 clkin clock implements a mux with a set_parent hook, > > but doesn't provide a determine_rate implementation. > > > This is a bit odd, since set_parent() is there to, as its name implies, > > change the parent of a clock. However, the most likely candidate to > > trigger that parent change is a call to clk_set_rate(), with > > determine_rate() figuring out which parent is the best suited for a > > given rate. > > > The other trigger would be a call to clk_set_parent(), but it's far less > > used, and it doesn't look like there's any obvious user for that clock. > > It could be configured from device tree as well couldn't it? Yep, indeed. > > So, the set_parent hook is effectively unused, possibly because of an > > oversight. However, it could also be an explicit decision by the > > original author to avoid any reparenting but through an explicit call to > > clk_set_parent(). > > Historically clk_set_rate() wouldn't reparent IIRC. > > > The latter case would be equivalent to setting the flag > > CLK_SET_RATE_NO_REPARENT, together with setting our determine_rate hook > > to __clk_mux_determine_rate(). Indeed, if no determine_rate > > implementation is provided, clk_round_rate() (through > > clk_core_round_rate_nolock()) will call itself on the parent if > > CLK_SET_RATE_PARENT is set, and will not change the clock rate > > otherwise. __clk_mux_determine_rate() has the exact same behavior when > > CLK_SET_RATE_NO_REPARENT is set. > > > And if it was an oversight, then we are at least explicit about our > > behavior now and it can be further refined down the line. > > To be honest it's surprising that we'd have to manually specify this, I > would expect to be able to reparent. I suspect it'd be better to go the > other way here and allow reparenting. Yeah, I think I'd prefer to allow reparenting too, but as can be seen from the other reviewers in that thread, it seems like we have a very split community here, so that doesn't sound very realistic without some major pushback :) Maxime signature.asc Description: PGP signature
Re: [PATCH v3 64/65] ASoC: tlv320aic32x4: div: Switch to determine_rate
On Tue, Apr 04, 2023 at 12:11:54PM +0200, Maxime Ripard wrote: > The driver does implement round_rate() though, which means that we can > change the rate of the clock, but we will never get to change the > parent. > However, It's hard to tell whether it's been done on purpose or not. > Since we'll start mandating a determine_rate() implementation, let's > convert the round_rate() implementation to a determine_rate(), which > will also make the current behavior explicit. And if it was an > oversight, the clock behaviour can be adjusted later on. Similar comments to the other patch, I'm pretty sure this is just surprising design on the part of the clock API and we should just allow reparenting. signature.asc Description: PGP signature
Re: [PATCH v3 63/65] ASoC: tlv320aic32x4: pll: Switch to determine_rate
On Tue, Apr 04, 2023 at 12:11:53PM +0200, Maxime Ripard wrote: > The driver does implement round_rate() though, which means that we can > change the rate of the clock, but we will never get to change the > parent. > However, It's hard to tell whether it's been done on purpose or not. > Since we'll start mandating a determine_rate() implementation, let's > convert the round_rate() implementation to a determine_rate(), which > will also make the current behavior explicit. And if it was an > oversight, the clock behaviour can be adjusted later on. Similar comments to the other patch, I'm pretty sure this is just surprising design on the part of the clock API and we should just allow reparenting. signature.asc Description: PGP signature
Re: [PATCH 0/7] drm/tegra: Convert fbdev to DRM client
Am 05.04.23 um 16:55 schrieb Thierry Reding: On Thu, Mar 30, 2023 at 10:36:00AM +0200, Thomas Zimmermann wrote: Convert tegra's fbdev code to struct drm_client. Replaces the current ad-hoc integration. The conversion includes a number of cleanups. As with most other drivers' fbdev emulation, fbdev in tegra is now just another DRM client that runs after the DRM device has been registered. Once all drivers' fbdev emulation has been converted to struct drm_client, we can attempt to add additional in-kernel clients. A DRM-based dmesg log or a bootsplash are commonly mentioned. DRM can then switch easily among the existing clients if/when required. I did the conversion from similar experience with other drivers. But I don't have the hardware to test this. Any testing is welcome. Thomas Zimmermann (7): drm/tegra: Include drm/tegra: Include drm/tegra: Removed fb from struct tegra_fbdev drm/tegra: Remove struct tegra_fbdev drm/tegra: Hide fbdev support behind config option drm/tegra: Initialize fbdev DRM client drm/tegra: Implement fbdev emulation as in-kernel client drivers/gpu/drm/tegra/Makefile | 2 + drivers/gpu/drm/tegra/drm.c| 23 +--- drivers/gpu/drm/tegra/drm.h| 27 ++-- drivers/gpu/drm/tegra/fb.c | 242 + drivers/gpu/drm/tegra/fbdev.c | 240 drivers/gpu/drm/tegra/output.c | 3 + drivers/gpu/drm/tegra/rgb.c| 1 + 7 files changed, 265 insertions(+), 273 deletions(-) create mode 100644 drivers/gpu/drm/tegra/fbdev.c Seems to be working just fine. Applied, thanks. Thanks a lot! Thierry -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev OpenPGP_signature Description: OpenPGP digital signature
[PATCH 16/18] arch/sparc: Implement fb_is_primary_device() in source file
Other architectures implment fb_is_primary_device() in a source file. Do the same on sparc. No functional changes, but allows to remove several include statement from . Signed-off-by: Thomas Zimmermann Cc: "David S. Miller" --- arch/sparc/Makefile | 1 + arch/sparc/include/asm/fb.h | 22 +- arch/sparc/video/Makefile | 3 +++ arch/sparc/video/fbdev.c| 24 4 files changed, 33 insertions(+), 17 deletions(-) create mode 100644 arch/sparc/video/Makefile create mode 100644 arch/sparc/video/fbdev.c diff --git a/arch/sparc/Makefile b/arch/sparc/Makefile index a4ea5b05f288..95a9211e48e3 100644 --- a/arch/sparc/Makefile +++ b/arch/sparc/Makefile @@ -60,6 +60,7 @@ libs-y += arch/sparc/prom/ libs-y += arch/sparc/lib/ drivers-$(CONFIG_PM) += arch/sparc/power/ +drivers-$(CONFIG_FB) += arch/sparc/video/ boot := arch/sparc/boot diff --git a/arch/sparc/include/asm/fb.h b/arch/sparc/include/asm/fb.h index f699962e9ddf..e4ef1955b2b6 100644 --- a/arch/sparc/include/asm/fb.h +++ b/arch/sparc/include/asm/fb.h @@ -1,12 +1,14 @@ /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _SPARC_FB_H_ #define _SPARC_FB_H_ -#include -#include + #include + #include #include +struct fb_info; + static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma, unsigned long off) { @@ -15,20 +17,6 @@ static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma, #endif } -static inline int fb_is_primary_device(struct fb_info *info) -{ - struct device *dev = info->device; - struct device_node *node; - - if (console_set_on_cmdline) - return 0; - - node = dev->of_node; - if (node && - node == of_console_device) - return 1; - - return 0; -} +int fb_is_primary_device(struct fb_info *info); #endif /* _SPARC_FB_H_ */ diff --git a/arch/sparc/video/Makefile b/arch/sparc/video/Makefile new file mode 100644 index ..6baddbd58e4d --- /dev/null +++ b/arch/sparc/video/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0-only + +obj-$(CONFIG_FB) += fbdev.o diff --git a/arch/sparc/video/fbdev.c b/arch/sparc/video/fbdev.c new file mode 100644 index ..dadd5799fbb3 --- /dev/null +++ b/arch/sparc/video/fbdev.c @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include + +#include +#include + +int fb_is_primary_device(struct fb_info *info) +{ + struct device *dev = info->device; + struct device_node *node; + + if (console_set_on_cmdline) + return 0; + + node = dev->of_node; + if (node && node == of_console_device) + return 1; + + return 0; +} +EXPORT_SYMBOL(fb_is_primary_device); -- 2.40.0
[PATCH 14/18] arch/powerpc: Implement with generic helpers
Replace the architecture's fb_is_primary_device() with the generic one from . No functional changes. Signed-off-by: Thomas Zimmermann Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy --- arch/powerpc/include/asm/fb.h | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/fb.h b/arch/powerpc/include/asm/fb.h index 6541ab77c5b9..5f1a2e5f7654 100644 --- a/arch/powerpc/include/asm/fb.h +++ b/arch/powerpc/include/asm/fb.h @@ -2,8 +2,8 @@ #ifndef _ASM_FB_H_ #define _ASM_FB_H_ -#include #include + #include static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma, @@ -13,10 +13,8 @@ static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma, vma->vm_end - vma->vm_start, vma->vm_page_prot); } +#define fb_pgprotect fb_pgprotect -static inline int fb_is_primary_device(struct fb_info *info) -{ - return 0; -} +#include #endif /* _ASM_FB_H_ */ -- 2.40.0
[PATCH 12/18] arch/parisc: Implement fb_is_primary_device() under arch/parisc
Move PARISC's implementation of fb_is_primary_device() into the architecture directory. This the place of the declaration and where other architectures implement this function. No functional changes. Signed-off-by: Thomas Zimmermann Cc: "James E.J. Bottomley" Cc: Helge Deller --- arch/parisc/Makefile | 2 ++ arch/parisc/include/asm/fb.h | 2 +- arch/parisc/video/Makefile | 3 +++ arch/parisc/video/fbdev.c| 27 +++ drivers/video/sticore.c | 19 --- include/video/sticore.h | 2 ++ 6 files changed, 35 insertions(+), 20 deletions(-) create mode 100644 arch/parisc/video/Makefile create mode 100644 arch/parisc/video/fbdev.c diff --git a/arch/parisc/Makefile b/arch/parisc/Makefile index 0d049a6f6a60..968ebe17494c 100644 --- a/arch/parisc/Makefile +++ b/arch/parisc/Makefile @@ -119,6 +119,8 @@ export LIBGCC libs-y += arch/parisc/lib/ $(LIBGCC) +drivers-y += arch/parisc/video/ + boot := arch/parisc/boot PALO := $(shell if (which palo 2>&1); then : ; \ diff --git a/arch/parisc/include/asm/fb.h b/arch/parisc/include/asm/fb.h index 55d29c4f716e..0b9a38ced5c8 100644 --- a/arch/parisc/include/asm/fb.h +++ b/arch/parisc/include/asm/fb.h @@ -12,7 +12,7 @@ static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma, pgprot_val(vma->vm_page_prot) |= _PAGE_NO_CACHE; } -#if defined(CONFIG_FB_STI) +#if defined(CONFIG_STI_CORE) int fb_is_primary_device(struct fb_info *info); #else static inline int fb_is_primary_device(struct fb_info *info) diff --git a/arch/parisc/video/Makefile b/arch/parisc/video/Makefile new file mode 100644 index ..16a73cce4661 --- /dev/null +++ b/arch/parisc/video/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0-only + +obj-$(CONFIG_STI_CORE) += fbdev.o diff --git a/arch/parisc/video/fbdev.c b/arch/parisc/video/fbdev.c new file mode 100644 index ..4a0ae08fc75b --- /dev/null +++ b/arch/parisc/video/fbdev.c @@ -0,0 +1,27 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2000 Philipp Rumpf + * Copyright (C) 2001-2020 Helge Deller + * Copyright (C) 2001-2002 Thomas Bogendoerfer + */ + +#include + +#include + +#include + +int fb_is_primary_device(struct fb_info *info) +{ + struct sti_struct *sti; + + sti = sti_get_rom(0); + + /* if no built-in graphics card found, allow any fb driver as default */ + if (!sti) + return true; + + /* return true if it's the default built-in framebuffer driver */ + return (sti->info == info); +} +EXPORT_SYMBOL(fb_is_primary_device); diff --git a/drivers/video/sticore.c b/drivers/video/sticore.c index f8aaedea437d..7eb925f2ba9c 100644 --- a/drivers/video/sticore.c +++ b/drivers/video/sticore.c @@ -30,7 +30,6 @@ #include #include #include -#include #include @@ -1148,24 +1147,6 @@ int sti_call(const struct sti_struct *sti, unsigned long func, return ret; } -#if defined(CONFIG_FB_STI) -/* check if given fb_info is the primary device */ -int fb_is_primary_device(struct fb_info *info) -{ - struct sti_struct *sti; - - sti = sti_get_rom(0); - - /* if no built-in graphics card found, allow any fb driver as default */ - if (!sti) - return true; - - /* return true if it's the default built-in framebuffer driver */ - return (sti->info == info); -} -EXPORT_SYMBOL(fb_is_primary_device); -#endif - MODULE_AUTHOR("Philipp Rumpf, Helge Deller, Thomas Bogendoerfer"); MODULE_DESCRIPTION("Core STI driver for HP's NGLE series graphics cards in HP PARISC machines"); MODULE_LICENSE("GPL v2"); diff --git a/include/video/sticore.h b/include/video/sticore.h index c0879352cde4..fbb78d7e7565 100644 --- a/include/video/sticore.h +++ b/include/video/sticore.h @@ -2,6 +2,8 @@ #ifndef STICORE_H #define STICORE_H +struct fb_info; + /* generic STI structures & functions */ #define MAX_STI_ROMS 4 /* max no. of ROMs which this driver handles */ -- 2.40.0
[PATCH 04/18] arch/arm64: Implement with generic helpers
Replace the architecture's fb_is_primary_device() with the generic one from . No functional changes. Signed-off-by: Thomas Zimmermann Cc: Catalin Marinas Cc: Will Deacon --- arch/arm64/include/asm/fb.h | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/arch/arm64/include/asm/fb.h b/arch/arm64/include/asm/fb.h index bdc735ee1f67..fc31a5d1f48a 100644 --- a/arch/arm64/include/asm/fb.h +++ b/arch/arm64/include/asm/fb.h @@ -5,19 +5,17 @@ #ifndef __ASM_FB_H_ #define __ASM_FB_H_ -#include -#include #include +struct file; + static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma, unsigned long off) { vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); } +#define fb_pgprotect fb_pgprotect -static inline int fb_is_primary_device(struct fb_info *info) -{ - return 0; -} +#include #endif /* __ASM_FB_H_ */ -- 2.40.0
[PATCH 15/18] arch/sh: Implement with generic helpers
Replace the architecture's fb_is_primary_device() with the generic one from . No functional changes. Signed-off-by: Thomas Zimmermann Cc: Yoshinori Sato Cc: Rich Felker Cc: John Paul Adrian Glaubitz --- arch/sh/include/asm/fb.h | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/arch/sh/include/asm/fb.h b/arch/sh/include/asm/fb.h index 9a0bca2686fd..1e7b1cfd5b5e 100644 --- a/arch/sh/include/asm/fb.h +++ b/arch/sh/include/asm/fb.h @@ -2,19 +2,17 @@ #ifndef _ASM_FB_H_ #define _ASM_FB_H_ -#include -#include #include +struct file; + static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma, unsigned long off) { vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); } +#define fb_pgprotect fb_pgprotect -static inline int fb_is_primary_device(struct fb_info *info) -{ - return 0; -} +#include #endif /* _ASM_FB_H_ */ -- 2.40.0
[PATCH 18/18] arch/x86: Implement with generic helpers
Include and set the required preprocessor tokens correctly. x86 now implements its own set of fb helpers, but still follows the overall pattern. Signed-off-by: Thomas Zimmermann Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: "H. Peter Anvin" --- arch/x86/include/asm/fb.h | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/fb.h b/arch/x86/include/asm/fb.h index ab4c960146e3..a3fb801f12f1 100644 --- a/arch/x86/include/asm/fb.h +++ b/arch/x86/include/asm/fb.h @@ -2,10 +2,11 @@ #ifndef _ASM_X86_FB_H #define _ASM_X86_FB_H -#include -#include #include +struct fb_info; +struct file; + static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma, unsigned long off) { @@ -16,7 +17,11 @@ static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma, pgprot_val(vma->vm_page_prot) = prot | cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS); } +#define fb_pgprotect fb_pgprotect + +int fb_is_primary_device(struct fb_info *info); +#define fb_is_primary_device fb_is_primary_device -extern int fb_is_primary_device(struct fb_info *info); +#include #endif /* _ASM_X86_FB_H */ -- 2.40.0
[PATCH 06/18] arch/loongarch: Implement with generic helpers
Replace the architecture's fb_is_primary_device() with the generic one from . No functional changes. Signed-off-by: Thomas Zimmermann Cc: Huacai Chen Cc: WANG Xuerui --- arch/loongarch/include/asm/fb.h | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/arch/loongarch/include/asm/fb.h b/arch/loongarch/include/asm/fb.h index 3116bde8772d..d1c9dd1c6e2e 100644 --- a/arch/loongarch/include/asm/fb.h +++ b/arch/loongarch/include/asm/fb.h @@ -5,19 +5,17 @@ #ifndef _ASM_FB_H_ #define _ASM_FB_H_ -#include -#include #include +struct file; + static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma, unsigned long off) { vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); } +#define fb_pgprotect fb_pgprotect -static inline int fb_is_primary_device(struct fb_info *info) -{ - return 0; -} +#include #endif /* _ASM_FB_H_ */ -- 2.40.0
[PATCH 07/18] arch/m68k: Implement with generic helpers
Replace the architecture's fb_is_primary_device() with the generic one from . No functional changes. Also use the generic helper for fb_pgprotect() on systems without MMU. Signed-off-by: Thomas Zimmermann Cc: Geert Uytterhoeven --- arch/m68k/include/asm/fb.h | 10 +++--- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/arch/m68k/include/asm/fb.h b/arch/m68k/include/asm/fb.h index b86c6e2e26dd..f15a14e36826 100644 --- a/arch/m68k/include/asm/fb.h +++ b/arch/m68k/include/asm/fb.h @@ -2,8 +2,8 @@ #ifndef _ASM_FB_H_ #define _ASM_FB_H_ -#include #include + #include #include @@ -27,13 +27,9 @@ static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma, } } #endif /* CONFIG_SUN3 */ -#else -#define fb_pgprotect(...) do {} while (0) +#define fb_pgprotect fb_pgprotect #endif /* CONFIG_MMU */ -static inline int fb_is_primary_device(struct fb_info *info) -{ - return 0; -} +#include #endif /* _ASM_FB_H_ */ -- 2.40.0