Re: drm/bridge: anx7625: Set downstream sink into normal status
Reviewed-by: Pin-Yen Lin On Wed, Mar 2, 2022 at 8:09 PM Xin Ji wrote: > > As downstream sink was set into standby mode while bridge disabled, > this patch used for setting downstream sink into normal status > while enable bridge. > > Signed-off-by: Xin Ji > --- > drivers/gpu/drm/bridge/analogix/anx7625.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/drivers/gpu/drm/bridge/analogix/anx7625.c > b/drivers/gpu/drm/bridge/analogix/anx7625.c > index 9aab879a8851..963eaf73ecab 100644 > --- a/drivers/gpu/drm/bridge/analogix/anx7625.c > +++ b/drivers/gpu/drm/bridge/analogix/anx7625.c > @@ -919,12 +919,20 @@ static void anx7625_dp_start(struct anx7625_data *ctx) > { > int ret; > struct device *dev = &ctx->client->dev; > + u8 data; > > if (!ctx->display_timing_valid) { > DRM_DEV_ERROR(dev, "mipi not set display timing yet.\n"); > return; > } > > + dev_info(dev, "set downstream sink into normal\n"); > + /* Downstream sink enter into normal mode */ > + data = 1; > + ret = anx7625_aux_trans(ctx, DP_AUX_NATIVE_WRITE, 0x000600, 1, &data); > + if (ret < 0) > + dev_err(dev, "IO error : set sink into normal mode fail\n"); > + > /* Disable HDCP */ > anx7625_write_and(ctx, ctx->i2c.rx_p1_client, 0xee, 0x9f); >
[PULL] drm-intel-gt-next
Hi Dave & Daniel, Here is the last feature PR for v5.18. For new platforms we have got more DG2 enabling: small BAR foundations, 64K page support and accelerated migration. For XeHP SDV we've got flat CCS detection and compute command streamer being added. Disabling i915 build on PREEMPT_RT for now due to deadlocks and warnings. Fixes to GuC data structure accesses on ARM platforms. A couple of other GuC init and SLPC fixes. Then the usual bits of cleanup and smaller fixes. Regards, Joonas *** drm-intel-gt-next-2022-03-03: Cross-subsystem Changes: - drm-next backmerge for buddy allocator changes Driver Changes: - Skip i915_perf init for DG2 as it is not yet enabled (Ram) - Add missing workarounds for DG2 (Clint) - Add 64K page/align support for platforms like DG2 that require it (Matt A, Ram, Bob) - Add accelerated migration support for DG2 (Matt A) - Add flat CCS support for XeHP SDV (Abdiel, Ram) - Add Compute Command Streamer (CCS) engine support for XeHP SDV (Michel, Daniele, Aravind, Matt R) - Don't support parallel submission on compute / render (Matt B, Matt R) - Disable i915 build on PREEMPT_RT until RT behaviour fixed (Sebastian) - Remove RPS interrupt support for TGL+ (Jose) - Fix S/R with PM_EARLY for non-GTT mappable objects on DG2 (Matt, Lucas) - Skip stolen memory init if it is fully reserved (Jose) - Use iosys_map for GuC data structures that may be in LMEM BAR or SMEM (Lucas) - Do not complain about stale GuC reset notifications for banned contexts (John) - Move context descriptor fields to intel_lrc.h (Matt R) - Start adding support for small BAR (Matt A) - Clarify vma lifetime (Thomas) - Simplify subplatform detection on TGL (Jose) - Correct the param count for unset GuC SLPC param (Vinay, Umesh) - Read RP_STATE_CAP correctly on Gen12 with GuC SLPC (Vinay) - Initialize GuC submission locks and queues early (Daniele) - Fix GuC flag query helper function to not modify state (John) - Drop fake lmem support now we have real hardware available (Lucas) - Move misplaced W/A to their correct locations (Srinivasan) - Use get_reset_domain() helper (Tejas) - Move context descriptor fields to intel_lrc.h (Matt R) - Selftest improvements (Matt A) The following changes since commit 54f43c17d681f6d9523fcfaeefc9df77993802e1: Merge tag 'drm-misc-next-2022-02-23' of git://anongit.freedesktop.org/drm/drm-misc into drm-next (2022-02-25 05:50:18 +1000) are available in the Git repository at: git://anongit.freedesktop.org/drm/drm-intel tags/drm-intel-gt-next-2022-03-03 for you to fetch changes up to b2006061ae28fe7e84af6c9757ee89c4e505e92b: drm/i915/xehpsdv: Move render/compute engine reset domains related workarounds (2022-03-02 06:52:42 -0800) Cross-subsystem Changes: - drm-next backmerge for buddy allocator changes Driver Changes: - Skip i915_perf init for DG2 as it is not yet enabled (Ram) - Add missing workarounds for DG2 (Clint) - Add 64K page/align support for platforms like DG2 that require it (Matt A, Ram, Bob) - Add accelerated migration support for DG2 (Matt A) - Add flat CCS support for XeHP SDV (Abdiel, Ram) - Add Compute Command Streamer (CCS) engine support for XeHP SDV (Michel, Daniele, Aravind, Matt R) - Don't support parallel submission on compute / render (Matt B, Matt R) - Disable i915 build on PREEMPT_RT until RT behaviour fixed (Sebastian) - Remove RPS interrupt support for TGL+ (Jose) - Fix S/R with PM_EARLY for non-GTT mappable objects on DG2 (Matt, Lucas) - Skip stolen memory init if it is fully reserved (Jose) - Use iosys_map for GuC data structures that may be in LMEM BAR or SMEM (Lucas) - Do not complain about stale GuC reset notifications for banned contexts (John) - Move context descriptor fields to intel_lrc.h - Start adding support for small BAR (Matt A) - Clarify vma lifetime (Thomas) - Simplify subplatform detection on TGL (Jose) - Correct the param count for unset GuC SLPC param (Vinay, Umesh) - Read RP_STATE_CAP correctly on Gen12 with GuC SLPC (Vinay) - Initialize GuC submission locks and queues early (Daniele) - Fix GuC flag query helper function to not modify state (John) - Drop fake lmem support now we have real hardware available (Lucas) - Move misplaced W/A to their correct locations (Srinivasan) - Use get_reset_domain() helper (Tejas) - Move context descriptor fields to intel_lrc.h (Matt R) - Selftest improvements (Matt A) Abdiel Janulgue (1): drm/i915/lmem: Enable lmem for platforms with Flat CCS CQ Tang (1): drm/i915/xehpsdv: Add has_flat_ccs to device info Clint Taylor (1): drm/i915/dg2: add Wa_14014947963 Daniele Ceraolo Spurio (4): drm/i915/guc: Initialize GuC submission locks and queues early drm/i915/xehp: compute engine pipe_control drm/i915/xehp/guc: enable compute engine inside GuC drm/i915/xehp: handle fused off CCS engines John Harrison (2):
Re: [PATCH v2 1/4] drm/i915/gt: Clear compress metadata for Xe_HP platforms
On Wed, 2022-03-02 at 03:23 +0530, Ramalingam C wrote: > From: Ayaz A Siddiqui > > Xe-HP and latest devices support Flat CCS which reserved a portion of > the device memory to store compression metadata, during the clearing > of > device memory buffer object we also need to clear the associated > CCS buffer. > > Flat CCS memory can not be directly accessed by S/W. > Address of CCS buffer associated main BO is automatically calculated > by device itself. KMD/UMD can only access this buffer indirectly > using > XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer. > > v2: Fixed issues with platform naming [Lucas] > v3: Rebased [Ram] > Used the round_up funcs [Bob] > v4: Fixed ccs blk calculation [Ram] > Added Kdoc on flat-ccs. > v5: GENMASK is used [Matt] > mocs fix [Matt] > Comments Fix [Matt] > Flush address programming [Ram] > v6: FLUSH_DW is fixed > Few coding style fix > > Signed-off-by: Ayaz A Siddiqui > Signed-off-by: Ramalingam C > --- > drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 15 ++ > drivers/gpu/drm/i915/gt/intel_migrate.c | 143 > ++- > 2 files changed, 154 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h > b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h > index f8253012d166..237c1baccc64 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h > +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h > @@ -203,6 +203,21 @@ > #define GFX_OP_DRAWRECT_INFO > ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3)) > #define GFX_OP_DRAWRECT_INFO_I965 ((0x7900<<16)|0x2) > > +#define XY_CTRL_SURF_INSTR_SIZE5 > +#define MI_FLUSH_DW_SIZE 3 > +#define XY_CTRL_SURF_COPY_BLT ((2 << 29) | (0x48 << 22) | > 3) > +#define SRC_ACCESS_TYPE_SHIFT21 > +#define DST_ACCESS_TYPE_SHIFT20 > +#define CCS_SIZE_MASKGENMASK(17, 8) > +#define XY_CTRL_SURF_MOCS_MASK GENMASK(31, 25) > +#define NUM_CCS_BYTES_PER_BLOCK 256 > +#define NUM_BYTES_PER_CCS_BYTE 256 > +#define NUM_CCS_BLKS_PER_XFER1024 > +#define INDIRECT_ACCESS 0 > +#define DIRECT_ACCESS1 > +#define MI_FLUSH_LLC BIT(9) > +#define MI_FLUSH_CCS BIT(16) > + > #define COLOR_BLT_CMD (2 << 29 | 0x40 << 22 | (5 - > 2)) > #define XY_COLOR_BLT_CMD (2 << 29 | 0x50 << 22) > #define SRC_COPY_BLT_CMD (2 << 29 | 0x43 << 22) > diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c > b/drivers/gpu/drm/i915/gt/intel_migrate.c > index 20444d6ceb3c..330fcdc3e0cf 100644 > --- a/drivers/gpu/drm/i915/gt/intel_migrate.c > +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c > @@ -16,6 +16,8 @@ struct insert_pte_data { > }; > > #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */ > +#define GET_CCS_BYTES(i915, size) (HAS_FLAT_CCS(i915) ? \ > + DIV_ROUND_UP(size, > NUM_BYTES_PER_CCS_BYTE) : 0) > > static bool engine_supports_migration(struct intel_engine_cs > *engine) > { > @@ -467,6 +469,110 @@ static bool wa_1209644611_applies(int ver, u32 > size) > return height % 4 == 3 && height <= 8; > } > > +/** > + * DOC: Flat-CCS - Memory compression for Local memory > + * > + * On Xe-HP and later devices, we use dedicated compression control > state (CCS) > + * stored in local memory for each surface, to support the 3D and > media > + * compression formats. > + * > + * The memory required for the CCS of the entire local memory is > 1/256 of the > + * local memory size. So before the kernel boot, the required memory > is reserved > + * for the CCS data and a secure register will be programmed with > the CCS base > + * address. > + * > + * Flat CCS data needs to be cleared when a lmem object is > allocated. > + * And CCS data can be copied in and out of CCS region through > + * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly. > + * > + * When we exhaust the lmem, if the object's placements support > smem, then we can > + * directly decompress the compressed lmem object into smem and > start using it > + * from smem itself. > + * > + * But when we need to swapout the compressed lmem object into a > smem region > + * though objects' placement doesn't support smem, then we copy the > lmem content > + * as it is into smem region along with ccs data (using > XY_CTRL_SURF_COPY_BLT). > + * When the object is referred, lmem content will be swaped in along > with > + * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at > corresponding > + * location. > + */ > + > +static inline u32 *i915_flush_dw(u32 *cmd, u32 flags) > +{ > + *cmd++ = MI_FLUSH_DW | flags; > + *cmd++ = 0; > + *cmd++ = 0; > + > + return cmd; > +} > + > +static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, > int size) > +{ > + u32 num_cmds, num_blks, total_s
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
> On 3. Mar 2022, at 05:58, David Laight wrote: > > From: Xiaomeng Tong >> Sent: 03 March 2022 02:27 >> >> On Wed, 2 Mar 2022 14:04:06 +, David Laight >> wrote: >>> I think that it would be better to make any alternate loop macro >>> just set the variable to NULL on the loop exit. >>> That is easier to code for and the compiler might be persuaded to >>> not redo the test. >> >> No, that would lead to a NULL dereference. > > Why, it would make it b ethe same as the 'easy to use': > for (item = head; item; item = item->next) { > ... > if (...) > break; > ... > } > if (!item) > return; > >> The problem is the mis-use of iterator outside the loop on exit, and >> the iterator will be the HEAD's container_of pointer which pointers >> to a type-confused struct. Sidenote: The *mis-use* here refers to >> mistakely access to other members of the struct, instead of the >> list_head member which acutally is the valid HEAD. > > The problem is that the HEAD's container_of pointer should never > be calculated at all. > This is what is fundamentally broken about the current definition. > >> IOW, you would dereference a (NULL + offset_of_member) address here. > > Where? > >> Please remind me if i missed something, thanks. >> >> Can you share your "alternative definitions" details? thanks! > > The loop should probably use as extra variable that points > to the 'list node' in the next structure. > Something like: > for (xxx *iter = head->next; > iter == &head ? ((item = NULL),0) : ((item = > list_item(iter),1)); > iter = item->member->next) { > ... > With a bit of casting you can use 'item' to hold 'iter'. I think this would make sense, it would mean you only assign the containing element on valid elements. I was thinking something along the lines of: #define list_for_each_entry(pos, head, member) \ for (struct list_head *list = head->next, typeof(pos) pos; \ list == head ? 0 : (( pos = list_entry(pos, list, member), 1)); \ list = list->next) Although the initialization block of the for loop is not valid C, I'm not sure there is any way to declare two variables of a different type in the initialization part of the loop. I believe all this does is get rid of the &pos->member == (head) check to terminate the list. It alone will not fix any of the other issues that using the iterator variable after the loop currently has. AFAIK Adrián Moreno is working on doing something along those lines for the list iterator in openvswitch (that was similar to the kernel one before) [1]. I *think* they don't declare 'pos' within the loop which we *do want* to avoid any uses of it after the loop. (If pos is not declared in the initialization block, shadowing the *outer* pos, it would just point to the last element of the list or stay uninitialized if the list is empty). [1] https://www.mail-archive.com/ovs-dev@openvswitch.org/msg63497.html > >> >>> OTOH there may be alternative definitions that can be used to get >>> the compiler (or other compiler-like tools) to detect broken code. >>> Even if the definition can't possibly generate a working kerrnel. >> >> The "list_for_each_entry_inside(pos, type, head, member)" way makes >> the iterator invisiable outside the loop, and would be catched by >> compiler if use-after-loop things happened. > > It is also a compete PITA for anything doing a search. > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 > 1PT, UK > Registration No: 1397386 (Wales) > - Jakob
Re: linux-next: build failure after merge of the kspp tree
Hi all, ok, today I used the current kspp tree and added three patches as merge fixups: https://lore.kernel.org/all/20220228081421.1504213-1-hsi...@chromium.org/ https://cgit.freedesktop.org/drm/drm-misc/commit/?id=d4da1f27396fb1dde079447a3612f4f512caed07 https://cgit.freedesktop.org/drm/drm-misc/commit/?id=a2151490cc6c57b368d7974ffd447a8b36ade639 (the second one required a bit of fuzzing) I will remove them as they appear in the drm trees (hopefully). Kees, you just need to remember that you have a dependency on the DRM patches being in Linus' tree before you send your pull request. -- Cheers, Stephen Rothwell pgp7dNzzqLOZL.pgp Description: OpenPGP digital signature
Re: [PATCH -next] drm/amdgpu: clean up some inconsistent indenting
Am 03.03.22 um 02:52 schrieb Yang Li: Eliminate the follow smatch warning: drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c:413 amdgpu_get_xgmi_hive() warn: inconsistent indenting Reported-by: Abaci Robot Signed-off-by: Yang Li Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index 91817a31f3e1..4ff6e06babca 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -410,14 +410,14 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev) */ if (adev->reset_domain->type != XGMI_HIVE) { hive->reset_domain = amdgpu_reset_create_reset_domain(XGMI_HIVE, "amdgpu-reset-hive"); - if (!hive->reset_domain) { - dev_err(adev->dev, "XGMI: failed initializing reset domain for xgmi hive\n"); - ret = -ENOMEM; - kobject_put(&hive->kobj); - kfree(hive); - hive = NULL; - goto pro_end; - } + if (!hive->reset_domain) { + dev_err(adev->dev, "XGMI: failed initializing reset domain for xgmi hive\n"); + ret = -ENOMEM; + kobject_put(&hive->kobj); + kfree(hive); + hive = NULL; + goto pro_end; + } } else { amdgpu_reset_get_reset_domain(adev->reset_domain); hive->reset_domain = adev->reset_domain;
Re: [PATCH] drm/msm/gpu: Fix crash on devices without devfreq support
On Wed, 23 Feb 2022 at 18:46, Rob Clark wrote: > > On Tue, Feb 22, 2022 at 7:11 PM Dmitry Baryshkov > wrote: > > > > On 19/02/2022 21:33, Rob Clark wrote: > > > From: Rob Clark > > > > > > Avoid going down devfreq paths on devices where devfreq is not > > > initialized. > > > > > > Reported-by: Linux Kernel Functional Testing > > > Reported-by: Anders Roxell > > > Signed-off-by: Rob Clark > > > --- > > > drivers/gpu/drm/msm/msm_gpu_devfreq.c | 31 +-- > > > 1 file changed, 25 insertions(+), 6 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c > > > b/drivers/gpu/drm/msm/msm_gpu_devfreq.c > > > index 9bf319be11f6..26a3669a97b3 100644 > > > --- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c > > > +++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c > > > @@ -83,12 +83,17 @@ static struct devfreq_dev_profile msm_devfreq_profile > > > = { > > > static void msm_devfreq_boost_work(struct kthread_work *work); > > > static void msm_devfreq_idle_work(struct kthread_work *work); > > > > > > +static bool has_devfreq(struct msm_gpu *gpu) > > > +{ > > > + return !!gpu->funcs->gpu_busy; > > > > I see that devfreq init will be skipped if gpu_busy is NULL. > > Can we use gpu->devfreq instead of this condition? > > We could, but then we couldn't also use the same has_devfreq() helper > in msm_devfreq_init(). I thought it was clearer to use the same > helper everywhere. Well... It is not clear at first glance how gpu_busy is related to devfreq. On the other hand, if gpu->devfreq is NULL, it's obvious that devfreq is not initialized. I'd propose to use if (gpu->funcs->gpu_busy) to check if we can init devfreq and after that to check (gpu->devfreq) as a way to know whether the devfreq is available. > > > I noticed that you have replaced some of gpu->devfreq checks with > > has_devreq() calls. Is there any difference? > > It amounts to the same thing because if you don't have gpu_busy, then > devfreq is never initialized. I just thought it clearer to use the > same check in all places. See my comment above. > > BR, > -R > > > > +} > > > + > > > void msm_devfreq_init(struct msm_gpu *gpu) > > > { > > > struct msm_gpu_devfreq *df = &gpu->devfreq; > > > > > > /* We need target support to do devfreq */ > > > - if (!gpu->funcs->gpu_busy) > > > + if (!has_devfreq(gpu)) > > > return; > > > > > > dev_pm_qos_add_request(&gpu->pdev->dev, &df->idle_freq, > > > @@ -149,6 +154,9 @@ void msm_devfreq_cleanup(struct msm_gpu *gpu) > > > { > > > struct msm_gpu_devfreq *df = &gpu->devfreq; > > > > > > + if (!has_devfreq(gpu)) > > > + return; > > > + > > > devfreq_cooling_unregister(gpu->cooling); > > > dev_pm_qos_remove_request(&df->boost_freq); > > > dev_pm_qos_remove_request(&df->idle_freq); > > > @@ -156,16 +164,24 @@ void msm_devfreq_cleanup(struct msm_gpu *gpu) > > > > > > void msm_devfreq_resume(struct msm_gpu *gpu) > > > { > > > - gpu->devfreq.busy_cycles = 0; > > > - gpu->devfreq.time = ktime_get(); > > > + struct msm_gpu_devfreq *df = &gpu->devfreq; > > > > > > - devfreq_resume_device(gpu->devfreq.devfreq); > > > + if (!has_devfreq(gpu)) > > > + return; > > > + > > > + df->busy_cycles = 0; > > > + df->time = ktime_get(); > > > + > > > + devfreq_resume_device(df->devfreq); > > > } > > > > > > void msm_devfreq_suspend(struct msm_gpu *gpu) > > > { > > > struct msm_gpu_devfreq *df = &gpu->devfreq; > > > > > > + if (!has_devfreq(gpu)) > > > + return; > > > + > > > devfreq_suspend_device(df->devfreq); > > > > > > cancel_idle_work(df); > > > @@ -185,6 +201,9 @@ void msm_devfreq_boost(struct msm_gpu *gpu, unsigned > > > factor) > > > struct msm_gpu_devfreq *df = &gpu->devfreq; > > > uint64_t freq; > > > > > > + if (!has_devfreq(gpu)) > > > + return; > > > + > > > freq = get_freq(gpu); > > > freq *= factor; > > > > > > @@ -207,7 +226,7 @@ void msm_devfreq_active(struct msm_gpu *gpu) > > > struct devfreq_dev_status status; > > > unsigned int idle_time; > > > > > > - if (!df->devfreq) > > > + if (!has_devfreq(gpu)) > > > return; > > > > > > /* > > > @@ -253,7 +272,7 @@ void msm_devfreq_idle(struct msm_gpu *gpu) > > > { > > > struct msm_gpu_devfreq *df = &gpu->devfreq; > > > > > > - if (!df->devfreq) > > > + if (!has_devfreq(gpu)) > > > return; > > > > > > msm_hrtimer_queue_work(&df->idle_work, ms_to_ktime(1), > > > > > > -- > > With best wishes > > Dmitry -- With best wishes Dmitry
Re: [PATCH v2 2/3] drm/i915: Remove the vma refcount
On Wed, 2022-03-02 at 14:01 -0800, Niranjana Vishwanathapura wrote: > On Wed, Mar 02, 2022 at 11:21:59AM +0100, Thomas Hellström wrote: > > Now that i915_vma_parked() is taking the object lock on vma > > destruction, > > and the only user of the vma refcount, i915_gem_object_unbind() > > also takes the object lock, remove the vma refcount. > > > > Signed-off-by: Thomas Hellström > > --- > > drivers/gpu/drm/i915/i915_gem.c | 17 + > > drivers/gpu/drm/i915/i915_vma.c | 14 +++--- > > drivers/gpu/drm/i915/i915_vma.h | 14 -- > > drivers/gpu/drm/i915/i915_vma_types.h | 1 - > > 4 files changed, 16 insertions(+), 30 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c > > b/drivers/gpu/drm/i915/i915_gem.c > > index dd84ebabb50f..c26110abcc0b 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -151,14 +151,25 @@ int i915_gem_object_unbind(struct > > drm_i915_gem_object *obj, > > break; > > } > > > > + /* > > + * Requiring the vm destructor to take the object > > lock > > + * before destroying a vma would help us eliminate > > the > > + * i915_vm_tryget() here, AND thus also the barrier > > stuff > > + * at the end. That's an easy fix, but sleeping > > locks in > > + * a kthread should generally be avoided. > > + */ > > ret = -EAGAIN; > > if (!i915_vm_tryget(vma->vm)) > > break; > > > > - /* Prevent vma being freed by i915_vma_parked as we > > unbind */ > > - vma = __i915_vma_get(vma); > > spin_unlock(&obj->vma.lock); > > > > + /* > > + * Since i915_vma_parked() takes the object lock > > + * before vma destruction, it won't race us here, > > + * and destroy the vma from under us. > > + */ > > + > > if (vma) { > > bool vm_trylock = !!(flags & > > I915_GEM_OBJECT_UNBIND_VM_TRYLOCK); > > ret = -EBUSY; > > @@ -180,8 +191,6 @@ int i915_gem_object_unbind(struct > > drm_i915_gem_object *obj, > > ret = i915_vma_unbind(vma); > > } > > } > > - > > - __i915_vma_put(vma); > > } > > > > i915_vm_put(vma->vm); > > One issue still left in i915_gem_object_unbind is that it temporarily > removes > vmas from the obj->vma.list and adds back later as vma needs to be > unbind outside > the obj->vma.lock spinlock. This is an issue for other places where > we iterate > over the obj->vma.list. i915_debugfs_describe_obj is one such case > (upcoming > vm_bind will be another) that iterates over this list. > What is the plan here? Do we need to take object lock while iterating > over the > list? Yeah, I guess that's an option if that's at all possible (we might need to iterate over the list in the mmu notifier, for example). The other option is to *) get rid of the GGTT / PPGTT sorting of vmas in the list, *) being able to determine per vma *before we unlock* if we need to unlock the list spinlock to take action, *) re-add all vmas we've previously iterated over at the *tail* of the list before unlocking the list lock. Then a termination criterion for iterating would be that we reached the end of the list without unlocking. Otherwise we need to restart iteration after unlocking. This would typically give us O(2N) complexity for the iteration. If we re-add at the *head* of the list, we'd see O(N²), but to be able to re- add previous vmas to the tail of the list requires us to get rid of the sorting. > > But this just something I noticed and not related to this patch. > This patch looks good to me. > Reviewed-by: Niranjana Vishwanathapura > > Thanks for reviewing. I noticed there is some documentation needing updating as well, so I'll send out a v3 without functional changes. /Thomas > > > diff --git a/drivers/gpu/drm/i915/i915_vma.c > > b/drivers/gpu/drm/i915/i915_vma.c > > index 91538bc38110..6fd25b39748f 100644 > > --- a/drivers/gpu/drm/i915/i915_vma.c > > +++ b/drivers/gpu/drm/i915/i915_vma.c > > @@ -122,7 +122,6 @@ vma_create(struct drm_i915_gem_object *obj, > > if (vma == NULL) > > return ERR_PTR(-ENOMEM); > > > > - kref_init(&vma->ref); > > vma->ops = &vm->vma_ops; > > vma->obj = obj; > > vma->size = obj->base.size; > > @@ -1628,15 +1627,6 @@ void i915_vma_reopen(struct i915_vma *vma) > > __i915_vma_remove_closed(vma); > > } > > > > -void i915_vma_release(struct kref *ref) > > -{ > > - struct i915_vma *vma = container_of(ref, typeof(*vma), > > ref); > > - > > - i915_active_fini(&vma->active); > > -
Re: [PATCH] drm: Fix no previous prototype error in drm_nomodeset.c
On Wed, Mar 02, 2022 at 10:39:02PM +0530, Aashish Sharma wrote: > Fix this kernel test robot error: > > drivers/gpu/drm/drm_nomodeset.c:8:6: error: > no previous prototype for 'drm_firmware_drivers_only' > > Including drm_drv.h in drm_nomodeset.c which contains > drm_firmware_drivers_only's declaration. > > Signed-off-by: Aashish Sharma Reviewed-by: Guenter Roeck > --- > drivers/gpu/drm/drm_nomodeset.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/drm_nomodeset.c b/drivers/gpu/drm/drm_nomodeset.c > index f3978d5bd3a1..9402deb4985f 100644 > --- a/drivers/gpu/drm/drm_nomodeset.c > +++ b/drivers/gpu/drm/drm_nomodeset.c > @@ -2,6 +2,7 @@ > > #include > #include > +#include > > static bool drm_nomodeset; > > -- > 2.35.1.574.g5d30c73bfb-goog >
Re: [PATCH 15/15] drm/i915/gt: Clear compress metadata for Xe_HP platforms
On Sun, Feb 27, 2022 at 10:22:20PM +0530, Ramalingam C wrote: > Matt, > > Thanks for the review. > > On 2022-02-18 at 17:47:22 -0800, Matt Roper wrote: > > On Sat, Feb 19, 2022 at 12:17:52AM +0530, Ramalingam C wrote: > > > From: Ayaz A Siddiqui > > > > > > Xe-HP and latest devices support Flat CCS which reserved a portion of > > > the device memory to store compression metadata, during the clearing of > > > device memory buffer object we also need to clear the associated > > > CCS buffer. > > > > > > Flat CCS memory can not be directly accessed by S/W. > > > Address of CCS buffer associated main BO is automatically calculated > > > by device itself. KMD/UMD can only access this buffer indirectly using > > > XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer. > > > > > > v2: Fixed issues with platform naming [Lucas] > > > v3: Rebased [Ram] > > > Used the round_up funcs [Bob] > > > v4: Fixed ccs blk calculation [Ram] > > > Added Kdoc on flat-ccs. > > > > > > Cc: CQ Tang > > > Signed-off-by: Ayaz A Siddiqui > > > Signed-off-by: Ramalingam C > > > --- > > > drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 15 ++ > > > drivers/gpu/drm/i915/gt/intel_migrate.c | 145 ++- > > > 2 files changed, 156 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h > > > b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h > > > index f8253012d166..166de5436c4a 100644 > > > --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h > > > +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h > > > @@ -203,6 +203,21 @@ > > > #define GFX_OP_DRAWRECT_INFO ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3)) > > > #define GFX_OP_DRAWRECT_INFO_I965 ((0x7900<<16)|0x2) > > > > > > +#define XY_CTRL_SURF_INSTR_SIZE 5 > > > +#define MI_FLUSH_DW_SIZE 3 > > > +#define XY_CTRL_SURF_COPY_BLT((2 << 29) | (0x48 << 22) | 3) > > > +#define SRC_ACCESS_TYPE_SHIFT 21 > > > +#define DST_ACCESS_TYPE_SHIFT 20 > > > +#define CCS_SIZE_SHIFT 8 > > > > Rather than using a shift, it might be better to just define the > > bitfield. E.g., > > > > #define CCS_SIZEGENMASK(17, 8) > > > > and then later > > > > FIELD_PREP(CCS_SIZE, i - 1) > > > > to refer to the proper value. > > > > > +#define XY_CTRL_SURF_MOCS_SHIFT25 > > > > Same here; we can use GENMASK(31, 25) to define the field. > > Adapting to the GENMASK and FIELD_PREP for these two macros > > > > > +#define NUM_CCS_BYTES_PER_BLOCK256 > > > +#define NUM_BYTES_PER_CCS_BYTE 256 > > > +#define NUM_CCS_BLKS_PER_XFER 1024 > > > +#define INDIRECT_ACCESS0 > > > +#define DIRECT_ACCESS 1 > > > +#define MI_FLUSH_LLCBIT(9) > > > +#define MI_FLUSH_CCSBIT(16) > > > + > > > #define COLOR_BLT_CMD(2 << 29 | 0x40 << 22 | (5 - 2)) > > > #define XY_COLOR_BLT_CMD (2 << 29 | 0x50 << 22) > > > #define SRC_COPY_BLT_CMD (2 << 29 | 0x43 << 22) > > > diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c > > > b/drivers/gpu/drm/i915/gt/intel_migrate.c > > > index 20444d6ceb3c..9f9cd2649377 100644 > > > --- a/drivers/gpu/drm/i915/gt/intel_migrate.c > > > +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c > > > @@ -16,6 +16,8 @@ struct insert_pte_data { > > > }; > > > > > > #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */ > > > +#define GET_CCS_BYTES(i915, size)(HAS_FLAT_CCS(i915) ? \ > > > + DIV_ROUND_UP(size, > > > NUM_BYTES_PER_CCS_BYTE) : 0) > > > > > > static bool engine_supports_migration(struct intel_engine_cs *engine) > > > { > > > @@ -467,6 +469,113 @@ static bool wa_1209644611_applies(int ver, u32 size) > > > return height % 4 == 3 && height <= 8; > > > } > > > > > > +/** > > > + * DOC: Flat-CCS - Memory compression for Local memory > > > + * > > > + * On Xe-HP and later devices, we use dedicated compression control > > > state (CCS) > > > + * stored in local memory for each surface, to support the 3D and media > > > + * compression formats. > > > + * > > > + * The memory required for the CCS of the entire local memory is 1/256 > > > of the > > > + * local memory size. So before the kernel boot, the required memory is > > > reserved > > > + * for the CCS data and a secure register will be programmed with the > > > CCS base > > > + * address. > > > + * > > > + * Flat CCS data needs to be cleared when a lmem object is allocated. > > > + * And CCS data can be copied in and out of CCS region through > > > + * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly. > > > + * > > > + * When we exaust the lmem, if the object's placements support smem, > > > then we can > > > > Typo: exhaust > > > > > + * directly decompress the compressed lmem object into smem and start > > > using it > > > + * from smem itself. > > > + * > > > + * But when
Re: Report 2 in ext4 and journal based on v5.17-rc1
Ted wrote: > On Thu, Mar 03, 2022 at 10:00:33AM +0900, Byungchul Park wrote: > > > > Unfortunately, it's neither perfect nor safe without another wakeup > > source - rescue wakeup source. > > > >consumer producer > > > > lock L > > (too much work queued == true) > > unlock L > > --- preempted > >lock L > >unlock L > >do work > >lock L > >unlock L > >do work > >... > >(no work == true) > >sleep > > --- scheduled in > > sleep > > That's not how things work in ext4. It's **way** more complicated You seem to get it wrong. This example is what Jan Kara gave me. I just tried to explain things based on Jan Kara's example so leaving all statements that Jan Kara wrote. Plus the example was so helpful. Thanks, Jan Kara. > than that. We have multiple wait channels, one wake up the consumer > (read: the commit thread), and one which wakes up any processes > waiting for commit thread to have made forward progress. We also have > two spin-lock protected sequence number, one which indicates the > current commited transaction #, and one indicating the transaction # > that needs to be committed. > > On the commit thread, it will sleep on j_wait_commit, and when it is > woken up, it will check to see if there is work to be done > (j_commit_sequence != j_commit_request), and if so, do the work, and > then wake up processes waiting on the wait_queue j_wait_done_commit. > (Again, all of this uses the pattern, "prepare to wait", then check to > see if we should sleep, if we do need to sleep, unlock j_state_lock, > then sleep. So this prevents any races leading to lost wakeups. > > On the start_this_handle() thread, if we current transaction is too > full, we set j_commit_request to its transaction id to indicate that > we want the current transaction to be committed, and then we wake up > the j_wait_commit wait queue and then we enter a loop where do a > prepare_to_wait in j_wait_done_commit, check to see if > j_commit_sequence == the transaction id that we want to be completed, > and if it's not done yet, we unlock the j_state_lock spinlock, and go > to sleep. Again, because of the prepare_to_wait, there is no chance > of a lost wakeup. The above explantion gives me a clear view about synchronization of journal things. I appreciate it. > So there really is no "consumer" and "producer" here. If you really > insist on using this model, which really doesn't apply, for one Dept does not assume "consumer" and "producer" model at all, but Dept works with general waits and events. *That model is just one of them.* > thread, it's the consumer with respect to one wait queue, and the > producer with respect to the *other* wait queue. For the other > thread, the consumer and producer roles are reversed. > > And of course, this is a highly simplified model, since we also have a > wait queue used by the commit thread to wait for the number of active > handles on a particular transaction to go to zero, and > stop_this_handle() will wake up commit thread via this wait queue when > the last active handle on a particular transaction is retired. (And > yes, that parameter is also protected by a different spin lock which > is per-transaction). This one also gives me a clear view. Thanks a lot. > So it seems to me that a fundamental flaw in DEPT's model is assuming > that the only waiting paradigm that can be used is consumer/producer, No, Dept does not. > and that's simply not true. The fact that you use the term "lock" is > also going to lead a misleading line of reasoning, because properly "lock/unlock L" comes from the Jan Kara's example. It has almost nothing to do with the explanation. I just left "lock/unlock L" as a statement that comes from the Jan Kara's example. > speaking, they aren't really locks. We are simply using wait channels I totally agree with you. *They aren't really locks but it's just waits and wakeups.* That's exactly why I decided to develop Dept. Dept is not interested in locks unlike Lockdep, but fouces on waits and wakeup sources itself. I think you get Dept wrong a lot. Please ask me more if you have things you doubt about Dept. Thanks, Byungchul
RE: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
From: Xiaomeng Tong > Sent: 03 March 2022 02:27 > > On Wed, 2 Mar 2022 14:04:06 +, David Laight > wrote: > > I think that it would be better to make any alternate loop macro > > just set the variable to NULL on the loop exit. > > That is easier to code for and the compiler might be persuaded to > > not redo the test. > > No, that would lead to a NULL dereference. Why, it would make it b ethe same as the 'easy to use': for (item = head; item; item = item->next) { ... if (...) break; ... } if (!item) return; > The problem is the mis-use of iterator outside the loop on exit, and > the iterator will be the HEAD's container_of pointer which pointers > to a type-confused struct. Sidenote: The *mis-use* here refers to > mistakely access to other members of the struct, instead of the > list_head member which acutally is the valid HEAD. The problem is that the HEAD's container_of pointer should never be calculated at all. This is what is fundamentally broken about the current definition. > IOW, you would dereference a (NULL + offset_of_member) address here. Where? > Please remind me if i missed something, thanks. > > Can you share your "alternative definitions" details? thanks! The loop should probably use as extra variable that points to the 'list node' in the next structure. Something like: for (xxx *iter = head->next; iter == &head ? ((item = NULL),0) : ((item = list_item(iter),1)); iter = item->member->next) { ... With a bit of casting you can use 'item' to hold 'iter'. > > > OTOH there may be alternative definitions that can be used to get > > the compiler (or other compiler-like tools) to detect broken code. > > Even if the definition can't possibly generate a working kerrnel. > > The "list_for_each_entry_inside(pos, type, head, member)" way makes > the iterator invisiable outside the loop, and would be catched by > compiler if use-after-loop things happened. It is also a compete PITA for anything doing a search. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
[pull] amdgpu drm-fixes-5.17
Hi Dave, Daniel, Just one small fix for 5.17. The following changes since commit e7c470a4b543375d50d88a4c5abd4b9e0f5adcea: Merge tag 'exynos-drm-fixes-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes (2022-02-28 14:05:44 +1000) are available in the Git repository at: https://gitlab.freedesktop.org/agd5f/linux.git tags/amd-drm-fixes-5.17-2022-03-02 for you to fetch changes up to f1ef17011c765495c876fa75435e59eecfdc1ee4: drm/amdgpu: fix suspend/resume hang regression (2022-03-02 18:36:43 -0500) amd-drm-fixes-5.17-2022-03-02: amdgpu: - Suspend regression fix Qiang Yu (1): drm/amdgpu: fix suspend/resume hang regression drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
Re: [PATCH 1/9] dt-bindings: mxsfb: Add compatible for i.MX8MP
On 3/3/22 03:54, Liu Ying wrote: On Wed, 2022-03-02 at 12:57 +0100, Lucas Stach wrote: Am Mittwoch, dem 02.03.2022 um 17:41 +0800 schrieb Liu Ying: On Wed, 2022-03-02 at 10:23 +0100, Lucas Stach wrote: Am Mittwoch, dem 02.03.2022 um 03:54 +0100 schrieb Marek Vasut: On 3/1/22 14:18, Lucas Stach wrote: Am Dienstag, dem 01.03.2022 um 07:03 -0600 schrieb Adam Ford: On Tue, Mar 1, 2022 at 5:05 AM Lucas Stach wrote: Am Dienstag, dem 01.03.2022 um 11:19 +0100 schrieb Marek Vasut: On 3/1/22 11:04, Lucas Stach wrote: Hi, [...] Given the two totally different IPs, I don't see bugs of IP control logics should be fixed for both drivers. Naturally, the two would diverge due to different HWs. Looking at Patch 9/9, it basically squashes code to control LCDIFv3 into the mxsfb drm driver with 'if/else' checks(barely no common control code), which is hard to maintain and not able to achieve good scalability for both 'LCDIFv3' and 'LCDIF'. I tend to agree with Liu here. Writing a DRM driver isn't that much boilerplate anymore with all the helpers we have available in the framework today. I did write a separate driver for this IP before I spent time merging them into single driver, that's when I realized a single driver is much better and discarded the separate driver idea. The IP is so different from the currently supported LCDIF controllers that I think trying to support this one in the existing driver actually increases the chances to break something when modifying the driver in the future. Not everyone is able to test all LCDIF versions. My vote is on having a separate driver for the i.MX8MP LCDIF. If you look at both controllers, it is clear it is still the LCDIF behind, even the CSC that is bolted on would suggest that. Yes, but from a driver PoV what you care about is not really the hardware blocks used to implement something, but the programming model, i.e. the register interface exposed to software. I am also not happy when I look at the amount of duplication a separate driver would create, it will be some 50% of the code that would be just duplicated. Yea, the duplicated code is still significant, as the HW itself is so simple. However, if you find yourself in the situation where basically every actual register access in the driver ends up being in a "if (some HW rev) ... " clause, i still think it would be better to have a separate driver, as the programming interface is just different. I tend to agree with Marek on this one. We have an instance where the blk-ctrl and the GPC driver between 8m, mini, nano, plus are close, but different enough where each SoC has it's own set of tables and some checks. Lucas created the framework, and others adapted it for various SoC's. If there really is nearly 50% common code for the LCDIF, why not either leave the driver as one or split the common code into its own driver like lcdif-common and then have smaller drivers that handle their specific variations. I don't know exactly how the standalone driver looks like, but I guess the overlap is not really in any real HW specific parts, but the common DRM boilerplate, so there isn't much point in creating a common lcdif driver. The mxsfb currently has 1280 LoC as of patch 8/9 of this series. Of that, there is some 400 LoC which are specific to old LCDIF and this patch adds 380 LoC for the new LCDIF. So that's 800 LoC or ~60% of shared boilerplate that would be duplicated . That is probably ignoring the fact that the 8MP LCDIF does not support any overlays, so it could use the drm_simple_display_pipe infrastructure to reduce the needed boilerplate. The drm_simple_display_pipe infrastructure is probably too simple for i.MX8MP LCDIF, since it uses one only crtc for one drm device. i.MX8MP embeds *three* LCDIF instances to support MIPI DSI, LVDS and HDMI outputs respectively. To use that infrastructure means there would be three dri cards in all. However, the three LCDIF instances can be wrapped by the one drm device, which is not the boilerplate code in the current mxsfb driver may handle. While that may make things a little simpler for userspace, I'm not sure if this is the right thing to do. It complicates the driver a lot, especially if you want to get things like independent power management, etc. right. It also creates a fake view for userspace, where is looks like there might be some shared resources between the different display paths, while in reality they are fully independent. Trade-off will be made between one drm device and three. My first impression of using the drm_simple_display_pipe infrastructure is that it's too simple and less flexible/scalable, because SoC designer will be likely to add muxes between CRTCs and encoders/bridges or overlay plane(s) in next generations of SoCs(SW developers don't seem have good reasons to suggest not to do that). Another concern is that whether the userspace may use the three drm devices well or not. A few more points: 1) With one drm dev
RE: [intel-gfx] [PATCH] drm/i915: avoid concurrent writes to aux_inv (rev3)
Hi Chris, for some reason I didn't receive the review email, so I copied your comments from patchwork and faked this email. >> static void execlists_dequeue(struct intel_engine_cs *engine) >> { >> struct intel_engine_execlists * const execlists = &engine->execlists; >> @@ -1538,6 +1566,16 @@ static void execlists_dequeue(struct intel_engine_cs >> *engine) >> } >> >> if (__i915_request_submit(rq)) { >> + /* hsdes: 1809175790 */ >> + if ((GRAPHICS_VER(engine->i915) == 12) && >> + rq->vd_ve_aux_inv && >> + (engine->class == VIDEO_DECODE_CLASS || >> +engine->class == >> VIDEO_ENHANCEMENT_CLASS)) { > We do not need the extra checks here; we just do as we are told. We only > tell ourselves to apply the fixup when required. Without checking GRAPHICS_VER, I'm seeing a lot of regressions on older platforms in the CI result. This workaround was only implemented for gen12 (gen12_emit_flush_xcs). Without checking engine->class, I'm seeing boot issues due to GEM_BUG_ON() in aux_inv_reg(). Any rq will go through this code regardless of engine class and gen version, so the checking seems to be necessary. >> + *rq->vd_ve_aux_inv = >> i915_mmio_reg_offset > Likewise, vd_ve is overspecific, aux_inv_fixup or aux_inv_wa (or > wa_aux_iv, fixup_aux_inv). I wanted to be specific because the workaround was only implemented for vd/ve engines. But I'm ok with your proposal. >> + (aux_inv_reg(engine)); >> + rq->vd_ve_aux_inv = NULL; > Move this to i915_request initialisation so that we only set aux_inv > when required, which probably explains the extra defence. The pointer is currently initialized with 0x5a5a. I set it to NULL in gen12_emit_flush_xcs, otherwise the rq will enter that if-statement with an invalid pointer. I'm not familiar with the code, there seems to be multiple functions allocating the structure. I agree it's better to set it to NULL at initialization, but need some guidance on where is the best place to do so. >> + rq->execution_mask = engine->mask; >> + } >> if (!merge) { >> *port++ = i915_request_get(last); >> last = NULL; >> diff --git a/drivers/gpu/drm/i915/i915_request.h >> b/drivers/gpu/drm/i915/i915_request.h >> index 28b1f9db5487..69de32e5e15d 100644 >> --- a/drivers/gpu/drm/i915/i915_request.h >> +++ b/drivers/gpu/drm/i915/i915_request.h >> @@ -350,6 +350,8 @@ struct i915_request { >> struct list_head link; >> unsigned long delay; >> } mock;) >> + >> + u32 *vd_ve_aux_inv; > Not at the end of the struct; that's where we put things in the dungeon. > The selftest struct should be last; I do hope no one has been putting > things at random places in the struct without considering the layout and > semantics. From the flow, this is akin to batch, capture_list; before > emitted_jiffies would be a good spot. Got it, will change. I thought adding at the end would be safer, thanks for the explanation. > -Chris
Re: [EXT] Re: [PATCH 1/9] dt-bindings: mxsfb: Add compatible for i.MX8MP
On 3/2/22 14:14, Robby Cai wrote: Hi [...] LCDIF on i.MX8MP is a different IP which is borrowed from non-iMX series, although it's also called 'LCDIF'. We prefer not mix these two series of IPs in one driver for ease of maintenance and extension. Where does the MX8MP LCDIF come from then, SGTL maybe ? AFAIK, it's RT1170. You may have a check on RM [1]. Interestingly, this SoC has both eLCDIF and LCDIFv2, two IPs we are talking about. [1] https://www.nxp.com/webapp/Download?colCode=IMXRT1170RM That's interesting, I wasn't aware of this one. So that's MX8MP LCDIF + overlays. Did the LCDIF v2 in iMXRT1070 have any predecessor too ?
Re: [PATCH 1/9] dt-bindings: mxsfb: Add compatible for i.MX8MP
On 3/2/22 10:23, Lucas Stach wrote: [...] I tend to agree with Marek on this one. We have an instance where the blk-ctrl and the GPC driver between 8m, mini, nano, plus are close, but different enough where each SoC has it's own set of tables and some checks. Lucas created the framework, and others adapted it for various SoC's. If there really is nearly 50% common code for the LCDIF, why not either leave the driver as one or split the common code into its own driver like lcdif-common and then have smaller drivers that handle their specific variations. I don't know exactly how the standalone driver looks like, but I guess the overlap is not really in any real HW specific parts, but the common DRM boilerplate, so there isn't much point in creating a common lcdif driver. The mxsfb currently has 1280 LoC as of patch 8/9 of this series. Of that, there is some 400 LoC which are specific to old LCDIF and this patch adds 380 LoC for the new LCDIF. So that's 800 LoC or ~60% of shared boilerplate that would be duplicated . That is probably ignoring the fact that the 8MP LCDIF does not support any overlays, so it could use the drm_simple_display_pipe infrastructure to reduce the needed boilerplate. It seems the IMXRT1070 LCDIF v2 (heh ...) does support overlays, so no, the mxsfb and hypothetical lcdif drivers would look really very similar. As you brought up the blk-ctrl as an example: I'm all for supporting slightly different hardware in the same driver, as long as the HW interface is close enough. But then I also opted for a separate 8MP blk-ctrl driver for those blk-ctrls that differ significantly from the others, as I think it would make the common driver unmaintainable trying to support all the different variants in one driver. But then you also need to maintain two sets of boilerplate, they diverge, and that is not good. I don't think that there is much chance for bugs going unfixed due to divergence in the boilerplate, especially if you use the simple pipe framework to handle most of that stuff for you, which gives you a lot of code sharing with other simple DRM drivers. But I can not use the simple pipe because overlays, see imxrt1070 . [...] We can always split the drivers later if this becomes unmaintainable too, no ?
Re: [PATCH 1/9] dt-bindings: mxsfb: Add compatible for i.MX8MP
On Wed, 2022-03-02 at 12:57 +0100, Lucas Stach wrote: > Am Mittwoch, dem 02.03.2022 um 17:41 +0800 schrieb Liu Ying: > > On Wed, 2022-03-02 at 10:23 +0100, Lucas Stach wrote: > > > Am Mittwoch, dem 02.03.2022 um 03:54 +0100 schrieb Marek Vasut: > > > > On 3/1/22 14:18, Lucas Stach wrote: > > > > > Am Dienstag, dem 01.03.2022 um 07:03 -0600 schrieb Adam Ford: > > > > > > On Tue, Mar 1, 2022 at 5:05 AM Lucas Stach > > > > > > wrote: > > > > > > > Am Dienstag, dem 01.03.2022 um 11:19 +0100 schrieb Marek Vasut: > > > > > > > > On 3/1/22 11:04, Lucas Stach wrote: > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > Given the two totally different IPs, I don't see bugs of IP > > > > > > > > > > control > > > > > > > > > > logics should be fixed for both drivers. Naturally, the two > > > > > > > > > > would > > > > > > > > > > diverge due to different HWs. Looking at Patch 9/9, it > > > > > > > > > > basically > > > > > > > > > > squashes code to control LCDIFv3 into the mxsfb drm driver > > > > > > > > > > with > > > > > > > > > > 'if/else' checks(barely no common control code), which is > > > > > > > > > > hard to > > > > > > > > > > maintain and not able to achieve good scalability for both > > > > > > > > > > 'LCDIFv3' > > > > > > > > > > and 'LCDIF'. > > > > > > > > > > > > > > > > > > I tend to agree with Liu here. Writing a DRM driver isn't > > > > > > > > > that much > > > > > > > > > boilerplate anymore with all the helpers we have available in > > > > > > > > > the > > > > > > > > > framework today. > > > > > > > > > > > > > > > > I did write a separate driver for this IP before I spent time > > > > > > > > merging > > > > > > > > them into single driver, that's when I realized a single driver > > > > > > > > is much > > > > > > > > better and discarded the separate driver idea. > > > > > > > > > > > > > > > > > The IP is so different from the currently supported LCDIF > > > > > > > > > controllers > > > > > > > > > that I think trying to support this one in the existing > > > > > > > > > driver actually > > > > > > > > > increases the chances to break something when modifying the > > > > > > > > > driver in > > > > > > > > > the future. Not everyone is able to test all LCDIF versions. > > > > > > > > > My vote is > > > > > > > > > on having a separate driver for the i.MX8MP LCDIF. > > > > > > > > > > > > > > > > If you look at both controllers, it is clear it is still the > > > > > > > > LCDIF > > > > > > > > behind, even the CSC that is bolted on would suggest that. > > > > > > > > > > > > > > Yes, but from a driver PoV what you care about is not really the > > > > > > > hardware blocks used to implement something, but the programming > > > > > > > model, > > > > > > > i.e. the register interface exposed to software. > > > > > > > > > > > > > > > I am also not happy when I look at the amount of duplication a > > > > > > > > separate > > > > > > > > driver would create, it will be some 50% of the code that would > > > > > > > > be just > > > > > > > > duplicated. > > > > > > > > > > > > > > > Yea, the duplicated code is still significant, as the HW itself > > > > > > > is so > > > > > > > simple. However, if you find yourself in the situation where > > > > > > > basically > > > > > > > every actual register access in the driver ends up being in a "if > > > > > > > (some > > > > > > > HW rev) ... " clause, i still think it would be better to have a > > > > > > > separate driver, as the programming interface is just different. > > > > > > > > > > > > I tend to agree with Marek on this one. We have an instance where > > > > > > the > > > > > > blk-ctrl and the GPC driver between 8m, mini, nano, plus are close, > > > > > > but different enough where each SoC has it's own set of tables and > > > > > > some checks. Lucas created the framework, and others adapted it > > > > > > for > > > > > > various SoC's. If there really is nearly 50% common code for the > > > > > > LCDIF, why not either leave the driver as one or split the common > > > > > > code > > > > > > into its own driver like lcdif-common and then have smaller drivers > > > > > > that handle their specific variations. > > > > > > > > > > I don't know exactly how the standalone driver looks like, but I guess > > > > > the overlap is not really in any real HW specific parts, but the > > > > > common > > > > > DRM boilerplate, so there isn't much point in creating a common lcdif > > > > > driver. > > > > > > > > The mxsfb currently has 1280 LoC as of patch 8/9 of this series. Of > > > > that, there is some 400 LoC which are specific to old LCDIF and this > > > > patch adds 380 LoC for the new LCDIF. So that's 800 LoC or ~60% of > > > > shared boilerplate that would be duplicated . > > > > > > That is probably ignoring the fact that the 8MP LCDIF does not support > > > any overlays, so it could use the drm_simple_display_pipe > > > infrastructure to
[Bug 215652] kernel 5.17-rc fail to load radeon DRM "modprobe: ERROR: could not insert 'radeon': Unknown symbol in module, or unknown parameter (see dmesg)"
https://bugzilla.kernel.org/show_bug.cgi?id=215652 Alex Deucher (alexdeuc...@gmail.com) changed: What|Removed |Added CC||alexdeuc...@gmail.com --- Comment #3 from Alex Deucher (alexdeuc...@gmail.com) --- If you are using an initrd, the firmware must be present on the initrd for the driver to find it when it loads. Please make sure the firmware is available in your initrd. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
Re: Report 2 in ext4 and journal based on v5.17-rc1
On Thu, Mar 03, 2022 at 10:00:33AM +0900, Byungchul Park wrote: > > Unfortunately, it's neither perfect nor safe without another wakeup > source - rescue wakeup source. > >consumer producer > > lock L > (too much work queued == true) > unlock L > --- preempted >lock L >unlock L >do work >lock L >unlock L >do work >... >(no work == true) >sleep > --- scheduled in > sleep That's not how things work in ext4. It's **way** more complicated than that. We have multiple wait channels, one wake up the consumer (read: the commit thread), and one which wakes up any processes waiting for commit thread to have made forward progress. We also have two spin-lock protected sequence number, one which indicates the current commited transaction #, and one indicating the transaction # that needs to be committed. On the commit thread, it will sleep on j_wait_commit, and when it is woken up, it will check to see if there is work to be done (j_commit_sequence != j_commit_request), and if so, do the work, and then wake up processes waiting on the wait_queue j_wait_done_commit. (Again, all of this uses the pattern, "prepare to wait", then check to see if we should sleep, if we do need to sleep, unlock j_state_lock, then sleep. So this prevents any races leading to lost wakeups. On the start_this_handle() thread, if we current transaction is too full, we set j_commit_request to its transaction id to indicate that we want the current transaction to be committed, and then we wake up the j_wait_commit wait queue and then we enter a loop where do a prepare_to_wait in j_wait_done_commit, check to see if j_commit_sequence == the transaction id that we want to be completed, and if it's not done yet, we unlock the j_state_lock spinlock, and go to sleep. Again, because of the prepare_to_wait, there is no chance of a lost wakeup. So there really is no "consumer" and "producer" here. If you really insist on using this model, which really doesn't apply, for one thread, it's the consumer with respect to one wait queue, and the producer with respect to the *other* wait queue. For the other thread, the consumer and producer roles are reversed. And of course, this is a highly simplified model, since we also have a wait queue used by the commit thread to wait for the number of active handles on a particular transaction to go to zero, and stop_this_handle() will wake up commit thread via this wait queue when the last active handle on a particular transaction is retired. (And yes, that parameter is also protected by a different spin lock which is per-transaction). So it seems to me that a fundamental flaw in DEPT's model is assuming that the only waiting paradigm that can be used is consumer/producer, and that's simply not true. The fact that you use the term "lock" is also going to lead a misleading line of reasoning, because properly speaking, they aren't really locks. We are simply using wait channels to wake up processes as necessary, and then they will check other variables to decide whether or not they need to sleep or not, and we have an invariant that when these variables change indicating forward progress, the associated wait channel will be woken up. Cheers, - Ted P.S. This model is also highly simplified since there are other reasons why the commit thread can be woken up, some which might be via a timeout, and some which is via the j_wait_commit wait channel but not because j_commit_request has been changed, but because file system is being unmounted, or the file system is being frozen in preparation of a snapshot, etc. These are *not* necessary to prevent a deadlock, because under normal circumstances the two wake channels are sufficient of themselves. So please don't think of them as "rescue wakeup sources"; again, that's highly misleading and the wrong way to think of them. And to make things even more complicated, we have more than 2 wait channel --- we have *five*: /** * @j_wait_transaction_locked: * * Wait queue for waiting for a locked transaction to start committing, * or for a barrier lock to be released. */ wait_queue_head_t j_wait_transaction_locked; /** * @j_wait_done_commit: Wait queue for waiting for commit to complete. */ wait_queue_head_t j_wait_done_commit; /** * @j_wait_commit: Wait queue to trigger commit. */ wait_queue_head_t j_wait_commit; /** * @j_wait_updates: Wait queue to wait for updates to complete. */ wait_queue_head_t j_wait_updates; /** * @j_wait_reserved: * * Wait queue to wait f
[PATCH -next] drm/fsl-dcu: Remove unnecessary print function dev_err()
The print function dev_err() is redundant because platform_get_irq() already prints an error. Eliminate the follow coccicheck warning: ./drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c:277:2-9: line 277 is redundant because platform_get_irq() already prints an error Reported-by: Abaci Robot Signed-off-by: Yang Li --- drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c b/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c index 7a503bf08d0f..20895ea79739 100644 --- a/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c +++ b/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c @@ -273,10 +273,8 @@ static int fsl_dcu_drm_probe(struct platform_device *pdev) } fsl_dev->irq = platform_get_irq(pdev, 0); - if (fsl_dev->irq < 0) { - dev_err(dev, "failed to get irq\n"); + if (fsl_dev->irq < 0) return fsl_dev->irq; - } fsl_dev->regmap = devm_regmap_init_mmio(dev, base, &fsl_dcu_regmap_config); -- 2.20.1.7.g153144c
Re: [PATCH v3 00/21] DEPT(Dependency Tracker)
On Wed, Mar 02, 2022 at 04:36:38AM +, Hyeonggon Yoo wrote: > On Mon, Feb 28, 2022 at 06:56:39PM +0900, Byungchul Park wrote: > > I didn't want to bother you so I was planning to send the next spin > > after making more progress. However, PATCH v2 reports too many false > > positives because Dept tracked the bit_wait_table[] wrong way - I > > apologize for that. So I decided to send PATCH v3 first before going > > further for those who want to run Dept for now. > > > > There might still be some false positives but not overwhelming. > > > > Hello Byungchul, I'm running DEPT v3 on my system > and I see report below. > > Looking at the kmemleak code and comment, I think > kmemleak tried to avoid lockdep recursive warning > but detected by DEPT? > > === > DEPT: Circular dependency has been detected. > 5.17.0-rc1+ #1 Tainted: GW > --- > summary > --- > *** AA DEADLOCK *** > > context A > [S] __raw_spin_lock_irqsave(&object->lock:0) > [W] _raw_spin_lock_nested(&object->lock:0) > [E] spin_unlock(&object->lock:0) > > [S]: start of the event context > [W]: the wait blocked > [E]: the event not reachable > --- > context A's detail > --- > context A > [S] __raw_spin_lock_irqsave(&object->lock:0) > [W] _raw_spin_lock_nested(&object->lock:0) > [E] spin_unlock(&object->lock:0) > > [S] __raw_spin_lock_irqsave(&object->lock:0): > [] scan_gray_list+0x84/0x13c > stacktrace: > dept_ecxt_enter+0x88/0xf4 > _raw_spin_lock_irqsave+0xf0/0x1c4 > scan_gray_list+0x84/0x13c > kmemleak_scan+0x2d8/0x54c > kmemleak_scan_thread+0xac/0xd4 > kthread+0xd4/0xe4 > ret_from_fork+0x10/0x20 [W]'s stack trace is missed. But I guess this issue is the same issue of what you reported following this one. We can discuss this issue on the other report's thread. Thanks, Byunghcul > [E] spin_unlock(&object->lock:0): > [] scan_block+0x60/0x128 > --- > information that might be helpful > --- > CPU: 1 PID: 38 Comm: kmemleak Tainted: GW 5.17.0-rc1+ #1 > Hardware name: linux,dummy-virt (DT) > Call trace: > dump_backtrace.part.0+0x9c/0xc4 > show_stack+0x14/0x28 > dump_stack_lvl+0x9c/0xcc > dump_stack+0x14/0x2c > print_circle+0x2d4/0x438 > cb_check_dl+0x44/0x70 > bfs+0x60/0x168 > add_dep+0x88/0x11c > add_wait+0x2d0/0x2dc > __dept_wait+0x8c/0xa4 > dept_wait+0x6c/0x88 > _raw_spin_lock_nested+0xa8/0x1b0 > scan_block+0xb4/0x128 > scan_gray_list+0xc4/0x13c > kmemleak_scan+0x2d8/0x54c > kmemleak_scan_thread+0xac/0xd4 > kthread+0xd4/0xe4 > ret_from_fork+0x10/0x20 > > > --- > > > > Hi Linus and folks, > > > > I've been developing a tool for detecting deadlock possibilities by > > tracking wait/event rather than lock(?) acquisition order to try to > > cover all synchonization machanisms. It's done on v5.17-rc1 tag. > > > > https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1 > > > [...] > > Benifit: > > > > 0. Works with all lock primitives. > > 1. Works with wait_for_completion()/complete(). > > 2. Works with 'wait' on PG_locked. > > 3. Works with 'wait' on PG_writeback. > > 4. Works with swait/wakeup. > > 5. Works with waitqueue. > > 6. Multiple reports are allowed. > > 7. Deduplication control on multiple reports. > > 8. Withstand false positives thanks to 6. > > 9. Easy to tag any wait/event. > > > > Future work: > > > > 0. To make it more stable. > > 1. To separates Dept from Lockdep. > > 2. To improves performance in terms of time and space. > > 3. To use Dept as a dependency engine for Lockdep. > > 4. To add any missing tags of wait/event in the kernel. > > 5. To deduplicate stack trace. > > > > How to interpret reports: > > > > 1. E(event) in each context cannot be triggered because of the > >W(wait) that cannot be woken. > > 2. The stack trace helping find the problematic code is located > >in each conext's detail. > > > > Thanks, > > Byungchul > > > > --- > > > > Changes from v2: > > > > 1. Disable Dept on bit_wait_table[] in sched/wait_bit.c > >reporting a lot of false positives, which is my fault. > >Wait/event for bit_wait_table[] should've been tagged in a > >higher layer for better work, which is a future work. > >(feedback from Jan Kara) > > 2. Disable Dept on crypto_larval's completion to prevent a false > >positive. > > > > Changes from v1: > > > > 1. Fix coding style and typo. (feedback from Steven) > > 2. Distinguish each work context from another in workqueue. > > 3. Skip checking lock acquisition with nest_lock, whic
[PATCH -next] drm/amdgpu: clean up some inconsistent indenting
Eliminate the follow smatch warning: drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c:413 amdgpu_get_xgmi_hive() warn: inconsistent indenting Reported-by: Abaci Robot Signed-off-by: Yang Li --- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index 91817a31f3e1..4ff6e06babca 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -410,14 +410,14 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev) */ if (adev->reset_domain->type != XGMI_HIVE) { hive->reset_domain = amdgpu_reset_create_reset_domain(XGMI_HIVE, "amdgpu-reset-hive"); - if (!hive->reset_domain) { - dev_err(adev->dev, "XGMI: failed initializing reset domain for xgmi hive\n"); - ret = -ENOMEM; - kobject_put(&hive->kobj); - kfree(hive); - hive = NULL; - goto pro_end; - } + if (!hive->reset_domain) { + dev_err(adev->dev, "XGMI: failed initializing reset domain for xgmi hive\n"); + ret = -ENOMEM; + kobject_put(&hive->kobj); + kfree(hive); + hive = NULL; + goto pro_end; + } } else { amdgpu_reset_get_reset_domain(adev->reset_domain); hive->reset_domain = adev->reset_domain; -- 2.20.1.7.g153144c
Re: Report 2 in ext4 and journal based on v5.17-rc1
On Mon, Feb 28, 2022 at 04:25:04PM -0500, Theodore Ts'o wrote: > On Mon, Feb 28, 2022 at 11:14:44AM +0100, Jan Kara wrote: > > > case 1. Code with an actual circular dependency, but not deadlock. > > > > > >A circular dependency can be broken by a rescue wakeup source e.g. > > >timeout. It's not a deadlock. If it's okay that the contexts > > >participating in the circular dependency and others waiting for the > > >events in the circle are stuck until it gets broken. Otherwise, say, > > >if it's not meant, then it's anyway problematic. > > > > > >1-1. What if we judge this code is problematic? > > >1-2. What if we judge this code is good? > > > > > > I've been wondering if the kernel guys esp. Linus considers code with > > > any circular dependency is problematic or not, even if it won't lead to > > > a deadlock, say, case 1. Even though I designed Dept based on what I > > > believe is right, of course, I'm willing to change the design according > > > to the majority opinion. > > > > > > However, I would never allow case 1 if I were the owner of the kernel > > > for better stability, even though the code works anyway okay for now. > > Note, I used the example of the timeout as the most obvious way of > explaining that a deadlock is not possible. There is also the much > more complex explanation which Jan was trying to give, which is what > leads to the circular dependency. It can happen that when trying to > start a handle, if either (a) there is not enough space in the journal > for new handles, or (b) the current transaction is so large that if we > don't close the transaction and start a new hone, we will end up > running out of space in the future, and so in that case, > start_this_handle() will block starting any more handles, and then > wake up the commit thread. The commit thread then waits for the > currently running threads to complete, before it allows new handles to > start, and then it will complete the commit. In the case of (a) we > then need to do a journal checkpoint, which is more work to release > space in the journal, and only then, can we allow new handles to start. Thank you for the full explanation of how journal things work. > The botom line is (a) it works, (b) there aren't significant delays, > and for DEPT to complain that this is somehow wrong and we need to > completely rearchitect perfectly working code because it doesn't > confirm to DEPT's idea of what is "correct" is not acceptable. Thanks to you and Jan Kara, I realized it's not a real dependency in the consumer and producer scenario but again *ONLY IF* there is a rescue wakeup source. Dept should track the rescue wakeup source instead in the case. I won't ask you to rearchitect the working code. The code looks sane. Thanks a lot. Thanks, Byungchul > > We have a queue of work to do Q protected by lock L. Consumer process has > > code like: > > > > while (1) { > > lock L > > prepare_to_wait(work_queued); > > if (no work) { > > unlock L > > sleep > > } else { > > unlock L > > do work > > wake_up(work_done) > > } > > } > > > > AFAIU Dept will create dependency here that 'wakeup work_done' is after > > 'wait for work_queued'. Producer has code like: > > > > while (1) { > > lock L > > prepare_to_wait(work_done) > > if (too much work queued) { > > unlock L > > sleep > > } else { > > queue work > > unlock L > > wake_up(work_queued) > > } > > } > > > > And Dept will create dependency here that 'wakeup work_queued' is after > > 'wait for work_done'. And thus we have a trivial cycle in the dependencies > > despite the code being perfectly valid and safe. > > Cheers, > > - Ted
Re: [PATCH] video: fbdev: sm712fb: Fix crash in smtcfb_write()
Hi, On Thu, Mar 3, 2022 at 12:49 AM Helge Deller wrote: > > On 3/2/22 15:33, Zheyu Ma wrote: > > When the sm712fb driver writes three bytes to the framebuffer, the > > driver will crash: > > > > BUG: unable to handle page fault for address: c90001ff > > RIP: 0010:smtcfb_write+0x454/0x5b0 > > Call Trace: > > vfs_write+0x291/0xd60 > > ? do_sys_openat2+0x27d/0x350 > > ? __fget_light+0x54/0x340 > > ksys_write+0xce/0x190 > > do_syscall_64+0x43/0x90 > > entry_SYSCALL_64_after_hwframe+0x44/0xae > > > > Fix it by removing the open-coded endianness fixup-code. > > > > Signed-off-by: Zheyu Ma > > Thanks... it's already in the fbdev git tree and queued up for v5.18... > https://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev.git/commit/?h=for-next&id=bd771cf5c4254511cc4abb88f3dab3bd58bdf8e8 This patch fixes the crash in smtcfb_read(), but there is a similar bug in smtcfb_write(), and I mocked up your patch a wrote a new patch for it. So we should fix two bugs with two patches. Regards, Zheyu Ma
Re: Report 2 in ext4 and journal based on v5.17-rc1
On Mon, Feb 28, 2022 at 11:14:44AM +0100, Jan Kara wrote: > On Mon 28-02-22 18:28:26, Byungchul Park wrote: > > case 1. Code with an actual circular dependency, but not deadlock. > > > >A circular dependency can be broken by a rescue wakeup source e.g. > >timeout. It's not a deadlock. If it's okay that the contexts > >participating in the circular dependency and others waiting for the > >events in the circle are stuck until it gets broken. Otherwise, say, > >if it's not meant, then it's anyway problematic. > > > >1-1. What if we judge this code is problematic? > >1-2. What if we judge this code is good? > > > > case 2. Code with an actual circular dependency, and deadlock. > > > >There's no other wakeup source than those within the circular > >dependency. Literally deadlock. It's problematic and critical. > > > >2-1. What if we judge this code is problematic? > >2-2. What if we judge this code is good? > > > > case 3. Code with no actual circular dependency, and not deadlock. > > > >Must be good. > > > >3-1. What if we judge this code is problematic? > >3-2. What if we judge this code is good? > > > > --- > > > > I call only 3-1 "false positive" circular dependency. And you call 1-1 > > and 3-1 "false positive" deadlock. > > > > I've been wondering if the kernel guys esp. Linus considers code with > > any circular dependency is problematic or not, even if it won't lead to > > a deadlock, say, case 1. Even though I designed Dept based on what I > > believe is right, of course, I'm willing to change the design according > > to the majority opinion. > > > > However, I would never allow case 1 if I were the owner of the kernel > > for better stability, even though the code works anyway okay for now. > > So yes, I call a report for the situation "There is circular dependency but > deadlock is not possible." a false positive. And that is because in my > opinion your definition of circular dependency includes schemes that are > useful and used in the kernel. > > Your example in case 1 is kind of borderline (I personally would consider > that bug as well) but there are other more valid schemes with multiple > wakeup sources like: > > We have a queue of work to do Q protected by lock L. Consumer process has > code like: > > while (1) { > lock L > prepare_to_wait(work_queued); > if (no work) { > unlock L > sleep > } else { > unlock L > do work > wake_up(work_done) > } > } > > AFAIU Dept will create dependency here that 'wakeup work_done' is after > 'wait for work_queued'. Producer has code like: First of all, thank you for this good example. > while (1) { > lock L > prepare_to_wait(work_done) > if (too much work queued) { > unlock L > sleep > } else { > queue work > unlock L > wake_up(work_queued) > } > } > > And Dept will create dependency here that 'wakeup work_queued' is after > 'wait for work_done'. And thus we have a trivial cycle in the dependencies > despite the code being perfectly valid and safe. Unfortunately, it's neither perfect nor safe without another wakeup source - rescue wakeup source. consumer producer lock L (too much work queued == true) unlock L --- preempted lock L unlock L do work lock L unlock L do work ... (no work == true) sleep --- scheduled in sleep This code leads a deadlock without another wakeup source, say, not safe. But yes. I also think this code should be allowed if it anyway runs alongside another wakeup source. For the case, Dept should track the rescue wakeup source instead that leads a actual deadlock. I will correct code to make Dept track its rescue wakeup source whenever finding the case. Lastly, just for your information, I need to explain how Dept works a little more for you not to misunderstand Dept. Assuming the consumer and producer guarantee not to lead a deadlock like the following, Dept won't report it a problem: consumer producer sleep wakeup work_done queue work sleep wakeup work_queued do work sleep wakeup work_done queue work sleep wakeup work_queued do work sleep ... ... Dept does not consider all waits preceeding an event but only waits that might lead a deadlock. In this case, Dept works with each region independently. consumer pr
Re: [PATCH 7/8] drm/vmwgfx: Implement MSI/MSI-X support for IRQs
Hi Zack, I love your patch! Perhaps something to improve: [auto build test WARNING on drm/drm-next] [also build test WARNING on drm-intel/for-linux-next drm-tip/drm-tip drm-exynos/exynos-drm-next tegra-drm/drm/tegra/for-next v5.17-rc6 next-20220302] [cannot apply to airlied/drm-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Zack-Rusin/drm-vmwgfx-3D-on-arm64-and-large-cursor-support/20220302-232641 base: git://anongit.freedesktop.org/drm/drm drm-next config: i386-randconfig-a004 (https://download.01.org/0day-ci/archive/20220303/202203030802.itiytjh1-...@intel.com/config) compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d271fc04d5b97b12e6b797c6067d3c96a8d7470e) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/63d27e1cb11562966a6f07bf0e667b0a54dc2412 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Zack-Rusin/drm-vmwgfx-3D-on-arm64-and-large-cursor-support/20220302-232641 git checkout 63d27e1cb11562966a6f07bf0e667b0a54dc2412 # save the config file to linux build tree mkdir build_dir COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash drivers/gpu/drm/vmwgfx/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): >> drivers/gpu/drm/vmwgfx/vmwgfx_irq.c:340:6: warning: variable 'ret' is used >> uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] if (nvec <= 0) { ^ drivers/gpu/drm/vmwgfx/vmwgfx_irq.c:369:9: note: uninitialized use occurs here return ret; ^~~ drivers/gpu/drm/vmwgfx/vmwgfx_irq.c:340:2: note: remove the 'if' if its condition is always false if (nvec <= 0) { ^~~~ drivers/gpu/drm/vmwgfx/vmwgfx_irq.c:330:9: note: initialize the variable 'ret' to silence this warning int ret; ^ = 0 1 warning generated. vim +340 drivers/gpu/drm/vmwgfx/vmwgfx_irq.c 319 320 /** 321 * vmw_irq_install - Install the irq handlers 322 * 323 * @dev_priv: Pointer to the vmw_private device. 324 * Return: Zero if successful. Negative number otherwise. 325 */ 326 int vmw_irq_install(struct vmw_private *dev_priv) 327 { 328 struct pci_dev *pdev = to_pci_dev(dev_priv->drm.dev); 329 struct drm_device *dev = &dev_priv->drm; 330 int ret; 331 int nvec; 332 int i = 0; 333 334 BUILD_BUG_ON((SVGA_IRQFLAG_MAX >> VMWGFX_MAX_NUM_IRQS) != 1); 335 BUG_ON(VMWGFX_MAX_NUM_IRQS != get_count_order(SVGA_IRQFLAG_MAX)); 336 337 nvec = pci_alloc_irq_vectors(pdev, 1, VMWGFX_MAX_NUM_IRQS, 338 PCI_IRQ_ALL_TYPES); 339 > 340 if (nvec <= 0) { --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
Re: [PATCH v3 00/21] DEPT(Dependency Tracker)
On Wed, Mar 02, 2022 at 04:53:41AM +, Hyeonggon Yoo wrote: > On Wed, Mar 02, 2022 at 04:36:38AM +, Hyeonggon Yoo wrote: > > On Mon, Feb 28, 2022 at 06:56:39PM +0900, Byungchul Park wrote: > > > I didn't want to bother you so I was planning to send the next spin > > > after making more progress. However, PATCH v2 reports too many false > > > positives because Dept tracked the bit_wait_table[] wrong way - I > > > apologize for that. So I decided to send PATCH v3 first before going > > > further for those who want to run Dept for now. > > > > > > There might still be some false positives but not overwhelming. > > > > > > > Hello Byungchul, I'm running DEPT v3 on my system > > and I see report below. > > > > Looking at the kmemleak code and comment, I think > > kmemleak tried to avoid lockdep recursive warning > > but detected by DEPT? > > > > Forgot to include another warning caused by DEPT. > > And comment below might be useful for debugging: > > in kmemleak.c: > 43 * Locks and mutexes are acquired/nested in the following order: > 44 * > 45 * scan_mutex [-> object->lock] -> kmemleak_lock -> other_object->lock > (SINGLE_DEPTH_NESTING) > 46 * > 47 * No kmemleak_lock and object->lock nesting is allowed outside > scan_mutex > 48 * regions. > > === > DEPT: Circular dependency has been detected. > 5.17.0-rc1+ #1 Tainted: GW > --- > summary > --- > *** DEADLOCK *** > > context A > [S] __raw_spin_lock_irqsave(&object->lock:0) > [W] __raw_spin_lock_irqsave(kmemleak_lock:0) > [E] spin_unlock(&object->lock:0) > > context B > [S] __raw_spin_lock_irqsave(kmemleak_lock:0) > [W] _raw_spin_lock_nested(&object->lock:0) > [E] spin_unlock(kmemleak_lock:0) > > [S]: start of the event context > [W]: the wait blocked > [E]: the event not reachable Hi Hyeonggon, Dept also allows the following scenario when an user guarantees that each lock instance is different from another at a different depth: lock A0 with depth lock A1 with depth + 1 lock A2 with depth + 2 lock A3 with depth + 3 (and so on) .. unlock A3 unlock A2 unlock A1 unlock A0 However, Dept does not allow the following scenario where another lock class cuts in the dependency chain: lock A0 with depth lock B lock A1 with depth + 1 lock A2 with depth + 2 lock A3 with depth + 3 (and so on) .. unlock A3 unlock A2 unlock A1 unlock B unlock A0 This scenario is clearly problematic. What do you think is going to happen with another context running the following? lock A1 with depth lock B lock A2 with depth + 1 lock A3 with depth + 2 (and so on) .. unlock A3 unlock A2 unlock B unlock A1 It's a deadlock. That's why Dept reports this case as a problem. Or am I missing something? Thanks, Byungchul > --- > context A's detail > --- > context A > [S] __raw_spin_lock_irqsave(&object->lock:0) > [W] __raw_spin_lock_irqsave(kmemleak_lock:0) > [E] spin_unlock(&object->lock:0) > > [S] __raw_spin_lock_irqsave(&object->lock:0): > [] scan_gray_list+0x84/0x13c > stacktrace: > dept_ecxt_enter+0x88/0xf4 > _raw_spin_lock_irqsave+0xf0/0x1c4 > scan_gray_list+0x84/0x13c > kmemleak_scan+0x2d8/0x54c > kmemleak_scan_thread+0xac/0xd4 > kthread+0xd4/0xe4 > ret_from_fork+0x10/0x20 > > [W] __raw_spin_lock_irqsave(kmemleak_lock:0): > [] scan_block+0x3c/0x128 > stacktrace: > __dept_wait+0x8c/0xa4 > dept_wait+0x6c/0x88 > _raw_spin_lock_irqsave+0xb8/0x1c4 > scan_block+0x3c/0x128 > scan_gray_list+0xc4/0x13c > kmemleak_scan+0x2d8/0x54c > kmemleak_scan_thread+0xac/0xd4 > kthread+0xd4/0xe4 > ret_from_fork+0x10/0x20 > > [E] spin_unlock(&object->lock:0): > [] scan_block+0x60/0x128 > > --- > context B's detail > --- > context B > [S] __raw_spin_lock_irqsave(kmemleak_lock:0) > [W] _raw_spin_lock_nested(&object->lock:0) > [E] spin_unlock(kmemleak_lock:0) > > [S] __raw_spin_lock_irqsave(kmemleak_lock:0): > [] scan_block+0x3c/0x128 > stacktrace: > dept_ecxt_enter+0x88/0xf4 > _raw_spin_lock_irqsave+0xf0/0x1c4 > scan_block+0x3c/0x128 > kmemleak_scan+0x19c/0x54c > kmemleak_scan_thread+0xac/0xd4 > kthread+0xd4/0xe4 > ret_from_fork+0x10/0x20 > > [W] _raw_spin_lock_nested(&object->lock:0): > [] scan_block+0xb4/0x128 > stacktrace: > dept_wait+0x74/0x88 > _raw_spin_lock_nested+0xa8/0x1b0 > scan_block+0xb4/0x128 > kmemleak_scan+0x19c/0x54c > kmemleak_scan_thread+0xac/0xd4 > kthread+0xd4/0xe4 > ret_from_fork+0x10/0x
[PATCH] drm/selftest: plane_helper: Put test structures in static storage
Clang warns on certain 32-bit architectures: drivers/gpu/drm/selftests/test-drm_plane_helper.c:76:5: warning: stack frame size (1064) exceeds limit (1024) in 'igt_check_plane_state' [-Wframe-larger-than] int igt_check_plane_state(void *ignored) ^ 1 warning generated. The structures in igt_check_plane_state() total 1008 bytes, so any small amount of inlining will cause the stack frame to exceed the 32-bit limit of 1024, triggering the warning. Move these structures to static storage, which dramatically reduces the amount of stack space in igt_check_plane_state(). There is no testing impact, as igt_check_plane_state() is only called once in the driver. Fixes: 943e6a8beeac ("mock a drm_plane in igt_check_plane_state to make the test more robust") Link: https://github.com/ClangBuiltLinux/linux/issues/1600 Reported-by: kernel test robot Suggested-by: Nick Desaulniers Signed-off-by: Nathan Chancellor --- drivers/gpu/drm/selftests/test-drm_plane_helper.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/selftests/test-drm_plane_helper.c b/drivers/gpu/drm/selftests/test-drm_plane_helper.c index ceebeede55ea..b61273e9c403 100644 --- a/drivers/gpu/drm/selftests/test-drm_plane_helper.c +++ b/drivers/gpu/drm/selftests/test-drm_plane_helper.c @@ -77,7 +77,7 @@ int igt_check_plane_state(void *ignored) { int ret; - const struct drm_crtc_state crtc_state = { + static const struct drm_crtc_state crtc_state = { .crtc = ZERO_SIZE_PTR, .enable = true, .active = true, @@ -87,14 +87,14 @@ int igt_check_plane_state(void *ignored) DRM_MODE_FLAG_NHSYNC | DRM_MODE_FLAG_NVSYNC) }, }; - struct drm_plane plane = { + static struct drm_plane plane = { .dev = NULL }; - struct drm_framebuffer fb = { + static struct drm_framebuffer fb = { .width = 2048, .height = 2048 }; - struct drm_plane_state plane_state = { + static struct drm_plane_state plane_state = { .plane = &plane, .crtc = ZERO_SIZE_PTR, .fb = &fb, base-commit: 9ae2ac4d31a85ce59cc560d514a31b95f4ace154 -- 2.35.1
[Bug 215652] kernel 5.17-rc fail to load radeon DRM "modprobe: ERROR: could not insert 'radeon': Unknown symbol in module, or unknown parameter (see dmesg)"
https://bugzilla.kernel.org/show_bug.cgi?id=215652 --- Comment #2 from Erhard F. (erhar...@mailbox.org) --- Created attachment 300522 --> https://bugzilla.kernel.org/attachment.cgi?id=300522&action=edit kernel .config (kernel 5.17-rc5, CONFIG_DRM_RADEON=m, Talos II) -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 215652] kernel 5.17-rc fail to load radeon DRM "modprobe: ERROR: could not insert 'radeon': Unknown symbol in module, or unknown parameter (see dmesg)"
https://bugzilla.kernel.org/show_bug.cgi?id=215652 --- Comment #1 from Erhard F. (erhar...@mailbox.org) --- Created attachment 300521 --> https://bugzilla.kernel.org/attachment.cgi?id=300521&action=edit kernel dmesg (kernel 5.17-rc5, CONFIG_DRM_RADEON=y, Talos II) -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 215652] New: kernel 5.17-rc fail to load radeon DRM "modprobe: ERROR: could not insert 'radeon': Unknown symbol in module, or unknown parameter (see dmesg)"
https://bugzilla.kernel.org/show_bug.cgi?id=215652 Bug ID: 215652 Summary: kernel 5.17-rc fail to load radeon DRM "modprobe: ERROR: could not insert 'radeon': Unknown symbol in module, or unknown parameter (see dmesg)" Product: Drivers Version: 2.5 Kernel Version: 5.17-rc5 Hardware: PPC-64 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-...@kernel-bugs.osdl.org Reporter: erhar...@mailbox.org CC: platform_ppc...@kernel-bugs.osdl.org Regression: No Created attachment 300520 --> https://bugzilla.kernel.org/attachment.cgi?id=300520&action=edit kernel dmesg (kernel 5.17-rc5, CONFIG_DRM_RADEON=m, Talos II) Kernel 5.17-rc5 has problems loading the radeon KMS module on my Talos II: # modprobe -v radeon insmod /lib/modules/5.17.0-rc5-P9+/kernel/drivers/gpu/drm/drm_kms_helper.ko modprobe: ERROR: could not insert 'radeon': Unknown symbol in module, or unknown parameter (see dmesg) dmesg show no further output then. When I build a kernel with CONFIG_DRM_RADEON=y I get this in dmesg: [...] ATOM BIOS: X1550 [drm] Generation 2 PCI interface, using max accessible memory radeon :01:00.0: VRAM: 512M 0x - 0x1FFF (512M used) radeon :01:00.0: GTT: 512M 0x2000 - 0x3FFF [drm] Detected VRAM RAM=512M, BAR=256M [drm] RAM width 128bits DDR radeon :01:00.0: dma_iommu_get_required_mask: returning bypass mask 0xfff [drm] radeon: 512M of VRAM memory ready [drm] radeon: 512M of GTT memory ready. [drm] GART: num cpu pages 131072, num gpu pages 131072 [drm] radeon: 1 quad pipes, 1 z pipes initialized. [drm] PCIE GART of 512M enabled (table at 0x0004). radeon :01:00.0: WB enabled radeon :01:00.0: fence driver on ring 0 use gpu addr 0x2000 radeon :01:00.0: radeon: MSI limited to 32-bit [drm] radeon: irq initialized. [drm] Loading R500 Microcode radeon :01:00.0: Direct firmware load for radeon/R520_cp.bin failed with error -2 radeon_cp: Failed to load firmware "radeon/R520_cp.bin" [drm:.r100_cp_init] *ERROR* Failed to load firmware! radeon :01:00.0: failed initializing CP (-2). radeon :01:00.0: Disabling GPU acceleration So it complains about not finding the relevant firmware. But the firmware being located in everything should be ok. This used to work on kernels 5.16.x and before. # ls -al /lib/firmware/radeon/R520_cp.bin -rw-r--r-- 1 root root 2048 2. Mär 20:43 /lib/firmware/radeon/R520_cp.bin Some data about the machine: # lspci :00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4) :01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV516 [Radeon X1300/X1550 Series] :01:00.1 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] RV516 [Radeon X1300/X1550 Series] (Secondary) 0001:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4) 0001:01:00.0 Non-Volatile memory controller: Phison Electronics Corporation Device 5008 (rev 01) 0002:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4) 0003:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4) 0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02) 0004:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4) 0004:01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0004:01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 0005:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4) 0005:01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04) 0005:02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41) 0030:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4) 0031:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4) 0032:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4) 0033:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4) # lspci -s :01:00.0 -vv :01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV516 [Radeon X1300/X1550 Series] (prog-if 00 [VGA controller]) Subsystem: PC Partner Limited / Sapphire Technology RV516 [Radeon X1300/X1550 Series] Device tree node: /sys/firmware/devicetree/base/pciex@600c3c000/pci@0/vga@0 Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- [disabled] Expansion ROM at 600c0 [disabled] [size=128K] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Re: [PATCH v16 4/4] drm/bridge: dw-hdmi: fix bus formats negotiation for 8 bit modes
Hi Neil, > Am 02.03.2022 um 15:34 schrieb Neil Armstrong : > > Hi, > >> (cross-checked: RGB mode still works if I force hdmi->sink_is_hdmi = false) > > I don't understand what's wrong, can you try to make the logic select > MEDIA_BUS_FMT_YUV8_1X24 instead of DRM_COLOR_FORMAT_YCBCR422 ? I have forced hdmi->sink_is_hdmi = false and replaced /* Default 8bit RGB fallback */ - output_fmts[i++] = MEDIA_BUS_FMT_RGB888_1X24; + output_fmts[i++] = MEDIA_BUS_FMT_YUV8_1X24; And then screen remains black. MEDIA_BUS_FMT_RGB888_1X24 works. (MEDIA_BUS_FMT_VUY8_1X24 doesn't work either). So this indicates that YUV conversion is not working properly. Maybe missing some special setup. What I have to test if it works on a different monitor. Not that this specific panel (a 7 inch waveshare touch with HDMIinput) is buggy and reports YUV capabilities but does not handle them... On the other hand this panel works on RasPi and OMAP5 (where I admit I do not know in which mode). > If your CSC is broken, we'll need to disable it on your platform. Indeed. So it seems as if we need a mechanism to overwrite dw_hdmi_bridge_atomic_get_output_bus_fmts() in our ingenic-dw-hdmi platform specialization [1] to always return MEDIA_BUS_FMT_RGB888_1X24. Or alternatively set sink_is_hdmi = false there (unfortunately there is no direct access to struct dw_hdmi in a specialization drivers). Is this already possible or how can it be done? BR and thanks, Nikolaus [1]: https://lore.kernel.org/all/24a27226a22adf5f5573f013e5d7d89b0ec73664.1645895582.git@goldelico.com/
Re: [next] arm64: db410c: Internal error: Oops: 96000004 - msm_gpu_pm_suspend
Should be fixed by: https://patchwork.freedesktop.org/patch/475238/?series=100448&rev=1 t-b's welcome On Wed, Mar 2, 2022 at 12:35 PM Naresh Kamboju wrote: > > [Please ignore this email if it is already reported] > > Linux next-20220302 running on Qcom db410c the following kernel crash > reported [1]. > > metadata: > git_ref: master > git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next > git_sha: adaedcf826dccf01b69d9a1f1997c9446c6b2c54 > git_describe: next-20220302 > kernel-config: > https://builds.tuxbuild.com/25pJv2XjzFav5peWxwfhaU3LFEN/config > > > Kernel crash: > > Failed to start Entropy Daemon based on the HAVEGE algorithm > [ 12.104662] Unable to handle kernel NULL pointer dereference at > virtual address 0010 > [ 12.121151] ESR = 0x9604 > [ 12.121211] EC = 0x25: DABT (current EL), IL = 32 bits > [ 12.123137] SET = 0, FnV = 0 > [ 12.128687] EA = 0, S1PTW = 0 > [ 12.131464] FSC = 0x04: level 0 translation fault > [ 12.134572] Data abort info: > [ 12.139457] ISV = 0, ISS = 0x0004 > [ 12.142566] CM = 0, WnR = 0 > [ 12.146165] user pgtable: 4k pages, 48-bit VAs, pgdp=8235d000 > [ 12.149360] [0010] pgd=, p4d= > [ 12.156339] Internal error: Oops: 9604 [#1] PREEMPT SMP > [ 12.162370] Modules linked in: videobuf2_dma_contig adv7511(+) > crct10dif_ce qcom_wcnss_pil cec qrtr qcom_q6v5_mss qcom_camss > snd_soc_lpass_apq8016 qcom_pil_info snd_soc_lpass_cpu rtc_pm8xxx > videobuf2_dma_sg qcom_spmi_vadc snd_soc_msm8916_analog qcom_q6v5 > snd_soc_msm8916_digital snd_soc_apq8016_sbc snd_soc_lpass_platform > qcom_pon v4l2_fwnode snd_soc_qcom_common qcom_spmi_temp_alarm > qcom_sysmon qcom_vadc_common venus_core msm qcom_common v4l2_async > qcom_glink_smem qmi_helpers v4l2_mem2mem videobuf2_memops mdt_loader > qnoc_msm8916 gpu_sched icc_smd_rpm display_connector drm_dp_helper > videobuf2_v4l2 drm_kms_helper qcom_stats videobuf2_common qcom_rng > i2c_qcom_cci rpmsg_char drm socinfo rmtfs_mem fuse > [ 12.207393] CPU: 0 PID: 66 Comm: kworker/0:4 Not tainted > 5.17.0-rc6-next-20220302 #1 > [ 12.207407] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > [ 12.207415] Workqueue: pm pm_runtime_work > [ 12.243952] pstate: 6005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 12.247862] pc : hrtimer_active+0x14/0x80 > [ 12.254628] lr : hrtimer_cancel+0x28/0x70 > [ 12.258795] sp : 8b423b70 > [ 12.262786] x29: 8b423b70 x28: x27: > > [ 12.266092] x26: 8ad5d2e0 x25: 0002d138d917 x24: > 0d8dbb80 > [ 12.273210] x23: 0326c010 x22: 0f6b2020 x21: > 0f6b2000 > [ 12.280328] x20: 0f6b20f8 x19: 0f6b2318 x18: > > [ 12.287447] x17: 800035bf3000 x16: 88004000 x15: > 4000 > [ 12.294564] x14: x13: x12: > 8a8b7000 > [ 12.301682] x11: 087facb61180 x10: 0bc0 x9 : > 881d3a78 > [ 12.308800] x8 : 03c68000 x7 : 0018 x6 : > 1483ced5 > [ 12.315918] x5 : 00ff x4 : x3 : > > [ 12.323036] x2 : x1 : x0 : > 0f6b2318 > [ 12.330156] Call trace: > [ 12.337263] hrtimer_active+0x14/0x80 > [ 12.339524] msm_devfreq_suspend+0x30/0x60 [msm] > [ 12.343348] msm_gpu_pm_suspend+0x44/0x144 [msm] > [ 12.348035] adreno_suspend+0x6c/0x174 [msm] > [ 12.352634] pm_generic_runtime_suspend+0x38/0x50 > [ 12.356885] genpd_runtime_suspend+0xb4/0x314 > [ 12.361487] __rpm_callback+0x50/0x180 > [ 12.365824] rpm_callback+0x74/0x80 > [ 12.369470] rpm_suspend+0x110/0x634 > [ 12.372856] pm_runtime_work+0xd0/0xd4 > [ 12.376676] process_one_work+0x1dc/0x450 > [ 12.380235] worker_thread+0x154/0x450 > [ 12.384314] kthread+0x100/0x110 > [ 12.387959] ret_from_fork+0x10/0x20 > [ 12.391351] Code: aa1e03e9 d503201f d503233f f9401802 (b9401041) > [ 12.394913] ---[ end trace ]--- > > Reported-by: Linux Kernel Functional Testing > > -- > Linaro LKFT > https://lkft.linaro.org > [1] https://lkft.validation.linaro.org/scheduler/job/4643232#L2396
Re: [PATCH v2 2/3] drm/i915: Remove the vma refcount
On Wed, Mar 02, 2022 at 11:21:59AM +0100, Thomas Hellström wrote: Now that i915_vma_parked() is taking the object lock on vma destruction, and the only user of the vma refcount, i915_gem_object_unbind() also takes the object lock, remove the vma refcount. Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/i915_gem.c | 17 + drivers/gpu/drm/i915/i915_vma.c | 14 +++--- drivers/gpu/drm/i915/i915_vma.h | 14 -- drivers/gpu/drm/i915/i915_vma_types.h | 1 - 4 files changed, 16 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index dd84ebabb50f..c26110abcc0b 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -151,14 +151,25 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj, break; } + /* +* Requiring the vm destructor to take the object lock +* before destroying a vma would help us eliminate the +* i915_vm_tryget() here, AND thus also the barrier stuff +* at the end. That's an easy fix, but sleeping locks in +* a kthread should generally be avoided. +*/ ret = -EAGAIN; if (!i915_vm_tryget(vma->vm)) break; - /* Prevent vma being freed by i915_vma_parked as we unbind */ - vma = __i915_vma_get(vma); spin_unlock(&obj->vma.lock); + /* +* Since i915_vma_parked() takes the object lock +* before vma destruction, it won't race us here, +* and destroy the vma from under us. +*/ + if (vma) { bool vm_trylock = !!(flags & I915_GEM_OBJECT_UNBIND_VM_TRYLOCK); ret = -EBUSY; @@ -180,8 +191,6 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj, ret = i915_vma_unbind(vma); } } - - __i915_vma_put(vma); } i915_vm_put(vma->vm); One issue still left in i915_gem_object_unbind is that it temporarily removes vmas from the obj->vma.list and adds back later as vma needs to be unbind outside the obj->vma.lock spinlock. This is an issue for other places where we iterate over the obj->vma.list. i915_debugfs_describe_obj is one such case (upcoming vm_bind will be another) that iterates over this list. What is the plan here? Do we need to take object lock while iterating over the list? But this just something I noticed and not related to this patch. This patch looks good to me. Reviewed-by: Niranjana Vishwanathapura diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 91538bc38110..6fd25b39748f 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -122,7 +122,6 @@ vma_create(struct drm_i915_gem_object *obj, if (vma == NULL) return ERR_PTR(-ENOMEM); - kref_init(&vma->ref); vma->ops = &vm->vma_ops; vma->obj = obj; vma->size = obj->base.size; @@ -1628,15 +1627,6 @@ void i915_vma_reopen(struct i915_vma *vma) __i915_vma_remove_closed(vma); } -void i915_vma_release(struct kref *ref) -{ - struct i915_vma *vma = container_of(ref, typeof(*vma), ref); - - i915_active_fini(&vma->active); - GEM_WARN_ON(vma->resource); - i915_vma_free(vma); -} - static void force_unbind(struct i915_vma *vma) { if (!drm_mm_node_allocated(&vma->node)) @@ -1665,7 +1655,9 @@ static void release_references(struct i915_vma *vma, bool vm_ddestroy) if (vm_ddestroy) i915_vm_resv_put(vma->vm); - __i915_vma_put(vma); + i915_active_fini(&vma->active); + GEM_WARN_ON(vma->resource); + i915_vma_free(vma); } /** diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 67ae7341c7e0..6034991d89fe 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -222,20 +222,6 @@ void i915_vma_unlink_ctx(struct i915_vma *vma); void i915_vma_close(struct i915_vma *vma); void i915_vma_reopen(struct i915_vma *vma); -static inline struct i915_vma *__i915_vma_get(struct i915_vma *vma) -{ - if (kref_get_unless_zero(&vma->ref)) - return vma; - - return NULL; -} - -void i915_vma_release(struct kref *ref); -static inline void __i915_vma_put(struct i915_vma *vma) -{ - kref_put(&vma->ref, i915_vma_release); -} - void i915_vma_destroy_locked(struct i915_vma *vma); void i915_vma_destroy(struct i915_vma *vma); diff --git a/drivers/gpu/drm/i915/i915_vma_types.h b/drivers/gpu/drm/i915/i915_vma_types.h index eac36be184e5..be6e028c3b57 100644 --- a/drivers/gpu/drm/i915/i915_vma_types.h +++ b/d
Re: [Intel-gfx] [PATCH] drm/i915: avoid concurrent writes to aux_inv
Quoting fei.y...@intel.com (2022-03-02 18:26:57) > From: Fei Yang > > GPU hangs have been observed when multiple engines write to the > same aux_inv register at the same time. To avoid this each engine > should only invalidate its own auxiliary table. The function > gen12_emit_flush_xcs() currently invalidate the auxiliary table for > all engines because the rq->engine is not necessarily the engine > eventually carrying out the request, and potentially the engine > could even be a virtual one (with engine->instance being -1). > With this patch, auxiliary table invalidation is done only for the > engine executing the request. And the mmio address for the aux_inv > register is set after the engine instance becomes certain. > > Signed-off-by: Chris Wilson > Signed-off-by: Fei Yang > --- > drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 41 --- > .../drm/i915/gt/intel_execlists_submission.c | 38 + > drivers/gpu/drm/i915/i915_request.h | 2 + > 3 files changed, 47 insertions(+), 34 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c > b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c > index b1b9c3fd7bf9..af62e2bc2c9b 100644 > --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c > +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c > @@ -165,30 +165,6 @@ static u32 preparser_disable(bool state) > return MI_ARB_CHECK | 1 << 8 | state; > } > > -static i915_reg_t aux_inv_reg(const struct intel_engine_cs *engine) > -{ > - static const i915_reg_t vd[] = { > - GEN12_VD0_AUX_NV, > - GEN12_VD1_AUX_NV, > - GEN12_VD2_AUX_NV, > - GEN12_VD3_AUX_NV, > - }; > - > - static const i915_reg_t ve[] = { > - GEN12_VE0_AUX_NV, > - GEN12_VE1_AUX_NV, > - }; > - > - if (engine->class == VIDEO_DECODE_CLASS) > - return vd[engine->instance]; > - > - if (engine->class == VIDEO_ENHANCEMENT_CLASS) > - return ve[engine->instance]; > - > - GEM_BUG_ON("unknown aux_inv reg\n"); > - return INVALID_MMIO_REG; > -} > - > static u32 *gen12_emit_aux_table_inv(const i915_reg_t inv_reg, u32 *cs) > { > *cs++ = MI_LOAD_REGISTER_IMM(1); > @@ -288,7 +264,7 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 > mode) > if (mode & EMIT_INVALIDATE) > aux_inv = rq->engine->mask & ~BIT(BCS0); > if (aux_inv) > - cmd += 2 * hweight32(aux_inv) + 2; > + cmd += 4; > > cs = intel_ring_begin(rq, cmd); > if (IS_ERR(cs)) > @@ -319,16 +295,13 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 > mode) > *cs++ = 0; /* value */ > > if (aux_inv) { /* hsdes: 1809175790 */ > - struct intel_engine_cs *engine; > - unsigned int tmp; > - > - *cs++ = MI_LOAD_REGISTER_IMM(hweight32(aux_inv)); > - for_each_engine_masked(engine, rq->engine->gt, aux_inv, tmp) { > - *cs++ = i915_mmio_reg_offset(aux_inv_reg(engine)); > - *cs++ = AUX_INV; > - } > + *cs++ = MI_LOAD_REGISTER_IMM(1); > + rq->vd_ve_aux_inv = cs; > + *cs++ = 0; /* address to be set at submission to HW */ > + *cs++ = AUX_INV; > *cs++ = MI_NOOP; > - } > + } else > + rq->vd_ve_aux_inv = NULL; > > if (mode & EMIT_INVALIDATE) > *cs++ = preparser_disable(false); > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > index 1c602d4ae297..a018de6dcac5 100644 > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > @@ -1258,6 +1258,34 @@ static bool completed(const struct i915_request *rq) > return __i915_request_is_complete(rq); > } > > +static i915_reg_t aux_inv_reg(const struct intel_engine_cs *engine) > +{ > + static const i915_reg_t vd[] = { > + GEN12_VD0_AUX_NV, > + GEN12_VD1_AUX_NV, > + GEN12_VD2_AUX_NV, > + GEN12_VD3_AUX_NV, > + }; > + > + static const i915_reg_t ve[] = { > + GEN12_VE0_AUX_NV, > + GEN12_VE1_AUX_NV, > + }; > + > + if (engine->class == VIDEO_DECODE_CLASS) { > + GEM_BUG_ON(engine->instance >= ARRAY_SIZE(vd)); > + return vd[engine->instance]; > + } > + > + if (engine->class == VIDEO_ENHANCEMENT_CLASS) { > + GEM_BUG_ON(engine->instance >= ARRAY_SIZE(ve)); > + return ve[engine->instance]; > + } > + > + GEM_BUG_ON("unknown aux_inv reg\n"); > + return INVALID_MMIO_REG; > +} > + > static void execlists_dequeue(struct intel_engine_cs *engine) > { > struct intel_engine_execlists * const execlists = &
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Wed, Mar 02, 2022 at 12:18:45PM -0800, Linus Torvalds wrote: > On Wed, Mar 2, 2022 at 12:07 PM Kees Cook wrote: > > > > I've long wanted to change kfree() to explicitly set pointers to NULL on > > free. https://github.com/KSPP/linux/issues/87 > > We've had this discussion with the gcc people in the past, and gcc > actually has some support for it, but it's sadly tied to the actual > function name (ie gcc has some special-casing for "free()") > > See > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94527 > > for some of that discussion. > > Oh, and I see some patch actually got merged since I looked there last > so that you can mark "deallocator" functions, but I think it's only > for the context matching, not for actually killing accesses to the > pointer afterwards. Ah! I missed that getting added in GCC 11. But yes, there it is: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-malloc-function-attribute Hah, now we may need to split __malloc from __alloc_size. ;) I'd still like the NULL assignment behavior, though, since some things can easily avoid static analysis. -- Kees Cook
[next] arm64: db410c: Internal error: Oops: 96000004 - msm_gpu_pm_suspend
[Please ignore this email if it is already reported] Linux next-20220302 running on Qcom db410c the following kernel crash reported [1]. metadata: git_ref: master git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git_sha: adaedcf826dccf01b69d9a1f1997c9446c6b2c54 git_describe: next-20220302 kernel-config: https://builds.tuxbuild.com/25pJv2XjzFav5peWxwfhaU3LFEN/config Kernel crash: Failed to start Entropy Daemon based on the HAVEGE algorithm [ 12.104662] Unable to handle kernel NULL pointer dereference at virtual address 0010 [ 12.121151] ESR = 0x9604 [ 12.121211] EC = 0x25: DABT (current EL), IL = 32 bits [ 12.123137] SET = 0, FnV = 0 [ 12.128687] EA = 0, S1PTW = 0 [ 12.131464] FSC = 0x04: level 0 translation fault [ 12.134572] Data abort info: [ 12.139457] ISV = 0, ISS = 0x0004 [ 12.142566] CM = 0, WnR = 0 [ 12.146165] user pgtable: 4k pages, 48-bit VAs, pgdp=8235d000 [ 12.149360] [0010] pgd=, p4d= [ 12.156339] Internal error: Oops: 9604 [#1] PREEMPT SMP [ 12.162370] Modules linked in: videobuf2_dma_contig adv7511(+) crct10dif_ce qcom_wcnss_pil cec qrtr qcom_q6v5_mss qcom_camss snd_soc_lpass_apq8016 qcom_pil_info snd_soc_lpass_cpu rtc_pm8xxx videobuf2_dma_sg qcom_spmi_vadc snd_soc_msm8916_analog qcom_q6v5 snd_soc_msm8916_digital snd_soc_apq8016_sbc snd_soc_lpass_platform qcom_pon v4l2_fwnode snd_soc_qcom_common qcom_spmi_temp_alarm qcom_sysmon qcom_vadc_common venus_core msm qcom_common v4l2_async qcom_glink_smem qmi_helpers v4l2_mem2mem videobuf2_memops mdt_loader qnoc_msm8916 gpu_sched icc_smd_rpm display_connector drm_dp_helper videobuf2_v4l2 drm_kms_helper qcom_stats videobuf2_common qcom_rng i2c_qcom_cci rpmsg_char drm socinfo rmtfs_mem fuse [ 12.207393] CPU: 0 PID: 66 Comm: kworker/0:4 Not tainted 5.17.0-rc6-next-20220302 #1 [ 12.207407] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 12.207415] Workqueue: pm pm_runtime_work [ 12.243952] pstate: 6005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 12.247862] pc : hrtimer_active+0x14/0x80 [ 12.254628] lr : hrtimer_cancel+0x28/0x70 [ 12.258795] sp : 8b423b70 [ 12.262786] x29: 8b423b70 x28: x27: [ 12.266092] x26: 8ad5d2e0 x25: 0002d138d917 x24: 0d8dbb80 [ 12.273210] x23: 0326c010 x22: 0f6b2020 x21: 0f6b2000 [ 12.280328] x20: 0f6b20f8 x19: 0f6b2318 x18: [ 12.287447] x17: 800035bf3000 x16: 88004000 x15: 4000 [ 12.294564] x14: x13: x12: 8a8b7000 [ 12.301682] x11: 087facb61180 x10: 0bc0 x9 : 881d3a78 [ 12.308800] x8 : 03c68000 x7 : 0018 x6 : 1483ced5 [ 12.315918] x5 : 00ff x4 : x3 : [ 12.323036] x2 : x1 : x0 : 0f6b2318 [ 12.330156] Call trace: [ 12.337263] hrtimer_active+0x14/0x80 [ 12.339524] msm_devfreq_suspend+0x30/0x60 [msm] [ 12.343348] msm_gpu_pm_suspend+0x44/0x144 [msm] [ 12.348035] adreno_suspend+0x6c/0x174 [msm] [ 12.352634] pm_generic_runtime_suspend+0x38/0x50 [ 12.356885] genpd_runtime_suspend+0xb4/0x314 [ 12.361487] __rpm_callback+0x50/0x180 [ 12.365824] rpm_callback+0x74/0x80 [ 12.369470] rpm_suspend+0x110/0x634 [ 12.372856] pm_runtime_work+0xd0/0xd4 [ 12.376676] process_one_work+0x1dc/0x450 [ 12.380235] worker_thread+0x154/0x450 [ 12.384314] kthread+0x100/0x110 [ 12.387959] ret_from_fork+0x10/0x20 [ 12.391351] Code: aa1e03e9 d503201f d503233f f9401802 (b9401041) [ 12.394913] ---[ end trace ]--- Reported-by: Linux Kernel Functional Testing -- Linaro LKFT https://lkft.linaro.org [1] https://lkft.validation.linaro.org/scheduler/job/4643232#L2396
Re: [PATCH v2 1/3] drm/i915: Remove the vm open count
On Wed, Mar 02, 2022 at 11:21:58AM +0100, Thomas Hellström wrote: vms are not getting properly closed. Rather than fixing that, Remove the vm open count and instead rely on the vm refcount. The vm open count existed solely to break the strong references the vmas had on the vms. Now instead make those references weak and ensure vmas are destroyed when the vm is destroyed. Unfortunately if the vm destructor and the object destructor both wants to destroy a vma, that may lead to a race in that the vm destructor just unbinds the vma and leaves the actual vma destruction to the object destructor. However in order for the object destructor to ensure the vma is unbound it needs to grab the vm mutex. In order to keep the vm mutex alive until the object destructor is done with it, somewhat hackishly grab a vm_resv refcount that is released late in the vma destruction process, when the vm mutex is no longer needed. v2: Address review-comments from Niranjana - Clarify that the struct i915_address_space::skip_pte_rewrite is a hack and should ideally be replaced in an upcoming patch. - Remove an unneeded continue in clear_vm_list and update comment. Co-developed-by: Niranjana Vishwanathapura Signed-off-by: Niranjana Vishwanathapura Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/display/intel_dpt.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_context.c | 29 ++- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 6 ++ .../gpu/drm/i915/gem/selftests/mock_context.c | 5 +- drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 2 +- drivers/gpu/drm/i915/gt/intel_ggtt.c | 30 +++ drivers/gpu/drm/i915/gt/intel_gtt.c | 54 drivers/gpu/drm/i915/gt/intel_gtt.h | 56 drivers/gpu/drm/i915/gt/selftest_execlists.c | 86 +-- drivers/gpu/drm/i915/i915_gem.c | 6 +- drivers/gpu/drm/i915/i915_vma.c | 55 drivers/gpu/drm/i915/i915_vma_resource.c | 2 +- drivers/gpu/drm/i915/i915_vma_resource.h | 6 ++ drivers/gpu/drm/i915/i915_vma_types.h | 7 ++ drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 4 +- 15 files changed, 186 insertions(+), 164 deletions(-) Looks good to me. Reviewed-by: Niranjana Vishwanathapura
Re: [PATCH v2 3/3] drm/i915/gem: Remove some unnecessary code
On Wed, Mar 02, 2022 at 11:22:00AM +0100, Thomas Hellström wrote: The test for vma should always return true, and when assigning -EBUSY to ret, the variable should already have that value. Signed-off-by: Thomas Hellström --- drivers/gpu/drm/i915/i915_gem.c | 32 ++-- 1 file changed, 14 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index c26110abcc0b..9747924cc57b 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -118,6 +118,7 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj, unsigned long flags) { struct intel_runtime_pm *rpm = &to_i915(obj->base.dev)->runtime_pm; + bool vm_trylock = !!(flags & I915_GEM_OBJECT_UNBIND_VM_TRYLOCK); LIST_HEAD(still_in_list); intel_wakeref_t wakeref; struct i915_vma *vma; @@ -170,26 +171,21 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj, * and destroy the vma from under us. */ - if (vma) { - bool vm_trylock = !!(flags & I915_GEM_OBJECT_UNBIND_VM_TRYLOCK); - ret = -EBUSY; - if (flags & I915_GEM_OBJECT_UNBIND_ASYNC) { - assert_object_held(vma->obj); - ret = i915_vma_unbind_async(vma, vm_trylock); - } + ret = -EBUSY; + if (flags & I915_GEM_OBJECT_UNBIND_ASYNC) { + assert_object_held(vma->obj); + ret = i915_vma_unbind_async(vma, vm_trylock); + } - if (ret == -EBUSY && (flags & I915_GEM_OBJECT_UNBIND_ACTIVE || - !i915_vma_is_active(vma))) { - if (vm_trylock) { - if (mutex_trylock(&vma->vm->mutex)) { - ret = __i915_vma_unbind(vma); - mutex_unlock(&vma->vm->mutex); - } else { - ret = -EBUSY; - } - } else { - ret = i915_vma_unbind(vma); + if (ret == -EBUSY && (flags & I915_GEM_OBJECT_UNBIND_ACTIVE || + !i915_vma_is_active(vma))) { + if (vm_trylock) { + if (mutex_trylock(&vma->vm->mutex)) { + ret = __i915_vma_unbind(vma); + mutex_unlock(&vma->vm->mutex); } + } else { + ret = i915_vma_unbind(vma); } } Looks good to me. Reviewed-by: Niranjana Vishwanathapura -- 2.34.1
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Wed, Mar 2, 2022 at 12:07 PM Kees Cook wrote: > > I've long wanted to change kfree() to explicitly set pointers to NULL on > free. https://github.com/KSPP/linux/issues/87 We've had this discussion with the gcc people in the past, and gcc actually has some support for it, but it's sadly tied to the actual function name (ie gcc has some special-casing for "free()") See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94527 for some of that discussion. Oh, and I see some patch actually got merged since I looked there last so that you can mark "deallocator" functions, but I think it's only for the context matching, not for actually killing accesses to the pointer afterwards. Linus
Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Wed, Mar 02, 2022 at 10:29:31AM +0100, Rasmus Villemoes wrote: > This won't help the current issue (because it doesn't exist and might > never), but just in case some compiler people are listening, I'd like to > have some sort of way to tell the compiler "treat this variable as > uninitialized from here on". So one could do > > #define kfree(p) do { __kfree(p); __magic_uninit(p); } while (0) > > with __magic_uninit being a magic no-op that doesn't affect the > semantics of the code, but could be used by the compiler's "[is/may be] > used uninitialized" machinery to flag e.g. double frees on some odd > error path etc. It would probably only work for local automatic > variables, but it should be possible to just ignore the hint if p is > some expression like foo->bar or has side effects. If we had that, the > end-of-loop test could include that to "uninitialize" the iterator. I've long wanted to change kfree() to explicitly set pointers to NULL on free. https://github.com/KSPP/linux/issues/87 The thing stopping a trivial transformation of kfree() is: kfree(get_some_pointer()); I would argue, though, that the above is poor form: the thing holding the pointer should be the thing freeing it, so these cases should be refactored and kfree() could do the NULLing by default. Quoting myself in the above issue: Without doing massive tree-wide changes, I think we need compiler support. If we had something like __builtin_is_lvalue(), we could distinguish function returns from lvalues. For example, right now a common case are things like: kfree(get_some_ptr()); But if we could at least gain coverage of the lvalue cases, and detect them statically at compile-time, we could do: #define __kfree_and_null(x) do { __kfree(*x); *x = NULL; } while (0) #define kfree(x) __builtin_choose_expr(__builtin_is_lvalue(x), __kfree_and_null(&(x)), __kfree(x)) Alternatively, we could do a tree-wide change of the former case (findable with Coccinelle) and change them into something like kfree_no_null() and redefine kfree() itself: #define kfree_no_null(x) do { void *__ptr = (x); __kfree(__ptr); } while (0) #define kfree(x) do { __kfree(x); x = NULL; } while (0) -- Kees Cook
Re: [PATCH] simpldrm: Enable boot time VESA graphic mode selection.
Hello, On 3/2/22 20:38, Michal Suchánek wrote: > Hello, > > On Wed, Mar 02, 2022 at 08:31:25PM +0100, Thomas Zimmermann wrote: >> Hi, >> >> is this ready to be merged? > > The objections raised so far have been addressed in v4. > > I think this is good to merge. > The v4 patches looks good to me and have provided my Reviewed-by to all of them. > Thanks > > Michal > >> -- Best regards, Javier Martinez Canillas Linux Engineering Red Hat
Re: [PATCH] simpldrm: Enable boot time VESA graphic mode selection.
Hello, On Wed, Mar 02, 2022 at 08:31:25PM +0100, Thomas Zimmermann wrote: > Hi, > > is this ready to be merged? The objections raised so far have been addressed in v4. I think this is good to merge. Thanks Michal > > Best regards > Thomas > > Am 18.02.22 um 10:33 schrieb Michal Suchanek: > > Since switch to simpledrm VESA graphic modes are no longer available > > with legacy BIOS. > > > > The x86 realmode boot code enables the VESA graphic modes when option > > FB_BOOT_VESA_SUPPORT is enabled. > > > > To enable use of VESA modes with simpledrm in legacy BIOS boot mode drop > > dependency of BOOT_VESA_SUPPORT on FB, also drop the FB_ prefix, and > > select the option when simpledrm is built-in on x86. > > > > Fixes: 11e8f5fd223b ("drm: Add simpledrm driver") > > Signed-off-by: Michal Suchanek > > --- > > arch/x86/boot/video-vesa.c | 4 ++-- > > drivers/gpu/drm/tiny/Kconfig | 1 + > > drivers/video/fbdev/Kconfig | 9 - > > 3 files changed, 7 insertions(+), 7 deletions(-) > > > > diff --git a/arch/x86/boot/video-vesa.c b/arch/x86/boot/video-vesa.c > > index 7e185977a984..c2c6d35e3a43 100644 > > --- a/arch/x86/boot/video-vesa.c > > +++ b/arch/x86/boot/video-vesa.c > > @@ -83,7 +83,7 @@ static int vesa_probe(void) > >(vminfo.memory_layout == 4 || > > vminfo.memory_layout == 6) && > >vminfo.memory_planes == 1) { > > -#ifdef CONFIG_FB_BOOT_VESA_SUPPORT > > +#ifdef CONFIG_BOOT_VESA_SUPPORT > > /* Graphics mode, color, linear frame buffer > >supported. Only register the mode if > >if framebuffer is configured, however, > > @@ -121,7 +121,7 @@ static int vesa_set_mode(struct mode_info *mode) > > if ((vminfo.mode_attr & 0x15) == 0x05) { > > /* It's a supported text mode */ > > is_graphic = 0; > > -#ifdef CONFIG_FB_BOOT_VESA_SUPPORT > > +#ifdef CONFIG_BOOT_VESA_SUPPORT > > } else if ((vminfo.mode_attr & 0x99) == 0x99) { > > /* It's a graphics mode with linear frame buffer */ > > is_graphic = 1; > > diff --git a/drivers/gpu/drm/tiny/Kconfig b/drivers/gpu/drm/tiny/Kconfig > > index 712e0004e96e..1bc30c64ed15 100644 > > --- a/drivers/gpu/drm/tiny/Kconfig > > +++ b/drivers/gpu/drm/tiny/Kconfig > > @@ -54,6 +54,7 @@ config DRM_GM12U320 > > config DRM_SIMPLEDRM > > tristate "Simple framebuffer driver" > > depends on DRM && MMU > > + select BOOT_VESA_SUPPORT if X86 && DRM_SIMPLEDRM = y > > select DRM_GEM_SHMEM_HELPER > > select DRM_KMS_HELPER > > help > > diff --git a/drivers/video/fbdev/Kconfig b/drivers/video/fbdev/Kconfig > > index 6ed5e608dd04..4f3be9b7a520 100644 > > --- a/drivers/video/fbdev/Kconfig > > +++ b/drivers/video/fbdev/Kconfig > > @@ -66,9 +66,8 @@ config FB_DDC > > select I2C_ALGOBIT > > select I2C > > -config FB_BOOT_VESA_SUPPORT > > +config BOOT_VESA_SUPPORT > > bool > > - depends on FB > > help > > If true, at least one selected framebuffer driver can take advantage > > of VESA video modes set at an early boot stage via the vga= parameter. > > @@ -627,7 +626,7 @@ config FB_VESA > > select FB_CFB_FILLRECT > > select FB_CFB_COPYAREA > > select FB_CFB_IMAGEBLIT > > - select FB_BOOT_VESA_SUPPORT > > + select BOOT_VESA_SUPPORT > > help > > This is the frame buffer device driver for generic VESA 2.0 > > compliant graphic cards. The older VESA 1.2 cards are not supported. > > @@ -1051,7 +1050,7 @@ config FB_INTEL > > select FB_CFB_FILLRECT > > select FB_CFB_COPYAREA > > select FB_CFB_IMAGEBLIT > > - select FB_BOOT_VESA_SUPPORT if FB_INTEL = y > > + select BOOT_VESA_SUPPORT if FB_INTEL = y > > depends on !DRM_I915 > > help > > This driver supports the on-board graphics built in to the Intel > > @@ -1378,7 +1377,7 @@ config FB_SIS > > select FB_CFB_FILLRECT > > select FB_CFB_COPYAREA > > select FB_CFB_IMAGEBLIT > > - select FB_BOOT_VESA_SUPPORT if FB_SIS = y > > + select BOOT_VESA_SUPPORT if FB_SIS = y > > select FB_SIS_300 if !FB_SIS_315 > > help > > This is the frame buffer device driver for the SiS 300, 315, 330 > > -- > Thomas Zimmermann > Graphics Driver Developer > SUSE Software Solutions Germany GmbH > Maxfeldstr. 5, 90409 Nürnberg, Germany > (HRB 36809, AG Nürnberg) > Geschäftsführer: Ivo Totev
Re: [PATCH] simpldrm: Enable boot time VESA graphic mode selection.
Hi, is this ready to be merged? Best regards Thomas Am 18.02.22 um 10:33 schrieb Michal Suchanek: Since switch to simpledrm VESA graphic modes are no longer available with legacy BIOS. The x86 realmode boot code enables the VESA graphic modes when option FB_BOOT_VESA_SUPPORT is enabled. To enable use of VESA modes with simpledrm in legacy BIOS boot mode drop dependency of BOOT_VESA_SUPPORT on FB, also drop the FB_ prefix, and select the option when simpledrm is built-in on x86. Fixes: 11e8f5fd223b ("drm: Add simpledrm driver") Signed-off-by: Michal Suchanek --- arch/x86/boot/video-vesa.c | 4 ++-- drivers/gpu/drm/tiny/Kconfig | 1 + drivers/video/fbdev/Kconfig | 9 - 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/x86/boot/video-vesa.c b/arch/x86/boot/video-vesa.c index 7e185977a984..c2c6d35e3a43 100644 --- a/arch/x86/boot/video-vesa.c +++ b/arch/x86/boot/video-vesa.c @@ -83,7 +83,7 @@ static int vesa_probe(void) (vminfo.memory_layout == 4 || vminfo.memory_layout == 6) && vminfo.memory_planes == 1) { -#ifdef CONFIG_FB_BOOT_VESA_SUPPORT +#ifdef CONFIG_BOOT_VESA_SUPPORT /* Graphics mode, color, linear frame buffer supported. Only register the mode if if framebuffer is configured, however, @@ -121,7 +121,7 @@ static int vesa_set_mode(struct mode_info *mode) if ((vminfo.mode_attr & 0x15) == 0x05) { /* It's a supported text mode */ is_graphic = 0; -#ifdef CONFIG_FB_BOOT_VESA_SUPPORT +#ifdef CONFIG_BOOT_VESA_SUPPORT } else if ((vminfo.mode_attr & 0x99) == 0x99) { /* It's a graphics mode with linear frame buffer */ is_graphic = 1; diff --git a/drivers/gpu/drm/tiny/Kconfig b/drivers/gpu/drm/tiny/Kconfig index 712e0004e96e..1bc30c64ed15 100644 --- a/drivers/gpu/drm/tiny/Kconfig +++ b/drivers/gpu/drm/tiny/Kconfig @@ -54,6 +54,7 @@ config DRM_GM12U320 config DRM_SIMPLEDRM tristate "Simple framebuffer driver" depends on DRM && MMU + select BOOT_VESA_SUPPORT if X86 && DRM_SIMPLEDRM = y select DRM_GEM_SHMEM_HELPER select DRM_KMS_HELPER help diff --git a/drivers/video/fbdev/Kconfig b/drivers/video/fbdev/Kconfig index 6ed5e608dd04..4f3be9b7a520 100644 --- a/drivers/video/fbdev/Kconfig +++ b/drivers/video/fbdev/Kconfig @@ -66,9 +66,8 @@ config FB_DDC select I2C_ALGOBIT select I2C -config FB_BOOT_VESA_SUPPORT +config BOOT_VESA_SUPPORT bool - depends on FB help If true, at least one selected framebuffer driver can take advantage of VESA video modes set at an early boot stage via the vga= parameter. @@ -627,7 +626,7 @@ config FB_VESA select FB_CFB_FILLRECT select FB_CFB_COPYAREA select FB_CFB_IMAGEBLIT - select FB_BOOT_VESA_SUPPORT + select BOOT_VESA_SUPPORT help This is the frame buffer device driver for generic VESA 2.0 compliant graphic cards. The older VESA 1.2 cards are not supported. @@ -1051,7 +1050,7 @@ config FB_INTEL select FB_CFB_FILLRECT select FB_CFB_COPYAREA select FB_CFB_IMAGEBLIT - select FB_BOOT_VESA_SUPPORT if FB_INTEL = y + select BOOT_VESA_SUPPORT if FB_INTEL = y depends on !DRM_I915 help This driver supports the on-board graphics built in to the Intel @@ -1378,7 +1377,7 @@ config FB_SIS select FB_CFB_FILLRECT select FB_CFB_COPYAREA select FB_CFB_IMAGEBLIT - select FB_BOOT_VESA_SUPPORT if FB_SIS = y + select BOOT_VESA_SUPPORT if FB_SIS = y select FB_SIS_300 if !FB_SIS_315 help This is the frame buffer device driver for the SiS 300, 315, 330 -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH v3 0/5] fbdev: Improve performance of fbdev console
Hi, merged with fixes for the typoes in the final patch. Thanks for reviewing. Best regards Thomas Am 23.02.22 um 20:37 schrieb Thomas Zimmermann: Optimize performance of the fbdev console for the common case of software-based clearing and image blitting. The commit descripton of each patch contains resuls os a simple microbenchmark. I also tested the full patchset's effect on the console output by printing directory listings (i7-4790, FullHD, simpledrm, kernel with debugging). > time find /usr/share/doc -type f In the unoptimized case: real0m6.173s user0m0.044s sys 0m6.107s With optimizations applied: real0m4.754s user0m0.044s sys 0m4.698s In the optimized case, printing the directory listing is ~25% faster than before. In v2 of the patchset, after implementing Sam's suggestion to update cfb_imageblit() as well, it turns out that the compiled code in sys_imageblit() is still significantly slower than the CFB version. A fix is probably a larger task and would include architecture-specific changes. A new TODO item suggests to investigate the performance of the various helpers and format-conversion functions in DRM and fbdev. v3: * fix description of cfb_imageblit() patch (Pekka) v2: * improve readability for sys_imageblit() (Gerd, Sam) * new TODO item for further optimization Thomas Zimmermann (5): fbdev: Improve performance of sys_fillrect() fbdev: Improve performance of sys_imageblit() fbdev: Remove trailing whitespaces from cfbimgblt.c fbdev: Improve performance of cfb_imageblit() drm: Add TODO item for optimizing format helpers Documentation/gpu/todo.rst | 22 + drivers/video/fbdev/core/cfbimgblt.c | 107 - drivers/video/fbdev/core/sysfillrect.c | 16 +--- drivers/video/fbdev/core/sysimgblt.c | 49 --- 4 files changed, 133 insertions(+), 61 deletions(-) -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range
On 2022-03-02 15:55, Michael Cheng wrote: Thanks for the feedback Robin! Sorry my choices of word weren't that great, but what I meant is to understand how ARM flushes a range of dcache for device drivers, and not an equal to x86 clflush. I believe the concern is if the CPU writes an update, that update might only be sitting in the CPU cache and never make it to device memory where the device can see it; there are specific places that we are supposed to flush the CPU caches to make sure our updates are visible to the hardware. Ah, OK, if it's more about ordering, and it's actually write buffers rather than caches that you care about flushing, then we might be a lot safer, phew! For a very simple overview, in a case where the device itself needs to observe memory writes in the correct order, e.g.: data_descriptor.valid = 1; clflush(&data_descriptor); command_descriptor.data = &data_descriptor writel(/* control register to read command to then read data */) then dma_wmb() between the first two writes should be the right tool to ensure that the command does not observe the command update while the data update is still sat somewhere in a CPU write buffer. If you want a slightly stronger notion that, at a given point, all prior writes have actually been issued and should now be visible (rather than just that they won't become visible in the wrong order whenever they do), then wmb() should suffice on arm64. Note that wioth arm64 memory types, a Non-Cacheable mapping of DRAM for a non-coherent DMA mapping, or of VRAM in a prefetchable BAR, can still be write-buffered, so barriers still matter even when actual cache maintenance ops don't (and as before if you're trying to perform cache maintenance outside the DMA API then you've already lost anyway). MMIO registers should be mapped as Device memory via ioremap(), which is not bufferable, hence the barrier implicit in writel() effectively pushes out any prior buffered writes ahead of a register write, which is why we don't need to worry about this most of the time. This is only a very rough overview, though, and I'm not familiar enough with x86 semantics, your hardware, or the exact use-case to be able to say whether barriers alone are anywhere near the right answer or not. Robin. +Matt Roper Matt, Lucas, any feed back here? On 2022-03-02 4:49 a.m., Robin Murphy wrote: On 2022-02-25 19:27, Michael Cheng wrote: Hi Robin, [ +arm64 maintainers for their awareness, which would have been a good thing to do from the start ] * Thanks for adding the arm64 maintainer and sorry I didn't rope them in sooner. Why does i915 need to ensure the CPU's instruction cache is coherent with its data cache? Is it a self-modifying driver? * Also thanks for pointing this out. Initially I was using dcache_clean_inval_poc, which seem to be the equivalently to what x86 is doing for dcache flushing, but it was giving me build errors since its not on the global list of kernel symbols. And after revisiting the documentation for caches_clean_inval_pou, it won't fly for what we are trying to do. Moving forward, what would you (or someone in the ARM community) suggest we do? Could it be possible to export dcache_clean_inval_poc as a global symbol? Unlikely, unless something with a legitimate need for CPU-centric cache maintenance like kexec or CPU hotplug ever becomes modular. In the case of a device driver, it's not even the basic issues of assuming to find direct equivalents to x86 semantics in other CPU architectures, or effectively reinventing parts of the DMA API, it's even bigger than that. Once you move from being integrated in a single vendor's system architecture to being on a discrete card, you fundamentally *no longer have any control over cache coherency*. Whether the host CPU architecture happens to be AArch64, RISC-V, or whatever doesn't really matter, you're at the mercy of 3rd-party PCIe and interconnect IP vendors, and SoC integrators. You'll find yourself in systems where PCIe simply cannot snoop any caches, where you'd better have the correct DMA API calls in place to have any hope of even the most basic functionality working properly; you'll find yourself in systems where even if the PCIe root complex claims to support No Snoop, your uncached traffic will still end up snooping stale data that got prefetched back into caches you thought you'd invalidated; you'll find yourself in systems where your memory attributes may or may not get forcibly rewritten by an IOMMU depending on the kernel config and/or command line. It's not about simply finding a substitute for clflush, it's that the reasons you have for using clflush in the first place can no longer be assumed to be valid. Robin. On 2022-02-25 10:24 a.m., Robin Murphy wrote: [ +arm64 maintainers for their awareness, which would have been a good thing to do from the star
Re: [PATCH 0/2] DSI host and peripheral initialisation ordering
Hi, On Wed, Mar 2, 2022 at 9:20 AM Dave Stevenson wrote: > > Hi Doug > > On Wed, 2 Mar 2022 at 00:13, Doug Anderson wrote: > > > > Hi, > > > > On Wed, Feb 16, 2022 at 9:00 AM Dave Stevenson > > wrote: > > > > > > Hi All > > > > > > Hopefully I've cc'ed all those that have bashed this problem around > > > previously, > > > or are otherwise linked to DRM bridges. > > > > > > There have been numerous discussions around how DSI support is currently > > > broken > > > as it doesn't support initialising the PHY to LP-11 and potentially the > > > clock > > > lane to HS prior to configuring the DSI peripheral. There is no op where > > > the > > > interface is initialised but HS video isn't also being sent. > > > Currently you have: > > > - peripheral pre_enable (host not initialised yet) > > > - host pre_enable > > > - encoder enable > > > - host enable > > > - peripheral enable (video already running) > > > > > > vc4 and exynos currently implement the DSI host as an encoder, and split > > > the > > > bridge_chain. This fails if you want to switch to being a bridge and/or > > > use > > > atomic calls as the state of all the elements split off are not added by > > > drm_atomic_add_encoder_bridges. > > > > > > dw-mipi-dsi[1] and now msm[2] use the mode_set hook to initialise the > > > PHY, so > > > the bridge/panel pre_enable can send commands. In their post_disable they > > > then > > > call the downstream bridge/panel post_disable op manually so that shutdown > > > commands can be sent before shutting down the PHY. Nothing handles that > > > fact, > > > so the framework then continues down the bridge chain and calls the > > > post_disable > > > again, so we get unbalanced panel prepare/unprepare calls being reported > > > [3]. > > > > > > There have been patches[4] proposing reversing the entire direction of > > > pre_enable and post_disable, but that risks driving voltage into devices > > > that > > > have yet to be powered up. > > > There have been discussions about adding either a pre_pre_enable, or > > > adding a > > > DSI host_op to initialise the host[5]. Both require significant reworking > > > to all > > > existing drivers in moving initialisation phases. > > > We have patches that look like they may well be addressing race > > > conditions in > > > starting up a DSI peripheral[6]. > > > > In general I'm happy to let the more senior people in DRM set the > > direction here so I probably won't do lots of review, but I will point > > out that I did have another proposal that sorta got lost in the noise > > of the whole "reversing the entire direction". That's basically: > > > > https://lists.freedesktop.org/archives/dri-devel/2021-October/328934.html > > > > I have no idea if something like that would work for your use case, > > but after analyzing it it felt like a surprisingly clean proposal even > > if my first instinct when I thought about it was that it was a hack. > > ;-) I suspect (but haven't analyzed your code) that it might be > > equivalent to your proposal of using a flag but maybe easier to wrap > > ones head around? > > If I'm reading that right, then you're proposing adding > after_pre_enable and before_post_disable hooks. > That's almost the same as the power_up() and power_down() ops that > Dmitry suggested earlier, or pre_pre_enable / post_post_disable that > had also been considered. > > Neither of those options handles the case of a longer chain in which > two non-consecutive links want their upstream bridge enabled first. > As per the clarification in patch 1/2, considering the chain > - Panel > - Bridge 1 > - Bridge 2 DRM_BRIDGE_OP_UPSTREAM_FIRST > - Bridge 3 > - Bridge 4 DRM_BRIDGE_OP_UPSTREAM_FIRST > - Bridge 5 > - Encoder > With the flag option we call pre_enables as Panel, Bridge 1, Bridge 3, > Bridge 2, Bridge 5, Bridge 4, Encoder. > If adding after_pre_enable, then we end up with Panel, Bridge 1, > Bridge 3, Bridge 5, Bridge 4 (after_pre_enable), Bridge 2 > (after_pre_enable), Encoder. > (power_on / pre_pre_enable from encoder to connector would end up with > Bridge 5 (power_on), Bridge 3 (power_on), Bridge 1 (power_on), Panel, > Bridge 2, Bridge 4, Encoder). > Those potentially work, but it seems a less logical order compared to > using a flag to swap only the bridges of interest. I think power_on / > pre_pre_enable covers DSI better than after_pre_enable. > > Adding the extra ops requires the source bridge (eg DSI host) to know > the behaviour the sink bridge/panel wants. So do all DSI hosts have to > implement power_on to power up and enter LP-11. Some DSI peripherals > may be quite happy or even prefer to have the bus totally idle / > powered down at pre_enable, but there would be no way of implementing > that. Ah, that makes it super clear, thanks! :-) If the local swap of just two components that you're doing is more useful to you than the two stage approach and everyone likes it then I have no objections. > You seem to be looking at DP, which I h
[PATCH] drm/i915: avoid concurrent writes to aux_inv
From: Fei Yang GPU hangs have been observed when multiple engines write to the same aux_inv register at the same time. To avoid this each engine should only invalidate its own auxiliary table. The function gen12_emit_flush_xcs() currently invalidate the auxiliary table for all engines because the rq->engine is not necessarily the engine eventually carrying out the request, and potentially the engine could even be a virtual one (with engine->instance being -1). With this patch, auxiliary table invalidation is done only for the engine executing the request. And the mmio address for the aux_inv register is set after the engine instance becomes certain. Signed-off-by: Chris Wilson Signed-off-by: Fei Yang --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 41 --- .../drm/i915/gt/intel_execlists_submission.c | 38 + drivers/gpu/drm/i915/i915_request.h | 2 + 3 files changed, 47 insertions(+), 34 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c index b1b9c3fd7bf9..af62e2bc2c9b 100644 --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c @@ -165,30 +165,6 @@ static u32 preparser_disable(bool state) return MI_ARB_CHECK | 1 << 8 | state; } -static i915_reg_t aux_inv_reg(const struct intel_engine_cs *engine) -{ - static const i915_reg_t vd[] = { - GEN12_VD0_AUX_NV, - GEN12_VD1_AUX_NV, - GEN12_VD2_AUX_NV, - GEN12_VD3_AUX_NV, - }; - - static const i915_reg_t ve[] = { - GEN12_VE0_AUX_NV, - GEN12_VE1_AUX_NV, - }; - - if (engine->class == VIDEO_DECODE_CLASS) - return vd[engine->instance]; - - if (engine->class == VIDEO_ENHANCEMENT_CLASS) - return ve[engine->instance]; - - GEM_BUG_ON("unknown aux_inv reg\n"); - return INVALID_MMIO_REG; -} - static u32 *gen12_emit_aux_table_inv(const i915_reg_t inv_reg, u32 *cs) { *cs++ = MI_LOAD_REGISTER_IMM(1); @@ -288,7 +264,7 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 mode) if (mode & EMIT_INVALIDATE) aux_inv = rq->engine->mask & ~BIT(BCS0); if (aux_inv) - cmd += 2 * hweight32(aux_inv) + 2; + cmd += 4; cs = intel_ring_begin(rq, cmd); if (IS_ERR(cs)) @@ -319,16 +295,13 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 mode) *cs++ = 0; /* value */ if (aux_inv) { /* hsdes: 1809175790 */ - struct intel_engine_cs *engine; - unsigned int tmp; - - *cs++ = MI_LOAD_REGISTER_IMM(hweight32(aux_inv)); - for_each_engine_masked(engine, rq->engine->gt, aux_inv, tmp) { - *cs++ = i915_mmio_reg_offset(aux_inv_reg(engine)); - *cs++ = AUX_INV; - } + *cs++ = MI_LOAD_REGISTER_IMM(1); + rq->vd_ve_aux_inv = cs; + *cs++ = 0; /* address to be set at submission to HW */ + *cs++ = AUX_INV; *cs++ = MI_NOOP; - } + } else + rq->vd_ve_aux_inv = NULL; if (mode & EMIT_INVALIDATE) *cs++ = preparser_disable(false); diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 1c602d4ae297..a018de6dcac5 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -1258,6 +1258,34 @@ static bool completed(const struct i915_request *rq) return __i915_request_is_complete(rq); } +static i915_reg_t aux_inv_reg(const struct intel_engine_cs *engine) +{ + static const i915_reg_t vd[] = { + GEN12_VD0_AUX_NV, + GEN12_VD1_AUX_NV, + GEN12_VD2_AUX_NV, + GEN12_VD3_AUX_NV, + }; + + static const i915_reg_t ve[] = { + GEN12_VE0_AUX_NV, + GEN12_VE1_AUX_NV, + }; + + if (engine->class == VIDEO_DECODE_CLASS) { + GEM_BUG_ON(engine->instance >= ARRAY_SIZE(vd)); + return vd[engine->instance]; + } + + if (engine->class == VIDEO_ENHANCEMENT_CLASS) { + GEM_BUG_ON(engine->instance >= ARRAY_SIZE(ve)); + return ve[engine->instance]; + } + + GEM_BUG_ON("unknown aux_inv reg\n"); + return INVALID_MMIO_REG; +} + static void execlists_dequeue(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; @@ -1538,6 +1566,16 @@ static void execlists_dequeue(struct intel_engine_cs *engine) } if (__i915_request_submit(rq)) { + /* hsdes: 1809175790 */ + if ((GRAPHICS_VER(engine->i915) ==
Re: [PATCH 5/6] drm/rcar_du: changes to rcar-du driver resulting from drm_writeback_connector structure changes
Hi Abhinav, On Wed, Mar 02, 2022 at 10:28:03AM -0800, Abhinav Kumar wrote: > On 2/28/2022 5:42 AM, Laurent Pinchart wrote: > > On Mon, Feb 28, 2022 at 02:28:27PM +0200, Laurent Pinchart wrote: > >> On Mon, Feb 28, 2022 at 02:09:15PM +0200, Jani Nikula wrote: > >>> On Mon, 28 Feb 2022, Laurent Pinchart wrote: > On Sat, Feb 26, 2022 at 10:27:59AM -0800, Rob Clark wrote: > > On Wed, Feb 2, 2022 at 7:41 AM Jani Nikula wrote: > >> On Wed, 02 Feb 2022, Laurent Pinchart wrote: > >>> On Wed, Feb 02, 2022 at 03:15:03PM +0200, Jani Nikula wrote: > On Wed, 02 Feb 2022, Laurent Pinchart wrote: > > On Wed, Feb 02, 2022 at 02:24:28PM +0530, Kandpal Suraj wrote: > >> Changing rcar_du driver to accomadate the change of > >> drm_writeback_connector.base and drm_writeback_connector.encoder > >> to a pointer the reason for which is explained in the > >> Patch(drm: add writeback pointers to drm_connector). > >> > >> Signed-off-by: Kandpal Suraj > >> --- > >> drivers/gpu/drm/rcar-du/rcar_du_crtc.h | 2 ++ > >> drivers/gpu/drm/rcar-du/rcar_du_writeback.c | 8 +--- > >> 2 files changed, 7 insertions(+), 3 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/rcar-du/rcar_du_crtc.h > >> b/drivers/gpu/drm/rcar-du/rcar_du_crtc.h > >> index 66e8839db708..68f387a04502 100644 > >> --- a/drivers/gpu/drm/rcar-du/rcar_du_crtc.h > >> +++ b/drivers/gpu/drm/rcar-du/rcar_du_crtc.h > >> @@ -72,6 +72,8 @@ struct rcar_du_crtc { > >>const char *const *sources; > >>unsigned int sources_count; > >> > >> + struct drm_connector connector; > >> + struct drm_encoder encoder; > > > > Those fields are, at best, poorly named. Furthermore, there's no > > need in > > this driver or in other drivers using drm_writeback_connector to > > create > > an encoder or connector manually. Let's not polute all drivers > > because > > i915 doesn't have its abstractions right. > > i915 uses the quite common model for struct inheritance: > > struct intel_connector { > struct drm_connector base; > /* ... */ > } > > Same with at least amd, ast, fsl-dcu, hisilicon, mga200, msm, > nouveau, > radeon, tilcdc, and vboxvideo. > > We could argue about the relative merits of that abstraction, but I > think the bottom line is that it's popular and the drivers using it > are > not going to be persuaded to move away from it. > >>> > >>> Nobody said inheritance is bad. > >>> > It's no coincidence that the drivers who've implemented writeback so > far > (komeda, mali, rcar-du, vc4, and vkms) do not use the abstraction, > because the drm_writeback_connector midlayer does, forcing the issue. > >>> > >>> Are you sure it's not a coincidence ? :-) > >>> > >>> The encoder and especially connector created by > >>> drm_writeback_connector > >>> are there only because KMS requires a drm_encoder and a drm_connector > >>> to > >>> be exposed to userspace (and I could argue that using a connector for > >>> writeback is a hack, but that won't change). The connector is > >>> "virtual", > >>> I still fail to see why i915 or any other driver would need to wrap it > >>> into something else. The whole point of the drm_writeback_connector > >>> abstraction is that drivers do not have to manage the writeback > >>> drm_connector manually, they shouldn't touch it at all. > >> > >> The thing is, drm_writeback_connector_init() calling > >> drm_connector_init() on the drm_connector embedded in > >> drm_writeback_connector leads to that connector being added to the > >> drm_device's list of connectors. Ditto for the encoder. > >> > >> All the driver code that handles drm_connectors would need to take into > >> account they might not be embedded in intel_connector. Throughout the > >> driver. Ditto for the encoders. > > > > The assumption that a connector is embedded in intel_connector doesn't > > really play that well with how bridge and panel connectors work.. so > > in general this seems like a good thing to unwind. > > > > But as a point of practicality, i915 is a large driver covering a lot > > of generations of hw with a lot of users. So I can understand > > changing this design isn't something that can happen quickly or > > easily. IMO we should allow i915 to create it's own connector for > > writeback, and just document clearly that this isn't the approach new > > drivers should take. I mean, I understand idealism, but sometimes a > > d
Re: [PATCH][next] drm: ssd130x: remove redundant initialization of pointer mode
Hello Colin, Thanks for the patch. On Wed, Mar 2, 2022 at 6:53 PM Colin Ian King wrote: > > Pointer mode is being assigned a value that is never read, it is > being re-assigned later with a new value. The initialization is > redundant and can be removed. > Indeed. Acked-by: Javier Martinez Canillas Best regards, Javier
Re: [PATCH V5 3/6] dt-bindings: display: mediatek: revise enum to const
On Tue, Mar 01, 2022 at 04:01:02PM +0800, Rex-BC Chen wrote: > There won't be more than 1 fallback for these bindings, so we modify > them to use const instead of enum. > > Signed-off-by: Rex-BC Chen > --- > .../devicetree/bindings/display/mediatek/mediatek,aal.yaml| 2 +- > .../devicetree/bindings/display/mediatek/mediatek,ccorr.yaml | 2 +- > .../devicetree/bindings/display/mediatek/mediatek,color.yaml | 4 ++-- > .../devicetree/bindings/display/mediatek/mediatek,dither.yaml | 2 +- > .../devicetree/bindings/display/mediatek/mediatek,gamma.yaml | 2 +- > .../devicetree/bindings/display/mediatek/mediatek,ovl.yaml| 4 ++-- > .../devicetree/bindings/display/mediatek/mediatek,rdma.yaml | 4 ++-- > 7 files changed, 10 insertions(+), 10 deletions(-) > > diff --git > a/Documentation/devicetree/bindings/display/mediatek/mediatek,aal.yaml > b/Documentation/devicetree/bindings/display/mediatek/mediatek,aal.yaml > index 191b56e16bee..bc1c70d089ba 100644 > --- a/Documentation/devicetree/bindings/display/mediatek/mediatek,aal.yaml > +++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,aal.yaml > @@ -31,7 +31,7 @@ properties: >- mediatek,mt8183-disp-aal >- mediatek,mt8192-disp-aal >- mediatek,mt8195-disp-aal > - - enum: > + - const: >- mediatek,mt8173-disp-aal Run 'make dt_binding_check'. This will fail. > >reg: > diff --git > a/Documentation/devicetree/bindings/display/mediatek/mediatek,ccorr.yaml > b/Documentation/devicetree/bindings/display/mediatek/mediatek,ccorr.yaml > index 6894b6999412..fc40e2981eaa 100644 > --- a/Documentation/devicetree/bindings/display/mediatek/mediatek,ccorr.yaml > +++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,ccorr.yaml > @@ -28,7 +28,7 @@ properties: >- items: >- enum: >- mediatek,mt8195-disp-ccorr > - - enum: > + - const: >- mediatek,mt8192-disp-ccorr > >reg: > diff --git > a/Documentation/devicetree/bindings/display/mediatek/mediatek,color.yaml > b/Documentation/devicetree/bindings/display/mediatek/mediatek,color.yaml > index bc83155b3b4c..22ed5d368cbe 100644 > --- a/Documentation/devicetree/bindings/display/mediatek/mediatek,color.yaml > +++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,color.yaml > @@ -32,14 +32,14 @@ properties: >- enum: >- mediatek,mt7623-disp-color >- mediatek,mt2712-disp-color > - - enum: > + - const: >- mediatek,mt2701-disp-color >- items: >- enum: >- mediatek,mt8183-disp-color >- mediatek,mt8192-disp-color >- mediatek,mt8195-disp-color > - - enum: > + - const: >- mediatek,mt8173-disp-color >reg: > maxItems: 1 > diff --git > a/Documentation/devicetree/bindings/display/mediatek/mediatek,dither.yaml > b/Documentation/devicetree/bindings/display/mediatek/mediatek,dither.yaml > index 9d89297f5f1d..2a82792b2625 100644 > --- a/Documentation/devicetree/bindings/display/mediatek/mediatek,dither.yaml > +++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,dither.yaml > @@ -28,7 +28,7 @@ properties: >- enum: >- mediatek,mt8192-disp-dither >- mediatek,mt8195-disp-dither > - - enum: > + - const: >- mediatek,mt8183-disp-dither > >reg: > diff --git > a/Documentation/devicetree/bindings/display/mediatek/mediatek,gamma.yaml > b/Documentation/devicetree/bindings/display/mediatek/mediatek,gamma.yaml > index 247baad147b3..45f88bdf07bb 100644 > --- a/Documentation/devicetree/bindings/display/mediatek/mediatek,gamma.yaml > +++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,gamma.yaml > @@ -29,7 +29,7 @@ properties: >- enum: >- mediatek,mt8192-disp-gamma >- mediatek,mt8195-disp-gamma > - - enum: > + - const: >- mediatek,mt8183-disp-gamma > >reg: > diff --git > a/Documentation/devicetree/bindings/display/mediatek/mediatek,ovl.yaml > b/Documentation/devicetree/bindings/display/mediatek/mediatek,ovl.yaml > index e71f79bc2dee..88ba3f27b0e4 100644 > --- a/Documentation/devicetree/bindings/display/mediatek/mediatek,ovl.yaml > +++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,ovl.yaml > @@ -33,12 +33,12 @@ properties: >- enum: >- mediatek,mt7623-disp-ovl >- mediatek,mt2712-disp-ovl > - - enum: > + - const: >- mediatek,mt2701-disp-ovl >- items: >- enum: >- mediatek,mt8195-disp-ovl > - - enum: > + - const: >- mediatek,mt8183-disp-ovl > >reg: > diff --git > a/Documentation/devicetree/bindings/display/mediatek/m
Re: [PATCH 5/6] drm/rcar_du: changes to rcar-du driver resulting from drm_writeback_connector structure changes
On 2/28/2022 5:42 AM, Laurent Pinchart wrote: Hello, On Mon, Feb 28, 2022 at 02:28:27PM +0200, Laurent Pinchart wrote: On Mon, Feb 28, 2022 at 02:09:15PM +0200, Jani Nikula wrote: On Mon, 28 Feb 2022, Laurent Pinchart wrote: On Sat, Feb 26, 2022 at 10:27:59AM -0800, Rob Clark wrote: On Wed, Feb 2, 2022 at 7:41 AM Jani Nikula wrote: On Wed, 02 Feb 2022, Laurent Pinchart wrote: On Wed, Feb 02, 2022 at 03:15:03PM +0200, Jani Nikula wrote: On Wed, 02 Feb 2022, Laurent Pinchart wrote: On Wed, Feb 02, 2022 at 02:24:28PM +0530, Kandpal Suraj wrote: Changing rcar_du driver to accomadate the change of drm_writeback_connector.base and drm_writeback_connector.encoder to a pointer the reason for which is explained in the Patch(drm: add writeback pointers to drm_connector). Signed-off-by: Kandpal Suraj --- drivers/gpu/drm/rcar-du/rcar_du_crtc.h | 2 ++ drivers/gpu/drm/rcar-du/rcar_du_writeback.c | 8 +--- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/rcar-du/rcar_du_crtc.h b/drivers/gpu/drm/rcar-du/rcar_du_crtc.h index 66e8839db708..68f387a04502 100644 --- a/drivers/gpu/drm/rcar-du/rcar_du_crtc.h +++ b/drivers/gpu/drm/rcar-du/rcar_du_crtc.h @@ -72,6 +72,8 @@ struct rcar_du_crtc { const char *const *sources; unsigned int sources_count; + struct drm_connector connector; + struct drm_encoder encoder; Those fields are, at best, poorly named. Furthermore, there's no need in this driver or in other drivers using drm_writeback_connector to create an encoder or connector manually. Let's not polute all drivers because i915 doesn't have its abstractions right. i915 uses the quite common model for struct inheritance: struct intel_connector { struct drm_connector base; /* ... */ } Same with at least amd, ast, fsl-dcu, hisilicon, mga200, msm, nouveau, radeon, tilcdc, and vboxvideo. We could argue about the relative merits of that abstraction, but I think the bottom line is that it's popular and the drivers using it are not going to be persuaded to move away from it. Nobody said inheritance is bad. It's no coincidence that the drivers who've implemented writeback so far (komeda, mali, rcar-du, vc4, and vkms) do not use the abstraction, because the drm_writeback_connector midlayer does, forcing the issue. Are you sure it's not a coincidence ? :-) The encoder and especially connector created by drm_writeback_connector are there only because KMS requires a drm_encoder and a drm_connector to be exposed to userspace (and I could argue that using a connector for writeback is a hack, but that won't change). The connector is "virtual", I still fail to see why i915 or any other driver would need to wrap it into something else. The whole point of the drm_writeback_connector abstraction is that drivers do not have to manage the writeback drm_connector manually, they shouldn't touch it at all. The thing is, drm_writeback_connector_init() calling drm_connector_init() on the drm_connector embedded in drm_writeback_connector leads to that connector being added to the drm_device's list of connectors. Ditto for the encoder. All the driver code that handles drm_connectors would need to take into account they might not be embedded in intel_connector. Throughout the driver. Ditto for the encoders. The assumption that a connector is embedded in intel_connector doesn't really play that well with how bridge and panel connectors work.. so in general this seems like a good thing to unwind. But as a point of practicality, i915 is a large driver covering a lot of generations of hw with a lot of users. So I can understand changing this design isn't something that can happen quickly or easily. IMO we should allow i915 to create it's own connector for writeback, and just document clearly that this isn't the approach new drivers should take. I mean, I understand idealism, but sometimes a dose of pragmatism is needed. :-) i915 is big, but so is Intel. It's not fair to treat everybody else as a second class citizen and let Intel get away without doing its homework. Laurent, as you accuse us of not doing our homework, I'll point out that we've been embedding drm crtc, encoder and connector ever since modesetting support was added to i915 in 2008, since before *any* of the things you now use as a rationale for asking us to do a massive rewrite of the driver existed. It's been ok to embed those structures for well over ten years. It's a common pattern, basically throughout the kernel. Other drivers do it too, not just i915. There hasn't been the slightest hint this should not be done until this very conversation. I want to see this refactoring effort moving forward in i915 (and moving to drm_bridge would then be a good idea too). If writeback support in i915 urgent, then we can discuss *temporary* pragmatic stopgap measures, but not without a real effort to fix the core issue. I think the onus is on you to first convince everyo
Re: [Intel-gfx] [PATCH v2 1/3] drm/i915/guc: Limit scheduling properties to avoid overflow
On 3/2/2022 01:49, Tvrtko Ursulin wrote: On 25/02/2022 20:41, john.c.harri...@intel.com wrote: From: John Harrison GuC converts the pre-emption timeout and timeslice quantum values into clock ticks internally. That significantly reduces the point of 32bit overflow. On current platforms, worst case scenario is approximately 110 seconds. Rather than allowing the user to set higher values and then get confused by early timeouts, add limits when setting these values. v2: Add helper functins for clamping (review feedback from Tvrtko). Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio (v1) --- drivers/gpu/drm/i915/gt/intel_engine.h | 6 ++ drivers/gpu/drm/i915/gt/intel_engine_cs.c | 69 + drivers/gpu/drm/i915/gt/sysfs_engines.c | 25 +--- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 9 +++ 4 files changed, 99 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index be4b1e65442f..5a9186f784c4 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -349,4 +349,10 @@ intel_engine_get_hung_context(struct intel_engine_cs *engine) return engine->hung_ce; } +u64 intel_clamp_heartbeat_interval_ms(struct intel_engine_cs *engine, u64 value); +u64 intel_clamp_max_busywait_duration_ns(struct intel_engine_cs *engine, u64 value); +u64 intel_clamp_preempt_timeout_ms(struct intel_engine_cs *engine, u64 value); +u64 intel_clamp_stop_timeout_ms(struct intel_engine_cs *engine, u64 value); +u64 intel_clamp_timeslice_duration_ms(struct intel_engine_cs *engine, u64 value); + #endif /* _INTEL_RINGBUFFER_H_ */ diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index e855c801ba28..7ad9e6006656 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -399,6 +399,26 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id, if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS) engine->props.preempt_timeout_ms = 0; + /* Cap properties according to any system limits */ +#define CLAMP_PROP(field) \ + do { \ + u64 clamp = intel_clamp_##field(engine, engine->props.field); \ + if (clamp != engine->props.field) { \ + drm_notice(&engine->i915->drm, \ + "Warning, clamping %s to %lld to prevent overflow\n", \ + #field, clamp); \ + engine->props.field = clamp; \ + } \ + } while (0) + + CLAMP_PROP(heartbeat_interval_ms); + CLAMP_PROP(max_busywait_duration_ns); + CLAMP_PROP(preempt_timeout_ms); + CLAMP_PROP(stop_timeout_ms); + CLAMP_PROP(timeslice_duration_ms); + +#undef CLAMP_PROP + engine->defaults = engine->props; /* never to change again */ engine->context_size = intel_engine_context_size(gt, engine->class); @@ -421,6 +441,55 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id, return 0; } +u64 intel_clamp_heartbeat_interval_ms(struct intel_engine_cs *engine, u64 value) +{ + value = min_t(u64, value, jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT)); + + return value; +} + +u64 intel_clamp_max_busywait_duration_ns(struct intel_engine_cs *engine, u64 value) +{ + value = min(value, jiffies_to_nsecs(2)); + + return value; +} + +u64 intel_clamp_preempt_timeout_ms(struct intel_engine_cs *engine, u64 value) +{ + /* + * NB: The GuC API only supports 32bit values. However, the limit is further + * reduced due to internal calculations which would otherwise overflow. + */ + if (intel_guc_submission_is_wanted(&engine->gt->uc.guc)) + value = min_t(u64, value, GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS); + + value = min_t(u64, value, jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT)); + + return value; +} + +u64 intel_clamp_stop_timeout_ms(struct intel_engine_cs *engine, u64 value) +{ + value = min_t(u64, value, jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT)); + + return value; +} + +u64 intel_clamp_timeslice_duration_ms(struct intel_engine_cs *engine, u64 value) +{ + /* + * NB: The GuC API only supports 32bit values. However, the limit is further + * reduced due to internal calculations which would otherwise overflow. + */ + if (intel_guc_submission_is_wanted(&engine->gt->uc.guc)) + value = min_t(u64, value, GUC_POLICY_MAX_EXEC_QUANTUM_MS); + + value = min_t(u64, value, jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT)); + + return value; +} + static void __setup_engine_capabilities(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915; diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c b/drivers/gpu/drm/i915/gt/sysfs_engines.c index 967031056202..f2d9858d827c 100644 --- a/drivers/gpu/drm/i915/gt/sysfs_engines.c +++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c @@ -144,7
Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc: Limit scheduling properties to avoid overflow
On 3/2/2022 01:20, Tvrtko Ursulin wrote: On 01/03/2022 19:57, John Harrison wrote: On 3/1/2022 02:50, Tvrtko Ursulin wrote: On 28/02/2022 18:32, John Harrison wrote: On 2/28/2022 08:11, Tvrtko Ursulin wrote: On 25/02/2022 17:39, John Harrison wrote: On 2/25/2022 09:06, Tvrtko Ursulin wrote: On 24/02/2022 19:19, John Harrison wrote: [snip] ./gt/uc/intel_guc_fwif.h: u32 execution_quantum; ./gt/uc/intel_guc_submission.c: desc->execution_quantum = engine->props.timeslice_duration_ms * 1000; ./gt/intel_engine_types.h: unsigned long timeslice_duration_ms; timeslice_store/preempt_timeout_store: err = kstrtoull(buf, 0, &duration); So both kconfig and sysfs can already overflow GuC, not only because of tick conversion internally but because at backend level nothing was done for assigning 64-bit into 32-bit. Or I failed to find where it is handled. That's why I'm adding this range check to make sure we don't allow overflows. Yes and no, this fixes it, but the first bug was not only due GuC internal tick conversion. It was present ever since the u64 from i915 was shoved into u32 sent to GuC. So even if GuC used the value without additional multiplication, bug was be there. My point being when GuC backend was added timeout_ms values should have been limited/clamped to U32_MAX. The tick discovery is additional limit on top. I'm not disagreeing. I'm just saying that the truncation wasn't noticed until I actually tried using very long timeouts to debug a particular problem. Now that it is noticed, we need some method of range checking and this simple clamp solves all the truncation problems. Agreed in principle, just please mention in the commit message all aspects of the problem. I think we can get away without a Fixes: tag since it requires user fiddling to break things in unexpected ways. I would though put in a code a clamping which expresses both, something like min(u32, ..GUC LIMIT..). So the full story is documented forever. Or "if > u32 || > ..GUC LIMIT..) return -EINVAL". Just in case GuC limit one day changes but u32 stays. Perhaps internal ticks go away or anything and we are left with plain 1:1 millisecond relationship. Can certainly add a comment along the lines of "GuC API only takes a 32bit field but that is further reduced to GUC_LIMIT due to internal calculations which would otherwise overflow". But if the GuC limit is > u32 then, by definition, that means the GuC API has changed to take a u64 instead of a u32. So there will no u32 truncation any more. So I'm not seeing a need to explicitly test the integer size when the value check covers that. Hmm I was thinking if the internal conversion in the GuC fw changes so that GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS goes above u32, then to be extra safe by documenting in code there is the additional limit of the data structure field. Say the field was changed to take some unit larger than a millisecond. Then the check against the GuC MAX limit define would not be enough, unless that would account both for internal implementation and u32 in the protocol. Maybe that is overdefensive but I don't see that it harms. 50-50, but it's do it once and forget so I'd do it. Huh? How can the limit be greater than a u32 if the interface only takes a u32? By definition the limit would be clamped to u32 size. If you mean that the GuC policy is in different units and those units might not overflow but ms units do, then actually that is already the case. The GuC works in us not ms. That's part of why the wrap around is so low, we have to multiply by 1000 before sending to GuC. However, that is actually irrelevant because the comparison is being done on the i915 side in i915's units. We have to scale the GuC limit to match what i915 is using. And the i915 side is u64 so if the scaling to i915 numbers overflows a u32 then who cares because that comparison can be done at 64 bits wide. If the units change then that is a backwards breaking API change that will require a manual driver code update. You can't just recompile with a new header and magically get an ms to us or ms to s conversion in your a = b assignment. The code will need to be changed to do the new unit conversion (note we already convert from ms to us, the GuC API is all expressed in us). And that code change will mean having to revisit any and all scaling, type conversions, etc. I.e. any pre-existing checks will not necessarily be valid and will need to be re-visted anyway. But as above, any scaling to GuC units has to be incorporated into the limit already because otherwise the limit would not fit in the GuC's own API. Yes I get that, I was just worried that u32 field in the protocol and GUC_POLICY_MAX_EXEC_QUANTUM_MS defines are separate in the source code and then how to protect against forgetting to update both in sync. Like if the protocol was changed to take nanoseconds, and firmware implementation changed to support t
Re: [RESEND PATCH] dt-bindings: display/msm: add missing brace in dpu-qcm2290.yaml
On Wed, Mar 02, 2022 at 03:14:10AM +0300, Dmitry Baryshkov wrote: > Add missing brace in dpu-qcm2290.yaml. While we are at it, also fix > indentation for another brace, so it matches the corresponding line. > > Reported-by: Rob Herring > Cc: Loic Poulain > Reviewed-by: Bjorn Andersson > Signed-off-by: Dmitry Baryshkov > --- > Didn't include freedreno@ in the first email, so resending. > --- > Documentation/devicetree/bindings/display/msm/dpu-qcm2290.yaml | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) Thanks for fixing. Acked-by: Rob Herring
Re: [PATCH v2 1/2] dt-bindings: display: msm: Add optional resets
On Tue, Mar 01, 2022 at 05:29:30PM -0800, Bjorn Andersson wrote: > Add an optional reference to the MDSS_CORE reset, which when specified > can be used by the implementation to reset the hardware blocks. > > Signed-off-by: Bjorn Andersson > --- > > Changes since v1: > - New approach/patch > > .../devicetree/bindings/display/msm/dpu-qcm2290.yaml | 4 > Documentation/devicetree/bindings/display/msm/dpu-sc7180.yaml | 4 > Documentation/devicetree/bindings/display/msm/dpu-sc7280.yaml | 4 > Documentation/devicetree/bindings/display/msm/dpu-sdm845.yaml | 4 > 4 files changed, 16 insertions(+) Acked-by: Rob Herring
Re: [PATCH] dt-bindings: gpu: Convert aspeed-gfx bindings to yaml
On Wed, Mar 02, 2022 at 03:40:56PM +1030, Joel Stanley wrote: > Convert the bindings to yaml and add the ast2600 compatible string. > > Signed-off-by: Joel Stanley > --- > .../devicetree/bindings/gpu/aspeed,gfx.yaml | 69 +++ > .../devicetree/bindings/gpu/aspeed-gfx.txt| 41 --- > 2 files changed, 69 insertions(+), 41 deletions(-) > create mode 100644 Documentation/devicetree/bindings/gpu/aspeed,gfx.yaml > delete mode 100644 Documentation/devicetree/bindings/gpu/aspeed-gfx.txt Applied, thanks.
Re: [Intel-gfx] [PATCH 2/3] drm/i915/gt: Make the heartbeat play nice with long pre-emption timeouts
On 3/2/2022 03:07, Tvrtko Ursulin wrote: On 01/03/2022 20:59, John Harrison wrote: On 3/1/2022 04:09, Tvrtko Ursulin wrote: I'll trim it a bit again.. On 28/02/2022 18:55, John Harrison wrote: On 2/28/2022 09:12, Tvrtko Ursulin wrote: On 25/02/2022 18:48, John Harrison wrote: On 2/25/2022 10:14, Tvrtko Ursulin wrote: [snip] Your only objection is that ends up with too long total time before reset? Or something else as well? An unnecessarily long total heartbeat timeout is the main objection. (2.5 + 12) * 5 = 72.5 seconds. That is a massive change from the current 12.5s. If we are happy with that huge increase then fine. But I'm pretty sure you are going to get a lot more bug reports about hung systems not recovering. 10-20s is just about long enough for someone to wait before leaning on the power button of their machine. Over a minute is not. That kind of delay is going to cause support issues. Sorry I wrote 12s, while you actually said tP * 12, so 7.68s, chosen just so it is longer than tH * 3? And how do you keep coming up with factor of five? Isn't it four periods before "heartbeat stopped"? (Prio normal, hearbeat, barrier and then reset.) Prio starts at low not normal. Right, slipped my mind since I only keep seeing that one priority ladder block in intel_engine_heartbeat.c/heartbeat().. From the point of view of user experience I agree reasonable responsiveness is needed before user "reaches for the power button". In your proposal we are talking about 3 * 2.5s + 2 * 7.5s, so 22.5s. Question of workloads.. what is the actual preempt timeout compute is happy with? And I don't mean compute setups with disabled hangcheck, which you say they want anyway, but if we target defaults for end users. Do we have some numbers on what they are likely to run? Not that I have ever seen. This is all just finger in the air stuff. I don't recall if we invented the number and the compute people agreed with it or if they proposed the number to us. Yeah me neither. And found nothing in my email archives. :( Thinking about it today I don't see that disabled timeout is a practical default. With it, if users have something un-preemptable to run (assuming prio normal), it would get killed after ~13s (5 * 2.5). If we go for my scheme it gets killed in ~17.5s (3 * (2.5 + 2.5) + 2.5 (third pulse triggers preempt timeout)). And if we go for your scheme it gets killed in ~22.5s (4 * 2.5 + 2 * 3 * 2.5). Erm, that is not an apples to apples comparison. Your 17.5 is for an engine reset tripped by the pre-emption timeout, but your 22.5s is for a GT reset tripped by the heartbeat reaching the end and nuking the universe. Right, in your scheme I did get it wrong. It would wait for GuC to reset the engine at the end, rather than hit the fake "hearbeat stopped" in that case, full reset path. 4 * 2.5 to trigger a max prio pulse, then 3 * 2.5 preempt timeout for GuC to reset (last hearbeat delay extended so it does not trigger). So 17.5 as well. Again, apples or oranges? I was using your tP(RCS) == 2.5s assumption in all the above calculations given that the discussion was about the heartbeat algorithm, not the choice of pre-emption timeout. In which case the last heartbeat is max(tP * 2, tH) == 2 * 2.5s. If you are saying that the first pulse at sufficient priority (third being normal prio) is what causes the reset because the system is working as expected and the pre-emption timeout trips the reset. In that case, you have two periods to get to normal prio plus one pre-emption timeout to trip the reset. I.e. (tH * 2) + tP. Your scheme is then tH(actual) = tH(user) + tP, yes? So pre-emption based reset is after ((tH(user) + tP) * 2) + tP => (3 * tP) + (2 * tH) And GT based reset is after (tH(user) + tP) * 5 => (5 * tP) + (5 * tH) My scheme is tH(actual) = tH(user) for first four, then max(tH(user), tP) for fifth. So pre-emption based reset is after tH(user) * 2 + tP = > tP + (2 * tH); And GT based reset is after (tH(user) * 4) + (max(tH(user), tP) * 1) => greater of ((4 * tH) + tP) or (5 * tH) Either way your scheme is longer. With tH(user) = 2.5s, tP(RCS) = 7.5s, we get 27.5s for engine and 50s for GT versus my 12.5s for engine and 17.5s for GT. With tP(RCS) = 2.5s, yours is 12.5s for engine and 25s for GT versus my 7.5s for engine and 12.5s for GT. Plus, not sure why your calculations above are using 2.5 for tP? Are you still arguing that 7.5s is too long? That is a separate issue and not related to the heartbeat algorithms. tP must be long enough to allow 'out of box OpenCL workloads to complete'. That doesn't just mean not being killed by the heartbeat, it also means not being killed by running two of them concurrently (or one plus desktop OpenGL rendering) and not having it killed by basic time slicing between the two contexts. The heartbeat is not involved in that process. That is purely the pre-emption timeout. And that is the fun
[PATCH][next] drm: ssd130x: remove redundant initialization of pointer mode
Pointer mode is being assigned a value that is never read, it is being re-assigned later with a new value. The initialization is redundant and can be removed. Cleans up clang scan build warning: drivers/gpu/drm/solomon/ssd130x.c:582:27: warning: Value stored to 'mode' during its initialization is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King --- drivers/gpu/drm/solomon/ssd130x.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/solomon/ssd130x.c b/drivers/gpu/drm/solomon/ssd130x.c index 92c1902f53e4..ce4dc20412e0 100644 --- a/drivers/gpu/drm/solomon/ssd130x.c +++ b/drivers/gpu/drm/solomon/ssd130x.c @@ -579,7 +579,7 @@ static const struct drm_simple_display_pipe_funcs ssd130x_pipe_funcs = { static int ssd130x_connector_get_modes(struct drm_connector *connector) { struct ssd130x_device *ssd130x = drm_to_ssd130x(connector->dev); - struct drm_display_mode *mode = &ssd130x->mode; + struct drm_display_mode *mode; struct device *dev = ssd130x->dev; mode = drm_mode_duplicate(connector->dev, &ssd130x->mode); -- 2.34.1
Re: [Intel-gfx] [PATCH 0/3] Improve anti-pre-emption w/a for compute workloads
On 3/2/2022 03:21, Tvrtko Ursulin wrote: On 28/02/2022 19:17, John Harrison wrote: On 2/28/2022 07:32, Tvrtko Ursulin wrote: On 25/02/2022 19:03, John Harrison wrote: On 2/25/2022 10:29, Tvrtko Ursulin wrote: On 25/02/2022 18:01, John Harrison wrote: On 2/25/2022 09:39, Tvrtko Ursulin wrote: On 25/02/2022 17:11, John Harrison wrote: On 2/25/2022 08:36, Tvrtko Ursulin wrote: On 24/02/2022 20:02, John Harrison wrote: On 2/23/2022 04:00, Tvrtko Ursulin wrote: On 23/02/2022 02:22, John Harrison wrote: On 2/22/2022 01:53, Tvrtko Ursulin wrote: On 18/02/2022 21:33, john.c.harri...@intel.com wrote: From: John Harrison Compute workloads are inherently not pre-emptible on current hardware. Thus the pre-emption timeout was disabled as a workaround to prevent unwanted resets. Instead, the hang detection was left to the heartbeat and its (longer) timeout. This is undesirable with GuC submission as the heartbeat is a full GT reset rather than a per engine reset and so is much more destructive. Instead, just bump the pre-emption timeout Can we have a feature request to allow asking GuC for an engine reset? For what purpose? To allow "stopped heartbeat" to reset the engine, however.. GuC manages the scheduling of contexts across engines. With virtual engines, the KMD has no knowledge of which engine a context might be executing on. Even without virtual engines, the KMD still has no knowledge of which context is currently executing on any given engine at any given time. There is a reason why hang detection should be left to the entity that is doing the scheduling. Any other entity is second guessing at best. The reason for keeping the heartbeat around even when GuC submission is enabled is for the case where the KMD/GuC have got out of sync with either other somehow or GuC itself has just crashed. I.e. when no submission at all is working and we need to reset the GuC itself and start over. .. I wasn't really up to speed to know/remember heartbeats are nerfed already in GuC mode. Not sure what you mean by that claim. Engine resets are handled by GuC because GuC handles the scheduling. You can't do the former if you aren't doing the latter. However, the heartbeat is still present and is still the watchdog by which engine resets are triggered. As per the rest of the submission process, the hang detection and recovery is split between i915 and GuC. I meant that "stopped heartbeat on engine XXX" can only do a full GPU reset on GuC. I mean that there is no 'stopped heartbeat on engine XXX' when i915 is not handling the recovery part of the process. H? static void reset_engine(struct intel_engine_cs *engine, struct i915_request *rq) { if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) show_heartbeat(rq, engine); if (intel_engine_uses_guc(engine)) /* * GuC itself is toast or GuC's hang detection * is disabled. Either way, need to find the * hang culprit manually. */ intel_guc_find_hung_context(engine); intel_gt_handle_error(engine->gt, engine->mask, I915_ERROR_CAPTURE, "stopped heartbeat on %s", engine->name); } How there is no "stopped hearbeat" in guc mode? From this code it certainly looks there is. Only when the GuC is toast and it is no longer an engine reset but a full GT reset that is required. So technically, it is not a 'stopped heartbeat on engine XXX' it is 'stopped heartbeat on GT#'. You say below heartbeats are going in GuC mode. Now I totally don't understand how they are going but there is allegedly no "stopped hearbeat". Because if GuC is handling the detection and recovery then i915 will not reach that point. GuC will do the engine reset and start scheduling the next context before the heartbeat period expires. So the notification will be a G2H about a specific context being reset rather than the i915 notification about a stopped heartbeat. intel_gt_handle_error(engine->gt, engine->mask, I915_ERROR_CAPTURE, "stopped heartbeat on %s", engine->name); intel_gt_handle_error: /* * Try engine reset when available. We fall back to full reset if * single reset fails. */ if (!intel_uc_uses_guc_submission(>->uc) && intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) { local_bh_disable(); for_each_engine_masked(engine, gt, engine_mask, tmp) { You said "However, the heartbeat is still present and is still the watchdog by which engine resets are triggered", now I don't know what you meant by this. It actually triggers a single engine reset in GuC mode? Where in code does that happen if this block above shows it not taking the engine reset path? i915 sends down the per engine pulse. GuC schedules the pulse GuC attempts to pre-empt the currently active context GuC detects the pre-emption timeout GuC resets th
[PATCH v1 06/10] drm/msm/a6xx: Propagate OOB set error
Propagate OOB set error to higher level so that a coredump is captured followed by recovery sequence. Signed-off-by: Akhil P Oommen --- drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 33 - drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 19 --- drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 2 +- 3 files changed, 37 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c index f121d798..66ae509 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c @@ -133,7 +133,7 @@ bool a6xx_gmu_gx_is_on(struct a6xx_gmu *gmu) A6XX_GMU_SPTPRAC_PWR_CLK_STATUS_GX_HM_CLK_OFF)); } -void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp) +int a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp) { struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu); @@ -145,7 +145,7 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp) gpu_freq = dev_pm_opp_get_freq(opp); if (gpu_freq == gmu->freq) - return; + return 0; for (perf_index = 0; perf_index < gmu->nr_gpu_freqs - 1; perf_index++) if (gpu_freq == gmu->gpu_freqs[perf_index]) @@ -161,13 +161,13 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp) * bring up the power if it isn't already active */ if (pm_runtime_get_if_in_use(gmu->dev) == 0) - return; + return 0; if (!gmu->legacy) { - a6xx_hfi_set_freq(gmu, perf_index); + ret = a6xx_hfi_set_freq(gmu, perf_index); dev_pm_opp_set_opp(&gpu->pdev->dev, opp); pm_runtime_put(gmu->dev); - return; + return ret; } gmu_write(gmu, REG_A6XX_GMU_DCVS_ACK_OPTION, 0); @@ -182,15 +182,17 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp) gmu_write(gmu, REG_A6XX_GMU_DCVS_BW_SETTING, 0xff); /* Set and clear the OOB for DCVS to trigger the GMU */ - a6xx_gmu_set_oob(gmu, GMU_OOB_DCVS_SET); + ret = a6xx_gmu_set_oob(gmu, GMU_OOB_DCVS_SET); a6xx_gmu_clear_oob(gmu, GMU_OOB_DCVS_SET); - ret = gmu_read(gmu, REG_A6XX_GMU_DCVS_RETURN); - if (ret) + if (!ret && gmu_read(gmu, REG_A6XX_GMU_DCVS_RETURN)) { dev_err(gmu->dev, "GMU set GPU frequency error: %d\n", ret); + ret = -EINVAL; + } dev_pm_opp_set_opp(&gpu->pdev->dev, opp); pm_runtime_put(gmu->dev); + return ret; } unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu) @@ -353,11 +355,13 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state) ret = gmu_poll_timeout(gmu, REG_A6XX_GMU_GMU2HOST_INTR_INFO, val, val & (1 << ack), 100, 1); - if (ret) + if (ret) { DRM_DEV_ERROR(gmu->dev, "Timeout waiting for GMU OOB set %s: 0x%x\n", a6xx_gmu_oob_bits[state].name, gmu_read(gmu, REG_A6XX_GMU_GMU2HOST_INTR_INFO)); + return -ETIMEDOUT; + } /* Clear the acknowledge interrupt */ gmu_write(gmu, REG_A6XX_GMU_GMU2HOST_INTR_CLR, 1 << ack); @@ -922,18 +926,21 @@ static void a6xx_gmu_force_off(struct a6xx_gmu *gmu) a6xx_gmu_rpmh_off(gmu); } -static void a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu) +static int a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu) { struct dev_pm_opp *gpu_opp; unsigned long gpu_freq = gmu->gpu_freqs[gmu->current_perf_index]; + int ret; gpu_opp = dev_pm_opp_find_freq_exact(&gpu->pdev->dev, gpu_freq, true); if (IS_ERR(gpu_opp)) - return; + return PTR_ERR(gpu_opp); gmu->freq = 0; /* so a6xx_gmu_set_freq() doesn't exit early */ - a6xx_gmu_set_freq(gpu, gpu_opp); + ret = a6xx_gmu_set_freq(gpu, gpu_opp); dev_pm_opp_put(gpu_opp); + + return ret; } static void a6xx_gmu_set_initial_bw(struct msm_gpu *gpu, struct a6xx_gmu *gmu) @@ -1018,7 +1025,7 @@ int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu) enable_irq(gmu->hfi_irq); /* Set the GPU to the current freq */ - a6xx_gmu_set_initial_freq(gpu, gmu); + ret = a6xx_gmu_set_initial_freq(gpu, gmu); out: /* On failure, shut down the GMU to leave it in a good state */ diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index 8c3cb31..fdfc5c4 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -890,7 +890,7 @@ static int hw_init(struct msm_gpu *gpu) int ret; /* Make sure the GMU keeps the GPU on while we set it up *
[PATCH v1 10/10] drm/msm/a6xx: Free gmu_debug crashstate bo
Free gmu_debug bo while destroying the gpu crashstate. Signed-off-by: Akhil P Oommen --- drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c index 4d4588a..09bb993 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c @@ -1054,6 +1054,9 @@ static void a6xx_gpu_state_destroy(struct kref *kref) if (a6xx_state->gmu_hfi) kvfree(a6xx_state->gmu_hfi->data); + if (a6xx_state->gmu_debug) + kvfree(a6xx_state->gmu_debug->data); + list_for_each_entry_safe(obj, tmp, &a6xx_state->objs, node) kfree(obj); -- 2.7.4
[PATCH v1 09/10] drm/msm: Remove pm_runtime_get() from msm_job_run()
We do pm_runtime_get() within msm_gpu_submit(). So remove the redundant pm_runtime_get/put from msm_job_run(). Signed-off-by: Akhil P Oommen --- drivers/gpu/drm/msm/msm_ringbuffer.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c index 3bbf574..43fb04e 100644 --- a/drivers/gpu/drm/msm/msm_ringbuffer.c +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c @@ -18,8 +18,6 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job) submit->hw_fence = msm_fence_alloc(submit->ring->fctx); - pm_runtime_get_sync(&gpu->pdev->dev); - /* TODO move submit path over to using a per-ring lock.. */ mutex_lock(&gpu->lock); @@ -27,8 +25,6 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job) mutex_unlock(&gpu->lock); - pm_runtime_put(&gpu->pdev->dev); - return dma_fence_get(submit->hw_fence); } -- 2.7.4
[PATCH v1 08/10] drm/msm/a6xx: Remove clk votes on failure
Remove vote on clks on gpu resume failure. Signed-off-by: Akhil P Oommen --- drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c index 66ae509..e90359f 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c @@ -1033,6 +1033,7 @@ int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu) disable_irq(gmu->gmu_irq); a6xx_gmu_inline_coredump(gmu); a6xx_rpmh_stop(gmu); + clk_bulk_disable_unprepare(gmu->nr_clocks, gmu->clocks); pm_runtime_put(gmu->gxpd); pm_runtime_put(gmu->dev); } -- 2.7.4
[PATCH v1 07/10] drm/msm/adreno: Retry on gpu resume failure
Retry infinitely on resume failure because there is nothing much we can do if GPU is not ON. Also, this helps us to avoid checking for the return value of pm_runtime_get() to see if GPU is ON. Signed-off-by: Akhil P Oommen --- drivers/gpu/drm/msm/adreno/adreno_device.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c index 89cfd84..abcc553 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_device.c +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c @@ -603,8 +603,16 @@ static const struct of_device_id dt_match[] = { static int adreno_resume(struct device *dev) { struct msm_gpu *gpu = dev_to_gpu(dev); + int ret; + + /* What hope do we have for the future if we can't turn ON gpu */ + while (true) { + ret = gpu->funcs->pm_resume(gpu); + if (!ret) + break; + } - return gpu->funcs->pm_resume(gpu); + return 0; } static int active_submits(struct msm_gpu *gpu) -- 2.7.4
[PATCH v1 05/10] drm/msm: Do recovery on hw_init failure
Schedule the recover worker when there is hw init failure in msm_gpu_submit(). The recover worker will take care of capturing coredump, gpu recovery and resubmission of pending IBs. Signed-off-by: Akhil P Oommen --- drivers/gpu/drm/msm/msm_gpu.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c index e8a442a..4d24fa1 100644 --- a/drivers/gpu/drm/msm/msm_gpu.c +++ b/drivers/gpu/drm/msm/msm_gpu.c @@ -757,12 +757,15 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit) struct msm_drm_private *priv = dev->dev_private; struct msm_ringbuffer *ring = submit->ring; unsigned long flags; + int ret; WARN_ON(!mutex_is_locked(&gpu->lock)); pm_runtime_get_sync(&gpu->pdev->dev); - msm_gpu_hw_init(gpu); + ret = msm_gpu_hw_init(gpu); + if (ret) + kthread_queue_work(gpu->worker, &gpu->recover_work); submit->seqno = ++ring->seqno; -- 2.7.4
[PATCH v1 03/10] drm/msm/a6xx: Avoid gmu lock in pm ops
We don't really need gmu lock in runtime pm ops because these operations are serialized anyway and also with other paths where we take this lock. This patch will help to simplify the locking order when we introduce crashstate_lock in the upcoming patch. Signed-off-by: Akhil P Oommen --- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index 3faf551..8c3cb31 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -1530,9 +1530,7 @@ static int a6xx_pm_resume(struct msm_gpu *gpu) trace_msm_gpu_resume(0); - mutex_lock(&a6xx_gpu->gmu.lock); ret = a6xx_gmu_resume(a6xx_gpu); - mutex_unlock(&a6xx_gpu->gmu.lock); if (ret) return ret; @@ -1555,9 +1553,7 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu) msm_devfreq_suspend(gpu); - mutex_lock(&a6xx_gpu->gmu.lock); ret = a6xx_gmu_stop(a6xx_gpu); - mutex_unlock(&a6xx_gpu->gmu.lock); if (ret) return ret; -- 2.7.4
[PATCH v1 04/10] drm/msm/a6xx: Enhance debugging of gmu faults
Add support for inline capture of gmu coredump in gmu resume/suspend path to help debug gmu error/faults. This is sort of a lite version of gpu coredump with just gmu states. And we can't use recover_worker in these scenarios because gmu is collapsed after a failure in this path. Hence we need to capture the gmu states inline before it is collapsed. Signed-off-by: Akhil P Oommen --- drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 18 +++-- drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 1 + drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 60 + drivers/gpu/drm/msm/adreno/adreno_gpu.c | 10 +++-- drivers/gpu/drm/msm/adreno/adreno_gpu.h | 2 + drivers/gpu/drm/msm/msm_gpu.c | 23 ++- drivers/gpu/drm/msm/msm_gpu.h | 11 -- 7 files changed, 105 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c index f208a81..f121d798 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c @@ -1024,6 +1024,7 @@ int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu) /* On failure, shut down the GMU to leave it in a good state */ if (ret) { disable_irq(gmu->gmu_irq); + a6xx_gmu_inline_coredump(gmu); a6xx_rpmh_stop(gmu); pm_runtime_put(gmu->gxpd); pm_runtime_put(gmu->dev); @@ -1082,6 +1083,7 @@ static void a6xx_gmu_shutdown(struct a6xx_gmu *gmu) { struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu); struct adreno_gpu *adreno_gpu = &a6xx_gpu->base; + int ret = 0; u32 val; /* @@ -1091,10 +1093,11 @@ static void a6xx_gmu_shutdown(struct a6xx_gmu *gmu) val = gmu_read(gmu, REG_A6XX_GPU_GMU_CX_GMU_RPMH_POWER_STATE); if (val != 0xf) { - int ret = a6xx_gmu_wait_for_idle(gmu); + ret = a6xx_gmu_wait_for_idle(gmu); /* If the GMU isn't responding assume it is hung */ if (ret) { + a6xx_gmu_inline_coredump(gmu); a6xx_gmu_force_off(gmu); return; } @@ -1102,7 +1105,9 @@ static void a6xx_gmu_shutdown(struct a6xx_gmu *gmu) a6xx_bus_clear_pending_transactions(adreno_gpu); /* tell the GMU we want to slumber */ - a6xx_gmu_notify_slumber(gmu); + ret = a6xx_gmu_notify_slumber(gmu); + if (ret) + goto out; ret = gmu_poll_timeout(gmu, REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_STATUS, val, @@ -1123,6 +1128,10 @@ static void a6xx_gmu_shutdown(struct a6xx_gmu *gmu) REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_STATUS2)); } +out: + if (ret) + a6xx_gmu_inline_coredump(gmu); + /* Turn off HFI */ a6xx_hfi_stop(gmu); @@ -1146,9 +1155,10 @@ int a6xx_gmu_stop(struct a6xx_gpu *a6xx_gpu) * Force the GMU off if we detected a hang, otherwise try to shut it * down gracefully */ - if (gmu->hung) + if (gmu->hung) { + a6xx_gmu_inline_coredump(gmu); a6xx_gmu_force_off(gmu); - else + } else a6xx_gmu_shutdown(gmu); /* Remove the bus vote */ diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h index 675aef0..2599443 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h @@ -86,5 +86,6 @@ void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state, struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu); int a6xx_gpu_state_put(struct msm_gpu_state *state); bool a6xx_is_smmu_stalled(struct msm_gpu *gpu); +void a6xx_gmu_inline_coredump(struct a6xx_gmu *gmu); #endif /* __A6XX_GPU_H__ */ diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c index 09b2ff0..4d4588a 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c @@ -3,6 +3,7 @@ #include #include "msm_gem.h" +#include "msm_gpu.h" #include "a6xx_gpu.h" #include "a6xx_gmu.h" #include "a6xx_gpu_state.h" @@ -970,10 +971,19 @@ void a6xx_get_gmu_state(struct msm_gpu *gpu, struct a6xx_gpu_state *a6xx_state) struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu); struct a6xx_gmu *gmu = &a6xx_gpu->gmu; - if (gmu->hung) + if (gmu->hung) { + mutex_lock(&gmu->lock); a6xx_gmu_send_nmi(gmu); + mutex_unlock(&gmu->lock); + } a6xx_get_gmu_registers(gpu, a6xx_state); + + a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log); + a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi); + a6xx_state->gmu_d
[PATCH v1 02/10] drm/msm/a6xx: Send NMI to gmu when it is hung
While capturing gmu state, first send an NMI to gmu when it is hung. This helps to move gmu to a safe state. Signed-off-by: Akhil P Oommen --- drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 37 + drivers/gpu/drm/msm/adreno/a6xx_gmu.h | 1 + drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 14 ++- 3 files changed, 51 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c index 3e325e2..f208a81 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c @@ -14,6 +14,37 @@ #include "msm_gpu_trace.h" #include "msm_mmu.h" +void a6xx_gmu_send_nmi(struct a6xx_gmu *gmu) +{ + struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu); + struct adreno_gpu *adreno_gpu = &a6xx_gpu->base; + struct msm_gpu *gpu = &adreno_gpu->base; + u32 val; + + if (a6xx_gmu_gx_is_on(gmu) && a6xx_is_smmu_stalled(gpu)) { + DRM_DEV_ERROR(gmu->dev, + "Skipping GMU NMI since SMMU is stalled\n"); + } + + /* Don't retrigger NMI if gmu reset is already active */ + val = gmu_read(gmu, REG_A6XX_GMU_CM3_FW_INIT_RESULT); + if (val & 0xE00) + return; + + /* Mask all interrupts from GMU first */ + gmu_write(gmu, REG_A6XX_GMU_GMU2HOST_INTR_MASK, 0x); + + /* Trigger NMI to make gmu save it's internal state to ddr */ + val = gmu_read(gmu, REG_A6XX_GMU_CM3_CFG); + gmu_write(gmu, REG_A6XX_GMU_CM3_CFG, val | BIT(9)); + + /* Barrier to ensure write is posted before we proceed */ + wmb(); + + /* Small delay to ensure state copy is ddr is complete at GMU */ + udelay(200); +} + static void a6xx_gmu_fault(struct a6xx_gmu *gmu) { struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu); @@ -790,6 +821,12 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, unsigned int state) gmu_write(gmu, REG_A6XX_GMU_CM3_FW_INIT_RESULT, 0); gmu_write(gmu, REG_A6XX_GMU_CM3_BOOT_CONFIG, 0x02); + /* +* Make sure that the NMI bit is cleared by configuring the reset value +* here +*/ + gmu_write(gmu, REG_A6XX_GMU_CM3_CFG, 0x4052); + /* Write the iova of the HFI table */ gmu_write(gmu, REG_A6XX_GMU_HFI_QTBL_ADDR, gmu->hfi.iova); gmu_write(gmu, REG_A6XX_GMU_HFI_QTBL_INFO, 1); diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h index 84bd516..4228ec1 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h @@ -186,5 +186,6 @@ int a6xx_hfi_set_freq(struct a6xx_gmu *gmu, int index); bool a6xx_gmu_gx_is_on(struct a6xx_gmu *gmu); bool a6xx_gmu_sptprac_is_on(struct a6xx_gmu *gmu); +void a6xx_gmu_send_nmi(struct a6xx_gmu *gmu); #endif diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c index 7de9d2f..09b2ff0 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c @@ -964,6 +964,18 @@ static void a6xx_get_indexed_registers(struct msm_gpu *gpu, a6xx_state->nr_indexed_regs = count; } +void a6xx_get_gmu_state(struct msm_gpu *gpu, struct a6xx_gpu_state *a6xx_state) +{ + struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); + struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu); + struct a6xx_gmu *gmu = &a6xx_gpu->gmu; + + if (gmu->hung) + a6xx_gmu_send_nmi(gmu); + + a6xx_get_gmu_registers(gpu, a6xx_state); +} + struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu) { struct a6xx_crashdumper _dumper = { 0 }, *dumper = NULL; @@ -980,7 +992,7 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu) /* Get the generic state from the adreno core */ adreno_gpu_state_get(gpu, &a6xx_state->base); - a6xx_get_gmu_registers(gpu, a6xx_state); + a6xx_get_gmu_state(gpu, a6xx_state); a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.log); a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, &a6xx_gpu->gmu.hfi); -- 2.7.4
[PATCH v1 01/10] drm/msm/a6xx: Add helper to check smmu is stalled
Add a helper function to check for stalled smmu and also avoid reading RBBM_STATUS3 register which is in GX domain before ensuring GX is ON. Signed-off-by: Akhil P Oommen --- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 8 +++- drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 1 + drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 4 +--- 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index 7d23c74..3faf551 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -14,6 +14,12 @@ #define GPU_PAS_ID 13 +bool a6xx_is_smmu_stalled(struct msm_gpu *gpu) +{ + return !!(gpu_read(gpu, REG_A6XX_RBBM_STATUS3) & + A6XX_RBBM_STATUS3_SMMU_STALLED_ON_FAULT); +} + static inline bool _a6xx_check_idle(struct msm_gpu *gpu) { struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); @@ -1346,7 +1352,7 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu) * to otherwise resume normally rather than killing the submit, so * just bail. */ - if (gpu_read(gpu, REG_A6XX_RBBM_STATUS3) & A6XX_RBBM_STATUS3_SMMU_STALLED_ON_FAULT) + if (a6xx_is_smmu_stalled(gpu)) return; /* diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h index 86e0a7c..675aef0 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h @@ -85,5 +85,6 @@ void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state, struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu); int a6xx_gpu_state_put(struct msm_gpu_state *state); +bool a6xx_is_smmu_stalled(struct msm_gpu *gpu); #endif /* __A6XX_GPU_H__ */ diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c index 55f4433..7de9d2f 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c @@ -971,8 +971,6 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu) struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu); struct a6xx_gpu_state *a6xx_state = kzalloc(sizeof(*a6xx_state), GFP_KERNEL); - bool stalled = !!(gpu_read(gpu, REG_A6XX_RBBM_STATUS3) & - A6XX_RBBM_STATUS3_SMMU_STALLED_ON_FAULT); if (!a6xx_state) return ERR_PTR(-ENOMEM); @@ -1003,7 +1001,7 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu) * write out GPU state, so we need to skip this when the SMMU is * stalled in response to an iova fault */ - if (!stalled && !gpu->needs_hw_init && + if (!a6xx_is_smmu_stalled(gpu) && !a6xx_crashdumper_init(gpu, &_dumper)) { dumper = &_dumper; } -- 2.7.4
[PATCH v1 00/10] Support for GMU coredump and some related improvements
Major enhancement in this series is the support for a minimal gmu coredump which can be captured inline instead of through our usual recover worker. It is helpful in the case of gmu errors during gpu wake-up/suspend path and helps to capture a snapshot of gmu before we do a suspend. I had to introduce a lock to synchronize the crashstate because the runtime-suspend can happen from an asynchronous RPM thread. Apart from this, there are some improvements to gracefully handle the gmu errors by propagating the error back to parent or by retrying. Also, a few patches to fix some trivial bugs in the related code. Akhil P Oommen (10): drm/msm/a6xx: Add helper to check smmu is stalled drm/msm/a6xx: Send NMI to gmu when it is hung drm/msm/a6xx: Avoid gmu lock in pm ops drm/msm/a6xx: Enhance debugging of gmu faults drm/msm: Do recovery on hw_init failure drm/msm/a6xx: Propagate OOB set error drm/msm/adreno: Retry on gpu resume failure drm/msm/a6xx: Remove clk votes on failure drm/msm: Remove pm_runtime_get() from msm_job_run() drm/msm/a6xx: Free gmu_debug crashstate bo drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 89 +++-- drivers/gpu/drm/msm/adreno/a6xx_gmu.h | 1 + drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 31 +++--- drivers/gpu/drm/msm/adreno/a6xx_gpu.h | 4 +- drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 79 + drivers/gpu/drm/msm/adreno/adreno_device.c | 10 +++- drivers/gpu/drm/msm/adreno/adreno_gpu.c | 10 +++- drivers/gpu/drm/msm/adreno/adreno_gpu.h | 2 + drivers/gpu/drm/msm/msm_gpu.c | 28 - drivers/gpu/drm/msm/msm_gpu.h | 11 ++-- drivers/gpu/drm/msm/msm_ringbuffer.c| 4 -- 11 files changed, 218 insertions(+), 51 deletions(-) -- 2.7.4
Re: [PATCH 0/2] DSI host and peripheral initialisation ordering
Hi Doug On Wed, 2 Mar 2022 at 00:13, Doug Anderson wrote: > > Hi, > > On Wed, Feb 16, 2022 at 9:00 AM Dave Stevenson > wrote: > > > > Hi All > > > > Hopefully I've cc'ed all those that have bashed this problem around > > previously, > > or are otherwise linked to DRM bridges. > > > > There have been numerous discussions around how DSI support is currently > > broken > > as it doesn't support initialising the PHY to LP-11 and potentially the > > clock > > lane to HS prior to configuring the DSI peripheral. There is no op where the > > interface is initialised but HS video isn't also being sent. > > Currently you have: > > - peripheral pre_enable (host not initialised yet) > > - host pre_enable > > - encoder enable > > - host enable > > - peripheral enable (video already running) > > > > vc4 and exynos currently implement the DSI host as an encoder, and split the > > bridge_chain. This fails if you want to switch to being a bridge and/or use > > atomic calls as the state of all the elements split off are not added by > > drm_atomic_add_encoder_bridges. > > > > dw-mipi-dsi[1] and now msm[2] use the mode_set hook to initialise the PHY, > > so > > the bridge/panel pre_enable can send commands. In their post_disable they > > then > > call the downstream bridge/panel post_disable op manually so that shutdown > > commands can be sent before shutting down the PHY. Nothing handles that > > fact, > > so the framework then continues down the bridge chain and calls the > > post_disable > > again, so we get unbalanced panel prepare/unprepare calls being reported > > [3]. > > > > There have been patches[4] proposing reversing the entire direction of > > pre_enable and post_disable, but that risks driving voltage into devices > > that > > have yet to be powered up. > > There have been discussions about adding either a pre_pre_enable, or adding > > a > > DSI host_op to initialise the host[5]. Both require significant reworking > > to all > > existing drivers in moving initialisation phases. > > We have patches that look like they may well be addressing race conditions > > in > > starting up a DSI peripheral[6]. > > In general I'm happy to let the more senior people in DRM set the > direction here so I probably won't do lots of review, but I will point > out that I did have another proposal that sorta got lost in the noise > of the whole "reversing the entire direction". That's basically: > > https://lists.freedesktop.org/archives/dri-devel/2021-October/328934.html > > I have no idea if something like that would work for your use case, > but after analyzing it it felt like a surprisingly clean proposal even > if my first instinct when I thought about it was that it was a hack. > ;-) I suspect (but haven't analyzed your code) that it might be > equivalent to your proposal of using a flag but maybe easier to wrap > ones head around? If I'm reading that right, then you're proposing adding after_pre_enable and before_post_disable hooks. That's almost the same as the power_up() and power_down() ops that Dmitry suggested earlier, or pre_pre_enable / post_post_disable that had also been considered. Neither of those options handles the case of a longer chain in which two non-consecutive links want their upstream bridge enabled first. As per the clarification in patch 1/2, considering the chain - Panel - Bridge 1 - Bridge 2 DRM_BRIDGE_OP_UPSTREAM_FIRST - Bridge 3 - Bridge 4 DRM_BRIDGE_OP_UPSTREAM_FIRST - Bridge 5 - Encoder With the flag option we call pre_enables as Panel, Bridge 1, Bridge 3, Bridge 2, Bridge 5, Bridge 4, Encoder. If adding after_pre_enable, then we end up with Panel, Bridge 1, Bridge 3, Bridge 5, Bridge 4 (after_pre_enable), Bridge 2 (after_pre_enable), Encoder. (power_on / pre_pre_enable from encoder to connector would end up with Bridge 5 (power_on), Bridge 3 (power_on), Bridge 1 (power_on), Panel, Bridge 2, Bridge 4, Encoder). Those potentially work, but it seems a less logical order compared to using a flag to swap only the bridges of interest. I think power_on / pre_pre_enable covers DSI better than after_pre_enable. Adding the extra ops requires the source bridge (eg DSI host) to know the behaviour the sink bridge/panel wants. So do all DSI hosts have to implement power_on to power up and enter LP-11. Some DSI peripherals may be quite happy or even prefer to have the bus totally idle / powered down at pre_enable, but there would be no way of implementing that. You seem to be looking at DP, which I have very little knowledge of, and I don't quite understand your comments about the AUX bus and how ordering should be configured. If your panel isn't a generic driver, couldn't it request that the upstream bridge is pre_enabled first? Dave
Re: [Intel-gfx] [PATCH 6/6] treewide: remove check of list iterator against head past the loop body
On 28/02/2022 11:08, Jakob Koschel wrote: When list_for_each_entry() completes the iteration over the whole list without breaking the loop, the iterator value will be a bogus pointer computed based on the head element. While it is safe to use the pointer to determine if it was computed based on the head element, either with list_entry_is_head() or &pos->member == head, using the iterator variable after the loop should be avoided. In preparation to limiting the scope of a list iterator to the list traversal loop, use a dedicated pointer to point to the found element. Signed-off-by: Jakob Koschel [snip until i915 parts] drivers/gpu/drm/i915/gem/i915_gem_context.c | 14 +++--- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 15 --- drivers/gpu/drm/i915/gt/intel_ring.c | 15 --- [snip] diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 00327b750fbb..80c79028901a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -107,25 +107,27 @@ static void lut_close(struct i915_gem_context *ctx) radix_tree_for_each_slot(slot, &ctx->handles_vma, &iter, 0) { struct i915_vma *vma = rcu_dereference_raw(*slot); struct drm_i915_gem_object *obj = vma->obj; - struct i915_lut_handle *lut; + struct i915_lut_handle *lut = NULL; + struct i915_lut_handle *tmp; if (!kref_get_unless_zero(&obj->base.refcount)) continue; spin_lock(&obj->lut_lock); - list_for_each_entry(lut, &obj->lut_list, obj_link) { - if (lut->ctx != ctx) + list_for_each_entry(tmp, &obj->lut_list, obj_link) { + if (tmp->ctx != ctx) continue; - if (lut->handle != iter.index) + if (tmp->handle != iter.index) continue; - list_del(&lut->obj_link); + list_del(&tmp->obj_link); + lut = tmp; break; } spin_unlock(&obj->lut_lock); - if (&lut->obj_link != &obj->lut_list) { + if (lut) { i915_lut_handle_free(lut); radix_tree_iter_delete(&ctx->handles_vma, &iter, slot); Looks okay although personally I would have left lut as is for a smaller diff and introduced a new local like 'found' or 'unlinked'. i915_vma_close(vma); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 1736efa43339..fda9e3685ad2 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -2444,7 +2444,8 @@ static struct i915_request *eb_throttle(struct i915_execbuffer *eb, struct intel { struct intel_ring *ring = ce->ring; struct intel_timeline *tl = ce->timeline; - struct i915_request *rq; + struct i915_request *rq = NULL; + struct i915_request *tmp; /* * Completely unscientific finger-in-the-air estimates for suitable @@ -2460,15 +2461,17 @@ static struct i915_request *eb_throttle(struct i915_execbuffer *eb, struct intel * claiming our resources, but not so long that the ring completely * drains before we can submit our next request. */ - list_for_each_entry(rq, &tl->requests, link) { - if (rq->ring != ring) + list_for_each_entry(tmp, &tl->requests, link) { + if (tmp->ring != ring) continue; - if (__intel_ring_space(rq->postfix, - ring->emit, ring->size) > ring->size / 2) + if (__intel_ring_space(tmp->postfix, + ring->emit, ring->size) > ring->size / 2) { + rq = tmp; break; + } } - if (&rq->link == &tl->requests) + if (!rq) return NULL; /* weird, we will check again later for real */ Alternatively, instead of break could simply do "return i915_request_get(rq);" and replace the end of the function after the loop with "return NULL;". A bit smaller diff, or at least less "spread out" over the function, so might be easier to backport stuff touching this area in the future. But looks correct as is. return i915_request_get(rq); diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c index 2fdd52b62092..4881c4e0c407 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring.c +++ b/drivers/gpu/drm/i915/gt/intel_ring.c @@ -191,24 +191,27 @@ wait_for_space(struct intel_ring *ring, struct intel_timeline *tl,
Re: [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range
On Wed, Mar 2, 2022 at 10:55 AM Michael Cheng wrote: > > Thanks for the feedback Robin! > > Sorry my choices of word weren't that great, but what I meant is to > understand how ARM flushes a range of dcache for device drivers, and not > an equal to x86 clflush. > > I believe the concern is if the CPU writes an update, that update might > only be sitting in the CPU cache and never make it to device memory > where the device can see it; there are specific places that we are > supposed to flush the CPU caches to make sure our updates are visible to > the hardware. > > +Matt Roper > > Matt, Lucas, any feed back here? MMIO (e.g., PCI BARs, etc.) should be mapped uncached. If it's not you'll have a lot of problems using a GPU on that architecture. One thing that you may want to check is if your device has its own caches or write queues on the BAR aperture. You may have to flush them after CPU access to the BAR to make sure CPU updates land in device memory. For system memory, PCI, per the spec, should be cache coherent with the CPU. If it's not, you'll have a lot of trouble using a GPU on that platform. Alex > > On 2022-03-02 4:49 a.m., Robin Murphy wrote: > > On 2022-02-25 19:27, Michael Cheng wrote: > >> Hi Robin, > >> > >> [ +arm64 maintainers for their awareness, which would have been a > >> good thing to do from the start ] > >> > >> * Thanks for adding the arm64 maintainer and sorry I didn't rope them > >> in sooner. > >> > >> Why does i915 need to ensure the CPU's instruction cache is coherent > >> with its data cache? Is it a self-modifying driver? > >> > >> * Also thanks for pointing this out. Initially I was using > >> dcache_clean_inval_poc, which seem to be the equivalently to what > >> x86 is doing for dcache flushing, but it was giving me build errors > >> since its not on the global list of kernel symbols. And after > >> revisiting the documentation for caches_clean_inval_pou, it won't > >> fly for what we are trying to do. Moving forward, what would you (or > >> someone in the ARM community) suggest we do? Could it be possible to > >> export dcache_clean_inval_poc as a global symbol? > > > > Unlikely, unless something with a legitimate need for CPU-centric > > cache maintenance like kexec or CPU hotplug ever becomes modular. > > > > In the case of a device driver, it's not even the basic issues of > > assuming to find direct equivalents to x86 semantics in other CPU > > architectures, or effectively reinventing parts of the DMA API, it's > > even bigger than that. Once you move from being integrated in a single > > vendor's system architecture to being on a discrete card, you > > fundamentally *no longer have any control over cache coherency*. > > Whether the host CPU architecture happens to be AArch64, RISC-V, or > > whatever doesn't really matter, you're at the mercy of 3rd-party PCIe > > and interconnect IP vendors, and SoC integrators. You'll find yourself > > in systems where PCIe simply cannot snoop any caches, where you'd > > better have the correct DMA API calls in place to have any hope of > > even the most basic functionality working properly; you'll find > > yourself in systems where even if the PCIe root complex claims to > > support No Snoop, your uncached traffic will still end up snooping > > stale data that got prefetched back into caches you thought you'd > > invalidated; you'll find yourself in systems where your memory > > attributes may or may not get forcibly rewritten by an IOMMU depending > > on the kernel config and/or command line. > > > > It's not about simply finding a substitute for clflush, it's that the > > reasons you have for using clflush in the first place can no longer be > > assumed to be valid. > > > > Robin. > > > >> On 2022-02-25 10:24 a.m., Robin Murphy wrote: > >>> [ +arm64 maintainers for their awareness, which would have been a > >>> good thing to do from the start ] > >>> > >>> On 2022-02-25 03:24, Michael Cheng wrote: > Add arm64 support for drm_clflush_virt_range. caches_clean_inval_pou > performs a flush by first performing a clean, follow by an > invalidation > operation. > > v2 (Michael Cheng): Use correct macro for cleaning and invalidation > the > dcache. Thanks Tvrtko for the suggestion. > > v3 (Michael Cheng): Replace asm/cacheflush.h with linux/cacheflush.h > > v4 (Michael Cheng): Arm64 does not export dcache_clean_inval_poc as a > symbol that could be use by other modules, thus use > caches_clean_inval_pou instead. Also this version > removes include for cacheflush, since its already > included base on architecture type. > > Signed-off-by: Michael Cheng > Reviewed-by: Matt Roper > --- > drivers/gpu/drm/drm_cache.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/g
Re: [PATCH] ASoC: qcom: Fix error code in lpass_platform_copy()
On Tue, 1 Mar 2022 11:11:04 +0300, Dan Carpenter wrote: > The copy_to/from_user() functions return the number of bytes remaining > to be copied. This function needs to return negative error codes > because snd_soc_pcm_component_copy_user() treats positive returns as > success in soc_component_ret(). > > Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [1/1] ASoC: qcom: Fix error code in lpass_platform_copy() commit: d5dd781bcc81aa31b62310927f25cfa2574450f1 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH 3/3] dt-bindings: display: bridge: renesas, lvds: Document r8a77961 bindings
Hi Laurent, On Wed, Dec 29, 2021 at 5:47 PM Laurent Pinchart wrote: > On Fri, Dec 24, 2021 at 08:23:09AM +0300, Nikita Yushchenko wrote: > > Document the R-Car M3-W+ (R8A77961) SoC in the R-Car LVDS encoder > > bindings. > > > > Signed-off-by: Nikita Yushchenko > > --- > > .../devicetree/bindings/display/bridge/renesas,lvds.yaml | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git > > a/Documentation/devicetree/bindings/display/bridge/renesas,lvds.yaml > > b/Documentation/devicetree/bindings/display/bridge/renesas,lvds.yaml > > index acfc327f70a7..ca5443e5c2e3 100644 > > --- a/Documentation/devicetree/bindings/display/bridge/renesas,lvds.yaml > > +++ b/Documentation/devicetree/bindings/display/bridge/renesas,lvds.yaml > > @@ -27,6 +27,7 @@ properties: > >- renesas,r8a7791-lvds # for R-Car M2-W compatible LVDS encoders > >- renesas,r8a7793-lvds # for R-Car M2-N compatible LVDS encoders > >- renesas,r8a7795-lvds # for R-Car H3 compatible LVDS encoders > > + - renesas,r8a77961-lvds # for R-Car M3-W+ compatible LVDS encoders > > I'll move this line after the next to keep them sorted. No need to > resubmit. Any chance this will happen soon? Same for patch 1/3 . Patch 2/3 is already queued in soc/for-next. Thank you! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH v5 4/7] drm/i915/gt: create per-tile sysfs interface
On 17.02.2022 15:41, Andi Shyti wrote: Now that we have tiles we want each of them to have its own interface. A directory "gt/" is created under "cardN/" that will contain as many diroctories as the tiles. In the coming patches tile related interfaces will be added. For now the sysfs gt structure simply has an id interface related to the current tile count. The directory structure will follow this scheme: /sys/.../card0 └── gt ├── gt0 │ └── id : : └─- gtN └── id This new set of interfaces will be a basic tool for system managers and administrators when using i915. Signed-off-by: Andi Shyti Cc: Matt Roper Cc: Sujaritha Sundaresan Cc: Tvrtko Ursulin Reviewed-by: Sujaritha Sundaresan --- drivers/gpu/drm/i915/Makefile| 1 + drivers/gpu/drm/i915/gt/intel_gt.c | 2 + drivers/gpu/drm/i915/gt/intel_gt_sysfs.c | 118 +++ drivers/gpu/drm/i915/gt/intel_gt_sysfs.h | 34 +++ drivers/gpu/drm/i915/i915_drv.h | 2 + drivers/gpu/drm/i915/i915_sysfs.c| 12 ++- drivers/gpu/drm/i915/i915_sysfs.h| 3 + 7 files changed, 171 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_sysfs.c create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_sysfs.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 9d588d936e3d..277064b00afd 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -105,6 +105,7 @@ gt-y += \ gt/intel_gt_pm_debugfs.o \ gt/intel_gt_pm_irq.o \ gt/intel_gt_requests.o \ + gt/intel_gt_sysfs.o \ gt/intel_gtt.o \ gt/intel_llc.o \ gt/intel_lrc.o \ diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 8c64b81e9ec9..0f080bbad043 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -26,6 +26,7 @@ #include "intel_rc6.h" #include "intel_renderstate.h" #include "intel_rps.h" +#include "intel_gt_sysfs.h" #include "intel_uncore.h" #include "shmem_utils.h" @@ -458,6 +459,7 @@ void intel_gt_driver_register(struct intel_gt *gt) intel_rps_driver_register(>->rps); intel_gt_debugfs_register(gt); + intel_gt_sysfs_register(gt); } static int intel_gt_init_scratch(struct intel_gt *gt, unsigned int size) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c b/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c new file mode 100644 index ..0206e9aa4867 --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c @@ -0,0 +1,118 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2022 Intel Corporation + */ + +#include +#include +#include +#include +#include + +#include "i915_drv.h" +#include "i915_sysfs.h" +#include "intel_gt.h" +#include "intel_gt_sysfs.h" +#include "intel_gt_types.h" +#include "intel_rc6.h" + +bool is_object_gt(struct kobject *kobj) +{ + return !strncmp(kobj->name, "gt", 2); +} It looks quite fragile, at the moment I do not have better idea:) maybe after reviewing the rest of the patches. + +static struct intel_gt *kobj_to_gt(struct kobject *kobj) +{ + return container_of(kobj, struct kobj_gt, base)->gt; +} + +struct intel_gt *intel_gt_sysfs_get_drvdata(struct device *dev, + const char *name) +{ + struct kobject *kobj = &dev->kobj; + + /* +* We are interested at knowing from where the interface +* has been called, whether it's called from gt/ or from +* the parent directory. +* From the interface position it depends also the value of +* the private data. +* If the interface is called from gt/ then private data is +* of the "struct intel_gt *" type, otherwise it's * a +* "struct drm_i915_private *" type. +*/ + if (!is_object_gt(kobj)) { + struct drm_i915_private *i915 = kdev_minor_to_i915(dev); + + pr_devel_ratelimited(DEPRECATED + "%s (pid %d) is accessing deprecated %s " + "sysfs control, please use gt/gt/%s instead\n", + current->comm, task_pid_nr(current), name, name); + return to_gt(i915); + } + + return kobj_to_gt(kobj); It took some time for me to understand what is going on here. We have dev argument which sometimes can point to "struct device", sometimes to "struct kobj_gt", but it's type suggests differently, quite ugly. I wonder if wouldn't be better to use __ATTR instead of DEVICE_ATTR* as in case of intel_engines_add_sysfs. This way abstractions would look better, hopefully. +} + +static ssize_t id_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct intel_gt *gt = intel_
Re: [PATCH] video: fbdev: sm712fb: Fix crash in smtcfb_write()
On 3/2/22 15:33, Zheyu Ma wrote: > When the sm712fb driver writes three bytes to the framebuffer, the > driver will crash: > > BUG: unable to handle page fault for address: c90001ff > RIP: 0010:smtcfb_write+0x454/0x5b0 > Call Trace: > vfs_write+0x291/0xd60 > ? do_sys_openat2+0x27d/0x350 > ? __fget_light+0x54/0x340 > ksys_write+0xce/0x190 > do_syscall_64+0x43/0x90 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > Fix it by removing the open-coded endianness fixup-code. > > Signed-off-by: Zheyu Ma Thanks... it's already in the fbdev git tree and queued up for v5.18... https://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev.git/commit/?h=for-next&id=bd771cf5c4254511cc4abb88f3dab3bd58bdf8e8 Helge > --- > drivers/video/fbdev/sm712fb.c | 21 - > 1 file changed, 4 insertions(+), 17 deletions(-) > > diff --git a/drivers/video/fbdev/sm712fb.c b/drivers/video/fbdev/sm712fb.c > index 0dbc6bf8268a..e355089ac7d6 100644 > --- a/drivers/video/fbdev/sm712fb.c > +++ b/drivers/video/fbdev/sm712fb.c > @@ -1130,7 +1130,7 @@ static ssize_t smtcfb_write(struct fb_info *info, const > char __user *buf, > count = total_size - p; > } > > - buffer = kmalloc((count > PAGE_SIZE) ? PAGE_SIZE : count, GFP_KERNEL); > + buffer = kmalloc(PAGE_SIZE, GFP_KERNEL); > if (!buffer) > return -ENOMEM; > > @@ -1148,24 +1148,11 @@ static ssize_t smtcfb_write(struct fb_info *info, > const char __user *buf, > break; > } > > - for (i = c >> 2; i--;) { > - fb_writel(big_swap(*src), dst++); > + for (i = (c + 3) >> 2; i--;) { > + fb_writel(big_swap(*src), dst); > + dst++; > src++; > } > - if (c & 3) { > - u8 *src8 = (u8 *)src; > - u8 __iomem *dst8 = (u8 __iomem *)dst; > - > - for (i = c & 3; i--;) { > - if (i & 1) { > - fb_writeb(*src8++, ++dst8); > - } else { > - fb_writeb(*src8++, --dst8); > - dst8 += 2; > - } > - } > - dst = (u32 __iomem *)dst8; > - } > > *ppos += c; > buf += c;
Re: [RFC PATCH] drm/panel: simple: panel-dpi: use bus-format to set bpc and bus_format
On 3/2/22 15:21, Maxime Ripard wrote: Hi, Hi, Please try to avoid top posting On Wed, Feb 23, 2022 at 04:25:19PM +0100, Max Krummenacher wrote: The goal here is to set the element bus_format in the struct panel_desc. This is an enum with the possible values defined in include/uapi/linux/media-bus-format.h. The enum values are not constructed in a way that you could calculate the value from color channel width/shift/mapping/whatever. You rather would have to check if the combination of color channel width/shift/mapping/whatever maps to an existing value and otherwise EINVAL out. I don't see the value in having yet another way of how this information can be specified and then having to write a more complicated parser which maps the dt data to bus_format. Generally speaking, sending an RFC without explicitly stating what you want a comment on isn't very efficient. Isn't that what RFC stands for -- Request For Comment ? That being said, what I (and I can only assume Marek) don't like is the string encoding. Especially when the similar bus-type property uses a integer with the various available bus options we have. Right, the string encoding isn't good. Having an integer, with a set of defines that you would map to the proper MEDIA_BUS_* would be more efficient and more elegant. That being said, the first question that needs to be answered is why does this have to be in the DT in the first place? Because panel-simple panel-dpi , you may need to specify the bus format between the last bridge and the panel .
Re: [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range
Thanks for the feedback Robin! Sorry my choices of word weren't that great, but what I meant is to understand how ARM flushes a range of dcache for device drivers, and not an equal to x86 clflush. I believe the concern is if the CPU writes an update, that update might only be sitting in the CPU cache and never make it to device memory where the device can see it; there are specific places that we are supposed to flush the CPU caches to make sure our updates are visible to the hardware. +Matt Roper Matt, Lucas, any feed back here? On 2022-03-02 4:49 a.m., Robin Murphy wrote: On 2022-02-25 19:27, Michael Cheng wrote: Hi Robin, [ +arm64 maintainers for their awareness, which would have been a good thing to do from the start ] * Thanks for adding the arm64 maintainer and sorry I didn't rope them in sooner. Why does i915 need to ensure the CPU's instruction cache is coherent with its data cache? Is it a self-modifying driver? * Also thanks for pointing this out. Initially I was using dcache_clean_inval_poc, which seem to be the equivalently to what x86 is doing for dcache flushing, but it was giving me build errors since its not on the global list of kernel symbols. And after revisiting the documentation for caches_clean_inval_pou, it won't fly for what we are trying to do. Moving forward, what would you (or someone in the ARM community) suggest we do? Could it be possible to export dcache_clean_inval_poc as a global symbol? Unlikely, unless something with a legitimate need for CPU-centric cache maintenance like kexec or CPU hotplug ever becomes modular. In the case of a device driver, it's not even the basic issues of assuming to find direct equivalents to x86 semantics in other CPU architectures, or effectively reinventing parts of the DMA API, it's even bigger than that. Once you move from being integrated in a single vendor's system architecture to being on a discrete card, you fundamentally *no longer have any control over cache coherency*. Whether the host CPU architecture happens to be AArch64, RISC-V, or whatever doesn't really matter, you're at the mercy of 3rd-party PCIe and interconnect IP vendors, and SoC integrators. You'll find yourself in systems where PCIe simply cannot snoop any caches, where you'd better have the correct DMA API calls in place to have any hope of even the most basic functionality working properly; you'll find yourself in systems where even if the PCIe root complex claims to support No Snoop, your uncached traffic will still end up snooping stale data that got prefetched back into caches you thought you'd invalidated; you'll find yourself in systems where your memory attributes may or may not get forcibly rewritten by an IOMMU depending on the kernel config and/or command line. It's not about simply finding a substitute for clflush, it's that the reasons you have for using clflush in the first place can no longer be assumed to be valid. Robin. On 2022-02-25 10:24 a.m., Robin Murphy wrote: [ +arm64 maintainers for their awareness, which would have been a good thing to do from the start ] On 2022-02-25 03:24, Michael Cheng wrote: Add arm64 support for drm_clflush_virt_range. caches_clean_inval_pou performs a flush by first performing a clean, follow by an invalidation operation. v2 (Michael Cheng): Use correct macro for cleaning and invalidation the dcache. Thanks Tvrtko for the suggestion. v3 (Michael Cheng): Replace asm/cacheflush.h with linux/cacheflush.h v4 (Michael Cheng): Arm64 does not export dcache_clean_inval_poc as a symbol that could be use by other modules, thus use caches_clean_inval_pou instead. Also this version removes include for cacheflush, since its already included base on architecture type. Signed-off-by: Michael Cheng Reviewed-by: Matt Roper --- drivers/gpu/drm/drm_cache.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index c3e6e615bf09..81c28714f930 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -174,6 +174,11 @@ drm_clflush_virt_range(void *addr, unsigned long length) if (wbinvd_on_all_cpus()) pr_err("Timed out waiting for cache flush\n"); + +#elif defined(CONFIG_ARM64) + void *end = addr + length; + caches_clean_inval_pou((unsigned long)addr, (unsigned long)end); Why does i915 need to ensure the CPU's instruction cache is coherent with its data cache? Is it a self-modifying driver? Robin. (Note that the above is somewhat of a loaded question, and I do actually have half an idea of what you're trying to do here and why it won't fly, but I'd like to at least assume you've read the documentation of the function you decided was OK to use) + #else WARN_ONCE(1, "Architecture has no drm_cache.c support\n"); #endif
[PATCH 8/8] drm/vmwgfx: Stop using surface dma commands on most configurations
From: Zack Rusin Initial version of guest backed objects in the host had some performance issues that made using surface-dma's instead of direct copies faster. Surface dma's force a migration to vram which at best is slow and at worst is impossible (e.g. on svga3 where there's not enough vram to migrate fb's to it). Slowly migrate away from surface dma's to direct copies by limiting their usage to systems with more than 32MB of vram. Signed-off-by: Zack Rusin Reviewed-by: Maaz Mombasawala Reviewed-by: Martin Krastev --- drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c b/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c index 42b5ecb0c5e9..eb014b97d156 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c @@ -138,6 +138,11 @@ static void vmw_stdu_destroy(struct vmw_screen_target_display_unit *stdu); * Screen Target Display Unit CRTC Functions */ +static bool vmw_stdu_use_cpu_blit(const struct vmw_private *vmw) +{ + return !(vmw->capabilities & SVGA_CAP_3D) || vmw->vram_size < (32 * 1024 * 1024); +} + /** * vmw_stdu_crtc_destroy - cleans up the STDU @@ -689,7 +694,7 @@ int vmw_kms_stdu_dma(struct vmw_private *dev_priv, container_of(vfb, struct vmw_framebuffer_bo, base)->buffer; struct vmw_stdu_dirty ddirty; int ret; - bool cpu_blit = !(dev_priv->capabilities & SVGA_CAP_3D); + bool cpu_blit = vmw_stdu_use_cpu_blit(dev_priv); DECLARE_VAL_CONTEXT(val_ctx, NULL, 0); /* @@ -1164,7 +1169,7 @@ vmw_stdu_primary_plane_prepare_fb(struct drm_plane *plane, * so cache these mappings */ if (vps->content_fb_type == SEPARATE_BO && - !(dev_priv->capabilities & SVGA_CAP_3D)) + vmw_stdu_use_cpu_blit(dev_priv)) vps->cpp = new_fb->pitches[0] / new_fb->width; return 0; @@ -1368,7 +1373,7 @@ static int vmw_stdu_plane_update_bo(struct vmw_private *dev_priv, bo_update.base.vfb = vfb; bo_update.base.out_fence = out_fence; bo_update.base.mutex = NULL; - bo_update.base.cpu_blit = !(dev_priv->capabilities & SVGA_CAP_3D); + bo_update.base.cpu_blit = vmw_stdu_use_cpu_blit(dev_priv); bo_update.base.intr = false; /* -- 2.32.0
[PATCH 6/8] drm/vmwgfx: Initialize drm_mode_fb_cmd2
From: Zack Rusin Transition to drm_mode_fb_cmd2 from drm_mode_fb_cmd left the structure unitialized. drm_mode_fb_cmd2 adds a few additional members, e.g. flags and modifiers which were never initialized. Garbage in those members can cause random failures during the bringup of the fbcon. Initializing the structure fixes random blank screens after bootup due to flags/modifiers mismatches during the fbcon bring up. Fixes: dabdcdc9822a ("drm/vmwgfx: Switch to mode_cmd2") Signed-off-by: Zack Rusin Cc: Daniel Vetter Cc: # v4.10+ Reviewed-by: Martin Krastev Reviewed-by: Maaz Mombasawala --- drivers/gpu/drm/vmwgfx/vmwgfx_fb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c index 8ee34576c7d0..adf17c740656 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c @@ -483,7 +483,7 @@ static int vmw_fb_kms_detach(struct vmw_fb_par *par, static int vmw_fb_kms_framebuffer(struct fb_info *info) { - struct drm_mode_fb_cmd2 mode_cmd; + struct drm_mode_fb_cmd2 mode_cmd = {0}; struct vmw_fb_par *par = info->par; struct fb_var_screeninfo *var = &info->var; struct drm_framebuffer *cur_fb; -- 2.32.0
[PATCH 5/8] drm/vmwgfx: Allow querying of the SVGA PCI id from the userspace
From: Zack Rusin Mesa3D loaders require knowledge of the devices PCI id. SVGAv2 and v3 have different PCI id's, but the same driver is used to handle them both. To allow Mesa3D svga driver to be loaded automatically for both SVGAv2 and SVGAv3 make the kernel return the PCI id of the currently running device. Signed-off-by: Zack Rusin Reviewed-by: Martin Krastev Reviewed-by: Maaz Mombasawala --- drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c | 27 +++ include/uapi/drm/vmwgfx_drm.h | 9 - 2 files changed, 23 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c index 471da2b4c177..a1da5678c731 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 OR MIT /** * - * Copyright 2009-2015 VMware, Inc., Palo Alto, CA., USA + * Copyright 2009-2022 VMware, Inc., Palo Alto, CA., USA * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the @@ -27,9 +27,11 @@ #include "vmwgfx_drv.h" #include "vmwgfx_devcaps.h" -#include #include "vmwgfx_kms.h" +#include +#include + int vmw_getparam_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { @@ -62,17 +64,15 @@ int vmw_getparam_ioctl(struct drm_device *dev, void *data, break; case DRM_VMW_PARAM_FIFO_HW_VERSION: { - if ((dev_priv->capabilities & SVGA_CAP_GBOBJECTS)) { + if ((dev_priv->capabilities & SVGA_CAP_GBOBJECTS)) param->value = SVGA3D_HWVERSION_WS8_B1; - break; - } - - param->value = - vmw_fifo_mem_read(dev_priv, - ((vmw_fifo_caps(dev_priv) & - SVGA_FIFO_CAP_3D_HWVERSION_REVISED) ? - SVGA_FIFO_3D_HWVERSION_REVISED : - SVGA_FIFO_3D_HWVERSION)); + else + param->value = vmw_fifo_mem_read( + dev_priv, + ((vmw_fifo_caps(dev_priv) & + SVGA_FIFO_CAP_3D_HWVERSION_REVISED) ? + SVGA_FIFO_3D_HWVERSION_REVISED : + SVGA_FIFO_3D_HWVERSION)); break; } case DRM_VMW_PARAM_MAX_SURF_MEMORY: @@ -108,6 +108,9 @@ int vmw_getparam_ioctl(struct drm_device *dev, void *data, case DRM_VMW_PARAM_GL43: param->value = has_gl43_context(dev_priv); break; + case DRM_VMW_PARAM_DEVICE_ID: + param->value = to_pci_dev(dev_priv->drm.dev)->device; + break; default: return -EINVAL; } diff --git a/include/uapi/drm/vmwgfx_drm.h b/include/uapi/drm/vmwgfx_drm.h index 8277644c1144..26549c86a91f 100644 --- a/include/uapi/drm/vmwgfx_drm.h +++ b/include/uapi/drm/vmwgfx_drm.h @@ -1,6 +1,6 @@ /** * - * Copyright © 2009-2015 VMware, Inc., Palo Alto, CA., USA + * Copyright © 2009-2022 VMware, Inc., Palo Alto, CA., USA * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining a @@ -92,6 +92,12 @@ extern "C" { * * DRM_VMW_PARAM_SM5 * SM5 support is enabled. + * + * DRM_VMW_PARAM_GL43 + * SM5.1+GL4.3 support is enabled. + * + * DRM_VMW_PARAM_DEVICE_ID + * PCI ID of the underlying SVGA device. */ #define DRM_VMW_PARAM_NUM_STREAMS 0 @@ -111,6 +117,7 @@ extern "C" { #define DRM_VMW_PARAM_SM4_114 #define DRM_VMW_PARAM_SM5 15 #define DRM_VMW_PARAM_GL43 16 +#define DRM_VMW_PARAM_DEVICE_ID17 /** * enum drm_vmw_handle_type - handle type for ref ioctls -- 2.32.0
[PATCH 7/8] drm/vmwgfx: Implement MSI/MSI-X support for IRQs
From: Zack Rusin SVGAv3 deprecates legacy interrupts and adds support for MSI/MSI-X. With MSI the driver visible side remains largely unchanged but with MSI-X each interrupt gets delivered on its own vector. Add support for MSI/MSI-X while preserving the old functionality for SVGAv2. Code between the SVGAv2 and SVGAv3 is exactly the same, only the number of available vectors changes, in particular between legacy and MSI-X interrupts. Signed-off-by: Zack Rusin Reviewed-by: Martin Krastev Reviewed-by: Maaz Mombasawala --- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 9 - drivers/gpu/drm/vmwgfx/vmwgfx_irq.c | 54 + 3 files changed, 57 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c index f43afd56915e..791f9a5f3868 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c @@ -980,7 +980,7 @@ static int vmw_driver_load(struct vmw_private *dev_priv, u32 pci_id) } if (dev_priv->capabilities & SVGA_CAP_IRQMASK) { - ret = vmw_irq_install(&dev_priv->drm, pdev->irq); + ret = vmw_irq_install(dev_priv); if (ret != 0) { drm_err(&dev_priv->drm, "Failed installing irq: %d\n", ret); diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h index 12eb4de41036..be19aa6e1f13 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h @@ -65,6 +65,11 @@ #define VMWGFX_PCI_ID_SVGA2 0x0405 #define VMWGFX_PCI_ID_SVGA3 0x0406 +/* + * This has to match get_count_order(SVGA_IRQFLAG_MAX) + */ +#define VMWGFX_MAX_NUM_IRQS 6 + /* * Perhaps we should have sysfs entries for these. */ @@ -532,6 +537,8 @@ struct vmw_private { bool has_mob; spinlock_t hw_lock; bool assume_16bpp; + u32 irqs[VMWGFX_MAX_NUM_IRQS]; + u32 num_irq_vectors; enum vmw_sm_type sm_type; @@ -1158,7 +1165,7 @@ bool vmw_cmd_describe(const void *buf, u32 *size, char const **cmd); * IRQs and wating - vmwgfx_irq.c */ -extern int vmw_irq_install(struct drm_device *dev, int irq); +extern int vmw_irq_install(struct vmw_private *dev_priv); extern void vmw_irq_uninstall(struct drm_device *dev); extern bool vmw_seqno_passed(struct vmw_private *dev_priv, uint32_t seqno); diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_irq.c b/drivers/gpu/drm/vmwgfx/vmwgfx_irq.c index fe4732bf2c9d..482989b1f211 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_irq.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_irq.c @@ -300,6 +300,7 @@ void vmw_irq_uninstall(struct drm_device *dev) struct vmw_private *dev_priv = vmw_priv(dev); struct pci_dev *pdev = to_pci_dev(dev->dev); uint32_t status; + u32 i; if (!(dev_priv->capabilities & SVGA_CAP_IRQMASK)) return; @@ -309,20 +310,61 @@ void vmw_irq_uninstall(struct drm_device *dev) status = vmw_irq_status_read(dev_priv); vmw_irq_status_write(dev_priv, status); - free_irq(pdev->irq, dev); + for (i = 0; i < dev_priv->num_irq_vectors; ++i) + free_irq(dev_priv->irqs[i], dev); + + pci_free_irq_vectors(pdev); + dev_priv->num_irq_vectors = 0; } /** * vmw_irq_install - Install the irq handlers * - * @dev: Pointer to the drm device. - * @irq: The irq number. + * @dev_priv: Pointer to the vmw_private device. * Return: Zero if successful. Negative number otherwise. */ -int vmw_irq_install(struct drm_device *dev, int irq) +int vmw_irq_install(struct vmw_private *dev_priv) { + struct pci_dev *pdev = to_pci_dev(dev_priv->drm.dev); + struct drm_device *dev = &dev_priv->drm; + int ret; + int nvec; + int i = 0; + + BUILD_BUG_ON((SVGA_IRQFLAG_MAX >> VMWGFX_MAX_NUM_IRQS) != 1); + BUG_ON(VMWGFX_MAX_NUM_IRQS != get_count_order(SVGA_IRQFLAG_MAX)); + + nvec = pci_alloc_irq_vectors(pdev, 1, VMWGFX_MAX_NUM_IRQS, +PCI_IRQ_ALL_TYPES); + + if (nvec <= 0) { + drm_err(&dev_priv->drm, + "IRQ's are unavailable, nvec: %d\n", nvec); + goto done; + } + vmw_irq_preinstall(dev); - return request_threaded_irq(irq, vmw_irq_handler, vmw_thread_fn, - IRQF_SHARED, VMWGFX_DRIVER_NAME, dev); + for (i = 0; i < nvec; ++i) { + ret = pci_irq_vector(pdev, i); + if (ret < 0) { + drm_err(&dev_priv->drm, + "failed getting irq vector: %d\n", ret); + goto done; + } + dev_priv->irqs[i] = ret; + + ret = request_threaded_irq(dev_priv->irqs[i], vmw_irq_handler, vmw_thread_fn, +
[PATCH 1/8] drm/vmwgfx: Add support for CursorMob and CursorBypass 4
From: Martin Krastev * Add support for CursorMob * Add support for CursorBypass 4 * Refactor vmw_du_cursor_plane_atomic_update to be kms-helper-atomic -- move BO mappings to vmw_du_cursor_plane_prepare_fb -- move BO unmappings to vmw_du_cursor_plane_cleanup_fb Cursor mobs are a new svga feature which enables support for large cursors, e.g. large accessibility cursor on platforms with vmwgfx. It also cleans up the cursor code and makes it more uniform with the rest of modern guest backed objects support. Signed-off-by: Martin Krastev Reviewed-by: Zack Rusin Reviewed-by: Maaz Mombasawala Signed-off-by: Zack Rusin --- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 6 +- drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 399 ++- drivers/gpu/drm/vmwgfx/vmwgfx_kms.h | 28 +- drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c | 18 +- drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c | 17 +- drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c | 16 +- 7 files changed, 382 insertions(+), 104 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c index 26eb5478394a..621231c66fd4 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 OR MIT /** * - * Copyright 2009-2016 VMware, Inc., Palo Alto, CA., USA + * Copyright 2009-2022 VMware, Inc., Palo Alto, CA., USA * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h index ea3ecdda561d..780da6147cf9 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 OR MIT */ /** * - * Copyright 2009-2021 VMware, Inc., Palo Alto, CA., USA + * Copyright 2009-2022 VMware, Inc., Palo Alto, CA., USA * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the @@ -101,6 +101,7 @@ struct vmw_fpriv { * struct vmw_buffer_object - TTM buffer object with vmwgfx additions * @base: The TTM buffer object * @res_tree: RB tree of resources using this buffer object as a backing MOB + * @base_mapped_count: ttm BO mapping count; used by KMS atomic helpers. * @cpu_writers: Number of synccpu write grabs. Protected by reservation when * increased. May be decreased without reservation. * @dx_query_ctx: DX context if this buffer object is used as a DX query MOB @@ -111,6 +112,9 @@ struct vmw_fpriv { struct vmw_buffer_object { struct ttm_buffer_object base; struct rb_root res_tree; + /* For KMS atomic helpers: ttm bo mapping count */ + atomic_t base_mapped_count; + atomic_t cpu_writers; /* Not ref-counted. Protected by binding_mutex */ struct vmw_resource *dx_query_ctx; diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c index bbd2f4ec08ec..9d82a7b49aed 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 OR MIT /** * - * Copyright 2009-2015 VMware, Inc., Palo Alto, CA., USA + * Copyright 2009-2022 VMware, Inc., Palo Alto, CA., USA * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the @@ -41,7 +41,7 @@ void vmw_du_cleanup(struct vmw_display_unit *du) struct vmw_private *dev_priv = vmw_priv(du->primary.dev); drm_plane_cleanup(&du->primary); if (vmw_cmd_supported(dev_priv)) - drm_plane_cleanup(&du->cursor); + drm_plane_cleanup(&du->cursor.base); drm_connector_unregister(&du->connector); drm_crtc_cleanup(&du->crtc); @@ -53,23 +53,43 @@ void vmw_du_cleanup(struct vmw_display_unit *du) * Display Unit Cursor functions */ -static int vmw_cursor_update_image(struct vmw_private *dev_priv, - u32 *image, u32 width, u32 height, - u32 hotspotX, u32 hotspotY) +static void vmw_cursor_update_mob(struct vmw_private *dev_priv, + struct ttm_buffer_object *bo, + struct ttm_bo_kmap_obj *map, + u32 *image, u32 width, u32 height, + u32 hotspotX, u32 hotspotY); + +struct vmw_svga_fifo_cmd_define_cursor { + u32 cmd; + SVGAFifoCmdDefineAlphaCursor cursor; +}; + +static void vmw_cursor_update_image(struct vmw_private *dev_priv, +
[PATCH 4/8] drm/vmwgfx: Fix fencing on SVGAv3
From: Zack Rusin Port of the vmwgfx to SVGAv3 lacked support for fencing. SVGAv3 removed FIFO's and replaced them with command buffers and extra registers. The initial version of SVGAv3 lacked support for most advanced features (e.g. 3D) which made fences unnecessary. That is no longer the case, especially as 3D support is being turned on. Switch from FIFO commands and capabilities to command buffers and extra registers to enable fences on SVGAv3. Fixes: 2cd80dbd3551 ("drm/vmwgfx: Add basic support for SVGA3") Signed-off-by: Zack Rusin Reviewed-by: Martin Krastev Reviewed-by: Maaz Mombasawala --- drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 8 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 28 --- drivers/gpu/drm/vmwgfx/vmwgfx_irq.c | 26 + drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 8 +--- 5 files changed, 53 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c b/drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c index a3bfbb6c3e14..bf1b394753da 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c @@ -528,7 +528,7 @@ int vmw_cmd_send_fence(struct vmw_private *dev_priv, uint32_t *seqno) *seqno = atomic_add_return(1, &dev_priv->marker_seq); } while (*seqno == 0); - if (!(vmw_fifo_caps(dev_priv) & SVGA_FIFO_CAP_FENCE)) { + if (!vmw_has_fences(dev_priv)) { /* * Don't request hardware to send a fence. The diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h index 780da6147cf9..12eb4de41036 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h @@ -1683,4 +1683,12 @@ static inline void vmw_irq_status_write(struct vmw_private *vmw, outl(status, vmw->io_start + SVGA_IRQSTATUS_PORT); } +static inline bool vmw_has_fences(struct vmw_private *vmw) +{ + if ((vmw->capabilities & (SVGA_CAP_COMMAND_BUFFERS | + SVGA_CAP_CMD_BUFFERS_2)) != 0) + return true; + return (vmw_fifo_caps(vmw) & SVGA_FIFO_CAP_FENCE) != 0; +} + #endif diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c index 59d6a2dd4c2e..66cc35dc223e 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c @@ -82,6 +82,22 @@ fman_from_fence(struct vmw_fence_obj *fence) return container_of(fence->base.lock, struct vmw_fence_manager, lock); } +static u32 vmw_fence_goal_read(struct vmw_private *vmw) +{ + if ((vmw->capabilities2 & SVGA_CAP2_EXTRA_REGS) != 0) + return vmw_read(vmw, SVGA_REG_FENCE_GOAL); + else + return vmw_fifo_mem_read(vmw, SVGA_FIFO_FENCE_GOAL); +} + +static void vmw_fence_goal_write(struct vmw_private *vmw, u32 value) +{ + if ((vmw->capabilities2 & SVGA_CAP2_EXTRA_REGS) != 0) + vmw_write(vmw, SVGA_REG_FENCE_GOAL, value); + else + vmw_fifo_mem_write(vmw, SVGA_FIFO_FENCE_GOAL, value); +} + /* * Note on fencing subsystem usage of irqs: * Typically the vmw_fences_update function is called @@ -392,7 +408,7 @@ static bool vmw_fence_goal_new_locked(struct vmw_fence_manager *fman, if (likely(!fman->seqno_valid)) return false; - goal_seqno = vmw_fifo_mem_read(fman->dev_priv, SVGA_FIFO_FENCE_GOAL); + goal_seqno = vmw_fence_goal_read(fman->dev_priv); if (likely(passed_seqno - goal_seqno >= VMW_FENCE_WRAP)) return false; @@ -400,9 +416,8 @@ static bool vmw_fence_goal_new_locked(struct vmw_fence_manager *fman, list_for_each_entry(fence, &fman->fence_list, head) { if (!list_empty(&fence->seq_passed_actions)) { fman->seqno_valid = true; - vmw_fifo_mem_write(fman->dev_priv, - SVGA_FIFO_FENCE_GOAL, - fence->base.seqno); + vmw_fence_goal_write(fman->dev_priv, +fence->base.seqno); break; } } @@ -434,13 +449,12 @@ static bool vmw_fence_goal_check_locked(struct vmw_fence_obj *fence) if (dma_fence_is_signaled_locked(&fence->base)) return false; - goal_seqno = vmw_fifo_mem_read(fman->dev_priv, SVGA_FIFO_FENCE_GOAL); + goal_seqno = vmw_fence_goal_read(fman->dev_priv); if (likely(fman->seqno_valid && goal_seqno - fence->base.seqno < VMW_FENCE_WRAP)) return false; - vmw_fifo_mem_write(fman->dev_priv, SVGA_FIFO_FENCE_GOAL, - fence->base.seqno); + vmw_fence_goal_write(fman->dev_priv, fence->base.seqno); fman->seqno_valid = true; return true; diff --git a/drivers/gpu/drm/
[PATCH 2/8] drm/vmwgfx: Cleanup multimon initialization code
From: Zack Rusin The results of the legacy display unit initialization were being silently ignored. Unifying the selection of number of display units based on whether the underlying device supports multimon makes it easier to add error checking to all paths. This makes the driver report the errors in ldu initialization paths and try to recover from them. Signed-off-by: Zack Rusin Reviewed-by: Martin Krastev Reviewed-by: Maaz Mombasawala --- drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c index 643c1608ddfd..e4347faccee0 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c @@ -492,6 +492,8 @@ int vmw_kms_ldu_init_display(struct vmw_private *dev_priv) { struct drm_device *dev = &dev_priv->drm; int i, ret; + int num_display_units = (dev_priv->capabilities & SVGA_CAP_MULTIMON) ? + VMWGFX_NUM_DISPLAY_UNITS : 1; if (unlikely(dev_priv->ldu_priv)) { return -EINVAL; @@ -506,21 +508,17 @@ int vmw_kms_ldu_init_display(struct vmw_private *dev_priv) dev_priv->ldu_priv->last_num_active = 0; dev_priv->ldu_priv->fb = NULL; - /* for old hardware without multimon only enable one display */ - if (dev_priv->capabilities & SVGA_CAP_MULTIMON) - ret = drm_vblank_init(dev, VMWGFX_NUM_DISPLAY_UNITS); - else - ret = drm_vblank_init(dev, 1); + ret = drm_vblank_init(dev, num_display_units); if (ret != 0) goto err_free; vmw_kms_create_implicit_placement_property(dev_priv); - if (dev_priv->capabilities & SVGA_CAP_MULTIMON) - for (i = 0; i < VMWGFX_NUM_DISPLAY_UNITS; ++i) - vmw_ldu_init(dev_priv, i); - else - vmw_ldu_init(dev_priv, 0); + for (i = 0; i < num_display_units; ++i) { + ret = vmw_ldu_init(dev_priv, i); + if (ret != 0) + goto err_free; + } dev_priv->active_display_unit = vmw_du_legacy; -- 2.32.0
[PATCH 3/8] drm/vmwgfx: Print capabilities early during the initialization
From: Zack Rusin Capabilities were logged at the end of initialization so any early errors would make them not appear in the logs. Which is also when they're needed the most. Print the the capabilities right after fetching them, before the init code starts using them to make sure they always show up in the logs. Signed-off-by: Zack Rusin Reviewed-by: Martin Krastev Reviewed-by: Maaz Mombasawala --- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 16 ++-- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c index 621231c66fd4..f43afd56915e 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c @@ -848,12 +848,16 @@ static int vmw_driver_load(struct vmw_private *dev_priv, u32 pci_id) dev_priv->capabilities = vmw_read(dev_priv, SVGA_REG_CAPABILITIES); - + vmw_print_bitmap(&dev_priv->drm, "Capabilities", +dev_priv->capabilities, +cap1_names, ARRAY_SIZE(cap1_names)); if (dev_priv->capabilities & SVGA_CAP_CAP2_REGISTER) { dev_priv->capabilities2 = vmw_read(dev_priv, SVGA_REG_CAP2); + vmw_print_bitmap(&dev_priv->drm, "Capabilities2", +dev_priv->capabilities2, +cap2_names, ARRAY_SIZE(cap2_names)); } - ret = vmw_dma_select_mode(dev_priv); if (unlikely(ret != 0)) { drm_info(&dev_priv->drm, @@ -939,14 +943,6 @@ static int vmw_driver_load(struct vmw_private *dev_priv, u32 pci_id) "MOB limits: max mob size = %u kB, max mob pages = %u\n", dev_priv->max_mob_size / 1024, dev_priv->max_mob_pages); - vmw_print_bitmap(&dev_priv->drm, "Capabilities", -dev_priv->capabilities, -cap1_names, ARRAY_SIZE(cap1_names)); - if (dev_priv->capabilities & SVGA_CAP_CAP2_REGISTER) - vmw_print_bitmap(&dev_priv->drm, "Capabilities2", -dev_priv->capabilities2, -cap2_names, ARRAY_SIZE(cap2_names)); - ret = vmw_dma_masks(dev_priv); if (unlikely(ret != 0)) goto out_err0; -- 2.32.0
[PATCH 0/8] drm/vmwgfx: 3D on arm64 and large cursor support
From: Zack Rusin Series finishes 3D support on arm64 with vmwgfx. With this and changes that add svga3 pci id's to Mesa3D - OpenGL 4.3 and GLES 3.1 work smoothly on arm64. Most changes are not svga3 specific but rather could be classified as generic improvements. That's in particular true for support for curso mobs which enable large cursor support on both svga2 and svga3 and fixing initialization of drm_mode_fb_cmd2 struct. Martin Krastev (1): drm/vmwgfx: Add support for CursorMob and CursorBypass 4 Zack Rusin (7): drm/vmwgfx: Cleanup multimon initialization code drm/vmwgfx: Print capabilities early during the initialization drm/vmwgfx: Fix fencing on SVGAv3 drm/vmwgfx: Allow querying of the SVGA PCI id from the userspace drm/vmwgfx: Initialize drm_mode_fb_cmd2 drm/vmwgfx: Implement MSI/MSI-X support for IRQs drm/vmwgfx: Stop using surface dma commands on most configurations drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 20 +- drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 23 +- drivers/gpu/drm/vmwgfx/vmwgfx_fb.c| 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 28 +- drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c | 27 +- drivers/gpu/drm/vmwgfx/vmwgfx_irq.c | 80 - drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 407 +- drivers/gpu/drm/vmwgfx/vmwgfx_kms.h | 28 +- drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c | 36 ++- drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c | 17 +- drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c | 27 +- include/uapi/drm/vmwgfx_drm.h | 9 +- 13 files changed, 538 insertions(+), 168 deletions(-) -- 2.32.0
Re: [PATCH V2 04/12] drm: bridge: icn6211: Add DSI lane count DT property parsing
On 3/2/22 11:01, Maxime Ripard wrote: On Thu, Feb 17, 2022 at 01:25:22AM +0100, Marek Vasut wrote: The driver currently hard-codes DSI lane count to two, however the chip is capable of operating in 1..4 DSI lanes mode. Parse 'data-lanes' DT property and program the result into DSI_CTRL register. Signed-off-by: Marek Vasut Cc: Jagan Teki Cc: Maxime Ripard Cc: Robert Foss Cc: Sam Ravnborg Cc: Thomas Zimmermann To: dri-devel@lists.freedesktop.org --- V2: Rebase on next-20220214 --- drivers/gpu/drm/bridge/chipone-icn6211.c | 21 - 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/bridge/chipone-icn6211.c b/drivers/gpu/drm/bridge/chipone-icn6211.c index 2ac8eb7e25f52..7c013a08c7b00 100644 --- a/drivers/gpu/drm/bridge/chipone-icn6211.c +++ b/drivers/gpu/drm/bridge/chipone-icn6211.c @@ -136,10 +136,12 @@ struct chipone { struct drm_bridge bridge; struct drm_display_mode mode; struct drm_bridge *panel_bridge; + struct device_node *host_node; struct gpio_desc *enable_gpio; struct regulator *vdd1; struct regulator *vdd2; struct regulator *vdd3; + int dsi_lanes; }; static inline struct chipone *bridge_to_chipone(struct drm_bridge *bridge) @@ -212,6 +214,11 @@ static void chipone_atomic_enable(struct drm_bridge *bridge, /* dsi specific sequence */ ICN6211_DSI(icn, SYNC_EVENT_DLY, 0x80); ICN6211_DSI(icn, HFP_MIN, hfp & 0xff); + + /* DSI data lane count */ + ICN6211_DSI(icn, DSI_CTRL, + DSI_CTRL_UNKNOWN | DSI_CTRL_DSI_LANES(icn->dsi_lanes - 1)); + ICN6211_DSI(icn, MIPI_PD_CK_LANE, 0xa0); ICN6211_DSI(icn, PLL_CTRL(12), 0xff); @@ -314,6 +321,7 @@ static const struct drm_bridge_funcs chipone_bridge_funcs = { static int chipone_parse_dt(struct chipone *icn) { struct device *dev = icn->dev; + struct device_node *endpoint; struct drm_panel *panel; int ret; @@ -350,6 +358,16 @@ static int chipone_parse_dt(struct chipone *icn) return PTR_ERR(icn->enable_gpio); } + endpoint = of_graph_get_endpoint_by_regs(dev->of_node, 0, 0); + icn->dsi_lanes = of_property_count_u32_elems(endpoint, "data-lanes"); The binding must be amended to allow for the usage of data-lanes, and you need to keep the previous value as default for older device trees Regarding the default value -- there are no in-tree users of this driver yet (per git grep in current linux-next), do we really care about backward compatibility in this case?
Re: [PATCH V2 03/12] drm: bridge: icn6211: Add HS/VS/DE polarity handling
On 3/2/22 10:59, Maxime Ripard wrote: On Thu, Feb 17, 2022 at 01:25:21AM +0100, Marek Vasut wrote: The driver currently hard-codes HS/VS polarity to active-low and DE to active-high, which is not correct for a lot of supported DPI panels. Add the missing mode flag handling for HS/VS/DE polarity. Signed-off-by: Marek Vasut Cc: Jagan Teki Cc: Maxime Ripard Cc: Robert Foss Cc: Sam Ravnborg Cc: Thomas Zimmermann To: dri-devel@lists.freedesktop.org --- V2: Rebase on next-20220214 --- drivers/gpu/drm/bridge/chipone-icn6211.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/bridge/chipone-icn6211.c b/drivers/gpu/drm/bridge/chipone-icn6211.c index e29e6a84c39a6..2ac8eb7e25f52 100644 --- a/drivers/gpu/drm/bridge/chipone-icn6211.c +++ b/drivers/gpu/drm/bridge/chipone-icn6211.c @@ -165,8 +165,16 @@ static void chipone_atomic_enable(struct drm_bridge *bridge, struct drm_bridge_state *old_bridge_state) { struct chipone *icn = bridge_to_chipone(bridge); + struct drm_atomic_state *state = old_bridge_state->base.state; struct drm_display_mode *mode = &icn->mode; + const struct drm_bridge_state *bridge_state; u16 hfp, hbp, hsync; + u32 bus_flags; + u8 pol; + + /* Get the DPI flags from the bridge state. */ + bridge_state = drm_atomic_get_new_bridge_state(state, bridge); + bus_flags = bridge_state->output_bus_cfg.flags; ICN6211_DSI(icn, MIPI_CFG_PW, MIPI_CFG_PW_CONFIG_DSI); @@ -206,7 +214,13 @@ static void chipone_atomic_enable(struct drm_bridge *bridge, ICN6211_DSI(icn, HFP_MIN, hfp & 0xff); ICN6211_DSI(icn, MIPI_PD_CK_LANE, 0xa0); ICN6211_DSI(icn, PLL_CTRL(12), 0xff); - ICN6211_DSI(icn, BIST_POL, BIST_POL_DE_POL); + + /* DPI HS/VS/DE polarity */ + pol = ((mode->flags & DRM_MODE_FLAG_PHSYNC) ? BIST_POL_HSYNC_POL : 0) | + ((mode->flags & DRM_MODE_FLAG_PVSYNC) ? BIST_POL_VSYNC_POL : 0) | + ((bus_flags & DRM_BUS_FLAG_DE_HIGH) ? BIST_POL_DE_POL : 0); Is there a reason you didn't use bus_flags for all the polarities there? Because there is no separate HS/VS bus flag, that's why HS/VS is pulled from mode flags.
Re: [PATCH v16 4/4] drm/bridge: dw-hdmi: fix bus formats negotiation for 8 bit modes
Hi, On 02/03/2022 12:15, H. Nikolaus Schaller wrote: Hi Neil, Am 02.03.2022 um 11:25 schrieb Neil Armstrong : I added a printk for hdmi->sink_is_hdmi. This returns 1. Which IMHO is to be expected since I am using a HDMI connector and panel... So your patch will still add the UYVY formats. Either the synposys module inside the jz4780 or the panel does not understand them. By selecting the UYVY formats, the driver will enable the colorspace converters in the dw-hdmi IP, I don't see why it doesn't work here... There is a bit called `Support Color Space Converter` in config0_id: bit | Name| R/W | Desc 2 | csc | R | Indicates if Color Space Conversion block is present Could you dump all the config0 bits: ===><= diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c index 54d8fdad395f..547731482da8 100644 --- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c +++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c @@ -3431,6 +3431,7 @@ struct dw_hdmi *dw_hdmi_probe(struct platform_device *pdev, pdevinfo.id = PLATFORM_DEVID_AUTO; config0 = hdmi_readb(hdmi, HDMI_CONFIG0_ID); + dev_info(dev, "config0: %x\n", config0); config3 = hdmi_readb(hdmi, HDMI_CONFIG3_ID); if (iores && config3 & HDMI_CONFIG3_AHBAUDDMA) { ===><= If this bit is missing, this would explain the black screen. [9.291011] dw-hdmi-ingenic 1018.hdmi: config0: bf Hm. Or is the color-space conversion of the sw-hdmi module inside the jz4780 broken or not configured properly? (cross-checked: RGB mode still works if I force hdmi->sink_is_hdmi = false) I don't understand what's wrong, can you try to make the logic select MEDIA_BUS_FMT_YUV8_1X24 instead of DRM_COLOR_FORMAT_YCBCR422 ? If your CSC is broken, we'll need to disable it on your platform. Thanks, Neil BR and thanks, Nikolaus
[PATCH] video: fbdev: sm712fb: Fix crash in smtcfb_write()
When the sm712fb driver writes three bytes to the framebuffer, the driver will crash: BUG: unable to handle page fault for address: c90001ff RIP: 0010:smtcfb_write+0x454/0x5b0 Call Trace: vfs_write+0x291/0xd60 ? do_sys_openat2+0x27d/0x350 ? __fget_light+0x54/0x340 ksys_write+0xce/0x190 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Fix it by removing the open-coded endianness fixup-code. Signed-off-by: Zheyu Ma --- drivers/video/fbdev/sm712fb.c | 21 - 1 file changed, 4 insertions(+), 17 deletions(-) diff --git a/drivers/video/fbdev/sm712fb.c b/drivers/video/fbdev/sm712fb.c index 0dbc6bf8268a..e355089ac7d6 100644 --- a/drivers/video/fbdev/sm712fb.c +++ b/drivers/video/fbdev/sm712fb.c @@ -1130,7 +1130,7 @@ static ssize_t smtcfb_write(struct fb_info *info, const char __user *buf, count = total_size - p; } - buffer = kmalloc((count > PAGE_SIZE) ? PAGE_SIZE : count, GFP_KERNEL); + buffer = kmalloc(PAGE_SIZE, GFP_KERNEL); if (!buffer) return -ENOMEM; @@ -1148,24 +1148,11 @@ static ssize_t smtcfb_write(struct fb_info *info, const char __user *buf, break; } - for (i = c >> 2; i--;) { - fb_writel(big_swap(*src), dst++); + for (i = (c + 3) >> 2; i--;) { + fb_writel(big_swap(*src), dst); + dst++; src++; } - if (c & 3) { - u8 *src8 = (u8 *)src; - u8 __iomem *dst8 = (u8 __iomem *)dst; - - for (i = c & 3; i--;) { - if (i & 1) { - fb_writeb(*src8++, ++dst8); - } else { - fb_writeb(*src8++, --dst8); - dst8 += 2; - } - } - dst = (u32 __iomem *)dst8; - } *ppos += c; buf += c; -- 2.25.1
Re: [RFC PATCH] drm/panel: simple: panel-dpi: use bus-format to set bpc and bus_format
Hi, Please try to avoid top posting On Wed, Feb 23, 2022 at 04:25:19PM +0100, Max Krummenacher wrote: > The goal here is to set the element bus_format in the struct > panel_desc. This is an enum with the possible values defined in > include/uapi/linux/media-bus-format.h. > > The enum values are not constructed in a way that you could calculate > the value from color channel width/shift/mapping/whatever. You rather > would have to check if the combination of color channel > width/shift/mapping/whatever maps to an existing value and otherwise > EINVAL out. > > I don't see the value in having yet another way of how this > information can be specified and then having to write a more > complicated parser which maps the dt data to bus_format. Generally speaking, sending an RFC without explicitly stating what you want a comment on isn't very efficient. That being said, what I (and I can only assume Marek) don't like is the string encoding. Especially when the similar bus-type property uses a integer with the various available bus options we have. Having an integer, with a set of defines that you would map to the proper MEDIA_BUS_* would be more efficient and more elegant. That being said, the first question that needs to be answered is why does this have to be in the DT in the first place? Maxime signature.asc Description: PGP signature
[Bug 215648] amdgpu: Changing monitor configuration (plug/unplug/wake from DPMS) causes kernel panic
https://bugzilla.kernel.org/show_bug.cgi?id=215648 --- Comment #5 from Philipp Riederer (pr_ker...@tum.fail) --- Hey, Thank you for working on this! I added the dmesg as attachment to the bug. Please note that this is from the working kernel (commit 94ba5b0fb52d6dbf1f200351876a839afb74aedd) as that is the one I have running now. If it helps, I can also provide the messages from a non-working kernel later. Cheers, Philipp -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 215648] amdgpu: Changing monitor configuration (plug/unplug/wake from DPMS) causes kernel panic
https://bugzilla.kernel.org/show_bug.cgi?id=215648 --- Comment #4 from Philipp Riederer (pr_ker...@tum.fail) --- Created attachment 300517 --> https://bugzilla.kernel.org/attachment.cgi?id=300517&action=edit dmesg from boot using kernel 5.15.12@94ba5b0fb52d6dbf1f200351876a839afb74aedd -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 215648] amdgpu: Changing monitor configuration (plug/unplug/wake from DPMS) causes kernel panic
https://bugzilla.kernel.org/show_bug.cgi?id=215648 --- Comment #3 from Alex Deucher (alexdeuc...@gmail.com) --- Thanks. Can you get the dmesg output from boot prior to the hang? -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
RE: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
From: Xiaomeng Tong > Sent: 02 March 2022 09:31 > > On Mon, 28 Feb 2022 16:41:04 -0800, Linus Torvalds > wrote: > > > > But basically to _me_, the important part is that the end result is > > maintainable longer-term. > > I couldn't agree more. And because of that, I stick with the following > approach because it's maintainable longer-term than "type(pos) pos" one: > Implements a new macro for each list_for_each_entry* with _inside suffix. > #define list_for_each_entry_inside(pos, type, head, member) I think that it would be better to make any alternate loop macro just set the variable to NULL on the loop exit. That is easier to code for and the compiler might be persuaded to not redo the test. It also doesn't need an extra variable defined in the for() statement so can be back-ported to older kernels without required declaration in the middle of blocks. OTOH there may be alternative definitions that can be used to get the compiler (or other compiler-like tools) to detect broken code. Even if the definition can't possibly generate a working kerrnel. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH 1/2] drm: Introduce DRM_BRIDGE_OP_UPSTREAM_FIRST to alter bridge init order
Hi Andrzej On Mon, 28 Feb 2022 at 15:36, Andrzej Hajda wrote: > > > > On 22.02.2022 09:43, Dave Stevenson wrote: > > Hi Laurent. > > > > Thanks for the review. > > > > On Tue, 22 Feb 2022 at 06:34, Laurent Pinchart > > wrote: > >> Hi Dave, > >> > >> Thank you for the patch. > >> > >> On Wed, Feb 16, 2022 at 04:59:43PM +, Dave Stevenson wrote: > >>> DSI sink devices typically want the DSI host powered up and configured > >>> before they are powered up. pre_enable is the place this would normally > >>> happen, but they are called in reverse order from panel/connector towards > >>> the encoder, which is the "wrong" order. > >>> > >>> Add a new flag DRM_BRIDGE_OP_UPSTREAM_FIRST that any bridge can set > >>> to swap the order of pre_enable (and post_disable) so that any upstream > >>> bridges are called first to create the desired state. > >>> > >>> eg: > >>> - Panel > >>> - Bridge 1 > >>> - Bridge 2 DRM_BRIDGE_OP_UPSTREAM_FIRST > >>> - Bridge 3 > >>> - Encoder > >>> Would result in pre_enable's being called as Panel, Bridge 1, Bridge 3, > >>> Bridge 2. > >> If there was a Bridge 4 between Bridge 3 and Encoder, would it be > >> > >> Panel, Bridge 1, Bridge 3, Bridge 4, Bridge 2 > >> > >> ? I'd capture that here, to be explicit. > > No. > > - Panel > > - Bridge 1 > > - Bridge 2 DRM_BRIDGE_OP_UPSTREAM_FIRST > > - Bridge 3 > > - Bridge 4 > >- Encoder > > Would result in pre_enable's being called as Panel, Bridge 1, Bridge > > 3, Bridge 2, Bridge 4, Encoder. > > ie it only swaps the order of bridges 2 & 3. > > > > - Panel > > - Bridge 1 > > - Bridge 2 DRM_BRIDGE_OP_UPSTREAM_FIRST > > - Bridge 3 DRM_BRIDGE_OP_UPSTREAM_FIRST > > - Bridge 4 > > - Encoder > > Would result in pre_enable's being called as Panel, Bridge 1, Bridge > > 4, Bridge 3, Bridge 2, Encoder. > > (Bridge 2&3 have asked for upstream to be enabled first, which means > > bridge 4. Bridge 2 wants upstream enabled first, which means bridge > > 3). > > > > - Panel > > - Bridge 1 > > - Bridge 2 DRM_BRIDGE_OP_UPSTREAM_FIRST > > - Bridge 3 > > - Bridge 4 DRM_BRIDGE_OP_UPSTREAM_FIRST > > - Bridge 5 > > - Encoder > > Would result in Panel, Bridge 1, Bridge 3, Bridge 2, Bridge 5, Bridge > > 4, Encoder. > > > > So we only reverse the order whilst the bridges request that they want > > upstream enabled first, but we can do that multiple times within the > > chain. I hope that makes sense. > > > >>> Signed-off-by: Dave Stevenson > >>> --- > >>> drivers/gpu/drm/drm_bridge.c | 197 > >>> +-- > >>> include/drm/drm_bridge.h | 8 ++ > >>> 2 files changed, 180 insertions(+), 25 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c > >>> index c96847fc0ebc..7c24e8340efa 100644 > >>> --- a/drivers/gpu/drm/drm_bridge.c > >>> +++ b/drivers/gpu/drm/drm_bridge.c > >>> @@ -522,21 +522,58 @@ EXPORT_SYMBOL(drm_bridge_chain_disable); > >>>* Calls &drm_bridge_funcs.post_disable op for all the bridges in the > >>>* encoder chain, starting from the first bridge to the last. These are > >>> called > >>>* after completing the encoder's prepare op. > >> Missing blank line, as well as in three locations below. > >> > >>> + * If a bridge sets the DRM_BRIDGE_OP_UPSTREAM_FIRST, then the > >>> post_disable for > >>> + * that bridge will be called before the previous one to reverse the > >>> pre_enable > >>> + * calling direction. > >>>* > >>>* Note: the bridge passed should be the one closest to the encoder > >>>*/ > >>> void drm_bridge_chain_post_disable(struct drm_bridge *bridge) > >>> { > >>>struct drm_encoder *encoder; > >>> + struct drm_bridge *next, *limit; > >>> > >>>if (!bridge) > >>>return; > >>> > >>>encoder = bridge->encoder; > >>>list_for_each_entry_from(bridge, &encoder->bridge_chain, > >>> chain_node) { > >>> + limit = NULL; > >>> + > >>> + if (!list_is_last(&bridge->chain_node, > >>> &encoder->bridge_chain)) { > >>> + next = list_next_entry(bridge, chain_node); > >>> + > >>> + if (next->ops & DRM_BRIDGE_OP_UPSTREAM_FIRST) { > >>> + limit = next; > >>> + > >>> + list_for_each_entry_from(next, > >>> &encoder->bridge_chain, > >>> + chain_node) { > >>> + if (!(next->ops & > >>> + > >>> DRM_BRIDGE_OP_UPSTREAM_FIRST)) { > >>> + next = > >>> list_prev_entry(next, chain_node); > >>> + limit = next; > >>> + break; > >>> + } > >>> + } > >>> + > >>> + list_for_each_entry_from_reverse(next, >