Re: drm/bridge: anx7625: Set downstream sink into normal status

2022-03-02 Thread Pin-yen Lin
Reviewed-by: Pin-Yen Lin 


On Wed, Mar 2, 2022 at 8:09 PM Xin Ji  wrote:
>
> As downstream sink was set into standby mode while bridge disabled,
> this patch used for setting downstream sink into normal status
> while enable bridge.
>
> Signed-off-by: Xin Ji 
> ---
>  drivers/gpu/drm/bridge/analogix/anx7625.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/gpu/drm/bridge/analogix/anx7625.c 
> b/drivers/gpu/drm/bridge/analogix/anx7625.c
> index 9aab879a8851..963eaf73ecab 100644
> --- a/drivers/gpu/drm/bridge/analogix/anx7625.c
> +++ b/drivers/gpu/drm/bridge/analogix/anx7625.c
> @@ -919,12 +919,20 @@ static void anx7625_dp_start(struct anx7625_data *ctx)
>  {
> int ret;
> struct device *dev = &ctx->client->dev;
> +   u8 data;
>
> if (!ctx->display_timing_valid) {
> DRM_DEV_ERROR(dev, "mipi not set display timing yet.\n");
> return;
> }
>
> +   dev_info(dev, "set downstream sink into normal\n");
> +   /* Downstream sink enter into normal mode */
> +   data = 1;
> +   ret = anx7625_aux_trans(ctx, DP_AUX_NATIVE_WRITE, 0x000600, 1, &data);
> +   if (ret < 0)
> +   dev_err(dev, "IO error : set sink into normal mode fail\n");
> +
> /* Disable HDCP */
> anx7625_write_and(ctx, ctx->i2c.rx_p1_client, 0xee, 0x9f);
>


[PULL] drm-intel-gt-next

2022-03-02 Thread Joonas Lahtinen
Hi Dave & Daniel,

Here is the last feature PR for v5.18.

For new platforms we have got more DG2 enabling: small BAR foundations,
64K page support and accelerated migration. For XeHP SDV we've got flat
CCS detection and compute command streamer being added.

Disabling i915 build on PREEMPT_RT for now due to deadlocks and
warnings. Fixes to GuC data structure accesses on ARM platforms.
A couple of other GuC init and SLPC fixes.

Then the usual bits of cleanup and smaller fixes.

Regards, Joonas

***

drm-intel-gt-next-2022-03-03:

Cross-subsystem Changes:

- drm-next backmerge for buddy allocator changes

Driver Changes:

- Skip i915_perf init for DG2 as it is not yet enabled (Ram)
- Add missing workarounds for DG2 (Clint)
- Add 64K page/align support for platforms like DG2 that require it (Matt A, 
Ram, Bob)
- Add accelerated migration support for DG2 (Matt A)
- Add flat CCS support for XeHP SDV (Abdiel, Ram)
- Add Compute Command Streamer (CCS) engine support for XeHP SDV (Michel,
  Daniele, Aravind, Matt R)
- Don't support parallel submission on compute / render (Matt B, Matt R)

- Disable i915 build on PREEMPT_RT until RT behaviour fixed (Sebastian)
- Remove RPS interrupt support for TGL+ (Jose)
- Fix S/R with PM_EARLY for non-GTT mappable objects on DG2 (Matt, Lucas)
- Skip stolen memory init if it is fully reserved (Jose)
- Use iosys_map for GuC data structures that may be in LMEM BAR or SMEM (Lucas)
- Do not complain about stale GuC reset notifications for banned contexts (John)

- Move context descriptor fields to intel_lrc.h (Matt R)
- Start adding support for small BAR (Matt A)
- Clarify vma lifetime (Thomas)
- Simplify subplatform detection on TGL (Jose)
- Correct the param count for unset GuC SLPC param (Vinay, Umesh)
- Read RP_STATE_CAP correctly on Gen12 with GuC SLPC (Vinay)
- Initialize GuC submission locks and queues early (Daniele)
- Fix GuC flag query helper function to not modify state (John)

- Drop fake lmem support now we have real hardware available (Lucas)
- Move misplaced W/A to their correct locations (Srinivasan)
- Use get_reset_domain() helper (Tejas)
- Move context descriptor fields to intel_lrc.h (Matt R)
- Selftest improvements (Matt A)

The following changes since commit 54f43c17d681f6d9523fcfaeefc9df77993802e1:

  Merge tag 'drm-misc-next-2022-02-23' of 
git://anongit.freedesktop.org/drm/drm-misc into drm-next (2022-02-25 05:50:18 
+1000)

are available in the Git repository at:

  git://anongit.freedesktop.org/drm/drm-intel tags/drm-intel-gt-next-2022-03-03

for you to fetch changes up to b2006061ae28fe7e84af6c9757ee89c4e505e92b:

  drm/i915/xehpsdv: Move render/compute engine reset domains related 
workarounds (2022-03-02 06:52:42 -0800)


Cross-subsystem Changes:

- drm-next backmerge for buddy allocator changes

Driver Changes:

- Skip i915_perf init for DG2 as it is not yet enabled (Ram)
- Add missing workarounds for DG2 (Clint)
- Add 64K page/align support for platforms like DG2 that require it (Matt A, 
Ram, Bob)
- Add accelerated migration support for DG2 (Matt A)
- Add flat CCS support for XeHP SDV (Abdiel, Ram)
- Add Compute Command Streamer (CCS) engine support for XeHP SDV (Michel,
  Daniele, Aravind, Matt R)
- Don't support parallel submission on compute / render (Matt B, Matt R)

- Disable i915 build on PREEMPT_RT until RT behaviour fixed (Sebastian)
- Remove RPS interrupt support for TGL+ (Jose)
- Fix S/R with PM_EARLY for non-GTT mappable objects on DG2 (Matt, Lucas)
- Skip stolen memory init if it is fully reserved (Jose)
- Use iosys_map for GuC data structures that may be in LMEM BAR or SMEM (Lucas)
- Do not complain about stale GuC reset notifications for banned contexts (John)

- Move context descriptor fields to intel_lrc.h
- Start adding support for small BAR (Matt A)
- Clarify vma lifetime (Thomas)
- Simplify subplatform detection on TGL (Jose)
- Correct the param count for unset GuC SLPC param (Vinay, Umesh)
- Read RP_STATE_CAP correctly on Gen12 with GuC SLPC (Vinay)
- Initialize GuC submission locks and queues early (Daniele)
- Fix GuC flag query helper function to not modify state (John)

- Drop fake lmem support now we have real hardware available (Lucas)
- Move misplaced W/A to their correct locations (Srinivasan)
- Use get_reset_domain() helper (Tejas)
- Move context descriptor fields to intel_lrc.h (Matt R)
- Selftest improvements (Matt A)


Abdiel Janulgue (1):
  drm/i915/lmem: Enable lmem for platforms with Flat CCS

CQ Tang (1):
  drm/i915/xehpsdv: Add has_flat_ccs to device info

Clint Taylor (1):
  drm/i915/dg2: add Wa_14014947963

Daniele Ceraolo Spurio (4):
  drm/i915/guc: Initialize GuC submission locks and queues early
  drm/i915/xehp: compute engine pipe_control
  drm/i915/xehp/guc: enable compute engine inside GuC
  drm/i915/xehp: handle fused off CCS engines

John Harrison (2):

Re: [PATCH v2 1/4] drm/i915/gt: Clear compress metadata for Xe_HP platforms

2022-03-02 Thread Hellstrom, Thomas
On Wed, 2022-03-02 at 03:23 +0530, Ramalingam C wrote:
> From: Ayaz A Siddiqui 
> 
> Xe-HP and latest devices support Flat CCS which reserved a portion of
> the device memory to store compression metadata, during the clearing
> of
> device memory buffer object we also need to clear the associated
> CCS buffer.
> 
> Flat CCS memory can not be directly accessed by S/W.
> Address of CCS buffer associated main BO is automatically calculated
> by device itself. KMD/UMD can only access this buffer indirectly
> using
> XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.
> 
> v2: Fixed issues with platform naming [Lucas]
> v3: Rebased [Ram]
>     Used the round_up funcs [Bob]
> v4: Fixed ccs blk calculation [Ram]
>     Added Kdoc on flat-ccs.
> v5: GENMASK is used [Matt]
>     mocs fix [Matt]
>     Comments Fix [Matt]
>     Flush address programming [Ram]
> v6: FLUSH_DW is fixed
>     Few coding style fix
> 
> Signed-off-by: Ayaz A Siddiqui 
> Signed-off-by: Ramalingam C 
> ---
>  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 ++
>  drivers/gpu/drm/i915/gt/intel_migrate.c  | 143
> ++-
>  2 files changed, 154 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> index f8253012d166..237c1baccc64 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> @@ -203,6 +203,21 @@
>  #define GFX_OP_DRAWRECT_INFO
> ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
>  #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
>  
> +#define XY_CTRL_SURF_INSTR_SIZE5
> +#define MI_FLUSH_DW_SIZE   3
> +#define XY_CTRL_SURF_COPY_BLT  ((2 << 29) | (0x48 << 22) |
> 3)
> +#define   SRC_ACCESS_TYPE_SHIFT21
> +#define   DST_ACCESS_TYPE_SHIFT20
> +#define   CCS_SIZE_MASKGENMASK(17, 8)
> +#define   XY_CTRL_SURF_MOCS_MASK   GENMASK(31, 25)
> +#define   NUM_CCS_BYTES_PER_BLOCK  256
> +#define   NUM_BYTES_PER_CCS_BYTE   256
> +#define   NUM_CCS_BLKS_PER_XFER1024
> +#define   INDIRECT_ACCESS  0
> +#define   DIRECT_ACCESS1
> +#define  MI_FLUSH_LLC  BIT(9)
> +#define  MI_FLUSH_CCS  BIT(16)
> +
>  #define COLOR_BLT_CMD  (2 << 29 | 0x40 << 22 | (5 -
> 2))
>  #define XY_COLOR_BLT_CMD   (2 << 29 | 0x50 << 22)
>  #define SRC_COPY_BLT_CMD   (2 << 29 | 0x43 << 22)
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c
> b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 20444d6ceb3c..330fcdc3e0cf 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -16,6 +16,8 @@ struct insert_pte_data {
>  };
>  
>  #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
> +#define GET_CCS_BYTES(i915, size)  (HAS_FLAT_CCS(i915) ? \
> +    DIV_ROUND_UP(size,
> NUM_BYTES_PER_CCS_BYTE) : 0)
>  
>  static bool engine_supports_migration(struct intel_engine_cs
> *engine)
>  {
> @@ -467,6 +469,110 @@ static bool wa_1209644611_applies(int ver, u32
> size)
> return height % 4 == 3 && height <= 8;
>  }
>  
> +/**
> + * DOC: Flat-CCS - Memory compression for Local memory
> + *
> + * On Xe-HP and later devices, we use dedicated compression control
> state (CCS)
> + * stored in local memory for each surface, to support the 3D and
> media
> + * compression formats.
> + *
> + * The memory required for the CCS of the entire local memory is
> 1/256 of the
> + * local memory size. So before the kernel boot, the required memory
> is reserved
> + * for the CCS data and a secure register will be programmed with
> the CCS base
> + * address.
> + *
> + * Flat CCS data needs to be cleared when a lmem object is
> allocated.
> + * And CCS data can be copied in and out of CCS region through
> + * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
> + *
> + * When we exhaust the lmem, if the object's placements support
> smem, then we can
> + * directly decompress the compressed lmem object into smem and
> start using it
> + * from smem itself.
> + *
> + * But when we need to swapout the compressed lmem object into a
> smem region
> + * though objects' placement doesn't support smem, then we copy the
> lmem content
> + * as it is into smem region along with ccs data (using
> XY_CTRL_SURF_COPY_BLT).
> + * When the object is referred, lmem content will be swaped in along
> with
> + * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at
> corresponding
> + * location.
> + */
> +
> +static inline u32 *i915_flush_dw(u32 *cmd, u32 flags)
> +{
> +   *cmd++ = MI_FLUSH_DW | flags;
> +   *cmd++ = 0;
> +   *cmd++ = 0;
> +
> +   return cmd;
> +}
> +
> +static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915,
> int size)
> +{
> +   u32 num_cmds, num_blks, total_s

Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-02 Thread Jakob Koschel



> On 3. Mar 2022, at 05:58, David Laight  wrote:
> 
> From: Xiaomeng Tong
>> Sent: 03 March 2022 02:27
>> 
>> On Wed, 2 Mar 2022 14:04:06 +, David Laight
>>  wrote:
>>> I think that it would be better to make any alternate loop macro
>>> just set the variable to NULL on the loop exit.
>>> That is easier to code for and the compiler might be persuaded to
>>> not redo the test.
>> 
>> No, that would lead to a NULL dereference.
> 
> Why, it would make it b ethe same as the 'easy to use':
>   for (item = head; item; item = item->next) {
>   ...
>   if (...)
>   break;
>   ...
>   }
>   if (!item)
>   return;
> 
>> The problem is the mis-use of iterator outside the loop on exit, and
>> the iterator will be the HEAD's container_of pointer which pointers
>> to a type-confused struct. Sidenote: The *mis-use* here refers to
>> mistakely access to other members of the struct, instead of the
>> list_head member which acutally is the valid HEAD.
> 
> The problem is that the HEAD's container_of pointer should never
> be calculated at all.
> This is what is fundamentally broken about the current definition.
> 
>> IOW, you would dereference a (NULL + offset_of_member) address here.
> 
> Where?
> 
>> Please remind me if i missed something, thanks.
>> 
>> Can you share your "alternative definitions" details? thanks!
> 
> The loop should probably use as extra variable that points
> to the 'list node' in the next structure.
> Something like:
>   for (xxx *iter = head->next;
>   iter == &head ? ((item = NULL),0) : ((item = 
> list_item(iter),1));
>   iter = item->member->next) {
>  ...
> With a bit of casting you can use 'item' to hold 'iter'.

I think this would make sense, it would mean you only assign the containing
element on valid elements.

I was thinking something along the lines of:

#define list_for_each_entry(pos, head, member)  
\
for (struct list_head *list = head->next, typeof(pos) pos;  \
 list == head ? 0 : (( pos = list_entry(pos, list, member), 1));
\
 list = list->next)

Although the initialization block of the for loop is not valid C, I'm
not sure there is any way to declare two variables of a different type
in the initialization part of the loop.

I believe all this does is get rid of the &pos->member == (head) check
to terminate the list.
It alone will not fix any of the other issues that using the iterator
variable after the loop currently has.


AFAIK Adrián Moreno is working on doing something along those lines
for the list iterator in openvswitch (that was similar to the kernel
one before) [1].

I *think* they don't declare 'pos' within the loop which we *do want*
to avoid any uses of it after the loop.
(If pos is not declared in the initialization block, shadowing the
*outer* pos, it would just point to the last element of the list or stay
uninitialized if the list is empty).


[1] https://www.mail-archive.com/ovs-dev@openvswitch.org/msg63497.html


> 
>> 
>>> OTOH there may be alternative definitions that can be used to get
>>> the compiler (or other compiler-like tools) to detect broken code.
>>> Even if the definition can't possibly generate a working kerrnel.
>> 
>> The "list_for_each_entry_inside(pos, type, head, member)" way makes
>> the iterator invisiable outside the loop, and would be catched by
>> compiler if use-after-loop things happened.
> 
> It is also a compete PITA for anything doing a search.
> 
>   David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 
> 1PT, UK
> Registration No: 1397386 (Wales)
> 

- Jakob

Re: linux-next: build failure after merge of the kspp tree

2022-03-02 Thread Stephen Rothwell
Hi all,

ok, today I used the current kspp tree and added three patches as merge fixups:

https://lore.kernel.org/all/20220228081421.1504213-1-hsi...@chromium.org/
https://cgit.freedesktop.org/drm/drm-misc/commit/?id=d4da1f27396fb1dde079447a3612f4f512caed07
https://cgit.freedesktop.org/drm/drm-misc/commit/?id=a2151490cc6c57b368d7974ffd447a8b36ade639

(the second one required a bit of fuzzing)

I will remove them as they appear in the drm trees (hopefully).

Kees, you just need to remember that you have a dependency on the DRM
patches being in Linus' tree before you send your pull request.

-- 
Cheers,
Stephen Rothwell


pgp7dNzzqLOZL.pgp
Description: OpenPGP digital signature


Re: [PATCH -next] drm/amdgpu: clean up some inconsistent indenting

2022-03-02 Thread Christian König

Am 03.03.22 um 02:52 schrieb Yang Li:

Eliminate the follow smatch warning:
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c:413 amdgpu_get_xgmi_hive()
warn: inconsistent indenting

Reported-by: Abaci Robot 
Signed-off-by: Yang Li 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 16 
  1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 91817a31f3e1..4ff6e06babca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -410,14 +410,14 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct 
amdgpu_device *adev)
 */
if (adev->reset_domain->type != XGMI_HIVE) {
hive->reset_domain = amdgpu_reset_create_reset_domain(XGMI_HIVE, 
"amdgpu-reset-hive");
-   if (!hive->reset_domain) {
-   dev_err(adev->dev, "XGMI: failed initializing reset 
domain for xgmi hive\n");
-   ret = -ENOMEM;
-   kobject_put(&hive->kobj);
-   kfree(hive);
-   hive = NULL;
-   goto pro_end;
-   }
+   if (!hive->reset_domain) {
+   dev_err(adev->dev, "XGMI: failed initializing reset domain 
for xgmi hive\n");
+   ret = -ENOMEM;
+   kobject_put(&hive->kobj);
+   kfree(hive);
+   hive = NULL;
+   goto pro_end;
+   }
} else {
amdgpu_reset_get_reset_domain(adev->reset_domain);
hive->reset_domain = adev->reset_domain;




Re: [PATCH] drm/msm/gpu: Fix crash on devices without devfreq support

2022-03-02 Thread Dmitry Baryshkov
On Wed, 23 Feb 2022 at 18:46, Rob Clark  wrote:
>
> On Tue, Feb 22, 2022 at 7:11 PM Dmitry Baryshkov
>  wrote:
> >
> > On 19/02/2022 21:33, Rob Clark wrote:
> > > From: Rob Clark 
> > >
> > > Avoid going down devfreq paths on devices where devfreq is not
> > > initialized.
> > >
> > > Reported-by: Linux Kernel Functional Testing 
> > > Reported-by: Anders Roxell 
> > > Signed-off-by: Rob Clark 
> > > ---
> > >   drivers/gpu/drm/msm/msm_gpu_devfreq.c | 31 +--
> > >   1 file changed, 25 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c 
> > > b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> > > index 9bf319be11f6..26a3669a97b3 100644
> > > --- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> > > +++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> > > @@ -83,12 +83,17 @@ static struct devfreq_dev_profile msm_devfreq_profile 
> > > = {
> > >   static void msm_devfreq_boost_work(struct kthread_work *work);
> > >   static void msm_devfreq_idle_work(struct kthread_work *work);
> > >
> > > +static bool has_devfreq(struct msm_gpu *gpu)
> > > +{
> > > + return !!gpu->funcs->gpu_busy;
> >
> > I see that devfreq init will be skipped if gpu_busy is NULL.
> > Can we use gpu->devfreq instead of this condition?
>
> We could, but then we couldn't also use the same has_devfreq() helper
> in msm_devfreq_init().  I thought it was clearer to use the same
> helper everywhere.

Well... It is not clear at first glance how gpu_busy is related to
devfreq. On the other hand, if gpu->devfreq is NULL, it's obvious that
devfreq is not initialized.

I'd propose to use if (gpu->funcs->gpu_busy) to check if we can init
devfreq and after that to check (gpu->devfreq) as a way to know
whether the devfreq is available.

>
> > I noticed that you have replaced some of gpu->devfreq checks with
> > has_devreq() calls. Is there any difference?
>
> It amounts to the same thing because if you don't have gpu_busy, then
> devfreq is never initialized.  I just thought it clearer to use the
> same check in all places.

See my comment above.

>
> BR,
> -R
>
> > > +}
> > > +
> > >   void msm_devfreq_init(struct msm_gpu *gpu)
> > >   {
> > >   struct msm_gpu_devfreq *df = &gpu->devfreq;
> > >
> > >   /* We need target support to do devfreq */
> > > - if (!gpu->funcs->gpu_busy)
> > > + if (!has_devfreq(gpu))
> > >   return;
> > >
> > >   dev_pm_qos_add_request(&gpu->pdev->dev, &df->idle_freq,
> > > @@ -149,6 +154,9 @@ void msm_devfreq_cleanup(struct msm_gpu *gpu)
> > >   {
> > >   struct msm_gpu_devfreq *df = &gpu->devfreq;
> > >
> > > + if (!has_devfreq(gpu))
> > > + return;
> > > +
> > >   devfreq_cooling_unregister(gpu->cooling);
> > >   dev_pm_qos_remove_request(&df->boost_freq);
> > >   dev_pm_qos_remove_request(&df->idle_freq);
> > > @@ -156,16 +164,24 @@ void msm_devfreq_cleanup(struct msm_gpu *gpu)
> > >
> > >   void msm_devfreq_resume(struct msm_gpu *gpu)
> > >   {
> > > - gpu->devfreq.busy_cycles = 0;
> > > - gpu->devfreq.time = ktime_get();
> > > + struct msm_gpu_devfreq *df = &gpu->devfreq;
> > >
> > > - devfreq_resume_device(gpu->devfreq.devfreq);
> > > + if (!has_devfreq(gpu))
> > > + return;
> > > +
> > > + df->busy_cycles = 0;
> > > + df->time = ktime_get();
> > > +
> > > + devfreq_resume_device(df->devfreq);
> > >   }
> > >
> > >   void msm_devfreq_suspend(struct msm_gpu *gpu)
> > >   {
> > >   struct msm_gpu_devfreq *df = &gpu->devfreq;
> > >
> > > + if (!has_devfreq(gpu))
> > > + return;
> > > +
> > >   devfreq_suspend_device(df->devfreq);
> > >
> > >   cancel_idle_work(df);
> > > @@ -185,6 +201,9 @@ void msm_devfreq_boost(struct msm_gpu *gpu, unsigned 
> > > factor)
> > >   struct msm_gpu_devfreq *df = &gpu->devfreq;
> > >   uint64_t freq;
> > >
> > > + if (!has_devfreq(gpu))
> > > + return;
> > > +
> > >   freq = get_freq(gpu);
> > >   freq *= factor;
> > >
> > > @@ -207,7 +226,7 @@ void msm_devfreq_active(struct msm_gpu *gpu)
> > >   struct devfreq_dev_status status;
> > >   unsigned int idle_time;
> > >
> > > - if (!df->devfreq)
> > > + if (!has_devfreq(gpu))
> > >   return;
> > >
> > >   /*
> > > @@ -253,7 +272,7 @@ void msm_devfreq_idle(struct msm_gpu *gpu)
> > >   {
> > >   struct msm_gpu_devfreq *df = &gpu->devfreq;
> > >
> > > - if (!df->devfreq)
> > > + if (!has_devfreq(gpu))
> > >   return;
> > >
> > >   msm_hrtimer_queue_work(&df->idle_work, ms_to_ktime(1),
> >
> >
> > --
> > With best wishes
> > Dmitry



-- 
With best wishes
Dmitry


Re: [PATCH v2 2/3] drm/i915: Remove the vma refcount

2022-03-02 Thread Thomas Hellström
On Wed, 2022-03-02 at 14:01 -0800, Niranjana Vishwanathapura wrote:
> On Wed, Mar 02, 2022 at 11:21:59AM +0100, Thomas Hellström wrote:
> > Now that i915_vma_parked() is taking the object lock on vma
> > destruction,
> > and the only user of the vma refcount, i915_gem_object_unbind()
> > also takes the object lock, remove the vma refcount.
> > 
> > Signed-off-by: Thomas Hellström 
> > ---
> > drivers/gpu/drm/i915/i915_gem.c   | 17 +
> > drivers/gpu/drm/i915/i915_vma.c   | 14 +++---
> > drivers/gpu/drm/i915/i915_vma.h   | 14 --
> > drivers/gpu/drm/i915/i915_vma_types.h |  1 -
> > 4 files changed, 16 insertions(+), 30 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index dd84ebabb50f..c26110abcc0b 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -151,14 +151,25 @@ int i915_gem_object_unbind(struct
> > drm_i915_gem_object *obj,
> > break;
> > }
> > 
> > +   /*
> > +    * Requiring the vm destructor to take the object
> > lock
> > +    * before destroying a vma would help us eliminate
> > the
> > +    * i915_vm_tryget() here, AND thus also the barrier
> > stuff
> > +    * at the end. That's an easy fix, but sleeping
> > locks in
> > +    * a kthread should generally be avoided.
> > +    */
> > ret = -EAGAIN;
> > if (!i915_vm_tryget(vma->vm))
> > break;
> > 
> > -   /* Prevent vma being freed by i915_vma_parked as we
> > unbind */
> > -   vma = __i915_vma_get(vma);
> > spin_unlock(&obj->vma.lock);
> > 
> > +   /*
> > +    * Since i915_vma_parked() takes the object lock
> > +    * before vma destruction, it won't race us here,
> > +    * and destroy the vma from under us.
> > +    */
> > +
> > if (vma) {
> > bool vm_trylock = !!(flags &
> > I915_GEM_OBJECT_UNBIND_VM_TRYLOCK);
> > ret = -EBUSY;
> > @@ -180,8 +191,6 @@ int i915_gem_object_unbind(struct
> > drm_i915_gem_object *obj,
> > ret = i915_vma_unbind(vma);
> > }
> > }
> > -
> > -   __i915_vma_put(vma);
> > }
> > 
> > i915_vm_put(vma->vm);
> 
> One issue still left in i915_gem_object_unbind is that it temporarily
> removes
> vmas from the obj->vma.list and adds back later as vma needs to be
> unbind outside
> the obj->vma.lock spinlock. This is an issue for other places where
> we iterate
> over the obj->vma.list. i915_debugfs_describe_obj is one such case
> (upcoming
> vm_bind will be another) that iterates over this list.
> What is the plan here? Do we need to take object lock while iterating
> over the
> list?

Yeah, I guess that's an option if that's at all possible (we might need
to iterate over the list in the mmu notifier, for example).

The other option is to
*) get rid of the GGTT / PPGTT sorting of vmas in the list,
*) being able to determine per vma *before we unlock* if we need to
unlock the list spinlock to take action,
*) re-add all vmas we've previously iterated over at the *tail* of the
list before unlocking the list lock.

Then a termination criterion for iterating would be that we reached the
end of the list without unlocking. Otherwise we need to restart
iteration after unlocking.

This would typically give us O(2N) complexity for the iteration. If we
re-add at the *head* of the list, we'd see O(N²), but to be able to re-
add previous vmas to the tail of the list requires us to get rid of the
sorting.

> 
> But this just something I noticed and not related to this patch.
> This patch looks good to me.
> Reviewed-by: Niranjana Vishwanathapura
> 
>  

Thanks for reviewing. I noticed there is some documentation needing
updating as well, so I'll send out a v3 without functional changes.

/Thomas


> 
> > diff --git a/drivers/gpu/drm/i915/i915_vma.c
> > b/drivers/gpu/drm/i915/i915_vma.c
> > index 91538bc38110..6fd25b39748f 100644
> > --- a/drivers/gpu/drm/i915/i915_vma.c
> > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > @@ -122,7 +122,6 @@ vma_create(struct drm_i915_gem_object *obj,
> > if (vma == NULL)
> > return ERR_PTR(-ENOMEM);
> > 
> > -   kref_init(&vma->ref);
> > vma->ops = &vm->vma_ops;
> > vma->obj = obj;
> > vma->size = obj->base.size;
> > @@ -1628,15 +1627,6 @@ void i915_vma_reopen(struct i915_vma *vma)
> > __i915_vma_remove_closed(vma);
> > }
> > 
> > -void i915_vma_release(struct kref *ref)
> > -{
> > -   struct i915_vma *vma = container_of(ref, typeof(*vma),
> > ref);
> > -
> > -   i915_active_fini(&vma->active);
> > -

Re: [PATCH] drm: Fix no previous prototype error in drm_nomodeset.c

2022-03-02 Thread Guenter Roeck
On Wed, Mar 02, 2022 at 10:39:02PM +0530, Aashish Sharma wrote:
> Fix this kernel test robot error:
> 
> drivers/gpu/drm/drm_nomodeset.c:8:6: error:
> no previous prototype for 'drm_firmware_drivers_only'
> 
> Including drm_drv.h in drm_nomodeset.c which contains
> drm_firmware_drivers_only's declaration.
> 
> Signed-off-by: Aashish Sharma 

Reviewed-by: Guenter Roeck 

> ---
>  drivers/gpu/drm/drm_nomodeset.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/drm_nomodeset.c b/drivers/gpu/drm/drm_nomodeset.c
> index f3978d5bd3a1..9402deb4985f 100644
> --- a/drivers/gpu/drm/drm_nomodeset.c
> +++ b/drivers/gpu/drm/drm_nomodeset.c
> @@ -2,6 +2,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  static bool drm_nomodeset;
>  
> -- 
> 2.35.1.574.g5d30c73bfb-goog
> 


Re: [PATCH 15/15] drm/i915/gt: Clear compress metadata for Xe_HP platforms

2022-03-02 Thread Matt Roper
On Sun, Feb 27, 2022 at 10:22:20PM +0530, Ramalingam C wrote:
> Matt,
> 
> Thanks for the review.
> 
> On 2022-02-18 at 17:47:22 -0800, Matt Roper wrote:
> > On Sat, Feb 19, 2022 at 12:17:52AM +0530, Ramalingam C wrote:
> > > From: Ayaz A Siddiqui 
> > > 
> > > Xe-HP and latest devices support Flat CCS which reserved a portion of
> > > the device memory to store compression metadata, during the clearing of
> > > device memory buffer object we also need to clear the associated
> > > CCS buffer.
> > > 
> > > Flat CCS memory can not be directly accessed by S/W.
> > > Address of CCS buffer associated main BO is automatically calculated
> > > by device itself. KMD/UMD can only access this buffer indirectly using
> > > XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.
> > > 
> > > v2: Fixed issues with platform naming [Lucas]
> > > v3: Rebased [Ram]
> > > Used the round_up funcs [Bob]
> > > v4: Fixed ccs blk calculation [Ram]
> > > Added Kdoc on flat-ccs.
> > > 
> > > Cc: CQ Tang 
> > > Signed-off-by: Ayaz A Siddiqui 
> > > Signed-off-by: Ramalingam C 
> > > ---
> > >  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 ++
> > >  drivers/gpu/drm/i915/gt/intel_migrate.c  | 145 ++-
> > >  2 files changed, 156 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h 
> > > b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > > index f8253012d166..166de5436c4a 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > > @@ -203,6 +203,21 @@
> > >  #define GFX_OP_DRAWRECT_INFO ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
> > >  #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
> > >  
> > > +#define XY_CTRL_SURF_INSTR_SIZE  5
> > > +#define MI_FLUSH_DW_SIZE 3
> > > +#define XY_CTRL_SURF_COPY_BLT((2 << 29) | (0x48 << 22) | 3)
> > > +#define   SRC_ACCESS_TYPE_SHIFT  21
> > > +#define   DST_ACCESS_TYPE_SHIFT  20
> > > +#define   CCS_SIZE_SHIFT 8
> > 
> > Rather than using a shift, it might be better to just define the
> > bitfield.  E.g.,
> > 
> > #define CCS_SIZEGENMASK(17, 8)
> > 
> > and then later
> > 
> > FIELD_PREP(CCS_SIZE, i - 1)
> > 
> > to refer to the proper value.
> > 
> > > +#define   XY_CTRL_SURF_MOCS_SHIFT25
> > 
> > Same here; we can use GENMASK(31, 25) to define the field.
> 
> Adapting to the GENMASK and FIELD_PREP for these two macros
> > 
> > > +#define   NUM_CCS_BYTES_PER_BLOCK256
> > > +#define   NUM_BYTES_PER_CCS_BYTE 256
> > > +#define   NUM_CCS_BLKS_PER_XFER  1024
> > > +#define   INDIRECT_ACCESS0
> > > +#define   DIRECT_ACCESS  1
> > > +#define  MI_FLUSH_LLCBIT(9)
> > > +#define  MI_FLUSH_CCSBIT(16)
> > > +
> > >  #define COLOR_BLT_CMD(2 << 29 | 0x40 << 22 | (5 - 2))
> > >  #define XY_COLOR_BLT_CMD (2 << 29 | 0x50 << 22)
> > >  #define SRC_COPY_BLT_CMD (2 << 29 | 0x43 << 22)
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c 
> > > b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > > index 20444d6ceb3c..9f9cd2649377 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > > @@ -16,6 +16,8 @@ struct insert_pte_data {
> > >  };
> > >  
> > >  #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
> > > +#define GET_CCS_BYTES(i915, size)(HAS_FLAT_CCS(i915) ? \
> > > +  DIV_ROUND_UP(size, 
> > > NUM_BYTES_PER_CCS_BYTE) : 0)
> > >  
> > >  static bool engine_supports_migration(struct intel_engine_cs *engine)
> > >  {
> > > @@ -467,6 +469,113 @@ static bool wa_1209644611_applies(int ver, u32 size)
> > >   return height % 4 == 3 && height <= 8;
> > >  }
> > >  
> > > +/**
> > > + * DOC: Flat-CCS - Memory compression for Local memory
> > > + *
> > > + * On Xe-HP and later devices, we use dedicated compression control 
> > > state (CCS)
> > > + * stored in local memory for each surface, to support the 3D and media
> > > + * compression formats.
> > > + *
> > > + * The memory required for the CCS of the entire local memory is 1/256 
> > > of the
> > > + * local memory size. So before the kernel boot, the required memory is 
> > > reserved
> > > + * for the CCS data and a secure register will be programmed with the 
> > > CCS base
> > > + * address.
> > > + *
> > > + * Flat CCS data needs to be cleared when a lmem object is allocated.
> > > + * And CCS data can be copied in and out of CCS region through
> > > + * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
> > > + *
> > > + * When we exaust the lmem, if the object's placements support smem, 
> > > then we can
> > 
> > Typo: exhaust
> > 
> > > + * directly decompress the compressed lmem object into smem and start 
> > > using it
> > > + * from smem itself.
> > > + *
> > > + * But when

Re: Report 2 in ext4 and journal based on v5.17-rc1

2022-03-02 Thread Byungchul Park
Ted wrote:
> On Thu, Mar 03, 2022 at 10:00:33AM +0900, Byungchul Park wrote:
> > 
> > Unfortunately, it's neither perfect nor safe without another wakeup
> > source - rescue wakeup source.
> > 
> >consumer producer
> > 
> > lock L
> > (too much work queued == true)
> > unlock L
> > --- preempted
> >lock L
> >unlock L
> >do work
> >lock L
> >unlock L
> >do work
> >...
> >(no work == true)
> >sleep
> > --- scheduled in
> > sleep
> 
> That's not how things work in ext4.  It's **way** more complicated

You seem to get it wrong. This example is what Jan Kara gave me. I just
tried to explain things based on Jan Kara's example so leaving all
statements that Jan Kara wrote. Plus the example was so helpful. Thanks,
Jan Kara.

> than that.  We have multiple wait channels, one wake up the consumer
> (read: the commit thread), and one which wakes up any processes
> waiting for commit thread to have made forward progress.  We also have
> two spin-lock protected sequence number, one which indicates the
> current commited transaction #, and one indicating the transaction #
> that needs to be committed.
> 
> On the commit thread, it will sleep on j_wait_commit, and when it is
> woken up, it will check to see if there is work to be done
> (j_commit_sequence != j_commit_request), and if so, do the work, and
> then wake up processes waiting on the wait_queue j_wait_done_commit.
> (Again, all of this uses the pattern, "prepare to wait", then check to
> see if we should sleep, if we do need to sleep, unlock j_state_lock,
> then sleep.   So this prevents any races leading to lost wakeups.
> 
> On the start_this_handle() thread, if we current transaction is too
> full, we set j_commit_request to its transaction id to indicate that
> we want the current transaction to be committed, and then we wake up
> the j_wait_commit wait queue and then we enter a loop where do a
> prepare_to_wait in j_wait_done_commit, check to see if
> j_commit_sequence == the transaction id that we want to be completed,
> and if it's not done yet, we unlock the j_state_lock spinlock, and go
> to sleep.  Again, because of the prepare_to_wait, there is no chance
> of a lost wakeup.

The above explantion gives me a clear view about synchronization of
journal things. I appreciate it.

> So there really is no "consumer" and "producer" here.  If you really
> insist on using this model, which really doesn't apply, for one

Dept does not assume "consumer" and "producer" model at all, but Dept
works with general waits and events. *That model is just one of them.*

> thread, it's the consumer with respect to one wait queue, and the
> producer with respect to the *other* wait queue.  For the other
> thread, the consumer and producer roles are reversed.
> 
> And of course, this is a highly simplified model, since we also have a
> wait queue used by the commit thread to wait for the number of active
> handles on a particular transaction to go to zero, and
> stop_this_handle() will wake up commit thread via this wait queue when
> the last active handle on a particular transaction is retired.  (And
> yes, that parameter is also protected by a different spin lock which
> is per-transaction).

This one also gives me a clear view. Thanks a lot.

> So it seems to me that a fundamental flaw in DEPT's model is assuming
> that the only waiting paradigm that can be used is consumer/producer,

No, Dept does not.

> and that's simply not true.  The fact that you use the term "lock" is
> also going to lead a misleading line of reasoning, because properly

"lock/unlock L" comes from the Jan Kara's example. It has almost nothing
to do with the explanation. I just left "lock/unlock L" as a statement
that comes from the Jan Kara's example.

> speaking, they aren't really locks.  We are simply using wait channels

I totally agree with you. *They aren't really locks but it's just waits
and wakeups.* That's exactly why I decided to develop Dept. Dept is not
interested in locks unlike Lockdep, but fouces on waits and wakeup
sources itself. I think you get Dept wrong a lot. Please ask me more if
you have things you doubt about Dept.

Thanks,
Byungchul


RE: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-02 Thread David Laight
From: Xiaomeng Tong
> Sent: 03 March 2022 02:27
> 
> On Wed, 2 Mar 2022 14:04:06 +, David Laight
>  wrote:
> > I think that it would be better to make any alternate loop macro
> > just set the variable to NULL on the loop exit.
> > That is easier to code for and the compiler might be persuaded to
> > not redo the test.
> 
> No, that would lead to a NULL dereference.

Why, it would make it b ethe same as the 'easy to use':
for (item = head; item; item = item->next) {
...
if (...)
break;
...
}
if (!item)
return;
 
> The problem is the mis-use of iterator outside the loop on exit, and
> the iterator will be the HEAD's container_of pointer which pointers
> to a type-confused struct. Sidenote: The *mis-use* here refers to
> mistakely access to other members of the struct, instead of the
> list_head member which acutally is the valid HEAD.

The problem is that the HEAD's container_of pointer should never
be calculated at all.
This is what is fundamentally broken about the current definition.

> IOW, you would dereference a (NULL + offset_of_member) address here.

Where?

> Please remind me if i missed something, thanks.
>
> Can you share your "alternative definitions" details? thanks!

The loop should probably use as extra variable that points
to the 'list node' in the next structure.
Something like:
for (xxx *iter = head->next;
iter == &head ? ((item = NULL),0) : ((item = 
list_item(iter),1));
iter = item->member->next) {
   ...
With a bit of casting you can use 'item' to hold 'iter'.

> 
> > OTOH there may be alternative definitions that can be used to get
> > the compiler (or other compiler-like tools) to detect broken code.
> > Even if the definition can't possibly generate a working kerrnel.
> 
> The "list_for_each_entry_inside(pos, type, head, member)" way makes
> the iterator invisiable outside the loop, and would be catched by
> compiler if use-after-loop things happened.

It is also a compete PITA for anything doing a search.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



[pull] amdgpu drm-fixes-5.17

2022-03-02 Thread Alex Deucher
Hi Dave, Daniel,

Just one small fix for 5.17.

The following changes since commit e7c470a4b543375d50d88a4c5abd4b9e0f5adcea:

  Merge tag 'exynos-drm-fixes-v5.17-rc6' of 
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes 
(2022-02-28 14:05:44 +1000)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-fixes-5.17-2022-03-02

for you to fetch changes up to f1ef17011c765495c876fa75435e59eecfdc1ee4:

  drm/amdgpu: fix suspend/resume hang regression (2022-03-02 18:36:43 -0500)


amd-drm-fixes-5.17-2022-03-02:

amdgpu:
- Suspend regression fix


Qiang Yu (1):
  drm/amdgpu: fix suspend/resume hang regression

 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


Re: [PATCH 1/9] dt-bindings: mxsfb: Add compatible for i.MX8MP

2022-03-02 Thread Marek Vasut

On 3/3/22 03:54, Liu Ying wrote:

On Wed, 2022-03-02 at 12:57 +0100, Lucas Stach wrote:

Am Mittwoch, dem 02.03.2022 um 17:41 +0800 schrieb Liu Ying:

On Wed, 2022-03-02 at 10:23 +0100, Lucas Stach wrote:

Am Mittwoch, dem 02.03.2022 um 03:54 +0100 schrieb Marek Vasut:

On 3/1/22 14:18, Lucas Stach wrote:

Am Dienstag, dem 01.03.2022 um 07:03 -0600 schrieb Adam Ford:

On Tue, Mar 1, 2022 at 5:05 AM Lucas Stach  wrote:

Am Dienstag, dem 01.03.2022 um 11:19 +0100 schrieb Marek Vasut:

On 3/1/22 11:04, Lucas Stach wrote:

Hi,

[...]


Given the two totally different IPs, I don't see bugs of IP control
logics should be fixed for both drivers. Naturally, the two would
diverge due to different HWs. Looking at Patch 9/9, it basically
squashes code to control LCDIFv3 into the mxsfb drm driver with
'if/else' checks(barely no common control code), which is hard to
maintain and not able to achieve good scalability for both 'LCDIFv3'
and 'LCDIF'.


I tend to agree with Liu here. Writing a DRM driver isn't that much
boilerplate anymore with all the helpers we have available in the
framework today.


I did write a separate driver for this IP before I spent time merging
them into single driver, that's when I realized a single driver is much
better and discarded the separate driver idea.


The IP is so different from the currently supported LCDIF controllers
that I think trying to support this one in the existing driver actually
increases the chances to break something when modifying the driver in
the future. Not everyone is able to test all LCDIF versions. My vote is
on having a separate driver for the i.MX8MP LCDIF.


If you look at both controllers, it is clear it is still the LCDIF
behind, even the CSC that is bolted on would suggest that.


Yes, but from a driver PoV what you care about is not really the
hardware blocks used to implement something, but the programming model,
i.e. the register interface exposed to software.


I am also not happy when I look at the amount of duplication a separate
driver would create, it will be some 50% of the code that would be just
duplicated.


Yea, the duplicated code is still significant, as the HW itself is so
simple. However, if you find yourself in the situation where basically
every actual register access in the driver ends up being in a "if (some
HW rev) ... " clause, i still think it would be better to have a
separate driver, as the programming interface is just different.


I tend to agree with Marek on this one.  We have an instance where the
blk-ctrl and the GPC driver between 8m, mini, nano, plus are close,
but different enough where each SoC has it's own set of tables and
some checks.   Lucas created the framework, and others adapted it for
various SoC's.  If there really is nearly 50% common code for the
LCDIF, why not either leave the driver as one or split the common code
into its own driver like lcdif-common and then have smaller drivers
that handle their specific variations.


I don't know exactly how the standalone driver looks like, but I guess
the overlap is not really in any real HW specific parts, but the common
DRM boilerplate, so there isn't much point in creating a common lcdif
driver.


The mxsfb currently has 1280 LoC as of patch 8/9 of this series. Of
that, there is some 400 LoC which are specific to old LCDIF and this
patch adds 380 LoC for the new LCDIF. So that's 800 LoC or ~60% of
shared boilerplate that would be duplicated .


That is probably ignoring the fact that the 8MP LCDIF does not support
any overlays, so it could use the drm_simple_display_pipe
infrastructure to reduce the needed boilerplate.


The drm_simple_display_pipe infrastructure is probably too simple for
i.MX8MP LCDIF, since it uses one only crtc for one drm device. i.MX8MP
embeds *three* LCDIF instances to support MIPI DSI, LVDS and HDMI
outputs respectively. To use that infrastructure means there would be
three dri cards in all. However, the three LCDIF instances can be
wrapped by the one drm device, which is not the boilerplate code in the
current mxsfb driver may handle.


While that may make things a little simpler for userspace, I'm not sure
if this is the right thing to do. It complicates the driver a lot,
especially if you want to get things like independent power management,
etc. right. It also creates a fake view for userspace, where is looks
like there might be some shared resources between the different display
paths, while in reality they are fully independent.


Trade-off will be made between one drm device and three. My first
impression of using the drm_simple_display_pipe infrastructure is that
it's too simple and less flexible/scalable, because SoC designer will
be likely to add muxes between CRTCs and encoders/bridges or overlay
plane(s) in next generations of SoCs(SW developers don't seem have good
reasons to suggest not to do that).  Another concern is that whether
the userspace may use the three drm devices well or not.

A few more points:
1) With one drm dev

RE: [intel-gfx] [PATCH] drm/i915: avoid concurrent writes to aux_inv (rev3)

2022-03-02 Thread Yang, Fei
Hi Chris, for some reason I didn't receive the review email, so I copied your 
comments from patchwork and faked this email.

>>  static void execlists_dequeue(struct intel_engine_cs *engine)
>>  {
>> struct intel_engine_execlists * const execlists = &engine->execlists;
>> @@ -1538,6 +1566,16 @@ static void execlists_dequeue(struct intel_engine_cs 
>> *engine)
>> }
>>
>> if (__i915_request_submit(rq)) {
>> +   /* hsdes: 1809175790 */
>> +   if ((GRAPHICS_VER(engine->i915) == 12) &&
>> +   rq->vd_ve_aux_inv &&
>> +   (engine->class == VIDEO_DECODE_CLASS ||
>> +engine->class == 
>> VIDEO_ENHANCEMENT_CLASS)) {

> We do not need the extra checks here; we just do as we are told. We only
> tell ourselves to apply the fixup when required.

Without checking GRAPHICS_VER, I'm seeing a lot of regressions on older 
platforms in the CI result.
This workaround was only implemented for gen12 (gen12_emit_flush_xcs).
Without checking engine->class, I'm seeing boot issues due to GEM_BUG_ON() in 
aux_inv_reg().
Any rq will go through this code regardless of engine class and gen version, so 
the checking seems to be
necessary.

>> +   *rq->vd_ve_aux_inv = 
>> i915_mmio_reg_offset

> Likewise, vd_ve is overspecific, aux_inv_fixup or aux_inv_wa (or
> wa_aux_iv, fixup_aux_inv).

I wanted to be specific because the workaround was only implemented for vd/ve 
engines.
But I'm ok with your proposal.

>> +   (aux_inv_reg(engine));
>> +   rq->vd_ve_aux_inv = NULL;

> Move this to i915_request initialisation so that we only set aux_inv
> when required, which probably explains the extra defence.

The pointer is currently initialized with 0x5a5a. I set it to NULL in 
gen12_emit_flush_xcs, otherwise the rq will
enter that if-statement with an invalid pointer.
I'm not familiar with the code, there seems to be multiple functions allocating 
the structure. I agree it's better
to set it to NULL at initialization, but need some guidance on where is the 
best place to do so.

>> +   rq->execution_mask = engine->mask;
>> +   }
>> if (!merge) {
>> *port++ = i915_request_get(last);
>> last = NULL;
>> diff --git a/drivers/gpu/drm/i915/i915_request.h 
>> b/drivers/gpu/drm/i915/i915_request.h
>> index 28b1f9db5487..69de32e5e15d 100644
>> --- a/drivers/gpu/drm/i915/i915_request.h
>> +++ b/drivers/gpu/drm/i915/i915_request.h
>> @@ -350,6 +350,8 @@ struct i915_request {
>> struct list_head link;
>> unsigned long delay;
>> } mock;)
>> +
>> +   u32 *vd_ve_aux_inv;

> Not at the end of the struct; that's where we put things in the dungeon.
> The selftest struct should be last; I do hope no one has been putting
> things at random places in the struct without considering the layout and
> semantics. From the flow, this is akin to batch, capture_list; before
> emitted_jiffies would be a good spot.

Got it, will change. I thought adding at the end would be safer, thanks for the 
explanation.

> -Chris



Re: [EXT] Re: [PATCH 1/9] dt-bindings: mxsfb: Add compatible for i.MX8MP

2022-03-02 Thread Marek Vasut

On 3/2/22 14:14, Robby Cai wrote:

Hi

[...]


LCDIF on i.MX8MP is a different IP which is borrowed from non-iMX series,

although it's also called 'LCDIF'.

We prefer not mix these two series of IPs in one driver for ease of

maintenance and extension.

Where does the MX8MP LCDIF come from then, SGTL maybe ?


AFAIK, it's RT1170. You may have a check on RM [1]. Interestingly, this SoC has 
both eLCDIF and LCDIFv2, two IPs we are talking about.

[1] https://www.nxp.com/webapp/Download?colCode=IMXRT1170RM


That's interesting, I wasn't aware of this one. So that's MX8MP LCDIF + 
overlays. Did the LCDIF v2 in iMXRT1070 have any predecessor too ?


Re: [PATCH 1/9] dt-bindings: mxsfb: Add compatible for i.MX8MP

2022-03-02 Thread Marek Vasut

On 3/2/22 10:23, Lucas Stach wrote:

[...]


I tend to agree with Marek on this one.  We have an instance where the
blk-ctrl and the GPC driver between 8m, mini, nano, plus are close,
but different enough where each SoC has it's own set of tables and
some checks.   Lucas created the framework, and others adapted it for
various SoC's.  If there really is nearly 50% common code for the
LCDIF, why not either leave the driver as one or split the common code
into its own driver like lcdif-common and then have smaller drivers
that handle their specific variations.


I don't know exactly how the standalone driver looks like, but I guess
the overlap is not really in any real HW specific parts, but the common
DRM boilerplate, so there isn't much point in creating a common lcdif
driver.


The mxsfb currently has 1280 LoC as of patch 8/9 of this series. Of
that, there is some 400 LoC which are specific to old LCDIF and this
patch adds 380 LoC for the new LCDIF. So that's 800 LoC or ~60% of
shared boilerplate that would be duplicated .


That is probably ignoring the fact that the 8MP LCDIF does not support
any overlays, so it could use the drm_simple_display_pipe
infrastructure to reduce the needed boilerplate.


It seems the IMXRT1070 LCDIF v2 (heh ...) does support overlays, so no, 
the mxsfb and hypothetical lcdif drivers would look really very similar.



As you brought up the blk-ctrl as an example: I'm all for supporting
slightly different hardware in the same driver, as long as the HW
interface is close enough. But then I also opted for a separate 8MP
blk-ctrl driver for those blk-ctrls that differ significantly from the
others, as I think it would make the common driver unmaintainable
trying to support all the different variants in one driver.


But then you also need to maintain two sets of boilerplate, they
diverge, and that is not good.


I don't think that there is much chance for bugs going unfixed due to
divergence in the boilerplate, especially if you use the simple pipe
framework to handle most of that stuff for you, which gives you a lot
of code sharing with other simple DRM drivers.


But I can not use the simple pipe because overlays, see imxrt1070 .

[...]

We can always split the drivers later if this becomes unmaintainable 
too, no ?


Re: [PATCH 1/9] dt-bindings: mxsfb: Add compatible for i.MX8MP

2022-03-02 Thread Liu Ying
On Wed, 2022-03-02 at 12:57 +0100, Lucas Stach wrote:
> Am Mittwoch, dem 02.03.2022 um 17:41 +0800 schrieb Liu Ying:
> > On Wed, 2022-03-02 at 10:23 +0100, Lucas Stach wrote:
> > > Am Mittwoch, dem 02.03.2022 um 03:54 +0100 schrieb Marek Vasut:
> > > > On 3/1/22 14:18, Lucas Stach wrote:
> > > > > Am Dienstag, dem 01.03.2022 um 07:03 -0600 schrieb Adam Ford:
> > > > > > On Tue, Mar 1, 2022 at 5:05 AM Lucas Stach  
> > > > > > wrote:
> > > > > > > Am Dienstag, dem 01.03.2022 um 11:19 +0100 schrieb Marek Vasut:
> > > > > > > > On 3/1/22 11:04, Lucas Stach wrote:
> > > > > > > > 
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > [...]
> > > > > > > > 
> > > > > > > > > > Given the two totally different IPs, I don't see bugs of IP 
> > > > > > > > > > control
> > > > > > > > > > logics should be fixed for both drivers. Naturally, the two 
> > > > > > > > > > would
> > > > > > > > > > diverge due to different HWs. Looking at Patch 9/9, it 
> > > > > > > > > > basically
> > > > > > > > > > squashes code to control LCDIFv3 into the mxsfb drm driver 
> > > > > > > > > > with
> > > > > > > > > > 'if/else' checks(barely no common control code), which is 
> > > > > > > > > > hard to
> > > > > > > > > > maintain and not able to achieve good scalability for both 
> > > > > > > > > > 'LCDIFv3'
> > > > > > > > > > and 'LCDIF'.
> > > > > > > > > 
> > > > > > > > > I tend to agree with Liu here. Writing a DRM driver isn't 
> > > > > > > > > that much
> > > > > > > > > boilerplate anymore with all the helpers we have available in 
> > > > > > > > > the
> > > > > > > > > framework today.
> > > > > > > > 
> > > > > > > > I did write a separate driver for this IP before I spent time 
> > > > > > > > merging
> > > > > > > > them into single driver, that's when I realized a single driver 
> > > > > > > > is much
> > > > > > > > better and discarded the separate driver idea.
> > > > > > > > 
> > > > > > > > > The IP is so different from the currently supported LCDIF 
> > > > > > > > > controllers
> > > > > > > > > that I think trying to support this one in the existing 
> > > > > > > > > driver actually
> > > > > > > > > increases the chances to break something when modifying the 
> > > > > > > > > driver in
> > > > > > > > > the future. Not everyone is able to test all LCDIF versions. 
> > > > > > > > > My vote is
> > > > > > > > > on having a separate driver for the i.MX8MP LCDIF.
> > > > > > > > 
> > > > > > > > If you look at both controllers, it is clear it is still the 
> > > > > > > > LCDIF
> > > > > > > > behind, even the CSC that is bolted on would suggest that.
> > > > > > > 
> > > > > > > Yes, but from a driver PoV what you care about is not really the
> > > > > > > hardware blocks used to implement something, but the programming 
> > > > > > > model,
> > > > > > > i.e. the register interface exposed to software.
> > > > > > > 
> > > > > > > > I am also not happy when I look at the amount of duplication a 
> > > > > > > > separate
> > > > > > > > driver would create, it will be some 50% of the code that would 
> > > > > > > > be just
> > > > > > > > duplicated.
> > > > > > > > 
> > > > > > > Yea, the duplicated code is still significant, as the HW itself 
> > > > > > > is so
> > > > > > > simple. However, if you find yourself in the situation where 
> > > > > > > basically
> > > > > > > every actual register access in the driver ends up being in a "if 
> > > > > > > (some
> > > > > > > HW rev) ... " clause, i still think it would be better to have a
> > > > > > > separate driver, as the programming interface is just different.
> > > > > > 
> > > > > > I tend to agree with Marek on this one.  We have an instance where 
> > > > > > the
> > > > > > blk-ctrl and the GPC driver between 8m, mini, nano, plus are close,
> > > > > > but different enough where each SoC has it's own set of tables and
> > > > > > some checks.   Lucas created the framework, and others adapted it 
> > > > > > for
> > > > > > various SoC's.  If there really is nearly 50% common code for the
> > > > > > LCDIF, why not either leave the driver as one or split the common 
> > > > > > code
> > > > > > into its own driver like lcdif-common and then have smaller drivers
> > > > > > that handle their specific variations.
> > > > > 
> > > > > I don't know exactly how the standalone driver looks like, but I guess
> > > > > the overlap is not really in any real HW specific parts, but the 
> > > > > common
> > > > > DRM boilerplate, so there isn't much point in creating a common lcdif
> > > > > driver.
> > > > 
> > > > The mxsfb currently has 1280 LoC as of patch 8/9 of this series. Of 
> > > > that, there is some 400 LoC which are specific to old LCDIF and this 
> > > > patch adds 380 LoC for the new LCDIF. So that's 800 LoC or ~60% of 
> > > > shared boilerplate that would be duplicated .
> > > 
> > > That is probably ignoring the fact that the 8MP LCDIF does not support
> > > any overlays, so it could use the drm_simple_display_pipe
> > > infrastructure to

[Bug 215652] kernel 5.17-rc fail to load radeon DRM "modprobe: ERROR: could not insert 'radeon': Unknown symbol in module, or unknown parameter (see dmesg)"

2022-03-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=215652

Alex Deucher (alexdeuc...@gmail.com) changed:

   What|Removed |Added

 CC||alexdeuc...@gmail.com

--- Comment #3 from Alex Deucher (alexdeuc...@gmail.com) ---
If you are using an initrd, the firmware must be present on the initrd for the
driver to find it when it loads.  Please make sure the firmware is available in
your initrd.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: Report 2 in ext4 and journal based on v5.17-rc1

2022-03-02 Thread Theodore Ts'o
On Thu, Mar 03, 2022 at 10:00:33AM +0900, Byungchul Park wrote:
> 
> Unfortunately, it's neither perfect nor safe without another wakeup
> source - rescue wakeup source.
> 
>consumer   producer
> 
>   lock L
>   (too much work queued == true)
>   unlock L
>   --- preempted
>lock L
>unlock L
>do work
>lock L
>unlock L
>do work
>...
>(no work == true)
>sleep
>   --- scheduled in
>   sleep

That's not how things work in ext4.  It's **way** more complicated
than that.  We have multiple wait channels, one wake up the consumer
(read: the commit thread), and one which wakes up any processes
waiting for commit thread to have made forward progress.  We also have
two spin-lock protected sequence number, one which indicates the
current commited transaction #, and one indicating the transaction #
that needs to be committed.

On the commit thread, it will sleep on j_wait_commit, and when it is
woken up, it will check to see if there is work to be done
(j_commit_sequence != j_commit_request), and if so, do the work, and
then wake up processes waiting on the wait_queue j_wait_done_commit.
(Again, all of this uses the pattern, "prepare to wait", then check to
see if we should sleep, if we do need to sleep, unlock j_state_lock,
then sleep.   So this prevents any races leading to lost wakeups.

On the start_this_handle() thread, if we current transaction is too
full, we set j_commit_request to its transaction id to indicate that
we want the current transaction to be committed, and then we wake up
the j_wait_commit wait queue and then we enter a loop where do a
prepare_to_wait in j_wait_done_commit, check to see if
j_commit_sequence == the transaction id that we want to be completed,
and if it's not done yet, we unlock the j_state_lock spinlock, and go
to sleep.  Again, because of the prepare_to_wait, there is no chance
of a lost wakeup.

So there really is no "consumer" and "producer" here.  If you really
insist on using this model, which really doesn't apply, for one
thread, it's the consumer with respect to one wait queue, and the
producer with respect to the *other* wait queue.  For the other
thread, the consumer and producer roles are reversed.

And of course, this is a highly simplified model, since we also have a
wait queue used by the commit thread to wait for the number of active
handles on a particular transaction to go to zero, and
stop_this_handle() will wake up commit thread via this wait queue when
the last active handle on a particular transaction is retired.  (And
yes, that parameter is also protected by a different spin lock which
is per-transaction).

So it seems to me that a fundamental flaw in DEPT's model is assuming
that the only waiting paradigm that can be used is consumer/producer,
and that's simply not true.  The fact that you use the term "lock" is
also going to lead a misleading line of reasoning, because properly
speaking, they aren't really locks.  We are simply using wait channels
to wake up processes as necessary, and then they will check other
variables to decide whether or not they need to sleep or not, and we
have an invariant that when these variables change indicating forward
progress, the associated wait channel will be woken up.

Cheers,

- Ted


P.S.  This model is also highly simplified since there are other
reasons why the commit thread can be woken up, some which might be via
a timeout, and some which is via the j_wait_commit wait channel but
not because j_commit_request has been changed, but because file system
is being unmounted, or the file system is being frozen in preparation
of a snapshot, etc.  These are *not* necessary to prevent a deadlock,
because under normal circumstances the two wake channels are
sufficient of themselves.  So please don't think of them as "rescue
wakeup sources"; again, that's highly misleading and the wrong way to
think of them.

And to make things even more complicated, we have more than 2 wait
channel --- we have *five*:

/**
 * @j_wait_transaction_locked:
 *
 * Wait queue for waiting for a locked transaction to start committing,
 * or for a barrier lock to be released.
 */
wait_queue_head_t   j_wait_transaction_locked;

/**
 * @j_wait_done_commit: Wait queue for waiting for commit to complete.
 */
wait_queue_head_t   j_wait_done_commit;

/**
 * @j_wait_commit: Wait queue to trigger commit.
 */
wait_queue_head_t   j_wait_commit;

/**
 * @j_wait_updates: Wait queue to wait for updates to complete.
 */
wait_queue_head_t   j_wait_updates;

/**
 * @j_wait_reserved:
 *
 * Wait queue to wait f

[PATCH -next] drm/fsl-dcu: Remove unnecessary print function dev_err()

2022-03-02 Thread Yang Li
The print function dev_err() is redundant because platform_get_irq()
already prints an error.

Eliminate the follow coccicheck warning:
./drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c:277:2-9: line 277 is
redundant because platform_get_irq() already prints an error

Reported-by: Abaci Robot 
Signed-off-by: Yang Li 
---
 drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c 
b/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c
index 7a503bf08d0f..20895ea79739 100644
--- a/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c
+++ b/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c
@@ -273,10 +273,8 @@ static int fsl_dcu_drm_probe(struct platform_device *pdev)
}
 
fsl_dev->irq = platform_get_irq(pdev, 0);
-   if (fsl_dev->irq < 0) {
-   dev_err(dev, "failed to get irq\n");
+   if (fsl_dev->irq < 0)
return fsl_dev->irq;
-   }
 
fsl_dev->regmap = devm_regmap_init_mmio(dev, base,
&fsl_dcu_regmap_config);
-- 
2.20.1.7.g153144c



Re: [PATCH v3 00/21] DEPT(Dependency Tracker)

2022-03-02 Thread Byungchul Park
On Wed, Mar 02, 2022 at 04:36:38AM +, Hyeonggon Yoo wrote:
> On Mon, Feb 28, 2022 at 06:56:39PM +0900, Byungchul Park wrote:
> > I didn't want to bother you so I was planning to send the next spin
> > after making more progress. However, PATCH v2 reports too many false
> > positives because Dept tracked the bit_wait_table[] wrong way - I
> > apologize for that. So I decided to send PATCH v3 first before going
> > further for those who want to run Dept for now.
> > 
> > There might still be some false positives but not overwhelming.
> >
> 
> Hello Byungchul, I'm running DEPT v3 on my system
> and I see report below.
> 
> Looking at the kmemleak code and comment, I think
> kmemleak tried to avoid lockdep recursive warning
> but detected by DEPT?
> 
> ===
> DEPT: Circular dependency has been detected.
> 5.17.0-rc1+ #1 Tainted: GW
> ---
> summary
> ---
> *** AA DEADLOCK ***
> 
> context A
> [S] __raw_spin_lock_irqsave(&object->lock:0)
> [W] _raw_spin_lock_nested(&object->lock:0)
> [E] spin_unlock(&object->lock:0)
> 
> [S]: start of the event context
> [W]: the wait blocked
> [E]: the event not reachable
> ---
> context A's detail
> ---
> context A
> [S] __raw_spin_lock_irqsave(&object->lock:0)
> [W] _raw_spin_lock_nested(&object->lock:0)
> [E] spin_unlock(&object->lock:0)
> 
> [S] __raw_spin_lock_irqsave(&object->lock:0):
> [] scan_gray_list+0x84/0x13c
> stacktrace:
>   dept_ecxt_enter+0x88/0xf4
>   _raw_spin_lock_irqsave+0xf0/0x1c4
>   scan_gray_list+0x84/0x13c
>   kmemleak_scan+0x2d8/0x54c
>   kmemleak_scan_thread+0xac/0xd4
>   kthread+0xd4/0xe4
>   ret_from_fork+0x10/0x20

[W]'s stack trace is missed. But I guess this issue is the same issue of
what you reported following this one. We can discuss this issue on the
other report's thread.

Thanks,
Byunghcul

> [E] spin_unlock(&object->lock:0):
> [] scan_block+0x60/0x128
> ---
> information that might be helpful
> ---
> CPU: 1 PID: 38 Comm: kmemleak Tainted: GW 5.17.0-rc1+ #1
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace.part.0+0x9c/0xc4
>  show_stack+0x14/0x28
>  dump_stack_lvl+0x9c/0xcc
>  dump_stack+0x14/0x2c
>  print_circle+0x2d4/0x438
>  cb_check_dl+0x44/0x70
>  bfs+0x60/0x168
>  add_dep+0x88/0x11c
>  add_wait+0x2d0/0x2dc
>  __dept_wait+0x8c/0xa4
>  dept_wait+0x6c/0x88
>  _raw_spin_lock_nested+0xa8/0x1b0
>  scan_block+0xb4/0x128
>  scan_gray_list+0xc4/0x13c
>  kmemleak_scan+0x2d8/0x54c
>  kmemleak_scan_thread+0xac/0xd4
>  kthread+0xd4/0xe4
>  ret_from_fork+0x10/0x20
> 
> > ---
> > 
> > Hi Linus and folks,
> > 
> > I've been developing a tool for detecting deadlock possibilities by
> > tracking wait/event rather than lock(?) acquisition order to try to
> > cover all synchonization machanisms. It's done on v5.17-rc1 tag.
> > 
> > https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1
> >
> [...]
> > Benifit:
> > 
> > 0. Works with all lock primitives.
> > 1. Works with wait_for_completion()/complete().
> > 2. Works with 'wait' on PG_locked.
> > 3. Works with 'wait' on PG_writeback.
> > 4. Works with swait/wakeup.
> > 5. Works with waitqueue.
> > 6. Multiple reports are allowed.
> > 7. Deduplication control on multiple reports.
> > 8. Withstand false positives thanks to 6.
> > 9. Easy to tag any wait/event.
> > 
> > Future work:
> > 
> > 0. To make it more stable.
> > 1. To separates Dept from Lockdep.
> > 2. To improves performance in terms of time and space.
> > 3. To use Dept as a dependency engine for Lockdep.
> > 4. To add any missing tags of wait/event in the kernel.
> > 5. To deduplicate stack trace.
> > 
> > How to interpret reports:
> > 
> > 1. E(event) in each context cannot be triggered because of the
> >W(wait) that cannot be woken.
> > 2. The stack trace helping find the problematic code is located
> >in each conext's detail.
> > 
> > Thanks,
> > Byungchul
> > 
> > ---
> > 
> > Changes from v2:
> > 
> > 1. Disable Dept on bit_wait_table[] in sched/wait_bit.c
> >reporting a lot of false positives, which is my fault.
> >Wait/event for bit_wait_table[] should've been tagged in a
> >higher layer for better work, which is a future work.
> >(feedback from Jan Kara)
> > 2. Disable Dept on crypto_larval's completion to prevent a false
> >positive.
> > 
> > Changes from v1:
> > 
> > 1. Fix coding style and typo. (feedback from Steven)
> > 2. Distinguish each work context from another in workqueue.
> > 3. Skip checking lock acquisition with nest_lock, whic

[PATCH -next] drm/amdgpu: clean up some inconsistent indenting

2022-03-02 Thread Yang Li
Eliminate the follow smatch warning:
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c:413 amdgpu_get_xgmi_hive()
warn: inconsistent indenting

Reported-by: Abaci Robot 
Signed-off-by: Yang Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 91817a31f3e1..4ff6e06babca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -410,14 +410,14 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct 
amdgpu_device *adev)
 */
if (adev->reset_domain->type != XGMI_HIVE) {
hive->reset_domain = 
amdgpu_reset_create_reset_domain(XGMI_HIVE, "amdgpu-reset-hive");
-   if (!hive->reset_domain) {
-   dev_err(adev->dev, "XGMI: failed initializing 
reset domain for xgmi hive\n");
-   ret = -ENOMEM;
-   kobject_put(&hive->kobj);
-   kfree(hive);
-   hive = NULL;
-   goto pro_end;
-   }
+   if (!hive->reset_domain) {
+   dev_err(adev->dev, "XGMI: failed initializing reset 
domain for xgmi hive\n");
+   ret = -ENOMEM;
+   kobject_put(&hive->kobj);
+   kfree(hive);
+   hive = NULL;
+   goto pro_end;
+   }
} else {
amdgpu_reset_get_reset_domain(adev->reset_domain);
hive->reset_domain = adev->reset_domain;
-- 
2.20.1.7.g153144c



Re: Report 2 in ext4 and journal based on v5.17-rc1

2022-03-02 Thread Byungchul Park
On Mon, Feb 28, 2022 at 04:25:04PM -0500, Theodore Ts'o wrote:
> On Mon, Feb 28, 2022 at 11:14:44AM +0100, Jan Kara wrote:
> > > case 1. Code with an actual circular dependency, but not deadlock.
> > > 
> > >A circular dependency can be broken by a rescue wakeup source e.g.
> > >timeout. It's not a deadlock. If it's okay that the contexts
> > >participating in the circular dependency and others waiting for the
> > >events in the circle are stuck until it gets broken. Otherwise, say,
> > >if it's not meant, then it's anyway problematic.
> > > 
> > >1-1. What if we judge this code is problematic?
> > >1-2. What if we judge this code is good?
> > > 
> > > I've been wondering if the kernel guys esp. Linus considers code with
> > > any circular dependency is problematic or not, even if it won't lead to
> > > a deadlock, say, case 1. Even though I designed Dept based on what I
> > > believe is right, of course, I'm willing to change the design according
> > > to the majority opinion.
> > > 
> > > However, I would never allow case 1 if I were the owner of the kernel
> > > for better stability, even though the code works anyway okay for now.
> 
> Note, I used the example of the timeout as the most obvious way of
> explaining that a deadlock is not possible.  There is also the much
> more complex explanation which Jan was trying to give, which is what
> leads to the circular dependency.  It can happen that when trying to
> start a handle, if either (a) there is not enough space in the journal
> for new handles, or (b) the current transaction is so large that if we
> don't close the transaction and start a new hone, we will end up
> running out of space in the future, and so in that case,
> start_this_handle() will block starting any more handles, and then
> wake up the commit thread.  The commit thread then waits for the
> currently running threads to complete, before it allows new handles to
> start, and then it will complete the commit.  In the case of (a) we
> then need to do a journal checkpoint, which is more work to release
> space in the journal, and only then, can we allow new handles to start.

Thank you for the full explanation of how journal things work.

> The botom line is (a) it works, (b) there aren't significant delays,
> and for DEPT to complain that this is somehow wrong and we need to
> completely rearchitect perfectly working code because it doesn't
> confirm to DEPT's idea of what is "correct" is not acceptable.

Thanks to you and Jan Kara, I realized it's not a real dependency in the
consumer and producer scenario but again *ONLY IF* there is a rescue
wakeup source. Dept should track the rescue wakeup source instead in the
case.

I won't ask you to rearchitect the working code. The code looks sane.

Thanks a lot.

Thanks,
Byungchul

> > We have a queue of work to do Q protected by lock L. Consumer process has
> > code like:
> > 
> > while (1) {
> > lock L
> > prepare_to_wait(work_queued);
> > if (no work) {
> > unlock L
> > sleep
> > } else {
> > unlock L
> > do work
> > wake_up(work_done)
> > }
> > }
> > 
> > AFAIU Dept will create dependency here that 'wakeup work_done' is after
> > 'wait for work_queued'. Producer has code like:
> > 
> > while (1) {
> > lock L
> > prepare_to_wait(work_done)
> > if (too much work queued) {
> > unlock L
> > sleep
> > } else {
> > queue work
> > unlock L
> > wake_up(work_queued)
> > }
> > }
> > 
> > And Dept will create dependency here that 'wakeup work_queued' is after
> > 'wait for work_done'. And thus we have a trivial cycle in the dependencies
> > despite the code being perfectly valid and safe.
> 
> Cheers,
> 
>   - Ted


Re: [PATCH] video: fbdev: sm712fb: Fix crash in smtcfb_write()

2022-03-02 Thread Zheyu Ma
Hi,

On Thu, Mar 3, 2022 at 12:49 AM Helge Deller  wrote:
>
> On 3/2/22 15:33, Zheyu Ma wrote:
> > When the sm712fb driver writes three bytes to the framebuffer, the
> > driver will crash:
> >
> > BUG: unable to handle page fault for address: c90001ff
> > RIP: 0010:smtcfb_write+0x454/0x5b0
> > Call Trace:
> >  vfs_write+0x291/0xd60
> >  ? do_sys_openat2+0x27d/0x350
> >  ? __fget_light+0x54/0x340
> >  ksys_write+0xce/0x190
> >  do_syscall_64+0x43/0x90
> >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > Fix it by removing the open-coded endianness fixup-code.
> >
> > Signed-off-by: Zheyu Ma 
>
> Thanks... it's already in the fbdev git tree and queued up for v5.18...
> https://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev.git/commit/?h=for-next&id=bd771cf5c4254511cc4abb88f3dab3bd58bdf8e8

This patch fixes the crash in smtcfb_read(), but there is a similar
bug in smtcfb_write(), and I mocked up your patch a wrote a new patch
for it.
So we should fix two bugs with two patches.

Regards,
Zheyu Ma


Re: Report 2 in ext4 and journal based on v5.17-rc1

2022-03-02 Thread Byungchul Park
On Mon, Feb 28, 2022 at 11:14:44AM +0100, Jan Kara wrote:
> On Mon 28-02-22 18:28:26, Byungchul Park wrote:
> > case 1. Code with an actual circular dependency, but not deadlock.
> > 
> >A circular dependency can be broken by a rescue wakeup source e.g.
> >timeout. It's not a deadlock. If it's okay that the contexts
> >participating in the circular dependency and others waiting for the
> >events in the circle are stuck until it gets broken. Otherwise, say,
> >if it's not meant, then it's anyway problematic.
> > 
> >1-1. What if we judge this code is problematic?
> >1-2. What if we judge this code is good?
> > 
> > case 2. Code with an actual circular dependency, and deadlock.
> > 
> >There's no other wakeup source than those within the circular
> >dependency. Literally deadlock. It's problematic and critical.
> > 
> >2-1. What if we judge this code is problematic?
> >2-2. What if we judge this code is good?
> > 
> > case 3. Code with no actual circular dependency, and not deadlock.
> > 
> >Must be good.
> > 
> >3-1. What if we judge this code is problematic?
> >3-2. What if we judge this code is good?
> > 
> > ---
> > 
> > I call only 3-1 "false positive" circular dependency. And you call 1-1
> > and 3-1 "false positive" deadlock.
> > 
> > I've been wondering if the kernel guys esp. Linus considers code with
> > any circular dependency is problematic or not, even if it won't lead to
> > a deadlock, say, case 1. Even though I designed Dept based on what I
> > believe is right, of course, I'm willing to change the design according
> > to the majority opinion.
> > 
> > However, I would never allow case 1 if I were the owner of the kernel
> > for better stability, even though the code works anyway okay for now.
> 
> So yes, I call a report for the situation "There is circular dependency but
> deadlock is not possible." a false positive. And that is because in my
> opinion your definition of circular dependency includes schemes that are
> useful and used in the kernel.
> 
> Your example in case 1 is kind of borderline (I personally would consider
> that bug as well) but there are other more valid schemes with multiple
> wakeup sources like:
> 
> We have a queue of work to do Q protected by lock L. Consumer process has
> code like:
> 
> while (1) {
>   lock L
>   prepare_to_wait(work_queued);
>   if (no work) {
>   unlock L
>   sleep
>   } else {
>   unlock L
>   do work
>   wake_up(work_done)
>   }
> }
> 
> AFAIU Dept will create dependency here that 'wakeup work_done' is after
> 'wait for work_queued'. Producer has code like:

First of all, thank you for this good example.

> while (1) {
>   lock L
>   prepare_to_wait(work_done)
>   if (too much work queued) {
>   unlock L
>   sleep
>   } else {
>   queue work
>   unlock L
>   wake_up(work_queued)
>   }
> }
> 
> And Dept will create dependency here that 'wakeup work_queued' is after
> 'wait for work_done'. And thus we have a trivial cycle in the dependencies
> despite the code being perfectly valid and safe.

Unfortunately, it's neither perfect nor safe without another wakeup
source - rescue wakeup source.

   consumer producer

lock L
(too much work queued == true)
unlock L
--- preempted
   lock L
   unlock L
   do work
   lock L
   unlock L
   do work
   ...
   (no work == true)
   sleep
--- scheduled in
sleep

This code leads a deadlock without another wakeup source, say, not safe.

But yes. I also think this code should be allowed if it anyway runs
alongside another wakeup source. For the case, Dept should track the
rescue wakeup source instead that leads a actual deadlock.

I will correct code to make Dept track its rescue wakeup source whenever
finding the case.

Lastly, just for your information, I need to explain how Dept works a
little more for you not to misunderstand Dept.

Assuming the consumer and producer guarantee not to lead a deadlock like
the following, Dept won't report it a problem:

   consumer producer

sleep
   wakeup work_done
queue work
   sleep
wakeup work_queued
   do work
sleep
   wakeup work_done
queue work
   sleep
wakeup work_queued
   do work
sleep
   ...  ...

Dept does not consider all waits preceeding an event but only waits that
might lead a deadlock. In this case, Dept works with each region
independently.

   consumer pr

Re: [PATCH 7/8] drm/vmwgfx: Implement MSI/MSI-X support for IRQs

2022-03-02 Thread kernel test robot
Hi Zack,

I love your patch! Perhaps something to improve:

[auto build test WARNING on drm/drm-next]
[also build test WARNING on drm-intel/for-linux-next drm-tip/drm-tip 
drm-exynos/exynos-drm-next tegra-drm/drm/tegra/for-next v5.17-rc6 next-20220302]
[cannot apply to airlied/drm-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Zack-Rusin/drm-vmwgfx-3D-on-arm64-and-large-cursor-support/20220302-232641
base:   git://anongit.freedesktop.org/drm/drm drm-next
config: i386-randconfig-a004 
(https://download.01.org/0day-ci/archive/20220303/202203030802.itiytjh1-...@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 
d271fc04d5b97b12e6b797c6067d3c96a8d7470e)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/63d27e1cb11562966a6f07bf0e667b0a54dc2412
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Zack-Rusin/drm-vmwgfx-3D-on-arm64-and-large-cursor-support/20220302-232641
git checkout 63d27e1cb11562966a6f07bf0e667b0a54dc2412
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 
O=build_dir ARCH=i386 SHELL=/bin/bash drivers/gpu/drm/vmwgfx/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/vmwgfx/vmwgfx_irq.c:340:6: warning: variable 'ret' is used 
>> uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
   if (nvec <= 0) {
   ^
   drivers/gpu/drm/vmwgfx/vmwgfx_irq.c:369:9: note: uninitialized use occurs 
here
   return ret;
  ^~~
   drivers/gpu/drm/vmwgfx/vmwgfx_irq.c:340:2: note: remove the 'if' if its 
condition is always false
   if (nvec <= 0) {
   ^~~~
   drivers/gpu/drm/vmwgfx/vmwgfx_irq.c:330:9: note: initialize the variable 
'ret' to silence this warning
   int ret;
  ^
   = 0
   1 warning generated.


vim +340 drivers/gpu/drm/vmwgfx/vmwgfx_irq.c

   319  
   320  /**
   321   * vmw_irq_install - Install the irq handlers
   322   *
   323   * @dev_priv:  Pointer to the vmw_private device.
   324   * Return:  Zero if successful. Negative number otherwise.
   325   */
   326  int vmw_irq_install(struct vmw_private *dev_priv)
   327  {
   328  struct pci_dev *pdev = to_pci_dev(dev_priv->drm.dev);
   329  struct drm_device *dev = &dev_priv->drm;
   330  int ret;
   331  int nvec;
   332  int i = 0;
   333  
   334  BUILD_BUG_ON((SVGA_IRQFLAG_MAX >> VMWGFX_MAX_NUM_IRQS) != 1);
   335  BUG_ON(VMWGFX_MAX_NUM_IRQS != 
get_count_order(SVGA_IRQFLAG_MAX));
   336  
   337  nvec = pci_alloc_irq_vectors(pdev, 1, VMWGFX_MAX_NUM_IRQS,
   338   PCI_IRQ_ALL_TYPES);
   339  
 > 340  if (nvec <= 0) {

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


Re: [PATCH v3 00/21] DEPT(Dependency Tracker)

2022-03-02 Thread Byungchul Park
On Wed, Mar 02, 2022 at 04:53:41AM +, Hyeonggon Yoo wrote:
> On Wed, Mar 02, 2022 at 04:36:38AM +, Hyeonggon Yoo wrote:
> > On Mon, Feb 28, 2022 at 06:56:39PM +0900, Byungchul Park wrote:
> > > I didn't want to bother you so I was planning to send the next spin
> > > after making more progress. However, PATCH v2 reports too many false
> > > positives because Dept tracked the bit_wait_table[] wrong way - I
> > > apologize for that. So I decided to send PATCH v3 first before going
> > > further for those who want to run Dept for now.
> > > 
> > > There might still be some false positives but not overwhelming.
> > >
> > 
> > Hello Byungchul, I'm running DEPT v3 on my system
> > and I see report below.
> > 
> > Looking at the kmemleak code and comment, I think
> > kmemleak tried to avoid lockdep recursive warning
> > but detected by DEPT?
> >
> 
> Forgot to include another warning caused by DEPT.
> 
> And comment below might be useful for debugging:
> 
> in kmemleak.c:
>   43  * Locks and mutexes are acquired/nested in the following order:
>   44  *
>   45  *   scan_mutex [-> object->lock] -> kmemleak_lock -> other_object->lock 
> (SINGLE_DEPTH_NESTING)
>   46  *
>   47  * No kmemleak_lock and object->lock nesting is allowed outside 
> scan_mutex
>   48  * regions.
> 
> ===
> DEPT: Circular dependency has been detected.
> 5.17.0-rc1+ #1 Tainted: GW
> ---
> summary
> ---
> *** DEADLOCK ***
> 
> context A
> [S] __raw_spin_lock_irqsave(&object->lock:0)
> [W] __raw_spin_lock_irqsave(kmemleak_lock:0)
> [E] spin_unlock(&object->lock:0)
> 
> context B
> [S] __raw_spin_lock_irqsave(kmemleak_lock:0)
> [W] _raw_spin_lock_nested(&object->lock:0)
> [E] spin_unlock(kmemleak_lock:0)
> 
> [S]: start of the event context
> [W]: the wait blocked
> [E]: the event not reachable

Hi Hyeonggon,

Dept also allows the following scenario when an user guarantees that
each lock instance is different from another at a different depth:

   lock A0 with depth
   lock A1 with depth + 1
   lock A2 with depth + 2
   lock A3 with depth + 3
   (and so on)
   ..
   unlock A3
   unlock A2
   unlock A1
   unlock A0

However, Dept does not allow the following scenario where another lock
class cuts in the dependency chain:

   lock A0 with depth
   lock B
   lock A1 with depth + 1
   lock A2 with depth + 2
   lock A3 with depth + 3
   (and so on)
   ..
   unlock A3
   unlock A2
   unlock A1
   unlock B
   unlock A0

This scenario is clearly problematic. What do you think is going to
happen with another context running the following?

   lock A1 with depth
   lock B
   lock A2 with depth + 1
   lock A3 with depth + 2
   (and so on)
   ..
   unlock A3
   unlock A2
   unlock B
   unlock A1

It's a deadlock. That's why Dept reports this case as a problem. Or am I
missing something?

Thanks,
Byungchul

> ---
> context A's detail
> ---
> context A
> [S] __raw_spin_lock_irqsave(&object->lock:0)
> [W] __raw_spin_lock_irqsave(kmemleak_lock:0)
> [E] spin_unlock(&object->lock:0)
> 
> [S] __raw_spin_lock_irqsave(&object->lock:0):
> [] scan_gray_list+0x84/0x13c
> stacktrace:
>   dept_ecxt_enter+0x88/0xf4
>   _raw_spin_lock_irqsave+0xf0/0x1c4
>   scan_gray_list+0x84/0x13c
>   kmemleak_scan+0x2d8/0x54c
>   kmemleak_scan_thread+0xac/0xd4
>   kthread+0xd4/0xe4
>   ret_from_fork+0x10/0x20
> 
> [W] __raw_spin_lock_irqsave(kmemleak_lock:0):
> [] scan_block+0x3c/0x128
> stacktrace:
>   __dept_wait+0x8c/0xa4
>   dept_wait+0x6c/0x88
>   _raw_spin_lock_irqsave+0xb8/0x1c4
>   scan_block+0x3c/0x128
>   scan_gray_list+0xc4/0x13c
>   kmemleak_scan+0x2d8/0x54c
>   kmemleak_scan_thread+0xac/0xd4
>   kthread+0xd4/0xe4
>   ret_from_fork+0x10/0x20
> 
> [E] spin_unlock(&object->lock:0):
> [] scan_block+0x60/0x128
> 
> ---
> context B's detail
> ---
> context B
> [S] __raw_spin_lock_irqsave(kmemleak_lock:0)
> [W] _raw_spin_lock_nested(&object->lock:0)
> [E] spin_unlock(kmemleak_lock:0)
> 
> [S] __raw_spin_lock_irqsave(kmemleak_lock:0):
> [] scan_block+0x3c/0x128
> stacktrace:
>   dept_ecxt_enter+0x88/0xf4
>   _raw_spin_lock_irqsave+0xf0/0x1c4
>   scan_block+0x3c/0x128
>   kmemleak_scan+0x19c/0x54c
>   kmemleak_scan_thread+0xac/0xd4
>   kthread+0xd4/0xe4
>   ret_from_fork+0x10/0x20
> 
> [W] _raw_spin_lock_nested(&object->lock:0):
> [] scan_block+0xb4/0x128
> stacktrace:
>   dept_wait+0x74/0x88
>   _raw_spin_lock_nested+0xa8/0x1b0
>   scan_block+0xb4/0x128
>   kmemleak_scan+0x19c/0x54c
>   kmemleak_scan_thread+0xac/0xd4
>   kthread+0xd4/0xe4
>   ret_from_fork+0x10/0x

[PATCH] drm/selftest: plane_helper: Put test structures in static storage

2022-03-02 Thread Nathan Chancellor
Clang warns on certain 32-bit architectures:

  drivers/gpu/drm/selftests/test-drm_plane_helper.c:76:5: warning: stack frame 
size (1064) exceeds limit (1024) in 'igt_check_plane_state' 
[-Wframe-larger-than]
  int igt_check_plane_state(void *ignored)
  ^
  1 warning generated.

The structures in igt_check_plane_state() total 1008 bytes, so any small
amount of inlining will cause the stack frame to exceed the 32-bit limit
of 1024, triggering the warning.

Move these structures to static storage, which dramatically reduces the
amount of stack space in igt_check_plane_state(). There is no testing
impact, as igt_check_plane_state() is only called once in the driver.

Fixes: 943e6a8beeac ("mock a drm_plane in igt_check_plane_state to make the 
test more robust")
Link: https://github.com/ClangBuiltLinux/linux/issues/1600
Reported-by: kernel test robot 
Suggested-by: Nick Desaulniers 
Signed-off-by: Nathan Chancellor 
---
 drivers/gpu/drm/selftests/test-drm_plane_helper.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/selftests/test-drm_plane_helper.c 
b/drivers/gpu/drm/selftests/test-drm_plane_helper.c
index ceebeede55ea..b61273e9c403 100644
--- a/drivers/gpu/drm/selftests/test-drm_plane_helper.c
+++ b/drivers/gpu/drm/selftests/test-drm_plane_helper.c
@@ -77,7 +77,7 @@ int igt_check_plane_state(void *ignored)
 {
int ret;
 
-   const struct drm_crtc_state crtc_state = {
+   static const struct drm_crtc_state crtc_state = {
.crtc = ZERO_SIZE_PTR,
.enable = true,
.active = true,
@@ -87,14 +87,14 @@ int igt_check_plane_state(void *ignored)
DRM_MODE_FLAG_NHSYNC | DRM_MODE_FLAG_NVSYNC)
},
};
-   struct drm_plane plane = {
+   static struct drm_plane plane = {
.dev = NULL
};
-   struct drm_framebuffer fb = {
+   static struct drm_framebuffer fb = {
.width = 2048,
.height = 2048
};
-   struct drm_plane_state plane_state = {
+   static struct drm_plane_state plane_state = {
.plane = &plane,
.crtc = ZERO_SIZE_PTR,
.fb = &fb,

base-commit: 9ae2ac4d31a85ce59cc560d514a31b95f4ace154
-- 
2.35.1



[Bug 215652] kernel 5.17-rc fail to load radeon DRM "modprobe: ERROR: could not insert 'radeon': Unknown symbol in module, or unknown parameter (see dmesg)"

2022-03-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=215652

--- Comment #2 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 300522
  --> https://bugzilla.kernel.org/attachment.cgi?id=300522&action=edit
kernel .config (kernel 5.17-rc5, CONFIG_DRM_RADEON=m, Talos II)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 215652] kernel 5.17-rc fail to load radeon DRM "modprobe: ERROR: could not insert 'radeon': Unknown symbol in module, or unknown parameter (see dmesg)"

2022-03-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=215652

--- Comment #1 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 300521
  --> https://bugzilla.kernel.org/attachment.cgi?id=300521&action=edit
kernel dmesg (kernel 5.17-rc5, CONFIG_DRM_RADEON=y, Talos II)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 215652] New: kernel 5.17-rc fail to load radeon DRM "modprobe: ERROR: could not insert 'radeon': Unknown symbol in module, or unknown parameter (see dmesg)"

2022-03-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=215652

Bug ID: 215652
   Summary: kernel 5.17-rc fail to load radeon DRM "modprobe:
ERROR: could not insert 'radeon': Unknown symbol in
module, or unknown parameter (see dmesg)"
   Product: Drivers
   Version: 2.5
Kernel Version: 5.17-rc5
  Hardware: PPC-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Video(DRI - non Intel)
  Assignee: drivers_video-...@kernel-bugs.osdl.org
  Reporter: erhar...@mailbox.org
CC: platform_ppc...@kernel-bugs.osdl.org
Regression: No

Created attachment 300520
  --> https://bugzilla.kernel.org/attachment.cgi?id=300520&action=edit
kernel dmesg (kernel 5.17-rc5, CONFIG_DRM_RADEON=m, Talos II)

Kernel 5.17-rc5 has problems loading the radeon KMS module on my Talos II:

 # modprobe -v radeon
insmod /lib/modules/5.17.0-rc5-P9+/kernel/drivers/gpu/drm/drm_kms_helper.ko 
modprobe: ERROR: could not insert 'radeon': Unknown symbol in module, or
unknown parameter (see dmesg)

dmesg show no further output then.

When I build a kernel with CONFIG_DRM_RADEON=y I get this in dmesg:
[...]
ATOM BIOS: X1550
[drm] Generation 2 PCI interface, using max accessible memory
radeon :01:00.0: VRAM: 512M 0x - 0x1FFF (512M
used)
radeon :01:00.0: GTT: 512M 0x2000 - 0x3FFF
[drm] Detected VRAM RAM=512M, BAR=256M
[drm] RAM width 128bits DDR
radeon :01:00.0: dma_iommu_get_required_mask: returning bypass mask
0xfff
[drm] radeon: 512M of VRAM memory ready
[drm] radeon: 512M of GTT memory ready.
[drm] GART: num cpu pages 131072, num gpu pages 131072
[drm] radeon: 1 quad pipes, 1 z pipes initialized.
[drm] PCIE GART of 512M enabled (table at 0x0004).
radeon :01:00.0: WB enabled
radeon :01:00.0: fence driver on ring 0 use gpu addr 0x2000
radeon :01:00.0: radeon: MSI limited to 32-bit
[drm] radeon: irq initialized.
[drm] Loading R500 Microcode
radeon :01:00.0: Direct firmware load for radeon/R520_cp.bin failed with
error -2
radeon_cp: Failed to load firmware "radeon/R520_cp.bin"
[drm:.r100_cp_init] *ERROR* Failed to load firmware!
radeon :01:00.0: failed initializing CP (-2).
radeon :01:00.0: Disabling GPU acceleration

So it complains about not finding the relevant firmware. But the firmware being
located in everything should be ok. This used to work on kernels 5.16.x and
before.

 # ls -al /lib/firmware/radeon/R520_cp.bin
-rw-r--r-- 1 root root 2048  2. Mär 20:43 /lib/firmware/radeon/R520_cp.bin


Some data about the machine:
 # lspci 
:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
RV516 [Radeon X1300/X1550 Series]
:01:00.1 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] RV516
[Radeon X1300/X1550 Series] (Secondary)
0001:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0001:01:00.0 Non-Volatile memory controller: Phison Electronics Corporation
Device 5008 (rev 01)
0002:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0003:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI
Host Controller (rev 02)
0004:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0004:01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
BCM5719 Gigabit Ethernet PCIe (rev 01)
0004:01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
BCM5719 Gigabit Ethernet PCIe (rev 01)
0005:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0005:01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev
04)
0005:02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics
Family (rev 41)
0030:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0031:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0032:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0033:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)

 # lspci -s :01:00.0 -vv
:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
RV516 [Radeon X1300/X1550 Series] (prog-if 00 [VGA controller])
Subsystem: PC Partner Limited / Sapphire Technology RV516 [Radeon
X1300/X1550 Series]
Device tree node:
/sys/firmware/devicetree/base/pciex@600c3c000/pci@0/vga@0
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR-  [disabled]
Expansion ROM at 600c0 [disabled] [size=128K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
  

Re: [PATCH v16 4/4] drm/bridge: dw-hdmi: fix bus formats negotiation for 8 bit modes

2022-03-02 Thread H. Nikolaus Schaller
Hi Neil,

> Am 02.03.2022 um 15:34 schrieb Neil Armstrong :
> 
> Hi,
> 
>> (cross-checked: RGB mode still works if I force hdmi->sink_is_hdmi = false)
> 
> I don't understand what's wrong, can you try to make the logic select 
> MEDIA_BUS_FMT_YUV8_1X24 instead of DRM_COLOR_FORMAT_YCBCR422 ?

I have forced hdmi->sink_is_hdmi = false and replaced

/* Default 8bit RGB fallback */
-   output_fmts[i++] = MEDIA_BUS_FMT_RGB888_1X24;
+   output_fmts[i++] = MEDIA_BUS_FMT_YUV8_1X24;

And then screen remains black. MEDIA_BUS_FMT_RGB888_1X24 works.
(MEDIA_BUS_FMT_VUY8_1X24 doesn't work either).

So this indicates that YUV conversion is not working properly. Maybe missing 
some special
setup.

What I have to test if it works on a different monitor. Not that this specific 
panel
(a 7 inch waveshare touch with HDMIinput) is buggy and reports YUV capabilities
but does not handle them...

On the other hand this panel works on RasPi and OMAP5 (where I admit I do not 
know in
which mode).

> If your CSC is broken, we'll need to disable it on your platform.

Indeed.

So it seems as if we need a mechanism to overwrite 
dw_hdmi_bridge_atomic_get_output_bus_fmts()
in our ingenic-dw-hdmi platform specialization [1] to always return 
MEDIA_BUS_FMT_RGB888_1X24.

Or alternatively set sink_is_hdmi = false there (unfortunately there is no 
direct access to
struct dw_hdmi in a specialization drivers).

Is this already possible or how can it be done?

BR and thanks,
Nikolaus

[1]: 
https://lore.kernel.org/all/24a27226a22adf5f5573f013e5d7d89b0ec73664.1645895582.git@goldelico.com/

Re: [next] arm64: db410c: Internal error: Oops: 96000004 - msm_gpu_pm_suspend

2022-03-02 Thread Rob Clark
Should be fixed by:

https://patchwork.freedesktop.org/patch/475238/?series=100448&rev=1

t-b's welcome

On Wed, Mar 2, 2022 at 12:35 PM Naresh Kamboju
 wrote:
>
> [Please ignore this email if it is already reported]
>
> Linux next-20220302 running on Qcom db410c the following kernel crash
> reported [1].
>
> metadata:
>   git_ref: master
>   git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
>   git_sha: adaedcf826dccf01b69d9a1f1997c9446c6b2c54
>   git_describe: next-20220302
>   kernel-config: 
> https://builds.tuxbuild.com/25pJv2XjzFav5peWxwfhaU3LFEN/config
>
>
> Kernel crash:
>
>  Failed to start Entropy Daemon based on the HAVEGE algorithm
> [   12.104662] Unable to handle kernel NULL pointer dereference at
> virtual address 0010
> [   12.121151]   ESR = 0x9604
> [   12.121211]   EC = 0x25: DABT (current EL), IL = 32 bits
> [   12.123137]   SET = 0, FnV = 0
> [   12.128687]   EA = 0, S1PTW = 0
> [   12.131464]   FSC = 0x04: level 0 translation fault
> [   12.134572] Data abort info:
> [   12.139457]   ISV = 0, ISS = 0x0004
> [   12.142566]   CM = 0, WnR = 0
> [   12.146165] user pgtable: 4k pages, 48-bit VAs, pgdp=8235d000
> [   12.149360] [0010] pgd=, p4d=
> [   12.156339] Internal error: Oops: 9604 [#1] PREEMPT SMP
> [   12.162370] Modules linked in: videobuf2_dma_contig adv7511(+)
> crct10dif_ce qcom_wcnss_pil cec qrtr qcom_q6v5_mss qcom_camss
> snd_soc_lpass_apq8016 qcom_pil_info snd_soc_lpass_cpu rtc_pm8xxx
> videobuf2_dma_sg qcom_spmi_vadc snd_soc_msm8916_analog qcom_q6v5
> snd_soc_msm8916_digital snd_soc_apq8016_sbc snd_soc_lpass_platform
> qcom_pon v4l2_fwnode snd_soc_qcom_common qcom_spmi_temp_alarm
> qcom_sysmon qcom_vadc_common venus_core msm qcom_common v4l2_async
> qcom_glink_smem qmi_helpers v4l2_mem2mem videobuf2_memops mdt_loader
> qnoc_msm8916 gpu_sched icc_smd_rpm display_connector drm_dp_helper
> videobuf2_v4l2 drm_kms_helper qcom_stats videobuf2_common qcom_rng
> i2c_qcom_cci rpmsg_char drm socinfo rmtfs_mem fuse
> [   12.207393] CPU: 0 PID: 66 Comm: kworker/0:4 Not tainted
> 5.17.0-rc6-next-20220302 #1
> [   12.207407] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> [   12.207415] Workqueue: pm pm_runtime_work
> [   12.243952] pstate: 6005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   12.247862] pc : hrtimer_active+0x14/0x80
> [   12.254628] lr : hrtimer_cancel+0x28/0x70
> [   12.258795] sp : 8b423b70
> [   12.262786] x29: 8b423b70 x28:  x27: 
> 
> [   12.266092] x26: 8ad5d2e0 x25: 0002d138d917 x24: 
> 0d8dbb80
> [   12.273210] x23: 0326c010 x22: 0f6b2020 x21: 
> 0f6b2000
> [   12.280328] x20: 0f6b20f8 x19: 0f6b2318 x18: 
> 
> [   12.287447] x17: 800035bf3000 x16: 88004000 x15: 
> 4000
> [   12.294564] x14:  x13:  x12: 
> 8a8b7000
> [   12.301682] x11: 087facb61180 x10: 0bc0 x9 : 
> 881d3a78
> [   12.308800] x8 : 03c68000 x7 : 0018 x6 : 
> 1483ced5
> [   12.315918] x5 : 00ff x4 :  x3 : 
> 
> [   12.323036] x2 :  x1 :  x0 : 
> 0f6b2318
> [   12.330156] Call trace:
> [   12.337263]  hrtimer_active+0x14/0x80
> [   12.339524]  msm_devfreq_suspend+0x30/0x60 [msm]
> [   12.343348]  msm_gpu_pm_suspend+0x44/0x144 [msm]
> [   12.348035]  adreno_suspend+0x6c/0x174 [msm]
> [   12.352634]  pm_generic_runtime_suspend+0x38/0x50
> [   12.356885]  genpd_runtime_suspend+0xb4/0x314
> [   12.361487]  __rpm_callback+0x50/0x180
> [   12.365824]  rpm_callback+0x74/0x80
> [   12.369470]  rpm_suspend+0x110/0x634
> [   12.372856]  pm_runtime_work+0xd0/0xd4
> [   12.376676]  process_one_work+0x1dc/0x450
> [   12.380235]  worker_thread+0x154/0x450
> [   12.384314]  kthread+0x100/0x110
> [   12.387959]  ret_from_fork+0x10/0x20
> [   12.391351] Code: aa1e03e9 d503201f d503233f f9401802 (b9401041)
> [   12.394913] ---[ end trace  ]---
>
> Reported-by: Linux Kernel Functional Testing 
>
> --
> Linaro LKFT
> https://lkft.linaro.org
> [1] https://lkft.validation.linaro.org/scheduler/job/4643232#L2396


Re: [PATCH v2 2/3] drm/i915: Remove the vma refcount

2022-03-02 Thread Niranjana Vishwanathapura

On Wed, Mar 02, 2022 at 11:21:59AM +0100, Thomas Hellström wrote:

Now that i915_vma_parked() is taking the object lock on vma destruction,
and the only user of the vma refcount, i915_gem_object_unbind()
also takes the object lock, remove the vma refcount.

Signed-off-by: Thomas Hellström 
---
drivers/gpu/drm/i915/i915_gem.c   | 17 +
drivers/gpu/drm/i915/i915_vma.c   | 14 +++---
drivers/gpu/drm/i915/i915_vma.h   | 14 --
drivers/gpu/drm/i915/i915_vma_types.h |  1 -
4 files changed, 16 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index dd84ebabb50f..c26110abcc0b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -151,14 +151,25 @@ int i915_gem_object_unbind(struct drm_i915_gem_object 
*obj,
break;
}

+   /*
+* Requiring the vm destructor to take the object lock
+* before destroying a vma would help us eliminate the
+* i915_vm_tryget() here, AND thus also the barrier stuff
+* at the end. That's an easy fix, but sleeping locks in
+* a kthread should generally be avoided.
+*/
ret = -EAGAIN;
if (!i915_vm_tryget(vma->vm))
break;

-   /* Prevent vma being freed by i915_vma_parked as we unbind */
-   vma = __i915_vma_get(vma);
spin_unlock(&obj->vma.lock);

+   /*
+* Since i915_vma_parked() takes the object lock
+* before vma destruction, it won't race us here,
+* and destroy the vma from under us.
+*/
+
if (vma) {
bool vm_trylock = !!(flags & 
I915_GEM_OBJECT_UNBIND_VM_TRYLOCK);
ret = -EBUSY;
@@ -180,8 +191,6 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj,
ret = i915_vma_unbind(vma);
}
}
-
-   __i915_vma_put(vma);
}

i915_vm_put(vma->vm);


One issue still left in i915_gem_object_unbind is that it temporarily removes
vmas from the obj->vma.list and adds back later as vma needs to be unbind 
outside
the obj->vma.lock spinlock. This is an issue for other places where we iterate
over the obj->vma.list. i915_debugfs_describe_obj is one such case (upcoming
vm_bind will be another) that iterates over this list.
What is the plan here? Do we need to take object lock while iterating over the
list?

But this just something I noticed and not related to this patch.
This patch looks good to me.
Reviewed-by: Niranjana Vishwanathapura 



diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 91538bc38110..6fd25b39748f 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -122,7 +122,6 @@ vma_create(struct drm_i915_gem_object *obj,
if (vma == NULL)
return ERR_PTR(-ENOMEM);

-   kref_init(&vma->ref);
vma->ops = &vm->vma_ops;
vma->obj = obj;
vma->size = obj->base.size;
@@ -1628,15 +1627,6 @@ void i915_vma_reopen(struct i915_vma *vma)
__i915_vma_remove_closed(vma);
}

-void i915_vma_release(struct kref *ref)
-{
-   struct i915_vma *vma = container_of(ref, typeof(*vma), ref);
-
-   i915_active_fini(&vma->active);
-   GEM_WARN_ON(vma->resource);
-   i915_vma_free(vma);
-}
-
static void force_unbind(struct i915_vma *vma)
{
if (!drm_mm_node_allocated(&vma->node))
@@ -1665,7 +1655,9 @@ static void release_references(struct i915_vma *vma, bool 
vm_ddestroy)
if (vm_ddestroy)
i915_vm_resv_put(vma->vm);

-   __i915_vma_put(vma);
+   i915_active_fini(&vma->active);
+   GEM_WARN_ON(vma->resource);
+   i915_vma_free(vma);
}

/**
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 67ae7341c7e0..6034991d89fe 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -222,20 +222,6 @@ void i915_vma_unlink_ctx(struct i915_vma *vma);
void i915_vma_close(struct i915_vma *vma);
void i915_vma_reopen(struct i915_vma *vma);

-static inline struct i915_vma *__i915_vma_get(struct i915_vma *vma)
-{
-   if (kref_get_unless_zero(&vma->ref))
-   return vma;
-
-   return NULL;
-}
-
-void i915_vma_release(struct kref *ref);
-static inline void __i915_vma_put(struct i915_vma *vma)
-{
-   kref_put(&vma->ref, i915_vma_release);
-}
-
void i915_vma_destroy_locked(struct i915_vma *vma);
void i915_vma_destroy(struct i915_vma *vma);

diff --git a/drivers/gpu/drm/i915/i915_vma_types.h 
b/drivers/gpu/drm/i915/i915_vma_types.h
index eac36be184e5..be6e028c3b57 100644
--- a/drivers/gpu/drm/i915/i915_vma_types.h
+++ b/d

Re: [Intel-gfx] [PATCH] drm/i915: avoid concurrent writes to aux_inv

2022-03-02 Thread Chris Wilson
Quoting fei.y...@intel.com (2022-03-02 18:26:57)
> From: Fei Yang 
> 
> GPU hangs have been observed when multiple engines write to the
> same aux_inv register at the same time. To avoid this each engine
> should only invalidate its own auxiliary table. The function
> gen12_emit_flush_xcs() currently invalidate the auxiliary table for
> all engines because the rq->engine is not necessarily the engine
> eventually carrying out the request, and potentially the engine
> could even be a virtual one (with engine->instance being -1).
> With this patch, auxiliary table invalidation is done only for the
> engine executing the request. And the mmio address for the aux_inv
> register is set after the engine instance becomes certain.
> 
> Signed-off-by: Chris Wilson 
> Signed-off-by: Fei Yang 
> ---
>  drivers/gpu/drm/i915/gt/gen8_engine_cs.c  | 41 ---
>  .../drm/i915/gt/intel_execlists_submission.c  | 38 +
>  drivers/gpu/drm/i915/i915_request.h   |  2 +
>  3 files changed, 47 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> index b1b9c3fd7bf9..af62e2bc2c9b 100644
> --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
> @@ -165,30 +165,6 @@ static u32 preparser_disable(bool state)
> return MI_ARB_CHECK | 1 << 8 | state;
>  }
>  
> -static i915_reg_t aux_inv_reg(const struct intel_engine_cs *engine)
> -{
> -   static const i915_reg_t vd[] = {
> -   GEN12_VD0_AUX_NV,
> -   GEN12_VD1_AUX_NV,
> -   GEN12_VD2_AUX_NV,
> -   GEN12_VD3_AUX_NV,
> -   };
> -
> -   static const i915_reg_t ve[] = {
> -   GEN12_VE0_AUX_NV,
> -   GEN12_VE1_AUX_NV,
> -   };
> -
> -   if (engine->class == VIDEO_DECODE_CLASS)
> -   return vd[engine->instance];
> -
> -   if (engine->class == VIDEO_ENHANCEMENT_CLASS)
> -   return ve[engine->instance];
> -
> -   GEM_BUG_ON("unknown aux_inv reg\n");
> -   return INVALID_MMIO_REG;
> -}
> -
>  static u32 *gen12_emit_aux_table_inv(const i915_reg_t inv_reg, u32 *cs)
>  {
> *cs++ = MI_LOAD_REGISTER_IMM(1);
> @@ -288,7 +264,7 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 
> mode)
> if (mode & EMIT_INVALIDATE)
> aux_inv = rq->engine->mask & ~BIT(BCS0);
> if (aux_inv)
> -   cmd += 2 * hweight32(aux_inv) + 2;
> +   cmd += 4;
>  
> cs = intel_ring_begin(rq, cmd);
> if (IS_ERR(cs))
> @@ -319,16 +295,13 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 
> mode)
> *cs++ = 0; /* value */
>  
> if (aux_inv) { /* hsdes: 1809175790 */
> -   struct intel_engine_cs *engine;
> -   unsigned int tmp;
> -
> -   *cs++ = MI_LOAD_REGISTER_IMM(hweight32(aux_inv));
> -   for_each_engine_masked(engine, rq->engine->gt, aux_inv, tmp) {
> -   *cs++ = i915_mmio_reg_offset(aux_inv_reg(engine));
> -   *cs++ = AUX_INV;
> -   }
> +   *cs++ = MI_LOAD_REGISTER_IMM(1);
> +   rq->vd_ve_aux_inv = cs;
> +   *cs++ = 0; /* address to be set at submission to HW */
> +   *cs++ = AUX_INV;
> *cs++ = MI_NOOP;
> -   }
> +   } else
> +   rq->vd_ve_aux_inv = NULL;
>  
> if (mode & EMIT_INVALIDATE)
> *cs++ = preparser_disable(false);
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
> b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 1c602d4ae297..a018de6dcac5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -1258,6 +1258,34 @@ static bool completed(const struct i915_request *rq)
> return __i915_request_is_complete(rq);
>  }
>  
> +static i915_reg_t aux_inv_reg(const struct intel_engine_cs *engine)
> +{
> +   static const i915_reg_t vd[] = {
> +   GEN12_VD0_AUX_NV,
> +   GEN12_VD1_AUX_NV,
> +   GEN12_VD2_AUX_NV,
> +   GEN12_VD3_AUX_NV,
> +   };
> +
> +   static const i915_reg_t ve[] = {
> +   GEN12_VE0_AUX_NV,
> +   GEN12_VE1_AUX_NV,
> +   };
> +
> +   if (engine->class == VIDEO_DECODE_CLASS) {
> +   GEM_BUG_ON(engine->instance >= ARRAY_SIZE(vd));
> +   return vd[engine->instance];
> +   }
> +
> +   if (engine->class == VIDEO_ENHANCEMENT_CLASS) {
> +   GEM_BUG_ON(engine->instance >= ARRAY_SIZE(ve));
> +   return ve[engine->instance];
> +   }
> +
> +   GEM_BUG_ON("unknown aux_inv reg\n");
> +   return INVALID_MMIO_REG;
> +}
> +
>  static void execlists_dequeue(struct intel_engine_cs *engine)
>  {
> struct intel_engine_execlists * const execlists = &

Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-02 Thread Kees Cook
On Wed, Mar 02, 2022 at 12:18:45PM -0800, Linus Torvalds wrote:
> On Wed, Mar 2, 2022 at 12:07 PM Kees Cook  wrote:
> >
> > I've long wanted to change kfree() to explicitly set pointers to NULL on
> > free. https://github.com/KSPP/linux/issues/87
> 
> We've had this discussion with the gcc people in the past, and gcc
> actually has some support for it, but it's sadly tied to the actual
> function name (ie gcc has some special-casing for "free()")
> 
> See
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94527
> 
> for some of that discussion.
> 
> Oh, and I see some patch actually got merged since I looked there last
> so that you can mark "deallocator" functions, but I think it's only
> for the context matching, not for actually killing accesses to the
> pointer afterwards.

Ah! I missed that getting added in GCC 11. But yes, there it is:

https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-malloc-function-attribute

Hah, now we may need to split __malloc from __alloc_size. ;)

I'd still like the NULL assignment behavior, though, since some things
can easily avoid static analysis.

-- 
Kees Cook


[next] arm64: db410c: Internal error: Oops: 96000004 - msm_gpu_pm_suspend

2022-03-02 Thread Naresh Kamboju
[Please ignore this email if it is already reported]

Linux next-20220302 running on Qcom db410c the following kernel crash
reported [1].

metadata:
  git_ref: master
  git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
  git_sha: adaedcf826dccf01b69d9a1f1997c9446c6b2c54
  git_describe: next-20220302
  kernel-config: https://builds.tuxbuild.com/25pJv2XjzFav5peWxwfhaU3LFEN/config


Kernel crash:

 Failed to start Entropy Daemon based on the HAVEGE algorithm
[   12.104662] Unable to handle kernel NULL pointer dereference at
virtual address 0010
[   12.121151]   ESR = 0x9604
[   12.121211]   EC = 0x25: DABT (current EL), IL = 32 bits
[   12.123137]   SET = 0, FnV = 0
[   12.128687]   EA = 0, S1PTW = 0
[   12.131464]   FSC = 0x04: level 0 translation fault
[   12.134572] Data abort info:
[   12.139457]   ISV = 0, ISS = 0x0004
[   12.142566]   CM = 0, WnR = 0
[   12.146165] user pgtable: 4k pages, 48-bit VAs, pgdp=8235d000
[   12.149360] [0010] pgd=, p4d=
[   12.156339] Internal error: Oops: 9604 [#1] PREEMPT SMP
[   12.162370] Modules linked in: videobuf2_dma_contig adv7511(+)
crct10dif_ce qcom_wcnss_pil cec qrtr qcom_q6v5_mss qcom_camss
snd_soc_lpass_apq8016 qcom_pil_info snd_soc_lpass_cpu rtc_pm8xxx
videobuf2_dma_sg qcom_spmi_vadc snd_soc_msm8916_analog qcom_q6v5
snd_soc_msm8916_digital snd_soc_apq8016_sbc snd_soc_lpass_platform
qcom_pon v4l2_fwnode snd_soc_qcom_common qcom_spmi_temp_alarm
qcom_sysmon qcom_vadc_common venus_core msm qcom_common v4l2_async
qcom_glink_smem qmi_helpers v4l2_mem2mem videobuf2_memops mdt_loader
qnoc_msm8916 gpu_sched icc_smd_rpm display_connector drm_dp_helper
videobuf2_v4l2 drm_kms_helper qcom_stats videobuf2_common qcom_rng
i2c_qcom_cci rpmsg_char drm socinfo rmtfs_mem fuse
[   12.207393] CPU: 0 PID: 66 Comm: kworker/0:4 Not tainted
5.17.0-rc6-next-20220302 #1
[   12.207407] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[   12.207415] Workqueue: pm pm_runtime_work
[   12.243952] pstate: 6005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   12.247862] pc : hrtimer_active+0x14/0x80
[   12.254628] lr : hrtimer_cancel+0x28/0x70
[   12.258795] sp : 8b423b70
[   12.262786] x29: 8b423b70 x28:  x27: 
[   12.266092] x26: 8ad5d2e0 x25: 0002d138d917 x24: 0d8dbb80
[   12.273210] x23: 0326c010 x22: 0f6b2020 x21: 0f6b2000
[   12.280328] x20: 0f6b20f8 x19: 0f6b2318 x18: 
[   12.287447] x17: 800035bf3000 x16: 88004000 x15: 4000
[   12.294564] x14:  x13:  x12: 8a8b7000
[   12.301682] x11: 087facb61180 x10: 0bc0 x9 : 881d3a78
[   12.308800] x8 : 03c68000 x7 : 0018 x6 : 1483ced5
[   12.315918] x5 : 00ff x4 :  x3 : 
[   12.323036] x2 :  x1 :  x0 : 0f6b2318
[   12.330156] Call trace:
[   12.337263]  hrtimer_active+0x14/0x80
[   12.339524]  msm_devfreq_suspend+0x30/0x60 [msm]
[   12.343348]  msm_gpu_pm_suspend+0x44/0x144 [msm]
[   12.348035]  adreno_suspend+0x6c/0x174 [msm]
[   12.352634]  pm_generic_runtime_suspend+0x38/0x50
[   12.356885]  genpd_runtime_suspend+0xb4/0x314
[   12.361487]  __rpm_callback+0x50/0x180
[   12.365824]  rpm_callback+0x74/0x80
[   12.369470]  rpm_suspend+0x110/0x634
[   12.372856]  pm_runtime_work+0xd0/0xd4
[   12.376676]  process_one_work+0x1dc/0x450
[   12.380235]  worker_thread+0x154/0x450
[   12.384314]  kthread+0x100/0x110
[   12.387959]  ret_from_fork+0x10/0x20
[   12.391351] Code: aa1e03e9 d503201f d503233f f9401802 (b9401041)
[   12.394913] ---[ end trace  ]---

Reported-by: Linux Kernel Functional Testing 

--
Linaro LKFT
https://lkft.linaro.org
[1] https://lkft.validation.linaro.org/scheduler/job/4643232#L2396


Re: [PATCH v2 1/3] drm/i915: Remove the vm open count

2022-03-02 Thread Niranjana Vishwanathapura

On Wed, Mar 02, 2022 at 11:21:58AM +0100, Thomas Hellström wrote:

vms are not getting properly closed. Rather than fixing that,
Remove the vm open count and instead rely on the vm refcount.

The vm open count existed solely to break the strong references the
vmas had on the vms. Now instead make those references weak and
ensure vmas are destroyed when the vm is destroyed.

Unfortunately if the vm destructor and the object destructor both
wants to destroy a vma, that may lead to a race in that the vm
destructor just unbinds the vma and leaves the actual vma destruction
to the object destructor. However in order for the object destructor
to ensure the vma is unbound it needs to grab the vm mutex. In order
to keep the vm mutex alive until the object destructor is done with
it, somewhat hackishly grab a vm_resv refcount that is released late
in the vma destruction process, when the vm mutex is no longer needed.

v2: Address review-comments from Niranjana
- Clarify that the struct i915_address_space::skip_pte_rewrite is a hack and
 should ideally be replaced in an upcoming patch.
- Remove an unneeded continue in clear_vm_list and update comment.

Co-developed-by: Niranjana Vishwanathapura 
Signed-off-by: Niranjana Vishwanathapura 
Signed-off-by: Thomas Hellström 
---
drivers/gpu/drm/i915/display/intel_dpt.c  |  2 +-
drivers/gpu/drm/i915/gem/i915_gem_context.c   | 29 ++-
.../gpu/drm/i915/gem/i915_gem_execbuffer.c|  6 ++
.../gpu/drm/i915/gem/selftests/mock_context.c |  5 +-
drivers/gpu/drm/i915/gt/gen6_ppgtt.c  |  2 +-
drivers/gpu/drm/i915/gt/intel_ggtt.c  | 30 +++
drivers/gpu/drm/i915/gt/intel_gtt.c   | 54 
drivers/gpu/drm/i915/gt/intel_gtt.h   | 56 
drivers/gpu/drm/i915/gt/selftest_execlists.c  | 86 +--
drivers/gpu/drm/i915/i915_gem.c   |  6 +-
drivers/gpu/drm/i915/i915_vma.c   | 55 
drivers/gpu/drm/i915/i915_vma_resource.c  |  2 +-
drivers/gpu/drm/i915/i915_vma_resource.h  |  6 ++
drivers/gpu/drm/i915/i915_vma_types.h |  7 ++
drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  4 +-
15 files changed, 186 insertions(+), 164 deletions(-)



Looks good to me.
Reviewed-by: Niranjana Vishwanathapura 



Re: [PATCH v2 3/3] drm/i915/gem: Remove some unnecessary code

2022-03-02 Thread Niranjana Vishwanathapura

On Wed, Mar 02, 2022 at 11:22:00AM +0100, Thomas Hellström wrote:

The test for vma should always return true, and when assigning -EBUSY
to ret, the variable should already have that value.

Signed-off-by: Thomas Hellström 
---
drivers/gpu/drm/i915/i915_gem.c | 32 ++--
1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c26110abcc0b..9747924cc57b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -118,6 +118,7 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj,
   unsigned long flags)
{
struct intel_runtime_pm *rpm = &to_i915(obj->base.dev)->runtime_pm;
+   bool vm_trylock = !!(flags & I915_GEM_OBJECT_UNBIND_VM_TRYLOCK);
LIST_HEAD(still_in_list);
intel_wakeref_t wakeref;
struct i915_vma *vma;
@@ -170,26 +171,21 @@ int i915_gem_object_unbind(struct drm_i915_gem_object 
*obj,
 * and destroy the vma from under us.
 */

-   if (vma) {
-   bool vm_trylock = !!(flags & 
I915_GEM_OBJECT_UNBIND_VM_TRYLOCK);
-   ret = -EBUSY;
-   if (flags & I915_GEM_OBJECT_UNBIND_ASYNC) {
-   assert_object_held(vma->obj);
-   ret = i915_vma_unbind_async(vma, vm_trylock);
-   }
+   ret = -EBUSY;
+   if (flags & I915_GEM_OBJECT_UNBIND_ASYNC) {
+   assert_object_held(vma->obj);
+   ret = i915_vma_unbind_async(vma, vm_trylock);
+   }

-   if (ret == -EBUSY && (flags & 
I915_GEM_OBJECT_UNBIND_ACTIVE ||
- !i915_vma_is_active(vma))) {
-   if (vm_trylock) {
-   if (mutex_trylock(&vma->vm->mutex)) {
-   ret = __i915_vma_unbind(vma);
-   mutex_unlock(&vma->vm->mutex);
-   } else {
-   ret = -EBUSY;
-   }
-   } else {
-   ret = i915_vma_unbind(vma);
+   if (ret == -EBUSY && (flags & I915_GEM_OBJECT_UNBIND_ACTIVE ||
+ !i915_vma_is_active(vma))) {
+   if (vm_trylock) {
+   if (mutex_trylock(&vma->vm->mutex)) {
+   ret = __i915_vma_unbind(vma);
+   mutex_unlock(&vma->vm->mutex);
}
+   } else {
+   ret = i915_vma_unbind(vma);
}
}


Looks good to me.
Reviewed-by: Niranjana Vishwanathapura 



--
2.34.1



Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-02 Thread Linus Torvalds
On Wed, Mar 2, 2022 at 12:07 PM Kees Cook  wrote:
>
> I've long wanted to change kfree() to explicitly set pointers to NULL on
> free. https://github.com/KSPP/linux/issues/87

We've had this discussion with the gcc people in the past, and gcc
actually has some support for it, but it's sadly tied to the actual
function name (ie gcc has some special-casing for "free()")

See

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94527

for some of that discussion.

Oh, and I see some patch actually got merged since I looked there last
so that you can mark "deallocator" functions, but I think it's only
for the context matching, not for actually killing accesses to the
pointer afterwards.

   Linus


Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-02 Thread Kees Cook
On Wed, Mar 02, 2022 at 10:29:31AM +0100, Rasmus Villemoes wrote:
> This won't help the current issue (because it doesn't exist and might
> never), but just in case some compiler people are listening, I'd like to
> have some sort of way to tell the compiler "treat this variable as
> uninitialized from here on". So one could do
> 
> #define kfree(p) do { __kfree(p); __magic_uninit(p); } while (0)
> 
> with __magic_uninit being a magic no-op that doesn't affect the
> semantics of the code, but could be used by the compiler's "[is/may be]
> used uninitialized" machinery to flag e.g. double frees on some odd
> error path etc. It would probably only work for local automatic
> variables, but it should be possible to just ignore the hint if p is
> some expression like foo->bar or has side effects. If we had that, the
> end-of-loop test could include that to "uninitialize" the iterator.

I've long wanted to change kfree() to explicitly set pointers to NULL on
free. https://github.com/KSPP/linux/issues/87

The thing stopping a trivial transformation of kfree() is:

kfree(get_some_pointer());

I would argue, though, that the above is poor form: the thing holding
the pointer should be the thing freeing it, so these cases should be
refactored and kfree() could do the NULLing by default.

Quoting myself in the above issue:


Without doing massive tree-wide changes, I think we need compiler
support. If we had something like __builtin_is_lvalue(), we could
distinguish function returns from lvalues. For example, right now a
common case are things like:

kfree(get_some_ptr());

But if we could at least gain coverage of the lvalue cases, and detect
them statically at compile-time, we could do:

#define __kfree_and_null(x) do { __kfree(*x); *x = NULL; } while (0)
#define kfree(x) __builtin_choose_expr(__builtin_is_lvalue(x),
__kfree_and_null(&(x)), __kfree(x))

Alternatively, we could do a tree-wide change of the former case (findable
with Coccinelle) and change them into something like kfree_no_null()
and redefine kfree() itself:

#define kfree_no_null(x) do { void *__ptr = (x); __kfree(__ptr); } while (0)
#define kfree(x) do { __kfree(x); x = NULL; } while (0)

-- 
Kees Cook


Re: [PATCH] simpldrm: Enable boot time VESA graphic mode selection.

2022-03-02 Thread Javier Martinez Canillas
Hello,

On 3/2/22 20:38, Michal Suchánek wrote:
> Hello,
> 
> On Wed, Mar 02, 2022 at 08:31:25PM +0100, Thomas Zimmermann wrote:
>> Hi,
>>
>> is this ready to be merged?
> 
> The objections raised so far have been addressed in v4.
>
> I think this is good to merge.
>

The v4 patches looks good to me and have provided my Reviewed-by to all of them.
 
> Thanks
> 
> Michal
> 
>>
-- 
Best regards,

Javier Martinez Canillas
Linux Engineering
Red Hat



Re: [PATCH] simpldrm: Enable boot time VESA graphic mode selection.

2022-03-02 Thread Michal Suchánek
Hello,

On Wed, Mar 02, 2022 at 08:31:25PM +0100, Thomas Zimmermann wrote:
> Hi,
> 
> is this ready to be merged?

The objections raised so far have been addressed in v4.

I think this is good to merge.

Thanks

Michal

> 
> Best regards
> Thomas
> 
> Am 18.02.22 um 10:33 schrieb Michal Suchanek:
> > Since switch to simpledrm VESA graphic modes are no longer available
> > with legacy BIOS.
> > 
> > The x86 realmode boot code enables the VESA graphic modes when option
> > FB_BOOT_VESA_SUPPORT is enabled.
> > 
> > To enable use of VESA modes with simpledrm in legacy BIOS boot mode drop
> > dependency of BOOT_VESA_SUPPORT on FB, also drop the FB_ prefix, and
> > select the option when simpledrm is built-in on x86.
> > 
> > Fixes: 11e8f5fd223b ("drm: Add simpledrm driver")
> > Signed-off-by: Michal Suchanek 
> > ---
> >   arch/x86/boot/video-vesa.c   | 4 ++--
> >   drivers/gpu/drm/tiny/Kconfig | 1 +
> >   drivers/video/fbdev/Kconfig  | 9 -
> >   3 files changed, 7 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/boot/video-vesa.c b/arch/x86/boot/video-vesa.c
> > index 7e185977a984..c2c6d35e3a43 100644
> > --- a/arch/x86/boot/video-vesa.c
> > +++ b/arch/x86/boot/video-vesa.c
> > @@ -83,7 +83,7 @@ static int vesa_probe(void)
> >(vminfo.memory_layout == 4 ||
> > vminfo.memory_layout == 6) &&
> >vminfo.memory_planes == 1) {
> > -#ifdef CONFIG_FB_BOOT_VESA_SUPPORT
> > +#ifdef CONFIG_BOOT_VESA_SUPPORT
> > /* Graphics mode, color, linear frame buffer
> >supported.  Only register the mode if
> >if framebuffer is configured, however,
> > @@ -121,7 +121,7 @@ static int vesa_set_mode(struct mode_info *mode)
> > if ((vminfo.mode_attr & 0x15) == 0x05) {
> > /* It's a supported text mode */
> > is_graphic = 0;
> > -#ifdef CONFIG_FB_BOOT_VESA_SUPPORT
> > +#ifdef CONFIG_BOOT_VESA_SUPPORT
> > } else if ((vminfo.mode_attr & 0x99) == 0x99) {
> > /* It's a graphics mode with linear frame buffer */
> > is_graphic = 1;
> > diff --git a/drivers/gpu/drm/tiny/Kconfig b/drivers/gpu/drm/tiny/Kconfig
> > index 712e0004e96e..1bc30c64ed15 100644
> > --- a/drivers/gpu/drm/tiny/Kconfig
> > +++ b/drivers/gpu/drm/tiny/Kconfig
> > @@ -54,6 +54,7 @@ config DRM_GM12U320
> >   config DRM_SIMPLEDRM
> > tristate "Simple framebuffer driver"
> > depends on DRM && MMU
> > +   select BOOT_VESA_SUPPORT if X86 && DRM_SIMPLEDRM = y
> > select DRM_GEM_SHMEM_HELPER
> > select DRM_KMS_HELPER
> > help
> > diff --git a/drivers/video/fbdev/Kconfig b/drivers/video/fbdev/Kconfig
> > index 6ed5e608dd04..4f3be9b7a520 100644
> > --- a/drivers/video/fbdev/Kconfig
> > +++ b/drivers/video/fbdev/Kconfig
> > @@ -66,9 +66,8 @@ config FB_DDC
> > select I2C_ALGOBIT
> > select I2C
> > -config FB_BOOT_VESA_SUPPORT
> > +config BOOT_VESA_SUPPORT
> > bool
> > -   depends on FB
> > help
> >   If true, at least one selected framebuffer driver can take advantage
> >   of VESA video modes set at an early boot stage via the vga= parameter.
> > @@ -627,7 +626,7 @@ config FB_VESA
> > select FB_CFB_FILLRECT
> > select FB_CFB_COPYAREA
> > select FB_CFB_IMAGEBLIT
> > -   select FB_BOOT_VESA_SUPPORT
> > +   select BOOT_VESA_SUPPORT
> > help
> >   This is the frame buffer device driver for generic VESA 2.0
> >   compliant graphic cards. The older VESA 1.2 cards are not supported.
> > @@ -1051,7 +1050,7 @@ config FB_INTEL
> > select FB_CFB_FILLRECT
> > select FB_CFB_COPYAREA
> > select FB_CFB_IMAGEBLIT
> > -   select FB_BOOT_VESA_SUPPORT if FB_INTEL = y
> > +   select BOOT_VESA_SUPPORT if FB_INTEL = y
> > depends on !DRM_I915
> > help
> >   This driver supports the on-board graphics built in to the Intel
> > @@ -1378,7 +1377,7 @@ config FB_SIS
> > select FB_CFB_FILLRECT
> > select FB_CFB_COPYAREA
> > select FB_CFB_IMAGEBLIT
> > -   select FB_BOOT_VESA_SUPPORT if FB_SIS = y
> > +   select BOOT_VESA_SUPPORT if FB_SIS = y
> > select FB_SIS_300 if !FB_SIS_315
> > help
> >   This is the frame buffer device driver for the SiS 300, 315, 330
> 
> -- 
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Ivo Totev





Re: [PATCH] simpldrm: Enable boot time VESA graphic mode selection.

2022-03-02 Thread Thomas Zimmermann

Hi,

is this ready to be merged?

Best regards
Thomas

Am 18.02.22 um 10:33 schrieb Michal Suchanek:

Since switch to simpledrm VESA graphic modes are no longer available
with legacy BIOS.

The x86 realmode boot code enables the VESA graphic modes when option
FB_BOOT_VESA_SUPPORT is enabled.

To enable use of VESA modes with simpledrm in legacy BIOS boot mode drop
dependency of BOOT_VESA_SUPPORT on FB, also drop the FB_ prefix, and
select the option when simpledrm is built-in on x86.

Fixes: 11e8f5fd223b ("drm: Add simpledrm driver")
Signed-off-by: Michal Suchanek 
---
  arch/x86/boot/video-vesa.c   | 4 ++--
  drivers/gpu/drm/tiny/Kconfig | 1 +
  drivers/video/fbdev/Kconfig  | 9 -
  3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/boot/video-vesa.c b/arch/x86/boot/video-vesa.c
index 7e185977a984..c2c6d35e3a43 100644
--- a/arch/x86/boot/video-vesa.c
+++ b/arch/x86/boot/video-vesa.c
@@ -83,7 +83,7 @@ static int vesa_probe(void)
   (vminfo.memory_layout == 4 ||
vminfo.memory_layout == 6) &&
   vminfo.memory_planes == 1) {
-#ifdef CONFIG_FB_BOOT_VESA_SUPPORT
+#ifdef CONFIG_BOOT_VESA_SUPPORT
/* Graphics mode, color, linear frame buffer
   supported.  Only register the mode if
   if framebuffer is configured, however,
@@ -121,7 +121,7 @@ static int vesa_set_mode(struct mode_info *mode)
if ((vminfo.mode_attr & 0x15) == 0x05) {
/* It's a supported text mode */
is_graphic = 0;
-#ifdef CONFIG_FB_BOOT_VESA_SUPPORT
+#ifdef CONFIG_BOOT_VESA_SUPPORT
} else if ((vminfo.mode_attr & 0x99) == 0x99) {
/* It's a graphics mode with linear frame buffer */
is_graphic = 1;
diff --git a/drivers/gpu/drm/tiny/Kconfig b/drivers/gpu/drm/tiny/Kconfig
index 712e0004e96e..1bc30c64ed15 100644
--- a/drivers/gpu/drm/tiny/Kconfig
+++ b/drivers/gpu/drm/tiny/Kconfig
@@ -54,6 +54,7 @@ config DRM_GM12U320
  config DRM_SIMPLEDRM
tristate "Simple framebuffer driver"
depends on DRM && MMU
+   select BOOT_VESA_SUPPORT if X86 && DRM_SIMPLEDRM = y
select DRM_GEM_SHMEM_HELPER
select DRM_KMS_HELPER
help
diff --git a/drivers/video/fbdev/Kconfig b/drivers/video/fbdev/Kconfig
index 6ed5e608dd04..4f3be9b7a520 100644
--- a/drivers/video/fbdev/Kconfig
+++ b/drivers/video/fbdev/Kconfig
@@ -66,9 +66,8 @@ config FB_DDC
select I2C_ALGOBIT
select I2C
  
-config FB_BOOT_VESA_SUPPORT

+config BOOT_VESA_SUPPORT
bool
-   depends on FB
help
  If true, at least one selected framebuffer driver can take advantage
  of VESA video modes set at an early boot stage via the vga= parameter.
@@ -627,7 +626,7 @@ config FB_VESA
select FB_CFB_FILLRECT
select FB_CFB_COPYAREA
select FB_CFB_IMAGEBLIT
-   select FB_BOOT_VESA_SUPPORT
+   select BOOT_VESA_SUPPORT
help
  This is the frame buffer device driver for generic VESA 2.0
  compliant graphic cards. The older VESA 1.2 cards are not supported.
@@ -1051,7 +1050,7 @@ config FB_INTEL
select FB_CFB_FILLRECT
select FB_CFB_COPYAREA
select FB_CFB_IMAGEBLIT
-   select FB_BOOT_VESA_SUPPORT if FB_INTEL = y
+   select BOOT_VESA_SUPPORT if FB_INTEL = y
depends on !DRM_I915
help
  This driver supports the on-board graphics built in to the Intel
@@ -1378,7 +1377,7 @@ config FB_SIS
select FB_CFB_FILLRECT
select FB_CFB_COPYAREA
select FB_CFB_IMAGEBLIT
-   select FB_BOOT_VESA_SUPPORT if FB_SIS = y
+   select BOOT_VESA_SUPPORT if FB_SIS = y
select FB_SIS_300 if !FB_SIS_315
help
  This is the frame buffer device driver for the SiS 300, 315, 330


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v3 0/5] fbdev: Improve performance of fbdev console

2022-03-02 Thread Thomas Zimmermann

Hi,

merged with fixes for the typoes in the final patch. Thanks for reviewing.

Best regards
Thomas

Am 23.02.22 um 20:37 schrieb Thomas Zimmermann:

Optimize performance of the fbdev console for the common case of
software-based clearing and image blitting.

The commit descripton of each patch contains resuls os a simple
microbenchmark. I also tested the full patchset's effect on the
console output by printing directory listings (i7-4790, FullHD,
simpledrm, kernel with debugging).

   > time find /usr/share/doc -type f

In the unoptimized case:

   real0m6.173s
   user0m0.044s
   sys 0m6.107s

With optimizations applied:

   real0m4.754s
   user0m0.044s
   sys 0m4.698s

In the optimized case, printing the directory listing is ~25% faster
than before.

In v2 of the patchset, after implementing Sam's suggestion to update
cfb_imageblit() as well, it turns out that the compiled code in
sys_imageblit() is still significantly slower than the CFB version. A
fix is probably a larger task and would include architecture-specific
changes. A new TODO item suggests to investigate the performance of the
various helpers and format-conversion functions in DRM and fbdev.

v3:
* fix description of cfb_imageblit() patch (Pekka)
v2:
* improve readability for sys_imageblit() (Gerd, Sam)
* new TODO item for further optimization

Thomas Zimmermann (5):
   fbdev: Improve performance of sys_fillrect()
   fbdev: Improve performance of sys_imageblit()
   fbdev: Remove trailing whitespaces from cfbimgblt.c
   fbdev: Improve performance of cfb_imageblit()
   drm: Add TODO item for optimizing format helpers

  Documentation/gpu/todo.rst |  22 +
  drivers/video/fbdev/core/cfbimgblt.c   | 107 -
  drivers/video/fbdev/core/sysfillrect.c |  16 +---
  drivers/video/fbdev/core/sysimgblt.c   |  49 ---
  4 files changed, 133 insertions(+), 61 deletions(-)



--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range

2022-03-02 Thread Robin Murphy

On 2022-03-02 15:55, Michael Cheng wrote:

Thanks for the feedback Robin!

Sorry my choices of word weren't that great, but what I meant is to 
understand how ARM flushes a range of dcache for device drivers, and not 
an equal to x86 clflush.


I believe the concern is if the CPU writes an update, that update might 
only be sitting in the CPU cache and never make it to device memory 
where the device can see it; there are specific places that we are 
supposed to flush the CPU caches to make sure our updates are visible to 
the hardware.


Ah, OK, if it's more about ordering, and it's actually write buffers 
rather than caches that you care about flushing, then we might be a lot 
safer, phew!


For a very simple overview, in a case where the device itself needs to 
observe memory writes in the correct order, e.g.:


data_descriptor.valid = 1;

clflush(&data_descriptor);

command_descriptor.data = &data_descriptor

writel(/* control register to read command to then read data */)

then dma_wmb() between the first two writes should be the right tool to 
ensure that the command does not observe the command update while the 
data update is still sat somewhere in a CPU write buffer.


If you want a slightly stronger notion that, at a given point, all prior 
writes have actually been issued and should now be visible (rather than 
just that they won't become visible in the wrong order whenever they 
do), then wmb() should suffice on arm64.


Note that wioth arm64 memory types, a Non-Cacheable mapping of DRAM for 
a non-coherent DMA mapping, or of VRAM in a prefetchable BAR, can still 
be write-buffered, so barriers still matter even when actual cache 
maintenance ops don't (and as before if you're trying to perform cache 
maintenance outside the DMA API then you've already lost anyway). MMIO 
registers should be mapped as Device memory via ioremap(), which is not 
bufferable, hence the barrier implicit in writel() effectively pushes 
out any prior buffered writes ahead of a register write, which is why we 
don't need to worry about this most of the time.


This is only a very rough overview, though, and I'm not familiar enough 
with x86 semantics, your hardware, or the exact use-case to be able to 
say whether barriers alone are anywhere near the right answer or not.


Robin.



+Matt Roper

Matt, Lucas, any feed back here?

On 2022-03-02 4:49 a.m., Robin Murphy wrote:

On 2022-02-25 19:27, Michael Cheng wrote:

Hi Robin,

[ +arm64 maintainers for their awareness, which would have been a 
good thing to do from the start ]


  * Thanks for adding the arm64 maintainer and sorry I didn't rope them
    in sooner.

Why does i915 need to ensure the CPU's instruction cache is coherent 
with its data cache? Is it a self-modifying driver?


  * Also thanks for pointing this out. Initially I was using
    dcache_clean_inval_poc, which seem to be the equivalently to what
    x86 is doing for dcache flushing, but it was giving me build errors
    since its not on the global list of kernel symbols. And after
    revisiting the documentation for caches_clean_inval_pou, it won't
    fly for what we are trying to do. Moving forward, what would you (or
    someone in the ARM community) suggest we do? Could it be possible to
    export dcache_clean_inval_poc as a global symbol?


Unlikely, unless something with a legitimate need for CPU-centric 
cache maintenance like kexec or CPU hotplug ever becomes modular.


In the case of a device driver, it's not even the basic issues of 
assuming to find direct equivalents to x86 semantics in other CPU 
architectures, or effectively reinventing parts of the DMA API, it's 
even bigger than that. Once you move from being integrated in a single 
vendor's system architecture to being on a discrete card, you 
fundamentally *no longer have any control over cache coherency*. 
Whether the host CPU architecture happens to be AArch64, RISC-V, or 
whatever doesn't really matter, you're at the mercy of 3rd-party PCIe 
and interconnect IP vendors, and SoC integrators. You'll find yourself 
in systems where PCIe simply cannot snoop any caches, where you'd 
better have the correct DMA API calls in place to have any hope of 
even the most basic functionality working properly; you'll find 
yourself in systems where even if the PCIe root complex claims to 
support No Snoop, your uncached traffic will still end up snooping 
stale data that got prefetched back into caches you thought you'd 
invalidated; you'll find yourself in systems where your memory 
attributes may or may not get forcibly rewritten by an IOMMU depending 
on the kernel config and/or command line.


It's not about simply finding a substitute for clflush, it's that the 
reasons you have for using clflush in the first place can no longer be 
assumed to be valid.


Robin.


On 2022-02-25 10:24 a.m., Robin Murphy wrote:
[ +arm64 maintainers for their awareness, which would have been a 
good thing to do from the star

Re: [PATCH 0/2] DSI host and peripheral initialisation ordering

2022-03-02 Thread Doug Anderson
Hi,

On Wed, Mar 2, 2022 at 9:20 AM Dave Stevenson
 wrote:
>
> Hi Doug
>
> On Wed, 2 Mar 2022 at 00:13, Doug Anderson  wrote:
> >
> > Hi,
> >
> > On Wed, Feb 16, 2022 at 9:00 AM Dave Stevenson
> >  wrote:
> > >
> > > Hi All
> > >
> > > Hopefully I've cc'ed all those that have bashed this problem around 
> > > previously,
> > > or are otherwise linked to DRM bridges.
> > >
> > > There have been numerous discussions around how DSI support is currently 
> > > broken
> > > as it doesn't support initialising the PHY to LP-11 and potentially the 
> > > clock
> > > lane to HS prior to configuring the DSI peripheral. There is no op where 
> > > the
> > > interface is initialised but HS video isn't also being sent.
> > > Currently you have:
> > > - peripheral pre_enable (host not initialised yet)
> > > - host pre_enable
> > > - encoder enable
> > > - host enable
> > > - peripheral enable (video already running)
> > >
> > > vc4 and exynos currently implement the DSI host as an encoder, and split 
> > > the
> > > bridge_chain. This fails if you want to switch to being a bridge and/or 
> > > use
> > > atomic calls as the state of all the elements split off are not added by
> > > drm_atomic_add_encoder_bridges.
> > >
> > > dw-mipi-dsi[1] and now msm[2] use the mode_set hook to initialise the 
> > > PHY, so
> > > the bridge/panel pre_enable can send commands. In their post_disable they 
> > > then
> > > call the downstream bridge/panel post_disable op manually so that shutdown
> > > commands can be sent before shutting down the PHY. Nothing handles that 
> > > fact,
> > > so the framework then continues down the bridge chain and calls the 
> > > post_disable
> > > again, so we get unbalanced panel prepare/unprepare calls being reported 
> > > [3].
> > >
> > > There have been patches[4] proposing reversing the entire direction of
> > > pre_enable and post_disable, but that risks driving voltage into devices 
> > > that
> > > have yet to be powered up.
> > > There have been discussions about adding either a pre_pre_enable, or 
> > > adding a
> > > DSI host_op to initialise the host[5]. Both require significant reworking 
> > > to all
> > > existing drivers in moving initialisation phases.
> > > We have patches that look like they may well be addressing race 
> > > conditions in
> > > starting up a DSI peripheral[6].
> >
> > In general I'm happy to let the more senior people in DRM set the
> > direction here so I probably won't do lots of review, but I will point
> > out that I did have another proposal that sorta got lost in the noise
> > of the whole "reversing the entire direction". That's basically:
> >
> > https://lists.freedesktop.org/archives/dri-devel/2021-October/328934.html
> >
> > I have no idea if something like that would work for your use case,
> > but after analyzing it it felt like a surprisingly clean proposal even
> > if my first instinct when I thought about it was that it was a hack.
> > ;-) I suspect (but haven't analyzed your code) that it might be
> > equivalent to your proposal of using a flag but maybe easier to wrap
> > ones head around?
>
> If I'm reading that right, then you're proposing adding
> after_pre_enable and before_post_disable hooks.
> That's almost the same as the power_up() and power_down() ops that
> Dmitry suggested earlier, or pre_pre_enable / post_post_disable that
> had also been considered.
>
> Neither of those options handles the case of a longer chain in which
> two non-consecutive links want their upstream bridge enabled first.
> As per the clarification in patch 1/2, considering the chain
> - Panel
>  - Bridge 1
>  - Bridge 2 DRM_BRIDGE_OP_UPSTREAM_FIRST
>  - Bridge 3
>  - Bridge 4 DRM_BRIDGE_OP_UPSTREAM_FIRST
>  - Bridge 5
>  - Encoder
> With the flag option we call pre_enables as Panel, Bridge 1, Bridge 3,
> Bridge 2, Bridge 5, Bridge 4, Encoder.
> If adding after_pre_enable, then we end up with Panel, Bridge 1,
> Bridge 3, Bridge 5, Bridge 4 (after_pre_enable), Bridge 2
> (after_pre_enable), Encoder.
> (power_on / pre_pre_enable from encoder to connector would end up with
> Bridge 5 (power_on), Bridge 3 (power_on), Bridge 1 (power_on), Panel,
> Bridge 2, Bridge 4, Encoder).
> Those potentially work, but it seems a less logical order compared to
> using a flag to swap only the bridges of interest. I think power_on /
> pre_pre_enable covers DSI better than after_pre_enable.
>
> Adding the extra ops requires the source bridge (eg DSI host) to know
> the behaviour the sink bridge/panel wants. So do all DSI hosts have to
> implement power_on to power up and enter LP-11. Some DSI peripherals
> may be quite happy or even prefer to have the bus totally idle /
> powered down at pre_enable, but there would be no way of implementing
> that.

Ah, that makes it super clear, thanks! :-) If the local swap of just
two components that you're doing is more useful to you than the two
stage approach and everyone likes it then I have no objections.


> You seem to be looking at DP, which I h

[PATCH] drm/i915: avoid concurrent writes to aux_inv

2022-03-02 Thread fei . yang
From: Fei Yang 

GPU hangs have been observed when multiple engines write to the
same aux_inv register at the same time. To avoid this each engine
should only invalidate its own auxiliary table. The function
gen12_emit_flush_xcs() currently invalidate the auxiliary table for
all engines because the rq->engine is not necessarily the engine
eventually carrying out the request, and potentially the engine
could even be a virtual one (with engine->instance being -1).
With this patch, auxiliary table invalidation is done only for the
engine executing the request. And the mmio address for the aux_inv
register is set after the engine instance becomes certain.

Signed-off-by: Chris Wilson 
Signed-off-by: Fei Yang 
---
 drivers/gpu/drm/i915/gt/gen8_engine_cs.c  | 41 ---
 .../drm/i915/gt/intel_execlists_submission.c  | 38 +
 drivers/gpu/drm/i915/i915_request.h   |  2 +
 3 files changed, 47 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c 
b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
index b1b9c3fd7bf9..af62e2bc2c9b 100644
--- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
@@ -165,30 +165,6 @@ static u32 preparser_disable(bool state)
return MI_ARB_CHECK | 1 << 8 | state;
 }
 
-static i915_reg_t aux_inv_reg(const struct intel_engine_cs *engine)
-{
-   static const i915_reg_t vd[] = {
-   GEN12_VD0_AUX_NV,
-   GEN12_VD1_AUX_NV,
-   GEN12_VD2_AUX_NV,
-   GEN12_VD3_AUX_NV,
-   };
-
-   static const i915_reg_t ve[] = {
-   GEN12_VE0_AUX_NV,
-   GEN12_VE1_AUX_NV,
-   };
-
-   if (engine->class == VIDEO_DECODE_CLASS)
-   return vd[engine->instance];
-
-   if (engine->class == VIDEO_ENHANCEMENT_CLASS)
-   return ve[engine->instance];
-
-   GEM_BUG_ON("unknown aux_inv reg\n");
-   return INVALID_MMIO_REG;
-}
-
 static u32 *gen12_emit_aux_table_inv(const i915_reg_t inv_reg, u32 *cs)
 {
*cs++ = MI_LOAD_REGISTER_IMM(1);
@@ -288,7 +264,7 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 mode)
if (mode & EMIT_INVALIDATE)
aux_inv = rq->engine->mask & ~BIT(BCS0);
if (aux_inv)
-   cmd += 2 * hweight32(aux_inv) + 2;
+   cmd += 4;
 
cs = intel_ring_begin(rq, cmd);
if (IS_ERR(cs))
@@ -319,16 +295,13 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 
mode)
*cs++ = 0; /* value */
 
if (aux_inv) { /* hsdes: 1809175790 */
-   struct intel_engine_cs *engine;
-   unsigned int tmp;
-
-   *cs++ = MI_LOAD_REGISTER_IMM(hweight32(aux_inv));
-   for_each_engine_masked(engine, rq->engine->gt, aux_inv, tmp) {
-   *cs++ = i915_mmio_reg_offset(aux_inv_reg(engine));
-   *cs++ = AUX_INV;
-   }
+   *cs++ = MI_LOAD_REGISTER_IMM(1);
+   rq->vd_ve_aux_inv = cs;
+   *cs++ = 0; /* address to be set at submission to HW */
+   *cs++ = AUX_INV;
*cs++ = MI_NOOP;
-   }
+   } else
+   rq->vd_ve_aux_inv = NULL;
 
if (mode & EMIT_INVALIDATE)
*cs++ = preparser_disable(false);
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 1c602d4ae297..a018de6dcac5 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1258,6 +1258,34 @@ static bool completed(const struct i915_request *rq)
return __i915_request_is_complete(rq);
 }
 
+static i915_reg_t aux_inv_reg(const struct intel_engine_cs *engine)
+{
+   static const i915_reg_t vd[] = {
+   GEN12_VD0_AUX_NV,
+   GEN12_VD1_AUX_NV,
+   GEN12_VD2_AUX_NV,
+   GEN12_VD3_AUX_NV,
+   };
+
+   static const i915_reg_t ve[] = {
+   GEN12_VE0_AUX_NV,
+   GEN12_VE1_AUX_NV,
+   };
+
+   if (engine->class == VIDEO_DECODE_CLASS) {
+   GEM_BUG_ON(engine->instance >= ARRAY_SIZE(vd));
+   return vd[engine->instance];
+   }
+
+   if (engine->class == VIDEO_ENHANCEMENT_CLASS) {
+   GEM_BUG_ON(engine->instance >= ARRAY_SIZE(ve));
+   return ve[engine->instance];
+   }
+
+   GEM_BUG_ON("unknown aux_inv reg\n");
+   return INVALID_MMIO_REG;
+}
+
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -1538,6 +1566,16 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
}
 
if (__i915_request_submit(rq)) {
+   /* hsdes: 1809175790 */
+   if ((GRAPHICS_VER(engine->i915) ==

Re: [PATCH 5/6] drm/rcar_du: changes to rcar-du driver resulting from drm_writeback_connector structure changes

2022-03-02 Thread Laurent Pinchart
Hi Abhinav,

On Wed, Mar 02, 2022 at 10:28:03AM -0800, Abhinav Kumar wrote:
> On 2/28/2022 5:42 AM, Laurent Pinchart wrote:
> > On Mon, Feb 28, 2022 at 02:28:27PM +0200, Laurent Pinchart wrote:
> >> On Mon, Feb 28, 2022 at 02:09:15PM +0200, Jani Nikula wrote:
> >>> On Mon, 28 Feb 2022, Laurent Pinchart wrote:
>  On Sat, Feb 26, 2022 at 10:27:59AM -0800, Rob Clark wrote:
> > On Wed, Feb 2, 2022 at 7:41 AM Jani Nikula wrote:
> >> On Wed, 02 Feb 2022, Laurent Pinchart wrote:
> >>> On Wed, Feb 02, 2022 at 03:15:03PM +0200, Jani Nikula wrote:
>  On Wed, 02 Feb 2022, Laurent Pinchart wrote:
> > On Wed, Feb 02, 2022 at 02:24:28PM +0530, Kandpal Suraj wrote:
> >> Changing rcar_du driver to accomadate the change of
> >> drm_writeback_connector.base and drm_writeback_connector.encoder
> >> to a pointer the reason for which is explained in the
> >> Patch(drm: add writeback pointers to drm_connector).
> >>
> >> Signed-off-by: Kandpal Suraj 
> >> ---
> >>   drivers/gpu/drm/rcar-du/rcar_du_crtc.h  | 2 ++
> >>   drivers/gpu/drm/rcar-du/rcar_du_writeback.c | 8 +---
> >>   2 files changed, 7 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/rcar-du/rcar_du_crtc.h 
> >> b/drivers/gpu/drm/rcar-du/rcar_du_crtc.h
> >> index 66e8839db708..68f387a04502 100644
> >> --- a/drivers/gpu/drm/rcar-du/rcar_du_crtc.h
> >> +++ b/drivers/gpu/drm/rcar-du/rcar_du_crtc.h
> >> @@ -72,6 +72,8 @@ struct rcar_du_crtc {
> >>const char *const *sources;
> >>unsigned int sources_count;
> >>
> >> + struct drm_connector connector;
> >> + struct drm_encoder encoder;
> >
> > Those fields are, at best, poorly named. Furthermore, there's no 
> > need in
> > this driver or in other drivers using drm_writeback_connector to 
> > create
> > an encoder or connector manually. Let's not polute all drivers 
> > because
> > i915 doesn't have its abstractions right.
> 
>  i915 uses the quite common model for struct inheritance:
> 
>    struct intel_connector {
>    struct drm_connector base;
>    /* ... */
>    }
> 
>  Same with at least amd, ast, fsl-dcu, hisilicon, mga200, msm, 
>  nouveau,
>  radeon, tilcdc, and vboxvideo.
> 
>  We could argue about the relative merits of that abstraction, but I
>  think the bottom line is that it's popular and the drivers using it 
>  are
>  not going to be persuaded to move away from it.
> >>>
> >>> Nobody said inheritance is bad.
> >>>
>  It's no coincidence that the drivers who've implemented writeback so 
>  far
>  (komeda, mali, rcar-du, vc4, and vkms) do not use the abstraction,
>  because the drm_writeback_connector midlayer does, forcing the issue.
> >>>
> >>> Are you sure it's not a coincidence ? :-)
> >>>
> >>> The encoder and especially connector created by 
> >>> drm_writeback_connector
> >>> are there only because KMS requires a drm_encoder and a drm_connector 
> >>> to
> >>> be exposed to userspace (and I could argue that using a connector for
> >>> writeback is a hack, but that won't change). The connector is 
> >>> "virtual",
> >>> I still fail to see why i915 or any other driver would need to wrap it
> >>> into something else. The whole point of the drm_writeback_connector
> >>> abstraction is that drivers do not have to manage the writeback
> >>> drm_connector manually, they shouldn't touch it at all.
> >>
> >> The thing is, drm_writeback_connector_init() calling
> >> drm_connector_init() on the drm_connector embedded in
> >> drm_writeback_connector leads to that connector being added to the
> >> drm_device's list of connectors. Ditto for the encoder.
> >>
> >> All the driver code that handles drm_connectors would need to take into
> >> account they might not be embedded in intel_connector. Throughout the
> >> driver. Ditto for the encoders.
> >
> > The assumption that a connector is embedded in intel_connector doesn't
> > really play that well with how bridge and panel connectors work.. so
> > in general this seems like a good thing to unwind.
> >
> > But as a point of practicality, i915 is a large driver covering a lot
> > of generations of hw with a lot of users.  So I can understand
> > changing this design isn't something that can happen quickly or
> > easily.  IMO we should allow i915 to create it's own connector for
> > writeback, and just document clearly that this isn't the approach new
> > drivers should take.  I mean, I understand idealism, but sometimes a
> > d

Re: [PATCH][next] drm: ssd130x: remove redundant initialization of pointer mode

2022-03-02 Thread Javier Martinez Canillas
Hello Colin,

Thanks for the patch.

On Wed, Mar 2, 2022 at 6:53 PM Colin Ian King  wrote:
>
> Pointer mode is being assigned a value that is never read, it is
> being re-assigned later with a new value. The initialization is
> redundant and can be removed.
>

Indeed.

Acked-by: Javier Martinez Canillas 

Best regards,
Javier



Re: [PATCH V5 3/6] dt-bindings: display: mediatek: revise enum to const

2022-03-02 Thread Rob Herring
On Tue, Mar 01, 2022 at 04:01:02PM +0800, Rex-BC Chen wrote:
> There won't be more than 1 fallback for these bindings, so we modify
> them to use const instead of enum.
> 
> Signed-off-by: Rex-BC Chen 
> ---
>  .../devicetree/bindings/display/mediatek/mediatek,aal.yaml| 2 +-
>  .../devicetree/bindings/display/mediatek/mediatek,ccorr.yaml  | 2 +-
>  .../devicetree/bindings/display/mediatek/mediatek,color.yaml  | 4 ++--
>  .../devicetree/bindings/display/mediatek/mediatek,dither.yaml | 2 +-
>  .../devicetree/bindings/display/mediatek/mediatek,gamma.yaml  | 2 +-
>  .../devicetree/bindings/display/mediatek/mediatek,ovl.yaml| 4 ++--
>  .../devicetree/bindings/display/mediatek/mediatek,rdma.yaml   | 4 ++--
>  7 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git 
> a/Documentation/devicetree/bindings/display/mediatek/mediatek,aal.yaml 
> b/Documentation/devicetree/bindings/display/mediatek/mediatek,aal.yaml
> index 191b56e16bee..bc1c70d089ba 100644
> --- a/Documentation/devicetree/bindings/display/mediatek/mediatek,aal.yaml
> +++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,aal.yaml
> @@ -31,7 +31,7 @@ properties:
>- mediatek,mt8183-disp-aal
>- mediatek,mt8192-disp-aal
>- mediatek,mt8195-disp-aal
> -  - enum:
> +  - const:
>- mediatek,mt8173-disp-aal

Run 'make dt_binding_check'. This will fail.

>  
>reg:
> diff --git 
> a/Documentation/devicetree/bindings/display/mediatek/mediatek,ccorr.yaml 
> b/Documentation/devicetree/bindings/display/mediatek/mediatek,ccorr.yaml
> index 6894b6999412..fc40e2981eaa 100644
> --- a/Documentation/devicetree/bindings/display/mediatek/mediatek,ccorr.yaml
> +++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,ccorr.yaml
> @@ -28,7 +28,7 @@ properties:
>- items:
>- enum:
>- mediatek,mt8195-disp-ccorr
> -  - enum:
> +  - const:
>- mediatek,mt8192-disp-ccorr
>  
>reg:
> diff --git 
> a/Documentation/devicetree/bindings/display/mediatek/mediatek,color.yaml 
> b/Documentation/devicetree/bindings/display/mediatek/mediatek,color.yaml
> index bc83155b3b4c..22ed5d368cbe 100644
> --- a/Documentation/devicetree/bindings/display/mediatek/mediatek,color.yaml
> +++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,color.yaml
> @@ -32,14 +32,14 @@ properties:
>- enum:
>- mediatek,mt7623-disp-color
>- mediatek,mt2712-disp-color
> -  - enum:
> +  - const:
>- mediatek,mt2701-disp-color
>- items:
>- enum:
>- mediatek,mt8183-disp-color
>- mediatek,mt8192-disp-color
>- mediatek,mt8195-disp-color
> -  - enum:
> +  - const:
>- mediatek,mt8173-disp-color
>reg:
>  maxItems: 1
> diff --git 
> a/Documentation/devicetree/bindings/display/mediatek/mediatek,dither.yaml 
> b/Documentation/devicetree/bindings/display/mediatek/mediatek,dither.yaml
> index 9d89297f5f1d..2a82792b2625 100644
> --- a/Documentation/devicetree/bindings/display/mediatek/mediatek,dither.yaml
> +++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,dither.yaml
> @@ -28,7 +28,7 @@ properties:
>- enum:
>- mediatek,mt8192-disp-dither
>- mediatek,mt8195-disp-dither
> -  - enum:
> +  - const:
>- mediatek,mt8183-disp-dither
>  
>reg:
> diff --git 
> a/Documentation/devicetree/bindings/display/mediatek/mediatek,gamma.yaml 
> b/Documentation/devicetree/bindings/display/mediatek/mediatek,gamma.yaml
> index 247baad147b3..45f88bdf07bb 100644
> --- a/Documentation/devicetree/bindings/display/mediatek/mediatek,gamma.yaml
> +++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,gamma.yaml
> @@ -29,7 +29,7 @@ properties:
>- enum:
>- mediatek,mt8192-disp-gamma
>- mediatek,mt8195-disp-gamma
> -  - enum:
> +  - const:
>- mediatek,mt8183-disp-gamma
>  
>reg:
> diff --git 
> a/Documentation/devicetree/bindings/display/mediatek/mediatek,ovl.yaml 
> b/Documentation/devicetree/bindings/display/mediatek/mediatek,ovl.yaml
> index e71f79bc2dee..88ba3f27b0e4 100644
> --- a/Documentation/devicetree/bindings/display/mediatek/mediatek,ovl.yaml
> +++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,ovl.yaml
> @@ -33,12 +33,12 @@ properties:
>- enum:
>- mediatek,mt7623-disp-ovl
>- mediatek,mt2712-disp-ovl
> -  - enum:
> +  - const:
>- mediatek,mt2701-disp-ovl
>- items:
>- enum:
>- mediatek,mt8195-disp-ovl
> -  - enum:
> +  - const:
>- mediatek,mt8183-disp-ovl
>  
>reg:
> diff --git 
> a/Documentation/devicetree/bindings/display/mediatek/m

Re: [PATCH 5/6] drm/rcar_du: changes to rcar-du driver resulting from drm_writeback_connector structure changes

2022-03-02 Thread Abhinav Kumar




On 2/28/2022 5:42 AM, Laurent Pinchart wrote:

Hello,

On Mon, Feb 28, 2022 at 02:28:27PM +0200, Laurent Pinchart wrote:

On Mon, Feb 28, 2022 at 02:09:15PM +0200, Jani Nikula wrote:

On Mon, 28 Feb 2022, Laurent Pinchart wrote:

On Sat, Feb 26, 2022 at 10:27:59AM -0800, Rob Clark wrote:

On Wed, Feb 2, 2022 at 7:41 AM Jani Nikula wrote:

On Wed, 02 Feb 2022, Laurent Pinchart wrote:

On Wed, Feb 02, 2022 at 03:15:03PM +0200, Jani Nikula wrote:

On Wed, 02 Feb 2022, Laurent Pinchart wrote:

On Wed, Feb 02, 2022 at 02:24:28PM +0530, Kandpal Suraj wrote:

Changing rcar_du driver to accomadate the change of
drm_writeback_connector.base and drm_writeback_connector.encoder
to a pointer the reason for which is explained in the
Patch(drm: add writeback pointers to drm_connector).

Signed-off-by: Kandpal Suraj 
---
  drivers/gpu/drm/rcar-du/rcar_du_crtc.h  | 2 ++
  drivers/gpu/drm/rcar-du/rcar_du_writeback.c | 8 +---
  2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/rcar-du/rcar_du_crtc.h 
b/drivers/gpu/drm/rcar-du/rcar_du_crtc.h
index 66e8839db708..68f387a04502 100644
--- a/drivers/gpu/drm/rcar-du/rcar_du_crtc.h
+++ b/drivers/gpu/drm/rcar-du/rcar_du_crtc.h
@@ -72,6 +72,8 @@ struct rcar_du_crtc {
   const char *const *sources;
   unsigned int sources_count;

+ struct drm_connector connector;
+ struct drm_encoder encoder;


Those fields are, at best, poorly named. Furthermore, there's no need in
this driver or in other drivers using drm_writeback_connector to create
an encoder or connector manually. Let's not polute all drivers because
i915 doesn't have its abstractions right.


i915 uses the quite common model for struct inheritance:

  struct intel_connector {
  struct drm_connector base;
  /* ... */
  }

Same with at least amd, ast, fsl-dcu, hisilicon, mga200, msm, nouveau,
radeon, tilcdc, and vboxvideo.

We could argue about the relative merits of that abstraction, but I
think the bottom line is that it's popular and the drivers using it are
not going to be persuaded to move away from it.


Nobody said inheritance is bad.


It's no coincidence that the drivers who've implemented writeback so far
(komeda, mali, rcar-du, vc4, and vkms) do not use the abstraction,
because the drm_writeback_connector midlayer does, forcing the issue.


Are you sure it's not a coincidence ? :-)

The encoder and especially connector created by drm_writeback_connector
are there only because KMS requires a drm_encoder and a drm_connector to
be exposed to userspace (and I could argue that using a connector for
writeback is a hack, but that won't change). The connector is "virtual",
I still fail to see why i915 or any other driver would need to wrap it
into something else. The whole point of the drm_writeback_connector
abstraction is that drivers do not have to manage the writeback
drm_connector manually, they shouldn't touch it at all.


The thing is, drm_writeback_connector_init() calling
drm_connector_init() on the drm_connector embedded in
drm_writeback_connector leads to that connector being added to the
drm_device's list of connectors. Ditto for the encoder.

All the driver code that handles drm_connectors would need to take into
account they might not be embedded in intel_connector. Throughout the
driver. Ditto for the encoders.


The assumption that a connector is embedded in intel_connector doesn't
really play that well with how bridge and panel connectors work.. so
in general this seems like a good thing to unwind.

But as a point of practicality, i915 is a large driver covering a lot
of generations of hw with a lot of users.  So I can understand
changing this design isn't something that can happen quickly or
easily.  IMO we should allow i915 to create it's own connector for
writeback, and just document clearly that this isn't the approach new
drivers should take.  I mean, I understand idealism, but sometimes a
dose of pragmatism is needed. :-)


i915 is big, but so is Intel. It's not fair to treat everybody else as a
second class citizen and let Intel get away without doing its homework.


Laurent, as you accuse us of not doing our homework, I'll point out that
we've been embedding drm crtc, encoder and connector ever since
modesetting support was added to i915 in 2008, since before *any* of the
things you now use as a rationale for asking us to do a massive rewrite
of the driver existed.

It's been ok to embed those structures for well over ten years. It's a
common pattern, basically throughout the kernel. Other drivers do it
too, not just i915. There hasn't been the slightest hint this should not
be done until this very conversation.


I want to see this refactoring effort moving forward in i915 (and moving
to drm_bridge would then be a good idea too). If writeback support in
i915 urgent, then we can discuss *temporary* pragmatic stopgap measures,
but not without a real effort to fix the core issue.


I think the onus is on you to first convince everyo

Re: [Intel-gfx] [PATCH v2 1/3] drm/i915/guc: Limit scheduling properties to avoid overflow

2022-03-02 Thread John Harrison

On 3/2/2022 01:49, Tvrtko Ursulin wrote:

On 25/02/2022 20:41, john.c.harri...@intel.com wrote:

From: John Harrison 

GuC converts the pre-emption timeout and timeslice quantum values into
clock ticks internally. That significantly reduces the point of 32bit
overflow. On current platforms, worst case scenario is approximately
110 seconds. Rather than allowing the user to set higher values and
then get confused by early timeouts, add limits when setting these
values.

v2: Add helper functins for clamping (review feedback from Tvrtko).

Signed-off-by: John Harrison 
Reviewed-by: Daniele Ceraolo Spurio  
(v1)

---
  drivers/gpu/drm/i915/gt/intel_engine.h  |  6 ++
  drivers/gpu/drm/i915/gt/intel_engine_cs.c   | 69 +
  drivers/gpu/drm/i915/gt/sysfs_engines.c | 25 +---
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h |  9 +++
  4 files changed, 99 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
b/drivers/gpu/drm/i915/gt/intel_engine.h

index be4b1e65442f..5a9186f784c4 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -349,4 +349,10 @@ intel_engine_get_hung_context(struct 
intel_engine_cs *engine)

  return engine->hung_ce;
  }
  +u64 intel_clamp_heartbeat_interval_ms(struct intel_engine_cs 
*engine, u64 value);
+u64 intel_clamp_max_busywait_duration_ns(struct intel_engine_cs 
*engine, u64 value);
+u64 intel_clamp_preempt_timeout_ms(struct intel_engine_cs *engine, 
u64 value);
+u64 intel_clamp_stop_timeout_ms(struct intel_engine_cs *engine, u64 
value);
+u64 intel_clamp_timeslice_duration_ms(struct intel_engine_cs 
*engine, u64 value);

+
  #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c

index e855c801ba28..7ad9e6006656 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -399,6 +399,26 @@ static int intel_engine_setup(struct intel_gt 
*gt, enum intel_engine_id id,

  if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS)
  engine->props.preempt_timeout_ms = 0;
  +    /* Cap properties according to any system limits */
+#define CLAMP_PROP(field) \
+    do { \
+    u64 clamp = intel_clamp_##field(engine, engine->props.field); \
+    if (clamp != engine->props.field) { \
+    drm_notice(&engine->i915->drm, \
+   "Warning, clamping %s to %lld to prevent 
overflow\n", \

+   #field, clamp); \
+    engine->props.field = clamp; \
+    } \
+    } while (0)
+
+    CLAMP_PROP(heartbeat_interval_ms);
+    CLAMP_PROP(max_busywait_duration_ns);
+    CLAMP_PROP(preempt_timeout_ms);
+    CLAMP_PROP(stop_timeout_ms);
+    CLAMP_PROP(timeslice_duration_ms);
+
+#undef CLAMP_PROP
+
  engine->defaults = engine->props; /* never to change again */
    engine->context_size = intel_engine_context_size(gt, 
engine->class);
@@ -421,6 +441,55 @@ static int intel_engine_setup(struct intel_gt 
*gt, enum intel_engine_id id,

  return 0;
  }
  +u64 intel_clamp_heartbeat_interval_ms(struct intel_engine_cs 
*engine, u64 value)

+{
+    value = min_t(u64, value, jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT));
+
+    return value;
+}
+
+u64 intel_clamp_max_busywait_duration_ns(struct intel_engine_cs 
*engine, u64 value)

+{
+    value = min(value, jiffies_to_nsecs(2));
+
+    return value;
+}
+
+u64 intel_clamp_preempt_timeout_ms(struct intel_engine_cs *engine, 
u64 value)

+{
+    /*
+ * NB: The GuC API only supports 32bit values. However, the 
limit is further
+ * reduced due to internal calculations which would otherwise 
overflow.

+ */
+    if (intel_guc_submission_is_wanted(&engine->gt->uc.guc))
+    value = min_t(u64, value, GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS);
+
+    value = min_t(u64, value, jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT));
+
+    return value;
+}
+
+u64 intel_clamp_stop_timeout_ms(struct intel_engine_cs *engine, u64 
value)

+{
+    value = min_t(u64, value, jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT));
+
+    return value;
+}
+
+u64 intel_clamp_timeslice_duration_ms(struct intel_engine_cs 
*engine, u64 value)

+{
+    /*
+ * NB: The GuC API only supports 32bit values. However, the 
limit is further
+ * reduced due to internal calculations which would otherwise 
overflow.

+ */
+    if (intel_guc_submission_is_wanted(&engine->gt->uc.guc))
+    value = min_t(u64, value, GUC_POLICY_MAX_EXEC_QUANTUM_MS);
+
+    value = min_t(u64, value, jiffies_to_msecs(MAX_SCHEDULE_TIMEOUT));
+
+    return value;
+}
+
  static void __setup_engine_capabilities(struct intel_engine_cs 
*engine)

  {
  struct drm_i915_private *i915 = engine->i915;
diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c 
b/drivers/gpu/drm/i915/gt/sysfs_engines.c

index 967031056202..f2d9858d827c 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -144,7

Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc: Limit scheduling properties to avoid overflow

2022-03-02 Thread John Harrison

On 3/2/2022 01:20, Tvrtko Ursulin wrote:

On 01/03/2022 19:57, John Harrison wrote:

On 3/1/2022 02:50, Tvrtko Ursulin wrote:

On 28/02/2022 18:32, John Harrison wrote:

On 2/28/2022 08:11, Tvrtko Ursulin wrote:

On 25/02/2022 17:39, John Harrison wrote:

On 2/25/2022 09:06, Tvrtko Ursulin wrote:


On 24/02/2022 19:19, John Harrison wrote:

[snip]


./gt/uc/intel_guc_fwif.h: u32 execution_quantum;

./gt/uc/intel_guc_submission.c: desc->execution_quantum = 
engine->props.timeslice_duration_ms * 1000;


./gt/intel_engine_types.h: unsigned long 
timeslice_duration_ms;


timeslice_store/preempt_timeout_store:
err = kstrtoull(buf, 0, &duration);

So both kconfig and sysfs can already overflow GuC, not 
only because of tick conversion internally but because at 
backend level nothing was done for assigning 64-bit into 
32-bit. Or I failed to find where it is handled.
That's why I'm adding this range check to make sure we 
don't allow overflows.


Yes and no, this fixes it, but the first bug was not only 
due GuC internal tick conversion. It was present ever since 
the u64 from i915 was shoved into u32 sent to GuC. So even 
if GuC used the value without additional multiplication, bug 
was be there. My point being when GuC backend was added 
timeout_ms values should have been limited/clamped to 
U32_MAX. The tick discovery is additional limit on top.
I'm not disagreeing. I'm just saying that the truncation 
wasn't noticed until I actually tried using very long 
timeouts to debug a particular problem. Now that it is 
noticed, we need some method of range checking and this 
simple clamp solves all the truncation problems.


Agreed in principle, just please mention in the commit message 
all aspects of the problem.


I think we can get away without a Fixes: tag since it requires 
user fiddling to break things in unexpected ways.


I would though put in a code a clamping which expresses both, 
something like min(u32, ..GUC LIMIT..). So the full story is 
documented forever. Or "if > u32 || > ..GUC LIMIT..) return 
-EINVAL". Just in case GuC limit one day changes but u32 
stays. Perhaps internal ticks go away or anything and we are 
left with plain 1:1 millisecond relationship.
Can certainly add a comment along the lines of "GuC API only 
takes a 32bit field but that is further reduced to GUC_LIMIT 
due to internal calculations which would otherwise overflow".


But if the GuC limit is > u32 then, by definition, that means 
the GuC API has changed to take a u64 instead of a u32. So 
there will no u32 truncation any more. So I'm not seeing a need 
to explicitly test the integer size when the value check covers 
that.


Hmm I was thinking if the internal conversion in the GuC fw 
changes so that GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS goes above 
u32, then to be extra safe by documenting in code there is the 
additional limit of the data structure field. Say the field was 
changed to take some unit larger than a millisecond. Then the 
check against the GuC MAX limit define would not be enough, 
unless that would account both for internal implementation and 
u32 in the protocol. Maybe that is overdefensive but I don't see 
that it harms. 50-50, but it's do it once and forget so I'd do it.

Huh?

How can the limit be greater than a u32 if the interface only 
takes a u32? By definition the limit would be clamped to u32 size.


If you mean that the GuC policy is in different units and those 
units might not overflow but ms units do, then actually that is 
already the case. The GuC works in us not ms. That's part of why 
the wrap around is so low, we have to multiply by 1000 before 
sending to GuC. However, that is actually irrelevant because the 
comparison is being done on the i915 side in i915's units. We 
have to scale the GuC limit to match what i915 is using. And the 
i915 side is u64 so if the scaling to i915 numbers overflows a 
u32 then who cares because that comparison can be done at 64 bits 
wide.


If the units change then that is a backwards breaking API change 
that will require a manual driver code update. You can't just 
recompile with a new header and magically get an ms to us or ms 
to s conversion in your a = b assignment. The code will need to 
be changed to do the new unit conversion (note we already convert 
from ms to us, the GuC API is all expressed in us). And that code 
change will mean having to revisit any and all scaling, type 
conversions, etc. I.e. any pre-existing checks will not 
necessarily be valid and will need to be re-visted anyway. But as 
above, any scaling to GuC units has to be incorporated into the 
limit already because otherwise the limit would not fit in the 
GuC's own API.


Yes I get that, I was just worried that u32 field in the protocol 
and GUC_POLICY_MAX_EXEC_QUANTUM_MS defines are separate in the 
source code and then how to protect against forgetting to update 
both in sync.


Like if the protocol was changed to take nanoseconds, and firmware 
implementation changed to support t

Re: [RESEND PATCH] dt-bindings: display/msm: add missing brace in dpu-qcm2290.yaml

2022-03-02 Thread Rob Herring
On Wed, Mar 02, 2022 at 03:14:10AM +0300, Dmitry Baryshkov wrote:
> Add missing brace in dpu-qcm2290.yaml. While we are at it, also fix
> indentation for another brace, so it matches the corresponding line.
> 
> Reported-by: Rob Herring 
> Cc: Loic Poulain 
> Reviewed-by: Bjorn Andersson 
> Signed-off-by: Dmitry Baryshkov 
> ---
> Didn't include freedreno@ in the first email, so resending.
> ---
>  Documentation/devicetree/bindings/display/msm/dpu-qcm2290.yaml | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Thanks for fixing.

Acked-by: Rob Herring 


Re: [PATCH v2 1/2] dt-bindings: display: msm: Add optional resets

2022-03-02 Thread Rob Herring
On Tue, Mar 01, 2022 at 05:29:30PM -0800, Bjorn Andersson wrote:
> Add an optional reference to the MDSS_CORE reset, which when specified
> can be used by the implementation to reset the hardware blocks.
> 
> Signed-off-by: Bjorn Andersson 
> ---
> 
> Changes since v1:
> - New approach/patch
> 
>  .../devicetree/bindings/display/msm/dpu-qcm2290.yaml  | 4 
>  Documentation/devicetree/bindings/display/msm/dpu-sc7180.yaml | 4 
>  Documentation/devicetree/bindings/display/msm/dpu-sc7280.yaml | 4 
>  Documentation/devicetree/bindings/display/msm/dpu-sdm845.yaml | 4 
>  4 files changed, 16 insertions(+)

Acked-by: Rob Herring 



Re: [PATCH] dt-bindings: gpu: Convert aspeed-gfx bindings to yaml

2022-03-02 Thread Rob Herring
On Wed, Mar 02, 2022 at 03:40:56PM +1030, Joel Stanley wrote:
> Convert the bindings to yaml and add the ast2600 compatible string.
> 
> Signed-off-by: Joel Stanley 
> ---
>  .../devicetree/bindings/gpu/aspeed,gfx.yaml   | 69 +++
>  .../devicetree/bindings/gpu/aspeed-gfx.txt| 41 ---
>  2 files changed, 69 insertions(+), 41 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/gpu/aspeed,gfx.yaml
>  delete mode 100644 Documentation/devicetree/bindings/gpu/aspeed-gfx.txt

Applied, thanks.


Re: [Intel-gfx] [PATCH 2/3] drm/i915/gt: Make the heartbeat play nice with long pre-emption timeouts

2022-03-02 Thread John Harrison

On 3/2/2022 03:07, Tvrtko Ursulin wrote:

On 01/03/2022 20:59, John Harrison wrote:

On 3/1/2022 04:09, Tvrtko Ursulin wrote:


I'll trim it a bit again..

On 28/02/2022 18:55, John Harrison wrote:

On 2/28/2022 09:12, Tvrtko Ursulin wrote:

On 25/02/2022 18:48, John Harrison wrote:

On 2/25/2022 10:14, Tvrtko Ursulin wrote:


[snip]

Your only objection is that ends up with too long total time 
before reset? Or something else as well?
An unnecessarily long total heartbeat timeout is the main 
objection. (2.5 + 12) * 5 = 72.5 seconds. That is a massive 
change from the current 12.5s.


If we are happy with that huge increase then fine. But I'm pretty 
sure you are going to get a lot more bug reports about hung 
systems not recovering. 10-20s is just about long enough for 
someone to wait before leaning on the power button of their 
machine. Over a minute is not. That kind of delay is going to 
cause support issues.


Sorry I wrote 12s, while you actually said tP * 12, so 7.68s, 
chosen just so it is longer than tH * 3?


And how do you keep coming up with factor of five? Isn't it four 
periods before "heartbeat stopped"? (Prio normal, hearbeat, 
barrier and then reset.)

Prio starts at low not normal.


Right, slipped my mind since I only keep seeing that one priority 
ladder block in intel_engine_heartbeat.c/heartbeat()..


From the point of view of user experience I agree reasonable 
responsiveness is needed before user "reaches for the power button".


In your proposal we are talking about 3 * 2.5s + 2 * 7.5s, so 22.5s.

Question of workloads.. what is the actual preempt timeout compute 
is happy with? And I don't mean compute setups with disabled 
hangcheck, which you say they want anyway, but if we target 
defaults for end users. Do we have some numbers on what they are 
likely to run?
Not that I have ever seen. This is all just finger in the air 
stuff. I don't recall if we invented the number and the compute 
people agreed with it or if they proposed the number to us.


Yeah me neither. And found nothing in my email archives. :(

Thinking about it today I don't see that disabled timeout is a 
practical default.


With it, if users have something un-preemptable to run (assuming 
prio normal), it would get killed after ~13s (5 * 2.5).


If we go for my scheme it gets killed in ~17.5s (3 * (2.5 + 2.5) + 
2.5 (third pulse triggers preempt timeout)).


And if we go for your scheme it gets killed in ~22.5s (4 * 2.5 + 2 * 
3 * 2.5).
Erm, that is not an apples to apples comparison. Your 17.5 is for an 
engine reset tripped by the pre-emption timeout, but your 22.5s is 
for a GT reset tripped by the heartbeat reaching the end and nuking 
the universe.


Right, in your scheme I did get it wrong. It would wait for GuC to 
reset the engine at the end, rather than hit the fake "hearbeat 
stopped" in that case, full reset path.


4 * 2.5 to trigger a max prio pulse, then 3 * 2.5 preempt timeout for 
GuC to reset (last hearbeat delay extended so it does not trigger). So 
17.5 as well.


Again, apples or oranges? I was using your tP(RCS) == 2.5s assumption in 
all the above calculations given that the discussion was about the 
heartbeat algorithm, not the choice of pre-emption timeout. In which 
case the last heartbeat is max(tP * 2, tH) == 2 * 2.5s.




If you are saying that the first pulse at sufficient priority (third 
being normal prio) is what causes the reset because the system is 
working as expected and the pre-emption timeout trips the reset. In 
that case, you have two periods to get to normal prio plus one 
pre-emption timeout to trip the reset. I.e. (tH * 2) + tP.


Your scheme is then tH(actual) = tH(user) + tP, yes?
So pre-emption based reset is after ((tH(user) + tP) * 2) + tP => (3 
* tP) + (2 * tH)

And GT based reset is after (tH(user) + tP) * 5 => (5 * tP) + (5 * tH)

My scheme is tH(actual) = tH(user) for first four, then max(tH(user), 
tP) for fifth.

So pre-emption based reset is after tH(user) * 2 + tP = > tP + (2 * tH);
And GT based reset is after (tH(user) * 4) + (max(tH(user), tP) * 1) 
=> greater of ((4 * tH) + tP) or (5 * tH)


Either way your scheme is longer. With tH(user) = 2.5s, tP(RCS) = 
7.5s, we get 27.5s for engine and 50s for GT versus my 12.5s for 
engine and 17.5s for GT. With tP(RCS) = 2.5s, yours is 12.5s for 
engine and 25s for GT versus my 7.5s for engine and 12.5s for GT.


Plus, not sure why your calculations above are using 2.5 for tP? Are 
you still arguing that 7.5s is too long? That is a separate issue and 
not related to the heartbeat algorithms. tP must be long enough to 
allow 'out of box OpenCL workloads to complete'. That doesn't just 
mean not being killed by the heartbeat, it also means not being 
killed by running two of them concurrently (or one plus desktop 
OpenGL rendering) and not having it killed by basic time slicing 
between the two contexts. The heartbeat is not involved in that 
process. That is purely the pre-emption timeout. And that is the 
fun

[PATCH][next] drm: ssd130x: remove redundant initialization of pointer mode

2022-03-02 Thread Colin Ian King
Pointer mode is being assigned a value that is never read, it is
being re-assigned later with a new value. The initialization is
redundant and can be removed.

Cleans up clang scan build warning:
drivers/gpu/drm/solomon/ssd130x.c:582:27: warning: Value stored
to 'mode' during its initialization is never read [deadcode.DeadStores]

Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/solomon/ssd130x.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/solomon/ssd130x.c 
b/drivers/gpu/drm/solomon/ssd130x.c
index 92c1902f53e4..ce4dc20412e0 100644
--- a/drivers/gpu/drm/solomon/ssd130x.c
+++ b/drivers/gpu/drm/solomon/ssd130x.c
@@ -579,7 +579,7 @@ static const struct drm_simple_display_pipe_funcs 
ssd130x_pipe_funcs = {
 static int ssd130x_connector_get_modes(struct drm_connector *connector)
 {
struct ssd130x_device *ssd130x = drm_to_ssd130x(connector->dev);
-   struct drm_display_mode *mode = &ssd130x->mode;
+   struct drm_display_mode *mode;
struct device *dev = ssd130x->dev;
 
mode = drm_mode_duplicate(connector->dev, &ssd130x->mode);
-- 
2.34.1



Re: [Intel-gfx] [PATCH 0/3] Improve anti-pre-emption w/a for compute workloads

2022-03-02 Thread John Harrison

On 3/2/2022 03:21, Tvrtko Ursulin wrote:

On 28/02/2022 19:17, John Harrison wrote:

On 2/28/2022 07:32, Tvrtko Ursulin wrote:

On 25/02/2022 19:03, John Harrison wrote:

On 2/25/2022 10:29, Tvrtko Ursulin wrote:

On 25/02/2022 18:01, John Harrison wrote:

On 2/25/2022 09:39, Tvrtko Ursulin wrote:

On 25/02/2022 17:11, John Harrison wrote:

On 2/25/2022 08:36, Tvrtko Ursulin wrote:

On 24/02/2022 20:02, John Harrison wrote:

On 2/23/2022 04:00, Tvrtko Ursulin wrote:

On 23/02/2022 02:22, John Harrison wrote:

On 2/22/2022 01:53, Tvrtko Ursulin wrote:

On 18/02/2022 21:33, john.c.harri...@intel.com wrote:

From: John Harrison 

Compute workloads are inherently not pre-emptible on 
current hardware.
Thus the pre-emption timeout was disabled as a workaround 
to prevent
unwanted resets. Instead, the hang detection was left to 
the heartbeat
and its (longer) timeout. This is undesirable with GuC 
submission as
the heartbeat is a full GT reset rather than a per engine 
reset and so
is much more destructive. Instead, just bump the 
pre-emption timeout


Can we have a feature request to allow asking GuC for an 
engine reset?

For what purpose?


To allow "stopped heartbeat" to reset the engine, however..

GuC manages the scheduling of contexts across engines. With 
virtual engines, the KMD has no knowledge of which engine a 
context might be executing on. Even without virtual 
engines, the KMD still has no knowledge of which context is 
currently executing on any given engine at any given time.


There is a reason why hang detection should be left to the 
entity that is doing the scheduling. Any other entity is 
second guessing at best.


The reason for keeping the heartbeat around even when GuC 
submission is enabled is for the case where the KMD/GuC 
have got out of sync with either other somehow or GuC 
itself has just crashed. I.e. when no submission at all is 
working and we need to reset the GuC itself and start over.


.. I wasn't really up to speed to know/remember heartbeats 
are nerfed already in GuC mode.
Not sure what you mean by that claim. Engine resets are 
handled by GuC because GuC handles the scheduling. You can't 
do the former if you aren't doing the latter. However, the 
heartbeat is still present and is still the watchdog by which 
engine resets are triggered. As per the rest of the 
submission process, the hang detection and recovery is split 
between i915 and GuC.


I meant that "stopped heartbeat on engine XXX" can only do a 
full GPU reset on GuC.
I mean that there is no 'stopped heartbeat on engine XXX' when 
i915 is not handling the recovery part of the process.


H?

static void
reset_engine(struct intel_engine_cs *engine, struct i915_request 
*rq)

{
if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
    show_heartbeat(rq, engine);

if (intel_engine_uses_guc(engine))
    /*
 * GuC itself is toast or GuC's hang detection
 * is disabled. Either way, need to find the
 * hang culprit manually.
 */
    intel_guc_find_hung_context(engine);

intel_gt_handle_error(engine->gt, engine->mask,
  I915_ERROR_CAPTURE,
  "stopped heartbeat on %s",
  engine->name);
}

How there is no "stopped hearbeat" in guc mode? From this code 
it certainly looks there is.
Only when the GuC is toast and it is no longer an engine reset 
but a full GT reset that is required. So technically, it is not a 
'stopped heartbeat on engine XXX' it is 'stopped heartbeat on GT#'.




You say below heartbeats are going in GuC mode. Now I totally 
don't understand how they are going but there is allegedly no 
"stopped hearbeat".
Because if GuC is handling the detection and recovery then i915 
will not reach that point. GuC will do the engine reset and start 
scheduling the next context before the heartbeat period expires. 
So the notification will be a G2H about a specific context being 
reset rather than the i915 notification about a stopped heartbeat.






intel_gt_handle_error(engine->gt, engine->mask,
  I915_ERROR_CAPTURE,
  "stopped heartbeat on %s",
  engine->name);

intel_gt_handle_error:

/*
 * Try engine reset when available. We fall back to full 
reset if

 * single reset fails.
 */
if (!intel_uc_uses_guc_submission(>->uc) &&
    intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
    local_bh_disable();
    for_each_engine_masked(engine, gt, engine_mask, tmp) {

You said "However, the heartbeat is still present and is still 
the watchdog by which engine resets are triggered", now I 
don't know what you meant by this. It actually triggers a 
single engine reset in GuC mode? Where in code does that 
happen if this block above shows it not taking the engine 
reset path?

i915 sends down the per engine pulse.
GuC schedules the pulse
GuC attempts to pre-empt the currently active context
GuC detects the pre-emption timeout
GuC resets th

[PATCH v1 06/10] drm/msm/a6xx: Propagate OOB set error

2022-03-02 Thread Akhil P Oommen
Propagate OOB set error to higher level so that a coredump is captured
followed by recovery sequence.

Signed-off-by: Akhil P Oommen 
---

 drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 33 -
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 19 ---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  2 +-
 3 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index f121d798..66ae509 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -133,7 +133,7 @@ bool a6xx_gmu_gx_is_on(struct a6xx_gmu *gmu)
A6XX_GMU_SPTPRAC_PWR_CLK_STATUS_GX_HM_CLK_OFF));
 }
 
-void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
+int a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
 {
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
@@ -145,7 +145,7 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct 
dev_pm_opp *opp)
gpu_freq = dev_pm_opp_get_freq(opp);
 
if (gpu_freq == gmu->freq)
-   return;
+   return 0;
 
for (perf_index = 0; perf_index < gmu->nr_gpu_freqs - 1; perf_index++)
if (gpu_freq == gmu->gpu_freqs[perf_index])
@@ -161,13 +161,13 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct 
dev_pm_opp *opp)
 * bring up the power if it isn't already active
 */
if (pm_runtime_get_if_in_use(gmu->dev) == 0)
-   return;
+   return 0;
 
if (!gmu->legacy) {
-   a6xx_hfi_set_freq(gmu, perf_index);
+   ret = a6xx_hfi_set_freq(gmu, perf_index);
dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
pm_runtime_put(gmu->dev);
-   return;
+   return ret;
}
 
gmu_write(gmu, REG_A6XX_GMU_DCVS_ACK_OPTION, 0);
@@ -182,15 +182,17 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct 
dev_pm_opp *opp)
gmu_write(gmu, REG_A6XX_GMU_DCVS_BW_SETTING, 0xff);
 
/* Set and clear the OOB for DCVS to trigger the GMU */
-   a6xx_gmu_set_oob(gmu, GMU_OOB_DCVS_SET);
+   ret = a6xx_gmu_set_oob(gmu, GMU_OOB_DCVS_SET);
a6xx_gmu_clear_oob(gmu, GMU_OOB_DCVS_SET);
 
-   ret = gmu_read(gmu, REG_A6XX_GMU_DCVS_RETURN);
-   if (ret)
+   if (!ret && gmu_read(gmu, REG_A6XX_GMU_DCVS_RETURN)) {
dev_err(gmu->dev, "GMU set GPU frequency error: %d\n", ret);
+   ret = -EINVAL;
+   }
 
dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
pm_runtime_put(gmu->dev);
+   return ret;
 }
 
 unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu)
@@ -353,11 +355,13 @@ int a6xx_gmu_set_oob(struct a6xx_gmu *gmu, enum 
a6xx_gmu_oob_state state)
ret = gmu_poll_timeout(gmu, REG_A6XX_GMU_GMU2HOST_INTR_INFO, val,
val & (1 << ack), 100, 1);
 
-   if (ret)
+   if (ret) {
DRM_DEV_ERROR(gmu->dev,
"Timeout waiting for GMU OOB set %s: 0x%x\n",
a6xx_gmu_oob_bits[state].name,
gmu_read(gmu, REG_A6XX_GMU_GMU2HOST_INTR_INFO));
+   return -ETIMEDOUT;
+   }
 
/* Clear the acknowledge interrupt */
gmu_write(gmu, REG_A6XX_GMU_GMU2HOST_INTR_CLR, 1 << ack);
@@ -922,18 +926,21 @@ static void a6xx_gmu_force_off(struct a6xx_gmu *gmu)
a6xx_gmu_rpmh_off(gmu);
 }
 
-static void a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu 
*gmu)
+static int a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu)
 {
struct dev_pm_opp *gpu_opp;
unsigned long gpu_freq = gmu->gpu_freqs[gmu->current_perf_index];
+   int ret;
 
gpu_opp = dev_pm_opp_find_freq_exact(&gpu->pdev->dev, gpu_freq, true);
if (IS_ERR(gpu_opp))
-   return;
+   return PTR_ERR(gpu_opp);
 
gmu->freq = 0; /* so a6xx_gmu_set_freq() doesn't exit early */
-   a6xx_gmu_set_freq(gpu, gpu_opp);
+   ret = a6xx_gmu_set_freq(gpu, gpu_opp);
dev_pm_opp_put(gpu_opp);
+
+   return ret;
 }
 
 static void a6xx_gmu_set_initial_bw(struct msm_gpu *gpu, struct a6xx_gmu *gmu)
@@ -1018,7 +1025,7 @@ int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu)
enable_irq(gmu->hfi_irq);
 
/* Set the GPU to the current freq */
-   a6xx_gmu_set_initial_freq(gpu, gmu);
+   ret = a6xx_gmu_set_initial_freq(gpu, gmu);
 
 out:
/* On failure, shut down the GMU to leave it in a good state */
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 8c3cb31..fdfc5c4 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -890,7 +890,7 @@ static int hw_init(struct msm_gpu *gpu)
int ret;
 
/* Make sure the GMU keeps the GPU on while we set it up *

[PATCH v1 10/10] drm/msm/a6xx: Free gmu_debug crashstate bo

2022-03-02 Thread Akhil P Oommen
Free gmu_debug bo while destroying the gpu crashstate.

Signed-off-by: Akhil P Oommen 
---

 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
index 4d4588a..09bb993 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
@@ -1054,6 +1054,9 @@ static void a6xx_gpu_state_destroy(struct kref *kref)
if (a6xx_state->gmu_hfi)
kvfree(a6xx_state->gmu_hfi->data);
 
+   if (a6xx_state->gmu_debug)
+   kvfree(a6xx_state->gmu_debug->data);
+
list_for_each_entry_safe(obj, tmp, &a6xx_state->objs, node)
kfree(obj);
 
-- 
2.7.4



[PATCH v1 09/10] drm/msm: Remove pm_runtime_get() from msm_job_run()

2022-03-02 Thread Akhil P Oommen
We do pm_runtime_get() within msm_gpu_submit(). So remove the redundant
pm_runtime_get/put from msm_job_run().

Signed-off-by: Akhil P Oommen 
---

 drivers/gpu/drm/msm/msm_ringbuffer.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c 
b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 3bbf574..43fb04e 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -18,8 +18,6 @@ static struct dma_fence *msm_job_run(struct drm_sched_job 
*job)
 
submit->hw_fence = msm_fence_alloc(submit->ring->fctx);
 
-   pm_runtime_get_sync(&gpu->pdev->dev);
-
/* TODO move submit path over to using a per-ring lock.. */
mutex_lock(&gpu->lock);
 
@@ -27,8 +25,6 @@ static struct dma_fence *msm_job_run(struct drm_sched_job 
*job)
 
mutex_unlock(&gpu->lock);
 
-   pm_runtime_put(&gpu->pdev->dev);
-
return dma_fence_get(submit->hw_fence);
 }
 
-- 
2.7.4



[PATCH v1 08/10] drm/msm/a6xx: Remove clk votes on failure

2022-03-02 Thread Akhil P Oommen
Remove vote on clks on gpu resume failure.

Signed-off-by: Akhil P Oommen 
---

 drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 66ae509..e90359f 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -1033,6 +1033,7 @@ int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu)
disable_irq(gmu->gmu_irq);
a6xx_gmu_inline_coredump(gmu);
a6xx_rpmh_stop(gmu);
+   clk_bulk_disable_unprepare(gmu->nr_clocks, gmu->clocks);
pm_runtime_put(gmu->gxpd);
pm_runtime_put(gmu->dev);
}
-- 
2.7.4



[PATCH v1 07/10] drm/msm/adreno: Retry on gpu resume failure

2022-03-02 Thread Akhil P Oommen
Retry infinitely on resume failure because there is nothing much we can
do if GPU is not ON. Also, this helps us to avoid checking for the
return value of pm_runtime_get() to see if GPU is ON.

Signed-off-by: Akhil P Oommen 
---

 drivers/gpu/drm/msm/adreno/adreno_device.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 89cfd84..abcc553 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -603,8 +603,16 @@ static const struct of_device_id dt_match[] = {
 static int adreno_resume(struct device *dev)
 {
struct msm_gpu *gpu = dev_to_gpu(dev);
+   int ret;
+
+   /* What hope do we have for the future if we can't turn ON gpu */
+   while (true) {
+   ret = gpu->funcs->pm_resume(gpu);
+   if (!ret)
+   break;
+   }
 
-   return gpu->funcs->pm_resume(gpu);
+   return 0;
 }
 
 static int active_submits(struct msm_gpu *gpu)
-- 
2.7.4



[PATCH v1 05/10] drm/msm: Do recovery on hw_init failure

2022-03-02 Thread Akhil P Oommen
Schedule the recover worker when there is hw init failure in
msm_gpu_submit(). The recover worker will take care of capturing
coredump, gpu recovery and resubmission of pending IBs.

Signed-off-by: Akhil P Oommen 
---

 drivers/gpu/drm/msm/msm_gpu.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index e8a442a..4d24fa1 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -757,12 +757,15 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit)
struct msm_drm_private *priv = dev->dev_private;
struct msm_ringbuffer *ring = submit->ring;
unsigned long flags;
+   int ret;
 
WARN_ON(!mutex_is_locked(&gpu->lock));
 
pm_runtime_get_sync(&gpu->pdev->dev);
 
-   msm_gpu_hw_init(gpu);
+   ret = msm_gpu_hw_init(gpu);
+   if (ret)
+   kthread_queue_work(gpu->worker, &gpu->recover_work);
 
submit->seqno = ++ring->seqno;
 
-- 
2.7.4



[PATCH v1 03/10] drm/msm/a6xx: Avoid gmu lock in pm ops

2022-03-02 Thread Akhil P Oommen
We don't really need gmu lock in runtime pm ops because these operations
are serialized anyway and also with other paths where we take this lock.
This patch will help to simplify the locking order when we introduce
crashstate_lock in the upcoming patch.

Signed-off-by: Akhil P Oommen 
---

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 3faf551..8c3cb31 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1530,9 +1530,7 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
 
trace_msm_gpu_resume(0);
 
-   mutex_lock(&a6xx_gpu->gmu.lock);
ret = a6xx_gmu_resume(a6xx_gpu);
-   mutex_unlock(&a6xx_gpu->gmu.lock);
if (ret)
return ret;
 
@@ -1555,9 +1553,7 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
 
msm_devfreq_suspend(gpu);
 
-   mutex_lock(&a6xx_gpu->gmu.lock);
ret = a6xx_gmu_stop(a6xx_gpu);
-   mutex_unlock(&a6xx_gpu->gmu.lock);
if (ret)
return ret;
 
-- 
2.7.4



[PATCH v1 04/10] drm/msm/a6xx: Enhance debugging of gmu faults

2022-03-02 Thread Akhil P Oommen
Add support for inline capture of gmu coredump in gmu resume/suspend
path to help debug gmu error/faults. This is sort of a lite version of
gpu coredump with just gmu states. And we can't use recover_worker
in these scenarios because gmu is collapsed after a failure in this
path. Hence we need to capture the gmu states inline before it is
collapsed.

Signed-off-by: Akhil P Oommen 
---

 drivers/gpu/drm/msm/adreno/a6xx_gmu.c   | 18 +++--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h   |  1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 60 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 10 +++--
 drivers/gpu/drm/msm/adreno/adreno_gpu.h |  2 +
 drivers/gpu/drm/msm/msm_gpu.c   | 23 ++-
 drivers/gpu/drm/msm/msm_gpu.h   | 11 --
 7 files changed, 105 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index f208a81..f121d798 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -1024,6 +1024,7 @@ int a6xx_gmu_resume(struct a6xx_gpu *a6xx_gpu)
/* On failure, shut down the GMU to leave it in a good state */
if (ret) {
disable_irq(gmu->gmu_irq);
+   a6xx_gmu_inline_coredump(gmu);
a6xx_rpmh_stop(gmu);
pm_runtime_put(gmu->gxpd);
pm_runtime_put(gmu->dev);
@@ -1082,6 +1083,7 @@ static void a6xx_gmu_shutdown(struct a6xx_gmu *gmu)
 {
struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
+   int ret = 0;
u32 val;
 
/*
@@ -1091,10 +1093,11 @@ static void a6xx_gmu_shutdown(struct a6xx_gmu *gmu)
val = gmu_read(gmu, REG_A6XX_GPU_GMU_CX_GMU_RPMH_POWER_STATE);
 
if (val != 0xf) {
-   int ret = a6xx_gmu_wait_for_idle(gmu);
+   ret = a6xx_gmu_wait_for_idle(gmu);
 
/* If the GMU isn't responding assume it is hung */
if (ret) {
+   a6xx_gmu_inline_coredump(gmu);
a6xx_gmu_force_off(gmu);
return;
}
@@ -1102,7 +1105,9 @@ static void a6xx_gmu_shutdown(struct a6xx_gmu *gmu)
a6xx_bus_clear_pending_transactions(adreno_gpu);
 
/* tell the GMU we want to slumber */
-   a6xx_gmu_notify_slumber(gmu);
+   ret = a6xx_gmu_notify_slumber(gmu);
+   if (ret)
+   goto out;
 
ret = gmu_poll_timeout(gmu,
REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_STATUS, val,
@@ -1123,6 +1128,10 @@ static void a6xx_gmu_shutdown(struct a6xx_gmu *gmu)

REG_A6XX_GPU_GMU_AO_GPU_CX_BUSY_STATUS2));
}
 
+out:
+   if (ret)
+   a6xx_gmu_inline_coredump(gmu);
+
/* Turn off HFI */
a6xx_hfi_stop(gmu);
 
@@ -1146,9 +1155,10 @@ int a6xx_gmu_stop(struct a6xx_gpu *a6xx_gpu)
 * Force the GMU off if we detected a hang, otherwise try to shut it
 * down gracefully
 */
-   if (gmu->hung)
+   if (gmu->hung) {
+   a6xx_gmu_inline_coredump(gmu);
a6xx_gmu_force_off(gmu);
-   else
+   } else
a6xx_gmu_shutdown(gmu);
 
/* Remove the bus vote */
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 675aef0..2599443 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -86,5 +86,6 @@ void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state 
*state,
 struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu);
 int a6xx_gpu_state_put(struct msm_gpu_state *state);
 bool a6xx_is_smmu_stalled(struct msm_gpu *gpu);
+void a6xx_gmu_inline_coredump(struct a6xx_gmu *gmu);
 
 #endif /* __A6XX_GPU_H__ */
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
index 09b2ff0..4d4588a 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
@@ -3,6 +3,7 @@
 
 #include 
 #include "msm_gem.h"
+#include "msm_gpu.h"
 #include "a6xx_gpu.h"
 #include "a6xx_gmu.h"
 #include "a6xx_gpu_state.h"
@@ -970,10 +971,19 @@ void a6xx_get_gmu_state(struct msm_gpu *gpu, struct 
a6xx_gpu_state *a6xx_state)
struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
 
-   if (gmu->hung)
+   if (gmu->hung) {
+   mutex_lock(&gmu->lock);
a6xx_gmu_send_nmi(gmu);
+   mutex_unlock(&gmu->lock);
+   }
 
a6xx_get_gmu_registers(gpu, a6xx_state);
+
+   a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, 
&a6xx_gpu->gmu.log);
+   a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, 
&a6xx_gpu->gmu.hfi);
+   a6xx_state->gmu_d

[PATCH v1 02/10] drm/msm/a6xx: Send NMI to gmu when it is hung

2022-03-02 Thread Akhil P Oommen
While capturing gmu state, first send an NMI to gmu when it is hung.
This helps to move gmu to a safe state.

Signed-off-by: Akhil P Oommen 
---

 drivers/gpu/drm/msm/adreno/a6xx_gmu.c   | 37 +
 drivers/gpu/drm/msm/adreno/a6xx_gmu.h   |  1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 14 ++-
 3 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 3e325e2..f208a81 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -14,6 +14,37 @@
 #include "msm_gpu_trace.h"
 #include "msm_mmu.h"
 
+void a6xx_gmu_send_nmi(struct a6xx_gmu *gmu)
+{
+   struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
+   struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
+   struct msm_gpu *gpu = &adreno_gpu->base;
+   u32 val;
+
+   if (a6xx_gmu_gx_is_on(gmu) && a6xx_is_smmu_stalled(gpu)) {
+   DRM_DEV_ERROR(gmu->dev,
+   "Skipping GMU NMI since SMMU is stalled\n");
+   }
+
+   /* Don't retrigger NMI if gmu reset is already active */
+   val = gmu_read(gmu, REG_A6XX_GMU_CM3_FW_INIT_RESULT);
+   if (val & 0xE00)
+   return;
+
+   /* Mask all interrupts from GMU first */
+   gmu_write(gmu, REG_A6XX_GMU_GMU2HOST_INTR_MASK, 0x);
+
+   /* Trigger NMI to make gmu save it's internal state to ddr */
+   val = gmu_read(gmu, REG_A6XX_GMU_CM3_CFG);
+   gmu_write(gmu, REG_A6XX_GMU_CM3_CFG, val | BIT(9));
+
+   /* Barrier to ensure write is posted before we proceed */
+   wmb();
+
+   /* Small delay to ensure state copy is ddr is complete at GMU */
+   udelay(200);
+}
+
 static void a6xx_gmu_fault(struct a6xx_gmu *gmu)
 {
struct a6xx_gpu *a6xx_gpu = container_of(gmu, struct a6xx_gpu, gmu);
@@ -790,6 +821,12 @@ static int a6xx_gmu_fw_start(struct a6xx_gmu *gmu, 
unsigned int state)
gmu_write(gmu, REG_A6XX_GMU_CM3_FW_INIT_RESULT, 0);
gmu_write(gmu, REG_A6XX_GMU_CM3_BOOT_CONFIG, 0x02);
 
+   /*
+* Make sure that the NMI bit is cleared by configuring the reset value
+* here
+*/
+   gmu_write(gmu, REG_A6XX_GMU_CM3_CFG, 0x4052);
+
/* Write the iova of the HFI table */
gmu_write(gmu, REG_A6XX_GMU_HFI_QTBL_ADDR, gmu->hfi.iova);
gmu_write(gmu, REG_A6XX_GMU_HFI_QTBL_INFO, 1);
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h 
b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
index 84bd516..4228ec1 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.h
@@ -186,5 +186,6 @@ int a6xx_hfi_set_freq(struct a6xx_gmu *gmu, int index);
 
 bool a6xx_gmu_gx_is_on(struct a6xx_gmu *gmu);
 bool a6xx_gmu_sptprac_is_on(struct a6xx_gmu *gmu);
+void a6xx_gmu_send_nmi(struct a6xx_gmu *gmu);
 
 #endif
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
index 7de9d2f..09b2ff0 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
@@ -964,6 +964,18 @@ static void a6xx_get_indexed_registers(struct msm_gpu *gpu,
a6xx_state->nr_indexed_regs = count;
 }
 
+void a6xx_get_gmu_state(struct msm_gpu *gpu, struct a6xx_gpu_state *a6xx_state)
+{
+   struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+   struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+   struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
+
+   if (gmu->hung)
+   a6xx_gmu_send_nmi(gmu);
+
+   a6xx_get_gmu_registers(gpu, a6xx_state);
+}
+
 struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu)
 {
struct a6xx_crashdumper _dumper = { 0 }, *dumper = NULL;
@@ -980,7 +992,7 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu 
*gpu)
/* Get the generic state from the adreno core */
adreno_gpu_state_get(gpu, &a6xx_state->base);
 
-   a6xx_get_gmu_registers(gpu, a6xx_state);
+   a6xx_get_gmu_state(gpu, a6xx_state);
 
a6xx_state->gmu_log = a6xx_snapshot_gmu_bo(a6xx_state, 
&a6xx_gpu->gmu.log);
a6xx_state->gmu_hfi = a6xx_snapshot_gmu_bo(a6xx_state, 
&a6xx_gpu->gmu.hfi);
-- 
2.7.4



[PATCH v1 01/10] drm/msm/a6xx: Add helper to check smmu is stalled

2022-03-02 Thread Akhil P Oommen
Add a helper function to check for stalled smmu and also avoid reading
RBBM_STATUS3 register which is in GX domain before ensuring GX is
ON.

Signed-off-by: Akhil P Oommen 
---

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 8 +++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h   | 1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 4 +---
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 7d23c74..3faf551 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -14,6 +14,12 @@
 
 #define GPU_PAS_ID 13
 
+bool a6xx_is_smmu_stalled(struct msm_gpu *gpu)
+{
+   return !!(gpu_read(gpu, REG_A6XX_RBBM_STATUS3) &
+   A6XX_RBBM_STATUS3_SMMU_STALLED_ON_FAULT);
+}
+
 static inline bool _a6xx_check_idle(struct msm_gpu *gpu)
 {
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
@@ -1346,7 +1352,7 @@ static void a6xx_fault_detect_irq(struct msm_gpu *gpu)
 * to otherwise resume normally rather than killing the submit, so
 * just bail.
 */
-   if (gpu_read(gpu, REG_A6XX_RBBM_STATUS3) & 
A6XX_RBBM_STATUS3_SMMU_STALLED_ON_FAULT)
+   if (a6xx_is_smmu_stalled(gpu))
return;
 
/*
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 86e0a7c..675aef0 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -85,5 +85,6 @@ void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state 
*state,
 
 struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu *gpu);
 int a6xx_gpu_state_put(struct msm_gpu_state *state);
+bool a6xx_is_smmu_stalled(struct msm_gpu *gpu);
 
 #endif /* __A6XX_GPU_H__ */
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
index 55f4433..7de9d2f 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c
@@ -971,8 +971,6 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu 
*gpu)
struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
struct a6xx_gpu_state *a6xx_state = kzalloc(sizeof(*a6xx_state),
GFP_KERNEL);
-   bool stalled = !!(gpu_read(gpu, REG_A6XX_RBBM_STATUS3) &
-   A6XX_RBBM_STATUS3_SMMU_STALLED_ON_FAULT);
 
if (!a6xx_state)
return ERR_PTR(-ENOMEM);
@@ -1003,7 +1001,7 @@ struct msm_gpu_state *a6xx_gpu_state_get(struct msm_gpu 
*gpu)
 * write out GPU state, so we need to skip this when the SMMU is
 * stalled in response to an iova fault
 */
-   if (!stalled && !gpu->needs_hw_init &&
+   if (!a6xx_is_smmu_stalled(gpu) &&
!a6xx_crashdumper_init(gpu, &_dumper)) {
dumper = &_dumper;
}
-- 
2.7.4



[PATCH v1 00/10] Support for GMU coredump and some related improvements

2022-03-02 Thread Akhil P Oommen
Major enhancement in this series is the support for a minimal gmu coredump
which can be captured inline instead of through our usual recover worker. It
is helpful in the case of gmu errors during gpu wake-up/suspend path and
helps to capture a snapshot of gmu before we do a suspend. I had to introduce
a lock to synchronize the crashstate because the runtime-suspend can happen
from an asynchronous RPM thread.

Apart from this, there are some improvements to gracefully handle the
gmu errors by propagating the error back to parent or by retrying. Also, a
few patches to fix some trivial bugs in the related code.


Akhil P Oommen (10):
  drm/msm/a6xx: Add helper to check smmu is stalled
  drm/msm/a6xx: Send NMI to gmu when it is hung
  drm/msm/a6xx: Avoid gmu lock in pm ops
  drm/msm/a6xx: Enhance debugging of gmu faults
  drm/msm: Do recovery on hw_init failure
  drm/msm/a6xx: Propagate OOB set error
  drm/msm/adreno: Retry on gpu resume failure
  drm/msm/a6xx: Remove clk votes on failure
  drm/msm: Remove pm_runtime_get() from msm_job_run()
  drm/msm/a6xx: Free gmu_debug crashstate bo

 drivers/gpu/drm/msm/adreno/a6xx_gmu.c   | 89 +++--
 drivers/gpu/drm/msm/adreno/a6xx_gmu.h   |  1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 31 +++---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h   |  4 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 79 +
 drivers/gpu/drm/msm/adreno/adreno_device.c  | 10 +++-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 10 +++-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h |  2 +
 drivers/gpu/drm/msm/msm_gpu.c   | 28 -
 drivers/gpu/drm/msm/msm_gpu.h   | 11 ++--
 drivers/gpu/drm/msm/msm_ringbuffer.c|  4 --
 11 files changed, 218 insertions(+), 51 deletions(-)

-- 
2.7.4



Re: [PATCH 0/2] DSI host and peripheral initialisation ordering

2022-03-02 Thread Dave Stevenson
Hi Doug

On Wed, 2 Mar 2022 at 00:13, Doug Anderson  wrote:
>
> Hi,
>
> On Wed, Feb 16, 2022 at 9:00 AM Dave Stevenson
>  wrote:
> >
> > Hi All
> >
> > Hopefully I've cc'ed all those that have bashed this problem around 
> > previously,
> > or are otherwise linked to DRM bridges.
> >
> > There have been numerous discussions around how DSI support is currently 
> > broken
> > as it doesn't support initialising the PHY to LP-11 and potentially the 
> > clock
> > lane to HS prior to configuring the DSI peripheral. There is no op where the
> > interface is initialised but HS video isn't also being sent.
> > Currently you have:
> > - peripheral pre_enable (host not initialised yet)
> > - host pre_enable
> > - encoder enable
> > - host enable
> > - peripheral enable (video already running)
> >
> > vc4 and exynos currently implement the DSI host as an encoder, and split the
> > bridge_chain. This fails if you want to switch to being a bridge and/or use
> > atomic calls as the state of all the elements split off are not added by
> > drm_atomic_add_encoder_bridges.
> >
> > dw-mipi-dsi[1] and now msm[2] use the mode_set hook to initialise the PHY, 
> > so
> > the bridge/panel pre_enable can send commands. In their post_disable they 
> > then
> > call the downstream bridge/panel post_disable op manually so that shutdown
> > commands can be sent before shutting down the PHY. Nothing handles that 
> > fact,
> > so the framework then continues down the bridge chain and calls the 
> > post_disable
> > again, so we get unbalanced panel prepare/unprepare calls being reported 
> > [3].
> >
> > There have been patches[4] proposing reversing the entire direction of
> > pre_enable and post_disable, but that risks driving voltage into devices 
> > that
> > have yet to be powered up.
> > There have been discussions about adding either a pre_pre_enable, or adding 
> > a
> > DSI host_op to initialise the host[5]. Both require significant reworking 
> > to all
> > existing drivers in moving initialisation phases.
> > We have patches that look like they may well be addressing race conditions 
> > in
> > starting up a DSI peripheral[6].
>
> In general I'm happy to let the more senior people in DRM set the
> direction here so I probably won't do lots of review, but I will point
> out that I did have another proposal that sorta got lost in the noise
> of the whole "reversing the entire direction". That's basically:
>
> https://lists.freedesktop.org/archives/dri-devel/2021-October/328934.html
>
> I have no idea if something like that would work for your use case,
> but after analyzing it it felt like a surprisingly clean proposal even
> if my first instinct when I thought about it was that it was a hack.
> ;-) I suspect (but haven't analyzed your code) that it might be
> equivalent to your proposal of using a flag but maybe easier to wrap
> ones head around?

If I'm reading that right, then you're proposing adding
after_pre_enable and before_post_disable hooks.
That's almost the same as the power_up() and power_down() ops that
Dmitry suggested earlier, or pre_pre_enable / post_post_disable that
had also been considered.

Neither of those options handles the case of a longer chain in which
two non-consecutive links want their upstream bridge enabled first.
As per the clarification in patch 1/2, considering the chain
- Panel
 - Bridge 1
 - Bridge 2 DRM_BRIDGE_OP_UPSTREAM_FIRST
 - Bridge 3
 - Bridge 4 DRM_BRIDGE_OP_UPSTREAM_FIRST
 - Bridge 5
 - Encoder
With the flag option we call pre_enables as Panel, Bridge 1, Bridge 3,
Bridge 2, Bridge 5, Bridge 4, Encoder.
If adding after_pre_enable, then we end up with Panel, Bridge 1,
Bridge 3, Bridge 5, Bridge 4 (after_pre_enable), Bridge 2
(after_pre_enable), Encoder.
(power_on / pre_pre_enable from encoder to connector would end up with
Bridge 5 (power_on), Bridge 3 (power_on), Bridge 1 (power_on), Panel,
Bridge 2, Bridge 4, Encoder).
Those potentially work, but it seems a less logical order compared to
using a flag to swap only the bridges of interest. I think power_on /
pre_pre_enable covers DSI better than after_pre_enable.

Adding the extra ops requires the source bridge (eg DSI host) to know
the behaviour the sink bridge/panel wants. So do all DSI hosts have to
implement power_on to power up and enter LP-11. Some DSI peripherals
may be quite happy or even prefer to have the bus totally idle /
powered down at pre_enable, but there would be no way of implementing
that.

You seem to be looking at DP, which I have very little knowledge of,
and I don't quite understand your comments about the AUX bus and how
ordering should be configured. If your panel isn't a generic driver,
couldn't it request that the upstream bridge is pre_enabled first?

  Dave


Re: [Intel-gfx] [PATCH 6/6] treewide: remove check of list iterator against head past the loop body

2022-03-02 Thread Tvrtko Ursulin



On 28/02/2022 11:08, Jakob Koschel wrote:

When list_for_each_entry() completes the iteration over the whole list
without breaking the loop, the iterator value will be a bogus pointer
computed based on the head element.

While it is safe to use the pointer to determine if it was computed
based on the head element, either with list_entry_is_head() or
&pos->member == head, using the iterator variable after the loop should
be avoided.

In preparation to limiting the scope of a list iterator to the list
traversal loop, use a dedicated pointer to point to the found element.

Signed-off-by: Jakob Koschel 


[snip until i915 parts]


  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 14 +++---
  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 15 ---
  drivers/gpu/drm/i915/gt/intel_ring.c  | 15 ---


[snip]


diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 00327b750fbb..80c79028901a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -107,25 +107,27 @@ static void lut_close(struct i915_gem_context *ctx)
radix_tree_for_each_slot(slot, &ctx->handles_vma, &iter, 0) {
struct i915_vma *vma = rcu_dereference_raw(*slot);
struct drm_i915_gem_object *obj = vma->obj;
-   struct i915_lut_handle *lut;
+   struct i915_lut_handle *lut = NULL;
+   struct i915_lut_handle *tmp;

if (!kref_get_unless_zero(&obj->base.refcount))
continue;

spin_lock(&obj->lut_lock);
-   list_for_each_entry(lut, &obj->lut_list, obj_link) {
-   if (lut->ctx != ctx)
+   list_for_each_entry(tmp, &obj->lut_list, obj_link) {
+   if (tmp->ctx != ctx)
continue;

-   if (lut->handle != iter.index)
+   if (tmp->handle != iter.index)
continue;

-   list_del(&lut->obj_link);
+   list_del(&tmp->obj_link);
+   lut = tmp;
break;
}
spin_unlock(&obj->lut_lock);

-   if (&lut->obj_link != &obj->lut_list) {
+   if (lut) {
i915_lut_handle_free(lut);
radix_tree_iter_delete(&ctx->handles_vma, &iter, slot);


Looks okay although personally I would have left lut as is for a smaller 
diff and introduced a new local like 'found' or 'unlinked'.



i915_vma_close(vma);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 1736efa43339..fda9e3685ad2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2444,7 +2444,8 @@ static struct i915_request *eb_throttle(struct 
i915_execbuffer *eb, struct intel
  {
struct intel_ring *ring = ce->ring;
struct intel_timeline *tl = ce->timeline;
-   struct i915_request *rq;
+   struct i915_request *rq = NULL;
+   struct i915_request *tmp;

/*
 * Completely unscientific finger-in-the-air estimates for suitable
@@ -2460,15 +2461,17 @@ static struct i915_request *eb_throttle(struct 
i915_execbuffer *eb, struct intel
 * claiming our resources, but not so long that the ring completely
 * drains before we can submit our next request.
 */
-   list_for_each_entry(rq, &tl->requests, link) {
-   if (rq->ring != ring)
+   list_for_each_entry(tmp, &tl->requests, link) {
+   if (tmp->ring != ring)
continue;

-   if (__intel_ring_space(rq->postfix,
-  ring->emit, ring->size) > ring->size / 2)
+   if (__intel_ring_space(tmp->postfix,
+  ring->emit, ring->size) > ring->size / 
2) {
+   rq = tmp;
break;
+   }
}
-   if (&rq->link == &tl->requests)
+   if (!rq)
return NULL; /* weird, we will check again later for real */


Alternatively, instead of break could simply do "return 
i915_request_get(rq);" and replace the end of the function after the 
loop with "return NULL;". A bit smaller diff, or at least less "spread 
out" over the function, so might be easier to backport stuff touching 
this area in the future. But looks correct as is.




return i915_request_get(rq);
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c 
b/drivers/gpu/drm/i915/gt/intel_ring.c
index 2fdd52b62092..4881c4e0c407 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -191,24 +191,27 @@ wait_for_space(struct intel_ring *ring,
   struct intel_timeline *tl,
 

Re: [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range

2022-03-02 Thread Alex Deucher
On Wed, Mar 2, 2022 at 10:55 AM Michael Cheng  wrote:
>
> Thanks for the feedback Robin!
>
> Sorry my choices of word weren't that great, but what I meant is to
> understand how ARM flushes a range of dcache for device drivers, and not
> an equal to x86 clflush.
>
> I believe the concern is if the CPU writes an update, that update might
> only be sitting in the CPU cache and never make it to device memory
> where the device can see it; there are specific places that we are
> supposed to flush the CPU caches to make sure our updates are visible to
> the hardware.
>
> +Matt Roper
>
> Matt, Lucas, any feed back here?

MMIO (e.g., PCI BARs, etc.) should be mapped uncached.  If it's not
you'll have a lot of problems using a GPU on that architecture.  One
thing that you may want to check is if your device has its own caches
or write queues on the BAR aperture.  You may have to flush them after
CPU access to the BAR to make sure CPU updates land in device memory.
For system memory, PCI, per the spec, should be cache coherent with
the CPU.  If it's not, you'll have a lot of trouble using a GPU on
that platform.

Alex

>
> On 2022-03-02 4:49 a.m., Robin Murphy wrote:
> > On 2022-02-25 19:27, Michael Cheng wrote:
> >> Hi Robin,
> >>
> >> [ +arm64 maintainers for their awareness, which would have been a
> >> good thing to do from the start ]
> >>
> >>   * Thanks for adding the arm64 maintainer and sorry I didn't rope them
> >> in sooner.
> >>
> >> Why does i915 need to ensure the CPU's instruction cache is coherent
> >> with its data cache? Is it a self-modifying driver?
> >>
> >>   * Also thanks for pointing this out. Initially I was using
> >> dcache_clean_inval_poc, which seem to be the equivalently to what
> >> x86 is doing for dcache flushing, but it was giving me build errors
> >> since its not on the global list of kernel symbols. And after
> >> revisiting the documentation for caches_clean_inval_pou, it won't
> >> fly for what we are trying to do. Moving forward, what would you (or
> >> someone in the ARM community) suggest we do? Could it be possible to
> >> export dcache_clean_inval_poc as a global symbol?
> >
> > Unlikely, unless something with a legitimate need for CPU-centric
> > cache maintenance like kexec or CPU hotplug ever becomes modular.
> >
> > In the case of a device driver, it's not even the basic issues of
> > assuming to find direct equivalents to x86 semantics in other CPU
> > architectures, or effectively reinventing parts of the DMA API, it's
> > even bigger than that. Once you move from being integrated in a single
> > vendor's system architecture to being on a discrete card, you
> > fundamentally *no longer have any control over cache coherency*.
> > Whether the host CPU architecture happens to be AArch64, RISC-V, or
> > whatever doesn't really matter, you're at the mercy of 3rd-party PCIe
> > and interconnect IP vendors, and SoC integrators. You'll find yourself
> > in systems where PCIe simply cannot snoop any caches, where you'd
> > better have the correct DMA API calls in place to have any hope of
> > even the most basic functionality working properly; you'll find
> > yourself in systems where even if the PCIe root complex claims to
> > support No Snoop, your uncached traffic will still end up snooping
> > stale data that got prefetched back into caches you thought you'd
> > invalidated; you'll find yourself in systems where your memory
> > attributes may or may not get forcibly rewritten by an IOMMU depending
> > on the kernel config and/or command line.
> >
> > It's not about simply finding a substitute for clflush, it's that the
> > reasons you have for using clflush in the first place can no longer be
> > assumed to be valid.
> >
> > Robin.
> >
> >> On 2022-02-25 10:24 a.m., Robin Murphy wrote:
> >>> [ +arm64 maintainers for their awareness, which would have been a
> >>> good thing to do from the start ]
> >>>
> >>> On 2022-02-25 03:24, Michael Cheng wrote:
>  Add arm64 support for drm_clflush_virt_range. caches_clean_inval_pou
>  performs a flush by first performing a clean, follow by an
>  invalidation
>  operation.
> 
>  v2 (Michael Cheng): Use correct macro for cleaning and invalidation
>  the
>  dcache. Thanks Tvrtko for the suggestion.
> 
>  v3 (Michael Cheng): Replace asm/cacheflush.h with linux/cacheflush.h
> 
>  v4 (Michael Cheng): Arm64 does not export dcache_clean_inval_poc as a
>  symbol that could be use by other modules, thus use
>  caches_clean_inval_pou instead. Also this version
>  removes include for cacheflush, since its already
>  included base on architecture type.
> 
>  Signed-off-by: Michael Cheng 
>  Reviewed-by: Matt Roper 
>  ---
>    drivers/gpu/drm/drm_cache.c | 5 +
>    1 file changed, 5 insertions(+)
> 
>  diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/g

Re: [PATCH] ASoC: qcom: Fix error code in lpass_platform_copy()

2022-03-02 Thread Mark Brown
On Tue, 1 Mar 2022 11:11:04 +0300, Dan Carpenter wrote:
> The copy_to/from_user() functions return the number of bytes remaining
> to be copied.  This function needs to return negative error codes
> because snd_soc_pcm_component_copy_user() treats positive returns as
> success in soc_component_ret().
> 
> 

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: qcom: Fix error code in lpass_platform_copy()
  commit: d5dd781bcc81aa31b62310927f25cfa2574450f1

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


Re: [PATCH 3/3] dt-bindings: display: bridge: renesas, lvds: Document r8a77961 bindings

2022-03-02 Thread Geert Uytterhoeven
Hi Laurent,

On Wed, Dec 29, 2021 at 5:47 PM Laurent Pinchart
 wrote:
> On Fri, Dec 24, 2021 at 08:23:09AM +0300, Nikita Yushchenko wrote:
> > Document the R-Car M3-W+ (R8A77961) SoC in the R-Car LVDS encoder
> > bindings.
> >
> > Signed-off-by: Nikita Yushchenko 
> > ---
> >  .../devicetree/bindings/display/bridge/renesas,lvds.yaml | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git 
> > a/Documentation/devicetree/bindings/display/bridge/renesas,lvds.yaml 
> > b/Documentation/devicetree/bindings/display/bridge/renesas,lvds.yaml
> > index acfc327f70a7..ca5443e5c2e3 100644
> > --- a/Documentation/devicetree/bindings/display/bridge/renesas,lvds.yaml
> > +++ b/Documentation/devicetree/bindings/display/bridge/renesas,lvds.yaml
> > @@ -27,6 +27,7 @@ properties:
> >- renesas,r8a7791-lvds # for R-Car M2-W compatible LVDS encoders
> >- renesas,r8a7793-lvds # for R-Car M2-N compatible LVDS encoders
> >- renesas,r8a7795-lvds # for R-Car H3 compatible LVDS encoders
> > +  - renesas,r8a77961-lvds # for R-Car M3-W+ compatible LVDS encoders
>
> I'll move this line after the next to keep them sorted. No need to
> resubmit.

Any chance this will happen soon? Same for patch 1/3 .
Patch 2/3 is already queued in soc/for-next.

Thank you!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH v5 4/7] drm/i915/gt: create per-tile sysfs interface

2022-03-02 Thread Andrzej Hajda




On 17.02.2022 15:41, Andi Shyti wrote:

Now that we have tiles we want each of them to have its own
interface. A directory "gt/" is created under "cardN/" that will
contain as many diroctories as the tiles.

In the coming patches tile related interfaces will be added. For
now the sysfs gt structure simply has an id interface related
to the current tile count.

The directory structure will follow this scheme:

 /sys/.../card0
  └── gt
      ├── gt0
      │   └── id
  :
 :
 └─- gtN
          └── id

This new set of interfaces will be a basic tool for system
managers and administrators when using i915.

Signed-off-by: Andi Shyti 
Cc: Matt Roper 
Cc: Sujaritha Sundaresan 
Cc: Tvrtko Ursulin 
Reviewed-by: Sujaritha Sundaresan 
---
  drivers/gpu/drm/i915/Makefile|   1 +
  drivers/gpu/drm/i915/gt/intel_gt.c   |   2 +
  drivers/gpu/drm/i915/gt/intel_gt_sysfs.c | 118 +++
  drivers/gpu/drm/i915/gt/intel_gt_sysfs.h |  34 +++
  drivers/gpu/drm/i915/i915_drv.h  |   2 +
  drivers/gpu/drm/i915/i915_sysfs.c|  12 ++-
  drivers/gpu/drm/i915/i915_sysfs.h|   3 +
  7 files changed, 171 insertions(+), 1 deletion(-)
  create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
  create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_sysfs.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 9d588d936e3d..277064b00afd 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -105,6 +105,7 @@ gt-y += \
gt/intel_gt_pm_debugfs.o \
gt/intel_gt_pm_irq.o \
gt/intel_gt_requests.o \
+   gt/intel_gt_sysfs.o \
gt/intel_gtt.o \
gt/intel_llc.o \
gt/intel_lrc.o \
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index 8c64b81e9ec9..0f080bbad043 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -26,6 +26,7 @@
  #include "intel_rc6.h"
  #include "intel_renderstate.h"
  #include "intel_rps.h"
+#include "intel_gt_sysfs.h"
  #include "intel_uncore.h"
  #include "shmem_utils.h"
  
@@ -458,6 +459,7 @@ void intel_gt_driver_register(struct intel_gt *gt)

intel_rps_driver_register(>->rps);
  
  	intel_gt_debugfs_register(gt);

+   intel_gt_sysfs_register(gt);
  }
  
  static int intel_gt_init_scratch(struct intel_gt *gt, unsigned int size)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c 
b/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
new file mode 100644
index ..0206e9aa4867
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "i915_drv.h"
+#include "i915_sysfs.h"
+#include "intel_gt.h"
+#include "intel_gt_sysfs.h"
+#include "intel_gt_types.h"
+#include "intel_rc6.h"
+
+bool is_object_gt(struct kobject *kobj)
+{
+   return !strncmp(kobj->name, "gt", 2);
+}


It looks quite fragile, at the moment I do not have better idea:) maybe 
after reviewing the rest of the patches.



+
+static struct intel_gt *kobj_to_gt(struct kobject *kobj)
+{
+   return container_of(kobj, struct kobj_gt, base)->gt;
+}
+
+struct intel_gt *intel_gt_sysfs_get_drvdata(struct device *dev,
+   const char *name)
+{
+   struct kobject *kobj = &dev->kobj;
+
+   /*
+* We are interested at knowing from where the interface
+* has been called, whether it's called from gt/ or from
+* the parent directory.
+* From the interface position it depends also the value of
+* the private data.
+* If the interface is called from gt/ then private data is
+* of the "struct intel_gt *" type, otherwise it's * a
+* "struct drm_i915_private *" type.
+*/
+   if (!is_object_gt(kobj)) {
+   struct drm_i915_private *i915 = kdev_minor_to_i915(dev);
+
+   pr_devel_ratelimited(DEPRECATED
+   "%s (pid %d) is accessing deprecated %s "
+   "sysfs control, please use gt/gt/%s instead\n",
+   current->comm, task_pid_nr(current), name, name);
+   return to_gt(i915);
+   }
+
+   return kobj_to_gt(kobj);


It took some time for me to understand what is going on here.
We have dev argument which sometimes can point to "struct device", 
sometimes to "struct kobj_gt", but it's type suggests differently, quite 
ugly.
I wonder if wouldn't be better to use __ATTR instead of DEVICE_ATTR* as 
in case of intel_engines_add_sysfs. This way abstractions would look 
better, hopefully.



+}
+
+static ssize_t id_show(struct device *dev,
+  struct device_attribute *attr,
+  char *buf)
+{
+   struct intel_gt *gt = intel_

Re: [PATCH] video: fbdev: sm712fb: Fix crash in smtcfb_write()

2022-03-02 Thread Helge Deller
On 3/2/22 15:33, Zheyu Ma wrote:
> When the sm712fb driver writes three bytes to the framebuffer, the
> driver will crash:
>
> BUG: unable to handle page fault for address: c90001ff
> RIP: 0010:smtcfb_write+0x454/0x5b0
> Call Trace:
>  vfs_write+0x291/0xd60
>  ? do_sys_openat2+0x27d/0x350
>  ? __fget_light+0x54/0x340
>  ksys_write+0xce/0x190
>  do_syscall_64+0x43/0x90
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> Fix it by removing the open-coded endianness fixup-code.
>
> Signed-off-by: Zheyu Ma 

Thanks... it's already in the fbdev git tree and queued up for v5.18...
https://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev.git/commit/?h=for-next&id=bd771cf5c4254511cc4abb88f3dab3bd58bdf8e8

Helge


> ---
>  drivers/video/fbdev/sm712fb.c | 21 -
>  1 file changed, 4 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/video/fbdev/sm712fb.c b/drivers/video/fbdev/sm712fb.c
> index 0dbc6bf8268a..e355089ac7d6 100644
> --- a/drivers/video/fbdev/sm712fb.c
> +++ b/drivers/video/fbdev/sm712fb.c
> @@ -1130,7 +1130,7 @@ static ssize_t smtcfb_write(struct fb_info *info, const 
> char __user *buf,
>   count = total_size - p;
>   }
>
> - buffer = kmalloc((count > PAGE_SIZE) ? PAGE_SIZE : count, GFP_KERNEL);
> + buffer = kmalloc(PAGE_SIZE, GFP_KERNEL);
>   if (!buffer)
>   return -ENOMEM;
>
> @@ -1148,24 +1148,11 @@ static ssize_t smtcfb_write(struct fb_info *info, 
> const char __user *buf,
>   break;
>   }
>
> - for (i = c >> 2; i--;) {
> - fb_writel(big_swap(*src), dst++);
> + for (i = (c + 3) >> 2; i--;) {
> + fb_writel(big_swap(*src), dst);
> + dst++;
>   src++;
>   }
> - if (c & 3) {
> - u8 *src8 = (u8 *)src;
> - u8 __iomem *dst8 = (u8 __iomem *)dst;
> -
> - for (i = c & 3; i--;) {
> - if (i & 1) {
> - fb_writeb(*src8++, ++dst8);
> - } else {
> - fb_writeb(*src8++, --dst8);
> - dst8 += 2;
> - }
> - }
> - dst = (u32 __iomem *)dst8;
> - }
>
>   *ppos += c;
>   buf += c;



Re: [RFC PATCH] drm/panel: simple: panel-dpi: use bus-format to set bpc and bus_format

2022-03-02 Thread Marek Vasut

On 3/2/22 15:21, Maxime Ripard wrote:

Hi,


Hi,


Please try to avoid top posting

On Wed, Feb 23, 2022 at 04:25:19PM +0100, Max Krummenacher wrote:

The goal here is to set the element bus_format in the struct
panel_desc. This is an enum with the possible values defined in
include/uapi/linux/media-bus-format.h.

The enum values are not constructed in a way that you could calculate
the value from color channel width/shift/mapping/whatever. You rather
would have to check if the combination of color channel
width/shift/mapping/whatever maps to an existing value and otherwise
EINVAL out.

I don't see the value in having yet another way of how this
information can be specified and then having to write a more
complicated parser which maps the dt data to bus_format.


Generally speaking, sending an RFC without explicitly stating what you
want a comment on isn't very efficient.


Isn't that what RFC stands for -- Request For Comment ?


That being said, what I (and I can only assume Marek) don't like is the
string encoding. Especially when the similar bus-type property uses a
integer with the various available bus options we have.


Right, the string encoding isn't good.


Having an integer, with a set of defines that you would map to the
proper MEDIA_BUS_* would be more efficient and more elegant.

That being said, the first question that needs to be answered is why
does this have to be in the DT in the first place?


Because panel-simple panel-dpi , you may need to specify the bus format 
between the last bridge and the panel .


Re: [PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range

2022-03-02 Thread Michael Cheng

Thanks for the feedback Robin!

Sorry my choices of word weren't that great, but what I meant is to 
understand how ARM flushes a range of dcache for device drivers, and not 
an equal to x86 clflush.


I believe the concern is if the CPU writes an update, that update might 
only be sitting in the CPU cache and never make it to device memory 
where the device can see it; there are specific places that we are 
supposed to flush the CPU caches to make sure our updates are visible to 
the hardware.


+Matt Roper

Matt, Lucas, any feed back here?

On 2022-03-02 4:49 a.m., Robin Murphy wrote:

On 2022-02-25 19:27, Michael Cheng wrote:

Hi Robin,

[ +arm64 maintainers for their awareness, which would have been a 
good thing to do from the start ]


  * Thanks for adding the arm64 maintainer and sorry I didn't rope them
    in sooner.

Why does i915 need to ensure the CPU's instruction cache is coherent 
with its data cache? Is it a self-modifying driver?


  * Also thanks for pointing this out. Initially I was using
    dcache_clean_inval_poc, which seem to be the equivalently to what
    x86 is doing for dcache flushing, but it was giving me build errors
    since its not on the global list of kernel symbols. And after
    revisiting the documentation for caches_clean_inval_pou, it won't
    fly for what we are trying to do. Moving forward, what would you (or
    someone in the ARM community) suggest we do? Could it be possible to
    export dcache_clean_inval_poc as a global symbol?


Unlikely, unless something with a legitimate need for CPU-centric 
cache maintenance like kexec or CPU hotplug ever becomes modular.


In the case of a device driver, it's not even the basic issues of 
assuming to find direct equivalents to x86 semantics in other CPU 
architectures, or effectively reinventing parts of the DMA API, it's 
even bigger than that. Once you move from being integrated in a single 
vendor's system architecture to being on a discrete card, you 
fundamentally *no longer have any control over cache coherency*. 
Whether the host CPU architecture happens to be AArch64, RISC-V, or 
whatever doesn't really matter, you're at the mercy of 3rd-party PCIe 
and interconnect IP vendors, and SoC integrators. You'll find yourself 
in systems where PCIe simply cannot snoop any caches, where you'd 
better have the correct DMA API calls in place to have any hope of 
even the most basic functionality working properly; you'll find 
yourself in systems where even if the PCIe root complex claims to 
support No Snoop, your uncached traffic will still end up snooping 
stale data that got prefetched back into caches you thought you'd 
invalidated; you'll find yourself in systems where your memory 
attributes may or may not get forcibly rewritten by an IOMMU depending 
on the kernel config and/or command line.


It's not about simply finding a substitute for clflush, it's that the 
reasons you have for using clflush in the first place can no longer be 
assumed to be valid.


Robin.


On 2022-02-25 10:24 a.m., Robin Murphy wrote:
[ +arm64 maintainers for their awareness, which would have been a 
good thing to do from the start ]


On 2022-02-25 03:24, Michael Cheng wrote:

Add arm64 support for drm_clflush_virt_range. caches_clean_inval_pou
performs a flush by first performing a clean, follow by an 
invalidation

operation.

v2 (Michael Cheng): Use correct macro for cleaning and invalidation 
the

    dcache. Thanks Tvrtko for the suggestion.

v3 (Michael Cheng): Replace asm/cacheflush.h with linux/cacheflush.h

v4 (Michael Cheng): Arm64 does not export dcache_clean_inval_poc as a
    symbol that could be use by other modules, thus use
    caches_clean_inval_pou instead. Also this version
    removes include for cacheflush, since its already
    included base on architecture type.

Signed-off-by: Michael Cheng 
Reviewed-by: Matt Roper 
---
  drivers/gpu/drm/drm_cache.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index c3e6e615bf09..81c28714f930 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -174,6 +174,11 @@ drm_clflush_virt_range(void *addr, unsigned 
long length)

    if (wbinvd_on_all_cpus())
  pr_err("Timed out waiting for cache flush\n");
+
+#elif defined(CONFIG_ARM64)
+    void *end = addr + length;
+    caches_clean_inval_pou((unsigned long)addr, (unsigned long)end);


Why does i915 need to ensure the CPU's instruction cache is coherent 
with its data cache? Is it a self-modifying driver?


Robin.

(Note that the above is somewhat of a loaded question, and I do 
actually have half an idea of what you're trying to do here and why 
it won't fly, but I'd like to at least assume you've read the 
documentation of the function you decided was OK to use)



+
  #else
  WARN_ONCE(1, "Architecture has no drm_cache.c support\n");
  #endif


[PATCH 8/8] drm/vmwgfx: Stop using surface dma commands on most configurations

2022-03-02 Thread Zack Rusin
From: Zack Rusin 

Initial version of guest backed objects in the host had some performance
issues that made using surface-dma's instead of direct copies faster.
Surface dma's force a migration to vram which at best is slow and at
worst is impossible (e.g. on svga3 where there's not enough vram
to migrate fb's to it).

Slowly migrate away from surface dma's to direct copies by limiting
their usage to systems with more than 32MB of vram.

Signed-off-by: Zack Rusin 
Reviewed-by: Maaz Mombasawala 
Reviewed-by: Martin Krastev 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c
index 42b5ecb0c5e9..eb014b97d156 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c
@@ -138,6 +138,11 @@ static void vmw_stdu_destroy(struct 
vmw_screen_target_display_unit *stdu);
  * Screen Target Display Unit CRTC Functions
  */
 
+static bool vmw_stdu_use_cpu_blit(const struct vmw_private *vmw)
+{
+   return !(vmw->capabilities & SVGA_CAP_3D) || vmw->vram_size < (32 * 
1024 * 1024);
+}
+
 
 /**
  * vmw_stdu_crtc_destroy - cleans up the STDU
@@ -689,7 +694,7 @@ int vmw_kms_stdu_dma(struct vmw_private *dev_priv,
container_of(vfb, struct vmw_framebuffer_bo, base)->buffer;
struct vmw_stdu_dirty ddirty;
int ret;
-   bool cpu_blit = !(dev_priv->capabilities & SVGA_CAP_3D);
+   bool cpu_blit = vmw_stdu_use_cpu_blit(dev_priv);
DECLARE_VAL_CONTEXT(val_ctx, NULL, 0);
 
/*
@@ -1164,7 +1169,7 @@ vmw_stdu_primary_plane_prepare_fb(struct drm_plane *plane,
 * so cache these mappings
 */
if (vps->content_fb_type == SEPARATE_BO &&
-   !(dev_priv->capabilities & SVGA_CAP_3D))
+   vmw_stdu_use_cpu_blit(dev_priv))
vps->cpp = new_fb->pitches[0] / new_fb->width;
 
return 0;
@@ -1368,7 +1373,7 @@ static int vmw_stdu_plane_update_bo(struct vmw_private 
*dev_priv,
bo_update.base.vfb = vfb;
bo_update.base.out_fence = out_fence;
bo_update.base.mutex = NULL;
-   bo_update.base.cpu_blit = !(dev_priv->capabilities & SVGA_CAP_3D);
+   bo_update.base.cpu_blit = vmw_stdu_use_cpu_blit(dev_priv);
bo_update.base.intr = false;
 
/*
-- 
2.32.0



[PATCH 6/8] drm/vmwgfx: Initialize drm_mode_fb_cmd2

2022-03-02 Thread Zack Rusin
From: Zack Rusin 

Transition to drm_mode_fb_cmd2 from drm_mode_fb_cmd left the structure
unitialized. drm_mode_fb_cmd2 adds a few additional members, e.g. flags
and modifiers which were never initialized. Garbage in those members
can cause random failures during the bringup of the fbcon.

Initializing the structure fixes random blank screens after bootup due
to flags/modifiers mismatches during the fbcon bring up.

Fixes: dabdcdc9822a ("drm/vmwgfx: Switch to mode_cmd2")
Signed-off-by: Zack Rusin 
Cc: Daniel Vetter 
Cc:  # v4.10+
Reviewed-by: Martin Krastev 
Reviewed-by: Maaz Mombasawala 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_fb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c
index 8ee34576c7d0..adf17c740656 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c
@@ -483,7 +483,7 @@ static int vmw_fb_kms_detach(struct vmw_fb_par *par,
 
 static int vmw_fb_kms_framebuffer(struct fb_info *info)
 {
-   struct drm_mode_fb_cmd2 mode_cmd;
+   struct drm_mode_fb_cmd2 mode_cmd = {0};
struct vmw_fb_par *par = info->par;
struct fb_var_screeninfo *var = &info->var;
struct drm_framebuffer *cur_fb;
-- 
2.32.0



[PATCH 5/8] drm/vmwgfx: Allow querying of the SVGA PCI id from the userspace

2022-03-02 Thread Zack Rusin
From: Zack Rusin 

Mesa3D loaders require knowledge of the devices PCI id. SVGAv2 and v3
have different PCI id's, but the same driver is used to handle them both.
To allow Mesa3D svga driver to be loaded automatically for both SVGAv2
and SVGAv3 make the kernel return the PCI id of the currently running
device.

Signed-off-by: Zack Rusin 
Reviewed-by: Martin Krastev 
Reviewed-by: Maaz Mombasawala 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c | 27 +++
 include/uapi/drm/vmwgfx_drm.h |  9 -
 2 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c
index 471da2b4c177..a1da5678c731 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 OR MIT
 /**
  *
- * Copyright 2009-2015 VMware, Inc., Palo Alto, CA., USA
+ * Copyright 2009-2022 VMware, Inc., Palo Alto, CA., USA
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
  * copy of this software and associated documentation files (the
@@ -27,9 +27,11 @@
 
 #include "vmwgfx_drv.h"
 #include "vmwgfx_devcaps.h"
-#include 
 #include "vmwgfx_kms.h"
 
+#include 
+#include 
+
 int vmw_getparam_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_priv)
 {
@@ -62,17 +64,15 @@ int vmw_getparam_ioctl(struct drm_device *dev, void *data,
break;
case DRM_VMW_PARAM_FIFO_HW_VERSION:
{
-   if ((dev_priv->capabilities & SVGA_CAP_GBOBJECTS)) {
+   if ((dev_priv->capabilities & SVGA_CAP_GBOBJECTS))
param->value = SVGA3D_HWVERSION_WS8_B1;
-   break;
-   }
-
-   param->value =
-   vmw_fifo_mem_read(dev_priv,
- ((vmw_fifo_caps(dev_priv) &
-   SVGA_FIFO_CAP_3D_HWVERSION_REVISED) 
?
-  
SVGA_FIFO_3D_HWVERSION_REVISED :
-  SVGA_FIFO_3D_HWVERSION));
+   else
+   param->value = vmw_fifo_mem_read(
+  dev_priv,
+  ((vmw_fifo_caps(dev_priv) &
+
SVGA_FIFO_CAP_3D_HWVERSION_REVISED) ?
+   
SVGA_FIFO_3D_HWVERSION_REVISED :
+   
SVGA_FIFO_3D_HWVERSION));
break;
}
case DRM_VMW_PARAM_MAX_SURF_MEMORY:
@@ -108,6 +108,9 @@ int vmw_getparam_ioctl(struct drm_device *dev, void *data,
case DRM_VMW_PARAM_GL43:
param->value = has_gl43_context(dev_priv);
break;
+   case DRM_VMW_PARAM_DEVICE_ID:
+   param->value = to_pci_dev(dev_priv->drm.dev)->device;
+   break;
default:
return -EINVAL;
}
diff --git a/include/uapi/drm/vmwgfx_drm.h b/include/uapi/drm/vmwgfx_drm.h
index 8277644c1144..26549c86a91f 100644
--- a/include/uapi/drm/vmwgfx_drm.h
+++ b/include/uapi/drm/vmwgfx_drm.h
@@ -1,6 +1,6 @@
 /**
  *
- * Copyright © 2009-2015 VMware, Inc., Palo Alto, CA., USA
+ * Copyright © 2009-2022 VMware, Inc., Palo Alto, CA., USA
  * All Rights Reserved.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
@@ -92,6 +92,12 @@ extern "C" {
  *
  * DRM_VMW_PARAM_SM5
  * SM5 support is enabled.
+ *
+ * DRM_VMW_PARAM_GL43
+ * SM5.1+GL4.3 support is enabled.
+ *
+ * DRM_VMW_PARAM_DEVICE_ID
+ * PCI ID of the underlying SVGA device.
  */
 
 #define DRM_VMW_PARAM_NUM_STREAMS  0
@@ -111,6 +117,7 @@ extern "C" {
 #define DRM_VMW_PARAM_SM4_114
 #define DRM_VMW_PARAM_SM5  15
 #define DRM_VMW_PARAM_GL43 16
+#define DRM_VMW_PARAM_DEVICE_ID17
 
 /**
  * enum drm_vmw_handle_type - handle type for ref ioctls
-- 
2.32.0



[PATCH 7/8] drm/vmwgfx: Implement MSI/MSI-X support for IRQs

2022-03-02 Thread Zack Rusin
From: Zack Rusin 

SVGAv3 deprecates legacy interrupts and adds support for MSI/MSI-X. With
MSI the driver visible side remains largely unchanged but with MSI-X
each interrupt gets delivered on its own vector.

Add support for MSI/MSI-X while preserving the old functionality for
SVGAv2. Code between the SVGAv2 and SVGAv3 is exactly the same, only
the number of available vectors changes, in particular between legacy
and MSI-X interrupts.

Signed-off-by: Zack Rusin 
Reviewed-by: Martin Krastev 
Reviewed-by: Maaz Mombasawala 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c |  2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h |  9 -
 drivers/gpu/drm/vmwgfx/vmwgfx_irq.c | 54 +
 3 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index f43afd56915e..791f9a5f3868 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -980,7 +980,7 @@ static int vmw_driver_load(struct vmw_private *dev_priv, 
u32 pci_id)
}
 
if (dev_priv->capabilities & SVGA_CAP_IRQMASK) {
-   ret = vmw_irq_install(&dev_priv->drm, pdev->irq);
+   ret = vmw_irq_install(dev_priv);
if (ret != 0) {
drm_err(&dev_priv->drm,
"Failed installing irq: %d\n", ret);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 12eb4de41036..be19aa6e1f13 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -65,6 +65,11 @@
 #define VMWGFX_PCI_ID_SVGA2  0x0405
 #define VMWGFX_PCI_ID_SVGA3  0x0406
 
+/*
+ * This has to match get_count_order(SVGA_IRQFLAG_MAX)
+ */
+#define VMWGFX_MAX_NUM_IRQS 6
+
 /*
  * Perhaps we should have sysfs entries for these.
  */
@@ -532,6 +537,8 @@ struct vmw_private {
bool has_mob;
spinlock_t hw_lock;
bool assume_16bpp;
+   u32 irqs[VMWGFX_MAX_NUM_IRQS];
+   u32 num_irq_vectors;
 
enum vmw_sm_type sm_type;
 
@@ -1158,7 +1165,7 @@ bool vmw_cmd_describe(const void *buf, u32 *size, char 
const **cmd);
  * IRQs and wating - vmwgfx_irq.c
  */
 
-extern int vmw_irq_install(struct drm_device *dev, int irq);
+extern int vmw_irq_install(struct vmw_private *dev_priv);
 extern void vmw_irq_uninstall(struct drm_device *dev);
 extern bool vmw_seqno_passed(struct vmw_private *dev_priv,
uint32_t seqno);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_irq.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_irq.c
index fe4732bf2c9d..482989b1f211 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_irq.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_irq.c
@@ -300,6 +300,7 @@ void vmw_irq_uninstall(struct drm_device *dev)
struct vmw_private *dev_priv = vmw_priv(dev);
struct pci_dev *pdev = to_pci_dev(dev->dev);
uint32_t status;
+   u32 i;
 
if (!(dev_priv->capabilities & SVGA_CAP_IRQMASK))
return;
@@ -309,20 +310,61 @@ void vmw_irq_uninstall(struct drm_device *dev)
status = vmw_irq_status_read(dev_priv);
vmw_irq_status_write(dev_priv, status);
 
-   free_irq(pdev->irq, dev);
+   for (i = 0; i < dev_priv->num_irq_vectors; ++i)
+   free_irq(dev_priv->irqs[i], dev);
+
+   pci_free_irq_vectors(pdev);
+   dev_priv->num_irq_vectors = 0;
 }
 
 /**
  * vmw_irq_install - Install the irq handlers
  *
- * @dev:  Pointer to the drm device.
- * @irq:  The irq number.
+ * @dev_priv:  Pointer to the vmw_private device.
  * Return:  Zero if successful. Negative number otherwise.
  */
-int vmw_irq_install(struct drm_device *dev, int irq)
+int vmw_irq_install(struct vmw_private *dev_priv)
 {
+   struct pci_dev *pdev = to_pci_dev(dev_priv->drm.dev);
+   struct drm_device *dev = &dev_priv->drm;
+   int ret;
+   int nvec;
+   int i = 0;
+
+   BUILD_BUG_ON((SVGA_IRQFLAG_MAX >> VMWGFX_MAX_NUM_IRQS) != 1);
+   BUG_ON(VMWGFX_MAX_NUM_IRQS != get_count_order(SVGA_IRQFLAG_MAX));
+
+   nvec = pci_alloc_irq_vectors(pdev, 1, VMWGFX_MAX_NUM_IRQS,
+PCI_IRQ_ALL_TYPES);
+
+   if (nvec <= 0) {
+   drm_err(&dev_priv->drm,
+   "IRQ's are unavailable, nvec: %d\n", nvec);
+   goto done;
+   }
+
vmw_irq_preinstall(dev);
 
-   return request_threaded_irq(irq, vmw_irq_handler, vmw_thread_fn,
-   IRQF_SHARED, VMWGFX_DRIVER_NAME, dev);
+   for (i = 0; i < nvec; ++i) {
+   ret = pci_irq_vector(pdev, i);
+   if (ret < 0) {
+   drm_err(&dev_priv->drm,
+   "failed getting irq vector: %d\n", ret);
+   goto done;
+   }
+   dev_priv->irqs[i] = ret;
+
+   ret = request_threaded_irq(dev_priv->irqs[i], vmw_irq_handler, 
vmw_thread_fn,
+  

[PATCH 1/8] drm/vmwgfx: Add support for CursorMob and CursorBypass 4

2022-03-02 Thread Zack Rusin
From: Martin Krastev 

* Add support for CursorMob
* Add support for CursorBypass 4
* Refactor vmw_du_cursor_plane_atomic_update to be kms-helper-atomic
  -- move BO mappings to vmw_du_cursor_plane_prepare_fb
  -- move BO unmappings to vmw_du_cursor_plane_cleanup_fb

Cursor mobs are a new svga feature which enables support for large
cursors, e.g. large accessibility cursor on platforms with vmwgfx. It
also cleans up the cursor code and makes it more uniform with the rest
of modern guest backed objects support.

Signed-off-by: Martin Krastev 
Reviewed-by: Zack Rusin 
Reviewed-by: Maaz Mombasawala 
Signed-off-by: Zack Rusin 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  |   2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h  |   6 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c  | 399 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h  |  28 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c  |  18 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c |  17 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c |  16 +-
 7 files changed, 382 insertions(+), 104 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 26eb5478394a..621231c66fd4 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 OR MIT
 /**
  *
- * Copyright 2009-2016 VMware, Inc., Palo Alto, CA., USA
+ * Copyright 2009-2022 VMware, Inc., Palo Alto, CA., USA
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
  * copy of this software and associated documentation files (the
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index ea3ecdda561d..780da6147cf9 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 OR MIT */
 /**
  *
- * Copyright 2009-2021 VMware, Inc., Palo Alto, CA., USA
+ * Copyright 2009-2022 VMware, Inc., Palo Alto, CA., USA
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
  * copy of this software and associated documentation files (the
@@ -101,6 +101,7 @@ struct vmw_fpriv {
  * struct vmw_buffer_object - TTM buffer object with vmwgfx additions
  * @base: The TTM buffer object
  * @res_tree: RB tree of resources using this buffer object as a backing MOB
+ * @base_mapped_count: ttm BO mapping count; used by KMS atomic helpers.
  * @cpu_writers: Number of synccpu write grabs. Protected by reservation when
  * increased. May be decreased without reservation.
  * @dx_query_ctx: DX context if this buffer object is used as a DX query MOB
@@ -111,6 +112,9 @@ struct vmw_fpriv {
 struct vmw_buffer_object {
struct ttm_buffer_object base;
struct rb_root res_tree;
+   /* For KMS atomic helpers: ttm bo mapping count */
+   atomic_t base_mapped_count;
+
atomic_t cpu_writers;
/* Not ref-counted.  Protected by binding_mutex */
struct vmw_resource *dx_query_ctx;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index bbd2f4ec08ec..9d82a7b49aed 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 OR MIT
 /**
  *
- * Copyright 2009-2015 VMware, Inc., Palo Alto, CA., USA
+ * Copyright 2009-2022 VMware, Inc., Palo Alto, CA., USA
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
  * copy of this software and associated documentation files (the
@@ -41,7 +41,7 @@ void vmw_du_cleanup(struct vmw_display_unit *du)
struct vmw_private *dev_priv = vmw_priv(du->primary.dev);
drm_plane_cleanup(&du->primary);
if (vmw_cmd_supported(dev_priv))
-   drm_plane_cleanup(&du->cursor);
+   drm_plane_cleanup(&du->cursor.base);
 
drm_connector_unregister(&du->connector);
drm_crtc_cleanup(&du->crtc);
@@ -53,23 +53,43 @@ void vmw_du_cleanup(struct vmw_display_unit *du)
  * Display Unit Cursor functions
  */
 
-static int vmw_cursor_update_image(struct vmw_private *dev_priv,
-  u32 *image, u32 width, u32 height,
-  u32 hotspotX, u32 hotspotY)
+static void vmw_cursor_update_mob(struct vmw_private *dev_priv,
+ struct ttm_buffer_object *bo,
+ struct ttm_bo_kmap_obj *map,
+ u32 *image, u32 width, u32 height,
+ u32 hotspotX, u32 hotspotY);
+
+struct vmw_svga_fifo_cmd_define_cursor {
+   u32 cmd;
+   SVGAFifoCmdDefineAlphaCursor cursor;
+};
+
+static void vmw_cursor_update_image(struct vmw_private *dev_priv,
+

[PATCH 4/8] drm/vmwgfx: Fix fencing on SVGAv3

2022-03-02 Thread Zack Rusin
From: Zack Rusin 

Port of the vmwgfx to SVGAv3 lacked support for fencing. SVGAv3 removed
FIFO's and replaced them with command buffers and extra registers.
The initial version of SVGAv3 lacked support for most advanced features
(e.g. 3D) which made fences unnecessary. That is no longer the case,
especially as 3D support is being turned on.

Switch from FIFO commands and capabilities to command buffers and extra
registers to enable fences on SVGAv3.

Fixes: 2cd80dbd3551 ("drm/vmwgfx: Add basic support for SVGA3")
Signed-off-by: Zack Rusin 
Reviewed-by: Martin Krastev 
Reviewed-by: Maaz Mombasawala 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c   |  2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h   |  8 
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 28 ---
 drivers/gpu/drm/vmwgfx/vmwgfx_irq.c   | 26 +
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c   |  8 +---
 5 files changed, 53 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c
index a3bfbb6c3e14..bf1b394753da 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c
@@ -528,7 +528,7 @@ int vmw_cmd_send_fence(struct vmw_private *dev_priv, 
uint32_t *seqno)
*seqno = atomic_add_return(1, &dev_priv->marker_seq);
} while (*seqno == 0);
 
-   if (!(vmw_fifo_caps(dev_priv) & SVGA_FIFO_CAP_FENCE)) {
+   if (!vmw_has_fences(dev_priv)) {
 
/*
 * Don't request hardware to send a fence. The
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 780da6147cf9..12eb4de41036 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -1683,4 +1683,12 @@ static inline void vmw_irq_status_write(struct 
vmw_private *vmw,
outl(status, vmw->io_start + SVGA_IRQSTATUS_PORT);
 }
 
+static inline bool vmw_has_fences(struct vmw_private *vmw)
+{
+   if ((vmw->capabilities & (SVGA_CAP_COMMAND_BUFFERS |
+ SVGA_CAP_CMD_BUFFERS_2)) != 0)
+   return true;
+   return (vmw_fifo_caps(vmw) & SVGA_FIFO_CAP_FENCE) != 0;
+}
+
 #endif
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 59d6a2dd4c2e..66cc35dc223e 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -82,6 +82,22 @@ fman_from_fence(struct vmw_fence_obj *fence)
return container_of(fence->base.lock, struct vmw_fence_manager, lock);
 }
 
+static u32 vmw_fence_goal_read(struct vmw_private *vmw)
+{
+   if ((vmw->capabilities2 & SVGA_CAP2_EXTRA_REGS) != 0)
+   return vmw_read(vmw, SVGA_REG_FENCE_GOAL);
+   else
+   return vmw_fifo_mem_read(vmw, SVGA_FIFO_FENCE_GOAL);
+}
+
+static void vmw_fence_goal_write(struct vmw_private *vmw, u32 value)
+{
+   if ((vmw->capabilities2 & SVGA_CAP2_EXTRA_REGS) != 0)
+   vmw_write(vmw, SVGA_REG_FENCE_GOAL, value);
+   else
+   vmw_fifo_mem_write(vmw, SVGA_FIFO_FENCE_GOAL, value);
+}
+
 /*
  * Note on fencing subsystem usage of irqs:
  * Typically the vmw_fences_update function is called
@@ -392,7 +408,7 @@ static bool vmw_fence_goal_new_locked(struct 
vmw_fence_manager *fman,
if (likely(!fman->seqno_valid))
return false;
 
-   goal_seqno = vmw_fifo_mem_read(fman->dev_priv, SVGA_FIFO_FENCE_GOAL);
+   goal_seqno = vmw_fence_goal_read(fman->dev_priv);
if (likely(passed_seqno - goal_seqno >= VMW_FENCE_WRAP))
return false;
 
@@ -400,9 +416,8 @@ static bool vmw_fence_goal_new_locked(struct 
vmw_fence_manager *fman,
list_for_each_entry(fence, &fman->fence_list, head) {
if (!list_empty(&fence->seq_passed_actions)) {
fman->seqno_valid = true;
-   vmw_fifo_mem_write(fman->dev_priv,
-  SVGA_FIFO_FENCE_GOAL,
-  fence->base.seqno);
+   vmw_fence_goal_write(fman->dev_priv,
+fence->base.seqno);
break;
}
}
@@ -434,13 +449,12 @@ static bool vmw_fence_goal_check_locked(struct 
vmw_fence_obj *fence)
if (dma_fence_is_signaled_locked(&fence->base))
return false;
 
-   goal_seqno = vmw_fifo_mem_read(fman->dev_priv, SVGA_FIFO_FENCE_GOAL);
+   goal_seqno = vmw_fence_goal_read(fman->dev_priv);
if (likely(fman->seqno_valid &&
   goal_seqno - fence->base.seqno < VMW_FENCE_WRAP))
return false;
 
-   vmw_fifo_mem_write(fman->dev_priv, SVGA_FIFO_FENCE_GOAL,
-  fence->base.seqno);
+   vmw_fence_goal_write(fman->dev_priv, fence->base.seqno);
fman->seqno_valid = true;
 
return true;
diff --git a/drivers/gpu/drm/

[PATCH 2/8] drm/vmwgfx: Cleanup multimon initialization code

2022-03-02 Thread Zack Rusin
From: Zack Rusin 

The results of the legacy display unit initialization were being silently
ignored. Unifying the selection of number of display units based
on whether the underlying device supports multimon makes it easier
to add error checking to all paths.

This makes the driver report the errors in ldu initialization paths
and try to recover from them.

Signed-off-by: Zack Rusin 
Reviewed-by: Martin Krastev 
Reviewed-by: Maaz Mombasawala 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
index 643c1608ddfd..e4347faccee0 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
@@ -492,6 +492,8 @@ int vmw_kms_ldu_init_display(struct vmw_private *dev_priv)
 {
struct drm_device *dev = &dev_priv->drm;
int i, ret;
+   int num_display_units = (dev_priv->capabilities & SVGA_CAP_MULTIMON) ?
+   VMWGFX_NUM_DISPLAY_UNITS : 1;
 
if (unlikely(dev_priv->ldu_priv)) {
return -EINVAL;
@@ -506,21 +508,17 @@ int vmw_kms_ldu_init_display(struct vmw_private *dev_priv)
dev_priv->ldu_priv->last_num_active = 0;
dev_priv->ldu_priv->fb = NULL;
 
-   /* for old hardware without multimon only enable one display */
-   if (dev_priv->capabilities & SVGA_CAP_MULTIMON)
-   ret = drm_vblank_init(dev, VMWGFX_NUM_DISPLAY_UNITS);
-   else
-   ret = drm_vblank_init(dev, 1);
+   ret = drm_vblank_init(dev, num_display_units);
if (ret != 0)
goto err_free;
 
vmw_kms_create_implicit_placement_property(dev_priv);
 
-   if (dev_priv->capabilities & SVGA_CAP_MULTIMON)
-   for (i = 0; i < VMWGFX_NUM_DISPLAY_UNITS; ++i)
-   vmw_ldu_init(dev_priv, i);
-   else
-   vmw_ldu_init(dev_priv, 0);
+   for (i = 0; i < num_display_units; ++i) {
+   ret = vmw_ldu_init(dev_priv, i);
+   if (ret != 0)
+   goto err_free;
+   }
 
dev_priv->active_display_unit = vmw_du_legacy;
 
-- 
2.32.0



[PATCH 3/8] drm/vmwgfx: Print capabilities early during the initialization

2022-03-02 Thread Zack Rusin
From: Zack Rusin 

Capabilities were logged at the end of initialization so any early errors
would make them not appear in the logs. Which is also when they're needed
the most.
Print the the capabilities right after fetching them, before the init
code starts using them to make sure they always show up in the logs.

Signed-off-by: Zack Rusin 
Reviewed-by: Martin Krastev 
Reviewed-by: Maaz Mombasawala 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 621231c66fd4..f43afd56915e 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -848,12 +848,16 @@ static int vmw_driver_load(struct vmw_private *dev_priv, 
u32 pci_id)
 
 
dev_priv->capabilities = vmw_read(dev_priv, SVGA_REG_CAPABILITIES);
-
+   vmw_print_bitmap(&dev_priv->drm, "Capabilities",
+dev_priv->capabilities,
+cap1_names, ARRAY_SIZE(cap1_names));
if (dev_priv->capabilities & SVGA_CAP_CAP2_REGISTER) {
dev_priv->capabilities2 = vmw_read(dev_priv, SVGA_REG_CAP2);
+   vmw_print_bitmap(&dev_priv->drm, "Capabilities2",
+dev_priv->capabilities2,
+cap2_names, ARRAY_SIZE(cap2_names));
}
 
-
ret = vmw_dma_select_mode(dev_priv);
if (unlikely(ret != 0)) {
drm_info(&dev_priv->drm,
@@ -939,14 +943,6 @@ static int vmw_driver_load(struct vmw_private *dev_priv, 
u32 pci_id)
 "MOB limits: max mob size = %u kB, max mob pages = %u\n",
 dev_priv->max_mob_size / 1024, dev_priv->max_mob_pages);
 
-   vmw_print_bitmap(&dev_priv->drm, "Capabilities",
-dev_priv->capabilities,
-cap1_names, ARRAY_SIZE(cap1_names));
-   if (dev_priv->capabilities & SVGA_CAP_CAP2_REGISTER)
-   vmw_print_bitmap(&dev_priv->drm, "Capabilities2",
-dev_priv->capabilities2,
-cap2_names, ARRAY_SIZE(cap2_names));
-
ret = vmw_dma_masks(dev_priv);
if (unlikely(ret != 0))
goto out_err0;
-- 
2.32.0



[PATCH 0/8] drm/vmwgfx: 3D on arm64 and large cursor support

2022-03-02 Thread Zack Rusin
From: Zack Rusin 

Series finishes 3D support on arm64 with vmwgfx. With this and changes
that add svga3 pci id's to Mesa3D - OpenGL 4.3 and GLES 3.1 work smoothly
on arm64.
   
Most changes are not svga3 specific but rather could be classified as
generic improvements. That's in particular true for support for curso
mobs which enable large cursor support on both svga2 and svga3 and fixing
initialization of drm_mode_fb_cmd2 struct.

Martin Krastev (1):
  drm/vmwgfx: Add support for CursorMob and CursorBypass 4

Zack Rusin (7):
  drm/vmwgfx: Cleanup multimon initialization code
  drm/vmwgfx: Print capabilities early during the initialization
  drm/vmwgfx: Fix fencing on SVGAv3
  drm/vmwgfx: Allow querying of the SVGA PCI id from the userspace
  drm/vmwgfx: Initialize drm_mode_fb_cmd2
  drm/vmwgfx: Implement MSI/MSI-X support for IRQs
  drm/vmwgfx: Stop using surface dma commands on most configurations

 drivers/gpu/drm/vmwgfx/vmwgfx_cmd.c   |   2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c   |  20 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h   |  23 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_fb.c|   2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c |  28 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c |  27 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_irq.c   |  80 -
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c   | 407 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.h   |  28 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c   |  36 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c  |  17 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c  |  27 +-
 include/uapi/drm/vmwgfx_drm.h |   9 +-
 13 files changed, 538 insertions(+), 168 deletions(-)

-- 
2.32.0



Re: [PATCH V2 04/12] drm: bridge: icn6211: Add DSI lane count DT property parsing

2022-03-02 Thread Marek Vasut

On 3/2/22 11:01, Maxime Ripard wrote:

On Thu, Feb 17, 2022 at 01:25:22AM +0100, Marek Vasut wrote:

The driver currently hard-codes DSI lane count to two, however the chip
is capable of operating in 1..4 DSI lanes mode. Parse 'data-lanes' DT
property and program the result into DSI_CTRL register.

Signed-off-by: Marek Vasut 
Cc: Jagan Teki 
Cc: Maxime Ripard 
Cc: Robert Foss 
Cc: Sam Ravnborg 
Cc: Thomas Zimmermann 
To: dri-devel@lists.freedesktop.org
---
V2: Rebase on next-20220214
---
  drivers/gpu/drm/bridge/chipone-icn6211.c | 21 -
  1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/bridge/chipone-icn6211.c 
b/drivers/gpu/drm/bridge/chipone-icn6211.c
index 2ac8eb7e25f52..7c013a08c7b00 100644
--- a/drivers/gpu/drm/bridge/chipone-icn6211.c
+++ b/drivers/gpu/drm/bridge/chipone-icn6211.c
@@ -136,10 +136,12 @@ struct chipone {
struct drm_bridge bridge;
struct drm_display_mode mode;
struct drm_bridge *panel_bridge;
+   struct device_node *host_node;
struct gpio_desc *enable_gpio;
struct regulator *vdd1;
struct regulator *vdd2;
struct regulator *vdd3;
+   int dsi_lanes;
  };
  
  static inline struct chipone *bridge_to_chipone(struct drm_bridge *bridge)

@@ -212,6 +214,11 @@ static void chipone_atomic_enable(struct drm_bridge 
*bridge,
/* dsi specific sequence */
ICN6211_DSI(icn, SYNC_EVENT_DLY, 0x80);
ICN6211_DSI(icn, HFP_MIN, hfp & 0xff);
+
+   /* DSI data lane count */
+   ICN6211_DSI(icn, DSI_CTRL,
+   DSI_CTRL_UNKNOWN | DSI_CTRL_DSI_LANES(icn->dsi_lanes - 1));
+
ICN6211_DSI(icn, MIPI_PD_CK_LANE, 0xa0);
ICN6211_DSI(icn, PLL_CTRL(12), 0xff);
  
@@ -314,6 +321,7 @@ static const struct drm_bridge_funcs chipone_bridge_funcs = {

  static int chipone_parse_dt(struct chipone *icn)
  {
struct device *dev = icn->dev;
+   struct device_node *endpoint;
struct drm_panel *panel;
int ret;
  
@@ -350,6 +358,16 @@ static int chipone_parse_dt(struct chipone *icn)

return PTR_ERR(icn->enable_gpio);
}
  
+	endpoint = of_graph_get_endpoint_by_regs(dev->of_node, 0, 0);

+   icn->dsi_lanes = of_property_count_u32_elems(endpoint, "data-lanes");


The binding must be amended to allow for the usage of data-lanes, and
you need to keep the previous value as default for older device trees


Regarding the default value -- there are no in-tree users of this driver 
yet (per git grep in current linux-next), do we really care about 
backward compatibility in this case?


Re: [PATCH V2 03/12] drm: bridge: icn6211: Add HS/VS/DE polarity handling

2022-03-02 Thread Marek Vasut

On 3/2/22 10:59, Maxime Ripard wrote:

On Thu, Feb 17, 2022 at 01:25:21AM +0100, Marek Vasut wrote:

The driver currently hard-codes HS/VS polarity to active-low and DE to
active-high, which is not correct for a lot of supported DPI panels.
Add the missing mode flag handling for HS/VS/DE polarity.

Signed-off-by: Marek Vasut 
Cc: Jagan Teki 
Cc: Maxime Ripard 
Cc: Robert Foss 
Cc: Sam Ravnborg 
Cc: Thomas Zimmermann 
To: dri-devel@lists.freedesktop.org
---
V2: Rebase on next-20220214
---
  drivers/gpu/drm/bridge/chipone-icn6211.c | 16 +++-
  1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/bridge/chipone-icn6211.c 
b/drivers/gpu/drm/bridge/chipone-icn6211.c
index e29e6a84c39a6..2ac8eb7e25f52 100644
--- a/drivers/gpu/drm/bridge/chipone-icn6211.c
+++ b/drivers/gpu/drm/bridge/chipone-icn6211.c
@@ -165,8 +165,16 @@ static void chipone_atomic_enable(struct drm_bridge 
*bridge,
  struct drm_bridge_state *old_bridge_state)
  {
struct chipone *icn = bridge_to_chipone(bridge);
+   struct drm_atomic_state *state = old_bridge_state->base.state;
struct drm_display_mode *mode = &icn->mode;
+   const struct drm_bridge_state *bridge_state;
u16 hfp, hbp, hsync;
+   u32 bus_flags;
+   u8 pol;
+
+   /* Get the DPI flags from the bridge state. */
+   bridge_state = drm_atomic_get_new_bridge_state(state, bridge);
+   bus_flags = bridge_state->output_bus_cfg.flags;
  
  	ICN6211_DSI(icn, MIPI_CFG_PW, MIPI_CFG_PW_CONFIG_DSI);
  
@@ -206,7 +214,13 @@ static void chipone_atomic_enable(struct drm_bridge *bridge,

ICN6211_DSI(icn, HFP_MIN, hfp & 0xff);
ICN6211_DSI(icn, MIPI_PD_CK_LANE, 0xa0);
ICN6211_DSI(icn, PLL_CTRL(12), 0xff);
-   ICN6211_DSI(icn, BIST_POL, BIST_POL_DE_POL);
+
+   /* DPI HS/VS/DE polarity */
+   pol = ((mode->flags & DRM_MODE_FLAG_PHSYNC) ? BIST_POL_HSYNC_POL : 0) |
+ ((mode->flags & DRM_MODE_FLAG_PVSYNC) ? BIST_POL_VSYNC_POL : 0) |
+ ((bus_flags & DRM_BUS_FLAG_DE_HIGH) ? BIST_POL_DE_POL : 0);


Is there a reason you didn't use bus_flags for all the polarities there?


Because there is no separate HS/VS bus flag, that's why HS/VS is pulled 
from mode flags.


Re: [PATCH v16 4/4] drm/bridge: dw-hdmi: fix bus formats negotiation for 8 bit modes

2022-03-02 Thread Neil Armstrong

Hi,

On 02/03/2022 12:15, H. Nikolaus Schaller wrote:

Hi Neil,


Am 02.03.2022 um 11:25 schrieb Neil Armstrong :


I added a printk for hdmi->sink_is_hdmi. This returns 1. Which IMHO is to be 
expected
since I am using a HDMI connector and panel... So your patch will still add the 
UYVY formats.
Either the synposys module inside the jz4780 or the panel does not understand 
them.


By selecting the UYVY formats, the driver will enable the colorspace converters 
in the dw-hdmi IP,
I don't see why it doesn't work here...

There is a bit called `Support Color Space Converter` in config0_id:
bit |   Name|   R/W |   Desc
2   |   csc |   R   |   Indicates if Color Space 
Conversion block is present

Could you dump all the config0 bits:

===><=
diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c 
b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
index 54d8fdad395f..547731482da8 100644
--- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
+++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
@@ -3431,6 +3431,7 @@ struct dw_hdmi *dw_hdmi_probe(struct platform_device 
*pdev,
pdevinfo.id = PLATFORM_DEVID_AUTO;

config0 = hdmi_readb(hdmi, HDMI_CONFIG0_ID);
+   dev_info(dev, "config0: %x\n", config0);
config3 = hdmi_readb(hdmi, HDMI_CONFIG3_ID);

if (iores && config3 & HDMI_CONFIG3_AHBAUDDMA) {
===><=

If this bit is missing, this would explain the black screen.


[9.291011] dw-hdmi-ingenic 1018.hdmi: config0: bf

Hm. Or is the color-space conversion of the sw-hdmi module inside the jz4780 
broken
or not configured properly?

(cross-checked: RGB mode still works if I force hdmi->sink_is_hdmi = false)


I don't understand what's wrong, can you try to make the logic select 
MEDIA_BUS_FMT_YUV8_1X24 instead of DRM_COLOR_FORMAT_YCBCR422 ?

If your CSC is broken, we'll need to disable it on your platform.

Thanks,
Neil



BR and thanks,
Nikolaus





[PATCH] video: fbdev: sm712fb: Fix crash in smtcfb_write()

2022-03-02 Thread Zheyu Ma
When the sm712fb driver writes three bytes to the framebuffer, the
driver will crash:

BUG: unable to handle page fault for address: c90001ff
RIP: 0010:smtcfb_write+0x454/0x5b0
Call Trace:
 vfs_write+0x291/0xd60
 ? do_sys_openat2+0x27d/0x350
 ? __fget_light+0x54/0x340
 ksys_write+0xce/0x190
 do_syscall_64+0x43/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Fix it by removing the open-coded endianness fixup-code.

Signed-off-by: Zheyu Ma 
---
 drivers/video/fbdev/sm712fb.c | 21 -
 1 file changed, 4 insertions(+), 17 deletions(-)

diff --git a/drivers/video/fbdev/sm712fb.c b/drivers/video/fbdev/sm712fb.c
index 0dbc6bf8268a..e355089ac7d6 100644
--- a/drivers/video/fbdev/sm712fb.c
+++ b/drivers/video/fbdev/sm712fb.c
@@ -1130,7 +1130,7 @@ static ssize_t smtcfb_write(struct fb_info *info, const 
char __user *buf,
count = total_size - p;
}
 
-   buffer = kmalloc((count > PAGE_SIZE) ? PAGE_SIZE : count, GFP_KERNEL);
+   buffer = kmalloc(PAGE_SIZE, GFP_KERNEL);
if (!buffer)
return -ENOMEM;
 
@@ -1148,24 +1148,11 @@ static ssize_t smtcfb_write(struct fb_info *info, const 
char __user *buf,
break;
}
 
-   for (i = c >> 2; i--;) {
-   fb_writel(big_swap(*src), dst++);
+   for (i = (c + 3) >> 2; i--;) {
+   fb_writel(big_swap(*src), dst);
+   dst++;
src++;
}
-   if (c & 3) {
-   u8 *src8 = (u8 *)src;
-   u8 __iomem *dst8 = (u8 __iomem *)dst;
-
-   for (i = c & 3; i--;) {
-   if (i & 1) {
-   fb_writeb(*src8++, ++dst8);
-   } else {
-   fb_writeb(*src8++, --dst8);
-   dst8 += 2;
-   }
-   }
-   dst = (u32 __iomem *)dst8;
-   }
 
*ppos += c;
buf += c;
-- 
2.25.1



Re: [RFC PATCH] drm/panel: simple: panel-dpi: use bus-format to set bpc and bus_format

2022-03-02 Thread Maxime Ripard
Hi,

Please try to avoid top posting

On Wed, Feb 23, 2022 at 04:25:19PM +0100, Max Krummenacher wrote:
> The goal here is to set the element bus_format in the struct
> panel_desc. This is an enum with the possible values defined in
> include/uapi/linux/media-bus-format.h.
> 
> The enum values are not constructed in a way that you could calculate
> the value from color channel width/shift/mapping/whatever. You rather
> would have to check if the combination of color channel
> width/shift/mapping/whatever maps to an existing value and otherwise
> EINVAL out.
> 
> I don't see the value in having yet another way of how this
> information can be specified and then having to write a more
> complicated parser which maps the dt data to bus_format.

Generally speaking, sending an RFC without explicitly stating what you
want a comment on isn't very efficient.

That being said, what I (and I can only assume Marek) don't like is the
string encoding. Especially when the similar bus-type property uses a
integer with the various available bus options we have.

Having an integer, with a set of defines that you would map to the
proper MEDIA_BUS_* would be more efficient and more elegant.

That being said, the first question that needs to be answered is why
does this have to be in the DT in the first place?

Maxime


signature.asc
Description: PGP signature


[Bug 215648] amdgpu: Changing monitor configuration (plug/unplug/wake from DPMS) causes kernel panic

2022-03-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=215648

--- Comment #5 from Philipp Riederer (pr_ker...@tum.fail) ---
Hey,

Thank you for working on this!

I added the dmesg as attachment to the bug. Please note that this is from the
working kernel (commit 94ba5b0fb52d6dbf1f200351876a839afb74aedd) as that is the
one I have running now. If it helps, I can also provide the messages from a
non-working kernel later.

Cheers,
Philipp

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 215648] amdgpu: Changing monitor configuration (plug/unplug/wake from DPMS) causes kernel panic

2022-03-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=215648

--- Comment #4 from Philipp Riederer (pr_ker...@tum.fail) ---
Created attachment 300517
  --> https://bugzilla.kernel.org/attachment.cgi?id=300517&action=edit
dmesg from boot using kernel 5.15.12@94ba5b0fb52d6dbf1f200351876a839afb74aedd

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 215648] amdgpu: Changing monitor configuration (plug/unplug/wake from DPMS) causes kernel panic

2022-03-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=215648

--- Comment #3 from Alex Deucher (alexdeuc...@gmail.com) ---
Thanks.  Can you get the dmesg output from boot prior to the hang?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

RE: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-02 Thread David Laight
From: Xiaomeng Tong
> Sent: 02 March 2022 09:31
> 
> On Mon, 28 Feb 2022 16:41:04 -0800, Linus Torvalds
>  wrote:
> >
> > But basically to _me_, the important part is that the end result is
> > maintainable longer-term.
> 
> I couldn't agree more. And because of that, I stick with the following
> approach because it's maintainable longer-term than "type(pos) pos" one:
>  Implements a new macro for each list_for_each_entry* with _inside suffix.
>   #define list_for_each_entry_inside(pos, type, head, member)

I think that it would be better to make any alternate loop macro
just set the variable to NULL on the loop exit.
That is easier to code for and the compiler might be persuaded to
not redo the test.

It also doesn't need an extra variable defined in the for() statement
so can be back-ported to older kernels without required declaration
in the middle of blocks.

OTOH there may be alternative definitions that can be used to get
the compiler (or other compiler-like tools) to detect broken code.
Even if the definition can't possibly generate a working kerrnel.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



Re: [PATCH 1/2] drm: Introduce DRM_BRIDGE_OP_UPSTREAM_FIRST to alter bridge init order

2022-03-02 Thread Dave Stevenson
Hi Andrzej

On Mon, 28 Feb 2022 at 15:36, Andrzej Hajda  wrote:
>
>
>
> On 22.02.2022 09:43, Dave Stevenson wrote:
> > Hi Laurent.
> >
> > Thanks for the review.
> >
> > On Tue, 22 Feb 2022 at 06:34, Laurent Pinchart
> >  wrote:
> >> Hi Dave,
> >>
> >> Thank you for the patch.
> >>
> >> On Wed, Feb 16, 2022 at 04:59:43PM +, Dave Stevenson wrote:
> >>> DSI sink devices typically want the DSI host powered up and configured
> >>> before they are powered up. pre_enable is the place this would normally
> >>> happen, but they are called in reverse order from panel/connector towards
> >>> the encoder, which is the "wrong" order.
> >>>
> >>> Add a new flag DRM_BRIDGE_OP_UPSTREAM_FIRST that any bridge can set
> >>> to swap the order of pre_enable (and post_disable) so that any upstream
> >>> bridges are called first to create the desired state.
> >>>
> >>> eg:
> >>> - Panel
> >>> - Bridge 1
> >>> - Bridge 2 DRM_BRIDGE_OP_UPSTREAM_FIRST
> >>> - Bridge 3
> >>> - Encoder
> >>> Would result in pre_enable's being called as Panel, Bridge 1, Bridge 3,
> >>> Bridge 2.
> >> If there was a Bridge 4 between Bridge 3 and Encoder, would it be
> >>
> >> Panel, Bridge 1, Bridge 3, Bridge 4, Bridge 2
> >>
> >> ? I'd capture that here, to be explicit.
> > No.
> >   - Panel
> >   - Bridge 1
> >   - Bridge 2 DRM_BRIDGE_OP_UPSTREAM_FIRST
> >   - Bridge 3
> >   - Bridge 4
> >- Encoder
> > Would result in pre_enable's being called as Panel, Bridge 1, Bridge
> > 3, Bridge 2, Bridge 4, Encoder.
> > ie it only swaps the order of bridges 2 & 3.
> >
> >   - Panel
> >   - Bridge 1
> >   - Bridge 2 DRM_BRIDGE_OP_UPSTREAM_FIRST
> >   - Bridge 3 DRM_BRIDGE_OP_UPSTREAM_FIRST
> >   - Bridge 4
> >   - Encoder
> > Would result in pre_enable's being called as Panel, Bridge 1, Bridge
> > 4, Bridge 3, Bridge 2, Encoder.
> > (Bridge 2&3 have asked for upstream to be enabled first, which means
> > bridge 4. Bridge 2 wants upstream enabled first, which means bridge
> > 3).
> >
> >   - Panel
> >   - Bridge 1
> >   - Bridge 2 DRM_BRIDGE_OP_UPSTREAM_FIRST
> >   - Bridge 3
> >   - Bridge 4 DRM_BRIDGE_OP_UPSTREAM_FIRST
> >   - Bridge 5
> >   - Encoder
> > Would result in Panel, Bridge 1, Bridge 3, Bridge 2, Bridge 5, Bridge
> > 4, Encoder.
> >
> > So we only reverse the order whilst the bridges request that they want
> > upstream enabled first, but we can do that multiple times within the
> > chain. I hope that makes sense.
> >
> >>> Signed-off-by: Dave Stevenson 
> >>> ---
> >>>   drivers/gpu/drm/drm_bridge.c | 197 
> >>> +--
> >>>   include/drm/drm_bridge.h |   8 ++
> >>>   2 files changed, 180 insertions(+), 25 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c
> >>> index c96847fc0ebc..7c24e8340efa 100644
> >>> --- a/drivers/gpu/drm/drm_bridge.c
> >>> +++ b/drivers/gpu/drm/drm_bridge.c
> >>> @@ -522,21 +522,58 @@ EXPORT_SYMBOL(drm_bridge_chain_disable);
> >>>* Calls &drm_bridge_funcs.post_disable op for all the bridges in the
> >>>* encoder chain, starting from the first bridge to the last. These are 
> >>> called
> >>>* after completing the encoder's prepare op.
> >> Missing blank line, as well as in three locations below.
> >>
> >>> + * If a bridge sets the DRM_BRIDGE_OP_UPSTREAM_FIRST, then the 
> >>> post_disable for
> >>> + * that bridge will be called before the previous one to reverse the 
> >>> pre_enable
> >>> + * calling direction.
> >>>*
> >>>* Note: the bridge passed should be the one closest to the encoder
> >>>*/
> >>>   void drm_bridge_chain_post_disable(struct drm_bridge *bridge)
> >>>   {
> >>>struct drm_encoder *encoder;
> >>> + struct drm_bridge *next, *limit;
> >>>
> >>>if (!bridge)
> >>>return;
> >>>
> >>>encoder = bridge->encoder;
> >>>list_for_each_entry_from(bridge, &encoder->bridge_chain, 
> >>> chain_node) {
> >>> + limit = NULL;
> >>> +
> >>> + if (!list_is_last(&bridge->chain_node, 
> >>> &encoder->bridge_chain)) {
> >>> + next = list_next_entry(bridge, chain_node);
> >>> +
> >>> + if (next->ops & DRM_BRIDGE_OP_UPSTREAM_FIRST) {
> >>> + limit = next;
> >>> +
> >>> + list_for_each_entry_from(next, 
> >>> &encoder->bridge_chain,
> >>> +  chain_node) {
> >>> + if (!(next->ops &
> >>> + 
> >>> DRM_BRIDGE_OP_UPSTREAM_FIRST)) {
> >>> + next = 
> >>> list_prev_entry(next, chain_node);
> >>> + limit = next;
> >>> + break;
> >>> + }
> >>> + }
> >>> +
> >>> + list_for_each_entry_from_reverse(next, 
>

  1   2   >