date:20240502

On Fri, May 03, 2024 at 01:14:45AM +0100, Al Viro wrote:
> On Thu, May 02, 2024 at 05:10:18PM -0700, Kees Cook wrote:
> 
> > But anyway, there needs to be a general "oops I hit 0"-aware form of
> > get_file(), and it seems like it should just be get_file() itself...
> 
> ... which brings back the question of what's the sane damage mitigation
> for that.  Adding arseloads of never-exercised failure exits is generally
> a bad idea - it's asking for bitrot and making the thing harder to review
> in future.

Linus seems to prefer best-effort error recovery to sprinkling BUG()s
around.  But if that's really the solution, then how about get_file()
switching to to use inc_not_zero and BUG on 0?

-- 
Kees Cook

Re: [PATCH 5/5] fs: Convert struct file::f_count to refcount_long_t

On Thu, May 02, 2024 at 05:10:18PM -0700, Kees Cook wrote:

> But anyway, there needs to be a general "oops I hit 0"-aware form of
> get_file(), and it seems like it should just be get_file() itself...

... which brings back the question of what's the sane damage mitigation
for that.  Adding arseloads of never-exercised failure exits is generally
a bad idea - it's asking for bitrot and making the thing harder to review
in future.

Re: [PATCH 5/5] fs: Convert struct file::f_count to refcount_long_t

On Fri, May 03, 2024 at 12:41:52AM +0100, Al Viro wrote:
> On Thu, May 02, 2024 at 04:21:13PM -0700, Kees Cook wrote:
> > On Fri, May 03, 2024 at 12:12:28AM +0100, Al Viro wrote:
> > > On Thu, May 02, 2024 at 03:52:21PM -0700, Kees Cook wrote:
> > > 
> > > > As for semantics, what do you mean? Detecting dec-below-zero means we
> > > > catch underflow, and detected inc-from-zero means we catch resurrection
> > > > attempts. In both cases we avoid double-free, but we have already lost
> > > > to a potential dangling reference to a freed struct file. But just
> > > > letting f_count go bad seems dangerous.
> > > 
> > > Detected inc-from-zero can also mean an RCU lookup detecting a descriptor
> > > in the middle of getting closed.  And it's more subtle than that, 
> > > actually,
> > > thanks to SLAB_TYPESAFE_BY_RCU for struct file.
> > 
> > But isn't that already handled by __get_file_rcu()? i.e. shouldn't it be
> > impossible for a simple get_file() to ever see a 0 f_count under normal
> > conditions?
> 
> For get_file() it is impossible.  The comment about semantics had been
> about the sane ways to recover if such crap gets detected.
> 
> __get_file_rcu() is a separate story - consider the comment in there: 
>   * atomic_long_inc_not_zero() above provided a full memory
>   * barrier when we acquired a reference.
>  *
>  * This is paired with the write barrier from assigning to the
>  * __rcu protected file pointer so that if that pointer still
>  * matches the current file, we know we have successfully
>  * acquired a reference to the right file.
> 
> and IIRC, refcount_t is weaker wrt barriers.

I think that was also fixed for refcount_t. I'll need to go dig out the
commit...

But anyway, there needs to be a general "oops I hit 0"-aware form of
get_file(), and it seems like it should just be get_file() itself...

-- 
Kees Cook

Re: [PATCH] drm/panel-edp: Add ID for KD KD116N09-30NH-A016

2024-05-02 Thread Hsin-Yi Wang

On Thu, May 2, 2024 at 4:48 PM Douglas Anderson  wrote:
>
> As evidenced by in-field reports, this panel shipped on pompom but we
> never added the ID and thus we're stuck w/ conservative timings. The
> panel was part of early patches but somehow got left off in the
> end. :( Add it in now.
>
> For future reference, EDID from this panel is:
> 00002c821212
> 321e0104951a0e780ae511965e55932c
> 1950540001010101010101010101
> 010101010101a41f5686500084302820
> 55901018
> 
> 00fe
> 004b443131364e3039333041313600f6
>
> We use the ASCII string from decoding the EDID ("KD116N0930A16") as
> the panel name.
>
> Signed-off-by: Douglas Anderson 

Reviewed-by: Hsin-Yi Wang 

> ---
>
>  drivers/gpu/drm/panel/panel-edp.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/panel/panel-edp.c 
> b/drivers/gpu/drm/panel/panel-edp.c
> index 6db277efcbb7..9cfa05c7d193 100644
> --- a/drivers/gpu/drm/panel/panel-edp.c
> +++ b/drivers/gpu/drm/panel/panel-edp.c
> @@ -2094,6 +2094,7 @@ static const struct edp_panel_entry edp_panels[] = {
> EDP_PANEL_ENTRY('K', 'D', 'B', 0x0624, 
> _kd116n21_30nv_a010.delay, "116N21-30NV-A010"),
> EDP_PANEL_ENTRY('K', 'D', 'B', 0x1118, _200_500_e50, 
> "KD116N29-30NK-A005"),
> EDP_PANEL_ENTRY('K', 'D', 'B', 0x1120, _200_500_e80_d50, 
> "116N29-30NK-C007"),
> +   EDP_PANEL_ENTRY('K', 'D', 'B', 0x1212, _200_500_e50, 
> "KD116N0930A16"),
>
> EDP_PANEL_ENTRY('K', 'D', 'C', 0x044f, _200_500_e50, 
> "KD116N9-30NH-F3"),
> EDP_PANEL_ENTRY('K', 'D', 'C', 0x05f1, _200_500_e80_d50, 
> "KD116N5-30NV-G7"),
> --
> 2.45.0.rc1.225.g2a3ae87e7f-goog
>

Re: [PATCH] drm/fourcc: Add RGB161616 and BGR161616 formats

2024-05-02 Thread Laurent Pinchart

Hi Jacopo,

On Thu, May 02, 2024 at 11:02:27AM +0200, Jacopo Mondi wrote:
> Hello
>which tree should this patch be collected from now that it has
> been reviewed ?

I think this can go through drm-misc. I'm not sure what the rule is for
patches that touch core code like these, can then be pushed by anyone
with commit access, or do they need to be collected by a drm-misc
maintainer ?

> On Mon, Feb 26, 2024 at 02:25:43PM GMT, Jacopo Mondi wrote:
> > Add FourCC definitions for the 48-bit RGB/BGR formats to the
> > DRM/KMS uapi.
> >
> > The format will be used by the Raspberry Pi PiSP Back End,
> > supported by a V4L2 driver in kernel space and by libcamera in
> > userspace, which uses the DRM FourCC identifiers.
> >
> > Signed-off-by: Jacopo Mondi 
> > ---
> >  drivers/gpu/drm/drm_fourcc.c  | 8 
> >  include/uapi/drm/drm_fourcc.h | 4 
> >  2 files changed, 12 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/drm_fourcc.c b/drivers/gpu/drm/drm_fourcc.c
> > index 193cf8ed7912..908f20b96fd5 100644
> > --- a/drivers/gpu/drm/drm_fourcc.c
> > +++ b/drivers/gpu/drm/drm_fourcc.c
> > @@ -210,6 +210,14 @@ const struct drm_format_info *__drm_format_info(u32 
> > format)
> > { .format = DRM_FORMAT_ABGR2101010, .depth = 30, 
> > .num_planes = 1, .cpp = { 4, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = 
> > true },
> > { .format = DRM_FORMAT_RGBA1010102, .depth = 30, 
> > .num_planes = 1, .cpp = { 4, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = 
> > true },
> > { .format = DRM_FORMAT_BGRA1010102, .depth = 30, 
> > .num_planes = 1, .cpp = { 4, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = 
> > true },
> > +   { .format = DRM_FORMAT_RGB161616,   .depth = 0,
> > + .num_planes = 1, .char_per_block = { 6, 0, 0 },
> > + .block_w = { 1, 0, 0 }, .block_h = { 1, 0, 0 },
> > + .hsub = 1, .vsub = 1, .has_alpha = false },
> > +   { .format = DRM_FORMAT_BGR161616,   .depth = 0,
> > + .num_planes = 1, .char_per_block = { 6, 0, 0 },
> > + .block_w = { 1, 0, 0 }, .block_h = { 1, 0, 0 },
> > + .hsub = 1, .vsub = 1, .has_alpha = false },
> > { .format = DRM_FORMAT_ARGB,.depth = 32, 
> > .num_planes = 1, .cpp = { 4, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = 
> > true },
> > { .format = DRM_FORMAT_ABGR,.depth = 32, 
> > .num_planes = 1, .cpp = { 4, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = 
> > true },
> > { .format = DRM_FORMAT_RGBA,.depth = 32, 
> > .num_planes = 1, .cpp = { 4, 0, 0 }, .hsub = 1, .vsub = 1, .has_alpha = 
> > true },
> > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> > index 84d502e42961..00db00083175 100644
> > --- a/include/uapi/drm/drm_fourcc.h
> > +++ b/include/uapi/drm/drm_fourcc.h
> > @@ -210,6 +210,10 @@ extern "C" {
> >  #define DRM_FORMAT_RGBA1010102 fourcc_code('R', 'A', '3', '0') /* 
> > [31:0] R:G:B:A 10:10:10:2 little endian */
> >  #define DRM_FORMAT_BGRA1010102 fourcc_code('B', 'A', '3', '0') /* 
> > [31:0] B:G:R:A 10:10:10:2 little endian */
> >
> > +/* 48 bpp RGB */
> > +#define DRM_FORMAT_RGB161616 fourcc_code('R', 'G', '4', '8') /* [47:0] 
> > R:G:B 16:16:16 little endian */
> > +#define DRM_FORMAT_BGR161616 fourcc_code('B', 'G', '4', '8') /* [47:0] 
> > B:G:R 16:16:16 little endian */
> > +
> >  /* 64 bpp RGB */
> >  #define DRM_FORMAT_XRGB16161616fourcc_code('X', 'R', '4', '8') /* 
> > [63:0] x:R:G:B 16:16:16:16 little endian */
> >  #define DRM_FORMAT_XBGR16161616fourcc_code('X', 'B', '4', '8') /* 
> > [63:0] x:B:G:R 16:16:16:16 little endian */

-- 
Regards,

Laurent Pinchart

[PATCH] drm/panel-edp: Add ID for KD KD116N09-30NH-A016

2024-05-02 Thread Douglas Anderson

As evidenced by in-field reports, this panel shipped on pompom but we
never added the ID and thus we're stuck w/ conservative timings. The
panel was part of early patches but somehow got left off in the
end. :( Add it in now.

For future reference, EDID from this panel is:
00002c821212
321e0104951a0e780ae511965e55932c
1950540001010101010101010101
010101010101a41f5686500084302820
55901018

00fe
004b443131364e3039333041313600f6

We use the ASCII string from decoding the EDID ("KD116N0930A16") as
the panel name.

Signed-off-by: Douglas Anderson 
---

 drivers/gpu/drm/panel/panel-edp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/panel/panel-edp.c 
b/drivers/gpu/drm/panel/panel-edp.c
index 6db277efcbb7..9cfa05c7d193 100644
--- a/drivers/gpu/drm/panel/panel-edp.c
+++ b/drivers/gpu/drm/panel/panel-edp.c
@@ -2094,6 +2094,7 @@ static const struct edp_panel_entry edp_panels[] = {
EDP_PANEL_ENTRY('K', 'D', 'B', 0x0624, 
_kd116n21_30nv_a010.delay, "116N21-30NV-A010"),
EDP_PANEL_ENTRY('K', 'D', 'B', 0x1118, _200_500_e50, 
"KD116N29-30NK-A005"),
EDP_PANEL_ENTRY('K', 'D', 'B', 0x1120, _200_500_e80_d50, 
"116N29-30NK-C007"),
+   EDP_PANEL_ENTRY('K', 'D', 'B', 0x1212, _200_500_e50, 
"KD116N0930A16"),
 
EDP_PANEL_ENTRY('K', 'D', 'C', 0x044f, _200_500_e50, 
"KD116N9-30NH-F3"),
EDP_PANEL_ENTRY('K', 'D', 'C', 0x05f1, _200_500_e80_d50, 
"KD116N5-30NV-G7"),
-- 
2.45.0.rc1.225.g2a3ae87e7f-goog

Re: [PATCH 5/5] fs: Convert struct file::f_count to refcount_long_t

On Thu, May 02, 2024 at 04:21:13PM -0700, Kees Cook wrote:
> On Fri, May 03, 2024 at 12:12:28AM +0100, Al Viro wrote:
> > On Thu, May 02, 2024 at 03:52:21PM -0700, Kees Cook wrote:
> > 
> > > As for semantics, what do you mean? Detecting dec-below-zero means we
> > > catch underflow, and detected inc-from-zero means we catch resurrection
> > > attempts. In both cases we avoid double-free, but we have already lost
> > > to a potential dangling reference to a freed struct file. But just
> > > letting f_count go bad seems dangerous.
> > 
> > Detected inc-from-zero can also mean an RCU lookup detecting a descriptor
> > in the middle of getting closed.  And it's more subtle than that, actually,
> > thanks to SLAB_TYPESAFE_BY_RCU for struct file.
> 
> But isn't that already handled by __get_file_rcu()? i.e. shouldn't it be
> impossible for a simple get_file() to ever see a 0 f_count under normal
> conditions?

For get_file() it is impossible.  The comment about semantics had been
about the sane ways to recover if such crap gets detected.

__get_file_rcu() is a separate story - consider the comment in there: 
* atomic_long_inc_not_zero() above provided a full memory
* barrier when we acquired a reference.
 *
 * This is paired with the write barrier from assigning to the
 * __rcu protected file pointer so that if that pointer still
 * matches the current file, we know we have successfully
 * acquired a reference to the right file.

and IIRC, refcount_t is weaker wrt barriers.

Re: [PATCH 5/5] fs: Convert struct file::f_count to refcount_long_t

On Fri, May 03, 2024 at 12:12:28AM +0100, Al Viro wrote:
> On Thu, May 02, 2024 at 03:52:21PM -0700, Kees Cook wrote:
> 
> > As for semantics, what do you mean? Detecting dec-below-zero means we
> > catch underflow, and detected inc-from-zero means we catch resurrection
> > attempts. In both cases we avoid double-free, but we have already lost
> > to a potential dangling reference to a freed struct file. But just
> > letting f_count go bad seems dangerous.
> 
> Detected inc-from-zero can also mean an RCU lookup detecting a descriptor
> in the middle of getting closed.  And it's more subtle than that, actually,
> thanks to SLAB_TYPESAFE_BY_RCU for struct file.

But isn't that already handled by __get_file_rcu()? i.e. shouldn't it be
impossible for a simple get_file() to ever see a 0 f_count under normal
conditions?

-- 
Kees Cook

Re: [PATCH 5/5] fs: Convert struct file::f_count to refcount_long_t

On Thu, May 02, 2024 at 03:52:21PM -0700, Kees Cook wrote:

> As for semantics, what do you mean? Detecting dec-below-zero means we
> catch underflow, and detected inc-from-zero means we catch resurrection
> attempts. In both cases we avoid double-free, but we have already lost
> to a potential dangling reference to a freed struct file. But just
> letting f_count go bad seems dangerous.

Detected inc-from-zero can also mean an RCU lookup detecting a descriptor
in the middle of getting closed.  And it's more subtle than that, actually,
thanks to SLAB_TYPESAFE_BY_RCU for struct file.

Re: [PATCH 1/5] fs: Do not allow get_file() to resurrect 0 f_count

On Fri, May 03, 2024 at 12:53:56AM +0200, Jann Horn wrote:
> On Fri, May 3, 2024 at 12:34 AM Kees Cook  wrote:
> > If f_count reaches 0, calling get_file() should be a failure. Adjust to
> > use atomic_long_inc_not_zero() and return NULL on failure. In the future
> > get_file() can be annotated with __must_check, though that is not
> > currently possible.
> [...]
> >  static inline struct file *get_file(struct file *f)
> >  {
> > -   atomic_long_inc(>f_count);
> > +   if (unlikely(!atomic_long_inc_not_zero(>f_count)))
> > +   return NULL;
> > return f;
> >  }
> 
> Oh, I really don't like this...
> 
> In most code, if you call get_file() on a file and see refcount zero,
> that basically means you're in a UAF write situation, or that you
> could be in such a situation if you had raced differently. It's
> basically just like refcount_inc() in that regard.

Shouldn't the system attempt to not make things worse if it encounters
an inc-from-0 condition? Yes, we've already lost the race for a UaF
condition, but maybe don't continue on.

> And get_file() has semantics just like refcount_inc(): The caller
> guarantees that it is already holding a reference to the file; and if

Yes, but if that guarantee is violated, we should do something about it.

> the caller is wrong about that, their subsequent attempt to clean up
> the reference that they think they were already holding will likely
> lead to UAF too. If get_file() sees a zero refcount, there is no safe
> way to continue. And all existing callers of get_file() expect the
> return value to be the same as the non-NULL pointer they passed in, so
> they'll either ignore the result of this check and barrel on, or oops
> with a NULL deref.
> 
> For callers that want to actually try incrementing file refcounts that
> could be zero, which is only possible under specific circumstances, we
> have helpers like get_file_rcu() and get_file_active().

So what's going on in here:
https://lore.kernel.org/linux-hardening/20240502223341.1835070-2-keesc...@chromium.org/

> Can't you throw a CHECK_DATA_CORRUPTION() or something like that in
> there instead?

I'm open to suggestions, but given what's happening with struct dma_buf
above, it seems like this is a state worth checking for?

-- 
Kees Cook

Re: [PATCH 1/5] fs: Do not allow get_file() to resurrect 0 f_count

2024-05-02 Thread Jann Horn

On Fri, May 3, 2024 at 12:34 AM Kees Cook  wrote:
> If f_count reaches 0, calling get_file() should be a failure. Adjust to
> use atomic_long_inc_not_zero() and return NULL on failure. In the future
> get_file() can be annotated with __must_check, though that is not
> currently possible.
[...]
>  static inline struct file *get_file(struct file *f)
>  {
> -   atomic_long_inc(>f_count);
> +   if (unlikely(!atomic_long_inc_not_zero(>f_count)))
> +   return NULL;
> return f;
>  }

Oh, I really don't like this...

In most code, if you call get_file() on a file and see refcount zero,
that basically means you're in a UAF write situation, or that you
could be in such a situation if you had raced differently. It's
basically just like refcount_inc() in that regard.

And get_file() has semantics just like refcount_inc(): The caller
guarantees that it is already holding a reference to the file; and if
the caller is wrong about that, their subsequent attempt to clean up
the reference that they think they were already holding will likely
lead to UAF too. If get_file() sees a zero refcount, there is no safe
way to continue. And all existing callers of get_file() expect the
return value to be the same as the non-NULL pointer they passed in, so
they'll either ignore the result of this check and barrel on, or oops
with a NULL deref.

For callers that want to actually try incrementing file refcounts that
could be zero, which is only possible under specific circumstances, we
have helpers like get_file_rcu() and get_file_active().

Can't you throw a CHECK_DATA_CORRUPTION() or something like that in
there instead?

Re: [PATCH 5/5] fs: Convert struct file::f_count to refcount_long_t

On Thu, May 02, 2024 at 11:42:50PM +0100, Al Viro wrote:
> On Thu, May 02, 2024 at 03:33:40PM -0700, Kees Cook wrote:
> > Underflow of f_count needs to be more carefully detected than it
> > currently is. The results of get_file() should be checked, but the
> > first step is detection. Redefine f_count from atomic_long_t to
> > refcount_long_t.
> 
>   It is used on fairly hot paths.  What's more, it's not
> at all obvious what the hell would right semantics be.

I think we've put performance concerns between refcount_t and atomic_t
to rest long ago. If there is a real workload where it's a problem,
let's find it! :)

As for semantics, what do you mean? Detecting dec-below-zero means we
catch underflow, and detected inc-from-zero means we catch resurrection
attempts. In both cases we avoid double-free, but we have already lost
to a potential dangling reference to a freed struct file. But just
letting f_count go bad seems dangerous.

-- 
Kees Cook

Re: [PATCH 5/5] fs: Convert struct file::f_count to refcount_long_t

On Thu, May 02, 2024 at 03:33:40PM -0700, Kees Cook wrote:
> Underflow of f_count needs to be more carefully detected than it
> currently is. The results of get_file() should be checked, but the
> first step is detection. Redefine f_count from atomic_long_t to
> refcount_long_t.

It is used on fairly hot paths.  What's more, it's not
at all obvious what the hell would right semantics be.

NAKed-by: Al Viro

Re: [PATCH] drm/connector: Add \n to message about demoting connector force-probes

2024-05-02 Thread Abhinav Kumar





On 5/2/2024 3:32 PM, Douglas Anderson wrote:

The debug print clearly lacks a \n at the end. Add it.

Fixes: 8f86c82aba8b ("drm/connector: demote connector force-probes for non-master 
clients")
Signed-off-by: Douglas Anderson 
---

  drivers/gpu/drm/drm_connector.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)



Reviewed-by: Abhinav Kumar

[PATCH 4/5] refcount: Introduce refcount_long_t and APIs

Duplicate the refcount_t types and APIs gain refcount_long_t. This is
needed for larger counters that while currently very unlikely to overflow,
still want to detect and mitigate underflow.

Generate refcount-long.h via direct string replacements. Doing macros
like compat_binfmt_elf doesn't appear to work well.

Signed-off-by: Kees Cook 
---
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Boqun Feng 
Cc: Mark Rutland 
Cc: Kent Overstreet 
Cc: Masahiro Yamada 
Cc: Nathan Chancellor 
Cc: Nicolas Schier 
Cc: linux-kbu...@vger.kernel.org
---
 MAINTAINERS|   2 +-
 Makefile   |  11 +-
 include/linux/refcount-impl.h  | 344 +
 include/linux/refcount.h   | 341 +---
 include/linux/refcount_types.h |  12 ++
 lib/refcount.c |  17 +-
 6 files changed, 385 insertions(+), 342 deletions(-)
 create mode 100644 include/linux/refcount-impl.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 7c121493f43d..2e6c8eaab194 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3360,7 +3360,7 @@ S:Maintained
 F: Documentation/atomic_*.txt
 F: arch/*/include/asm/atomic*.h
 F: include/*/atomic*.h
-F: include/linux/refcount.h
+F: include/linux/refcount*.h
 F: scripts/atomic/
 
 ATTO EXPRESSSAS SAS/SATA RAID SCSI DRIVER
diff --git a/Makefile b/Makefile
index 4bef6323c47d..a4bdcd34f323 100644
--- a/Makefile
+++ b/Makefile
@@ -1190,7 +1190,9 @@ PHONY += prepare archprepare
 
 archprepare: outputmakefile archheaders archscripts scripts 
include/config/kernel.release \
asm-generic $(version_h) include/generated/utsrelease.h \
-   include/generated/compile.h include/generated/autoconf.h 
remove-stale-files
+   include/generated/compile.h include/generated/autoconf.h \
+   include/generated/refcount-long.h \
+   remove-stale-files
 
 prepare0: archprepare
$(Q)$(MAKE) $(build)=scripts/mod
@@ -1262,6 +1264,13 @@ filechk_compile.h = $(srctree)/scripts/mkcompile_h \
 include/generated/compile.h: FORCE
$(call filechk,compile.h)
 
+include/generated/refcount-long.h: $(srctree)/include/linux/refcount-impl.h
+   $(Q)$(PERL) -pe 's/\b(atomic|(__)?refcount)_/\1_long_/g;\
+s/ATOMIC_/ATOMIC_LONG_/g;  \
+s/(REFCOUNT)_(IMPL|INIT|MAX|SAT)/\1_LONG_\2/g; \
+s/\b(U?)INT_/\1LONG_/g;\
+s/\bint\b/long/g;' $< >$@
+
 PHONY += headerdep
 headerdep:
$(Q)find $(srctree)/include/ -name '*.h' | xargs --max-args 1 \
diff --git a/include/linux/refcount-impl.h b/include/linux/refcount-impl.h
new file mode 100644
index ..f5c73a0f46a4
--- /dev/null
+++ b/include/linux/refcount-impl.h
@@ -0,0 +1,344 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Variant of atomic_t specialized for reference counts.
+ *
+ * The interface matches the atomic_t interface (to aid in porting) but only
+ * provides the few functions one should use for reference counting.
+ *
+ * Saturation semantics
+ * 
+ *
+ * refcount_t differs from atomic_t in that the counter saturates at
+ * REFCOUNT_SATURATED and will not move once there. This avoids wrapping the
+ * counter and causing 'spurious' use-after-free issues. In order to avoid the
+ * cost associated with introducing cmpxchg() loops into all of the saturating
+ * operations, we temporarily allow the counter to take on an unchecked value
+ * and then explicitly set it to REFCOUNT_SATURATED on detecting that underflow
+ * or overflow has occurred. Although this is racy when multiple threads
+ * access the refcount concurrently, by placing REFCOUNT_SATURATED roughly
+ * equidistant from 0 and INT_MAX we minimise the scope for error:
+ *
+ *INT_MAX REFCOUNT_SATURATED   UINT_MAX
+ *   0  (0x7fff_)(0xc000_)(0x_)
+ *   ++++
+ * <-- bad value! -->
+ *
+ * (in a signed view of the world, the "bad value" range corresponds to
+ * a negative counter value).
+ *
+ * As an example, consider a refcount_inc() operation that causes the counter
+ * to overflow:
+ *
+ * int old = atomic_fetch_add_relaxed(r);
+ * // old is INT_MAX, refcount now INT_MIN (0x8000_)
+ * if (old < 0)
+ * atomic_set(r, REFCOUNT_SATURATED);
+ *
+ * If another thread also performs a refcount_inc() operation between the two
+ * atomic operations, then the count will continue to edge closer to 0. If it
+ * reaches a value of 1 before /any/ of the threads reset it to the saturated
+ * value, then a concurrent refcount_dec_and_test() may erroneously free the
+ * underlying object.
+ * Linux limits the maximum number of tasks to PID_MAX_LIMIT, which is 
currently
+ * 0x40 (and can't easily be raised in the

[PATCH 5/5] fs: Convert struct file::f_count to refcount_long_t

Underflow of f_count needs to be more carefully detected than it
currently is. The results of get_file() should be checked, but the
first step is detection. Redefine f_count from atomic_long_t to
refcount_long_t.

Signed-off-by: Kees Cook 
---
Cc: Christian Brauner 
Cc: Alexander Viro 
Cc: Jan Kara 
Cc: linux-fsde...@vger.kernel.org
---
 fs/file.c  | 4 ++--
 fs/file_table.c| 6 +++---
 include/linux/fs.h | 6 +++---
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 3b683b9101d8..570424dd634b 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -865,7 +865,7 @@ static struct file *__get_file_rcu(struct file __rcu **f)
if (!file)
return NULL;
 
-   if (unlikely(!atomic_long_inc_not_zero(>f_count)))
+   if (unlikely(!refcount_long_inc_not_zero(>f_count)))
return ERR_PTR(-EAGAIN);
 
file_reloaded = rcu_dereference_raw(*f);
@@ -987,7 +987,7 @@ static inline struct file *__fget_files_rcu(struct 
files_struct *files,
 * barrier. We only really need an 'acquire' one to
 * protect the loads below, but we don't have that.
 */
-   if (unlikely(!atomic_long_inc_not_zero(>f_count)))
+   if (unlikely(!refcount_long_inc_not_zero(>f_count)))
continue;
 
/*
diff --git a/fs/file_table.c b/fs/file_table.c
index 4f03beed4737..f29e7b94bca1 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -167,7 +167,7 @@ static int init_file(struct file *f, int flags, const 
struct cred *cred)
 * fget-rcu pattern users need to be able to handle spurious
 * refcount bumps we should reinitialize the reused file first.
 */
-   atomic_long_set(>f_count, 1);
+   refcount_long_set(>f_count, 1);
return 0;
 }
 
@@ -470,7 +470,7 @@ static DECLARE_DELAYED_WORK(delayed_fput_work, 
delayed_fput);
 
 void fput(struct file *file)
 {
-   if (atomic_long_dec_and_test(>f_count)) {
+   if (refcount_long_dec_and_test(>f_count)) {
struct task_struct *task = current;
 
if (unlikely(!(file->f_mode & (FMODE_BACKING | FMODE_OPENED 
{
@@ -503,7 +503,7 @@ void fput(struct file *file)
  */
 void __fput_sync(struct file *file)
 {
-   if (atomic_long_dec_and_test(>f_count))
+   if (refcount_long_dec_and_test(>f_count))
__fput(file);
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 210bbbfe9b83..b8f6cce7c39d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1001,7 +1001,7 @@ struct file {
 */
spinlock_t  f_lock;
fmode_t f_mode;
-   atomic_long_t   f_count;
+   refcount_long_t f_count;
struct mutexf_pos_lock;
loff_t  f_pos;
unsigned intf_flags;
@@ -1038,7 +1038,7 @@ struct file_handle {
 
 static inline struct file *get_file(struct file *f)
 {
-   if (unlikely(!atomic_long_inc_not_zero(>f_count)))
+   if (unlikely(!refcount_long_inc_not_zero(>f_count)))
return NULL;
return f;
 }
@@ -1046,7 +1046,7 @@ static inline struct file *get_file(struct file *f)
 struct file *get_file_rcu(struct file __rcu **f);
 struct file *get_file_active(struct file **f);
 
-#define file_count(x)  atomic_long_read(&(x)->f_count)
+#define file_count(x)  refcount_long_read(&(x)->f_count)
 
 #defineMAX_NON_LFS ((1UL<<31) - 1)
 
-- 
2.34.1

[PATCH 3/5] drm/i915: Do not directly manipulate file->f_count

The correct helper for taking an f_count reference is get_file(). Use it
and check results.

Signed-off-by: Kees Cook 
---
Cc: Jani Nikula 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
Cc: Tvrtko Ursulin 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Andi Shyti 
Cc: Lucas De Marchi 
Cc: Matt Atwood 
Cc: Matthew Auld 
Cc: Nirmoy Das 
Cc: Jonathan Cavitt 
Cc: intel-...@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
---
 drivers/gpu/drm/i915/gt/shmem_utils.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c 
b/drivers/gpu/drm/i915/gt/shmem_utils.c
index bccc3a1200bc..dc25e6dc884b 100644
--- a/drivers/gpu/drm/i915/gt/shmem_utils.c
+++ b/drivers/gpu/drm/i915/gt/shmem_utils.c
@@ -38,8 +38,9 @@ struct file *shmem_create_from_object(struct 
drm_i915_gem_object *obj)
void *ptr;
 
if (i915_gem_object_is_shmem(obj)) {
-   file = obj->base.filp;
-   atomic_long_inc(>f_count);
+   file = get_file(obj->base.filp);
+   if (!file)
+   return ERR_PTR(-ESRCH);
return file;
}
 
-- 
2.34.1

[PATCH 1/5] fs: Do not allow get_file() to resurrect 0 f_count

If f_count reaches 0, calling get_file() should be a failure. Adjust to
use atomic_long_inc_not_zero() and return NULL on failure. In the future
get_file() can be annotated with __must_check, though that is not
currently possible.

Signed-off-by: Kees Cook 
---
Cc: Christian Brauner 
Cc: Alexander Viro 
Cc: Jan Kara 
Cc: linux-fsde...@vger.kernel.org
---
 include/linux/fs.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 00fc429b0af0..210bbbfe9b83 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1038,7 +1038,8 @@ struct file_handle {
 
 static inline struct file *get_file(struct file *f)
 {
-   atomic_long_inc(>f_count);
+   if (unlikely(!atomic_long_inc_not_zero(>f_count)))
+   return NULL;
return f;
 }
 
-- 
2.34.1

[PATCH 2/5] drm/vmwgfx: Do not directly manipulate file->f_count

The correct helper for taking an f_count reference is get_file().
Now that it checks for 0 counts, use it and check results.

Signed-off-by: Kees Cook 
---
Cc: Zack Rusin 
Cc: Broadcom internal kernel review list 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-devel@lists.freedesktop.org
---
 drivers/gpu/drm/vmwgfx/ttm_object.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/ttm_object.c 
b/drivers/gpu/drm/vmwgfx/ttm_object.c
index 6806c05e57f6..68d8ee3020b1 100644
--- a/drivers/gpu/drm/vmwgfx/ttm_object.c
+++ b/drivers/gpu/drm/vmwgfx/ttm_object.c
@@ -475,7 +475,7 @@ void ttm_object_device_release(struct ttm_object_device 
**p_tdev)
  */
 static bool __must_check get_dma_buf_unless_doomed(struct dma_buf *dmabuf)
 {
-   return atomic_long_inc_not_zero(>file->f_count) != 0L;
+   return get_file(dmabuf->file) != NULL;
 }
 
 /**
-- 
2.34.1

[PATCH 0/5] fs: Do not allow get_file() to resurrect 0 f_count

Hi,

Failure with f_count reference counting are better contained by
an actual reference counting type, like refcount_t. The first step
is for get_file() to use inc_not_zero to avoid resurrection. I also
found a couple open-coded modifications of f_count that should be using
get_file(). Since long ago, f_count was switched to atomic_long_t, so to
get proper reference count checking, I've added a refcount_long_t API,
and then converted f_count to refcount_long_t.

Now if there are underflows (or somehow an overflow), we'll see them
reported.

-Kees

Kees Cook (5):
  fs: Do not allow get_file() to resurrect 0 f_count
  drm/vmwgfx: Do not directly manipulate file->f_count
  drm/i915: Do not directly manipulate file->f_count
  refcount: Introduce refcount_long_t and APIs
  fs: Convert struct file::f_count to refcount_long_t

 MAINTAINERS   |   2 +-
 Makefile  |  11 +-
 drivers/gpu/drm/i915/gt/shmem_utils.c |   5 +-
 drivers/gpu/drm/vmwgfx/ttm_object.c   |   2 +-
 fs/file.c |   4 +-
 fs/file_table.c   |   6 +-
 include/linux/fs.h|   7 +-
 include/linux/refcount-impl.h | 344 ++
 include/linux/refcount.h  | 341 +
 include/linux/refcount_types.h|  12 +
 lib/refcount.c|  17 +-
 11 files changed, 398 insertions(+), 353 deletions(-)
 create mode 100644 include/linux/refcount-impl.h

-- 
2.34.1

[PATCH] drm/connector: Add \n to message about demoting connector force-probes

2024-05-02 Thread Douglas Anderson

The debug print clearly lacks a \n at the end. Add it.

Fixes: 8f86c82aba8b ("drm/connector: demote connector force-probes for 
non-master clients")
Signed-off-by: Douglas Anderson 
---

 drivers/gpu/drm/drm_connector.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_connector.c
index b0516505f7ae..4d2df7f64dc5 100644
--- a/drivers/gpu/drm/drm_connector.c
+++ b/drivers/gpu/drm/drm_connector.c
@@ -2940,7 +2940,7 @@ int drm_mode_getconnector(struct drm_device *dev, void 
*data,
 dev->mode_config.max_width,
 
dev->mode_config.max_height);
else
-   drm_dbg_kms(dev, "User-space requested a forced probe 
on [CONNECTOR:%d:%s] but is not the DRM master, demoting to read-only probe",
+   drm_dbg_kms(dev, "User-space requested a forced probe 
on [CONNECTOR:%d:%s] but is not the DRM master, demoting to read-only probe\n",
connector->base.id, connector->name);
}
 
-- 
2.45.0.rc1.225.g2a3ae87e7f-goog

Re: [PATCH v1 12/12] fbdev/viafb: Make I2C terminology more inclusive

2024-05-02 Thread Easwar Hariharan

On 5/2/2024 3:46 AM, Thomas Zimmermann wrote:
> 
> 
> Am 30.04.24 um 19:38 schrieb Easwar Hariharan:
>> I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave"
>> with more appropriate terms. Inspired by and following on to Wolfram's
>> series to fix drivers/i2c/[1], fix the terminology for users of
>> I2C_ALGOBIT bitbanging interface, now that the approved verbiage exists
>> in the specification.
>>
>> Compile tested, no functionality changes intended
>>
>> [1]: 
>> https://lore.kernel.org/all/20240322132619.6389-1-wsa+rene...@sang-engineering.com/
>>
>> Signed-off-by: Easwar Hariharan 
> 
> Acked-by: Thomas Zimmermann 
> 

Thanks for the ack! I had been addressing feedback as I got it on the v0 
series, and it seems
I missed out on updating viafb and smscufx to spec-compliant controller/target 
terminology like
the v0->v1 changelog calls out before posting v1.

For smscufx, I feel phrasing the following line (as an example)

> -/* sets up I2C Controller for 100 Kbps, std. speed, 7-bit addr, host, 
> +/* sets up I2C Controller for 100 Kbps, std. speed, 7-bit addr, 
> *controller*, 

would actually impact readability negatively, so I propose to leave smscufx as 
is.

For viafb, I propose making it compliant with the spec using the 
controller/target terminology and
posting a v2 respin (which I can send out as soon as you say) and ask you to 
review again.

What do you think?

Thanks,
Easwar

>> ---
>>   drivers/video/fbdev/via/chip.h    |  8 
>>   drivers/video/fbdev/via/dvi.c | 24 
>>   drivers/video/fbdev/via/lcd.c |  6 +++---
>>   drivers/video/fbdev/via/via_aux.h |  2 +-
>>   drivers/video/fbdev/via/via_i2c.c | 12 ++--
>>   drivers/video/fbdev/via/vt1636.c  |  6 +++---
>>   6 files changed, 29 insertions(+), 29 deletions(-)
>>

[PATCH 1/1] drm: arm: display: komeda: komeda_crtc: use 'time_left' variable with wait_for_completion_timeout()

2024-05-02 Thread Wolfram Sang

There is a confusing pattern in the kernel to use a variable named 'timeout' to
store the result of wait_for_completion_timeout() causing patterns like:

timeout = wait_for_completion_timeout(...)
if (!timeout) return -ETIMEDOUT;

with all kinds of permutations. Use 'time_left' as a variable to make the code
self explaining.

Fix to the proper variable type 'unsigned long' while here.

Signed-off-by: Wolfram Sang 
---
 drivers/gpu/drm/arm/display/komeda/komeda_crtc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/arm/display/komeda/komeda_crtc.c 
b/drivers/gpu/drm/arm/display/komeda/komeda_crtc.c
index 2c661f28410e..c867acb737d6 100644
--- a/drivers/gpu/drm/arm/display/komeda/komeda_crtc.c
+++ b/drivers/gpu/drm/arm/display/komeda/komeda_crtc.c
@@ -294,7 +294,7 @@ komeda_crtc_flush_and_wait_for_flip_done(struct komeda_crtc 
*kcrtc,
struct komeda_dev *mdev = kcrtc->master->mdev;
struct completion *flip_done;
struct completion temp;
-   int timeout;
+   unsigned long time_left;
 
/* if caller doesn't send a flip_done, use a private flip_done */
if (input_flip_done) {
@@ -308,8 +308,8 @@ komeda_crtc_flush_and_wait_for_flip_done(struct komeda_crtc 
*kcrtc,
mdev->funcs->flush(mdev, kcrtc->master->id, 0);
 
/* wait the flip take affect.*/
-   timeout = wait_for_completion_timeout(flip_done, HZ);
-   if (timeout == 0) {
+   time_left = wait_for_completion_timeout(flip_done, HZ);
+   if (time_left == 0) {
DRM_ERROR("wait pipe%d flip done timeout\n", kcrtc->master->id);
if (!input_flip_done) {
unsigned long flags;
-- 
2.43.0

[linux-next:master] BUILD REGRESSION 9c6ecb3cb6e20c4fd7997047213ba0efcf9ada1a

2024-05-02 Thread kernel test robot

tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 9c6ecb3cb6e20c4fd7997047213ba0efcf9ada1a  Add linux-next specific 
files for 20240502

Unverified Error/Warning (likely false positive, please contact us if 
interested):

drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu14/smu_v14_0.c:80:52: error: '%s' 
directive output may be truncated writing up to 29 bytes into a region of size 
23 [-Werror=format-truncation=]
{standard input}:898: Warning: overflow in branch to .L152; converted into 
longer instruction sequence

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-gmc_v12_0.c:warning:Function-parameter-or-struct-member-all_hub-not-described-in-gmc_v12_0_flush_gpu_tlb_pasid
|   |-- 
drivers-gpu-drm-amd-amdgpu-gmc_v12_0.c:warning:Function-parameter-or-struct-member-flush_type-not-described-in-gmc_v12_0_flush_gpu_tlb
|   |-- 
drivers-gpu-drm-amd-amdgpu-gmc_v12_0.c:warning:Function-parameter-or-struct-member-flush_type-not-described-in-gmc_v12_0_flush_gpu_tlb_pasid
|   |-- 
drivers-gpu-drm-amd-amdgpu-gmc_v12_0.c:warning:Function-parameter-or-struct-member-inst-not-described-in-gmc_v12_0_flush_gpu_tlb_pasid
|   |-- 
drivers-gpu-drm-amd-amdgpu-gmc_v12_0.c:warning:Function-parameter-or-struct-member-vmhub-not-described-in-gmc_v12_0_flush_gpu_tlb
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Excess-function-parameter-addr-description-in-sdma_v7_0_vm_write_pte
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Excess-function-parameter-fence-description-in-sdma_v7_0_ring_emit_fence
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Excess-function-parameter-flags-description-in-sdma_v7_0_vm_write_pte
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Excess-function-parameter-ib-description-in-sdma_v7_0_ring_emit_mem_sync
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Excess-function-parameter-job-description-in-sdma_v7_0_ring_emit_mem_sync
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Excess-function-parameter-ring-description-in-sdma_v7_0_emit_copy_buffer
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Excess-function-parameter-ring-description-in-sdma_v7_0_emit_fill_buffer
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Excess-function-parameter-vm-description-in-sdma_v7_0_ring_emit_vm_flush
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-addr-not-described-in-sdma_v7_0_ring_emit_fence
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-flags-not-described-in-sdma_v7_0_ring_emit_fence
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-flags-not-described-in-sdma_v7_0_ring_emit_ib
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-ib-not-described-in-sdma_v7_0_emit_copy_buffer
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-ib-not-described-in-sdma_v7_0_emit_fill_buffer
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-job-not-described-in-sdma_v7_0_ring_emit_ib
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-pd_addr-not-described-in-sdma_v7_0_ring_emit_vm_flush
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-ring-not-described-in-sdma_v7_0_ring_pad_ib
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-seq-not-described-in-sdma_v7_0_ring_emit_fence
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-timeout-not-described-in-sdma_v7_0_ring_test_ib
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-value-not-described-in-sdma_v7_0_vm_write_pte
|   |-- 
drivers-gpu-drm-amd-amdgpu-sdma_v7_0.c:warning:Function-parameter-or-struct-member-vmid-not-described-in-sdma_v7_0_ring_emit_vm_flush
|   |-- 
drivers-gpu-drm-imx-ipuv3-imx-ldb.c:error:_sel-directive-output-may-be-truncated-writing-bytes-into-a-region-of-size-between-and
|   |-- 
drivers-gpu-drm-nouveau-nouveau_backlight.c:error:d-directive-output-may-be-truncated-writing-between-and-bytes-into-a-region-of-size
|   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
|-- arc-allmodconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-gmc_v12_0.c:warning:Function-parameter-or-struct-member-all_hub-not-described-in-gmc_v12_0_flush_gpu_tlb_pasid
|   |-- 
drivers-gpu-drm-amd-amdgpu-gmc_v12_0.c:warning:Function-parameter-or-struct-member-flush_type-not-described-in-gmc_v12_0_flush_gpu_tlb
|   |-- 
drivers-gpu-drm-amd-amdgpu-gmc_v12_0.c:warning:Function-parameter-or-struct-member-flush_type-not-described-in-gmc_v12_0_flush_gpu_tlb_pasid
|   |-- 
drivers-gpu-drm-amd-amdgpu-gmc_v12_0.c:warning:Function-parameter-or-struct-member-inst

RE: [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table from hmm range

2024-05-02 Thread Zeng, Oak

Hi Jason,

I tried to understand how you supposed us to use hmm range fault... it seems 
you want us to call hmm range fault two times on each gpu page fault:

1.
Call Hmm_range_fault first time, pfn of the fault address is set with 
HMM_PFN_REQ_FAULT
Other pfns in the PREFAULT_SIZE range will be set as 0
Hmm_range fault returns:
Pfn with 0 flag or HMM_PFN_VALID flag, means a valid pfn
Pfn with HMM_PFN_ERROR flag, means invalid pfn

2.  
Then call hmm_range_fault a second time
Setting the hmm_range start/end only to cover valid pfns
With all valid pfns, set the REQ_FAULT flag

Basically use hmm_range_fault to figure out the valid address range in the 
first round; then really fault (e.g., trigger cpu fault to allocate system 
pages) in the second call the hmm range fault.

Do I understand it correctly?

This is strange to me. We should already know the valid address range before we 
call hmm range fault, because the migration codes need to look up cpu vma 
anyway. what is the point of the first hmm_range fault?

Oak

> -Original Message-
> From: Thomas Hellström 
> Sent: Thursday, May 2, 2024 11:02 AM
> To: Jason Gunthorpe 
> Cc: Daniel Vetter ; Zeng, Oak ; dri-
> de...@lists.freedesktop.org; intel...@lists.freedesktop.org; Brost,
> Matthew ; Welty, Brian
> ; Ghimiray, Himal Prasad
> ; Bommu, Krishnaiah
> ; Vishwanathapura, Niranjana
> ; Leon Romanovsky
> 
> Subject: Re: [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table
> from hmm range
> 
> On Thu, 2024-05-02 at 09:46 -0300, Jason Gunthorpe wrote:
> > On Thu, May 02, 2024 at 11:11:04AM +0200, Thomas Hellström wrote:
> >
> > > It's true the cpu vma lookup is a remnant from amdkfd. The idea
> > > here is
> > > to replace that with fixed prefaulting ranges of tunable size. So
> > > far,
> > > as you mention, the prefaulting range has been determined by the
> > > CPU
> > > vma size. Given previous feedback, this is going to change.
> >
> > Perhaps limiting prefault to a VMA barrier is a reasonable thing to
> > do, but the implementation should be pushed into hmm_range_fault and
> > not open coded in the driver.
> >
> > > Still the prefaulting range needs to be restricted to avoid -EFAULT
> > > failures in hmm_range_fault(). That can ofc be done by calling it
> > > without HMM_PFN_REQ_FAULT for the range and interpret the returned
> > > pnfs.
> >
> > Yes, this is exactly what that feature is for, you mark your prefetch
> > differently from the fault critical page(s).
> >
> > > There is a performance concern of this approach as compared to
> > > peeking at the CPU vmas directly, since hmm_range_fault() would
> > > need to
> > > be called twice. Any guidelines ideas here?
> >
> > If there is something wrong with hmm_range_fault() then please fix
> > it. I'm not sure why you'd call it twice, the HMM_PFN_REQ_FAULT is
> > per
> > PFN?
> 
> Ah, yes you're right. I somehow thought it was per range. Makes sense
> now.
> 
> Thanks,
> Thomas
> 
> 
> 
> >
> > Jason

[PULL] drm-misc-fixes

2024-05-02 Thread Thomas Zimmermann

Hi Dave, Sima,

here's the PR for drm-misc-fixes for this week.

Best regards
Thomas

drm-misc-fixes-2024-05-02:
Short summary of fixes pull:

imagination:
- fix page-count macro

nouveau:
- avoid page-table allocation failures
- fix firmware memory allocation

panel:
- ili9341: avoid OF for device properties; respect deferred probe; fix
usage of errno codes

ttm:
- fix status output

vmwgfx:
- fix legacy display unit
- fix read length in fence signalling
The following changes since commit 78d9161d2bcd442d93d917339297ffa057dbee8c:

  fbdev: fix incorrect address computation in deferred IO (2024-04-24 15:03:37 
+0200)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/misc/kernel.git 
tags/drm-misc-fixes-2024-05-02

for you to fetch changes up to da85f0aaa9f21999753b01d45c0343f885a8f905:

  drm/panel: ili9341: Use predefined error codes (2024-05-02 09:41:27 +0200)


Short summary of fixes pull:

imagination:
- fix page-count macro

nouveau:
- avoid page-table allocation failures
- fix firmware memory allocation

panel:
- ili9341: avoid OF for device properties; respect deferred probe; fix
usage of errno codes

ttm:
- fix status output

vmwgfx:
- fix legacy display unit
- fix read length in fence signalling


Andy Shevchenko (3):
  drm/panel: ili9341: Correct use of device property APIs
  drm/panel: ili9341: Respect deferred probe
  drm/panel: ili9341: Use predefined error codes

Ian Forbes (1):
  drm/vmwgfx: Fix Legacy Display Unit

Lyude Paul (2):
  drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()
  drm/nouveau/gsp: Use the sg allocator for level 2 of radix3

Matt Coster (1):
  drm/imagination: Ensure PVR_MIPS_PT_PAGE_COUNT is never zero

Zack Rusin (2):
  drm/ttm: Print the memory decryption status just once
  drm/vmwgfx: Fix invalid reads in fence signaled events

 drivers/gpu/drm/imagination/pvr_fw_mips.h |  5 +-
 drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h |  4 +-
 drivers/gpu/drm/nouveau/nvkm/core/firmware.c  | 19 +++---
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c| 77 +++
 drivers/gpu/drm/panel/Kconfig |  2 +-
 drivers/gpu/drm/panel/panel-ilitek-ili9341.c  | 13 ++--
 drivers/gpu/drm/ttm/ttm_tt.c  |  2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c|  1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c |  2 +-
 9 files changed, 80 insertions(+), 45 deletions(-)

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)

Re: [PATCH] drm/panthor: Fix the FW reset logic

On Tue, 30 Apr 2024 13:37:27 +0200
Boris Brezillon  wrote:

> In the post_reset function, if the fast reset didn't succeed, we
> are not clearing the fast_reset flag, which prevents firmware
> sections from being reloaded. While at it, use panthor_fw_stop()
> instead of manually writing DISABLE to the MCU_CONTROL register.
> 
> Fixes: 2718d91816ee ("drm/panthor: Add the FW logical block")
> Signed-off-by: Boris Brezillon 

Queued to drm-misc-next-fixes.

> ---
>  drivers/gpu/drm/panthor/panthor_fw.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c 
> b/drivers/gpu/drm/panthor/panthor_fw.c
> index 181395e2859a..fedf9627453f 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.c
> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> @@ -1083,10 +1083,11 @@ int panthor_fw_post_reset(struct panthor_device 
> *ptdev)
>   if (!ret)
>   goto out;
>  
> - /* Force a disable, so we get a fresh boot on the next
> -  * panthor_fw_start() call.
> + /* Forcibly reset the MCU and force a slow reset, so we get a
> +  * fresh boot on the next panthor_fw_start() call.
>*/
> - gpu_write(ptdev, MCU_CONTROL, MCU_CONTROL_DISABLE);
> + panthor_fw_stop(ptdev);
> + ptdev->fw->fast_reset = false;
>   drm_err(>base, "FW fast reset failed, trying a slow 
> reset");
>   }
>

[PATCH 4/4] drm/panthor: Call panthor_sched_post_reset() even if the reset failed

We need to undo what was done in panthor_sched_pre_reset() even if the
reset failed. We just flag all previously running groups as terminated
when that happens to unblock things.

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panthor/panthor_device.c |  7 +--
 drivers/gpu/drm/panthor/panthor_sched.c  | 19 ++-
 drivers/gpu/drm/panthor/panthor_sched.h  |  2 +-
 3 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.c 
b/drivers/gpu/drm/panthor/panthor_device.c
index 4c5b54e7abb7..4082c8f2951d 100644
--- a/drivers/gpu/drm/panthor/panthor_device.c
+++ b/drivers/gpu/drm/panthor/panthor_device.c
@@ -129,13 +129,8 @@ static void panthor_device_reset_work(struct work_struct 
*work)
panthor_gpu_l2_power_on(ptdev);
panthor_mmu_post_reset(ptdev);
ret = panthor_fw_post_reset(ptdev);
-   if (ret)
-   goto out_dev_exit;
-
atomic_set(>reset.pending, 0);
-   panthor_sched_post_reset(ptdev);
-
-out_dev_exit:
+   panthor_sched_post_reset(ptdev, ret != 0);
drm_dev_exit(cookie);
 
if (ret) {
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c 
b/drivers/gpu/drm/panthor/panthor_sched.c
index 6ea094b00cf9..fc43ff62c77d 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -2728,15 +2728,22 @@ void panthor_sched_pre_reset(struct panthor_device 
*ptdev)
mutex_unlock(>reset.lock);
 }
 
-void panthor_sched_post_reset(struct panthor_device *ptdev)
+void panthor_sched_post_reset(struct panthor_device *ptdev, bool reset_failed)
 {
struct panthor_scheduler *sched = ptdev->scheduler;
struct panthor_group *group, *group_tmp;
 
mutex_lock(>reset.lock);
 
-   list_for_each_entry_safe(group, group_tmp, 
>reset.stopped_groups, run_node)
+   list_for_each_entry_safe(group, group_tmp, 
>reset.stopped_groups, run_node) {
+   /* Consider all previously running group as terminated if the
+* reset failed.
+*/
+   if (reset_failed)
+   group->state = PANTHOR_CS_GROUP_TERMINATED;
+
panthor_group_start(group);
+   }
 
/* We're done resetting the GPU, clear the reset.in_progress bit so we 
can
 * kick the scheduler.
@@ -2744,9 +2751,11 @@ void panthor_sched_post_reset(struct panthor_device 
*ptdev)
atomic_set(>reset.in_progress, false);
mutex_unlock(>reset.lock);
 
-   sched_queue_delayed_work(sched, tick, 0);
-
-   sched_queue_work(sched, sync_upd);
+   /* No need to queue a tick and update syncs if the reset failed. */
+   if (!reset_failed) {
+   sched_queue_delayed_work(sched, tick, 0);
+   sched_queue_work(sched, sync_upd);
+   }
 }
 
 static void group_sync_upd_work(struct work_struct *work)
diff --git a/drivers/gpu/drm/panthor/panthor_sched.h 
b/drivers/gpu/drm/panthor/panthor_sched.h
index 66438b1f331f..3a30d2328b30 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.h
+++ b/drivers/gpu/drm/panthor/panthor_sched.h
@@ -40,7 +40,7 @@ void panthor_group_pool_destroy(struct panthor_file *pfile);
 int panthor_sched_init(struct panthor_device *ptdev);
 void panthor_sched_unplug(struct panthor_device *ptdev);
 void panthor_sched_pre_reset(struct panthor_device *ptdev);
-void panthor_sched_post_reset(struct panthor_device *ptdev);
+void panthor_sched_post_reset(struct panthor_device *ptdev, bool reset_failed);
 void panthor_sched_suspend(struct panthor_device *ptdev);
 void panthor_sched_resume(struct panthor_device *ptdev);
 
-- 
2.44.0

[PATCH 2/4] drm/panthor: Keep a ref to the VM at the panthor_kernel_bo level

Avoids use-after-free situations when panthor_fw_unplug() is called
and the kernel BO was mapped to the FW VM.

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panthor/panthor_fw.c|  4 ++--
 drivers/gpu/drm/panthor/panthor_gem.c   |  8 +---
 drivers/gpu/drm/panthor/panthor_gem.h   |  8 ++--
 drivers/gpu/drm/panthor/panthor_heap.c  |  8 
 drivers/gpu/drm/panthor/panthor_sched.c | 11 +--
 5 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_fw.c 
b/drivers/gpu/drm/panthor/panthor_fw.c
index 181395e2859a..b41685304a83 100644
--- a/drivers/gpu/drm/panthor/panthor_fw.c
+++ b/drivers/gpu/drm/panthor/panthor_fw.c
@@ -453,7 +453,7 @@ panthor_fw_alloc_queue_iface_mem(struct panthor_device 
*ptdev,
 
ret = panthor_kernel_bo_vmap(mem);
if (ret) {
-   panthor_kernel_bo_destroy(panthor_fw_vm(ptdev), mem);
+   panthor_kernel_bo_destroy(mem);
return ERR_PTR(ret);
}
 
@@ -1133,7 +1133,7 @@ void panthor_fw_unplug(struct panthor_device *ptdev)
panthor_fw_stop(ptdev);
 
list_for_each_entry(section, >fw->sections, node)
-   panthor_kernel_bo_destroy(panthor_fw_vm(ptdev), section->mem);
+   panthor_kernel_bo_destroy(section->mem);
 
/* We intentionally don't call panthor_vm_idle() and let
 * panthor_mmu_unplug() release the AS we acquired with
diff --git a/drivers/gpu/drm/panthor/panthor_gem.c 
b/drivers/gpu/drm/panthor/panthor_gem.c
index d6483266d0c2..38f560864879 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.c
+++ b/drivers/gpu/drm/panthor/panthor_gem.c
@@ -26,18 +26,18 @@ static void panthor_gem_free_object(struct drm_gem_object 
*obj)
 
 /**
  * panthor_kernel_bo_destroy() - Destroy a kernel buffer object
- * @vm: The VM this BO was mapped to.
  * @bo: Kernel buffer object to destroy. If NULL or an ERR_PTR(), the 
destruction
  * is skipped.
  */
-void panthor_kernel_bo_destroy(struct panthor_vm *vm,
-  struct panthor_kernel_bo *bo)
+void panthor_kernel_bo_destroy(struct panthor_kernel_bo *bo)
 {
+   struct panthor_vm *vm;
int ret;
 
if (IS_ERR_OR_NULL(bo))
return;
 
+   vm = bo->vm;
panthor_kernel_bo_vunmap(bo);
 
if (drm_WARN_ON(bo->obj->dev,
@@ -53,6 +53,7 @@ void panthor_kernel_bo_destroy(struct panthor_vm *vm,
drm_gem_object_put(bo->obj);
 
 out_free_bo:
+   panthor_vm_put(vm);
kfree(bo);
 }
 
@@ -106,6 +107,7 @@ panthor_kernel_bo_create(struct panthor_device *ptdev, 
struct panthor_vm *vm,
if (ret)
goto err_free_va;
 
+   kbo->vm = panthor_vm_get(vm);
bo->exclusive_vm_root_gem = panthor_vm_root_gem(vm);
drm_gem_object_get(bo->exclusive_vm_root_gem);
bo->base.base.resv = bo->exclusive_vm_root_gem->resv;
diff --git a/drivers/gpu/drm/panthor/panthor_gem.h 
b/drivers/gpu/drm/panthor/panthor_gem.h
index 3bccba394d00..e43021cf6d45 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.h
+++ b/drivers/gpu/drm/panthor/panthor_gem.h
@@ -61,6 +61,11 @@ struct panthor_kernel_bo {
 */
struct drm_gem_object *obj;
 
+   /**
+* @vm: VM this private buffer is attached to.
+*/
+   struct panthor_vm *vm;
+
/**
 * @va_node: VA space allocated to this GEM.
 */
@@ -136,7 +141,6 @@ panthor_kernel_bo_create(struct panthor_device *ptdev, 
struct panthor_vm *vm,
 size_t size, u32 bo_flags, u32 vm_map_flags,
 u64 gpu_va);
 
-void panthor_kernel_bo_destroy(struct panthor_vm *vm,
-  struct panthor_kernel_bo *bo);
+void panthor_kernel_bo_destroy(struct panthor_kernel_bo *bo);
 
 #endif /* __PANTHOR_GEM_H__ */
diff --git a/drivers/gpu/drm/panthor/panthor_heap.c 
b/drivers/gpu/drm/panthor/panthor_heap.c
index 143fa35f2e74..65921296a18c 100644
--- a/drivers/gpu/drm/panthor/panthor_heap.c
+++ b/drivers/gpu/drm/panthor/panthor_heap.c
@@ -127,7 +127,7 @@ static void panthor_free_heap_chunk(struct panthor_vm *vm,
heap->chunk_count--;
mutex_unlock(>lock);
 
-   panthor_kernel_bo_destroy(vm, chunk->bo);
+   panthor_kernel_bo_destroy(chunk->bo);
kfree(chunk);
 }
 
@@ -183,7 +183,7 @@ static int panthor_alloc_heap_chunk(struct panthor_device 
*ptdev,
return 0;
 
 err_destroy_bo:
-   panthor_kernel_bo_destroy(vm, chunk->bo);
+   panthor_kernel_bo_destroy(chunk->bo);
 
 err_free_chunk:
kfree(chunk);
@@ -391,7 +391,7 @@ int panthor_heap_return_chunk(struct panthor_heap_pool 
*pool,
mutex_unlock(>lock);
 
if (removed) {
-   panthor_kernel_bo_destroy(pool->vm, chunk->bo);
+   panthor_kernel_bo_destroy(chunk->bo);
kfree(chunk);
ret = 0;
} else {
@@ -587,7 +587,7 @@ void panthor_heap_pool_destroy(struct panthor_heap_pool 
*pool)

[PATCH 0/4] drm/panthor: More reset fixes

Hello,

This is a collection of fixes for bugs found while chasing an
unrecoverable fault leading to a device unplug (because of some
other bugs that was introduced in my local dev branch).

The first patch makes sure we immediately reset the GPU on an
unrecoverable fault, and following patches are fixing various
NULL/invalid pointer derefs caused by use-after-free situations
following a device unplug.

Regards,

Boris

Boris Brezillon (4):
  drm/panthor: Force an immediate reset on unrecoverable faults
  drm/panthor: Keep a ref to the VM at the panthor_kernel_bo level
  drm/panthor: Reset the FW VM to NULL on unplug
  drm/panthor: Call panthor_sched_post_reset() even if the reset failed

 drivers/gpu/drm/panthor/panthor_device.c |  8 ++---
 drivers/gpu/drm/panthor/panthor_device.h |  1 +
 drivers/gpu/drm/panthor/panthor_fw.c |  5 +--
 drivers/gpu/drm/panthor/panthor_gem.c|  8 +++--
 drivers/gpu/drm/panthor/panthor_gem.h|  8 +++--
 drivers/gpu/drm/panthor/panthor_heap.c   |  8 ++---
 drivers/gpu/drm/panthor/panthor_sched.c  | 40 +---
 drivers/gpu/drm/panthor/panthor_sched.h  |  2 +-
 8 files changed, 51 insertions(+), 29 deletions(-)

-- 
2.44.0

[PATCH 3/4] drm/panthor: Reset the FW VM to NULL on unplug

This way get NULL derefs instead of use-after-free if the FW VM is
referenced after the device has been unplugged.

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panthor/panthor_fw.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/panthor/panthor_fw.c 
b/drivers/gpu/drm/panthor/panthor_fw.c
index b41685304a83..93165961a6b5 100644
--- a/drivers/gpu/drm/panthor/panthor_fw.c
+++ b/drivers/gpu/drm/panthor/panthor_fw.c
@@ -1141,6 +1141,7 @@ void panthor_fw_unplug(struct panthor_device *ptdev)
 * state to keep the active_refcnt balanced.
 */
panthor_vm_put(ptdev->fw->vm);
+   ptdev->fw->vm = NULL;
 
panthor_gpu_power_off(ptdev, L2, ptdev->gpu_info.l2_present, 2);
 }
-- 
2.44.0

[PATCH 1/4] drm/panthor: Force an immediate reset on unrecoverable faults

If the FW reports an unrecoverable fault, we need to reset the GPU
before we can start re-using it again.

Signed-off-by: Boris Brezillon 
---
 drivers/gpu/drm/panthor/panthor_device.c |  1 +
 drivers/gpu/drm/panthor/panthor_device.h |  1 +
 drivers/gpu/drm/panthor/panthor_sched.c  | 11 ++-
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.c 
b/drivers/gpu/drm/panthor/panthor_device.c
index 75276cbeba20..4c5b54e7abb7 100644
--- a/drivers/gpu/drm/panthor/panthor_device.c
+++ b/drivers/gpu/drm/panthor/panthor_device.c
@@ -293,6 +293,7 @@ static const struct panthor_exception_info 
panthor_exception_infos[] = {
PANTHOR_EXCEPTION(ACTIVE),
PANTHOR_EXCEPTION(CS_RES_TERM),
PANTHOR_EXCEPTION(CS_CONFIG_FAULT),
+   PANTHOR_EXCEPTION(CS_UNRECOVERABLE),
PANTHOR_EXCEPTION(CS_ENDPOINT_FAULT),
PANTHOR_EXCEPTION(CS_BUS_FAULT),
PANTHOR_EXCEPTION(CS_INSTR_INVALID),
diff --git a/drivers/gpu/drm/panthor/panthor_device.h 
b/drivers/gpu/drm/panthor/panthor_device.h
index 2fdd671b38fd..e388c0472ba7 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -216,6 +216,7 @@ enum drm_panthor_exception_type {
DRM_PANTHOR_EXCEPTION_CS_RES_TERM = 0x0f,
DRM_PANTHOR_EXCEPTION_MAX_NON_FAULT = 0x3f,
DRM_PANTHOR_EXCEPTION_CS_CONFIG_FAULT = 0x40,
+   DRM_PANTHOR_EXCEPTION_CS_UNRECOVERABLE = 0x41,
DRM_PANTHOR_EXCEPTION_CS_ENDPOINT_FAULT = 0x44,
DRM_PANTHOR_EXCEPTION_CS_BUS_FAULT = 0x48,
DRM_PANTHOR_EXCEPTION_CS_INSTR_INVALID = 0x49,
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c 
b/drivers/gpu/drm/panthor/panthor_sched.c
index 7f16a4a14e9a..1d2708c3ab0a 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -1281,7 +1281,16 @@ cs_slot_process_fatal_event_locked(struct panthor_device 
*ptdev,
if (group)
group->fatal_queues |= BIT(cs_id);
 
-   sched_queue_delayed_work(sched, tick, 0);
+   if (CS_EXCEPTION_TYPE(fatal) == DRM_PANTHOR_EXCEPTION_CS_UNRECOVERABLE) 
{
+   /* If this exception is unrecoverable, queue a reset, and make
+* sure we stop scheduling groups until the reset has happened.
+*/
+   panthor_device_schedule_reset(ptdev);
+   cancel_delayed_work(>tick_work);
+   } else {
+   sched_queue_delayed_work(sched, tick, 0);
+   }
+
drm_warn(>base,
 "CSG slot %d CS slot: %d\n"
 "CS_FATAL.EXCEPTION_TYPE: 0x%x (%s)\n"
-- 
2.44.0

Re: [v1,1/3] drm/panel: ili9341: Correct use of device property APIs

2024-05-02 Thread Andy Shevchenko

On Fri, May 03, 2024 at 01:28:26AM +0800, Sui Jingfeng wrote:
> On 2024/5/2 15:34, Neil Armstrong wrote:
> > On 30/04/2024 11:34, Maxime Ripard wrote:
> > > On Tue, Apr 30, 2024 at 12:54:39AM +0800, Sui Jingfeng wrote:
> > > > On 2024/4/29 19:55, Maxime Ripard wrote:
> > > > > On Sat, Apr 27, 2024 at 01:57:46PM +0800, Sui Jingfeng wrote:
> > > > > > On 2024/4/26 14:23, Maxime Ripard wrote:
> > > > > > > On Fri, Apr 26, 2024 at 04:43:18AM +0800, Sui Jingfeng wrote:
> > > > > > > > On 2024/4/26 03:10, Andy Shevchenko wrote:
> > > > > > > > > On Fri, Apr 26, 2024 at 02:08:16AM +0800, Sui Jingfeng wrote:
> > > > > > > > > > On 2024/4/25 22:26, Andy Shevchenko wrote:

...

> > > > > > > > > > > It seems driver missed the point of proper use of device
> > > > > > > > > > > property APIs.  Correct this by updating headers and
> > > > > > > > > > > calls respectively.

> > > > > > > > > > You are using the 'seems' here exactly saying that you are
> > > > > > > > > > not 100% sure.

> > > > > > > > > > Please allow me to tell you the truth: This patch again has
> > > > > > > > > > ZERO effect.  It fix nothing. And this patch is has the
> > > > > > > > > > risks to be wrong.

> > > > > > > > > Huh?! Really, stop commenting the stuff you do not understand.

> > > > > > > > I'm actually a professional display drivers developer at the
> > > > > > > > downstream in the past, despite my contribution to upstream is
> > > > > > > > less. But I believe that all panel driver developers know what
> > > > > > > > I'm talking about. So please have take a look at my replies.

> > > > > > > Most of the interactions you had in this series has been uncalled
> > > > > > > for.  You might be against a patch, but there's no need to go to
> > > > > > > such length.
> > > > > > > 
> > > > > > > As far as I'm concerned, this patch is fine to me in itself, and
> > > > > > > I don't see anything that would prevent us from merging it.

> > > > > > No one is preventing you, as long as don't misunderstanding what
> > > > > > other people's technical replies intentionally. I'm just a usual
> > > > > > and normal contributor, I hope the world will better than
> > > > > > yesterday.

> > > > > You should seriously consider your tone when replying then.
> > > > > 
> > > > > > Saying such thing to me may not proper, I guess you may want to talk
> > > > > > to peoples who has the push rights

> > > > > I think you misunderstood me. My point was that your several rants
> > > > > were uncalled for and aren't the kind of things we're doing here.
> > > > > 
> > > > > I know very well how to get a patch merged, thanks.
> > > > > 
> > > > > > just make sure it isn't a insult to the professionalism of drm 
> > > > > > bridge
> > > > > > community itself though.

> > > > > I'm not sure why you're bringing the bridge community or its
> > > > > professionalism. It's a panel, not a bridge, and I never doubted the
> > > > > professionalism of anyone.
> > > > 
> > > > I means that the code itself could be adopted, as newer and younger
> > > > programmer (like Andy) need to be encouraged to contribute.
> > > 
> > > Andy has thousands of commits in Linux. He's *very* far from being a new
> > > contributor.
> > > 
> > > > I express no obvious objections, just hints him that something else
> > > > probably should also be taken into consideration as well.
> > > 
> > > That might be what you wanted to express, but you definitely didn't
> > > express it that way.
> > > 
> > > > On the other hand, we probably should allow other people participate in
> > > > discussion so that it is sufficient discussed and ensure that it won't
> > > > be reverted by someone in the future for some reasons. Backing to out
> > > > case happens here, we may need to move things forward.  Therefore, it
> > > > definitely deserve to have a try. It is not a big deal even though it
> > > > gets reverted someday.
> > > > 
> > > > In the end, I don't mind if you think there is nothing that could
> > > > prevent you from merge it, but I still suggest you have a glance at
> > > > peoples siting at the Cc list. I'm busy now and I have a lot of other
> > > > tasks to do, and may not be able to reply you emails on time. So it up
> > > > to you and other maintainers to decide.
> > > > Thank you.
> > > 
> > > So far, you're the only one who reviewed those patches. I'm not sure
> > > what you're talking about here.
> > 
> > Well I (as drm-panel maintainer) did review them positively

[...]

> > because the patches looked perfectly correct in regards of the commit
> > message
> 
> The point is the 'fixes' tag.
> 
> Then, can I ask what's the issue it fixes? I'm asking because I see the
> submitting-patches.html [1] documentation told us that a fixes tag indicates
> that the patch fixes an issue in a previous commit.
> 
> Previously, the driver only meant to be used on the DT systems, so what's 
> issue?

It fixes copy'n'paste error as far as I understand the motivation of this code.

> [1] 
>

Re: [v1,1/3] drm/panel: ili9341: Correct use of device property APIs

2024-05-02 Thread Andy Shevchenko

On Fri, May 03, 2024 at 12:25:17AM +0800, Sui Jingfeng wrote:
> On 2024/5/2 16:32, Andy Shevchenko wrote:
> > On Wed, May 01, 2024 at 12:27:14AM +0800, Sui Jingfeng wrote:
> > > On 2024/4/30 22:13, Andy Shevchenko wrote:
> > > > On Fri, Apr 26, 2024 at 05:13:43AM +0800, Sui Jingfeng wrote:

...

> > > > the former might be subdivided to "is it swnode backed or real fwnode 
> > > > one?"
> > > > 
> > > Yeah,
> > > On non-DT cases, it can be subdivided to swnode backed case and ACPI 
> > > fwnode backed case.
> > > 
> > >   - For swnode backed case: the device_get_match_data() don't has a 
> > > implemented backend.
> > >   - For ACPI fwnode backed case: the device_get_match_data() has a 
> > > implemented backend.
> > > 
> > > But the driver has *neither* software node support
> > True.
> > 
> > > nor ACPI support,
> > Not true.
> 
> Why this is not true? Are you means that the panel-ilitek-ili9341 driver has 
> ACPI support?

That's the idea (as far as I see the copy of the code from tinyDRM),
but broken over the copy'n'paste. This patch rectifies that to be
in align with the original code, which *does* support ACPI.

> I'm asking because I don't see struct acpi_device_id related stuff in that 
> source file,
> am I miss something?

Yes, you are. I leave it for you to research.

> > So, slow down and take your time to get into the code and understand how it 
> > works.
> > 
> > > so that the rotation property can not get and ili9341_dpi_probe() will 
> > > fails.
> > > So in total, this is not a 100% correct use of device property APIs.
> > > 
> > > But I'm fine that if you want to leave(ignore) those less frequent use 
> > > cases temporarily,
> > > there may have programmers want to post patches, to complete the missing 
> > > in the future.
> > > 
> > > So, there do have some gains on non-DT cases.
> > > 
> > >   - As you make it be able to compiled on X86 with the drm-misc-defconfig.
> > >   - You cleanup the code up (at least patch 2 in this series is no 
> > > obvious problem).
> > >   - You allow people to modprobe it, and maybe half right and half 
> > > undefined.
> > > 
> > > But you do helps moving something forward, so congratulations for the 
> > > wake up.

-- 
With Best Regards,
Andy Shevchenko

Re: [v1,1/3] drm/panel: ili9341: Correct use of device property APIs

2024-05-02 Thread Sui Jingfeng

Hi,

On 2024/5/2 15:34, Neil Armstrong wrote:

On 30/04/2024 11:34, Maxime Ripard wrote:

On Tue, Apr 30, 2024 at 12:54:39AM +0800, Sui Jingfeng wrote:

On 2024/4/29 19:55, Maxime Ripard wrote:

On Sat, Apr 27, 2024 at 01:57:46PM +0800, Sui Jingfeng wrote:

On 2024/4/26 14:23, Maxime Ripard wrote:

On Fri, Apr 26, 2024 at 04:43:18AM +0800, Sui Jingfeng wrote:

On 2024/4/26 03:10, Andy Shevchenko wrote:

On Fri, Apr 26, 2024 at 02:08:16AM +0800, Sui Jingfeng wrote:

On 2024/4/25 22:26, Andy Shevchenko wrote:
It seems driver missed the point of proper use of device
property APIs.

Correct this by updating headers and calls respectively.
You are using the 'seems' here exactly saying that you are not
100% sure.

Please allow me to tell you the truth: This patch again has
ZERO effect.

It fix nothing. And this patch is has the risks to be wrong.

Huh?! Really, stop commenting the stuff you do not understand.
I'm actually a professional display drivers developer at the
downstream
in the past, despite my contribution to upstream is less. But I
believe
that all panel driver developers know what I'm talking about. So
please

have take a look at my replies.
Most of the interactions you had in this series has been uncalled
for.
You might be against a patch, but there's no need to go to such
length.

As far as I'm concerned, this patch is fine to me in itself, and
I don't

see anything that would prevent us from merging it.
No one is preventing you, as long as don't misunderstanding what
other

people's technical replies intentionally. I'm just a usual and normal
contributor, I hope the world will better than yesterday.

You should seriously consider your tone when replying then.

Saying such thing to me may not proper, I guess you may want to talk
to peoples who has the push rights
I think you misunderstood me. My point was that your several rants
were

uncalled for and aren't the kind of things we're doing here.

I know very well how to get a patch merged, thanks.

just make sure it isn't a insult to the professionalism of drm bridge
community itself though.

I'm not sure why you're bringing the bridge community or its
professionalism. It's a panel, not a bridge, and I never doubted the
professionalism of anyone.

I means that the code itself could be adopted, as newer and younger
programmer (like Andy) need to be encouraged to contribute.

Andy has thousands of commits in Linux. He's *very* far from being a new
contributor.

I express no obvious objections, just hints him that something else
probably should also be taken into consideration as well.

That might be what you wanted to express, but you definitely didn't
express it that way.

On the other hand, we probably should allow other people participate
in discussion so that it is sufficient discussed and ensure that it
won't be reverted by someone in the future for some reasons. Backing
to out case happens here, we may need to move things forward.
Therefore,

it definitely deserve to have a try. It is not a big deal even though
it gets reverted someday.

In the end, I don't mind if you think there is nothing that could
prevent you from merge it, but I still suggest you have a glance at
peoples siting at the Cc list. I'm busy now and I have a lot of other
tasks to do, and may not be able to reply you emails on time. So it up
to you and other maintainers to decide.
Thank you.

So far, you're the only one who reviewed those patches. I'm not sure
what you're talking about here.

Well I (as drm-panel maintainer) did review them positively

[...]

because the patches looked perfectly correct in regards of the commit
message

The point is the 'fixes' tag.

Then, can I ask what's the issue it fixes? I'm asking because I see the
submitting-patches.html [1] documentation told us that a fixes tag indicates
that the patch fixes an issue in a previous commit.

Previously, the driver only meant to be used on the DT systems, so what's issue?

[1]
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#reviewer-s-statement-of-oversight

I copy & paste the paragraph from link [1] for easier to read. See below:

"A Fixes: tag indicates that the patch fixes an issue in a previous commit. It
is used to make it easy to determine where a bug originated, which can help
review a bug fix. This tag also assists the stable kernel team in determining
which stable kernel versions should receive your fix. This is the preferred
method for indicating a bug fixed by the patch."

and the patchset motivation and

OK, the motivation is good, I agree and I admit.

because I trust Andy being a long time contributor with a lot of
expertise.

Does this means that you are going to merge patches from the experts without
have a glance and
reject or ignore novice's patches unconditionally?

I'm asking because I see there still have a lot of other panel drivers use
of_device_get_match_data()
function to get a match, and most of them has the 'OF'

Re: (subset) [RESEND][PATCH v3][next] backlight: sky81452-backlight: Remove unnecessary call to of_node_get

On Thu, 02 May 2024 22:51:21 +0530, Shresth Prasad wrote:
> `dev->of_node` already has a reference to the device_node and calling
> of_node_get on it is unnecessary. All conresponding calls to
> of_node_put are also removed.
> 
> 

Applied, thanks!

[1/1] backlight: sky81452-backlight: Remove unnecessary call to of_node_get
  commit: 4da294108e38bf9cd5c62c2caa47611e5dbb7fb1

--
Lee Jones [李琼斯]

[RESEND][PATCH v3][next] backlight: sky81452-backlight: Remove unnecessary call to of_node_get

2024-05-02 Thread Shresth Prasad

`dev->of_node` already has a reference to the device_node and calling
of_node_get on it is unnecessary. All conresponding calls to
of_node_put are also removed.

Reviewed-by: Daniel Thompson 
Signed-off-by: Shresth Prasad 
---
Changes in v3:
- Remove unnecessary braces

 drivers/video/backlight/sky81452-backlight.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/video/backlight/sky81452-backlight.c 
b/drivers/video/backlight/sky81452-backlight.c
index eb18c6eb0ff0..19f9f84a9fd6 100644
--- a/drivers/video/backlight/sky81452-backlight.c
+++ b/drivers/video/backlight/sky81452-backlight.c
@@ -182,7 +182,7 @@ static const struct attribute_group sky81452_bl_attr_group 
= {
 static struct sky81452_bl_platform_data *sky81452_bl_parse_dt(
struct device *dev)
 {
-   struct device_node *np = of_node_get(dev->of_node);
+   struct device_node *np = dev->of_node;
struct sky81452_bl_platform_data *pdata;
int num_entry;
unsigned int sources[6];
@@ -194,10 +194,8 @@ static struct sky81452_bl_platform_data 
*sky81452_bl_parse_dt(
}
 
pdata = devm_kzalloc(dev, sizeof(*pdata), GFP_KERNEL);
-   if (!pdata) {
-   of_node_put(np);
+   if (!pdata)
return ERR_PTR(-ENOMEM);
-   }
 
of_property_read_string(np, "name", >name);
pdata->ignore_pwm = of_property_read_bool(np, "skyworks,ignore-pwm");
@@ -217,7 +215,6 @@ static struct sky81452_bl_platform_data 
*sky81452_bl_parse_dt(
num_entry);
if (ret < 0) {
dev_err(dev, "led-sources node is invalid.\n");
-   of_node_put(np);
return ERR_PTR(-EINVAL);
}
 
@@ -237,7 +234,6 @@ static struct sky81452_bl_platform_data 
*sky81452_bl_parse_dt(
if (ret < 0)
pdata->boost_current_limit = 2750;
 
-   of_node_put(np);
return pdata;
 }
 #else
-- 
2.45.0

Re: [PATCH v3 4/9] drm/mipi-dsi: Reduce driver bloat of mipi_dsi_*_write_seq()

2024-05-02 Thread Doug Anderson

Hi,

On Thu, May 2, 2024 at 1:23 AM Linus Walleij  wrote:
>
> On Wed, May 1, 2024 at 5:43 PM Douglas Anderson  wrote:
>
> > Through a cooperative effort between Hsin-Yi Wang and Dmitry
> > Baryshkov, we have realized the dev_err() in the
> > mipi_dsi_*_write_seq() macros was causing quite a bit of bloat to the
> > kernel. Let's hoist this call into drm_mipi_dsi.c by adding a "chatty"
> > version of the functions that includes the print. While doing this,
> > add a bit more comments to these macros making it clear that they
> > print errors and also that they return out of _the caller's_ function.
>
> This doesn't really explain why this takes so much less space.
>
> Please add some explanation like that "the macro gets inlined
> and thus the error report string gets inlined into every call to
> mipi_dsi_dcs_write_seq() and mipi_dsi_generic_write_seq(),
> by using a call to a "chatty" function, the usage is reduced to one
> string in the chatty function and a function call at the invoking
> site".
>
> With some explanation like that +/- added in:
> Reviewed-by: Linus Walleij 

Sure. I'll plan on not sending out a v4 (unless I need to for some
other reason) and I can just add your wording in when applying. Yell
if you want me to do something different.

-Doug

Re: [PATCH v2 0/3] drm/mediatek: Add support for OF graphs

2024-05-02 Thread Alexandre Mergnat





On 30/04/2024 13:33, AngeloGioacchino Del Regno wrote:

Il 30/04/24 12:17, Alexandre Mergnat ha scritto:

Hi Angelo,

On 09/04/2024 14:02, AngeloGioacchino Del Regno wrote:

This series was tested on MT8195 Cherry Tomato and on MT8395 Radxa
NIO-12L with both hardcoded paths, OF graph support and partially
hardcoded paths (meaning main display through OF graph and external
display hardcoded, because of OVL_ADAPTOR).


Is that make sense for you to add the DTS changes of these boards into this 
serie ?
I asked because, IMHO, that could help to understand the serie.



Yes and no... but I imagine that you're asking this because you're trying to
prepare something with a different SoC+board(s) combination :-)

In that case, I'm preventively sorry because what follows here is not 100%
perfectly tidy yet as I didn't mean to send the devicetree commits upstream
before this series got picked

... but there you go - I'm sure that you won't mind and that the example will
be more than good enough for you.

Please note that one of the reasons why I didn't want to add this to the series
is that the following changes show only a mere 50% of the reasons why we want OF
graph support on mediatek-drm (but mainly, it's because I didn't have time to
actually rebase etc :-P )


Thanks for the explanations and examples.
Unfortunately, I have 2 display but only one is working (the main: DSI0) when I 
use the dts method.
I've probably missed something but I don't know what.

In my "mmsys" node, if I swap display (the ext endpoint with the main endpoint), the DPI0 is 
working, but not the DSI0. I conclude my both paths are good.


Then, I've put some trace into "mtk_drm_of_ddp_path_build" to check if it parse the two endpoint of 
the node. Both are parsed, but "of_ep.port" is always = 0. According to "of_graph_parse_endpoint" 
function, "port" is the value of the parent "reg", whereas "id" is the value of the endpoint "reg".

So I replaced "of_ep.port" by "of_ep.id". Now I've of_ep.id = 0 for main and 
of_ep.id = 1 for EXT.

Now I've the good CRTC path, I get this error:
  mediatek-drm mediatek-drm.1.auto: Invalid display hw pipeline. Last 
component: 54 (ret=-2)
  mediatek-drm mediatek-drm.1.auto: probe with driver mediatek-drm failed with 
error -22

After quick look, the "cpath" into "mtk_drm_of_ddp_path_build_one" (or deeper functions) seems not 
be used as it should, due to the previous "of_ep.port" => "of_ep.id" change of course.


But I probably have to fix "of_ep.port" because I've mis-coded something. Just in case, I share you 
my diff:


diff --git a/arch/arm64/boot/dts/mediatek/mt8365-evk.dts 
b/arch/arm64/boot/dts/mediatek/mt8365-evk.dts
index 1aa3426f561b..f660481d3fe8 100644
--- a/arch/arm64/boot/dts/mediatek/mt8365-evk.dts
+++ b/arch/arm64/boot/dts/mediatek/mt8365-evk.dts
@@ -109,15 +109,51 @@ vsys_lcm_reg: regulator-vsys-lcm {
};
 };

+ {
+   proc-supply = <_vproc_reg>;
+   sram-supply = <_vsram_proc_reg>;
+};
+
+ {
+   proc-supply = <_vproc_reg>;
+   sram-supply = <_vsram_proc_reg>;
+};
+
+ {
+   proc-supply = <_vproc_reg>;
+   sram-supply = <_vsram_proc_reg>;
+};
+
+ {
+   proc-supply = <_vproc_reg>;
+   sram-supply = <_vsram_proc_reg>;
+};
+
+_out {
+   remote-endpoint = <_in>;
+};
+
  {
pinctrl-0 = <_default_pins>;
pinctrl-1 = <_idle_pins>;
pinctrl-names = "default", "sleep";
status = "okay";
+   ports {
+   #address-cells = <1>;
+   #size-cells = <0>;

-   port {
-   dpi_out: endpoint {
-   remote-endpoint = <_in>;
+   port@0 {
+   reg = <0>;
+   dpi0_in: endpoint {
+   remote-endpoint = <_out>;
+   };
+   };
+
+   port@1 {
+   reg = <1>;
+   dpi0_out: endpoint {
+   remote-endpoint = <_in>;
+   };
};
};
 };
@@ -137,36 +173,28 @@ panel@0 {

port {
panel_in: endpoint {
-   remote-endpoint = <_out>;
+   remote-endpoint = <_out>;
};
};
};
+   ports {
+   #address-cells = <1>;
+   #size-cells = <0>;

-   port {
-   dsi_out: endpoint {
-   remote-endpoint = <_in>;
+   port@0 {
+   reg = <0>;
+   dsi0_in: endpoint {
+   remote-endpoint = <_out>;
+   };
};
-   };
-};

- {
-   proc-supply = <_vproc_reg>;
-   sram-supply = <_vsram_proc_reg>;
-};
-
- {
-   proc-supply = <_vproc_reg>;
-   sram-supply = <_vsram_proc_reg>;
-};
-
- {
-   proc-supply = <_vproc_reg>;
-   sram-supply = <_vsram_proc_reg>;

Re: [PATCH v3 3/5] drm/panthor: Relax the constraints on the tiler chunk size

On Thu, 2 May 2024 16:47:56 +0100
Steven Price  wrote:

> On 02/05/2024 16:40, Boris Brezillon wrote:
> > The field used to store the chunk size if 12 bits wide, and the encoding
> > is chunk_size = chunk_header.chunk_size << 12, which gives us a
> > theoretical [4k:8M] range. This range is further limited by
> > implementation constraints, and all known implementations seem to
> > impose a [128k:8M] range, so do the same here.
> > 
> > We also relax the power-of-two constraint, which doesn't seem to
> > exist on v10. This will allow userspace to fine-tune initial/max
> > tiler memory on memory-constrained devices.
> > 
> > v3:
> > - Add R-bs
> > - Fix valid range in the kerneldoc  
> 
> Sadly the fixed range didn't make it to this posting... ;)

My bad, I was checking the uAPI header and thought I had already fixed
it the other day. Should be good in v4.

> 
> Steve
> 
> > 
> > v2:
> > - Turn the power-of-two constraint into a page-aligned constraint to allow
> >   fine-tune of the initial/max heap memory size
> > - Fix the panthor_heap_create() kerneldoc
> > 
> > Fixes: 9cca48fa4f89 ("drm/panthor: Add the heap logical block")
> > Signed-off-by: Boris Brezillon 
> > Reviewed-by: Liviu Dudau 
> > Reviewed-by: Steven Price 
> > ---
> >  drivers/gpu/drm/panthor/panthor_heap.c | 8 
> >  include/uapi/drm/panthor_drm.h | 6 +-
> >  2 files changed, 9 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/panthor/panthor_heap.c 
> > b/drivers/gpu/drm/panthor/panthor_heap.c
> > index 3be86ec383d6..683bb94761bc 100644
> > --- a/drivers/gpu/drm/panthor/panthor_heap.c
> > +++ b/drivers/gpu/drm/panthor/panthor_heap.c
> > @@ -253,8 +253,8 @@ int panthor_heap_destroy(struct panthor_heap_pool 
> > *pool, u32 handle)
> >   * @pool: Pool to instantiate the heap context from.
> >   * @initial_chunk_count: Number of chunk allocated at initialization time.
> >   * Must be at least 1.
> > - * @chunk_size: The size of each chunk. Must be a power of two between 256k
> > - * and 2M.
> > + * @chunk_size: The size of each chunk. Must be page-aligned and lie in the
> > + * [128k:2M] range.
> >   * @max_chunks: Maximum number of chunks that can be allocated.
> >   * @target_in_flight: Maximum number of in-flight render passes.
> >   * @heap_ctx_gpu_va: Pointer holding the GPU address of the allocated heap
> > @@ -284,8 +284,8 @@ int panthor_heap_create(struct panthor_heap_pool *pool,
> > if (initial_chunk_count > max_chunks)
> > return -EINVAL;
> >  
> > -   if (hweight32(chunk_size) != 1 ||
> > -   chunk_size < SZ_256K || chunk_size > SZ_2M)
> > +   if (!IS_ALIGNED(chunk_size, PAGE_SIZE) ||
> > +   chunk_size < SZ_128K || chunk_size > SZ_8M)
> > return -EINVAL;
> >  
> > down_read(>lock);
> > diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
> > index 5db80a0682d5..b8220d2e698f 100644
> > --- a/include/uapi/drm/panthor_drm.h
> > +++ b/include/uapi/drm/panthor_drm.h
> > @@ -898,7 +898,11 @@ struct drm_panthor_tiler_heap_create {
> > /** @initial_chunk_count: Initial number of chunks to allocate. Must be 
> > at least one. */
> > __u32 initial_chunk_count;
> >  
> > -   /** @chunk_size: Chunk size. Must be a power of two at least 256KB 
> > large. */
> > +   /**
> > +* @chunk_size: Chunk size.
> > +*
> > +* Must be page-aligned and lie in the [128k:8M] range.
> > +*/
> > __u32 chunk_size;
> >  
> > /**  
>

[PATCH v4 5/5] drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity constraints

Make sure the user is aware that drm_panthor_tiler_heap_destroy::handle
must be a handle previously returned by
DRM_IOCTL_PANTHOR_TILER_HEAP_CREATE.

v4:
- Add Steve's R-b

v3:
- New patch

Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 include/uapi/drm/panthor_drm.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index b8220d2e698f..aaed8e12ad0b 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -939,7 +939,11 @@ struct drm_panthor_tiler_heap_create {
  * struct drm_panthor_tiler_heap_destroy - Arguments passed to 
DRM_IOCTL_PANTHOR_TILER_HEAP_DESTROY
  */
 struct drm_panthor_tiler_heap_destroy {
-   /** @handle: Handle of the tiler heap to destroy */
+   /**
+* @handle: Handle of the tiler heap to destroy.
+*
+* Must be a valid heap handle returned by 
DRM_IOCTL_PANTHOR_TILER_HEAP_CREATE.
+*/
__u32 handle;
 
/** @pad: Padding field, MBZ. */
-- 
2.44.0

[PATCH v4 4/5] drm/panthor: Fix an off-by-one in the heap context retrieval logic

The heap ID is used to index the heap context pool, and allocating
in the [1:MAX_HEAPS_PER_POOL] leads to an off-by-one. This was
originally to avoid returning a zero heap handle, but given the handle
is formed with (vm_id << 16) | heap_id, with vm_id > 0, we already can't
end up with a valid heap handle that's zero.

v4:
- s/XA_FLAGS_ALLOC1/XA_FLAGS_ALLOC/

v3:
- Allocate in the [0:MAX_HEAPS_PER_POOL-1] range

v2:
- New patch

Fixes: 9cca48fa4f89 ("drm/panthor: Add the heap logical block")
Reported-by: Eric Smith 
Signed-off-by: Boris Brezillon 
Tested-by: Eric Smith 
---
 drivers/gpu/drm/panthor/panthor_heap.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_heap.c 
b/drivers/gpu/drm/panthor/panthor_heap.c
index b0fc5b9ee847..95a1c6c9f35e 100644
--- a/drivers/gpu/drm/panthor/panthor_heap.c
+++ b/drivers/gpu/drm/panthor/panthor_heap.c
@@ -323,7 +323,8 @@ int panthor_heap_create(struct panthor_heap_pool *pool,
if (!pool->vm) {
ret = -EINVAL;
} else {
-   ret = xa_alloc(>xa, , heap, XA_LIMIT(1, 
MAX_HEAPS_PER_POOL), GFP_KERNEL);
+   ret = xa_alloc(>xa, , heap,
+  XA_LIMIT(0, MAX_HEAPS_PER_POOL - 1), GFP_KERNEL);
if (!ret) {
void *gpu_ctx = panthor_get_heap_ctx(pool, id);
 
@@ -543,7 +544,7 @@ panthor_heap_pool_create(struct panthor_device *ptdev, 
struct panthor_vm *vm)
pool->vm = vm;
pool->ptdev = ptdev;
init_rwsem(>lock);
-   xa_init_flags(>xa, XA_FLAGS_ALLOC1);
+   xa_init_flags(>xa, XA_FLAGS_ALLOC);
kref_init(>refcount);
 
pool->gpu_contexts = panthor_kernel_bo_create(ptdev, vm, bosize,
-- 
2.44.0

[PATCH v4 1/5] drm/panthor: Fix tiler OOM handling to allow incremental rendering

From: Antonino Maniscalco 

If the kernel couldn't allocate memory because we reached the maximum
number of chunks but no render passes are in flight
(panthor_heap_grow() returning -ENOMEM), we should defer the OOM
handling to the FW by returning a NULL chunk. The FW will then call
the tiler OOM exception handler, which is supposed to implement
incremental rendering (execute an intermediate fragment job to flush
the pending primitives, release the tiler memory that was used to
store those primitives, and start over from where it stopped).

Instead of checking for both ENOMEM and EBUSY, make panthor_heap_grow()
return ENOMEM no matter the reason of this allocation failure, the FW
doesn't care anyway.

v3:
- Add R-bs

v2:
- Make panthor_heap_grow() return -ENOMEM for all kind of allocation
  failures
- Document the panthor_heap_grow() semantics

Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block")
Signed-off-by: Antonino Maniscalco 
Signed-off-by: Boris Brezillon 
Reviewed-by: Liviu Dudau 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panthor/panthor_heap.c  | 12 
 drivers/gpu/drm/panthor/panthor_sched.c |  7 ++-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_heap.c 
b/drivers/gpu/drm/panthor/panthor_heap.c
index 143fa35f2e74..c3c0ba744937 100644
--- a/drivers/gpu/drm/panthor/panthor_heap.c
+++ b/drivers/gpu/drm/panthor/panthor_heap.c
@@ -410,6 +410,13 @@ int panthor_heap_return_chunk(struct panthor_heap_pool 
*pool,
  * @renderpasses_in_flight: Number of render passes currently in-flight.
  * @pending_frag_count: Number of fragment jobs waiting for 
execution/completion.
  * @new_chunk_gpu_va: Pointer used to return the chunk VA.
+ *
+ * Return:
+ * - 0 if a new heap was allocated
+ * - -ENOMEM if the tiler context reached the maximum number of chunks
+ *   or if too many render passes are in-flight
+ *   or if the allocation failed
+ * - -EINVAL if any of the arguments passed to panthor_heap_grow() is invalid
  */
 int panthor_heap_grow(struct panthor_heap_pool *pool,
  u64 heap_gpu_va,
@@ -439,10 +446,7 @@ int panthor_heap_grow(struct panthor_heap_pool *pool,
 * handler provided by the userspace driver, if any).
 */
if (renderpasses_in_flight > heap->target_in_flight ||
-   (pending_frag_count > 0 && heap->chunk_count >= heap->max_chunks)) {
-   ret = -EBUSY;
-   goto out_unlock;
-   } else if (heap->chunk_count >= heap->max_chunks) {
+   heap->chunk_count >= heap->max_chunks) {
ret = -ENOMEM;
goto out_unlock;
}
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c 
b/drivers/gpu/drm/panthor/panthor_sched.c
index 7f16a4a14e9a..c126251c5ba7 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -1385,7 +1385,12 @@ static int group_process_tiler_oom(struct panthor_group 
*group, u32 cs_id)
pending_frag_count, _chunk_va);
}
 
-   if (ret && ret != -EBUSY) {
+   /* If the heap context doesn't have memory for us, we want to let the
+* FW try to reclaim memory by waiting for fragment jobs to land or by
+* executing the tiler OOM exception handler, which is supposed to
+* implement incremental rendering.
+*/
+   if (ret && ret != -ENOMEM) {
drm_warn(>base, "Failed to extend the tiler heap\n");
group->fatal_queues |= BIT(cs_id);
sched_queue_delayed_work(sched, tick, 0);
-- 
2.44.0

[PATCH v4 3/5] drm/panthor: Relax the constraints on the tiler chunk size

The field used to store the chunk size if 12 bits wide, and the encoding
is chunk_size = chunk_header.chunk_size << 12, which gives us a
theoretical [4k:8M] range. This range is further limited by
implementation constraints, and all known implementations seem to
impose a [128k:8M] range, so do the same here.

We also relax the power-of-two constraint, which doesn't seem to
exist on v10. This will allow userspace to fine-tune initial/max
tiler memory on memory-constrained devices.

v4:
- Actually fix the range in the kerneldoc

v3:
- Add R-bs
- Fix valid range in the kerneldoc

v2:
- Turn the power-of-two constraint into a page-aligned constraint to allow
  fine-tune of the initial/max heap memory size
- Fix the panthor_heap_create() kerneldoc

Fixes: 9cca48fa4f89 ("drm/panthor: Add the heap logical block")
Signed-off-by: Boris Brezillon 
Reviewed-by: Liviu Dudau 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panthor/panthor_heap.c | 8 
 include/uapi/drm/panthor_drm.h | 6 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_heap.c 
b/drivers/gpu/drm/panthor/panthor_heap.c
index 3be86ec383d6..b0fc5b9ee847 100644
--- a/drivers/gpu/drm/panthor/panthor_heap.c
+++ b/drivers/gpu/drm/panthor/panthor_heap.c
@@ -253,8 +253,8 @@ int panthor_heap_destroy(struct panthor_heap_pool *pool, 
u32 handle)
  * @pool: Pool to instantiate the heap context from.
  * @initial_chunk_count: Number of chunk allocated at initialization time.
  * Must be at least 1.
- * @chunk_size: The size of each chunk. Must be a power of two between 256k
- * and 2M.
+ * @chunk_size: The size of each chunk. Must be page-aligned and lie in the
+ * [128k:8M] range.
  * @max_chunks: Maximum number of chunks that can be allocated.
  * @target_in_flight: Maximum number of in-flight render passes.
  * @heap_ctx_gpu_va: Pointer holding the GPU address of the allocated heap
@@ -284,8 +284,8 @@ int panthor_heap_create(struct panthor_heap_pool *pool,
if (initial_chunk_count > max_chunks)
return -EINVAL;
 
-   if (hweight32(chunk_size) != 1 ||
-   chunk_size < SZ_256K || chunk_size > SZ_2M)
+   if (!IS_ALIGNED(chunk_size, PAGE_SIZE) ||
+   chunk_size < SZ_128K || chunk_size > SZ_8M)
return -EINVAL;
 
down_read(>lock);
diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index 5db80a0682d5..b8220d2e698f 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -898,7 +898,11 @@ struct drm_panthor_tiler_heap_create {
/** @initial_chunk_count: Initial number of chunks to allocate. Must be 
at least one. */
__u32 initial_chunk_count;
 
-   /** @chunk_size: Chunk size. Must be a power of two at least 256KB 
large. */
+   /**
+* @chunk_size: Chunk size.
+*
+* Must be page-aligned and lie in the [128k:8M] range.
+*/
__u32 chunk_size;
 
/**
-- 
2.44.0

[PATCH v4 2/5] drm/panthor: Make sure the tiler initial/max chunks are consistent

It doesn't make sense to have a maximum number of chunks smaller than
the initial number of chunks attached to the context.

Fix the uAPI header to reflect the new constraint, and mention the
undocumented "initial_chunk_count > 0" constraint while at it.

v3:
- Add R-b

v2:
- Fix the check

Fixes: 9cca48fa4f89 ("drm/panthor: Add the heap logical block")
Signed-off-by: Boris Brezillon 
Reviewed-by: Liviu Dudau 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panthor/panthor_heap.c | 3 +++
 include/uapi/drm/panthor_drm.h | 8 ++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_heap.c 
b/drivers/gpu/drm/panthor/panthor_heap.c
index c3c0ba744937..3be86ec383d6 100644
--- a/drivers/gpu/drm/panthor/panthor_heap.c
+++ b/drivers/gpu/drm/panthor/panthor_heap.c
@@ -281,6 +281,9 @@ int panthor_heap_create(struct panthor_heap_pool *pool,
if (initial_chunk_count == 0)
return -EINVAL;
 
+   if (initial_chunk_count > max_chunks)
+   return -EINVAL;
+
if (hweight32(chunk_size) != 1 ||
chunk_size < SZ_256K || chunk_size > SZ_2M)
return -EINVAL;
diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index dadb05ab1235..5db80a0682d5 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -895,13 +895,17 @@ struct drm_panthor_tiler_heap_create {
/** @vm_id: VM ID the tiler heap should be mapped to */
__u32 vm_id;
 
-   /** @initial_chunk_count: Initial number of chunks to allocate. */
+   /** @initial_chunk_count: Initial number of chunks to allocate. Must be 
at least one. */
__u32 initial_chunk_count;
 
/** @chunk_size: Chunk size. Must be a power of two at least 256KB 
large. */
__u32 chunk_size;
 
-   /** @max_chunks: Maximum number of chunks that can be allocated. */
+   /**
+* @max_chunks: Maximum number of chunks that can be allocated.
+*
+* Must be at least @initial_chunk_count.
+*/
__u32 max_chunks;
 
/**
-- 
2.44.0

[PATCH v4 0/5] drm/panthor: Collection of tiler heap related fixes

This is a collection of tiler heap fixes for bugs/oddities found while
looking at incremental rendering.

Ideally, we want to land those before 6.10 is released, so we don't need
to increment the driver version to reflect the ABI changes.

Changelog detailed in each commit.

Regards,

Boris

Antonino Maniscalco (1):
  drm/panthor: Fix tiler OOM handling to allow incremental rendering

Boris Brezillon (4):
  drm/panthor: Make sure the tiler initial/max chunks are consistent
  drm/panthor: Relax the constraints on the tiler chunk size
  drm/panthor: Fix an off-by-one in the heap context retrieval logic
  drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity
constraints

 drivers/gpu/drm/panthor/panthor_heap.c  | 28 -
 drivers/gpu/drm/panthor/panthor_sched.c |  7 ++-
 include/uapi/drm/panthor_drm.h  | 20 ++
 3 files changed, 40 insertions(+), 15 deletions(-)

-- 
2.44.0

Re: [PATCH v3 4/5] drm/panthor: Fix an off-by-one in the heap context retrieval logic

On Thu, 2 May 2024 16:52:24 +0100
Steven Price  wrote:

> On 02/05/2024 16:40, Boris Brezillon wrote:
> > The heap ID is used to index the heap context pool, and allocating
> > in the [1:MAX_HEAPS_PER_POOL] leads to an off-by-one. This was
> > originally to avoid returning a zero heap handle, but given the handle
> > is formed with (vm_id << 16) | heap_id, with vm_id > 0, we already can't
> > end up with a valid heap handle that's zero.
> > 
> > v3:
> > - Allocate in the [0:MAX_HEAPS_PER_POOL-1] range
> > 
> > v2:
> > - New patch
> > 
> > Fixes: 9cca48fa4f89 ("drm/panthor: Add the heap logical block")
> > Reported-by: Eric Smith 
> > Signed-off-by: Boris Brezillon 
> > Tested-by: Eric Smith   
> 
> Don't we also need to change the xa_init_flags() in
> panthor_heap_pool_create()?

Uh, we should, indeed.

> 
> Steve
> 
> > ---
> >  drivers/gpu/drm/panthor/panthor_heap.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/panthor/panthor_heap.c 
> > b/drivers/gpu/drm/panthor/panthor_heap.c
> > index 683bb94761bc..252332f5390f 100644
> > --- a/drivers/gpu/drm/panthor/panthor_heap.c
> > +++ b/drivers/gpu/drm/panthor/panthor_heap.c
> > @@ -323,7 +323,8 @@ int panthor_heap_create(struct panthor_heap_pool *pool,
> > if (!pool->vm) {
> > ret = -EINVAL;
> > } else {
> > -   ret = xa_alloc(>xa, , heap, XA_LIMIT(1, 
> > MAX_HEAPS_PER_POOL), GFP_KERNEL);
> > +   ret = xa_alloc(>xa, , heap,
> > +  XA_LIMIT(0, MAX_HEAPS_PER_POOL - 1), GFP_KERNEL);
> > if (!ret) {
> > void *gpu_ctx = panthor_get_heap_ctx(pool, id);
> >
>

Re: (subset) [PATCH v1 1/1] backlight: mp3309c: fix leds flickering in pwm mode

On Thu, 02 May 2024, Lee Jones wrote:

> On Wed, 17 Apr 2024 17:31:05 +0200, Flavio Suligoi wrote:
> > The mp3309 has two configuration registers, named according to their
> > address (0x00 and 0x01).
> > In the second register (0x01), the bit DIMS (Dimming Mode Select) must
> > be always 0 (zero), in both analog (via i2c commands) and pwm dimming
> > mode.
> > 
> > In the initial driver version, the DIMS bit was set in pwm mode and
> > reset in analog mode.
> > But if the DIMS bit is set in pwm dimming mode and other devices are
> > connected on the same i2c bus, every i2c commands on the bus generates a
> > flickering on the LEDs powered by the mp3309c.
> > 
> > [...]
> 
> Applied, thanks!
> 
> [1/1] backlight: mp3309c: fix leds flickering in pwm mode
>   commit: ce60cddc2abf61902dfca71d630624db95315124

Applied, but in future it's; I2C, PWM and LED, thanks.

-- 
Lee Jones [李琼斯]

Re: (subset) [PATCH v1 1/1] backlight: mp3309c: fix leds flickering in pwm mode

On Wed, 17 Apr 2024 17:31:05 +0200, Flavio Suligoi wrote:
> The mp3309 has two configuration registers, named according to their
> address (0x00 and 0x01).
> In the second register (0x01), the bit DIMS (Dimming Mode Select) must
> be always 0 (zero), in both analog (via i2c commands) and pwm dimming
> mode.
> 
> In the initial driver version, the DIMS bit was set in pwm mode and
> reset in analog mode.
> But if the DIMS bit is set in pwm dimming mode and other devices are
> connected on the same i2c bus, every i2c commands on the bus generates a
> flickering on the LEDs powered by the mp3309c.
> 
> [...]

Applied, thanks!

[1/1] backlight: mp3309c: fix leds flickering in pwm mode
  commit: ce60cddc2abf61902dfca71d630624db95315124

--
Lee Jones [李琼斯]

Re: [PATCH v2] drm/panthor: Make sure we handle 'unknown group state' case properly

On Thu,  2 May 2024 17:52:48 +0200
Boris Brezillon  wrote:

> When we check for state values returned by the FW, we only cover part of
> the 0:7 range. Make sure we catch FW inconsistencies by adding a default
> to the switch statement, and flagging the group state as unknown in that
> case.
> 
> When an unknown state is detected, we trigger a reset, and consider the
> group as unusable after that point, to prevent the potential corruption
> from creeping in other places if we continue executing stuff on this
> context.
> 
> v2:
> - Add Steve's R-b
> - Fix commit message
> 
> Reported-by: Dan Carpenter 
> Suggested-by: Steven Price 
> Signed-off-by: Boris Brezillon 
> Reviewed-by: Steven Price 
> Reviewed-by: Liviu Dudau 

Queued to drm-misc-next-fixes.

> ---
>  drivers/gpu/drm/panthor/panthor_sched.c | 37 +++--
>  1 file changed, 35 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c 
> b/drivers/gpu/drm/panthor/panthor_sched.c
> index b3a51a6de523..1a14ee30f774 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -490,6 +490,18 @@ enum panthor_group_state {
>* Can no longer be scheduled. The only allowed action is a destruction.
>*/
>   PANTHOR_CS_GROUP_TERMINATED,
> +
> + /**
> +  * @PANTHOR_CS_GROUP_UNKNOWN_STATE: Group is an unknown state.
> +  *
> +  * The FW returned an inconsistent state. The group is flagged unusable
> +  * and can no longer be scheduled. The only allowed action is a
> +  * destruction.
> +  *
> +  * When that happens, we also schedule a FW reset, to start from a fresh
> +  * state.
> +  */
> + PANTHOR_CS_GROUP_UNKNOWN_STATE,
>  };
>  
>  /**
> @@ -1127,6 +1139,7 @@ csg_slot_sync_state_locked(struct panthor_device 
> *ptdev, u32 csg_id)
>   struct panthor_fw_csg_iface *csg_iface;
>   struct panthor_group *group;
>   enum panthor_group_state new_state, old_state;
> + u32 csg_state;
>  
>   lockdep_assert_held(>scheduler->lock);
>  
> @@ -1137,7 +1150,8 @@ csg_slot_sync_state_locked(struct panthor_device 
> *ptdev, u32 csg_id)
>   return;
>  
>   old_state = group->state;
> - switch (csg_iface->output->ack & CSG_STATE_MASK) {
> + csg_state = csg_iface->output->ack & CSG_STATE_MASK;
> + switch (csg_state) {
>   case CSG_STATE_START:
>   case CSG_STATE_RESUME:
>   new_state = PANTHOR_CS_GROUP_ACTIVE;
> @@ -1148,11 +1162,28 @@ csg_slot_sync_state_locked(struct panthor_device 
> *ptdev, u32 csg_id)
>   case CSG_STATE_SUSPEND:
>   new_state = PANTHOR_CS_GROUP_SUSPENDED;
>   break;
> + default:
> + /* The unknown state might be caused by a FW state corruption,
> +  * which means the group metadata can't be trusted anymore, and
> +  * the SUSPEND operation might propagate the corruption to the
> +  * suspend buffers. Flag the group state as unknown to make
> +  * sure it's unusable after that point.
> +  */
> + drm_err(>base, "Invalid state on CSG %d (state=%d)",
> + csg_id, csg_state);
> + new_state = PANTHOR_CS_GROUP_UNKNOWN_STATE;
> + break;
>   }
>  
>   if (old_state == new_state)
>   return;
>  
> + /* The unknown state might be caused by a FW issue, reset the FW to
> +  * take a fresh start.
> +  */
> + if (new_state == PANTHOR_CS_GROUP_UNKNOWN_STATE)
> + panthor_device_schedule_reset(ptdev);
> +
>   if (new_state == PANTHOR_CS_GROUP_SUSPENDED)
>   csg_slot_sync_queues_state_locked(ptdev, csg_id);
>  
> @@ -1783,6 +1814,7 @@ static bool
>  group_can_run(struct panthor_group *group)
>  {
>   return group->state != PANTHOR_CS_GROUP_TERMINATED &&
> +group->state != PANTHOR_CS_GROUP_UNKNOWN_STATE &&
>  !group->destroyed && group->fatal_queues == 0 &&
>  !group->timedout;
>  }
> @@ -2557,7 +2589,8 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
>  
>   if (csg_slot->group) {
>   csgs_upd_ctx_queue_reqs(ptdev, _ctx, i,
> - CSG_STATE_SUSPEND,
> + group_can_run(csg_slot->group) ?
> + CSG_STATE_SUSPEND : 
> CSG_STATE_TERMINATE,
>   CSG_STATE_MASK);
>   }
>   }

Re: [PATCH v3 7/9] drm/panel: boe-tv101wum-nl6: Don't use a table for initting panels

2024-05-02 Thread Doug Anderson

Hi,

On Thu, May 2, 2024 at 6:42 AM Linus Walleij  wrote:
>
> On Wed, May 1, 2024 at 5:43 PM Douglas Anderson  wrote:
>
> > Consensus on the mailing lists is that panels shouldn't use a table of
> > init commands but should instead use init functions. With the recently
> > introduced mipi_dsi_dcs_write_seq_multi() this is not only clean/easy
> > but also saves space. Measuring before/after this change:
> >
> > $ scripts/bloat-o-meter \
> >   .../before/panel-boe-tv101wum-nl6.ko \
> >   .../after/panel-boe-tv101wum-nl6.ko
> > add/remove: 14/8 grow/shrink: 0/1 up/down: 27062/-31433 (-4371)
> > Function old new   delta
> > inx_hj110iz_init   -7040   +7040
> > boe_tv110c9m_init  -6440   +6440
> > boe_init   -5916   +5916
> > starry_qfh032011_53g_init  -1944   +1944
> > starry_himax83102_j02_init -1228   +1228
> > inx_hj110iz_init.d -1040   +1040
> > boe_tv110c9m_init.d- 982+982
> > auo_b101uan08_3_init   - 944+944
> > boe_init.d - 580+580
> > starry_himax83102_j02_init.d   - 512+512
> > starry_qfh032011_53g_init.d- 180+180
> > auo_kd101n80_45na_init - 172+172
> > auo_b101uan08_3_init.d -  82 +82
> > auo_kd101n80_45na_init.d   -   2  +2
> > auo_kd101n80_45na_init_cmd   144   --144
> > boe_panel_prepare592 440-152
> > auo_b101uan08_3_init_cmd1056   -   -1056
> > starry_himax83102_j02_init_cmd  1392   -   -1392
> > starry_qfh032011_53g_init_cmd   2256   -   -2256
> > .compoundliteral3393   -   -3393
> > boe_init_cmd7008   -   -7008
> > boe_tv110c9m_init_cmd   7656   -   -7656
> > inx_hj110iz_init_cmd8376   -   -8376
> > Total: Before=37297, After=32926, chg -11.72%
> >
> > Let's do the conversion.
> >
> > Since we're touching all the tables, let's also convert hex numbers to
> > lower case as per kernel conventions.
> >
> > Signed-off-by: Douglas Anderson 
>
> Wow that's a *VERY* nice patch.
> Reviewed-by: Linus Walleij 

Thanks!


> The metrics surprisingly reports more compact object code,
> I wasn't expecting this, but it's nice.

I think it has to do with the design of the table structure in this
driver. Each "struct panel_init_cmd" was 24-bytes big. That means that
to represent one 1-byte sequence we needed 24 bytes + 1 bytes cmd + 1
byte seq = 26 bytes. Lots of overhead for 2 bytes of content. The old
table stuff could certainly have been made _a lot_ less overhead, but
since it wasn't then it also wasn't hard to be better than it with it
via the new style.

FWIW, it also wouldn't be terribly hard to save a tiny bit more space
with the new style if we wanted. It gets a little ugly, but it doesn't
affect callers of the macro. Specifically, if you assume people aren't
going to pass more than 256-byte sequences, you could stuff the length
in the data:

 #define mipi_dsi_dcs_write_seq_multi(ctx, cmd, seq...)  \
-   do {\
-   static const u8 d[] = { cmd, seq }; \
-   mipi_dsi_dcs_write_buffer_multi(ctx, d, ARRAY_SIZE(d)); \
+   do { \
+   static const u8 d[] = { \
+   (sizeof((u8[]){seq})/sizeof(u8)) + 1, cmd, seq }; \
+   mipi_dsi_dcs_write_buffer_multi(ctx, d); \
} while (0)


...and then snag the length out of the first byte of the data in
mipi_dsi_dcs_write_buffer_multi(). If you do this, you actually get
another 10% space savings on this driver. :-P

add/remove: 0/0 grow/shrink: 7/7 up/down: 1140/-4560 (-3420)
Function old new   delta
inx_hj110iz_init.d  10401385+345
boe_tv110c9m_init.d  9821297+315
boe_init.d   580 870+290
starry_qfh032011_53g_init.d  180 271 +91
starry_himax83102_j02_init.d 512 568 +56
auo_b101uan08_3_init.d82 123 +41
auo_kd101n80_45na_init.d   2   4  +2
auo_kd101n80_45na_init   172 164  -8
auo_b101uan08_3_init 944 780-164
starry_himax83102_j02_init  12281004-224
starry_qfh032011_53g_init   19441580-364
boe_init5916

[PULL] drm-xe-fixes

2024-05-02 Thread Lucas De Marchi


Hi Dave and Sima,

Please pull the drm-xe-fixes for this week targeting v6.9-rc7.

Two important fixes: one avoiding a use-after-free in the rebind worker
and the other to make display work in ADL-N.

thanks
Lucas De Marchi

drm-xe-fixes-2024-05-02:
- Fix UAF on rebind worker
- Fix ADL-N display integration
The following changes since commit e67572cd2204894179d89bd7b984072f19313b03:

  Linux 6.9-rc6 (2024-04-28 13:47:24 -0700)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/xe/kernel.git tags/drm-xe-fixes-2024-05-02

for you to fetch changes up to df04b152fca2d46e75fbb74ed79299bc420bc9e6:

  drm/xe/display: Fix ADL-N detection (2024-05-02 11:11:01 -0500)


- Fix UAF on rebind worker
- Fix ADL-N display integration


Lucas De Marchi (1):
  drm/xe/display: Fix ADL-N detection

Matthew Auld (1):
  drm/xe/vm: prevent UAF in rebind_work_func()

 drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h | 3 ++-
 drivers/gpu/drm/xe/xe_vm.c| 3 +++
 2 files changed, 5 insertions(+), 1 deletion(-)

Re: [v1,1/3] drm/panel: ili9341: Correct use of device property APIs

2024-05-02 Thread Sui Jingfeng


Hi,


On 2024/5/2 16:32, Andy Shevchenko wrote:

On Wed, May 01, 2024 at 12:27:14AM +0800, Sui Jingfeng wrote:

On 2024/4/30 22:13, Andy Shevchenko wrote:

On Fri, Apr 26, 2024 at 05:13:43AM +0800, Sui Jingfeng wrote:

...


the former might be subdivided to "is it swnode backed or real fwnode one?"


Yeah,
On non-DT cases, it can be subdivided to swnode backed case and ACPI fwnode 
backed case.

  - For swnode backed case: the device_get_match_data() don't has a implemented 
backend.
  - For ACPI fwnode backed case: the device_get_match_data() has a implemented 
backend.

But the driver has *neither* software node support

True.


nor ACPI support,

Not true.


Why this is not true? Are you means that the panel-ilitek-ili9341 driver has 
ACPI support?
I'm asking because I don't see struct acpi_device_id related stuff in that 
source file,
am I miss something?



So, slow down and take your time to get into the code and understand how it 
works.


so that the rotation property can not get and ili9341_dpi_probe() will fails.
So in total, this is not a 100% correct use of device property APIs.

But I'm fine that if you want to leave(ignore) those less frequent use cases 
temporarily,
there may have programmers want to post patches, to complete the missing in the 
future.

So, there do have some gains on non-DT cases.

  - As you make it be able to compiled on X86 with the drm-misc-defconfig.
  - You cleanup the code up (at least patch 2 in this series is no obvious 
problem).
  - You allow people to modprobe it, and maybe half right and half undefined.

But you do helps moving something forward, so congratulations for the wake up.


--
Best regards,
Sui

Re: [PATCH v3 0/9] drm/mipi-dsi: Reduce bloat and add funcs for cleaner init seqs

2024-05-02 Thread neil . armstrong

On 02/05/2024 16:27, Doug Anderson wrote:

Hi,

On Thu, May 2, 2024 at 12:48 AM Neil Armstrong
wrote:

Hi,

On 01/05/2024 17:41, Douglas Anderson wrote:

The consensus of many DRM folks is that we want to move away from DSI
drivers defining tables of init commands. Instead, we want to move to
init functions that can use common DRM functions. The issue thus far
has been that using the macros mipi_dsi_generic_write_seq() and
mipi_dsi_dcs_write_seq() bloats the driver using them.

While trying to solve bloat, we realized that the majority of the it
was easy to solve. This series solves the bloat for existing drivers
by moving the printout outside of the macro.

During discussion of my v1 patch to fix the bloat [1], we also decided
that we really want to change the way that drivers deal with init
sequences to make it clearer. In addition to being cleaner, a side
effect of moving drivers to the new style reduces bloat _even more_.

This series also contains a few minor fixes / cleanups that I found
along the way.

This series converts four drivers over to the new
mipi_dsi_dcs_write_seq_multi() function. Not all conversions have been
tested, but hopefully they are straightforward enough. I'd appreciate
testing.

NOTE: In v3 I tried to incorporate the feedback from v2. I also
converted the other two panels I could find that used table-based
initialization.

[1]
https://lore.kernel.org/r/20240424172017.1.Id15fae80582bc74a0d4f1338987fa375738f45b9@changeid

Changes in v3:
- ("mipi_dsi_*_write functions don't need to ratelimit...") moved earlier.
- Add a TODO item for cleaning up the deprecated macros/functions.
- Fix spacing of init function.
- Inline kerneldoc comments for struct mipi_dsi_multi_context.
- Rebased upon patch to remove ratelimit of prints.
- Remove an unneeded error print.
- Squash boe-tv101wum-nl6 lowercase patch into main patch
- Use %zd in print instead of casting errors to int.
- drm/panel: ili9882t: Don't use a table for initting panels
- drm/panel: innolux-p079zca: Don't use a table for initting panels

Changes in v2:
- Add some comments to the macros about printing and returning.
- Change the way err value is handled in prep for next patch.
- Modify commit message now that this is part of a series.
- Rebased upon patches to avoid theoretical int overflow.
- drm/mipi-dsi: Fix theoretical int overflow in mipi_dsi_dcs_write_seq()
- drm/mipi-dsi: Fix theoretical int overflow in mipi_dsi_generic_write_seq()
- drm/mipi-dsi: Introduce mipi_dsi_*_write_seq_multi()
- drm/mipi-dsi: mipi_dsi_*_write functions don't need to ratelimit prints
- drm/panel: boe-tv101wum-nl6: Convert hex to lowercase
- drm/panel: boe-tv101wum-nl6: Don't use a table for initting commands
- drm/panel: novatek-nt36672e: Switch to mipi_dsi_dcs_write_seq_multi()

Douglas Anderson (9):
drm/mipi-dsi: Fix theoretical int overflow in mipi_dsi_dcs_write_seq()
drm/mipi-dsi: Fix theoretical int overflow in
mipi_dsi_generic_write_seq()
drm/mipi-dsi: mipi_dsi_*_write functions don't need to ratelimit
prints
drm/mipi-dsi: Reduce driver bloat of mipi_dsi_*_write_seq()
drm/mipi-dsi: Introduce mipi_dsi_*_write_seq_multi()
drm/panel: novatek-nt36672e: Switch to mipi_dsi_dcs_write_seq_multi()
drm/panel: boe-tv101wum-nl6: Don't use a table for initting panels
drm/panel: ili9882t: Don't use a table for initting panels
drm/panel: innolux-p079zca: Don't use a table for initting panels

Thanks Doug!

I think we all agree on the core changes, now I think we can wait a few weeks
and try to get some test feedbacks on the indirectly and directly affected
panels, drm-misc-next won't be merged into linux-next until v6.10-rc1 anyway
so we have some time to test on our boards.

Great!

Just to be clear, are you suggesting that we leave these patches on
the lists for a few weeks before landing in drm-misc-next, or are you
suggesting that it's safe to land them in drm-misc-next because it
won't make it to linuxnext for a while anyway? I assume the former
(AKA leave it on the lists for a while) but just want to be clear. ;-)

Yes you assume right

There's nothing terribly urgent about these patches except that they
are blocking Cong Yang's patch series [0] and lvzhaoxiong's patch
series [1]. I think it would be fine for them to send out their patch
series with mine marked as a dependency so we can finish reviewing
their series and then when mine lands theirs will be good to go too.

Maybe we can aim for 2 weeks of stewing on the list if there are no
issues during that time? I know landing in drm-misc during this time
won't help get into mainline faster, but with ChromeOS's "upstream
first" policy it saves us a bunch of headache if we can land things in
our tree from a maintainer tree with stable git hashes (like
drm-misc-next) instead of landing from a mailing list. Certainly that
doesn't mean we want to rush patches in before they're ready, but I
just want to say that there is still some benefit in

Re: [PATCH v2] drm: move DRM-related CONFIG options into DRM submenu

2024-05-02 Thread Maxime Ripard

On Fri, 26 Apr 2024 22:56:02 +0900, Masahiro Yamada wrote:
> When you create a submenu using the 'menu' syntax, there is no
> ambiguity about its end because the code between 'menu' and 'endmenu'
> becomes the submenu.
> 
> In contrast, 'menuconfig' does not have the corresponding end marker.
> Instead, the end of the submenu is inferred from symbol dependencies.
> 
> [...]

Applied to misc/kernel.git (drm-misc-next-fixes).

Thanks!
Maxime

Re: [RESEND v7 24/37] mfd: sm501: Convert platform_data to OF property

On Thu, 04 Apr 2024, Yoshinori Sato wrote:

> Various parameters of SM501 can be set using platform_data,
> so parameters cannot be passed in the DeviceTree target.
> Expands the parameters set in platform_data so that they can be
> specified using DeviceTree properties.
> 
> Signed-off-by: Yoshinori Sato 
> ---
>  drivers/mfd/sm501.c   | 315 ++
>  drivers/video/fbdev/sm501fb.c | 106 
>  2 files changed, 421 insertions(+)

I don't know exactly what this is, but I do know that I don't like it.

If you manage to get it through another maintainer, more power to you,
but it is not suitable for MFD.

> diff --git a/drivers/mfd/sm501.c b/drivers/mfd/sm501.c
> index b3592982a83b..98a69e254f5f 100644
> --- a/drivers/mfd/sm501.c
> +++ b/drivers/mfd/sm501.c
> @@ -82,6 +82,16 @@ struct sm501_devdata {
>   unsigned int rev;
>  };
>  
> +struct sm501_config_props_uint {
> + char *name;
> + u32 shift;
> +};
> +
> +struct sm501_config_props_flag {
> + char *clr_name;
> + char *set_name;
> + u32 bit;
> +};
>  
>  #define MHZ (1000 * 1000)
>  
> @@ -1370,6 +1380,305 @@ static int sm501_init_dev(struct sm501_devdata *sm)
>   return 0;
>  }
>  
> +static const struct sm501_config_props_uint misc_timing[] = {
> + {"delay",0},
> + {"-",3},
> + {"divider",  4},
> + {"-",6},
> + {"sm0",  8},
> + {"-",   12},
> + {"sm1", 16},
> + {"-",   20},
> + {"xc",  24},
> + {"-",   26},
> + {"ex",  28},
> + {NULL,  32},
> +};
> +
> +static const struct sm501_config_props_flag misc_timing_flag[] = {
> + {"usb-host-normal",  "usb-host-simulation",3},
> + {"no-acpi-control",  "acpi-control",   6},
> + {"pll-debug-input",  "pll-debug-output",   7},
> + {"sdram-clock-mode0-288mhz", "sdram-clock-mode0-div", 12},
> + {"sdram-clock-mode1-288mhz", "sdram-clock-mode1-div", 20},
> + {"usb-over-current-detect-disable",
> +  "usb-over-current-detect-enable",  23},
> + {},
> +};
> +
> +static const struct sm501_config_props_uint misc_control[] = {
> + {"hold", 18},
> + {"refresh",  21},
> + {"-",23},
> + {"usbclk",   28},
> + {"pad",  30},
> + {NULL,   32},
> +};
> +
> +static const struct sm501_config_props_flag misc_control_flag[] = {
> + {"vr-mmio-30mb","vr-mmio-62mb", 4},
> + {"usb-port-master", "usb-port-slave",   9},
> + {"burst-length-8",  "burst-length-1",  10},
> + {"usb-slave-cpu",   "usb-slave-8051",  11},
> + {"dac-power-enable","dac-power-disable",   12},
> + {"pll-clock-count-disable", "pll-clock-count-enable",  15},
> + {"interrupt-normal","interrupt-invarted",  16},
> + {"sh-ready-low","sh-ready-high",   17},
> + {"xtal-freq-24mhz", "xtal-freq-12mhz", 24},
> + {"panel-data-18bit","panel-dtat-24bit",25},
> + {"latch-address-disable",   "latch-address-enable",26},
> + {"uart1",   "ssp1",27},
> + {},
> +};
> +
> +/* Read configuration values */
> +static void sm501_of_read_config(struct device *dev, struct device_node *np,
> +  const char *prefix,
> +  const struct sm501_config_props_uint *props,
> +  const struct sm501_config_props_flag 
> *props_flag,
> +  struct sm501_reg_init *ret)
> +{
> + struct device_node *child;
> + char *name;
> + u32 shift;
> + u32 width;
> + u32 mask;
> + u32 val;
> +
> + ret->mask = ~0;
> + ret->set = 0;
> +
> + child = of_get_child_by_name(np, prefix);
> + if (!child)
> + return;
> +
> + while (props->name) {
> + name = props->name;
> + shift = props->shift;
> + props++;
> +
> + if (name[0] == '-' ||
> + of_property_read_u32(child, name, ))
> + continue;
> +
> + width = props->shift - shift;
> + mask = (1 << width) - 1;
> + if (mask < val) {
> + dev_warn(dev, "%s invalid value %d", name, val);
> + continue;
> + }
> + mask = ~(mask << shift);
> + ret->mask &= mask;
> + ret->set |= val << shift;
> + }
> + while (props_flag->clr_name) {
> + val = ~0;
> + if (of_property_read_bool(child, props_flag->clr_name))
> + val = 0;
> + else if (of_property_read_bool(child, props_flag->set_name))
> + val = 1;
> + if (val != ~0) {
> + val <<= (props_flag->bit & 31);
> + mask =

Re: [PATCH] drm/panthor: Kill the faulty_slots variable in panthor_sched_suspend()

On Thu, 25 Apr 2024 12:18:29 +0100
Steven Price  wrote:

> On 25/04/2024 11:39, Boris Brezillon wrote:
> > We can use upd_ctx.timedout_mask directly, and the faulty_slots update
> > in the flush_caches_failed situation is never used.
> > 
> > Suggested-by: Suggested-by: Steven Price   
> 
> I'm obviously too full of suggestions! ;)

Pushed to drm-misc-next-fixes, but I realize I forgot to drop the extra
Suggested-by. Oh well.

> 
> And you're doing a much better job of my todo list than I am!
> 
> > Signed-off-by: Boris Brezillon   
> 
> Reviewed-by: Steven Price 
> 
> > ---
> >  drivers/gpu/drm/panthor/panthor_sched.c | 10 +++---
> >  1 file changed, 3 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c 
> > b/drivers/gpu/drm/panthor/panthor_sched.c
> > index fad4678ca4c8..fed28c16d5d1 100644
> > --- a/drivers/gpu/drm/panthor/panthor_sched.c
> > +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> > @@ -2584,8 +2584,8 @@ void panthor_sched_suspend(struct panthor_device 
> > *ptdev)
> >  {
> > struct panthor_scheduler *sched = ptdev->scheduler;
> > struct panthor_csg_slots_upd_ctx upd_ctx;
> > -   u32 suspended_slots, faulty_slots;
> > struct panthor_group *group;
> > +   u32 suspended_slots;
> > u32 i;
> >  
> > mutex_lock(>lock);
> > @@ -2605,10 +2605,9 @@ void panthor_sched_suspend(struct panthor_device 
> > *ptdev)
> >  
> > csgs_upd_ctx_apply_locked(ptdev, _ctx);
> > suspended_slots &= ~upd_ctx.timedout_mask;
> > -   faulty_slots = upd_ctx.timedout_mask;
> >  
> > -   if (faulty_slots) {
> > -   u32 slot_mask = faulty_slots;
> > +   if (upd_ctx.timedout_mask) {
> > +   u32 slot_mask = upd_ctx.timedout_mask;
> >  
> > drm_err(>base, "CSG suspend failed, escalating to 
> > termination");
> > csgs_upd_ctx_init(_ctx);
> > @@ -2659,9 +2658,6 @@ void panthor_sched_suspend(struct panthor_device 
> > *ptdev)
> >  
> > slot_mask &= ~BIT(csg_id);
> > }
> > -
> > -   if (flush_caches_failed)
> > -   faulty_slots |= suspended_slots;
> > }
> >  
> > for (i = 0; i < sched->csg_slot_count; i++) {  
>

[PATCH v2] drm/panthor: Make sure we handle 'unknown group state' case properly

When we check for state values returned by the FW, we only cover part of
the 0:7 range. Make sure we catch FW inconsistencies by adding a default
to the switch statement, and flagging the group state as unknown in that
case.

When an unknown state is detected, we trigger a reset, and consider the
group as unusable after that point, to prevent the potential corruption
from creeping in other places if we continue executing stuff on this
context.

v2:
- Add Steve's R-b
- Fix commit message

Reported-by: Dan Carpenter 
Suggested-by: Steven Price 
Signed-off-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: Liviu Dudau 
---
 drivers/gpu/drm/panthor/panthor_sched.c | 37 +++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_sched.c 
b/drivers/gpu/drm/panthor/panthor_sched.c
index b3a51a6de523..1a14ee30f774 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -490,6 +490,18 @@ enum panthor_group_state {
 * Can no longer be scheduled. The only allowed action is a destruction.
 */
PANTHOR_CS_GROUP_TERMINATED,
+
+   /**
+* @PANTHOR_CS_GROUP_UNKNOWN_STATE: Group is an unknown state.
+*
+* The FW returned an inconsistent state. The group is flagged unusable
+* and can no longer be scheduled. The only allowed action is a
+* destruction.
+*
+* When that happens, we also schedule a FW reset, to start from a fresh
+* state.
+*/
+   PANTHOR_CS_GROUP_UNKNOWN_STATE,
 };
 
 /**
@@ -1127,6 +1139,7 @@ csg_slot_sync_state_locked(struct panthor_device *ptdev, 
u32 csg_id)
struct panthor_fw_csg_iface *csg_iface;
struct panthor_group *group;
enum panthor_group_state new_state, old_state;
+   u32 csg_state;
 
lockdep_assert_held(>scheduler->lock);
 
@@ -1137,7 +1150,8 @@ csg_slot_sync_state_locked(struct panthor_device *ptdev, 
u32 csg_id)
return;
 
old_state = group->state;
-   switch (csg_iface->output->ack & CSG_STATE_MASK) {
+   csg_state = csg_iface->output->ack & CSG_STATE_MASK;
+   switch (csg_state) {
case CSG_STATE_START:
case CSG_STATE_RESUME:
new_state = PANTHOR_CS_GROUP_ACTIVE;
@@ -1148,11 +1162,28 @@ csg_slot_sync_state_locked(struct panthor_device 
*ptdev, u32 csg_id)
case CSG_STATE_SUSPEND:
new_state = PANTHOR_CS_GROUP_SUSPENDED;
break;
+   default:
+   /* The unknown state might be caused by a FW state corruption,
+* which means the group metadata can't be trusted anymore, and
+* the SUSPEND operation might propagate the corruption to the
+* suspend buffers. Flag the group state as unknown to make
+* sure it's unusable after that point.
+*/
+   drm_err(>base, "Invalid state on CSG %d (state=%d)",
+   csg_id, csg_state);
+   new_state = PANTHOR_CS_GROUP_UNKNOWN_STATE;
+   break;
}
 
if (old_state == new_state)
return;
 
+   /* The unknown state might be caused by a FW issue, reset the FW to
+* take a fresh start.
+*/
+   if (new_state == PANTHOR_CS_GROUP_UNKNOWN_STATE)
+   panthor_device_schedule_reset(ptdev);
+
if (new_state == PANTHOR_CS_GROUP_SUSPENDED)
csg_slot_sync_queues_state_locked(ptdev, csg_id);
 
@@ -1783,6 +1814,7 @@ static bool
 group_can_run(struct panthor_group *group)
 {
return group->state != PANTHOR_CS_GROUP_TERMINATED &&
+  group->state != PANTHOR_CS_GROUP_UNKNOWN_STATE &&
   !group->destroyed && group->fatal_queues == 0 &&
   !group->timedout;
 }
@@ -2557,7 +2589,8 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
 
if (csg_slot->group) {
csgs_upd_ctx_queue_reqs(ptdev, _ctx, i,
-   CSG_STATE_SUSPEND,
+   group_can_run(csg_slot->group) ?
+   CSG_STATE_SUSPEND : 
CSG_STATE_TERMINATE,
CSG_STATE_MASK);
}
}
-- 
2.44.0

Re: [PATCH v3 5/5] drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity constraints

On 02/05/2024 16:40, Boris Brezillon wrote:
> Make sure the user is aware that drm_panthor_tiler_heap_destroy::handle
> must be a handle previously returned by
> DRM_IOCTL_PANTHOR_TILER_HEAP_CREATE.
> 
> Signed-off-by: Boris Brezillon 

Reviewed-by: Steven Price 

> ---
>  include/uapi/drm/panthor_drm.h | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
> index b8220d2e698f..aaed8e12ad0b 100644
> --- a/include/uapi/drm/panthor_drm.h
> +++ b/include/uapi/drm/panthor_drm.h
> @@ -939,7 +939,11 @@ struct drm_panthor_tiler_heap_create {
>   * struct drm_panthor_tiler_heap_destroy - Arguments passed to 
> DRM_IOCTL_PANTHOR_TILER_HEAP_DESTROY
>   */
>  struct drm_panthor_tiler_heap_destroy {
> - /** @handle: Handle of the tiler heap to destroy */
> + /**
> +  * @handle: Handle of the tiler heap to destroy.
> +  *
> +  * Must be a valid heap handle returned by 
> DRM_IOCTL_PANTHOR_TILER_HEAP_CREATE.
> +  */
>   __u32 handle;
>  
>   /** @pad: Padding field, MBZ. */

Re: [PATCH v3 4/5] drm/panthor: Fix an off-by-one in the heap context retrieval logic

On 02/05/2024 16:40, Boris Brezillon wrote:
> The heap ID is used to index the heap context pool, and allocating
> in the [1:MAX_HEAPS_PER_POOL] leads to an off-by-one. This was
> originally to avoid returning a zero heap handle, but given the handle
> is formed with (vm_id << 16) | heap_id, with vm_id > 0, we already can't
> end up with a valid heap handle that's zero.
> 
> v3:
> - Allocate in the [0:MAX_HEAPS_PER_POOL-1] range
> 
> v2:
> - New patch
> 
> Fixes: 9cca48fa4f89 ("drm/panthor: Add the heap logical block")
> Reported-by: Eric Smith 
> Signed-off-by: Boris Brezillon 
> Tested-by: Eric Smith 

Don't we also need to change the xa_init_flags() in
panthor_heap_pool_create()?

Steve

> ---
>  drivers/gpu/drm/panthor/panthor_heap.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_heap.c 
> b/drivers/gpu/drm/panthor/panthor_heap.c
> index 683bb94761bc..252332f5390f 100644
> --- a/drivers/gpu/drm/panthor/panthor_heap.c
> +++ b/drivers/gpu/drm/panthor/panthor_heap.c
> @@ -323,7 +323,8 @@ int panthor_heap_create(struct panthor_heap_pool *pool,
>   if (!pool->vm) {
>   ret = -EINVAL;
>   } else {
> - ret = xa_alloc(>xa, , heap, XA_LIMIT(1, 
> MAX_HEAPS_PER_POOL), GFP_KERNEL);
> + ret = xa_alloc(>xa, , heap,
> +XA_LIMIT(0, MAX_HEAPS_PER_POOL - 1), GFP_KERNEL);
>   if (!ret) {
>   void *gpu_ctx = panthor_get_heap_ctx(pool, id);
>

Re: [PATCH v3 3/5] drm/panthor: Relax the constraints on the tiler chunk size

On 02/05/2024 16:40, Boris Brezillon wrote:
> The field used to store the chunk size if 12 bits wide, and the encoding
> is chunk_size = chunk_header.chunk_size << 12, which gives us a
> theoretical [4k:8M] range. This range is further limited by
> implementation constraints, and all known implementations seem to
> impose a [128k:8M] range, so do the same here.
> 
> We also relax the power-of-two constraint, which doesn't seem to
> exist on v10. This will allow userspace to fine-tune initial/max
> tiler memory on memory-constrained devices.
> 
> v3:
> - Add R-bs
> - Fix valid range in the kerneldoc

Sadly the fixed range didn't make it to this posting... ;)

Steve

> 
> v2:
> - Turn the power-of-two constraint into a page-aligned constraint to allow
>   fine-tune of the initial/max heap memory size
> - Fix the panthor_heap_create() kerneldoc
> 
> Fixes: 9cca48fa4f89 ("drm/panthor: Add the heap logical block")
> Signed-off-by: Boris Brezillon 
> Reviewed-by: Liviu Dudau 
> Reviewed-by: Steven Price 
> ---
>  drivers/gpu/drm/panthor/panthor_heap.c | 8 
>  include/uapi/drm/panthor_drm.h | 6 +-
>  2 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_heap.c 
> b/drivers/gpu/drm/panthor/panthor_heap.c
> index 3be86ec383d6..683bb94761bc 100644
> --- a/drivers/gpu/drm/panthor/panthor_heap.c
> +++ b/drivers/gpu/drm/panthor/panthor_heap.c
> @@ -253,8 +253,8 @@ int panthor_heap_destroy(struct panthor_heap_pool *pool, 
> u32 handle)
>   * @pool: Pool to instantiate the heap context from.
>   * @initial_chunk_count: Number of chunk allocated at initialization time.
>   * Must be at least 1.
> - * @chunk_size: The size of each chunk. Must be a power of two between 256k
> - * and 2M.
> + * @chunk_size: The size of each chunk. Must be page-aligned and lie in the
> + * [128k:2M] range.
>   * @max_chunks: Maximum number of chunks that can be allocated.
>   * @target_in_flight: Maximum number of in-flight render passes.
>   * @heap_ctx_gpu_va: Pointer holding the GPU address of the allocated heap
> @@ -284,8 +284,8 @@ int panthor_heap_create(struct panthor_heap_pool *pool,
>   if (initial_chunk_count > max_chunks)
>   return -EINVAL;
>  
> - if (hweight32(chunk_size) != 1 ||
> - chunk_size < SZ_256K || chunk_size > SZ_2M)
> + if (!IS_ALIGNED(chunk_size, PAGE_SIZE) ||
> + chunk_size < SZ_128K || chunk_size > SZ_8M)
>   return -EINVAL;
>  
>   down_read(>lock);
> diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
> index 5db80a0682d5..b8220d2e698f 100644
> --- a/include/uapi/drm/panthor_drm.h
> +++ b/include/uapi/drm/panthor_drm.h
> @@ -898,7 +898,11 @@ struct drm_panthor_tiler_heap_create {
>   /** @initial_chunk_count: Initial number of chunks to allocate. Must be 
> at least one. */
>   __u32 initial_chunk_count;
>  
> - /** @chunk_size: Chunk size. Must be a power of two at least 256KB 
> large. */
> + /**
> +  * @chunk_size: Chunk size.
> +  *
> +  * Must be page-aligned and lie in the [128k:8M] range.
> +  */
>   __u32 chunk_size;
>  
>   /**

[PATCH v3 5/5] drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity constraints

Make sure the user is aware that drm_panthor_tiler_heap_destroy::handle
must be a handle previously returned by
DRM_IOCTL_PANTHOR_TILER_HEAP_CREATE.

Signed-off-by: Boris Brezillon 
---
 include/uapi/drm/panthor_drm.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index b8220d2e698f..aaed8e12ad0b 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -939,7 +939,11 @@ struct drm_panthor_tiler_heap_create {
  * struct drm_panthor_tiler_heap_destroy - Arguments passed to 
DRM_IOCTL_PANTHOR_TILER_HEAP_DESTROY
  */
 struct drm_panthor_tiler_heap_destroy {
-   /** @handle: Handle of the tiler heap to destroy */
+   /**
+* @handle: Handle of the tiler heap to destroy.
+*
+* Must be a valid heap handle returned by 
DRM_IOCTL_PANTHOR_TILER_HEAP_CREATE.
+*/
__u32 handle;
 
/** @pad: Padding field, MBZ. */
-- 
2.44.0

[PATCH v3 4/5] drm/panthor: Fix an off-by-one in the heap context retrieval logic

The heap ID is used to index the heap context pool, and allocating
in the [1:MAX_HEAPS_PER_POOL] leads to an off-by-one. This was
originally to avoid returning a zero heap handle, but given the handle
is formed with (vm_id << 16) | heap_id, with vm_id > 0, we already can't
end up with a valid heap handle that's zero.

v3:
- Allocate in the [0:MAX_HEAPS_PER_POOL-1] range

v2:
- New patch

Fixes: 9cca48fa4f89 ("drm/panthor: Add the heap logical block")
Reported-by: Eric Smith 
Signed-off-by: Boris Brezillon 
Tested-by: Eric Smith 
---
 drivers/gpu/drm/panthor/panthor_heap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panthor/panthor_heap.c 
b/drivers/gpu/drm/panthor/panthor_heap.c
index 683bb94761bc..252332f5390f 100644
--- a/drivers/gpu/drm/panthor/panthor_heap.c
+++ b/drivers/gpu/drm/panthor/panthor_heap.c
@@ -323,7 +323,8 @@ int panthor_heap_create(struct panthor_heap_pool *pool,
if (!pool->vm) {
ret = -EINVAL;
} else {
-   ret = xa_alloc(>xa, , heap, XA_LIMIT(1, 
MAX_HEAPS_PER_POOL), GFP_KERNEL);
+   ret = xa_alloc(>xa, , heap,
+  XA_LIMIT(0, MAX_HEAPS_PER_POOL - 1), GFP_KERNEL);
if (!ret) {
void *gpu_ctx = panthor_get_heap_ctx(pool, id);
 
-- 
2.44.0

[PATCH v3 3/5] drm/panthor: Relax the constraints on the tiler chunk size

The field used to store the chunk size if 12 bits wide, and the encoding
is chunk_size = chunk_header.chunk_size << 12, which gives us a
theoretical [4k:8M] range. This range is further limited by
implementation constraints, and all known implementations seem to
impose a [128k:8M] range, so do the same here.

We also relax the power-of-two constraint, which doesn't seem to
exist on v10. This will allow userspace to fine-tune initial/max
tiler memory on memory-constrained devices.

v3:
- Add R-bs
- Fix valid range in the kerneldoc

v2:
- Turn the power-of-two constraint into a page-aligned constraint to allow
  fine-tune of the initial/max heap memory size
- Fix the panthor_heap_create() kerneldoc

Fixes: 9cca48fa4f89 ("drm/panthor: Add the heap logical block")
Signed-off-by: Boris Brezillon 
Reviewed-by: Liviu Dudau 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panthor/panthor_heap.c | 8 
 include/uapi/drm/panthor_drm.h | 6 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_heap.c 
b/drivers/gpu/drm/panthor/panthor_heap.c
index 3be86ec383d6..683bb94761bc 100644
--- a/drivers/gpu/drm/panthor/panthor_heap.c
+++ b/drivers/gpu/drm/panthor/panthor_heap.c
@@ -253,8 +253,8 @@ int panthor_heap_destroy(struct panthor_heap_pool *pool, 
u32 handle)
  * @pool: Pool to instantiate the heap context from.
  * @initial_chunk_count: Number of chunk allocated at initialization time.
  * Must be at least 1.
- * @chunk_size: The size of each chunk. Must be a power of two between 256k
- * and 2M.
+ * @chunk_size: The size of each chunk. Must be page-aligned and lie in the
+ * [128k:2M] range.
  * @max_chunks: Maximum number of chunks that can be allocated.
  * @target_in_flight: Maximum number of in-flight render passes.
  * @heap_ctx_gpu_va: Pointer holding the GPU address of the allocated heap
@@ -284,8 +284,8 @@ int panthor_heap_create(struct panthor_heap_pool *pool,
if (initial_chunk_count > max_chunks)
return -EINVAL;
 
-   if (hweight32(chunk_size) != 1 ||
-   chunk_size < SZ_256K || chunk_size > SZ_2M)
+   if (!IS_ALIGNED(chunk_size, PAGE_SIZE) ||
+   chunk_size < SZ_128K || chunk_size > SZ_8M)
return -EINVAL;
 
down_read(>lock);
diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index 5db80a0682d5..b8220d2e698f 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -898,7 +898,11 @@ struct drm_panthor_tiler_heap_create {
/** @initial_chunk_count: Initial number of chunks to allocate. Must be 
at least one. */
__u32 initial_chunk_count;
 
-   /** @chunk_size: Chunk size. Must be a power of two at least 256KB 
large. */
+   /**
+* @chunk_size: Chunk size.
+*
+* Must be page-aligned and lie in the [128k:8M] range.
+*/
__u32 chunk_size;
 
/**
-- 
2.44.0

[PATCH v3 2/5] drm/panthor: Make sure the tiler initial/max chunks are consistent

It doesn't make sense to have a maximum number of chunks smaller than
the initial number of chunks attached to the context.

Fix the uAPI header to reflect the new constraint, and mention the
undocumented "initial_chunk_count > 0" constraint while at it.

v3:
- Add R-b

v2:
- Fix the check

Fixes: 9cca48fa4f89 ("drm/panthor: Add the heap logical block")
Signed-off-by: Boris Brezillon 
Reviewed-by: Liviu Dudau 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panthor/panthor_heap.c | 3 +++
 include/uapi/drm/panthor_drm.h | 8 ++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_heap.c 
b/drivers/gpu/drm/panthor/panthor_heap.c
index c3c0ba744937..3be86ec383d6 100644
--- a/drivers/gpu/drm/panthor/panthor_heap.c
+++ b/drivers/gpu/drm/panthor/panthor_heap.c
@@ -281,6 +281,9 @@ int panthor_heap_create(struct panthor_heap_pool *pool,
if (initial_chunk_count == 0)
return -EINVAL;
 
+   if (initial_chunk_count > max_chunks)
+   return -EINVAL;
+
if (hweight32(chunk_size) != 1 ||
chunk_size < SZ_256K || chunk_size > SZ_2M)
return -EINVAL;
diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index dadb05ab1235..5db80a0682d5 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -895,13 +895,17 @@ struct drm_panthor_tiler_heap_create {
/** @vm_id: VM ID the tiler heap should be mapped to */
__u32 vm_id;
 
-   /** @initial_chunk_count: Initial number of chunks to allocate. */
+   /** @initial_chunk_count: Initial number of chunks to allocate. Must be 
at least one. */
__u32 initial_chunk_count;
 
/** @chunk_size: Chunk size. Must be a power of two at least 256KB 
large. */
__u32 chunk_size;
 
-   /** @max_chunks: Maximum number of chunks that can be allocated. */
+   /**
+* @max_chunks: Maximum number of chunks that can be allocated.
+*
+* Must be at least @initial_chunk_count.
+*/
__u32 max_chunks;
 
/**
-- 
2.44.0

[PATCH v3 0/5] drm/panthor: Collection of tiler heap related fixes

This is a collection of tiler heap fixes for bugs/oddities found while
looking at incremental rendering.

Ideally, we want to land those before 6.10 is released, so we don't need
to increment the driver version to reflect the ABI changes.

Changelog detailed in each commit.

Regards,

Boris

Antonino Maniscalco (1):
  drm/panthor: Fix tiler OOM handling to allow incremental rendering

Boris Brezillon (4):
  drm/panthor: Make sure the tiler initial/max chunks are consistent
  drm/panthor: Relax the constraints on the tiler chunk size
  drm/panthor: Fix an off-by-one in the heap context retrieval logic
  drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity
constraints

 drivers/gpu/drm/panthor/panthor_heap.c  | 26 -
 drivers/gpu/drm/panthor/panthor_sched.c |  7 ++-
 include/uapi/drm/panthor_drm.h  | 20 +++
 3 files changed, 39 insertions(+), 14 deletions(-)

-- 
2.44.0

[PATCH v3 1/5] drm/panthor: Fix tiler OOM handling to allow incremental rendering

From: Antonino Maniscalco 

If the kernel couldn't allocate memory because we reached the maximum
number of chunks but no render passes are in flight
(panthor_heap_grow() returning -ENOMEM), we should defer the OOM
handling to the FW by returning a NULL chunk. The FW will then call
the tiler OOM exception handler, which is supposed to implement
incremental rendering (execute an intermediate fragment job to flush
the pending primitives, release the tiler memory that was used to
store those primitives, and start over from where it stopped).

Instead of checking for both ENOMEM and EBUSY, make panthor_heap_grow()
return ENOMEM no matter the reason of this allocation failure, the FW
doesn't care anyway.

v3:
- Add R-bs

v2:
- Make panthor_heap_grow() return -ENOMEM for all kind of allocation
  failures
- Document the panthor_heap_grow() semantics

Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block")
Signed-off-by: Antonino Maniscalco 
Signed-off-by: Boris Brezillon 
Reviewed-by: Liviu Dudau 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panthor/panthor_heap.c  | 12 
 drivers/gpu/drm/panthor/panthor_sched.c |  7 ++-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_heap.c 
b/drivers/gpu/drm/panthor/panthor_heap.c
index 143fa35f2e74..c3c0ba744937 100644
--- a/drivers/gpu/drm/panthor/panthor_heap.c
+++ b/drivers/gpu/drm/panthor/panthor_heap.c
@@ -410,6 +410,13 @@ int panthor_heap_return_chunk(struct panthor_heap_pool 
*pool,
  * @renderpasses_in_flight: Number of render passes currently in-flight.
  * @pending_frag_count: Number of fragment jobs waiting for 
execution/completion.
  * @new_chunk_gpu_va: Pointer used to return the chunk VA.
+ *
+ * Return:
+ * - 0 if a new heap was allocated
+ * - -ENOMEM if the tiler context reached the maximum number of chunks
+ *   or if too many render passes are in-flight
+ *   or if the allocation failed
+ * - -EINVAL if any of the arguments passed to panthor_heap_grow() is invalid
  */
 int panthor_heap_grow(struct panthor_heap_pool *pool,
  u64 heap_gpu_va,
@@ -439,10 +446,7 @@ int panthor_heap_grow(struct panthor_heap_pool *pool,
 * handler provided by the userspace driver, if any).
 */
if (renderpasses_in_flight > heap->target_in_flight ||
-   (pending_frag_count > 0 && heap->chunk_count >= heap->max_chunks)) {
-   ret = -EBUSY;
-   goto out_unlock;
-   } else if (heap->chunk_count >= heap->max_chunks) {
+   heap->chunk_count >= heap->max_chunks) {
ret = -ENOMEM;
goto out_unlock;
}
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c 
b/drivers/gpu/drm/panthor/panthor_sched.c
index b3a51a6de523..fd928362d45e 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -1354,7 +1354,12 @@ static int group_process_tiler_oom(struct panthor_group 
*group, u32 cs_id)
pending_frag_count, _chunk_va);
}
 
-   if (ret && ret != -EBUSY) {
+   /* If the heap context doesn't have memory for us, we want to let the
+* FW try to reclaim memory by waiting for fragment jobs to land or by
+* executing the tiler OOM exception handler, which is supposed to
+* implement incremental rendering.
+*/
+   if (ret && ret != -ENOMEM) {
drm_warn(>base, "Failed to extend the tiler heap\n");
group->fatal_queues |= BIT(cs_id);
sched_queue_delayed_work(sched, tick, 0);
-- 
2.44.0

Re: [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table from hmm range

2024-05-02 Thread Thomas Hellström

On Thu, 2024-05-02 at 09:46 -0300, Jason Gunthorpe wrote:
> On Thu, May 02, 2024 at 11:11:04AM +0200, Thomas Hellström wrote:
> 
> > It's true the cpu vma lookup is a remnant from amdkfd. The idea
> > here is
> > to replace that with fixed prefaulting ranges of tunable size. So
> > far,
> > as you mention, the prefaulting range has been determined by the
> > CPU
> > vma size. Given previous feedback, this is going to change.
> 
> Perhaps limiting prefault to a VMA barrier is a reasonable thing to
> do, but the implementation should be pushed into hmm_range_fault and
> not open coded in the driver.
> 
> > Still the prefaulting range needs to be restricted to avoid -EFAULT
> > failures in hmm_range_fault(). That can ofc be done by calling it
> > without HMM_PFN_REQ_FAULT for the range and interpret the returned
> > pnfs. 
> 
> Yes, this is exactly what that feature is for, you mark your prefetch
> differently from the fault critical page(s).
> 
> > There is a performance concern of this approach as compared to
> > peeking at the CPU vmas directly, since hmm_range_fault() would
> > need to
> > be called twice. Any guidelines ideas here?
> 
> If there is something wrong with hmm_range_fault() then please fix
> it. I'm not sure why you'd call it twice, the HMM_PFN_REQ_FAULT is
> per
> PFN?

Ah, yes you're right. I somehow thought it was per range. Makes sense
now.

Thanks,
Thomas



> 
> Jason

Re: [PATCH v2 4/4] drm/panthor: Fix an off-by-one in the heap context retrieval logic

On Thu, 2 May 2024 16:36:02 +0200
Boris Brezillon  wrote:

> On Thu, 2 May 2024 15:26:55 +0100
> Steven Price  wrote:
> 
> > On 02/05/2024 15:15, Boris Brezillon wrote:  
> > > On Thu, 2 May 2024 15:03:51 +0100
> > > Steven Price  wrote:
> > > 
> > >> On 30/04/2024 12:28, Boris Brezillon wrote:
> > >>> ID 0 is reserved to encode 'no-tiler-heap', the heap ID range is
> > >>> [1:MAX_HEAPS_PER_POOL], which we occasionally need to turn into an index
> > >>> in the [0:MAX_HEAPS_PER_POOL-1] when we want to access the context 
> > >>> object.  
> > >>
> > >> This might be a silly question, but do we need ID 0 to be
> > >> "no-tiler-heap"? Would it be easier to e.g. use a negative number for
> > >> that situation and avoid all the off-by-one problems?
> > >>
> > >> I'm struggling to find the code which needs the 0 value to be special -
> > >> where is it exactly that we encode this "no-tiler-heap" value?
> > > 
> > > Hm, I thought we were passing the heap handle to the group creation
> > > ioctl, but heap queue/heap association is actually done through a CS
> > > instruction, so I guess you have a point. The only thing that makes a
> > > bit hesitant is that handle=0 is reserved for all other kind of handles
> > > we return, and I think I'd prefer to keep it the same for heap handles.
> > > 
> > > This being said, we could do the `+- 1` in
> > > panthor_ioctl_tiler_heap_{create,destroy}() to keep things simple in
> > > panthor_heap.c.
> > 
> > The heap handles returned to user space have the upper 16 bits encoding
> > the VM ID - so hopefully no one is doing anything crazy and splitting it
> > up to treat the lower part specially. And (unless I'm mistaken) the VM
> > IDs start from 1 so we'd still not have IDs of 0. So I don't think we
> > need the +- 1 part anywhere for tiler heaps.  
> 
> Ah, I forgot about that too. Guess we're all good with a
> [0,MAX_HEAPS_PER_POOL-1] range then.
> 
> > 
> > I'd certainly consider it a user space bug to treat the handles as
> > anything other than opaque. Really user space shouldn't be treating 0 as
> > special either: the uAPI doesn't say it's not valid. But I'd be open to
> > updating the uAPI to say 0 is invalid if there's some desire for that.  
> 
> Will do that in v3 then.

Taking that back. I don't think it needs to be enforced in the uAPI. As
you said, it's supposed to be opaque, so I'm tempted to update the
drm_panthor_tiler_heap_destroy::handle kerneldoc saying it must be
a valid handle returned by DRM_IOCTL_PANTHOR_TILER_HEAP_CREATE instead.

It's just that making the handle non-zero is kinda nice for debugging
purposes, and as I said, this way it's consistent with other kind of
handles (GEMs, VMs, syncobjs, ...).

Re: [PATCH 1/2] dt-bindings: drm/bridge: anx7625: Add a perporty to change TDM setting

2024-05-02 Thread Conor Dooley

On Thu, May 02, 2024 at 09:03:31AM +, Hsin-Te Yuan wrote:
> Add a perporty to indicate whether anx7625 should shift the first audio
> data bit. The default TDM setting is to shift the first audio data bit.
> 
> Signed-off-by: Hsin-Te Yuan 
> ---
>  .../devicetree/bindings/display/bridge/analogix,anx7625.yaml  | 4 
> 
>  1 file changed, 4 insertions(+)
> 
> diff --git 
> a/Documentation/devicetree/bindings/display/bridge/analogix,anx7625.yaml 
> b/Documentation/devicetree/bindings/display/bridge/analogix,anx7625.yaml
> index a1ed1004651b9..915d5d54a2160 100644
> --- a/Documentation/devicetree/bindings/display/bridge/analogix,anx7625.yaml
> +++ b/Documentation/devicetree/bindings/display/bridge/analogix,anx7625.yaml
> @@ -82,6 +82,10 @@ properties:
>  type: boolean
>  description: let the driver enable audio HDMI codec function or not.
>  
> +  no-shift-audio-data:
> +type: boolean
> +description: Disable the first audio data bit shift in the TDM settings.

This just looks like software policy, since there's no mention in the
commit message or description as to what property of the hardware causes
this to be required. Can you please explain why this property is needed?

You're also missing a vendor prefix.

Thanks,
Conor.

> +
>aux-bus:
>  $ref: /schemas/display/dp-aux-bus.yaml#
>  
> 
> -- 
> 2.45.0.rc1.225.g2a3ae87e7f-goog
> 


signature.asc
Description: PGP signature

Re: [PATCH v2 4/4] drm/panthor: Fix an off-by-one in the heap context retrieval logic

On Thu, 2 May 2024 15:26:55 +0100
Steven Price  wrote:

> On 02/05/2024 15:15, Boris Brezillon wrote:
> > On Thu, 2 May 2024 15:03:51 +0100
> > Steven Price  wrote:
> >   
> >> On 30/04/2024 12:28, Boris Brezillon wrote:  
> >>> ID 0 is reserved to encode 'no-tiler-heap', the heap ID range is
> >>> [1:MAX_HEAPS_PER_POOL], which we occasionally need to turn into an index
> >>> in the [0:MAX_HEAPS_PER_POOL-1] when we want to access the context 
> >>> object.
> >>
> >> This might be a silly question, but do we need ID 0 to be
> >> "no-tiler-heap"? Would it be easier to e.g. use a negative number for
> >> that situation and avoid all the off-by-one problems?
> >>
> >> I'm struggling to find the code which needs the 0 value to be special -
> >> where is it exactly that we encode this "no-tiler-heap" value?  
> > 
> > Hm, I thought we were passing the heap handle to the group creation
> > ioctl, but heap queue/heap association is actually done through a CS
> > instruction, so I guess you have a point. The only thing that makes a
> > bit hesitant is that handle=0 is reserved for all other kind of handles
> > we return, and I think I'd prefer to keep it the same for heap handles.
> > 
> > This being said, we could do the `+- 1` in
> > panthor_ioctl_tiler_heap_{create,destroy}() to keep things simple in
> > panthor_heap.c.  
> 
> The heap handles returned to user space have the upper 16 bits encoding
> the VM ID - so hopefully no one is doing anything crazy and splitting it
> up to treat the lower part specially. And (unless I'm mistaken) the VM
> IDs start from 1 so we'd still not have IDs of 0. So I don't think we
> need the +- 1 part anywhere for tiler heaps.

Ah, I forgot about that too. Guess we're all good with a
[0,MAX_HEAPS_PER_POOL-1] range then.

> 
> I'd certainly consider it a user space bug to treat the handles as
> anything other than opaque. Really user space shouldn't be treating 0 as
> special either: the uAPI doesn't say it's not valid. But I'd be open to
> updating the uAPI to say 0 is invalid if there's some desire for that.

Will do that in v3 then.

Thanks!

Boris

drm scheduler and wq flavours

2024-05-02 Thread Tvrtko Ursulin




Hi all,

Continuing after the brief IRC discussion yesterday regarding work 
queues being prone to deadlocks or not, I had a browse around the code 
base and ended up a bit confused.


When drm_sched_init documents and allocates an *ordered* wq, if no 
custom one was provided, could someone remind me was the ordered 
property fundamental for something to work correctly? Like run_job vs 
free_job ordering?


I ask because it appears different drivers to different things and at 
the moment it looks we have all possible combos or ordered/unordered, 
bound and unbound, shared or not shared with the timeout wq, or even 
unbound for the timeout wq.


The drivers worth looking at in this respect are probably nouveau, 
panthor, pvr and xe.


Nouveau also talks about a depency betwen run_job and free_job and goes 
to create two unordered wqs.


Then xe looks a bit funky with the workaround/hack for lockep where it 
creates 512 work queues and hands them over to user queues in 
round-robin fashion. (Instead of default 1:1.) Which I suspect is a 
problem which should be applicable for any 1:1 driver given a thorough 
enough test suite.


So anyway.. ordered vs unordered - drm sched dictated or at driver's choice?

Regards,

Tvrtko

Re: [PATCH v3 0/9] drm/mipi-dsi: Reduce bloat and add funcs for cleaner init seqs

2024-05-02 Thread Doug Anderson

Hi,

On Thu, May 2, 2024 at 12:48 AM Neil Armstrong
 wrote:
>
> Hi,
>
> On 01/05/2024 17:41, Douglas Anderson wrote:
> > The consensus of many DRM folks is that we want to move away from DSI
> > drivers defining tables of init commands. Instead, we want to move to
> > init functions that can use common DRM functions. The issue thus far
> > has been that using the macros mipi_dsi_generic_write_seq() and
> > mipi_dsi_dcs_write_seq() bloats the driver using them.
> >
> > While trying to solve bloat, we realized that the majority of the it
> > was easy to solve. This series solves the bloat for existing drivers
> > by moving the printout outside of the macro.
> >
> > During discussion of my v1 patch to fix the bloat [1], we also decided
> > that we really want to change the way that drivers deal with init
> > sequences to make it clearer. In addition to being cleaner, a side
> > effect of moving drivers to the new style reduces bloat _even more_.
> >
> > This series also contains a few minor fixes / cleanups that I found
> > along the way.
> >
> > This series converts four drivers over to the new
> > mipi_dsi_dcs_write_seq_multi() function. Not all conversions have been
> > tested, but hopefully they are straightforward enough. I'd appreciate
> > testing.
> >
> > NOTE: In v3 I tried to incorporate the feedback from v2. I also
> > converted the other two panels I could find that used table-based
> > initialization.
> >
> > [1] 
> > https://lore.kernel.org/r/20240424172017.1.Id15fae80582bc74a0d4f1338987fa375738f45b9@changeid
> >
> > Changes in v3:
> > - ("mipi_dsi_*_write functions don't need to ratelimit...") moved earlier.
> > - Add a TODO item for cleaning up the deprecated macros/functions.
> > - Fix spacing of init function.
> > - Inline kerneldoc comments for struct mipi_dsi_multi_context.
> > - Rebased upon patch to remove ratelimit of prints.
> > - Remove an unneeded error print.
> > - Squash boe-tv101wum-nl6 lowercase patch into main patch
> > - Use %zd in print instead of casting errors to int.
> > - drm/panel: ili9882t: Don't use a table for initting panels
> > - drm/panel: innolux-p079zca: Don't use a table for initting panels
> >
> > Changes in v2:
> > - Add some comments to the macros about printing and returning.
> > - Change the way err value is handled in prep for next patch.
> > - Modify commit message now that this is part of a series.
> > - Rebased upon patches to avoid theoretical int overflow.
> > - drm/mipi-dsi: Fix theoretical int overflow in mipi_dsi_dcs_write_seq()
> > - drm/mipi-dsi: Fix theoretical int overflow in mipi_dsi_generic_write_seq()
> > - drm/mipi-dsi: Introduce mipi_dsi_*_write_seq_multi()
> > - drm/mipi-dsi: mipi_dsi_*_write functions don't need to ratelimit prints
> > - drm/panel: boe-tv101wum-nl6: Convert hex to lowercase
> > - drm/panel: boe-tv101wum-nl6: Don't use a table for initting commands
> > - drm/panel: novatek-nt36672e: Switch to mipi_dsi_dcs_write_seq_multi()
> >
> > Douglas Anderson (9):
> >drm/mipi-dsi: Fix theoretical int overflow in mipi_dsi_dcs_write_seq()
> >drm/mipi-dsi: Fix theoretical int overflow in
> >  mipi_dsi_generic_write_seq()
> >drm/mipi-dsi: mipi_dsi_*_write functions don't need to ratelimit
> >  prints
> >drm/mipi-dsi: Reduce driver bloat of mipi_dsi_*_write_seq()
> >drm/mipi-dsi: Introduce mipi_dsi_*_write_seq_multi()
> >drm/panel: novatek-nt36672e: Switch to mipi_dsi_dcs_write_seq_multi()
> >drm/panel: boe-tv101wum-nl6: Don't use a table for initting panels
> >drm/panel: ili9882t: Don't use a table for initting panels
> >drm/panel: innolux-p079zca: Don't use a table for initting panels
>
> Thanks Doug!
>
> I think we all agree on the core changes, now I think we can wait a few weeks
> and try to get some test feedbacks on the indirectly and directly affected
> panels, drm-misc-next won't be merged into linux-next until v6.10-rc1 anyway
> so we have some time to test on our boards.

Great!

Just to be clear, are you suggesting that we leave these patches on
the lists for a few weeks before landing in drm-misc-next, or are you
suggesting that it's safe to land them in drm-misc-next because it
won't make it to linuxnext for a while anyway? I assume the former
(AKA leave it on the lists for a while) but just want to be clear. ;-)

There's nothing terribly urgent about these patches except that they
are blocking Cong Yang's patch series [0] and lvzhaoxiong's patch
series [1]. I think it would be fine for them to send out their patch
series with mine marked as a dependency so we can finish reviewing
their series and then when mine lands theirs will be good to go too.

Maybe we can aim for 2 weeks of stewing on the list if there are no
issues during that time? I know landing in drm-misc during this time
won't help get into mainline faster, but with ChromeOS's "upstream
first" policy it saves us a bunch of headache if we can land things in
our tree from a maintainer tree with stable git hashes (like

Re: [PATCH v2 4/4] drm/panthor: Fix an off-by-one in the heap context retrieval logic

On 02/05/2024 15:15, Boris Brezillon wrote:
> On Thu, 2 May 2024 15:03:51 +0100
> Steven Price  wrote:
> 
>> On 30/04/2024 12:28, Boris Brezillon wrote:
>>> ID 0 is reserved to encode 'no-tiler-heap', the heap ID range is
>>> [1:MAX_HEAPS_PER_POOL], which we occasionally need to turn into an index
>>> in the [0:MAX_HEAPS_PER_POOL-1] when we want to access the context object.  
>>
>> This might be a silly question, but do we need ID 0 to be
>> "no-tiler-heap"? Would it be easier to e.g. use a negative number for
>> that situation and avoid all the off-by-one problems?
>>
>> I'm struggling to find the code which needs the 0 value to be special -
>> where is it exactly that we encode this "no-tiler-heap" value?
> 
> Hm, I thought we were passing the heap handle to the group creation
> ioctl, but heap queue/heap association is actually done through a CS
> instruction, so I guess you have a point. The only thing that makes a
> bit hesitant is that handle=0 is reserved for all other kind of handles
> we return, and I think I'd prefer to keep it the same for heap handles.
> 
> This being said, we could do the `+- 1` in
> panthor_ioctl_tiler_heap_{create,destroy}() to keep things simple in
> panthor_heap.c.

The heap handles returned to user space have the upper 16 bits encoding
the VM ID - so hopefully no one is doing anything crazy and splitting it
up to treat the lower part specially. And (unless I'm mistaken) the VM
IDs start from 1 so we'd still not have IDs of 0. So I don't think we
need the +- 1 part anywhere for tiler heaps.

I'd certainly consider it a user space bug to treat the handles as
anything other than opaque. Really user space shouldn't be treating 0 as
special either: the uAPI doesn't say it's not valid. But I'd be open to
updating the uAPI to say 0 is invalid if there's some desire for that.

Steve

[PULL] drm-xe-next-fixes

2024-05-02 Thread Thomas Hellstrom

Dave, Sima

This week's small set of fixes for drm-next.

drm-xe-next-fixes-2024-05-02:
Driver Changes:
- Fix for a backmerge going slightly wrong.
- An UAF fix
- Avoid a WA error on LNL.

Thanks,
Thomas

The following changes since commit 4a56c0ed5aa0bcbe1f5f7d755fb1fe1ebf48ae9c:

  Merge tag 'amd-drm-next-6.10-2024-04-26' of 
https://gitlab.freedesktop.org/agd5f/linux into drm-next (2024-04-30 14:43:00 
+1000)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/xe/kernel.git 
tags/drm-xe-next-fixes-2024-05-02

for you to fetch changes up to 3bc8848bb7f7478e6806e4522b06b63f40a53e1e:

  drm/xe: Merge 16021540221 and 18034896535 WAs (2024-05-02 11:29:42 +0200)


Driver Changes:
- Fix for a backmerge going slightly wrong.
- An UAF fix
- Avoid a WA error on LNL.


Lucas De Marchi (1):
  drm/xe: Merge 16021540221 and 18034896535 WAs

Matthew Auld (1):
  drm/xe/vm: prevent UAF in rebind_work_func()

Thomas Hellström (1):
  drm/xe: Fix unexpected backmerge results

 drivers/gpu/drm/xe/xe_vm.c   | 16 ++--
 drivers/gpu/drm/xe/xe_vm_types.h |  4 
 drivers/gpu/drm/xe/xe_wa.c   |  7 +--
 3 files changed, 15 insertions(+), 12 deletions(-)

Re: [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription

2024-05-02 Thread Maarten Lankhorst


Hey,

Den 2024-04-24 kl. 18:56, skrev Friedrich Vock:

Hi everyone,

recently I've been looking into remedies for apps (in particular, newer
games) that experience significant performance loss when they start to
hit VRAM limits, especially on older or lower-end cards that struggle
to fit both desktop apps and all the game data into VRAM at once.

The root of the problem lies in the fact that from userspace's POV,
buffer eviction is very opaque: Userspace applications/drivers cannot
tell how oversubscribed VRAM is, nor do they have fine-grained control
over which buffers get evicted.  At the same time, with GPU APIs becoming
increasingly lower-level and GPU-driven, only the application itself
can know which buffers are used within a particular submission, and
how important each buffer is. For this, GPU APIs include interfaces
to query oversubscription and specify memory priorities: In Vulkan,
oversubscription can be queried through the VK_EXT_memory_budget
extension. Different buffers can also be assigned priorities via the
VK_EXT_pageable_device_local_memory extension. Modern games, especially
D3D12 games via vkd3d-proton, rely on oversubscription being reported and
priorities being respected in order to perform their memory management.

However, relaying this information to the kernel via the current KMD uAPIs
is not possible. On AMDGPU for example, all work submissions include a
"bo list" that contains any buffer object that is accessed during the
course of the submission. If VRAM is oversubscribed and a buffer in the
list was evicted to system memory, that buffer is moved back to VRAM
(potentially evicting other unused buffers).

Since the usermode driver doesn't know what buffers are used by the
application, its only choice is to submit a bo list that contains every
buffer the application has allocated. In case of VRAM oversubscription,
it is highly likely that some of the application's buffers were evicted,
which almost guarantees that some buffers will get moved around. Since
the bo list is only known at submit time, this also means the buffers
will get moved right before submitting application work, which is the
worst possible time to move buffers from a latency perspective. Another
consequence of the large bo list is that nearly all memory from other
applications will be evicted, too. When different applications (e.g. game
and compositor) submit work one after the other, this causes a ping-pong
effect where each app's submission evicts the other app's memory,
resulting in a large amount of unnecessary moves.

This overly aggressive eviction behavior led to RADV adopting a change
that effectively allows all VRAM applications to reside in system memory
[1].  This worked around the ping-ponging/excessive buffer moving problem,
but also meant that any memory evicted to system memory would forever
stay there, regardless of how VRAM is used.

My proposal aims at providing a middle ground between these extremes.
The goals I want to meet are:
- Userspace is accurately informed about VRAM oversubscription/how much
   VRAM has been evicted
- Buffer eviction respects priorities set by userspace - Wasteful
   ping-ponging is avoided to the extent possible

I have been testing out some prototypes, and came up with this rough
sketch of an API:

- For each ttm_resource_manager, the amount of evicted memory is tracked
   (similarly to how "usage" tracks the memory usage). When memory is
   evicted via ttm_bo_evict, the size of the evicted memory is added, when
   memory is un-evicted (see below), its size is subtracted. The amount of
   evicted memory for e.g. VRAM can be queried by userspace via an ioctl.

- Each ttm_resource_manager maintains a list of evicted buffer objects.

- ttm_mem_unevict walks the list of evicted bos for a given
   ttm_resource_manager and tries moving evicted resources back. When a
   buffer is freed, this function is called to immediately restore some
   evicted memory.

- Each ttm_buffer_object independently tracks the mem_type it wants
   to reside in.

- ttm_bo_try_unevict is added as a helper function which attempts to
   move the buffer to its preferred mem_type. If no space is available
   there, it fails with -ENOSPC/-ENOMEM.

- Similar to how ttm_bo_evict works, each driver can implement
   uneviction_valuable/unevict_flags callbacks to control buffer
   un-eviction.

This is what patches 1-10 accomplish (together with an amdgpu
implementation utilizing the new API).

Userspace priorities could then be implemented as follows:

- TTM already manages priorities for each buffer object. These priorities
   can be updated by userspace via a GEM_OP ioctl to inform the kernel
   which buffers should be evicted before others. If an ioctl increases
   the priority of a buffer, ttm_bo_try_unevict is called on that buffer to
   try and move it back (potentially evicting buffers with a lower
   priority)

- Buffers should never be evicted by other buffers with equal/lower
   priority, but

Re: [PATCH v2 4/4] drm/panthor: Fix an off-by-one in the heap context retrieval logic

On Thu, 2 May 2024 15:03:51 +0100
Steven Price  wrote:

> On 30/04/2024 12:28, Boris Brezillon wrote:
> > ID 0 is reserved to encode 'no-tiler-heap', the heap ID range is
> > [1:MAX_HEAPS_PER_POOL], which we occasionally need to turn into an index
> > in the [0:MAX_HEAPS_PER_POOL-1] when we want to access the context object.  
> 
> This might be a silly question, but do we need ID 0 to be
> "no-tiler-heap"? Would it be easier to e.g. use a negative number for
> that situation and avoid all the off-by-one problems?
> 
> I'm struggling to find the code which needs the 0 value to be special -
> where is it exactly that we encode this "no-tiler-heap" value?

Hm, I thought we were passing the heap handle to the group creation
ioctl, but heap queue/heap association is actually done through a CS
instruction, so I guess you have a point. The only thing that makes a
bit hesitant is that handle=0 is reserved for all other kind of handles
we return, and I think I'd prefer to keep it the same for heap handles.

This being said, we could do the `+- 1` in
panthor_ioctl_tiler_heap_{create,destroy}() to keep things simple in
panthor_heap.c.

Re: [PATCH v2 4/4] drm/panthor: Fix an off-by-one in the heap context retrieval logic

On 30/04/2024 12:28, Boris Brezillon wrote:
> ID 0 is reserved to encode 'no-tiler-heap', the heap ID range is
> [1:MAX_HEAPS_PER_POOL], which we occasionally need to turn into an index
> in the [0:MAX_HEAPS_PER_POOL-1] when we want to access the context object.

This might be a silly question, but do we need ID 0 to be
"no-tiler-heap"? Would it be easier to e.g. use a negative number for
that situation and avoid all the off-by-one problems?

I'm struggling to find the code which needs the 0 value to be special -
where is it exactly that we encode this "no-tiler-heap" value?

Steve

> 
> v2:
> - New patch
> 
> Fixes: 9cca48fa4f89 ("drm/panthor: Add the heap logical block")
> Reported-by: Eric Smith 
> Signed-off-by: Boris Brezillon 
> Tested-by: Eric Smith 
> ---
>  drivers/gpu/drm/panthor/panthor_heap.c | 35 +++---
>  1 file changed, 26 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_heap.c 
> b/drivers/gpu/drm/panthor/panthor_heap.c
> index 683bb94761bc..b1a7dbf25fb2 100644
> --- a/drivers/gpu/drm/panthor/panthor_heap.c
> +++ b/drivers/gpu/drm/panthor/panthor_heap.c
> @@ -109,7 +109,11 @@ static int panthor_heap_ctx_stride(struct panthor_device 
> *ptdev)
>  
>  static int panthor_get_heap_ctx_offset(struct panthor_heap_pool *pool, int 
> id)
>  {
> - return panthor_heap_ctx_stride(pool->ptdev) * id;
> + /* ID 0 is reserved to encode 'no-tiler-heap', the valid range
> +  * is [1:MAX_HEAPS_PER_POOL], which we need to turn into a
> +  * [0:MAX_HEAPS_PER_POOL-1] context index, hence the minus one here.
> +  */
> + return panthor_heap_ctx_stride(pool->ptdev) * (id - 1);
>  }
>  
>  static void *panthor_get_heap_ctx(struct panthor_heap_pool *pool, int id)
> @@ -118,6 +122,21 @@ static void *panthor_get_heap_ctx(struct 
> panthor_heap_pool *pool, int id)
>  panthor_get_heap_ctx_offset(pool, id);
>  }
>  
> +static int panthor_get_heap_ctx_id(struct panthor_heap_pool *pool,
> +u64 heap_ctx_gpu_va)
> +{
> + u64 offset = heap_ctx_gpu_va - 
> panthor_kernel_bo_gpuva(pool->gpu_contexts);
> + u32 heap_idx = (u32)offset / panthor_heap_ctx_stride(pool->ptdev);
> +
> + if (offset > U32_MAX || heap_idx >= MAX_HEAPS_PER_POOL)
> + return -EINVAL;
> +
> + /* ID 0 is reserved to encode 'no-tiler-heap', the valid range
> +  * is [1:MAX_HEAPS_PER_POOL], hence the plus one here.
> +  */
> + return heap_idx + 1;
> +}
> +
>  static void panthor_free_heap_chunk(struct panthor_vm *vm,
>   struct panthor_heap *heap,
>   struct panthor_heap_chunk *chunk)
> @@ -364,14 +383,13 @@ int panthor_heap_return_chunk(struct panthor_heap_pool 
> *pool,
> u64 heap_gpu_va,
> u64 chunk_gpu_va)
>  {
> - u64 offset = heap_gpu_va - panthor_kernel_bo_gpuva(pool->gpu_contexts);
> - u32 heap_id = (u32)offset / panthor_heap_ctx_stride(pool->ptdev);
> + int heap_id = panthor_get_heap_ctx_id(pool, heap_gpu_va);
>   struct panthor_heap_chunk *chunk, *tmp, *removed = NULL;
>   struct panthor_heap *heap;
>   int ret;
>  
> - if (offset > U32_MAX || heap_id >= MAX_HEAPS_PER_POOL)
> - return -EINVAL;
> + if (heap_id < 0)
> + return heap_id;
>  
>   down_read(>lock);
>   heap = xa_load(>xa, heap_id);
> @@ -427,14 +445,13 @@ int panthor_heap_grow(struct panthor_heap_pool *pool,
> u32 pending_frag_count,
> u64 *new_chunk_gpu_va)
>  {
> - u64 offset = heap_gpu_va - panthor_kernel_bo_gpuva(pool->gpu_contexts);
> - u32 heap_id = (u32)offset / panthor_heap_ctx_stride(pool->ptdev);
> + int heap_id = panthor_get_heap_ctx_id(pool, heap_gpu_va);
>   struct panthor_heap_chunk *chunk;
>   struct panthor_heap *heap;
>   int ret;
>  
> - if (offset > U32_MAX || heap_id >= MAX_HEAPS_PER_POOL)
> - return -EINVAL;
> + if (heap_id < 0)
> + return heap_id;
>  
>   down_read(>lock);
>   heap = xa_load(>xa, heap_id);

Re: [PATCH v2 3/4] drm/panthor: Relax the constraints on the tiler chunk size

On 30/04/2024 14:08, Adrián Larumbe wrote:
> Hi Boris,
> 
> On 30.04.2024 13:28, Boris Brezillon wrote:
>> The field used to store the chunk size if 12 bits wide, and the encoding
>> is chunk_size = chunk_header.chunk_size << 12, which gives us a
>> theoretical [4k:8M] range. This range is further limited by
>> implementation constraints, and all known implementations seem to
>> impose a [128k:8M] range, so do the same here.
>>
>> We also relax the power-of-two constraint, which doesn't seem to
>> exist on v10. This will allow userspace to fine-tune initial/max
>> tiler memory on memory-constrained devices.
>>
>> v2:
>> - Turn the power-of-two constraint into a page-aligned constraint to allow
>>   fine-tune of the initial/max heap memory size
>> - Fix the panthor_heap_create() kerneldoc
>>
>> Fixes: 9cca48fa4f89 ("drm/panthor: Add the heap logical block")
>> Signed-off-by: Boris Brezillon 

Other than the typo Adrián pointed out below...

Reviewed-by: Steven Price 

>> ---
>>  drivers/gpu/drm/panthor/panthor_heap.c | 8 
>>  include/uapi/drm/panthor_drm.h | 6 +-
>>  2 files changed, 9 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/panthor/panthor_heap.c 
>> b/drivers/gpu/drm/panthor/panthor_heap.c
>> index 3be86ec383d6..683bb94761bc 100644
>> --- a/drivers/gpu/drm/panthor/panthor_heap.c
>> +++ b/drivers/gpu/drm/panthor/panthor_heap.c
>> @@ -253,8 +253,8 @@ int panthor_heap_destroy(struct panthor_heap_pool *pool, 
>> u32 handle)
>>   * @pool: Pool to instantiate the heap context from.
>>   * @initial_chunk_count: Number of chunk allocated at initialization time.
>>   * Must be at least 1.
>> - * @chunk_size: The size of each chunk. Must be a power of two between 256k
>> - * and 2M.
>> + * @chunk_size: The size of each chunk. Must be page-aligned and lie in the
>> + * [128k:2M] range.
> 
> Probably a typo, but I guess this should be [128k:8M] ?
> 
>>   * @max_chunks: Maximum number of chunks that can be allocated.
>>   * @target_in_flight: Maximum number of in-flight render passes.
>>   * @heap_ctx_gpu_va: Pointer holding the GPU address of the allocated heap
>> @@ -284,8 +284,8 @@ int panthor_heap_create(struct panthor_heap_pool *pool,
>>  if (initial_chunk_count > max_chunks)
>>  return -EINVAL;
>>  
>> -if (hweight32(chunk_size) != 1 ||
>> -chunk_size < SZ_256K || chunk_size > SZ_2M)
>> +if (!IS_ALIGNED(chunk_size, PAGE_SIZE) ||
>> +chunk_size < SZ_128K || chunk_size > SZ_8M)
>>  return -EINVAL;
>>  
>>  down_read(>lock);
>> diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
>> index 5db80a0682d5..b8220d2e698f 100644
>> --- a/include/uapi/drm/panthor_drm.h
>> +++ b/include/uapi/drm/panthor_drm.h
>> @@ -898,7 +898,11 @@ struct drm_panthor_tiler_heap_create {
>>  /** @initial_chunk_count: Initial number of chunks to allocate. Must be 
>> at least one. */
>>  __u32 initial_chunk_count;
>>  
>> -/** @chunk_size: Chunk size. Must be a power of two at least 256KB 
>> large. */
>> +/**
>> + * @chunk_size: Chunk size.
>> + *
>> + * Must be page-aligned and lie in the [128k:8M] range.
>> + */
>>  __u32 chunk_size;
>>  
>>  /**
>> -- 
>> 2.44.0
> 
> 
> Adrian Larumbe

Re: [PATCH v2 2/4] drm/panthor: Make sure the tiler initial/max chunks are consistent

On 30/04/2024 12:28, Boris Brezillon wrote:
> It doesn't make sense to have a maximum number of chunks smaller than
> the initial number of chunks attached to the context.
> 
> Fix the uAPI header to reflect the new constraint, and mention the
> undocumented "initial_chunk_count > 0" constraint while at it.
> 
> v2:
> - Fix the check
> 
> Fixes: 9cca48fa4f89 ("drm/panthor: Add the heap logical block")
> Signed-off-by: Boris Brezillon 

Reviewed-by: Steven Price 

Thanks,
Steve

> ---
>  drivers/gpu/drm/panthor/panthor_heap.c | 3 +++
>  include/uapi/drm/panthor_drm.h | 8 ++--
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_heap.c 
> b/drivers/gpu/drm/panthor/panthor_heap.c
> index c3c0ba744937..3be86ec383d6 100644
> --- a/drivers/gpu/drm/panthor/panthor_heap.c
> +++ b/drivers/gpu/drm/panthor/panthor_heap.c
> @@ -281,6 +281,9 @@ int panthor_heap_create(struct panthor_heap_pool *pool,
>   if (initial_chunk_count == 0)
>   return -EINVAL;
>  
> + if (initial_chunk_count > max_chunks)
> + return -EINVAL;
> +
>   if (hweight32(chunk_size) != 1 ||
>   chunk_size < SZ_256K || chunk_size > SZ_2M)
>   return -EINVAL;
> diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
> index dadb05ab1235..5db80a0682d5 100644
> --- a/include/uapi/drm/panthor_drm.h
> +++ b/include/uapi/drm/panthor_drm.h
> @@ -895,13 +895,17 @@ struct drm_panthor_tiler_heap_create {
>   /** @vm_id: VM ID the tiler heap should be mapped to */
>   __u32 vm_id;
>  
> - /** @initial_chunk_count: Initial number of chunks to allocate. */
> + /** @initial_chunk_count: Initial number of chunks to allocate. Must be 
> at least one. */
>   __u32 initial_chunk_count;
>  
>   /** @chunk_size: Chunk size. Must be a power of two at least 256KB 
> large. */
>   __u32 chunk_size;
>  
> - /** @max_chunks: Maximum number of chunks that can be allocated. */
> + /**
> +  * @max_chunks: Maximum number of chunks that can be allocated.
> +  *
> +  * Must be at least @initial_chunk_count.
> +  */
>   __u32 max_chunks;
>  
>   /**

Re: [PATCH v2 1/4] drm/panthor: Fix tiler OOM handling to allow incremental rendering

On 30/04/2024 12:28, Boris Brezillon wrote:
> From: Antonino Maniscalco 
> 
> If the kernel couldn't allocate memory because we reached the maximum
> number of chunks but no render passes are in flight
> (panthor_heap_grow() returning -ENOMEM), we should defer the OOM
> handling to the FW by returning a NULL chunk. The FW will then call
> the tiler OOM exception handler, which is supposed to implement
> incremental rendering (execute an intermediate fragment job to flush
> the pending primitives, release the tiler memory that was used to
> store those primitives, and start over from where it stopped).
> 
> Instead of checking for both ENOMEM and EBUSY, make panthor_heap_grow()
> return ENOMEM no matter the reason of this allocation failure, the FW
> doesn't care anyway.
> 
> v2:
> - Make panthor_heap_grow() return -ENOMEM for all kind of allocation
>   failures
> - Document the panthor_heap_grow() semantics
> 
> Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block")
> Signed-off-by: Antonino Maniscalco 
> Signed-off-by: Boris Brezillon 

Reviewed-by: Steven Price 

Thanks,
Steve

> ---
>  drivers/gpu/drm/panthor/panthor_heap.c  | 12 
>  drivers/gpu/drm/panthor/panthor_sched.c |  7 ++-
>  2 files changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_heap.c 
> b/drivers/gpu/drm/panthor/panthor_heap.c
> index 143fa35f2e74..c3c0ba744937 100644
> --- a/drivers/gpu/drm/panthor/panthor_heap.c
> +++ b/drivers/gpu/drm/panthor/panthor_heap.c
> @@ -410,6 +410,13 @@ int panthor_heap_return_chunk(struct panthor_heap_pool 
> *pool,
>   * @renderpasses_in_flight: Number of render passes currently in-flight.
>   * @pending_frag_count: Number of fragment jobs waiting for 
> execution/completion.
>   * @new_chunk_gpu_va: Pointer used to return the chunk VA.
> + *
> + * Return:
> + * - 0 if a new heap was allocated
> + * - -ENOMEM if the tiler context reached the maximum number of chunks
> + *   or if too many render passes are in-flight
> + *   or if the allocation failed
> + * - -EINVAL if any of the arguments passed to panthor_heap_grow() is invalid
>   */
>  int panthor_heap_grow(struct panthor_heap_pool *pool,
> u64 heap_gpu_va,
> @@ -439,10 +446,7 @@ int panthor_heap_grow(struct panthor_heap_pool *pool,
>* handler provided by the userspace driver, if any).
>*/
>   if (renderpasses_in_flight > heap->target_in_flight ||
> - (pending_frag_count > 0 && heap->chunk_count >= heap->max_chunks)) {
> - ret = -EBUSY;
> - goto out_unlock;
> - } else if (heap->chunk_count >= heap->max_chunks) {
> + heap->chunk_count >= heap->max_chunks) {
>   ret = -ENOMEM;
>   goto out_unlock;
>   }
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c 
> b/drivers/gpu/drm/panthor/panthor_sched.c
> index b3a51a6de523..fd928362d45e 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -1354,7 +1354,12 @@ static int group_process_tiler_oom(struct 
> panthor_group *group, u32 cs_id)
>   pending_frag_count, _chunk_va);
>   }
>  
> - if (ret && ret != -EBUSY) {
> + /* If the heap context doesn't have memory for us, we want to let the
> +  * FW try to reclaim memory by waiting for fragment jobs to land or by
> +  * executing the tiler OOM exception handler, which is supposed to
> +  * implement incremental rendering.
> +  */
> + if (ret && ret != -ENOMEM) {
>   drm_warn(>base, "Failed to extend the tiler heap\n");
>   group->fatal_queues |= BIT(cs_id);
>   sched_queue_delayed_work(sched, tick, 0);

Re: [PATCH v3 1/4] drm: add devm release action

2024-05-02 Thread Maxime Ripard

On Wed, Apr 24, 2024 at 06:06:52PM +0530, Aravind Iddamsetty wrote:
> 
> On 24/04/24 17:21, Maxime Ripard wrote:
> > On Mon, Apr 22, 2024 at 12:27:53PM +0530, Aravind Iddamsetty wrote:
> >> In scenarios where drm_dev_put is directly called by driver we want to
> >> release devm_drm_dev_init_release action associated with struct
> >> drm_device.
> >>
> >> v2: Directly expose the original function, instead of introducing a
> >> helper (Rodrigo)
> >>
> >> v3: add kernel-doc (Maxime Ripard)
> >>
> >> Cc: Maxime Ripard 
> >> Cc: Thomas Hellstr_m 
> >> Cc: Rodrigo Vivi 
> >>
> >> Reviewed-by: Rodrigo Vivi 
> >> Signed-off-by: Aravind Iddamsetty 
> >> ---
> >>  drivers/gpu/drm/drm_drv.c | 13 +
> >>  include/drm/drm_drv.h |  2 ++
> >>  2 files changed, 15 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> >> index 243cacb3575c..9d0409165f1e 100644
> >> --- a/drivers/gpu/drm/drm_drv.c
> >> +++ b/drivers/gpu/drm/drm_drv.c
> >> @@ -714,6 +714,19 @@ static int devm_drm_dev_init(struct device *parent,
> >>devm_drm_dev_init_release, dev);
> >>  }
> >>  
> >> +/**
> >> + * devm_drm_dev_release_action - Call the final release action of the 
> >> device
> >> + * @dev: DRM device
> >> + *
> >> + * In scenarios where drm_dev_put is directly called by driver we want to 
> >> release
> >> + * devm_drm_dev_init_release action associated with struct drm_device.
> >> + */
> > I'm not entirely sure what you mean by that documentation. "In scenarios
> > where drm_dev_put is directly called by the driver", we wouldn't need to
> > consider that function at all, right?
> 
> the drm_dev_put is not being invoked by drivers directly but that is
> associated with devres releases and the scenario here I meant if
> drivers want to have that called by themselves.

Then that needs to be rephrased to mention both that it applies only to
drivers using devm_drm_dev_alloc, and if they want to drop their
reference earlier than anticipated.

> > Also, we should reference it in drm_dev_put and devm_drm_dev_alloc too.
> 
> sorry I didn't get this can you please elaborate.

devm_drm_dev_alloc needs to point at this new function to mention that
we can drop the reference explicitly. And drm_dev_put needs to mention
that it only applies to non-devm drivers, and if we want to drop the
reference on a devm driver, we need to call this new function.

Maxime


signature.asc
Description: PGP signature

Re: [PATCH v3 7/9] drm/panel: boe-tv101wum-nl6: Don't use a table for initting panels

2024-05-02 Thread Linus Walleij

On Wed, May 1, 2024 at 5:43 PM Douglas Anderson  wrote:

> Consensus on the mailing lists is that panels shouldn't use a table of
> init commands but should instead use init functions. With the recently
> introduced mipi_dsi_dcs_write_seq_multi() this is not only clean/easy
> but also saves space. Measuring before/after this change:
>
> $ scripts/bloat-o-meter \
>   .../before/panel-boe-tv101wum-nl6.ko \
>   .../after/panel-boe-tv101wum-nl6.ko
> add/remove: 14/8 grow/shrink: 0/1 up/down: 27062/-31433 (-4371)
> Function old new   delta
> inx_hj110iz_init   -7040   +7040
> boe_tv110c9m_init  -6440   +6440
> boe_init   -5916   +5916
> starry_qfh032011_53g_init  -1944   +1944
> starry_himax83102_j02_init -1228   +1228
> inx_hj110iz_init.d -1040   +1040
> boe_tv110c9m_init.d- 982+982
> auo_b101uan08_3_init   - 944+944
> boe_init.d - 580+580
> starry_himax83102_j02_init.d   - 512+512
> starry_qfh032011_53g_init.d- 180+180
> auo_kd101n80_45na_init - 172+172
> auo_b101uan08_3_init.d -  82 +82
> auo_kd101n80_45na_init.d   -   2  +2
> auo_kd101n80_45na_init_cmd   144   --144
> boe_panel_prepare592 440-152
> auo_b101uan08_3_init_cmd1056   -   -1056
> starry_himax83102_j02_init_cmd  1392   -   -1392
> starry_qfh032011_53g_init_cmd   2256   -   -2256
> .compoundliteral3393   -   -3393
> boe_init_cmd7008   -   -7008
> boe_tv110c9m_init_cmd   7656   -   -7656
> inx_hj110iz_init_cmd8376   -   -8376
> Total: Before=37297, After=32926, chg -11.72%
>
> Let's do the conversion.
>
> Since we're touching all the tables, let's also convert hex numbers to
> lower case as per kernel conventions.
>
> Signed-off-by: Douglas Anderson 

Wow that's a *VERY* nice patch.
Reviewed-by: Linus Walleij 

The metrics surprisingly reports more compact object code,
I wasn't expecting this, but it's nice.

Yours,
Linus Walleij

Re: [PATCH v1 2/2] vfio/pci: Allow MMIO regions to be exported through dma-buf

2024-05-02 Thread Leon Romanovsky

On Thu, May 02, 2024 at 07:50:36AM +, Kasireddy, Vivek wrote:
> Hi Jason,

<...>

> > I'd rather we stick with the original design. Leon is working on DMA
> > API changes that should address half the issue.
> Ok, I'll keep an eye out for Leon's work.

The code for v1 is here:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=dma-split-v1
It is constantly rebased till we will be ready to submit it.

v0 is here:
https://lore.kernel.org/linux-rdma/cover.1709635535.git.l...@kernel.org/

Thanks

> 
> Thanks,
> Vivek
> 
> > 
> > Jason
>

Re: [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table from hmm range

2024-05-02 Thread Jason Gunthorpe

On Thu, May 02, 2024 at 11:11:04AM +0200, Thomas Hellström wrote:

> It's true the cpu vma lookup is a remnant from amdkfd. The idea here is
> to replace that with fixed prefaulting ranges of tunable size. So far,
> as you mention, the prefaulting range has been determined by the CPU
> vma size. Given previous feedback, this is going to change.

Perhaps limiting prefault to a VMA barrier is a reasonable thing to
do, but the implementation should be pushed into hmm_range_fault and
not open coded in the driver.

> Still the prefaulting range needs to be restricted to avoid -EFAULT
> failures in hmm_range_fault(). That can ofc be done by calling it
> without HMM_PFN_REQ_FAULT for the range and interpret the returned
> pnfs. 

Yes, this is exactly what that feature is for, you mark your prefetch
differently from the fault critical page(s).

> There is a performance concern of this approach as compared to
> peeking at the CPU vmas directly, since hmm_range_fault() would need to
> be called twice. Any guidelines ideas here?

If there is something wrong with hmm_range_fault() then please fix
it. I'm not sure why you'd call it twice, the HMM_PFN_REQ_FAULT is per
PFN?

Jason

Re: [PATCH v3 0/2] drm: Fix dma_resv deadlock at drm object pin time

2024-05-02 Thread Thomas Zimmermann


Hi

Am 02.05.24 um 14:00 schrieb Boris Brezillon:

On Thu, 2 May 2024 13:59:41 +0200
Boris Brezillon  wrote:


Hi Thomas,

On Thu, 2 May 2024 13:51:16 +0200
Thomas Zimmermann  wrote:


Hi,

ignoring my r-b on patch 1, I'd like to rethink the current patches in
general.

I think drm_gem_shmem_pin() should become the locked version of _pin(),
so that drm_gem_shmem_object_pin() can call it directly. The existing
_pin_unlocked() would not be needed any longer. Same for the _unpin()
functions. This change would also fix the consistency with the semantics
of the shmem _vmap() functions, which never take reservation locks.

There are only two external callers of drm_gem_shmem_pin(): the test
case and panthor. These assume that drm_gem_shmem_pin() acquires the
reservation lock. The test case should likely call drm_gem_pin()
instead. That would acquire the reservation lock and the test would
validate that shmem's pin helper integrates well into the overall GEM
framework. The way panthor uses drm_gem_shmem_pin() looks wrong to me.
For now, it could receive a wrapper that takes the lock and that's it.

I do agree that the current inconsistencies in the naming is
troublesome (sometimes _unlocked, sometimes _locked, with the version
without any suffix meaning either _locked or _unlocked depending on
what the suffixed version does), and that's the very reason I asked
Dmitry to address that in his shrinker series [1]. So, ideally I'd
prefer if patches from Dmitry's series were applied instead of
trying to fix that here (IIRC, we had an ack from Maxime).

With the link this time :-).

[1]https://lore.kernel.org/lkml/20240105184624.508603-1-dmitry.osipe...@collabora.com/T/


Thanks. I remember these patches. Somehow I thought they would have been 
merged already. I wasn't super happy about the naming changes in patch 
5, because the names of the GEM object callbacks do no longer correspond 
with their implementations. But anyway.


If we go that direction, we should here simply push drm_gem_shmem_pin() 
and drm_gem_shmem_unpin() into panthor and update the shmem tests with 
drm_gem_pin(). Panfrost and lima would call drm_gem_shmem_pin_locked(). 
IMHO we should not promote the use of drm_gem_shmem_object_*() 
functions, as they are meant to be callbacks for struct 
drm_gem_object_funcs. (Auto-generating them would be nice.)


Best regards
Thomas





Regards,

Boris


--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)

Re: [PATCH v3 0/2] drm: Fix dma_resv deadlock at drm object pin time

On Thu, 2 May 2024 13:59:41 +0200
Boris Brezillon  wrote:

> Hi Thomas,
> 
> On Thu, 2 May 2024 13:51:16 +0200
> Thomas Zimmermann  wrote:
> 
> > Hi,
> > 
> > ignoring my r-b on patch 1, I'd like to rethink the current patches in 
> > general.
> > 
> > I think drm_gem_shmem_pin() should become the locked version of _pin(), 
> > so that drm_gem_shmem_object_pin() can call it directly. The existing 
> > _pin_unlocked() would not be needed any longer. Same for the _unpin() 
> > functions. This change would also fix the consistency with the semantics 
> > of the shmem _vmap() functions, which never take reservation locks.
> > 
> > There are only two external callers of drm_gem_shmem_pin(): the test 
> > case and panthor. These assume that drm_gem_shmem_pin() acquires the 
> > reservation lock. The test case should likely call drm_gem_pin() 
> > instead. That would acquire the reservation lock and the test would 
> > validate that shmem's pin helper integrates well into the overall GEM 
> > framework. The way panthor uses drm_gem_shmem_pin() looks wrong to me. 
> > For now, it could receive a wrapper that takes the lock and that's it.  
> 
> I do agree that the current inconsistencies in the naming is
> troublesome (sometimes _unlocked, sometimes _locked, with the version
> without any suffix meaning either _locked or _unlocked depending on
> what the suffixed version does), and that's the very reason I asked
> Dmitry to address that in his shrinker series [1]. So, ideally I'd
> prefer if patches from Dmitry's series were applied instead of
> trying to fix that here (IIRC, we had an ack from Maxime).

With the link this time :-).

[1]https://lore.kernel.org/lkml/20240105184624.508603-1-dmitry.osipe...@collabora.com/T/

> 
> Regards,
> 
> Boris

Re: [PATCH v3 0/2] drm: Fix dma_resv deadlock at drm object pin time