Re: [PATCH v5 3/7] PCI: Drop the `is_thunderbolt` attribute from PCI core

2022-03-01 Thread Mika Westerberg
Hi,

On Mon, Feb 28, 2022 at 10:36:59PM +, Limonciello, Mario wrote:
> [AMD Official Use Only]
> 
> > -Original Message-
> > From: Lukas Wunner 
> > Sent: Monday, February 28, 2022 16:33
> > To: Bjorn Helgaas 
> > Cc: Limonciello, Mario ; Mika Westerberg
> > ; Michael Jamet
> > ; open list:PCI SUBSYSTEM  > p...@vger.kernel.org>; open list:THUNDERBOLT DRIVER  > u...@vger.kernel.org>; Yehezkel Bernat ; open
> > list:DRM DRIVERS ; open list:X86
> > PLATFORM DRIVERS ; Andreas
> > Noever ; open list:RADEON and AMDGPU
> > DRM DRIVERS ; open list:DRM DRIVER FOR
> > NVIDIA GEFORCE/QUADRO GPUS ; Bjorn
> > Helgaas ; Deucher, Alexander
> > 
> > Subject: Re: [PATCH v5 3/7] PCI: Drop the `is_thunderbolt` attribute from 
> > PCI
> > core
> > 
> > On Mon, Feb 28, 2022 at 04:13:44PM -0600, Bjorn Helgaas wrote:
> > > On Mon, Feb 28, 2022 at 03:33:13PM +, Limonciello, Mario wrote:
> > > > > On Fri, Feb 25, 2022 at 11:42:24AM -0600, Bjorn Helgaas wrote:
> > > > > > That would just leave the "PCI_VSEC_ID_INTEL_TBT implies external-
> > > > > facing"
> > > > > > assumption above.  Not having a Thunderbolt spec, I have no idea
> > how
> > > > > > you deal with that.
> > > > >
> > > > > You can download the spec here:
> > [...]
> > > > > Inside the archive there is also the DVSEC spec with name "USB4 DVSEC
> > > > > Version 1.0.pdf".
> > > >
> > > > The spec has Host_Router_indication (bits 18-19) as meaning external
> > facing.
> > > > I'll respin the patch 3 for using that.
> > >
> > > Thanks, please include the spec citation when you do.  And probably
> > > the URL, because it's not at all obvious how the casual reader would
> > > get from "is_thunderbolt" to a recent add-on to the USB4 spec.
> > 
> > PCI_VSEC_ID_INTEL_TBT is not mentioned at all in the USB4 spec,
> > hence there's no connection between "is_thunderbolt" and the USB4 spec.
> > 
> > It's a proprietary VSEC used by Intel and the only way to recognize
> > pre-USB4 Thunderbolt devices that I know of.  Its ID is also
> > different from the DVSEC IDs given in the above-mentioned spec.
> > 
> > Thanks,
> 
> The USB4 DVSEC spec makes comments about DVSEC_ID of 0x8086 and also
> DVSEC VENDOR_ID of 0x8086.  Is that not also present on the Intel TBT3 
> controllers?
> 
> My interpretation of this (and Mika's comment) was that rather than
> looking at the Intel VSEC we should look at the USB4 DVSEC to detect
> the Intel TBT3 controllers.

For pre-USB4 controllers (TBT 1-3) we need to use the existing method
(or a quirk based on device ID) as they don't have the USB4 DVSEC.


Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Linus Torvalds
On Mon, Feb 28, 2022 at 4:45 PM Linus Torvalds
 wrote:
>
> Yeah, except that's ugly beyond belief, plus it's literally not what
> we do in the kernel.

(Of course, I probably shouldn't have used 'min()' as an example,
because that is actually one of the few places where we do exactly
that, using our __UNIQUE_ID() macros. Exactly because people _have_
tried to do -Wshadow when doing W=2).

 Linus


Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Matthew Wilcox
On Mon, Feb 28, 2022 at 12:37:15PM -0800, Linus Torvalds wrote:
> On Mon, Feb 28, 2022 at 12:16 PM Matthew Wilcox  wrote:
> >
> > Then we can never use -Wshadow ;-(  I'd love to be able to turn it on;
> > it catches real bugs.
> 
> Oh, we already can never use -Wshadow regardless of things like this.
> That bridge hasn't just been burned, it never existed in the first
> place.
> 
> The whole '-Wshadow' thing simply cannot work with local variables in
> macros - something that we've used since day 1.
> 
> Try this (as a "p.c" file):
> 
> #define min(a,b) ({ \
> typeof(a) __a = (a);\
> typeof(b) __b = (b);\
> __a < __b ? __a : __b; })
> 
> int min3(int a, int b, int c)
> {
> return min(a,min(b,c));
> }
> 
> and now do "gcc -O2 -S t.c".
> 
> Then try it with -Wshadow.

#define ___PASTE(a, b)  a##b
#define __PASTE(a, b) ___PASTE(a, b)
#define _min(a, b, u) ({ \
typeof(a) __PASTE(__a,u) = (a);\
typeof(b) __PASTE(__b,u) = (b);\
__PASTE(__a,u) < __PASTE(__b,u) ? __PASTE(__a,u) : __PASTE(__b,u); })

#define min(a, b) _min(a, b, __COUNTER__)

int min3(int a, int b, int c)
{
return min(a,min(b,c));
}

(probably there's a more elegant way to do this)


Re: [PATCH] mm: split vm_normal_pages for LRU and non-LRU handling

2022-03-01 Thread David Hildenbrand
On 28.02.22 21:34, Alex Sierra wrote:
> DEVICE_COHERENT pages introduce a subtle distinction in the way
> "normal" pages can be used by various callers throughout the kernel.
> They behave like normal pages for purposes of mapping in CPU page
> tables, and for COW. But they do not support LRU lists, NUMA
> migration or THP. Therefore we split vm_normal_page into two
> functions vm_normal_any_page and vm_normal_lru_page. The latter will
> only return pages that can be put on an LRU list and that support
> NUMA migration and THP.

Why not s/vm_normal_any_page/vm_normal_page/ and avoid code churn?

> 
> We also introduced a FOLL_LRU flag that adds the same behaviour to
> follow_page and related APIs, to allow callers to specify that they
> expect to put pages on an LRU list.

[...]
> -#define FOLL_WRITE   0x01/* check pte is writable */
> -#define FOLL_TOUCH   0x02/* mark page accessed */
> -#define FOLL_GET 0x04/* do get_page on page */
> -#define FOLL_DUMP0x08/* give error on hole if it would be zero */
> -#define FOLL_FORCE   0x10/* get_user_pages read/write w/o permission */
> -#define FOLL_NOWAIT  0x20/* if a disk transfer is needed, start the IO
> -  * and return without waiting upon it */
> -#define FOLL_POPULATE0x40/* fault in pages (with FOLL_MLOCK) */
> -#define FOLL_NOFAULT 0x80/* do not fault in pages */
> -#define FOLL_HWPOISON0x100   /* check page is hwpoisoned */
> -#define FOLL_NUMA0x200   /* force NUMA hinting page fault */
> -#define FOLL_MIGRATION   0x400   /* wait for page to replace migration 
> entry */
> -#define FOLL_TRIED   0x800   /* a retry, previous pass started an IO */
> -#define FOLL_MLOCK   0x1000  /* lock present pages */
> -#define FOLL_REMOTE  0x2000  /* we are working on non-current tsk/mm */
> -#define FOLL_COW 0x4000  /* internal GUP flag */
> -#define FOLL_ANON0x8000  /* don't do file mappings */
> -#define FOLL_LONGTERM0x1 /* mapping lifetime is indefinite: see 
> below */
> -#define FOLL_SPLIT_PMD   0x2 /* split huge pmd before returning */
> -#define FOLL_PIN 0x4 /* pages must be released via unpin_user_page */
> -#define FOLL_FAST_ONLY   0x8 /* gup_fast: prevent fall-back to slow 
> gup */
> +#define FOLL_WRITE   0x01 /* check pte is writable */
> +#define FOLL_TOUCH   0x02 /* mark page accessed */
> +#define FOLL_GET 0x04 /* do get_page on page */
> +#define FOLL_DUMP0x08 /* give error on hole if it would be zero */
> +#define FOLL_FORCE   0x10 /* get_user_pages read/write w/o permission */
> +#define FOLL_NOWAIT  0x20 /* if a disk transfer is needed, start the IO
> +   * and return without waiting upon it */
> +#define FOLL_POPULATE0x40 /* fault in pages (with FOLL_MLOCK) */
> +#define FOLL_NOFAULT 0x80 /* do not fault in pages */
> +#define FOLL_HWPOISON0x100/* check page is hwpoisoned */
> +#define FOLL_NUMA0x200/* force NUMA hinting page fault */
> +#define FOLL_MIGRATION   0x400/* wait for page to replace migration 
> entry */
> +#define FOLL_TRIED   0x800/* a retry, previous pass started an IO */
> +#define FOLL_MLOCK   0x1000   /* lock present pages */
> +#define FOLL_REMOTE  0x2000   /* we are working on non-current tsk/mm */
> +#define FOLL_COW 0x4000   /* internal GUP flag */
> +#define FOLL_ANON0x8000   /* don't do file mappings */
> +#define FOLL_LONGTERM0x1  /* mapping lifetime is indefinite: see 
> below */
> +#define FOLL_SPLIT_PMD   0x2  /* split huge pmd before returning */
> +#define FOLL_PIN 0x4  /* pages must be released via unpin_user_page 
> */
> +#define FOLL_FAST_ONLY   0x8  /* gup_fast: prevent fall-back to slow 
> gup */
> +#define FOLL_LRU 0x10 /* return only LRU (anon or page cache) */
>  

Can we minimize code churn, please?


>   if (PageReserved(page))
> diff --git a/mm/migrate.c b/mm/migrate.c
> index c31d04b46a5e..17d049311b78 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1614,7 +1614,7 @@ static int add_page_for_migration(struct mm_struct *mm, 
> unsigned long addr,
>   goto out;
>  
>   /* FOLL_DUMP to ignore special (like zero) pages */
> - follflags = FOLL_GET | FOLL_DUMP;
> + follflags = FOLL_GET | FOLL_DUMP | FOLL_LRU;
>   page = follow_page(vma, addr, follflags);

Why wouldn't we want to dump DEVICE_COHERENT pages? This looks wrong.


-- 
Thanks,

David / dhildenb



Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Linus Torvalds
On Mon, Feb 28, 2022 at 1:47 PM Jakob Koschel  wrote:
>
> The goal of this is to get compiler warnings right? This would indeed be 
> great.

Yes, so I don't mind having a one-time patch that has been gathered
using some automated checker tool, but I don't think that works from a
long-term maintenance perspective.

So if we have the basic rule being "don't use the loop iterator after
the loop has finished, because it can cause all kinds of subtle
issues", then in _addition_ to fixing the existing code paths that
have this issue, I really would want to (a) get a compiler warning for
future cases and (b) make it not actually _work_ for future cases.

Because otherwise it will just happen again.

> Changing the list_for_each_entry() macro first will break all of those cases
> (e.g. the ones using 'list_entry_is_head()).

So I have no problems with breaking cases that we basically already
have a patch for due to  your automated tool. There were certainly
more than a handful, but it didn't look _too_ bad to just make the
rule be "don't use the iterator after the loop".

Of course, that's just based on that patch of yours. Maybe there are a
ton of other cases that your patch didn't change, because they didn't
match your trigger case, so I may just be overly optimistic here.

But basically to _me_, the important part is that the end result is
maintainable longer-term. I'm more than happy to have a one-time patch
to fix a lot of dubious cases if we can then have clean rules going
forward.

> I assumed it is better to fix those cases first and then have a simple
> coccinelle script changing the macro + moving the iterator into the scope
> of the macro.

So that had been another plan of mine, until I actually looked at
changing the macro. In the one case I looked at, it was ugly beyond
belief.

It turns out that just syntactically, it's really nice to give the
type of the iterator from outside the way we do now. Yeah, it may be a
bit odd, and maybe it's partly because I'm so used to the
"list_for_each_list_entry()" syntax, but moving the type into the loop
construct really made it nasty - either one very complex line, or
having to split it over two lines which was even worse.

Maybe the place I looked at just happened to have a long typename, but
it's basically always going to be a struct, so it's never a _simple_
type. And it just looked very odd adn unnatural to have the type as
one of the "arguments" to that list_for_each_entry() macro.

So yes, initially my idea had been to just move the iterator entirely
inside the macro. But specifying the type got so ugly that I think
that

typeof (pos) pos

trick inside the macro really ends up giving us the best of all worlds:

 (a) let's us keep the existing syntax and code for all the nice cases
that did everything inside the loop anyway

 (b) gives us a nice warning for any normal use-after-loop case
(unless you explicitly initialized it like that
sgx_mmu_notifier_release() function did for no good reason

 (c) also guarantees that even if you don't get a warning,
non-converted (or newly written) bad code won't actually _work_

so you end up getting the new rules without any ambiguity or mistaken

> With this you are no longer able to set the 'outer' pos within the list
> iterator loop body or am I missing something?

Correct. Any assignment inside the loop will be entirely just to the
local loop case. So any "break;" out of the loop will have to set
another variable - like your updated patch did.

> I fail to see how this will make most of the changes in this
> patch obsolete (if that was the intention).

I hope my explanation above clarifies my thinking: I do not dislike
your patch, and in fact your patch is indeed required to make the new
semantics work.

What I disliked was always the maintainability of your patch - making
the rules be something that isn't actually visible in the source code,
and letting the old semantics still work as well as they ever did, and
having to basically run some verification pass to find bad users.

(I also disliked your original patch that mixed up the "CPU
speculation type safety" with the actual non-speculative problems, but
that was another issue).

Linus


Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Linus Torvalds
On Mon, Feb 28, 2022 at 3:26 PM Matthew Wilcox  wrote:
>
> #define ___PASTE(a, b)  a##b
> #define __PASTE(a, b) ___PASTE(a, b)
> #define _min(a, b, u) ({ \

Yeah, except that's ugly beyond belief, plus it's literally not what
we do in the kernel.

Really. The "-Wshadow doesn't work on the kernel" is not some new
issue, because you have to do completely insane things to the source
code to enable it.

Just compare your uglier-than-sin version to my straightforward one.
One does the usual and obvious "use a private variable to avoid the
classic multi-use of a macro argument". And the other one is an
abomination.

  Linus


RE: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread David Laight
From: Matthew Wilcox
> Sent: 28 February 2022 20:16
> 
> On Mon, Feb 28, 2022 at 12:10:24PM -0800, Linus Torvalds wrote:
> > We can do
> >
> > typeof(pos) pos
> >
> > in the 'for ()' loop, and never use __iter at all.
> >
> > That means that inside the for-loop, we use a _different_ 'pos' than 
> > outside.
> 
> Then we can never use -Wshadow ;-(  I'd love to be able to turn it on;
> it catches real bugs.
> 
> > +#define list_for_each_entry(pos, head, member) 
> > \
> > +   for (typeof(pos) pos = list_first_entry(head, typeof(*pos), member);
> > \
> > +!list_entry_is_head(pos, head, member);\
> >  pos = list_next_entry(pos, member))

Actually can't you use 'pos' to temporarily hold the address of 'member'.
Something like:
for (pos = (void *)head; \
pos ? ((pos = (void *)pos - offsetof(member)), 1) : 0; \
pos = (void *)pos->next)
So that 'pos' is NULL if the loop terminates.
No pointers outside structures are generated.
Probably need to kill list_entry_is_head() - or it just checks for NULL.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Segher Boessenkool
On Mon, Feb 28, 2022 at 05:28:58PM -0500, James Bottomley wrote:
> On Mon, 2022-02-28 at 23:59 +0200, Mike Rapoport wrote:
> > 
> > On February 28, 2022 10:42:53 PM GMT+02:00, James Bottomley <
> > james.bottom...@hansenpartnership.com> wrote:
> > > On Mon, 2022-02-28 at 21:07 +0100, Christian König wrote:
> [...]
> > > > > I do wish we could actually poison the 'pos' value after the
> > > > > loop somehow - but clearly the "might be uninitialized" I was
> > > > > hoping for isn't the way to do it.
> > > > > 
> > > > > Anybody have any ideas?
> > > > 
> > > > I think we should look at the use cases why code is touching
> > > > (pos) after the loop.
> > > > 
> > > > Just from skimming over the patches to change this and experience
> > > > with the drivers/subsystems I help to maintain I think the
> > > > primary pattern looks something like this:
> > > > 
> > > > list_for_each_entry(entry, head, member) {
> > > >  if (some_condition_checking(entry))
> > > >  break;
> > > > }
> > > > do_something_with(entry);
> > > 
> > > Actually, we usually have a check to see if the loop found
> > > anything, but in that case it should something like
> > > 
> > > if (list_entry_is_head(entry, head, member)) {
> > >return with error;
> > > }
> > > do_somethin_with(entry);
> > > 
> > > Suffice?  The list_entry_is_head() macro is designed to cope with
> > > the bogus entry on head problem.
> > 
> > Won't suffice because the end goal of this work is to limit scope of
> > entry only to loop. Hence the need for additional variable.
> 
> Well, yes, but my objection is more to the size of churn than the
> desire to do loop local.  I'm not even sure loop local is possible,
> because it's always annoyed me that for (int i = 0; ...  in C++ defines
> i in the outer scope not the loop scope, which is why I never use it.

In C its scope is the rest of the declaration and the entire loop, not
anything after it.  This was the same in C++98 already, btw (but in
pre-standard versions of C++ things were like you remember, yes, and it
was painful).


Segher


RE: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread David Laight
From: Linus Torvalds
> Sent: 28 February 2022 19:56
> On Mon, Feb 28, 2022 at 4:19 AM Christian König
>  wrote:
> >
> > I don't think that using the extra variable makes the code in any way
> > more reliable or easier to read.
> 
> So I think the next step is to do the attached patch (which requires
> that "-std=gnu11" that was discussed in the original thread).
> 
> That will guarantee that the 'pos' parameter of list_for_each_entry()
> is only updated INSIDE the for_each_list_entry() loop, and can never
> point to the (wrongly typed) head entry.
> 
> And I would actually hope that it should actually cause compiler
> warnings about possibly uninitialized variables if people then use the
> 'pos' pointer outside the loop. Except
> 
>  (a) that code in sgx/encl.c currently initializes 'tmp' to NULL for
> inexplicable reasons - possibly because it already expected this
> behavior
> 
>  (b) when I remove that NULL initializer, I still don't get a warning,
> because we've disabled -Wno-maybe-uninitialized since it results in so
> many false positives.
> 
> Oh well.
> 
> Anyway, give this patch a look, and at least if it's expanded to do
> "(pos) = NULL" in the entry statement for the for-loop, it will avoid
> the HEAD type confusion that Jakob is working on. And I think in a
> cleaner way than the horrid games he plays.
> 
> (But it won't avoid possible CPU speculation of such type confusion.
> That, in my opinion, is a completely different issue)
> 
> I do wish we could actually poison the 'pos' value after the loop
> somehow - but clearly the "might be uninitialized" I was hoping for
> isn't the way to do it.
> 
> Anybody have any ideas?
> 
> Linus
diff --git a/include/linux/list.h b/include/linux/list.h
index dd6c2041d09c..bab995596aaa 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -634,10 +634,10 @@ static inline void list_splice_tail_init(struct list_head 
*list,
  * @head:  the head for your list.
  * @member:the name of the list_head within the struct.
  */
-#define list_for_each_entry(pos, head, member) \
-   for (pos = list_first_entry(head, typeof(*pos), member);\
-!list_entry_is_head(pos, head, member);\
-pos = list_next_entry(pos, member))
+#define list_for_each_entry(pos, head, member) 
\
+   for (typeof(pos) __iter = list_first_entry(head, typeof(*pos), member); 
\
+!list_entry_is_head(__iter, head, member) && (((pos)=__iter),1);   
\
+__iter = list_next_entry(__iter, member))
 
 /**
  * list_for_each_entry_reverse - iterate backwards over list of given type.

I think you actually want:
!list_entry_is_head(__iter, head, member) ? (((pos)=__iter),1) : 
(((pos) = NULL),0);

Which can be done in the original by:
!list_entry_is_head(pos, head, member) ? 1 : (((pos) = NULL), 0);

Although it would be safer to have (without looking up the actual name):
for (item *__item = head; \
__item ? (((pos) = list_item(__item, member)), 1) : (((pos) = 
NULL), 0);
__item = (pos)->member)

The local does need 'fixing' to avoid shadowing for nested loops.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Linus Torvalds
On Mon, Feb 28, 2022 at 4:38 PM Segher Boessenkool
 wrote:
>
> In C its scope is the rest of the declaration and the entire loop, not
> anything after it.  This was the same in C++98 already, btw (but in
> pre-standard versions of C++ things were like you remember, yes, and it
> was painful).

Yeah, the original C++ model was just unadulterated garbage, with no
excuse for it, and the scope was not the loop, but the block the loop
existed in.

That would never have been acceptable for the kernel - it's basically
just an even uglier version of "put variable declarations in the
middle of code" (and we use "-Wdeclaration-after-statement" to
disallow that for kernel code, although apparently some of our user
space tooling code doesn't enforce or follow that rule).

The actual C99 version is the sane one which actually makes it easier
and clearer to have loop iterators that are clearly just in loop
scope.

That's a good idea in general, and I have wanted to start using that
in the kernel even aside from some of the loop construct macros.
Because putting variables in natural minimal scope is a GoodThing(tm).

Of course, we shouldn't go crazy with it. Even after we do that
-std=gnu11 thing, we'll have backports to worry about. And it's not
clear that we necessarily want to backport that gnu11 thing - since
people who run old stable kernels also may be still using those really
old compilers...

Linus


[PATCH] drm/amdgpu: Move common initialization operations of each ras block to one function

2022-03-01 Thread yipechai
Define amdgpu_ras_sw_init function to initialize all ras blocks.

Signed-off-by: yipechai 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c|   2 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 143 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h|   1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  |  21 ---
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c |  16 ---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  28 
 drivers/gpu/drm/amd/amdgpu/mca_v3_0.c  |   6 -
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c |  17 ---
 9 files changed, 148 insertions(+), 92 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6113ddc765a7..72550e9f6058 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2402,6 +2402,12 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
}
}
 
+   r = amdgpu_ras_sw_init(adev);
+   if (r) {
+   DRM_ERROR("amdgpu_ras_early_init failed (%d).\n", r);
+   goto init_failed;
+   }
+
if (amdgpu_sriov_vf(adev))
amdgpu_virt_init_data_exchange(adev);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index ab75e189bc0b..544241f357b2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -440,8 +440,6 @@ int amdgpu_gmc_ras_early_init(struct amdgpu_device *adev)
 {
if (!adev->gmc.xgmi.connected_to_cpu) {
adev->gmc.xgmi.ras = &xgmi_ras;
-   amdgpu_ras_register_ras_block(adev, 
&adev->gmc.xgmi.ras->ras_block);
-   adev->gmc.xgmi.ras_if = &adev->gmc.xgmi.ras->ras_block.ras_comm;
}
 
return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index d3875618ebf5..89075ab9e82e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2299,8 +2299,6 @@ int amdgpu_ras_init(struct amdgpu_device *adev)
case CHIP_ALDEBARAN:
if (!adev->gmc.xgmi.connected_to_cpu) {
adev->nbio.ras = &nbio_v7_4_ras;
-   amdgpu_ras_register_ras_block(adev, 
&adev->nbio.ras->ras_block);
-   adev->nbio.ras_if = &adev->nbio.ras->ras_block.ras_comm;
}
break;
default:
@@ -2533,6 +2531,147 @@ void amdgpu_ras_suspend(struct amdgpu_device *adev)
amdgpu_ras_disable_all_features(adev, 1);
 }
 
+int amdgpu_ras_sw_init(struct amdgpu_device *adev)
+{
+   int err = 0;
+
+   if (!amdgpu_ras_asic_supported(adev))
+   return 0;
+
+   if (adev->nbio.ras) {
+   err = amdgpu_ras_register_ras_block(adev, 
&adev->nbio.ras->ras_block);
+   if (err) {
+   dev_err(adev->dev, "Failed to register nbio ras 
block!\n");
+   return err;
+   }
+   adev->nbio.ras_if = &adev->nbio.ras->ras_block.ras_comm;
+   }
+
+   if (adev->gmc.xgmi.ras) {
+   err = amdgpu_ras_register_ras_block(adev, 
&adev->gmc.xgmi.ras->ras_block);
+   if (err) {
+   dev_err(adev->dev, "Failed to register xgmi ras 
block!\n");
+   return err;
+   }
+   adev->gmc.xgmi.ras_if = &adev->gmc.xgmi.ras->ras_block.ras_comm;
+   }
+
+   if (adev->gfx.ras) {
+   err = amdgpu_ras_register_ras_block(adev, 
&adev->gfx.ras->ras_block);
+   if (err) {
+   dev_err(adev->dev, "Failed to register gfx ras 
block!\n");
+   return err;
+   }
+
+   strcpy(adev->gfx.ras->ras_block.ras_comm.name, "gfx");
+   adev->gfx.ras->ras_block.ras_comm.block = AMDGPU_RAS_BLOCK__GFX;
+   adev->gfx.ras->ras_block.ras_comm.type = 
AMDGPU_RAS_ERROR__MULTI_UNCORRECTABLE;
+   adev->gfx.ras_if = &adev->gfx.ras->ras_block.ras_comm;
+
+   /* If not define special ras_late_init function, use gfx 
default ras_late_init */
+   if (!adev->gfx.ras->ras_block.ras_late_init)
+   adev->gfx.ras->ras_block.ras_late_init = 
amdgpu_gfx_ras_late_init;
+
+   /* If not defined special ras_cb function, use default ras_cb */
+   if (!adev->gfx.ras->ras_block.ras_cb)
+   adev->gfx.ras->ras_block.ras_cb = 
amdgpu_gfx_process_ras_data_cb;
+   }
+
+   if (adev->umc.ras) {
+   err = amdgpu_ras_register_ras_block(adev, 
&adev->umc.ras->ras_block);
+   if (err) {
+   dev_err(adev->dev, "Failed to register umc ras 
block!\n");
+   return err;
+   }
+
+   strcpy(adev->umc.ras->ra

[PATCH 1/3] drm/amdgpu/nv: enable clock gating for GC 10.3.7 subblock

2022-03-01 Thread Prike Liang
This will enable the following block clock gating.

 - MC
 - SDMA
 - HDP
 - ATHUB
 - IH
 - VCN/JPEG

Signed-off-by: Prike Liang 
---
 drivers/gpu/drm/amd/amdgpu/nv.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index f6aeef759ee1..05487894120a 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -946,7 +946,17 @@ static int nv_common_early_init(void *handle)
AMD_CG_SUPPORT_GFX_3D_CGLS |
AMD_CG_SUPPORT_GFX_RLC_LS |
AMD_CG_SUPPORT_GFX_CP_LS |
-   AMD_CG_SUPPORT_GFX_FGCG;
+   AMD_CG_SUPPORT_GFX_FGCG |
+   AMD_CG_SUPPORT_MC_MGCG |
+   AMD_CG_SUPPORT_MC_LS |
+   AMD_CG_SUPPORT_SDMA_LS |
+   AMD_CG_SUPPORT_HDP_MGCG |
+   AMD_CG_SUPPORT_HDP_LS |
+   AMD_CG_SUPPORT_ATHUB_MGCG |
+   AMD_CG_SUPPORT_ATHUB_LS |
+   AMD_CG_SUPPORT_IH_CG |
+   AMD_CG_SUPPORT_VCN_MGCG |
+   AMD_CG_SUPPORT_JPEG_MGCG;
adev->pg_flags = AMD_PG_SUPPORT_VCN |
AMD_PG_SUPPORT_VCN_DPG |
AMD_PG_SUPPORT_JPEG;
-- 
2.17.1



[PATCH 2/3] drm/amdgpu: enable gfx power gating for GC 10.3.7

2022-03-01 Thread Prike Liang
Enable gfx power gating for GC 10.3.7.

Signed-off-by: Prike Liang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 2 ++
 drivers/gpu/drm/amd/amdgpu/nv.c| 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index fd7ded7799e2..e048635435a2 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -8348,6 +8348,7 @@ static void gfx_v10_cntl_power_gating(struct 
amdgpu_device *adev, bool enable)
case IP_VERSION(10, 3, 1):
case IP_VERSION(10, 3, 3):
case IP_VERSION(10, 3, 6):
+   case IP_VERSION(10, 3, 7):
data = 0x4E20 & 
RLC_PG_DELAY_3__CGCG_ACTIVE_BEFORE_CGPG_MASK_Vangogh;
WREG32_SOC15(GC, 0, mmRLC_PG_DELAY_3, data);
break;
@@ -8417,6 +8418,7 @@ static int gfx_v10_0_set_powergating_state(void *handle,
case IP_VERSION(10, 3, 1):
case IP_VERSION(10, 3, 3):
case IP_VERSION(10, 3, 6):
+   case IP_VERSION(10, 3, 7):
gfx_v10_cntl_pg(adev, enable);
amdgpu_gfx_off_ctrl(adev, enable);
break;
diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 05487894120a..e19f14c3ef59 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -959,7 +959,8 @@ static int nv_common_early_init(void *handle)
AMD_CG_SUPPORT_JPEG_MGCG;
adev->pg_flags = AMD_PG_SUPPORT_VCN |
AMD_PG_SUPPORT_VCN_DPG |
-   AMD_PG_SUPPORT_JPEG;
+   AMD_PG_SUPPORT_JPEG |
+   AMD_PG_SUPPORT_GFX_PG;
adev->external_rev_id = adev->rev_id + 0x01;
break;
default:
-- 
2.17.1



[PATCH 3/3] drm/amdgpu: enable gfxoff routine for GC 10.3.7

2022-03-01 Thread Prike Liang
Enable gfxoff routine for GC 10.3.7.

Signed-off-by: Prike Liang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 +++
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index e048635435a2..92fdccc4a905 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -6557,6 +6557,7 @@ static void gfx_v10_0_kiq_setting(struct amdgpu_ring 
*ring)
case IP_VERSION(10, 3, 5):
case IP_VERSION(10, 3, 6):
case IP_VERSION(10, 3, 3):
+   case IP_VERSION(10, 3, 7):
tmp = RREG32_SOC15(GC, 0, mmRLC_CP_SCHEDULERS_Sienna_Cichlid);
tmp &= 0xff00;
tmp |= (ring->me << 5) | (ring->pipe << 3) | (ring->queue);
@@ -7857,6 +7858,7 @@ static void gfx_v10_0_set_safe_mode(struct amdgpu_device 
*adev)
case IP_VERSION(10, 3, 5):
case IP_VERSION(10, 3, 6):
case IP_VERSION(10, 3, 3):
+   case IP_VERSION(10, 3, 7):
WREG32_SOC15(GC, 0, mmRLC_SAFE_MODE_Sienna_Cichlid, data);
 
/* wait for RLC_SAFE_MODE */
@@ -7894,6 +7896,7 @@ static void gfx_v10_0_unset_safe_mode(struct 
amdgpu_device *adev)
case IP_VERSION(10, 3, 5):
case IP_VERSION(10, 3, 6):
case IP_VERSION(10, 3, 3):
+   case IP_VERSION(10, 3, 7):
WREG32_SOC15(GC, 0, mmRLC_SAFE_MODE_Sienna_Cichlid, data);
break;
default:
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
index 261a3749c089..05783b9b4b9a 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
@@ -751,6 +751,7 @@ int smu_v13_0_gfx_off_control(struct smu_context *smu, bool 
enable)
switch (adev->ip_versions[MP1_HWIP][0]) {
case IP_VERSION(13, 0, 1):
case IP_VERSION(13, 0, 3):
+   case IP_VERSION(13, 0, 8):
if (!(adev->pm.pp_feature & PP_GFXOFF_MASK))
return 0;
if (enable)
-- 
2.17.1



[PATCH 1/2] drm/amdgpu: move amdgpu_gmc_noretry_set after ip_versions populated

2022-03-01 Thread Yifan Zhang
otherwise adev->ip_versions is still empty when amdgpu_gmc_noretry_set
is called.

Signed-off-by: Yifan Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6113ddc765a7..61a6a7920c76 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1554,7 +1554,6 @@ static int amdgpu_device_check_arguments(struct 
amdgpu_device *adev)
 
amdgpu_gmc_tmz_set(adev);
 
-   amdgpu_gmc_noretry_set(adev);
 
return 0;
 }
@@ -3641,6 +3640,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
if (r)
return r;
 
+   amdgpu_gmc_noretry_set(adev);
/* Need to get xgmi info early to decide the reset behavior*/
if (adev->gmc.xgmi.supported) {
r = adev->gfxhub.funcs->get_xgmi_info(adev);
-- 
2.25.1



[PATCH 2/2] drm/amdgpu: add noretry=1 for gc 10.3.6

2022-03-01 Thread Yifan Zhang
From: Aaron Liu 

By default, set noretry=1 for kfd exception test.

Signed-off-by: Aaron Liu 
Signed-off-by: Yifan Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index ab75e189bc0b..4bbd8501c6c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -595,6 +595,13 @@ void amdgpu_gmc_noretry_set(struct amdgpu_device *adev)
gmc->noretry = amdgpu_noretry;
break;
}
+
+   if (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 6)) {
+   if (amdgpu_noretry == -1)
+   gmc->noretry = 1;
+   else
+   gmc->noretry = amdgpu_noretry;
+   }
 }
 
 void amdgpu_gmc_set_vm_fault_masks(struct amdgpu_device *adev, int hub_type,
-- 
2.25.1



Re: [PATCH 2/2] drm/amdgpu: add noretry=1 for gc 10.3.6

2022-03-01 Thread Christian König

Am 01.03.22 um 14:05 schrieb Yifan Zhang:

From: Aaron Liu 

By default, set noretry=1 for kfd exception test.

Signed-off-by: Aaron Liu 
Signed-off-by: Yifan Zhang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index ab75e189bc0b..4bbd8501c6c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -595,6 +595,13 @@ void amdgpu_gmc_noretry_set(struct amdgpu_device *adev)
gmc->noretry = amdgpu_noretry;
break;
}
+
+   if (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 6)) {
+   if (amdgpu_noretry == -1)
+   gmc->noretry = 1;
+   else
+   gmc->noretry = amdgpu_noretry;
+   }


You should probably move the whole switch case to version checks instead.

Christian.


  }
  
  void amdgpu_gmc_set_vm_fault_masks(struct amdgpu_device *adev, int hub_type,




[PATCH Review 1/1] drm/amdgpu: support send bad channel info to smu

2022-03-01 Thread Stanley . Yang
Message SMU bad channel information bitmap to update OOB table

Change-Id: I49a79af64d5263c28db059ecb8b8405a471431b4
Signed-off-by: Stanley.Yang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   |  7 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h   |  3 ++
 .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 25 ++-
 .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h|  4 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c   |  5 +++
 drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 12 ++
 drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  1 +
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 10 +
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  7 +++
 .../pm/swsmu/inc/pmfw_if/aldebaran_ppsmc.h|  3 +-
 drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h  |  3 +-
 .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 43 +++
 12 files changed, 119 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index d3875618ebf5..f9104f99eb9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2068,6 +2068,7 @@ int amdgpu_ras_recovery_init(struct amdgpu_device *adev)
mutex_init(&con->recovery_lock);
INIT_WORK(&con->recovery_work, amdgpu_ras_do_recovery);
atomic_set(&con->in_recovery, 0);
+   con->eeprom_control.bad_channel_bitmap = 0;
 
max_eeprom_records_count = amdgpu_ras_eeprom_max_record_count();
amdgpu_ras_validate_threshold(adev, max_eeprom_records_count);
@@ -2092,6 +2093,11 @@ int amdgpu_ras_recovery_init(struct amdgpu_device *adev)
goto free;
 
amdgpu_dpm_send_hbm_bad_pages_num(adev, 
con->eeprom_control.ras_num_recs);
+
+   if (con->update_channel_flag == true) {
+   amdgpu_dpm_send_hbm_bad_channel_flag(adev, 
con->eeprom_control.bad_channel_bitmap);
+   con->update_channel_flag = false;
+   }
}
 
 #ifdef CONFIG_X86_MCE_AMD
@@ -2285,6 +2291,7 @@ int amdgpu_ras_init(struct amdgpu_device *adev)
goto release_con;
}
 
+   con->update_channel_flag = false;
con->features = 0;
INIT_LIST_HEAD(&con->head);
/* Might need get this flag from vbios. */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
index 7cddaad90d6d..9314fde81e68 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
@@ -374,6 +374,9 @@ struct amdgpu_ras {
 
/* record umc error info queried from smu */
struct umc_ecc_info umc_ecc;
+
+   /* Indicates smu whether need update bad channel info */
+   bool update_channel_flag;
 };
 
 struct ras_fs_data {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index 2b844a5aafdb..ad5d8667756d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -265,6 +265,7 @@ int amdgpu_ras_eeprom_reset_table(struct 
amdgpu_ras_eeprom_control *control)
 {
struct amdgpu_device *adev = to_amdgpu_device(control);
struct amdgpu_ras_eeprom_table_header *hdr = &control->tbl_hdr;
+   struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
u8 csum;
int res;
 
@@ -285,6 +286,10 @@ int amdgpu_ras_eeprom_reset_table(struct 
amdgpu_ras_eeprom_control *control)
 
amdgpu_dpm_send_hbm_bad_pages_num(adev, control->ras_num_recs);
 
+   control->bad_channel_bitmap = 0;
+   amdgpu_dpm_send_hbm_bad_channel_flag(adev, control->bad_channel_bitmap);
+   con->update_channel_flag = false;
+
amdgpu_ras_debugfs_set_ret_size(control);
 
mutex_unlock(&control->ras_tbl_mutex);
@@ -418,6 +423,7 @@ amdgpu_ras_eeprom_append_table(struct 
amdgpu_ras_eeprom_control *control,
   struct eeprom_table_record *record,
   const u32 num)
 {
+   struct amdgpu_ras *con = 
amdgpu_ras_get_context(to_amdgpu_device(control));
u32 a, b, i;
u8 *buf, *pp;
int res;
@@ -429,9 +435,16 @@ amdgpu_ras_eeprom_append_table(struct 
amdgpu_ras_eeprom_control *control,
/* Encode all of them in one go.
 */
pp = buf;
-   for (i = 0; i < num; i++, pp += RAS_TABLE_RECORD_SIZE)
+   for (i = 0; i < num; i++, pp += RAS_TABLE_RECORD_SIZE) {
__encode_table_record_to_buf(control, &record[i], pp);
 
+   /* update bad channel bitmap */
+   if (!(control->bad_channel_bitmap & (1 << 
record[i].mem_channel))) {
+   control->bad_channel_bitmap |= 1 << 
record[i].mem_channel;
+   con->update_channel_flag = true;
+   }
+   }
+
/* a, first record index to write into.
 * b, last record index to write into.
 * a = first index to read (fri)

回复: [PATCH] drm/amdgpu: Move common initialization operations of each ras block to one function

2022-03-01 Thread Yang, Stanley
[AMD Official Use Only]

Hi yipe,

One suggestion for this patch, please check my comment.

Regards,
Stanley
> -邮件原件-
> 发件人: amd-gfx  代表 yipechai
> 发送时间: Tuesday, March 1, 2022 5:46 PM
> 收件人: amd-gfx@lists.freedesktop.org
> 抄送: Zhou1, Tao ; Zhang, Hawking
> ; Clements, John ;
> Chai, Thomas ; Chai, Thomas
> 
> 主题: [PATCH] drm/amdgpu: Move common initialization operations of each
> ras block to one function
>
> Define amdgpu_ras_sw_init function to initialize all ras blocks.
>
> Signed-off-by: yipechai 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   6 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c|   2 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 143
> -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h|   1 +
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  |  21 ---
>  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c |  16 ---
>  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  28 
>  drivers/gpu/drm/amd/amdgpu/mca_v3_0.c  |   6 -
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c |  17 ---
>  9 files changed, 148 insertions(+), 92 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 6113ddc765a7..72550e9f6058 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2402,6 +2402,12 @@ static int amdgpu_device_ip_init(struct
> amdgpu_device *adev)
>   }
>   }
>
> + r = amdgpu_ras_sw_init(adev);
> + if (r) {
> + DRM_ERROR("amdgpu_ras_early_init failed (%d).\n", r);
> + goto init_failed;
> + }
[Yang, Stanley] : This is ras blocks early init, I  think it's more reasonable 
to move amdgpu_ras_sw_init before amdgpu_ras_init function.

> +
>   if (amdgpu_sriov_vf(adev))
>   amdgpu_virt_init_data_exchange(adev);
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index ab75e189bc0b..544241f357b2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -440,8 +440,6 @@ int amdgpu_gmc_ras_early_init(struct
> amdgpu_device *adev)  {
>   if (!adev->gmc.xgmi.connected_to_cpu) {
>   adev->gmc.xgmi.ras = &xgmi_ras;
> - amdgpu_ras_register_ras_block(adev, &adev-
> >gmc.xgmi.ras->ras_block);
> - adev->gmc.xgmi.ras_if = &adev->gmc.xgmi.ras-
> >ras_block.ras_comm;
>   }
>
>   return 0;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index d3875618ebf5..89075ab9e82e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2299,8 +2299,6 @@ int amdgpu_ras_init(struct amdgpu_device *adev)
>   case CHIP_ALDEBARAN:
>   if (!adev->gmc.xgmi.connected_to_cpu) {
>   adev->nbio.ras = &nbio_v7_4_ras;
> - amdgpu_ras_register_ras_block(adev, &adev-
> >nbio.ras->ras_block);
> - adev->nbio.ras_if = &adev->nbio.ras-
> >ras_block.ras_comm;
>   }
>   break;
>   default:
> @@ -2533,6 +2531,147 @@ void amdgpu_ras_suspend(struct
> amdgpu_device *adev)
>   amdgpu_ras_disable_all_features(adev, 1);  }
>
> +int amdgpu_ras_sw_init(struct amdgpu_device *adev) {
> + int err = 0;
> +
> + if (!amdgpu_ras_asic_supported(adev))
> + return 0;
> +
> + if (adev->nbio.ras) {
> + err = amdgpu_ras_register_ras_block(adev, &adev-
> >nbio.ras->ras_block);
> + if (err) {
> + dev_err(adev->dev, "Failed to register nbio ras
> block!\n");
> + return err;
> + }
> + adev->nbio.ras_if = &adev->nbio.ras->ras_block.ras_comm;
> + }
> +
> + if (adev->gmc.xgmi.ras) {
> + err = amdgpu_ras_register_ras_block(adev, &adev-
> >gmc.xgmi.ras->ras_block);
> + if (err) {
> + dev_err(adev->dev, "Failed to register xgmi ras
> block!\n");
> + return err;
> + }
> + adev->gmc.xgmi.ras_if = &adev->gmc.xgmi.ras-
> >ras_block.ras_comm;
> + }
> +
> + if (adev->gfx.ras) {
> + err = amdgpu_ras_register_ras_block(adev, &adev->gfx.ras-
> >ras_block);
> + if (err) {
> + dev_err(adev->dev, "Failed to register gfx ras
> block!\n");
> + return err;
> + }
> +
> + strcpy(adev->gfx.ras->ras_block.ras_comm.name, "gfx");
> + adev->gfx.ras->ras_block.ras_comm.block =
> AMDGPU_RAS_BLOCK__GFX;
> + adev->gfx.ras->ras_block.ras_comm.type =
> AMDGPU_RAS_ERROR__MULTI_UNCORRECTABLE;
> + adev->gfx.ras_if = &adev->gfx.ras->ras_block.ras_comm;
> +
> + /* If not define special ras_late_init function, use gfx default
> ras_late_init */
> + if (!adev->gfx.ras->ras_bloc

Re: [PATCH] drm/amdgpu: enable gfx clock gating control for GC 10.3.7

2022-03-01 Thread Deucher, Alexander
[Public]

Reviewed-by: Alex Deucher 

From: Liang, Prike 
Sent: Monday, February 28, 2022 10:01 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Huang, Ray 
; Liang, Prike 
Subject: [PATCH] drm/amdgpu: enable gfx clock gating control for GC 10.3.7

Enable gfx cg gate/ungate control for GC 10.3.7.

Signed-off-by: Prike Liang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 90158289cd30..fd7ded7799e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -8445,6 +8445,7 @@ static int gfx_v10_0_set_clockgating_state(void *handle,
 case IP_VERSION(10, 3, 5):
 case IP_VERSION(10, 3, 6):
 case IP_VERSION(10, 3, 3):
+   case IP_VERSION(10, 3, 7):
 gfx_v10_0_update_gfx_clock_gating(adev,
  state == AMD_CG_STATE_GATE);
 break;
--
2.17.1



Re: [PATCH 3/3] drm/amdgpu: enable gfxoff routine for GC 10.3.7

2022-03-01 Thread Alex Deucher
Series is:
Reviewed-by: Alex Deucher 

On Tue, Mar 1, 2022 at 7:30 AM Prike Liang  wrote:
>
> Enable gfxoff routine for GC 10.3.7.
>
> Signed-off-by: Prike Liang 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 +++
>  drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 1 +
>  2 files changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index e048635435a2..92fdccc4a905 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -6557,6 +6557,7 @@ static void gfx_v10_0_kiq_setting(struct amdgpu_ring 
> *ring)
> case IP_VERSION(10, 3, 5):
> case IP_VERSION(10, 3, 6):
> case IP_VERSION(10, 3, 3):
> +   case IP_VERSION(10, 3, 7):
> tmp = RREG32_SOC15(GC, 0, mmRLC_CP_SCHEDULERS_Sienna_Cichlid);
> tmp &= 0xff00;
> tmp |= (ring->me << 5) | (ring->pipe << 3) | (ring->queue);
> @@ -7857,6 +7858,7 @@ static void gfx_v10_0_set_safe_mode(struct 
> amdgpu_device *adev)
> case IP_VERSION(10, 3, 5):
> case IP_VERSION(10, 3, 6):
> case IP_VERSION(10, 3, 3):
> +   case IP_VERSION(10, 3, 7):
> WREG32_SOC15(GC, 0, mmRLC_SAFE_MODE_Sienna_Cichlid, data);
>
> /* wait for RLC_SAFE_MODE */
> @@ -7894,6 +7896,7 @@ static void gfx_v10_0_unset_safe_mode(struct 
> amdgpu_device *adev)
> case IP_VERSION(10, 3, 5):
> case IP_VERSION(10, 3, 6):
> case IP_VERSION(10, 3, 3):
> +   case IP_VERSION(10, 3, 7):
> WREG32_SOC15(GC, 0, mmRLC_SAFE_MODE_Sienna_Cichlid, data);
> break;
> default:
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
> index 261a3749c089..05783b9b4b9a 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
> @@ -751,6 +751,7 @@ int smu_v13_0_gfx_off_control(struct smu_context *smu, 
> bool enable)
> switch (adev->ip_versions[MP1_HWIP][0]) {
> case IP_VERSION(13, 0, 1):
> case IP_VERSION(13, 0, 3):
> +   case IP_VERSION(13, 0, 8):
> if (!(adev->pm.pp_feature & PP_GFXOFF_MASK))
> return 0;
> if (enable)
> --
> 2.17.1
>


Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Jakob Koschel



> On 1. Mar 2022, at 01:41, Linus Torvalds  
> wrote:
> 
> On Mon, Feb 28, 2022 at 1:47 PM Jakob Koschel  wrote:
>> 
>> The goal of this is to get compiler warnings right? This would indeed be 
>> great.
> 
> Yes, so I don't mind having a one-time patch that has been gathered
> using some automated checker tool, but I don't think that works from a
> long-term maintenance perspective.
> 
> So if we have the basic rule being "don't use the loop iterator after
> the loop has finished, because it can cause all kinds of subtle
> issues", then in _addition_ to fixing the existing code paths that
> have this issue, I really would want to (a) get a compiler warning for
> future cases and (b) make it not actually _work_ for future cases.
> 
> Because otherwise it will just happen again.
> 
>> Changing the list_for_each_entry() macro first will break all of those cases
>> (e.g. the ones using 'list_entry_is_head()).
> 
> So I have no problems with breaking cases that we basically already
> have a patch for due to  your automated tool. There were certainly
> more than a handful, but it didn't look _too_ bad to just make the
> rule be "don't use the iterator after the loop".
> 
> Of course, that's just based on that patch of yours. Maybe there are a
> ton of other cases that your patch didn't change, because they didn't
> match your trigger case, so I may just be overly optimistic here.

Based on the coccinelle script there are ~480 cases that need fixing
in total. I'll now finish all of them and then split them by
submodules as Greg suggested and repost a patch set per submodule.
Sounds good?

> 
> But basically to _me_, the important part is that the end result is
> maintainable longer-term. I'm more than happy to have a one-time patch
> to fix a lot of dubious cases if we can then have clean rules going
> forward.
> 
>> I assumed it is better to fix those cases first and then have a simple
>> coccinelle script changing the macro + moving the iterator into the scope
>> of the macro.
> 
> So that had been another plan of mine, until I actually looked at
> changing the macro. In the one case I looked at, it was ugly beyond
> belief.
> 
> It turns out that just syntactically, it's really nice to give the
> type of the iterator from outside the way we do now. Yeah, it may be a
> bit odd, and maybe it's partly because I'm so used to the
> "list_for_each_list_entry()" syntax, but moving the type into the loop
> construct really made it nasty - either one very complex line, or
> having to split it over two lines which was even worse.
> 
> Maybe the place I looked at just happened to have a long typename, but
> it's basically always going to be a struct, so it's never a _simple_
> type. And it just looked very odd adn unnatural to have the type as
> one of the "arguments" to that list_for_each_entry() macro.
> 
> So yes, initially my idea had been to just move the iterator entirely
> inside the macro. But specifying the type got so ugly that I think
> that
> 
>typeof (pos) pos
> 
> trick inside the macro really ends up giving us the best of all worlds:
> 
> (a) let's us keep the existing syntax and code for all the nice cases
> that did everything inside the loop anyway
> 
> (b) gives us a nice warning for any normal use-after-loop case
> (unless you explicitly initialized it like that
> sgx_mmu_notifier_release() function did for no good reason
> 
> (c) also guarantees that even if you don't get a warning,
> non-converted (or newly written) bad code won't actually _work_
> 
> so you end up getting the new rules without any ambiguity or mistaken
> 
>> With this you are no longer able to set the 'outer' pos within the list
>> iterator loop body or am I missing something?
> 
> Correct. Any assignment inside the loop will be entirely just to the
> local loop case. So any "break;" out of the loop will have to set
> another variable - like your updated patch did.
> 
>> I fail to see how this will make most of the changes in this
>> patch obsolete (if that was the intention).
> 
> I hope my explanation above clarifies my thinking: I do not dislike
> your patch, and in fact your patch is indeed required to make the new
> semantics work.

ok it's all clear now, thanks for clarifying.
I've defined all the 'tmp' iterator variables uninitialized so applying
your patch on top of that later will just give the nice compiler warning 
if they are used past the loop body.

> 
> What I disliked was always the maintainability of your patch - making
> the rules be something that isn't actually visible in the source code,
> and letting the old semantics still work as well as they ever did, and
> having to basically run some verification pass to find bad users.

Since this patch is not a complete list of cases that need fixing (30%)
I haven't included the actual change of moving the iterator variable
into the loop and thought that would be a second step coming after this
is merged.

With these changes alone, yes you still rely on manu

撤回: [PATCH] drm/amdgpu: Move common initialization operations of each ras block to one function

2022-03-01 Thread Yang, Stanley
Yang, Stanley 将撤回邮件“[PATCH] drm/amdgpu: Move common initialization operations 
of each ras block to one function”。

[PATCH v2 1/3] drm/amdgpu: move amdgpu_gmc_noretry_set after ip_versions populated

2022-03-01 Thread Yifan Zhang
otherwise adev->ip_versions is still empty when amdgpu_gmc_noretry_set
is called.

Signed-off-by: Yifan Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6113ddc765a7..61a6a7920c76 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1554,7 +1554,6 @@ static int amdgpu_device_check_arguments(struct 
amdgpu_device *adev)
 
amdgpu_gmc_tmz_set(adev);
 
-   amdgpu_gmc_noretry_set(adev);
 
return 0;
 }
@@ -3641,6 +3640,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
if (r)
return r;
 
+   amdgpu_gmc_noretry_set(adev);
/* Need to get xgmi info early to decide the reset behavior*/
if (adev->gmc.xgmi.supported) {
r = adev->gfxhub.funcs->get_xgmi_info(adev);
-- 
2.25.1



[PATCH v2 2/3] drm/amdgpu: convert code name to ip version for noretry set

2022-03-01 Thread Yifan Zhang
Use IP version rather than codename for noretry set.

Signed-off-by: Yifan Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index ab75e189bc0b..fbc22b7b6315 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -559,14 +559,14 @@ void amdgpu_gmc_noretry_set(struct amdgpu_device *adev)
 {
struct amdgpu_gmc *gmc = &adev->gmc;
 
-   switch (adev->asic_type) {
-   case CHIP_VEGA10:
-   case CHIP_VEGA20:
-   case CHIP_ARCTURUS:
-   case CHIP_ALDEBARAN:
-   case CHIP_BEIGE_GOBY:
-   case CHIP_YELLOW_CARP:
-   case CHIP_RENOIR:
+   switch (adev->ip_versions[GC_HWIP][0]) {
+   case IP_VERSION(9, 0, 1):
+   case IP_VERSION(9, 4, 0):
+   case IP_VERSION(9, 4, 1):
+   case IP_VERSION(9, 4, 2):
+   case IP_VERSION(10, 3, 5):
+   case IP_VERSION(10, 3, 3):
+   case IP_VERSION(9, 3, 0):
/*
 * noretry = 0 will cause kfd page fault tests fail
 * for some ASICs, so set default to 1 for these ASICs.
@@ -576,7 +576,6 @@ void amdgpu_gmc_noretry_set(struct amdgpu_device *adev)
else
gmc->noretry = amdgpu_noretry;
break;
-   case CHIP_RAVEN:
default:
/* Raven currently has issues with noretry
 * regardless of what we decide for other
-- 
2.25.1



[PATCH v2 3/3] drm/amdgpu: set noretry=1 for gc 10.3.6

2022-03-01 Thread Yifan Zhang
this patch to set noretry=1 for gc 10.3.6.

Signed-off-by: Yifan Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index fbc22b7b6315..7c2a9555b7cc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -567,6 +567,7 @@ void amdgpu_gmc_noretry_set(struct amdgpu_device *adev)
case IP_VERSION(10, 3, 5):
case IP_VERSION(10, 3, 3):
case IP_VERSION(9, 3, 0):
+   case IP_VERSION(10, 3, 6):
/*
 * noretry = 0 will cause kfd page fault tests fail
 * for some ASICs, so set default to 1 for these ASICs.
-- 
2.25.1



Re: [PATCH v2 1/3] drm/amdgpu: move amdgpu_gmc_noretry_set after ip_versions populated

2022-03-01 Thread Alex Deucher
Series is:
Reviewed-by: Alex Deucher 

On Tue, Mar 1, 2022 at 10:02 AM Yifan Zhang  wrote:
>
> otherwise adev->ip_versions is still empty when amdgpu_gmc_noretry_set
> is called.
>
> Signed-off-by: Yifan Zhang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 6113ddc765a7..61a6a7920c76 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1554,7 +1554,6 @@ static int amdgpu_device_check_arguments(struct 
> amdgpu_device *adev)
>
> amdgpu_gmc_tmz_set(adev);
>
> -   amdgpu_gmc_noretry_set(adev);
>
> return 0;
>  }
> @@ -3641,6 +3640,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> if (r)
> return r;
>
> +   amdgpu_gmc_noretry_set(adev);
> /* Need to get xgmi info early to decide the reset behavior*/
> if (adev->gmc.xgmi.supported) {
> r = adev->gfxhub.funcs->get_xgmi_info(adev);
> --
> 2.25.1
>


Re: [PATCH v2 2/3] drm/amdgpu: convert code name to ip version for noretry set

2022-03-01 Thread Paul Menzel

Dear Yifan,


Thank you for your patch.

Am 01.03.22 um 16:01 schrieb Yifan Zhang:

Use IP version rather than codename for noretry set.


Why?


Signed-off-by: Yifan Zhang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 17 -
  1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index ab75e189bc0b..fbc22b7b6315 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -559,14 +559,14 @@ void amdgpu_gmc_noretry_set(struct amdgpu_device *adev)
  {
struct amdgpu_gmc *gmc = &adev->gmc;
  
-	switch (adev->asic_type) {

-   case CHIP_VEGA10:
-   case CHIP_VEGA20:
-   case CHIP_ARCTURUS:
-   case CHIP_ALDEBARAN:
-   case CHIP_BEIGE_GOBY:
-   case CHIP_YELLOW_CARP:
-   case CHIP_RENOIR:
+   switch (adev->ip_versions[GC_HWIP][0]) {
+   case IP_VERSION(9, 0, 1):
+   case IP_VERSION(9, 4, 0):
+   case IP_VERSION(9, 4, 1):
+   case IP_VERSION(9, 4, 2):
+   case IP_VERSION(10, 3, 5):
+   case IP_VERSION(10, 3, 3):
+   case IP_VERSION(9, 3, 0):


I think, sorting these entries might be useful.

Should the names be added as comments for those not having them memorized?


/*
 * noretry = 0 will cause kfd page fault tests fail
 * for some ASICs, so set default to 1 for these ASICs.
@@ -576,7 +576,6 @@ void amdgpu_gmc_noretry_set(struct amdgpu_device *adev)
else
gmc->noretry = amdgpu_noretry;
break;
-   case CHIP_RAVEN:


Why remove this?


default:
/* Raven currently has issues with noretry
 * regardless of what we decide for other



Kind regards,

Paul


[PATCH 3/3] drm/amdgpu/sdma5: drop unused cyan skillfish firmware

2022-03-01 Thread Alex Deucher
Leftover from bring up.  Not used anymore.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
index 81e033549dda..53a8df4b030e 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
@@ -51,9 +51,6 @@ MODULE_FIRMWARE("amdgpu/navi14_sdma1.bin");
 MODULE_FIRMWARE("amdgpu/navi12_sdma.bin");
 MODULE_FIRMWARE("amdgpu/navi12_sdma1.bin");
 
-MODULE_FIRMWARE("amdgpu/cyan_skillfish_sdma.bin");
-MODULE_FIRMWARE("amdgpu/cyan_skillfish_sdma1.bin");
-
 MODULE_FIRMWARE("amdgpu/cyan_skillfish2_sdma.bin");
 MODULE_FIRMWARE("amdgpu/cyan_skillfish2_sdma1.bin");
 
@@ -264,10 +261,7 @@ static int sdma_v5_0_init_microcode(struct amdgpu_device 
*adev)
chip_name = "navi12";
break;
case IP_VERSION(5, 0, 1):
-   if (adev->apu_flags & AMD_APU_IS_CYAN_SKILLFISH2)
-   chip_name = "cyan_skillfish2";
-   else
-   chip_name = "cyan_skillfish";
+   chip_name = "cyan_skillfish2";
break;
default:
BUG();
-- 
2.35.1



[PATCH 2/3] drm/amdgpu/gfx10: drop unused cyan skillfish firmware

2022-03-01 Thread Alex Deucher
Leftover from bring up.  Not used anymore.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 90158289cd30..0f350a5c4bf2 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -250,13 +250,6 @@ MODULE_FIRMWARE("amdgpu/yellow_carp_mec.bin");
 MODULE_FIRMWARE("amdgpu/yellow_carp_mec2.bin");
 MODULE_FIRMWARE("amdgpu/yellow_carp_rlc.bin");
 
-MODULE_FIRMWARE("amdgpu/cyan_skillfish_ce.bin");
-MODULE_FIRMWARE("amdgpu/cyan_skillfish_pfp.bin");
-MODULE_FIRMWARE("amdgpu/cyan_skillfish_me.bin");
-MODULE_FIRMWARE("amdgpu/cyan_skillfish_mec.bin");
-MODULE_FIRMWARE("amdgpu/cyan_skillfish_mec2.bin");
-MODULE_FIRMWARE("amdgpu/cyan_skillfish_rlc.bin");
-
 MODULE_FIRMWARE("amdgpu/cyan_skillfish2_ce.bin");
 MODULE_FIRMWARE("amdgpu/cyan_skillfish2_pfp.bin");
 MODULE_FIRMWARE("amdgpu/cyan_skillfish2_me.bin");
@@ -4043,10 +4036,7 @@ static int gfx_v10_0_init_microcode(struct amdgpu_device 
*adev)
break;
case IP_VERSION(10, 1, 3):
case IP_VERSION(10, 1, 4):
-   if (adev->apu_flags & AMD_APU_IS_CYAN_SKILLFISH2)
-   chip_name = "cyan_skillfish2";
-   else
-   chip_name = "cyan_skillfish";
+   chip_name = "cyan_skillfish2";
break;
case IP_VERSION(10, 3, 7):
chip_name = "gc_10_3_7";
-- 
2.35.1



[PATCH 1/3] drm/amdgpu: remove unused gpu_info firmwares

2022-03-01 Thread Alex Deucher
These were leftover from bring up and are no longer
necessary.  The information is available via
the IP discovery table.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 23 --
 1 file changed, 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6113ddc765a7..2f732a1758d6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -80,12 +80,7 @@ MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/picasso_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/raven2_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/arcturus_gpu_info.bin");
-MODULE_FIRMWARE("amdgpu/renoir_gpu_info.bin");
-MODULE_FIRMWARE("amdgpu/navi10_gpu_info.bin");
-MODULE_FIRMWARE("amdgpu/navi14_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/navi12_gpu_info.bin");
-MODULE_FIRMWARE("amdgpu/vangogh_gpu_info.bin");
-MODULE_FIRMWARE("amdgpu/yellow_carp_gpu_info.bin");
 
 #define AMDGPU_RESUME_MS   2000
 #define AMDGPU_MAX_RETRY_LIMIT 2
@@ -1992,27 +1987,9 @@ static int amdgpu_device_parse_gpu_info_fw(struct 
amdgpu_device *adev)
case CHIP_ARCTURUS:
chip_name = "arcturus";
break;
-   case CHIP_RENOIR:
-   if (adev->apu_flags & AMD_APU_IS_RENOIR)
-   chip_name = "renoir";
-   else
-   chip_name = "green_sardine";
-   break;
-   case CHIP_NAVI10:
-   chip_name = "navi10";
-   break;
-   case CHIP_NAVI14:
-   chip_name = "navi14";
-   break;
case CHIP_NAVI12:
chip_name = "navi12";
break;
-   case CHIP_VANGOGH:
-   chip_name = "vangogh";
-   break;
-   case CHIP_YELLOW_CARP:
-   chip_name = "yellow_carp";
-   break;
}
 
snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_gpu_info.bin", chip_name);
-- 
2.35.1



Re: [PATCH] mm: split vm_normal_pages for LRU and non-LRU handling

2022-03-01 Thread Felix Kuehling



Am 2022-03-01 um 03:03 schrieb David Hildenbrand:

On 28.02.22 21:34, Alex Sierra wrote:

DEVICE_COHERENT pages introduce a subtle distinction in the way
"normal" pages can be used by various callers throughout the kernel.
They behave like normal pages for purposes of mapping in CPU page
tables, and for COW. But they do not support LRU lists, NUMA
migration or THP. Therefore we split vm_normal_page into two
functions vm_normal_any_page and vm_normal_lru_page. The latter will
only return pages that can be put on an LRU list and that support
NUMA migration and THP.

Why not s/vm_normal_any_page/vm_normal_page/ and avoid code churn?


I don't care much, personally, what names we end up with. I'll wait for 
a consensus here.






We also introduced a FOLL_LRU flag that adds the same behaviour to
follow_page and related APIs, to allow callers to specify that they
expect to put pages on an LRU list.

[...]

-#define FOLL_WRITE 0x01/* check pte is writable */
-#define FOLL_TOUCH 0x02/* mark page accessed */
-#define FOLL_GET   0x04/* do get_page on page */
-#define FOLL_DUMP  0x08/* give error on hole if it would be zero */
-#define FOLL_FORCE 0x10/* get_user_pages read/write w/o permission */
-#define FOLL_NOWAIT0x20/* if a disk transfer is needed, start the IO
-* and return without waiting upon it */
-#define FOLL_POPULATE  0x40/* fault in pages (with FOLL_MLOCK) */
-#define FOLL_NOFAULT   0x80/* do not fault in pages */
-#define FOLL_HWPOISON  0x100   /* check page is hwpoisoned */
-#define FOLL_NUMA  0x200   /* force NUMA hinting page fault */
-#define FOLL_MIGRATION 0x400   /* wait for page to replace migration entry */
-#define FOLL_TRIED 0x800   /* a retry, previous pass started an IO */
-#define FOLL_MLOCK 0x1000  /* lock present pages */
-#define FOLL_REMOTE0x2000  /* we are working on non-current tsk/mm */
-#define FOLL_COW   0x4000  /* internal GUP flag */
-#define FOLL_ANON  0x8000  /* don't do file mappings */
-#define FOLL_LONGTERM  0x1 /* mapping lifetime is indefinite: see below */
-#define FOLL_SPLIT_PMD 0x2 /* split huge pmd before returning */
-#define FOLL_PIN   0x4 /* pages must be released via unpin_user_page */
-#define FOLL_FAST_ONLY 0x8 /* gup_fast: prevent fall-back to slow gup */
+#define FOLL_WRITE 0x01 /* check pte is writable */
+#define FOLL_TOUCH 0x02 /* mark page accessed */
+#define FOLL_GET   0x04 /* do get_page on page */
+#define FOLL_DUMP  0x08 /* give error on hole if it would be zero */
+#define FOLL_FORCE 0x10 /* get_user_pages read/write w/o permission */
+#define FOLL_NOWAIT0x20 /* if a disk transfer is needed, start the IO
+ * and return without waiting upon it */
+#define FOLL_POPULATE  0x40 /* fault in pages (with FOLL_MLOCK) */
+#define FOLL_NOFAULT   0x80 /* do not fault in pages */
+#define FOLL_HWPOISON  0x100/* check page is hwpoisoned */
+#define FOLL_NUMA  0x200/* force NUMA hinting page fault */
+#define FOLL_MIGRATION 0x400/* wait for page to replace migration entry */
+#define FOLL_TRIED 0x800/* a retry, previous pass started an IO */
+#define FOLL_MLOCK 0x1000   /* lock present pages */
+#define FOLL_REMOTE0x2000   /* we are working on non-current tsk/mm */
+#define FOLL_COW   0x4000   /* internal GUP flag */
+#define FOLL_ANON  0x8000   /* don't do file mappings */
+#define FOLL_LONGTERM  0x1  /* mapping lifetime is indefinite: see below */
+#define FOLL_SPLIT_PMD 0x2  /* split huge pmd before returning */
+#define FOLL_PIN   0x4  /* pages must be released via unpin_user_page 
*/
+#define FOLL_FAST_ONLY 0x8  /* gup_fast: prevent fall-back to slow gup */
+#define FOLL_LRU   0x10 /* return only LRU (anon or page cache) */
  

Can we minimize code churn, please?


OK. I guess we could unindent the FOLL_LRU number to avoid changing all 
the comments.







if (PageReserved(page))
diff --git a/mm/migrate.c b/mm/migrate.c
index c31d04b46a5e..17d049311b78 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1614,7 +1614,7 @@ static int add_page_for_migration(struct mm_struct *mm, 
unsigned long addr,
goto out;
  
  	/* FOLL_DUMP to ignore special (like zero) pages */

-   follflags = FOLL_GET | FOLL_DUMP;
+   follflags = FOLL_GET | FOLL_DUMP | FOLL_LRU;
page = follow_page(vma, addr, follflags);

Why wouldn't we want to dump DEVICE_COHERENT pages? This looks wrong.


This function later calls isolate_lru_page, which is something you can't 
do with a device page.


Regards,
  Felix






RE: [PATCH] drm/amdgpu: Move CAP firmware loading to the beginning of PSP firmware list

2022-03-01 Thread Zhang, Bokun
[AMD Official Use Only]

Reviewed-by: Bokun Zhang 

Thanks!

-Original Message-
From: Yifan Zha  
Sent: Tuesday, March 1, 2022 2:19 AM
To: amd-gfx@lists.freedesktop.org; Zhang, Bokun 
Cc: Chen, Jingwen ; Liu, Monk ; Chen, 
Guchun ; Zha, YiFan(Even) 
Subject: [PATCH] drm/amdgpu: Move CAP firmware loading to the beginning of PSP 
firmware list

[Why]
As PSP needs to verify the signature, CAP firmware must be loaded first when 
PSP loads firmwares.
Otherwise, when DFC feature is enabled, CP firmwares would be loaded failed.

[ 1149.160480] [drm] MM table gpu addr = 0x800022f000, cpu addr = 
a62afcea.
[ 1149.209874] [drm] failed to load ucode CP_CE(0x8) [ 1149.209878] [drm] psp 
gfx command LOAD_IP_FW(0x6) failed and response status is (0x0007) [ 
1149.215914] [drm] failed to load ucode CP_PFP(0x9) [ 1149.215917] [drm] psp 
gfx command LOAD_IP_FW(0x6) failed and response status is (0x0007) [ 
1149.221941] [drm] failed to load ucode CP_ME(0xA) [ 1149.221944] [drm] psp gfx 
command LOAD_IP_FW(0x6) failed and response status is (0x0007) [ 
1149.228082] [drm] failed to load ucode CP_MEC1(0xB) [ 1149.228085] [drm] psp 
gfx command LOAD_IP_FW(0x6) failed and response status is (0x0007) [ 
1149.234209] [drm] failed to load ucode CP_MEC2(0xD) [ 1149.234212] [drm] psp 
gfx command LOAD_IP_FW(0x6) failed and response status is (0x0007) [ 
1149.242379] [drm] failed to load ucode VCN(0x1C) [ 1149.242382] [drm] psp gfx 
command LOAD_IP_FW(0x6) failed and response status is (0x0007)

[How]
Move CAP UCODE ID to the beginning of AMDGPU_UCODE_ID enum list.

Signed-off-by: Yifan Zha 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h
index 428f4df184d0..40dffbac85a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h
@@ -343,7 +343,8 @@ union amdgpu_firmware_header {
  * fw loading support
  */
 enum AMDGPU_UCODE_ID {
-   AMDGPU_UCODE_ID_SDMA0 = 0,
+   AMDGPU_UCODE_ID_CAP = 0,
+   AMDGPU_UCODE_ID_SDMA0,
AMDGPU_UCODE_ID_SDMA1,
AMDGPU_UCODE_ID_SDMA2,
AMDGPU_UCODE_ID_SDMA3,
@@ -378,7 +379,6 @@ enum AMDGPU_UCODE_ID {
AMDGPU_UCODE_ID_VCN0_RAM,
AMDGPU_UCODE_ID_VCN1_RAM,
AMDGPU_UCODE_ID_DMCUB,
-   AMDGPU_UCODE_ID_CAP,
AMDGPU_UCODE_ID_MAXIMUM,
 };
 
--
2.25.1


Re: [PATCH] mm: split vm_normal_pages for LRU and non-LRU handling

2022-03-01 Thread Felix Kuehling

Am 2022-03-01 um 11:22 schrieb David Hildenbrand:

if (PageReserved(page))
diff --git a/mm/migrate.c b/mm/migrate.c
index c31d04b46a5e..17d049311b78 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1614,7 +1614,7 @@ static int add_page_for_migration(struct mm_struct *mm, 
unsigned long addr,
goto out;
   
   	/* FOLL_DUMP to ignore special (like zero) pages */

-   follflags = FOLL_GET | FOLL_DUMP;
+   follflags = FOLL_GET | FOLL_DUMP | FOLL_LRU;
page = follow_page(vma, addr, follflags);

Why wouldn't we want to dump DEVICE_COHERENT pages? This looks wrong.

This function later calls isolate_lru_page, which is something you can't
do with a device page.


Then, that code might require care instead. We most certainly don't want
to have random memory holes in a dump just because some anonymous memory
was migrated to DEVICE_COHERENT.

I don't think this code is for core dumps. The call chain I see is

SYSCALL_DEFINE6(move_pages, ...) -> kernel_move_pages -> do_pages_move 
-> add_page_for_migration


As far as I can tell this is for NUMA migrations.

Regards,
  Felix




[PATCH] drm/amdgpu/discovery: add a comment about ip_discovery firmware

2022-03-01 Thread Alex Deucher
The firmware is optional and only used as a fallback when
the IP discovery table is not available (e.g., during asic
bring up).  Once the chip goes into production, the
table will always be available on the part itself.

Unfortunately, there is no way to mark a firmware as
optional.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index e4fcbb385a62..31d86d083968 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -67,6 +67,13 @@
 #include "smuio_v11_0_6.h"
 #include "smuio_v13_0.h"
 
+/* This firmware is optional, but there is no way to
+ * specify a firmware file as optional.  It is only
+ * used as a fallback when the IP discovery table
+ * is not available for some reason (e.g., during
+ * bring up).  Once the product is in production
+ * the IP discovery table is shipped on the part itself.
+ */
 #define FIRMWARE_IP_DISCOVERY "amdgpu/ip_discovery.bin"
 MODULE_FIRMWARE(FIRMWARE_IP_DISCOVERY);
 
-- 
2.35.1



[PATCH 00/12] use dynamic-debug under drm.debug api

2022-03-01 Thread Jim Cromie
hi Jason, Greg, Daniel, DRM-everyone

drm.debug api provides ~23 macros to issue 10 categories of debug
messages, each enabled by a bit in /sys/module/drm/parameters/debug.
drm_debug_enabled(category) tests these bits at runtime; while cheap
individually, the costs accumulate.

Daniel,
I think this revision addresses most of your early review, a lot has
changed since.  Heres the link:
https://patchwork.freedesktop.org/patch/443989/

For CONFIG_DRM_USE_DYNAMIC_DEBUG=y, this patchset obsoletes those
runtime tests (inside drm_*dbg) by wrapping the 2 fns in one of the
dynamic_func_call* Factory macros.  The config dependence is due to
the .data footprint cost of the tables; AMDGPU has ~4k callsites, at
56 bytes each.

This patchset creates entries in /proc/dynamic_debug/control for each
callsite, and each has .class_id = macros' category.  Those entries,
and a new query keyword, allow (1st):

  # 1=DRM_UT_KMS (iirc)
  #> echo "module drm class 1 +p  > /proc/dynamic_debug/control

Then equivalently:
  # except it also clears other flags
  #> echo 0x01 > /sys/module/drm/parameters/debug

series overview:

dyndbg:
 - fix a bug in dyndbg static_key toggling, @stable cc'd
 - adds support for distinct classes to dyndbg (new,unused feature)
 - add DECLARE_DYNAMIC_DEBUG_CLASSBITS macro and callbacks
   to implement bitmap -> classid sysfs knob
dyndbg:
 - drops exported fn: dynamic_debug_exec_queries()
   any potential users would just use macro, or a tweak on it.
 - improve info-msg to print both "old -> new" flags
drm:
 - adapts drm debug category to dyndbg.class_id
 - wraps drm_*dbg() in a dyndbg Factory macro to get NOOP optimized debugs
   this disconnects drm.debug sysfs knob
 - uses DECLARE_DYNAMIC_DEBUG_CLASSBITS macro
   this reconnects sysfs knob

This could be -v12, but the focus and subject has wandered a bit, and
patchwork CI had multiple different notions of the version.

Noteworthy changes:

- no tracefs stuff here, refocus

In contrast, the previous drm.debug approach:

- replaced drm_dbg & drm_devdbg with calls to pr_debug & dev_dbg
  this preserved the optional decorations: module:function:line:

- used DRM_UT_CORE => "drm:core:" prefix-string, cpp cat'd to formats
  this made sites selectable by matching to that format prefix

This version:

- .class_id is easier to explain, and no config/format-string diffs

- wraps drm_dbg & drm_devdbg callsites for jumplabel enablement
  efficiency was original goal.

- loses the optional decorations.
  drm has its own logmsg standards, doesn't need decorations slapped on
  later: could recast flags for drm specific decorations

This is based on 5.17-rc4, for no particular reason.

Its also here: in (dd-drm branch)
  ghlinux-rohttps://github.com/jimc/linux.git (fetch)


Jim Cromie (13):
  dyndbg: fix static_branch manipulation @stable
  dyndbg: add class_id field and query support
  dyndbg: add DEFINE_DYNAMIC_DEBUG_CLASSBITS macro and callbacks
  dyndbg: drop EXPORTed dynamic_debug_exec_queries
  dyndbg: improve change-info to have old and new
  dyndbg: abstract dyndbg_site_is_printing
  drm_print: condense enum drm_debug_category
  drm_print: interpose drm_*dbg with forwarding macros
  drm_print: wrap drm_*_dbg in dyndbg jumplabel
  drm_print: refine drm_debug_enabled for dyndbg+jump-label
  drm_print: prefer bare printk KERN_DEBUG on generic fn
  drm_print: add _ddebug desc to drm_*dbg prototypes
  drm_print: use DEFINE_DYNAMIC_DEBUG_CLASSBITS for drm.debug

 .../admin-guide/dynamic-debug-howto.rst   |   7 +
 drivers/gpu/drm/Kconfig   |  12 ++
 drivers/gpu/drm/Makefile  |   2 +
 drivers/gpu/drm/drm_print.c   |  56 ---
 include/drm/drm_print.h   |  80 +++---
 include/linux/dynamic_debug.h | 113 +++---
 lib/dynamic_debug.c   | 140 ++
 7 files changed, 323 insertions(+), 87 deletions(-)

-- 
2.35.1



[PATCH 01/13] dyndbg: fix static_branch manipulation

2022-03-01 Thread Jim Cromie
In https://lore.kernel.org/lkml/20211209150910.ga23...@axis.com/

Vincent's patch commented on, and worked around, a bug toggling
static_branch's, when a 2nd PRINTK-ish flag was added.  The bug
results in a premature static_branch_disable when the 1st of 2 flags
was disabled.

The cited commit computed newflags, but then in the JUMP_LABEL block,
did not use that result, but used just one of the terms in it.  Using
newflags instead made the code work properly.

This is Vincents test-case, reduced.  It needs the 2nd flag to work
properly, but it's explanatory here.

pt_test() {
echo 5 > /sys/module/dynamic_debug/verbose

site="module tcp" # just one callsite
echo " $site =_ " > /proc/dynamic_debug/control # clear it

# A B ~A ~B
for flg in +T +p "-T #broke here" -p; do
echo " $site $flg " > /proc/dynamic_debug/control
done;

# A B ~B ~A
for flg in +T +p "-p #broke here" -T; do
echo " $site $flg " > /proc/dynamic_debug/control
done
}
pt_test

Fixes: 84da83a6ffc0 dyndbg: combine flags & mask into a struct, simplify with it
CC: vincent.whitchu...@axis.com
CC: sta...@vger.kernel.org
Signed-off-by: Jim Cromie 
---
 lib/dynamic_debug.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index dd7f56af9aed..a56c1286ffa4 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -211,10 +211,11 @@ static int ddebug_change(const struct ddebug_query *query,
continue;
 #ifdef CONFIG_JUMP_LABEL
if (dp->flags & _DPRINTK_FLAGS_PRINT) {
-   if (!(modifiers->flags & _DPRINTK_FLAGS_PRINT))
+   if (!(newflags & _DPRINTK_FLAGS_PRINT))

static_branch_disable(&dp->key.dd_key_true);
-   } else if (modifiers->flags & _DPRINTK_FLAGS_PRINT)
+   } else if (newflags & _DPRINTK_FLAGS_PRINT) {
static_branch_enable(&dp->key.dd_key_true);
+   }
 #endif
dp->flags = newflags;
v4pr_info("changed %s:%d [%s]%s =%s\n",
-- 
2.35.1



[PATCH 02/13] dyndbg: add class_id field and query support

2022-03-01 Thread Jim Cromie
DRM defines/uses 10 enum drm_debug_category's to create exclusive
classes of debug messages.  To support this directly in dynamic-debug,
add the following:

- struct _ddebug.class_id:4 - 4 bits is enough
- define _DPRINTK_SITE_UNCLASSED 15 - see below

and the query support:
- struct _ddebug_query.class_id
- ddebug_parse_query- looks for "class" keyword, then calls..
- parse_class   - accepts uint: 0-15, sets query.class_id and marker
- vpr_info_dq   - displays new field
- ddebug_proc_show  - append column with "cls:%d" if !UNCLASSED

With the patch, this command enables current/unclassed callsites:

  #> echo drm class 15 +p > /proc/dynamic_debug/control

parse_class() accepts 0 .. _DPRINTK_SITE_UNCLASSED, which allows the
>control interface to explicitly manipulate unclassed callsites.

After parsing keywords, ddebug_parse_query() sets .class_id=15, iff it
wasnt explicitly set.  This allows future classed/categorized
callsites to be untouched by legacy (class unaware) queries.

DEFINE_DYNAMIC_DEBUG_METADATA gets _CLS(cls,) suffix and arg, and
initializes the new .class_id=cls field.  The old name gets the default.

Then, these _CLS(cls,...) modifications are repeated up through the
stack of *dynamic_func_call* macros that use the METADATA initializer,
so as to actually supply the category into it.

NOTES:

_DPRINTK_SITE_UNCLASSED: this symbol is used to initialize all
existing/unclassed pr-debug callsites.  Normally, the default would be
zero, but DRM_UT_CORE "uses" that value, in the sense that 0 is
exposed as a bit position in drm.debug.  Using 15 allows identity
mapping from category to class, avoiding fiddly offsets.

CC: Rasmus Villemoes 
Signed-off-by: Jim Cromie 

fixup class-id preset

fix2
---
 .../admin-guide/dynamic-debug-howto.rst   |  7 +++
 include/linux/dynamic_debug.h | 54 ++-
 lib/dynamic_debug.c   | 48 ++---
 3 files changed, 88 insertions(+), 21 deletions(-)

diff --git a/Documentation/admin-guide/dynamic-debug-howto.rst 
b/Documentation/admin-guide/dynamic-debug-howto.rst
index a89cfa083155..8ef8d7dcd140 100644
--- a/Documentation/admin-guide/dynamic-debug-howto.rst
+++ b/Documentation/admin-guide/dynamic-debug-howto.rst
@@ -35,6 +35,7 @@ Dynamic debug has even more useful features:
- line number (including ranges of line numbers)
- module name
- format string
+   - class number:0-15
 
  * Provides a debugfs control file: ``/dynamic_debug/control``
which can be read to display the complete list of known debug
@@ -143,6 +144,7 @@ against.  Possible keywords are:::
 'module' string |
 'format' string |
 'line' line-range
+'class' integer:[0-15]
 
   line-range ::= lineno |
 '-'lineno |
@@ -217,6 +219,11 @@ line
line -1605  // the 1605 lines from line 1 to line 1605
line 1600-  // all lines from line 1600 to the end of the file
 
+class
+This expects a single integer in range: 0-15.
+15 is used/reserved for existing/unclassed callsites,
+and is defaulted in unless specified to >control
+
 The flags specification comprises a change operation followed
 by one or more flag characters.  The change operation is one
 of the characters::
diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index dce631e678dd..d4b48f3cc6e8 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -6,6 +6,8 @@
 #include 
 #endif
 
+#include 
+
 /*
  * An instance of this structure is created in a special
  * ELF section at every dynamic debug callsite.  At runtime,
@@ -21,6 +23,9 @@ struct _ddebug {
const char *filename;
const char *format;
unsigned int lineno:18;
+#define CLS_BITS 4
+   unsigned int class_id:CLS_BITS;
+#define _DPRINTK_SITE_UNCLASSED((1 << CLS_BITS) - 1)
/*
 * The flags field controls the behaviour at the callsite.
 * The bits here are changed dynamically when the user
@@ -87,7 +92,7 @@ void __dynamic_ibdev_dbg(struct _ddebug *descriptor,
 const struct ib_device *ibdev,
 const char *fmt, ...);
 
-#define DEFINE_DYNAMIC_DEBUG_METADATA(name, fmt)   \
+#define DEFINE_DYNAMIC_DEBUG_METADATA_CLS(name, cls, fmt)  \
static struct _ddebug  __aligned(8) \
__section("__dyndbg") name = {  \
.modname = KBUILD_MODNAME,  \
@@ -96,8 +101,14 @@ void __dynamic_ibdev_dbg(struct _ddebug *descriptor,
.format = (fmt),\
.lineno = __LINE__, \
.flags = _DPRINTK_FLAGS_DEFAULT,\
+   .class_id = cls,\
_DPRINTK_KEY_INIT   

[PATCH 03/13] dyndbg: add DEFINE_DYNAMIC_DEBUG_CLASSBITS macro and callbacks

2022-03-01 Thread Jim Cromie
DEFINE_DYNAMIC_DEBUG_CLASSBITS(fsname, var, bitmap_desc, classes..)
allows users to create a drm.debug style (bitmap) sysfs interface, to
control sets of pr_debug's according to their .class_id's

This wraps existing "class" keyword and behavior:

   echo "module drm -p ; module drm class 0 +p ; module drm class 2 +p" >control

With the macro in use, this is equivalent:

   echo 0x05 > /sys/module/drm/parameters/debug

To use:

DEFINE_DYNAMIC_DEBUG_CLASSBITS(debug, __drm_debug, "pmfl",
"drm.debug - bits => categories:",
/* vector of uint:4 symbols, ala enum drm_debug_category, 15 is EOL */
DRM_UT_CORE,
DRM_UT_DRIVER,
DRM_UT_KMS ... );

The 3rd arg is a string with any of the dyndbg.flags [pmflt_]+
Full exposure of the flags here lets the module author:

- fully customize/take-over the decorations of enabled sites.
  generally leaving decorations to user is preferred.

- aim the debug-stream:
  now printk, later tracefs.
  using both together means more work (p or T, in practice)
  iface doesn't care about new flags added later

- declare 2 separate sysfs-knobs, one each for p, T, if desired.

- decorations are per callsite,
  shared across sysfs-knobs for any controlled classes

To support the macro, the patch adds:

 - int param_set_dyndbg_classbits()
 - int param_get_dyndbg_classbits()
 - struct kernel_param_ops param_ops_dyndbg_classbits

Following the model of kernel/params.c STANDARD_PARAM_DEFS, these are
non-static and exported.

get/set use an augmented kernel_param; the arg refs a new struct
dyndbg_bitmap_param containing:

A- the vector of classes (drm.debug "categories") being controlled

This in-line vector of constants (uint [0-14]) specifies a sequence of
controlling bits (by position, starting at 0) with the values naming
the class_id's mapped to that bit.

A value of _DPRINTK_SITE_UNCLASSED terminates the vector processing by
param_set_dyndbg_classbits(), and is appended by the macro to insure a
defined termination after max 15 classes are applied.

Technically, the vector is a flex-array, but its size is practically
limited to max 15 in length (repeats are pointless).

B- a pointer to the user module's ulong holding the bits/state.

By accessing client's bit-state, we coordinate with existing code
that still uses drm_debug_enabled(), so they work unchanged.
The change to ulong allows use of BIT() etc.

NOTES:

_DPRINTK_SITE_UNCLASSED = 15, ie 2**CLS_BITS-1, deserves special
mention; it already marks all existing pr-debug callsites, only the
new drm.debug callsites are initialized to other (DRM_UT_*) values.

_DPRINTK_SITE_UNCLASSED is used in param_set_dyndbg_classbits() to
limit the range of bits that are processed to what fits in uint:4.
It also terminates the class-id list passed into the macro, so dont
use it halfway through your list of classes-to-control.

param_set_dyndbg_classbits() compares new vs old bits, and only
updates each class on changes.  This maximally preserves the
underlying state, which may have been customized at some point via
`echo $cmd >control`.  So if users really want to know that all
prdbgs are set precisely, they must pre-clear then set.

Identity mapping in (A) is encouraged.  Your vector should exclude 15,
if used, it terminates the list prematurely; any following bit
mappings will be ignored (it is the default/non category).

The whole (A) vector/list passed to the macro is:

- not strictly needed: 0-N bits are scanned, only N is needed in the
  macro interface to do this, not the whole list.

- 0-N list allows juggling the bit->class map
  Identity map is preferred.
  15 terminates list if used. (macro impl does this)

That said, (A) is self-documenting; the explicit list is greppable,
'DRM_UT_*' provides lots of clues.  Further, it could be upgraded,
something like:

  _pick_sym_(DRM_UT_CORE, "mumble something useful about CORE debug")

_pick_sym_(a,b) a   // gives us what we need here
_pick_help_(a,b) #a " : " b // mod-info fodder

Signed-off-by: Jim Cromie 
---
 include/linux/dynamic_debug.h | 50 +++
 lib/dynamic_debug.c   | 77 +++
 2 files changed, 127 insertions(+)

diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index d4b48f3cc6e8..e83c4e36ad29 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -209,6 +209,10 @@ void __dynamic_ibdev_dbg(struct _ddebug *descriptor,
   KERN_DEBUG, prefix_str, prefix_type, \
   rowsize, groupsize, buf, len, ascii)
 
+struct kernel_param;
+int param_set_dyndbg_classbits(const char *instr, const struct kernel_param 
*kp);
+int param_get_dyndbg_classbits(char *buffer, const struct kernel_param *kp);
+
 #else /* !CONFIG_DYNAMIC_DEBUG_CORE */
 
 #include 
@@ -255,6 +259,52 @@ static inline int dynamic_debug_exec_queries(const char 
*query, const char *modn
return 0;
 }
 
+struct kern

[PATCH 05/13] dyndbg: improve change-info to have old and new

2022-03-01 Thread Jim Cromie
move site.flag update after the v4pr_info("change") message, and
improve the message to print both old and new flag values.

Heres new form:
  dyndbg: changed net/ipv4/tcp.c:2424 [tcp]tcp_recvmsg_locked pT -> _

Signed-off-by: Jim Cromie 
---
 lib/dynamic_debug.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 60b2572e64f0..ab93b370d489 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -158,7 +158,7 @@ static int ddebug_change(const struct ddebug_query *query,
struct ddebug_table *dt;
unsigned int newflags;
unsigned int nfound = 0;
-   struct flagsbuf fbuf;
+   struct flagsbuf fbuf, nbuf;
 
/* search for matching ddebugs */
mutex_lock(&ddebug_lock);
@@ -223,11 +223,12 @@ static int ddebug_change(const struct ddebug_query *query,
static_branch_enable(&dp->key.dd_key_true);
}
 #endif
+   v4pr_info("changed %s:%d [%s]%s %s -> %s\n",
+ trim_prefix(dp->filename), dp->lineno,
+ dt->mod_name, dp->function,
+ ddebug_describe_flags(dp->flags, &fbuf),
+ ddebug_describe_flags(newflags, &nbuf));
dp->flags = newflags;
-   v4pr_info("changed %s:%d [%s]%s =%s\n",
-trim_prefix(dp->filename), dp->lineno,
-dt->mod_name, dp->function,
-ddebug_describe_flags(dp->flags, &fbuf));
}
}
mutex_unlock(&ddebug_lock);
-- 
2.35.1



[PATCH 04/13] dyndbg: drop EXPORTed dynamic_debug_exec_queries

2022-03-01 Thread Jim Cromie
This exported fn is effectively obsoleted by Commit:HEAD~2, so remove it.

The export was added here:
  commit a2d375eda771 ("dyndbg: refine export, rename to 
dynamic_debug_exec_queries()")
  commit 4c0d77828d4f ("dyndbg: export ddebug_exec_queries")

Its intent was to allow drm.debug to use the exported function to
implement its drm.debug bitmap api using dynamic_debug.

Instead, HEAD~2 implements the bitmap inside dyndbg, and exposes it in
a macro declarator, and HEAD~1 uses the macro to connect __drm_debug
to the supporting callbacks.

Since there are no other expected users, and any prospects would
likely reuse the bitmap or a straightforward extension of it, we can
drop this function until its really needed.

This also drops the CONFIG_DYNAMIC_DEBUG=N stub-func, and its
pr_warn(), which I avoided in 2012, then added in 2020 :-/

Signed-off-by: Jim Cromie 
---
 include/linux/dynamic_debug.h |  9 -
 lib/dynamic_debug.c   | 29 -
 2 files changed, 38 deletions(-)

diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index e83c4e36ad29..664bb83778d2 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -60,9 +60,6 @@ struct _ddebug {
 
 #if defined(CONFIG_DYNAMIC_DEBUG_CORE)
 
-/* exported for module authors to exercise >control */
-int dynamic_debug_exec_queries(const char *query, const char *modname);
-
 int ddebug_add_module(struct _ddebug *tab, unsigned int n,
const char *modname);
 extern int ddebug_remove_module(const char *mod_name);
@@ -253,12 +250,6 @@ static inline int ddebug_dyndbg_module_param_cb(char 
*param, char *val,
rowsize, groupsize, buf, len, ascii);   \
} while (0)
 
-static inline int dynamic_debug_exec_queries(const char *query, const char 
*modname)
-{
-   pr_warn("kernel not built with CONFIG_DYNAMIC_DEBUG_CORE\n");
-   return 0;
-}
-
 struct kernel_param;
 static inline int param_set_dyndbg_classbits(const char *instr, const struct 
kernel_param *kp)
 { return 0; }
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 7eb1c31f870d..60b2572e64f0 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -582,35 +582,6 @@ static int ddebug_exec_queries(char *query, const char 
*modname)
return nfound;
 }
 
-/**
- * dynamic_debug_exec_queries - select and change dynamic-debug prints
- * @query: query-string described in admin-guide/dynamic-debug-howto
- * @modname: string containing module name, usually &module.mod_name
- *
- * This uses the >/proc/dynamic_debug/control reader, allowing module
- * authors to modify their dynamic-debug callsites. The modname is
- * canonically struct module.mod_name, but can also be null or a
- * module-wildcard, for example: "drm*".
- */
-int dynamic_debug_exec_queries(const char *query, const char *modname)
-{
-   int rc;
-   char *qry; /* writable copy of query */
-
-   if (!query) {
-   pr_err("non-null query/command string expected\n");
-   return -EINVAL;
-   }
-   qry = kstrndup(query, PAGE_SIZE, GFP_KERNEL);
-   if (!qry)
-   return -ENOMEM;
-
-   rc = ddebug_exec_queries(qry, modname);
-   kfree(qry);
-   return rc;
-}
-EXPORT_SYMBOL_GPL(dynamic_debug_exec_queries);
-
 #ifdef CONFIG_MODULES
 #define KP_MOD_NAME kp->mod->name
 #else
-- 
2.35.1



[PATCH 06/13] dyndbg: abstract dyndbg_site_is_printing

2022-03-01 Thread Jim Cromie
Hide flags test in a macro.
no functional changes.

Signed-off-by: Jim Cromie 
---
 include/linux/dynamic_debug.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index 664bb83778d2..106065244f73 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -56,7 +56,7 @@ struct _ddebug {
 #endif
 } __attribute__((aligned(8)));
 
-
+#define dyndbg_site_is_printing(desc)  (desc->flags & _DPRINTK_FLAGS_PRINT)
 
 #if defined(CONFIG_DYNAMIC_DEBUG_CORE)
 
-- 
2.35.1



[PATCH 07/13] drm_print: condense enum drm_debug_category

2022-03-01 Thread Jim Cromie
enum drm_debug_category has 10 "classes", explicitly initialized with
0x-bitmasks which could be simplified as BIT(X)s.  But lets go
further: use natural enumeration (int, starting at 0), and do the
BIT(cat) in drm_debug_enabled(cat) at runtime.

While this slightly pessimizes the bit-test, the category now fits in
4 bits, allowing it in struct _ddebug.class_id:4.  This sets us up to
adapt drm to use dyndbg with JUMP_LABEL, thus avoiding all those
bit-tests anyway.

Signed-off-by: Jim Cromie 
---
 include/drm/drm_print.h | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
index 22fabdeed297..b3b470440e46 100644
--- a/include/drm/drm_print.h
+++ b/include/drm/drm_print.h
@@ -279,49 +279,49 @@ enum drm_debug_category {
 * @DRM_UT_CORE: Used in the generic drm code: drm_ioctl.c, drm_mm.c,
 * drm_memory.c, ...
 */
-   DRM_UT_CORE = 0x01,
+   DRM_UT_CORE,
/**
 * @DRM_UT_DRIVER: Used in the vendor specific part of the driver: i915,
 * radeon, ... macro.
 */
-   DRM_UT_DRIVER   = 0x02,
+   DRM_UT_DRIVER,
/**
 * @DRM_UT_KMS: Used in the modesetting code.
 */
-   DRM_UT_KMS  = 0x04,
+   DRM_UT_KMS,
/**
 * @DRM_UT_PRIME: Used in the prime code.
 */
-   DRM_UT_PRIME= 0x08,
+   DRM_UT_PRIME,
/**
 * @DRM_UT_ATOMIC: Used in the atomic code.
 */
-   DRM_UT_ATOMIC   = 0x10,
+   DRM_UT_ATOMIC,
/**
 * @DRM_UT_VBL: Used for verbose debug message in the vblank code.
 */
-   DRM_UT_VBL  = 0x20,
+   DRM_UT_VBL,
/**
 * @DRM_UT_STATE: Used for verbose atomic state debugging.
 */
-   DRM_UT_STATE= 0x40,
+   DRM_UT_STATE,
/**
 * @DRM_UT_LEASE: Used in the lease code.
 */
-   DRM_UT_LEASE= 0x80,
+   DRM_UT_LEASE,
/**
 * @DRM_UT_DP: Used in the DP code.
 */
-   DRM_UT_DP   = 0x100,
+   DRM_UT_DP,
/**
 * @DRM_UT_DRMRES: Used in the drm managed resources code.
 */
-   DRM_UT_DRMRES   = 0x200,
+   DRM_UT_DRMRES
 };
 
 static inline bool drm_debug_enabled(enum drm_debug_category category)
 {
-   return unlikely(__drm_debug & category);
+   return unlikely(__drm_debug & BIT(category));
 }
 
 /*
-- 
2.35.1



[PATCH 08/13] drm_print: interpose drm_*dbg with forwarding macros

2022-03-01 Thread Jim Cromie
drm_dev_dbg() & drm_dbg() sit below the categorized layer of the DRM
debug API, and implement most of it.  These are good places to insert
dynamic-debug jump-label mechanics, allowing DRM to avoid the runtime
cost of drm_debug_enabled().

Set up for this by changing the func names by adding '__' prefixes,
and define forwarding macros to the new names.

no functional changes.

memory cost baseline: (unchanged)
bash-5.1# drms_load
[9.220389] dyndbg:   1 debug prints in module drm
[9.224426] ACPI: bus type drm_connector registered
[9.302192] dyndbg:   2 debug prints in module ttm
[9.305033] dyndbg:   8 debug prints in module video
[9.627563] dyndbg: 127 debug prints in module i915
[9.721505] AMD-Vi: AMD IOMMUv2 functionality not available on this system - 
This is not a bug.
[   10.091345] dyndbg: 2196 debug prints in module amdgpu
[   10.106589] [drm] amdgpu kernel modesetting enabled.
[   10.107270] amdgpu: CRAT table not found
[   10.107926] amdgpu: Virtual CRAT table created for CPU
[   10.108398] amdgpu: Topology: Add CPU node
[   10.168507] dyndbg:   3 debug prints in module wmi
[   10.329587] dyndbg:   3 debug prints in module nouveau

Signed-off-by: Jim Cromie 
---
 drivers/gpu/drm/drm_print.c | 10 +-
 include/drm/drm_print.h |  9 +++--
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c
index f783d4963d4b..e45ba224e57c 100644
--- a/drivers/gpu/drm/drm_print.c
+++ b/drivers/gpu/drm/drm_print.c
@@ -256,8 +256,8 @@ void drm_dev_printk(const struct device *dev, const char 
*level,
 }
 EXPORT_SYMBOL(drm_dev_printk);
 
-void drm_dev_dbg(const struct device *dev, enum drm_debug_category category,
-const char *format, ...)
+void __drm_dev_dbg(const struct device *dev, enum drm_debug_category category,
+  const char *format, ...)
 {
struct va_format vaf;
va_list args;
@@ -278,9 +278,9 @@ void drm_dev_dbg(const struct device *dev, enum 
drm_debug_category category,
 
va_end(args);
 }
-EXPORT_SYMBOL(drm_dev_dbg);
+EXPORT_SYMBOL(__drm_dev_dbg);
 
-void __drm_dbg(enum drm_debug_category category, const char *format, ...)
+void ___drm_dbg(enum drm_debug_category category, const char *format, ...)
 {
struct va_format vaf;
va_list args;
@@ -297,7 +297,7 @@ void __drm_dbg(enum drm_debug_category category, const char 
*format, ...)
 
va_end(args);
 }
-EXPORT_SYMBOL(__drm_dbg);
+EXPORT_SYMBOL(___drm_dbg);
 
 void __drm_err(const char *format, ...)
 {
diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
index b3b470440e46..4bed99326631 100644
--- a/include/drm/drm_print.h
+++ b/include/drm/drm_print.h
@@ -334,7 +334,7 @@ __printf(3, 4)
 void drm_dev_printk(const struct device *dev, const char *level,
const char *format, ...);
 __printf(3, 4)
-void drm_dev_dbg(const struct device *dev, enum drm_debug_category category,
+void __drm_dev_dbg(const struct device *dev, enum drm_debug_category category,
 const char *format, ...);
 
 /**
@@ -383,6 +383,9 @@ void drm_dev_dbg(const struct device *dev, enum 
drm_debug_category category,
}   \
 })
 
+#define drm_dev_dbg(dev, cat, fmt, ...)\
+   __drm_dev_dbg(dev, cat, fmt, ##__VA_ARGS__)
+
 /**
  * DRM_DEV_DEBUG() - Debug output for generic drm code
  *
@@ -484,10 +487,12 @@ void drm_dev_dbg(const struct device *dev, enum 
drm_debug_category category,
  */
 
 __printf(2, 3)
-void __drm_dbg(enum drm_debug_category category, const char *format, ...);
+void ___drm_dbg(enum drm_debug_category category, const char *format, ...);
 __printf(1, 2)
 void __drm_err(const char *format, ...);
 
+#define __drm_dbg(fmt, ...)___drm_dbg(fmt, ##__VA_ARGS__)
+
 /* Macros to make printk easier */
 
 #define _DRM_PRINTK(once, level, fmt, ...) \
-- 
2.35.1



[PATCH 09/13] drm_print: wrap drm_*_dbg in dyndbg jumplabel

2022-03-01 Thread Jim Cromie
For CONFIG_DRM_USE_DYNAMIC_DEBUG=y, wrap drm_dbg() & drm_dev_dbg() in
one of dyndbg's Factory macros: _dynamic_func_call_no_desc().
This makes the (~4000) callsites controllable, typically by class:

  # 0 is DRM_UT_CORE
  #> echo module drm class 0 +p > /proc/dynamic_debug/control

 =N: keeps direct forwarding: drm_*_dbg -> __drm_*_dbg()

I added the CONFIG_DRM_USE_DYNAMIC_DEBUG item because of the .data
footprint cost of per-callsite control; 56 bytes/site * ~2k,4k
callsites (for i915, amdgpu), which is significant enough that a user
might not want it.  Using CONFIG_DYNAMIC_DEBUG_CORE only eliminates
the builtin portion, leaving only drm modules, but still 200k of
module data is a lot.

Signed-off-by: Jim Cromie 
---
 drivers/gpu/drm/Kconfig  | 12 
 drivers/gpu/drm/Makefile |  2 ++
 include/drm/drm_print.h  | 12 
 3 files changed, 26 insertions(+)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index b1f22e457fd0..ec14a1cd4449 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -63,6 +63,18 @@ config DRM_DEBUG_MM
 
  If in doubt, say "N".
 
+config DRM_USE_DYNAMIC_DEBUG
+   bool "use dynamic debug to implement drm.debug"
+   default y
+   depends on DRM
+   depends on DYNAMIC_DEBUG || DYNAMIC_DEBUG_CORE
+   depends on JUMP_LABEL
+   help
+ Use dynamic-debug to avoid drm_debug_enabled() runtime overheads.
+ Due to callsite counts in DRM drivers (~4k in amdgpu) and 56
+ bytes per callsite, the .data costs can be substantial, and
+ are therefore configurable.
+
 config DRM_DEBUG_SELFTEST
tristate "kselftests for DRM"
depends on DRM
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 301a44dc18e3..24e6410d6c0e 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -3,6 +3,8 @@
 # Makefile for the drm device driver.  This driver provides support for the
 # Direct Rendering Infrastructure (DRI) in XFree86 4.1.0 and higher.
 
+CFLAGS-$(CONFIG_DRM_USE_DYNAMIC_DEBUG) += -DDYNAMIC_DEBUG_MODULE
+
 drm-y   := drm_aperture.o drm_auth.o drm_cache.o \
drm_file.o drm_gem.o drm_ioctl.o \
drm_drv.o \
diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
index 4bed99326631..06f0ee06be1f 100644
--- a/include/drm/drm_print.h
+++ b/include/drm/drm_print.h
@@ -383,8 +383,14 @@ void __drm_dev_dbg(const struct device *dev, enum 
drm_debug_category category,
}   \
 })
 
+#if !defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
 #define drm_dev_dbg(dev, cat, fmt, ...)\
__drm_dev_dbg(dev, cat, fmt, ##__VA_ARGS__)
+#else
+#define drm_dev_dbg(dev, cat, fmt, ...)\
+   _dynamic_func_call_no_desc(fmt, __drm_dev_dbg,  \
+  dev, cat, fmt, ##__VA_ARGS__)
+#endif
 
 /**
  * DRM_DEV_DEBUG() - Debug output for generic drm code
@@ -491,7 +497,13 @@ void ___drm_dbg(enum drm_debug_category category, const 
char *format, ...);
 __printf(1, 2)
 void __drm_err(const char *format, ...);
 
+#if !defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
 #define __drm_dbg(fmt, ...)___drm_dbg(fmt, ##__VA_ARGS__)
+#else
+#define __drm_dbg(cat, fmt, ...)   \
+   _dynamic_func_call_no_desc(fmt, ___drm_dbg, \
+  cat, fmt, ##__VA_ARGS__)
+#endif
 
 /* Macros to make printk easier */
 
-- 
2.35.1



[PATCH 10/13] drm_print: refine drm_debug_enabled for dyndbg+jump-label

2022-03-01 Thread Jim Cromie
In order to use dynamic-debug's jump-label optimization in drm-debug,
its clarifying to refine drm_debug_enabled into 3 uses:

1.   drm_debug_enabled - legacy, public
2. __drm_debug_enabled - optimized for dyndbg jump-label enablement.
3.  _drm_debug_enabled - pr_debug instrumented, observable

1. The legacy version always checks the bits.

2. is privileged, for use by __drm_dbg(), __drm_dev_dbg(), which do an
early return unless the category is enabled (free of call/NOOP side
effects).  For dyndbg builds, debug callsites are selectively
"pre-enabled", so __drm_debug_enabled() short-circuits to true there.
Remaining callers of 1 may be able to use 2, case by case.

3. is 1st wrapped in a macro, with a pr_debug, which reports each
usage in /proc/dynamic_debug/control, making it observable in the
logs.  The macro lets the pr_debug see the real caller, not an inline
function.

When plugged into 1, it identified ~10 remaining callers of the
function, leading to the follow-on cleanup patch, and would allow
activating the pr_debugs, estimating the callrate, and the potential
savings by using the wrapper macro.  It is unused ATM, but it fills
out the picture.

Signed-off-by: Jim Cromie 
---
 drivers/gpu/drm/drm_print.c |  4 ++--
 include/drm/drm_print.h | 28 
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c
index e45ba224e57c..92e6e18026da 100644
--- a/drivers/gpu/drm/drm_print.c
+++ b/drivers/gpu/drm/drm_print.c
@@ -262,7 +262,7 @@ void __drm_dev_dbg(const struct device *dev, enum 
drm_debug_category category,
struct va_format vaf;
va_list args;
 
-   if (!drm_debug_enabled(category))
+   if (!__drm_debug_enabled(category))
return;
 
va_start(args, format);
@@ -285,7 +285,7 @@ void ___drm_dbg(enum drm_debug_category category, const 
char *format, ...)
struct va_format vaf;
va_list args;
 
-   if (!drm_debug_enabled(category))
+   if (!__drm_debug_enabled(category))
return;
 
va_start(args, format);
diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
index 06f0ee06be1f..38ef044d786e 100644
--- a/include/drm/drm_print.h
+++ b/include/drm/drm_print.h
@@ -319,11 +319,39 @@ enum drm_debug_category {
DRM_UT_DRMRES
 };
 
+/*
+ * 3 name flavors of drm_debug_enabled:
+ *   drm_debug_enabled - public/legacy, always checks bits
+ *  _drm_debug_enabled - instrumented to observe call-rates, est overheads.
+ * __drm_debug_enabled - privileged - knows jump-label state, can short-circuit
+ */
 static inline bool drm_debug_enabled(enum drm_debug_category category)
 {
return unlikely(__drm_debug & BIT(category));
 }
 
+/*
+ * Wrap fn in macro, so that the pr_debug sees the actual caller, not
+ * the inline fn.  Using this name creates a callsite entry / control
+ * point in /proc/dynamic_debug/control.
+ */
+#define _drm_debug_enabled(category)   \
+   ({  \
+   pr_debug("todo: maybe avoid via dyndbg\n"); \
+   drm_debug_enabled(category);\
+   })
+
+#if defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
+/*
+ * dyndbg is wrapping the drm.debug API, so as to avoid the runtime
+ * bit-test overheads of drm_debug_enabled() in those api calls.
+ * In this case, executed callsites are known enabled, so true.
+ */
+#define __drm_debug_enabled(category)  true
+#else
+#define __drm_debug_enabled(category)  drm_debug_enabled(category)
+#endif
+
 /*
  * struct device based logging
  *
-- 
2.35.1



[PATCH 11/13] drm_print: prefer bare printk KERN_DEBUG on generic fn

2022-03-01 Thread Jim Cromie
drm_print.c calls pr_debug() just once, from __drm_printfn_debug(),
which is a generic/service fn.  The callsite is compile-time enabled
by DEBUG in both DYNAMIC_DEBUG=y/n builds.

For dyndbg builds, reverting this callsite back to bare printk is
correcting a few anti-features:

1- callsite is generic, serves multiple drm users.
   its hardwired on currently
   could accidentally: #> echo -p > /proc/dynamic_debug/control

2- optional "decorations" by dyndbg are unhelpful/misleading
   they describe only the generic site, not end users

IOW, 1,2 are unhelpful at best, and possibly confusing.

reverting yields a nominal data and text shrink:

   textdata bss dec hex filename
 462583   36604   54592 553779   87333 
/lib/modules/5.16.0-rc4-lm1-8-ged3eac8ceeea/kernel/drivers/gpu/drm/drm.ko
 462515   36532   54592 553639   872a7 
/lib/modules/5.16.0-rc4-lm1-9-g6ce0b88d2539-dirty/kernel/drivers/gpu/drm/drm.ko

NB: this was noticed using _drm_debug_enabled(), added earlier.

Signed-off-by: Jim Cromie 
---
 drivers/gpu/drm/drm_print.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c
index 92e6e18026da..24c57b92dc69 100644
--- a/drivers/gpu/drm/drm_print.c
+++ b/drivers/gpu/drm/drm_print.c
@@ -23,8 +23,6 @@
  * Rob Clark 
  */
 
-#define DEBUG /* for pr_debug() */
-
 #include 
 
 #include 
@@ -162,7 +160,8 @@ EXPORT_SYMBOL(__drm_printfn_info);
 
 void __drm_printfn_debug(struct drm_printer *p, struct va_format *vaf)
 {
-   pr_debug("%s %pV", p->prefix, vaf);
+   /* pr_debug callsite decorations are unhelpful here */
+   printk(KERN_DEBUG "%s %pV", p->prefix, vaf);
 }
 EXPORT_SYMBOL(__drm_printfn_debug);
 
-- 
2.35.1



[PATCH 13/13] drm_print: use DEFINE_DYNAMIC_DEBUG_CLASSBITS for drm.debug

2022-03-01 Thread Jim Cromie
if CONFIG_DRM_USE_DYNAMIC_DEBUG=y, use new macro to create the sysfs
bitmap to control drm.debug callsites.

DEFINE_DYNAMIC_DEBUG_CLASSBITS( debug, __drm_debug, "p",
"drm.debug - control summary",
/* inline vector of _ddebug.class_id's to be controlled, max 14 vals */
DRM_UT_CORE,
DRM_UT_DRIVER,
DRM_UT_KMS,
DRM_UT_PRIME,
DRM_UT_ATOMIC,
DRM_UT_VBL,
DRM_UT_STATE,
DRM_UT_LEASE,
DRM_UT_DP,
DRM_UT_DRMRES );

NOTES:

The @_flgs used here is "p", so this bitmap enables input to syslog
only, matching legacy behavior.

Also, no "fmlt" decorator flags are used here; that is discouraged, as
it then toggles those flags along with the "p".  This would overwrite
any customizations a user added since the sysfs-knob was last used.
Still, there may be cases/reasons.

_ddebug.class_id is uint:4, values 0-14 are valid. 15 is reserved for
non-classified callsites (regular pr_debugs).  Using it terminates the
scan, don't use it halfway through your list.

Signed-off-by: Jim Cromie 
---
 drivers/gpu/drm/drm_print.c | 20 ++--
 include/drm/drm_print.h |  4 ++--
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c
index c9b2a2ab0d3d..d916daa384e5 100644
--- a/drivers/gpu/drm/drm_print.c
+++ b/drivers/gpu/drm/drm_print.c
@@ -38,7 +38,7 @@
  * __drm_debug: Enable debug output.
  * Bitmask of DRM_UT_x. See include/drm/drm_print.h for details.
  */
-unsigned int __drm_debug;
+unsigned long __drm_debug;
 EXPORT_SYMBOL(__drm_debug);
 
 MODULE_PARM_DESC(debug, "Enable debug output, where each bit enables a debug 
category.\n"
@@ -50,7 +50,23 @@ MODULE_PARM_DESC(debug, "Enable debug output, where each bit 
enables a debug cat
 "\t\tBit 5 (0x20)  will enable VBL messages (vblank code)\n"
 "\t\tBit 7 (0x80)  will enable LEASE messages (leasing code)\n"
 "\t\tBit 8 (0x100) will enable DP messages (displayport code)");
-module_param_named(debug, __drm_debug, int, 0600);
+
+#if !defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
+module_param_named(debug, __drm_debug, ulong, 0600);
+#else
+DEFINE_DYNAMIC_DEBUG_CLASSBITS(debug, __drm_debug, "p",
+   "enable drm.debug categories - 1 bit per category",
+   DRM_UT_CORE,
+   DRM_UT_DRIVER,
+   DRM_UT_KMS,
+   DRM_UT_PRIME,
+   DRM_UT_ATOMIC,
+   DRM_UT_VBL,
+   DRM_UT_STATE,
+   DRM_UT_LEASE,
+   DRM_UT_DP,
+   DRM_UT_DRMRES);
+#endif
 
 void __drm_puts_coredump(struct drm_printer *p, const char *str)
 {
diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
index 13d52b60f388..419140bf992d 100644
--- a/include/drm/drm_print.h
+++ b/include/drm/drm_print.h
@@ -36,7 +36,7 @@
 #include 
 
 /* Do *not* use outside of drm_print.[ch]! */
-extern unsigned int __drm_debug;
+extern unsigned long __drm_debug;
 
 /**
  * DOC: print
@@ -527,7 +527,7 @@ __printf(1, 2)
 void __drm_err(const char *format, ...);
 
 #if !defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
-#define __drm_dbg(fmt, ...)___drm_dbg(NULL, fmt, ##__VA_ARGS__)
+#define __drm_dbg(cat, fmt, ...)   ___drm_dbg(NULL, cat, fmt, 
##__VA_ARGS__)
 #else
 #define __drm_dbg(cat, fmt, ...)   \
_dynamic_func_call_cls(cat, fmt, ___drm_dbg,\
-- 
2.35.1



[PATCH 12/13] drm_print: add _ddebug desc to drm_*dbg prototypes

2022-03-01 Thread Jim Cromie
Add a struct _ddebug ptr to drm_dbg() and drm_dev_dbg() protos.

And upgrade the current use of _dynamic_func_call_no_desc(); ie drop
the '_no_desc', since the factory macro's callees (these 2 functions)
are now expecting the arg.

This lets those functions act more like pr_debug().  It also means
that these functions don't just get the decorations from an underlying
implementation.  DRM already has standards for logging/messaging;
tossing optional decorations on top may not help.

use that info

provide it to dyndbg [1], which can then
control debug enablement and decoration for all those drm.debug
callsites.

For CONFIG_DRM_USE_DYNAMIC_DEBUG=N, just pass null.
NB: desc->class_id is redundant with category, but !!desc dependent.

Signed-off-by: Jim Cromie 
---
 drivers/gpu/drm/drm_print.c | 23 +--
 include/drm/drm_print.h | 23 ---
 2 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c
index 24c57b92dc69..c9b2a2ab0d3d 100644
--- a/drivers/gpu/drm/drm_print.c
+++ b/drivers/gpu/drm/drm_print.c
@@ -255,8 +255,8 @@ void drm_dev_printk(const struct device *dev, const char 
*level,
 }
 EXPORT_SYMBOL(drm_dev_printk);
 
-void __drm_dev_dbg(const struct device *dev, enum drm_debug_category category,
-  const char *format, ...)
+void __drm_dev_dbg(struct _ddebug *desc, const struct device *dev,
+  enum drm_debug_category category, const char *format, ...)
 {
struct va_format vaf;
va_list args;
@@ -264,22 +264,25 @@ void __drm_dev_dbg(const struct device *dev, enum 
drm_debug_category category,
if (!__drm_debug_enabled(category))
return;
 
+   /* we know we are printing for either syslog, tracefs, or both */
va_start(args, format);
vaf.fmt = format;
vaf.va = &args;
 
-   if (dev)
-   dev_printk(KERN_DEBUG, dev, "[" DRM_NAME ":%ps] %pV",
-  __builtin_return_address(0), &vaf);
-   else
-   printk(KERN_DEBUG "[" DRM_NAME ":%ps] %pV",
-  __builtin_return_address(0), &vaf);
-
+   if (dev) {
+   if (dyndbg_site_is_printing(desc))
+   dev_printk(KERN_DEBUG, dev, "[" DRM_NAME ":%ps] %pV",
+  __builtin_return_address(0), &vaf);
+   } else {
+   if (dyndbg_site_is_printing(desc))
+   printk(KERN_DEBUG "[" DRM_NAME ":%ps] %pV",
+  __builtin_return_address(0), &vaf);
+   }
va_end(args);
 }
 EXPORT_SYMBOL(__drm_dev_dbg);
 
-void ___drm_dbg(enum drm_debug_category category, const char *format, ...)
+void ___drm_dbg(struct _ddebug *desc, enum drm_debug_category category, const 
char *format, ...)
 {
struct va_format vaf;
va_list args;
diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
index 38ef044d786e..13d52b60f388 100644
--- a/include/drm/drm_print.h
+++ b/include/drm/drm_print.h
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -361,9 +362,9 @@ static inline bool drm_debug_enabled(enum 
drm_debug_category category)
 __printf(3, 4)
 void drm_dev_printk(const struct device *dev, const char *level,
const char *format, ...);
-__printf(3, 4)
-void __drm_dev_dbg(const struct device *dev, enum drm_debug_category category,
-const char *format, ...);
+__printf(4, 5)
+void __drm_dev_dbg(struct _ddebug *desc, const struct device *dev,
+  enum drm_debug_category category, const char *format, ...);
 
 /**
  * DRM_DEV_ERROR() - Error output.
@@ -413,11 +414,11 @@ void __drm_dev_dbg(const struct device *dev, enum 
drm_debug_category category,
 
 #if !defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
 #define drm_dev_dbg(dev, cat, fmt, ...)\
-   __drm_dev_dbg(dev, cat, fmt, ##__VA_ARGS__)
+   __drm_dev_dbg(NULL, dev, cat, fmt, ##__VA_ARGS__)
 #else
 #define drm_dev_dbg(dev, cat, fmt, ...)\
-   _dynamic_func_call_no_desc(fmt, __drm_dev_dbg,  \
-  dev, cat, fmt, ##__VA_ARGS__)
+   _dynamic_func_call_cls(cat, fmt, __drm_dev_dbg, \
+  dev, cat, fmt, ##__VA_ARGS__)
 #endif
 
 /**
@@ -520,17 +521,17 @@ void __drm_dev_dbg(const struct device *dev, enum 
drm_debug_category category,
  * Prefer drm_device based logging over device or prink based logging.
  */
 
-__printf(2, 3)
-void ___drm_dbg(enum drm_debug_category category, const char *format, ...);
+__printf(3, 4)
+void ___drm_dbg(struct _ddebug *desc, enum drm_debug_category category, const 
char *format, ...);
 __printf(1, 2)
 void __drm_err(const char *format, ...);
 
 #if !defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
-#define __drm_dbg(fmt, ...)___drm_dbg(fmt, ##__VA_ARGS__)
+#define __drm_dbg(fmt, ...) 

Re: [PATCH] mm: split vm_normal_pages for LRU and non-LRU handling

2022-03-01 Thread David Hildenbrand
On 01.03.22 17:30, Felix Kuehling wrote:
> Am 2022-03-01 um 11:22 schrieb David Hildenbrand:
>   if (PageReserved(page))
> diff --git a/mm/migrate.c b/mm/migrate.c
> index c31d04b46a5e..17d049311b78 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1614,7 +1614,7 @@ static int add_page_for_migration(struct mm_struct 
> *mm, unsigned long addr,
>   goto out;
>
>   /* FOLL_DUMP to ignore special (like zero) pages */
> - follflags = FOLL_GET | FOLL_DUMP;
> + follflags = FOLL_GET | FOLL_DUMP | FOLL_LRU;
>   page = follow_page(vma, addr, follflags);
 Why wouldn't we want to dump DEVICE_COHERENT pages? This looks wrong.
>>> This function later calls isolate_lru_page, which is something you can't
>>> do with a device page.
>>>
>> Then, that code might require care instead. We most certainly don't want
>> to have random memory holes in a dump just because some anonymous memory
>> was migrated to DEVICE_COHERENT.
> I don't think this code is for core dumps. The call chain I see is
> 
> SYSCALL_DEFINE6(move_pages, ...) -> kernel_move_pages -> do_pages_move 
> -> add_page_for_migration
> 

Ah, sorry, I got mislead by FOLL_DUMP and thought we'd be in
get_dump_page() . As you said, nothing to do.

-- 
Thanks,

David / dhildenb



Re: [PATCH] mm: split vm_normal_pages for LRU and non-LRU handling

2022-03-01 Thread David Hildenbrand


>>
>>> if (PageReserved(page))
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index c31d04b46a5e..17d049311b78 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1614,7 +1614,7 @@ static int add_page_for_migration(struct mm_struct 
>>> *mm, unsigned long addr,
>>> goto out;
>>>   
>>> /* FOLL_DUMP to ignore special (like zero) pages */
>>> -   follflags = FOLL_GET | FOLL_DUMP;
>>> +   follflags = FOLL_GET | FOLL_DUMP | FOLL_LRU;
>>> page = follow_page(vma, addr, follflags);
>> Why wouldn't we want to dump DEVICE_COHERENT pages? This looks wrong.
> 
> This function later calls isolate_lru_page, which is something you can't 
> do with a device page.
> 

Then, that code might require care instead. We most certainly don't want
to have random memory holes in a dump just because some anonymous memory
was migrated to DEVICE_COHERENT.

-- 
Thanks,

David / dhildenb



Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Jakob Koschel



> On 1. Mar 2022, at 18:36, Greg KH  wrote:
> 
> On Tue, Mar 01, 2022 at 12:28:15PM +0100, Jakob Koschel wrote:
>> 
>> 
>>> On 1. Mar 2022, at 01:41, Linus Torvalds  
>>> wrote:
>>> 
>>> On Mon, Feb 28, 2022 at 1:47 PM Jakob Koschel  
>>> wrote:
 
 The goal of this is to get compiler warnings right? This would indeed be 
 great.
>>> 
>>> Yes, so I don't mind having a one-time patch that has been gathered
>>> using some automated checker tool, but I don't think that works from a
>>> long-term maintenance perspective.
>>> 
>>> So if we have the basic rule being "don't use the loop iterator after
>>> the loop has finished, because it can cause all kinds of subtle
>>> issues", then in _addition_ to fixing the existing code paths that
>>> have this issue, I really would want to (a) get a compiler warning for
>>> future cases and (b) make it not actually _work_ for future cases.
>>> 
>>> Because otherwise it will just happen again.
>>> 
 Changing the list_for_each_entry() macro first will break all of those 
 cases
 (e.g. the ones using 'list_entry_is_head()).
>>> 
>>> So I have no problems with breaking cases that we basically already
>>> have a patch for due to  your automated tool. There were certainly
>>> more than a handful, but it didn't look _too_ bad to just make the
>>> rule be "don't use the iterator after the loop".
>>> 
>>> Of course, that's just based on that patch of yours. Maybe there are a
>>> ton of other cases that your patch didn't change, because they didn't
>>> match your trigger case, so I may just be overly optimistic here.
>> 
>> Based on the coccinelle script there are ~480 cases that need fixing
>> in total. I'll now finish all of them and then split them by
>> submodules as Greg suggested and repost a patch set per submodule.
>> Sounds good?
> 
> Sounds good to me!
> 
> If you need help carving these up and maintaining them over time as
> different subsystem maintainers accept/ignore them, just let me know.
> Doing large patchsets like this can be tough without a lot of
> experience.

Very much appreciated!

There will probably be some cases that do not match one of the pattern
we already discussed and need separate attention.

I was planning to start with one subsystem and adjust the coming ones
according to the feedback gather there instead of posting all of them
in one go.

> 
> thanks,
> 
> greg k-h

- Jakob

Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Greg KH
On Mon, Feb 28, 2022 at 01:06:57PM +0100, Jakob Koschel wrote:
> 
> 
> > On 28. Feb 2022, at 12:20, Greg KH  wrote:
> > 
> > On Mon, Feb 28, 2022 at 12:08:18PM +0100, Jakob Koschel wrote:
> >> If the list does not contain the expected element, the value of
> >> list_for_each_entry() iterator will not point to a valid structure.
> >> To avoid type confusion in such case, the list iterator
> >> scope will be limited to list_for_each_entry() loop.
> >> 
> >> In preparation to limiting scope of a list iterator to the list traversal
> >> loop, use a dedicated pointer to point to the found element.
> >> Determining if an element was found is then simply checking if
> >> the pointer is != NULL.
> >> 
> >> Signed-off-by: Jakob Koschel 
> >> ---
> >> arch/x86/kernel/cpu/sgx/encl.c   |  6 +++--
> >> drivers/scsi/scsi_transport_sas.c| 17 -
> >> drivers/thermal/thermal_core.c   | 38 ++--
> >> drivers/usb/gadget/configfs.c| 22 ++--
> >> drivers/usb/gadget/udc/max3420_udc.c | 11 +---
> >> drivers/usb/gadget/udc/tegra-xudc.c  | 11 +---
> >> drivers/usb/mtu3/mtu3_gadget.c   | 11 +---
> >> drivers/usb/musb/musb_gadget.c   | 11 +---
> >> drivers/vfio/mdev/mdev_core.c| 11 +---
> >> 9 files changed, 88 insertions(+), 50 deletions(-)
> > 
> > The drivers/usb/ portion of this patch should be in patch 1/X, right?
> 
> I kept them separate since it's a slightly different case.
> Using 'ptr' instead of '&ptr->other_member'. Regardless, I'll split
> this commit up as you mentioned.
> 
> > 
> > Also, you will have to split these up per-subsystem so that the
> > different subsystem maintainers can take these in their trees.
> 
> Thanks for the feedback.
> To clarify I understand you correctly:
> I should do one patch set per-subsystem with every instance/(file?)
> fixed as a separate commit?

Yes, each file needs a different commit as they are usually all written
or maintained by different people.

As I said in my other response, if you need any help with this, just let
me know, I'm glad to help.

thanks,

greg k-h


Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Greg KH
On Tue, Mar 01, 2022 at 12:28:15PM +0100, Jakob Koschel wrote:
> 
> 
> > On 1. Mar 2022, at 01:41, Linus Torvalds  
> > wrote:
> > 
> > On Mon, Feb 28, 2022 at 1:47 PM Jakob Koschel  
> > wrote:
> >> 
> >> The goal of this is to get compiler warnings right? This would indeed be 
> >> great.
> > 
> > Yes, so I don't mind having a one-time patch that has been gathered
> > using some automated checker tool, but I don't think that works from a
> > long-term maintenance perspective.
> > 
> > So if we have the basic rule being "don't use the loop iterator after
> > the loop has finished, because it can cause all kinds of subtle
> > issues", then in _addition_ to fixing the existing code paths that
> > have this issue, I really would want to (a) get a compiler warning for
> > future cases and (b) make it not actually _work_ for future cases.
> > 
> > Because otherwise it will just happen again.
> > 
> >> Changing the list_for_each_entry() macro first will break all of those 
> >> cases
> >> (e.g. the ones using 'list_entry_is_head()).
> > 
> > So I have no problems with breaking cases that we basically already
> > have a patch for due to  your automated tool. There were certainly
> > more than a handful, but it didn't look _too_ bad to just make the
> > rule be "don't use the iterator after the loop".
> > 
> > Of course, that's just based on that patch of yours. Maybe there are a
> > ton of other cases that your patch didn't change, because they didn't
> > match your trigger case, so I may just be overly optimistic here.
> 
> Based on the coccinelle script there are ~480 cases that need fixing
> in total. I'll now finish all of them and then split them by
> submodules as Greg suggested and repost a patch set per submodule.
> Sounds good?

Sounds good to me!

If you need help carving these up and maintaining them over time as
different subsystem maintainers accept/ignore them, just let me know.
Doing large patchsets like this can be tough without a lot of
experience.

thanks,

greg k-h


[PATCH 1/2] drm/amdgpu: Fix sigsev when accessing MMIO on hot unplug.

2022-03-01 Thread Andrey Grodzovsky
Protect with drm_dev_enter/exit

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index f522b52725e4..4294f17cedcb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -23,6 +23,7 @@
  */
 
 #include 
+#include 
 #include "amdgpu.h"
 #include "amdgpu_sched.h"
 #include "amdgpu_ras.h"
@@ -339,7 +340,7 @@ static void amdgpu_ctx_fini(struct kref *ref)
 {
struct amdgpu_ctx *ctx = container_of(ref, struct amdgpu_ctx, refcount);
struct amdgpu_device *adev = ctx->adev;
-   unsigned i, j;
+   unsigned i, j, idx;
 
if (!adev)
return;
@@ -350,7 +351,12 @@ static void amdgpu_ctx_fini(struct kref *ref)
ctx->entities[i][j] = NULL;
}
}
-   amdgpu_ctx_set_stable_pstate(ctx, AMDGPU_CTX_STABLE_PSTATE_NONE);
+
+   if (drm_dev_enter(&adev->ddev, &idx)) {
+   amdgpu_ctx_set_stable_pstate(ctx, 
AMDGPU_CTX_STABLE_PSTATE_NONE);
+   drm_dev_exit(idx);
+   }
+
kfree(ctx);
 }
 
-- 
2.25.1



[PATCH 2/2] drm/amdgpu: Bump minor version for hot plug tests enabliing.

2022-03-01 Thread Andrey Grodzovsky
This will allow to enable the tests only after latest fix
after which the tests passed on my system.

I tested on NV21 standalone and Vega 10 and Polaris as
pair with DRI_PRIME.

It's possible there might be still issues on ASICs i don't
have at my posession but that that the point of enbling
the tests finally - if other people during testing will
encounter errors they will report and I will be able to fix.

The releated merge request for enabling libdrm tests suite  is in
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/227

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 415ecf8b2e05..be4adda8d674 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -100,9 +100,10 @@
  * - 3.43.0 - Add device hot plug/unplug support
  * - 3.44.0 - DCN3 supports DCC independent block settings: !64B && 128B, 64B 
&& 128B
  * - 3.45.0 - Add context ioctl stable pstate interface
+ * * 3.46.0 - Add context ioctl stable pstate interface
  */
 #define KMS_DRIVER_MAJOR   3
-#define KMS_DRIVER_MINOR   45
+#define KMS_DRIVER_MINOR   46
 #define KMS_DRIVER_PATCHLEVEL  0
 
 int amdgpu_vram_limit;
-- 
2.25.1



Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Kees Cook
On Tue, Mar 01, 2022 at 12:28:15PM +0100, Jakob Koschel wrote:
> Based on the coccinelle script there are ~480 cases that need fixing
> in total. I'll now finish all of them and then split them by
> submodules as Greg suggested and repost a patch set per submodule.
> Sounds good?

To help with this splitting, see:
https://github.com/kees/kernel-tools/blob/trunk/split-on-maintainer

It's not perfect, but it'll get you really close. For example, if you
had a single big tree-wide patch applied to your tree:

$ rm 0*.patch
$ git format-patch -1 HEAD
$ mv 0*.patch treewide.patch
$ split-on-maintainer treewide.patch
$ ls 0*.patch

If you have a build log before the patch that spits out warnings, the
--build-log argument can extract those warnings on a per-file basis, too
(though this can be fragile).

-- 
Kees Cook


Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Kees Cook
On Mon, Feb 28, 2022 at 04:45:11PM -0800, Linus Torvalds wrote:
> Really. The "-Wshadow doesn't work on the kernel" is not some new
> issue, because you have to do completely insane things to the source
> code to enable it.

The first big glitch with -Wshadow was with shadowed global variables.
GCC 4.8 fixed that, but it still yells about shadowed functions. What
_almost_ works is -Wshadow=local. At first glace, all the warnings
look solvable, but then one will eventually discover __wait_event()
and associated macros that mix when and how deeply it intentionally
shadows variables. :)

Another way to try to catch misused shadow variables is
-Wunused-but-set-varible, but it, too, has tons of false positives.

I tried to capture some of the rationale and research here:
https://github.com/KSPP/linux/issues/152

-- 
Kees Cook


Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Greg KH
On Tue, Mar 01, 2022 at 06:40:04PM +0100, Jakob Koschel wrote:
> 
> 
> > On 1. Mar 2022, at 18:36, Greg KH  wrote:
> > 
> > On Tue, Mar 01, 2022 at 12:28:15PM +0100, Jakob Koschel wrote:
> >> 
> >> 
> >>> On 1. Mar 2022, at 01:41, Linus Torvalds  
> >>> wrote:
> >>> 
> >>> On Mon, Feb 28, 2022 at 1:47 PM Jakob Koschel  
> >>> wrote:
>  
>  The goal of this is to get compiler warnings right? This would indeed be 
>  great.
> >>> 
> >>> Yes, so I don't mind having a one-time patch that has been gathered
> >>> using some automated checker tool, but I don't think that works from a
> >>> long-term maintenance perspective.
> >>> 
> >>> So if we have the basic rule being "don't use the loop iterator after
> >>> the loop has finished, because it can cause all kinds of subtle
> >>> issues", then in _addition_ to fixing the existing code paths that
> >>> have this issue, I really would want to (a) get a compiler warning for
> >>> future cases and (b) make it not actually _work_ for future cases.
> >>> 
> >>> Because otherwise it will just happen again.
> >>> 
>  Changing the list_for_each_entry() macro first will break all of those 
>  cases
>  (e.g. the ones using 'list_entry_is_head()).
> >>> 
> >>> So I have no problems with breaking cases that we basically already
> >>> have a patch for due to  your automated tool. There were certainly
> >>> more than a handful, but it didn't look _too_ bad to just make the
> >>> rule be "don't use the iterator after the loop".
> >>> 
> >>> Of course, that's just based on that patch of yours. Maybe there are a
> >>> ton of other cases that your patch didn't change, because they didn't
> >>> match your trigger case, so I may just be overly optimistic here.
> >> 
> >> Based on the coccinelle script there are ~480 cases that need fixing
> >> in total. I'll now finish all of them and then split them by
> >> submodules as Greg suggested and repost a patch set per submodule.
> >> Sounds good?
> > 
> > Sounds good to me!
> > 
> > If you need help carving these up and maintaining them over time as
> > different subsystem maintainers accept/ignore them, just let me know.
> > Doing large patchsets like this can be tough without a lot of
> > experience.
> 
> Very much appreciated!
> 
> There will probably be some cases that do not match one of the pattern
> we already discussed and need separate attention.
> 
> I was planning to start with one subsystem and adjust the coming ones
> according to the feedback gather there instead of posting all of them
> in one go.

That seems wise.  Feel free to use USB as a testing ground for this if
you want to :)

thanks,

greg k-h


Re: [PATCH 2/2] drm/amdgpu: Bump minor version for hot plug tests enabliing.

2022-03-01 Thread Alex Deucher
On Tue, Mar 1, 2022 at 1:08 PM Andrey Grodzovsky
 wrote:
>
> This will allow to enable the tests only after latest fix
> after which the tests passed on my system.
>
> I tested on NV21 standalone and Vega 10 and Polaris as
> pair with DRI_PRIME.
>
> It's possible there might be still issues on ASICs i don't
> have at my posession but that that the point of enbling
> the tests finally - if other people during testing will
> encounter errors they will report and I will be able to fix.
>
> The releated merge request for enabling libdrm tests suite  is in
> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/227
>
> Signed-off-by: Andrey Grodzovsky 

Typo in the title: enabliing -> enabling

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 415ecf8b2e05..be4adda8d674 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -100,9 +100,10 @@
>   * - 3.43.0 - Add device hot plug/unplug support
>   * - 3.44.0 - DCN3 supports DCC independent block settings: !64B && 128B, 
> 64B && 128B
>   * - 3.45.0 - Add context ioctl stable pstate interface
> + * * 3.46.0 - Add context ioctl stable pstate interface

Please update the comment here to say something like bump for hotplug
handing.  With the typo and this fixed, series is:
Reviewed-by: Alex Deucher 


>   */
>  #define KMS_DRIVER_MAJOR   3
> -#define KMS_DRIVER_MINOR   45
> +#define KMS_DRIVER_MINOR   46
>  #define KMS_DRIVER_PATCHLEVEL  0
>
>  int amdgpu_vram_limit;
> --
> 2.25.1
>


Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Linus Torvalds
On Mon, Feb 28, 2022 at 2:29 PM James Bottomley
 wrote:
>
> However, if the desire is really to poison the loop variable then we
> can do
>
> #define list_for_each_entry(pos, head, member)  \
> for (pos = list_first_entry(head, typeof(*pos), member);\
>  !list_entry_is_head(pos, head, member) && ((pos = NULL) == NULL; 
>   \
>  pos = list_next_entry(pos, member))
>
> Which would at least set pos to NULL when the loop completes.

That would actually have been excellent if we had done that
originally. It would not only avoid the stale and incorrectly typed
head entry left-over turd, it would also have made it very easy to
test for "did I find an entry in the loop".

But I don't much like it in the situation we are now.

Why? Mainly because it basically changes the semantics of the loop
_without_ any warnings about it.  And we don't actually get the
advantage of the nicer semantics, because we can't actually make code
do

list_for_each_entry(entry, ) {
..
}
if (!entry)
return -ESRCH;
.. use the entry we found ..

because that would be a disaster for back-porting, plus it would be a
flag-day issue (ie we'd have to change the semantics of the loop at
the same time we change every single user).

So instead of that simple "if (!entry)", we'd effectively have to
continue to use something that still works with the old world order
(ie that "if (list_entry_is_head())" model).

So we couldn't really take _advantage_ of the nicer semantics, and
we'd not even get a warning if somebody does it wrong - the code would
just silently do the wrong thing.

IOW: I don't think you are wrong about that patch: it would solve the
problem that Jakob wants to solve, and it would have absolutely been
much better if we had done this from the beginning. But I think that
in our current situation, it's actually a really fragile solution to
the "don't do that then" problem we have.

  Linus


Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Matthew Wilcox
On Tue, Mar 01, 2022 at 10:14:07AM -0800, Kees Cook wrote:
> On Mon, Feb 28, 2022 at 04:45:11PM -0800, Linus Torvalds wrote:
> > Really. The "-Wshadow doesn't work on the kernel" is not some new
> > issue, because you have to do completely insane things to the source
> > code to enable it.
> 
> The first big glitch with -Wshadow was with shadowed global variables.
> GCC 4.8 fixed that, but it still yells about shadowed functions. What
> _almost_ works is -Wshadow=local. At first glace, all the warnings
> look solvable, but then one will eventually discover __wait_event()
> and associated macros that mix when and how deeply it intentionally
> shadows variables. :)

Well, that's just disgusting.  Macros fundamentally shouldn't be
referring to things that aren't in their arguments.  The first step to
cleaning this up is ...

I'll take a look at the rest of cleaning this up soon.

>From 28ffe35d56223d4242b915832299e5acc926737e Mon Sep 17 00:00:00 2001
From: "Matthew Wilcox (Oracle)" 
Date: Tue, 1 Mar 2022 13:47:07 -0500
Subject: [PATCH] wait: Parameterize the return variable to ___wait_event()

Macros should not refer to variables which aren't in their arguments.
Pass the name from its callers.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/swait.h| 12 ++--
 include/linux/wait.h | 32 
 include/linux/wait_bit.h |  4 ++--
 3 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/include/linux/swait.h b/include/linux/swait.h
index 6a8c22b8c2a5..5e8e9b13be2d 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -191,14 +191,14 @@ do {  
\
 } while (0)
 
 #define __swait_event_timeout(wq, condition, timeout)  \
-   ___swait_event(wq, ___wait_cond_timeout(condition), \
+   ___swait_event(wq, ___wait_cond_timeout(condition, __ret),  \
  TASK_UNINTERRUPTIBLE, timeout,\
  __ret = schedule_timeout(__ret))
 
 #define swait_event_timeout_exclusive(wq, condition, timeout)  \
 ({ \
long __ret = timeout;   \
-   if (!___wait_cond_timeout(condition))   \
+   if (!___wait_cond_timeout(condition, __ret))\
__ret = __swait_event_timeout(wq, condition, timeout);  \
__ret;  \
 })
@@ -216,14 +216,14 @@ do {  
\
 })
 
 #define __swait_event_interruptible_timeout(wq, condition, timeout)\
-   ___swait_event(wq, ___wait_cond_timeout(condition), \
+   ___swait_event(wq, ___wait_cond_timeout(condition, __ret),  \
  TASK_INTERRUPTIBLE, timeout,  \
  __ret = schedule_timeout(__ret))
 
 #define swait_event_interruptible_timeout_exclusive(wq, condition, timeout)\
 ({ \
long __ret = timeout;   \
-   if (!___wait_cond_timeout(condition))   \
+   if (!___wait_cond_timeout(condition, __ret))\
__ret = __swait_event_interruptible_timeout(wq, \
condition, timeout);\
__ret;  \
@@ -252,7 +252,7 @@ do {
\
 } while (0)
 
 #define __swait_event_idle_timeout(wq, condition, timeout) \
-   ___swait_event(wq, ___wait_cond_timeout(condition), \
+   ___swait_event(wq, ___wait_cond_timeout(condition, __ret),  \
   TASK_IDLE, timeout,  \
   __ret = schedule_timeout(__ret))
 
@@ -278,7 +278,7 @@ do {
\
 #define swait_event_idle_timeout_exclusive(wq, condition, timeout) \
 ({ \
long __ret = timeout;   \
-   if (!___wait_cond_timeout(condition))   \
+   if (!___wait_cond_timeout(condition, __ret))\
__ret = __swait_event_idle_timeout(wq,  \
   condition, timeout); \
__ret;  \
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 851e07da2583..890cce3c0f2e 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -271,7 +271,7 @@ static inline void 

Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Linus Torvalds
On Tue, Mar 1, 2022 at 10:14 AM Kees Cook  wrote:
>
> The first big glitch with -Wshadow was with shadowed global variables.
> GCC 4.8 fixed that, but it still yells about shadowed functions. What
> _almost_ works is -Wshadow=local.

Heh. Yeah, I just have long memories of "-Wshadow was a disaster". You
looked into the details.

> Another way to try to catch misused shadow variables is
> -Wunused-but-set-varible, but it, too, has tons of false positives.

That on the face of it should be an easy warning to get technically
right for a compiler.

So I assume the "false positives" are simply because we end up having
various variables that really don't end up being used - and
"intentionally" so).

Or rather, they might only be used under some config option - perhaps
the use is even syntactically there and parsed, but the compiler
notices that it's turned off under some

if (IS_ENABLED(..))

option? Because yeah, we have a lot of those.

I think that's a common theme with a lot of compiler warnings: on the
face of it they sound "obviously sane" and nobody should ever write
code like that.

A conditional that is always true? Sounds idiotic, and sounds like a
reasonable thing for a compiler to warn about, since why would you
have a conditional in the first place for that?

But then you realize that maybe the conditional is a build config
option, and "always true" suddenly makes sense. Or it's a test for
something that is always true on _that_architecture_ but not in some
general sense (ie testing "sizeof()"). Or it's a purely syntactic
conditional, like "do { } while (0)".

It's why I'm often so down on a lot of the odd warnings that are
hiding under W=1 and friends. They all may make sense in the trivial
case ("That is insane") but then in the end they happen for sane code.

And yeah, -Wshadow has had tons of history with macro nesting, and
just being badly done in the first place (eg "strlen" can be a
perfectly fine local variable).

That said, maybe people could ask the gcc and clan people for a way to
_mark_ the places where we expect to validly see shadowing. For
example, that "local variable in a macro expression statement" thing
is absolutely horrendous to fix with preprocessor tricks to try to
make for unique identifiers.

But I think it would be much more syntactically reasonable to add (for
example) a "shadow" attribute to such a variable exactly to tell the
compiler "yeah, yeah, I know this identifier could shadow an outer
one" and turn it off that way.

   Linus


Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Linus Torvalds
On Tue, Mar 1, 2022 at 11:06 AM Linus Torvalds
 wrote:
>
> So instead of that simple "if (!entry)", we'd effectively have to
> continue to use something that still works with the old world order
> (ie that "if (list_entry_is_head())" model).

Just to prove my point about how this is painful, that doesn't work at all.

If the loop iterator at the end is NULL (good, in theory), we can't
use "list_entry_is_head()" to check whether we ended. We'd have to use
a new thing entirely, to handle the "list_for_each_entry() has the
old/new semantics" cases.

That's largely why I was pushing for the "let's make it impossible to
use the loop iterator at all outside the loop". It avoids the
confusing case, and the patches to move to that stricter semantic can
be merged independently (and before) doing the actual semantic change.

I'm not saying my suggested approach is wonderful either. Honestly,
it's painful that we have so nasty semantics for the end-of-loop case
for list_for_each_entry().

The minimal patch would clearly be to keep those broken semantics, and
just force everybody to use the list_entry_is_head() case. That's the
"we know we messed up, we are too lazy to fix it, we'll just work
around it and people need to be careful" approach.

And laziness is a virtue. But bad semantics are bad semantics. So it's
a question of balancing those two issues.

   Linus


[PATCH AUTOSEL 5.16 20/28] drm/amdgpu: bypass tiling flag check in virtual display case (v2)

2022-03-01 Thread Sasha Levin
From: Guchun Chen 

[ Upstream commit e2b993302f40c4eb714ecf896dd9e1c5be7d4cd7 ]

vkms leverages common amdgpu framebuffer creation, and
also as it does not support FB modifier, there is no need
to check tiling flags when initing framebuffer when virtual
display is enabled.

This can fix below calltrace:

amdgpu :00:08.0: GFX9+ requires FB check based on format modifier
WARNING: CPU: 0 PID: 1023 at drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:1150 
amdgpu_display_framebuffer_init+0x8e7/0xb40 [amdgpu]

v2: check adev->enable_virtual_display instead as vkms can be
enabled in bare metal as well.

Signed-off-by: Leslie Shi 
Signed-off-by: Guchun Chen 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index dc50c05f23fc2..5c08047adb594 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -1145,7 +1145,7 @@ int amdgpu_display_framebuffer_init(struct drm_device 
*dev,
if (ret)
return ret;
 
-   if (!dev->mode_config.allow_fb_modifiers) {
+   if (!dev->mode_config.allow_fb_modifiers && 
!adev->enable_virtual_display) {
drm_WARN_ONCE(dev, adev->family >= AMDGPU_FAMILY_AI,
  "GFX9+ requires FB check based on format 
modifier\n");
ret = check_tiling_flags_gfx6(rfb);
-- 
2.34.1



[PATCH AUTOSEL 5.15 16/23] drm/amdgpu: bypass tiling flag check in virtual display case (v2)

2022-03-01 Thread Sasha Levin
From: Guchun Chen 

[ Upstream commit e2b993302f40c4eb714ecf896dd9e1c5be7d4cd7 ]

vkms leverages common amdgpu framebuffer creation, and
also as it does not support FB modifier, there is no need
to check tiling flags when initing framebuffer when virtual
display is enabled.

This can fix below calltrace:

amdgpu :00:08.0: GFX9+ requires FB check based on format modifier
WARNING: CPU: 0 PID: 1023 at drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:1150 
amdgpu_display_framebuffer_init+0x8e7/0xb40 [amdgpu]

v2: check adev->enable_virtual_display instead as vkms can be
enabled in bare metal as well.

Signed-off-by: Leslie Shi 
Signed-off-by: Guchun Chen 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index dc50c05f23fc2..5c08047adb594 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -1145,7 +1145,7 @@ int amdgpu_display_framebuffer_init(struct drm_device 
*dev,
if (ret)
return ret;
 
-   if (!dev->mode_config.allow_fb_modifiers) {
+   if (!dev->mode_config.allow_fb_modifiers && 
!adev->enable_virtual_display) {
drm_WARN_ONCE(dev, adev->family >= AMDGPU_FAMILY_AI,
  "GFX9+ requires FB check based on format 
modifier\n");
ret = check_tiling_flags_gfx6(rfb);
-- 
2.34.1



Re: [PATCH v13 5/5] drm/amdgpu: add drm buddy support to amdgpu

2022-03-01 Thread Arunpravin



On 22/02/22 6:24 pm, Christian König wrote:
> Am 21.02.22 um 17:45 schrieb Arunpravin:
>> - Remove drm_mm references and replace with drm buddy functionalities
>> - Add res cursor support for drm buddy
>>
>> v2(Matthew Auld):
>>- replace spinlock with mutex as we call kmem_cache_zalloc
>>  (..., GFP_KERNEL) in drm_buddy_alloc() function
>>
>>- lock drm_buddy_block_trim() function as it calls
>>  mark_free/mark_split are all globally visible
>>
>> v3(Matthew Auld):
>>- remove trim method error handling as we address the failure case
>>  at drm_buddy_block_trim() function
>>
>> v4:
>>- fix warnings reported by kernel test robot 
>>
>> v5:
>>- fix merge conflict issue
>>
>> v6:
>>- fix warnings reported by kernel test robot 
>>
>> v7:
>>- remove DRM_BUDDY_RANGE_ALLOCATION flag usage
>>
>> v8:
>>- keep DRM_BUDDY_RANGE_ALLOCATION flag usage
>>- resolve conflicts created by drm/amdgpu: remove VRAM accounting v2
>>
>> Signed-off-by: Arunpravin 
>> ---
>>   drivers/gpu/drm/Kconfig   |   1 +
>>   .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h|  97 +--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |   7 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  | 256 ++
>>   4 files changed, 229 insertions(+), 132 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
>> index 763355330b17..019ec0440ced 100644
>> --- a/drivers/gpu/drm/Kconfig
>> +++ b/drivers/gpu/drm/Kconfig
>> @@ -279,6 +279,7 @@ config DRM_AMDGPU
>>  select HWMON
>>  select BACKLIGHT_CLASS_DEVICE
>>  select INTERVAL_TREE
>> +select DRM_BUDDY
>>  help
>>Choose this option if you have a recent AMD Radeon graphics card.
>>   
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
>> index acfa207cf970..da12b4ff2e45 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
>> @@ -30,12 +30,15 @@
>>   #include 
>>   #include 
>>   
>> +#include "amdgpu_vram_mgr.h"
>> +
>>   /* state back for walking over vram_mgr and gtt_mgr allocations */
>>   struct amdgpu_res_cursor {
>>  uint64_tstart;
>>  uint64_tsize;
>>  uint64_tremaining;
>> -struct drm_mm_node  *node;
>> +void*node;
>> +uint32_tmem_type;
>>   };
>>   
>>   /**
>> @@ -52,27 +55,63 @@ static inline void amdgpu_res_first(struct ttm_resource 
>> *res,
>>  uint64_t start, uint64_t size,
>>  struct amdgpu_res_cursor *cur)
>>   {
>> +struct drm_buddy_block *block;
>> +struct list_head *head, *next;
>>  struct drm_mm_node *node;
>>   
>> -if (!res || res->mem_type == TTM_PL_SYSTEM) {
>> -cur->start = start;
>> -cur->size = size;
>> -cur->remaining = size;
>> -cur->node = NULL;
>> -WARN_ON(res && start + size > res->num_pages << PAGE_SHIFT);
>> -return;
>> -}
>> +if (!res)
>> +goto err_out;
> 
> It's not really an error to not have a resource. So I would rather name 
> the label fallback or something like that.
> 
>>   
>>  BUG_ON(start + size > res->num_pages << PAGE_SHIFT);
>>   
>> -node = to_ttm_range_mgr_node(res)->mm_nodes;
>> -while (start >= node->size << PAGE_SHIFT)
>> -start -= node++->size << PAGE_SHIFT;
>> +cur->mem_type = res->mem_type;
>> +
>> +switch (cur->mem_type) {
>> +case TTM_PL_VRAM:
>> +head = &to_amdgpu_vram_mgr_node(res)->blocks;
>> +
>> +block = list_first_entry_or_null(head,
>> + struct drm_buddy_block,
>> + link);
>> +if (!block)
>> +goto err_out;
>> +
>> +while (start >= amdgpu_node_size(block)) {
>> +start -= amdgpu_node_size(block);
>> +
>> +next = block->link.next;
>> +if (next != head)
>> +block = list_entry(next, struct 
>> drm_buddy_block, link);
>> +}
>> +
>> +cur->start = amdgpu_node_start(block) + start;
>> +cur->size = min(amdgpu_node_size(block) - start, size);
>> +cur->remaining = size;
>> +cur->node = block;
>> +break;
>> +case TTM_PL_TT:
>> +node = to_ttm_range_mgr_node(res)->mm_nodes;
>> +while (start >= node->size << PAGE_SHIFT)
>> +start -= node++->size << PAGE_SHIFT;
>> +
>> +cur->start = (node->start << PAGE_SHIFT) + start;
>> +cur->size = min((node->size << PAGE_SHIFT) - start, size);
>> +cur->remaining = size;
>> +cur->node = node;
>> +break;
>> +default:
>>

[PATCH v9] drm/amdgpu: add drm buddy support to amdgpu

2022-03-01 Thread Arunpravin
- Remove drm_mm references and replace with drm buddy functionalities
- Add res cursor support for drm buddy

v2(Matthew Auld):
  - replace spinlock with mutex as we call kmem_cache_zalloc
(..., GFP_KERNEL) in drm_buddy_alloc() function

  - lock drm_buddy_block_trim() function as it calls
mark_free/mark_split are all globally visible

v3(Matthew Auld):
  - remove trim method error handling as we address the failure case
at drm_buddy_block_trim() function

v4:
  - fix warnings reported by kernel test robot 

v5:
  - fix merge conflict issue

v6:
  - fix warnings reported by kernel test robot 

v7:
  - remove DRM_BUDDY_RANGE_ALLOCATION flag usage

v8:
  - keep DRM_BUDDY_RANGE_ALLOCATION flag usage
  - resolve conflicts created by drm/amdgpu: remove VRAM accounting v2

v9(Christian):
  - rename label name as fallback
  - move struct amdgpu_vram_mgr to amdgpu_vram_mgr.h
  - remove unnecessary flags from struct amdgpu_vram_reservation
  - rewrite block NULL check condition
  - change else style as per coding standard
  - rewrite the node max size
  - add a helper function to fetch the first entry from the list

Signed-off-by: Arunpravin 
---
 drivers/gpu/drm/Kconfig   |   1 +
 .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h|  97 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  | 249 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h  |  68 +
 5 files changed, 289 insertions(+), 136 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index f1422bee3dcc..5133c3f028ab 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -280,6 +280,7 @@ config DRM_AMDGPU
select HWMON
select BACKLIGHT_CLASS_DEVICE
select INTERVAL_TREE
+   select DRM_BUDDY
help
  Choose this option if you have a recent AMD Radeon graphics card.
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
index acfa207cf970..864c609ba00b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
@@ -30,12 +30,15 @@
 #include 
 #include 
 
+#include "amdgpu_vram_mgr.h"
+
 /* state back for walking over vram_mgr and gtt_mgr allocations */
 struct amdgpu_res_cursor {
uint64_tstart;
uint64_tsize;
uint64_tremaining;
-   struct drm_mm_node  *node;
+   void*node;
+   uint32_tmem_type;
 };
 
 /**
@@ -52,27 +55,63 @@ static inline void amdgpu_res_first(struct ttm_resource 
*res,
uint64_t start, uint64_t size,
struct amdgpu_res_cursor *cur)
 {
+   struct drm_buddy_block *block;
+   struct list_head *head, *next;
struct drm_mm_node *node;
 
-   if (!res || res->mem_type == TTM_PL_SYSTEM) {
-   cur->start = start;
-   cur->size = size;
-   cur->remaining = size;
-   cur->node = NULL;
-   WARN_ON(res && start + size > res->num_pages << PAGE_SHIFT);
-   return;
-   }
+   if (!res)
+   goto fallback;
 
BUG_ON(start + size > res->num_pages << PAGE_SHIFT);
 
-   node = to_ttm_range_mgr_node(res)->mm_nodes;
-   while (start >= node->size << PAGE_SHIFT)
-   start -= node++->size << PAGE_SHIFT;
+   cur->mem_type = res->mem_type;
+
+   switch (cur->mem_type) {
+   case TTM_PL_VRAM:
+   head = &to_amdgpu_vram_mgr_node(res)->blocks;
+
+   block = list_first_entry_or_null(head,
+struct drm_buddy_block,
+link);
+   if (!block)
+   goto fallback;
+
+   while (start >= amdgpu_node_size(block)) {
+   start -= amdgpu_node_size(block);
+
+   next = block->link.next;
+   if (next != head)
+   block = list_entry(next, struct 
drm_buddy_block, link);
+   }
+
+   cur->start = amdgpu_node_start(block) + start;
+   cur->size = min(amdgpu_node_size(block) - start, size);
+   cur->remaining = size;
+   cur->node = block;
+   break;
+   case TTM_PL_TT:
+   node = to_ttm_range_mgr_node(res)->mm_nodes;
+   while (start >= node->size << PAGE_SHIFT)
+   start -= node++->size << PAGE_SHIFT;
+
+   cur->start = (node->start << PAGE_SHIFT) + start;
+   cur->size = min((node->size << PAGE_SHIFT) - start, size);
+   cur->remaining = size;
+   cur->node = node;
+   break;
+

Re: [PATCH 6/6] treewide: remove check of list iterator against head past the loop body

2022-03-01 Thread Linus Torvalds
So looking at this patch, I really reacted to the fact that quite
often the "use outside the loop" case is all kinds of just plain
unnecessary, but _used_ to be a convenience feature.

I'll just quote the first chunk in it's entirely as an example - not
because I think this chunk is particularly important, but because it's
a good example:

On Mon, Feb 28, 2022 at 3:09 AM Jakob Koschel  wrote:
>
> diff --git a/arch/arm/mach-mmp/sram.c b/arch/arm/mach-mmp/sram.c
> index 6794e2db1ad5..fc47c107059b 100644
> --- a/arch/arm/mach-mmp/sram.c
> +++ b/arch/arm/mach-mmp/sram.c
> @@ -39,19 +39,22 @@ static LIST_HEAD(sram_bank_list);
>  struct gen_pool *sram_get_gpool(char *pool_name)
>  {
> struct sram_bank_info *info = NULL;
> +   struct sram_bank_info *tmp;
>
> if (!pool_name)
> return NULL;
>
> mutex_lock(&sram_lock);
>
> -   list_for_each_entry(info, &sram_bank_list, node)
> -   if (!strcmp(pool_name, info->pool_name))
> +   list_for_each_entry(tmp, &sram_bank_list, node)
> +   if (!strcmp(pool_name, tmp->pool_name)) {
> +   info = tmp;
> break;
> +   }
>
> mutex_unlock(&sram_lock);
>
> -   if (&info->node == &sram_bank_list)
> +   if (!info)
> return NULL;
>
> return info->gpool;

I realize this was probably at least auto-generated with coccinelle,
but maybe that script could be taught to do avoid the "use after loop"
by simply moving the code _into_ the loop.

IOW, this all would be cleaner and clear written as

if (!pool_name)
return NULL;

mutex_lock(&sram_lock);
list_for_each_entry(info, &sram_bank_list, node) {
if (!strcmp(pool_name, info->pool_name)) {
mutex_unlock(&sram_lock);
return info;
}
}
mutex_unlock(&sram_lock);
return NULL;

Ta-daa - no use outside the loop, no need for new variables, just a
simple "just do it inside the loop". Yes, we end up having that lock
thing twice, but it looks worth it from a "make the code obvious"
standpoint.

Would it be even cleaner if the locking was done in the caller, and
the loop was some simple helper function? It probably would. But that
would require a bit more smarts than probably a simple coccinelle
script would do.

Linus


RE: [PATCH] drm/amdgpu: Move common initialization operations of each ras block to one function

2022-03-01 Thread Chai, Thomas
[AMD Official Use Only]

Hi Stanley,
Thanks for your suggestion. 
I add a comment after your comment.

-Original Message-
From: Yang, Stanley  
Sent: Tuesday, March 1, 2022 9:50 PM
To: Chai, Thomas ; amd-gfx@lists.freedesktop.org
Cc: Zhou1, Tao ; Zhang, Hawking ; 
Clements, John ; Chai, Thomas ; 
Chai, Thomas 
Subject: 回复: [PATCH] drm/amdgpu: Move common initialization operations of each 
ras block to one function

[AMD Official Use Only]

Hi yipe,

One suggestion for this patch, please check my comment.

Regards,
Stanley
> -邮件原件-
> 发件人: amd-gfx  代表 yipechai
> 发送时间: Tuesday, March 1, 2022 5:46 PM
> 收件人: amd-gfx@lists.freedesktop.org
> 抄送: Zhou1, Tao ; Zhang, Hawking 
> ; Clements, John ; Chai, 
> Thomas ; Chai, Thomas 
> 主题: [PATCH] drm/amdgpu: Move common initialization operations of each 
> ras block to one function
>
> Define amdgpu_ras_sw_init function to initialize all ras blocks.
>
> Signed-off-by: yipechai 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   6 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c|   2 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 143
> -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h|   1 +
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  |  21 ---
>  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c |  16 ---
>  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  28 
>  drivers/gpu/drm/amd/amdgpu/mca_v3_0.c  |   6 -
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c |  17 ---
>  9 files changed, 148 insertions(+), 92 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 6113ddc765a7..72550e9f6058 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2402,6 +2402,12 @@ static int amdgpu_device_ip_init(struct 
> amdgpu_device *adev)
>   }
>   }
>
> + r = amdgpu_ras_sw_init(adev);
> + if (r) {
> + DRM_ERROR("amdgpu_ras_early_init failed (%d).\n", r);
> + goto init_failed;
> + }
> [Yang, Stanley] : This is ras blocks early init, I  think it's more 
> reasonable to move amdgpu_ras_sw_init before amdgpu_ras_init function.
 
  [Thomas] Sorry, I will change this error message print.
I also agree with you to initialize all ras operations 
before amdgpu_ras_init,  but the initialization place of each ras instance is 
different.  
The ras block instances of  mmhub umc sdma and hdp are initialized 
in IP .early_init function, but gfx and mca ras block instances are initialized 
in the IP .sw_init function. 
   Then, after calling all IP .sw_init initialization and then calling 
amdgpu_ras_sw_init may be the earliest place to initialize all ras blocks. 
   

> +
>   if (amdgpu_sriov_vf(adev))
>   amdgpu_virt_init_data_exchange(adev);
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index ab75e189bc0b..544241f357b2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -440,8 +440,6 @@ int amdgpu_gmc_ras_early_init(struct amdgpu_device 
> *adev)  {
>   if (!adev->gmc.xgmi.connected_to_cpu) {
>   adev->gmc.xgmi.ras = &xgmi_ras;
> - amdgpu_ras_register_ras_block(adev, &adev-
> >gmc.xgmi.ras->ras_block);
> - adev->gmc.xgmi.ras_if = &adev->gmc.xgmi.ras-
> >ras_block.ras_comm;
>   }
>
>   return 0;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index d3875618ebf5..89075ab9e82e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2299,8 +2299,6 @@ int amdgpu_ras_init(struct amdgpu_device *adev)
>   case CHIP_ALDEBARAN:
>   if (!adev->gmc.xgmi.connected_to_cpu) {
>   adev->nbio.ras = &nbio_v7_4_ras;
> - amdgpu_ras_register_ras_block(adev, &adev-
> >nbio.ras->ras_block);
> - adev->nbio.ras_if = &adev->nbio.ras-
> >ras_block.ras_comm;
>   }
>   break;
>   default:
> @@ -2533,6 +2531,147 @@ void amdgpu_ras_suspend(struct amdgpu_device 
> *adev)
>   amdgpu_ras_disable_all_features(adev, 1);  }
>
> +int amdgpu_ras_sw_init(struct amdgpu_device *adev) {
> + int err = 0;
> +
> + if (!amdgpu_ras_asic_supported(adev))
> + return 0;
> +
> + if (adev->nbio.ras) {
> + err = amdgpu_ras_register_ras_block(adev, &adev-
> >nbio.ras->ras_block);
> + if (err) {
> + dev_err(adev->dev, "Failed to register nbio ras
> block!\n");
> + return err;
> + }
> + adev->nbio.ras_if = &adev->nbio.ras->ras_block.ras_comm;
> + }
> +
> + if (adev->gmc.xgmi.ras) {
> + err = amdgpu_ras_register_ras_block(adev, &adev-
> >gmc.xgmi.ras->ras_bl

Re: [PATCH v9] drm/amdgpu: add drm buddy support to amdgpu

2022-03-01 Thread kernel test robot
Hi Arunpravin,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on drm/drm-next]
[also build test WARNING on drm-tip/drm-tip]
[cannot apply to drm-intel/for-linux-next v5.17-rc6 next-20220301]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Arunpravin/drm-amdgpu-add-drm-buddy-support-to-amdgpu/20220302-043936
base:   git://anongit.freedesktop.org/drm/drm drm-next
config: ia64-randconfig-r011-20220301 
(https://download.01.org/0day-ci/archive/20220302/202203020700.yfcutgqz-...@intel.com/config)
compiler: ia64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/22a06757ae067e29c09a9d95eaf2b9053833740f
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Arunpravin/drm-amdgpu-add-drm-buddy-support-to-amdgpu/20220302-043936
git checkout 22a06757ae067e29c09a9d95eaf2b9053833740f
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross 
O=build_dir ARCH=ia64 SHELL=/bin/bash drivers/gpu/drm/amd/amdgpu/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c: In function 
'amdgpu_vram_mgr_new':
>> drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:360:28: warning: variable 
>> 'mem_bytes' set but not used [-Wunused-but-set-variable]
 360 | u64 vis_usage = 0, mem_bytes, max_bytes, min_page_size;
 |^


vim +/mem_bytes +360 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c

   343  
   344  /**
   345   * amdgpu_vram_mgr_new - allocate new ranges
   346   *
   347   * @man: TTM memory type manager
   348   * @tbo: TTM BO we need this range for
   349   * @place: placement flags and restrictions
   350   * @res: the resulting mem object
   351   *
   352   * Allocate VRAM for the given BO.
   353   */
   354  static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
   355 struct ttm_buffer_object *tbo,
   356 const struct ttm_place *place,
   357 struct ttm_resource **res)
   358  {
   359  unsigned long lpfn, pages_per_node, pages_left, pages;
 > 360  u64 vis_usage = 0, mem_bytes, max_bytes, min_page_size;
   361  struct amdgpu_vram_mgr *mgr = to_vram_mgr(man);
   362  struct amdgpu_device *adev = to_amdgpu_device(mgr);
   363  struct amdgpu_vram_mgr_node *node;
   364  struct drm_buddy *mm = &mgr->mm;
   365  struct drm_buddy_block *block;
   366  unsigned i;
   367  int r;
   368  
   369  lpfn = place->lpfn;
   370  if (!lpfn)
   371  lpfn = man->size >> PAGE_SHIFT;
   372  
   373  max_bytes = adev->gmc.mc_vram_size;
   374  if (tbo->type != ttm_bo_type_kernel)
   375  max_bytes -= AMDGPU_VM_RESERVED_VRAM;
   376  
   377  mem_bytes = tbo->base.size;
   378  if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
   379  pages_per_node = ~0ul;
   380  } else {
   381  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
   382  pages_per_node = HPAGE_PMD_NR;
   383  #else
   384  /* default to 2MB */
   385  pages_per_node = 2UL << (20UL - PAGE_SHIFT);
   386  #endif
   387  pages_per_node = max_t(uint32_t, pages_per_node,
   388 tbo->page_alignment);
   389  }
   390  
   391  node = kzalloc(sizeof(*node), GFP_KERNEL);
   392  if (!node)
   393  return -ENOMEM;
   394  
   395  ttm_resource_init(tbo, place, &node->base);
   396  
   397  /* bail out quickly if there's likely not enough VRAM for this 
BO */
   398  if (ttm_resource_manager_usage(man) > max_bytes) {
   399  r = -ENOSPC;
   400  goto error_fini;
   401  }
   402  
   403  INIT_LIST_HEAD(&node->blocks);
   404  
   405  if (place->flags & TTM_PL_FLAG_TOPDOWN)
   406  node->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
   407  
   408  if (place->fpfn || lpfn != man->size >> PAGE_SHIFT)
   409  /* Allocate blocks in desired range */
   410  node->flags |=

[PATCH] drm/amdgpu: fix potential null dereference

2022-03-01 Thread Weiguo Li
"ctx" is dereferenced but null checked later. Swap their positions
to avoid potential null dereference.

Found using a Coccinelle script:
https://coccinelle.gitlabpages.inria.fr/website/rules/mini_null_ref.cocci

Signed-off-by: Weiguo Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index f522b52725e4..b4f035ce44bc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -258,11 +258,12 @@ static void amdgpu_ctx_fini_entity(struct 
amdgpu_ctx_entity *entity)
 static int amdgpu_ctx_get_stable_pstate(struct amdgpu_ctx *ctx,
u32 *stable_pstate)
 {
-   struct amdgpu_device *adev = ctx->adev;
+   struct amdgpu_device *adev;
enum amd_dpm_forced_level current_level;
 
if (!ctx)
return -EINVAL;
+   adev = ctx->adev;
 
current_level = amdgpu_dpm_get_performance_level(adev);
 
@@ -289,12 +290,13 @@ static int amdgpu_ctx_get_stable_pstate(struct amdgpu_ctx 
*ctx,
 static int amdgpu_ctx_set_stable_pstate(struct amdgpu_ctx *ctx,
u32 stable_pstate)
 {
-   struct amdgpu_device *adev = ctx->adev;
+   struct amdgpu_device *adev;
enum amd_dpm_forced_level level;
int r;
 
if (!ctx)
return -EINVAL;
+   adev = ctx->adev;
 
mutex_lock(&adev->pm.stable_pstate_ctx_lock);
if (adev->pm.stable_pstate_ctx && adev->pm.stable_pstate_ctx != ctx) {
-- 
2.25.1



RE: [PATCH Review 1/1] drm/amdgpu: support send bad channel info to smu

2022-03-01 Thread Zhou1, Tao
[AMD Official Use Only]



> -Original Message-
> From: Stanley.Yang 
> Sent: Tuesday, March 1, 2022 9:30 PM
> To: amd-gfx@lists.freedesktop.org; Zhang, Hawking
> ; Zhou1, Tao ; Joo, Maria
> 
> Cc: Yang, Stanley 
> Subject: [PATCH Review 1/1] drm/amdgpu: support send bad channel info to
> smu
> 
> Message SMU bad channel information bitmap to update OOB table
> 
> Change-Id: I49a79af64d5263c28db059ecb8b8405a471431b4
> Signed-off-by: Stanley.Yang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   |  7 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h   |  3 ++
>  .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 25 ++-
>  .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h|  4 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c   |  5 +++
>  drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 12 ++
>  drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  1 +
>  drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 10 +
>  drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  7 +++
>  .../pm/swsmu/inc/pmfw_if/aldebaran_ppsmc.h|  3 +-
>  drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h  |  3 +-
>  .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 43 +++
>  12 files changed, 119 insertions(+), 4 deletions(-)

[Tao] It's better to split the patch into two parts, one for amdgpu and one for 
pm.

> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index d3875618ebf5..f9104f99eb9c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2068,6 +2068,7 @@ int amdgpu_ras_recovery_init(struct amdgpu_device
> *adev)
>   mutex_init(&con->recovery_lock);
>   INIT_WORK(&con->recovery_work, amdgpu_ras_do_recovery);
>   atomic_set(&con->in_recovery, 0);
> + con->eeprom_control.bad_channel_bitmap = 0;
> 
>   max_eeprom_records_count =
> amdgpu_ras_eeprom_max_record_count();
>   amdgpu_ras_validate_threshold(adev, max_eeprom_records_count);
> @@ -2092,6 +2093,11 @@ int amdgpu_ras_recovery_init(struct amdgpu_device
> *adev)
>   goto free;
> 
>   amdgpu_dpm_send_hbm_bad_pages_num(adev, con-
> >eeprom_control.ras_num_recs);
> +
> + if (con->update_channel_flag == true) {
[Tao] It can be simplified to "if (con->update_channel_flag)"

> + amdgpu_dpm_send_hbm_bad_channel_flag(adev, con-
> >eeprom_control.bad_channel_bitmap);

[Tao] do we need to check status of the function and stop recovery_init if it 
fails?

> + con->update_channel_flag = false;
> + }
>   }
> 
>  #ifdef CONFIG_X86_MCE_AMD
> @@ -2285,6 +2291,7 @@ int amdgpu_ras_init(struct amdgpu_device *adev)
>   goto release_con;
>   }
> 
> + con->update_channel_flag = false;
>   con->features = 0;
>   INIT_LIST_HEAD(&con->head);
>   /* Might need get this flag from vbios. */ diff --git
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> index 7cddaad90d6d..9314fde81e68 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> @@ -374,6 +374,9 @@ struct amdgpu_ras {
> 
>   /* record umc error info queried from smu */
>   struct umc_ecc_info umc_ecc;
> +
> + /* Indicates smu whether need update bad channel info */
> + bool update_channel_flag;
>  };
> 
>  struct ras_fs_data {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> index 2b844a5aafdb..ad5d8667756d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> @@ -265,6 +265,7 @@ int amdgpu_ras_eeprom_reset_table(struct
> amdgpu_ras_eeprom_control *control)  {
>   struct amdgpu_device *adev = to_amdgpu_device(control);
>   struct amdgpu_ras_eeprom_table_header *hdr = &control->tbl_hdr;
> + struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
>   u8 csum;
>   int res;
> 
> @@ -285,6 +286,10 @@ int amdgpu_ras_eeprom_reset_table(struct
> amdgpu_ras_eeprom_control *control)
> 
>   amdgpu_dpm_send_hbm_bad_pages_num(adev, control-
> >ras_num_recs);
> 
> + control->bad_channel_bitmap = 0;
> + amdgpu_dpm_send_hbm_bad_channel_flag(adev, control-
> >bad_channel_bitmap);
> + con->update_channel_flag = false;
> +
>   amdgpu_ras_debugfs_set_ret_size(control);
> 
>   mutex_unlock(&control->ras_tbl_mutex);
> @@ -418,6 +423,7 @@ amdgpu_ras_eeprom_append_table(struct
> amdgpu_ras_eeprom_control *control,
>  struct eeprom_table_record *record,
>  const u32 num)
>  {
> + struct amdgpu_ras *con =
> +amdgpu_ras_get_context(to_amdgpu_device(control));
>   u32 a, b, i;
>   u8 *buf, *pp;
>   int res;
> @@ -429,9 +435,16 @@ amdgpu_ras_eeprom_append_table(struct
> amdgpu_ras_eeprom_control *control,
>   /* Encode all of them in one go.
>

Re: [PATCH v2 2/3] drm/amdgpu: convert code name to ip version for noretry set

2022-03-01 Thread Christian König

Am 01.03.22 um 16:24 schrieb Paul Menzel:

Dear Yifan,


Thank you for your patch.

Am 01.03.22 um 16:01 schrieb Yifan Zhang:

Use IP version rather than codename for noretry set.


Why?


Why not?




Signed-off-by: Yifan Zhang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 17 -
  1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c

index ab75e189bc0b..fbc22b7b6315 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -559,14 +559,14 @@ void amdgpu_gmc_noretry_set(struct 
amdgpu_device *adev)

  {
  struct amdgpu_gmc *gmc = &adev->gmc;
  -    switch (adev->asic_type) {
-    case CHIP_VEGA10:
-    case CHIP_VEGA20:
-    case CHIP_ARCTURUS:
-    case CHIP_ALDEBARAN:
-    case CHIP_BEIGE_GOBY:
-    case CHIP_YELLOW_CARP:
-    case CHIP_RENOIR:
+    switch (adev->ip_versions[GC_HWIP][0]) {
+    case IP_VERSION(9, 0, 1):
+    case IP_VERSION(9, 4, 0):
+    case IP_VERSION(9, 4, 1):
+    case IP_VERSION(9, 4, 2):
+    case IP_VERSION(10, 3, 5):
+    case IP_VERSION(10, 3, 3):
+    case IP_VERSION(9, 3, 0):


I think, sorting these entries might be useful.

Should the names be added as comments for those not having them 
memorized?



  /*
   * noretry = 0 will cause kfd page fault tests fail
   * for some ASICs, so set default to 1 for these ASICs.
@@ -576,7 +576,6 @@ void amdgpu_gmc_noretry_set(struct amdgpu_device 
*adev)

  else
  gmc->noretry = amdgpu_noretry;
  break;
-    case CHIP_RAVEN:


Why remove this?


  default:
  /* Raven currently has issues with noretry
   * regardless of what we decide for other



Kind regards,

Paul




Re: [PATCH v2 2/3] drm/amdgpu: convert code name to ip version for noretry set

2022-03-01 Thread Christian König




Am 01.03.22 um 16:01 schrieb Yifan Zhang:

Use IP version rather than codename for noretry set.

Signed-off-by: Yifan Zhang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 17 -
  1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index ab75e189bc0b..fbc22b7b6315 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -559,14 +559,14 @@ void amdgpu_gmc_noretry_set(struct amdgpu_device *adev)
  {
struct amdgpu_gmc *gmc = &adev->gmc;
  
-	switch (adev->asic_type) {

-   case CHIP_VEGA10:
-   case CHIP_VEGA20:
-   case CHIP_ARCTURUS:
-   case CHIP_ALDEBARAN:
-   case CHIP_BEIGE_GOBY:
-   case CHIP_YELLOW_CARP:
-   case CHIP_RENOIR:
+   switch (adev->ip_versions[GC_HWIP][0]) {
+   case IP_VERSION(9, 0, 1):
+   case IP_VERSION(9, 4, 0):
+   case IP_VERSION(9, 4, 1):
+   case IP_VERSION(9, 4, 2):
+   case IP_VERSION(10, 3, 5):
+   case IP_VERSION(10, 3, 3):
+   case IP_VERSION(9, 3, 0):


Maybe sort those?

Apart from that Acked-by: Christian König 

Regards,
Christian.


/*
 * noretry = 0 will cause kfd page fault tests fail
 * for some ASICs, so set default to 1 for these ASICs.
@@ -576,7 +576,6 @@ void amdgpu_gmc_noretry_set(struct amdgpu_device *adev)
else
gmc->noretry = amdgpu_noretry;
break;
-   case CHIP_RAVEN:
default:
/* Raven currently has issues with noretry
 * regardless of what we decide for other




Re: [PATCH v2 2/3] drm/amdgpu: convert code name to ip version for noretry set

2022-03-01 Thread Paul Menzel

Dear Christian,


Am 02.03.22 um 08:46 schrieb Christian König:

Am 01.03.22 um 16:24 schrieb Paul Menzel:


[…]


Am 01.03.22 um 16:01 schrieb Yifan Zhang:

Use IP version rather than codename for noretry set.


Why?


Why not?


No idea as the commit message currently does not document the motivation.


Signed-off-by: Yifan Zhang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 17 -
  1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c

index ab75e189bc0b..fbc22b7b6315 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -559,14 +559,14 @@ void amdgpu_gmc_noretry_set(struct 
amdgpu_device *adev)

  {
  struct amdgpu_gmc *gmc = &adev->gmc;
  -    switch (adev->asic_type) {
-    case CHIP_VEGA10:
-    case CHIP_VEGA20:
-    case CHIP_ARCTURUS:
-    case CHIP_ALDEBARAN:
-    case CHIP_BEIGE_GOBY:
-    case CHIP_YELLOW_CARP:
-    case CHIP_RENOIR:
+    switch (adev->ip_versions[GC_HWIP][0]) {
+    case IP_VERSION(9, 0, 1):
+    case IP_VERSION(9, 4, 0):
+    case IP_VERSION(9, 4, 1):
+    case IP_VERSION(9, 4, 2):
+    case IP_VERSION(10, 3, 5):
+    case IP_VERSION(10, 3, 3):
+    case IP_VERSION(9, 3, 0):


I think, sorting these entries might be useful.

Should the names be added as comments for those not having them 
memorized?



  /*
   * noretry = 0 will cause kfd page fault tests fail
   * for some ASICs, so set default to 1 for these ASICs.
@@ -576,7 +576,6 @@ void amdgpu_gmc_noretry_set(struct amdgpu_device 
*adev)

  else
  gmc->noretry = amdgpu_noretry;
  break;
-    case CHIP_RAVEN:


Why remove this?


  default:
  /* Raven currently has issues with noretry
   * regardless of what we decide for other



Kind regards,

Paul