Re: [PATCH v9 0/4] mm: Enable conversion of powerpc to default topdown mmap layout

2022-04-08 Thread Andrew Morton
On Fri,  8 Apr 2022 09:24:58 +0200 Christophe Leroy 
 wrote:

> Rebased on top of Linux 5.18-rc1
> 
> This is the mm part of the series that converts powerpc to default
> topdown mmap layout, for merge into v5.18

We're at 5.18-rc1.  The 5.18 merge window has closed and we're in
fixes-only mode.

If there's a case to be made that these patches are needed by 5.18
users then please let's make that case.  Otherwise, this is 5.19-rc1 material.

And if it is indeed all 5.19-rc1 material, then please carry all four
in the powerpc tree with Acked-by: Andrew Morton
.

Also, [4/4] has a cc:stable.  This is a bit odd because -stable
candidates should be standalone patches, staged ahead of all
for-next-merge-window material, so we can get them merged up quickly.

More oddly, [4/4]'s changelog provides no explanation for why the patch
should be considered for backporting.



[PATCH] macintosh: fix via-pmu and via-cuda build without RTC_CLASS

2022-04-08 Thread Randy Dunlap
Fix build when RTC_CLASS is not set/enabled.
Eliminates these build errors:

m68k-linux-ld: drivers/macintosh/via-pmu.o: in function `pmu_set_rtc_time':
drivers/macintosh/via-pmu.c:1769: undefined reference to `rtc_tm_to_time64'
m68k-linux-ld: drivers/macintosh/via-cuda.o: in function `cuda_set_rtc_time':
drivers/macintosh/via-cuda.c:797: undefined reference to `rtc_tm_to_time64'

Fixes: 0792a2c8e0bb ("macintosh: Use common code to access RTC")
Signed-off-by: Randy Dunlap 
Reported-by: kernel test robot 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Christophe Leroy 
Cc: Kees Cook 
Cc: Arnd Bergmann 
Cc: Finn Thain 
Cc: Geert Uytterhoeven 
Cc: Nathan Chancellor 
Cc: Nick Desaulniers 
Cc: linuxppc-dev@lists.ozlabs.org
---
 drivers/macintosh/via-cuda.c |5 -
 drivers/macintosh/via-pmu.c  |5 -
 2 files changed, 8 insertions(+), 2 deletions(-)

--- a/drivers/macintosh/via-cuda.c
+++ b/drivers/macintosh/via-cuda.c
@@ -794,7 +794,10 @@ int cuda_set_rtc_time(struct rtc_time *t
u32 now;
struct adb_request req;
 
-   now = lower_32_bits(rtc_tm_to_time64(tm) + RTC_OFFSET);
+   now = lower_32_bits(mktime64(((unsigned int)tm->tm_year + 1900),
+   tm->tm_mon + 1, tm->tm_mday, tm->tm_hour,
+   tm->tm_min, tm->tm_sec)
+   + RTC_OFFSET);
if (cuda_request(, NULL, 6, CUDA_PACKET, CUDA_SET_TIME,
 now >> 24, now >> 16, now >> 8, now) < 0)
return -ENXIO;
--- a/drivers/macintosh/via-pmu.c
+++ b/drivers/macintosh/via-pmu.c
@@ -1766,7 +1766,10 @@ int pmu_set_rtc_time(struct rtc_time *tm
u32 now;
struct adb_request req;
 
-   now = lower_32_bits(rtc_tm_to_time64(tm) + RTC_OFFSET);
+   now = lower_32_bits(mktime64(((unsigned int)tm->tm_year + 1900),
+   tm->tm_mon + 1, tm->tm_mday, tm->tm_hour,
+   tm->tm_min, tm->tm_sec)
+   + RTC_OFFSET);
if (pmu_request(, NULL, 5, PMU_SET_RTC,
now >> 24, now >> 16, now >> 8, now) < 0)
return -ENXIO;


Re: [PATCH net-next 11/15] sfc: Remove usage of list iterator for list_add() after the loop body

2022-04-08 Thread Jakob Koschel
Hello Edward,

> On 7. Apr 2022, at 19:42, Edward Cree  wrote:
> 
> On 07/04/2022 11:28, Jakob Koschel wrote:
>> In preparation to limit the scope of a list iterator to the list
>> traversal loop, use a dedicated pointer to point to the found element [1].
>> 
>> Before, the code implicitly used the head when no element was found
>> when using >list. Since the new variable is only set if an
>> element was found, the list_add() is performed within the loop
>> and only done after the loop if it is done on the list head directly.
>> 
>> Link: 
>> https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
>>  [1]
>> Signed-off-by: Jakob Koschel 
> 
> The commit message doesn't accurately describe the patch; it states
> that "the list_add() is performed within the loop", which doesn't
> appear to be the case.

you're right, I've changed the code last minute. I'll make sure
the changelog reflects the actual behaviour here. Thanks for the
input!

> Also it seems a bit subtle to use `head` as both the head of the
> list to iterate over and the found entry/gap to insert before; a
> comment explaining that wouldn't go amiss.

Also a good point, I'll add a comment as well, or perhaps using
a separate 'struct list_head *pos' variable is even cleaner.

> (I'd question whether this change is really an improvement in this
> case, where the iterator really does hold the thing we want at the
> end of the search and so there's no if(found) special-casing —
> we're not even abusing the type system, because efx->rss_context
> is of the same type as all the list entries, so ctx really is a
> valid pointer and there shouldn't be any issues with speculative
> accesses or whatever — but it seems Linus has already pronounced
> in favour of the scope limiting, and far be it from me to gainsay
> him.)

So, since the head is included in the struct of the same type as
the element, it really doesn't make much of a difference here.
It will always be safe to use.

But this is the very rare exception. There are other benefits of
avoiding the use of list iterator after the loop. One of them
is scope limiting but you also might want to set the iterator
variable to a "safe value" before the processor might execute an additional
iteration in speculative execution on the 'bogus' head element.

If you do these kind of patches on the list macros, you need to make sure
they work for all the uses, including the safe ones (like this one).

> 
> -ed
> 
>> ---
>> drivers/net/ethernet/sfc/rx_common.c | 6 --
>> 1 file changed, 4 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/net/ethernet/sfc/rx_common.c 
>> b/drivers/net/ethernet/sfc/rx_common.c
>> index 1b22c7be0088..a8822152ff83 100644
>> --- a/drivers/net/ethernet/sfc/rx_common.c
>> +++ b/drivers/net/ethernet/sfc/rx_common.c
>> @@ -563,8 +563,10 @@ struct efx_rss_context 
>> *efx_alloc_rss_context_entry(struct efx_nic *efx)
>> 
>>  /* Search for first gap in the numbering */
>>  list_for_each_entry(ctx, head, list) {
>> -if (ctx->user_id != id)
>> +if (ctx->user_id != id) {
>> +head = >list;
>>  break;
>> +}
>>  id++;
>>  /* Check for wrap.  If this happens, we have nearly 2^32
>>   * allocated RSS contexts, which seems unlikely.
>> @@ -582,7 +584,7 @@ struct efx_rss_context 
>> *efx_alloc_rss_context_entry(struct efx_nic *efx)
>> 
>>  /* Insert the new entry into the gap */
>>  new->user_id = id;
>> -list_add_tail(>list, >list);
>> +list_add_tail(>list, head);
>>  return new;
>> }
>> 
>> 
> 

Thanks,
Jakob



Re: [PATCH net-next 02/15] net: dsa: sja1105: Remove usage of iterator for list_add() after loop

2022-04-08 Thread Vladimir Oltean
On Sat, Apr 09, 2022 at 01:58:29AM +0200, Jakob Koschel wrote:
> Hello Jakub,
> > Also the list_add() could be converted to list_add_tail().
> 
> Good point, I wasn't sure if that's considered as something that should be
> done as a separate change. I'm happy to include it in v2.

By now you probably studied more list access patterns than I did,
but I wrote that deliberately using list_add(..., pos->prev) rather than
list_add_tail(), because even though the code is the same, I tend to
think of the "head" argument of list_add_tail() as being the actual head
of the list, and therefore the head->prev being the tail of the list
(hence the name), something which doesn't hold true here where we're
inserting in the middle of the list. Anyway it's just a name and that's
what felt natural to me at the time, I won't oppose the change, but do
make it a separate change and not clump it together with the unrelated
list_for_each_entry() -> list_for_each() change.


Re: [PATCH net-next 02/15] net: dsa: sja1105: Remove usage of iterator for list_add() after loop

2022-04-08 Thread Jakub Kicinski
On Sat, 9 Apr 2022 01:58:29 +0200 Jakob Koschel wrote:
> > This turns a pretty slick piece of code into something ugly :(
> > I'd rather you open coded the iteration here than make it more 
> > complex to satisfy "safe coding guidelines".  
> 
> I'm not entirely sure I understand what you mean with 
> "open coded the iteration". But maybe the solution proposed by Vladimir [1]
> works for you?

Yup, that's what I meant!

> I'm planning to rewrite the cases in that way for the relevant ones.
>
> > Also the list_add() could be converted to list_add_tail().  
> 
> Good point, I wasn't sure if that's considered as something that should be
> done as a separate change. I'm happy to include it in v2.

Ack, separate patch would be better for that. I guess Vladimir may have
used .prev on purpose, since _tail() doesn't intuitively scream _after()
Anyway, not important.


Re: [PATCH net-next 03/15] net: dsa: mv88e6xxx: Replace usage of found with dedicated iterator

2022-04-08 Thread Jakob Koschel



> On 9. Apr 2022, at 01:50, Vladimir Oltean  wrote:
> 
> On Sat, Apr 09, 2022 at 01:44:00AM +0200, Jakob Koschel wrote:
>>> Let's try to not make convoluted code worse. Do the following 2 patches
>>> achieve what you are looking for? Originally I had a single patch (what
>>> is now 2/2) but I figured it would be cleaner to break out the unrelated
>>> change into what is now 1/2.
>> 
>> I do agree with not making convoluted code worse, but I was reluctant with
>> e.g. introducing new functions for this because others essentially
>> have the opposite opinion on this.
>> 
>> I however like solving it that way, it makes it a lot cleaner.
> 
> Yeah, I think 'just adapt to the context and style and intentions of the
> code you're changing and don't try to push a robotic one-size-fits-all
> solution' is sensible enough for an initial guiding principle.
> 
>>> If you want I can submit these changes separately.
>> 
>> Sure if you want to submit them separately, go ahead. Otherwise I can
>> integrate it into a v2, whatever you prefer essentially.
> 
> If you're moving quickly feel free to pick them up. I have lots of other
> things on my backlog so it won't be until late next week until I even
> consider submitting these.

I'm planning to send a v2 earlier than that, so I'll just integrate it there.

Thanks,
Jakob



Re: [PATCH net-next 02/15] net: dsa: sja1105: Remove usage of iterator for list_add() after loop

2022-04-08 Thread Jakob Koschel
Hello Jakub,

> On 8. Apr 2022, at 05:54, Jakub Kicinski  wrote:
> 
> On Thu,  7 Apr 2022 12:28:47 +0200 Jakob Koschel wrote:
>> diff --git a/drivers/net/dsa/sja1105/sja1105_vl.c 
>> b/drivers/net/dsa/sja1105/sja1105_vl.c
>> index b7e95d60a6e4..cfcae4d19eef 100644
>> --- a/drivers/net/dsa/sja1105/sja1105_vl.c
>> +++ b/drivers/net/dsa/sja1105/sja1105_vl.c
>> @@ -27,20 +27,24 @@ static int sja1105_insert_gate_entry(struct 
>> sja1105_gating_config *gating_cfg,
>>  if (list_empty(_cfg->entries)) {
>>  list_add(>list, _cfg->entries);
>>  } else {
>> -struct sja1105_gate_entry *p;
>> +struct sja1105_gate_entry *p = NULL, *iter;
>> 
>> -list_for_each_entry(p, _cfg->entries, list) {
>> -if (p->interval == e->interval) {
>> +list_for_each_entry(iter, _cfg->entries, list) {
>> +if (iter->interval == e->interval) {
>>  NL_SET_ERR_MSG_MOD(extack,
>> "Gate conflict");
>>  rc = -EBUSY;
>>  goto err;
>>  }
>> 
>> -if (e->interval < p->interval)
>> +if (e->interval < iter->interval) {
>> +p = iter;
>> +list_add(>list, iter->list.prev);
>>  break;
>> +}
>>  }
>> -list_add(>list, p->list.prev);
>> +if (!p)
>> +list_add(>list, gating_cfg->entries.prev);
> 
> This turns a pretty slick piece of code into something ugly :(
> I'd rather you open coded the iteration here than make it more 
> complex to satisfy "safe coding guidelines".

I'm not entirely sure I understand what you mean with 
"open coded the iteration". But maybe the solution proposed by Vladimir [1]
works for you? I'm planning to rewrite the cases in that way for the relevant
ones.

> 
> Also the list_add() could be converted to list_add_tail().

Good point, I wasn't sure if that's considered as something that should be
done as a separate change. I'm happy to include it in v2.

Thanks for the input.

Jakob

[1] https://lore.kernel.org/linux-kernel/20220408114120.tvf2lxvhfqbnrlml@skbuf/



Re: [PATCH net-next 02/15] net: dsa: sja1105: Remove usage of iterator for list_add() after loop

2022-04-08 Thread Jakob Koschel
Hello Vladimir,

> On 8. Apr 2022, at 13:41, Vladimir Oltean  wrote:
> 
> Hello Jakob,
> 
> On Thu, Apr 07, 2022 at 12:28:47PM +0200, Jakob Koschel wrote:
>> In preparation to limit the scope of a list iterator to the list
>> traversal loop, use a dedicated pointer to point to the found element [1].
>> 
>> Before, the code implicitly used the head when no element was found
>> when using >list. Since the new variable is only set if an
>> element was found, the list_add() is performed within the loop
>> and only done after the loop if it is done on the list head directly.
>> 
>> Link: 
>> https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
>>  [1]
>> Signed-off-by: Jakob Koschel 
>> ---
>> drivers/net/dsa/sja1105/sja1105_vl.c | 14 +-
>> 1 file changed, 9 insertions(+), 5 deletions(-)
>> 
>> diff --git a/drivers/net/dsa/sja1105/sja1105_vl.c 
>> b/drivers/net/dsa/sja1105/sja1105_vl.c
>> index b7e95d60a6e4..cfcae4d19eef 100644
>> --- a/drivers/net/dsa/sja1105/sja1105_vl.c
>> +++ b/drivers/net/dsa/sja1105/sja1105_vl.c
>> @@ -27,20 +27,24 @@ static int sja1105_insert_gate_entry(struct 
>> sja1105_gating_config *gating_cfg,
>>  if (list_empty(_cfg->entries)) {
>>  list_add(>list, _cfg->entries);
>>  } else {
>> -struct sja1105_gate_entry *p;
>> +struct sja1105_gate_entry *p = NULL, *iter;
>> 
>> -list_for_each_entry(p, _cfg->entries, list) {
>> -if (p->interval == e->interval) {
>> +list_for_each_entry(iter, _cfg->entries, list) {
>> +if (iter->interval == e->interval) {
>>  NL_SET_ERR_MSG_MOD(extack,
>> "Gate conflict");
>>  rc = -EBUSY;
>>  goto err;
>>  }
>> 
>> -if (e->interval < p->interval)
>> +if (e->interval < iter->interval) {
>> +p = iter;
>> +list_add(>list, iter->list.prev);
>>  break;
>> +}
>>  }
>> -list_add(>list, p->list.prev);
>> +if (!p)
>> +list_add(>list, gating_cfg->entries.prev);
>>  }
>> 
>>  gating_cfg->num_entries++;
>> -- 
>> 2.25.1
>> 
> 
> I apologize in advance if I've misinterpreted the end goal of your patch.
> I do have a vague suspicion I understand what you're trying to achieve,
> and in that case, would you mind using this patch instead of yours?

I think you are very much spot on!

> I think it still preserves the intention of the code in a clean manner.
> 
> -[ cut here ]-
> From 7aed740750d1bc3bff6e85fd33298f5905bb4e01 Mon Sep 17 00:00:00 2001
> From: Vladimir Oltean 
> Date: Fri, 8 Apr 2022 13:55:14 +0300
> Subject: [PATCH] net: dsa: sja1105: avoid use of type-confused pointer in
> sja1105_insert_gate_entry()
> 
> It appears that list_for_each_entry() leaks a type-confused pointer when
> the iteration loop ends with no early break, since "*p" will no longer
> point to a "struct sja1105_gate_entry", but rather to some memory in
> front of "gating_cfg->entries".
> 
> This isn't actually a problem here, because if the element we insert has
> the highest interval, therefore we never exit the loop early, "p->list"
> (which is all that we use outside the loop) will in fact point to
> "gating_cfg->entries" even though "p" itself is invalid.
> 
> Nonetheless, there are preparations to increase the safety of
> list_for_each_entry() by making it impossible to use the encapsulating
> structure of the iterator element outside the loop. So something needs
> to change here before those preparations go in, even though this
> constitutes legitimate use.
> 
> Make it clear that we are not dereferencing members of the encapsulating
> "struct sja1105_gate_entry" outside the loop, by using the regular
> list_for_each() iterator, and dereferencing the struct sja1105_gate_entry
> only within the loop.
> 
> With list_for_each(), the iterator element at the end of the loop does
> have a sane value in all cases, and we can just use that as the "head"
> argument of list_add().
> 
> Signed-off-by: Vladimir Oltean 
> ---
> drivers/net/dsa/sja1105/sja1105_vl.c | 12 +---
> 1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/dsa/sja1105/sja1105_vl.c 
> b/drivers/net/dsa/sja1105/sja1105_vl.c
> index c0e45b393fde..fe93c80fe5ef 100644
> --- a/drivers/net/dsa/sja1105/sja1105_vl.c
> +++ b/drivers/net/dsa/sja1105/sja1105_vl.c
> @@ -27,9 +27,15 @@ static int sja1105_insert_gate_entry(struct 
> sja1105_gating_config *gating_cfg,
>   if (list_empty(_cfg->entries)) {
>   list_add(>list, _cfg->entries);
>   } else {
> - struct sja1105_gate_entry *p;
> + struct list_head 

Re: [PATCH net-next 03/15] net: dsa: mv88e6xxx: Replace usage of found with dedicated iterator

2022-04-08 Thread Vladimir Oltean
On Sat, Apr 09, 2022 at 01:44:00AM +0200, Jakob Koschel wrote:
> > Let's try to not make convoluted code worse. Do the following 2 patches
> > achieve what you are looking for? Originally I had a single patch (what
> > is now 2/2) but I figured it would be cleaner to break out the unrelated
> > change into what is now 1/2.
> 
> I do agree with not making convoluted code worse, but I was reluctant with
> e.g. introducing new functions for this because others essentially
> have the opposite opinion on this.
> 
> I however like solving it that way, it makes it a lot cleaner.

Yeah, I think 'just adapt to the context and style and intentions of the
code you're changing and don't try to push a robotic one-size-fits-all
solution' is sensible enough for an initial guiding principle.

> > If you want I can submit these changes separately.
> 
> Sure if you want to submit them separately, go ahead. Otherwise I can
> integrate it into a v2, whatever you prefer essentially.

If you're moving quickly feel free to pick them up. I have lots of other
things on my backlog so it won't be until late next week until I even
consider submitting these.


Re: [PATCH net-next 02/15] net: dsa: sja1105: Remove usage of iterator for list_add() after loop

2022-04-08 Thread Jakob Koschel
Hey Christophe,

> On 8. Apr 2022, at 09:47, Christophe Leroy  
> wrote:
> 
> 
> 
> Le 07/04/2022 à 12:28, Jakob Koschel a écrit :
>> In preparation to limit the scope of a list iterator to the list
>> traversal loop, use a dedicated pointer to point to the found element [1].
>> 
>> Before, the code implicitly used the head when no element was found
>> when using >list. Since the new variable is only set if an
>> element was found, the list_add() is performed within the loop
>> and only done after the loop if it is done on the list head directly.
>> 
>> Link: 
>> https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
>>  [1]
>> Signed-off-by: Jakob Koschel 
>> ---
>> drivers/net/dsa/sja1105/sja1105_vl.c | 14 +-
>> 1 file changed, 9 insertions(+), 5 deletions(-)
>> 
>> diff --git a/drivers/net/dsa/sja1105/sja1105_vl.c 
>> b/drivers/net/dsa/sja1105/sja1105_vl.c
>> index b7e95d60a6e4..cfcae4d19eef 100644
>> --- a/drivers/net/dsa/sja1105/sja1105_vl.c
>> +++ b/drivers/net/dsa/sja1105/sja1105_vl.c
>> @@ -27,20 +27,24 @@ static int sja1105_insert_gate_entry(struct 
>> sja1105_gating_config *gating_cfg,
>>  if (list_empty(_cfg->entries)) {
>>  list_add(>list, _cfg->entries);
>>  } else {
>> -struct sja1105_gate_entry *p;
>> +struct sja1105_gate_entry *p = NULL, *iter;
>> 
>> -list_for_each_entry(p, _cfg->entries, list) {
>> -if (p->interval == e->interval) {
>> +list_for_each_entry(iter, _cfg->entries, list) {
>> +if (iter->interval == e->interval) {
>>  NL_SET_ERR_MSG_MOD(extack,
>>   "Gate conflict");
>>  rc = -EBUSY;
>>  goto err;
>>  }
>> 
>> -if (e->interval < p->interval)
>> +if (e->interval < iter->interval) {
>> +p = iter;
>> +list_add(>list, iter->list.prev);
>>  break;
>> +}
>>  }
>> -list_add(>list, p->list.prev);
>> +if (!p)
>> +list_add(>list, gating_cfg->entries.prev);
>>  }
>> 
>>  gating_cfg->num_entries++;
> 
> This change looks ugly, why duplicating the list_add() to do the same ? 
> At the end of the loop the pointer contains gating_cfg->entries, so it 
> was cleaner before.
> 
> If you don't want to use the loop index outside the loop, fair enough, 
> all you have to do is:
> 
>   struct sja1105_gate_entry *p, *iter;
> 
>   list_for_each_entry(iter, _cfg->entries, list) {
>   if (iter->interval == e->interval) {
>   NL_SET_ERR_MSG_MOD(extack,
>"Gate conflict");
>   rc = -EBUSY;
>   goto err;
>   }
>   p = iter;
> 
>   if (e->interval < iter->interval)
>   break;
>   }
>   list_add(>list, p->list.prev);

Thanks for the review and input.

The code you are showing here would have an uninitialized access to 'p'
if the list is empty.

Also 'p->list.prev' will be the second last entry if the list iterator
ran through completely, whereas the original code was pointing to the last
entry of the list.

IMO Vladimir Oltean posted a nice alternative way to solve it, see [1].
That way it keeps the semantics of the code the same and doesn't duplicate
the list_add.

> 
> 
> 
> Christophe

[1] https://lore.kernel.org/linux-kernel/20220408114120.tvf2lxvhfqbnrlml@skbuf/

Thanks,
Jakob



Re: [PATCH net-next 03/15] net: dsa: mv88e6xxx: Replace usage of found with dedicated iterator

2022-04-08 Thread Jakob Koschel
Hi Vladimir,

> On 8. Apr 2022, at 14:31, Vladimir Oltean  wrote:
> 
> Hi Jakob,
> 
> On Thu, Apr 07, 2022 at 12:28:48PM +0200, Jakob Koschel wrote:
>> To move the list iterator variable into the list_for_each_entry_*()
>> macro in the future it should be avoided to use the list iterator
>> variable after the loop body.
>> 
>> To *never* use the list iterator variable after the loop it was
>> concluded to use a separate iterator variable instead of a
>> found boolean [1].
>> 
>> This removes the need to use a found variable and simply checking if
>> the variable was set, can determine if the break/goto was hit.
>> 
>> Link: 
>> https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
>>  [1]
>> Signed-off-by: Jakob Koschel 
>> ---
>> drivers/net/dsa/mv88e6xxx/chip.c | 21 ++---
>> 1 file changed, 10 insertions(+), 11 deletions(-)
>> 
>> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c 
>> b/drivers/net/dsa/mv88e6xxx/chip.c
>> index 64f4fdd02902..f254f537c357 100644
>> --- a/drivers/net/dsa/mv88e6xxx/chip.c
>> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
>> @@ -1381,28 +1381,27 @@ static int mv88e6xxx_set_mac_eee(struct dsa_switch 
>> *ds, int port,
>> /* Mask of the local ports allowed to receive frames from a given fabric 
>> port */
>> static u16 mv88e6xxx_port_vlan(struct mv88e6xxx_chip *chip, int dev, int 
>> port)
>> {
>> +struct dsa_port *dp = NULL, *iter, *other_dp;
>>  struct dsa_switch *ds = chip->ds;
>>  struct dsa_switch_tree *dst = ds->dst;
>> -struct dsa_port *dp, *other_dp;
>> -bool found = false;
>>  u16 pvlan;
>> 
>>  /* dev is a physical switch */
>>  if (dev <= dst->last_switch) {
>> -list_for_each_entry(dp, >ports, list) {
>> -if (dp->ds->index == dev && dp->index == port) {
>> -/* dp might be a DSA link or a user port, so it
>> +list_for_each_entry(iter, >ports, list) {
>> +if (iter->ds->index == dev && iter->index == port) {
>> +/* iter might be a DSA link or a user port, so 
>> it
>>   * might or might not have a bridge.
>> - * Use the "found" variable for both cases.
>> + * Set the "dp" variable for both cases.
>>   */
>> -found = true;
>> +dp = iter;
>>  break;
>>  }
>>  }
>>  /* dev is a virtual bridge */
>>  } else {
>> -list_for_each_entry(dp, >ports, list) {
>> -unsigned int bridge_num = dsa_port_bridge_num_get(dp);
>> +list_for_each_entry(iter, >ports, list) {
>> +unsigned int bridge_num = dsa_port_bridge_num_get(iter);
>> 
>>  if (!bridge_num)
>>  continue;
>> @@ -1410,13 +1409,13 @@ static u16 mv88e6xxx_port_vlan(struct mv88e6xxx_chip 
>> *chip, int dev, int port)
>>  if (bridge_num + dst->last_switch != dev)
>>  continue;
>> 
>> -found = true;
>> +dp = iter;
>>  break;
>>  }
>>  }
>> 
>>  /* Prevent frames from unknown switch or virtual bridge */
>> -if (!found)
>> +if (!dp)
>>  return 0;
>> 
>>  /* Frames from DSA links and CPU ports can egress any local port */
>> -- 
>> 2.25.1
>> 
> 
> Let's try to not make convoluted code worse. Do the following 2 patches
> achieve what you are looking for? Originally I had a single patch (what
> is now 2/2) but I figured it would be cleaner to break out the unrelated
> change into what is now 1/2.

I do agree with not making convoluted code worse, but I was reluctant with
e.g. introducing new functions for this because others essentially
have the opposite opinion on this.

I however like solving it that way, it makes it a lot cleaner.

> 
> If you want I can submit these changes separately.

Sure if you want to submit them separately, go ahead. Otherwise I can
integrate it into a v2, whatever you prefer essentially.

> 
> -[ cut here ]-
> From 2d84ecd87566b1535a04526b4ebb2764e764625f Mon Sep 17 00:00:00 2001
> From: Vladimir Oltean 
> Date: Fri, 8 Apr 2022 15:15:30 +0300
> Subject: [PATCH 1/2] net: dsa: mv88e6xxx: remove redundant check in
> mv88e6xxx_port_vlan()
> 
> We know that "dev > dst->last_switch" in the "else" block.
> In other words, that "dev - dst->last_switch" is > 0.
> 
> dsa_port_bridge_num_get(dp) can be 0, but the check
> "if (bridge_num + dst->last_switch != dev) continue", rewritten as
> "if (bridge_num != dev - dst->last_switch) continue", aka
> "if (bridge_num != something which cannot be 0) continue",
> makes it redundant to have the extra "if (!bridge_num) continue" logic,
> 

Re: [PATCH V3] testing/selftests/mqueue: Fix mq_perf_tests to free the allocated cpu set

2022-04-08 Thread Shuah Khan

On 4/8/22 1:24 AM, Athira Rajeev wrote:

The selftest "mqueue/mq_perf_tests.c" use CPU_ALLOC to allocate
CPU set. This cpu set is used further in pthread_attr_setaffinity_np
and by pthread_create in the code. But in current code, allocated
cpu set is not freed.

Fix this issue by adding CPU_FREE in the "shutdown" function which
is called in most of the error/exit path for the cleanup. There are
few error paths which exit without using shutdown. Add a common goto
error path with CPU_FREE for these cases.

Fixes: 7820b0715b6f ("tools/selftests: add mq_perf_tests")
Signed-off-by: Athira Rajeev 
---
Changelog:
  From v2 -> v3:
   Addressed review comment from Shuah Khan to add
   common "goto" error path with CPU_FREE for few exit
   cases.
  From v1 -> v2:
   Addressed review comment from Shuah Khan to add
   CPU_FREE in other exit paths where it is needed



Thank you. I will queue this up for Linux 5.18-rc3

thanks,
-- Shuah


Re: [PATCH] powerpc: Mark arch_get_ima_policy() and is_ppc_trustedboot_enabled() as __init

2022-04-08 Thread Mimi Zohar
On Fri, 2022-04-08 at 13:31 -0400, Mimi Zohar wrote:
> On Fri, 2022-04-08 at 12:05 -0400, Mimi Zohar wrote:
> > On Fri, 2022-04-08 at 00:15 +1000, Michael Ellerman wrote:
> > > We can mark arch_get_ima_policy() as __init because it's only caller
> > > ima_init_arch_policy() is __init. We can then mark
> > > is_ppc_trustedboot_enabled() __init because its only caller is
> > > arch_get_ima_policy().
> > > 
> > > Signed-off-by: Michael Ellerman 
> > 
> > I assume you want to upstream this via power,
> > 
> > Reviewed-by: Mimi Zohar 
> 
> Sorry, I just noticed that is_ppc_trustedboot_enabled() is also called
> by arch_ima_get_secureboot().

Never mind, arch_ima_get_secureboot() calls
is_ppc_secureboot_enabled(), not is_ppc_trustedboot_enabled().




Re: [PATCH] powerpc: Mark arch_get_ima_policy() and is_ppc_trustedboot_enabled() as __init

2022-04-08 Thread Mimi Zohar
On Fri, 2022-04-08 at 12:05 -0400, Mimi Zohar wrote:
> On Fri, 2022-04-08 at 00:15 +1000, Michael Ellerman wrote:
> > We can mark arch_get_ima_policy() as __init because it's only caller
> > ima_init_arch_policy() is __init. We can then mark
> > is_ppc_trustedboot_enabled() __init because its only caller is
> > arch_get_ima_policy().
> > 
> > Signed-off-by: Michael Ellerman 
> 
> I assume you want to upstream this via power,
> 
> Reviewed-by: Mimi Zohar 

Sorry, I just noticed that is_ppc_trustedboot_enabled() is also called
by arch_ima_get_secureboot().

Mimi



Re: rcu_sched self-detected stall on CPU

2022-04-08 Thread Miguel Ojeda
On Fri, Apr 8, 2022 at 4:42 PM Michael Ellerman  wrote:
>
> The Rust CI has it disabled because I copied that from the x86 defconfig
> they were using back when I added the Rust support. I think that was
> meant to be a stripped down fast config for CI, but the result is it's

Indeed, that was my intention when I created the original config.

> So I'll send a patch to turn HIGH_RES_TIMERS on for the Rust CI, and we
> can debug this further without blocking them.

Thanks! I can also do it on your behalf, if you prefer, when I sync with -rc1.

Cheers,
Miguel


Re: [PATCH] powerpc: Mark arch_get_ima_policy() and is_ppc_trustedboot_enabled() as __init

2022-04-08 Thread Mimi Zohar
On Fri, 2022-04-08 at 00:15 +1000, Michael Ellerman wrote:
> We can mark arch_get_ima_policy() as __init because it's only caller
> ima_init_arch_policy() is __init. We can then mark
> is_ppc_trustedboot_enabled() __init because its only caller is
> arch_get_ima_policy().
> 
> Signed-off-by: Michael Ellerman 

I assume you want to upstream this via power,

Reviewed-by: Mimi Zohar 

thanks,

Mimi



Re: rcu_sched self-detected stall on CPU

2022-04-08 Thread Paul E. McKenney
On Sat, Apr 09, 2022 at 12:42:39AM +1000, Michael Ellerman wrote:
> Michael Ellerman  writes:
> > "Paul E. McKenney"  writes:
> >> On Wed, Apr 06, 2022 at 05:31:10PM +0800, Zhouyi Zhou wrote:
> >>> Hi
> >>> 
> >>> I can reproduce it in a ppc virtual cloud server provided by Oregon
> >>> State University.  Following is what I do:
> >>> 1) curl -l 
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/snapshot/linux-5.18-rc1.tar.gz
> >>> -o linux-5.18-rc1.tar.gz
> >>> 2) tar zxf linux-5.18-rc1.tar.gz
> >>> 3) cp config linux-5.18-rc1/.config
> >>> 4) cd linux-5.18-rc1
> >>> 5) make vmlinux -j 8
> >>> 6) qemu-system-ppc64 -kernel vmlinux -nographic -vga none -no-reboot
> >>> -smp 2 (QEMU 4.2.1)
> >>> 7) after 12 rounds, the bug got reproduced:
> >>> (http://154.223.142.244/logs/20220406/qemu.log.txt)
> >>
> >> Just to make sure, are you both seeing the same thing?  Last I knew,
> >> Zhouyi was chasing an RCU-tasks issue that appears only in kernels
> >> built with CONFIG_PROVE_RCU=y, which Miguel does not have set.  Or did
> >> I miss something?
> >>
> >> Miguel is instead seeing an RCU CPU stall warning where RCU's grace-period
> >> kthread slept for three milliseconds, but did not wake up for more than
> >> 20 seconds.  This kthread would normally have awakened on CPU 1, but
> >> CPU 1 looks to me to be very unhealthy, as can be seen in your console
> >> output below (but maybe my idea of what is healthy for powerpc systems
> >> is outdated).  Please see also the inline annotations.
> >>
> >> Thoughts from the PPC guys?
> >
> > I haven't seen it in my testing. But using Miguel's config I can
> > reproduce it seemingly on every boot.
> >
> > For me it bisects to:
> >
> >   35de589cb879 ("powerpc/time: improve decrementer clockevent processing")
> >
> > Which seems plausible.
> >
> > Reverting that on mainline makes the bug go away.
> >
> > I don't see an obvious bug in the diff, but I could be wrong, or the old
> > code was papering over an existing bug?
> >
> > I'll try and work out what it is about Miguel's config that exposes
> > this vs our defconfig, that might give us a clue.
> 
> It's CONFIG_HIGH_RES_TIMERS=n which triggers the stall.
> 
> I can reproduce just with:
> 
>   $ make ppc64le_guest_defconfig
>   $ ./scripts/config -d HIGH_RES_TIMERS
> 
> We have no defconfigs that disable HIGH_RES_TIMERS, I didn't even
> realise you could disable it TBH :)
> 
> The Rust CI has it disabled because I copied that from the x86 defconfig
> they were using back when I added the Rust support. I think that was
> meant to be a stripped down fast config for CI, but the result is it's
> just using a badly tested combination which is not helpful.
> 
> So I'll send a patch to turn HIGH_RES_TIMERS on for the Rust CI, and we
> can debug this further without blocking them.

Would it make sense to select HIGH_RES_TIMERS from one of the Kconfig*
files in arch/powerpc?  Asking for a friend.  ;-)

Thanx, Paul


[PATCH v4 2/2] PCI/DPC: Disable DPC service when link is in L2/L3 ready, L2 and L3 state

2022-04-08 Thread Kai-Heng Feng
On Intel Alder Lake platforms, Thunderbolt entering D3cold can cause
some errors reported by AER:
[   30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error 
received: :00:1d.0
[   30.100251] pcieport :00:1d.0: PCIe Bus Error: severity=Uncorrected 
(Non-Fatal), type=Transaction Layer, (Requester ID)
[   30.100256] pcieport :00:1d.0:   device [8086:7ab0] error 
status/mask=0010/4000
[   30.100262] pcieport :00:1d.0:[20] UnsupReq   (First)
[   30.100267] pcieport :00:1d.0: AER:   TLP Header: 3400 0852 
 
[   30.100372] thunderbolt :0a:00.0: AER: can't recover (no error_detected 
callback)
[   30.100401] xhci_hcd :3e:00.0: AER: can't recover (no error_detected 
callback)
[   30.100427] pcieport :00:1d.0: AER: device recovery failed

Since AER is disabled in previous patch for a Link in L2/L3 Ready, L2
and L3, also disable DPC here as DPC depends on AER to work.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453
Reviewed-by: Mika Westerberg 
Signed-off-by: Kai-Heng Feng 
---
v4:
 - Wording change.

v3:
 - Wording change to make the patch more clear.

v2:
 - Wording change.
 - Empty line dropped.

 drivers/pci/pcie/dpc.c | 60 +++---
 1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 3e9afee02e8d1..414258967f08e 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -343,13 +343,33 @@ void pci_dpc_init(struct pci_dev *pdev)
}
 }
 
+static void dpc_enable(struct pcie_device *dev)
+{
+   struct pci_dev *pdev = dev->port;
+   u16 ctl;
+
+   pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, );
+   ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | 
PCI_EXP_DPC_CTL_INT_EN;
+   pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
+}
+
+static void dpc_disable(struct pcie_device *dev)
+{
+   struct pci_dev *pdev = dev->port;
+   u16 ctl;
+
+   pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, );
+   ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN);
+   pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
+}
+
 #define FLAG(x, y) (((x) & (y)) ? '+' : '-')
 static int dpc_probe(struct pcie_device *dev)
 {
struct pci_dev *pdev = dev->port;
struct device *device = >device;
int status;
-   u16 ctl, cap;
+   u16 cap;
 
if (!pcie_aer_is_native(pdev) && !pcie_ports_dpc_native)
return -ENOTSUPP;
@@ -364,10 +384,7 @@ static int dpc_probe(struct pcie_device *dev)
}
 
pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CAP, );
-   pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, );
-
-   ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | 
PCI_EXP_DPC_CTL_INT_EN;
-   pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
+   dpc_enable(dev);
pci_info(pdev, "enabled with IRQ %d\n", dev->irq);
 
pci_info(pdev, "error containment capabilities: Int Msg #%d, RPExt%c 
PoisonedTLP%c SwTrigger%c RP PIO Log %d, DL_ActiveErr%c\n",
@@ -380,22 +397,33 @@ static int dpc_probe(struct pcie_device *dev)
return status;
 }
 
-static void dpc_remove(struct pcie_device *dev)
+static int dpc_suspend(struct pcie_device *dev)
 {
-   struct pci_dev *pdev = dev->port;
-   u16 ctl;
+   dpc_disable(dev);
+   return 0;
+}
 
-   pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, );
-   ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN);
-   pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
+static int dpc_resume(struct pcie_device *dev)
+{
+   dpc_enable(dev);
+   return 0;
+}
+
+static void dpc_remove(struct pcie_device *dev)
+{
+   dpc_disable(dev);
 }
 
 static struct pcie_port_service_driver dpcdriver = {
-   .name   = "dpc",
-   .port_type  = PCIE_ANY_PORT,
-   .service= PCIE_PORT_SERVICE_DPC,
-   .probe  = dpc_probe,
-   .remove = dpc_remove,
+   .name   = "dpc",
+   .port_type  = PCIE_ANY_PORT,
+   .service= PCIE_PORT_SERVICE_DPC,
+   .probe  = dpc_probe,
+   .suspend= dpc_suspend,
+   .resume = dpc_resume,
+   .runtime_suspend= dpc_suspend,
+   .runtime_resume = dpc_resume,
+   .remove = dpc_remove,
 };
 
 int __init pcie_dpc_init(void)
-- 
2.34.1



[PATCH v4 1/2] PCI/AER: Disable AER service when link is in L2/L3 ready, L2 and L3 state

2022-04-08 Thread Kai-Heng Feng
On Intel Alder Lake platforms, Thunderbolt entering D3cold can cause
some errors reported by AER:
[   30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error 
received: :00:1d.0
[   30.100251] pcieport :00:1d.0: PCIe Bus Error: severity=Uncorrected 
(Non-Fatal), type=Transaction Layer, (Requester ID)
[   30.100256] pcieport :00:1d.0:   device [8086:7ab0] error 
status/mask=0010/4000
[   30.100262] pcieport :00:1d.0:[20] UnsupReq   (First)
[   30.100267] pcieport :00:1d.0: AER:   TLP Header: 3400 0852 
 
[   30.100372] thunderbolt :0a:00.0: AER: can't recover (no error_detected 
callback)
[   30.100401] xhci_hcd :3e:00.0: AER: can't recover (no error_detected 
callback)
[   30.100427] pcieport :00:1d.0: AER: device recovery failed

So disable AER service to avoid the noises from turning power rails
on/off when the device is in low power states (D3hot and D3cold), as
PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states
that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready
(D3hot), L2 (D3cold with aux power) and L3 (D3cold).

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453
Reviewed-by: Mika Westerberg 
Signed-off-by: Kai-Heng Feng 
---
v4:
 - Explicitly states the spec version.
 - Wording change. 

v3:
 - Remove reference to ACS.
 - Wording change.

v2:
 - Wording change.

 drivers/pci/pcie/aer.c | 31 +--
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 9fa1f97e5b270..e4e9d4a3098d7 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1367,6 +1367,22 @@ static int aer_probe(struct pcie_device *dev)
return 0;
 }
 
+static int aer_suspend(struct pcie_device *dev)
+{
+   struct aer_rpc *rpc = get_service_data(dev);
+
+   aer_disable_rootport(rpc);
+   return 0;
+}
+
+static int aer_resume(struct pcie_device *dev)
+{
+   struct aer_rpc *rpc = get_service_data(dev);
+
+   aer_enable_rootport(rpc);
+   return 0;
+}
+
 /**
  * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
  * @dev: pointer to Root Port, RCEC, or RCiEP
@@ -1433,12 +1449,15 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
*dev)
 }
 
 static struct pcie_port_service_driver aerdriver = {
-   .name   = "aer",
-   .port_type  = PCIE_ANY_PORT,
-   .service= PCIE_PORT_SERVICE_AER,
-
-   .probe  = aer_probe,
-   .remove = aer_remove,
+   .name   = "aer",
+   .port_type  = PCIE_ANY_PORT,
+   .service= PCIE_PORT_SERVICE_AER,
+   .probe  = aer_probe,
+   .suspend= aer_suspend,
+   .resume = aer_resume,
+   .runtime_suspend= aer_suspend,
+   .runtime_resume = aer_resume,
+   .remove = aer_remove,
 };
 
 /**
-- 
2.34.1



Re: rcu_sched self-detected stall on CPU

2022-04-08 Thread Michael Ellerman
Michael Ellerman  writes:
> "Paul E. McKenney"  writes:
>> On Wed, Apr 06, 2022 at 05:31:10PM +0800, Zhouyi Zhou wrote:
>>> Hi
>>> 
>>> I can reproduce it in a ppc virtual cloud server provided by Oregon
>>> State University.  Following is what I do:
>>> 1) curl -l 
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/snapshot/linux-5.18-rc1.tar.gz
>>> -o linux-5.18-rc1.tar.gz
>>> 2) tar zxf linux-5.18-rc1.tar.gz
>>> 3) cp config linux-5.18-rc1/.config
>>> 4) cd linux-5.18-rc1
>>> 5) make vmlinux -j 8
>>> 6) qemu-system-ppc64 -kernel vmlinux -nographic -vga none -no-reboot
>>> -smp 2 (QEMU 4.2.1)
>>> 7) after 12 rounds, the bug got reproduced:
>>> (http://154.223.142.244/logs/20220406/qemu.log.txt)
>>
>> Just to make sure, are you both seeing the same thing?  Last I knew,
>> Zhouyi was chasing an RCU-tasks issue that appears only in kernels
>> built with CONFIG_PROVE_RCU=y, which Miguel does not have set.  Or did
>> I miss something?
>>
>> Miguel is instead seeing an RCU CPU stall warning where RCU's grace-period
>> kthread slept for three milliseconds, but did not wake up for more than
>> 20 seconds.  This kthread would normally have awakened on CPU 1, but
>> CPU 1 looks to me to be very unhealthy, as can be seen in your console
>> output below (but maybe my idea of what is healthy for powerpc systems
>> is outdated).  Please see also the inline annotations.
>>
>> Thoughts from the PPC guys?
>
> I haven't seen it in my testing. But using Miguel's config I can
> reproduce it seemingly on every boot.
>
> For me it bisects to:
>
>   35de589cb879 ("powerpc/time: improve decrementer clockevent processing")
>
> Which seems plausible.
>
> Reverting that on mainline makes the bug go away.
>
> I don't see an obvious bug in the diff, but I could be wrong, or the old
> code was papering over an existing bug?
>
> I'll try and work out what it is about Miguel's config that exposes
> this vs our defconfig, that might give us a clue.

It's CONFIG_HIGH_RES_TIMERS=n which triggers the stall.

I can reproduce just with:

  $ make ppc64le_guest_defconfig
  $ ./scripts/config -d HIGH_RES_TIMERS

We have no defconfigs that disable HIGH_RES_TIMERS, I didn't even
realise you could disable it TBH :)

The Rust CI has it disabled because I copied that from the x86 defconfig
they were using back when I added the Rust support. I think that was
meant to be a stripped down fast config for CI, but the result is it's
just using a badly tested combination which is not helpful.

So I'll send a patch to turn HIGH_RES_TIMERS on for the Rust CI, and we
can debug this further without blocking them.

cheers


Re: rcu_sched self-detected stall on CPU

2022-04-08 Thread Zhouyi Zhou
On Fri, Apr 8, 2022 at 10:07 PM Paul E. McKenney  wrote:
>
> On Fri, Apr 08, 2022 at 06:02:19PM +0800, Zhouyi Zhou wrote:
> > On Fri, Apr 8, 2022 at 3:23 PM Michael Ellerman  wrote:
> > >
> > > "Paul E. McKenney"  writes:
> > > > On Wed, Apr 06, 2022 at 05:31:10PM +0800, Zhouyi Zhou wrote:
> > > >> Hi
> > > >>
> > > >> I can reproduce it in a ppc virtual cloud server provided by Oregon
> > > >> State University.  Following is what I do:
> > > >> 1) curl -l 
> > > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/snapshot/linux-5.18-rc1.tar.gz
> > > >> -o linux-5.18-rc1.tar.gz
> > > >> 2) tar zxf linux-5.18-rc1.tar.gz
> > > >> 3) cp config linux-5.18-rc1/.config
> > > >> 4) cd linux-5.18-rc1
> > > >> 5) make vmlinux -j 8
> > > >> 6) qemu-system-ppc64 -kernel vmlinux -nographic -vga none -no-reboot
> > > >> -smp 2 (QEMU 4.2.1)
> > > >> 7) after 12 rounds, the bug got reproduced:
> > > >> (http://154.223.142.244/logs/20220406/qemu.log.txt)
> > > >
> > > > Just to make sure, are you both seeing the same thing?  Last I knew,
> > > > Zhouyi was chasing an RCU-tasks issue that appears only in kernels
> > > > built with CONFIG_PROVE_RCU=y, which Miguel does not have set.  Or did
> > > > I miss something?
> > > >
> > > > Miguel is instead seeing an RCU CPU stall warning where RCU's 
> > > > grace-period
> > > > kthread slept for three milliseconds, but did not wake up for more than
> > > > 20 seconds.  This kthread would normally have awakened on CPU 1, but
> > > > CPU 1 looks to me to be very unhealthy, as can be seen in your console
> > > > output below (but maybe my idea of what is healthy for powerpc systems
> > > > is outdated).  Please see also the inline annotations.
> > > >
> > > > Thoughts from the PPC guys?
> > >
> > > I haven't seen it in my testing. But using Miguel's config I can
> > > reproduce it seemingly on every boot.
> > >
> > > For me it bisects to:
> > >
> > >   35de589cb879 ("powerpc/time: improve decrementer clockevent processing")
> > >
> > > Which seems plausible.
> > I also bisect to 35de589cb879 ("powerpc/time: improve decrementer
> > clockevent processing")
>
> Very good!  Thank you all!!!
You are very welcome ;-)  and Thank you all
>
> Thanx, Paul
>
> > > Reverting that on mainline makes the bug go away.
> > I also revert that on the mainline, and am currently doing a pressure
> > test (by repeatedly invoking qemu and checking the console.log) on PPC
> > VM in Oregon State University.
After 306 rounds of stress test on mainline without triggering the bug
(last for 4 hours and 27 minutes), I think the bug is indeed caused by
35de589cb879 ("powerpc/time: improve decrementer clockevent
processing") and stop the test for now.

Thanks ;-)
Zhouyi
> > >
> > > I don't see an obvious bug in the diff, but I could be wrong, or the old
> > > code was papering over an existing bug?
> > >
> > > I'll try and work out what it is about Miguel's config that exposes
> > > this vs our defconfig, that might give us a clue.
> > Great job!
> > >
> > > cheers
> > Thanks
> > Zhouyi


Re: [PATCH v5] PCI hotplug: rpaphp: Error out on busy status from get-sensor-state

2022-04-08 Thread Mahesh J Salgaonkar
On 2022-03-09 22:42:16 Wed, Mahesh Salgaonkar wrote:
> When certain PHB HW failure causes phyp to recover PHB, it marks the PE
> state as temporarily unavailable until recovery is complete. This also
> triggers an EEH handler in Linux which needs to notify drivers, and perform
> recovery. But before notifying the driver about the pci error it uses
> get_adapter_state()->get-sesnor-state() operation of the hotplug_slot to
> determine if the slot contains a device or not. if the slot is empty, the
> recovery is skipped entirely.
> 
> However on certain PHB failures, the rtas call get-sesnor-state() returns
> extended busy error (9902) until PHB is recovered by phyp. Once PHB is
> recovered, the get-sensor-state() returns success with correct presence
> status. The rtas call interface rtas_get_sensor() loops over the rtas call
> on extended delay return code (9902) until the return value is either
> success (0) or error (-1). This causes the EEH handler to get stuck for ~6
> seconds before it could notify that the pci error has been detected and
> stop any active operations. Hence with running I/O traffic, during this 6
> seconds, the network driver continues its operation and hits a timeout
> (netdev watchdog). On timeouts, network driver go into ffdc capture mode
> and reset path assuming the PCI device is in fatal condition. This
> sometimes causes EEH recovery to fail. This impacts the ssh connection and
> leads to the system being inaccessible.
> 
> 
> [52732.244731] DEBUG: ibm_read_slot_reset_state2()
> [52732.244762] DEBUG: ret = 0, rets[0]=5, rets[1]=1, rets[2]=4000, rets[3]=>
> [52732.244798] DEBUG: in eeh_slot_presence_check
> [52732.244804] DEBUG: error state check
> [52732.244807] DEBUG: Is slot hotpluggable
> [52732.244810] DEBUG: hotpluggable ops ?
> [52732.244953] DEBUG: Calling ops->get_adapter_status
> [52732.244958] DEBUG: calling rpaphp_get_sensor_state
> [52736.564262] [ cut here ]
> [52736.564299] NETDEV WATCHDOG: enP64p1s0f3 (tg3): transmit queue 0 timed o>
> [52736.564324] WARNING: CPU: 1442 PID: 0 at net/sched/sch_generic.c:478 dev>
> [...]
> [52736.564505] NIP [c0c32368] dev_watchdog+0x438/0x440
> [52736.564513] LR [c0c32364] dev_watchdog+0x434/0x440
> 
> 
> To avoid this issue, fix the pci hotplug driver (rpaphp) to return an error
> if the slot presence state can not be detected immediately while pe is in
> EEH recovery state. Current implementation uses rtas_get_sensor() API which
> blocks the slot check state until rtas call returns success. Change
> rpaphp_get_sensor_state() to invoke rtas_call(get-sensor-state) directly
> only if the respective pe is in EEH recovery state, and take actions based
> on rtas return status.
> 
> In normal cases (non-EEH case) rpaphp_get_sensor_state() will continue to
> invoke rtas_get_sensor() as it was earlier with no change in existing
> behavior.
> 
> Signed-off-by: Mahesh Salgaonkar 
> ---
> Change in v5:
> - Fixup #define macros with parentheses around the values.
> 
> Change in V4:
> - Error out on sensor busy only if pe is going through EEH recovery instead
>   of always error out.
> 
> Change in V3:
> - Invoke rtas_call(get-sensor-state) directly from
>   rpaphp_get_sensor_state() directly and do special handling.
> - See v2 at
>   https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-November/237336.html
> 
> Change in V2:
> - Alternate approach to fix the EEH issue instead of delaying slot presence
>   check proposed at
>   https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-November/236956.html
> 
> Also refer:
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-November/237027.html
> ---
>  drivers/pci/hotplug/rpaphp_pci.c |  100 
> +-
>  1 file changed, 97 insertions(+), 3 deletions(-)

Any comments on this patch?

Thanks,
-Mahesh.

> 
> diff --git a/drivers/pci/hotplug/rpaphp_pci.c 
> b/drivers/pci/hotplug/rpaphp_pci.c
> index c380bdacd1466..e463e915a052a 100644
> --- a/drivers/pci/hotplug/rpaphp_pci.c
> +++ b/drivers/pci/hotplug/rpaphp_pci.c
> @@ -18,12 +18,107 @@
>  #include "../pci.h"  /* for pci_add_new_bus */
>  #include "rpaphp.h"
>  
> +/*
> + * RTAS call get-sensor-state(DR_ENTITY_SENSE) return values as per PAPR:
> + *-1: Hardware Error
> + *-2: RTAS_BUSY
> + *-3: Invalid sensor. RTAS Parameter Error.
> + * -9000: Need DR entity to be powered up and unisolated before RTAS call
> + * -9001: Need DR entity to be powered up, but not unisolated, before RTAS 
> call
> + * -9002: DR entity unusable
> + *  990x: Extended delay - where x is a number in the range of 0-5
> + */
> +#define RTAS_HARDWARE_ERROR  (-1)
> +#define RTAS_INVALID_SENSOR  (-3)
> +#define SLOT_UNISOLATED  (-9000)
> +#define SLOT_NOT_UNISOLATED  (-9001)
> +#define SLOT_NOT_USABLE  (-9002)
> +
> +static int rtas_to_errno(int rtas_rc)
> +{
> + int rc;
> +
> + switch (rtas_rc) {
> + case RTAS_HARDWARE_ERROR:
> 

Re: rcu_sched self-detected stall on CPU

2022-04-08 Thread Paul E. McKenney
On Fri, Apr 08, 2022 at 06:02:19PM +0800, Zhouyi Zhou wrote:
> On Fri, Apr 8, 2022 at 3:23 PM Michael Ellerman  wrote:
> >
> > "Paul E. McKenney"  writes:
> > > On Wed, Apr 06, 2022 at 05:31:10PM +0800, Zhouyi Zhou wrote:
> > >> Hi
> > >>
> > >> I can reproduce it in a ppc virtual cloud server provided by Oregon
> > >> State University.  Following is what I do:
> > >> 1) curl -l 
> > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/snapshot/linux-5.18-rc1.tar.gz
> > >> -o linux-5.18-rc1.tar.gz
> > >> 2) tar zxf linux-5.18-rc1.tar.gz
> > >> 3) cp config linux-5.18-rc1/.config
> > >> 4) cd linux-5.18-rc1
> > >> 5) make vmlinux -j 8
> > >> 6) qemu-system-ppc64 -kernel vmlinux -nographic -vga none -no-reboot
> > >> -smp 2 (QEMU 4.2.1)
> > >> 7) after 12 rounds, the bug got reproduced:
> > >> (http://154.223.142.244/logs/20220406/qemu.log.txt)
> > >
> > > Just to make sure, are you both seeing the same thing?  Last I knew,
> > > Zhouyi was chasing an RCU-tasks issue that appears only in kernels
> > > built with CONFIG_PROVE_RCU=y, which Miguel does not have set.  Or did
> > > I miss something?
> > >
> > > Miguel is instead seeing an RCU CPU stall warning where RCU's grace-period
> > > kthread slept for three milliseconds, but did not wake up for more than
> > > 20 seconds.  This kthread would normally have awakened on CPU 1, but
> > > CPU 1 looks to me to be very unhealthy, as can be seen in your console
> > > output below (but maybe my idea of what is healthy for powerpc systems
> > > is outdated).  Please see also the inline annotations.
> > >
> > > Thoughts from the PPC guys?
> >
> > I haven't seen it in my testing. But using Miguel's config I can
> > reproduce it seemingly on every boot.
> >
> > For me it bisects to:
> >
> >   35de589cb879 ("powerpc/time: improve decrementer clockevent processing")
> >
> > Which seems plausible.
> I also bisect to 35de589cb879 ("powerpc/time: improve decrementer
> clockevent processing")

Very good!  Thank you all!!!

Thanx, Paul

> > Reverting that on mainline makes the bug go away.
> I also revert that on the mainline, and am currently doing a pressure
> test (by repeatedly invoking qemu and checking the console.log) on PPC
> VM in Oregon State University.
> >
> > I don't see an obvious bug in the diff, but I could be wrong, or the old
> > code was papering over an existing bug?
> >
> > I'll try and work out what it is about Miguel's config that exposes
> > this vs our defconfig, that might give us a clue.
> Great job!
> >
> > cheers
> Thanks
> Zhouyi


Re: rcu_sched self-detected stall on CPU

2022-04-08 Thread Paul E. McKenney
On Fri, Apr 08, 2022 at 05:23:32PM +1000, Michael Ellerman wrote:
> "Paul E. McKenney"  writes:
> > On Wed, Apr 06, 2022 at 05:31:10PM +0800, Zhouyi Zhou wrote:
> >> Hi
> >> 
> >> I can reproduce it in a ppc virtual cloud server provided by Oregon
> >> State University.  Following is what I do:
> >> 1) curl -l 
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/snapshot/linux-5.18-rc1.tar.gz
> >> -o linux-5.18-rc1.tar.gz
> >> 2) tar zxf linux-5.18-rc1.tar.gz
> >> 3) cp config linux-5.18-rc1/.config
> >> 4) cd linux-5.18-rc1
> >> 5) make vmlinux -j 8
> >> 6) qemu-system-ppc64 -kernel vmlinux -nographic -vga none -no-reboot
> >> -smp 2 (QEMU 4.2.1)
> >> 7) after 12 rounds, the bug got reproduced:
> >> (http://154.223.142.244/logs/20220406/qemu.log.txt)
> >
> > Just to make sure, are you both seeing the same thing?  Last I knew,
> > Zhouyi was chasing an RCU-tasks issue that appears only in kernels
> > built with CONFIG_PROVE_RCU=y, which Miguel does not have set.  Or did
> > I miss something?
> >
> > Miguel is instead seeing an RCU CPU stall warning where RCU's grace-period
> > kthread slept for three milliseconds, but did not wake up for more than
> > 20 seconds.  This kthread would normally have awakened on CPU 1, but
> > CPU 1 looks to me to be very unhealthy, as can be seen in your console
> > output below (but maybe my idea of what is healthy for powerpc systems
> > is outdated).  Please see also the inline annotations.
> >
> > Thoughts from the PPC guys?
> 
> I haven't seen it in my testing. But using Miguel's config I can
> reproduce it seemingly on every boot.
> 
> For me it bisects to:
> 
>   35de589cb879 ("powerpc/time: improve decrementer clockevent processing")
> 
> Which seems plausible.
> 
> Reverting that on mainline makes the bug go away.

Thank you for looking into this!

> I don't see an obvious bug in the diff, but I could be wrong, or the old
> code was papering over an existing bug?
> 
> I'll try and work out what it is about Miguel's config that exposes
> this vs our defconfig, that might give us a clue.

I have recently had some RCU bugs that were due to Kconfig failing to
rule out broken .config files.  Maybe this is something similar?

Thanx, Paul


Re: rcu_sched self-detected stall on CPU

2022-04-08 Thread Miguel Ojeda
On Fri, Apr 8, 2022 at 9:23 AM Michael Ellerman  wrote:
>
> I haven't seen it in my testing. But using Miguel's config I can
> reproduce it seemingly on every boot.

Hmm... I noticed this for some kernel builds: in some builds/commits,
it triggered the very first time, while in others I had to re-try
quite a few times. It could be a "fluke", but since it happened to you
too (and Zhouyi seemed to need 12 tries), it may be that particular
kernel builds makes the bug much more likely.

> For me it bisects to:
>
>   35de589cb879 ("powerpc/time: improve decrementer clockevent processing")
>
> Which seems plausible.
>
> Reverting that on mainline makes the bug go away.

That is great, thanks for that -- I can revert that one in our CI meanwhile.

> I'll try and work out what it is about Miguel's config that exposes
> this vs our defconfig, that might give us a clue.

Yeah, it is one based on the "debug" one you sent for Rust PPC.
Assuming you based that one on the others we had for other archs, then
I guess we are bound to find some things like this at times like with
randconfig, since I made them to be fairly minimal and "custom"... :)

Cheers,
Miguel


Re: [PATCH V4 2/7] powerpc/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT

2022-04-08 Thread Christophe Leroy


Le 07/04/2022 à 12:32, Anshuman Khandual a écrit :
> This defines and exports a platform specific custom vm_get_page_prot() via
> subscribing ARCH_HAS_VM_GET_PAGE_PROT. While here, this also localizes
> arch_vm_get_page_prot() as powerpc_vm_get_page_prot() and moves it near
> vm_get_page_prot().
> 
> Cc: Michael Ellerman 
> Cc: Paul Mackerras 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-ker...@vger.kernel.org
> Signed-off-by: Anshuman Khandual 
> ---
>   arch/powerpc/Kconfig|  1 +
>   arch/powerpc/include/asm/mman.h | 12 
>   arch/powerpc/mm/mmap.c  | 26 ++
>   3 files changed, 27 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 174edabb74fa..eb9b6ddbf92f 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -140,6 +140,7 @@ config PPC
>   select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
>   select ARCH_HAS_UACCESS_FLUSHCACHE
>   select ARCH_HAS_UBSAN_SANITIZE_ALL
> + select ARCH_HAS_VM_GET_PAGE_PROT
>   select ARCH_HAVE_NMI_SAFE_CMPXCHG
>   select ARCH_KEEP_MEMBLOCK
>   select ARCH_MIGHT_HAVE_PC_PARPORT
> diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
> index 7cb6d18f5cd6..1b024e64c8ec 100644
> --- a/arch/powerpc/include/asm/mman.h
> +++ b/arch/powerpc/include/asm/mman.h
> @@ -24,18 +24,6 @@ static inline unsigned long 
> arch_calc_vm_prot_bits(unsigned long prot,
>   }
>   #define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, 
> pkey)
>   
> -static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags)
> -{
> -#ifdef CONFIG_PPC_MEM_KEYS
> - return (vm_flags & VM_SAO) ?
> - __pgprot(_PAGE_SAO | vmflag_to_pte_pkey_bits(vm_flags)) :
> - __pgprot(0 | vmflag_to_pte_pkey_bits(vm_flags));
> -#else
> - return (vm_flags & VM_SAO) ? __pgprot(_PAGE_SAO) : __pgprot(0);
> -#endif
> -}
> -#define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags)
> -
>   static inline bool arch_validate_prot(unsigned long prot, unsigned long 
> addr)
>   {
>   if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_SAO))
> diff --git a/arch/powerpc/mm/mmap.c b/arch/powerpc/mm/mmap.c
> index c475cf810aa8..cd17bd6fa36b 100644
> --- a/arch/powerpc/mm/mmap.c
> +++ b/arch/powerpc/mm/mmap.c
> @@ -254,3 +254,29 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct 
> rlimit *rlim_stack)
>   mm->get_unmapped_area = arch_get_unmapped_area_topdown;
>   }
>   }
> +
> +#ifdef CONFIG_PPC64
> +static pgprot_t powerpc_vm_get_page_prot(unsigned long vm_flags)
> +{
> +#ifdef CONFIG_PPC_MEM_KEYS
> + return (vm_flags & VM_SAO) ?
> + __pgprot(_PAGE_SAO | vmflag_to_pte_pkey_bits(vm_flags)) :
> + __pgprot(0 | vmflag_to_pte_pkey_bits(vm_flags));
> +#else
> + return (vm_flags & VM_SAO) ? __pgprot(_PAGE_SAO) : __pgprot(0);
> +#endif
> +}
> +#else
> +static pgprot_t powerpc_vm_get_page_prot(unsigned long vm_flags)
> +{
> + return __pgprot(0);
> +}
> +#endif /* CONFIG_PPC64 */

Can we reduce this forest of #ifdefs and make it more readable ?

mm/mmap.c is going away with patch 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/d6d849621f821af253e777a24eda4c648814a76e.1646847562.git.christophe.le...@csgroup.eu/

So it would be better to add two versions of vm_get_page_prot(), for 
instance one in mm/pgtable_64.c and one in mm/pgtable_32.c


> +
> +pgprot_t vm_get_page_prot(unsigned long vm_flags)
> +{
> + return __pgprot(pgprot_val(protection_map[vm_flags &
> + (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]) |
> +pgprot_val(powerpc_vm_get_page_prot(vm_flags)));
> +}
> +EXPORT_SYMBOL(vm_get_page_prot);

By the way I'm a bit puzzled with the name powerpc_vm_get_page_prot(), 
the name suggests that it's just a powerpc replacement of 
vm_get_page_prot(). I'd prefer if it was named __vm_get_page_prot(), it 
would be clearer that it is a complement to vm_get_page_prot() and not a 
remplacement.

Christophe

Re: [PATCH net-next 03/15] net: dsa: mv88e6xxx: Replace usage of found with dedicated iterator

2022-04-08 Thread Vladimir Oltean
Hi Jakob,

On Thu, Apr 07, 2022 at 12:28:48PM +0200, Jakob Koschel wrote:
> To move the list iterator variable into the list_for_each_entry_*()
> macro in the future it should be avoided to use the list iterator
> variable after the loop body.
> 
> To *never* use the list iterator variable after the loop it was
> concluded to use a separate iterator variable instead of a
> found boolean [1].
> 
> This removes the need to use a found variable and simply checking if
> the variable was set, can determine if the break/goto was hit.
> 
> Link: 
> https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
>  [1]
> Signed-off-by: Jakob Koschel 
> ---
>  drivers/net/dsa/mv88e6xxx/chip.c | 21 ++---
>  1 file changed, 10 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c 
> b/drivers/net/dsa/mv88e6xxx/chip.c
> index 64f4fdd02902..f254f537c357 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -1381,28 +1381,27 @@ static int mv88e6xxx_set_mac_eee(struct dsa_switch 
> *ds, int port,
>  /* Mask of the local ports allowed to receive frames from a given fabric 
> port */
>  static u16 mv88e6xxx_port_vlan(struct mv88e6xxx_chip *chip, int dev, int 
> port)
>  {
> + struct dsa_port *dp = NULL, *iter, *other_dp;
>   struct dsa_switch *ds = chip->ds;
>   struct dsa_switch_tree *dst = ds->dst;
> - struct dsa_port *dp, *other_dp;
> - bool found = false;
>   u16 pvlan;
>  
>   /* dev is a physical switch */
>   if (dev <= dst->last_switch) {
> - list_for_each_entry(dp, >ports, list) {
> - if (dp->ds->index == dev && dp->index == port) {
> - /* dp might be a DSA link or a user port, so it
> + list_for_each_entry(iter, >ports, list) {
> + if (iter->ds->index == dev && iter->index == port) {
> + /* iter might be a DSA link or a user port, so 
> it
>* might or might not have a bridge.
> -  * Use the "found" variable for both cases.
> +  * Set the "dp" variable for both cases.
>*/
> - found = true;
> + dp = iter;
>   break;
>   }
>   }
>   /* dev is a virtual bridge */
>   } else {
> - list_for_each_entry(dp, >ports, list) {
> - unsigned int bridge_num = dsa_port_bridge_num_get(dp);
> + list_for_each_entry(iter, >ports, list) {
> + unsigned int bridge_num = dsa_port_bridge_num_get(iter);
>  
>   if (!bridge_num)
>   continue;
> @@ -1410,13 +1409,13 @@ static u16 mv88e6xxx_port_vlan(struct mv88e6xxx_chip 
> *chip, int dev, int port)
>   if (bridge_num + dst->last_switch != dev)
>   continue;
>  
> - found = true;
> + dp = iter;
>   break;
>   }
>   }
>  
>   /* Prevent frames from unknown switch or virtual bridge */
> - if (!found)
> + if (!dp)
>   return 0;
>  
>   /* Frames from DSA links and CPU ports can egress any local port */
> -- 
> 2.25.1
> 

Let's try to not make convoluted code worse. Do the following 2 patches
achieve what you are looking for? Originally I had a single patch (what
is now 2/2) but I figured it would be cleaner to break out the unrelated
change into what is now 1/2.

If you want I can submit these changes separately.

-[ cut here ]-
>From 2d84ecd87566b1535a04526b4ebb2764e764625f Mon Sep 17 00:00:00 2001
From: Vladimir Oltean 
Date: Fri, 8 Apr 2022 15:15:30 +0300
Subject: [PATCH 1/2] net: dsa: mv88e6xxx: remove redundant check in
 mv88e6xxx_port_vlan()

We know that "dev > dst->last_switch" in the "else" block.
In other words, that "dev - dst->last_switch" is > 0.

dsa_port_bridge_num_get(dp) can be 0, but the check
"if (bridge_num + dst->last_switch != dev) continue", rewritten as
"if (bridge_num != dev - dst->last_switch) continue", aka
"if (bridge_num != something which cannot be 0) continue",
makes it redundant to have the extra "if (!bridge_num) continue" logic,
since a bridge_num of zero would have been skipped anyway.

Signed-off-by: Vladimir Oltean 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 64f4fdd02902..b3aa0e5bc842 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1404,9 +1404,6 @@ static u16 mv88e6xxx_port_vlan(struct mv88e6xxx_chip 
*chip, int dev, int port)
list_for_each_entry(dp, >ports, 

Re: [PATCH v2 3/4] tools/perf: Fix perf numa bench to fix usage of affinity for machines with #CPUs > 1K

2022-04-08 Thread Srikar Dronamraju
* Athira Rajeev  [2022-04-06 23:21:12]:

> perf bench numa testcase fails on systems with CPU's
> more than 1K.
> 
> Testcase: perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp  1
> Snippet of code:
> <<>>
> perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
> Aborted (core dumped)
> <<>>
> 
> bind_to_node function uses "sched_getaffinity" to save the original
> cpumask and this call is returning EINVAL ((invalid argument).
> This happens because the default mask size in glibc is 1024.
> To overcome this 1024 CPUs mask size limitation of cpu_set_t,
> change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to
> allocate cpumask, CPU_ALLOC_SIZE for size. Apart from fixing this
> for "orig_mask", apply same logic to "mask" as well which is used to
> setaffinity so that mask size is large enough to represent number
> of possible CPU's in the system.
> 
> sched_getaffinity is used in one more place in perf numa bench. It
> is in "bind_to_cpu" function. Apply the same logic there also. Though
> currently no failure is reported from there, it is ideal to change
> getaffinity to work with such system configurations having CPU's more
> than default mask size supported by glibc.
> 
> Also fix "sched_setaffinity" to use mask size which is large enough
> to represent number of possible CPU's in the system.
> 
> Fixed all places where "bind_cpumask" which is part of "struct
> thread_data" is used such that bind_cpumask works in all configuration.
> 
> Tested-by: Disha Goel 
> Signed-off-by: Athira Rajeev 
> Reported-by: Disha Goel 

Looks good to me.

Reviewed-by: Srikar Dronamraju 

> ---
>  tools/perf/bench/numa.c | 97 ++---
>  1 file changed, 71 insertions(+), 26 deletions(-)
> 
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index f2640179ada9..29e41e32bd88 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -54,7 +54,7 @@
> 
>  struct thread_data {
>   int curr_cpu;
> - cpu_set_t   bind_cpumask;
> + cpu_set_t   *bind_cpumask;
>   int bind_node;
>   u8  *process_data;
>   int process_nr;
> @@ -266,46 +266,71 @@ static bool node_has_cpus(int node)
>   return ret;
>  }
> 
> -static cpu_set_t bind_to_cpu(int target_cpu)
> +static cpu_set_t *bind_to_cpu(int target_cpu)
>  {
> - cpu_set_t orig_mask, mask;
> + int nrcpus = numa_num_possible_cpus();
> + cpu_set_t *orig_mask, *mask;
> + size_t size;
>   int ret;
> 
> - ret = sched_getaffinity(0, sizeof(orig_mask), _mask);
> - BUG_ON(ret);
> + orig_mask = CPU_ALLOC(nrcpus);
> + BUG_ON(!orig_mask);
> + size = CPU_ALLOC_SIZE(nrcpus);
> + CPU_ZERO_S(size, orig_mask);
> +
> + ret = sched_getaffinity(0, size, orig_mask);
> + if (ret) {
> + CPU_FREE(orig_mask);
> + BUG_ON(ret);
> + }
> 
> - CPU_ZERO();
> + mask = CPU_ALLOC(nrcpus);
> + BUG_ON(!mask);
> + CPU_ZERO_S(size, mask);
> 
>   if (target_cpu == -1) {
>   int cpu;
> 
>   for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
> - CPU_SET(cpu, );
> + CPU_SET_S(cpu, size, mask);
>   } else {
>   BUG_ON(target_cpu < 0 || target_cpu >= g->p.nr_cpus);
> - CPU_SET(target_cpu, );
> + CPU_SET_S(target_cpu, size, mask);
>   }
> 
> - ret = sched_setaffinity(0, sizeof(mask), );
> + ret = sched_setaffinity(0, size, mask);
> + CPU_FREE(mask);
>   BUG_ON(ret);
> 
>   return orig_mask;
>  }
> 
> -static cpu_set_t bind_to_node(int target_node)
> +static cpu_set_t *bind_to_node(int target_node)
>  {
> - cpu_set_t orig_mask, mask;
> + int nrcpus = numa_num_possible_cpus();
> + cpu_set_t *orig_mask, *mask;
> + size_t size;
>   int cpu;
>   int ret;
> 
> - ret = sched_getaffinity(0, sizeof(orig_mask), _mask);
> - BUG_ON(ret);
> + orig_mask = CPU_ALLOC(nrcpus);
> + BUG_ON(!orig_mask);
> + size = CPU_ALLOC_SIZE(nrcpus);
> + CPU_ZERO_S(size, orig_mask);
> +
> + ret = sched_getaffinity(0, size, orig_mask);
> + if (ret) {
> + CPU_FREE(orig_mask);
> + BUG_ON(ret);
> + }
> 
> - CPU_ZERO();
> + mask = CPU_ALLOC(nrcpus);
> + BUG_ON(!mask);
> + CPU_ZERO_S(size, mask);
> 
>   if (target_node == NUMA_NO_NODE) {
>   for (cpu = 0; cpu < g->p.nr_cpus; cpu++)
> - CPU_SET(cpu, );
> + CPU_SET_S(cpu, size, mask);
>   } else {
>   struct bitmask *cpumask = numa_allocate_cpumask();
> 
> @@ -313,24 +338,29 @@ static cpu_set_t bind_to_node(int target_node)
>   if (!numa_node_to_cpus(target_node, cpumask)) {
>   for (cpu = 0; cpu < (int)cpumask->size; cpu++) {
>   if 

Re: [PATCH v2 1/4] tools/perf: Fix perf bench futex to correct usage of affinity for machines with #CPUs > 1K

2022-04-08 Thread Srikar Dronamraju
* Athira Rajeev  [2022-04-06 23:21:10]:

> perf bench futex testcase fails on systems with CPU's
> more than 1K.
> 
> Testcase: perf bench futex all
> Failure snippet:
> <<>>Running futex/hash benchmark...
> 
> perf: pthread_create: No such file or directory
> <<>>
> 
> All the futex benchmarks ( ie hash, lock-api, requeue, wake,
> wake-parallel ), pthread_create is invoked in respective bench_futex_*
> function. Though the logs shows direct failure from pthread_create,
> strace logs showed that actual failure is from  "sched_setaffinity"
> returning EINVAL (invalid argument). This happens because the default
> mask size in glibc is 1024. To overcome this 1024 CPUs mask size
> limitation of cpu_set_t, change the mask size using the CPU_*_S macros.
> 
> Patch addresses this by fixing all the futex benchmarks to use
> CPU_ALLOC to allocate cpumask, CPU_ALLOC_SIZE for size, and
> CPU_SET_S to set the mask.
> 
> Tested-by: Disha Goel 
> Signed-off-by: Athira Rajeev 
> Reported-by: Disha Goel 

Looks good to me
Reviewed-by: Srikar Dronamraju 


> ---
>  tools/perf/bench/futex-hash.c  | 26 +++---
>  tools/perf/bench/futex-lock-pi.c   | 21 -
>  tools/perf/bench/futex-requeue.c   | 21 -
>  tools/perf/bench/futex-wake-parallel.c | 21 -
>  tools/perf/bench/futex-wake.c  | 22 --
>  5 files changed, 83 insertions(+), 28 deletions(-)
> 
> diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c
> index 9627b6ab8670..dfce64e551e2 100644
> --- a/tools/perf/bench/futex-hash.c
> +++ b/tools/perf/bench/futex-hash.c
> @@ -122,12 +122,14 @@ static void print_summary(void)
>  int bench_futex_hash(int argc, const char **argv)
>  {
>   int ret = 0;
> - cpu_set_t cpuset;
> + cpu_set_t *cpuset;
>   struct sigaction act;
>   unsigned int i;
>   pthread_attr_t thread_attr;
>   struct worker *worker = NULL;
>   struct perf_cpu_map *cpu;
> + int nrcpus;
> + size_t size;
> 
>   argc = parse_options(argc, argv, options, bench_futex_hash_usage, 0);
>   if (argc) {
> @@ -170,25 +172,35 @@ int bench_futex_hash(int argc, const char **argv)
>   threads_starting = params.nthreads;
>   pthread_attr_init(_attr);
>   gettimeofday(__start, NULL);
> +
> + nrcpus = perf_cpu_map__nr(cpu);
> + cpuset = CPU_ALLOC(nrcpus);
> + BUG_ON(!cpuset);
> + size = CPU_ALLOC_SIZE(nrcpus);
> +
>   for (i = 0; i < params.nthreads; i++) {
>   worker[i].tid = i;
>   worker[i].futex = calloc(params.nfutexes, 
> sizeof(*worker[i].futex));
>   if (!worker[i].futex)
>   goto errmem;
> 
> - CPU_ZERO();
> - CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, 
> );
> + CPU_ZERO_S(size, cpuset);
> 
> - ret = pthread_attr_setaffinity_np(_attr, 
> sizeof(cpu_set_t), );
> - if (ret)
> + CPU_SET_S(perf_cpu_map__cpu(cpu, i % 
> perf_cpu_map__nr(cpu)).cpu, size, cpuset);
> + ret = pthread_attr_setaffinity_np(_attr, size, cpuset);
> + if (ret) {
> + CPU_FREE(cpuset);
>   err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
> -
> + }
>   ret = pthread_create([i].thread, _attr, workerfn,
>(void *)(struct worker *) [i]);
> - if (ret)
> + if (ret) {
> + CPU_FREE(cpuset);
>   err(EXIT_FAILURE, "pthread_create");
> + }
> 
>   }
> + CPU_FREE(cpuset);
>   pthread_attr_destroy(_attr);
> 
>   pthread_mutex_lock(_lock);
> diff --git a/tools/perf/bench/futex-lock-pi.c 
> b/tools/perf/bench/futex-lock-pi.c
> index a512a320df74..61c3bb80d4cf 100644
> --- a/tools/perf/bench/futex-lock-pi.c
> +++ b/tools/perf/bench/futex-lock-pi.c
> @@ -120,11 +120,17 @@ static void *workerfn(void *arg)
>  static void create_threads(struct worker *w, pthread_attr_t thread_attr,
>  struct perf_cpu_map *cpu)
>  {
> - cpu_set_t cpuset;
> + cpu_set_t *cpuset;
>   unsigned int i;
> + int nrcpus =  perf_cpu_map__nr(cpu);
> + size_t size;
> 
>   threads_starting = params.nthreads;
> 
> + cpuset = CPU_ALLOC(nrcpus);
> + BUG_ON(!cpuset);
> + size = CPU_ALLOC_SIZE(nrcpus);
> +
>   for (i = 0; i < params.nthreads; i++) {
>   worker[i].tid = i;
> 
> @@ -135,15 +141,20 @@ static void create_threads(struct worker *w, 
> pthread_attr_t thread_attr,
>   } else
>   worker[i].futex = _futex;
> 
> - CPU_ZERO();
> - CPU_SET(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, 
> );
> + CPU_ZERO_S(size, cpuset);
> + CPU_SET_S(perf_cpu_map__cpu(cpu, i % 
> perf_cpu_map__nr(cpu)).cpu, size, cpuset);
> 
> - 

Re: [PATCH v2 4/4] tools/perf: Fix perf bench numa testcase to check if CPU used to bind task is online

2022-04-08 Thread Srikar Dronamraju
* Athira Rajeev  [2022-04-06 23:21:13]:

> Perf numa bench test fails with error:
> 
> Testcase:
> ./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq
> --thp  1 --no-data_rand_walk
> 
> Failure snippet:
> <<>>
>  Running 'numa/mem' benchmark:
> 
>  # Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8
> -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"
> 
> perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed.
> <<>>
> 
> The Testcases uses CPU???s 0 and 8. In function "parse_setup_cpu_list",
> There is check to see if cpu number is greater than max cpu???s possible
> in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus ||
> bind_cpu_1 >= g->p.nr_cpus) {". But it could happen that system has
> say 48 CPU???s, but only number of online CPU???s is 0-7. Other CPU???s
> are offlined. Since "g->p.nr_cpus" is 48, so function will go ahead
> and set bit for CPU 8 also in cpumask ( td->bind_cpumask).
> 
> bind_to_cpumask function is called to set affinity using
> sched_setaffinity and the cpumask. Since the CPU8 is not present,
> set affinity will fail here with EINVAL. Fix this issue by adding a
> check to make sure that, CPU???s provided in the input argument values
> are online before proceeding further and skip the test. For this,
> include new helper function "is_cpu_online" in
> "tools/perf/util/header.c".
> 
> Since "BIT(x)" definition will get included from header.h, remove
> that from bench/numa.c
> 
> Tested-by: Disha Goel 
> Signed-off-by: Athira Rajeev 
> Reported-by: Disha Goel 

Looks good to me.
Reviewed-by: Srikar Dronamraju 

> ---
>  tools/perf/bench/numa.c  |  8 ++--
>  tools/perf/util/header.c | 43 
>  tools/perf/util/header.h |  1 +
>  3 files changed, 50 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> index 29e41e32bd88..7992d79b3e41 100644
> --- a/tools/perf/bench/numa.c
> +++ b/tools/perf/bench/numa.c
> @@ -34,6 +34,7 @@
>  #include 
>  #include 
> 
> +#include "../util/header.h"
>  #include 
>  #include 
> 
> @@ -616,6 +617,11 @@ static int parse_setup_cpu_list(void)
>   return -1;
>   }
> 
> + if (is_cpu_online(bind_cpu_0) != 1 || is_cpu_online(bind_cpu_1) 
> != 1) {
> + printf("\nTest not applicable, bind_cpu_0 or bind_cpu_1 
> is offline\n");
> + return -1;
> + }
> +
>   BUG_ON(bind_cpu_0 < 0 || bind_cpu_1 < 0);
>   BUG_ON(bind_cpu_0 > bind_cpu_1);
> 
> @@ -786,8 +792,6 @@ static int parse_nodes_opt(const struct option *opt 
> __maybe_unused,
>   return parse_node_list(arg);
>  }
> 
> -#define BIT(x) (1ul << x)
> -
>  static inline uint32_t lfsr_32(uint32_t lfsr)
>  {
>   const uint32_t taps = BIT(1) | BIT(5) | BIT(6) | BIT(31);
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index 6da12e522edc..3f5fcf5d4b3f 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -983,6 +983,49 @@ static int write_dir_format(struct feat_fd *ff,
>   return do_write(ff, >dir.version, sizeof(data->dir.version));
>  }
> 
> +#define SYSFS "/sys/devices/system/cpu/"
> +
> +/*
> + * Check whether a CPU is online
> + *
> + * Returns:
> + * 1 -> if CPU is online
> + * 0 -> if CPU is offline
> + *-1 -> error case
> + */
> +int is_cpu_online(unsigned int cpu)
> +{
> + char sysfs_cpu[255];
> + char buf[255];
> + struct stat statbuf;
> + size_t len;
> + int fd;
> +
> + snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u", cpu);
> +
> + if (stat(sysfs_cpu, ) != 0)
> + return 0;
> +
> + /*
> +  * Check if /sys/devices/system/cpu/cpux/online file
> +  * exists. In kernels without CONFIG_HOTPLUG_CPU, this
> +  * file won't exist.
> +  */
> + snprintf(sysfs_cpu, sizeof(sysfs_cpu), SYSFS "cpu%u/online", cpu);
> + if (stat(sysfs_cpu, ) != 0)
> + return 1;
> +
> + fd = open(sysfs_cpu, O_RDONLY);
> + if (fd == -1)
> + return -1;
> +
> + len = read(fd, buf, sizeof(buf) - 1);
> + buf[len] = '\0';
> + close(fd);
> +
> + return strtoul(buf, NULL, 16);
> +}
> +
>  #ifdef HAVE_LIBBPF_SUPPORT
>  static int write_bpf_prog_info(struct feat_fd *ff,
>  struct evlist *evlist __maybe_unused)
> diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
> index c9e3265832d9..0eb4bc29a5a4 100644
> --- a/tools/perf/util/header.h
> +++ b/tools/perf/util/header.h
> @@ -158,6 +158,7 @@ int do_write(struct feat_fd *fd, const void *buf, size_t 
> size);
>  int write_padded(struct feat_fd *fd, const void *bf,
>size_t count, size_t count_aligned);
> 
> +int is_cpu_online(unsigned int cpu);
>  /*
>   * arch specific callback
>   */
> -- 
> 2.35.1
> 


Re: [PATCH] powerpc/numa: Handle partially initialized numa nodes

2022-04-08 Thread Srikar Dronamraju
* Oscar Salvador  [2022-04-06 18:19:00]:

> On Wed, Mar 30, 2022 at 07:21:23PM +0530, Srikar Dronamraju wrote:
> >  arch/powerpc/mm/numa.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> > index b9b7fefbb64b..13022d734951 100644
> > --- a/arch/powerpc/mm/numa.c
> > +++ b/arch/powerpc/mm/numa.c
> > @@ -1436,7 +1436,7 @@ int find_and_online_cpu_nid(int cpu)
> > if (new_nid < 0 || !node_possible(new_nid))
> > new_nid = first_online_node;
> >  
> > -   if (NODE_DATA(new_nid) == NULL) {
> > +   if (!node_online(new_nid)) {
> >  #ifdef CONFIG_MEMORY_HOTPLUG
> > /*
> >  * Need to ensure that NODE_DATA is initialized for a node from
> 
> Because of this fix, I wanted to check whether we might have any more 
> fallouts due
> to ("mm: handle uninitialized numa nodes gracefully"), and it made me look 
> closer
> as to why powerpc is the only platform that special cases try_online_node(),
> while all others rely on cpu_up()->try_online_node() to do the right thing.
> 
> So, I had a look.
> Let us rewind a bit:
> 
> The commit that sets find_and_online_cpu_nid() in dlpar_online_cpu was
> e67e02a544e9 ("powerpc/pseries: Fix cpu hotplug crash with memoryless nodes").
> 
> In there, it says that we have the following picture:
> 
> partition_sched_domains
>  arch_update_cpu_topology
>   numa_update_cpu_topology
>find_and_online_cpu_nid
> 
> and by the time find_and_online_cpu_nid() gets called to online the node, it 
> might be
> too late as we might have referenced some NODE_DATA() fields.
> Note that this happens at a much later stage in cpuhp.
> 
> Also note that at a much earlier stage, we do already have a 
> try_online_node() in cpu_up(),
> which should allocate-and-online the node and prevent accessing garbage.
> But the problem is that, on powerpc, all possible cpus have the same node set 
> at boot stage
> (see arch/powerpc/mm/numa.c:mem_topology_setup),
> so cpu_to_node() returns the same thing until it the mapping gets update 
> (which happens in
> start_secondary()->set_numa_node()), and so, the try_online_node() from 
> cpu_up() does not
> apply on the right node, because it still holds the not-up-to-date mapping 
> node <-> cpu.
> 
> (e.g: in my test case, when onlining a CPU belongin to node1, 
> cpu_up()->try_online_node()
>  tries to online node0, or whatever old mapping numa<->cpu is there).
> 
> So, we have something like:
> 
> dlpar_online_cpu
>  device_online
>   dev->bus->online
>cpu_subsys_online
> cpu_device_up
>  cpu_up
>   try_online_node (old mapping nid <-> cpu )
>   ...
>   ...
>   cphp_callbacks
>sched_cpu_activate
> cpuset_update_active_cpus
>  schedule_work(_hotplug_work)
>   cpuset_hotplug_work
>partition_sched_domains
> arch_update_cpu_topology
>  numa_update_cpu_topology
>   find_and_online_cpu_nid (online new_nid)
> 
> 
> So, yeah, the real onlining in 
> numa_update_cpu_topology()->find_and_online_cpu_nid()
> happens too late, that is why adding find_and_online_cpu_nid() back in 
> dlpar_online_cpu()
> fixed the issue, but we should not need this special casing at all.
> 
> We do already know the numa<->cpu associativity in
> dlpar_online_cpu()->find_and_online_cpu_nid(), so we should just be able to
> update the numa<->cpu mapping, and let the try_online_node() do the right 
> thing.
> 
> With this in mind, I came up with the following patch, where I carried out a 
> battery
> of tests for CPU hotplug stuff and it worked as expected.
> But I am not familiar with all powerpc pitfalls, e.g: dedicated vs shared 
> cpus etc, so
> I would like to hear from more familiar people.
> 
> The patch is:
> 
> From: Oscar Salvador 
> Date: Wed, 6 Apr 2022 14:39:15 +0200
> Subject: [PATCH] powerpc/numa: Associate numa node to its cpu earlier
> 
> powerpc is the only platform that do not rely on
> cpu_up()->try_online_node() to bring up a numa node,
> and special cases it, instead, deep in its own machinery:
> 
> dlpar_online_cpu
>  find_and_online_cpu_nid
>   try_online_node
> 
> This should not be needed, but the thing is that the try_online_node()
> from cpu_up() will not apply on the right node, because cpu_to_node()
> will return the old mapping numa<->cpu that gets set on boot stage
> for all possible cpus.
> 
> That can be seen easily if we try to print out the numa node passed
> to try_online_node() in cpu_up().
> 
> The thing is that the numa<->cpu mapping does not get updated till a much
> later stage in start_secondary:
> 
> start_secondary:
>  set_numa_node(numa_cpu_lookup_table[cpu])
> 
> But we do not really care, as we already now the
> CPU <-> NUMA associativity back in find_and_online_cpu_nid(),
> so let us make use of that and set the proper numa<->cpu mapping,
> so cpu_to_node() in cpu_up() returns the right node and
> try_online_node() can do its 

Re: [PATCH net-next 02/15] net: dsa: sja1105: Remove usage of iterator for list_add() after loop

2022-04-08 Thread Vladimir Oltean
Hello Jakob,

On Thu, Apr 07, 2022 at 12:28:47PM +0200, Jakob Koschel wrote:
> In preparation to limit the scope of a list iterator to the list
> traversal loop, use a dedicated pointer to point to the found element [1].
> 
> Before, the code implicitly used the head when no element was found
> when using >list. Since the new variable is only set if an
> element was found, the list_add() is performed within the loop
> and only done after the loop if it is done on the list head directly.
> 
> Link: 
> https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
>  [1]
> Signed-off-by: Jakob Koschel 
> ---
>  drivers/net/dsa/sja1105/sja1105_vl.c | 14 +-
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/dsa/sja1105/sja1105_vl.c 
> b/drivers/net/dsa/sja1105/sja1105_vl.c
> index b7e95d60a6e4..cfcae4d19eef 100644
> --- a/drivers/net/dsa/sja1105/sja1105_vl.c
> +++ b/drivers/net/dsa/sja1105/sja1105_vl.c
> @@ -27,20 +27,24 @@ static int sja1105_insert_gate_entry(struct 
> sja1105_gating_config *gating_cfg,
>   if (list_empty(_cfg->entries)) {
>   list_add(>list, _cfg->entries);
>   } else {
> - struct sja1105_gate_entry *p;
> + struct sja1105_gate_entry *p = NULL, *iter;
>  
> - list_for_each_entry(p, _cfg->entries, list) {
> - if (p->interval == e->interval) {
> + list_for_each_entry(iter, _cfg->entries, list) {
> + if (iter->interval == e->interval) {
>   NL_SET_ERR_MSG_MOD(extack,
>  "Gate conflict");
>   rc = -EBUSY;
>   goto err;
>   }
>  
> - if (e->interval < p->interval)
> + if (e->interval < iter->interval) {
> + p = iter;
> + list_add(>list, iter->list.prev);
>   break;
> + }
>   }
> - list_add(>list, p->list.prev);
> + if (!p)
> + list_add(>list, gating_cfg->entries.prev);
>   }
>  
>   gating_cfg->num_entries++;
> -- 
> 2.25.1
> 

I apologize in advance if I've misinterpreted the end goal of your patch.
I do have a vague suspicion I understand what you're trying to achieve,
and in that case, would you mind using this patch instead of yours?
I think it still preserves the intention of the code in a clean manner.

-[ cut here ]-
>From 7aed740750d1bc3bff6e85fd33298f5905bb4e01 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean 
Date: Fri, 8 Apr 2022 13:55:14 +0300
Subject: [PATCH] net: dsa: sja1105: avoid use of type-confused pointer in
 sja1105_insert_gate_entry()

It appears that list_for_each_entry() leaks a type-confused pointer when
the iteration loop ends with no early break, since "*p" will no longer
point to a "struct sja1105_gate_entry", but rather to some memory in
front of "gating_cfg->entries".

This isn't actually a problem here, because if the element we insert has
the highest interval, therefore we never exit the loop early, "p->list"
(which is all that we use outside the loop) will in fact point to
"gating_cfg->entries" even though "p" itself is invalid.

Nonetheless, there are preparations to increase the safety of
list_for_each_entry() by making it impossible to use the encapsulating
structure of the iterator element outside the loop. So something needs
to change here before those preparations go in, even though this
constitutes legitimate use.

Make it clear that we are not dereferencing members of the encapsulating
"struct sja1105_gate_entry" outside the loop, by using the regular
list_for_each() iterator, and dereferencing the struct sja1105_gate_entry
only within the loop.

With list_for_each(), the iterator element at the end of the loop does
have a sane value in all cases, and we can just use that as the "head"
argument of list_add().

Signed-off-by: Vladimir Oltean 
---
 drivers/net/dsa/sja1105/sja1105_vl.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_vl.c 
b/drivers/net/dsa/sja1105/sja1105_vl.c
index c0e45b393fde..fe93c80fe5ef 100644
--- a/drivers/net/dsa/sja1105/sja1105_vl.c
+++ b/drivers/net/dsa/sja1105/sja1105_vl.c
@@ -27,9 +27,15 @@ static int sja1105_insert_gate_entry(struct 
sja1105_gating_config *gating_cfg,
if (list_empty(_cfg->entries)) {
list_add(>list, _cfg->entries);
} else {
-   struct sja1105_gate_entry *p;
+   struct list_head *pos;
+
+   /* We cannot safely use list_for_each_entry()
+* because we dereference "pos" after the loop
+*/
+   list_for_each(pos, _cfg->entries) {
+   struct 

Re: [PATCH v3 1/2] PCI/AER: Disable AER service when link is in L2/L3 ready, L2 and L3 state

2022-04-08 Thread Kai-Heng Feng
On Thu, Mar 31, 2022 at 3:40 AM Sathyanarayanan Kuppuswamy
 wrote:
>
>
>
> On 3/29/22 1:31 AM, Kai-Heng Feng wrote:
> > On some Intel AlderLake platforms, Thunderbolt entering D3cold can cause
> > some errors reported by AER:
> > [   30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error 
> > received: :00:1d.0
> > [   30.100251] pcieport :00:1d.0: PCIe Bus Error: severity=Uncorrected 
> > (Non-Fatal), type=Transaction Layer, (Requester ID)
> > [   30.100256] pcieport :00:1d.0:   device [8086:7ab0] error 
> > status/mask=0010/4000
> > [   30.100262] pcieport :00:1d.0:[20] UnsupReq   (First)
> > [   30.100267] pcieport :00:1d.0: AER:   TLP Header: 3400 0852 
> >  
> > [   30.100372] thunderbolt :0a:00.0: AER: can't recover (no 
> > error_detected callback)
> > [   30.100401] xhci_hcd :3e:00.0: AER: can't recover (no error_detected 
> > callback)
> > [   30.100427] pcieport :00:1d.0: AER: device recovery failed
>
> Include details about in which platform you have seen it and whether
> this is a generic power issue?

_All_ Alder Lake platforms I worked on have this issue. I don't think
have hardware to analyze if it's a power issue though.

>
> >
> > So disable AER service to avoid the noises from turning power rails
> > on/off when the device is in low power states (D3hot and D3cold), as
> > PCIe spec "5.2 Link State Power Management" states that TLP and DLLP
>
> Also include PCIe specification version number.

Will add in next revision.

Kai-Heng

>
> > transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold
> > with aux power) and L3 (D3cold).
> >
> > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453
> > Reviewed-by: Mika Westerberg 
> > Signed-off-by: Kai-Heng Feng 
> > ---
> > v3:
> >   - Remove reference to ACS.
> >   - Wording change.
> >
> > v2:
> >   - Wording change.
> >
> >   drivers/pci/pcie/aer.c | 31 +--
> >   1 file changed, 25 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index 9fa1f97e5b270..e4e9d4a3098d7 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -1367,6 +1367,22 @@ static int aer_probe(struct pcie_device *dev)
> >   return 0;
> >   }
> >
> > +static int aer_suspend(struct pcie_device *dev)
> > +{
> > + struct aer_rpc *rpc = get_service_data(dev);
> > +
> > + aer_disable_rootport(rpc);
> > + return 0;
> > +}
> > +
> > +static int aer_resume(struct pcie_device *dev)
> > +{
> > + struct aer_rpc *rpc = get_service_data(dev);
> > +
> > + aer_enable_rootport(rpc);
> > + return 0;
> > +}
> > +
> >   /**
> >* aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
> >* @dev: pointer to Root Port, RCEC, or RCiEP
> > @@ -1433,12 +1449,15 @@ static pci_ers_result_t aer_root_reset(struct 
> > pci_dev *dev)
> >   }
> >
> >   static struct pcie_port_service_driver aerdriver = {
> > - .name   = "aer",
> > - .port_type  = PCIE_ANY_PORT,
> > - .service= PCIE_PORT_SERVICE_AER,
> > -
> > - .probe  = aer_probe,
> > - .remove = aer_remove,
> > + .name   = "aer",
> > + .port_type  = PCIE_ANY_PORT,
> > + .service= PCIE_PORT_SERVICE_AER,
> > + .probe  = aer_probe,
> > + .suspend= aer_suspend,
> > + .resume = aer_resume,
> > + .runtime_suspend= aer_suspend,
> > + .runtime_resume = aer_resume,
> > + .remove = aer_remove,
> >   };
> >
> >   /**
>
> --
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer


Re: [PATCH V4 3/7] arm64/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT

2022-04-08 Thread Anshuman Khandual



On 4/8/22 15:58, Catalin Marinas wrote:
> On Thu, Apr 07, 2022 at 04:02:47PM +0530, Anshuman Khandual wrote:
>> diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
>> index 77ada00280d9..307534fcec00 100644
>> --- a/arch/arm64/mm/mmap.c
>> +++ b/arch/arm64/mm/mmap.c
>> @@ -55,3 +55,36 @@ static int __init adjust_protection_map(void)
>>  return 0;
>>  }
>>  arch_initcall(adjust_protection_map);
>> +
>> +static pgprot_t arm64_arch_vm_get_page_prot(unsigned long vm_flags)
>> +{
>> +pteval_t prot = 0;
>> +
>> +if (vm_flags & VM_ARM64_BTI)
>> +prot |= PTE_GP;
>> +
>> +/*
>> + * There are two conditions required for returning a Normal Tagged
>> + * memory type: (1) the user requested it via PROT_MTE passed to
>> + * mmap() or mprotect() and (2) the corresponding vma supports MTE. We
>> + * register (1) as VM_MTE in the vma->vm_flags and (2) as
>> + * VM_MTE_ALLOWED. Note that the latter can only be set during the
>> + * mmap() call since mprotect() does not accept MAP_* flags.
>> + * Checking for VM_MTE only is sufficient since arch_validate_flags()
>> + * does not permit (VM_MTE & !VM_MTE_ALLOWED).
>> + */
>> +if (vm_flags & VM_MTE)
>> +prot |= PTE_ATTRINDX(MT_NORMAL_TAGGED);
>> +
>> +return __pgprot(prot);
>> +}
>> +
>> +pgprot_t vm_get_page_prot(unsigned long vm_flags)
>> +{
>> +pgprot_t ret = __pgprot(pgprot_val(protection_map[vm_flags &
>> +(VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]) |
>> +
>> pgprot_val(arm64_arch_vm_get_page_prot(vm_flags)));
>> +
>> +return ret;
>> +}
>> +EXPORT_SYMBOL(vm_get_page_prot);
> 
> Could you write all this in a single function? I think I mentioned it in
> a previous series (untested):

Right, missed that.

> 
> pgprot_t vm_get_page_prot(unsigned long vm_flags)
> {
>   pteval_t prot = pgprot_val(protection_map[vm_flags &
>  (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]);
> 
>   if (vm_flags & VM_ARM64_BTI)
>   prot |= PTE_GP;
> 
>   /*
>* There are two conditions required for returning a Normal Tagged
>* memory type: (1) the user requested it via PROT_MTE passed to
>* mmap() or mprotect() and (2) the corresponding vma supports MTE. We
>* register (1) as VM_MTE in the vma->vm_flags and (2) as
>* VM_MTE_ALLOWED. Note that the latter can only be set during the
>* mmap() call since mprotect() does not accept MAP_* flags.
>* Checking for VM_MTE only is sufficient since arch_validate_flags()
>* does not permit (VM_MTE & !VM_MTE_ALLOWED).
>*/
>   if (vm_flags & VM_MTE)
>   prot |= PTE_ATTRINDX(MT_NORMAL_TAGGED);
> 
>   return __pgprot(prot);
> }
> EXPORT_SYMBOL(vm_get_page_prot);
> 
> With that:

Sure, will change them into a single function.

> 
> Reviewed-by: Catalin Marinas 
> 


Re: [PATCH V4 7/7] mm/mmap: Drop arch_vm_get_page_pgprot()

2022-04-08 Thread Catalin Marinas
On Thu, Apr 07, 2022 at 04:02:51PM +0530, Anshuman Khandual wrote:
> There are no platforms left which use arch_vm_get_page_prot(). Just drop
> generic arch_vm_get_page_prot().
> 
> Cc: Andrew Morton 
> Cc: linux...@kvack.org
> Cc: linux-ker...@vger.kernel.org
> Signed-off-by: Anshuman Khandual 

Reviewed-by: Catalin Marinas 


Re: [PATCH V4 6/7] mm/mmap: Drop arch_filter_pgprot()

2022-04-08 Thread Catalin Marinas
On Thu, Apr 07, 2022 at 04:02:50PM +0530, Anshuman Khandual wrote:
> There are no platforms left which subscribe ARCH_HAS_FILTER_PGPROT. Hence
> drop generic arch_filter_pgprot() and also config ARCH_HAS_FILTER_PGPROT.
> 
> Cc: Andrew Morton 
> Cc: linux...@kvack.org
> Cc: linux-ker...@vger.kernel.org
> Signed-off-by: Anshuman Khandual 

Reviewed-by: Catalin Marinas 


Re: [PATCH V4 3/7] arm64/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT

2022-04-08 Thread Catalin Marinas
On Thu, Apr 07, 2022 at 04:02:47PM +0530, Anshuman Khandual wrote:
> diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
> index 77ada00280d9..307534fcec00 100644
> --- a/arch/arm64/mm/mmap.c
> +++ b/arch/arm64/mm/mmap.c
> @@ -55,3 +55,36 @@ static int __init adjust_protection_map(void)
>   return 0;
>  }
>  arch_initcall(adjust_protection_map);
> +
> +static pgprot_t arm64_arch_vm_get_page_prot(unsigned long vm_flags)
> +{
> + pteval_t prot = 0;
> +
> + if (vm_flags & VM_ARM64_BTI)
> + prot |= PTE_GP;
> +
> + /*
> +  * There are two conditions required for returning a Normal Tagged
> +  * memory type: (1) the user requested it via PROT_MTE passed to
> +  * mmap() or mprotect() and (2) the corresponding vma supports MTE. We
> +  * register (1) as VM_MTE in the vma->vm_flags and (2) as
> +  * VM_MTE_ALLOWED. Note that the latter can only be set during the
> +  * mmap() call since mprotect() does not accept MAP_* flags.
> +  * Checking for VM_MTE only is sufficient since arch_validate_flags()
> +  * does not permit (VM_MTE & !VM_MTE_ALLOWED).
> +  */
> + if (vm_flags & VM_MTE)
> + prot |= PTE_ATTRINDX(MT_NORMAL_TAGGED);
> +
> + return __pgprot(prot);
> +}
> +
> +pgprot_t vm_get_page_prot(unsigned long vm_flags)
> +{
> + pgprot_t ret = __pgprot(pgprot_val(protection_map[vm_flags &
> + (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]) |
> + 
> pgprot_val(arm64_arch_vm_get_page_prot(vm_flags)));
> +
> + return ret;
> +}
> +EXPORT_SYMBOL(vm_get_page_prot);

Could you write all this in a single function? I think I mentioned it in
a previous series (untested):

pgprot_t vm_get_page_prot(unsigned long vm_flags)
{
pteval_t prot = pgprot_val(protection_map[vm_flags &
   (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]);

if (vm_flags & VM_ARM64_BTI)
prot |= PTE_GP;

/*
 * There are two conditions required for returning a Normal Tagged
 * memory type: (1) the user requested it via PROT_MTE passed to
 * mmap() or mprotect() and (2) the corresponding vma supports MTE. We
 * register (1) as VM_MTE in the vma->vm_flags and (2) as
 * VM_MTE_ALLOWED. Note that the latter can only be set during the
 * mmap() call since mprotect() does not accept MAP_* flags.
 * Checking for VM_MTE only is sufficient since arch_validate_flags()
 * does not permit (VM_MTE & !VM_MTE_ALLOWED).
 */
if (vm_flags & VM_MTE)
prot |= PTE_ATTRINDX(MT_NORMAL_TAGGED);

return __pgprot(prot);
}
EXPORT_SYMBOL(vm_get_page_prot);

With that:

Reviewed-by: Catalin Marinas 


Re: rcu_sched self-detected stall on CPU

2022-04-08 Thread Zhouyi Zhou
On Fri, Apr 8, 2022 at 3:23 PM Michael Ellerman  wrote:
>
> "Paul E. McKenney"  writes:
> > On Wed, Apr 06, 2022 at 05:31:10PM +0800, Zhouyi Zhou wrote:
> >> Hi
> >>
> >> I can reproduce it in a ppc virtual cloud server provided by Oregon
> >> State University.  Following is what I do:
> >> 1) curl -l 
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/snapshot/linux-5.18-rc1.tar.gz
> >> -o linux-5.18-rc1.tar.gz
> >> 2) tar zxf linux-5.18-rc1.tar.gz
> >> 3) cp config linux-5.18-rc1/.config
> >> 4) cd linux-5.18-rc1
> >> 5) make vmlinux -j 8
> >> 6) qemu-system-ppc64 -kernel vmlinux -nographic -vga none -no-reboot
> >> -smp 2 (QEMU 4.2.1)
> >> 7) after 12 rounds, the bug got reproduced:
> >> (http://154.223.142.244/logs/20220406/qemu.log.txt)
> >
> > Just to make sure, are you both seeing the same thing?  Last I knew,
> > Zhouyi was chasing an RCU-tasks issue that appears only in kernels
> > built with CONFIG_PROVE_RCU=y, which Miguel does not have set.  Or did
> > I miss something?
> >
> > Miguel is instead seeing an RCU CPU stall warning where RCU's grace-period
> > kthread slept for three milliseconds, but did not wake up for more than
> > 20 seconds.  This kthread would normally have awakened on CPU 1, but
> > CPU 1 looks to me to be very unhealthy, as can be seen in your console
> > output below (but maybe my idea of what is healthy for powerpc systems
> > is outdated).  Please see also the inline annotations.
> >
> > Thoughts from the PPC guys?
>
> I haven't seen it in my testing. But using Miguel's config I can
> reproduce it seemingly on every boot.
>
> For me it bisects to:
>
>   35de589cb879 ("powerpc/time: improve decrementer clockevent processing")
>
> Which seems plausible.
I also bisect to 35de589cb879 ("powerpc/time: improve decrementer
clockevent processing")
>
> Reverting that on mainline makes the bug go away.
I also revert that on the mainline, and am currently doing a pressure
test (by repeatedly invoking qemu and checking the console.log) on PPC
VM in Oregon State University.
>
> I don't see an obvious bug in the diff, but I could be wrong, or the old
> code was papering over an existing bug?
>
> I'll try and work out what it is about Miguel's config that exposes
> this vs our defconfig, that might give us a clue.
Great job!
>
> cheers
Thanks
Zhouyi


Re: [PATCH V4 1/7] mm/mmap: Add new config ARCH_HAS_VM_GET_PAGE_PROT

2022-04-08 Thread Catalin Marinas
On Thu, Apr 07, 2022 at 04:02:45PM +0530, Anshuman Khandual wrote:
> Add a new config ARCH_HAS_VM_GET_PAGE_PROT, which when subscribed enables a
> given platform to define its own vm_get_page_prot() but still utilizing the
> generic protection_map[] array.
> 
> Cc: Andrew Morton 
> Cc: linux...@kvack.org
> Cc: linux-ker...@vger.kernel.org
> Suggested-by: Christoph Hellwig 
> Signed-off-by: Anshuman Khandual 

Reviewed-by: Catalin Marinas 


Re: [PATCH net-next 02/15] net: dsa: sja1105: Remove usage of iterator for list_add() after loop

2022-04-08 Thread Christophe Leroy


Le 07/04/2022 à 12:28, Jakob Koschel a écrit :
> In preparation to limit the scope of a list iterator to the list
> traversal loop, use a dedicated pointer to point to the found element [1].
> 
> Before, the code implicitly used the head when no element was found
> when using >list. Since the new variable is only set if an
> element was found, the list_add() is performed within the loop
> and only done after the loop if it is done on the list head directly.
> 
> Link: 
> https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=ehreask5sqxpwr9y7k9sa6cwx...@mail.gmail.com/
>  [1]
> Signed-off-by: Jakob Koschel 
> ---
>   drivers/net/dsa/sja1105/sja1105_vl.c | 14 +-
>   1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/dsa/sja1105/sja1105_vl.c 
> b/drivers/net/dsa/sja1105/sja1105_vl.c
> index b7e95d60a6e4..cfcae4d19eef 100644
> --- a/drivers/net/dsa/sja1105/sja1105_vl.c
> +++ b/drivers/net/dsa/sja1105/sja1105_vl.c
> @@ -27,20 +27,24 @@ static int sja1105_insert_gate_entry(struct 
> sja1105_gating_config *gating_cfg,
>   if (list_empty(_cfg->entries)) {
>   list_add(>list, _cfg->entries);
>   } else {
> - struct sja1105_gate_entry *p;
> + struct sja1105_gate_entry *p = NULL, *iter;
>   
> - list_for_each_entry(p, _cfg->entries, list) {
> - if (p->interval == e->interval) {
> + list_for_each_entry(iter, _cfg->entries, list) {
> + if (iter->interval == e->interval) {
>   NL_SET_ERR_MSG_MOD(extack,
>  "Gate conflict");
>   rc = -EBUSY;
>   goto err;
>   }
>   
> - if (e->interval < p->interval)
> + if (e->interval < iter->interval) {
> + p = iter;
> + list_add(>list, iter->list.prev);
>   break;
> + }
>   }
> - list_add(>list, p->list.prev);
> + if (!p)
> + list_add(>list, gating_cfg->entries.prev);
>   }
>   
>   gating_cfg->num_entries++;

This change looks ugly, why duplicating the list_add() to do the same ? 
At the end of the loop the pointer contains gating_cfg->entries, so it 
was cleaner before.

If you don't want to use the loop index outside the loop, fair enough, 
all you have to do is:

struct sja1105_gate_entry *p, *iter;

list_for_each_entry(iter, _cfg->entries, list) {
if (iter->interval == e->interval) {
NL_SET_ERR_MSG_MOD(extack,
   "Gate conflict");
rc = -EBUSY;
goto err;
}
p = iter;

if (e->interval < iter->interval)
break;
}
list_add(>list, p->list.prev);



Christophe

Re: [PATCH V2] testing/selftests/mqueue: Fix mq_perf_tests to free the allocated cpu set

2022-04-08 Thread Athira Rajeev



> On 08-Apr-2022, at 12:31 AM, Shuah Khan  wrote:
> 
> On 4/7/22 12:40 PM, Athira Rajeev wrote:
>> The selftest "mqueue/mq_perf_tests.c" use CPU_ALLOC to allocate
>> CPU set. This cpu set is used further in pthread_attr_setaffinity_np
>> and by pthread_create in the code. But in current code, allocated
>> cpu set is not freed.
>> Fix this issue by adding CPU_FREE in the "shutdown" function which
>> is called in most of the error/exit path for the cleanup. Also add
>> CPU_FREE in some of the error paths where shutdown is not called.
>> Fixes: 7820b0715b6f ("tools/selftests: add mq_perf_tests")
>> Signed-off-by: Athira Rajeev 
>> ---
>> Changelog:
>> From v1 -> v2:
>>  Addressed review comment from Shuah Khan to add
>>  CPU_FREE in other exit paths where it is needed
> 
> Thank you. I have a couple of comments on making the error
> paths simpler. Please see below.
> 
>>  tools/testing/selftests/mqueue/mq_perf_tests.c | 10 +-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>> diff --git a/tools/testing/selftests/mqueue/mq_perf_tests.c 
>> b/tools/testing/selftests/mqueue/mq_perf_tests.c
>> index b019e0b8221c..182434c7898d 100644
>> --- a/tools/testing/selftests/mqueue/mq_perf_tests.c
>> +++ b/tools/testing/selftests/mqueue/mq_perf_tests.c
>> @@ -180,6 +180,9 @@ void shutdown(int exit_val, char *err_cause, int line_no)
>>  if (in_shutdown++)
>>  return;
>>  +   /* Free the cpu_set allocated using CPU_ALLOC in main function */
>> +CPU_FREE(cpu_set);
>> +
>>  for (i = 0; i < num_cpus_to_pin; i++)
>>  if (cpu_threads[i]) {
>>  pthread_kill(cpu_threads[i], SIGUSR1);
>> @@ -589,6 +592,7 @@ int main(int argc, char *argv[])
>>  cpu_set)) {
>>  fprintf(stderr, "Any given CPU may "
>>  "only be given once.\n");
>> +CPU_FREE(cpu_set);
> 
> This could be done in a common error path handling.
> 
>>  exit(1);
>>  } else
>>  CPU_SET_S(cpus_to_pin[cpu],
>> @@ -607,6 +611,7 @@ int main(int argc, char *argv[])
>>  queue_path = malloc(strlen(option) + 2);
>>  if (!queue_path) {
>>  perror("malloc()");
>> +CPU_FREE(cpu_set);
> 
> This could be done in a common error path handling.
> 
>>  exit(1);
>>  }
>>  queue_path[0] = '/';
>> @@ -619,6 +624,7 @@ int main(int argc, char *argv[])
>>  }
>>  if (continuous_mode && num_cpus_to_pin == 0) {
>> +CPU_FREE(cpu_set);
> 
> This could be done in a common error path handling.
> 
>>  fprintf(stderr, "Must pass at least one CPU to continuous "
>>  "mode.\n");
>>  poptPrintUsage(popt_context, stderr, 0);
>> @@ -628,10 +634,12 @@ int main(int argc, char *argv[])
>>  cpus_to_pin[0] = cpus_online - 1;
>>  }
>>  -   if (getuid() != 0)
>> +if (getuid() != 0) {
>> +CPU_FREE(cpu_set);
>>  ksft_exit_skip("Not running as root, but almost all tests "
>>  "require root in order to modify\nsystem settings.  "
>>  "Exiting.\n");
>> +}
>>  
> 
> Why not move this check before CPU_ALLOC and make this the very first
> check in main()?
> 
> With this change the other places where CPU_FREE is added right before
> exit(1). Something like this:
> 
> err_code:
>   CPU_FREE(cpu_set);
>   exit(code)
Hi Shuah

Thanks for the comments. Addressed these in V3

Athira
> 
>>  max_msgs = fopen(MAX_MSGS, "r+");
>>  max_msgsize = fopen(MAX_MSGSIZE, "r+");
> 
> thanks,
> -- Shuah



[PATCH v9 3/4] mm: Add len and flags parameters to arch_get_mmap_end()

2022-04-08 Thread Christophe Leroy
Powerpc needs flags and len to make decision on arch_get_mmap_end().

So add them as parameters to arch_get_mmap_end().

Signed-off-by: Christophe Leroy 
Cc: Steve Capper 
Cc: Catalin Marinas 
Cc: Will Deacon 
Acked-by: Catalin Marinas 
---
 arch/arm64/include/asm/processor.h | 4 ++--
 mm/mmap.c  | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/processor.h 
b/arch/arm64/include/asm/processor.h
index 73e38d9a540c..bd22d94e844d 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -92,8 +92,8 @@
 #endif /* CONFIG_COMPAT */
 
 #ifndef CONFIG_ARM64_FORCE_52BIT
-#define arch_get_mmap_end(addr) ((addr > DEFAULT_MAP_WINDOW) ? TASK_SIZE :\
-   DEFAULT_MAP_WINDOW)
+#define arch_get_mmap_end(addr, len, flags) \
+   (((addr) > DEFAULT_MAP_WINDOW) ? TASK_SIZE : DEFAULT_MAP_WINDOW)
 
 #define arch_get_mmap_base(addr, base) ((addr > DEFAULT_MAP_WINDOW) ? \
base + TASK_SIZE - DEFAULT_MAP_WINDOW :\
diff --git a/mm/mmap.c b/mm/mmap.c
index 08b40a9715ee..4d63e1cb3b52 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2118,7 +2118,7 @@ unsigned long vm_unmapped_area(struct 
vm_unmapped_area_info *info)
 }
 
 #ifndef arch_get_mmap_end
-#define arch_get_mmap_end(addr)(TASK_SIZE)
+#define arch_get_mmap_end(addr, len, flags)(TASK_SIZE)
 #endif
 
 #ifndef arch_get_mmap_base
@@ -2144,7 +2144,7 @@ generic_get_unmapped_area(struct file *filp, unsigned 
long addr,
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma, *prev;
struct vm_unmapped_area_info info;
-   const unsigned long mmap_end = arch_get_mmap_end(addr);
+   const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
 
if (len > mmap_end - mmap_min_addr)
return -ENOMEM;
@@ -2192,7 +2192,7 @@ generic_get_unmapped_area_topdown(struct file *filp, 
unsigned long addr,
struct vm_area_struct *vma, *prev;
struct mm_struct *mm = current->mm;
struct vm_unmapped_area_info info;
-   const unsigned long mmap_end = arch_get_mmap_end(addr);
+   const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
 
/* requested length too big for entire address space */
if (len > mmap_end - mmap_min_addr)
-- 
2.35.1



[PATCH v9 4/4] mm, hugetlbfs: Allow for "high" userspace addresses

2022-04-08 Thread Christophe Leroy
This is a complement of f6795053dac8 ("mm: mmap: Allow for "high"
userspace addresses") for hugetlb.

This patch adds support for "high" userspace addresses that are
optionally supported on the system and have to be requested via a hint
mechanism ("high" addr parameter to mmap).

Architectures such as powerpc and x86 achieve this by making changes to
their architectural versions of hugetlb_get_unmapped_area() function.
However, arm64 uses the generic version of that function.

So take into account arch_get_mmap_base() and arch_get_mmap_end() in
hugetlb_get_unmapped_area(). To allow that, move those two macros
out of mm/mmap.c into include/linux/sched/mm.h

If these macros are not defined in architectural code then they default
to (TASK_SIZE) and (base) so should not introduce any behavioural
changes to architectures that do not define them.

For the time being, only ARM64 is affected by this change.

Signed-off-by: Christophe Leroy 
Cc: Steve Capper 
Cc: Will Deacon 
Cc: Catalin Marinas 
Fixes: f6795053dac8 ("mm: mmap: Allow for "high" userspace addresses")
Cc:  # 5.0.x
Reviewed-by: Catalin Marinas 
---
 fs/hugetlbfs/inode.c | 9 +
 include/linux/sched/mm.h | 8 
 mm/mmap.c| 8 
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 09c8313ee1c4..6f863497bd69 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -205,7 +205,7 @@ hugetlb_get_unmapped_area_bottomup(struct file *file, 
unsigned long addr,
info.flags = 0;
info.length = len;
info.low_limit = current->mm->mmap_base;
-   info.high_limit = TASK_SIZE;
+   info.high_limit = arch_get_mmap_end(addr, len, flags);
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
info.align_offset = 0;
return vm_unmapped_area();
@@ -221,7 +221,7 @@ hugetlb_get_unmapped_area_topdown(struct file *file, 
unsigned long addr,
info.flags = VM_UNMAPPED_AREA_TOPDOWN;
info.length = len;
info.low_limit = max(PAGE_SIZE, mmap_min_addr);
-   info.high_limit = current->mm->mmap_base;
+   info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base);
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
info.align_offset = 0;
addr = vm_unmapped_area();
@@ -236,7 +236,7 @@ hugetlb_get_unmapped_area_topdown(struct file *file, 
unsigned long addr,
VM_BUG_ON(addr != -ENOMEM);
info.flags = 0;
info.low_limit = current->mm->mmap_base;
-   info.high_limit = TASK_SIZE;
+   info.high_limit = arch_get_mmap_end(addr, len, flags);
addr = vm_unmapped_area();
}
 
@@ -251,6 +251,7 @@ generic_hugetlb_get_unmapped_area(struct file *file, 
unsigned long addr,
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
struct hstate *h = hstate_file(file);
+   const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
 
if (len & ~huge_page_mask(h))
return -EINVAL;
@@ -266,7 +267,7 @@ generic_hugetlb_get_unmapped_area(struct file *file, 
unsigned long addr,
if (addr) {
addr = ALIGN(addr, huge_page_size(h));
vma = find_vma(mm, addr);
-   if (TASK_SIZE - len >= addr &&
+   if (mmap_end - len >= addr &&
(!vma || addr + len <= vm_start_gap(vma)))
return addr;
}
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 18b2d2b2e0ca..8cd975a8bfeb 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -136,6 +136,14 @@ static inline void mm_update_next_owner(struct mm_struct 
*mm)
 #endif /* CONFIG_MEMCG */
 
 #ifdef CONFIG_MMU
+#ifndef arch_get_mmap_end
+#define arch_get_mmap_end(addr, len, flags)(TASK_SIZE)
+#endif
+
+#ifndef arch_get_mmap_base
+#define arch_get_mmap_base(addr, base) (base)
+#endif
+
 extern void arch_pick_mmap_layout(struct mm_struct *mm,
  struct rlimit *rlim_stack);
 extern unsigned long
diff --git a/mm/mmap.c b/mm/mmap.c
index 4d63e1cb3b52..e9b7d74e58bc 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2117,14 +2117,6 @@ unsigned long vm_unmapped_area(struct 
vm_unmapped_area_info *info)
return addr;
 }
 
-#ifndef arch_get_mmap_end
-#define arch_get_mmap_end(addr, len, flags)(TASK_SIZE)
-#endif
-
-#ifndef arch_get_mmap_base
-#define arch_get_mmap_base(addr, base) (base)
-#endif
-
 /* Get an address range which is currently unmapped.
  * For shmat() with addr=0.
  *
-- 
2.35.1



[PATCH v9 0/4] mm: Enable conversion of powerpc to default topdown mmap layout

2022-04-08 Thread Christophe Leroy
Rebased on top of Linux 5.18-rc1

This is the mm part of the series that converts powerpc to default
topdown mmap layout, for merge into v5.18

powerpc requires its own arch_get_unmapped_area() only when
slices are needed, which is only for book3s/64. First part of
the series moves slices into book3s/64 specific directories
and cleans up other subarchitectures.

The actual convertion of powerpc to default topdown mmap layout will
then be resent in a follow-up series for application on v5.19

First patch modifies the core mm to allow powerpc to still provide its
own arch_randomize_brk()

Second patch modifies core mm to give len and flags to
arch_get_mmap_end() as powerpc needs it. 

Third patch modifies core mm to allow powerpc to use generic versions
of get_unmapped_area functions for Radix while still providing its own
implementation for Hash, the selection between Radix and Hash being
done at runtime.

Fourth patch is a complement/fix of f6795053dac8 ("mm: mmap: Allow for
"high" userspace addresses") for hugetlb. It adds support for "high"
userspace addresses that are optionally supported on the system and
have to be requested via a hint mechanism ("high" addr parameter to mmap).

Previous version of the series is available at
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?state=*=289718

Signed-off-by: Christophe Leroy 

Changes in v9:
- Sending mm patches and powerpc patches separately with the objective
of mm patches going into kernel v5.18 and powerpc patches folling up
into kernel v5.19

Changes in v8:
- Moved patch "sizes.h: Add SZ_1T macro" up from which is already in linux-next 
but not in Linus tree yet.
- Rebased on today's powerpc/next

Changes in v7:
- Taken into account comments from Catalin (patches 3 and 4)

Changes in v6:
- New patch (patch 4) to take arch_get_mmap_base() and arch_get_mmap_end() into 
account in generic hugetlb_get_unmapped_area()
- Get back arch_randomize_brk() simplification as it relies on default topdown 
mmap layout.
- Fixed precedence between || and && in powerpc's arch_get_mmap_end() (patch 9)

Changes in v5:
- Added patch 3
- Added arch_get_mmap_base() and arch_get_mmap_end() to patch 7 to better match 
original powerpc behaviour
- Switched patched 10 and 11 and performed full randomisation in patch 10 just 
before switching to default implementation, as suggested by Nic.

Changes in v4:
- Move arch_randomize_brk() simplification out of this series
- Add a change to core mm to enable using generic implementation
while providing arch specific one at the same time.
- Reworked radix get_unmapped_area to use generic implementation
- Rebase on top of Nic's series v6

Changes in v3:
- Fixed missing  in last patch
- Added a patch to move SZ_1T out of drivers/pci/controller/pci-xgene.c

Changes in v2:
- Moved patch 4 before patch 2
- Make generic arch_randomize_brk() __weak
- Added patch 9

Christophe Leroy (4):
  mm: Allow arch specific arch_randomize_brk() with
CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
  mm, hugetlbfs: Allow an arch to always use generic versions of
get_unmapped_area functions
  mm: Add len and flags parameters to arch_get_mmap_end()
  mm, hugetlbfs: Allow for "high" userspace addresses

 arch/arm64/include/asm/processor.h |  4 +--
 fs/hugetlbfs/inode.c   | 26 --
 include/linux/hugetlb.h|  5 
 include/linux/sched/mm.h   | 17 
 mm/mmap.c  | 43 ++
 mm/util.c  |  2 +-
 6 files changed, 69 insertions(+), 28 deletions(-)

-- 
2.35.1



[PATCH v9 2/4] mm, hugetlbfs: Allow an arch to always use generic versions of get_unmapped_area functions

2022-04-08 Thread Christophe Leroy
Unlike most architectures, powerpc can only define at runtime
if it is going to use the generic arch_get_unmapped_area() or not.

Today, powerpc has a copy of the generic arch_get_unmapped_area()
because when selection HAVE_ARCH_UNMAPPED_AREA the generic
arch_get_unmapped_area() is not available.

Rename it generic_get_unmapped_area() and make it independent of
HAVE_ARCH_UNMAPPED_AREA.

Do the same for arch_get_unmapped_area_topdown() versus
HAVE_ARCH_UNMAPPED_AREA_TOPDOWN.

Do the same for hugetlb_get_unmapped_area() versus
HAVE_ARCH_HUGETLB_UNMAPPED_AREA.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
---
 fs/hugetlbfs/inode.c | 17 +
 include/linux/hugetlb.h  |  5 +
 include/linux/sched/mm.h |  9 +
 mm/mmap.c| 31 ---
 4 files changed, 51 insertions(+), 11 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 99c7477cee5c..09c8313ee1c4 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -195,7 +195,6 @@ static int hugetlbfs_file_mmap(struct file *file, struct 
vm_area_struct *vma)
  * Called under mmap_write_lock(mm).
  */
 
-#ifndef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
 static unsigned long
 hugetlb_get_unmapped_area_bottomup(struct file *file, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags)
@@ -244,9 +243,10 @@ hugetlb_get_unmapped_area_topdown(struct file *file, 
unsigned long addr,
return addr;
 }
 
-static unsigned long
-hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
-   unsigned long len, unsigned long pgoff, unsigned long flags)
+unsigned long
+generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags)
 {
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
@@ -282,6 +282,15 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long 
addr,
return hugetlb_get_unmapped_area_bottomup(file, addr, len,
pgoff, flags);
 }
+
+#ifndef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
+static unsigned long
+hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags)
+{
+   return generic_hugetlb_get_unmapped_area(file, addr, len, pgoff, flags);
+}
 #endif
 
 static size_t
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 53c1b6082a4c..10895adc7635 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -513,6 +513,11 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, 
unsigned long addr,
unsigned long flags);
 #endif /* HAVE_ARCH_HUGETLB_UNMAPPED_AREA */
 
+unsigned long
+generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags);
+
 /*
  * huegtlb page specific state flags.  These flags are located in page.private
  * of the hugetlb head page.  Functions created via the below macros should be
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index a80356e9dc69..18b2d2b2e0ca 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -145,6 +145,15 @@ extern unsigned long
 arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
  unsigned long len, unsigned long pgoff,
  unsigned long flags);
+
+unsigned long
+generic_get_unmapped_area(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags);
+unsigned long
+generic_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags);
 #else
 static inline void arch_pick_mmap_layout(struct mm_struct *mm,
 struct rlimit *rlim_stack) {}
diff --git a/mm/mmap.c b/mm/mmap.c
index 3aa839f81e63..08b40a9715ee 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2136,10 +2136,10 @@ unsigned long vm_unmapped_area(struct 
vm_unmapped_area_info *info)
  *
  * This function "knows" that -ENOMEM has the bits set.
  */
-#ifndef HAVE_ARCH_UNMAPPED_AREA
 unsigned long
-arch_get_unmapped_area(struct file *filp, unsigned long addr,
-   unsigned long len, unsigned long pgoff, unsigned long flags)
+generic_get_unmapped_area(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags)
 {
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma, *prev;
@@ -2169,17 +2169,25 @@ 

[PATCH v9 1/4] mm: Allow arch specific arch_randomize_brk() with CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT

2022-04-08 Thread Christophe Leroy
Commit e7142bf5d231 ("arm64, mm: make randomization selected by
generic topdown mmap layout") introduced a default version of
arch_randomize_brk() provided when
CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT is selected.

powerpc could select CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
but needs to provide its own arch_randomize_brk().

In order to allow that, define generic version of arch_randomize_brk()
as a __weak symbol.

Cc: Alexandre Ghiti 
Signed-off-by: Christophe Leroy 
---
 mm/util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/util.c b/mm/util.c
index 54e5e761a9a9..0787fe4af4ac 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -344,7 +344,7 @@ unsigned long randomize_stack_top(unsigned long stack_top)
 }
 
 #ifdef CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
-unsigned long arch_randomize_brk(struct mm_struct *mm)
+unsigned long __weak arch_randomize_brk(struct mm_struct *mm)
 {
/* Is the current task 32bit ? */
if (!IS_ENABLED(CONFIG_64BIT) || is_compat_task())
-- 
2.35.1



[PATCH V3] testing/selftests/mqueue: Fix mq_perf_tests to free the allocated cpu set

2022-04-08 Thread Athira Rajeev
The selftest "mqueue/mq_perf_tests.c" use CPU_ALLOC to allocate
CPU set. This cpu set is used further in pthread_attr_setaffinity_np
and by pthread_create in the code. But in current code, allocated
cpu set is not freed.

Fix this issue by adding CPU_FREE in the "shutdown" function which
is called in most of the error/exit path for the cleanup. There are
few error paths which exit without using shutdown. Add a common goto
error path with CPU_FREE for these cases.

Fixes: 7820b0715b6f ("tools/selftests: add mq_perf_tests")
Signed-off-by: Athira Rajeev 
---
Changelog:
 From v2 -> v3:
  Addressed review comment from Shuah Khan to add
  common "goto" error path with CPU_FREE for few exit
  cases.
 From v1 -> v2:
  Addressed review comment from Shuah Khan to add
  CPU_FREE in other exit paths where it is needed

 .../testing/selftests/mqueue/mq_perf_tests.c  | 25 +--
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/mqueue/mq_perf_tests.c 
b/tools/testing/selftests/mqueue/mq_perf_tests.c
index b019e0b8221c..84fda3b49073 100644
--- a/tools/testing/selftests/mqueue/mq_perf_tests.c
+++ b/tools/testing/selftests/mqueue/mq_perf_tests.c
@@ -180,6 +180,9 @@ void shutdown(int exit_val, char *err_cause, int line_no)
if (in_shutdown++)
return;
 
+   /* Free the cpu_set allocated using CPU_ALLOC in main function */
+   CPU_FREE(cpu_set);
+
for (i = 0; i < num_cpus_to_pin; i++)
if (cpu_threads[i]) {
pthread_kill(cpu_threads[i], SIGUSR1);
@@ -551,6 +554,12 @@ int main(int argc, char *argv[])
perror("sysconf(_SC_NPROCESSORS_ONLN)");
exit(1);
}
+
+   if (getuid() != 0)
+   ksft_exit_skip("Not running as root, but almost all tests "
+   "require root in order to modify\nsystem settings.  "
+   "Exiting.\n");
+
cpus_online = min(MAX_CPUS, sysconf(_SC_NPROCESSORS_ONLN));
cpu_set = CPU_ALLOC(cpus_online);
if (cpu_set == NULL) {
@@ -589,7 +598,7 @@ int main(int argc, char *argv[])
cpu_set)) {
fprintf(stderr, "Any given CPU may "
"only be given once.\n");
-   exit(1);
+   goto err_code;
} else
CPU_SET_S(cpus_to_pin[cpu],
  cpu_set_size, cpu_set);
@@ -607,7 +616,7 @@ int main(int argc, char *argv[])
queue_path = malloc(strlen(option) + 2);
if (!queue_path) {
perror("malloc()");
-   exit(1);
+   goto err_code;
}
queue_path[0] = '/';
queue_path[1] = 0;
@@ -622,17 +631,12 @@ int main(int argc, char *argv[])
fprintf(stderr, "Must pass at least one CPU to continuous "
"mode.\n");
poptPrintUsage(popt_context, stderr, 0);
-   exit(1);
+   goto err_code;
} else if (!continuous_mode) {
num_cpus_to_pin = 1;
cpus_to_pin[0] = cpus_online - 1;
}
 
-   if (getuid() != 0)
-   ksft_exit_skip("Not running as root, but almost all tests "
-   "require root in order to modify\nsystem settings.  "
-   "Exiting.\n");
-
max_msgs = fopen(MAX_MSGS, "r+");
max_msgsize = fopen(MAX_MSGSIZE, "r+");
if (!max_msgs)
@@ -740,4 +744,9 @@ int main(int argc, char *argv[])
sleep(1);
}
shutdown(0, "", 0);
+
+err_code:
+   CPU_FREE(cpu_set);
+   exit(1);
+
 }
-- 
2.35.1



Re: rcu_sched self-detected stall on CPU

2022-04-08 Thread Michael Ellerman
"Paul E. McKenney"  writes:
> On Wed, Apr 06, 2022 at 05:31:10PM +0800, Zhouyi Zhou wrote:
>> Hi
>> 
>> I can reproduce it in a ppc virtual cloud server provided by Oregon
>> State University.  Following is what I do:
>> 1) curl -l 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/snapshot/linux-5.18-rc1.tar.gz
>> -o linux-5.18-rc1.tar.gz
>> 2) tar zxf linux-5.18-rc1.tar.gz
>> 3) cp config linux-5.18-rc1/.config
>> 4) cd linux-5.18-rc1
>> 5) make vmlinux -j 8
>> 6) qemu-system-ppc64 -kernel vmlinux -nographic -vga none -no-reboot
>> -smp 2 (QEMU 4.2.1)
>> 7) after 12 rounds, the bug got reproduced:
>> (http://154.223.142.244/logs/20220406/qemu.log.txt)
>
> Just to make sure, are you both seeing the same thing?  Last I knew,
> Zhouyi was chasing an RCU-tasks issue that appears only in kernels
> built with CONFIG_PROVE_RCU=y, which Miguel does not have set.  Or did
> I miss something?
>
> Miguel is instead seeing an RCU CPU stall warning where RCU's grace-period
> kthread slept for three milliseconds, but did not wake up for more than
> 20 seconds.  This kthread would normally have awakened on CPU 1, but
> CPU 1 looks to me to be very unhealthy, as can be seen in your console
> output below (but maybe my idea of what is healthy for powerpc systems
> is outdated).  Please see also the inline annotations.
>
> Thoughts from the PPC guys?

I haven't seen it in my testing. But using Miguel's config I can
reproduce it seemingly on every boot.

For me it bisects to:

  35de589cb879 ("powerpc/time: improve decrementer clockevent processing")

Which seems plausible.

Reverting that on mainline makes the bug go away.

I don't see an obvious bug in the diff, but I could be wrong, or the old
code was papering over an existing bug?

I'll try and work out what it is about Miguel's config that exposes
this vs our defconfig, that might give us a clue.

cheers