Re: [PATCH V6 17/18] tools/perf: Update data_type_cmp and sort__typeoff_sort function to include var_name in comparison

2024-07-12 Thread Athira Rajeev



> On 13 Jul 2024, at 2:55 AM, Namhyung Kim  wrote:
> 
> On Sun, Jul 07, 2024 at 08:14:18PM +0530, Athira Rajeev wrote:
>> Currently data_type_cmp() only compares size and type name.
>> But in cases where the type name of two data type entries
>> is same, but var_name is different, the comparison can't distinguish
>> two different types.
>> 
>> Consider there is a "long unsigned int" with var_name as "X" and there
>> is global variable "long unsigned int". Currently since
>> data_type_cmp uses only type_name for comparison ( "long unsigned int"),
>> it won't distinguish these as separate entries. Update the
> 
> I'm still not sure if it's ok.  It intentionally merges different
> instances of the same type together as it's a data 'type' profile.
> 
> 
>> functions "data_type_cmp" as well as "sort__typeoff_sort" to
>> compare variable names after type name if it exists.
>> 
>> Also updated "hist_entry__typeoff_snprintf" to print var_name if
>> it is set. With the changes,
>> 
>> 11.42%  long unsigned int  long unsigned int +0 (current_stack_pointer)
>> 4.68%  struct paca_struct  struct paca_struct +2312 (__current)
>> 4.57%  struct paca_struct  struct paca_struct +2354 (irq_soft_mask)
>> 2.69%  struct paca_struct  struct paca_struct +2808 (canary)
>> 2.68%  struct paca_struct  struct paca_struct +8 (paca_index)
>> 2.24%  struct paca_struct  struct paca_struct +48 (data_offset)
>> 1.43%  long unsigned int  long unsigned int +0 (no field)
> 
> It seems like an output of `perf report -s type,typeoff`.  But I'm
> curious how it'd work with -s type only?  I guess it'd have two separate
> entries for 'long unsigned int'.  Ideally we can have a single entry
> with two different fields.
> 
> For example, `perf report -s type,typeoff -H`:
> 
>  12.85% long unsigned int
> 11.42% long unsigned int +0 (current_stack_pointer)
>  1.43% long unsigned int +0 (no field)
>  ...
> 

Hi Namhyung,

Thanks for the comments.

While printing, the check for field is done only in “sort__typeoff_sort”.
To summarise,
1. While adding the data type in rb node, if the entry has different field, new 
entry will be added. So we will have different entries in rb node if field 
differs.
2. While using sort key to display, for “typeoff”, we use sort__typeoff_sort . 
For “type”, we use “sort__type_sort”
3. In “sort__type_sort” type_name is used. Hence result will have only single 
entry
4. In “sort__typeoff_sort” we added check for “var_name” too in this patch. So 
result will have two entries if field differs

Example:

Using -H option,

./perf report -s type,typeoff -H

17.65% struct paca_struct
4.68% struct paca_struct +2312 (__current)
4.57% struct paca_struct +2354 (irq_soft_mask)
2.69% struct paca_struct +2808 (canary)
2.68% struct paca_struct +8 (paca_index)
2.24% struct paca_struct +48 (data_offset)
0.55% struct paca_struct +2816 (mmiowb_state.nesting_count)
0.18% struct paca_struct +2818 (mmiowb_state.mmiowb_pending)
0.03% struct paca_struct +2352 (hsrr_valid)
0.02% struct paca_struct +2356 (irq_work_pending)
0.00% struct paca_struct +0 (lppaca_ptr)
12.85% long unsigned int
   11.42% long unsigned int +0 (current_stack_pointer)
1.43% long unsigned int +0 (no field)

As you mentioned, we have single entry with two different fields:

12.85% long unsigned int
   11.42% long unsigned int +0 (current_stack_pointer)
1.43% long unsigned int +0 (no field)

With perf report -s type:

17.65%  struct paca_struct
12.85%  long unsigned int
 1.69%  struct task_struct
 1.51%  struct rq
 1.49%  struct pt_regs
 1.41%  struct vm_fault
 1.20%  struct inode
 1.15%  struct file
 1.08%  struct sd_lb_stats
 0.90%  struct security_hook_list
 0.86%  struct seq_file
 0.79%  struct cfs_rq
 0.78%  struct irq_desc
 0.72%  long long unsigned int


Where as with perf report -s typeoff:

11.42%  long unsigned int +0 (current_stack_pointer)
 4.68%  struct paca_struct +2312 (__current)
 4.57%  struct paca_struct +2354 (irq_soft_mask)
 2.69%  struct paca_struct +2808 (canary)
 2.68%  struct paca_struct +8 (paca_index)
 2.24%  struct paca_struct +48 (data_offset)
 1.43%  long unsigned int +0 (no field)


Namhyung,

If the above shared result for “type” and “typeoff” looks good and other 
changes are fine, I will post V7 with change for sort cmp to use cmp_null.
Please share your response.

Thanks
Athira

>> 
>> Signed-off-by: Athira Rajeev 
>> ---
>> tools/perf/util/annotate-data.c | 24 ++--
>> tools/perf/util/sort.c  | 23 +--
>> 2 files changed, 43 insertions(+), 4 deletions(-)
>> 
>> diff --git a/tools/perf/util/annotate-data.c 
>> b/tools/perf/util/annotate-data.c
>> index 8d05f3dbddf6..759c6a22e719 100644
>> ---

Re: [PATCH V6 18/18] tools/perf: Set instruction name to be used with insn-stat when using raw instruction

2024-07-12 Thread Athira Rajeev



> On 13 Jul 2024, at 2:57 AM, Namhyung Kim  wrote:
> 
> On Sun, Jul 07, 2024 at 08:14:19PM +0530, Athira Rajeev wrote:
>> Since the "ins.name" is not set while using raw instruction,
>> perf annotate with insn-stat gives wrong data:
>> 
>> Result from "./perf annotate --data-type --insn-stat":
>> 
>> Annotate Instruction stats
>> total 615, ok 419 (68.1%), bad 196 (31.9%)
>> 
>>  Name  :  Good   Bad
>> ---
>>:   419   196
>> 
>> Patch sets "dl->ins.name" in arch specific function "check_ppc_insn"
>> while initialising "struct disasm_line". Also update "ins_find" function
>> to pass "struct disasm_line" as a parameter so as to set its name field
>> in arch specific call.
>> 
>> With the patch changes:
>> 
>> Annotate Instruction stats
>> total 609, ok 446 (73.2%), bad 163 (26.8%)
>> 
>>  Name/opcode:  Good   Bad
>> ---
>>  58  :   32380
>>  32  :4943
>>  34  :3311
>>  OP_31_XOP_LDX   : 820
>>  40  :23 0
>>  OP_31_XOP_LWARX : 5 1
>>  OP_31_XOP_LWZX  : 2 3
>>  OP_31_XOP_LDARX : 3 0
>>  33  : 0 2
>>  OP_31_XOP_LBZX  : 0 1
>>  OP_31_XOP_LWAX  : 0 1
>>  OP_31_XOP_LHZX  : 0 1
>> 
>> Signed-off-by: Athira Rajeev 
>> ---
>> .../perf/arch/powerpc/annotate/instructions.c  | 18 +++---
>> tools/perf/builtin-annotate.c  |  4 ++--
>> tools/perf/util/annotate.c |  2 +-
>> tools/perf/util/disasm.c   | 10 +-
>> tools/perf/util/disasm.h   |  2 +-
>> 5 files changed, 24 insertions(+), 12 deletions(-)
>> 
>> diff --git a/tools/perf/arch/powerpc/annotate/instructions.c 
>> b/tools/perf/arch/powerpc/annotate/instructions.c
>> index af1032572bf3..ede9eeade0ab 100644
>> --- a/tools/perf/arch/powerpc/annotate/instructions.c
>> +++ b/tools/perf/arch/powerpc/annotate/instructions.c
>> @@ -189,8 +189,9 @@ static int cmp_offset(const void *a, const void *b)
>> return (val1->value - val2->value);
>> }
>> 
>> -static struct ins_ops *check_ppc_insn(u32 raw_insn)
>> +static struct ins_ops *check_ppc_insn(struct disasm_line *dl)
>> {
>> + int raw_insn = dl->raw.raw_insn;
>> int opcode = PPC_OP(raw_insn);
>> int mem_insn_31 = PPC_21_30(raw_insn);
>> struct insn_offset *ret;
>> @@ -198,19 +199,30 @@ static struct ins_ops *check_ppc_insn(u32 raw_insn)
>> "OP_31_INSN",
>> mem_insn_31
>> };
>> + char name_insn[32];
>> 
>> /*
>>  * Instructions with opcode 32 to 63 are memory
>>  * instructions in powerpc
>>  */
>> if ((opcode & 0x20)) {
>> + /*
>> +  * Set name in case of raw instruction to
>> +  * opcode to be used in insn-stat
>> +  */
>> + if (!strlen(dl->ins.name)) {
>> + sprintf(name_insn, "%d", opcode);
>> + dl->ins.name = strdup(name_insn);
>> + }
>> return &load_store_ops;
>> } else if (opcode == 31) {
>> /* Check for memory instructions with opcode 31 */
>> ret = bsearch(&mem_insns_31_opcode, ins_array, ARRAY_SIZE(ins_array), 
>> sizeof(ins_array[0]), cmp_offset);
>> - if (ret != NULL)
>> + if (ret) {
>> + if (!strlen(dl->ins.name))
>> + dl->ins.name = strdup(ret->name);
>> return &load_store_ops;
>> - else {
>> + } else {
>> mem_insns_31_opcode.value = PPC_22_30(raw_insn);
>> ret = bsearch(&mem_insns_31_opcode, arithmetic_ins_op_31, 
>> ARRAY_SIZE(arithmetic_ins_op_31),
>> sizeof(arithmetic_ins_op_31[0]), cmp_offset);
>> diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
>> index b10b7f005658..68e929d4746e 100644
>> --- a/tools/perf/builtin-annotate.c
>> +++ b/tools/perf/builtin-annotate.c
>> @@ -396,10 +396,10 @@ static void print_annotate_item_stat(struct list_head 
>> *head, const char *title)
>> printf("total %d, ok %d (%.1f%%), bad %d (%.1f%%)\n\n", total,
>>total_good, 100.0 * total_good / (total ?: 1),
>>total_bad, 100.0 * total_bad / (total ?: 1));
>> - printf("  %-10s: %5s %5s\n", "Name", "Good", "Bad");
>> + printf("  %-10s: %5s %5s\n", "Name/opcode", "Good", "Bad");
> 
> It should be "%-20s".
> 
> Thanks,
> Namhyung

Sure, will make this change

Thanks
Athira
> 
> 
>> printf("---\n");
>> list_for_each_entry(istat, head, list)
>> - printf("  %-10s: %5d %5d\n", istat->name, istat->good, istat->bad);
>> + printf("  %-20s: %5d %5d\n", istat->name, istat->good, istat->bad);
>> printf("\n");
>> }
>> 
>> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
>> index 8db2f32700aa..e1f24dff8042 100644
>> --- a/tools/perf/util/annotate.c
>> +++ b/tools/perf/util/annotate.c
>> @@ -2229,7 +2229,7 @@ static struct annotated_item_stat 
>> *annotate_data_stat(struct list_head *head,
>> return NULL;
>> 
>> istat->name = strdup(name);
>> - if (istat->name == NULL) {
>> + if ((istat->name == NULL) || (!strlen(istat

Re: [PATCH V6 18/18] tools/perf: Set instruction name to be used with insn-stat when using raw instruction

2024-07-12 Thread Namhyung Kim
On Sun, Jul 07, 2024 at 08:14:19PM +0530, Athira Rajeev wrote:
> Since the "ins.name" is not set while using raw instruction,
> perf annotate with insn-stat gives wrong data:
> 
> Result from "./perf annotate --data-type --insn-stat":
> 
> Annotate Instruction stats
> total 615, ok 419 (68.1%), bad 196 (31.9%)
> 
>   Name  :  Good   Bad
> ---
> :   419   196
> 
> Patch sets "dl->ins.name" in arch specific function "check_ppc_insn"
> while initialising "struct disasm_line". Also update "ins_find" function
> to pass "struct disasm_line" as a parameter so as to set its name field
> in arch specific call.
> 
> With the patch changes:
> 
> Annotate Instruction stats
> total 609, ok 446 (73.2%), bad 163 (26.8%)
> 
>   Name/opcode:  Good   Bad
> ---
>   58  :   32380
>   32  :4943
>   34  :3311
>   OP_31_XOP_LDX   : 820
>   40  :23 0
>   OP_31_XOP_LWARX : 5 1
>   OP_31_XOP_LWZX  : 2 3
>   OP_31_XOP_LDARX : 3 0
>   33  : 0 2
>   OP_31_XOP_LBZX  : 0 1
>   OP_31_XOP_LWAX  : 0 1
>   OP_31_XOP_LHZX  : 0 1
> 
> Signed-off-by: Athira Rajeev 
> ---
>  .../perf/arch/powerpc/annotate/instructions.c  | 18 +++---
>  tools/perf/builtin-annotate.c  |  4 ++--
>  tools/perf/util/annotate.c |  2 +-
>  tools/perf/util/disasm.c   | 10 +-
>  tools/perf/util/disasm.h   |  2 +-
>  5 files changed, 24 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/perf/arch/powerpc/annotate/instructions.c 
> b/tools/perf/arch/powerpc/annotate/instructions.c
> index af1032572bf3..ede9eeade0ab 100644
> --- a/tools/perf/arch/powerpc/annotate/instructions.c
> +++ b/tools/perf/arch/powerpc/annotate/instructions.c
> @@ -189,8 +189,9 @@ static int cmp_offset(const void *a, const void *b)
>   return (val1->value - val2->value);
>  }
>  
> -static struct ins_ops *check_ppc_insn(u32 raw_insn)
> +static struct ins_ops *check_ppc_insn(struct disasm_line *dl)
>  {
> + int raw_insn = dl->raw.raw_insn;
>   int opcode = PPC_OP(raw_insn);
>   int mem_insn_31 = PPC_21_30(raw_insn);
>   struct insn_offset *ret;
> @@ -198,19 +199,30 @@ static struct ins_ops *check_ppc_insn(u32 raw_insn)
>   "OP_31_INSN",
>   mem_insn_31
>   };
> + char name_insn[32];
>  
>   /*
>* Instructions with opcode 32 to 63 are memory
>* instructions in powerpc
>*/
>   if ((opcode & 0x20)) {
> + /*
> +  * Set name in case of raw instruction to
> +  * opcode to be used in insn-stat
> +  */
> + if (!strlen(dl->ins.name)) {
> + sprintf(name_insn, "%d", opcode);
> + dl->ins.name = strdup(name_insn);
> + }
>   return &load_store_ops;
>   } else if (opcode == 31) {
>   /* Check for memory instructions with opcode 31 */
>   ret = bsearch(&mem_insns_31_opcode, ins_array, 
> ARRAY_SIZE(ins_array), sizeof(ins_array[0]), cmp_offset);
> - if (ret != NULL)
> + if (ret) {
> + if (!strlen(dl->ins.name))
> + dl->ins.name = strdup(ret->name);
>   return &load_store_ops;
> - else {
> + } else {
>   mem_insns_31_opcode.value = PPC_22_30(raw_insn);
>   ret = bsearch(&mem_insns_31_opcode, 
> arithmetic_ins_op_31, ARRAY_SIZE(arithmetic_ins_op_31),
>   sizeof(arithmetic_ins_op_31[0]), 
> cmp_offset);
> diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
> index b10b7f005658..68e929d4746e 100644
> --- a/tools/perf/builtin-annotate.c
> +++ b/tools/perf/builtin-annotate.c
> @@ -396,10 +396,10 @@ static void print_annotate_item_stat(struct list_head 
> *head, const char *title)
>   printf("total %d, ok %d (%.1f%%), bad %d (%.1f%%)\n\n", total,
>  total_good, 100.0 * total_good / (total ?: 1),
>  total_bad, 100.0 * total_bad / (total ?: 1));
> - printf("  %-10s: %5s %5s\n", "Name", "Good", "Bad");
> + printf("  %-10s: %5s %5s\n", "Name/opcode", "Good", "Bad");

It should be "%-20s".

Thanks,
Namhyung


>   printf("---\n");
>   list_for_each_entry(istat, head, list)
> - printf("  %-10s: %5d %5d\n", istat->name, istat->good, 
> istat->bad);
> + printf("  %-20s: %5d %5d\n", istat->name, istat->good, 
> istat->bad);
>   printf("\n");
>  }
>  
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index 8db2f32700aa..e1f2

Re: [PATCH V6 17/18] tools/perf: Update data_type_cmp and sort__typeoff_sort function to include var_name in comparison

2024-07-12 Thread Namhyung Kim
On Sun, Jul 07, 2024 at 08:14:18PM +0530, Athira Rajeev wrote:
> Currently data_type_cmp() only compares size and type name.
> But in cases where the type name of two data type entries
> is same, but var_name is different, the comparison can't distinguish
> two different types.
> 
> Consider there is a "long unsigned int" with var_name as "X" and there
> is global variable "long unsigned int". Currently since
> data_type_cmp uses only type_name for comparison ( "long unsigned int"),
> it won't distinguish these as separate entries. Update the

I'm still not sure if it's ok.  It intentionally merges different
instances of the same type together as it's a data 'type' profile.


> functions "data_type_cmp" as well as "sort__typeoff_sort" to
> compare variable names after type name if it exists.
> 
> Also updated "hist_entry__typeoff_snprintf" to print var_name if
> it is set. With the changes,
> 
>  11.42%  long unsigned int  long unsigned int +0 (current_stack_pointer)
>  4.68%  struct paca_struct  struct paca_struct +2312 (__current)
>  4.57%  struct paca_struct  struct paca_struct +2354 (irq_soft_mask)
>  2.69%  struct paca_struct  struct paca_struct +2808 (canary)
>  2.68%  struct paca_struct  struct paca_struct +8 (paca_index)
>  2.24%  struct paca_struct  struct paca_struct +48 (data_offset)
>  1.43%  long unsigned int  long unsigned int +0 (no field)

It seems like an output of `perf report -s type,typeoff`.  But I'm
curious how it'd work with -s type only?  I guess it'd have two separate
entries for 'long unsigned int'.  Ideally we can have a single entry
with two different fields.

For example, `perf report -s type,typeoff -H`:

  12.85% long unsigned int
 11.42% long unsigned int +0 (current_stack_pointer)
  1.43% long unsigned int +0 (no field)
  ...

> 
> Signed-off-by: Athira Rajeev 
> ---
>  tools/perf/util/annotate-data.c | 24 ++--
>  tools/perf/util/sort.c  | 23 +--
>  2 files changed, 43 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
> index 8d05f3dbddf6..759c6a22e719 100644
> --- a/tools/perf/util/annotate-data.c
> +++ b/tools/perf/util/annotate-data.c
> @@ -167,7 +167,7 @@ static void exit_type_state(struct type_state *state)
>  }
>  
>  /*
> - * Compare type name and size to maintain them in a tree.
> + * Compare type name, var_name  and size to maintain them in a tree.
>   * I'm not sure if DWARF would have information of a single type in many
>   * different places (compilation units).  If not, it could compare the
>   * offset of the type entry in the .debug_info section.
> @@ -176,12 +176,32 @@ static int data_type_cmp(const void *_key, const struct 
> rb_node *node)
>  {
>   const struct annotated_data_type *key = _key;
>   struct annotated_data_type *type;
> + int64_t ret = 0;
>  
>   type = rb_entry(node, struct annotated_data_type, node);
>  
>   if (key->self.size != type->self.size)
>   return key->self.size - type->self.size;
> - return strcmp(key->self.type_name, type->self.type_name);
> +
> + ret = strcmp(key->self.type_name, type->self.type_name);
> + if (ret) {
> + return ret;
> + }

No need for the parentheses.

> +
> + /*
> +  * Compare var_name if it exists for key and type.
> +  * If both nodes doesn't have var_name, but one of
> +  * them has, return non-zero. This is to indicate nodes
> +  * are not the same if one has var_name, but other doesn't.
> +  */
> + if (key->self.var_name && type->self.var_name) {
> + ret = strcmp(key->self.var_name, type->self.var_name);
> + if (ret)
> + return ret;
> + } else if (key->self.var_name || type->self.var_name)
> + return 1;

I think you need to compare the order properly like in cmp_null() in
util/sort.c.  Please see below.

> +
> + return ret;
>  }
>  
>  static bool data_type_less(struct rb_node *node_a, const struct rb_node 
> *node_b)
> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> index cd39ea972193..c6d885060ee7 100644
> --- a/tools/perf/util/sort.c
> +++ b/tools/perf/util/sort.c
> @@ -2267,9 +2267,25 @@ sort__typeoff_sort(struct hist_entry *left, struct 
> hist_entry *right)
>   right_type = right->mem_type;
>   }
>  
> + /*
> +  * Compare type_name first. Next, ompare var_name if it exists
> +  * for left and right hist_entry. If both entries doesn't have
> +  * var_name, but one of them has, return non-zero. This is to
> +  * indicate entries are not the same if one has var_name, but the
> +  * other doesn't.
> +  * If type_name and var_name is same, use mem_type_off field.
> +  */
>   ret = strcmp(left_type->self.type_name, right_type->self.type_name);
>   if (ret)
>   return ret;
> +
> + if (left_type->self.var_name && ri

Re: [PATCH] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest

2024-07-12 Thread Amit Machhiwal
Hi Lizhi,

On 2024/07/11 02:18 PM, Lizhi Hou wrote:
> 
> On 7/11/24 11:48, Amit Machhiwal wrote:
> > On 2024/07/10 09:48 PM, Lizhi Hou wrote:
> > > On 7/5/24 12:20, Bjorn Helgaas wrote:
> > > > [+cc Lukas, FYI]
> > > > 
> > > > On Wed, Jul 03, 2024 at 07:46:34PM +0530, Amit Machhiwal wrote:
> > > > > With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug 
> > > > > sequence
> > > > > of a PCI device attached to a PCI-bridge causes following kernel Oops 
> > > > > on
> > > > > a pseries KVM guest:
> > > > > 
> > > > >RTAS: event: 2, Type: Hotplug Event (229), Severity: 1
> > > > >Kernel attempted to read user page (10ec0048) - exploit 
> > > > > attempt? (uid: 0)
> > > > >BUG: Unable to handle kernel data access on read at 0x10ec0048
> > > > >Faulting instruction address: 0xc12d8728
> > > > >Oops: Kernel access of bad area, sig: 11 [#1]
> > > > >LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> > > > > 
> > > > >NIP [c12d8728] __of_changeset_entry_invert+0x10/0x1ac
> > > > >LR [c12da7f0] __of_changeset_revert_entries+0x98/0x180
> > > > >Call Trace:
> > > > >[cbcc3970] [c12daa60] of_changeset_revert+0x58/0xd8
> > > > >[cbcc39c0] [c0d0ed78] of_pci_remove_node+0x74/0xb0
> > > > >[cbcc39f0] [c0cdcfe0] 
> > > > > pci_stop_bus_device+0xf4/0x138
> > > > >[cbcc3a30] [c0cdd140] 
> > > > > pci_stop_and_remove_bus_device_locked+0x34/0x64
> > > > >[cbcc3a60] [c0cf3780] remove_store+0xf0/0x108
> > > > >[cbcc3ab0] [c0e89e04] dev_attr_store+0x34/0x78
> > > > >[cbcc3ad0] [c07f8dd4] sysfs_kf_write+0x70/0xa4
> > > > >[cbcc3af0] [c07f7248] 
> > > > > kernfs_fop_write_iter+0x1d0/0x2e0
> > > > >[cbcc3b40] [c06c9b08] vfs_write+0x27c/0x558
> > > > >[cbcc3bf0] [c06ca168] ksys_write+0x90/0x170
> > > > >[cbcc3c40] [c0033248] 
> > > > > system_call_exception+0xf8/0x290
> > > > >[cbcc3e50] [c000d05c] 
> > > > > system_call_vectored_common+0x15c/0x2ec
> > > > > 
> > > > > 
> > > > > A git bisect pointed this regression to be introduced via [1] that 
> > > > > added
> > > > > a mechanism to create device tree nodes for parent PCI bridges when a
> > > > > PCI device is hot-plugged.
> > > > > 
> > > > > The Oops is caused when `pci_stop_dev()` tries to remove a 
> > > > > non-existing
> > > > > device-tree node associated with the pci_dev that was earlier
> > > > > hot-plugged and was attached under a pci-bridge. The PCI dev header
> > > > > `dev->hdr_type` being 0, results a conditional check done with
> > > > > `pci_is_bridge()` into false. Consequently, a call to
> > > > > `of_pci_make_dev_node()` to create a device node is never made. When 
> > > > > at
> > > > > a later point in time, in the device node removal path, a memcpy is
> > > > > attempted in `__of_changeset_entry_invert()`; since the device node 
> > > > > was
> > > > > never created, results in an Oops due to kernel read access to a bad
> > > > > address.
> > > > > 
> > > > > To fix this issue the patch updates `pci_stop_dev()` to ensure that a
> > > > > call to `of_pci_remove_node()` is only made for pci-bridge devices.
> > > > > 
> > > > > [1] commit 407d1a51921e ("PCI: Create device tree node for bridge")
> > > > > 
> > > > > Fixes: 407d1a51921e ("PCI: Create device tree node for bridge")
> > > > > Reported-by: Kowshik Jois B S 
> > > > > Tested-by: Kowshik Jois B S 
> > > > > Signed-off-by: Amit Machhiwal 
> > > > Thanks for the patch and testing!  Would like a reviewed-by from
> > > > Lizhi.
> > > of_pci_make_dev_node() will create of nodes for some endpoint devices 
> > > (not a
> > > bridge) as well. And actually this is the main purpose.
> > > 
> > > Maybe the patch as below would resolve the Oops?
> > Thanks for the patch, Lizhi! I tried out this patch and don't see the issue 
> > with
> > the same. The hot-plug and hot-unplug of PCI device seem to work fine as
> > expected.
> 
> Cool! Thanks for trying it. Will you re-spin the patch or you would like me
> to create a patch?

Sure, I'll re-spin and send the V2 of the patch.

~ Amit

> 
> Lizhi
> 
> > 
> > ~ Amit
> > 
> > > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> > > index dda6092e6d3a..3c693b091ecf 100644
> > > --- a/drivers/of/dynamic.c
> > > +++ b/drivers/of/dynamic.c
> > > @@ -492,21 +492,29 @@ struct device_node *__of_node_dup(const struct
> > > device_node *np,
> > >    * a given changeset.
> > >    *
> > >    * @ocs: Pointer to changeset
> > > + * @np: Pointer to device node. If it is not null, init it directly 
> > > instead
> > > of
> > > + *  allocate a new node.
> > >    * @parent: Pointer to parent device node
> > >    * @full_name: Node full name
> > >    *
> > >    * Return: Pointer to the created device node or NULL in case of an 
> > > error.
> > >    */
> 

Re: [PATCH V6 00/18] Add data type profiling support for powerpc

2024-07-12 Thread kajoljain



On 7/12/24 09:14, Athira Rajeev wrote:
> 
> 
>> On 7 Jul 2024, at 8:14 PM, Athira Rajeev  wrote:
>>
>> The patchset from Namhyung added support for data type profiling
>> in perf tool. This enabled support to associate PMU samples to data
>> types they refer using DWARF debug information. With the upstream
>> perf, currently it possible to run perf report or perf annotate to
>> view the data type information on x86.
>>
>> Initial patchset posted here had changes need to enable data type
>> profiling support for powerpc.
>>
> 
> Hi Namhyung,
> 
> Requesting for review on this V6 patchset. I have addressed review comments 
> from V5. 
> If the patchset looks good, can you please pull this in.
> 
> Thanks
> Athira

Patches looks fine to me. Also tested in powerpc box.

Reviewed-by : Kajol Jain 
Tested-by : Kajol Jain 

Thanks,
Kajol Jain

>> https://lore.kernel.org/all/6e09dc28-4a2e-49d8-a2b5-ffb3396a9...@csgroup.eu/T/
>>
>> Main change were:
>> 1. powerpc instruction nmemonic table to associate load/store
>> instructions with move_ops which is use to identify if instruction
>> is a memory access one.
>> 2. To get register number and access offset from the given
>> instruction, code uses fields from "struct arch" -> objump.
>> Added entry for powerpc here.
>> 3. A get_arch_regnum to return register number from the
>> register name string.
>>
>> But the apporach used in the initial patchset used parsing of
>> disassembled code which the current perf tool implementation does.
>>
>> Example: lwz r10,0(r9)
>>
>> This line "lwz r10,0(r9)" is parsed to extract instruction name,
>> registers names and offset. Also to find whether there is a memory
>> reference in the operands, "memory_ref_char" field of objdump is used.
>> For x86, "(" is used as memory_ref_char to tackle instructions of the
>> form "mov  (%rax), %rcx".
>>
>> In case of powerpc, not all instructions using "(" are the only memory
>> instructions. Example, above instruction can also be of extended form (X
>> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
>> and extract the source/target registers, second patchset added support to use
>> raw instruction. With raw instruction, macros are added to extract opcode
>> and register fields.
>> Link to second patchset:
>> https://lore.kernel.org/all/20240506121906.76639-1-atraj...@linux.vnet.ibm.com/
>>
>> Example representation using --show-raw-insn in objdump gives result:
>>
>> 38 01 81 e8 ld  r4,312(r1)
>>
>> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
>> this translates to instruction form: "ld RT,DS(RA)" and binary code
>> as:
>>  _
>>  | 58 |  RT  |  RA |  DS   | |
>>  -
>> 06 1116  30 31
>>
>> Second patchset used "objdump" again to read the raw instruction.
>> But since there is no need to disassemble and binary code can be read
>> directly from the DSO, third patchset (ie this patchset) uses below
>> apporach. The apporach preferred in powerpc to parse sample for data
>> type profiling in V3 patchset is:
>> - Read directly from DSO using dso__data_read_offset
>> - If that fails for any case, fallback to using libcapstone
>> - If libcapstone is not supported, approach will use objdump
>>
>> Patchset adds support to pick the opcode and reg fields from this
>> raw/binary instruction code. This approach came in from review comment
>> by Segher Boessenkool and Christophe for the initial patchset.
>>
>> Apart from that, instruction tracking is enabled for powerpc and
>> support function is added to find variables defined as registers
>> Example, in powerpc, below two registers are
>> defined to represent variable:
>> 1. r13: represents local_paca
>> register struct paca_struct *local_paca asm("r13");
>>
>> 2. r1: represents stack_pointer
>> register void *__stack_pointer asm("r1");
>>
>> These are handled in this patchset.
>>
>> - Patch 1 is to rearrange register state type structures to header file
>> so that it can referred from other arch specific files
>> - Patch 2 is to make instruction tracking as a callback to"struct arch"
>> so that it can be implemented by other archs easily and defined in arch
>> specific files
>> - Patch 3 is to handle state type regs array size for x86 and powerpc
>> - Patch 4 adds support to capture and parse raw instruction in powerpc
>> using dso__data_read_offset utility
>> - Patch 4 also adds logic to support using objdump when doing default "perf
>> report" or "perf annotate" since it that needs disassembled instruction.
>> - Patch 5 adds disasm_line__parse to parse raw instruction for powerpc
>> - Patch 6 update parameters for reg extract functions to use raw
>> instruction on powerpc
>> - Patch 7 updates ins__find to carry raw_insn and also adds parse
>> callback for memory instructions for powerpc
>> - Patch 8 add support to identify memory instructions of opcode 31 in
>> powerpc
>> - Patch 9 adds mo

Re: [PATCH 11/13] huge_memory: Remove dead vmf_insert_pXd code

2024-07-12 Thread Peter Xu
On Fri, Jul 12, 2024 at 12:40:39PM +1000, Alistair Popple wrote:
> 
> Peter Xu  writes:
> 
> > On Tue, Jul 09, 2024 at 02:07:31PM +1000, Alistair Popple wrote:
> >> 
> >> Peter Xu  writes:
> >> 
> >> > Hi, Alistair,
> >> >
> >> > On Thu, Jun 27, 2024 at 10:54:26AM +1000, Alistair Popple wrote:
> >> >> Now that DAX is managing page reference counts the same as normal
> >> >> pages there are no callers for vmf_insert_pXd functions so remove
> >> >> them.
> >> >> 
> >> >> Signed-off-by: Alistair Popple 
> >> >> ---
> >> >>  include/linux/huge_mm.h |   2 +-
> >> >>  mm/huge_memory.c| 165 
> >> >> +-
> >> >>  2 files changed, 167 deletions(-)
> >> >> 
> >> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> >> >> index 9207d8e..0fb6bff 100644
> >> >> --- a/include/linux/huge_mm.h
> >> >> +++ b/include/linux/huge_mm.h
> >> >> @@ -37,8 +37,6 @@ int change_huge_pmd(struct mmu_gather *tlb, struct 
> >> >> vm_area_struct *vma,
> >> >> pmd_t *pmd, unsigned long addr, pgprot_t newprot,
> >> >> unsigned long cp_flags);
> >> >>  
> >> >> -vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool 
> >> >> write);
> >> >> -vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool 
> >> >> write);
> >> >>  vm_fault_t dax_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool 
> >> >> write);
> >> >>  vm_fault_t dax_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool 
> >> >> write);
> >> >
> >> > There's a plan to support huge pfnmaps in VFIO, which may still make good
> >> > use of these functions.  I think it's fine to remove them but it may mean
> >> > we'll need to add them back when supporting pfnmaps with no memmap.
> >> 
> >> I'm ok with that. If we need them back in future it shouldn't be too
> >> hard to add them back again. I just couldn't find any callers of them
> >> once DAX stopped using them and the usual policy is to remove unused
> >> functions.
> >
> > True.  Currently the pmd/pud helpers are only used in dax.
> >
> >> 
> >> > Is it still possible to make the old API generic to both service the new
> >> > dax refcount plan, but at the meantime working for pfn injections when
> >> > there's no page struct?
> >> 
> >> I don't think so - this new dax refcount plan relies on having a struct
> >> page to take references on so I don't think it makes much sense to
> >> combine it with something that doesn't have a struct page. It sounds
> >> like the situation is the analogue of vm_insert_page()
> >> vs. vmf_insert_pfn() - it's possible for both to exist but there's not
> >> really anything that can be shared between the two APIs as one has a
> >> page and the other is just a raw PFN.
> >
> > I still think most of the codes should be shared on e.g. most of sanity
> > checks, pgtable injections, pgtable deposits (for pmd) and so on.
> 
> Yeah, it was mostly the BUG_ON's that weren't applicable once pXd_devmap
> went away.
> 
> > To be explicit, I wonder whether something like below diff would be
> > applicable on top of the patch "huge_memory: Allow mappings of PMD sized
> > pages" in this series, which introduced dax_insert_pfn_pmd() for dax:
> >
> > $ diff origin new
> > 1c1
> > < vm_fault_t dax_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write)
> > ---
> >> vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write)
> > 55,58c55,60
> > <   folio = page_folio(page);
> > <   folio_get(folio);
> > <   folio_add_file_rmap_pmd(folio, page, vma);
> > <   add_mm_counter(mm, mm_counter_file(folio), HPAGE_PMD_NR);
> > ---
> >> if (page) {
> >> folio = page_folio(page);
> >> folio_get(folio);
> >> folio_add_file_rmap_pmd(folio, page, vma);
> >> add_mm_counter(mm, mm_counter_file(folio), HPAGE_PMD_NR);
> >> }
> 
> We get the page from calling pfn_t_to_page(pfn). This is safe for the
> DAX case but is it safe to use a page returned by this more generally?

Good question.  I thought it should work when the caller doesn't set any
bit in PFN_FLAGS_MASK, but it turns out it's not the case?  As I just
notice:

static inline bool pfn_t_has_page(pfn_t pfn)
{
return (pfn.val & PFN_MAP) == PFN_MAP || (pfn.val & PFN_DEV) == 0;
}

So it looks like "no PFN_FLAGS" case should also fall into this category of
"(pfn.val & PFN_DEV) == 0"..

I'm not sure whether my understanding is correct, though.  Maybe we'd want
to double check with pfn_valid() when it's a generic function.

> 
> From an API perspective it would make more sense for the DAX code to
> pass the page rather than the pfn. I didn't do that because device DAX
> just had the PFN and this was DAX-specific code. But if we want to make
> it generic I'd rather have callers pass the page in.
> 
> Of course that probably doesn't help you, because then the call would be
> vmf_insert_page_pmd() rather than a raw pfn, but as you point out t

Re: [PATCH v2] printk: Add a short description string to kmsg_dump()

2024-07-12 Thread Jocelyn Falempe




On 12/07/2024 15:34, Kees Cook wrote:



On July 12, 2024 2:59:30 AM PDT, Jocelyn Falempe  wrote:

Gentle ping, I need reviews from powerpc, usermod linux, mtd, pstore and 
hyperv, to be able to push it in the drm-misc tree.


Oops, I thought I'd Acked already!

Acked-by: Kees Cook 

And, yeah, as mpe said, you're all good to take this via drm-misc.


Thanks a lot. If there is no objection I will push it to drm-misc mid 
next week. I may have all required acks by then.


Thanks!

-Kees




Best regards,

--

Jocelyn



Re: [PATCH net-next 5/5] soc: fsl: qbman: FSL_DPAA depends on COMPILE_TEST

2024-07-12 Thread Breno Leitao
On Fri, Jul 12, 2024 at 03:14:00PM +0300, Vladimir Oltean wrote:
> On Thu, Jul 11, 2024 at 02:00:25AM +0300, Vladimir Oltean wrote:
> > From: Breno Leitao 
> > 
> > As most of the drivers that depend on ARCH_LAYERSCAPE, make FSL_DPAA
> > depend on COMPILE_TEST for compilation and testing.
> > 
> > # grep -r depends.\*ARCH_LAYERSCAPE.\*COMPILE_TEST | wc -l
> > 29
> > 
> > Signed-off-by: Breno Leitao 
> > Signed-off-by: Vladimir Oltean 
> > ---
> 
> I don't know why nipa says there are 800+ new warnings/errors introduced
> by this patch, but it looks like a false positive (or I can't seem to
> find something relevant).

Right, All of the warnings are basically a set of MODULES_DESCRIPTION()
that are missing in other parts of the kernel, and not related to this
patch series. None of them seems to be in the network stack. (for
context, I've fixed all the missing MODULES_DESCRIPTION in the past).

nipa also complained about an unused variable, and we have a patch for
it already:

https://lore.kernel.org/all/20240712134817.913756-1-lei...@debian.org/


Re: [PATCH v2] printk: Add a short description string to kmsg_dump()

2024-07-12 Thread Kees Cook



On July 12, 2024 2:59:30 AM PDT, Jocelyn Falempe  wrote:
>Gentle ping, I need reviews from powerpc, usermod linux, mtd, pstore and 
>hyperv, to be able to push it in the drm-misc tree.

Oops, I thought I'd Acked already!

Acked-by: Kees Cook 

And, yeah, as mpe said, you're all good to take this via drm-misc.

Thanks!

-Kees


-- 
Kees Cook


Re: [PATCH] macintosh/therm_windtunnel: fix module unload.

2024-07-12 Thread Michael Ellerman
On Wed, 10 Jul 2024 23:54:17 -0400, Nick Bowler wrote:
> The of_device_unregister call in therm_windtunnel's module_exit procedure
> does not fully reverse the effects of of_platform_device_create in the
> module_init prodedure.  Once you unload this module, it is impossible
> to load it ever again since only the first of_platform_device_create
> call on the fan node succeeds.
> 
> This driver predates first git commit, and it turns out back then
> of_platform_device_create worked differently than it does today.
> So this is actually an old regression.
> 
> [...]

Applied to powerpc/next.

[1/1] macintosh/therm_windtunnel: fix module unload.
  https://git.kernel.org/powerpc/c/fd748e177194ebcbbaf98df75152a30e08230cc6

cheers


Re: [PATCH 1/3] powerpc: Drop clang workaround for builtin constant checks

2024-07-12 Thread Michael Ellerman
On Thu, 09 May 2024 22:12:46 +1000, Michael Ellerman wrote:
> The CPU/MMU feature code has build-time checks that the feature value is
> a builtin constant.
> 
> Back when the code was added clang wasn't able to compile the
> checks, so an ifdef was added to avoid the checks for clang builds.
> See b5fa0f7f88ed ("powerpc: Fix build failure with clang due to
> BUILD_BUG_ON()")
> 
> [...]

Applied to powerpc/next.

[1/3] powerpc: Drop clang workaround for builtin constant checks
  https://git.kernel.org/powerpc/c/489116d784bebec7e441f400715fbfe6edbce66c
[2/3] powerpc/xmon: Fix disassembly CPU feature checks
  https://git.kernel.org/powerpc/c/14196e47c5ffe32af7ed5a51c9e421c5ea5bccce
[3/3] powerpc: Check only single values are passed to CPU/MMU feature checks
  https://git.kernel.org/powerpc/c/db25a9625dbc3aa6613c0347f574689c248a3d0b

cheers


Re: [PATCH] powerpc: Remove 40x leftovers

2024-07-12 Thread Michael Ellerman
On Thu, 11 Jul 2024 12:49:01 +0200, Christophe Leroy wrote:
> 


Applied to powerpc/next.

[1/1] powerpc: Remove 40x leftovers
  https://git.kernel.org/powerpc/c/3efe19a9b1549d4a35ec5790ad49f0a0234c

cheers


Re: [PATCH] Documentation/powerpc: Mention 40x is removed

2024-07-12 Thread Michael Ellerman
On Thu, 11 Jul 2024 12:50:21 +0200, Christophe Leroy wrote:
> Commit 732b32daef80 ("powerpc: Remove core support for 40x") removed 40x.
> 
> Update documentation accordingly.
> 
> 

Applied to powerpc/next.

[1/1] Documentation/powerpc: Mention 40x is removed
  https://git.kernel.org/powerpc/c/90e812ac40c4b813fdbafab22f426fe4cdf840a8

cheers


Re: [PATCH 0/5] powerpc64/bpf: jit support for cpuv4 instructions

2024-07-12 Thread Michael Ellerman
On Fri, 17 May 2024 09:56:45 +0200, Artem Savkov wrote:
> Add support for recently added cpuv4 instructions fixing test_bpf module
> failures. This is mostly based on 8ecf3c1dab1c6 (powerpc/bpf/32: Fix
> failing test_bpf tests, 2024-03-05)
> 
> Artem Savkov (5):
>   powerpc64/bpf: jit support for 32bit offset jmp instruction
>   powerpc64/bpf: jit support for unconditional byte swap
>   powerpc64/bpf: jit support for sign extended load
>   powerpc64/bpf: jit support for sign extended mov
>   powerpc64/bpf: jit support for signed division and modulo
> 
> [...]

Applied to powerpc/next.

[1/5] powerpc64/bpf: jit support for 32bit offset jmp instruction
  https://git.kernel.org/powerpc/c/3c086ce222cefcf16d412faa10d456161d076796
[2/5] powerpc64/bpf: jit support for unconditional byte swap
  https://git.kernel.org/powerpc/c/a71c0b09a14db72d59c48a8cda7a73032f4d418b
[3/5] powerpc64/bpf: jit support for sign extended load
  https://git.kernel.org/powerpc/c/717756c9c8ddad9f28389185bfb161d4d88e01a4
[4/5] powerpc64/bpf: jit support for sign extended mov
  https://git.kernel.org/powerpc/c/597b1710982d10b8629697e4a548b30d0d93eeed
[5/5] powerpc64/bpf: jit support for signed division and modulo
  https://git.kernel.org/powerpc/c/fde318326daa48a4bb3ca8ee229bac4d14b5bc2a

cheers


Re: [PATCH v2 07/10] soc: fsl: cpm1: qmc: Introduce functions to get a channel from a phandle list

2024-07-12 Thread Michael Ellerman
LEROY Christophe  writes:
> Le 04/07/2024 à 05:01, Michael Ellerman a écrit :
>> Mark Brown  writes:
>>> On Mon, Jul 01, 2024 at 01:30:34PM +0200, Herve Codina wrote:
 qmc_chan_get_byphandle() and the resource managed version retrieve a
 channel from a simple phandle.

 Extend the API and introduce qmc_chan_get_byphandles_index() and the
 resource managed version in order to retrieve a channel from a phandle
 list using the provided index to identify the phandle in the list.
>>>
>>> These two PowerPC patches seem trivial enough and have got no response,
>>> unless someone objects I'll go ahead and apply them.
>> 
>> Ack.
>> 
>> MAINTAINERS says:
>> 
>> FREESCALE QUICC ENGINE LIBRARY
>> M:  Qiang Zhao 
>> L:  linuxppc-dev@lists.ozlabs.org
>> S:  Maintained
>> F:  drivers/soc/fsl/qe/
>> F:  include/soc/fsl/qe/
>> 
>> But I see no email from that address since January 2021:
>> 
>>https://lore.kernel.org/all/?q=f%3Aqiang.zhao%40nxp.com
>> 
>> And actually drivers/soc/fsl was marked orphan in April, maybe this
>> should be also.
>> 
>> Or does Herve want to take over maintaining it?
>
> We had some discussion about that in April, see 
> https://lore.kernel.org/linuxppc-dev/20240219153016.ntltc76bphwrv6hn@skbuf/T/#mf6d4a5eef79e8eae7ae0456a2794c01e630a6756
>
> Hervé has some of our hardware for a limited period of time because he 
> is doing some implementation for us, but he won't keep that hardware on 
> the long run.
>
> I will send a patch to take over maintaining drivers/soc/fsl/

Thanks.

cheers


Re: [PATCH v2] printk: Add a short description string to kmsg_dump()

2024-07-12 Thread Michael Ellerman
Jocelyn Falempe  writes:
> kmsg_dump doesn't forward the panic reason string to the kmsg_dumper
> callback.
> This patch adds a new struct kmsg_dump_detail, that will hold the
> reason and description, and pass it to the dump() callback.
>
> To avoid updating all kmsg_dump() call, it adds a kmsg_dump_desc()
> function and a macro for backward compatibility.
>
> I've written this for drm_panic, but it can be useful for other
> kmsg_dumper.
> It allows to see the panic reason, like "sysrq triggered crash"
> or "VFS: Unable to mount root fs on " on the drm panic screen.
>
> v2:
>  * Use a struct kmsg_dump_detail to hold the reason and description
>pointer, for more flexibility if we want to add other parameters.
>(Kees Cook)
>  * Fix powerpc/nvram_64 build, as I didn't update the forward
>declaration of oops_to_nvram()
>
> Signed-off-by: Jocelyn Falempe 
> ---
>  arch/powerpc/kernel/nvram_64.c |  8 
>  arch/powerpc/platforms/powernv/opal-kmsg.c |  4 ++--

Acked-by: Michael Ellerman  (powerpc)

cheers


Re: [PATCH v2] printk: Add a short description string to kmsg_dump()

2024-07-12 Thread Michael Ellerman
Jocelyn Falempe  writes:
> On 02/07/2024 14:26, Jocelyn Falempe wrote:
>> kmsg_dump doesn't forward the panic reason string to the kmsg_dumper
>> callback.
>> This patch adds a new struct kmsg_dump_detail, that will hold the
>> reason and description, and pass it to the dump() callback.
>> 
>> To avoid updating all kmsg_dump() call, it adds a kmsg_dump_desc()
>> function and a macro for backward compatibility.
>> 
>> I've written this for drm_panic, but it can be useful for other
>> kmsg_dumper.
>> It allows to see the panic reason, like "sysrq triggered crash"
>> or "VFS: Unable to mount root fs on " on the drm panic screen.
>> 
>> v2:
>>   * Use a struct kmsg_dump_detail to hold the reason and description
>> pointer, for more flexibility if we want to add other parameters.
>> (Kees Cook)
>>   * Fix powerpc/nvram_64 build, as I didn't update the forward
>> declaration of oops_to_nvram()
>> 
>> Signed-off-by: Jocelyn Falempe 
>> ---
>>   arch/powerpc/kernel/nvram_64.c |  8 
>>   arch/powerpc/platforms/powernv/opal-kmsg.c |  4 ++--
>>   arch/um/kernel/kmsg_dump.c |  2 +-
>>   drivers/gpu/drm/drm_panic.c|  4 ++--
>>   drivers/hv/hv_common.c |  4 ++--
>>   drivers/mtd/mtdoops.c  |  2 +-
>>   fs/pstore/platform.c   | 10 +-
>>   include/linux/kmsg_dump.h  | 22 +++---
>>   kernel/panic.c |  2 +-
>>   kernel/printk/printk.c | 11 ---
>>   10 files changed, 45 insertions(+), 24 deletions(-)
>
...
> Gentle ping, I need reviews from powerpc, usermod linux, mtd, pstore and 
> hyperv, to be able to push it in the drm-misc tree.

For a simple mechanical change like that you don't need reviews from
every subsystem. As long as it's posted to each subsystem and there's
been a bit of time for folks to see it, and the build robots to build
it, that should be sufficient. Otherwise you could be waiting forever.

cheers


Re: [PATCH net-next 5/5] soc: fsl: qbman: FSL_DPAA depends on COMPILE_TEST

2024-07-12 Thread Vladimir Oltean
On Thu, Jul 11, 2024 at 02:00:25AM +0300, Vladimir Oltean wrote:
> From: Breno Leitao 
> 
> As most of the drivers that depend on ARCH_LAYERSCAPE, make FSL_DPAA
> depend on COMPILE_TEST for compilation and testing.
> 
>   # grep -r depends.\*ARCH_LAYERSCAPE.\*COMPILE_TEST | wc -l
>   29
> 
> Signed-off-by: Breno Leitao 
> Signed-off-by: Vladimir Oltean 
> ---

I don't know why nipa says there are 800+ new warnings/errors introduced
by this patch, but it looks like a false positive (or I can't seem to
find something relevant).


[PATCH] MAINTAINERS: Update FREESCALE SOC DRIVERS and QUICC ENGINE LIBRARY

2024-07-12 Thread Christophe Leroy
FREESCALE SOC DRIVERS has been orphaned since
commit eaac25d026a1 ("MAINTAINERS: Drop Li Yang as their email address
stopped working")
QUICC ENGINE LIBRARY has Qiang Zhao as maintainer but he hasn't
responded for years and when Li Yang was still maintaining FREESCALE
SOC DRIVERS he was also handling QUICC ENGINE LIBRARY directly.

As a maintainer of LINUX FOR POWERPC EMBEDDED PPC8XX AND PPC83XX, I
also need FREESCALE SOC DRIVERS to be actively maintained, so add
myself as maintainer of FREESCALE SOC DRIVERS and QUICC ENGINE LIBRARY.

See below link for more context.

Link: 
https://lore.kernel.org/linuxppc-dev/20240219153016.ntltc76bphwrv6hn@skbuf/T/#mf6d4a5eef79e8eae7ae0456a2794c01e630a6756
Signed-off-by: Christophe Leroy 
---
 MAINTAINERS | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2e68a68ae810..efb0cf61226e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8832,6 +8832,7 @@ F:drivers/spi/spi-fsl-qspi.c
 
 FREESCALE QUICC ENGINE LIBRARY
 M: Qiang Zhao 
+M: Christophe Leroy 
 L: linuxppc-dev@lists.ozlabs.org
 S: Maintained
 F: drivers/soc/fsl/qe/
@@ -8881,9 +8882,10 @@ S:   Maintained
 F: drivers/tty/serial/ucc_uart.c
 
 FREESCALE SOC DRIVERS
+M: Christophe Leroy 
 L: linuxppc-dev@lists.ozlabs.org
 L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
-S: Orphan
+S: Maintained
 F: Documentation/devicetree/bindings/misc/fsl,dpaa2-console.yaml
 F: Documentation/devicetree/bindings/soc/fsl/
 F: drivers/soc/fsl/
-- 
2.44.0



Re: [PATCH v2] printk: Add a short description string to kmsg_dump()

2024-07-12 Thread Jocelyn Falempe

On 02/07/2024 14:26, Jocelyn Falempe wrote:

kmsg_dump doesn't forward the panic reason string to the kmsg_dumper
callback.
This patch adds a new struct kmsg_dump_detail, that will hold the
reason and description, and pass it to the dump() callback.

To avoid updating all kmsg_dump() call, it adds a kmsg_dump_desc()
function and a macro for backward compatibility.

I've written this for drm_panic, but it can be useful for other
kmsg_dumper.
It allows to see the panic reason, like "sysrq triggered crash"
or "VFS: Unable to mount root fs on " on the drm panic screen.

v2:
  * Use a struct kmsg_dump_detail to hold the reason and description
pointer, for more flexibility if we want to add other parameters.
(Kees Cook)
  * Fix powerpc/nvram_64 build, as I didn't update the forward
declaration of oops_to_nvram()

Signed-off-by: Jocelyn Falempe 
---
  arch/powerpc/kernel/nvram_64.c |  8 
  arch/powerpc/platforms/powernv/opal-kmsg.c |  4 ++--
  arch/um/kernel/kmsg_dump.c |  2 +-
  drivers/gpu/drm/drm_panic.c|  4 ++--
  drivers/hv/hv_common.c |  4 ++--
  drivers/mtd/mtdoops.c  |  2 +-
  fs/pstore/platform.c   | 10 +-
  include/linux/kmsg_dump.h  | 22 +++---
  kernel/panic.c |  2 +-
  kernel/printk/printk.c | 11 ---
  10 files changed, 45 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c
index e385d3164648..f9c6568a9137 100644
--- a/arch/powerpc/kernel/nvram_64.c
+++ b/arch/powerpc/kernel/nvram_64.c
@@ -73,7 +73,7 @@ static const char *nvram_os_partitions[] = {
  };
  
  static void oops_to_nvram(struct kmsg_dumper *dumper,

- enum kmsg_dump_reason reason);
+ struct kmsg_dump_detail *detail);
  
  static struct kmsg_dumper nvram_kmsg_dumper = {

.dump = oops_to_nvram
@@ -643,7 +643,7 @@ void __init nvram_init_oops_partition(int 
rtas_partition_exists)
   * partition.  If that's too much, go back and capture uncompressed text.
   */
  static void oops_to_nvram(struct kmsg_dumper *dumper,
- enum kmsg_dump_reason reason)
+ struct kmsg_dump_detail *detail)
  {
struct oops_log_info *oops_hdr = (struct oops_log_info *)oops_buf;
static unsigned int oops_count = 0;
@@ -655,7 +655,7 @@ static void oops_to_nvram(struct kmsg_dumper *dumper,
unsigned int err_type = ERR_TYPE_KERNEL_PANIC_GZ;
int rc = -1;
  
-	switch (reason) {

+   switch (detail->reason) {
case KMSG_DUMP_SHUTDOWN:
/* These are almost always orderly shutdowns. */
return;
@@ -671,7 +671,7 @@ static void oops_to_nvram(struct kmsg_dumper *dumper,
break;
default:
pr_err("%s: ignoring unrecognized KMSG_DUMP_* reason %d\n",
-  __func__, (int) reason);
+  __func__, (int) detail->reason);
return;
}
  
diff --git a/arch/powerpc/platforms/powernv/opal-kmsg.c b/arch/powerpc/platforms/powernv/opal-kmsg.c

index 6c3bc4b4da98..bb4218fa796e 100644
--- a/arch/powerpc/platforms/powernv/opal-kmsg.c
+++ b/arch/powerpc/platforms/powernv/opal-kmsg.c
@@ -20,13 +20,13 @@
   * message, it just ensures that OPAL completely flushes the console buffer.
   */
  static void kmsg_dump_opal_console_flush(struct kmsg_dumper *dumper,
-enum kmsg_dump_reason reason)
+struct kmsg_dump_detail *detail)
  {
/*
 * Outside of a panic context the pollers will continue to run,
 * so we don't need to do any special flushing.
 */
-   if (reason != KMSG_DUMP_PANIC)
+   if (detail->reason != KMSG_DUMP_PANIC)
return;
  
  	opal_flush_console(0);

diff --git a/arch/um/kernel/kmsg_dump.c b/arch/um/kernel/kmsg_dump.c
index 4382cf02a6d1..419021175272 100644
--- a/arch/um/kernel/kmsg_dump.c
+++ b/arch/um/kernel/kmsg_dump.c
@@ -8,7 +8,7 @@
  #include 
  
  static void kmsg_dumper_stdout(struct kmsg_dumper *dumper,

-   enum kmsg_dump_reason reason)
+   struct kmsg_dump_detail *detail)
  {
static struct kmsg_dump_iter iter;
static DEFINE_SPINLOCK(lock);
diff --git a/drivers/gpu/drm/drm_panic.c b/drivers/gpu/drm/drm_panic.c
index 948aed00595e..8794c7f6c0ee 100644
--- a/drivers/gpu/drm/drm_panic.c
+++ b/drivers/gpu/drm/drm_panic.c
@@ -655,11 +655,11 @@ static struct drm_plane *to_drm_plane(struct kmsg_dumper 
*kd)
return container_of(kd, struct drm_plane, kmsg_panic);
  }
  
-static void drm_panic(struct kmsg_dumper *dumper, enum kmsg_dump_reason reason)

+static void drm_panic(struct kmsg_dumper *dumper, struct kmsg_dump_detail 
*detail)
  {
struct

[PATCH 2/2] soc: fsl: qbman: fix unconditional WARN_ON() on Arm during fsl_qman_probe()

2024-07-12 Thread Vladimir Oltean
The blamed commit performed a lossy transformation of the original logic.
Due to it, on Arm DPAA1 platforms, we are now faced with this WARN on
each boot:

[ cut here ]
Unexpected architecture using non shared-dma-mem reservations
WARNING: CPU: 0 PID: 1 at drivers/soc/fsl/qbman/qman_ccsr.c:795 
fsl_qman_probe+0x1d0/0x768
pc : fsl_qman_probe+0x1d0/0x768
lr : fsl_qman_probe+0x1b0/0x768
Call trace:
 fsl_qman_probe+0x1d0/0x768
 platform_probe+0xa8/0xd0
 really_probe+0x128/0x2c8

Prior to the refactoring, the logic in fsl_qman_probe() was as follows:

if (fqd_a) { // previously found using 
RESERVEDMEM_OF_DECLARE("fsl,qman-fqd") [0]
 #ifdef CONFIG_PPC
/*
 * For PPC backward DT compatibility
 * FQD memory MUST be zero'd by software
 */
zero_priv_mem(fqd_a, fqd_sz);
 #else
WARN(1, "Unexpected architecture using non shared-dma-mem 
reservations");
 #endif
} else {
// Find FQD using new-style device tree bindings [1]
ret = qbman_init_private_mem(dev, 0, &fqd_a, &fqd_sz);
}

After the refactoring, the search for the new-style and the old-style
got flipped, and both got absorbed into qbman_init_private_mem().

This creates a problem, because there is no longer a place to put the
"fqd_a != 0" branch within fsl_qman_probe(). The callee,
qbman_init_private_mem(), does not distinguish between FQD, PFDR and
FBPR, and zero_priv_mem() must execute only for FQD.

Split qbman_init_private_mem() into two different functions:
qbman_find_reserved_mem_by_idx() for new-style bindings, and
qbman_find_reserved_mem_by_compatible() for old-style.

Let callers explicitly call both, which permits fsl_qman_probe() to
zero-initialize the FQD memory on PowerPC if it matched on a compatible
string.

[0] Legacy bindings used by PowerPC:

/ {
reserved-memory {
qman_fqd: qman-fqd {
compatible = "fsl,qman-fqd";
alloc-ranges = <0 0 0x1 0>;
size = <0 0x40>;
alignment = <0 0x40>;
};
};
};

[1] New bindings:

/ {
reserved-memory {
qman_fqd: qman-fqd {
compatible = "shared-dma-pool";
size = <0 0x80>;
alignment = <0 0x80>;
no-map;
};

qman_pfdr: qman-pfdr {
compatible = "shared-dma-pool";
size = <0 0x200>;
alignment = <0 0x200>;
no-map;
};
};

soc {
qman: qman@188 {
compatible = "fsl,qman";
reg = <0x0 0x188 0x0 0x1>;
memory-region = <&qman_fqd &qman_pfdr>;
};
};
};

Fixes: 3e62273ac63a ("soc: fsl: qbman: Remove RESERVEDMEM_OF_DECLARE usage")
Cc: 
Signed-off-by: Vladimir Oltean 
---
 drivers/soc/fsl/qbman/bman_ccsr.c | 11 --
 drivers/soc/fsl/qbman/dpaa_sys.c  | 62 ++-
 drivers/soc/fsl/qbman/dpaa_sys.h  |  7 ++--
 drivers/soc/fsl/qbman/qman_ccsr.c | 40 +---
 4 files changed, 82 insertions(+), 38 deletions(-)

diff --git a/drivers/soc/fsl/qbman/bman_ccsr.c 
b/drivers/soc/fsl/qbman/bman_ccsr.c
index b0f26f6f731e..d8a440a265c5 100644
--- a/drivers/soc/fsl/qbman/bman_ccsr.c
+++ b/drivers/soc/fsl/qbman/bman_ccsr.c
@@ -231,11 +231,14 @@ static int fsl_bman_probe(struct platform_device *pdev)
return -ENODEV;
}
 
-   ret = qbman_init_private_mem(dev, 0, "fsl,bman-fbpr", &fbpr_a, 
&fbpr_sz);
+   ret = qbman_find_reserved_mem_by_idx(dev, 0, &fbpr_a, &fbpr_sz);
+   if (ret)
+   ret = qbman_find_reserved_mem_by_compatible(dev, 
"fsl,bman-fbpr",
+   &fbpr_a, &fbpr_sz);
if (ret) {
-   dev_err(dev, "qbman_init_private_mem() failed 0x%x\n",
-   ret);
-   return -ENODEV;
+   dev_err(dev, "Failed to find FBPR reserved-memory region: 
%pe\n",
+   ERR_PTR(ret));
+   return ret;
}
 
dev_dbg(dev, "Allocated FBPR 0x%llx 0x%zx\n", fbpr_a, fbpr_sz);
diff --git a/drivers/soc/fsl/qbman/dpaa_sys.c b/drivers/soc/fsl/qbman/dpaa_sys.c
index b1cee145cbd7..7c775c4c8a21 100644
--- a/drivers/soc/fsl/qbman/dpaa_sys.c
+++ b/drivers/soc/fsl/qbman/dpaa_sys.c
@@ -31,28 +31,14 @@
 #include 
 #include "dpaa_sys.h"
 
-/*
- * Initialize a devices private memory region
- */
-int qbman_init_private_mem(struct device *dev, int idx, const char *compat,
-  dma_addr_t *addr, size_t *size)
+static int qbman_reserved_mem_lookup(struct device_node *mem_node,
+ 

[PATCH 1/2] soc: fsl: qbman: delete bogus device tree fixup in qbman_init_private_mem()

2024-07-12 Thread Vladimir Oltean
This is effectively a revert of commit 6ea4c0fe4570 ("soc/fsl/qbman:
Update device tree with reserved memory").

What that commit intended to do: Fix up the device tree that is passed
to a subsequent kexec-loaded kernel, so that the reserved-memory nodes
have the same base addresses as the currently running kernel.

What that commit actually does: Fix up the running device tree, which
has no effect whatsoever upon the device tree passed to the next kernel.

I would have refrained from making this kind of non-bugfix change in
stable kernels, but qbman_init_private_mem() grossly misrepresents
what this function does, and for an actual upcoming bug fix, it needs to
be refactored. There is no place for the bogus code afterwards, so it
needs to go as part of that, sadly.

Cc: 
Signed-off-by: Vladimir Oltean 
---
 drivers/soc/fsl/qbman/dpaa_sys.c | 31 ---
 1 file changed, 31 deletions(-)

diff --git a/drivers/soc/fsl/qbman/dpaa_sys.c b/drivers/soc/fsl/qbman/dpaa_sys.c
index e1d7b79cc450..b1cee145cbd7 100644
--- a/drivers/soc/fsl/qbman/dpaa_sys.c
+++ b/drivers/soc/fsl/qbman/dpaa_sys.c
@@ -39,8 +39,6 @@ int qbman_init_private_mem(struct device *dev, int idx, const 
char *compat,
 {
struct device_node *mem_node;
struct reserved_mem *rmem;
-   int err;
-   __be32 *res_array;
 
mem_node = of_parse_phandle(dev->of_node, "memory-region", idx);
if (!mem_node) {
@@ -60,34 +58,5 @@ int qbman_init_private_mem(struct device *dev, int idx, 
const char *compat,
*addr = rmem->base;
*size = rmem->size;
 
-   /*
-* Check if the reg property exists - if not insert the node
-* so upon kexec() the same memory region address will be preserved.
-* This is needed because QBMan HW does not allow the base address/
-* size to be modified once set.
-*/
-   if (!of_property_present(mem_node, "reg")) {
-   struct property *prop;
-
-   prop = devm_kzalloc(dev, sizeof(*prop), GFP_KERNEL);
-   if (!prop)
-   return -ENOMEM;
-   prop->value = res_array = devm_kzalloc(dev, sizeof(__be32) * 4,
-  GFP_KERNEL);
-   if (!prop->value)
-   return -ENOMEM;
-   res_array[0] = cpu_to_be32(upper_32_bits(*addr));
-   res_array[1] = cpu_to_be32(lower_32_bits(*addr));
-   res_array[2] = cpu_to_be32(upper_32_bits(*size));
-   res_array[3] = cpu_to_be32(lower_32_bits(*size));
-   prop->length = sizeof(__be32) * 4;
-   prop->name = devm_kstrdup(dev, "reg", GFP_KERNEL);
-   if (!prop->name)
-   return -ENOMEM;
-   err = of_add_property(mem_node, prop);
-   if (err)
-   return err;
-   }
-
return 0;
 }
-- 
2.34.1



RE: [PATCH v5 0/2] PCI/AER: Handle Advisory Non-Fatal error

2024-07-12 Thread Duan, Zhenzhong
Hi Bjorn,

Kindly ping, this series got Reviewed-by and no comments for a month.
Will you think about picking it or further improvements are needed.
Look forward to your suggestions.

Thanks
Zhenzhong

>-Original Message-
>From: Duan, Zhenzhong 
>Subject: [PATCH v5 0/2] PCI/AER: Handle Advisory Non-Fatal error
>
>Hi,
>
>This is a relay work of Qingshun's v2 [1], but changed to focus on ANFE
>processing as subject suggests and drops trace-event for now. I think it's
>a bit heavy to do extra IOes to get PCIe registers only for trace purpose
>and not see it a community request for now.
>
>According to PCIe Base Specification Revision 6.1, Sections 6.2.3.2.4 and
>6.2.4.3, certain uncorrectable errors will signal ERR_COR instead of
>ERR_NONFATAL, logged as Advisory Non-Fatal Error(ANFE), and set bits in
>both Correctable Error(CE) Status register and Uncorrectable Error(UE)
>Status register. Currently, when handling AER events the kernel will only
>look at CE status or UE status, but never both. In the ANFE case, bits set
>in the UE status register will not be reported and cleared until the next
>FE/NFE arrives.
>
>For instance, previously, when the kernel receives an ANFE with Poisoned
>TLP in OS native AER mode, only the status of CE will be reported and
>cleared:
>
>  AER: Correctable error message received from :b7:02.0
>  PCIe Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
>device [8086:0db0] error status/mask=2000/
> [13] NonFatalErr
>
>If the kernel receives a Malformed TLP after that, two UEs will be
>reported, which is unexpected. The Malformed TLP Header is lost since
>the previous ANFE gated the TLP header logs:
>
>  PCIe Bus Error: severity="Uncorrectable (Fatal), type=Transaction Layer,
>(Receiver ID)
>device [8086:0db0] error status/mask=00041000/00180020
> [12] TLP(First)
> [18] MalfTLP
>
>To handle this case properly, calculate potential ANFE related status bits
>and save in aer_err_info. Use this information to determine the status bits
>that need to be cleared.
>
>Now, for the previous scenario, both CE status and related UE status will
>be reported and cleared after ANFE:
>
>  AER: Correctable error message received from :b7:02.0
>  PCIe Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
>device [8086:0db0] error status/mask=2000/
> [13] NonFatalErr
>Uncorrectable errors that may cause Advisory Non-Fatal:
> [12] TLP
>
>Note:
>checkpatch.pl will produce following warnings on PATCH1&2:
>
>WARNING: 'UE' may be misspelled - perhaps 'USE'?
>#22:
>uncorrectable error(UE) status should be cleared. However, there is no
>
>...similar warnings omitted...
>
>This is a false-positive, so not fixed.
>
>WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit
>description?)
>#10:
>  PCIe Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
>
>...similar warnings omitted...
>
>For readability reasons, these warnings are not fixed.
>
>
>
>[1] https://lore.kernel.org/linux-pci/20240125062802.50819-1-
>qingshun.w...@linux.intel.com
>
>Thanks
>Qingshun, Zhenzhong
>
>Changelog:
>v5:
> - squash patch 1 and 3 (Kuppuswamy)
> - add comment about avoiding race and fix typo error (Kuppuswamy)
> - collect Jonathan and Kuppuswamy's R-b
>
>v4:
>  - Fix a race in anfe_get_uc_status() (Jonathan)
>  - Add a comment to explain side effect of processing ANFE as NFE (Jonathan)
>  - Drop the check for PCI_EXP_DEVSTA_NFED
>
>v3:
>  - Split ANFE print and processing to two patches (Bjorn)
>  - Simplify ANFE handling, drop trace event
>  - Polish comments and patch description
>  - Add Tested-by
>
>v2:
>  - Reference to the latest PCIe Specification in both commit messages
>and comments, as suggested by Bjorn Helgaas.
>  - Describe the reason for storing additional information in
>aer_err_info in the commit message of PATCH 1, as suggested by Bjorn
>Helgaas.
>  - Add more details of behavior changes in the commit message of PATCH
>2, as suggested by Bjorn Helgaas.
>
>v4: https://lkml.org/lkml/2024/5/9/247
>v3: https://lore.kernel.org/lkml/20240417061407.1491361-1-
>zhenzhong.d...@intel.com
>v2: https://lore.kernel.org/linux-pci/20240125062802.50819-1-
>qingshun.w...@linux.intel.com
>v1: https://lore.kernel.org/linux-pci/20240111073227.31488-1-
>qingshun.w...@linux.intel.com
>
>
>Zhenzhong Duan (2):
>  PCI/AER: Clear UNCOR_STATUS bits that might be ANFE
>  PCI/AER: Print UNCOR_STATUS bits that might be ANFE
>
> drivers/pci/pci.h  |  1 +
> drivers/pci/pcie/aer.c | 79
>+-
> 2 files changed, 79 insertions(+), 1 deletion(-)
>
>--
>2.34.1



[PATCH v3 1/1] x86/elf: Add a new .note section containing xfeatures buffer layout info to x86 core files

2024-07-12 Thread Vignesh Balasubramanian
Add a new .note section containing type, size, offset and flags of every
xfeature that is present.

This information will be used by debuggers to understand the XSAVE
layout of the machine where the core file has been dumped, and to read
XSAVE registers, especially during cross-platform debugging.

The XSAVE layouts of modern AMD and Intel CPUs differ, especially since
Memory Protection Keys and the AVX-512 features have been inculcated into
the AMD CPUs.

Since AMD never adopted (and hence never left room in the XSAVE
layout for) the Intel MPX feature. Tools like GDB had assumed a fixed
XSAVE layout matching that of Intel (based on the XCR0 mask).

Hence, core dumps from AMD CPUs didn't match the known size for the
XCR0 mask. This resulted in GDB and other tools not being able to access
the values of the AVX-512 and PKRU registers on AMD CPUs.

To solve this, an interim solution has been accepted into GDB, and is
already a part of GDB 14, see
https://sourceware.org/pipermail/gdb-patches/2023-March/198081.html.

But it depends on heuristics based on the total XSAVE register set size
and the XCR0 mask to infer the layouts of the various register blocks
for core dumps, and hence, is not a foolproof mechanism to determine the
layout of the XSAVE area.

Therefore, add a new core dump note in order to allow GDB/LLDB and other
relevant tools to determine the layout of the XSAVE area of the machine
where the corefile was dumped.

The new core dump note (which is being proposed as a per-process .note
section), NT_X86_XSAVE_LAYOUT (0x205) contains an array of structures.

Each structure describes an individual extended feature containing
offset, size and flags in this format:

struct xfeat_component {
   u32 type;
   u32 size;
   u32 offset;
   u32 flags;
};

and in an independent manner, allowing for future extensions without
depending on hw arch specifics like CPUID etc.

Co-developed-by: Jini Susan George 
Signed-off-by: Jini Susan George 
Co-developed-by: Borislav Petkov (AMD) 
Signed-off-by: Borislav Petkov (AMD) 
Signed-off-by: Vignesh Balasubramanian 
---
V2->v3: simplified code as per review comments.
 arch/x86/Kconfig |  1 +
 arch/x86/include/asm/elf.h   |  9 
 arch/x86/kernel/fpu/xstate.c | 87 
 fs/binfmt_elf.c  |  4 +-
 include/uapi/linux/elf.h |  1 +
 5 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1d7122a18..0ea43da0b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -107,6 +107,7 @@ config X86
select ARCH_HAS_DEBUG_WX
select ARCH_HAS_ZONE_DMA_SET if EXPERT
select ARCH_HAVE_NMI_SAFE_CMPXCHG
+   select ARCH_HAVE_EXTRA_ELF_NOTES
select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
select ARCH_MIGHT_HAVE_PC_PARPORT
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 1fb83d477..cad37090b 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -13,6 +13,15 @@
 #include 
 #include 
 
+struct xfeat_component {
+   u32 type;
+   u32 size;
+   u32 offset;
+   u32 flags;
+} __packed;
+
+_Static_assert(sizeof(struct xfeat_component)%4 == 0, "xfeat_component is not 
aligned");
+
 typedef unsigned long elf_greg_t;
 
 #define ELF_NGREG (sizeof(struct user_regs_struct) / sizeof(elf_greg_t))
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c5a026fee..5d59f390c 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1838,3 +1839,89 @@ int proc_pid_arch_status(struct seq_file *m, struct 
pid_namespace *ns,
return 0;
 }
 #endif /* CONFIG_PROC_PID_ARCH_STATUS */
+
+#ifdef CONFIG_COREDUMP
+static const char owner_name[] = "LINUX";
+
+/*
+ * Dump type, size, offset and flag values for every xfeature that is present.
+ */
+static int dump_xsave_layout_desc(struct coredump_params *cprm)
+{
+   int num_records = 0;
+   int i;
+
+   for_each_extended_xfeature(i, fpu_user_cfg.max_features) {
+   struct xfeat_component xc = {
+   .type   = i,
+   .size   = xstate_sizes[i],
+   .offset = xstate_offsets[i],
+   /* reserved for future use */
+   .flags  = 0,
+   };
+
+   if (!dump_emit(cprm, &xc, sizeof(xc)))
+   return 0;
+
+   num_records++;
+   }
+   return num_records;
+}
+
+static u32 get_xsave_desc_size(void)
+{
+   u32 cnt = 0;
+   u32 i;
+
+   for_each_extended_xfeature(i, fpu_user_cfg.max_features)
+   cnt++;
+
+   return cnt * (sizeof(struct xfeat_component));
+}
+
+int elf_coredump_extra_notes_write(struct coredump_params *cprm)
+{
+   int num_records = 0;
+   struct

[PATCH v3 0/1] Add XSAVE layout description to Core files for debuggers to support varying XSAVE layouts

2024-07-12 Thread Vignesh Balasubramanian
This patch proposes to add an extra .note section in the corefile to dump the 
CPUID information of a machine. This is being done to solve the issue of tools 
like the debuggers having to deal with coredumps from machines with varying 
XSAVE layouts in spite of having the same XCR0 bits. The new proposed .note 
section, at this point, consists of an array of records containing the 
information of each extended feature that is present. This provides details 
about the offsets and the sizes of the various extended save state components 
of the machine where the application crash occurred. Requesting a review for 
this patch.

Vignesh Balasubramanian (1):
  x86/elf: Add a new .note section containing xfeatures buffer layout
info to x86 core files

 arch/x86/Kconfig |  1 +
 arch/x86/include/asm/elf.h   |  9 
 arch/x86/kernel/fpu/xstate.c | 87 
 fs/binfmt_elf.c  |  4 +-
 include/uapi/linux/elf.h |  1 +
 5 files changed, 100 insertions(+), 2 deletions(-)

-- 
2.34.1



Re: [PATCH v2 07/10] soc: fsl: cpm1: qmc: Introduce functions to get a channel from a phandle list

2024-07-12 Thread LEROY Christophe


Le 04/07/2024 à 05:01, Michael Ellerman a écrit :
> Mark Brown  writes:
>> On Mon, Jul 01, 2024 at 01:30:34PM +0200, Herve Codina wrote:
>>> qmc_chan_get_byphandle() and the resource managed version retrieve a
>>> channel from a simple phandle.
>>>
>>> Extend the API and introduce qmc_chan_get_byphandles_index() and the
>>> resource managed version in order to retrieve a channel from a phandle
>>> list using the provided index to identify the phandle in the list.
>>
>> These two PowerPC patches seem trivial enough and have got no response,
>> unless someone objects I'll go ahead and apply them.
> 
> Ack.
> 
> MAINTAINERS says:
> 
> FREESCALE QUICC ENGINE LIBRARY
> M:  Qiang Zhao 
> L:  linuxppc-dev@lists.ozlabs.org
> S:  Maintained
> F:  drivers/soc/fsl/qe/
> F:  include/soc/fsl/qe/
> 
> But I see no email from that address since January 2021:
> 
>https://lore.kernel.org/all/?q=f%3Aqiang.zhao%40nxp.com
> 
> And actually drivers/soc/fsl was marked orphan in April, maybe this
> should be also.
> 
> Or does Herve want to take over maintaining it?

We had some discussion about that in April, see 
https://lore.kernel.org/linuxppc-dev/20240219153016.ntltc76bphwrv6hn@skbuf/T/#mf6d4a5eef79e8eae7ae0456a2794c01e630a6756

Hervé has some of our hardware for a limited period of time because he 
is doing some implementation for us, but he won't keep that hardware on 
the long run.

I will send a patch to take over maintaining drivers/soc/fsl/

Christophe