Re: [PATCH 02/12] powerpc/pseries: Restructure hvc_get_chars() endianness

2024-02-19 Thread Christophe Leroy
Michael ? Any reason for not picking that one ?

Le 30/10/2023 à 14:16, Aneesh Kumar K.V a écrit :
> Benjamin Gray  writes:
> 
>> Sparse reports an endian mismatch in hvc_get_chars().
>>
>> At first it seemed like the retbuf should be __be64[], but actually
>> retbuf holds serialized registers returned by the hypervisor call, so
>> it's correctly CPU endian typed.
>>
>> Instead, it is the be64_to_cpu() that's misleading. The intent is to do
>> a byte swap on a little endian CPU. The swap is required because we
>> wanted to store the register values to memory without 'swapping' bytes,
>> so that the high order byte of the first register is the first byte
>> in the result buffer.
>>
>> In diagram form, on a LE CPU with the letters representing the return
>> string bytes:
>>
>>  (register bytes) A B C D E F G H   I J K L M N O P
>>(retbuf mem bytes) H G F E D C B A   P O N M L K J I
>> (buf/lbuf mem bytes) A B C D E F G H   I J K L M N O P
>>
>> So retbuf stores the registers in CPU endian, and buf always stores in
>> big endian.
>>
>> So replace the byte swap function with cpu_to_be64() and cast lbuf as an
>> array of __be64 to match the semantics closer.
>>
>> Signed-off-by: Benjamin Gray 
>> ---
>>   arch/powerpc/platforms/pseries/hvconsole.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/hvconsole.c 
>> b/arch/powerpc/platforms/pseries/hvconsole.c
>> index 1ac52963e08b..647718a15e78 100644
>> --- a/arch/powerpc/platforms/pseries/hvconsole.c
>> +++ b/arch/powerpc/platforms/pseries/hvconsole.c
>> @@ -29,11 +29,11 @@ int hvc_get_chars(uint32_t vtermno, char *buf, int count)
>>   {
>>  long ret;
>>  unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
>> -unsigned long *lbuf = (unsigned long *)buf;
>> +__be64 *lbuf = (__be64 __force *)buf;
>>   
>>  ret = plpar_hcall(H_GET_TERM_CHAR, retbuf, vtermno);
>> -lbuf[0] = be64_to_cpu(retbuf[1]);
>> -lbuf[1] = be64_to_cpu(retbuf[2]);
>> +lbuf[0] = cpu_to_be64(retbuf[1]);
>> +lbuf[1] = cpu_to_be64(retbuf[2]);
>>   
>>  if (ret == H_SUCCESS)
>>  return retbuf[0];
>>
> 
> There is no functionality change in this patch. It is clarifying the
> details that it expect the buf to have the big-endian format and retbuf
> contains native endian format.
> 
> Not sure why this was not picked.
> 
> Reviewed-by: Aneesh Kumar K.V 


Re: [PATCH 00/13] kmsan: Enable on powerpc

2024-02-19 Thread Christophe Leroy


Le 14/12/2023 à 06:55, Nicholas Miehlbradt a écrit :
> This series provides the minimal support for Kernal Memory Sanitizer on
> powerpc pseries le guests. Kernal Memory Sanitizer is a tool which detects
> uses of uninitialized memory. Currently KMSAN is clang only.
> 
> The clang support for powerpc has not yet been merged, the pull request can
> be found here [1].

As clang doesn't support it yet, it is probably prematurate to merge 
that in the kernel.

I have open https://github.com/linuxppc/issues/issues/475 to follow through

In the meantime I flag this series as "change requested" for a revisit 
it when time comes

Christophe

> 
> In addition to this series, there are a number of changes required in
> generic kmsan code. These changes are already on mailing lists as part of
> the series implementing KMSAN for s390 [2]. This series is intended to be
> rebased on top of the s390 series.
> 
> In addition, I found a bug in the rtc driver used on powerpc. I have sent
> a fix to this in a seperate series [3].
> 
> With this series and the two series mentioned above, I can successfully
> boot pseries le defconfig without KMSAN warnings. I have not tested other
> powerpc platforms.
> 
> [1] https://github.com/llvm/llvm-project/pull/73611
> [2] 
> https://lore.kernel.org/linux-mm/20231121220155.1217090-1-...@linux.ibm.com/
> [3] 
> https://lore.kernel.org/linux-rtc/20231129073647.2624497-1-nicho...@linux.ibm.com/
> 
> Nicholas Miehlbradt (13):
>kmsan: Export kmsan_handle_dma
>hvc: Fix use of uninitialized array in udbg_hvc_putc
>powerpc: Disable KMSAN santitization for prom_init, vdso and purgatory
>powerpc: Disable CONFIG_DCACHE_WORD_ACCESS when KMSAN is enabled
>powerpc: Unpoison buffers populated by hcalls
>powerpc/pseries/nvram: Unpoison buffer populated by rtas_call
>powerpc/kprobes: Unpoison instruction in kprobe struct
>powerpc: Unpoison pt_regs
>powerpc: Disable KMSAN checks on functions which walk the stack
>powerpc: Define KMSAN metadata address ranges for vmalloc and ioremap
>powerpc: Implement architecture specific KMSAN interface
>powerpc/string: Add KMSAN support
>powerpc: Enable KMSAN on powerpc
> 
>   arch/powerpc/Kconfig  |  3 +-
>   arch/powerpc/include/asm/book3s/64/pgtable.h  | 42 +++
>   arch/powerpc/include/asm/interrupt.h  |  2 +
>   arch/powerpc/include/asm/kmsan.h  | 51 +++
>   arch/powerpc/include/asm/string.h | 18 ++-
>   arch/powerpc/kernel/Makefile  |  2 +
>   arch/powerpc/kernel/irq_64.c  |  2 +
>   arch/powerpc/kernel/kprobes.c |  2 +
>   arch/powerpc/kernel/module.c  |  2 +-
>   arch/powerpc/kernel/process.c |  6 +--
>   arch/powerpc/kernel/stacktrace.c  | 10 ++--
>   arch/powerpc/kernel/vdso/Makefile |  1 +
>   arch/powerpc/lib/Makefile |  2 +
>   arch/powerpc/lib/mem_64.S |  5 +-
>   arch/powerpc/lib/memcpy_64.S  |  2 +
>   arch/powerpc/perf/callchain.c |  2 +-
>   arch/powerpc/platforms/pseries/hvconsole.c|  2 +
>   arch/powerpc/platforms/pseries/nvram.c|  4 ++
>   arch/powerpc/purgatory/Makefile   |  1 +
>   arch/powerpc/sysdev/xive/spapr.c  |  3 ++
>   drivers/tty/hvc/hvc_vio.c |  2 +-
>   mm/kmsan/hooks.c  |  1 +
>   .../selftests/powerpc/copyloops/asm/kmsan.h   |  0
>   .../powerpc/copyloops/linux/export.h  |  1 +
>   24 files changed, 152 insertions(+), 14 deletions(-)
>   create mode 100644 arch/powerpc/include/asm/kmsan.h
>   create mode 100644 tools/testing/selftests/powerpc/copyloops/asm/kmsan.h
> 


Re: [PATCH] powerpc: remove unused *_syscall_64.o variables in Makefile

2024-02-19 Thread Michael Ellerman
Masahiro Yamada  writes:
> +To: Daniel Axtens

Unfortunately dja doesn't work on the kernel anymore.

> Maybe, we should check if the issue fixed by
> 2f26ed1764b42a8c40d9c48441c73a70d805decf
> came back.

That crash is specific to Power8, which probably no one is running
syzkaller on anymore.

If I enable KCOV and boot with 1T segments disabled (to make it easier
to trigger the bug) it crashes similarly to the report in that commit.

> On Fri, Feb 16, 2024 at 10:55 PM Masahiro Yamada  wrote:
...
>> To restore the original behavior, we could replace them with:
>>
>>   GCOV_PROFILE_interrupt.o := n
>>   KCOV_INSTRUMENT_interrupt.o := n
>>   UBSAN_SANITIZE_interrupt.o := n

But just putting those back isn't actually enough to fix it, the code
has changed and there are other places that need KCOV disabled on P8.

So I'm not sure how to handle this one. I guess I might just take it and
then make a todo to fix the KCOV problems later.

cheers


Re: [PATCH] mm/debug_vm_pgtable: Fix BUG_ON with pud advanced test

2024-02-19 Thread Aneesh Kumar K.V
On 2/20/24 8:16 AM, Andrew Morton wrote:
> On Mon, 29 Jan 2024 13:43:39 +0530 "Aneesh Kumar K.V" 
>  wrote:
> 
>>> return (pud_val(pud) & (_PAGE_PSE|_PAGE_DEVMAP)) == _PAGE_PSE;
>>> }
>>> #endif
>>>
>>> #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>> static inline int pud_devmap(pud_t pud)
>>> {
>>> return !!(pud_val(pud) & _PAGE_DEVMAP);
>>> }
>>> #else
>>> static inline int pud_devmap(pud_t pud)
>>> {
>>> return 0;
>>> }
>>> #endif
>>>
>>> We might need some more clarity on this regarding x86 platform's pud huge
>>> page implementation.
>>>
>>
>> static vm_fault_t create_huge_pud(struct vm_fault *vmf)
>> {
>> #if defined(CONFIG_TRANSPARENT_HUGEPAGE) &&  \
>>  defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)
>>  struct vm_area_struct *vma = vmf->vma;
>>  /* No support for anonymous transparent PUD pages yet */
>>  if (vma_is_anonymous(vma))
>>  return VM_FAULT_FALLBACK;
>>  if (vma->vm_ops->huge_fault)
>>  return vma->vm_ops->huge_fault(vmf, PUD_ORDER);
>> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>  return VM_FAULT_FALLBACK;
>> }
> 
> cryptic reply, unreplied to.
> 
> What's the thinking here?  Should we proceed with the patch as-is, or
> are changes needed?
> 

Sorry for the confusion. What i wanted to update with the code was to reiterate
that no architectures currently does anonymous pud hugepage. So restricting
debug_vm_pgtable pud hugepage test to devmap pte entries should be ok w.r.t
these tests.

-aneesh


Re: [PATCH] mm/debug_vm_pgtable: Fix BUG_ON with pud advanced test

2024-02-19 Thread Andrew Morton
On Mon, 29 Jan 2024 13:43:39 +0530 "Aneesh Kumar K.V"  
wrote:

> > return (pud_val(pud) & (_PAGE_PSE|_PAGE_DEVMAP)) == _PAGE_PSE;
> > }
> > #endif
> > 
> > #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> > static inline int pud_devmap(pud_t pud)
> > {
> > return !!(pud_val(pud) & _PAGE_DEVMAP);
> > }
> > #else
> > static inline int pud_devmap(pud_t pud)
> > {
> > return 0;
> > }
> > #endif
> > 
> > We might need some more clarity on this regarding x86 platform's pud huge
> > page implementation.
> > 
> 
> static vm_fault_t create_huge_pud(struct vm_fault *vmf)
> {
> #if defined(CONFIG_TRANSPARENT_HUGEPAGE) &&   \
>   defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)
>   struct vm_area_struct *vma = vmf->vma;
>   /* No support for anonymous transparent PUD pages yet */
>   if (vma_is_anonymous(vma))
>   return VM_FAULT_FALLBACK;
>   if (vma->vm_ops->huge_fault)
>   return vma->vm_ops->huge_fault(vmf, PUD_ORDER);
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>   return VM_FAULT_FALLBACK;
> }

cryptic reply, unreplied to.

What's the thinking here?  Should we proceed with the patch as-is, or
are changes needed?



Re: [PATCH RFC] powerpc/pseries: exploit H_PAGE_SET_UNUSED for partition migration

2024-02-19 Thread Nathan Lynch
>
> Although the H_PAGE_INIT hcall's H_PAGE_SET_UNUSED historically has
> been tied to the cooperative memory overcommit (CMO) platform feature,
> the flag also is treated by the PowerVM hypervisor as a hint that the
> page contents need not be copied to the destination during a live
> partition migration.
>
> Use the "ibm,migratable-partition" root node property to determine
> whether this partition/guest can be migrated. Mark freed pages unused
> if so (or if CMO is in use, as before).
>
> Signed-off-by: Nathan Lynch 
> ---
> Several things yet to improve here:
>
> * powerpc's arch_free_page()/HAVE_ARCH_FREE_PAGE should be decoupled
>   from CONFIG_PPC_SMLPAR.
>
> * powerpc's arch_free_page() could be made to use a static key if
>   justified.
>
> * I have not yet measured the overhead this introduces, nor have I
>   measured the benefit to a live migration.
>
> To date, I have smoke tested it by doing a live migration and
> performing a build on a kernel with the change, to ensure it doesn't
> introduce obvious memory corruption or anything.

An update on this. I've been working on quantifying the benefit, and
I've now seen memory corruption after partition migrations, with several
programs getting mysterious SIGSEGV/SIGILL:

mobility: calling ibm,suspend-me on CPU 7
mobility: CPU 7 waking all threads
mobility: waiting for memory transfer to complete...
touch[10988]: segfault (11) at 8 nip 7fff8fc2f9fc lr 7fff8fc26684 code 1 in 
ld-2.28.so[7fff8fc2+3]
touch[10987]: segfault (11) at 8 nip 7fff86f5f9fc lr 7fff86f56684 code 1 in 
ld-2.28.so[7fff86f5+3]
touch[10988]: code: 3d22 81297470 71290020 4082254c e93d00f0 2fa9 
f93f00b0 409e2458 
touch[10988]: code: e93d00f8 e95d0068 eb1d 2fa9  419e00f8 
e93d0158 e91d0058 
touch[10987]: code: 3d22 81297470 71290020 4082254c e93d00f0 2fa9 
f93f00b0 409e2458 
touch[10987]: code: e93d00f8 e95d0068 eb1d 2fa9  419e00f8 
e93d0158 e91d0058 

Maybe I had some debug options enabled that masked this before, or just
got lucky.

In any case, it seems obvious now that for this to be safe,
powerpc/pseries needs to implement arch_alloc_page() to undo setting the
"unused" flag.


>
> This will be a possibly significant behavior change in that we will be
> flagging pages unused where we typically did not before. Until now,
> having CMO enabled was the only way to do this, and I don't think that
> feature is used all that much?
>
> Posting this as RFC to see if there are any major concerns.
> ---
>  arch/powerpc/platforms/pseries/lpar.c | 21 +++--
>  1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/lpar.c 
> b/arch/powerpc/platforms/pseries/lpar.c
> index 4561667832ed..b264371d8e12 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1772,17 +1773,25 @@ static void pSeries_set_page_state(struct page *page, 
> int order,
>   }
>  }
>  
> +static bool migratable_partition;
> +
>  void arch_free_page(struct page *page, int order)
>  {
> - if (radix_enabled())
> - return;
> - if (!cmo_free_hint_flag || !firmware_has_feature(FW_FEATURE_CMO))
> - return;
> -
> - pSeries_set_page_state(page, order, H_PAGE_SET_UNUSED);
> + if (migratable_partition ||
> + (firmware_has_feature(FW_FEATURE_CMO) && cmo_free_hint_flag))
> + pSeries_set_page_state(page, order, H_PAGE_SET_UNUSED);
>  }
>  EXPORT_SYMBOL(arch_free_page);
>  
> +static int __init check_migratable_partition(void)
> +{
> + struct device_node *root = of_find_node_by_path("/");
> + migratable_partition = !!of_find_property(root, 
> "ibm,migratable-partition", NULL);
> + of_node_put(root);
> + return 0;
> +}
> +machine_device_initcall(pseries, check_migratable_partition);
> +
>  #endif /* CONFIG_PPC_SMLPAR */
>  #endif /* CONFIG_PPC_BOOK3S_64 */
>  
>


Re: [PATCH 01/13] kmsan: Export kmsan_handle_dma

2024-02-19 Thread Christophe Leroy


Le 14/12/2023 à 06:55, Nicholas Miehlbradt a écrit :
> kmsan_handle_dma is required by virtio drivers. Export kmsan_handle_dma
> so that the drivers can be compiled as modules.
> 
> Signed-off-by: Nicholas Miehlbradt 
> ---
>   mm/kmsan/hooks.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
> index 7a30274b893c..3532d9275ca5 100644
> --- a/mm/kmsan/hooks.c
> +++ b/mm/kmsan/hooks.c
> @@ -358,6 +358,7 @@ void kmsan_handle_dma(struct page *page, size_t offset, 
> size_t size,
>   size -= to_go;
>   }
>   }
> +EXPORT_SYMBOL(kmsan_handle_dma);

virtio is GPL and all exports inside virtio are EXPORT_SYMBOL_GPL().
Should this one be _GPL as well ?

>   
>   void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
>enum dma_data_direction dir)


[PATCHv8 1/2] watchdog/softlockup: low-overhead detection of interrupt

2024-02-19 Thread Bitao Hu
The following softlockup is caused by interrupt storm, but it cannot be
identified from the call tree. Because the call tree is just a snapshot
and doesn't fully capture the behavior of the CPU during the soft lockup.
  watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
  ...
  Call trace:
__do_softirq+0xa0/0x37c
__irq_exit_rcu+0x108/0x140
irq_exit+0x14/0x20
__handle_domain_irq+0x84/0xe0
gic_handle_irq+0x80/0x108
el0_irq_naked+0x50/0x58

Therefore,I think it is necessary to report CPU utilization during the
softlockup_thresh period (report once every sample_period, for a total
of 5 reportings), like this:
  watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
  CPU#28 Utilization every 4s during lockup:
#1: 0% system, 0% softirq, 100% hardirq, 0% idle
#2: 0% system, 0% softirq, 100% hardirq, 0% idle
#3: 0% system, 0% softirq, 100% hardirq, 0% idle
#4: 0% system, 0% softirq, 100% hardirq, 0% idle
#5: 0% system, 0% softirq, 100% hardirq, 0% idle
  ...

This would be helpful in determining whether an interrupt storm has
occurred or in identifying the cause of the softlockup. The criteria for
determination are as follows:
  a. If the hardirq utilization is high, then interrupt storm should be
  considered and the root cause cannot be determined from the call tree.
  b. If the softirq utilization is high, then we could analyze the call
  tree but it may cannot reflect the root cause.
  c. If the system utilization is high, then we could analyze the root
  cause from the call tree.

The mechanism requires a considerable amount of global storage space
when configured for the maximum number of CPUs. Therefore, adding a
SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob that defaults to "yes"
if the max number of CPUs is <= 128.

Signed-off-by: Bitao Hu 
Reviewed-by: Douglas Anderson 
Reviewed-by: Liu Song 
---
 kernel/watchdog.c | 98 ++-
 lib/Kconfig.debug | 13 +++
 2 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 81a8862295d6..69e72d7e461d 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -16,6 +16,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -35,6 +37,8 @@ static DEFINE_MUTEX(watchdog_mutex);
 # define WATCHDOG_HARDLOCKUP_DEFAULT   0
 #endif
 
+#define NUM_SAMPLE_PERIODS 5
+
 unsigned long __read_mostly watchdog_enabled;
 int __read_mostly watchdog_user_enabled = 1;
 static int __read_mostly watchdog_hardlockup_user_enabled = 
WATCHDOG_HARDLOCKUP_DEFAULT;
@@ -333,6 +337,95 @@ __setup("watchdog_thresh=", watchdog_thresh_setup);
 
 static void __lockup_detector_cleanup(void);
 
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR_INTR_STORM
+enum stats_per_group {
+   STATS_SYSTEM,
+   STATS_SOFTIRQ,
+   STATS_HARDIRQ,
+   STATS_IDLE,
+   NUM_STATS_PER_GROUP,
+};
+
+static const enum cpu_usage_stat tracked_stats[NUM_STATS_PER_GROUP] = {
+   CPUTIME_SYSTEM,
+   CPUTIME_SOFTIRQ,
+   CPUTIME_IRQ,
+   CPUTIME_IDLE,
+};
+
+static DEFINE_PER_CPU(u16, cpustat_old[NUM_STATS_PER_GROUP]);
+static DEFINE_PER_CPU(u8, 
cpustat_util[NUM_SAMPLE_PERIODS][NUM_STATS_PER_GROUP]);
+static DEFINE_PER_CPU(u8, cpustat_tail);
+
+/*
+ * We don't need nanosecond resolution. A granularity of 16ms is
+ * sufficient for our precision, allowing us to use u16 to store
+ * cpustats, which will roll over roughly every ~1000 seconds.
+ * 2^24 ~= 16 * 10^6
+ */
+static u16 get_16bit_precision(u64 data_ns)
+{
+   return data_ns >> 24LL; /* 2^24ns ~= 16.8ms */
+}
+
+static void update_cpustat(void)
+{
+   int i;
+   u8 util;
+   u16 old_stat, new_stat;
+   struct kernel_cpustat kcpustat;
+   u64 *cpustat = kcpustat.cpustat;
+   u8 tail = __this_cpu_read(cpustat_tail);
+   u16 sample_period_16 = get_16bit_precision(sample_period);
+
+   kcpustat_cpu_fetch(&kcpustat, smp_processor_id());
+
+   for (i = 0; i < NUM_STATS_PER_GROUP; i++) {
+   old_stat = __this_cpu_read(cpustat_old[i]);
+   new_stat = get_16bit_precision(cpustat[tracked_stats[i]]);
+   util = DIV_ROUND_UP(100 * (new_stat - old_stat), 
sample_period_16);
+   __this_cpu_write(cpustat_util[tail][i], util);
+   __this_cpu_write(cpustat_old[i], new_stat);
+   }
+
+   __this_cpu_write(cpustat_tail, (tail + 1) % NUM_SAMPLE_PERIODS);
+}
+
+static void print_cpustat(void)
+{
+   int i, group;
+   u8 tail = __this_cpu_read(cpustat_tail);
+   u64 sample_period_second = sample_period;
+
+   do_div(sample_period_second, NSEC_PER_SEC);
+
+   /*
+* We do not want the "watchdog: " prefix on every line,
+* hence we use "printk" instead of "pr_crit".
+*/
+   printk(KERN_CRIT "CPU#%d Utilization every %llus during lockup:\n",
+  smp_processor_id(), sample_period_second);
+
+   for (i = 0; i < NUM

[PATCHv8 0/2] *** Detect interrupt storm in softlockup ***

2024-02-19 Thread Bitao Hu
Hi, guys.
I have implemented a low-overhead method for detecting interrupt
storm in softlockup. Please review it, all comments are welcome.

Changes from v7 to v8:

- From Thomas Gleixner, implement statistics within the interrupt
core code and provide sensible interfaces for the watchdog code. 

- Patch #1 remains unchanged. Patch #2 has significant changes
based on Thomas's suggestions, which is why I have removed
Liu Song and Douglas's Reviewed-by from patch #2. Please review
it again, and all comments are welcome.

Changes from v6 to v7:

- Remove "READ_ONCE" in "start_counting_irqs"

- Replace the hard-coded 5 with "NUM_SAMPLE_PERIODS" macro in
"set_sample_period".

- Add empty lines to help with reading the code.

- Remove the branch that processes IRQs where "counts_diff = 0".

- Add the Reviewed-by of Liu Song and Douglas.

Changes from v5 to v6:

- Use "./scripts/checkpatch.pl --strict" to get a few extra
style nits and fix them.

- Squash patch #3 into patch #1, and wrapp the help text to
80 columns.

- Sort existing headers alphabetically in watchdog.c

- Drop "softlockup_hardirq_cpus", just read "hardirq_counts"
and see if it's non-NULL.

- Store "nr_irqs" in a local variable.

- Simplify the calculation of "cpu_diff".

Changes from v4 to v5:

- Rearranging variable placement to make code look neater.

Changes from v3 to v4:

- Renaming some variable and function names to make the code logic
more readable.

- Change the code location to avoid predeclaring.

- Just swap rather than a double loop in tabulate_irq_count.

- Since nr_irqs has the potential to grow at runtime, bounds-check
logic has been implemented.

- Add SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob.

Changes from v2 to v3:

- From Liu Song, using enum instead of macro for cpu_stats, shortening
the name 'idx_to_stat' to 'stats', adding 'get_16bit_precesion' instead
of using right shift operations, and using 'struct irq_counts'.

- From kernel robot test, using '__this_cpu_read' and '__this_cpu_write'
instead of accessing to an per-cpu array directly, in order to avoid
this warning.
'sparse: incorrect type in initializer (different modifiers)'

Changes from v1 to v2:

- From Douglas, optimize the memory of cpustats. With the maximum number
of CPUs, that's now this.
2 * 8192 * 4 + 1 * 8192 * 5 * 4 + 1 * 8192 = 237,568 bytes.

- From Liu Song, refactor the code format and add necessary comments.

- From Douglas, use interrupt counts instead of interrupt time to
determine the cause of softlockup.

- Remove the cmdline parameter added in PATCHv1.


Bitao Hu (2):
  watchdog/softlockup: low-overhead detection of interrupt
  watchdog/softlockup: report the most frequent interrupts

 arch/mips/dec/setup.c|   2 +-
 arch/parisc/kernel/smp.c |   2 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c |   2 +-
 include/linux/irqdesc.h  |   9 +-
 include/linux/kernel_stat.h  |   4 +
 kernel/irq/internals.h   |   2 +-
 kernel/irq/irqdesc.c |  34 -
 kernel/irq/proc.c|   9 +-
 kernel/watchdog.c| 213 ++-
 lib/Kconfig.debug|  13 ++
 scripts/gdb/linux/interrupts.py  |   6 +-
 11 files changed, 269 insertions(+), 27 deletions(-)

-- 
2.37.1 (Apple Git-137.1)



[PATCHv8 2/2] watchdog/softlockup: report the most frequent interrupts

2024-02-19 Thread Bitao Hu
When the watchdog determines that the current soft lockup is due
to an interrupt storm based on CPU utilization, reporting the
most frequent interrupts could be good enough for further
troubleshooting.

Below is an example of interrupt storm. The call tree does not
provide useful information, but we can analyze which interrupt
caused the soft lockup by comparing the counts of interrupts.

[ 2987.488075] watchdog: BUG: soft lockup - CPU#9 stuck for 23s! 
[kworker/9:1:214]
[ 2987.488607] CPU#9 Utilization every 4s during lockup:
[ 2987.488941]  #1:   0% system,  0% softirq,   100% hardirq, 0% 
idle
[ 2987.489357]  #2:   0% system,  0% softirq,   100% hardirq, 0% 
idle
[ 2987.489771]  #3:   0% system,  0% softirq,   100% hardirq, 0% 
idle
[ 2987.490186]  #4:   0% system,  0% softirq,   100% hardirq, 0% 
idle
[ 2987.490601]  #5:   0% system,  0% softirq,   100% hardirq, 0% 
idle
[ 2987.491034] CPU#9 Detect HardIRQ Time exceeds 50%. Most frequent HardIRQs:
[ 2987.491493]  #1: 330985  irq#7
[ 2987.491743]  #2: 5000irq#10
[ 2987.492039]  #3: 9   irq#91
[ 2987.492318]  #4: 3   irq#118
...
[ 2987.492728] Call trace:
[ 2987.492729]  __do_softirq+0xa8/0x364

Signed-off-by: Bitao Hu 
---
 arch/mips/dec/setup.c|   2 +-
 arch/parisc/kernel/smp.c |   2 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c |   2 +-
 include/linux/irqdesc.h  |   9 ++-
 include/linux/kernel_stat.h  |   4 +
 kernel/irq/internals.h   |   2 +-
 kernel/irq/irqdesc.c |  34 ++--
 kernel/irq/proc.c|   9 +--
 kernel/watchdog.c| 115 ++-
 scripts/gdb/linux/interrupts.py  |   6 +-
 10 files changed, 159 insertions(+), 26 deletions(-)

diff --git a/arch/mips/dec/setup.c b/arch/mips/dec/setup.c
index 6c3704f51d0d..87f0a1436bf9 100644
--- a/arch/mips/dec/setup.c
+++ b/arch/mips/dec/setup.c
@@ -756,7 +756,7 @@ void __init arch_init_irq(void)
NULL))
pr_err("Failed to register fpu interrupt\n");
desc_fpu = irq_to_desc(irq_fpu);
-   fpu_kstat_irq = this_cpu_ptr(desc_fpu->kstat_irqs);
+   fpu_kstat_irq = this_cpu_ptr(&desc_fpu->kstat_irqs->cnt);
}
if (dec_interrupt[DEC_IRQ_CASCADE] >= 0) {
if (request_irq(dec_interrupt[DEC_IRQ_CASCADE], no_action,
diff --git a/arch/parisc/kernel/smp.c b/arch/parisc/kernel/smp.c
index 444154271f23..800eb64e91ad 100644
--- a/arch/parisc/kernel/smp.c
+++ b/arch/parisc/kernel/smp.c
@@ -344,7 +344,7 @@ static int smp_boot_one_cpu(int cpuid, struct task_struct 
*idle)
struct irq_desc *desc = irq_to_desc(i);
 
if (desc && desc->kstat_irqs)
-   *per_cpu_ptr(desc->kstat_irqs, cpuid) = 0;
+   *per_cpu_ptr(desc->kstat_irqs, cpuid) = (struct 
irqstat) { };
}
 #endif
 
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index e42984878503..f2636414d82a 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -837,7 +837,7 @@ static inline void this_cpu_inc_rm(unsigned int __percpu 
*addr)
  */
 static void kvmppc_rm_handle_irq_desc(struct irq_desc *desc)
 {
-   this_cpu_inc_rm(desc->kstat_irqs);
+   this_cpu_inc_rm(&desc->kstat_irqs->cnt);
__this_cpu_inc(kstat.irqs_sum);
 }
 
diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index d9451d456a73..2912b1998670 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -17,6 +17,11 @@ struct irq_desc;
 struct irq_domain;
 struct pt_regs;
 
+struct irqstat {
+   unsigned intcnt;
+   unsigned intref;
+};
+
 /**
  * struct irq_desc - interrupt descriptor
  * @irq_common_data:   per irq and chip data passed down to chip functions
@@ -55,7 +60,7 @@ struct pt_regs;
 struct irq_desc {
struct irq_common_data  irq_common_data;
struct irq_data irq_data;
-   unsigned int __percpu   *kstat_irqs;
+   struct irqstat __percpu *kstat_irqs;
irq_flow_handler_t  handle_irq;
struct irqaction*action;/* IRQ action list */
unsigned intstatus_use_accessors;
@@ -119,7 +124,7 @@ extern struct irq_desc irq_desc[NR_IRQS];
 static inline unsigned int irq_desc_kstat_cpu(struct irq_desc *desc,
  unsigned int cpu)
 {
-   return desc->kstat_irqs ? *per_cpu_ptr(desc->kstat_irqs, cpu) : 0;
+   return desc->kstat_irqs ? per_cpu(desc->kstat_irqs->cnt, cpu) : 0;
 }
 
 static inline struct irq_desc *irq_data_to_desc(struct irq_data *data)
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 9935f7ecbfb9..9cbb1361f957 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -79,6 +79,10 @@ static 

Re: [RESEND PATCH net v4 1/2] soc: fsl: qbman: Always disable interrupts when taking cgr_lock

2024-02-19 Thread Vladimir Oltean
Hi Sean,

On Thu, Feb 15, 2024 at 11:23:26AM -0500, Sean Anderson wrote:
> smp_call_function_single disables IRQs when executing the callback. To
> prevent deadlocks, we must disable IRQs when taking cgr_lock elsewhere.
> This is already done by qman_update_cgr and qman_delete_cgr; fix the
> other lockers.
> 
> Fixes: 96f413f47677 ("soc/fsl/qbman: fix issue in qman_delete_cgr_safe()")
> CC: sta...@vger.kernel.org
> Signed-off-by: Sean Anderson 
> Reviewed-by: Camelia Groza 
> Tested-by: Vladimir Oltean 
> ---
> I got no response the first time I sent this, so I am resending to net.
> This issue was introduced in a series which went through net, so I hope
> it makes sense to take it via net.
> 
> [1] 
> https://lore.kernel.org/linux-arm-kernel/20240108161904.2865093-1-sean.ander...@seco.com/
> 
> (no changes since v3)
> 
> Changes in v3:
> - Change blamed commit to something more appropriate
> 
> Changes in v2:
> - Fix one additional call to spin_unlock

Leo Li (Li Yang) is no longer with NXP. Until we figure out within NXP
how to continue with the maintainership of drivers/soc/fsl/, yes, please
continue to submit this series to 'net'. I would also like to point
out to Arnd that this is the case.

Arnd, a large portion of drivers/soc/fsl/ is networking-related
(dpio, qbman). Would it make sense to transfer the maintainership
of these under the respective networking drivers, to simplify the
procedures?

Also, your patches are whitespace-damaged. They do not apply to the
kernel, and patchwork shows this as well.
https://patchwork.kernel.org/project/netdevbpf/patch/20240215162327.3663092-1-sean.ander...@seco.com/

Please repost with this fixed.


Re: [PATCH v6 12/18] arm64/mm: Wire up PTE_CONT for user mappings

2024-02-19 Thread Catalin Marinas
On Fri, Feb 16, 2024 at 12:53:43PM +, Ryan Roberts wrote:
> On 16/02/2024 12:25, Catalin Marinas wrote:
> > On Thu, Feb 15, 2024 at 10:31:59AM +, Ryan Roberts wrote:
> >> +pte_t contpte_ptep_get_lockless(pte_t *orig_ptep)
> >> +{
> >> +  /*
> >> +   * Gather access/dirty bits, which may be populated in any of the ptes
> >> +   * of the contig range. We may not be holding the PTL, so any contiguous
> >> +   * range may be unfolded/modified/refolded under our feet. Therefore we
> >> +   * ensure we read a _consistent_ contpte range by checking that all ptes
> >> +   * in the range are valid and have CONT_PTE set, that all pfns are
> >> +   * contiguous and that all pgprots are the same (ignoring access/dirty).
> >> +   * If we find a pte that is not consistent, then we must be racing with
> >> +   * an update so start again. If the target pte does not have CONT_PTE
> >> +   * set then that is considered consistent on its own because it is not
> >> +   * part of a contpte range.
> >> +*/
[...]
> > After writing the comments above, I think I figured out that the whole
> > point of this loop is to check that the ptes in the contig range are
> > still consistent and the only variation allowed is the dirty/young
> > state to be passed to the orig_pte returned. The original pte may have
> > been updated by the time this loop finishes but I don't think it
> > matters, it wouldn't be any different than reading a single pte and
> > returning it while it is being updated.
> 
> Correct. The pte can be updated at any time, before after or during the reads.
> That was always the case. But now we have to cope with a whole contpte block
> being repainted while we are reading it. So we are just checking to make sure
> that all the ptes that we read from the contpte block are consistent with
> eachother and therefore we can trust that the access/dirty bits we gathered 
> are
> consistent.

I've been thinking a bit more about this - do any of the callers of
ptep_get_lockless() check the dirty/access bits? The only one that seems
to care is ptdump but in that case I'd rather see the raw bits for
debugging rather than propagating the dirty/access bits to the rest in
the contig range.

So with some clearer documentation on the requirements, I think we don't
need an arm64-specific ptep_get_lockless() (unless I missed something).

-- 
Catalin


Re: [PATCH v12 08/15] media: uapi: Define audio sample format fourcc type

2024-02-19 Thread Mauro Carvalho Chehab
Em Mon, 19 Feb 2024 12:05:02 +0800
Shengjiu Wang  escreveu:

> Hi Mauro
> 
> On Sat, Feb 17, 2024 at 5:19 PM Mauro Carvalho Chehab
>  wrote:
> >
> > Em Thu, 18 Jan 2024 20:32:01 +0800
> > Shengjiu Wang  escreveu:
> >  
> > > The audio sample format definition is from alsa,
> > > the header file is include/uapi/sound/asound.h, but
> > > don't include this header file directly, because in
> > > user space, there is another copy in alsa-lib.
> > > There will be conflict in userspace for include
> > > videodev2.h & asound.h and asoundlib.h
> > >
> > > Here still use the fourcc format.
> > >
> > > Signed-off-by: Shengjiu Wang 
> > > ---
> > >  .../userspace-api/media/v4l/pixfmt-audio.rst  | 87 +++
> > >  .../userspace-api/media/v4l/pixfmt.rst|  1 +
> > >  drivers/media/v4l2-core/v4l2-ioctl.c  | 13 +++
> > >  include/uapi/linux/videodev2.h| 23 +
> > >  4 files changed, 124 insertions(+)
> > >  create mode 100644 Documentation/userspace-api/media/v4l/pixfmt-audio.rst
> > >
> > > diff --git a/Documentation/userspace-api/media/v4l/pixfmt-audio.rst 
> > > b/Documentation/userspace-api/media/v4l/pixfmt-audio.rst
> > > new file mode 100644
> > > index ..04b4a7fbd8f4
> > > --- /dev/null
> > > +++ b/Documentation/userspace-api/media/v4l/pixfmt-audio.rst
> > > @@ -0,0 +1,87 @@
> > > +.. SPDX-License-Identifier: GFDL-1.1-no-invariants-or-later
> > > +
> > > +.. _pixfmt-audio:
> > > +
> > > +*
> > > +Audio Formats
> > > +*
> > > +
> > > +These formats are used for :ref:`audiomem2mem` interface only.
> > > +
> > > +.. tabularcolumns:: |p{5.8cm}|p{1.2cm}|p{10.3cm}|
> > > +
> > > +.. cssclass:: longtable
> > > +
> > > +.. flat-table:: Audio Format
> > > +:header-rows:  1
> > > +:stub-columns: 0
> > > +:widths:   3 1 4
> > > +
> > > +* - Identifier
> > > +  - Code
> > > +  - Details
> > > +* .. _V4L2-AUDIO-FMT-S8:
> > > +
> > > +  - ``V4L2_AUDIO_FMT_S8``
> > > +  - 'S8'
> > > +  - Corresponds to SNDRV_PCM_FORMAT_S8 in ALSA
> > > +* .. _V4L2-AUDIO-FMT-S16-LE:  
> >
> > Hmm... why can't we just use SNDRV_*_FORMAT_*? Those are already part of
> > an uAPI header. No need to add any abstraction here and/or redefine
> > what is there already at include/uapi/sound/asound.h.
> >  
> Actually I try to avoid including the include/uapi/sound/asound.h.
> Because in user space, there is another copy in alsa-lib (asoundlib.h).
> There will be conflict in userspace when including videodev2.h and
> asoundlib.h.

Well, alsasoundlib.h seems to be using the same definitions:
https://github.com/michaelwu/alsa-lib/blob/master/include/pcm.h

So, I can't see what would be the actual issue, as both userspace library
and ALSA internal headers use the same magic numbers.

You can still do things like:

#ifdef __KERNEL__
#  include 
#else
#  include 
#endif

To avoid such kind of conflicts, if you need to have it included on
some header file. Yet, I can't see why you would need that.

IMO, at uAPI headers, you just need to declare the uAPI audiofmt field
to be either __u32 or __u64, pointing it to where this value comes from
(on both userspace and Kernelspace. E. g.:

/**
 * struct v4l2_audio_format - audio data format definition
 * @audioformat:
 *  an integer number matching the fields inside
 *  enum snd_pcm_format_t (e. g. `SNDRV_PCM_FORMAT_*`), as defined
 *  in include/uapi/sound/asound.h and
 *  
https://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m.html#gaa14b7f26877a812acbb39811364177f8.
 * @channels:   channel numbers
 * @buffersize: maximum size in bytes required for data
 */
struct v4l2_audio_format {
__u32   audioformat;
__u32   channels;
__u32   buffersize;
} __attribute__ ((packed));

Then, at documentation you just need to point to where the
possible values for SNDRV_PCM_FORMAT_ are defined. No need to
document them one by one.

With such definition, you'll only need to include sound/asound.h
within the kAPI scope.

> 
> And in the V4l framework, the fourcc type is commonly used in other
> cases (video, radio, touch, meta), to avoid changing common code
> a lot, so I think using fourcc definition for audio may be simpler.

Those are real video streams (or a video-related streams, in the case
of metadata) where fourcc is widely used. There, it makes sense.
However, ALSA format definitions are already being used for a long time.
There's no sense on trying to reinvent it - or having an abstract layer
to convert from/to fourcc <==> enum snd_pcm_format_t. Just use what is
there already.

Thanks,
Mauro


Re: [PATCH v12 10/15] media: uapi: Add audio rate controls support

2024-02-19 Thread Mauro Carvalho Chehab
Em Mon, 19 Feb 2024 14:03:37 +0800
Shengjiu Wang  escreveu:

> On Sat, Feb 17, 2024 at 5:57 PM Mauro Carvalho Chehab
>  wrote:
> >
> > Em Thu, 18 Jan 2024 20:32:03 +0800
> > Shengjiu Wang  escreveu:
> >  
> > > Add V4L2_CID_M2M_AUDIO_SOURCE_RATE and V4L2_CID_M2M_AUDIO_DEST_RATE
> > > new IDs for rate control.
> > >
> > > Add V4L2_CID_M2M_AUDIO_SOURCE_RATE_OFFSET and
> > > V4L2_CID_M2M_AUDIO_DEST_RATE_OFFSET for clock drift.
> > >
> > > Signed-off-by: Shengjiu Wang 
> > > ---
> > >  .../media/v4l/ext-ctrls-audio-m2m.rst | 20 +++
> > >  drivers/media/v4l2-core/v4l2-ctrls-defs.c |  6 ++
> > >  include/uapi/linux/v4l2-controls.h|  5 +
> > >  3 files changed, 31 insertions(+)
> > >
> > > diff --git 
> > > a/Documentation/userspace-api/media/v4l/ext-ctrls-audio-m2m.rst 
> > > b/Documentation/userspace-api/media/v4l/ext-ctrls-audio-m2m.rst
> > > index 82d2ecedbfee..de579ab8fb94 100644
> > > --- a/Documentation/userspace-api/media/v4l/ext-ctrls-audio-m2m.rst
> > > +++ b/Documentation/userspace-api/media/v4l/ext-ctrls-audio-m2m.rst
> > > @@ -19,3 +19,23 @@ Audio M2M Control IDs
> > >  The Audio M2M class descriptor. Calling
> > >  :ref:`VIDIOC_QUERYCTRL` for this control will
> > >  return a description of this control class.
> > > +
> > > +.. _v4l2-audio-asrc:
> > > +
> > > +``V4L2_CID_M2M_AUDIO_SOURCE_RATE (integer menu)``
> > > +Sets the audio source sample rate, unit is Hz
> > > +
> > > +``V4L2_CID_M2M_AUDIO_DEST_RATE (integer menu)``
> > > +Sets the audio destination sample rate, unit is Hz
> > > +
> > > +``V4L2_CID_M2M_AUDIO_SOURCE_RATE_OFFSET (fixed point)``
> > > +Sets the offset from the audio source sample rate, unit is Hz.
> > > +The offset compensates for any clock drift. The actual source audio
> > > +sample rate is the ideal source audio sample rate from
> > > +``V4L2_CID_M2M_AUDIO_SOURCE_RATE`` plus this fixed point offset.
> > > +
> > > +``V4L2_CID_M2M_AUDIO_DEST_RATE_OFFSET (fixed point)``
> > > +Sets the offset from the audio destination sample rate, unit is Hz.
> > > +The offset compensates for any clock drift. The actual destination 
> > > audio
> > > +sample rate is the ideal source audio sample rate from
> > > +``V4L2_CID_M2M_AUDIO_DEST_RATE`` plus this fixed point offset.  
> >
> > Hmm... first of all, controls on V4L2 API can either be get or set.
> > So, starting the sentence with "Set" sounds an assumption that may
> > be wrong.  
> 
> Ok, will update the description.
> >
> > Also, I would explain a little bit more about the frequency offset values,
> > as clock drift adjustment on PCM streams is something that can be done
> > using different approaches.
> >
> > I'm assuming that what you wanted here is to use it to check if the
> > video and audio clocks have some drift, and reducing or increasing
> > the audio sample rate dynamically to ensure that such drift will
> > stay constraint to a maximum allowed drift measured in mili or nano
> > seconds. So, userspace would be expected to be monitoring such drift
> > and increasing/decreasing the sample frequency as needed to maintain
> > such constraint.
> >
> > Is that the way such uAPI is expected to work?  
> 
> Yes. Userspace should monitor the drift, get the offset based on the
> common sample rate (8k, 44.1k, 48k...) then send it to the driver.
> The offset is a fixed point. it is base on the patch:
> https://patchwork.kernel.org/project/linux-media/patch/cec82507-ced9-4e7d-802c-04a40f84a...@xs4all.nl/

Ok, so write a description about how this should be used (like the
test I written),to help userspace developers to better understand
how this uAPI should be used.

See, one of the main goals of the uAPI documentation is to ensure that
userspace programs will implement the uAPI bits the right way, behaving
the right way when using it. That's why we even have some userspace code 
examples for some ioctls. IMO frequence drift is one of such cases
where more explanation is needed.


> 
> Best regards
> Shengjiu Wang
> >  
> > > diff --git a/drivers/media/v4l2-core/v4l2-ctrls-defs.c 
> > > b/drivers/media/v4l2-core/v4l2-ctrls-defs.c
> > > index 2a85ea3dc92f..91e1f5348c23 100644
> > > --- a/drivers/media/v4l2-core/v4l2-ctrls-defs.c
> > > +++ b/drivers/media/v4l2-core/v4l2-ctrls-defs.c
> > > @@ -1245,6 +1245,8 @@ const char *v4l2_ctrl_get_name(u32 id)
> > >
> > >   /* Audio M2M controls */
> > >   case V4L2_CID_M2M_AUDIO_CLASS:  return "Audio M2M Controls";
> > > + case V4L2_CID_M2M_AUDIO_SOURCE_RATE:return "Audio Source Sample 
> > > Rate";
> > > + case V4L2_CID_M2M_AUDIO_DEST_RATE:  return "Audio Destination 
> > > Sample Rate";
> > >   default:
> > >   return NULL;
> > >   }
> > > @@ -1606,6 +1608,10 @@ void v4l2_ctrl_fill(u32 id, const char **name, 
> > > enum v4l2_ctrl_type *type,
> > >   case V4L2_CID_COLORIMETRY_HDR10_MASTERING_DISPLAY:
> > >   *type = V4L2_CTRL_TYPE_HDR10_MASTERIN

Re: [kvm-unit-tests PATCH v4 8/8] migration: add a migration selftest

2024-02-19 Thread Nicholas Piggin
On Mon Feb 19, 2024 at 4:56 PM AEST, Thomas Huth wrote:
> On 17/02/2024 08.19, Nicholas Piggin wrote:
> > On Fri Feb 16, 2024 at 9:15 PM AEST, Thomas Huth wrote:
> >> On 09/02/2024 10.11, Nicholas Piggin wrote:
> >>> Add a selftest for migration support in  guest library and test harness
> >>> code. It performs migrations in a tight loop to irritate races and bugs
> >>> in the test harness code.
> >>>
> >>> Include the test in arm, s390, powerpc.
> >>>
> >>> Acked-by: Claudio Imbrenda  (s390x)
> >>> Reviewed-by: Thomas Huth 
> >>> Signed-off-by: Nicholas Piggin 
> >>> ---
> >>>arm/Makefile.common  |  1 +
> >>>arm/selftest-migration.c |  1 +
> >>>arm/unittests.cfg|  6 ++
> >>
> >>Hi Nicholas,
> >>
> >> I just gave the patches a try, but the arm test seems to fail for me: Only
> >> the first getchar() seems to wait for a character, all the subsequent ones
> >> don't wait anymore and just continue immediately ... is this working for
> >> you? Or do I need another patch on top?
> > 
> > Hey sorry missed this comment
> > 
> > It does seem to work for me, I've mostly tested pseries but I did test
> > others too (that's how I saw the arm getchar limit).
> > 
> > How are you observing it not waiting for migration?
>
> According to you other mail, I think you figured it out already, but just 
> for the records: You can see it when running the guest manually, e.g. 
> something like:
>
>   qemu-system-aarch64 -nodefaults -machine virt -accel tcg -cpu cortex-a57 \
> -device virtio-serial-device -device virtconsole,chardev=ctd \
> -chardev testdev,id=ctd -device pci-testdev -display none \
> -serial mon:stdio -kernel arm/selftest-migration.flat -smp 1
>
> Without my "lib/arm/io: Fix calling getchar() multiple times" patch, the 
> guest only waits during the first getchar(), all the others simply return 
> immediately.

Yeah I got it -- I re-ran it on arm and it is obvious since you told
me it's not waiting. At the time I tested I thought it was just arm
migrating really fast :D

Thanks,
Nick


[PATCH next] bus: fsl-mc: constify the struct device_type usage

2024-02-19 Thread Ricardo B. Marliere
_type fsl_mc_bus_dprc_type;
+extern const struct device_type fsl_mc_bus_dpni_type;
+extern const struct device_type fsl_mc_bus_dpio_type;
+extern const struct device_type fsl_mc_bus_dpsw_type;
+extern const struct device_type fsl_mc_bus_dpbp_type;
+extern const struct device_type fsl_mc_bus_dpcon_type;
+extern const struct device_type fsl_mc_bus_dpmcp_type;
+extern const struct device_type fsl_mc_bus_dpmac_type;
+extern const struct device_type fsl_mc_bus_dprtc_type;
+extern const struct device_type fsl_mc_bus_dpseci_type;
+extern const struct device_type fsl_mc_bus_dpdmux_type;
+extern const struct device_type fsl_mc_bus_dpdcei_type;
+extern const struct device_type fsl_mc_bus_dpaiop_type;
+extern const struct device_type fsl_mc_bus_dpci_type;
+extern const struct device_type fsl_mc_bus_dpdmai_type;
 
 static inline bool is_fsl_mc_bus_dprc(const struct fsl_mc_device *mc_dev)
 {

---
base-commit: b401b621758e46812da61fa58a67c3fd8d91de0d
change-id: 20240219-device_cleanup-fsl-mc-b2d8f7a5308d

Best regards,
-- 
Ricardo B. Marliere 



Re: [PATCH v7 1/3] powerpc: make fadump resilient with memory add/remove events

2024-02-19 Thread Sourabh Jain

Hello Hari,

On 23/01/24 15:39, Hari Bathini wrote:



On 11/01/24 7:39 pm, Sourabh Jain wrote:

Due to changes in memory resources caused by either memory hotplug or
online/offline events, the elfcorehdr, which describes the CPUs and
memory of the crashed kernel to the kernel that collects the dump (known
as second/fadump kernel), becomes outdated. Consequently, attempting
dump collection with an outdated elfcorehdr can lead to failed or
inaccurate dump collection.

Memory hotplug or online/offline events is referred as memory add/remove
events in reset of the commit message.

The current solution to address the aforementioned issue is as follows:
Monitor memory add/remove events in userspace using udev rules, and
re-register fadump whenever there are changes in memory resources. This
leads to the creation of a new elfcorehdr with updated system memory
information.

There are several notable issues associated with re-registering fadump
for every memory add/remove events.

1. Bulk memory add/remove events with udev-based fadump re-registration
    can lead to race conditions and, more importantly, it creates a wide
    window during which fadump is inactive until all memory add/remove
    events are settled.
2. Re-registering fadump for every memory add/remove event is
    inefficient.
3. The memory for elfcorehdr is allocated based on the memblock regions
    available during early boot and remains fixed thereafter. 
However, if

    elfcorehdr is later recreated with additional memblock regions, its
    size will increase, potentially leading to memory corruption.

Address the aforementioned challenges by shifting the creation of
elfcorehdr from the first kernel (also referred as the crashed kernel),
where it was created and frequently recreated for every memory
add/remove event, to the fadump kernel. As a result, the elfcorehdr only
needs to be created once, thus eliminating the necessity to re-register
fadump during memory add/remove events.

At present, the first kernel prepares the fadump header and stores it in
the fadump reserved area. The fadump header contains start address of
the elfcorehd, crashing CPU details, etc.  In the event of first kernel


"elfcorehd" used instead of "elfcorehdr" at a couple of places..


Fixed it now. Thanks.




crash, the second/fadump boots and access the fadump header prepared by
first kernel and do the following in a platform-specific function
[rtas|opal]_fadump_process:

At present, the first kernel is responsible for preparing the fadump
header and storing it in the fadump reserved area. The fadump header
includes the start address of the elfcorehd, crashing CPU details, and
other relevant information. In the event of a crash in the first kernel,
the second/fadump boots and accesses the fadump header prepared by the
first kernel. It then performs the following steps in a
platform-specific function [rtas|opal]_fadump_process:

1. Sanity check for fadump header
2. Update CPU notes in elfcorehdr
3. Set the global variable elfcorehdr_addr to the address of the
    fadump header's elfcorehdr. For vmcore module to process it later 
on.


Along with the above, update the setup_fadump()/fadump.c to create
elfcorehdr in second/fadump kernel.

Section below outlines the information required to create the elfcorehdr
and the changes made to make it available to the fadump kernel if it's
not already.

To create elfcorehdr, the following crashed kernel information is
required: CPU notes, vmcoreinfo, and memory ranges.

At present, the CPU notes are already prepared in the fadump kernel, so
no changes are needed in that regard. The fadump kernel has access to
all crashed kernel memory regions, including boot memory regions that
are relocated by firmware to fadump reserved areas, so no changes for
that either. However, it is necessary to add new members to the fadump
header, i.e., the 'fadump_crash_info_header' structure, in order to pass
the crashed kernel's vmcoreinfo address and its size to fadump kernel.

In addition to the vmcoreinfo address and size, there are a few other
attributes also added to the fadump_crash_info_header structure.

1. version:
    It stores the fadump header version, which is currently set to 1.
    This provides flexibility to update the fadump crash info header in
    the future without changing the magic number. For each change in the
    fadump header, the version will be increased. This will help the
    updated kernel determine how to handle kernel dumps from older
    kernels. The magic number remains relevant for checking fadump 
header

    corruption.

2. elfcorehdr_size:
    since elfcorehdr is now prepared in the fadump/second kernel and
    it is not part of the reserved area, this attribute is needed to
    track the memory allocated for elfcorehdr to do the deallocation
    properly.

3. pt_regs_sz/cpu_mask_sz:
    Store size of pt_regs and cpu_mask strucutre in first kernel. These
    attributes are used avoid processing the dump if