Re: [PATCH v6 05/29] x86/apic/vector: Do not allocate vectors for NMIs

2022-05-13 Thread Ricardo Neri
On Fri, May 13, 2022 at 10:50:09PM +0200, Thomas Gleixner wrote:
> On Fri, May 13 2022 at 11:03, Ricardo Neri wrote:
> > On Fri, May 06, 2022 at 11:12:20PM +0200, Thomas Gleixner wrote:
> >> Why would a NMI ever end up in this code? There is no vector management
> >> required and this find cpu exercise is pointless.
> >
> > But even if the NMI has a fixed vector, it is still necessary to determine
> > which CPU will get the NMI. It is still necessary to determine what to
> > write in the Destination ID field of the MSI message.
> >
> > irq_matrix_find_best_cpu() would find the CPU with the lowest number of
> > managed vectors so that the NMI is directed to that CPU. 
> 
> What's the point to send it to the CPU with the lowest number of
> interrupts. It's not that this NMI happens every 50 microseconds.
> We pick one online CPU and are done.

Indeed, that is sensible.

> 
> > In today's code, an NMI would end up here because we rely on the existing
> > interrupt management infrastructure... Unless, the check is done the entry
> > points as you propose.
> 
> Correct. We don't want to call into functions which are not designed for
> NMIs.

Agreed.

>  
> >> > +
> >> > +if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI) {
> >> > +cpu = irq_matrix_find_best_cpu_managed(vector_matrix, 
> >> > dest);
> >> > +apicd->cpu = cpu;
> >> > +vector = 0;
> >> > +goto no_vector;
> >> > +}
> >> 
> >> This code can never be reached for a NMI delivery. If so, then it's a
> >> bug.
> >> 
> >> This all is special purpose for that particular HPET NMI watchdog use
> >> case and we are not exposing this to anything else at all.
> >> 
> >> So why are you sprinkling this NMI nonsense all over the place? Just
> >> because? There are well defined entry points to all of this where this
> >> can be fenced off.
> >
> > I put the NMI checks in these points because assign_vector_locked() and
> > assign_managed_vector() are reached through multiple paths and these are
> > the two places where the allocation of the vector is requested and the
> > destination CPU is determined.
> >
> > I do observe this code being reached for an NMI, but that is because this
> > code still does not know about NMIs... Unless the checks for NMI are put
> > in the entry points as you pointed out.
> >
> > The intent was to refactor the code in a generic manner and not to focus
> > only in the NMI watchdog. That would have looked hacky IMO.
> 
> We don't want to have more of this really. Supporting NMIs on x86 in a
> broader way is simply not reasonable because there is only one NMI
> vector and we have no sensible way to get to the cause of the NMI
> without a massive overhead.
> 
> Even if we get multiple NMI vectors some shiny day, this will be
> fundamentally different than regular interrupts and certainly not
> exposed broadly. There will be 99.99% fixed vectors for simplicity sake.

Understood.

> 
> >> +  if (info->flags & X86_IRQ_ALLOC_AS_NMI) {
> >> +  /*
> >> +   * NMIs have a fixed vector and need their own
> >> +   * interrupt chip so nothing can end up in the
> >> +   * regular local APIC management code except the
> >> +   * MSI message composing callback.
> >> +   */
> >> +  irqd->chip = _nmi_controller;
> >> +  /*
> >> +   * Don't allow affinity setting attempts for NMIs.
> >> +   * This cannot work with the regular affinity
> >> +   * mechanisms and for the intended HPET NMI
> >> +   * watchdog use case it's not required.
> >
> > But we do need the ability to set affinity, right? As stated above, we need
> > to know what Destination ID to write in the MSI message or in the interrupt
> > remapping table entry.
> >
> > It cannot be any CPU because only one specific CPU is supposed to handle the
> > NMI from the HPET channel.
> >
> > We cannot hard-code a CPU for that because it may go offline (and ignore 
> > NMIs)
> > or not be part of the monitored CPUs.
> >
> > Also, if lapic_nmi_controller.irq_set_affinity() is NULL, then irq_chips
> > INTEL-IR, AMD-IR, those using msi_domain_set_affinity() need to check for 
> > NULL.
> > They currently unconditionally call the parent irq_chip's 
> > irq_set_affinity().
> > I see that there is a irq_chip_set_affinity_parent() function. Perhaps it 
> > can
> > be used for this check?
> 
> Yes, this lacks obviously a NMI specific set_affinity callback and this
> can be very trivial and does not have any of the complexity of interrupt
> affinity assignment. First online CPU in the mask with a fallback to any
> online CPU.

Why would we need a fallback to any online CPU? Shouldn't it fail if it cannot
find an online CPU in the mask?

> 
> I did not claim that this is complete. This was for illustration.

In the reworked patch, 

Re: [PATCH v6 24/29] watchdog/hardlockup: Use parse_option_str() to handle "nmi_watchdog"

2022-05-13 Thread Ricardo Neri
On Tue, May 10, 2022 at 08:46:41PM +1000, Nicholas Piggin wrote:
> Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> > Prepare hardlockup_panic_setup() to handle a comma-separated list of
> > options. Thus, it can continue parsing its own command-line options while
> > ignoring parameters that are relevant only to specific implementations of
> > the hardlockup detector. Such implementations may use an early_param to
> > parse their own options.
> 
> It can't really handle comma separated list though, until the next
> patch. nmi_watchdog=panic,0 does not make sense, so you lost error
> handling of that.

Yes that is true. All possible combinations need to be checked.

> 
> And is it kosher to double handle options like this? I'm sure it
> happens but it's ugly.
> 
> Would you consider just add a new option for x86 and avoid changing
> this? Less code and patches.

Sure, I can not modify this code and add a x86-specific command-line
option.

Thanks and BR,
Ricardo


Re: [PATCH v6 20/29] init/main: Delay initialization of the lockup detector after smp_init()

2022-05-13 Thread Ricardo Neri
On Tue, May 10, 2022 at 08:38:22PM +1000, Nicholas Piggin wrote:
> Excerpts from Ricardo Neri's message of May 6, 2022 9:59 am:
> > Certain implementations of the hardlockup detector require support for
> > Inter-Processor Interrupt shorthands. On x86, support for these can only
> > be determined after all the possible CPUs have booted once (in
> > smp_init()). Other architectures may not need such check.
> > 
> > lockup_detector_init() only performs the initializations of data
> > structures of the lockup detector. Hence, there are no dependencies on
> > smp_init().
> 

Thank you for your feedback Nicholas!

> I think this is the only real thing which affects other watchdog types?

Also patches 18 and 19 that decouple the NMI watchdog functionality from
perf.

> 
> Not sure if it's a big problem, the secondary CPUs coming up won't
> have their watchdog active until quite late, and the primary could
> implement its own timeout in __cpu_up for secondary coming up, and
> IPI it to get traces if necessary which is probably more robust.

Indeed that could work. Another alternative I have been pondering is to boot
the system with the perf-based NMI watchdog enabled. Once all CPUs are up
and running, switch to the HPET-based NMI watchdog and free the PMU counters.

> 
> Acked-by: Nicholas Piggin 

Thank you!

BR,
Ricardo


Re: [PATCH 11/12] powerpc: wiiu: don't enforce flat memory

2022-05-13 Thread Pali Rohár
On Wednesday 02 March 2022 15:44:05 Ash Logan wrote:
> pgtable_32.c:mapin_ram loops over each valid memory range, which means
> non-contiguous memory just works.

Hello! Does it mean that non-contiguous memory works for any 32-bit
powerpc platform, and not only for wiiu? If yes, should not be
non-contiguous memory support enabled for all 32-bit ppc boards then?

> Signed-off-by: Ash Logan 
> ---
>  arch/powerpc/mm/init_32.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
> index 3d690be48e84..59a84629d9a0 100644
> --- a/arch/powerpc/mm/init_32.c
> +++ b/arch/powerpc/mm/init_32.c
> @@ -125,10 +125,10 @@ void __init MMU_init(void)
>* lowmem_end_addr is initialized below.
>*/
>   if (memblock.memory.cnt > 1) {
> -#ifndef CONFIG_WII
> +#if !defined(CONFIG_WII) && !defined(CONFIG_WIIU)
>   memblock_enforce_memory_limit(memblock.memory.regions[0].size);
>   pr_warn("Only using first contiguous memory region\n");
> -#else
> +#elif defined(CONFIG_WII)
>   wii_memory_fixups();
>  #endif
>   }
> -- 
> 2.35.1
> 


Re: [PATCH v6 22/29] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector

2022-05-13 Thread Ricardo Neri
On Mon, May 09, 2022 at 04:03:39PM +0200, Thomas Gleixner wrote:
> On Thu, May 05 2022 at 17:00, Ricardo Neri wrote:
> > +   if (is_hpet_hld_interrupt(hdata)) {
> > +   /*
> > +* Kick the timer first. If the HPET channel is periodic, it
> > +* helps to reduce the delta between the expected TSC value and
> > +* its actual value the next time the HPET channel fires.
> > +*/
> > +   kick_timer(hdata, !(hdata->has_periodic));
> > +
> > +   if (cpumask_weight(hld_data->monitored_cpumask) > 1) {
> > +   /*
> > +* Since we cannot know the source of an NMI, the best
> > +* we can do is to use a flag to indicate to all online
> > +* CPUs that they will get an NMI and that the source of
> > +* that NMI is the hardlockup detector. Offline CPUs
> > +* also receive the NMI but they ignore it.
> > +*
> > +* Even though we are in NMI context, we have concluded
> > +* that the NMI came from the HPET channel assigned to
> > +* the detector, an event that is infrequent and only
> > +* occurs in the handling CPU. There should not be races
> > +* with other NMIs.
> > +*/
> > +   cpumask_copy(hld_data->inspect_cpumask,
> > +cpu_online_mask);
> > +
> > +   /* If we are here, IPI shorthands are enabled. */
> > +   apic->send_IPI_allbutself(NMI_VECTOR);
> 
> So if the monitored cpumask is a subset of online CPUs, which is the
> case when isolation features are enabled, then you still send NMIs to
> those isolated CPUs. I'm sure the isolation folks will be enthused.

Yes, I acknowledged this limitation in the cover letter. I should also update
Documentation/admin-guide/lockup-watchdogs.rst.

This patchset proposes the HPET NMI watchdog as an opt-in feature.

Perhaps the limitation might be mitigated by adding a check for non-housekeeping
and non-monitored CPUs in exc_nmi(). However, that will not eliminate the
problem of isolated CPUs also getting the NMI.

Thanks and BR,
Ricardo


Re: [PATCH v6 15/29] x86/hpet: Add helper function hpet_set_comparator_periodic()

2022-05-13 Thread Ricardo Neri
On Fri, May 06, 2022 at 11:51:52PM +0200, Thomas Gleixner wrote:
> On Fri, May 06 2022 at 23:41, Thomas Gleixner wrote:
> > On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> >> Programming an HPET channel as periodic requires setting the
> >> HPET_TN_SETVAL bit in the channel configuration. Plus, the comparator
> >> register must be written twice (once for the comparator value and once for
> >> the periodic value). Since this programming might be needed in several
> >> places (e.g., the HPET clocksource and the HPET-based hardlockup detector),
> >> add a helper function for this purpose.
> >>
> >> A helper function hpet_set_comparator_oneshot() could also be implemented.
> >> However, such function would only program the comparator register and the
> >> function would be quite small. Hence, it is better to not bloat the code
> >> with such an obvious function.
> >
> > This word salad above does not provide a single reason why the periodic
> > programming function is required and better suited for the NMI watchdog
> > case and then goes on and blurbs about why a function which is not
> > required is not implemented. The argument about not bloating the code
> > with an "obvious???" function which is quite small is slightly beyond my
> > comprehension level.
> 
> What's even more uncomprehensible is that the patch which actually sets
> up that NMI watchdog cruft has:
> 
> > +   if (hc->boot_cfg & HPET_TN_PERIODIC_CAP)
> > +   hld_data->has_periodic = true;
> 
> So how the heck does that work with a HPET which does not support
> periodic mode?

If the HPET channel does not support periodic mode (as indicated by the flag
above), it will read the HPET counter into a local variable, increment that
local variable, and write comparator of the HPET channel.

If the HPET channel does support periodic mode, it will not kick it again.
It will only kick a periodic HPET channel if needed (e.g., if the NMI watchdog
was idle of watchdog_thresh changed its value).

> 
> That watchdog muck will still happily invoke that set periodic function
> in the hope that it works by chance?

It will not. It will check hld_data->has_periodic and act accordingly.

FWIW, I have tested this NMI watchdog with periodic and non-periodic HPET
channels.

Thanks and BR,
Ricardo


Re: [PATCH v6 15/29] x86/hpet: Add helper function hpet_set_comparator_periodic()

2022-05-13 Thread Ricardo Neri
On Fri, May 06, 2022 at 11:41:13PM +0200, Thomas Gleixner wrote:
> On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> > Programming an HPET channel as periodic requires setting the
> > HPET_TN_SETVAL bit in the channel configuration. Plus, the comparator
> > register must be written twice (once for the comparator value and once for
> > the periodic value). Since this programming might be needed in several
> > places (e.g., the HPET clocksource and the HPET-based hardlockup detector),
> > add a helper function for this purpose.
> >
> > A helper function hpet_set_comparator_oneshot() could also be implemented.
> > However, such function would only program the comparator register and the
> > function would be quite small. Hence, it is better to not bloat the code
> > with such an obvious function.
> 
> This word salad above does not provide a single reason why the periodic
> programming function is required and better suited for the NMI watchdog
> case

The goal of hpet_set_comparator_periodic() is to avoid code duplication. The
functions hpet_clkevt_set_state_periodic() and kick_timer() in the HPET NMI
watchdog need to program a periodic HPET channel when supported.


> and then goes on and blurbs about why a function which is not
> required is not implemented.

I can remove this.

> The argument about not bloating the code
> with an "obvious???" function which is quite small is slightly beyond my
> comprehension level.

That obvious function would look like this:

void hpet_set_comparator_one_shot(int channel, u32 delta)
{
u32 count;

count = hpet_readl(HPET_COUNTER);
count += delta;
hpet_writel(count, HPET_Tn_CMP(channel));
}

It involves one register read, one addition and one register write. IMO, this
code is sufficiently simple and small to allow duplication.

Programming a periodic HPET channel is not as straightforward, IMO. It involves
handling two different values (period and comparator) written in a specific
sequence, one configuration bit, and one delay. It also involves three register
writes and one register read.

Thanks and BR,
Ricardo


Re: [PATCH v6 05/29] x86/apic/vector: Do not allocate vectors for NMIs

2022-05-13 Thread Thomas Gleixner
On Fri, May 13 2022 at 11:03, Ricardo Neri wrote:
> On Fri, May 06, 2022 at 11:12:20PM +0200, Thomas Gleixner wrote:
>> Why would a NMI ever end up in this code? There is no vector management
>> required and this find cpu exercise is pointless.
>
> But even if the NMI has a fixed vector, it is still necessary to determine
> which CPU will get the NMI. It is still necessary to determine what to
> write in the Destination ID field of the MSI message.
>
> irq_matrix_find_best_cpu() would find the CPU with the lowest number of
> managed vectors so that the NMI is directed to that CPU. 

What's the point to send it to the CPU with the lowest number of
interrupts. It's not that this NMI happens every 50 microseconds.
We pick one online CPU and are done.

> In today's code, an NMI would end up here because we rely on the existing
> interrupt management infrastructure... Unless, the check is done the entry
> points as you propose.

Correct. We don't want to call into functions which are not designed for
NMIs.
 
>> > +
>> > +  if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI) {
>> > +  cpu = irq_matrix_find_best_cpu_managed(vector_matrix, dest);
>> > +  apicd->cpu = cpu;
>> > +  vector = 0;
>> > +  goto no_vector;
>> > +  }
>> 
>> This code can never be reached for a NMI delivery. If so, then it's a
>> bug.
>> 
>> This all is special purpose for that particular HPET NMI watchdog use
>> case and we are not exposing this to anything else at all.
>> 
>> So why are you sprinkling this NMI nonsense all over the place? Just
>> because? There are well defined entry points to all of this where this
>> can be fenced off.
>
> I put the NMI checks in these points because assign_vector_locked() and
> assign_managed_vector() are reached through multiple paths and these are
> the two places where the allocation of the vector is requested and the
> destination CPU is determined.
>
> I do observe this code being reached for an NMI, but that is because this
> code still does not know about NMIs... Unless the checks for NMI are put
> in the entry points as you pointed out.
>
> The intent was to refactor the code in a generic manner and not to focus
> only in the NMI watchdog. That would have looked hacky IMO.

We don't want to have more of this really. Supporting NMIs on x86 in a
broader way is simply not reasonable because there is only one NMI
vector and we have no sensible way to get to the cause of the NMI
without a massive overhead.

Even if we get multiple NMI vectors some shiny day, this will be
fundamentally different than regular interrupts and certainly not
exposed broadly. There will be 99.99% fixed vectors for simplicity sake.

>> +if (info->flags & X86_IRQ_ALLOC_AS_NMI) {
>> +/*
>> + * NMIs have a fixed vector and need their own
>> + * interrupt chip so nothing can end up in the
>> + * regular local APIC management code except the
>> + * MSI message composing callback.
>> + */
>> +irqd->chip = _nmi_controller;
>> +/*
>> + * Don't allow affinity setting attempts for NMIs.
>> + * This cannot work with the regular affinity
>> + * mechanisms and for the intended HPET NMI
>> + * watchdog use case it's not required.
>
> But we do need the ability to set affinity, right? As stated above, we need
> to know what Destination ID to write in the MSI message or in the interrupt
> remapping table entry.
>
> It cannot be any CPU because only one specific CPU is supposed to handle the
> NMI from the HPET channel.
>
> We cannot hard-code a CPU for that because it may go offline (and ignore NMIs)
> or not be part of the monitored CPUs.
>
> Also, if lapic_nmi_controller.irq_set_affinity() is NULL, then irq_chips
> INTEL-IR, AMD-IR, those using msi_domain_set_affinity() need to check for 
> NULL.
> They currently unconditionally call the parent irq_chip's irq_set_affinity().
> I see that there is a irq_chip_set_affinity_parent() function. Perhaps it can
> be used for this check?

Yes, this lacks obviously a NMI specific set_affinity callback and this
can be very trivial and does not have any of the complexity of interrupt
affinity assignment. First online CPU in the mask with a fallback to any
online CPU.

I did not claim that this is complete. This was for illustration.

>> + */
>> +irqd_set_no_balance(irqd);
>
> This code does not set apicd->hw_irq_cfg.delivery_mode as NMI, right?
> I had to add that to make it work.

I assumed you can figure that out on your own :)

Thanks,

tglx


Re: [PATCH v6 13/29] iommu/amd: Compose MSI messages for NMI irqs in non-IR format

2022-05-13 Thread Ricardo Neri
On Fri, May 06, 2022 at 11:31:56PM +0200, Thomas Gleixner wrote:
> On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> > +*
> > +* Also, NMIs do not have an associated vector. No need for cleanup.
> 
> They have a vector and what the heck is this cleanup comment for here?
> There is nothing cleanup alike anywhere near.
> 
> Adding confusing comments is worse than adding no comments at all.

I will remove the comment regarding cleanup. I will clarify that NMI has a
fixed vector.

Thanks and BR,
Ricardo


Re: [PATCH v6 12/29] iommu/amd: Enable NMIPass when allocating an NMI irq

2022-05-13 Thread Ricardo Neri
On Fri, May 06, 2022 at 11:26:22PM +0200, Thomas Gleixner wrote:
> On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> >  
> > +   if (info->flags & X86_IRQ_ALLOC_AS_NMI) {
> > +   /* Only one IRQ per NMI */
> > +   if (nr_irqs != 1)
> > +   return -EINVAL;
> 
> See previous reply.

I remove this check.

Thanks and BR,
Ricardo
> 


Re: [PATCH v6 10/29] iommu/vt-d: Implement minor tweaks for NMI irqs

2022-05-13 Thread Ricardo Neri
On Fri, May 06, 2022 at 11:23:23PM +0200, Thomas Gleixner wrote:
> On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> > The Intel IOMMU interrupt remapping driver already programs correctly the
> > delivery mode of individual irqs as per their irq_data. Improve handling
> > of NMIs. Allow only one irq per NMI. Also, it is not necessary to cleanup
> > irq vectors after updating affinity.
> 
> Structuring a changelog in paragraphs might make it readable. New lines
> exist for a reason.

Sure, I can structure this in paragraphps.
> 
> > NMIs do not have associated vectors.
> 
> Again. NMI has an vector associated, but it is not subject to dynamic
> vector management.

Indeed, it is clear to me now.

> 
> > diff --git a/drivers/iommu/intel/irq_remapping.c 
> > b/drivers/iommu/intel/irq_remapping.c
> > index fb2d71bea98d..791a9331e257 100644
> > --- a/drivers/iommu/intel/irq_remapping.c
> > +++ b/drivers/iommu/intel/irq_remapping.c
> > @@ -1198,8 +1198,12 @@ intel_ir_set_affinity(struct irq_data *data, const 
> > struct cpumask *mask,
> >  * After this point, all the interrupts will start arriving
> >  * at the new destination. So, time to cleanup the previous
> >  * vector allocation.
> > +*
> > +* Do it only for non-NMI irqs. NMIs don't have associated
> > +* vectors.
> 
> See above.

Sure.

> 
> >  */
> > -   send_cleanup_vector(cfg);
> > +   if (cfg->delivery_mode != APIC_DELIVERY_MODE_NMI)
> > +   send_cleanup_vector(cfg);
> 
> So this needs to be replicated for all invocations of
> send_cleanup_vector(), right? Why can't you put it into that function?

Certainly, it can be done inside the function.

>   
> > return IRQ_SET_MASK_OK_DONE;
> >  }
> > @@ -1352,6 +1356,9 @@ static int intel_irq_remapping_alloc(struct 
> > irq_domain *domain,
> > if (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI)
> > info->flags &= ~X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
> >  
> > +   if ((info->flags & X86_IRQ_ALLOC_AS_NMI) && nr_irqs != 1)
> > +   return -EINVAL;
> 
> This cannot be reached when the vector allocation domain already
> rejected it, but copy & pasta is wonderful and increases the line count.

Yes, this is not needed.

Thanks and BR,
Ricardo
> 
> Thanks,
> 
> tglx
> 
> 


Re: [PATCH v6 05/29] x86/apic/vector: Do not allocate vectors for NMIs

2022-05-13 Thread Ricardo Neri
On Fri, May 06, 2022 at 11:12:20PM +0200, Thomas Gleixner wrote:
> On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> > Vectors are meaningless when allocating IRQs with NMI as the delivery
> > mode.
> 
> Vectors are not meaningless. NMI has a fixed vector.
> 
> The point is that for a fixed vector there is no vector management
> required.
> 
> Can you spot the difference?

Yes, I see your point now. Thank you for the explanation.

> 
> > In such case, skip the reservation of IRQ vectors. Do it in the lowest-
> > level functions where the actual IRQ reservation takes place.
> >
> > Since NMIs target specific CPUs, keep the functionality to find the best
> > CPU.
> 
> Again. What for?
>   
> > +   if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI) {
> > +   cpu = irq_matrix_find_best_cpu(vector_matrix, dest);
> > +   apicd->cpu = cpu;
> > +   vector = 0;
> > +   goto no_vector;
> > +   }
> 
> Why would a NMI ever end up in this code? There is no vector management
> required and this find cpu exercise is pointless.

But even if the NMI has a fixed vector, it is still necessary to determine
which CPU will get the NMI. It is still necessary to determine what to
write in the Destination ID field of the MSI message.

irq_matrix_find_best_cpu() would find the CPU with the lowest number of
managed vectors so that the NMI is directed to that CPU. 

In today's code, an NMI would end up here because we rely on the existing
interrupt management infrastructure... Unless, the check is done the entry
points as you propose.

> 
> > vector = irq_matrix_alloc(vector_matrix, dest, resvd, );
> > trace_vector_alloc(irqd->irq, vector, resvd, vector);
> > if (vector < 0)
> > return vector;
> > apic_update_vector(irqd, vector, cpu);
> > +
> > +no_vector:
> > apic_update_irq_cfg(irqd, vector, cpu);
> >  
> > return 0;
> > @@ -321,12 +330,22 @@ assign_managed_vector(struct irq_data *irqd, const 
> > struct cpumask *dest)
> > /* set_affinity might call here for nothing */
> > if (apicd->vector && cpumask_test_cpu(apicd->cpu, vector_searchmask))
> > return 0;
> > +
> > +   if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI) {
> > +   cpu = irq_matrix_find_best_cpu_managed(vector_matrix, dest);
> > +   apicd->cpu = cpu;
> > +   vector = 0;
> > +   goto no_vector;
> > +   }
> 
> This code can never be reached for a NMI delivery. If so, then it's a
> bug.
> 
> This all is special purpose for that particular HPET NMI watchdog use
> case and we are not exposing this to anything else at all.
> 
> So why are you sprinkling this NMI nonsense all over the place? Just
> because? There are well defined entry points to all of this where this
> can be fenced off.

I put the NMI checks in these points because assign_vector_locked() and
assign_managed_vector() are reached through multiple paths and these are
the two places where the allocation of the vector is requested and the
destination CPU is determined.

I do observe this code being reached for an NMI, but that is because this
code still does not know about NMIs... Unless the checks for NMI are put
in the entry points as you pointed out.

The intent was to refactor the code in a generic manner and not to focus
only in the NMI watchdog. That would have looked hacky IMO.

> 
> If at some day the hardware people get their act together and provide a
> proper vector space for NMIs then we have to revisit that, but then
> there will be a separate NMI vector management too.
> 
> What you want is the below which also covers the next patch. Please keep
> an eye on the comments I added/modified.

Thank you for the code and the clarifying comments.
> 
> Thanks,
> 
> tglx
> ---
> --- a/arch/x86/kernel/apic/vector.c
> +++ b/arch/x86/kernel/apic/vector.c
> @@ -42,6 +42,7 @@ EXPORT_SYMBOL_GPL(x86_vector_domain);
>  static DEFINE_RAW_SPINLOCK(vector_lock);
>  static cpumask_var_t vector_searchmask;
>  static struct irq_chip lapic_controller;
> +static struct irq_chip lapic_nmi_controller;
>  static struct irq_matrix *vector_matrix;
>  #ifdef CONFIG_SMP
>  static DEFINE_PER_CPU(struct hlist_head, cleanup_list);
> @@ -451,6 +452,10 @@ static int x86_vector_activate(struct ir
>   trace_vector_activate(irqd->irq, apicd->is_managed,
> apicd->can_reserve, reserve);
>  
> + /* NMI has a fixed vector. No vector management required */
> + if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI)
> + return 0;
> +
>   raw_spin_lock_irqsave(_lock, flags);
>   if (!apicd->can_reserve && !apicd->is_managed)
>   assign_irq_vector_any_locked(irqd);
> @@ -472,6 +477,10 @@ static void vector_free_reserved_and_man
>   trace_vector_teardown(irqd->irq, apicd->is_managed,
> apicd->has_reserved);
>  
> + /* NMI has a fixed vector. No vector management required */
> +

Re: [PATCH -next v4 7/7] arm64: add cow to machine check safe

2022-05-13 Thread Mark Rutland
On Wed, Apr 20, 2022 at 03:04:18AM +, Tong Tiangen wrote:
> In the cow(copy on write) processing, the data of the user process is
> copied, when hardware memory error is encountered during copy, only the
> relevant processes are affected, so killing the user process and isolate
> the user page with hardware memory errors is a more reasonable choice than
> kernel panic.

There are plenty of other places we'll access user pages via a kernel
alias (e.g. when performing IO), so why is this special?

To be clear, I am not entirely averse to this, but it seems like this is
being done because it's easy to do rather than necessarily being all
that useful, and I'm not keen on having to duplicate a bunch of logic
for this.

> Add new helper copy_page_mc() which provide a page copy implementation with
> machine check safe. At present, only used in cow. In future, we can expand
> more scenes. As long as the consequences of page copy failure are not
> fatal(eg: only affect user process), we can use this helper.
> 
> The copy_page_mc() in copy_page_mc.S is largely borrows from copy_page()
> in copy_page.S and the main difference is copy_page_mc() add extable entry
> to every load/store insn to support machine check safe. largely to keep the
> patch simple. If needed those optimizations can be folded in.
> 
> Add new extable type EX_TYPE_COPY_PAGE_MC which used in copy_page_mc().
> 
> This type only be processed in fixup_exception_mc(), The reason is that
> copy_page_mc() is consistent with copy_page() except machine check safe is
> considered, and copy_page() do not need to consider exception fixup.
> 
> Signed-off-by: Tong Tiangen 
> ---
>  arch/arm64/include/asm/asm-extable.h |  5 ++
>  arch/arm64/include/asm/page.h| 10 
>  arch/arm64/lib/Makefile  |  2 +
>  arch/arm64/lib/copy_page_mc.S| 86 
>  arch/arm64/mm/copypage.c | 36 ++--
>  arch/arm64/mm/extable.c  |  2 +
>  include/linux/highmem.h  |  8 +++
>  mm/memory.c  |  2 +-
>  8 files changed, 144 insertions(+), 7 deletions(-)
>  create mode 100644 arch/arm64/lib/copy_page_mc.S
> 
> diff --git a/arch/arm64/include/asm/asm-extable.h 
> b/arch/arm64/include/asm/asm-extable.h
> index 80410899a9ad..74c056ddae15 100644
> --- a/arch/arm64/include/asm/asm-extable.h
> +++ b/arch/arm64/include/asm/asm-extable.h
> @@ -14,6 +14,7 @@
>  /* _MC indicates that can fixup from machine check errors */
>  #define EX_TYPE_UACCESS_MC   5
>  #define EX_TYPE_UACCESS_MC_ERR_ZERO  6
> +#define EX_TYPE_COPY_PAGE_MC 7
>  
>  #ifdef __ASSEMBLY__
>  
> @@ -42,6 +43,10 @@
>   __ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_UACCESS_MC, 0)
>   .endm
>  
> + .macro  _asm_extable_copy_page_mc, insn, fixup
> + __ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_COPY_PAGE_MC, 0)
> + .endm
> +
>  /*
>   * Create an exception table entry for `insn` if `fixup` is provided. 
> Otherwise
>   * do nothing.
> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> index 993a27ea6f54..832571a7dddb 100644
> --- a/arch/arm64/include/asm/page.h
> +++ b/arch/arm64/include/asm/page.h
> @@ -29,6 +29,16 @@ void copy_user_highpage(struct page *to, struct page *from,
>  void copy_highpage(struct page *to, struct page *from);
>  #define __HAVE_ARCH_COPY_HIGHPAGE
>  
> +#ifdef CONFIG_ARCH_HAS_COPY_MC
> +extern void copy_page_mc(void *to, const void *from);
> +void copy_highpage_mc(struct page *to, struct page *from);
> +#define __HAVE_ARCH_COPY_HIGHPAGE_MC
> +
> +void copy_user_highpage_mc(struct page *to, struct page *from,
> + unsigned long vaddr, struct vm_area_struct *vma);
> +#define __HAVE_ARCH_COPY_USER_HIGHPAGE_MC
> +#endif
> +
>  struct page *alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma,
>   unsigned long vaddr);
>  #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE_MOVABLE
> diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
> index 29490be2546b..0d9f292ef68a 100644
> --- a/arch/arm64/lib/Makefile
> +++ b/arch/arm64/lib/Makefile
> @@ -15,6 +15,8 @@ endif
>  
>  lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o
>  
> +lib-$(CONFIG_ARCH_HAS_COPY_MC) += copy_page_mc.o
> +
>  obj-$(CONFIG_CRC32) += crc32.o
>  
>  obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
> diff --git a/arch/arm64/lib/copy_page_mc.S b/arch/arm64/lib/copy_page_mc.S
> new file mode 100644
> index ..655161363dcf
> --- /dev/null
> +++ b/arch/arm64/lib/copy_page_mc.S
> @@ -0,0 +1,86 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2012 ARM Ltd.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define CPY_MC(l, x...)  \
> +:   x;   \
> + _asm_extable_copy_page_mcb, l
> +
> +/*
> + * Copy a page from src to dest (both are page 

Re: [PATCH -next v4 6/7] arm64: add {get, put}_user to machine check safe

2022-05-13 Thread Mark Rutland
On Wed, Apr 20, 2022 at 03:04:17AM +, Tong Tiangen wrote:
> Add {get, put}_user() to machine check safe.
> 
> If get/put fail due to hardware memory error, only the relevant processes
> are affected, so killing the user process and isolate the user page with
> hardware memory errors is a more reasonable choice than kernel panic.
> 
> Add new extable type EX_TYPE_UACCESS_MC_ERR_ZERO which can be used for
> uaccess that can be recovered from hardware memory errors. The difference
> from EX_TYPE_UACCESS_MC is that this type also sets additional two target
> register which save error code and value needs to be set zero.

Why does this need to be in any way distinct from the existing
EX_TYPE_UACCESS_ERR_ZERO ?

Other than the case where we currently (ab)use that for
copy_{to,from}_kernel_nofault(), where do we *not* want to use
EX_TYPE_UACCESS_ERR_ZERO and *not* recover from a memory error?

Thanks,
Mark.

> 
> Signed-off-by: Tong Tiangen 
> ---
>  arch/arm64/include/asm/asm-extable.h | 14 ++
>  arch/arm64/include/asm/uaccess.h |  4 ++--
>  arch/arm64/mm/extable.c  |  4 
>  3 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/asm-extable.h 
> b/arch/arm64/include/asm/asm-extable.h
> index 75b2c00e9523..80410899a9ad 100644
> --- a/arch/arm64/include/asm/asm-extable.h
> +++ b/arch/arm64/include/asm/asm-extable.h
> @@ -13,6 +13,7 @@
>  
>  /* _MC indicates that can fixup from machine check errors */
>  #define EX_TYPE_UACCESS_MC   5
> +#define EX_TYPE_UACCESS_MC_ERR_ZERO  6
>  
>  #ifdef __ASSEMBLY__
>  
> @@ -78,6 +79,15 @@
>  #define EX_DATA_REG(reg, gpr)
> \
>   "((.L__gpr_num_" #gpr ") << " __stringify(EX_DATA_REG_##reg##_SHIFT) ")"
>  
> +#define _ASM_EXTABLE_UACCESS_MC_ERR_ZERO(insn, fixup, err, zero) 
> \
> + __DEFINE_ASM_GPR_NUMS   
> \
> + __ASM_EXTABLE_RAW(#insn, #fixup,
> \
> +   __stringify(EX_TYPE_UACCESS_MC_ERR_ZERO), 
> \
> +   "("   
> \
> + EX_DATA_REG(ERR, err) " | " 
> \
> + EX_DATA_REG(ZERO, zero) 
> \
> +   ")")
> +
>  #define _ASM_EXTABLE_UACCESS_ERR_ZERO(insn, fixup, err, zero)
> \
>   __DEFINE_ASM_GPR_NUMS   \
>   __ASM_EXTABLE_RAW(#insn, #fixup,\
> @@ -90,6 +100,10 @@
>  #define _ASM_EXTABLE_UACCESS_ERR(insn, fixup, err)   \
>   _ASM_EXTABLE_UACCESS_ERR_ZERO(insn, fixup, err, wzr)
>  
> +
> +#define _ASM_EXTABLE_UACCESS_MC_ERR(insn, fixup, err)
> \
> + _ASM_EXTABLE_UACCESS_MC_ERR_ZERO(insn, fixup, err, wzr)
> +
>  #define EX_DATA_REG_DATA_SHIFT   0
>  #define EX_DATA_REG_DATA GENMASK(4, 0)
>  #define EX_DATA_REG_ADDR_SHIFT   5
> diff --git a/arch/arm64/include/asm/uaccess.h 
> b/arch/arm64/include/asm/uaccess.h
> index e8dce0cc5eaa..e41b47df48b0 100644
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@ -236,7 +236,7 @@ static inline void __user *__uaccess_mask_ptr(const void 
> __user *ptr)
>   asm volatile(   \
>   "1: " load "" reg "1, [%2]\n"   \
>   "2:\n"  \
> - _ASM_EXTABLE_UACCESS_ERR_ZERO(1b, 2b, %w0, %w1) \
> + _ASM_EXTABLE_UACCESS_MC_ERR_ZERO(1b, 2b, %w0, %w1)  \
>   : "+r" (err), "=" (x) \
>   : "r" (addr))
>  
> @@ -325,7 +325,7 @@ do {  
> \
>   asm volatile(   \
>   "1: " store "   " reg "1, [%2]\n"   \
>   "2:\n"  \
> - _ASM_EXTABLE_UACCESS_ERR(1b, 2b, %w0)   \
> + _ASM_EXTABLE_UACCESS_MC_ERR(1b, 2b, %w0)\
>   : "+r" (err)\
>   : "r" (x), "r" (addr))
>  
> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
> index 525876c3ebf4..1023ccdb2f89 100644
> --- a/arch/arm64/mm/extable.c
> +++ b/arch/arm64/mm/extable.c
> @@ -88,6 +88,7 @@ bool fixup_exception(struct pt_regs *regs)
>   case EX_TYPE_BPF:
>   return ex_handler_bpf(ex, regs);
>   case EX_TYPE_UACCESS_ERR_ZERO:
> + case EX_TYPE_UACCESS_MC_ERR_ZERO:
>   return ex_handler_uaccess_err_zero(ex, regs);
>   case EX_TYPE_LOAD_UNALIGNED_ZEROPAD:
>   return 

Re: [PATCH -next v4 5/7] arm64: mte: Clean up user tag accessors

2022-05-13 Thread Mark Rutland
On Wed, Apr 20, 2022 at 03:04:16AM +, Tong Tiangen wrote:
> From: Robin Murphy 
> 
> Invoking user_ldst to explicitly add a post-increment of 0 is silly.
> Just use a normal USER() annotation and save the redundant instruction.
> 
> Signed-off-by: Robin Murphy 
> Reviewed-by: Tong Tiangen 

When posting someone else's patch, you need to add your own
Signed-off-by tag. Please see:

  
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin

That said, the patch itself looks sane, and matches its original posting
at:

  
https://lore.kernel.org/linux-arm-kernel/38c6d4b5-a3db-5c3e-02e7-39875edb3...@arm.com/

So:

  Acked-by: Mark Rutland 

Catalin, are you happy to pick up this patch as a cleanup?

Thanks,
Mark.

> ---
>  arch/arm64/lib/mte.S | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/lib/mte.S b/arch/arm64/lib/mte.S
> index 8590af3c98c0..eeb9e45bcce8 100644
> --- a/arch/arm64/lib/mte.S
> +++ b/arch/arm64/lib/mte.S
> @@ -93,7 +93,7 @@ SYM_FUNC_START(mte_copy_tags_from_user)
>   mov x3, x1
>   cbz x2, 2f
>  1:
> - user_ldst 2f, ldtrb, w4, x1, 0
> +USER(2f, ldtrb   w4, [x1])
>   lsl x4, x4, #MTE_TAG_SHIFT
>   stg x4, [x0], #MTE_GRANULE_SIZE
>   add x1, x1, #1
> @@ -120,7 +120,7 @@ SYM_FUNC_START(mte_copy_tags_to_user)
>  1:
>   ldg x4, [x1]
>   ubfxx4, x4, #MTE_TAG_SHIFT, #MTE_TAG_SIZE
> - user_ldst 2f, sttrb, w4, x0, 0
> +USER(2f, sttrb   w4, [x0])
>   add x0, x0, #1
>   add x1, x1, #MTE_GRANULE_SIZE
>   subsx2, x2, #1
> -- 
> 2.25.1
> 


Re: [PATCH -next v4 4/7] arm64: add copy_{to, from}_user to machine check safe

2022-05-13 Thread Mark Rutland
On Wed, Apr 20, 2022 at 03:04:15AM +, Tong Tiangen wrote:
> Add copy_{to, from}_user() to machine check safe.
> 
> If copy fail due to hardware memory error, only the relevant processes are
> affected, so killing the user process and isolate the user page with
> hardware memory errors is a more reasonable choice than kernel panic.
> 
> Add new extable type EX_TYPE_UACCESS_MC which can be used for uaccess that
> can be recovered from hardware memory errors.

I don't understand why we need this.

If we apply EX_TYPE_UACCESS consistently to *all* user accesses, and
*only* to user accesses, that would *always* indicate that we can
recover, and that seems much simpler to deal with.

Today we use EX_TYPE_UACCESS_ERR_ZERO for kernel accesses in a couple of
cases, which we should clean up, and we user EX_TYPE_FIXUP for a couple
of user accesses, but those could easily be converted over.

> The x16 register is used to save the fixup type in copy_xxx_user which
> used extable type EX_TYPE_UACCESS_MC.

Why x16?

How is this intended to be consumed, and why is that behaviour different
from any *other* fault?

Mark.

> Signed-off-by: Tong Tiangen 
> ---
>  arch/arm64/include/asm/asm-extable.h | 14 ++
>  arch/arm64/include/asm/asm-uaccess.h | 15 ++-
>  arch/arm64/lib/copy_from_user.S  | 18 +++---
>  arch/arm64/lib/copy_to_user.S| 18 +++---
>  arch/arm64/mm/extable.c  | 18 ++
>  5 files changed, 60 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/asm-extable.h 
> b/arch/arm64/include/asm/asm-extable.h
> index c39f2437e08e..75b2c00e9523 100644
> --- a/arch/arm64/include/asm/asm-extable.h
> +++ b/arch/arm64/include/asm/asm-extable.h
> @@ -2,12 +2,18 @@
>  #ifndef __ASM_ASM_EXTABLE_H
>  #define __ASM_ASM_EXTABLE_H
>  
> +#define FIXUP_TYPE_NORMAL0
> +#define FIXUP_TYPE_MC1
> +
>  #define EX_TYPE_NONE 0
>  #define EX_TYPE_FIXUP1
>  #define EX_TYPE_BPF  2
>  #define EX_TYPE_UACCESS_ERR_ZERO 3
>  #define EX_TYPE_LOAD_UNALIGNED_ZEROPAD   4
>  
> +/* _MC indicates that can fixup from machine check errors */
> +#define EX_TYPE_UACCESS_MC   5
> +
>  #ifdef __ASSEMBLY__
>  
>  #define __ASM_EXTABLE_RAW(insn, fixup, type, data)   \
> @@ -27,6 +33,14 @@
>   __ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_FIXUP, 0)
>   .endm
>  
> +/*
> + * Create an exception table entry for `insn`, which will branch to `fixup`
> + * when an unhandled fault(include sea fault) is taken.
> + */
> + .macro  _asm_extable_uaccess_mc, insn, fixup
> + __ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_UACCESS_MC, 0)
> + .endm
> +
>  /*
>   * Create an exception table entry for `insn` if `fixup` is provided. 
> Otherwise
>   * do nothing.
> diff --git a/arch/arm64/include/asm/asm-uaccess.h 
> b/arch/arm64/include/asm/asm-uaccess.h
> index 0557af834e03..6c23c138e1fc 100644
> --- a/arch/arm64/include/asm/asm-uaccess.h
> +++ b/arch/arm64/include/asm/asm-uaccess.h
> @@ -63,6 +63,11 @@ alternative_else_nop_endif
>  :x;  \
>   _asm_extableb, l
>  
> +
> +#define USER_MC(l, x...) \
> +:x;  \
> + _asm_extable_uaccess_mc b, l
> +
>  /*
>   * Generate the assembly for LDTR/STTR with exception table entries.
>   * This is complicated as there is no post-increment or pair versions of the
> @@ -73,8 +78,8 @@ alternative_else_nop_endif
>  8889:ldtr\reg2, [\addr, #8];
>   add \addr, \addr, \post_inc;
>  
> - _asm_extableb,\l;
> - _asm_extable8889b,\l;
> + _asm_extable_uaccess_mc b, \l;
> + _asm_extable_uaccess_mc 8889b, \l;
>   .endm
>  
>   .macro user_stp l, reg1, reg2, addr, post_inc
> @@ -82,14 +87,14 @@ alternative_else_nop_endif
>  8889:sttr\reg2, [\addr, #8];
>   add \addr, \addr, \post_inc;
>  
> - _asm_extableb,\l;
> - _asm_extable8889b,\l;
> + _asm_extable_uaccess_mc b,\l;
> + _asm_extable_uaccess_mc 8889b,\l;
>   .endm
>  
>   .macro user_ldst l, inst, reg, addr, post_inc
>  :\inst   \reg, [\addr];
>   add \addr, \addr, \post_inc;
>  
> - _asm_extableb,\l;
> + _asm_extable_uaccess_mc b, \l;
>   .endm
>  #endif
> diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
> index 34e317907524..480cc5ac0a8d 100644
> --- a/arch/arm64/lib/copy_from_user.S
> +++ b/arch/arm64/lib/copy_from_user.S
> @@ -25,7 +25,7 @@
>   .endm
>  
>   .macro strb1 reg, ptr, val
> - strb \reg, [\ptr], \val
> + USER_MC(9998f, strb \reg, [\ptr], \val)
>   .endm
>  
>   .macro 

Re: [PATCH -next v4 3/7] arm64: add support for machine check error safe

2022-05-13 Thread Mark Rutland
On Wed, Apr 20, 2022 at 03:04:14AM +, Tong Tiangen wrote:
> During the processing of arm64 kernel hardware memory errors(do_sea()), if
> the errors is consumed in the kernel, the current processing is panic.
> However, it is not optimal.
> 
> Take uaccess for example, if the uaccess operation fails due to memory
> error, only the user process will be affected, kill the user process
> and isolate the user page with hardware memory errors is a better choice.

Conceptually, I'm fine with the idea of constraining what we do for a
true uaccess, but I don't like the implementation of this at all, and I
think we first need to clean up the arm64 extable usage to clearly
distinguish a uaccess from another access.

> This patch only enable machine error check framework, it add exception
> fixup before kernel panic in do_sea() and only limit the consumption of
> hardware memory errors in kernel mode triggered by user mode processes.
> If fixup successful, panic can be avoided.
> 
> Consistent with PPC/x86, it is implemented by CONFIG_ARCH_HAS_COPY_MC.
> 
> Also add copy_mc_to_user() in include/linux/uaccess.h, this helper is
> called when CONFIG_ARCH_HAS_COPOY_MC is open.
> 
> Signed-off-by: Tong Tiangen 
> ---
>  arch/arm64/Kconfig   |  1 +
>  arch/arm64/include/asm/extable.h |  1 +
>  arch/arm64/mm/extable.c  | 17 +
>  arch/arm64/mm/fault.c| 27 ++-
>  include/linux/uaccess.h  |  9 +
>  5 files changed, 54 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index d9325dd95eba..012e38309955 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -19,6 +19,7 @@ config ARM64
>   select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
>   select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
>   select ARCH_HAS_CACHE_LINE_SIZE
> + select ARCH_HAS_COPY_MC if ACPI_APEI_GHES
>   select ARCH_HAS_CURRENT_STACK_POINTER
>   select ARCH_HAS_DEBUG_VIRTUAL
>   select ARCH_HAS_DEBUG_VM_PGTABLE
> diff --git a/arch/arm64/include/asm/extable.h 
> b/arch/arm64/include/asm/extable.h
> index 72b0e71cc3de..f80ebd0addfd 100644
> --- a/arch/arm64/include/asm/extable.h
> +++ b/arch/arm64/include/asm/extable.h
> @@ -46,4 +46,5 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
>  #endif /* !CONFIG_BPF_JIT */
>  
>  bool fixup_exception(struct pt_regs *regs);
> +bool fixup_exception_mc(struct pt_regs *regs);
>  #endif
> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
> index 489455309695..4f0083a550d4 100644
> --- a/arch/arm64/mm/extable.c
> +++ b/arch/arm64/mm/extable.c
> @@ -9,6 +9,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  static inline unsigned long
>  get_ex_fixup(const struct exception_table_entry *ex)
> @@ -84,3 +85,19 @@ bool fixup_exception(struct pt_regs *regs)
>  
>   BUG();
>  }
> +
> +bool fixup_exception_mc(struct pt_regs *regs)
> +{
> + const struct exception_table_entry *ex;
> +
> + ex = search_exception_tables(instruction_pointer(regs));
> + if (!ex)
> + return false;
> +
> + /*
> +  * This is not complete, More Machine check safe extable type can
> +  * be processed here.
> +  */
> +
> + return false;
> +}

This is at best misnamed; It doesn't actually apply the fixup, it just
searches for one.

> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 77341b160aca..a9e6fb1999d1 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -695,6 +695,29 @@ static int do_bad(unsigned long far, unsigned int esr, 
> struct pt_regs *regs)
>   return 1; /* "fault" */
>  }
>  
> +static bool arm64_do_kernel_sea(unsigned long addr, unsigned int esr,
> +  struct pt_regs *regs, int sig, int code)
> +{
> + if (!IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC))
> + return false;
> +
> + if (user_mode(regs) || !current->mm)
> + return false;
> +
> + if (apei_claim_sea(regs) < 0)
> + return false;
> +
> + if (!fixup_exception_mc(regs))
> + return false;
> +
> + set_thread_esr(0, esr);
> +
> + arm64_force_sig_fault(sig, code, addr,
> + "Uncorrected hardware memory error in kernel-access\n");
> +
> + return true;
> +}
> +
>  static int do_sea(unsigned long far, unsigned int esr, struct pt_regs *regs)
>  {
>   const struct fault_info *inf;
> @@ -720,7 +743,9 @@ static int do_sea(unsigned long far, unsigned int esr, 
> struct pt_regs *regs)
>*/
>   siaddr  = untagged_addr(far);
>   }
> - arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
> +
> + if (!arm64_do_kernel_sea(siaddr, esr, regs, inf->sig, inf->code))
> + arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, 
> esr);
>  
>   return 0;
>  }
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index 

Re: [PATCH 11/30] um: Improve panic notifiers consistency and ordering

2022-05-13 Thread Johannes Berg
On Wed, 2022-05-11 at 17:22 -0300, Guilherme G. Piccoli wrote:
> On 10/05/2022 11:28, Petr Mladek wrote:
> > [...]
> > It is not clear to me why user mode linux should not care about
> > the other notifiers. It might be because I do not know much
> > about the user mode linux.
> > 
> > Is the because they always create core dump or are never running
> > in a hypervisor or ...?
> > 
> > AFAIK, the notifiers do many different things. For example, there
> > is a notifier that disables RCU watchdog, print some extra
> > information. Why none of them make sense here?
> > 
> 
> Hi Petr, my understanding is that UML is a form of running Linux as a
> regular userspace process for testing purposes.

Correct.

> With that said, as soon
> as we exit in the error path, less "pollution" would happen, so users
> can use GDB to debug the core dump for example.
> 
> In later patches of this series (when we split the panic notifiers in 3
> lists) these UML notifiers run in the pre-reboot list, so they run after
> the informational notifiers for example (in the default level).
> But without the list split we cannot order properly, so my gut feeling
> is that makes sense to run them rather earlier than later in the panic
> process...
> 
> Maybe Anton / Johannes / Richard could give their opinions - appreciate
> that, I'm not attached to the priority here, it's more about users'
> common usage of UML I can think of...

It's hard to say ... In a sense I'm not sure it matters?

OTOH something like the ftrace dump notifier (kernel/trace/trace.c)
might still be useful to run before the mconsole and coredump ones, even
if you could probably use gdb to figure out the information.

Personally, I don't have a scenario where I'd care about the trace
buffers though, and most of the others I found would seem irrelevant
(drivers that aren't even compiled, hung tasks won't really happen since
we exit immediately, and similar.)

johannes


Re: [PATCH v3] KVM: PPC: Book3S PR: Enable MSR_DR for switch_mmu_context()

2022-05-13 Thread Christophe Leroy


Le 10/05/2022 à 14:37, Alexander Graf a écrit :
> Commit 863771a28e27 ("powerpc/32s: Convert switch_mmu_context() to C")
> moved the switch_mmu_context() to C. While in principle a good idea, it
> meant that the function now uses the stack. The stack is not accessible
> from real mode though.
> 
> So to keep calling the function, let's turn on MSR_DR while we call it.
> That way, all pointer references to the stack are handled virtually.
> 
> In addition, make sure to save/restore r12 on the stack, as it may get
> clobbered by the C function.
> 
> Reported-by: Matt Evans 
> Fixes: 863771a28e27 ("powerpc/32s: Convert switch_mmu_context() to C")
> Signed-off-by: Alexander Graf 
> Cc: sta...@vger.kernel.org # v5.14+
> 
> ---
> 
> v1 -> v2:
> 
>- Save and restore R12, so that we don't touch volatile registers
>  while calling into C.
> 
> v2 -> v3:
> 
>- Save and restore R12 on the stack. SPRGs may be clobbered by
>  page faults.
> ---
>   arch/powerpc/kvm/book3s_32_sr.S | 26 +-
>   1 file changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_32_sr.S b/arch/powerpc/kvm/book3s_32_sr.S
> index e3ab9df6cf19..6cfcd20d4668 100644
> --- a/arch/powerpc/kvm/book3s_32_sr.S
> +++ b/arch/powerpc/kvm/book3s_32_sr.S
> @@ -122,11 +122,27 @@
>   
>   /* 0x0 - 0xb */
>   
> - /* 'current->mm' needs to be in r4 */
> - tophys(r4, r2)
> - lwz r4, MM(r4)
> - tophys(r4, r4)
> - /* This only clobbers r0, r3, r4 and r5 */
> + /* switch_mmu_context() needs paging, let's enable it */
> + mfmsr   r9
> + ori r11, r9, MSR_DR
> + mtmsr   r11
> + sync
> +
> + /* switch_mmu_context() clobbers r12, rescue it */
> + SAVE_GPR(12, r1)
> +
> + /* Calling switch_mmu_context(, current->mm, ); */
> + lwz r4, MM(r2)
>   bl  switch_mmu_context
>   
> + /* restore r12 */
> + REST_GPR(12, r1)
> +
> + /* Disable paging again */
> + mfmsr   r9
> + li  r6, MSR_DR
> + andcr9, r9, r6

Instead of li/andc you can do:

rlwinm  r9, r9, 0, ~MSR_DR

> + mtmsr   r9
> + sync
> +
>   .endm

Re: [PATCH 5/5] bpf ppc32: Add instructions for atomic_[cmp]xchg

2022-05-13 Thread Christophe Leroy


Le 12/05/2022 à 09:45, Hari Bathini a écrit :
> This adds two atomic opcodes BPF_XCHG and BPF_CMPXCHG on ppc32, both
> of which include the BPF_FETCH flag.  The kernel's atomic_cmpxchg
> operation fundamentally has 3 operands, but we only have two register
> fields. Therefore the operand we compare against (the kernel's API
> calls it 'old') is hard-coded to be BPF_REG_R0. Also, kernel's
> atomic_cmpxchg returns the previous value at dst_reg + off. JIT the
> same for BPF too with return value put in BPF_REG_0.
> 
>BPF_REG_R0 = atomic_cmpxchg(dst_reg + off, BPF_REG_R0, src_reg);


Ah, now we mix the xchg's with the bitwise operations. Ok I understand 
better that goto atomic_ops in the previous patch then. But it now 
becomes uneasy to read and follow.

I think it would be cleaner to separate completely the bitwise 
operations and this, even if it duplicates half a dozen of lines.

> 
> Signed-off-by: Hari Bathini 
> ---
>   arch/powerpc/net/bpf_jit_comp32.c | 17 +
>   1 file changed, 17 insertions(+)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
> b/arch/powerpc/net/bpf_jit_comp32.c
> index 5604ae1b60ab..4690fd6e9e52 100644
> --- a/arch/powerpc/net/bpf_jit_comp32.c
> +++ b/arch/powerpc/net/bpf_jit_comp32.c
> @@ -829,6 +829,23 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
> struct codegen_context *
>   /* we're done if this succeeded */
>   PPC_BCC_SHORT(COND_NE, tmp_idx);
>   break;
> + case BPF_CMPXCHG:
> + /* Compare with old value in BPF_REG_0 */
> + EMIT(PPC_RAW_CMPW(bpf_to_ppc(BPF_REG_0), _R0));
> + /* Don't set if different from old value */
> + PPC_BCC_SHORT(COND_NE, (ctx->idx + 3) * 4);
> + fallthrough;
> + case BPF_XCHG:
> + /* store new value */
> + EMIT(PPC_RAW_STWCX(src_reg, tmp_reg, dst_reg));
> + PPC_BCC_SHORT(COND_NE, tmp_idx);
> + /*
> +  * Return old value in src_reg for BPF_XCHG &
> +  * BPF_REG_0 for BPF_CMPXCHG.
> +  */
> + EMIT(PPC_RAW_MR(imm == BPF_XCHG ? src_reg : 
> bpf_to_ppc(BPF_REG_0),
> + _R0));

If the line spreads into two lines, compact form is probably not worth 
it. Would be more readable as

if (imm == BPF_XCHG)
EMIT_PPC_RAW_MR(src_reg, _R0));
else
EMIT_PPC_RAW_MR(src_reg, bpf_to_ppc(BPF_REG_0)));


At the end, it's probably even more readable if you separate both cases 
completely:

case BPF_CMPXCHG:
/* Compare with old value in BPF_REG_0 */
EMIT(PPC_RAW_CMPW(bpf_to_ppc(BPF_REG_0), _R0));
/* Don't set if different from old value */
PPC_BCC_SHORT(COND_NE, (ctx->idx + 3) * 4);
/* store new value */
EMIT(PPC_RAW_STWCX(src_reg, tmp_reg, dst_reg));
PPC_BCC_SHORT(COND_NE, tmp_idx);
/* Return old value in BPF_REG_0 */
EMIT_PPC_RAW_MR(src_reg, bpf_to_ppc(BPF_REG_0)));
break;
case BPF_XCHG:
/* store new value */
EMIT(PPC_RAW_STWCX(src_reg, tmp_reg, dst_reg));
PPC_BCC_SHORT(COND_NE, tmp_idx);
/* Return old value in src_reg */
EMIT_PPC_RAW_MR(src_reg, _R0));
break;


> + break;
>   default:
>   pr_err_ratelimited("eBPF filter atomic op code 
> %02x (@%d) unsupported\n",
>  code, i);

Re: [PATCH V2] tools/perf/tests: Fix session topology test to skip the test in guest environment

2022-05-13 Thread Disha Goel


-Original Message-
From: Athira Rajeev 
To: a...@kernel.org, jo...@kernel.org
Cc: m...@ellerman.id.au, linux-perf-us...@vger.kernel.org, 
linuxppc-dev@lists.ozlabs.org, ma...@linux.vnet.ibm.com, 
rnsas...@linux.ibm.com, kj...@linux.ibm.com, disg...@linux.vnet.ibm.com
, irog...@google.com
Subject: [PATCH V2] tools/perf/tests: Fix session topology test to skip
the test in guest environment
Date: Wed, 11 May 2022 17:19:59 +0530

The session topology test fails in powerpc pSeries platform.Test
logs:<<>>Session topology : FAILED!<<>>
This testcases tests cpu topology by checking the core_id andsocket_id
stored in perf_env from perf session. The data fromperf session is
compared with the cpu topology informationfrom
"/sys/devices/system/cpu/cpuX/topology" like
core_id,physical_package_id. In case of virtual environment, detaillike
physical_package_id is restricted to be exposed.
Hencephysical_package_id is set to -1. The testcase fails on
suchplatforms since socket_id can't be fetched from topology info.
Skip the testcase in powerpc if physical_package_id returns -1
Signed-off-by: Athira Rajeev ---
Changelog:v1 -> v2: Addressed review comments from Arnaldo and Michael
Ellerman. skip the test in powerpc when physical_package_id is set to
-1. Dropped patch 1 from V1 since current change doesn't use info from
/proc/cpuinfo and rather uses physical_package_id value.
 tools/perf/tests/topology.c | 11 +++ 1 file changed, 11
insertions(+)
Tested the patch on powerpc and powernv, session topology test works
fine with the patch.Tested-by: Disha Goel 
diff --git a/tools/perf/tests/topology.c
b/tools/perf/tests/topology.cindex ee1e3dcbc0bd..d23a9e322ff5 100644---
a/tools/perf/tests/topology.c+++ b/tools/perf/tests/topology.c@@ -109,6
+109,17 @@ static int check_cpu_topology(char *path, struct
perf_cpu_map *map)  && strncmp(session-
>header.env.arch, "aarch64", 7))return TEST_SKIP;
+   /*+  * In powerpc pSeries platform, not all the topology
information+ * are exposed via sysfs. Due to restriction, detail
like+* physical_package_id will be set to -1. Hence skip this+  
 * test if physical_package_id returns -1 for cpu from perf_cpu_map.+   
 */+if (strncmp(session->header.env.arch, "powerpc", 7)) {+ 
if (cpu__get_socket_id(perf_cpu_map__cpu(map, 0)) == -1)+   
return TEST_SKIP;+  }+  TEST_ASSERT_VAL("Session header
CPU map not set", session->header.env.cpu);
for (i = 0; i < session->header.env.nr_cpus_avail; i++) {


Re: [PATCH V2] tools/perf/tests: Skip perf BPF test if clang is not present

2022-05-13 Thread Disha Goel


-Original Message-
From: Athira Rajeev 
To: a...@kernel.org, jo...@kernel.org
Cc: m...@ellerman.id.au, linux-perf-us...@vger.kernel.org, 
linuxppc-dev@lists.ozlabs.org, ma...@linux.vnet.ibm.com, 
rnsas...@linux.ibm.com, kj...@linux.ibm.com, disg...@linux.vnet.ibm.com
, irog...@google.com
Subject: [PATCH V2] tools/perf/tests: Skip perf BPF test if clang is
not present
Date: Wed, 11 May 2022 17:24:38 +0530

Perf BPF filter test fails in environment where "clang"is not
installed.
Test failure logs:
<<>> 42: BPF filter: 42.1: Basic BPF
filtering : Skip 42.2: BPF pinning : FAILED!
42.3: BPF prologue generation : FAILED!<<>>
Enabling verbose option provided debug logs which saysclang/llvm needs
to be installed. Snippet of verbose logs:
<<>> 42.2: BPF pinning  : --- start ---test child
forked, pid 61423ERROR: unable to find clang.Hint:  Try to install
latest clang/llvm to support BPF.Check your $PATH
<>
Failed to compile test case: 'Basic BPF llvm compile'Unable to get BPF
object, fix kbuild firsttest child finished with -1  end BPF
filter subtest 2: FAILED!<<>>
Here subtests, "BPF pinning" and "BPF prologue generation"failed and
logs shows clang/llvm is needed. After installingclang, testcase
passes.
Reason on why subtest failure happens though logs has properdebug
information:Main function __test__bpf calls test_llvm__fetch_bpf_obj
bypassing 4th argument as true ( 4th arguments maps to parameter"force"
in test_llvm__fetch_bpf_obj ). But this will
causetest_llvm__fetch_bpf_obj to skip the check for clang/llvm.
Snippet of code part which checks for clang based onparameter "force"
in test_llvm__fetch_bpf_obj:
<<>>if (!force && (!llvm_param.user_set_param &&<<>>
Since force is set to "false", test won't get skipped andfails to
compile test case. The BPF code compilation needsclang, So pass the
fourth argument as "false" and also skipthe test if reason for return
is "TEST_SKIP"
After the patch:
<<>> 42: BPF filter: 42.1: Basic BPF
filtering : Skip 42.2: BPF pinning : Skip 42.3:
BPF prologue generation : Skip<<>>
Fixes: ba1fae431e74 ("perf test: Add 'perf test BPF'")Signed-off-by:
Athira Rajeev ---Changelog: Addressed
review comments from Arnaldo by adding reason for skipping of testcase.
 tools/perf/tests/bpf.c | 10 ++ 1 file changed, 6
insertions(+), 4 deletions(-)
Tested the patch on powerpc and powernv, perf BPF test works fine with
the patch.Tested-by: Disha Goel 
diff --git a/tools/perf/tests/bpf.c b/tools/perf/tests/bpf.cindex
57b9591f7cbb..17c023823713 100644--- a/tools/perf/tests/bpf.c+++
b/tools/perf/tests/bpf.c@@ -222,11 +222,11 @@ static int
__test__bpf(int idx)
ret = test_llvm__fetch_bpf_obj(_buf, _buf_sz,   
   bpf_testcase_table[idx].prog_id,-
   true, NULL);+   
false, NULL);   if (ret != TEST_OK || !obj_buf || !obj_buf_sz) {
pr_debug("Unable to get BPF object, %s\n",  
 bpf_testcase_table[idx].msg_compile_fail);-if (idx == 0)+  
if ((idx == 0) || (ret == TEST_SKIP))   return
TEST_SKIP;  elsereturn TEST_FAIL;@@
-364,9 +364,11 @@ static int test__bpf_prologue_test(struct test_suite
*test __maybe_unused, static struct test_case bpf_tests[] = { #ifdef
HAVE_LIBBPF_SUPPORT TEST_CASE("Basic BPF filtering",
basic_bpf_test),-   TEST_CASE("BPF pinning", bpf_pinning),+ TEST_CA
SE_REASON("BPF pinning", bpf_pinning,+  "clang isn't
installed or environment missing BPF support"), #ifdef
HAVE_BPF_PROLOGUE-  TEST_CASE("BPF prologue generation",
bpf_prologue_test),+TEST_CASE_REASON("BPF prologue generation",
bpf_prologue_test,+ "clang isn't installed or
environment missing BPF support"), #elseTEST_CASE_REASON("BPF
prologue generation", bpf_prologue_test, "not compiled in"), #endif


Re: [PATCH 4/5] bpf ppc32: add support for BPF_ATOMIC bitwise operations

2022-05-13 Thread Christophe Leroy


Le 12/05/2022 à 09:45, Hari Bathini a écrit :
> Adding instructions for ppc32 for
> 
> atomic_and
> atomic_or
> atomic_xor
> atomic_fetch_add
> atomic_fetch_and
> atomic_fetch_or
> atomic_fetch_xor
> 
> Signed-off-by: Hari Bathini 
> ---
>   arch/powerpc/net/bpf_jit_comp32.c | 45 +--
>   1 file changed, 31 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
> b/arch/powerpc/net/bpf_jit_comp32.c
> index e46ed1e8c6ca..5604ae1b60ab 100644
> --- a/arch/powerpc/net/bpf_jit_comp32.c
> +++ b/arch/powerpc/net/bpf_jit_comp32.c
> @@ -798,25 +798,42 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
> struct codegen_context *
>* BPF_STX ATOMIC (atomic ops)
>*/
>   case BPF_STX | BPF_ATOMIC | BPF_W:
> - if (imm != BPF_ADD) {
> - pr_err_ratelimited("eBPF filter atomic op code 
> %02x (@%d) unsupported\n",
> -code, i);
> - return -ENOTSUPP;
> - }
> -
> - /* *(u32 *)(dst + off) += src */
> -
>   bpf_set_seen_register(ctx, tmp_reg);
>   /* Get offset into TMP_REG */
>   EMIT(PPC_RAW_LI(tmp_reg, off));
> + tmp_idx = ctx->idx * 4;
>   /* load value from memory into r0 */
>   EMIT(PPC_RAW_LWARX(_R0, tmp_reg, dst_reg, 0));
> - /* add value from src_reg into this */
> - EMIT(PPC_RAW_ADD(_R0, _R0, src_reg));
> - /* store result back */
> - EMIT(PPC_RAW_STWCX(_R0, tmp_reg, dst_reg));
> - /* we're done if this succeeded */
> - PPC_BCC_SHORT(COND_NE, (ctx->idx - 3) * 4);
> + switch (imm) {
> + case BPF_ADD:
> + case BPF_ADD | BPF_FETCH:
> + EMIT(PPC_RAW_ADD(_R0, _R0, src_reg));
> + goto atomic_ops;
> + case BPF_AND:
> + case BPF_AND | BPF_FETCH:
> + EMIT(PPC_RAW_AND(_R0, _R0, src_reg));
> + goto atomic_ops;
> + case BPF_OR:
> + case BPF_OR | BPF_FETCH:
> + EMIT(PPC_RAW_OR(_R0, _R0, src_reg));
> + goto atomic_ops;
> + case BPF_XOR:
> + case BPF_XOR | BPF_FETCH:
> + EMIT(PPC_RAW_XOR(_R0, _R0, src_reg));
> +atomic_ops:

This looks like an odd construct.

The default case doesn't fall through, so the below part could go after 
the switch and all cases could just break instead of goto atomic_ops.

> + /* For the BPF_FETCH variant, get old data into 
> src_reg */
> + if (imm & BPF_FETCH)
> + EMIT(PPC_RAW_LWARX(src_reg, tmp_reg, 
> dst_reg, 0));

I think this is wrong. By doing a new LWARX you kill the reservation 
done by the previous one. If the data has changed between the first 
LWARX and now, it will go undetected.

It should be a LWZX I believe.

But there is another problem: you clobber src_reg, then what happens if 
STWCX fails and it loops back to tmp_idx ?

> + /* store result back */
> + EMIT(PPC_RAW_STWCX(_R0, tmp_reg, dst_reg));
> + /* we're done if this succeeded */
> + PPC_BCC_SHORT(COND_NE, tmp_idx);
> + break;
> + default:
> + pr_err_ratelimited("eBPF filter atomic op code 
> %02x (@%d) unsupported\n",
> +code, i);
> + return -EOPNOTSUPP;
> + }
>   break;
>   
>   case BPF_STX | BPF_ATOMIC | BPF_DW: /* *(u64 *)(dst + off) += 
> src */

Re: [PATCH 0/5] Atomics support for eBPF on powerpc

2022-05-13 Thread Michael Ellerman
Daniel Borkmann  writes:
> On 5/12/22 9:45 AM, Hari Bathini wrote:
>> This patchset adds atomic operations to the eBPF instruction set on
>> powerpc. The instructions that are added here can be summarised with
>> this list of kernel operations for ppc64:
>> 
>> * atomic[64]_[fetch_]add
>> * atomic[64]_[fetch_]and
>> * atomic[64]_[fetch_]or
>> * atomic[64]_[fetch_]xor
>> * atomic[64]_xchg
>> * atomic[64]_cmpxchg
>> 
>> and this list of kernel operations for ppc32:
>> 
>> * atomic_[fetch_]add
>> * atomic_[fetch_]and
>> * atomic_[fetch_]or
>> * atomic_[fetch_]xor
>> * atomic_xchg
>> * atomic_cmpxchg
>> 
>> The following are left out of scope for this effort:
>> 
>> * 64 bit operations on ppc32.
>> * Explicit memory barriers, 16 and 8 bit operations on both ppc32
>>& ppc64.
>> 
>> The first patch adds support for bitwsie atomic operations on ppc64.
>> The next patch adds fetch variant support for these instructions. The
>> third patch adds support for xchg and cmpxchg atomic operations on
>> ppc64. Patch #4 adds support for 32-bit atomic bitwise operations on
>> ppc32. patch #5 adds support for xchg and cmpxchg atomic operations
>> on ppc32.
>
> Thanks for adding these, Hari! I presume they'll get routed via Michael,
> right?

Yeah I'm happy to take them if they are OK by you.

I do wonder if the BPF jit code should eventually move out of arch/, but
that's a discussion for another day.

> One thing that may be worth adding to the commit log as well is
> the test result from test_bpf.ko given it has an extensive suite around
> atomics useful for testing corner cases in JITs.

Yes please, test results make me feel much better about merging things :)

cheers