Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection

2020-04-02 Thread Thomas Gleixner
Jann Horn  writes:
> On Thu, Apr 2, 2020 at 9:34 AM Christian König  
> wrote:
>> Am 02.04.20 um 04:34 schrieb Jann Horn:
>> > [x86 folks in CC so that they can chime in on the precise rules for
>> > this stuff]

They are pretty simple.

Any code using FPU needs to be completely isolated from regular code
either by using inline asm or by moving it to a different compilation
unit. The invocations need fpu_begin/end() of course.

>> > I noticed that several makefiles under drivers/gpu/drm/amd/display/dc/
>> > turn on floating-point instructions in the compiler flags
>> > (-mhard-float, -msse and -msse2) in order to make the "float" and
>> > "double" types usable from C code without requiring helper functions.
>> >
>> > However, as far as I know, code running in normal kernel context isn't
>> > allowed to use floating-point registers without special protection
>> > using helpers like kernel_fpu_begin() and kernel_fpu_end() (which also
>> > require that the protected code never blocks). If you violate that
>> > rule, that can lead to various issues - among other things, I think
>> > the kernel will clobber userspace FPU register state, and I think the
>> > kernel code can blow up if a context switch happens at the wrong time,
>> > since in-kernel task switches don't preserve FPU state.
>> >
>> > Is there some hidden trick I'm missing that makes it okay to use FPU
>> > registers here?
>> >
>> > I would try testing this, but unfortunately none of the AMD devices I
>> > have here have the appropriate graphics hardware...
>>
>> yes, using the floating point calculations in the display code has been
>> a source of numerous problems and confusion in the past.
>>
>> The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind
>> the DC_FP_START() and DC_FP_END() macros which are supposed to hide the
>> architecture depend handling for x86 and PPC64.
>
> Hmm... but as far as I can tell, you're using those macros from inside
> functions that are already compiled with the FPU on:
>
>  - drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c uses the macros,
> but is already compiled with calcs_ccflags
>  - drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c uses the
> macros, but is already compiled with "-mhard-float -msse -msse2"
>  - drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c uses the
> macros, but is already compiled with "-mhard-float -msse -msse2"
>
> AFAIK as soon as you enter any function in any file compiled with FPU
> instructions, you may encounter SSE instructions, e.g. via things like
> compiler-generated memory-zeroing code - not just when you're actually
> using doubles or floats.

That's correct and this will silently cause FPU state corruption ...

We really need objtool support to validate that.

Peter, now that we know how to do it (noinstr, clac/stac) we can emit
annotations (see patch below) and validate that any FPU instruction is
inside a safe region. Hmm?

Thanks,

tglx

8<---
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -19,8 +19,27 @@
  * If you intend to use the FPU in softirq you need to check first with
  * irq_fpu_usable() if it is possible.
  */
-extern void kernel_fpu_begin(void);
-extern void kernel_fpu_end(void);
+extern void __kernel_fpu_begin(void);
+extern void __kernel_fpu_end(void);
+
+static inline void kernel_fpu_begin(void)
+{
+   asm volatile("%c0:\n\t"
+".pushsection .discard.fpu_begin\n\t"
+".long %c0b - .\n\t"
+".popsection\n\t" : : "i" (__COUNTER__));
+   __kernel_fpu_begin();
+}
+
+static inline void kernel_fpu_end(void)
+{
+   __kernel_fpu_end();
+   asm volatile("%c0:\n\t"
+".pushsection .discard.fpu_end\n\t"
+".long %c0b - .\n\t"
+".popsection\n\t" : : "i" (__COUNTER__));
+}
+
 extern bool irq_fpu_usable(void);
 extern void fpregs_mark_activate(void);
 
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -82,7 +82,7 @@ bool irq_fpu_usable(void)
 }
 EXPORT_SYMBOL(irq_fpu_usable);
 
-void kernel_fpu_begin(void)
+void __kernel_fpu_begin(void)
 {
preempt_disable();
 
@@ -102,16 +102,16 @@ void kernel_fpu_begin(void)
}
__cpu_invalidate_fpregs_state();
 }
-EXPORT_SYMBOL_GPL(kernel_fpu_begin);
+EXPORT_SYMBOL_GPL(__kernel_fpu_begin);
 
-void kernel_fpu_end(void)
+void __kernel_fpu_end(void)
 {
WARN_ON_FPU(!this_cpu_read(in_kernel_fpu));
 
this_cpu_write(in_kernel_fpu, false);
preempt_enable();
 }
-EXPORT_SYMBOL_GPL(kernel_fpu_end);
+EXPORT_SYMBOL_GPL(__kernel_fpu_end);
 
 /*
  * Save the FPU state (mark it for reload if necessary):



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH RFC PKS/PMEM 05/58] kmap: Introduce k[un]map_thread

2020-11-10 Thread Thomas Gleixner
Ira,

On Fri, Oct 09 2020 at 12:49, ira weiny wrote:
> From: Ira Weiny 
>
> To correctly support the semantics of kmap() with Kernel protection keys
> (PKS), kmap() may be required to set the protections on multiple
> processors (globally).  Enabling PKS globally can be very expensive
> depending on the requested operation.  Furthermore, enabling a domain
> globally reduces the protection afforded by PKS.
>
> Most kmap() (Aprox 209 of 229) callers use the map within a single thread and
> have no need for the protection domain to be enabled globally.  However, the
> remaining callers do not follow this pattern and, as best I can tell, expect
> the mapping to be 'global' and available to any thread who may access the
> mapping.[1]
>
> We don't anticipate global mappings to pmem, however in general there is a
> danger in changing the semantics of kmap().  Effectively, this would cause an
> unresolved page fault with little to no information about why the failure
> occurred.
>
> To resolve this a number of options were considered.
>
> 1) Attempt to change all the thread local kmap() calls to kmap_atomic()[2]
> 2) Introduce a flags parameter to kmap() to indicate if the mapping should be
>global or not
> 3) Change ~20 call sites to 'kmap_global()' to indicate that they require a
>global enablement of the pages.
> 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is 
> to
>be used within that thread of execution only
>
> Option 1 is simply not feasible.  Option 2 would require all of the call sites
> of kmap() to change.  Option 3 seems like a good minimal change but there is a
> danger that new code may miss the semantic change of kmap() and not get the
> behavior the developer intended.  Therefore, #4 was chosen.

There is Option #5:

Convert the thread local kmap() invocations to the proposed kmap_local()
interface which is coming along [1].

That solves a couple of issues:

 1) It relieves the current kmap_atomic() usage sites from the implict
pagefault/preempt disable semantics which apply even when
CONFIG_HIGHMEM is disabled. kmap_local() still can be invoked from
atomic context.

 2) Due to #1 it allows to replace the conditional usage of kmap() and
kmap_atomic() for purely thread local mappings.

 3) It puts the burden on the HIGHMEM inflicted systems

 4) It is actually more efficient for most of the pure thread local use
cases on HIGHMEM inflicted systems because it avoids the overhead of
the global lock and the potential kmap slot exhaustion. A potential
preemption will be more expensive, but that's not really the case we
want to optimize for.

 5) It solves the RT issue vs. kmap_atomic()

So instead of creating yet another variety of kmap() which is just
scratching the particular PKRS itch, can we please consolidate all of
that on the wider reaching kmap_local() approach?

Thanks,

tglx
 
[1] https://lore.kernel.org/lkml/20201103092712.714480...@linutronix.de/

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH RFC PKS/PMEM 05/58] kmap: Introduce k[un]map_thread

2020-11-10 Thread Thomas Gleixner
On Mon, Nov 09 2020 at 20:59, Ira Weiny wrote:
> On Tue, Nov 10, 2020 at 02:13:56AM +0100, Thomas Gleixner wrote:
> Also, we can convert the new memcpy_*_page() calls to kmap_local() as well.
> [For now my patch just uses kmap_atomic().]
>
> I've not looked at all of the patches in your latest version.  Have you
> included converting any of the kmap() call sites?  I thought you were more
> focused on converting the kmap_atomic() to kmap_local()?

I did not touch any of those yet, but it's a logical consequence to
convert all kmap() instances which are _not_ creating a global mapping
over to it.

Thanks,

tglx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH v3 1/2] cpu/smt: add a notifier to notify the SMT changes

2023-03-31 Thread Thomas Gleixner
On Fri, Mar 31 2023 at 05:49, WenYou Yang wrote:


Removing pointlessly copied mail headers. Please fix your email
client


>> >
>> > So what do you want to have happen when someone goes and manually
>> > offlines all the SMT siblings using
>> > /sys/devices/system/cpu/cpu*/online
>> > ?
>> 
>> I don't consider this situation.  Any suggestions will be deeply appreciated.
>
> Hi Peter,
>
> I don't find a good method to handle this situation.
> Yes, manually offlining all the SMT sibling will get the same result of SMT 
> disabling on the fly.
>
> Actually, the normal way to enable/disable SMT on the fly is to echo on/off > 
> /sys/device/system/cpu/smt/control

That's the most convenient way, right.

But why do we need a kernel notifier for this, if you can do the same
with a sysfs knob for your driver?

Then user space can fiddle with SMT control in sysfs and afterwards tell
the driver that it should reconfigure.

That makes a ton more sense than this random notifier.

Thanks,

tglx