Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-09-08 Thread Pavel Machek
On Sat 2020-08-22 11:51:56, Sedat Dilek wrote:
> On Sat, Aug 22, 2020 at 11:23 AM Sedat Dilek  wrote:
> >
> > On Sat, Aug 22, 2020 at 10:42 AM Segher Boessenkool
> >  wrote:
> > >
> > > Hi Arvind,
> > >
> > > On Fri, Aug 21, 2020 at 11:55:52PM -0400, Arvind Sankar wrote:
> > > > Cc Segher.
> > > >
> > > > Segher, we were looking at gcc PR82602, where IRA could reorder volatile
> > > > asm's (reported on ARM). The fix was backported to gcc-6.
> > >
> > > I know ;-)
> > >
> > > > Do you know if
> > > > there is any reason the problem couldn't occur on x86 on older gcc
> > > > without the fix?
> > >
> > > No, I see no particular reason, at least GCC 5 seems vulnerable.  (The
> > > GCC 5 release branch was closed at the time this bug report was made,
> > > already).  There is no reason I see why it would work on x86 but fail
> > > elsewhere, either.
> > >
> >
> > [1] says:
> >
> > Current Minimal Requirements
> > ...
> > == ===  
> > 
> > ProgramMinimal version   Command to check the version
> > == ===  
> > 
> > GNU C  4.9  gcc --version
> >
> > - Sedat -
> >
> > [1] 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst#n32
> 
> [ CC Miguel Ojeda (Compiler Attributes maintainer) ]
> 
> There exist gcc-4.8 and gcc-4.9 for Debian/jessie where EOL was June
> 30, 2020 (see [1] and [2]).
> 
> In the latest available version "4.9.2-10+deb8u1" I see no PR82602 was
> backported (see [3] and [4]).
> 
> I am asking myself who is using such ancient compilers?
> Recently, I threw away GCC-8 from my Debian system.

I do have 4.9.2 on some systems. They work well, and are likely to compile
significantly faster than newer ones.

Please don't break them.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


RE: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-23 Thread David Laight
From: Arvind Sankar
> Sent: 22 August 2020 22:17
...
> Assuming we don't want to risk removing force_order, I'd suggest
> - make it an input/output operand, so it enforces ordering fully.
> - either restrict it to gcc < 8, or just provide a proper definition in
>   some file (maybe arch/x86/kernel/cpu/common.c)?

Is it possible to replace __force_order with a symbol that the
linker scripts defines?
Or just define __force_order in the linker script to some
'random' constant (eg 0).

ISTM that adding "m"(__force_order) to asm volatile can do no harm.
Especially for accesses to CRn and MSRn (etc) which might have obscure
side effects.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-22 Thread Arvind Sankar
On Sat, Aug 22, 2020 at 05:10:21PM -0700, Linus Torvalds wrote:
> On Sat, Aug 22, 2020 at 4:11 PM Arvind Sankar  wrote:
> >
> > Actually, is a memory clobber required for correctness? Memory accesses
> > probably shouldn't be reordered across a CRn write. Is asm volatile
> > enough to stop that or do you need a memory clobber?
> 
> You do need a memory clobber if you really care about ordering wrt
> normal memory references.
> 
> That said, I'm not convinced we do care here. Normal memory accesses
> (as seen by the compiler) should be entirely immune to any changes we
> do wrt CRx registers.
> 
> Because code that really fundamentally changes kernel mappings or
> access rules is already written in low-level assembler (eg the entry
> routines or bootup).
> 
> Anything that relies on the more subtle changes (ie user space
> accesses etc) should already be ordered by other things - usually by
> the fact that they are also "asm volatile".
> 
> But hey, maybe somebody can come up with an exception to that.
> 
> Linus

I'm sure in practice it can't happen, as any memory accesses happening
immediately around write_cr3() are probably mapped the same in both
pagetables anyway, but eg cleanup_trampoline() in
arch/x86/boot/compressed/pgtable_64.c:

memcpy(pgtable, trampoline_pgtable, PAGE_SIZE);
native_write_cr3((unsigned long)pgtable);

There'll probably be trouble if the compiler were to reverse the order
here.

We could actually make write_crn() use memory clobber, and read_crn()
use "m"(*(int *)0x1000) as an input operand. A bit hacky, but no global
variable needed. And maybe read_crn() doesn't even have to be volatile.

Also, if we look at the rdmsr/wrmsr pair, there's no force_order
equivalent AFAICT. wrmsr has a memory clobber, but the asm volatile-ness
is the only thing enforcing read/write ordering.


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-22 Thread Linus Torvalds
On Sat, Aug 22, 2020 at 4:11 PM Arvind Sankar  wrote:
>
> Actually, is a memory clobber required for correctness? Memory accesses
> probably shouldn't be reordered across a CRn write. Is asm volatile
> enough to stop that or do you need a memory clobber?

You do need a memory clobber if you really care about ordering wrt
normal memory references.

That said, I'm not convinced we do care here. Normal memory accesses
(as seen by the compiler) should be entirely immune to any changes we
do wrt CRx registers.

Because code that really fundamentally changes kernel mappings or
access rules is already written in low-level assembler (eg the entry
routines or bootup).

Anything that relies on the more subtle changes (ie user space
accesses etc) should already be ordered by other things - usually by
the fact that they are also "asm volatile".

But hey, maybe somebody can come up with an exception to that.

Linus


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-22 Thread Arvind Sankar
On Sat, Aug 22, 2020 at 02:08:27PM -0700, Linus Torvalds wrote:
> However, in this case, can we just leave that old "__force_order" hack
> alone, and to work around the clang thing, just make a dummy
> definition of it anyway.
> 
> Alternatively, just use the memory clobber. We use memory clobbers
> elsewhere in inline asms to make sure they are serialized, it's not
> normally a huge problem. Both clang and gcc should be smart enough to
> know that a memory clobber doesn't matter for things like local
> variables etc that might be on stack but have never had their address
> taken.
> 
> Or are there other cases than that particular __force_order thing that
> people now worry about?
> 
>  Linus

Actually, is a memory clobber required for correctness? Memory accesses
probably shouldn't be reordered across a CRn write. Is asm volatile
enough to stop that or do you need a memory clobber?

Replacing force_order with memory clobber introduces a few extra
instructions (testing with defconfig), but only in x86-64
hibernate/reboot/sleep code and early_ioremap_init on x86-32.


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-22 Thread Arvind Sankar
On Sat, Aug 22, 2020 at 08:17:32PM +0200, Miguel Ojeda wrote:
> On Sat, Aug 22, 2020 at 11:52 AM Sedat Dilek  wrote:
> >
> > I am asking myself who is using such ancient compilers?
> 
> There are many users/companies using older versions of compilers,
> kernels and everything. GCC <= 4.9 will still be used/supported (by
> third parties) for a handful of years at least.
> 
> However, the important question is whether those users/companies care
> about running the latest kernels. Many of those definitely do not want
> to touch their kernel either. For those that do, there are several
> longterms to pick from that still support 4.9, as well as other
> workarounds.
> 
> Thus I am usually in favor of raising the minimum whenever new hacks
> are required to be added. On the other hand, we already raised the
> version twice this year and it is not clear to me what is the minimum
> version we would need to go for to ensure this does not bite us.
> 
> > If this is a real problem with GCC version <= 5, so can this be moved
> > to a GCC specific include header-file?
> > Thinking of include/linux/compiler-gcc.h or
> > include/linux/compiler_types.h with a GCC-VERSION check?
> 
> That would be better if it can be done, yes.
> 
> Cheers,
> Miguel

The fix landed in gcc 6.5, 7.3 and 8.1. The bug is presumably quite
difficult to actually trigger. As a sample data point, I verified that
7.1 vs 7.1+fix have no differences on 32-bit and 64-bit x86 defconfigs,
on current mainline.

Assuming we don't want to risk removing force_order, I'd suggest
- make it an input/output operand, so it enforces ordering fully.
- either restrict it to gcc < 8, or just provide a proper definition in
  some file (maybe arch/x86/kernel/cpu/common.c)?

Thanks.


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-22 Thread Linus Torvalds
On Sat, Aug 22, 2020 at 11:17 AM Miguel Ojeda
 wrote:
>
> However, the important question is whether those users/companies care
> about running the latest kernels. Many of those definitely do not want
> to touch their kernel either. For those that do, there are several
> longterms to pick from that still support 4.9, as well as other
> workarounds.
>
> Thus I am usually in favor of raising the minimum whenever new hacks
> are required to be added. On the other hand, we already raised the
> version twice this year and it is not clear to me what is the minimum
> version we would need to go for to ensure this does not bite us.

Yeah. The good news is that I haven't seen a lot of pushback on the
gcc version updates so far. I was expecting some complaints. I haven't
seen a single one.

That may be because people did end up finding it very onerous and
complained internally on channels I haven't seen, but it might also be
indicative of us having perhaps been a bit too timid about compiler
version updates.

However, in this case, can we just leave that old "__force_order" hack
alone, and to work around the clang thing, just make a dummy
definition of it anyway.

Alternatively, just use the memory clobber. We use memory clobbers
elsewhere in inline asms to make sure they are serialized, it's not
normally a huge problem. Both clang and gcc should be smart enough to
know that a memory clobber doesn't matter for things like local
variables etc that might be on stack but have never had their address
taken.

Or are there other cases than that particular __force_order thing that
people now worry about?

 Linus


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-22 Thread Miguel Ojeda
On Sat, Aug 22, 2020 at 11:52 AM Sedat Dilek  wrote:
>
> I am asking myself who is using such ancient compilers?

There are many users/companies using older versions of compilers,
kernels and everything. GCC <= 4.9 will still be used/supported (by
third parties) for a handful of years at least.

However, the important question is whether those users/companies care
about running the latest kernels. Many of those definitely do not want
to touch their kernel either. For those that do, there are several
longterms to pick from that still support 4.9, as well as other
workarounds.

Thus I am usually in favor of raising the minimum whenever new hacks
are required to be added. On the other hand, we already raised the
version twice this year and it is not clear to me what is the minimum
version we would need to go for to ensure this does not bite us.

> If this is a real problem with GCC version <= 5, so can this be moved
> to a GCC specific include header-file?
> Thinking of include/linux/compiler-gcc.h or
> include/linux/compiler_types.h with a GCC-VERSION check?

That would be better if it can be done, yes.

Cheers,
Miguel


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-22 Thread Arnd Bergmann
On Sat, Aug 22, 2020 at 12:26 PM Segher Boessenkool
 wrote:
> [ There is GCC 4.9.4, no one should use an older 4.9. ]
>
> I mentioned 5 for a reason: the whole function this patch is to did not
> exist before then!  That does not mean the bug existed or did not exist
> before GCC 5, but it does for example mean that a backport to 4.9 or
> older isn't trivial at all.
>
> > I am asking myself who is using such ancient compilers?
>
> Some distros have a GCC 4.8 as system compiler.  We allow building GCC
> itself with a compiler that far back, for various reasons as well (and
> this is very sharp already, the last mainline GCC 4.8 release is from
> June 2015, not all that long ago at all).
>
> But, one reason this works is because people actually test it.  Does
> anyone actually test the kernel with old compilers?  It isn't hard to
> build a new compiler (because we make sure building a newer compiler
> works with older compilers, etc. :-) ), and as you say, most distros
> have newer compilers available nowadays.

We only recently changed the minimum from 4.6 to 4.8, and
subsequently to 4.9. Most people have fairly recent compilers,
but there are a number of notable kernel developers that
intentionally stick to old versions because of compile speed.

Each major compiler release adds about 4% overhead in total time
to compile a kernel, so between gcc-4.6 and gcc-11 you add over
50% in build time.

  Arnd


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-22 Thread Segher Boessenkool
On Sat, Aug 22, 2020 at 11:51:56AM +0200, Sedat Dilek wrote:
> On Sat, Aug 22, 2020 at 11:23 AM Sedat Dilek  wrote:
> >
> > On Sat, Aug 22, 2020 at 10:42 AM Segher Boessenkool
> >  wrote:
> > >
> > > Hi Arvind,
> > >
> > > On Fri, Aug 21, 2020 at 11:55:52PM -0400, Arvind Sankar wrote:
> > > > Cc Segher.
> > > >
> > > > Segher, we were looking at gcc PR82602, where IRA could reorder volatile
> > > > asm's (reported on ARM). The fix was backported to gcc-6.
> > >
> > > I know ;-)
> > >
> > > > Do you know if
> > > > there is any reason the problem couldn't occur on x86 on older gcc
> > > > without the fix?
> > >
> > > No, I see no particular reason, at least GCC 5 seems vulnerable.  (The
> > > GCC 5 release branch was closed at the time this bug report was made,
> > > already).  There is no reason I see why it would work on x86 but fail
> > > elsewhere, either.

> There exist gcc-4.8 and gcc-4.9 for Debian/jessie where EOL was June
> 30, 2020 (see [1] and [2]).
> 
> In the latest available version "4.9.2-10+deb8u1" I see no PR82602 was
> backported (see [3] and [4]).

[ There is GCC 4.9.4, no one should use an older 4.9. ]

I mentioned 5 for a reason: the whole function this patch is to did not
exist before then!  That does not mean the bug existed or did not exist
before GCC 5, but it does for example mean that a backport to 4.9 or
older isn't trivial at all.

> I am asking myself who is using such ancient compilers?

Some distros have a GCC 4.8 as system compiler.  We allow building GCC
itself with a compiler that far back, for various reasons as well (and
this is very sharp already, the last mainline GCC 4.8 release is from
June 2015, not all that long ago at all).

But, one reason this works is because people actually test it.  Does
anyone actually test the kernel with old compilers?  It isn't hard to
build a new compiler (because we make sure building a newer compiler
works with older compilers, etc. :-) ), and as you say, most distros
have newer compilers available nowadays.


Segher


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-22 Thread Sedat Dilek
On Sat, Aug 22, 2020 at 11:23 AM Sedat Dilek  wrote:
>
> On Sat, Aug 22, 2020 at 10:42 AM Segher Boessenkool
>  wrote:
> >
> > Hi Arvind,
> >
> > On Fri, Aug 21, 2020 at 11:55:52PM -0400, Arvind Sankar wrote:
> > > Cc Segher.
> > >
> > > Segher, we were looking at gcc PR82602, where IRA could reorder volatile
> > > asm's (reported on ARM). The fix was backported to gcc-6.
> >
> > I know ;-)
> >
> > > Do you know if
> > > there is any reason the problem couldn't occur on x86 on older gcc
> > > without the fix?
> >
> > No, I see no particular reason, at least GCC 5 seems vulnerable.  (The
> > GCC 5 release branch was closed at the time this bug report was made,
> > already).  There is no reason I see why it would work on x86 but fail
> > elsewhere, either.
> >
>
> [1] says:
>
> Current Minimal Requirements
> ...
> == ===  
> 
> ProgramMinimal version   Command to check the version
> == ===  
> 
> GNU C  4.9  gcc --version
>
> - Sedat -
>
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst#n32

[ CC Miguel Ojeda (Compiler Attributes maintainer) ]

There exist gcc-4.8 and gcc-4.9 for Debian/jessie where EOL was June
30, 2020 (see [1] and [2]).

In the latest available version "4.9.2-10+deb8u1" I see no PR82602 was
backported (see [3] and [4]).

I am asking myself who is using such ancient compilers?
Recently, I threw away GCC-8 from my Debian system.

If this is a real problem with GCC version <= 5, so can this be moved
to a GCC specific include header-file?
Thinking of include/linux/compiler-gcc.h or
include/linux/compiler_types.h with a GCC-VERSION check?

Thoughts?

- Sedat -

P.S.: Yesterday, I built with dropping __force_order entirely and LLVM
toolchain v11.0.0-rc2 on Debian/unstable AMD64 on top of recent Linux
v5.9-rc1+.

[1] https://packages.debian.org/search?keywords=gcc-4
[2] https://wiki.debian.org/LTS
[3] https://sources.debian.org/src/gcc-4.9/
[4] https://sources.debian.org/src/gcc-4.9/4.9.2-10+deb8u1/debian/patches/


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-22 Thread Sedat Dilek
On Sat, Aug 22, 2020 at 10:42 AM Segher Boessenkool
 wrote:
>
> Hi Arvind,
>
> On Fri, Aug 21, 2020 at 11:55:52PM -0400, Arvind Sankar wrote:
> > Cc Segher.
> >
> > Segher, we were looking at gcc PR82602, where IRA could reorder volatile
> > asm's (reported on ARM). The fix was backported to gcc-6.
>
> I know ;-)
>
> > Do you know if
> > there is any reason the problem couldn't occur on x86 on older gcc
> > without the fix?
>
> No, I see no particular reason, at least GCC 5 seems vulnerable.  (The
> GCC 5 release branch was closed at the time this bug report was made,
> already).  There is no reason I see why it would work on x86 but fail
> elsewhere, either.
>

[1] says:

Current Minimal Requirements
...
== ===  
ProgramMinimal version   Command to check the version
== ===  
GNU C  4.9  gcc --version

- Sedat -

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst#n32


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-22 Thread Segher Boessenkool
Hi Arvind,

On Fri, Aug 21, 2020 at 11:55:52PM -0400, Arvind Sankar wrote:
> Cc Segher.
> 
> Segher, we were looking at gcc PR82602, where IRA could reorder volatile
> asm's (reported on ARM). The fix was backported to gcc-6.

I know ;-)

> Do you know if
> there is any reason the problem couldn't occur on x86 on older gcc
> without the fix?

No, I see no particular reason, at least GCC 5 seems vulnerable.  (The
GCC 5 release branch was closed at the time this bug report was made,
already).  There is no reason I see why it would work on x86 but fail
elsewhere, either.


Segher


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-21 Thread Arvind Sankar
On Sat, Aug 22, 2020 at 02:43:08AM +0200, Thomas Gleixner wrote:
> On Fri, Aug 21 2020 at 16:16, Nick Desaulniers wrote:
> > On Fri, Aug 21, 2020 at 4:04 PM Arvind Sankar  wrote:
> >> On Fri, Aug 21, 2020 at 02:37:48AM +0200, Thomas Gleixner wrote:
> >> The gcc bug I linked to earlier is only fixed in gcc-6 onwards. Is that
> >
> > (based on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602#c14)
> >
> >> good enough to remove force_order? I can test gcc-4.9 and gcc-5 to check
> >> if it would currently have any impact.
> 
> And that test tells you what exactly? That your particular build of
> those compilers does not have the problem. A truly scientific approach.

More that the current kernel code doesn't have that problem, but yeah,
it might creep in later.

> 
> > I think checking the disassemblies with a pre-gcc-6 would be good
> > enough then; that bug isn't specific to this particular case.
> 
> What? I clearly want a statement from the GCC people that this won't
> happen on pre gcc6 compilers and not just some 'works for me' statement
> based on a randomly picked compiler build.

Presumably also from clang that the compiler does have protections
against this, as opposed to doesn't happen today.

> 
> Thanks,
> 
> tglx

Cc Segher.

Segher, we were looking at gcc PR82602, where IRA could reorder volatile
asm's (reported on ARM). The fix was backported to gcc-6. Do you know if
there is any reason the problem couldn't occur on x86 on older gcc
without the fix?

Thanks.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-21 Thread Thomas Gleixner
On Fri, Aug 21 2020 at 16:16, Nick Desaulniers wrote:
> On Fri, Aug 21, 2020 at 4:04 PM Arvind Sankar  wrote:
>> On Fri, Aug 21, 2020 at 02:37:48AM +0200, Thomas Gleixner wrote:
>> The gcc bug I linked to earlier is only fixed in gcc-6 onwards. Is that
>
> (based on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602#c14)
>
>> good enough to remove force_order? I can test gcc-4.9 and gcc-5 to check
>> if it would currently have any impact.

And that test tells you what exactly? That your particular build of
those compilers does not have the problem. A truly scientific approach.

> I think checking the disassemblies with a pre-gcc-6 would be good
> enough then; that bug isn't specific to this particular case.

What? I clearly want a statement from the GCC people that this won't
happen on pre gcc6 compilers and not just some 'works for me' statement
based on a randomly picked compiler build.

Thanks,

tglx


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-21 Thread Arvind Sankar
On Fri, Aug 21, 2020 at 04:16:56PM -0700, Nick Desaulniers wrote:
> On Fri, Aug 21, 2020 at 4:04 PM Arvind Sankar  wrote:
> >
> > On Fri, Aug 21, 2020 at 02:37:48AM +0200, Thomas Gleixner wrote:
> > > On Thu, Aug 20 2020 at 09:06, Arvind Sankar wrote:
> > > > I don't think that's an issue, or at least, not one where force_order
> > > > helps.
> > > >
> > > > If the source for foo() is not visible to the compiler, the only reason
> > > > force_order prevents the reordering is because foo() might have
> > > > references to it, but equally foo() might have volatile asm, so the
> > > > reordering isn't possible anyway.
> > > >
> > > > If the source is visible, force_order won't prevent any reordering
> > > > except across references to force_order, but the only references are
> > > > from the volatile asm's which already prevent reordering.
> > > >
> > > > I think force_order can only help with buggy compilers, and for those it
> > > > should really have been an input-output operand -- it wouldn't currently
> > > > do anything to prevent cr writes from being reordered.
> 
> I agree 100%.  From the link to GCC docs, the code in question doesn't
> even follow the pattern from the doc from informing the compiler of
> any dependency, it just looks like !@#$.
> 
> > >
> > > Fair enough. Care to provide a patch which has the collected wisdom of
> > > this thread in the changelog?
> > >
> > > Thanks,
> > >
> > > tglx
> >
> > The gcc bug I linked to earlier is only fixed in gcc-6 onwards. Is that
> 
> (based on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602#c14)

I actually checked gcc's git repo too. The fix is not there in gcc-4.9
and gcc-5.

> 
> > good enough to remove force_order? I can test gcc-4.9 and gcc-5 to check
> > if it would currently have any impact.
> 
> I think checking the disassemblies with a pre-gcc-6 would be good
> enough then; that bug isn't specific to this particular case.
> 
> > CBL guys, can you confirm that clang also will not reorder volatile asm?
> 
> Full disassemblies of vmlinux pre vs post __force_order removal are
> the same.  That's pretty good actually; I was worried for a code base
> of this size whether two compiles would produce the exact same
> disassemblies; I know the version strings are timestamped, for
> instance, but I didn't compare data, just .text.  I should triple
> check i386, and some of the ko's that use modified functions.  I'd be
> happy to help provide a tested by tag for numerous configurations with
> Clang.
> 
> Attaching the diff I was testing, feel free to add a commit message.
> --
> Thanks,
> ~Nick Desaulniers

Thanks, will write it up over the weekend.


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-21 Thread Nick Desaulniers
On Fri, Aug 21, 2020 at 4:04 PM Arvind Sankar  wrote:
>
> On Fri, Aug 21, 2020 at 02:37:48AM +0200, Thomas Gleixner wrote:
> > On Thu, Aug 20 2020 at 09:06, Arvind Sankar wrote:
> > > I don't think that's an issue, or at least, not one where force_order
> > > helps.
> > >
> > > If the source for foo() is not visible to the compiler, the only reason
> > > force_order prevents the reordering is because foo() might have
> > > references to it, but equally foo() might have volatile asm, so the
> > > reordering isn't possible anyway.
> > >
> > > If the source is visible, force_order won't prevent any reordering
> > > except across references to force_order, but the only references are
> > > from the volatile asm's which already prevent reordering.
> > >
> > > I think force_order can only help with buggy compilers, and for those it
> > > should really have been an input-output operand -- it wouldn't currently
> > > do anything to prevent cr writes from being reordered.

I agree 100%.  From the link to GCC docs, the code in question doesn't
even follow the pattern from the doc from informing the compiler of
any dependency, it just looks like !@#$.

> >
> > Fair enough. Care to provide a patch which has the collected wisdom of
> > this thread in the changelog?
> >
> > Thanks,
> >
> > tglx
>
> The gcc bug I linked to earlier is only fixed in gcc-6 onwards. Is that

(based on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602#c14)

> good enough to remove force_order? I can test gcc-4.9 and gcc-5 to check
> if it would currently have any impact.

I think checking the disassemblies with a pre-gcc-6 would be good
enough then; that bug isn't specific to this particular case.

> CBL guys, can you confirm that clang also will not reorder volatile asm?

Full disassemblies of vmlinux pre vs post __force_order removal are
the same.  That's pretty good actually; I was worried for a code base
of this size whether two compiles would produce the exact same
disassemblies; I know the version strings are timestamped, for
instance, but I didn't compare data, just .text.  I should triple
check i386, and some of the ko's that use modified functions.  I'd be
happy to help provide a tested by tag for numerous configurations with
Clang.

Attaching the diff I was testing, feel free to add a commit message.
--
Thanks,
~Nick Desaulniers


force_order.patch
Description: Binary data


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-21 Thread Arvind Sankar
On Fri, Aug 21, 2020 at 02:37:48AM +0200, Thomas Gleixner wrote:
> On Thu, Aug 20 2020 at 09:06, Arvind Sankar wrote:
> > I don't think that's an issue, or at least, not one where force_order
> > helps.
> >
> > If the source for foo() is not visible to the compiler, the only reason
> > force_order prevents the reordering is because foo() might have
> > references to it, but equally foo() might have volatile asm, so the
> > reordering isn't possible anyway.
> >
> > If the source is visible, force_order won't prevent any reordering
> > except across references to force_order, but the only references are
> > from the volatile asm's which already prevent reordering.
> >
> > I think force_order can only help with buggy compilers, and for those it
> > should really have been an input-output operand -- it wouldn't currently
> > do anything to prevent cr writes from being reordered.
> 
> Fair enough. Care to provide a patch which has the collected wisdom of
> this thread in the changelog?
> 
> Thanks,
> 
> tglx

The gcc bug I linked to earlier is only fixed in gcc-6 onwards. Is that
good enough to remove force_order? I can test gcc-4.9 and gcc-5 to check
if it would currently have any impact.

CBL guys, can you confirm that clang also will not reorder volatile asm?

Thanks.


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-20 Thread Thomas Gleixner
On Thu, Aug 20 2020 at 09:06, Arvind Sankar wrote:
> I don't think that's an issue, or at least, not one where force_order
> helps.
>
> If the source for foo() is not visible to the compiler, the only reason
> force_order prevents the reordering is because foo() might have
> references to it, but equally foo() might have volatile asm, so the
> reordering isn't possible anyway.
>
> If the source is visible, force_order won't prevent any reordering
> except across references to force_order, but the only references are
> from the volatile asm's which already prevent reordering.
>
> I think force_order can only help with buggy compilers, and for those it
> should really have been an input-output operand -- it wouldn't currently
> do anything to prevent cr writes from being reordered.

Fair enough. Care to provide a patch which has the collected wisdom of
this thread in the changelog?

Thanks,

tglx


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-20 Thread Arvind Sankar
On Thu, Aug 20, 2020 at 12:44:06PM +0200, Thomas Gleixner wrote:
> On Thu, Aug 13 2020 at 14:09, Arvind Sankar wrote:
> > On Thu, Aug 13, 2020 at 10:37:01AM -0700, Paul E. McKenney wrote:
> >> > Let me ask (hopefully) useful questions this time:
> >> > 
> >> >   Is a compiler allowed to reorder two 'asm volatile()'?
> >> > 
> >> >   Are there compilers (gcc >= 4.9 or other supported ones) which do that?
> >> 
> >> I would hope that the answer to both of these questions is "no"!
> >> 
> >> But I freely confess that I have been disappointed before on this sort
> >> of thing.  :-/
> >> 
> >>Thanx, Paul
> >
> > Ok, I found this, so gcc developers consider re-ordering volatile asm
> > wrt each other a bug at least.
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602
> 
> Yes. It prevents reordering of volatiles, but it does not necessarily
> prevent reorder of something like this:
> 
> asm volatile(...);
> foo();
> asm volatile(...);
> 
> it might turn that into
> 
> foo();
> asm volatile(...);
> asm volatile(...);
> 
> if I understood their discussion correctly. So removing this magic is
> not really straight forward.
> 
> Thanks,
> 
> tglx
> 
> 

I don't think that's an issue, or at least, not one where force_order
helps.

If the source for foo() is not visible to the compiler, the only reason
force_order prevents the reordering is because foo() might have
references to it, but equally foo() might have volatile asm, so the
reordering isn't possible anyway.

If the source is visible, force_order won't prevent any reordering
except across references to force_order, but the only references are
from the volatile asm's which already prevent reordering.

I think force_order can only help with buggy compilers, and for those it
should really have been an input-output operand -- it wouldn't currently
do anything to prevent cr writes from being reordered.

Thanks.


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-20 Thread Thomas Gleixner
On Thu, Aug 13 2020 at 14:09, Arvind Sankar wrote:
> On Thu, Aug 13, 2020 at 10:37:01AM -0700, Paul E. McKenney wrote:
>> > Let me ask (hopefully) useful questions this time:
>> > 
>> >   Is a compiler allowed to reorder two 'asm volatile()'?
>> > 
>> >   Are there compilers (gcc >= 4.9 or other supported ones) which do that?
>> 
>> I would hope that the answer to both of these questions is "no"!
>> 
>> But I freely confess that I have been disappointed before on this sort
>> of thing.  :-/
>> 
>>  Thanx, Paul
>
> Ok, I found this, so gcc developers consider re-ordering volatile asm
> wrt each other a bug at least.
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602

Yes. It prevents reordering of volatiles, but it does not necessarily
prevent reorder of something like this:

asm volatile(...);
foo();
asm volatile(...);

it might turn that into

foo();
asm volatile(...);
asm volatile(...);

if I understood their discussion correctly. So removing this magic is
not really straight forward.

Thanks,

tglx




Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-16 Thread Sedat Dilek
GCC toolchain simply ignores if kaslr_64.o has __force_order means the
build ends up successfully whereas LLVM toolchain and IAS breaks and
the build stops and needs explicitly commit
df6d4f9db79c1a5d6f48b59db35ccd1e9ff9adfc ("x86/boot/compressed: Don't
declare __force_order in kaslr_64.c") reverted to fix this.
With the revert GCC toolchain is also fine.

Maybe it is good to revert that commit?

This is with [1]:

diff --git a/arch/x86/include/asm/special_insns.h
b/arch/x86/include/asm/special_insns.h
index 59a3e13204c3..e1c19c5ecd5e 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -17,7 +17,7 @@
  * all loads stores around it, which can hurt performance. Solution is to
  * use a variable and mimic reads and writes to it to enforce serialization
  */
-extern unsigned long __force_order;
+extern unsigned long __force_order __weak;

 void native_write_cr0(unsigned long val);

...and the patchset of "x86/boot: Remove run-time relocations from
compressed kernel" applied [3].

More details in [4].

- Sedat -

References:
[1] https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674182703
[2] https://git.kernel.org/linus/df6d4f9db79c1a5d6f48b59db35ccd1e9ff9adfc
[3] https://lore.kernel.org/patchwork/project/lkml/list/?series=456251
[4] https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674502114


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-15 Thread Sedat Dilek
On Sat, Aug 15, 2020 at 10:23 AM Sedat Dilek  wrote:
>
> On Sat, Aug 15, 2020 at 5:28 AM Sedat Dilek  wrote:
> >
> > On Sat, Aug 15, 2020 at 2:27 AM Nick Desaulniers
> >  wrote:
> > >
> > > On Fri, Aug 14, 2020 at 3:57 PM Nick Desaulniers
> > >  wrote:
> > > >
> > > > On Fri, Aug 14, 2020 at 2:19 PM Sedat Dilek  
> > > > wrote:
> > > > >
> > > > > On Fri, Aug 14, 2020 at 7:29 PM Sedat Dilek  
> > > > > wrote:
> > > > > >
> > > > > > Thanks for the proposal.
> > > > > >
> > > > > > I have adapted it to fit my patchset against Linux v5.8.
> > > > > >
> > > > > > Both Debian's GCC-10 and a snapshot version of LLVM toolchain
> > > > > > v11.0.0-rc1+ seems to be OK.
> > > > > >
> > > > >
> > > > > Yupp, OK.
> > > > >
> > > > > I was able to boot FreeDOS 1.2 VM in VirtualBox GUI.
> > > >
> > > > Hi Sedat,
> > > > Apologies, but it's not clear to me precisely which patch you tested.
> > > > Can you please confirm whether you tested:
> > > > 1. Arnd's patch that started this thread.
> > > > 2. My proposed diff adding -fno-addrsig to CFLAGS_powernow-k6.o.
> > > > 3. My proposed diff removing __force_order from the kernel.
> > > >
> > > > I'm hoping you were referring to testing 3., but it's not clear to me.
> > >
> > > Ah, sorry, I missed your comment on github:
> > > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674282107
> > >
> > > Ok, I will look at more disassembly next week and hopefully have a
> > > patch ready, with your tested by tag.
> > >
> >
> > Sorry for not being precise - I tested with solution (3.).
> > Later I added the diff I used as mentioned in your above comment.
> >
> > See [1]:
> >
> > > In a 2nd run building with a selfmade clang-11 and LLVM "bin"utils is 
> > > fine, too.
> >
> > I cannot say much to older versions of GCC and/or LLVM/Clang if
> > removing "__force_order" works fine.
> >
> > Another (4.) solution:
> > Sami tried successfully by adding "__weak" declaration with
> > CONFIG_LKDTM=m (see [2]).
> > I am OK if this works, too.
> >
> > Please, see my attachments.
> >
> > - Sedat -
> >
> > [1] 
> > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674340760
> > [2] 
> > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674182703
>
> Unfortunately, the diff from Sami does not work together with Arvind's
> patchset...
>
> x86/boot: Remove run-time relocations from compressed kernel
>
> ...which got included in  recently.
>
> I see the following:
>
>   ld.lld-11 -m elf_x86_64  -pie  --no-dynamic-linker -T
> arch/x86/boot/compressed/vmlinux.lds
> arch/x86/boot/compressed/kernel_info.o
> arch/x86/boot/compressed/head_64.o arch/x86/boot/compressed/misc.o
> arch/x86/boot/compressed/string.o arch/x86/boot/compressed/cmdline.o
> arch/x86/boot/compressed/error.o arch/x86/boot/compressed/piggy.o
> arch/x86/boot/compressed/cpuflags.o
> arch/x86/boot/compressed/early_serial_console.o
> arch/x86/boot/compressed/kaslr.o arch/x86/boot/compressed/kaslr_64.o
> arch/x86/boot/compressed/mem_encrypt.o
> arch/x86/boot/compressed/pgtable_64.o arch/x86/boot/compressed/acpi.o
> arch/x86/boot/compressed/efi_thunk_64.o
> drivers/firmware/efi/libstub/lib.a -o arch/x86/boot/compressed/vmlinux
> ld.lld-11: error: Unexpected GOT entries detected!
> ld.lld-11: error: Unexpected run-time relocations detected!
> ld.lld-11: error: Unexpected GOT entries detected!
> ld.lld-11: error: Unexpected run-time relocations detected!
> make[5]: *** [arch/x86/boot/compressed/Makefile:91:
> arch/x86/boot/compressed/vmlinux] Error 1
>
> When you need further informations, please let me know.
>
> - Sedat -
>
> [1] 
> https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674182703

When I revert...

commit df6d4f9db79c1a5d6f48b59db35ccd1e9ff9adfc
"x86/boot/compressed: Don't declare __force_order in kaslr_64.c"

...I can build, boot on bare metal and start FreeDOS VM in VirtualBox.

For more details see [2].

- Sedat -

[1] https://git.kernel.org/linus/df6d4f9db79c1a5d6f48b59db35ccd1e9ff9adfc
[2] https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674378085


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-15 Thread Sedat Dilek
On Sat, Aug 15, 2020 at 5:28 AM Sedat Dilek  wrote:
>
> On Sat, Aug 15, 2020 at 2:27 AM Nick Desaulniers
>  wrote:
> >
> > On Fri, Aug 14, 2020 at 3:57 PM Nick Desaulniers
> >  wrote:
> > >
> > > On Fri, Aug 14, 2020 at 2:19 PM Sedat Dilek  wrote:
> > > >
> > > > On Fri, Aug 14, 2020 at 7:29 PM Sedat Dilek  
> > > > wrote:
> > > > >
> > > > > Thanks for the proposal.
> > > > >
> > > > > I have adapted it to fit my patchset against Linux v5.8.
> > > > >
> > > > > Both Debian's GCC-10 and a snapshot version of LLVM toolchain
> > > > > v11.0.0-rc1+ seems to be OK.
> > > > >
> > > >
> > > > Yupp, OK.
> > > >
> > > > I was able to boot FreeDOS 1.2 VM in VirtualBox GUI.
> > >
> > > Hi Sedat,
> > > Apologies, but it's not clear to me precisely which patch you tested.
> > > Can you please confirm whether you tested:
> > > 1. Arnd's patch that started this thread.
> > > 2. My proposed diff adding -fno-addrsig to CFLAGS_powernow-k6.o.
> > > 3. My proposed diff removing __force_order from the kernel.
> > >
> > > I'm hoping you were referring to testing 3., but it's not clear to me.
> >
> > Ah, sorry, I missed your comment on github:
> > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674282107
> >
> > Ok, I will look at more disassembly next week and hopefully have a
> > patch ready, with your tested by tag.
> >
>
> Sorry for not being precise - I tested with solution (3.).
> Later I added the diff I used as mentioned in your above comment.
>
> See [1]:
>
> > In a 2nd run building with a selfmade clang-11 and LLVM "bin"utils is fine, 
> > too.
>
> I cannot say much to older versions of GCC and/or LLVM/Clang if
> removing "__force_order" works fine.
>
> Another (4.) solution:
> Sami tried successfully by adding "__weak" declaration with
> CONFIG_LKDTM=m (see [2]).
> I am OK if this works, too.
>
> Please, see my attachments.
>
> - Sedat -
>
> [1] 
> https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674340760
> [2] 
> https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674182703

Unfortunately, the diff from Sami does not work together with Arvind's
patchset...

x86/boot: Remove run-time relocations from compressed kernel

...which got included in  recently.

I see the following:

  ld.lld-11 -m elf_x86_64  -pie  --no-dynamic-linker -T
arch/x86/boot/compressed/vmlinux.lds
arch/x86/boot/compressed/kernel_info.o
arch/x86/boot/compressed/head_64.o arch/x86/boot/compressed/misc.o
arch/x86/boot/compressed/string.o arch/x86/boot/compressed/cmdline.o
arch/x86/boot/compressed/error.o arch/x86/boot/compressed/piggy.o
arch/x86/boot/compressed/cpuflags.o
arch/x86/boot/compressed/early_serial_console.o
arch/x86/boot/compressed/kaslr.o arch/x86/boot/compressed/kaslr_64.o
arch/x86/boot/compressed/mem_encrypt.o
arch/x86/boot/compressed/pgtable_64.o arch/x86/boot/compressed/acpi.o
arch/x86/boot/compressed/efi_thunk_64.o
drivers/firmware/efi/libstub/lib.a -o arch/x86/boot/compressed/vmlinux
ld.lld-11: error: Unexpected GOT entries detected!
ld.lld-11: error: Unexpected run-time relocations detected!
ld.lld-11: error: Unexpected GOT entries detected!
ld.lld-11: error: Unexpected run-time relocations detected!
make[5]: *** [arch/x86/boot/compressed/Makefile:91:
arch/x86/boot/compressed/vmlinux] Error 1

When you need further informations, please let me know.

- Sedat -

[1] https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674182703


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-15 Thread Sedat Dilek
On Sat, Aug 15, 2020 at 12:46 PM Sedat Dilek  wrote:
>
> On Sat, Aug 15, 2020 at 10:23 AM Sedat Dilek  wrote:
> >
> > On Sat, Aug 15, 2020 at 5:28 AM Sedat Dilek  wrote:
> > >
> > > On Sat, Aug 15, 2020 at 2:27 AM Nick Desaulniers
> > >  wrote:
> > > >
> > > > On Fri, Aug 14, 2020 at 3:57 PM Nick Desaulniers
> > > >  wrote:
> > > > >
> > > > > On Fri, Aug 14, 2020 at 2:19 PM Sedat Dilek  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, Aug 14, 2020 at 7:29 PM Sedat Dilek  
> > > > > > wrote:
> > > > > > >
> > > > > > > Thanks for the proposal.
> > > > > > >
> > > > > > > I have adapted it to fit my patchset against Linux v5.8.
> > > > > > >
> > > > > > > Both Debian's GCC-10 and a snapshot version of LLVM toolchain
> > > > > > > v11.0.0-rc1+ seems to be OK.
> > > > > > >
> > > > > >
> > > > > > Yupp, OK.
> > > > > >
> > > > > > I was able to boot FreeDOS 1.2 VM in VirtualBox GUI.
> > > > >
> > > > > Hi Sedat,
> > > > > Apologies, but it's not clear to me precisely which patch you tested.
> > > > > Can you please confirm whether you tested:
> > > > > 1. Arnd's patch that started this thread.
> > > > > 2. My proposed diff adding -fno-addrsig to CFLAGS_powernow-k6.o.
> > > > > 3. My proposed diff removing __force_order from the kernel.
> > > > >
> > > > > I'm hoping you were referring to testing 3., but it's not clear to me.
> > > >
> > > > Ah, sorry, I missed your comment on github:
> > > > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674282107
> > > >
> > > > Ok, I will look at more disassembly next week and hopefully have a
> > > > patch ready, with your tested by tag.
> > > >
> > >
> > > Sorry for not being precise - I tested with solution (3.).
> > > Later I added the diff I used as mentioned in your above comment.
> > >
> > > See [1]:
> > >
> > > > In a 2nd run building with a selfmade clang-11 and LLVM "bin"utils is 
> > > > fine, too.
> > >
> > > I cannot say much to older versions of GCC and/or LLVM/Clang if
> > > removing "__force_order" works fine.
> > >
> > > Another (4.) solution:
> > > Sami tried successfully by adding "__weak" declaration with
> > > CONFIG_LKDTM=m (see [2]).
> > > I am OK if this works, too.
> > >
> > > Please, see my attachments.
> > >
> > > - Sedat -
> > >
> > > [1] 
> > > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674340760
> > > [2] 
> > > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674182703
> >
> > Unfortunately, the diff from Sami does not work together with Arvind's
> > patchset...
> >
> > x86/boot: Remove run-time relocations from compressed kernel
> >
> > ...which got included in  recently.
> >
> > I see the following:
> >
> >   ld.lld-11 -m elf_x86_64  -pie  --no-dynamic-linker -T
> > arch/x86/boot/compressed/vmlinux.lds
> > arch/x86/boot/compressed/kernel_info.o
> > arch/x86/boot/compressed/head_64.o arch/x86/boot/compressed/misc.o
> > arch/x86/boot/compressed/string.o arch/x86/boot/compressed/cmdline.o
> > arch/x86/boot/compressed/error.o arch/x86/boot/compressed/piggy.o
> > arch/x86/boot/compressed/cpuflags.o
> > arch/x86/boot/compressed/early_serial_console.o
> > arch/x86/boot/compressed/kaslr.o arch/x86/boot/compressed/kaslr_64.o
> > arch/x86/boot/compressed/mem_encrypt.o
> > arch/x86/boot/compressed/pgtable_64.o arch/x86/boot/compressed/acpi.o
> > arch/x86/boot/compressed/efi_thunk_64.o
> > drivers/firmware/efi/libstub/lib.a -o arch/x86/boot/compressed/vmlinux
> > ld.lld-11: error: Unexpected GOT entries detected!
> > ld.lld-11: error: Unexpected run-time relocations detected!
> > ld.lld-11: error: Unexpected GOT entries detected!
> > ld.lld-11: error: Unexpected run-time relocations detected!
> > make[5]: *** [arch/x86/boot/compressed/Makefile:91:
> > arch/x86/boot/compressed/vmlinux] Error 1
> >
> > When you need further informations, please let me know.
> >
> > - Sedat -
> >
> > [1] 
> > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674182703
>
> When I revert...
>
> commit df6d4f9db79c1a5d6f48b59db35ccd1e9ff9adfc
> "x86/boot/compressed: Don't declare __force_order in kaslr_64.c"
>
> ...I can build, boot on bare metal and start FreeDOS VM in VirtualBox.
>
> For more details see [2].
>
> - Sedat -
>
> [1] https://git.kernel.org/linus/df6d4f9db79c1a5d6f48b59db35ccd1e9ff9adfc
> [2] 
> https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674378085

Fine with using Debian's GCC v10.2 and GNU/ld v2.35 (from binutils v2.35).

All details in [1].

[1] https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674406068


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-14 Thread Nick Desaulniers
On Fri, Aug 14, 2020 at 3:57 PM Nick Desaulniers
 wrote:
>
> On Fri, Aug 14, 2020 at 2:19 PM Sedat Dilek  wrote:
> >
> > On Fri, Aug 14, 2020 at 7:29 PM Sedat Dilek  wrote:
> > >
> > > Thanks for the proposal.
> > >
> > > I have adapted it to fit my patchset against Linux v5.8.
> > >
> > > Both Debian's GCC-10 and a snapshot version of LLVM toolchain
> > > v11.0.0-rc1+ seems to be OK.
> > >
> >
> > Yupp, OK.
> >
> > I was able to boot FreeDOS 1.2 VM in VirtualBox GUI.
>
> Hi Sedat,
> Apologies, but it's not clear to me precisely which patch you tested.
> Can you please confirm whether you tested:
> 1. Arnd's patch that started this thread.
> 2. My proposed diff adding -fno-addrsig to CFLAGS_powernow-k6.o.
> 3. My proposed diff removing __force_order from the kernel.
>
> I'm hoping you were referring to testing 3., but it's not clear to me.

Ah, sorry, I missed your comment on github:
https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674282107

Ok, I will look at more disassembly next week and hopefully have a
patch ready, with your tested by tag.

-- 
Thanks,
~Nick Desaulniers


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-14 Thread Nick Desaulniers
On Fri, Aug 14, 2020 at 2:19 PM Sedat Dilek  wrote:
>
> On Fri, Aug 14, 2020 at 7:29 PM Sedat Dilek  wrote:
> >
> > Thanks for the proposal.
> >
> > I have adapted it to fit my patchset against Linux v5.8.
> >
> > Both Debian's GCC-10 and a snapshot version of LLVM toolchain
> > v11.0.0-rc1+ seems to be OK.
> >
>
> Yupp, OK.
>
> I was able to boot FreeDOS 1.2 VM in VirtualBox GUI.

Hi Sedat,
Apologies, but it's not clear to me precisely which patch you tested.
Can you please confirm whether you tested:
1. Arnd's patch that started this thread.
2. My proposed diff adding -fno-addrsig to CFLAGS_powernow-k6.o.
3. My proposed diff removing __force_order from the kernel.

I'm hoping you were referring to testing 3., but it's not clear to me.
I've been comparing the full disassemblies of vmlinux images when
built with Clang with 3 applied (they're no different, which is a
pleasant surprise, I didn't think kernel builds woulds would be fully
deterministic given the sheer amount of source).  I still need to
check the compressed vmlinux image, and various .ko's (XEN) that use
these read/write_cr[0,1,2,4]() functions, and then check them again
when built with GCC.  I'm falling behind a little trying to get our MC
organized for plumbers, as well as the end of intern season and
beginning of bi-annual "performance review" ("not stack ranking" I'm
told) at work.  If I don't find any differences, or if I do but don't
find them to be meaningful, I hope to push a more formal patch (rather
than just a diff) maybe next week.  I'll include my findings either
way; if it was 3 that you tested, I'll include your tested by tag when
sending.  Otherwise maybe you can help us test the more formal patch
next week?
-- 
Thanks,
~Nick Desaulniers


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-14 Thread Sedat Dilek
On Tue, Aug 4, 2020 at 2:09 AM 'Nick Desaulniers' via Clang Built
Linux  wrote:
>
> On Wed, May 27, 2020 at 6:53 AM Arnd Bergmann  wrote:
> >
> > When using the clang integrated assembler, we get a reference
> > to __force_order that should normally get ignored in a few
> > rare cases:
> >
> > ERROR: modpost: "__force_order" [drivers/cpufreq/powernow-k6.ko] undefined!
> >
> > Add a 'static' definition so any file in which this happens can
> > have a local copy.
> >
> > Signed-off-by: Arnd Bergmann 
>
> Hi Arnd,
> Looks like
> $ ARCH=i386 make CC=clang LLVM_IAS=1 -j71
> defconfig+CONFIG_X86_POWERNOW_K6=m
> is the simplest reproducer.
>
> If I run
> $ llvm-readelf -s drivers/cpufreq/powernow-k6.o | grep __force_order
> 39:  0 NOTYPE  GLOBAL DEFAULT   UND __force_order
> we can indeed see an undefined reference to __force_order.
>
> If I build the .s file via
> $ ARCH=i386 make CC=clang LLVM_IAS=1 -j71 drivers/cpufreq/powernow-k6.s
> the only reference I see to __force_order is:
> 979   .addrsig_sym __force_order
>
> which is created by Clang's implicit -faddr-sig.  If I disable that
> for this file via:
>
> ```diff
> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> index f1b7e3dd6e5d..87d655d5af49 100644
> --- a/drivers/cpufreq/Makefile
> +++ b/drivers/cpufreq/Makefile
> @@ -28,6 +28,9 @@ obj-$(CONFIG_X86_ACPI_CPUFREQ)+=
> acpi-cpufreq.o
>  obj-$(CONFIG_X86_POWERNOW_K8)  += powernow-k8.o
>  obj-$(CONFIG_X86_PCC_CPUFREQ)  += pcc-cpufreq.o
>  obj-$(CONFIG_X86_POWERNOW_K6)  += powernow-k6.o
> +ifdef CONFIG_CC_IS_CLANG
> +CFLAGS_powernow-k6.o   += -fno-addrsig
> +endif
>  obj-$(CONFIG_X86_POWERNOW_K7)  += powernow-k7.o
>  obj-$(CONFIG_X86_LONGHAUL) += longhaul.o
>  obj-$(CONFIG_X86_E_POWERSAVER) += e_powersaver.o
> ```
> then the module links without error, and we get no hits for
> __force_order from llvm-readelf -s.  This makes me think there may be
> a bug in Clang generating address significance tables for global
> variables that are otherwise unused, resulting in such linkage
> failures. +pcc@ for that.
>
> I ran a creduce job on drivers/cpufreq/powernow-k6.i where I'd compile
> twice, one with the implicit default value of -faddr-sig and look for
> the undefined __force_order, and again with -fno-addrsig and ensure
> there was no undefined __force_order, which coughed up:
> extern int __force_order;
> int a(void) { asm("" : "=m"(__force_order)); return 0; }
> as the bare minimum for an address significant table.
> https://godbolt.org/z/cjfaqM
>
> I'll bet this is coming from the call to read_cr0() in
> powernow_k6_set_cpu_multiplier().  If __force_order is defined in
> arch/x86/boot/compressed/pgtable_64.c, then I'm not sure it's a good
> idea to build drivers/cpufreq/powernow-k6.c as a kernel module
> (CONFIG_X86_POWERNOW_K6=m) vs statically compiled in.  Wouldn't
> __force_order need to be EXPORT'ed for kernel modules to use it
> safely?
>
> > ---
> >  arch/x86/boot/compressed/pgtable_64.c | 2 ++
> >  arch/x86/include/asm/special_insns.h  | 7 +++
> >  2 files changed, 9 insertions(+)
> >
> > diff --git a/arch/x86/boot/compressed/pgtable_64.c 
> > b/arch/x86/boot/compressed/pgtable_64.c
> > index c8862696a47b..8595194cea41 100644
> > --- a/arch/x86/boot/compressed/pgtable_64.c
> > +++ b/arch/x86/boot/compressed/pgtable_64.c
> > @@ -12,7 +12,9 @@
> >   * It is not referenced from the code, but GCC < 5 with -fPIE would fail
> >   * due to an undefined symbol. Define it to make these ancient GCCs work.
> >   */
> > +#ifndef CONFIG_CC_IS_CLANG
> >  unsigned long __force_order;
> > +#endif
> >
> >  #define BIOS_START_MIN 0x2U/* 128K, less than this is 
> > insane */
> >  #define BIOS_START_MAX 0x9f000U/* 640K, absolute maximum */
> > diff --git a/arch/x86/include/asm/special_insns.h 
> > b/arch/x86/include/asm/special_insns.h
> > index 82436cb04ccf..7081e587c1ea 100644
> > --- a/arch/x86/include/asm/special_insns.h
> > +++ b/arch/x86/include/asm/special_insns.h
> > @@ -16,8 +16,15 @@
> >   * A memory clobber would solve the problem, but would prevent reordering 
> > of
> >   * all loads stores around it, which can hurt performance. Solution is to
> >   * use a variable and mimic reads and writes to it to enforce serialization
> > + *
> > + * Clang sometimes fails to kill the reference to the dummy variable, so
> > + * provide an actual copy.
> >   */
> > +#ifdef CONFIG_CC_IS_CLANG
> > +static unsigned long __force_order;
> > +#else
> >  extern unsigned long __force_order;
> > +#endif
> >
> >  void native_write_cr0(unsigned long val);
> >
> > --
> > 2.26.2
> >

Thanks for the proposal.

I have adapted it to fit my patchset against Linux v5.8.

Both Debian's GCC-10 and a snapshot version of LLVM toolchain
v11.0.0-rc1+ seems to be OK.

MAKE_OPTS="V=1 -j3 CC=gcc-10 LD=ld.bfd"
make $MAKE_OPTS arch/x86/kernel/cpu/common.o

MAKE_OPTS="V=1 -j3 HOSTCC=clang-11 HOST

Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-13 Thread Paul E. McKenney
On Thu, Aug 13, 2020 at 02:09:33PM -0400, Arvind Sankar wrote:
> On Thu, Aug 13, 2020 at 10:37:01AM -0700, Paul E. McKenney wrote:
> > On Thu, Aug 13, 2020 at 07:28:57PM +0200, Thomas Gleixner wrote:
> > > Nick Desaulniers  writes:
> > > > On Thu, Aug 6, 2020 at 3:11 PM Thomas Gleixner  
> > > > wrote:
> > > >> > + *
> > > >> > + * Clang sometimes fails to kill the reference to the dummy 
> > > >> > variable, so
> > > >> > + * provide an actual copy.
> > > >>
> > > >> Can that compiler be fixed instead?
> > > >
> > > > I don't think so. The logic in the compiler whether to emit an
> > > 
> > > Forget that I asked. Heat induced brain damaged.
> > > 
> > > > I'd much rather remove all of __force_order.
> > > 
> > > Right.
> > > 
> > > > Not sure about the comment in arch/x86/include/asm/special_insns.h
> > > > either; smells fishy like a bug with a compiler from a long time ago.
> > > > It looks like it was introduced in:
> > > > commit d3ca901f94b32 ("x86: unify paravirt parts of system.h")
> > > > Lore has this thread:
> > > > https://lore.kernel.org/lkml/4755a809.4050...@qumranet.com/
> > > > Patch 4: 
> > > > https://lore.kernel.org/lkml/11967844071346-git-send-email-gco...@redhat.com/
> > > > It seems like there was a discussion about %cr8, but no one asked
> > > > "what's going on here with __force_order, is that right?"
> > > 
> > > Correct and the changelog is uselss in this regard.
> > > 
> > > > Quick boot test of the below works for me, though I should probably
> > > > test hosting a virtualized guest since d3ca901f94b32 refers to
> > > > paravirt.  Thoughts?
> > > 
> > > Let me ask (hopefully) useful questions this time:
> > > 
> > >   Is a compiler allowed to reorder two 'asm volatile()'?
> > > 
> > >   Are there compilers (gcc >= 4.9 or other supported ones) which do that?
> > 
> > I would hope that the answer to both of these questions is "no"!
> > 
> > But I freely confess that I have been disappointed before on this sort
> > of thing.  :-/
> > 
> > Thanx, Paul
> 
> Ok, I found this, so gcc developers consider re-ordering volatile asm
> wrt each other a bug at least.
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602

Whew!!!  ;-)

Thanx, Paul


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-13 Thread Arvind Sankar
On Thu, Aug 13, 2020 at 10:37:01AM -0700, Paul E. McKenney wrote:
> On Thu, Aug 13, 2020 at 07:28:57PM +0200, Thomas Gleixner wrote:
> > Nick Desaulniers  writes:
> > > On Thu, Aug 6, 2020 at 3:11 PM Thomas Gleixner  wrote:
> > >> > + *
> > >> > + * Clang sometimes fails to kill the reference to the dummy variable, 
> > >> > so
> > >> > + * provide an actual copy.
> > >>
> > >> Can that compiler be fixed instead?
> > >
> > > I don't think so. The logic in the compiler whether to emit an
> > 
> > Forget that I asked. Heat induced brain damaged.
> > 
> > > I'd much rather remove all of __force_order.
> > 
> > Right.
> > 
> > > Not sure about the comment in arch/x86/include/asm/special_insns.h
> > > either; smells fishy like a bug with a compiler from a long time ago.
> > > It looks like it was introduced in:
> > > commit d3ca901f94b32 ("x86: unify paravirt parts of system.h")
> > > Lore has this thread:
> > > https://lore.kernel.org/lkml/4755a809.4050...@qumranet.com/
> > > Patch 4: 
> > > https://lore.kernel.org/lkml/11967844071346-git-send-email-gco...@redhat.com/
> > > It seems like there was a discussion about %cr8, but no one asked
> > > "what's going on here with __force_order, is that right?"
> > 
> > Correct and the changelog is uselss in this regard.
> > 
> > > Quick boot test of the below works for me, though I should probably
> > > test hosting a virtualized guest since d3ca901f94b32 refers to
> > > paravirt.  Thoughts?
> > 
> > Let me ask (hopefully) useful questions this time:
> > 
> >   Is a compiler allowed to reorder two 'asm volatile()'?
> > 
> >   Are there compilers (gcc >= 4.9 or other supported ones) which do that?
> 
> I would hope that the answer to both of these questions is "no"!
> 
> But I freely confess that I have been disappointed before on this sort
> of thing.  :-/
> 
>   Thanx, Paul

Ok, I found this, so gcc developers consider re-ordering volatile asm
wrt each other a bug at least.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-13 Thread Paul E. McKenney
On Thu, Aug 13, 2020 at 07:28:57PM +0200, Thomas Gleixner wrote:
> Nick Desaulniers  writes:
> > On Thu, Aug 6, 2020 at 3:11 PM Thomas Gleixner  wrote:
> >> > + *
> >> > + * Clang sometimes fails to kill the reference to the dummy variable, so
> >> > + * provide an actual copy.
> >>
> >> Can that compiler be fixed instead?
> >
> > I don't think so. The logic in the compiler whether to emit an
> 
> Forget that I asked. Heat induced brain damaged.
> 
> > I'd much rather remove all of __force_order.
> 
> Right.
> 
> > Not sure about the comment in arch/x86/include/asm/special_insns.h
> > either; smells fishy like a bug with a compiler from a long time ago.
> > It looks like it was introduced in:
> > commit d3ca901f94b32 ("x86: unify paravirt parts of system.h")
> > Lore has this thread:
> > https://lore.kernel.org/lkml/4755a809.4050...@qumranet.com/
> > Patch 4: 
> > https://lore.kernel.org/lkml/11967844071346-git-send-email-gco...@redhat.com/
> > It seems like there was a discussion about %cr8, but no one asked
> > "what's going on here with __force_order, is that right?"
> 
> Correct and the changelog is uselss in this regard.
> 
> > Quick boot test of the below works for me, though I should probably
> > test hosting a virtualized guest since d3ca901f94b32 refers to
> > paravirt.  Thoughts?
> 
> Let me ask (hopefully) useful questions this time:
> 
>   Is a compiler allowed to reorder two 'asm volatile()'?
> 
>   Are there compilers (gcc >= 4.9 or other supported ones) which do that?

I would hope that the answer to both of these questions is "no"!

But I freely confess that I have been disappointed before on this sort
of thing.  :-/

Thanx, Paul


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-13 Thread Thomas Gleixner
Nick Desaulniers  writes:
> On Thu, Aug 6, 2020 at 3:11 PM Thomas Gleixner  wrote:
>> > + *
>> > + * Clang sometimes fails to kill the reference to the dummy variable, so
>> > + * provide an actual copy.
>>
>> Can that compiler be fixed instead?
>
> I don't think so. The logic in the compiler whether to emit an

Forget that I asked. Heat induced brain damaged.

> I'd much rather remove all of __force_order.

Right.

> Not sure about the comment in arch/x86/include/asm/special_insns.h
> either; smells fishy like a bug with a compiler from a long time ago.
> It looks like it was introduced in:
> commit d3ca901f94b32 ("x86: unify paravirt parts of system.h")
> Lore has this thread:
> https://lore.kernel.org/lkml/4755a809.4050...@qumranet.com/
> Patch 4: 
> https://lore.kernel.org/lkml/11967844071346-git-send-email-gco...@redhat.com/
> It seems like there was a discussion about %cr8, but no one asked
> "what's going on here with __force_order, is that right?"

Correct and the changelog is uselss in this regard.

> Quick boot test of the below works for me, though I should probably
> test hosting a virtualized guest since d3ca901f94b32 refers to
> paravirt.  Thoughts?

Let me ask (hopefully) useful questions this time:

  Is a compiler allowed to reorder two 'asm volatile()'?

  Are there compilers (gcc >= 4.9 or other supported ones) which do that?

Thanks,

tglx


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-13 Thread Arvind Sankar
On Wed, Aug 12, 2020 at 05:12:34PM -0700, Nick Desaulniers wrote:
> On Thu, Aug 6, 2020 at 3:11 PM Thomas Gleixner  wrote:
> >
> > Arnd Bergmann  writes:
> > > When using the clang integrated assembler, we get a reference
> > > to __force_order that should normally get ignored in a few
> > > rare cases:
> > >
> > > ERROR: modpost: "__force_order" [drivers/cpufreq/powernow-k6.ko] 
> > > undefined!
> > >
> > > Add a 'static' definition so any file in which this happens can
> > > have a local copy.
> >
> > That's a horrible hack.
> 
> Agreed.  And static means everyone gets their own copy, rather than
> sharing one memory address.  I guess no one actually writes to it, so
> it doesn't really matter, but __force_order just seems so strange to
> me.
> 
> > And the only reason why it does not trigger -Wunused-variable warnings
> > all over the place is because it's "referenced" in unused inline
> > functions and then optimized out along with the unused inlines.
> >
> > >   * It is not referenced from the code, but GCC < 5 with -fPIE would fail
> > >   * due to an undefined symbol. Define it to make these ancient GCCs
> > >   work.
> >
> > Bah, we really should have moved straight to GCC5 instead of upping it
> > just to 4.9
> >
> > > + *
> > > + * Clang sometimes fails to kill the reference to the dummy variable, so
> > > + * provide an actual copy.
> >
> > Can that compiler be fixed instead?
> 
> I don't think so. The logic in the compiler whether to emit an
> "address is significant" assembler directive is based on whether the
> variable is "used."  The "use" of `__force_order` is as output of all
> of these control register read/write functions' inline asm, even
> though the inline asm doesn't actually write to them.  We'd have to
> peek inside of the inline asm and build "use/def chains" for the
> inline asm, to see that you don't actually use the output variable.
> Best we can do is see it listed as an output to the inline asm
> statement.  And if you reference an `extern` variable, it should be no
> wonder that you can get undefined symbol linkage failures.
> 
> I'd much rather remove all of __force_order.
> 
> >
> > Aside of that is there a reason to make this 'static' thing wrapped in
> > #ifdeffery? A quick check with GCC8.3 just works. But maybe 4.9 gets
> > unhappy. Can't say due to: -ENOANCIENTCOMPILER :)
> 
> >From the comment in arch/x86/boot/compressed/pgtable_64.c, there's a
> hint that maybe gcc < 5 and -pie (CONFIG_RANDOMIZE_BASE?) would fail
> due to undefined symbol, though I'm not sure which symbol the comment
> is referring to.  If it's __force_order, then removing outright likely
> fixes that issue.

Yes, it's __force_order. Compressed kernel is always -fPIE, and gcc <5
and clang will generate mov instructions with GOTPCREL relocations to
load the address of __force_order into a register for use by the inline
asm. gcc-5+ works because it doesn't use GOTPCREL for global variables,
instead relying on the linker inserting copy relocations if necessary.

> 
> Not sure about the comment in arch/x86/include/asm/special_insns.h
> either; smells fishy like a bug with a compiler from a long time ago.
> It looks like it was introduced in:
> commit d3ca901f94b32 ("x86: unify paravirt parts of system.h")
> Lore has this thread:
> https://lore.kernel.org/lkml/4755a809.4050...@qumranet.com/
> Patch 4: 
> https://lore.kernel.org/lkml/11967844071346-git-send-email-gco...@redhat.com/
> It seems like there was a discussion about %cr8, but no one asked
> "what's going on here with __force_order, is that right?"
> Latest GCC release on December 4 2007 would have been GCC 4.2.2 according to:
> https://gcc.gnu.org/releases.html
> 
> Quick boot test of the below works for me, though I should probably
> test hosting a virtualized guest since d3ca901f94b32 refers to
> paravirt.  Thoughts?

It's unclear if there was a real problem this fixes, but if there was
I'd expect it on native, not paravirt, given it's native that has this
__force_order hack?

gcc's documentation of volatile extended asm includes a caveat.

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile

Near the end of 6.47.2.1:
"Note that the compiler can move even volatile asm instructions relative
to other code, including across jump instructions."

and it provides an example of unexpected code motion, with the fix being
adding an artificial dependency to the asm.

So it might do something silly like reversing the order of two
%crn writes, maybe?


RE: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-13 Thread David Laight
From: Nick Desaulniers
> Sent: 13 August 2020 01:13
> 
> On Thu, Aug 6, 2020 at 3:11 PM Thomas Gleixner  wrote:
> >
> > Arnd Bergmann  writes:
> > > When using the clang integrated assembler, we get a reference
> > > to __force_order that should normally get ignored in a few
> > > rare cases:
> > >
> > > ERROR: modpost: "__force_order" [drivers/cpufreq/powernow-k6.ko] 
> > > undefined!
> > >
> > > Add a 'static' definition so any file in which this happens can
> > > have a local copy.
> >
> > That's a horrible hack.
> 
> Agreed.  And static means everyone gets their own copy, rather than
> sharing one memory address.  I guess no one actually writes to it, so
> it doesn't really matter, but __force_order just seems so strange to
> me.

It could be changed to use a symbol that the linker script already defines.
However it does look like a workaround for a broken version of gcc.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-12 Thread Nick Desaulniers
On Thu, Aug 6, 2020 at 3:11 PM Thomas Gleixner  wrote:
>
> Arnd Bergmann  writes:
> > When using the clang integrated assembler, we get a reference
> > to __force_order that should normally get ignored in a few
> > rare cases:
> >
> > ERROR: modpost: "__force_order" [drivers/cpufreq/powernow-k6.ko] undefined!
> >
> > Add a 'static' definition so any file in which this happens can
> > have a local copy.
>
> That's a horrible hack.

Agreed.  And static means everyone gets their own copy, rather than
sharing one memory address.  I guess no one actually writes to it, so
it doesn't really matter, but __force_order just seems so strange to
me.

> And the only reason why it does not trigger -Wunused-variable warnings
> all over the place is because it's "referenced" in unused inline
> functions and then optimized out along with the unused inlines.
>
> >   * It is not referenced from the code, but GCC < 5 with -fPIE would fail
> >   * due to an undefined symbol. Define it to make these ancient GCCs
> >   work.
>
> Bah, we really should have moved straight to GCC5 instead of upping it
> just to 4.9
>
> > + *
> > + * Clang sometimes fails to kill the reference to the dummy variable, so
> > + * provide an actual copy.
>
> Can that compiler be fixed instead?

I don't think so. The logic in the compiler whether to emit an
"address is significant" assembler directive is based on whether the
variable is "used."  The "use" of `__force_order` is as output of all
of these control register read/write functions' inline asm, even
though the inline asm doesn't actually write to them.  We'd have to
peek inside of the inline asm and build "use/def chains" for the
inline asm, to see that you don't actually use the output variable.
Best we can do is see it listed as an output to the inline asm
statement.  And if you reference an `extern` variable, it should be no
wonder that you can get undefined symbol linkage failures.

I'd much rather remove all of __force_order.

>
> Aside of that is there a reason to make this 'static' thing wrapped in
> #ifdeffery? A quick check with GCC8.3 just works. But maybe 4.9 gets
> unhappy. Can't say due to: -ENOANCIENTCOMPILER :)

>From the comment in arch/x86/boot/compressed/pgtable_64.c, there's a
hint that maybe gcc < 5 and -pie (CONFIG_RANDOMIZE_BASE?) would fail
due to undefined symbol, though I'm not sure which symbol the comment
is referring to.  If it's __force_order, then removing outright likely
fixes that issue.

Not sure about the comment in arch/x86/include/asm/special_insns.h
either; smells fishy like a bug with a compiler from a long time ago.
It looks like it was introduced in:
commit d3ca901f94b32 ("x86: unify paravirt parts of system.h")
Lore has this thread:
https://lore.kernel.org/lkml/4755a809.4050...@qumranet.com/
Patch 4: 
https://lore.kernel.org/lkml/11967844071346-git-send-email-gco...@redhat.com/
It seems like there was a discussion about %cr8, but no one asked
"what's going on here with __force_order, is that right?"
Latest GCC release on December 4 2007 would have been GCC 4.2.2 according to:
https://gcc.gnu.org/releases.html

Quick boot test of the below works for me, though I should probably
test hosting a virtualized guest since d3ca901f94b32 refers to
paravirt.  Thoughts?
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -5,15 +5,6 @@
 #include "pgtable.h"
 #include "../string.h"

-/*
- * __force_order is used by special_insns.h asm code to force instruction
- * serialization.
- *
- * It is not referenced from the code, but GCC < 5 with -fPIE would fail
- * due to an undefined symbol. Define it to make these ancient GCCs work.
- */
-unsigned long __force_order;
-
 #define BIOS_START_MIN 0x2U/* 128K, less than
this is insane */
 #define BIOS_START_MAX 0x9f000U/* 640K, absolute maximum */

diff --git a/arch/x86/include/asm/special_insns.h
b/arch/x86/include/asm/special_insns.h
index 59a3e13204c3..d2e0d53b0f69 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -10,46 +10,37 @@
 #include 
 #include 

-/*
- * Volatile isn't enough to prevent the compiler from reordering the
- * read/write functions for the control registers and messing everything up.
- * A memory clobber would solve the problem, but would prevent reordering of
- * all loads stores around it, which can hurt performance. Solution is to
- * use a variable and mimic reads and writes to it to enforce serialization
- */
-extern unsigned long __force_order;
-
 void native_write_cr0(unsigned long val);

 static inline unsigned long native_read_cr0(void)
 {
unsigned long val;
-   asm volatile("mov %%cr0,%0\n\t" : "=r" (val), "=m" (__force_order));
+   asm volatile("mov %%cr0,%0\n\t" : "=r" (val));
return val;
 }

 static __always_inline unsigned long native_read_cr2(void)
 {
unsigned long val;
-   asm volatile("mov %%cr2,%0\n\t" : "=r" (val), "=m" (__force_orde

Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-07 Thread Sedat Dilek
On Fri, Aug 7, 2020 at 12:13 AM Thomas Gleixner  wrote:
>
> Sedat Dilek  writes:
> > what is the status of this patch?
>
> Just looked at it.
>
> > I needed this one to be able to build VirtualBox via DKMS as an
> > out-of-tree kernel-module.
>
> Not being able to build the vbox rootkit is a feature, not a bug.
>

It must be a funny job to work for Linux/x86.
Keep your humour :-).

We have a second issue hitting this problem when CONFIG_LKDTM=m.
There are more details in CBL issue #1120.
Especially see comments from Nick [2] and user pcc [3].

Thanks.

- Sedat -

[1] https://github.com/ClangBuiltLinux/linux/issues/1120
[2] https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-668736160
[3] https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-668746486


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-06 Thread Thomas Gleixner
Sedat Dilek  writes:
> what is the status of this patch?

Just looked at it.

> I needed this one to be able to build VirtualBox via DKMS as an
> out-of-tree kernel-module.

Not being able to build the vbox rootkit is a feature, not a bug.

Thanks,

tglx


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-06 Thread Thomas Gleixner
Arnd Bergmann  writes:
> When using the clang integrated assembler, we get a reference
> to __force_order that should normally get ignored in a few
> rare cases:
>
> ERROR: modpost: "__force_order" [drivers/cpufreq/powernow-k6.ko] undefined!
>
> Add a 'static' definition so any file in which this happens can
> have a local copy.

That's a horrible hack.

And the only reason why it does not trigger -Wunused-variable warnings
all over the place is because it's "referenced" in unused inline
functions and then optimized out along with the unused inlines.

>   * It is not referenced from the code, but GCC < 5 with -fPIE would fail
>   * due to an undefined symbol. Define it to make these ancient GCCs
>   work.

Bah, we really should have moved straight to GCC5 instead of upping it
just to 4.9

> + *
> + * Clang sometimes fails to kill the reference to the dummy variable, so
> + * provide an actual copy.

Can that compiler be fixed instead?

Aside of that is there a reason to make this 'static' thing wrapped in
#ifdeffery? A quick check with GCC8.3 just works. But maybe 4.9 gets
unhappy. Can't say due to: -ENOANCIENTCOMPILER :)

Thanks,

tglx


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-03 Thread Nick Desaulniers
On Wed, May 27, 2020 at 6:53 AM Arnd Bergmann  wrote:
>
> When using the clang integrated assembler, we get a reference
> to __force_order that should normally get ignored in a few
> rare cases:
>
> ERROR: modpost: "__force_order" [drivers/cpufreq/powernow-k6.ko] undefined!
>
> Add a 'static' definition so any file in which this happens can
> have a local copy.
>
> Signed-off-by: Arnd Bergmann 

Hi Arnd,
Looks like
$ ARCH=i386 make CC=clang LLVM_IAS=1 -j71
defconfig+CONFIG_X86_POWERNOW_K6=m
is the simplest reproducer.

If I run
$ llvm-readelf -s drivers/cpufreq/powernow-k6.o | grep __force_order
39:  0 NOTYPE  GLOBAL DEFAULT   UND __force_order
we can indeed see an undefined reference to __force_order.

If I build the .s file via
$ ARCH=i386 make CC=clang LLVM_IAS=1 -j71 drivers/cpufreq/powernow-k6.s
the only reference I see to __force_order is:
979   .addrsig_sym __force_order

which is created by Clang's implicit -faddr-sig.  If I disable that
for this file via:

```diff
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index f1b7e3dd6e5d..87d655d5af49 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -28,6 +28,9 @@ obj-$(CONFIG_X86_ACPI_CPUFREQ)+=
acpi-cpufreq.o
 obj-$(CONFIG_X86_POWERNOW_K8)  += powernow-k8.o
 obj-$(CONFIG_X86_PCC_CPUFREQ)  += pcc-cpufreq.o
 obj-$(CONFIG_X86_POWERNOW_K6)  += powernow-k6.o
+ifdef CONFIG_CC_IS_CLANG
+CFLAGS_powernow-k6.o   += -fno-addrsig
+endif
 obj-$(CONFIG_X86_POWERNOW_K7)  += powernow-k7.o
 obj-$(CONFIG_X86_LONGHAUL) += longhaul.o
 obj-$(CONFIG_X86_E_POWERSAVER) += e_powersaver.o
```
then the module links without error, and we get no hits for
__force_order from llvm-readelf -s.  This makes me think there may be
a bug in Clang generating address significance tables for global
variables that are otherwise unused, resulting in such linkage
failures. +pcc@ for that.

I ran a creduce job on drivers/cpufreq/powernow-k6.i where I'd compile
twice, one with the implicit default value of -faddr-sig and look for
the undefined __force_order, and again with -fno-addrsig and ensure
there was no undefined __force_order, which coughed up:
extern int __force_order;
int a(void) { asm("" : "=m"(__force_order)); return 0; }
as the bare minimum for an address significant table.
https://godbolt.org/z/cjfaqM

I'll bet this is coming from the call to read_cr0() in
powernow_k6_set_cpu_multiplier().  If __force_order is defined in
arch/x86/boot/compressed/pgtable_64.c, then I'm not sure it's a good
idea to build drivers/cpufreq/powernow-k6.c as a kernel module
(CONFIG_X86_POWERNOW_K6=m) vs statically compiled in.  Wouldn't
__force_order need to be EXPORT'ed for kernel modules to use it
safely?

> ---
>  arch/x86/boot/compressed/pgtable_64.c | 2 ++
>  arch/x86/include/asm/special_insns.h  | 7 +++
>  2 files changed, 9 insertions(+)
>
> diff --git a/arch/x86/boot/compressed/pgtable_64.c 
> b/arch/x86/boot/compressed/pgtable_64.c
> index c8862696a47b..8595194cea41 100644
> --- a/arch/x86/boot/compressed/pgtable_64.c
> +++ b/arch/x86/boot/compressed/pgtable_64.c
> @@ -12,7 +12,9 @@
>   * It is not referenced from the code, but GCC < 5 with -fPIE would fail
>   * due to an undefined symbol. Define it to make these ancient GCCs work.
>   */
> +#ifndef CONFIG_CC_IS_CLANG
>  unsigned long __force_order;
> +#endif
>
>  #define BIOS_START_MIN 0x2U/* 128K, less than this is 
> insane */
>  #define BIOS_START_MAX 0x9f000U/* 640K, absolute maximum */
> diff --git a/arch/x86/include/asm/special_insns.h 
> b/arch/x86/include/asm/special_insns.h
> index 82436cb04ccf..7081e587c1ea 100644
> --- a/arch/x86/include/asm/special_insns.h
> +++ b/arch/x86/include/asm/special_insns.h
> @@ -16,8 +16,15 @@
>   * A memory clobber would solve the problem, but would prevent reordering of
>   * all loads stores around it, which can hurt performance. Solution is to
>   * use a variable and mimic reads and writes to it to enforce serialization
> + *
> + * Clang sometimes fails to kill the reference to the dummy variable, so
> + * provide an actual copy.
>   */
> +#ifdef CONFIG_CC_IS_CLANG
> +static unsigned long __force_order;
> +#else
>  extern unsigned long __force_order;
> +#endif
>
>  void native_write_cr0(unsigned long val);
>
> --
> 2.26.2
>
> --


-- 
Thanks,
~Nick Desaulniers


Re: [PATCH] x86: work around clang IAS bug referencing __force_order

2020-08-01 Thread Sedat Dilek
On Wed, May 27, 2020 at 3:53 PM Arnd Bergmann  wrote:
>
> When using the clang integrated assembler, we get a reference
> to __force_order that should normally get ignored in a few
> rare cases:
>
> ERROR: modpost: "__force_order" [drivers/cpufreq/powernow-k6.ko] undefined!
>
> Add a 'static' definition so any file in which this happens can
> have a local copy.
>
> Signed-off-by: Arnd Bergmann 

Hi,

what is the status of this patch?

I needed this one to be able to build VirtualBox via DKMS as an
out-of-tree kernel-module.
Package: virtualbox-dkms version 6.1.12-dfsg-8 (Debian/unstable)

To quote myself (see [1]):
Passing LLVM_IAS=1 results in:

  AR  /var/lib/dkms/virtualbox/6.1.12/build/built-in.a
  MODPOST /var/lib/dkms/virtualbox/6.1.12/build/Module.symvers
ERROR: modpost: "__force_order"
[/var/lib/dkms/virtualbox/6.1.12/build/vboxdrv/vboxdrv.ko] undefined!

Arnd's patch is mandatory to build with Clang's Integrated Assembler
(make LLVM_IAS=1).
Here: LLVM toolchain version 11.0.0-rc1

Feel free to add:
Reported-by: Sedat Dilek 
Tested-by: Sedat Dilek 

Can one of the tip maintainers pick this up, please?

Thanks.

Regards,
- Sedat -

[1] https://github.com/ClangBuiltLinux/linux/issues/1104#issuecomment-667470053


> ---
>  arch/x86/boot/compressed/pgtable_64.c | 2 ++
>  arch/x86/include/asm/special_insns.h  | 7 +++
>  2 files changed, 9 insertions(+)
>
> diff --git a/arch/x86/boot/compressed/pgtable_64.c 
> b/arch/x86/boot/compressed/pgtable_64.c
> index c8862696a47b..8595194cea41 100644
> --- a/arch/x86/boot/compressed/pgtable_64.c
> +++ b/arch/x86/boot/compressed/pgtable_64.c
> @@ -12,7 +12,9 @@
>   * It is not referenced from the code, but GCC < 5 with -fPIE would fail
>   * due to an undefined symbol. Define it to make these ancient GCCs work.
>   */
> +#ifndef CONFIG_CC_IS_CLANG
>  unsigned long __force_order;
> +#endif
>
>  #define BIOS_START_MIN 0x2U/* 128K, less than this is 
> insane */
>  #define BIOS_START_MAX 0x9f000U/* 640K, absolute maximum */
> diff --git a/arch/x86/include/asm/special_insns.h 
> b/arch/x86/include/asm/special_insns.h
> index 82436cb04ccf..7081e587c1ea 100644
> --- a/arch/x86/include/asm/special_insns.h
> +++ b/arch/x86/include/asm/special_insns.h
> @@ -16,8 +16,15 @@
>   * A memory clobber would solve the problem, but would prevent reordering of
>   * all loads stores around it, which can hurt performance. Solution is to
>   * use a variable and mimic reads and writes to it to enforce serialization
> + *
> + * Clang sometimes fails to kill the reference to the dummy variable, so
> + * provide an actual copy.
>   */
> +#ifdef CONFIG_CC_IS_CLANG
> +static unsigned long __force_order;
> +#else
>  extern unsigned long __force_order;
> +#endif
>
>  void native_write_cr0(unsigned long val);
>
> --
> 2.26.2
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clang-built-linux+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/clang-built-linux/20200527135329.1172644-1-arnd%40arndb.de.


[PATCH] x86: work around clang IAS bug referencing __force_order

2020-05-27 Thread Arnd Bergmann
When using the clang integrated assembler, we get a reference
to __force_order that should normally get ignored in a few
rare cases:

ERROR: modpost: "__force_order" [drivers/cpufreq/powernow-k6.ko] undefined!

Add a 'static' definition so any file in which this happens can
have a local copy.

Signed-off-by: Arnd Bergmann 
---
 arch/x86/boot/compressed/pgtable_64.c | 2 ++
 arch/x86/include/asm/special_insns.h  | 7 +++
 2 files changed, 9 insertions(+)

diff --git a/arch/x86/boot/compressed/pgtable_64.c 
b/arch/x86/boot/compressed/pgtable_64.c
index c8862696a47b..8595194cea41 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -12,7 +12,9 @@
  * It is not referenced from the code, but GCC < 5 with -fPIE would fail
  * due to an undefined symbol. Define it to make these ancient GCCs work.
  */
+#ifndef CONFIG_CC_IS_CLANG
 unsigned long __force_order;
+#endif
 
 #define BIOS_START_MIN 0x2U/* 128K, less than this is 
insane */
 #define BIOS_START_MAX 0x9f000U/* 640K, absolute maximum */
diff --git a/arch/x86/include/asm/special_insns.h 
b/arch/x86/include/asm/special_insns.h
index 82436cb04ccf..7081e587c1ea 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -16,8 +16,15 @@
  * A memory clobber would solve the problem, but would prevent reordering of
  * all loads stores around it, which can hurt performance. Solution is to
  * use a variable and mimic reads and writes to it to enforce serialization
+ *
+ * Clang sometimes fails to kill the reference to the dummy variable, so
+ * provide an actual copy.
  */
+#ifdef CONFIG_CC_IS_CLANG
+static unsigned long __force_order;
+#else
 extern unsigned long __force_order;
+#endif
 
 void native_write_cr0(unsigned long val);
 
-- 
2.26.2