Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Sat 2020-08-22 11:51:56, Sedat Dilek wrote: > On Sat, Aug 22, 2020 at 11:23 AM Sedat Dilek wrote: > > > > On Sat, Aug 22, 2020 at 10:42 AM Segher Boessenkool > > wrote: > > > > > > Hi Arvind, > > > > > > On Fri, Aug 21, 2020 at 11:55:52PM -0400, Arvind Sankar wrote: > > > > Cc Segher. > > > > > > > > Segher, we were looking at gcc PR82602, where IRA could reorder volatile > > > > asm's (reported on ARM). The fix was backported to gcc-6. > > > > > > I know ;-) > > > > > > > Do you know if > > > > there is any reason the problem couldn't occur on x86 on older gcc > > > > without the fix? > > > > > > No, I see no particular reason, at least GCC 5 seems vulnerable. (The > > > GCC 5 release branch was closed at the time this bug report was made, > > > already). There is no reason I see why it would work on x86 but fail > > > elsewhere, either. > > > > > > > [1] says: > > > > Current Minimal Requirements > > ... > > == === > > > > ProgramMinimal version Command to check the version > > == === > > > > GNU C 4.9 gcc --version > > > > - Sedat - > > > > [1] > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst#n32 > > [ CC Miguel Ojeda (Compiler Attributes maintainer) ] > > There exist gcc-4.8 and gcc-4.9 for Debian/jessie where EOL was June > 30, 2020 (see [1] and [2]). > > In the latest available version "4.9.2-10+deb8u1" I see no PR82602 was > backported (see [3] and [4]). > > I am asking myself who is using such ancient compilers? > Recently, I threw away GCC-8 from my Debian system. I do have 4.9.2 on some systems. They work well, and are likely to compile significantly faster than newer ones. Please don't break them. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
RE: [PATCH] x86: work around clang IAS bug referencing __force_order
From: Arvind Sankar > Sent: 22 August 2020 22:17 ... > Assuming we don't want to risk removing force_order, I'd suggest > - make it an input/output operand, so it enforces ordering fully. > - either restrict it to gcc < 8, or just provide a proper definition in > some file (maybe arch/x86/kernel/cpu/common.c)? Is it possible to replace __force_order with a symbol that the linker scripts defines? Or just define __force_order in the linker script to some 'random' constant (eg 0). ISTM that adding "m"(__force_order) to asm volatile can do no harm. Especially for accesses to CRn and MSRn (etc) which might have obscure side effects. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Sat, Aug 22, 2020 at 05:10:21PM -0700, Linus Torvalds wrote: > On Sat, Aug 22, 2020 at 4:11 PM Arvind Sankar wrote: > > > > Actually, is a memory clobber required for correctness? Memory accesses > > probably shouldn't be reordered across a CRn write. Is asm volatile > > enough to stop that or do you need a memory clobber? > > You do need a memory clobber if you really care about ordering wrt > normal memory references. > > That said, I'm not convinced we do care here. Normal memory accesses > (as seen by the compiler) should be entirely immune to any changes we > do wrt CRx registers. > > Because code that really fundamentally changes kernel mappings or > access rules is already written in low-level assembler (eg the entry > routines or bootup). > > Anything that relies on the more subtle changes (ie user space > accesses etc) should already be ordered by other things - usually by > the fact that they are also "asm volatile". > > But hey, maybe somebody can come up with an exception to that. > > Linus I'm sure in practice it can't happen, as any memory accesses happening immediately around write_cr3() are probably mapped the same in both pagetables anyway, but eg cleanup_trampoline() in arch/x86/boot/compressed/pgtable_64.c: memcpy(pgtable, trampoline_pgtable, PAGE_SIZE); native_write_cr3((unsigned long)pgtable); There'll probably be trouble if the compiler were to reverse the order here. We could actually make write_crn() use memory clobber, and read_crn() use "m"(*(int *)0x1000) as an input operand. A bit hacky, but no global variable needed. And maybe read_crn() doesn't even have to be volatile. Also, if we look at the rdmsr/wrmsr pair, there's no force_order equivalent AFAICT. wrmsr has a memory clobber, but the asm volatile-ness is the only thing enforcing read/write ordering.
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Sat, Aug 22, 2020 at 4:11 PM Arvind Sankar wrote: > > Actually, is a memory clobber required for correctness? Memory accesses > probably shouldn't be reordered across a CRn write. Is asm volatile > enough to stop that or do you need a memory clobber? You do need a memory clobber if you really care about ordering wrt normal memory references. That said, I'm not convinced we do care here. Normal memory accesses (as seen by the compiler) should be entirely immune to any changes we do wrt CRx registers. Because code that really fundamentally changes kernel mappings or access rules is already written in low-level assembler (eg the entry routines or bootup). Anything that relies on the more subtle changes (ie user space accesses etc) should already be ordered by other things - usually by the fact that they are also "asm volatile". But hey, maybe somebody can come up with an exception to that. Linus
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Sat, Aug 22, 2020 at 02:08:27PM -0700, Linus Torvalds wrote: > However, in this case, can we just leave that old "__force_order" hack > alone, and to work around the clang thing, just make a dummy > definition of it anyway. > > Alternatively, just use the memory clobber. We use memory clobbers > elsewhere in inline asms to make sure they are serialized, it's not > normally a huge problem. Both clang and gcc should be smart enough to > know that a memory clobber doesn't matter for things like local > variables etc that might be on stack but have never had their address > taken. > > Or are there other cases than that particular __force_order thing that > people now worry about? > > Linus Actually, is a memory clobber required for correctness? Memory accesses probably shouldn't be reordered across a CRn write. Is asm volatile enough to stop that or do you need a memory clobber? Replacing force_order with memory clobber introduces a few extra instructions (testing with defconfig), but only in x86-64 hibernate/reboot/sleep code and early_ioremap_init on x86-32.
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Sat, Aug 22, 2020 at 08:17:32PM +0200, Miguel Ojeda wrote: > On Sat, Aug 22, 2020 at 11:52 AM Sedat Dilek wrote: > > > > I am asking myself who is using such ancient compilers? > > There are many users/companies using older versions of compilers, > kernels and everything. GCC <= 4.9 will still be used/supported (by > third parties) for a handful of years at least. > > However, the important question is whether those users/companies care > about running the latest kernels. Many of those definitely do not want > to touch their kernel either. For those that do, there are several > longterms to pick from that still support 4.9, as well as other > workarounds. > > Thus I am usually in favor of raising the minimum whenever new hacks > are required to be added. On the other hand, we already raised the > version twice this year and it is not clear to me what is the minimum > version we would need to go for to ensure this does not bite us. > > > If this is a real problem with GCC version <= 5, so can this be moved > > to a GCC specific include header-file? > > Thinking of include/linux/compiler-gcc.h or > > include/linux/compiler_types.h with a GCC-VERSION check? > > That would be better if it can be done, yes. > > Cheers, > Miguel The fix landed in gcc 6.5, 7.3 and 8.1. The bug is presumably quite difficult to actually trigger. As a sample data point, I verified that 7.1 vs 7.1+fix have no differences on 32-bit and 64-bit x86 defconfigs, on current mainline. Assuming we don't want to risk removing force_order, I'd suggest - make it an input/output operand, so it enforces ordering fully. - either restrict it to gcc < 8, or just provide a proper definition in some file (maybe arch/x86/kernel/cpu/common.c)? Thanks.
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Sat, Aug 22, 2020 at 11:17 AM Miguel Ojeda wrote: > > However, the important question is whether those users/companies care > about running the latest kernels. Many of those definitely do not want > to touch their kernel either. For those that do, there are several > longterms to pick from that still support 4.9, as well as other > workarounds. > > Thus I am usually in favor of raising the minimum whenever new hacks > are required to be added. On the other hand, we already raised the > version twice this year and it is not clear to me what is the minimum > version we would need to go for to ensure this does not bite us. Yeah. The good news is that I haven't seen a lot of pushback on the gcc version updates so far. I was expecting some complaints. I haven't seen a single one. That may be because people did end up finding it very onerous and complained internally on channels I haven't seen, but it might also be indicative of us having perhaps been a bit too timid about compiler version updates. However, in this case, can we just leave that old "__force_order" hack alone, and to work around the clang thing, just make a dummy definition of it anyway. Alternatively, just use the memory clobber. We use memory clobbers elsewhere in inline asms to make sure they are serialized, it's not normally a huge problem. Both clang and gcc should be smart enough to know that a memory clobber doesn't matter for things like local variables etc that might be on stack but have never had their address taken. Or are there other cases than that particular __force_order thing that people now worry about? Linus
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Sat, Aug 22, 2020 at 11:52 AM Sedat Dilek wrote: > > I am asking myself who is using such ancient compilers? There are many users/companies using older versions of compilers, kernels and everything. GCC <= 4.9 will still be used/supported (by third parties) for a handful of years at least. However, the important question is whether those users/companies care about running the latest kernels. Many of those definitely do not want to touch their kernel either. For those that do, there are several longterms to pick from that still support 4.9, as well as other workarounds. Thus I am usually in favor of raising the minimum whenever new hacks are required to be added. On the other hand, we already raised the version twice this year and it is not clear to me what is the minimum version we would need to go for to ensure this does not bite us. > If this is a real problem with GCC version <= 5, so can this be moved > to a GCC specific include header-file? > Thinking of include/linux/compiler-gcc.h or > include/linux/compiler_types.h with a GCC-VERSION check? That would be better if it can be done, yes. Cheers, Miguel
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Sat, Aug 22, 2020 at 12:26 PM Segher Boessenkool wrote: > [ There is GCC 4.9.4, no one should use an older 4.9. ] > > I mentioned 5 for a reason: the whole function this patch is to did not > exist before then! That does not mean the bug existed or did not exist > before GCC 5, but it does for example mean that a backport to 4.9 or > older isn't trivial at all. > > > I am asking myself who is using such ancient compilers? > > Some distros have a GCC 4.8 as system compiler. We allow building GCC > itself with a compiler that far back, for various reasons as well (and > this is very sharp already, the last mainline GCC 4.8 release is from > June 2015, not all that long ago at all). > > But, one reason this works is because people actually test it. Does > anyone actually test the kernel with old compilers? It isn't hard to > build a new compiler (because we make sure building a newer compiler > works with older compilers, etc. :-) ), and as you say, most distros > have newer compilers available nowadays. We only recently changed the minimum from 4.6 to 4.8, and subsequently to 4.9. Most people have fairly recent compilers, but there are a number of notable kernel developers that intentionally stick to old versions because of compile speed. Each major compiler release adds about 4% overhead in total time to compile a kernel, so between gcc-4.6 and gcc-11 you add over 50% in build time. Arnd
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Sat, Aug 22, 2020 at 11:51:56AM +0200, Sedat Dilek wrote: > On Sat, Aug 22, 2020 at 11:23 AM Sedat Dilek wrote: > > > > On Sat, Aug 22, 2020 at 10:42 AM Segher Boessenkool > > wrote: > > > > > > Hi Arvind, > > > > > > On Fri, Aug 21, 2020 at 11:55:52PM -0400, Arvind Sankar wrote: > > > > Cc Segher. > > > > > > > > Segher, we were looking at gcc PR82602, where IRA could reorder volatile > > > > asm's (reported on ARM). The fix was backported to gcc-6. > > > > > > I know ;-) > > > > > > > Do you know if > > > > there is any reason the problem couldn't occur on x86 on older gcc > > > > without the fix? > > > > > > No, I see no particular reason, at least GCC 5 seems vulnerable. (The > > > GCC 5 release branch was closed at the time this bug report was made, > > > already). There is no reason I see why it would work on x86 but fail > > > elsewhere, either. > There exist gcc-4.8 and gcc-4.9 for Debian/jessie where EOL was June > 30, 2020 (see [1] and [2]). > > In the latest available version "4.9.2-10+deb8u1" I see no PR82602 was > backported (see [3] and [4]). [ There is GCC 4.9.4, no one should use an older 4.9. ] I mentioned 5 for a reason: the whole function this patch is to did not exist before then! That does not mean the bug existed or did not exist before GCC 5, but it does for example mean that a backport to 4.9 or older isn't trivial at all. > I am asking myself who is using such ancient compilers? Some distros have a GCC 4.8 as system compiler. We allow building GCC itself with a compiler that far back, for various reasons as well (and this is very sharp already, the last mainline GCC 4.8 release is from June 2015, not all that long ago at all). But, one reason this works is because people actually test it. Does anyone actually test the kernel with old compilers? It isn't hard to build a new compiler (because we make sure building a newer compiler works with older compilers, etc. :-) ), and as you say, most distros have newer compilers available nowadays. Segher
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Sat, Aug 22, 2020 at 11:23 AM Sedat Dilek wrote: > > On Sat, Aug 22, 2020 at 10:42 AM Segher Boessenkool > wrote: > > > > Hi Arvind, > > > > On Fri, Aug 21, 2020 at 11:55:52PM -0400, Arvind Sankar wrote: > > > Cc Segher. > > > > > > Segher, we were looking at gcc PR82602, where IRA could reorder volatile > > > asm's (reported on ARM). The fix was backported to gcc-6. > > > > I know ;-) > > > > > Do you know if > > > there is any reason the problem couldn't occur on x86 on older gcc > > > without the fix? > > > > No, I see no particular reason, at least GCC 5 seems vulnerable. (The > > GCC 5 release branch was closed at the time this bug report was made, > > already). There is no reason I see why it would work on x86 but fail > > elsewhere, either. > > > > [1] says: > > Current Minimal Requirements > ... > == === > > ProgramMinimal version Command to check the version > == === > > GNU C 4.9 gcc --version > > - Sedat - > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst#n32 [ CC Miguel Ojeda (Compiler Attributes maintainer) ] There exist gcc-4.8 and gcc-4.9 for Debian/jessie where EOL was June 30, 2020 (see [1] and [2]). In the latest available version "4.9.2-10+deb8u1" I see no PR82602 was backported (see [3] and [4]). I am asking myself who is using such ancient compilers? Recently, I threw away GCC-8 from my Debian system. If this is a real problem with GCC version <= 5, so can this be moved to a GCC specific include header-file? Thinking of include/linux/compiler-gcc.h or include/linux/compiler_types.h with a GCC-VERSION check? Thoughts? - Sedat - P.S.: Yesterday, I built with dropping __force_order entirely and LLVM toolchain v11.0.0-rc2 on Debian/unstable AMD64 on top of recent Linux v5.9-rc1+. [1] https://packages.debian.org/search?keywords=gcc-4 [2] https://wiki.debian.org/LTS [3] https://sources.debian.org/src/gcc-4.9/ [4] https://sources.debian.org/src/gcc-4.9/4.9.2-10+deb8u1/debian/patches/
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Sat, Aug 22, 2020 at 10:42 AM Segher Boessenkool wrote: > > Hi Arvind, > > On Fri, Aug 21, 2020 at 11:55:52PM -0400, Arvind Sankar wrote: > > Cc Segher. > > > > Segher, we were looking at gcc PR82602, where IRA could reorder volatile > > asm's (reported on ARM). The fix was backported to gcc-6. > > I know ;-) > > > Do you know if > > there is any reason the problem couldn't occur on x86 on older gcc > > without the fix? > > No, I see no particular reason, at least GCC 5 seems vulnerable. (The > GCC 5 release branch was closed at the time this bug report was made, > already). There is no reason I see why it would work on x86 but fail > elsewhere, either. > [1] says: Current Minimal Requirements ... == === ProgramMinimal version Command to check the version == === GNU C 4.9 gcc --version - Sedat - [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst#n32
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
Hi Arvind, On Fri, Aug 21, 2020 at 11:55:52PM -0400, Arvind Sankar wrote: > Cc Segher. > > Segher, we were looking at gcc PR82602, where IRA could reorder volatile > asm's (reported on ARM). The fix was backported to gcc-6. I know ;-) > Do you know if > there is any reason the problem couldn't occur on x86 on older gcc > without the fix? No, I see no particular reason, at least GCC 5 seems vulnerable. (The GCC 5 release branch was closed at the time this bug report was made, already). There is no reason I see why it would work on x86 but fail elsewhere, either. Segher
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Sat, Aug 22, 2020 at 02:43:08AM +0200, Thomas Gleixner wrote: > On Fri, Aug 21 2020 at 16:16, Nick Desaulniers wrote: > > On Fri, Aug 21, 2020 at 4:04 PM Arvind Sankar wrote: > >> On Fri, Aug 21, 2020 at 02:37:48AM +0200, Thomas Gleixner wrote: > >> The gcc bug I linked to earlier is only fixed in gcc-6 onwards. Is that > > > > (based on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602#c14) > > > >> good enough to remove force_order? I can test gcc-4.9 and gcc-5 to check > >> if it would currently have any impact. > > And that test tells you what exactly? That your particular build of > those compilers does not have the problem. A truly scientific approach. More that the current kernel code doesn't have that problem, but yeah, it might creep in later. > > > I think checking the disassemblies with a pre-gcc-6 would be good > > enough then; that bug isn't specific to this particular case. > > What? I clearly want a statement from the GCC people that this won't > happen on pre gcc6 compilers and not just some 'works for me' statement > based on a randomly picked compiler build. Presumably also from clang that the compiler does have protections against this, as opposed to doesn't happen today. > > Thanks, > > tglx Cc Segher. Segher, we were looking at gcc PR82602, where IRA could reorder volatile asm's (reported on ARM). The fix was backported to gcc-6. Do you know if there is any reason the problem couldn't occur on x86 on older gcc without the fix? Thanks. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Fri, Aug 21 2020 at 16:16, Nick Desaulniers wrote: > On Fri, Aug 21, 2020 at 4:04 PM Arvind Sankar wrote: >> On Fri, Aug 21, 2020 at 02:37:48AM +0200, Thomas Gleixner wrote: >> The gcc bug I linked to earlier is only fixed in gcc-6 onwards. Is that > > (based on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602#c14) > >> good enough to remove force_order? I can test gcc-4.9 and gcc-5 to check >> if it would currently have any impact. And that test tells you what exactly? That your particular build of those compilers does not have the problem. A truly scientific approach. > I think checking the disassemblies with a pre-gcc-6 would be good > enough then; that bug isn't specific to this particular case. What? I clearly want a statement from the GCC people that this won't happen on pre gcc6 compilers and not just some 'works for me' statement based on a randomly picked compiler build. Thanks, tglx
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Fri, Aug 21, 2020 at 04:16:56PM -0700, Nick Desaulniers wrote: > On Fri, Aug 21, 2020 at 4:04 PM Arvind Sankar wrote: > > > > On Fri, Aug 21, 2020 at 02:37:48AM +0200, Thomas Gleixner wrote: > > > On Thu, Aug 20 2020 at 09:06, Arvind Sankar wrote: > > > > I don't think that's an issue, or at least, not one where force_order > > > > helps. > > > > > > > > If the source for foo() is not visible to the compiler, the only reason > > > > force_order prevents the reordering is because foo() might have > > > > references to it, but equally foo() might have volatile asm, so the > > > > reordering isn't possible anyway. > > > > > > > > If the source is visible, force_order won't prevent any reordering > > > > except across references to force_order, but the only references are > > > > from the volatile asm's which already prevent reordering. > > > > > > > > I think force_order can only help with buggy compilers, and for those it > > > > should really have been an input-output operand -- it wouldn't currently > > > > do anything to prevent cr writes from being reordered. > > I agree 100%. From the link to GCC docs, the code in question doesn't > even follow the pattern from the doc from informing the compiler of > any dependency, it just looks like !@#$. > > > > > > > Fair enough. Care to provide a patch which has the collected wisdom of > > > this thread in the changelog? > > > > > > Thanks, > > > > > > tglx > > > > The gcc bug I linked to earlier is only fixed in gcc-6 onwards. Is that > > (based on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602#c14) I actually checked gcc's git repo too. The fix is not there in gcc-4.9 and gcc-5. > > > good enough to remove force_order? I can test gcc-4.9 and gcc-5 to check > > if it would currently have any impact. > > I think checking the disassemblies with a pre-gcc-6 would be good > enough then; that bug isn't specific to this particular case. > > > CBL guys, can you confirm that clang also will not reorder volatile asm? > > Full disassemblies of vmlinux pre vs post __force_order removal are > the same. That's pretty good actually; I was worried for a code base > of this size whether two compiles would produce the exact same > disassemblies; I know the version strings are timestamped, for > instance, but I didn't compare data, just .text. I should triple > check i386, and some of the ko's that use modified functions. I'd be > happy to help provide a tested by tag for numerous configurations with > Clang. > > Attaching the diff I was testing, feel free to add a commit message. > -- > Thanks, > ~Nick Desaulniers Thanks, will write it up over the weekend.
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Fri, Aug 21, 2020 at 4:04 PM Arvind Sankar wrote: > > On Fri, Aug 21, 2020 at 02:37:48AM +0200, Thomas Gleixner wrote: > > On Thu, Aug 20 2020 at 09:06, Arvind Sankar wrote: > > > I don't think that's an issue, or at least, not one where force_order > > > helps. > > > > > > If the source for foo() is not visible to the compiler, the only reason > > > force_order prevents the reordering is because foo() might have > > > references to it, but equally foo() might have volatile asm, so the > > > reordering isn't possible anyway. > > > > > > If the source is visible, force_order won't prevent any reordering > > > except across references to force_order, but the only references are > > > from the volatile asm's which already prevent reordering. > > > > > > I think force_order can only help with buggy compilers, and for those it > > > should really have been an input-output operand -- it wouldn't currently > > > do anything to prevent cr writes from being reordered. I agree 100%. From the link to GCC docs, the code in question doesn't even follow the pattern from the doc from informing the compiler of any dependency, it just looks like !@#$. > > > > Fair enough. Care to provide a patch which has the collected wisdom of > > this thread in the changelog? > > > > Thanks, > > > > tglx > > The gcc bug I linked to earlier is only fixed in gcc-6 onwards. Is that (based on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602#c14) > good enough to remove force_order? I can test gcc-4.9 and gcc-5 to check > if it would currently have any impact. I think checking the disassemblies with a pre-gcc-6 would be good enough then; that bug isn't specific to this particular case. > CBL guys, can you confirm that clang also will not reorder volatile asm? Full disassemblies of vmlinux pre vs post __force_order removal are the same. That's pretty good actually; I was worried for a code base of this size whether two compiles would produce the exact same disassemblies; I know the version strings are timestamped, for instance, but I didn't compare data, just .text. I should triple check i386, and some of the ko's that use modified functions. I'd be happy to help provide a tested by tag for numerous configurations with Clang. Attaching the diff I was testing, feel free to add a commit message. -- Thanks, ~Nick Desaulniers force_order.patch Description: Binary data
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Fri, Aug 21, 2020 at 02:37:48AM +0200, Thomas Gleixner wrote: > On Thu, Aug 20 2020 at 09:06, Arvind Sankar wrote: > > I don't think that's an issue, or at least, not one where force_order > > helps. > > > > If the source for foo() is not visible to the compiler, the only reason > > force_order prevents the reordering is because foo() might have > > references to it, but equally foo() might have volatile asm, so the > > reordering isn't possible anyway. > > > > If the source is visible, force_order won't prevent any reordering > > except across references to force_order, but the only references are > > from the volatile asm's which already prevent reordering. > > > > I think force_order can only help with buggy compilers, and for those it > > should really have been an input-output operand -- it wouldn't currently > > do anything to prevent cr writes from being reordered. > > Fair enough. Care to provide a patch which has the collected wisdom of > this thread in the changelog? > > Thanks, > > tglx The gcc bug I linked to earlier is only fixed in gcc-6 onwards. Is that good enough to remove force_order? I can test gcc-4.9 and gcc-5 to check if it would currently have any impact. CBL guys, can you confirm that clang also will not reorder volatile asm? Thanks.
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Thu, Aug 20 2020 at 09:06, Arvind Sankar wrote: > I don't think that's an issue, or at least, not one where force_order > helps. > > If the source for foo() is not visible to the compiler, the only reason > force_order prevents the reordering is because foo() might have > references to it, but equally foo() might have volatile asm, so the > reordering isn't possible anyway. > > If the source is visible, force_order won't prevent any reordering > except across references to force_order, but the only references are > from the volatile asm's which already prevent reordering. > > I think force_order can only help with buggy compilers, and for those it > should really have been an input-output operand -- it wouldn't currently > do anything to prevent cr writes from being reordered. Fair enough. Care to provide a patch which has the collected wisdom of this thread in the changelog? Thanks, tglx
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Thu, Aug 20, 2020 at 12:44:06PM +0200, Thomas Gleixner wrote: > On Thu, Aug 13 2020 at 14:09, Arvind Sankar wrote: > > On Thu, Aug 13, 2020 at 10:37:01AM -0700, Paul E. McKenney wrote: > >> > Let me ask (hopefully) useful questions this time: > >> > > >> > Is a compiler allowed to reorder two 'asm volatile()'? > >> > > >> > Are there compilers (gcc >= 4.9 or other supported ones) which do that? > >> > >> I would hope that the answer to both of these questions is "no"! > >> > >> But I freely confess that I have been disappointed before on this sort > >> of thing. :-/ > >> > >>Thanx, Paul > > > > Ok, I found this, so gcc developers consider re-ordering volatile asm > > wrt each other a bug at least. > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602 > > Yes. It prevents reordering of volatiles, but it does not necessarily > prevent reorder of something like this: > > asm volatile(...); > foo(); > asm volatile(...); > > it might turn that into > > foo(); > asm volatile(...); > asm volatile(...); > > if I understood their discussion correctly. So removing this magic is > not really straight forward. > > Thanks, > > tglx > > I don't think that's an issue, or at least, not one where force_order helps. If the source for foo() is not visible to the compiler, the only reason force_order prevents the reordering is because foo() might have references to it, but equally foo() might have volatile asm, so the reordering isn't possible anyway. If the source is visible, force_order won't prevent any reordering except across references to force_order, but the only references are from the volatile asm's which already prevent reordering. I think force_order can only help with buggy compilers, and for those it should really have been an input-output operand -- it wouldn't currently do anything to prevent cr writes from being reordered. Thanks.
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Thu, Aug 13 2020 at 14:09, Arvind Sankar wrote: > On Thu, Aug 13, 2020 at 10:37:01AM -0700, Paul E. McKenney wrote: >> > Let me ask (hopefully) useful questions this time: >> > >> > Is a compiler allowed to reorder two 'asm volatile()'? >> > >> > Are there compilers (gcc >= 4.9 or other supported ones) which do that? >> >> I would hope that the answer to both of these questions is "no"! >> >> But I freely confess that I have been disappointed before on this sort >> of thing. :-/ >> >> Thanx, Paul > > Ok, I found this, so gcc developers consider re-ordering volatile asm > wrt each other a bug at least. > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602 Yes. It prevents reordering of volatiles, but it does not necessarily prevent reorder of something like this: asm volatile(...); foo(); asm volatile(...); it might turn that into foo(); asm volatile(...); asm volatile(...); if I understood their discussion correctly. So removing this magic is not really straight forward. Thanks, tglx
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
GCC toolchain simply ignores if kaslr_64.o has __force_order means the build ends up successfully whereas LLVM toolchain and IAS breaks and the build stops and needs explicitly commit df6d4f9db79c1a5d6f48b59db35ccd1e9ff9adfc ("x86/boot/compressed: Don't declare __force_order in kaslr_64.c") reverted to fix this. With the revert GCC toolchain is also fine. Maybe it is good to revert that commit? This is with [1]: diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 59a3e13204c3..e1c19c5ecd5e 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -17,7 +17,7 @@ * all loads stores around it, which can hurt performance. Solution is to * use a variable and mimic reads and writes to it to enforce serialization */ -extern unsigned long __force_order; +extern unsigned long __force_order __weak; void native_write_cr0(unsigned long val); ...and the patchset of "x86/boot: Remove run-time relocations from compressed kernel" applied [3]. More details in [4]. - Sedat - References: [1] https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674182703 [2] https://git.kernel.org/linus/df6d4f9db79c1a5d6f48b59db35ccd1e9ff9adfc [3] https://lore.kernel.org/patchwork/project/lkml/list/?series=456251 [4] https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674502114
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Sat, Aug 15, 2020 at 10:23 AM Sedat Dilek wrote: > > On Sat, Aug 15, 2020 at 5:28 AM Sedat Dilek wrote: > > > > On Sat, Aug 15, 2020 at 2:27 AM Nick Desaulniers > > wrote: > > > > > > On Fri, Aug 14, 2020 at 3:57 PM Nick Desaulniers > > > wrote: > > > > > > > > On Fri, Aug 14, 2020 at 2:19 PM Sedat Dilek > > > > wrote: > > > > > > > > > > On Fri, Aug 14, 2020 at 7:29 PM Sedat Dilek > > > > > wrote: > > > > > > > > > > > > Thanks for the proposal. > > > > > > > > > > > > I have adapted it to fit my patchset against Linux v5.8. > > > > > > > > > > > > Both Debian's GCC-10 and a snapshot version of LLVM toolchain > > > > > > v11.0.0-rc1+ seems to be OK. > > > > > > > > > > > > > > > > Yupp, OK. > > > > > > > > > > I was able to boot FreeDOS 1.2 VM in VirtualBox GUI. > > > > > > > > Hi Sedat, > > > > Apologies, but it's not clear to me precisely which patch you tested. > > > > Can you please confirm whether you tested: > > > > 1. Arnd's patch that started this thread. > > > > 2. My proposed diff adding -fno-addrsig to CFLAGS_powernow-k6.o. > > > > 3. My proposed diff removing __force_order from the kernel. > > > > > > > > I'm hoping you were referring to testing 3., but it's not clear to me. > > > > > > Ah, sorry, I missed your comment on github: > > > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674282107 > > > > > > Ok, I will look at more disassembly next week and hopefully have a > > > patch ready, with your tested by tag. > > > > > > > Sorry for not being precise - I tested with solution (3.). > > Later I added the diff I used as mentioned in your above comment. > > > > See [1]: > > > > > In a 2nd run building with a selfmade clang-11 and LLVM "bin"utils is > > > fine, too. > > > > I cannot say much to older versions of GCC and/or LLVM/Clang if > > removing "__force_order" works fine. > > > > Another (4.) solution: > > Sami tried successfully by adding "__weak" declaration with > > CONFIG_LKDTM=m (see [2]). > > I am OK if this works, too. > > > > Please, see my attachments. > > > > - Sedat - > > > > [1] > > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674340760 > > [2] > > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674182703 > > Unfortunately, the diff from Sami does not work together with Arvind's > patchset... > > x86/boot: Remove run-time relocations from compressed kernel > > ...which got included in recently. > > I see the following: > > ld.lld-11 -m elf_x86_64 -pie --no-dynamic-linker -T > arch/x86/boot/compressed/vmlinux.lds > arch/x86/boot/compressed/kernel_info.o > arch/x86/boot/compressed/head_64.o arch/x86/boot/compressed/misc.o > arch/x86/boot/compressed/string.o arch/x86/boot/compressed/cmdline.o > arch/x86/boot/compressed/error.o arch/x86/boot/compressed/piggy.o > arch/x86/boot/compressed/cpuflags.o > arch/x86/boot/compressed/early_serial_console.o > arch/x86/boot/compressed/kaslr.o arch/x86/boot/compressed/kaslr_64.o > arch/x86/boot/compressed/mem_encrypt.o > arch/x86/boot/compressed/pgtable_64.o arch/x86/boot/compressed/acpi.o > arch/x86/boot/compressed/efi_thunk_64.o > drivers/firmware/efi/libstub/lib.a -o arch/x86/boot/compressed/vmlinux > ld.lld-11: error: Unexpected GOT entries detected! > ld.lld-11: error: Unexpected run-time relocations detected! > ld.lld-11: error: Unexpected GOT entries detected! > ld.lld-11: error: Unexpected run-time relocations detected! > make[5]: *** [arch/x86/boot/compressed/Makefile:91: > arch/x86/boot/compressed/vmlinux] Error 1 > > When you need further informations, please let me know. > > - Sedat - > > [1] > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674182703 When I revert... commit df6d4f9db79c1a5d6f48b59db35ccd1e9ff9adfc "x86/boot/compressed: Don't declare __force_order in kaslr_64.c" ...I can build, boot on bare metal and start FreeDOS VM in VirtualBox. For more details see [2]. - Sedat - [1] https://git.kernel.org/linus/df6d4f9db79c1a5d6f48b59db35ccd1e9ff9adfc [2] https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674378085
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Sat, Aug 15, 2020 at 5:28 AM Sedat Dilek wrote: > > On Sat, Aug 15, 2020 at 2:27 AM Nick Desaulniers > wrote: > > > > On Fri, Aug 14, 2020 at 3:57 PM Nick Desaulniers > > wrote: > > > > > > On Fri, Aug 14, 2020 at 2:19 PM Sedat Dilek wrote: > > > > > > > > On Fri, Aug 14, 2020 at 7:29 PM Sedat Dilek > > > > wrote: > > > > > > > > > > Thanks for the proposal. > > > > > > > > > > I have adapted it to fit my patchset against Linux v5.8. > > > > > > > > > > Both Debian's GCC-10 and a snapshot version of LLVM toolchain > > > > > v11.0.0-rc1+ seems to be OK. > > > > > > > > > > > > > Yupp, OK. > > > > > > > > I was able to boot FreeDOS 1.2 VM in VirtualBox GUI. > > > > > > Hi Sedat, > > > Apologies, but it's not clear to me precisely which patch you tested. > > > Can you please confirm whether you tested: > > > 1. Arnd's patch that started this thread. > > > 2. My proposed diff adding -fno-addrsig to CFLAGS_powernow-k6.o. > > > 3. My proposed diff removing __force_order from the kernel. > > > > > > I'm hoping you were referring to testing 3., but it's not clear to me. > > > > Ah, sorry, I missed your comment on github: > > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674282107 > > > > Ok, I will look at more disassembly next week and hopefully have a > > patch ready, with your tested by tag. > > > > Sorry for not being precise - I tested with solution (3.). > Later I added the diff I used as mentioned in your above comment. > > See [1]: > > > In a 2nd run building with a selfmade clang-11 and LLVM "bin"utils is fine, > > too. > > I cannot say much to older versions of GCC and/or LLVM/Clang if > removing "__force_order" works fine. > > Another (4.) solution: > Sami tried successfully by adding "__weak" declaration with > CONFIG_LKDTM=m (see [2]). > I am OK if this works, too. > > Please, see my attachments. > > - Sedat - > > [1] > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674340760 > [2] > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674182703 Unfortunately, the diff from Sami does not work together with Arvind's patchset... x86/boot: Remove run-time relocations from compressed kernel ...which got included in recently. I see the following: ld.lld-11 -m elf_x86_64 -pie --no-dynamic-linker -T arch/x86/boot/compressed/vmlinux.lds arch/x86/boot/compressed/kernel_info.o arch/x86/boot/compressed/head_64.o arch/x86/boot/compressed/misc.o arch/x86/boot/compressed/string.o arch/x86/boot/compressed/cmdline.o arch/x86/boot/compressed/error.o arch/x86/boot/compressed/piggy.o arch/x86/boot/compressed/cpuflags.o arch/x86/boot/compressed/early_serial_console.o arch/x86/boot/compressed/kaslr.o arch/x86/boot/compressed/kaslr_64.o arch/x86/boot/compressed/mem_encrypt.o arch/x86/boot/compressed/pgtable_64.o arch/x86/boot/compressed/acpi.o arch/x86/boot/compressed/efi_thunk_64.o drivers/firmware/efi/libstub/lib.a -o arch/x86/boot/compressed/vmlinux ld.lld-11: error: Unexpected GOT entries detected! ld.lld-11: error: Unexpected run-time relocations detected! ld.lld-11: error: Unexpected GOT entries detected! ld.lld-11: error: Unexpected run-time relocations detected! make[5]: *** [arch/x86/boot/compressed/Makefile:91: arch/x86/boot/compressed/vmlinux] Error 1 When you need further informations, please let me know. - Sedat - [1] https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674182703
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Sat, Aug 15, 2020 at 12:46 PM Sedat Dilek wrote: > > On Sat, Aug 15, 2020 at 10:23 AM Sedat Dilek wrote: > > > > On Sat, Aug 15, 2020 at 5:28 AM Sedat Dilek wrote: > > > > > > On Sat, Aug 15, 2020 at 2:27 AM Nick Desaulniers > > > wrote: > > > > > > > > On Fri, Aug 14, 2020 at 3:57 PM Nick Desaulniers > > > > wrote: > > > > > > > > > > On Fri, Aug 14, 2020 at 2:19 PM Sedat Dilek > > > > > wrote: > > > > > > > > > > > > On Fri, Aug 14, 2020 at 7:29 PM Sedat Dilek > > > > > > wrote: > > > > > > > > > > > > > > Thanks for the proposal. > > > > > > > > > > > > > > I have adapted it to fit my patchset against Linux v5.8. > > > > > > > > > > > > > > Both Debian's GCC-10 and a snapshot version of LLVM toolchain > > > > > > > v11.0.0-rc1+ seems to be OK. > > > > > > > > > > > > > > > > > > > Yupp, OK. > > > > > > > > > > > > I was able to boot FreeDOS 1.2 VM in VirtualBox GUI. > > > > > > > > > > Hi Sedat, > > > > > Apologies, but it's not clear to me precisely which patch you tested. > > > > > Can you please confirm whether you tested: > > > > > 1. Arnd's patch that started this thread. > > > > > 2. My proposed diff adding -fno-addrsig to CFLAGS_powernow-k6.o. > > > > > 3. My proposed diff removing __force_order from the kernel. > > > > > > > > > > I'm hoping you were referring to testing 3., but it's not clear to me. > > > > > > > > Ah, sorry, I missed your comment on github: > > > > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674282107 > > > > > > > > Ok, I will look at more disassembly next week and hopefully have a > > > > patch ready, with your tested by tag. > > > > > > > > > > Sorry for not being precise - I tested with solution (3.). > > > Later I added the diff I used as mentioned in your above comment. > > > > > > See [1]: > > > > > > > In a 2nd run building with a selfmade clang-11 and LLVM "bin"utils is > > > > fine, too. > > > > > > I cannot say much to older versions of GCC and/or LLVM/Clang if > > > removing "__force_order" works fine. > > > > > > Another (4.) solution: > > > Sami tried successfully by adding "__weak" declaration with > > > CONFIG_LKDTM=m (see [2]). > > > I am OK if this works, too. > > > > > > Please, see my attachments. > > > > > > - Sedat - > > > > > > [1] > > > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674340760 > > > [2] > > > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674182703 > > > > Unfortunately, the diff from Sami does not work together with Arvind's > > patchset... > > > > x86/boot: Remove run-time relocations from compressed kernel > > > > ...which got included in recently. > > > > I see the following: > > > > ld.lld-11 -m elf_x86_64 -pie --no-dynamic-linker -T > > arch/x86/boot/compressed/vmlinux.lds > > arch/x86/boot/compressed/kernel_info.o > > arch/x86/boot/compressed/head_64.o arch/x86/boot/compressed/misc.o > > arch/x86/boot/compressed/string.o arch/x86/boot/compressed/cmdline.o > > arch/x86/boot/compressed/error.o arch/x86/boot/compressed/piggy.o > > arch/x86/boot/compressed/cpuflags.o > > arch/x86/boot/compressed/early_serial_console.o > > arch/x86/boot/compressed/kaslr.o arch/x86/boot/compressed/kaslr_64.o > > arch/x86/boot/compressed/mem_encrypt.o > > arch/x86/boot/compressed/pgtable_64.o arch/x86/boot/compressed/acpi.o > > arch/x86/boot/compressed/efi_thunk_64.o > > drivers/firmware/efi/libstub/lib.a -o arch/x86/boot/compressed/vmlinux > > ld.lld-11: error: Unexpected GOT entries detected! > > ld.lld-11: error: Unexpected run-time relocations detected! > > ld.lld-11: error: Unexpected GOT entries detected! > > ld.lld-11: error: Unexpected run-time relocations detected! > > make[5]: *** [arch/x86/boot/compressed/Makefile:91: > > arch/x86/boot/compressed/vmlinux] Error 1 > > > > When you need further informations, please let me know. > > > > - Sedat - > > > > [1] > > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674182703 > > When I revert... > > commit df6d4f9db79c1a5d6f48b59db35ccd1e9ff9adfc > "x86/boot/compressed: Don't declare __force_order in kaslr_64.c" > > ...I can build, boot on bare metal and start FreeDOS VM in VirtualBox. > > For more details see [2]. > > - Sedat - > > [1] https://git.kernel.org/linus/df6d4f9db79c1a5d6f48b59db35ccd1e9ff9adfc > [2] > https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674378085 Fine with using Debian's GCC v10.2 and GNU/ld v2.35 (from binutils v2.35). All details in [1]. [1] https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674406068
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Fri, Aug 14, 2020 at 3:57 PM Nick Desaulniers wrote: > > On Fri, Aug 14, 2020 at 2:19 PM Sedat Dilek wrote: > > > > On Fri, Aug 14, 2020 at 7:29 PM Sedat Dilek wrote: > > > > > > Thanks for the proposal. > > > > > > I have adapted it to fit my patchset against Linux v5.8. > > > > > > Both Debian's GCC-10 and a snapshot version of LLVM toolchain > > > v11.0.0-rc1+ seems to be OK. > > > > > > > Yupp, OK. > > > > I was able to boot FreeDOS 1.2 VM in VirtualBox GUI. > > Hi Sedat, > Apologies, but it's not clear to me precisely which patch you tested. > Can you please confirm whether you tested: > 1. Arnd's patch that started this thread. > 2. My proposed diff adding -fno-addrsig to CFLAGS_powernow-k6.o. > 3. My proposed diff removing __force_order from the kernel. > > I'm hoping you were referring to testing 3., but it's not clear to me. Ah, sorry, I missed your comment on github: https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-674282107 Ok, I will look at more disassembly next week and hopefully have a patch ready, with your tested by tag. -- Thanks, ~Nick Desaulniers
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Fri, Aug 14, 2020 at 2:19 PM Sedat Dilek wrote: > > On Fri, Aug 14, 2020 at 7:29 PM Sedat Dilek wrote: > > > > Thanks for the proposal. > > > > I have adapted it to fit my patchset against Linux v5.8. > > > > Both Debian's GCC-10 and a snapshot version of LLVM toolchain > > v11.0.0-rc1+ seems to be OK. > > > > Yupp, OK. > > I was able to boot FreeDOS 1.2 VM in VirtualBox GUI. Hi Sedat, Apologies, but it's not clear to me precisely which patch you tested. Can you please confirm whether you tested: 1. Arnd's patch that started this thread. 2. My proposed diff adding -fno-addrsig to CFLAGS_powernow-k6.o. 3. My proposed diff removing __force_order from the kernel. I'm hoping you were referring to testing 3., but it's not clear to me. I've been comparing the full disassemblies of vmlinux images when built with Clang with 3 applied (they're no different, which is a pleasant surprise, I didn't think kernel builds woulds would be fully deterministic given the sheer amount of source). I still need to check the compressed vmlinux image, and various .ko's (XEN) that use these read/write_cr[0,1,2,4]() functions, and then check them again when built with GCC. I'm falling behind a little trying to get our MC organized for plumbers, as well as the end of intern season and beginning of bi-annual "performance review" ("not stack ranking" I'm told) at work. If I don't find any differences, or if I do but don't find them to be meaningful, I hope to push a more formal patch (rather than just a diff) maybe next week. I'll include my findings either way; if it was 3 that you tested, I'll include your tested by tag when sending. Otherwise maybe you can help us test the more formal patch next week? -- Thanks, ~Nick Desaulniers
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Tue, Aug 4, 2020 at 2:09 AM 'Nick Desaulniers' via Clang Built Linux wrote: > > On Wed, May 27, 2020 at 6:53 AM Arnd Bergmann wrote: > > > > When using the clang integrated assembler, we get a reference > > to __force_order that should normally get ignored in a few > > rare cases: > > > > ERROR: modpost: "__force_order" [drivers/cpufreq/powernow-k6.ko] undefined! > > > > Add a 'static' definition so any file in which this happens can > > have a local copy. > > > > Signed-off-by: Arnd Bergmann > > Hi Arnd, > Looks like > $ ARCH=i386 make CC=clang LLVM_IAS=1 -j71 > defconfig+CONFIG_X86_POWERNOW_K6=m > is the simplest reproducer. > > If I run > $ llvm-readelf -s drivers/cpufreq/powernow-k6.o | grep __force_order > 39: 0 NOTYPE GLOBAL DEFAULT UND __force_order > we can indeed see an undefined reference to __force_order. > > If I build the .s file via > $ ARCH=i386 make CC=clang LLVM_IAS=1 -j71 drivers/cpufreq/powernow-k6.s > the only reference I see to __force_order is: > 979 .addrsig_sym __force_order > > which is created by Clang's implicit -faddr-sig. If I disable that > for this file via: > > ```diff > diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile > index f1b7e3dd6e5d..87d655d5af49 100644 > --- a/drivers/cpufreq/Makefile > +++ b/drivers/cpufreq/Makefile > @@ -28,6 +28,9 @@ obj-$(CONFIG_X86_ACPI_CPUFREQ)+= > acpi-cpufreq.o > obj-$(CONFIG_X86_POWERNOW_K8) += powernow-k8.o > obj-$(CONFIG_X86_PCC_CPUFREQ) += pcc-cpufreq.o > obj-$(CONFIG_X86_POWERNOW_K6) += powernow-k6.o > +ifdef CONFIG_CC_IS_CLANG > +CFLAGS_powernow-k6.o += -fno-addrsig > +endif > obj-$(CONFIG_X86_POWERNOW_K7) += powernow-k7.o > obj-$(CONFIG_X86_LONGHAUL) += longhaul.o > obj-$(CONFIG_X86_E_POWERSAVER) += e_powersaver.o > ``` > then the module links without error, and we get no hits for > __force_order from llvm-readelf -s. This makes me think there may be > a bug in Clang generating address significance tables for global > variables that are otherwise unused, resulting in such linkage > failures. +pcc@ for that. > > I ran a creduce job on drivers/cpufreq/powernow-k6.i where I'd compile > twice, one with the implicit default value of -faddr-sig and look for > the undefined __force_order, and again with -fno-addrsig and ensure > there was no undefined __force_order, which coughed up: > extern int __force_order; > int a(void) { asm("" : "=m"(__force_order)); return 0; } > as the bare minimum for an address significant table. > https://godbolt.org/z/cjfaqM > > I'll bet this is coming from the call to read_cr0() in > powernow_k6_set_cpu_multiplier(). If __force_order is defined in > arch/x86/boot/compressed/pgtable_64.c, then I'm not sure it's a good > idea to build drivers/cpufreq/powernow-k6.c as a kernel module > (CONFIG_X86_POWERNOW_K6=m) vs statically compiled in. Wouldn't > __force_order need to be EXPORT'ed for kernel modules to use it > safely? > > > --- > > arch/x86/boot/compressed/pgtable_64.c | 2 ++ > > arch/x86/include/asm/special_insns.h | 7 +++ > > 2 files changed, 9 insertions(+) > > > > diff --git a/arch/x86/boot/compressed/pgtable_64.c > > b/arch/x86/boot/compressed/pgtable_64.c > > index c8862696a47b..8595194cea41 100644 > > --- a/arch/x86/boot/compressed/pgtable_64.c > > +++ b/arch/x86/boot/compressed/pgtable_64.c > > @@ -12,7 +12,9 @@ > > * It is not referenced from the code, but GCC < 5 with -fPIE would fail > > * due to an undefined symbol. Define it to make these ancient GCCs work. > > */ > > +#ifndef CONFIG_CC_IS_CLANG > > unsigned long __force_order; > > +#endif > > > > #define BIOS_START_MIN 0x2U/* 128K, less than this is > > insane */ > > #define BIOS_START_MAX 0x9f000U/* 640K, absolute maximum */ > > diff --git a/arch/x86/include/asm/special_insns.h > > b/arch/x86/include/asm/special_insns.h > > index 82436cb04ccf..7081e587c1ea 100644 > > --- a/arch/x86/include/asm/special_insns.h > > +++ b/arch/x86/include/asm/special_insns.h > > @@ -16,8 +16,15 @@ > > * A memory clobber would solve the problem, but would prevent reordering > > of > > * all loads stores around it, which can hurt performance. Solution is to > > * use a variable and mimic reads and writes to it to enforce serialization > > + * > > + * Clang sometimes fails to kill the reference to the dummy variable, so > > + * provide an actual copy. > > */ > > +#ifdef CONFIG_CC_IS_CLANG > > +static unsigned long __force_order; > > +#else > > extern unsigned long __force_order; > > +#endif > > > > void native_write_cr0(unsigned long val); > > > > -- > > 2.26.2 > > Thanks for the proposal. I have adapted it to fit my patchset against Linux v5.8. Both Debian's GCC-10 and a snapshot version of LLVM toolchain v11.0.0-rc1+ seems to be OK. MAKE_OPTS="V=1 -j3 CC=gcc-10 LD=ld.bfd" make $MAKE_OPTS arch/x86/kernel/cpu/common.o MAKE_OPTS="V=1 -j3 HOSTCC=clang-11 HOST
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Thu, Aug 13, 2020 at 02:09:33PM -0400, Arvind Sankar wrote: > On Thu, Aug 13, 2020 at 10:37:01AM -0700, Paul E. McKenney wrote: > > On Thu, Aug 13, 2020 at 07:28:57PM +0200, Thomas Gleixner wrote: > > > Nick Desaulniers writes: > > > > On Thu, Aug 6, 2020 at 3:11 PM Thomas Gleixner > > > > wrote: > > > >> > + * > > > >> > + * Clang sometimes fails to kill the reference to the dummy > > > >> > variable, so > > > >> > + * provide an actual copy. > > > >> > > > >> Can that compiler be fixed instead? > > > > > > > > I don't think so. The logic in the compiler whether to emit an > > > > > > Forget that I asked. Heat induced brain damaged. > > > > > > > I'd much rather remove all of __force_order. > > > > > > Right. > > > > > > > Not sure about the comment in arch/x86/include/asm/special_insns.h > > > > either; smells fishy like a bug with a compiler from a long time ago. > > > > It looks like it was introduced in: > > > > commit d3ca901f94b32 ("x86: unify paravirt parts of system.h") > > > > Lore has this thread: > > > > https://lore.kernel.org/lkml/4755a809.4050...@qumranet.com/ > > > > Patch 4: > > > > https://lore.kernel.org/lkml/11967844071346-git-send-email-gco...@redhat.com/ > > > > It seems like there was a discussion about %cr8, but no one asked > > > > "what's going on here with __force_order, is that right?" > > > > > > Correct and the changelog is uselss in this regard. > > > > > > > Quick boot test of the below works for me, though I should probably > > > > test hosting a virtualized guest since d3ca901f94b32 refers to > > > > paravirt. Thoughts? > > > > > > Let me ask (hopefully) useful questions this time: > > > > > > Is a compiler allowed to reorder two 'asm volatile()'? > > > > > > Are there compilers (gcc >= 4.9 or other supported ones) which do that? > > > > I would hope that the answer to both of these questions is "no"! > > > > But I freely confess that I have been disappointed before on this sort > > of thing. :-/ > > > > Thanx, Paul > > Ok, I found this, so gcc developers consider re-ordering volatile asm > wrt each other a bug at least. > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602 Whew!!! ;-) Thanx, Paul
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Thu, Aug 13, 2020 at 10:37:01AM -0700, Paul E. McKenney wrote: > On Thu, Aug 13, 2020 at 07:28:57PM +0200, Thomas Gleixner wrote: > > Nick Desaulniers writes: > > > On Thu, Aug 6, 2020 at 3:11 PM Thomas Gleixner wrote: > > >> > + * > > >> > + * Clang sometimes fails to kill the reference to the dummy variable, > > >> > so > > >> > + * provide an actual copy. > > >> > > >> Can that compiler be fixed instead? > > > > > > I don't think so. The logic in the compiler whether to emit an > > > > Forget that I asked. Heat induced brain damaged. > > > > > I'd much rather remove all of __force_order. > > > > Right. > > > > > Not sure about the comment in arch/x86/include/asm/special_insns.h > > > either; smells fishy like a bug with a compiler from a long time ago. > > > It looks like it was introduced in: > > > commit d3ca901f94b32 ("x86: unify paravirt parts of system.h") > > > Lore has this thread: > > > https://lore.kernel.org/lkml/4755a809.4050...@qumranet.com/ > > > Patch 4: > > > https://lore.kernel.org/lkml/11967844071346-git-send-email-gco...@redhat.com/ > > > It seems like there was a discussion about %cr8, but no one asked > > > "what's going on here with __force_order, is that right?" > > > > Correct and the changelog is uselss in this regard. > > > > > Quick boot test of the below works for me, though I should probably > > > test hosting a virtualized guest since d3ca901f94b32 refers to > > > paravirt. Thoughts? > > > > Let me ask (hopefully) useful questions this time: > > > > Is a compiler allowed to reorder two 'asm volatile()'? > > > > Are there compilers (gcc >= 4.9 or other supported ones) which do that? > > I would hope that the answer to both of these questions is "no"! > > But I freely confess that I have been disappointed before on this sort > of thing. :-/ > > Thanx, Paul Ok, I found this, so gcc developers consider re-ordering volatile asm wrt each other a bug at least. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Thu, Aug 13, 2020 at 07:28:57PM +0200, Thomas Gleixner wrote: > Nick Desaulniers writes: > > On Thu, Aug 6, 2020 at 3:11 PM Thomas Gleixner wrote: > >> > + * > >> > + * Clang sometimes fails to kill the reference to the dummy variable, so > >> > + * provide an actual copy. > >> > >> Can that compiler be fixed instead? > > > > I don't think so. The logic in the compiler whether to emit an > > Forget that I asked. Heat induced brain damaged. > > > I'd much rather remove all of __force_order. > > Right. > > > Not sure about the comment in arch/x86/include/asm/special_insns.h > > either; smells fishy like a bug with a compiler from a long time ago. > > It looks like it was introduced in: > > commit d3ca901f94b32 ("x86: unify paravirt parts of system.h") > > Lore has this thread: > > https://lore.kernel.org/lkml/4755a809.4050...@qumranet.com/ > > Patch 4: > > https://lore.kernel.org/lkml/11967844071346-git-send-email-gco...@redhat.com/ > > It seems like there was a discussion about %cr8, but no one asked > > "what's going on here with __force_order, is that right?" > > Correct and the changelog is uselss in this regard. > > > Quick boot test of the below works for me, though I should probably > > test hosting a virtualized guest since d3ca901f94b32 refers to > > paravirt. Thoughts? > > Let me ask (hopefully) useful questions this time: > > Is a compiler allowed to reorder two 'asm volatile()'? > > Are there compilers (gcc >= 4.9 or other supported ones) which do that? I would hope that the answer to both of these questions is "no"! But I freely confess that I have been disappointed before on this sort of thing. :-/ Thanx, Paul
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
Nick Desaulniers writes: > On Thu, Aug 6, 2020 at 3:11 PM Thomas Gleixner wrote: >> > + * >> > + * Clang sometimes fails to kill the reference to the dummy variable, so >> > + * provide an actual copy. >> >> Can that compiler be fixed instead? > > I don't think so. The logic in the compiler whether to emit an Forget that I asked. Heat induced brain damaged. > I'd much rather remove all of __force_order. Right. > Not sure about the comment in arch/x86/include/asm/special_insns.h > either; smells fishy like a bug with a compiler from a long time ago. > It looks like it was introduced in: > commit d3ca901f94b32 ("x86: unify paravirt parts of system.h") > Lore has this thread: > https://lore.kernel.org/lkml/4755a809.4050...@qumranet.com/ > Patch 4: > https://lore.kernel.org/lkml/11967844071346-git-send-email-gco...@redhat.com/ > It seems like there was a discussion about %cr8, but no one asked > "what's going on here with __force_order, is that right?" Correct and the changelog is uselss in this regard. > Quick boot test of the below works for me, though I should probably > test hosting a virtualized guest since d3ca901f94b32 refers to > paravirt. Thoughts? Let me ask (hopefully) useful questions this time: Is a compiler allowed to reorder two 'asm volatile()'? Are there compilers (gcc >= 4.9 or other supported ones) which do that? Thanks, tglx
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Wed, Aug 12, 2020 at 05:12:34PM -0700, Nick Desaulniers wrote: > On Thu, Aug 6, 2020 at 3:11 PM Thomas Gleixner wrote: > > > > Arnd Bergmann writes: > > > When using the clang integrated assembler, we get a reference > > > to __force_order that should normally get ignored in a few > > > rare cases: > > > > > > ERROR: modpost: "__force_order" [drivers/cpufreq/powernow-k6.ko] > > > undefined! > > > > > > Add a 'static' definition so any file in which this happens can > > > have a local copy. > > > > That's a horrible hack. > > Agreed. And static means everyone gets their own copy, rather than > sharing one memory address. I guess no one actually writes to it, so > it doesn't really matter, but __force_order just seems so strange to > me. > > > And the only reason why it does not trigger -Wunused-variable warnings > > all over the place is because it's "referenced" in unused inline > > functions and then optimized out along with the unused inlines. > > > > > * It is not referenced from the code, but GCC < 5 with -fPIE would fail > > > * due to an undefined symbol. Define it to make these ancient GCCs > > > work. > > > > Bah, we really should have moved straight to GCC5 instead of upping it > > just to 4.9 > > > > > + * > > > + * Clang sometimes fails to kill the reference to the dummy variable, so > > > + * provide an actual copy. > > > > Can that compiler be fixed instead? > > I don't think so. The logic in the compiler whether to emit an > "address is significant" assembler directive is based on whether the > variable is "used." The "use" of `__force_order` is as output of all > of these control register read/write functions' inline asm, even > though the inline asm doesn't actually write to them. We'd have to > peek inside of the inline asm and build "use/def chains" for the > inline asm, to see that you don't actually use the output variable. > Best we can do is see it listed as an output to the inline asm > statement. And if you reference an `extern` variable, it should be no > wonder that you can get undefined symbol linkage failures. > > I'd much rather remove all of __force_order. > > > > > Aside of that is there a reason to make this 'static' thing wrapped in > > #ifdeffery? A quick check with GCC8.3 just works. But maybe 4.9 gets > > unhappy. Can't say due to: -ENOANCIENTCOMPILER :) > > >From the comment in arch/x86/boot/compressed/pgtable_64.c, there's a > hint that maybe gcc < 5 and -pie (CONFIG_RANDOMIZE_BASE?) would fail > due to undefined symbol, though I'm not sure which symbol the comment > is referring to. If it's __force_order, then removing outright likely > fixes that issue. Yes, it's __force_order. Compressed kernel is always -fPIE, and gcc <5 and clang will generate mov instructions with GOTPCREL relocations to load the address of __force_order into a register for use by the inline asm. gcc-5+ works because it doesn't use GOTPCREL for global variables, instead relying on the linker inserting copy relocations if necessary. > > Not sure about the comment in arch/x86/include/asm/special_insns.h > either; smells fishy like a bug with a compiler from a long time ago. > It looks like it was introduced in: > commit d3ca901f94b32 ("x86: unify paravirt parts of system.h") > Lore has this thread: > https://lore.kernel.org/lkml/4755a809.4050...@qumranet.com/ > Patch 4: > https://lore.kernel.org/lkml/11967844071346-git-send-email-gco...@redhat.com/ > It seems like there was a discussion about %cr8, but no one asked > "what's going on here with __force_order, is that right?" > Latest GCC release on December 4 2007 would have been GCC 4.2.2 according to: > https://gcc.gnu.org/releases.html > > Quick boot test of the below works for me, though I should probably > test hosting a virtualized guest since d3ca901f94b32 refers to > paravirt. Thoughts? It's unclear if there was a real problem this fixes, but if there was I'd expect it on native, not paravirt, given it's native that has this __force_order hack? gcc's documentation of volatile extended asm includes a caveat. https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile Near the end of 6.47.2.1: "Note that the compiler can move even volatile asm instructions relative to other code, including across jump instructions." and it provides an example of unexpected code motion, with the fix being adding an artificial dependency to the asm. So it might do something silly like reversing the order of two %crn writes, maybe?
RE: [PATCH] x86: work around clang IAS bug referencing __force_order
From: Nick Desaulniers > Sent: 13 August 2020 01:13 > > On Thu, Aug 6, 2020 at 3:11 PM Thomas Gleixner wrote: > > > > Arnd Bergmann writes: > > > When using the clang integrated assembler, we get a reference > > > to __force_order that should normally get ignored in a few > > > rare cases: > > > > > > ERROR: modpost: "__force_order" [drivers/cpufreq/powernow-k6.ko] > > > undefined! > > > > > > Add a 'static' definition so any file in which this happens can > > > have a local copy. > > > > That's a horrible hack. > > Agreed. And static means everyone gets their own copy, rather than > sharing one memory address. I guess no one actually writes to it, so > it doesn't really matter, but __force_order just seems so strange to > me. It could be changed to use a symbol that the linker script already defines. However it does look like a workaround for a broken version of gcc. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Thu, Aug 6, 2020 at 3:11 PM Thomas Gleixner wrote: > > Arnd Bergmann writes: > > When using the clang integrated assembler, we get a reference > > to __force_order that should normally get ignored in a few > > rare cases: > > > > ERROR: modpost: "__force_order" [drivers/cpufreq/powernow-k6.ko] undefined! > > > > Add a 'static' definition so any file in which this happens can > > have a local copy. > > That's a horrible hack. Agreed. And static means everyone gets their own copy, rather than sharing one memory address. I guess no one actually writes to it, so it doesn't really matter, but __force_order just seems so strange to me. > And the only reason why it does not trigger -Wunused-variable warnings > all over the place is because it's "referenced" in unused inline > functions and then optimized out along with the unused inlines. > > > * It is not referenced from the code, but GCC < 5 with -fPIE would fail > > * due to an undefined symbol. Define it to make these ancient GCCs > > work. > > Bah, we really should have moved straight to GCC5 instead of upping it > just to 4.9 > > > + * > > + * Clang sometimes fails to kill the reference to the dummy variable, so > > + * provide an actual copy. > > Can that compiler be fixed instead? I don't think so. The logic in the compiler whether to emit an "address is significant" assembler directive is based on whether the variable is "used." The "use" of `__force_order` is as output of all of these control register read/write functions' inline asm, even though the inline asm doesn't actually write to them. We'd have to peek inside of the inline asm and build "use/def chains" for the inline asm, to see that you don't actually use the output variable. Best we can do is see it listed as an output to the inline asm statement. And if you reference an `extern` variable, it should be no wonder that you can get undefined symbol linkage failures. I'd much rather remove all of __force_order. > > Aside of that is there a reason to make this 'static' thing wrapped in > #ifdeffery? A quick check with GCC8.3 just works. But maybe 4.9 gets > unhappy. Can't say due to: -ENOANCIENTCOMPILER :) >From the comment in arch/x86/boot/compressed/pgtable_64.c, there's a hint that maybe gcc < 5 and -pie (CONFIG_RANDOMIZE_BASE?) would fail due to undefined symbol, though I'm not sure which symbol the comment is referring to. If it's __force_order, then removing outright likely fixes that issue. Not sure about the comment in arch/x86/include/asm/special_insns.h either; smells fishy like a bug with a compiler from a long time ago. It looks like it was introduced in: commit d3ca901f94b32 ("x86: unify paravirt parts of system.h") Lore has this thread: https://lore.kernel.org/lkml/4755a809.4050...@qumranet.com/ Patch 4: https://lore.kernel.org/lkml/11967844071346-git-send-email-gco...@redhat.com/ It seems like there was a discussion about %cr8, but no one asked "what's going on here with __force_order, is that right?" Latest GCC release on December 4 2007 would have been GCC 4.2.2 according to: https://gcc.gnu.org/releases.html Quick boot test of the below works for me, though I should probably test hosting a virtualized guest since d3ca901f94b32 refers to paravirt. Thoughts? --- a/arch/x86/boot/compressed/pgtable_64.c +++ b/arch/x86/boot/compressed/pgtable_64.c @@ -5,15 +5,6 @@ #include "pgtable.h" #include "../string.h" -/* - * __force_order is used by special_insns.h asm code to force instruction - * serialization. - * - * It is not referenced from the code, but GCC < 5 with -fPIE would fail - * due to an undefined symbol. Define it to make these ancient GCCs work. - */ -unsigned long __force_order; - #define BIOS_START_MIN 0x2U/* 128K, less than this is insane */ #define BIOS_START_MAX 0x9f000U/* 640K, absolute maximum */ diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 59a3e13204c3..d2e0d53b0f69 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -10,46 +10,37 @@ #include #include -/* - * Volatile isn't enough to prevent the compiler from reordering the - * read/write functions for the control registers and messing everything up. - * A memory clobber would solve the problem, but would prevent reordering of - * all loads stores around it, which can hurt performance. Solution is to - * use a variable and mimic reads and writes to it to enforce serialization - */ -extern unsigned long __force_order; - void native_write_cr0(unsigned long val); static inline unsigned long native_read_cr0(void) { unsigned long val; - asm volatile("mov %%cr0,%0\n\t" : "=r" (val), "=m" (__force_order)); + asm volatile("mov %%cr0,%0\n\t" : "=r" (val)); return val; } static __always_inline unsigned long native_read_cr2(void) { unsigned long val; - asm volatile("mov %%cr2,%0\n\t" : "=r" (val), "=m" (__force_orde
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Fri, Aug 7, 2020 at 12:13 AM Thomas Gleixner wrote: > > Sedat Dilek writes: > > what is the status of this patch? > > Just looked at it. > > > I needed this one to be able to build VirtualBox via DKMS as an > > out-of-tree kernel-module. > > Not being able to build the vbox rootkit is a feature, not a bug. > It must be a funny job to work for Linux/x86. Keep your humour :-). We have a second issue hitting this problem when CONFIG_LKDTM=m. There are more details in CBL issue #1120. Especially see comments from Nick [2] and user pcc [3]. Thanks. - Sedat - [1] https://github.com/ClangBuiltLinux/linux/issues/1120 [2] https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-668736160 [3] https://github.com/ClangBuiltLinux/linux/issues/1120#issuecomment-668746486
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
Sedat Dilek writes: > what is the status of this patch? Just looked at it. > I needed this one to be able to build VirtualBox via DKMS as an > out-of-tree kernel-module. Not being able to build the vbox rootkit is a feature, not a bug. Thanks, tglx
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
Arnd Bergmann writes: > When using the clang integrated assembler, we get a reference > to __force_order that should normally get ignored in a few > rare cases: > > ERROR: modpost: "__force_order" [drivers/cpufreq/powernow-k6.ko] undefined! > > Add a 'static' definition so any file in which this happens can > have a local copy. That's a horrible hack. And the only reason why it does not trigger -Wunused-variable warnings all over the place is because it's "referenced" in unused inline functions and then optimized out along with the unused inlines. > * It is not referenced from the code, but GCC < 5 with -fPIE would fail > * due to an undefined symbol. Define it to make these ancient GCCs > work. Bah, we really should have moved straight to GCC5 instead of upping it just to 4.9 > + * > + * Clang sometimes fails to kill the reference to the dummy variable, so > + * provide an actual copy. Can that compiler be fixed instead? Aside of that is there a reason to make this 'static' thing wrapped in #ifdeffery? A quick check with GCC8.3 just works. But maybe 4.9 gets unhappy. Can't say due to: -ENOANCIENTCOMPILER :) Thanks, tglx
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Wed, May 27, 2020 at 6:53 AM Arnd Bergmann wrote: > > When using the clang integrated assembler, we get a reference > to __force_order that should normally get ignored in a few > rare cases: > > ERROR: modpost: "__force_order" [drivers/cpufreq/powernow-k6.ko] undefined! > > Add a 'static' definition so any file in which this happens can > have a local copy. > > Signed-off-by: Arnd Bergmann Hi Arnd, Looks like $ ARCH=i386 make CC=clang LLVM_IAS=1 -j71 defconfig+CONFIG_X86_POWERNOW_K6=m is the simplest reproducer. If I run $ llvm-readelf -s drivers/cpufreq/powernow-k6.o | grep __force_order 39: 0 NOTYPE GLOBAL DEFAULT UND __force_order we can indeed see an undefined reference to __force_order. If I build the .s file via $ ARCH=i386 make CC=clang LLVM_IAS=1 -j71 drivers/cpufreq/powernow-k6.s the only reference I see to __force_order is: 979 .addrsig_sym __force_order which is created by Clang's implicit -faddr-sig. If I disable that for this file via: ```diff diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile index f1b7e3dd6e5d..87d655d5af49 100644 --- a/drivers/cpufreq/Makefile +++ b/drivers/cpufreq/Makefile @@ -28,6 +28,9 @@ obj-$(CONFIG_X86_ACPI_CPUFREQ)+= acpi-cpufreq.o obj-$(CONFIG_X86_POWERNOW_K8) += powernow-k8.o obj-$(CONFIG_X86_PCC_CPUFREQ) += pcc-cpufreq.o obj-$(CONFIG_X86_POWERNOW_K6) += powernow-k6.o +ifdef CONFIG_CC_IS_CLANG +CFLAGS_powernow-k6.o += -fno-addrsig +endif obj-$(CONFIG_X86_POWERNOW_K7) += powernow-k7.o obj-$(CONFIG_X86_LONGHAUL) += longhaul.o obj-$(CONFIG_X86_E_POWERSAVER) += e_powersaver.o ``` then the module links without error, and we get no hits for __force_order from llvm-readelf -s. This makes me think there may be a bug in Clang generating address significance tables for global variables that are otherwise unused, resulting in such linkage failures. +pcc@ for that. I ran a creduce job on drivers/cpufreq/powernow-k6.i where I'd compile twice, one with the implicit default value of -faddr-sig and look for the undefined __force_order, and again with -fno-addrsig and ensure there was no undefined __force_order, which coughed up: extern int __force_order; int a(void) { asm("" : "=m"(__force_order)); return 0; } as the bare minimum for an address significant table. https://godbolt.org/z/cjfaqM I'll bet this is coming from the call to read_cr0() in powernow_k6_set_cpu_multiplier(). If __force_order is defined in arch/x86/boot/compressed/pgtable_64.c, then I'm not sure it's a good idea to build drivers/cpufreq/powernow-k6.c as a kernel module (CONFIG_X86_POWERNOW_K6=m) vs statically compiled in. Wouldn't __force_order need to be EXPORT'ed for kernel modules to use it safely? > --- > arch/x86/boot/compressed/pgtable_64.c | 2 ++ > arch/x86/include/asm/special_insns.h | 7 +++ > 2 files changed, 9 insertions(+) > > diff --git a/arch/x86/boot/compressed/pgtable_64.c > b/arch/x86/boot/compressed/pgtable_64.c > index c8862696a47b..8595194cea41 100644 > --- a/arch/x86/boot/compressed/pgtable_64.c > +++ b/arch/x86/boot/compressed/pgtable_64.c > @@ -12,7 +12,9 @@ > * It is not referenced from the code, but GCC < 5 with -fPIE would fail > * due to an undefined symbol. Define it to make these ancient GCCs work. > */ > +#ifndef CONFIG_CC_IS_CLANG > unsigned long __force_order; > +#endif > > #define BIOS_START_MIN 0x2U/* 128K, less than this is > insane */ > #define BIOS_START_MAX 0x9f000U/* 640K, absolute maximum */ > diff --git a/arch/x86/include/asm/special_insns.h > b/arch/x86/include/asm/special_insns.h > index 82436cb04ccf..7081e587c1ea 100644 > --- a/arch/x86/include/asm/special_insns.h > +++ b/arch/x86/include/asm/special_insns.h > @@ -16,8 +16,15 @@ > * A memory clobber would solve the problem, but would prevent reordering of > * all loads stores around it, which can hurt performance. Solution is to > * use a variable and mimic reads and writes to it to enforce serialization > + * > + * Clang sometimes fails to kill the reference to the dummy variable, so > + * provide an actual copy. > */ > +#ifdef CONFIG_CC_IS_CLANG > +static unsigned long __force_order; > +#else > extern unsigned long __force_order; > +#endif > > void native_write_cr0(unsigned long val); > > -- > 2.26.2 > > -- -- Thanks, ~Nick Desaulniers
Re: [PATCH] x86: work around clang IAS bug referencing __force_order
On Wed, May 27, 2020 at 3:53 PM Arnd Bergmann wrote: > > When using the clang integrated assembler, we get a reference > to __force_order that should normally get ignored in a few > rare cases: > > ERROR: modpost: "__force_order" [drivers/cpufreq/powernow-k6.ko] undefined! > > Add a 'static' definition so any file in which this happens can > have a local copy. > > Signed-off-by: Arnd Bergmann Hi, what is the status of this patch? I needed this one to be able to build VirtualBox via DKMS as an out-of-tree kernel-module. Package: virtualbox-dkms version 6.1.12-dfsg-8 (Debian/unstable) To quote myself (see [1]): Passing LLVM_IAS=1 results in: AR /var/lib/dkms/virtualbox/6.1.12/build/built-in.a MODPOST /var/lib/dkms/virtualbox/6.1.12/build/Module.symvers ERROR: modpost: "__force_order" [/var/lib/dkms/virtualbox/6.1.12/build/vboxdrv/vboxdrv.ko] undefined! Arnd's patch is mandatory to build with Clang's Integrated Assembler (make LLVM_IAS=1). Here: LLVM toolchain version 11.0.0-rc1 Feel free to add: Reported-by: Sedat Dilek Tested-by: Sedat Dilek Can one of the tip maintainers pick this up, please? Thanks. Regards, - Sedat - [1] https://github.com/ClangBuiltLinux/linux/issues/1104#issuecomment-667470053 > --- > arch/x86/boot/compressed/pgtable_64.c | 2 ++ > arch/x86/include/asm/special_insns.h | 7 +++ > 2 files changed, 9 insertions(+) > > diff --git a/arch/x86/boot/compressed/pgtable_64.c > b/arch/x86/boot/compressed/pgtable_64.c > index c8862696a47b..8595194cea41 100644 > --- a/arch/x86/boot/compressed/pgtable_64.c > +++ b/arch/x86/boot/compressed/pgtable_64.c > @@ -12,7 +12,9 @@ > * It is not referenced from the code, but GCC < 5 with -fPIE would fail > * due to an undefined symbol. Define it to make these ancient GCCs work. > */ > +#ifndef CONFIG_CC_IS_CLANG > unsigned long __force_order; > +#endif > > #define BIOS_START_MIN 0x2U/* 128K, less than this is > insane */ > #define BIOS_START_MAX 0x9f000U/* 640K, absolute maximum */ > diff --git a/arch/x86/include/asm/special_insns.h > b/arch/x86/include/asm/special_insns.h > index 82436cb04ccf..7081e587c1ea 100644 > --- a/arch/x86/include/asm/special_insns.h > +++ b/arch/x86/include/asm/special_insns.h > @@ -16,8 +16,15 @@ > * A memory clobber would solve the problem, but would prevent reordering of > * all loads stores around it, which can hurt performance. Solution is to > * use a variable and mimic reads and writes to it to enforce serialization > + * > + * Clang sometimes fails to kill the reference to the dummy variable, so > + * provide an actual copy. > */ > +#ifdef CONFIG_CC_IS_CLANG > +static unsigned long __force_order; > +#else > extern unsigned long __force_order; > +#endif > > void native_write_cr0(unsigned long val); > > -- > 2.26.2 > > -- > You received this message because you are subscribed to the Google Groups > "Clang Built Linux" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clang-built-linux+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/clang-built-linux/20200527135329.1172644-1-arnd%40arndb.de.
[PATCH] x86: work around clang IAS bug referencing __force_order
When using the clang integrated assembler, we get a reference to __force_order that should normally get ignored in a few rare cases: ERROR: modpost: "__force_order" [drivers/cpufreq/powernow-k6.ko] undefined! Add a 'static' definition so any file in which this happens can have a local copy. Signed-off-by: Arnd Bergmann --- arch/x86/boot/compressed/pgtable_64.c | 2 ++ arch/x86/include/asm/special_insns.h | 7 +++ 2 files changed, 9 insertions(+) diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c index c8862696a47b..8595194cea41 100644 --- a/arch/x86/boot/compressed/pgtable_64.c +++ b/arch/x86/boot/compressed/pgtable_64.c @@ -12,7 +12,9 @@ * It is not referenced from the code, but GCC < 5 with -fPIE would fail * due to an undefined symbol. Define it to make these ancient GCCs work. */ +#ifndef CONFIG_CC_IS_CLANG unsigned long __force_order; +#endif #define BIOS_START_MIN 0x2U/* 128K, less than this is insane */ #define BIOS_START_MAX 0x9f000U/* 640K, absolute maximum */ diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 82436cb04ccf..7081e587c1ea 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -16,8 +16,15 @@ * A memory clobber would solve the problem, but would prevent reordering of * all loads stores around it, which can hurt performance. Solution is to * use a variable and mimic reads and writes to it to enforce serialization + * + * Clang sometimes fails to kill the reference to the dummy variable, so + * provide an actual copy. */ +#ifdef CONFIG_CC_IS_CLANG +static unsigned long __force_order; +#else extern unsigned long __force_order; +#endif void native_write_cr0(unsigned long val); -- 2.26.2