Re: [PATCH v7 2/2] Kbuild: implement support for DWARF v5
On Thu, Feb 04, 2021 at 08:06:12AM +0900, Masahiro Yamada wrote: > GCC never outputs '.file 0', which is why > this test is only needed for Clang, correct? No, GCC outputs .file 0 if it during configure time detected assembler that supports it and doesn't have any of the known bugs related to it. But that means kernel doesn't need to care because GCC already took care of that. Jakub
Re: [PATCH v6 2/2] Kbuild: implement support for DWARF v5
On Fri, Jan 29, 2021 at 02:05:59PM -0800, Nick Desaulniers wrote: > Ah, I see. Then I should update the script I add > (scripts/test_dwarf5_support.sh) to feature detect that bug, since > it's the latest of the bunch. Also, should update my comment to note > that this requires binutils greater than 2.35.1 (which is what I have, > which fails, since the backport landed in ... what?!) How was this > backported to 2.35 > (https://sourceware.org/bugzilla/show_bug.cgi?id=27195#c12, Jan 26 > 2021) when binutils-2_35_1 was tagged sept 19 2020? Or will there be > a binutils 2.35.2 point release? AFAIK yes, soon. Jakub
Re: [PATCH v6 1/2] Kbuild: make DWARF version a choice
On Fri, Jan 29, 2021 at 04:32:32PM -0500, Arvind Sankar wrote: > Given what Jakub is saying, i.e. it was previously impossible to get > dwarf2 with gcc, and you get dwarf4 whether or not DEBUG_INFO_DWARF4 was It isn't impossible to get it, -gdwarf-2 works, it is just not a very good choice (at least unless one knows some debug info consumer is not DWARF3 or later ready). Though, even gcc -gdwarf-2 will use many extensions from DWARF3 and later, as long as there is no way to describe stuff in DWARF2. -gstrict-dwarf option requests that no DWARF extensions are used. Jakub
Re: [PATCH v6 2/2] Kbuild: implement support for DWARF v5
On Fri, Jan 29, 2021 at 01:05:56PM -0800, Nick Desaulniers wrote: > > Wasn't that fixed in GAS? > > https://sourceware.org/bugzilla/show_bug.cgi?id=27195 > > $ make LLVM=1 -j72 defconfig > $ ./scripts/config -e DEBUG_INFO -e DEBUG_INFO_DWARF5 > $ make LLVM=1 -j72 > ... > /tmp/init-d50d89.s: Assembler messages: > /tmp/init-d50d89.s:10: Error: file number less than one > /tmp/init-d50d89.s:11: Error: junk at end of line, first unrecognized > character is `m' > > which is https://sourceware.org/bugzilla/show_bug.cgi?id=25611. > > $ as --version | head -n1 > GNU assembler (GNU Binutils for Debian) 2.35.1 > > Maybe GAS should not need to be told -gdwarf-5 to parse these? Then > we would not need to pass -Wa,-gdwarf-5 via clang with > -no-integrated-as. That is what sw#27195 is about, just try current binutils 2.35, 2.36 or trunk branches. Jakub
Re: [PATCH v6 2/2] Kbuild: implement support for DWARF v5
On Fri, Jan 29, 2021 at 12:48:11PM -0800, Nick Desaulniers wrote: > > Should this be...? > > > > KBUILD_AFLAGS += -Wa,-gdwarf-5 > > No; under the set of conditions Clang is compiling .c to .S with DWARF > v5 assembler directives. GAS will choke unless told -gdwarf-5 via > -Wa,-gdwarf-5 for .c source files, hence it is a C flag, not an A Wasn't that fixed in GAS? https://sourceware.org/bugzilla/show_bug.cgi?id=27195 Jakub
Re: [PATCH v6 1/2] Kbuild: make DWARF version a choice
On Fri, Jan 29, 2021 at 11:43:17AM -0800, Nick Desaulniers wrote: > Modifies CONFIG_DEBUG_INFO_DWARF4 to be a member of a choice. Adds an > explicit CONFIG_DEBUG_INFO_DWARF2, which is the default. Does so in a > way that's forward compatible with existing configs, and makes adding > future versions more straightforward. > > Suggested-by: Arvind Sankar > Suggested-by: Fangrui Song > Suggested-by: Nathan Chancellor > Suggested-by: Masahiro Yamada > Signed-off-by: Nick Desaulniers > --- > Makefile | 6 +++--- > lib/Kconfig.debug | 21 - > 2 files changed, 19 insertions(+), 8 deletions(-) > > diff --git a/Makefile b/Makefile > index 95ab9856f357..20141cd9319e 100644 > --- a/Makefile > +++ b/Makefile > @@ -830,9 +830,9 @@ ifneq ($(LLVM_IAS),1) > KBUILD_AFLAGS+= -Wa,-gdwarf-2 > endif > > -ifdef CONFIG_DEBUG_INFO_DWARF4 > -DEBUG_CFLAGS += -gdwarf-4 > -endif > +dwarf-version-$(CONFIG_DEBUG_INFO_DWARF2) := 2 > +dwarf-version-$(CONFIG_DEBUG_INFO_DWARF4) := 4 > +DEBUG_CFLAGS += -gdwarf-$(dwarf-version-y) Why do you make DWARF2 the default? That seems a big step back from what the Makefile used to do before, where it defaulted to whatever DWARF version the compiler defaulted to? E.g. GCC 4.8 up to 10 defaults to -gdwarf-4 and GCC 11 will default to -gdwarf-5. DWARF2 is more than 27 years old standard, DWARF3 15 years old, DWARF4 over 10 years old and DWARF5 almost 4 years old... It is true that some tools aren't DWARF5 ready at this point, but with GCC defaulting to that it will change quickly, but at least DWARF4 support has been around for years. Jakub
Re: [PATCH] ilog2: Improve ilog2 for constant arguments
On Sat, Nov 21, 2020 at 09:23:10PM +0100, Luc Van Oostenryck wrote: > On Fri, Nov 20, 2020 at 01:51:54PM +0100, Peter Zijlstra wrote: > > > > Other option would be to change the const_ilog2 macro, though as the > > description says it is meant to be used also in C constant expressions, > > and while GCC will fold it to constant with constant argument even in > > those, perhaps it is better to avoid using extensions in that case. > > Just for info, the description is outdated and Sparse is just fine with > __builtin_clzll() and friends in constant expressions (since Feb 2017) Why is the description outdated? It is still an extension that not every compiler might fold in constant expressions. And, the large expressions aren't really a problem in constant expressions, they will be folded there to constant or error. The problem the patch was trying to solve is that the large expressions are a problem at least for GCC in runtime code when guarded by __builtin_constant_p, because __builtin_constant_p is folded quite late (intentionally so, so that more constants can be propagated into it, e.g. after inlining etc.), and the large expressions might confuse inliner heuristics. Jakub
[tip: core/core] ilog2 vs. GCC inlining heuristics
The following commit has been merged into the core/core branch of tip: Commit-ID: ecbd43f6728a5cf79c8b50ed326658e9181531b1 Gitweb: https://git.kernel.org/tip/ecbd43f6728a5cf79c8b50ed326658e9181531b1 Author:Jakub Jelinek AuthorDate:Wed, 21 Oct 2020 15:27:18 +02:00 Committer: Peter Zijlstra CommitterDate: Thu, 19 Nov 2020 11:26:18 +01:00 ilog2 vs. GCC inlining heuristics Hi! Based on the GCC PR97445 discussions, I'd like to propose following change, which should significantly decrease the amount of code in inline functions that use ilog2, but as I'm already two decades out of the Linux kernel development, I'd appreciate if some kernel developer could try that (all I have done is check that it gives the same results as before) and if it works submit it for inclusion into the kernel? Thanks. Improve ilog2 for constant arguments As discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445 the const_ilog2 macro generates a lot of code which interferes badly with GCC inlining heuristics, until it can be proven that the ilog2 argument can or can't be simplified into a constant. It can be expressed using __builtin_clzll builtin which is supported by GCC 3.4 and later and when used only in the __builtin_constant_p guarded code it ought to always fold back to a constant. Other compilers support the same builtin for many years too. Other option would be to change the const_ilog2 macro, though as the description says it is meant to be used also in C constant expressions, and while GCC will fold it to constant with constant argument even in those, perhaps it is better to avoid using extensions in that case. Signed-off-by: Jakub Jelinek Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20201021132718.GB2176@tucnak --- include/linux/log2.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/log2.h b/include/linux/log2.h index c619ec6..4307d34 100644 --- a/include/linux/log2.h +++ b/include/linux/log2.h @@ -156,7 +156,8 @@ unsigned long __rounddown_pow_of_two(unsigned long n) #define ilog2(n) \ ( \ __builtin_constant_p(n) ? \ - const_ilog2(n) :\ + ((n) < 2 ? 0 : \ +63 - __builtin_clzll (n)) :\ (sizeof(n) <= 4) ? \ __ilog2_u32(n) :\ __ilog2_u64(n) \
Re: [PATCH] Kbuild: implement support for DWARF5
On Tue, Nov 03, 2020 at 02:21:22PM -0800, Nick Desaulniers wrote: > > > This script fails for GCC 10. > > > > One thing is GCC DWARF-5 support, that is whether the compiler > > will support -gdwarf-5 flag, and that support should be there from > > GCC 7 onwards. > > I should improve my Kconfig check; I don't actually have a test for > -gdwarf-5 for the compiler. In godbolt, it looks like -gdwarf-5 > produces an error from GCC up until GCC 5.1. Does (5.1 < GCC < 7) not > produce DWARF5? No. After all, those versions also predate DWARF5. All 5.1 - 6.x did was start accepting -gdwarf-5 as experimental option that enabled some small DWARF subset (initially only a few DW_LANG_* codes newly added to DWARF5 drafts). Only GCC 7 (released after DWARF 5 has been finalized) started emitting DWARF5 section headers and got most of the DWARF5 changes in, e.g. including switching over most of the now standardized GNU extensions from their DW_*_GNU_* codes to DWARF5 DW_*). With GCC 5/6, you get: echo 'int i;' | gcc -c -o /tmp/test.o -xc - -gdwarf-5; readelf -wi /tmp/test.o | grep Version: Version: 4 while with 7+ Version: 5 instead. > Maybe there's a more specific test you had in mind? Guess what you want to test is what version you actually get in .debug_info if you compile with -gdwarf-5. > > Another separate thing is whether the assembler does support > > the -gdwarf-5 option (i.e. if you can compile assembler files > > with -Wa,-gdwarf-5) for GNU as I think that is binutils 35.1, > > i.e. very new); but only if you want to pass the -Wa,-gdwarf-5 > > only when compiling *.s and *.S files. That option is about whether > > the assembler will emit DWARF5 or DWARF2 .debug_line. > > It is fine to compile C sources with -gdwarf-5 and use DWARF2 > > .debug_line for assembler files if as doesn't support it. > > > > Yet another thing is if you can pass -Wa,-gdwarf-5 even when > > compiling C files. There are several bugs in that category that have been > > fixed only in the last few days on binutils trunk, I'd suggest > > just not to bother, GCC 11 will have proper test for fixed assembler > > and will pass -gdwarf-5 to as when compiling even C sources with -gdwarf-5. > > Do you have links? I would prefer to do feature detection rather than The https://gcc.gnu.org/r11-3693 https://gcc.gnu.org/r11-4338 commits contain those tests in gcc/configure.ac Jakub
Re: [PATCH] Kbuild: implement support for DWARF5
On Mon, Nov 02, 2020 at 11:20:41AM +0900, Masahiro Yamada wrote: > > --- /dev/null > > +++ b/scripts/test_dwarf5_support.sh > > @@ -0,0 +1,4 @@ > > +#!/bin/sh > > +# SPDX-License-Identifier: GPL-2.0 > > +set -eu > > +echo ".file 0 \"asdf\"" | $* -Wa,-gdwarf-5 -c -x assembler -o /dev/null - > > > > Please tell me how this script detects the dwarf-5 capability. > > > This script fails for GCC 10. One thing is GCC DWARF-5 support, that is whether the compiler will support -gdwarf-5 flag, and that support should be there from GCC 7 onwards. Another separate thing is whether the assembler does support the -gdwarf-5 option (i.e. if you can compile assembler files with -Wa,-gdwarf-5) for GNU as I think that is binutils 35.1, i.e. very new); but only if you want to pass the -Wa,-gdwarf-5 only when compiling *.s and *.S files. That option is about whether the assembler will emit DWARF5 or DWARF2 .debug_line. It is fine to compile C sources with -gdwarf-5 and use DWARF2 .debug_line for assembler files if as doesn't support it. Yet another thing is if you can pass -Wa,-gdwarf-5 even when compiling C files. There are several bugs in that category that have been fixed only in the last few days on binutils trunk, I'd suggest just not to bother, GCC 11 will have proper test for fixed assembler and will pass -gdwarf-5 to as when compiling even C sources with -gdwarf-5. The reason is to get DWARF5 .debug_line (.debug_line is usually produced by the assembler, not compiler, from .file/.loc directives). Jakub
Re: GCC section alignment, and GCC-4.9 being a weird one
On Wed, Oct 21, 2020 at 10:00:31AM +0200, Peter Zijlstra wrote: > On Tue, Oct 20, 2020 at 04:39:38PM -0700, Florian Fainelli wrote: > > This patch causes all files under kernel/sched/* that include sched.h to > > be rebuilt whenever the value of CONFIG_BLK_DEV_INITRD. There are at > > least two build systems (buildroot and OpenWrt) that toggle this > > configuration value in order to produce a kernel image without an > > initramfs, and one with. > > > > On ARM we get all of these to be needlessly rebuilt: > > Is it really ARM specific? AFAICT this should happen on everything. > > > Short of moving the STRUCT_ALIGNMENT to a separate header that would not > > be subject to any configuration key change, can you think of a good way > > to avoid these rebuilds, including for architectures like ARM that ship > > their own vmlinux.lds.h? I would not say this is a bug, but it is > > definitively an inconvenience. > > Well, no :/ I barely made it work in the first place. This linker cruft > is not my forte. GCC-4.9 being 'special' here is just weird in any case. > > We can ask our friends on linux-toolchains; maybe they'll have a clue. > > Guys, the problem is the below commit which, for dubious raisins makes > kernel/sched/sched.h depend on asm-generic/vmlinux.lds.h and triggers > rebuilds whenever a CONFIG mentioned in asm-generic/vmlinux.lds.h > changes. > > Is there an explanation for why GCC-4.9 is weird and is there a better > way to find the appropriate value? I have only looked at x86. If you are talking about: struct sched_class { #define A(n) void (*n) (void); #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) B(fn1) B(fn2) B(fn3) B(fn4) A(fn50) A(fn51) A(fn52) A(fn53) } __attribute__((aligned(32))); struct sched_class a; having the the variable aligned to 64 bytes rather than 32 bytes, it started with https://gcc.gnu.org/r0-127405-gd3c2fee09607e7d70cc7e69822638fab2bda6c7b and disappeared with https://gcc.gnu.org/r5-5887-g239711f6afe3070a11c4f1d9266588e8db1217ee As the aligned attribute is just on the type and not on the actual variables, I don't consider it a bug, GCC only guarantees it when the variable itself has user specified alignment. Otherwise the compiler can choose to align variables more if they are large for performance reasons (to help vectorization e.g. when the variable is copied around). Plus the ABI sometimes requires bigger alignment too (e.g. on x86_64 the psABI says that array variables larger than 15 bytes must have alignment at least 16 bytes). So, if you tweak the above testcase to have either struct sched_class a __attribute__((aligned(32))); or even remove it also from the sched_class struct, you should get the behavior you want even from GCC 4.9 and other GCC versions. And, if e.g. 32-byte alignment is not what you really need, but e.g. natural alignment of the struct sched_class would be sufficient, you could use __attribute__((aligned (alignof (struct sched_class on the variables instead (perhaps wrapped in some macro). BTW, the 32 vs. 64 vs. whatever else byte alignment is heavily architecture and GCC version dependent, it is not that on all arches larger structures will be always 32 byte aligned no matter what. > Fixes: c3a340f7e7ea ("sched: Have sched_class_highest define by > vmlinux.lds.h") > Reported-by: kernel test robot > Signed-off-by: Peter Zijlstra (Intel) > Link: > https://lkml.kernel.org/r/20200630144905.gx4...@hirez.programming.kicks-ass.net > > diff --git a/include/asm-generic/vmlinux.lds.h > b/include/asm-generic/vmlinux.lds.h > index 66fb84c3dc7e..3ceb4b7279ec 100644 > --- a/include/asm-generic/vmlinux.lds.h > +++ b/include/asm-generic/vmlinux.lds.h > @@ -108,6 +108,17 @@ > #define SBSS_MAIN .sbss > #endif > > +/* > + * GCC 4.5 and later have a 32 bytes section alignment for structures. > + * Except GCC 4.9, that feels the need to align on 64 bytes. > + */ > +#if __GNUC__ == 4 && __GNUC_MINOR__ == 9 > +#define STRUCT_ALIGNMENT 64 > +#else > +#define STRUCT_ALIGNMENT 32 > +#endif > +#define STRUCT_ALIGN() . = ALIGN(STRUCT_ALIGNMENT) > + > /* > * The order of the sched class addresses are important, as they are > * used to determine the order of the priority of each sched class in > @@ -123,13 +134,6 @@ > *(__stop_sched_class) \ > __end_sched_classes = .; > > -/* > - * Align to a 32 byte boundary equal to the > - * alignment gcc 4.5 uses for a struct > - */ > -#define STRUCT_ALIGNMENT 32 > -#define STRUCT_ALIGN() . = ALIGN(STRUCT_ALIGNMENT) > - > /* The actual configuration determine if the init/exit sections > * are handled as text/data or they can be discarded (which > * often happens at runtime) > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 5aa6661ecaf1..9bef2dd01247 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -67,6 +67,7 @@ > #include > > #include > +#include > > #ifdef CONFIG_PARAVIRT > # include > @@ -1810,
Re: [PATCH] [RFT] crypto: aes-generic - turn off -ftree-pre and -ftree-sra
On Wed, Dec 20, 2017 at 09:52:05PM +0100, Arnd Bergmann wrote: > diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c > index ca554d57d01e..35f973ba9878 100644 > --- a/crypto/aes_generic.c > +++ b/crypto/aes_generic.c > @@ -1331,6 +1331,20 @@ EXPORT_SYMBOL_GPL(crypto_aes_set_key); > f_rl(bo, bi, 3, k); \ > } while (0) > > +#if __GNUC__ >= 7 > +/* > + * Newer compilers try to optimize integer arithmetic more aggressively, > + * which generally improves code quality a lot, but in this specific case > + * ends up hurting more than it helps, in some configurations drastically > + * so. This turns off two optimization steps that have been shown to > + * lead to rather badly optimized code with gcc-7. > + * > + * See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 > + */ > +#pragma GCC optimize("-fno-tree-pre") > +#pragma GCC optimize("-fno-tree-sra") So do it only when UBSAN is enabled? GCC doesn't have a particular predefined macro for those (only for asan and tsan), but either the kernel does have something already, or could have something added in the corresponding Makefile. Jakub
Re: [PATCH] ARC: uaccess: get_user to zero out dest in cause of fault
On Sat, Aug 20, 2016 at 05:45:00PM -0700, Linus Torvalds wrote: > You have to save the stack pointer at the setjmp point too. And there > might be other architecture-specific ABI rules for that. But you're > right, it might be worth it. > > I *would* be a bit worried about code generation issues. > setjmp/longjmp is so seldom used that it's one of those things where > it might be best to verify with some gcc person that it doesn't cause > huge code-gen problems. > > Adding Jakub just to check: Jakub, would a setjump/longjump kind of > interface for exception handling going to cause us problems > (performance or correctness) with gcc? If you plan to use setjmp/longjmp a lot, then it is certainly a major performance and compile time/memory problem. Older versions don't model it properly, and newer gccs emit abnormal edges from every longjmp or call that might longjmp to an artificial basic block and from there to every setjmp. Also note that gcc has/supports two setjmp kind of APIs, normal setjmp and slightly more lightweight __builtin_setjmp which saves fewer registers, and on some targets is/used to be used for EH instead of DWARF based ones. Jakub
Re: Q: why didn't GCC warn about this uninitialized variable? (was: Re: [PATCH] perf tests: initialize sa.sa_flags)
On Thu, Mar 03, 2016 at 02:47:16PM +0100, Ingo Molnar wrote: > I tried to distill a testcase out of it, and the following silly hack seems > to > trigger it: ... This is a known issue, which we don't have a solution for yet. The thing is, GCC has 2 uninitialized warning passes, one is done very early, on fairly unoptimized code, which warns for -O and above only about must be uninitialized cases in code that is executed unconditionally (if the containing function is executed, and doesn't have PHI handling code), and then a very late uninitialized pass, that warns also about maybe-uninitialized cases, has predicate aware handling in it, etc.; but this warns only about the cases where the uninitialized uses survived through the optimizations until that phase. In the testcase, the conditional uninitialized uses got optimized away, passes seeing that you can get alt_idx initialized say to 2 from one branch and uninitialized from another one just optimize it into 2. Warning right away at that spot when the optimization pass performs this might not be the right thing, as it could warn for stuff in dead code, or couldn't be backed up by the predicate aware uninit analysis which is costly and couldn't be done in every pass that just happens to optimize away some uninitialized stuff. Not to mention that it doesn't have to be always even so obvious to the optimizing pass. Say, when computing value ranges, the uninitialized uses should be ignored, because they can't be used in valid paths, so if say you have value range [2, 34] from one branch and uninitialized use from another branch, the resulting value range will be [2, 34]. Then later on, you just optimize based on this value range and perhaps the uninitialized use will go away because of that. We could handle the uninitialized uses pessimistically, by not optimizing PHI into just initialized_2, etc., by considering uninitialized uses as VARYING ([min, max] range) rather than something that doesn't happen, etc., and then the late uninitialized pass would warn here. But then we'd trade the warning for less optimized code. GCC is primarily an optimizing compiler, rather than static analyzer, so that is why GCC chooses to do what it does. Do you want us introduce -Ow mode, which will prefer warnings over generated code quality? BTW, as for false positives and new warnings, my experience is that in the kernel generally such warnings are just disabled, even if they helped discover severe errors in other packages. Jakub
Re: Q: why didn't GCC warn about this uninitialized variable? (was: Re: [PATCH] perf tests: initialize sa.sa_flags)
On Thu, Mar 03, 2016 at 02:24:34PM +0100, Ingo Molnar wrote: > 6 hours of PeterZ time translates to quite a bit of code restructuring > overhead to > eliminate false positive warnings... I'll file a bugzilla enhancement request for this (with new attribute), perhaps we could do it in FRE that is able to see through memory stores/loads even in addressable structures in some cases. Though, certainly GCC 7 material. And, in this particular case it couldn't do anything anyway, because the sigfillset call is not inlined, and takes address of a field in the structure. The compiler can't know if it doesn't cast it back to struct sigaction and initialize the other fields. BTW, valgrind should be able to detect this. Jakub
Re: Q: why didn't GCC warn about this uninitialized variable? (was: Re: [PATCH] perf tests: initialize sa.sa_flags)
On Thu, Mar 03, 2016 at 01:19:44PM +0100, Ingo Molnar wrote: > struct sigaction sa; > > ... > > sigfillset(&sa.sa_mask); > sa.sa_sigaction = segfault_handler; > sigaction(SIGSEGV, &sa, NULL); > > ... which uninitialized sa.sa_flags field GCC merrily accepted as proper C > code, > despite us turning on essentially _all_ GCC warnings for the perf build that > exist > under the sun: GCC -W{,maybe-}uninitialized warning works only on SSA form, so in order to get that warning, either it needs to be some scalar that is uninitialized, or Scalar Replacement of Aggregates needs to be able to turn the structure into independent scalars. Neither is the case here, as you take address of the struct when passing its address to sigaction, and that call can't be inlined nor is in any other way visible to the compiler, so that it could optimize both the caller and sigaction itself at the same time. Even if GCC added infrastructure for tracking which bits/bytes in aggregates are or might be uninitialized at which point, generally, struct XYZ abc; foo (&abc); is so common pattern that warning here that the fields are uninitialized would have extremely high false positive ratio. Even if as somebody mentioned that the argument is const struct sigaction * rather than struct sigaction *, that doesn't change really anything, you can cast away the constness and still write into it in the other function. Furthermore, in many APIs, only a subset of fields need to be initialized unconditionally, and other fields might need to be initialized only conditionally, depending on those always initialized fields, other parameters to functions, etc. So, in order to warn here, we'd need some assurance (new attribute on sigaction function?) that when it is called, it has to have all named fields in the pointed to structures initialized (perhaps attribute like nonnull, which can either apply to all pointer arguments, or selected ones) at the entry of the function and not initializing them all is a bug. So, this really isn't as trivial as you might think it is. Jakub
Re: gcc feature request / RFC: extra clobbered regs
On Wed, Jul 01, 2015 at 01:35:16PM -0400, Vladimir Makarov wrote: > Actually it raise a question for me. If we describe that a function > clobbers more than calling convention and then use it as a value (assigning > a variable or passing as an argument) and loosing a track of it and than > call it. How can RA know what the call clobbers actually. So for the > function with the attributes we should prohibit use it as a value or make > the attributes as a part of the function type, or at least say it is unsafe. > So now I see this as a *bigger problem* with this extension. Although I > guess it already exists as we have description of different ABI as an > extension. Unfortunately target attribute is function decl attribute rather than function type. And having more attributes affect switchable targets will be non-fun. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: gcc feature request / RFC: extra clobbered regs
On Wed, Jul 01, 2015 at 11:23:17AM -0400, Vladimir Makarov wrote: > >>(I'm not necessarily suggesting that we do this for the syscall bodies > >>themselves. I want to do it for the entry and exit helpers, so we'd > >>still lose the five cycles in the full fast-path case, but we'd do > >>better in the slower paths, and the slower paths are becoming > >>increasingly important in real workloads.) > >GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG > >options, which allow to tweak the calling conventions; but it is per > >translation unit right now. It isn't clear which of these options > >you mean with the extra_clobber. > >I assume you are looking for a possibility to change this to be > >per-function, with caller with a different calling convention having to > >adjust for different ABI callee. To some extent, recent GCC versions > >do that automatically with -fipa-ra already - if some call used registers > >are not clobbered by some call and the caller can analyze that callee, > >it can stick values in such registers across the call. > >I'd say the most natural API for this would be to allow > >f{fixed,call-{used,saved}}-REG in target attribute. > > > > > One consequence of frequent changing calling convention per function or > register usage could be GCC slowdown. RA calculates too many data and it > requires a lot of time to recalculate them after something in the register > usage convention is changed. That is true. i?86/x86_64 is a switchable target, so at least for the case of info computed for the callee with non-standard calling convention such info can be computed just once when the function with such a target attribute would be seen first. But for the caller side, I agree not everything can be precomputed, if we can't use e.g. regsets saved in the callee; as a single function can call different functions with different ABIs. But to some extent we have that already with -fipa-ra, don't we? Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: gcc feature request / RFC: extra clobbered regs
On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote: > I'm working on a massive set of cleanups to Linux's syscall handling. > We currently have a nasty optimization in which we don't save rbx, > rbp, r12, r13, r14, and r15 on x86_64 before calling C functions. > This works, but it makes the code a huge mess. I'd rather save all > regs in asm and then call C code. > > Unfortunately, this will add five cycles (on SNB) to one of the > hottest paths in the kernel. To counteract it, I have a gcc feature > request that might not be all that crazy. When writing C functions > intended to be called from asm, what if we could do: > > __attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14", > "r15"))) void func(void); > > This will save enough pushes and pops that it could easily give us our > five cycles back and then some. It's also easy to be compatible with > old GCC versions -- we could just omit the attribute, since preserving > a register is always safe. > > Thoughts? Is this totally crazy? Is it easy to implement? > > (I'm not necessarily suggesting that we do this for the syscall bodies > themselves. I want to do it for the entry and exit helpers, so we'd > still lose the five cycles in the full fast-path case, but we'd do > better in the slower paths, and the slower paths are becoming > increasingly important in real workloads.) GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG options, which allow to tweak the calling conventions; but it is per translation unit right now. It isn't clear which of these options you mean with the extra_clobber. I assume you are looking for a possibility to change this to be per-function, with caller with a different calling convention having to adjust for different ABI callee. To some extent, recent GCC versions do that automatically with -fipa-ra already - if some call used registers are not clobbered by some call and the caller can analyze that callee, it can stick values in such registers across the call. I'd say the most natural API for this would be to allow f{fixed,call-{used,saved}}-REG in target attribute. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Optimize variable_test_bit()
On Fri, May 01, 2015 at 09:03:32AM -0700, Linus Torvalds wrote: > > PPS. Jakub, I see gcc5.1 still hasn't got output operands for asm goto; > > is this something we can get 'fixed' ? CCing Richard as author of asm goto and Vlad as register allocator maintainer. There are a few enhancement requests to support this, like http://gcc.gnu.org/PR59615 and http://gcc.gnu.org/PR52381 , but indeed the reason why no outputs are allowed is the register allocation issue. Don't know if LRA would be better suited to handle that case, but it would indeed be pretty hard. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bit fields && data tearing
On Thu, Sep 04, 2014 at 08:24:12AM -0400, Peter Hurley wrote: > And I just confirmed with the Alpha cross-compiler that the fields are > not 'padded out' if volatile either. They can't be, struct layout is part of the ABI. Guess you can introduce say atomic_bool and similar typedefs which would be bool or char on most arches and on alpha int. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bit fields && data tearing
On Thu, Sep 04, 2014 at 10:57:40AM +0200, Mikael Pettersson wrote: > Benjamin Herrenschmidt writes: > > On Wed, 2014-09-03 at 18:51 -0400, Peter Hurley wrote: > > > > > Apologies for hijacking this thread but I need to extend this discussion > > > somewhat regarding what a compiler might do with adjacent fields in a > structure. > > > > > > The tty subsystem defines a large aggregate structure, struct tty_struct. > > > Importantly, several different locks apply to different fields within > that > > > structure; ie., a specific spinlock will be claimed before updating or > accessing > > > certain fields while a different spinlock will be claimed before > updating or > > > accessing certain _adjacent_ fields. > > > > > > What is necessary and sufficient to prevent accidental false-sharing? > > > The patch below was flagged as insufficient on ia64, and possibly ARM. > > > > We expect native aligned scalar types to be accessed atomically (the > > read/modify/write of a larger quantity that gcc does on some bitfield > > cases has been flagged as a gcc bug, but shouldn't happen on normal > > scalar types). > > > > I am not 100% certain of "bool" here, I assume it's treated as a normal > > scalar and thus atomic but if unsure, you can always use int. > > Please use an aligned int or long. Some machines cannot do atomic > accesses on sub-int/long quantities, so 'bool' may cause unexpected > rmw cycles on adjacent fields. Yeah, at least pre-EV56 Alpha performs rmw cycles on char/short accesses and thus those are not atomic. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] gcc version 5: add basic definition header for latest gcc version
On Fri, Aug 15, 2014 at 03:23:01PM -0400, Paul Gortmaker wrote: > --- /dev/null > +++ b/include/linux/compiler-gcc5.h > @@ -0,0 +1,52 @@ > +#ifndef __LINUX_COMPILER_H > +#error "Please don't include directly, include > instead." > +#endif > + > +#define __used __attribute__((__used__)) > +#define __must_check __attribute__((warn_unused_result)) > +#define __compiler_offsetof(a, b) __builtin_offsetof(a, b) > + > +/* Mark functions as cold. gcc will assume any path leading to a call > + to them will be unlikely. This means a lot of manual unlikely()s > + are unnecessary now for any paths leading to the usual suspects > + like BUG(), printk(), panic() etc. [but let's keep them for now for > + older compilers] > + > + Early snapshots of gcc 4.3 don't support this and we can't detect this > + in the preprocessor, but we can live with this because they're unreleased. > + Maketime probing would be overkill here. The above 4 lines are probably unnecessary. More importantly, is it a good idea to store the gcc major number in the header? gcc 5.x will be released next year, but the year after that we'll have gcc 6.x, so you'd need to copy this header each year. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
On Tue, Aug 05, 2014 at 03:36:39PM -0700, Linus Torvalds wrote: > On Tue, Aug 5, 2014 at 2:07 PM, Frank Ch. Eigler wrote: > > > > Actually, "perf probe" does (via HAVE_DWARF_SUPPORT), to place probes > > and to extract variables at those probes, much as systemtap does. > > Without var-tracking, probes placed at most interior points of > > functions will make variables inaccessible. > > .. and as mentioned, -O2 already does that for many things, even > *with* tracking. Sure, debug info coverage for highly optimized code is never going to be perfect, but there is a difference if you have 75% of vars or just 33% of vars (see the numbers I've posted, I've picked ext4.o just randomly because it was one of the largest modules, can post more numbers if needed). BTW, because var-tracking is not performed at -O0, sometimes debug info is actually worse in -O0 than at higher optimization levels, because variable locations are bogus in prologues, epilogues and for variables with register keyword anywhere. There have been several man-years of work to get from the 25% var coverage to 67%, several DWARF extensions (most of them to be available in DWARF5 or work in progress on that) and with -fno-var-tracking-assignments that is just returned to the old state. > In other words, anybody who relies on it has already learnt to work > around it. Or, more likely, there just isn't anybody who relies on it. > > I don't understand how you guys can be so cavalier about a compiler > bug that has already resulted in actual real problems. You bring up I have no problem with a -fno-var-tracking-assignments workaround for compilers that have the PR61801 wrong-code bug. What I have problem with is with disabling it even for compilers that have that bug fixed. That is in essence disabling a useful feature just because it could have other bugs. If my memory serves me well, PR61801 is the only wrong-code I remember caused by -fvar-tracking-assignments during the 5 years since it has been introduced into gcc. Sure, there have been several -fcompare-debug bugs, where we generated slightly different code between -g and -g0, and as you mentioned we have one still pending (Vladimir is working on it right now), but that is mainly relevant to the case where you'd ship -g0 built binaries (== kernel) and then only if bugs appear wanted someone else to build kernel with -g and get identical binary, so that you could debug it. I believe if people build kernel with -g, then they usually build it with -g from the beginning, and either save the kernel with debug info somewhere, strip it to file or handle it similarly. If there is a fear there could be other wrong-code bugs with -fno-var-tracking-assignments, from the past experience that would be ~ another 5 years to discover it. Compare that to the frequency of -O2 wrong code issues, with that you'd need to disable -O2 because of the fear of unknown compiler bugs first. And, we had various wrong-code bugs even at -O0, so even that wouldn't help. Compiler bugs are just that, bugs that need to be reported, fixed, fixed compiler distributed to users, it is the same thing with kernel bugs, libc bugs etc. > theoretical cases that nobody has actually reported, and are > apparently ignoring the fact that the compiler generates INCORRECT > CODE. So on one hand we have known breakage, on the other we have It actually isn't theoretical, actually various -fvar-tracking-assignments changes have been done because people complained about important variables in the kernel being optimized away, the whole DW_OP_GNU_entry_value DWARF extension (DW_OP_entry_value in DWARF5 when it is released) was added because of bugreports from people trying to debug the kernel. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
On Tue, Aug 05, 2014 at 01:46:49PM +0200, Markus Trippelsdorf wrote: > On 2014.08.05 at 07:31 -0400, Josh Boyer wrote: > > Sorry to bring this back up after the fact, but it's important for a > > number of things in various distros. I don't disagree it should be > > disabled by default, but making it unconditional is going to force the > > distributions that care about perf, systemtap, and debuggers to > > manually revert this. That deviation is concerning because the > > upstream kernel won't easily be buildable the same way distros build > > it. > > > > I'm happy to come up with a config option patch, but I'm not sure if > > it would be accepted. Is that a possibility at this point? > > Please note that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61923 But that isn't a wrong-code bug, in some cases it is no code generation difference between -g and -g0 at all (just :TI on the dumps being different), in some cases it can affect scheduling decisions for real, but still not in a way that disregards the actual dependencies in between instructions. You can get such scheduling differences also if you just use slightly different compiler revision. > isn't fixed yet. So it would be premature to manually revert Linus' > patch yet. > > When PR61923 gets fixed (and backported) its testcase could be combined > with the testcase Jakub posted earlier in this thread. Just some data to show that the debug info differences are nothing close to insignificant. I've built a 3.16 kernel (from Fedora 21 rpm) both normally (which includes the -fno-var-tracking-assignments change) and with that line in Makefile commented out. E.g. on fs/ext4/ext4.o, dwlocstat says for -fno-var-tracking-assignments: dwlocstat --tabulate=0.0:10,99,100 fs/ext4/ext4.o cov%samples cumul 0.0 19620/75% 19620/75% 0..10 518/1% 20138/77% 11..20 317/1% 20455/78% 21..30 362/1% 20817/79% 31..40 349/1% 21166/81% 41..50 240/0% 21406/82% 51..60 297/1% 21703/83% 61..70 293/1% 21996/84% 71..80 355/1% 22351/85% 81..90 402/1% 22753/87% 91..99 901/3% 23654/90% 100 2425/9% 26079/100% and without it: dwlocstat --tabulate=0.0:10,99,100 fs/ext4/ext4.o cov%samples cumul 0.0 8697/33%8697/33% 0..10 431/1% 9128/35% 11..20 371/1% 9499/36% 21..30 608/2% 10107/38% 31..40 525/2% 10632/40% 41..50 461/1% 11093/42% 51..60 881/3% 11974/45% 61..70 678/2% 12652/48% 71..80 894/3% 13546/51% 81..90 1036/3% 14582/55% 91..99 949/3% 15531/59% 100 10548/40% 26079/100% So, with -fno-var-tracking-assignments, about 75% of parameters/variables have no location info at all in ext4.o, 9% of parameters/variables have coverage for all instructions where the parameter/variable is in scope, the rest something in between. Without -fno-var-tracking-assignments, only 33% of params/vars have no location info at all, and 40% have coverage on all instructions where they are in scope. Even for the params/vars without 100% coverage there can be seen improvements, e.g. in the 51%-90% coverage. Or e.g. -fno-var-tracking-assignments: dwlocstat --tabulate=0.0:10,99,100 lib/built-in.o cov%samples cumul 0.0 6323/64%6323/64% 0..10 144/1% 6467/66% 11..20 138/1% 6605/67% 21..30 123/1% 6728/68% 31..40 144/1% 6872/70% 41..50 169/1% 7041/72% 51..60 130/1% 7171/73% 61..70 146/1% 7317/74% 71..80 198/2% 7515/76% 81..90 258/2% 7773/79% 91..99 541/5% 8314/85% 100 1448/14%9762/100% without -fno-var-tracking-assignments: dwlocstat --tabulate=0.0:10,99,100 lib/built-in.o cov%samples cumul 0.0 2954/30%2954/30% 0..10 131/1% 3085/31% 11..20 110/1% 3195/32% 21..30 141/1% 3336/34% 31..40 212/2% 3548/36% 41..50 226/2% 3774/38% 51..60 254/2% 4028/41% 61..70 237/2% 4265/43% 71..80 325/3% 4590/47% 81..90 420/4% 5010/51% 91..99 425/4% 5435/55% 100 4328/44%9763/100% It is visible also e.g. in the section sizes on vmlinux: -fno-var-tracking-assignments: readelf -WS vmlinux 2>&1 | awk '/\.debug_/{printf "%s %fMB\n", $2, strtonum("0x"$6)/1024.0/1024.0}' .debug_aranges 0.115387MB .debug_info 96.625183MB .debug_abbrev 2.585021MB .debug_line 5.936069MB .debug_frame 1.710655MB .debug_str 1.956143MB .debug_loc 8.713246MB .debug_ranges 3.020081MB without -fno-var-tracking-assignments: readelf -WS vmlinux 2>&1 | awk '/\.debug_/{printf "%s %fMB\n", $2, strtonum("0x"$6)/1024.0/1024.0}' .debug_aranges 0.115387MB .debug_info 99.564449MB .debug_abbrev 2.665213MB .debug_line 5.936069MB .debug_frame 1.710655MB .debug_str 1.955960MB .debug_loc 22.607447MB .debug_ranges 3.020264MB .debug_info growth is only very small (not even 3MB), only few vars have the same value everywhere, but .debug_loc growth is significant (2.6 times bigger). Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
On Wed, Jul 30, 2014 at 09:13:08AM +0200, Markus Trippelsdorf wrote: > On 2014.07.30 at 08:53 +0200, Jakub Jelinek wrote: > > On Tue, Jul 29, 2014 at 06:49:09PM -0700, Greg Kroah-Hartman wrote: > > > 3.15-stable review patch. If anyone has any objections, please let me > > > know. > > > > IMNSHO this is a too big hammer approach. The bug happened on a single > > file only (right?), so if anything, IMHO it could be disabled for that > > single file only, and better do it only for compilers with the bug. > > No. There are many more files affected. It just happens that Linus > analyzed the assembly of this single file (fair.c) and found a bug. > Just build your redhat distro kernel with GCC_COMPARE_DEBUG=1 and you'll > see. So unless someone analyzes the assembly output of all other > affected files by hand and finds no issues, one has to assume the worst. I'm talking about wrong-code issues. For -fcompare-debug, we indeed check it primarily during gcc bootstrap (through bootstrap-debug) and some testcases, and we'll certainly try to build some more code with -fcompare-debug and fix the issues. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
On Tue, Jul 29, 2014 at 06:49:09PM -0700, Greg Kroah-Hartman wrote: > 3.15-stable review patch. If anyone has any objections, please let me know. IMNSHO this is a too big hammer approach. The bug happened on a single file only (right?), so if anything, IMHO it could be disabled for that single file only, and better do it only for compilers with the bug. If there are wrong code bugs with -O2 (we've fixed many in the past), you also don't turn off -O2 everywhere, similarly for -Os or any other options. Disabling it just in case the same bug happens elsewhere when it actually took 5 years before the bug caused miscompilation of something is too defensive. We had at least 15 other wrong-code bugfixes just in between 4.9.0 and 4.9.1. -fvar-tracking-assignments doesn't make a small difference in debug info, but significant for optimized code. If you build the kernel without and with -fno-var-tracking-assignments, you can use e.g. the dwlocstat tool to see what kind of difference it makes for the kernel in particular in variable debug info coverage. > > --- a/Makefile > +++ b/Makefile > @@ -669,6 +669,8 @@ KBUILD_CFLAGS += -fomit-frame-pointer > endif > endif > > +KBUILD_CFLAGS += $(call cc-option, -fno-var-tracking-assignments) > + > ifdef CONFIG_DEBUG_INFO > KBUILD_CFLAGS+= -g > KBUILD_AFLAGS+= -Wa,--gdwarf-2 > Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Random panic in load_balance() with 3.16-rc
On Mon, Jul 28, 2014 at 08:09:02PM +0200, Markus Trippelsdorf wrote: > Here's the testcase: > > int a, b, c; > void fn1 () > { > int d; > if (fn2 () && !0) > { > b = ( >{ >int e; >fn3 (); >switch (0) >default: >asm volatile("" : "=a"(e) : "0"(a), "i"(0)); >e; >}); > d = b; > } > c = d; > } int a, c; int bar (void); void baz (void); void foo (void) { int d; if (bar ()) { int e; baz (); asm volatile ("" : "=a" (e) : "0" (a), "i" (0)); d = e; } c = d; } fails the same way and has more creduce cruft removed. Fails also with 4.7 at -O2 -fcompare-debug. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Random panic in load_balance() with 3.16-rc
On Sat, Jul 26, 2014 at 10:20:55PM +0200, Markus Trippelsdorf wrote: > On 2014.07.26 at 15:55 -0400, Theodore Ts'o wrote: > > On Sat, Jul 26, 2014 at 09:35:57PM +0200, Markus Trippelsdorf wrote: > > > > > > But fortunately the workaround for the new inode.c bug is the same as > > > for the original bug: -fno-var-tracking-assignments. > > > > > > It would make sense to enabled it unconditionally for all debug > > > configurations for now. > > > > What's the downside of enabling this unconditionally on a compiler > > with the bug fixed? I assume a certain amount of optimization will > > lost, but is it significant/measurable? > > Only the quality of the debug info would suffer a bit. Which for various tools that use kernel's debug info is a significant difference. So adding the option even for fixed gcc is undesirable (and, tracking gcc version numbers only is not enough, I guess most of the distro gccs will backport the fix soon). This PR is the first -fcompare-debug wrong-code in the last few years I remember. There are -fcompare-debug failures from time to time, but usually they are just that either there is insignificant code change or no change at all, just changes in the text dump files -fcompare-debug uses to check whether there might be code differences or not. GCC's stated goal is that -g should not affect code generation, so we treat all such differences as bugs, but most of the time they aren't breaking anything. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Random panic in load_balance() with 3.16-rc
On Fri, Jul 25, 2014 at 01:01:11PM -0700, Linus Torvalds wrote: > For example, gcc will not create a small stack frame with "sub > $8,%rsp". No, what gcc does is to use a random "push" instruction. > Fair enough, but that really makes things much harder to see. Here's > an example: That is because for -Os, push is certainly shorter than sub $8,%rsp. If you want to test for this gcc bug in the kernel, supposedly one should just take the short testcase from the GCC PR, try to compile it and see if you e.g. get a -fcompare-debug -Os failure with the testcase. In that case, you could instead of giving up completely just -fno-var-tracking-assignments. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Random panic in load_balance() with 3.16-rc
On Thu, Jul 24, 2014 at 11:47:17AM -0700, Linus Torvalds wrote: > Adding Jakub to the cc, because gcc-4.9.0 seems to be terminally broken. ... > Jakub, any ideas? Can I ask anyone involved in this for preprocessed source and all gcc command line options to reproduce it, best in the form of a http://gcc.gnu.org/bugzilla/ bugreport? Thanks. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tell gcc optimizer to never introduce new data races
On Tue, Jun 10, 2014 at 05:13:29PM +0200, Peter Zijlstra wrote: > On Tue, Jun 10, 2014 at 05:04:55PM +0200, Marek Polacek wrote: > > On Tue, Jun 10, 2014 at 04:53:27PM +0200, Peter Zijlstra wrote: > > > On Tue, Jun 10, 2014 at 03:23:36PM +0200, Jiri Kosina wrote: > > > > +# Tell gcc to never replace conditional load with a non-conditional one > > > > +KBUILD_CFLAGS += $(call cc-option,--param allow-store-data-races=0) > > > > + > > > > > > Why do we not want: -fmemory-model=safe? And should we not at the very > > > least also disable packed-store-data-races? > > > > Note that the option does not exist, even though it is mentioned in the > > documentation. > > Urgh.. ok. Any word on the packed-store-data thing? That is recognized, undocumented and never used in the compiler (not in 4.7 or any later release till now). Most of the spots in the compiler that could introduce data races were actually just changed, there is (already since 4.7) just a single conditional on the --param allow-store-data-races=X value. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:x86/urgent] compiler/gcc4: Make quirk for asm_volatile_goto( ) unconditional
On Thu, Feb 13, 2014 at 03:37:08AM -0800, tip-bot for Steven Noonan wrote: > Commit-ID: a9f180345f5378ac87d80ed0bea55ba421d83859 > Gitweb: http://git.kernel.org/tip/a9f180345f5378ac87d80ed0bea55ba421d83859 > Author: Steven Noonan > AuthorDate: Wed, 12 Feb 2014 23:01:07 -0800 > Committer: Ingo Molnar > CommitDate: Thu, 13 Feb 2014 12:34:05 +0100 > > compiler/gcc4: Make quirk for asm_volatile_goto() unconditional > > I started noticing problems with KVM guest destruction on Linux > 3.12+, where guest memory wasn't being cleaned up. I bisected it > down to the commit introducing the new 'asm goto'-based atomics, > and found this quirk was later applied to those. > > Unfortunately, even with GCC 4.8.2 (which ostensibly fixed the > known 'asm goto' bug) I am still getting some kind of > miscompilation. If I enable the asm_volatile_goto quirk for my > compiler, KVM guests are destroyed correctly and the memory is > cleaned up. BTW, which exact 4.8.2 were you using? The last known asm goto bug has been fixed on October, 10th, 2013: http://gcc.gnu.org/PR58670 so before the October, 16th, 2013 4.8.2 release. But already since May 31th, 2013 the tip of the 4.8 GCC branch has been announcing itself as 4.8.2 prerelease. While some distribution versions of GCC announce themselves as the new version only starting from the release date, i.e. snapshots in between 4.8.1 release and 4.8.2 release announce themselves as 4.8.1, in other distributions or upstream it announces itself as 4.8.2. So, if you are using the latter and a snapshot in between May 31th, 2013 and October, 10th, 2013, then you could see gcc patchlevel 2, yet have a gcc with that bug unfixed. So, if the kernel doesn't use a runtime test/configure test to check for this issue, but instead just relies on the patchlevel version, the only safe way would be to look for GCC >= 4.9 or GCC 4.8 with patchlevel > 2 rather than > 1. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] compiler/gcc4: Add quirk for 'asm goto' miscompilation bug
On Wed, Feb 12, 2014 at 08:11:49PM -0800, Linus Torvalds wrote: > Jakub, any suggestions to how Steven might be able to pinpoint where > the code generation problem lies? For a suspected wrong-code where you have no idea where the problem is from debugging or oops etc., usually the best way is to bisect first at the *.o level between (ABI compatible) objects built in a way that it works (in this case supposedly the asm goto workaround you have in the kernel, but generally it can be -O0 compilation, a different version of the compiler, some other compiler flags that make the issue go away) and objects built in a way that fails, once you narrow it down to (hopefully a single) object where everything built the "bad" way but that single object works in the end and everything built the "good" way but that single object fails, the next step is to narrow it down to a single routine (unless there is e.g. just a single asm goto left in the TU). For that one can try e.g. optimize attribute on selected functions, or in this case supposedly preprocessing the file and bisecting between portions of the preprocessed file from "good" preprocessed file (with the asm goto workarounds applied) and "bad" preprocessed file (without them) - count the number of asm gotos there and bisect to find which asm goto matters. Or it is also possible to bisect at the *.s level (easiest is to build with -fno-asynchronous-unwind-tables -fno-exceptions -g0 if possible to decrease number of labels, build twice ("good" and "bad"), then rename the .LNNN labels in one or both files such that they don't clash (say to .LXNNN in one file, .LYNNN in another one) and then bisect at the function level - start with taking approximately first half of functions from one file and second half from another file, etc.). When you narrow it down to function level, eyeball the assembly and try to find out where it is wrong, or try to create a self-contained executable testcase out of the preprocessed source (cut out everything unneeded from the preprocessed file, keep there just the problematic routine and everything that is inlined into it and all symbols they need, supply from another file? a new main that calls the routine with the parameters that cause the problem and stub all the functions it calls with something that still reproduces it), or even just file a http://gcc.gnu.org/bugzilla/ bug with the preprocessed source, compiler options and hopefully detailed description what to look at. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: current_thread_info() not respecting program order with gcc 4.8.x
On Tue, Nov 19, 2013 at 04:57:49PM +0100, Peter Zijlstra wrote: > On Tue, Nov 19, 2013 at 03:29:12PM +, Mathieu Desnoyers wrote: > > However, looking at ARM arch/arm/include/asm/thread_info.h: > > > > static inline struct thread_info *current_thread_info(void) { register > > unsigned long sp asm ("sp"); return (struct thread_info *)(sp & > > ~(THREAD_SIZE - 1)); } > > > > The inline assembly has no clobber and is not volatile. (this is also > > true for all other architectures I've looked at so far, which includes > > x86 and powerpc) The above is not inline assembly, it is a local register variable extension, see http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Local-Reg-Vars.html#Local-Reg-Vars > > Since each current_thread_info() is a different asm ("sp") without > > clobber nor volatile, AFAIU, the compiler is within its right to > > reorder them. Sure. > > One possible solution to this might be to add "memory" clobber and > > volatile to this inline asm, but I fear it would put way too much > > constraints on the compiler optimizations (too heavyweight). As it is not inline asm extension, you can't. Of course you could add asm volatile ("" : "=r" (ret) : "0" (sp)); or similar and thus make it a barrier, but I think the current definition of current_thread_info meant to avoid all that extra overhead. Why does it matter when sp is different between the current_thread_info () calls? As long as sp & ~(THREAD_SIZE - 1) is the same, it shouldn't make a difference. Or is the call in between those changing sp to something else? Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] gcc4: Add 'asm goto' miscompilation quirk
On Thu, Oct 10, 2013 at 07:04:18AM -0700, Richard Henderson wrote: > On 10/10/2013 01:31 AM, Jakub Jelinek wrote: > > Also, for the bitops patch, you probably want an asm_volatile_goto variant. > > Why? Asm without output (which asm goto must be) are automatically volatile. You're right. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH, -v2] compiler/gcc4: Add quirk for 'asm goto' miscompilation bug
On Thu, Oct 10, 2013 at 01:56:17PM +0200, Peter Zijlstra wrote: > On Thu, Oct 10, 2013 at 10:55:06AM +0200, Ingo Molnar wrote: > > +/* > > + * GCC 'asm goto' miscompiles certain code sequences: > > + * > > + * http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 > > + * > > + * Work it around via quirk suggested by Jakub Jelinek. > > + * Fixed in GCC 4.8.2 and later versions. > > + */ > > +#if GCC_VERSION <= 40801 > > We didn't do version checks for CC_HAVE_ASM_GOTO because of vendor > backports; can't we detect this in the same way? The problem is that it will be harder to check for this as compile time only check, and for runtime check you'd need to have the assembly string for every architecture and you couldn't do it for cross-compiling anyway. For compile time only check, it wouldn't be 100% reliable, you could e.g. check for that using -S -O2 -xc - -o - on: int foo (int a, int b) { if (a) return -3; asm volatile goto ("asm volatile goto to %l[lab]" : : "m" (b) : "memory" : lab); return 0; lab: return 0; } and use awk on the resulting assembly to find out if the asm volatile goto to (.*)$ string, then skip lines starting in column 0 with an assembly comment character(s) (#, %, //, not sure if those 3 are all you can see) and check that the first non-skipped line starts with the string matching (.*) earlier followed by : (or perhaps skip other labels too?). That said, the check could fail even in fixed gccs, so perhaps you want to combine that with both version check and test, if version is >= 4.8.3 (note, while I hope it will be fixed in 4.8.2 release, people using prerelease compilers would still have __GNUC_PATCHLEVEL__ == 2, at least in upstream gcc (e.g. in Fedora/RHEL we patch down the patchlevel version, so that __GNUC_PATCHLEVEL__ is 2 only for GCC release x.y.2 and following snapshots, while upstream bumps patchlevel immediately after a release is made), even with gcc containing that bug. So for >= 4.8.3 just assume no workaround is needed, otherwise scan assembly. > > > +# define __asm_goto(vol, x...) do { asm vol goto(x); asm (""); } while (0) > > +#else > > +# define __asm_goto(vol, x...) do { asm vol goto(x); } while (0) > > +#endif > > This places the asm("") in the fallthrough case; but Jakub wrote: > > > @@ -8,6 +8,7 @@ foo (int a, int b) > >asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : > > lab); > >return 0; > > lab: > > + asm (""); > >return 0; > > } > > Which places the asm ("") after the label, these two are not the same. See the follow-up mails, I think placing it immediately after asm goto might be better. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] gcc4: Add 'asm goto' miscompilation quirk
On Thu, Oct 10, 2013 at 10:24:30AM +0200, Ingo Molnar wrote: > Something like the patch below? (Totally untested and all that.) > > Notes: > > - If the bug is fixed in 4.8.3 then the version check can be sharpened > from 9 to 40803. The bug is likely going to be fixed already for 4.8.2 (to be released next week or so). > - I'd really prefer this quirk versus having to add the extra barrier to > the label, as it makes the actual usage sites a lot less painful. Please check how much it bloats the generated code. Also, for the bitops patch, you probably want an asm_volatile_goto variant. > --- a/include/linux/compiler-gcc4.h > +++ b/include/linux/compiler-gcc4.h > @@ -65,6 +65,19 @@ > #define __visible __attribute__((externally_visible)) > #endif > > +/* > + * GCC 'asm goto' miscompiles certain code sequences: > + * > + * http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 > + * > + * Work it around via quirk suggested by Jakub Jelinek. > + * Not yet fixed, so use the quirk on all compiler versions: > + */ > +#if GCC_VERSION <= 9 > +# define asm_goto(x...) do { asm goto(x); asm (""); } while (0) > +#else > +# define asm_goto(x...) do { asm goto(x); } while (0) > +#endif > > #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP > #if GCC_VERSION >= 40400 Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Thu, Oct 10, 2013 at 08:51:04AM +0200, Jakub Jelinek wrote: > @@ -8,6 +8,7 @@ foo (int a, int b) >asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : lab); >return 0; > lab: > + asm (""); >return 0; > } Or alternatively put the asm (""); right after asm goto, asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : lab); asm (""); return ...; lab; return ...; What generates better code remains to be tested. In any case, please conditionalize the hacks on non-fixed compilers once the fix is released. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Thu, Oct 10, 2013 at 08:22:38AM +0200, Ingo Molnar wrote: > > On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote: > > > On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote: > > > > > > > Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 Seems all of > > > > 4.[6-9] miscompile it. Will have a look tomorrow unless somebody > > > > beats me to it. But historically, the case where asm goto labels > > > > jump to fallthru basic block had numerous problems in the past. > > > > > > That bug lists the component as middle end; this suggests x86_64 would > > > be vulnerable too, can you confirm? So far we've only observed the > > > wrong code on i386 targets, x86_64 targets appeared correct. > > > > Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and > > even say on ppc64 (sure, one would have to rewrite the asm to have it > > fail at runtime). > > Please let us know once you know enough about the bug to suggest > workarounds. Because it's a nice optimization even extra instruction(s) > would be acceptable I suspect: we could perhaps put a NOP into a slowpath, > with an (unused) goto to it, or something like that? IMHO you don't need to put there a nop, I guess asm (""); would be enough, that will still make sure the label is never in the fallthru basic block and the whole class of issues with asm goto with labels in the fallthru bb can't hit. The disadvantage is that it will generate worse code. @@ -8,6 +8,7 @@ foo (int a, int b) asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : lab); return 0; lab: + asm (""); return 0; } on the testcase from the PR results in something like: #APP # 8 "pr58670-1.c" 1 bts $1, -4(%rsp); jc .L3 # 0 "" 2 #NO_APP .L5: xorl%eax, %eax ret .p2align 4,,10 .p2align 3 .L3: xorl%eax, %eax ret .p2align 4,,10 .p2align 3 .L4: movl$-3, %eax ret while code without the extra asm (""); and with a fixed compiler: #APP # 6 "pr58670.c" 1 bts $1, -4(%rsp); jc .L3 # 0 "" 2 #NO_APP .L3: xorl%eax, %eax ret .p2align 4,,10 .p2align 3 .L4: .L2: movl$-3, %eax ret FYI, list of past compiler issues with asm goto include: PR54127, PR46226, PR44071, PR52650, PR54455, PR51767. I hope we get this fixed for 4.8.2, so you could then avoid these hacks for GCC 4.8.2 and later. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote: > On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote: > > Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 > > Seems all of 4.[6-9] miscompile it. Will have a look tomorrow > > unless somebody beats me to it. But historically, the case where > > asm goto labels jump to fallthru basic block had numerous problems in the > > past. > > That bug lists the component as middle end; this suggests x86_64 would > be vulnerable too, can you confirm? So far we've only observed the wrong > code on i386 targets, x86_64 targets appeared correct. Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and even say on ppc64 (sure, one would have to rewrite the asm to have it fail at runtime). Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Wed, Oct 09, 2013 at 04:46:56PM +0200, Peter Zijlstra wrote: > On Wed, Oct 09, 2013 at 04:33:59PM +0200, Peter Zijlstra wrote: > > On Wed, Oct 09, 2013 at 04:07:34PM +0200, Peter Zijlstra wrote: > > > Once I force a x86_64 build using the 'same' config it goes away and > > > generates 'sensible' code again (although I don't see why L9 isn't > > > merged with L2): > > > > i386-SMP also generates correct code afaict; a tad stupid but not wrong. > > > > If I remove ftrace from the .config its still broken.. > > If I also remove the likely/unlikely tracer its still broken and lots > > smaller: > > OK, its -march=winchip2 that's buggered. Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 Seems all of 4.[6-9] miscompile it. Will have a look tomorrow unless somebody beats me to it. But historically, the case where asm goto labels jump to fallthru basic block had numerous problems in the past. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86] BUG: unable to handle kernel paging request at 00740060
On Tue, Oct 08, 2013 at 08:51:54PM +0200, Oleg Nesterov wrote: > On 10/08, Linus Torvalds wrote: > > > > (not yet merged), see: > > > > > > http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d > > I do not really understand inline assembly constraints, but I'll ask > anyway. > > +#define __GEN_RMWcc(fullop, var, cc, ...) \ > +do { \ > + asm volatile goto (fullop "; j" cc " %l[cc_label]" \ > + : : "m" (var), ## __VA_ARGS__ \ > ^ > > don't we need > > "+m" (var) > > here? You actually can't have output operands with asm goto, only inputs and clobbers. But the "memory" clobber should be enough here. If you suspect a compiler bug, can somebody please narrow it down to a single object file (if I've skimmed the patch right, it is just an optimization, where object files compiled without and with the patch should actually coexist fine in the same kernel), ideally to a single routine if possible and post a preprocessed source + gcc command line + version of gcc? Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Friendlier EPERM - Request for input
On Wed, Jan 09, 2013 at 12:53:40PM -0800, Casey Schaufler wrote: > I'm suggesting that the string returned by get_extended_error_info() > ought to be the audit record the system call would generate, regardless > of whether the audit system would emit it or not. What system call would that info be for and would it be reset on next syscall that succeeded, or also failed? The thing is, various functions e.g. perform some syscall, save errno, do some other syscall, and if they decide that the first syscall should be what determines the whole function's errno, just restore errno from the saved value and return. Similarly, various functions just set errno upon detecting some error condition in userspace. There is no 1:1 mapping between many libc library calls and syscalls. So, when would it be safe to call this new get_extended_error_info function and how to determine to which syscall it was relevant? Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86.git#mm] stack protector fixes, vmsplice exploit
On Thu, Feb 14, 2008 at 09:25:35PM +0100, Ingo Molnar wrote: > The per function call overhead from stackprotector is already pretty > serious IMO, but at least that's something that GCC _could_ be doing > (much) smarter (why doesnt it jne forward out to __check_stk_failure, > instead of generating 4 instructions, one of them a default-mispredicted > branch instruction??), so that overhead could in theory be something > like 4 fall-through instructions per function, instead of the current 6. Where do you see a mispredicted branch? int foo (void) { char buf[64]; bar (buf); return 6; } -O2 -fstack-protector -m64: subq$88, %rsp movq%fs:40, %rax movq%rax, 72(%rsp) xorl%eax, %eax movq%rsp, %rdi callbar movq72(%rsp), %rdx xorq%fs:40, %rdx movl$6, %eax jne .L5 addq$88, %rsp ret .L5: .p2align 4,,6 .p2align 3 call__stack_chk_fail -O2 -fstack-protector -m32: pushl %ebp movl%esp, %ebp subl$88, %esp movl%gs:20, %eax movl%eax, -4(%ebp) xorl%eax, %eax leal-68(%ebp), %eax movl%eax, (%esp) callbar movl$6, %eax movl-4(%ebp), %edx xorl%gs:20, %edx jne .L5 leave ret .L5: .p2align 4,,7 .p2align 3 call__stack_chk_fail -O2 -fstack-protector -m64 -mcmodel=kernel: subq$88, %rsp movq%gs:40, %rax movq%rax, 72(%rsp) xorl%eax, %eax movq%rsp, %rdi callbar movq72(%rsp), %rdx xorq%gs:40, %rdx movl$6, %eax jne .L5 addq$88, %rsp ret .L5: .p2align 4,,6 .p2align 3 call__stack_chk_fail both with gcc 4.1.x and 4.3.0. BTW, you can use -fstack-protector --param=ssp-buffer-size=4 etc. to tweak the size of buffers to trigger stack protection, the default is 8, but e.g. whole Fedora is compiled with 4. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: asm-x86/sigcontext.h changes break userland
On Wed, Feb 13, 2008 at 08:26:50AM +0100, Ingo Molnar wrote: > > * Jakub Jelinek <[EMAIL PROTECTED]> wrote: > > > x86: use generic register names in struct sigcontext > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=742fa54a62be6a263df14a553bf832724471dfbe > > > > changeset breaks userland, e.g. it is not possible to compile gcc > > anymore (both 32-bit and 64-bit libgcc), and I expect any other > > program which pokes into struct sigcontext. The register names with e > > resp. r have been in use for years, what's the point breaking it now? > > ok - does the patch below solve the problem for you? Yes, this fixes it. Thanks. FYI, gcc uses glibc headers to get at struct sigcontext, but on i386 (and many other arches) glibc's just includes . On x86_64, ia64 and sparc* glibc doesn't include asm/sigcontext.h, but provides its own definitions, so for gcc itself only changing 32-bit parts woiuld be enough. That said, there are certainly programs which include asm/sigcontext.h directly (plus there are other c libraries, some of which may use asm/sigcontext.h). Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
asm-x86/sigcontext.h changes break userland
Hi! The x86: use generic register names in struct sigcontext http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=742fa54a62be6a263df14a553bf832724471dfbe changeset breaks userland, e.g. it is not possible to compile gcc anymore (both 32-bit and 64-bit libgcc), and I expect any other program which pokes into struct sigcontext. The register names with e resp. r have been in use for years, what's the point breaking it now? Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [regression] Re: brk randomization breaks columns
On Tue, Feb 05, 2008 at 01:54:26PM +0100, Ingo Molnar wrote: > * Jiri Kosina <[EMAIL PROTECTED]> wrote: > > > On Tue, 5 Feb 2008, Pavel Machek wrote: > > > > > > Actually, this clearly shows that either prehistoric libc.so.5 or the > > > > program itself are broken. > > > I believe it shows clear regression in latest 2.6.25 kernel. > > > > I am still not completely sure. It might be a regression, but it also > > might just trigger the bug in ancient version in libc.so.5 which might > > be fixed in some later version [...] > > which too is a regression ... > > really, lets add a sysctl for this, and a .config option that either > disables or enables it. Then we will default to disabled. (but users can > enable it - and distros can build their kernels with this .config option > enabled) I don't think kernel should care about programs which are buggy and make invalid assumptions, and that's the case here. I remember we have been through this 5 years ago when brk randomization has been added to Red Hat kernels. There was one or two broken programs which made assumptions on what brk(0) is supposed to return at program startup, everything else was ok. For the buggy apps there is always setarch i386 -R ./the_buggy_program so I don't think we need to add another sysctl for this. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Possible 2.6.24-rc7 issue w/respect to pthreads
On Wed, Jan 09, 2008 at 02:35:32AM -0800, [EMAIL PROTECTED] wrote: > After I patched my 2.6.23 kernel to 2.6.24-rc7 this morning, I noticed > some odd behavior with respect to POSIX threads in a test program I had > written (originally to test epoll.) > > The behavior is as follows: > > 1. main() creates a new thread of execution with pthread_create > 2. thread_func() immediately calls pthread_detach(), which is supposed to > ensure that thread resources are cleaned up when the thread terminates. > 3. The spawned thread sleeps and then prints a message "got here" > 4. The main thread calls pthread_join(). According to the POSIX > documentation, this should suspend execution until the spawned thread has > terminated. Your testcase is buggy. Detached threads aren't joinable, you can't call pthread_join on them. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fedora's latest gcc produces unbootable kernels
On Mon, Dec 03, 2007 at 12:34:17PM +0100, Thomas Gleixner wrote: > Of course just to annoy you :) It doesn't matter whether I'm annoyed about this or not, but whether gcc is able to generate decent code with it or not. And especially with union it is not, at least through all the tree ssa passes. You already have a lot of the details hidden in ktime.h accessor inlines, so I don't think it would be hard to add further one or two. Anyway, even just using typedef struct ktime { s64 tv64; } ktime_t; could make things better in case you have just one field. Unlike unions, structs can be (and in this case most likely will be) scalarized by SRA, so half of tree SSA passes will see it as integral var and will be able to perform optimizations on it. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fedora's latest gcc produces unbootable kernels
On Mon, Dec 03, 2007 at 09:17:22AM +0100, Thomas Gleixner wrote: > I looked at the disassembly but I can not spot the problem. > > I think the real problem is somewhere else. Likely candidates are > hrtimer_forward() or hrtimer_start() - in that order. Should be hopefully fixed in latest Fedora gcc. The problem was in code like typedef union { long long int s; } U; typedef struct { U u; } S; void foo (S *s, long long int x, unsigned long int y) { s->u = ({ (U) { .s = s->u.s + x * y }; }); } where a backport of a recent optimization of mine, without which gcc handles terribly initializers from compound literals (which is something hrtimer uses just everywhere - why can't ktime.h for #if BITS_PER_LONG == 64 || defined(CONFIG_KTIME_SCALAR) just use a scalar rather than union with a scalar in it??), sets the LHS object to the compound literal's initializer rather than forcing creation of a temporary object (the compound literal). Unfortunately the gimplifier had some bugs in case the initializer references (or at least might reference) parts of LHS object. Fixed by backporting 2 Ada bugfixes for the gimplifier from GCC trunk (Ada was hitting those bugs even without this compound literal optimization). Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC: 2.6 patch] add -fno-tree-scev-cprop to KBUILD_CFLAGS
On Sun, Nov 11, 2007 at 07:48:29AM +0100, Adrian Bunk wrote: > The gcc from svn that will become gcc 4.3 generates libgcc calls in > cases like the following (on 32bit architectures): > > <-- snip --> > > static inline void timespec_add_ns(struct timespec *a, u64 ns) > { > ... > while(ns >= NSEC_PER_SEC) { > ns -= NSEC_PER_SEC; > a->tv_sec++; > } > ... > > <-- snip --> Blindly using -fno-tree-scev-cprop just to get rid of one case where this turns out to be a pessimization when kernel knows ns is usually very small is IMHO a wrong thing, you'd lose many cases where this optimization can actually improve performance. Instead, for this exact case just add an optimization barrier to avoid gcc doing this. Adding asm ("" : "=r" (ns) : "0" (ns)); (or hide it in some macro) into the loop will do the job just fine. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [20/45] x86_64: Use 8 byte stack alignment when possible
On Fri, Sep 21, 2007 at 10:45:02PM +0200, Andi Kleen wrote: > > Kernel doesn't use SSE2, so it doesn't need 16 byte alignment. Also > the stack can be already unaligned so letting the compiler align > is useless. This may make some stack frames smaller. Shouldn't sources that are compiled into the VDSO or VSYSCALL pages revert this to the default? Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: increase AT_VECTOR_SIZE to terminate saved_auxv properly
On Fri, Sep 14, 2007 at 01:00:57PM +0200, Olaf Hering wrote: > include/asm-powerpc/elf.h has 6 entries in ARCH_DLINFO. > fs/binfmt_elf.c has 14 unconditional NEW_AUX_ENT entries and 2 > conditional NEW_AUX_ENT entries. > So in the worst case, saved_auxv does not get an AT_NULL entry at the > end. > > Is an AT_NULL entry required or must userspace use the AT_VECTOR_SIZE > to not loop past the end of the array? Of course it is required, AT_VECTOR_SIZE is a kernel implementation detail. > If AT_NULL is required, AT_VECTOR_SIZE should be changed from 44 to 46. No, it should be computed instead from the number of target independent aux vector pairs and then from an per-arch macro which says how many arch specific aux vector pairs are needed. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [RESEND] PIE executable randomization
On Wed, Aug 08, 2007 at 04:03:07PM +0200, Jiri Kosina wrote: > @@ -870,11 +917,15 @@ static int load_elf_binary(struct linux_binprm *bprm, > struct pt_regs *regs) >* default mmap base, as well as whatever program they >* might try to exec. This is because the brk will >* follow the loader, and is not movable. */ > +#ifdef CONFIG_X86 > + load_bias = 0; > +#else > load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr); > +#endif > } > > error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt, > - elf_prot, elf_flags); > + elf_prot, elf_flags,0); > if (BAD_ADDR(error)) { > send_sig(SIGKILL, current, 0); > retval = IS_ERR((void *)error) ? If I'm reading the above hunk correctly, this means we will randomize all PIEs and even all dynamic linkers invoked as executables on i?86 and x86_64, and on the rest of arches we won't randomize at all, instead load ET_DYN objects at ELF_ET_DYN_BASE address. But I don't see anything i?86/x86_64 specific on this. What would make much more sense to me would be conditionalizing on whether we are loading a dynamic linker (in which case loading it at ELF_ET_DYN_BASE is desirable or not (PIEs, ...; and for PIEs we want to randomize on all architectures). So something like if (elf_interpreter) load_bias = 0; else /* Probably dynamic linker invoked as /lib*/ld*so* program args - load at ELF_ET_DYN_BASE. */ load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr); instead of #ifdef CONFIG_X86 load_bias = 0; #else load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr); #endif Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Implementation of POSIX mqueues in Linux 2.6
On Fri, Aug 03, 2007 at 09:59:32AM +, gregfe wrote: > I find little documentation on the actual implementation of POSIX message > queues in Linux, and need some advise. In particular, I am wondering > whether it supports inter-process *and* inter-thread communication, and if Not sure what exactly you mean by inter-thread communication, whether communication between threads within one process or between threads from different processes. You can use mq_* for either, except that mq_notify registered signal notification is sent to the process that called mq_notify, not thread (and for SIGEV_THREAD a new thread is created). Though of course for communication between threads within one process mq_* is a huge overkill. > On more thing: kernel's "make menuconfig" of > version 2.6.11 says : > > >> To use this feature you will also need mqueue library, available > > >> from <... a URL ... to M. Wronski's and K. Benedyczak's home page>" > > Is it still up to date ? No, glibc supports mq_* APIs for more than 3 years now. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: gcc fixed size char array initialization bug - known?
On Thu, Aug 02, 2007 at 09:55:51PM +0200, Guennadi Liakhovetski wrote: > I've run across the following gcc "feature": > > char c[4] = "01234"; > > gcc emits a nice warning > > warning: initializer-string for array of chars is too long > > But do a > > char c[4] = "0123"; > > and - a wonder - no warning. No warning with gcc 3.3.2, 3.3.5, 3.4.5, > 4.1.2. I was told 4.2.x does produce a warning. 4.2.x nor 4.3 doesn't warn either and it is correct not to warn about perfectly valid code. ISO C99 is very obvious in that the terminating '\0' (resp. L'\0') from the string literal is only added if there is room in the array or if the array has unknown size. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: smaller kernel with no real time futexes
On Wed, Aug 01, 2007 at 09:24:34PM +0200, Andi Kleen wrote: > Adrian, > > You said earlier you're looking at smaller allnoconfig kernels. > One thing I noticed recently that realtime pi futexes are always > enabled and that pulls in a lot of other code (like the plists) > > Userland needs to handle them not being available anyways for older > kernels. > > Might be worth looking into turning that into a CONFIG. That's a very bad idea. glibc configured for 2.6.18 and higher kernels assumes PI futexes are present. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: vdso.so mislinked by buggy linker was Re: Linus 2.6.23-rc1
On Mon, Jul 23, 2007 at 01:56:20AM +0200, Andi Kleen wrote: > On Monday 23 July 2007 01:38:40 Andre Noll wrote: > [readded linux-kernel, Linus] > > > [Nr] Name Type Address Offset > >Size EntSize Flags Link Info Align > > [ 0] NULL > > 0 0 0 > > [ 1] .hash HASH ff700120 0120 > >00b4 0004 A 2 0 8 > > [ 2] .dynsym DYNSYM ff7001d8 01d8 > >0270 0018 A 312 8 > > [ 3] .dynstr STRTAB ff700448 0448 > >0059 A 0 0 1 > > [ 4] .gnu.version VERSYM ff7004a2 04a2 > >0034 0002 A 2 0 2 > > [ 5] .gnu.version_dVERDEF ff7004d8 04d8 > >0038 A 3 2 8 > > [ 6] .text PROGBITS ff700c00 00100bab > > >02e4 AX 0 0 64 > > It puts .text at 1MB. Your vdso file must be huge? > > It looks like it ignores the > -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096 > options passed to it. The AMD64 ABI has a 1MB minimum page size, but > these options are supposed to disable it. > > Not sure how to work around this, but having an 1+MB vdso would be incredibly > wasteful. What version is it? Perhaps we just drop support for this. I can't > think of a workaround currently. Looking at vdso.lds.S, if you change just VDSO_TEXT_OFFSET to 0xc00 and don't tweak the linker script, then you jump backwards with the dot, you should even get a linker warning about it: . = VDSO_PRELINK + VDSO_TEXT_OFFSET; .text : { *(.text) }:text .text.ptr : { *(.text.ptr) }:text . = VDSO_PRELINK + 0x900; Guess that 0x900 should have been VDSO_TEXT_OFFSET + 0x400 or something similar. Also note that it is highly desirable to fit the whole vdso into one page, so increasing VDSO_TEXT_OFFSET etc. offsets too much is just wasting memory. From the above dump, VDSO_TEXT_OFFSET 0x500 is too low, but 0x600 should work, assuming .data section is moved 0x100 higher as well. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linus 2.6.23-rc1
On Mon, Jul 23, 2007 at 01:31:00AM +0200, Andi Kleen wrote: > On Monday 23 July 2007 01:23:38 Andre Noll wrote: > > On 00:22, Andi Kleen wrote: > > > > /usr/bin/ld: section .text [ff700500 -> ff7007e3] > > > > overlaps section .gnu.version_d [ff7004d8 -> ff70050f] > > > > > > Does this patch fix it? > > > > Nope, with 0x600 I still get the same error. But it helped to further > > increase VDSO_TEXT_OFFSET to 0xc00. I tried 0x700, 0x800,... and 0xc00 > > is the smallest value in this series that makes the error go away, i.e. > > the patch below works for me. > > Can you send (privately) readelf -a output from your vdso.so ? > Your linker must be doing something weird. > > 0xc00 is quite wasteful. I think Roland's --build-id doesn't create very big section, the likely culprit would be a hacked up ld that e.g. defaults to --hash-style=both. Can you retry with --hash-style=sysv? vdso really has to include the traditional .hash section, otherwise it wouldn't be compatible with old glibcs, and an additional .gnu.hash might be an overkill for it - doesn't the vdso define only very few symbols? Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: vdso.so mislinked by buggy linker was Re: Linus 2.6.23-rc1
On Mon, Jul 23, 2007 at 01:56:20AM +0200, Andi Kleen wrote: > On Monday 23 July 2007 01:38:40 Andre Noll wrote: > [readded linux-kernel, Linus] > > > [Nr] Name Type Address Offset > >Size EntSize Flags Link Info Align > > [ 0] NULL > > 0 0 0 > > [ 1] .hash HASH ff700120 0120 > >00b4 0004 A 2 0 8 > > [ 2] .dynsym DYNSYM ff7001d8 01d8 > >0270 0018 A 312 8 > > [ 3] .dynstr STRTAB ff700448 0448 > >0059 A 0 0 1 > > [ 4] .gnu.version VERSYM ff7004a2 04a2 > >0034 0002 A 2 0 2 > > [ 5] .gnu.version_dVERDEF ff7004d8 04d8 > >0038 A 3 2 8 > > [ 6] .text PROGBITS ff700c00 00100bab > > >02e4 AX 0 0 64 > > It puts .text at 1MB. Your vdso file must be huge? > > It looks like it ignores the > -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096 > options passed to it. The AMD64 ABI has a 1MB minimum page size, but > these options are supposed to disable it. These options are fairly new, before they were ignored (like all unknown -z options). They were added 2006-05-30 to CVS binutils. I guess the problem is caused by the gap being too big and old binutils. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RESEND] PIE randomization
On Mon, Jul 09, 2007 at 11:58:07PM +0200, Jiri Kosina wrote: > On Mon, 9 Jul 2007, Jiri Kosina wrote: > > [ ... ] > > > - if (!BAD_ADDR(elf_entry)) { > > > + if (!IS_ERR((void *)elf_entry)) { > > I agree that this is better solution. Andrew, this Jakub's patch should > > replace the pie-randomization-fix-bad_addr-macro.patch if possible. You > > can add > > as this raced :) with Andrew who already folded the > pie-randomization-fix-bad_addr-macro.patch into pie-randomization.patch, > do you think you could rebase this change against the current state of -mm > and resend it? Thanks, Here it is: Restore BAD_ADDR check strictness, use IS_ERR in the only place where the stricter BAD_ADDR can't work, as the value is a load bias rather than userland address. Signed-off-by: Jakub Jelinek <[EMAIL PROTECTED]> --- linux/fs/binfmt_elf.c 2007-07-10 11:39:29.0 +0200 +++ linux/fs/binfmt_elf.c 2007-07-10 11:41:03.0 +0200 @@ -80,7 +80,7 @@ static struct linux_binfmt elf_format = .hasvdso= 1 }; -#define BAD_ADDR(x) IS_ERR_VALUE(x) +#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE) static int set_brk(unsigned long start, unsigned long end) { @@ -1005,7 +1005,7 @@ static int load_elf_binary(struct linux_ interpreter, &interp_map_addr, load_bias); - if (!BAD_ADDR(elf_entry)) { + if (!IS_ERR((void *)elf_entry)) { /* * load_elf_interp() returns relocation * adjustment Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RESEND] PIE randomization
On Sat, Jul 07, 2007 at 02:13:01AM +0200, Jiri Kosina wrote: > On Thu, 5 Jul 2007, Rik van Riel wrote: > > > So the original patch has: > > #define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE) > > For some reason(?) it got changed to the clearly buggy: > > #define BAD_ADDR(x) ((unsigned long)(x) >= PAGE_MASK) > > Jiri's patch undoes that second buggy define, which is very > > different from the original that was sent in by you and Ernie. > > This is a part of execshield patch, fthe pie-compiled binary executable > memory layout randomization was extracted from - see > http://people.redhat.com/~mingo/exec-shield/exec-shield-nx-2.6.19.patch > > Note that load_elf_interp() in vanilla kernel differs from the > execshield's (and pie-randomization.patch) version. > > The fix makes the BAD_ADDR check whether the address belongs to the > ERR_PTR range, which seems valid for all uses of BAD_ADDR in the patched > binfmt_elf.c (do_brk(), elf_map(), do_mmap() etc return valid address or > err ptr) ... am I missing something obvious here? I believe BAD_ADDR macro was changes from ((unsigned long)(x) >= TASK_SIZE) (which is the right test for invalid user addresses, stronger check than >= PAGE_MASK) to >= PAGE_MASK only because of the one check of the return value of load_elf_interp. All other uses of BAD_ADDR macro are either on userland addresses (what do_mmap, elf_map, do_brk etc. return; where TASK_SIZE or more is certainly wrong) or in one case still on unbiased ELF p_vaddr: if (BAD_ADDR(k) || elf_ppnt->p_filesz > elf_ppnt->p_memsz || in load_elf_binary (where >= TASK_SIZE check is ok too). So perhaps doing this instead of changing BAD_ADDR to IS_ERR_VAL might be better: Signed-off-by: Jakub Jelinek <[EMAIL PROTECTED]> --- linux/fs/binfmt_elf.c 2007-06-08 21:53:45.0 +0200 +++ linux/fs/binfmt_elf.c 2007-07-07 14:19:14.0 +0200 @@ -80,7 +80,7 @@ static struct linux_binfmt elf_format = .hasvdso= 1 }; -#define BAD_ADDR(x) ((unsigned long)(x) >= PAGE_MASK) +#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE) static int set_brk(unsigned long start, unsigned long end) { @@ -1015,7 +1015,7 @@ static int load_elf_binary(struct linux_ interpreter, &interp_map_addr, load_bias); - if (!BAD_ADDR(elf_entry)) { + if (!IS_ERR((void *)elf_entry)) { /* load_elf_interp() returns relocation adjustment */ interp_load_addr = elf_entry; elf_entry += loc->interp_elf_ex.e_entry; Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RESEND] PIE randomization
On Wed, May 23, 2007 at 10:50:24AM +0200, Jiri Kosina wrote: > From: Jan Kratochvil <[EMAIL PROTECTED]> > > This patch is using mmap()'s randomization functionality in such a way > that it maps the main executable of (specially compiled/linked -pie/-fpie) > ET_DYN binaries onto a random address (in cases in which mmap() is allowed > to perform a randomization). > > Origin of this patch is in exec-shield > (http://people.redhat.com/mingo/exec-shield/) > > Signed-off-by: Jan Kratochvil <[EMAIL PROTECTED]> > Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]> > Cc: Ingo Molnar <[EMAIL PROTECTED]> > Cc: Roland McGrath <[EMAIL PROTECTED]> > Cc: Jakub Jelinek <[EMAIL PROTECTED]> > -#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE) > +#define BAD_ADDR(x) ((unsigned long)(x) >= PAGE_MASK) ... > @@ -442,8 +491,7 @@ static unsigned long load_elf_interp(str > goto out_close; > } > > - *interp_load_addr = load_addr; > - error = ((unsigned long)interp_elf_ex->e_entry) + load_addr; > + error = load_addr; ... > if (elf_interpreter) { > - if (interpreter_type == INTERPRETER_AOUT) > + if (interpreter_type == INTERPRETER_AOUT) { > elf_entry = load_aout_interp(&loc->interp_ex, >interpreter); > - else > + } else { > + unsigned long interp_map_addr; /* unused */ > + > elf_entry = load_elf_interp(&loc->interp_elf_ex, > interpreter, > - &interp_load_addr); > + &interp_map_addr, > + load_bias); > + if (!BAD_ADDR(elf_entry)) { > + /* > + * load_elf_interp() returns relocation > + * adjustment > + */ > + interp_load_addr = elf_entry; > + elf_entry += loc->interp_elf_ex.e_entry; > + } > + } > if (BAD_ADDR(elf_entry)) { > force_sig(SIGSEGV, current); > retval = IS_ERR((void *)elf_entry) ? The above highlighted changes are the cause of random segfaults of PIE binaries. See https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=246623 The problem is if ld.so is prelinked to some address in the area where the kernel actually maps it, particularly if elf_map in load_elf_interp returns an address one page below its first PT_LOAD segments vaddr. Then load_addr (it is a load bias actually) returned from load_elf_interp is 0xf000 (on 32-bit kernels) and BAD_ADDR are all addresses >= 0xf000 (on i?86). The fix should be either changing the definition of BAD_ADDR to e.g. IS_ERR_VALUE(x), or at least changing the if (!BAD_ADDR(elf_entry)) { above to if (!IS_ERR_VALUE(elf_entry)) {, the second BAD_ADDR can already stay, because at that place elf_entry is no longer a bias (difference between actual and preferred load address), but an actual address, where very high addresses are of course invalid. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Userspace compiler support of "long long"
On Thu, Jun 28, 2007 at 07:53:51AM -0400, Kyle Moffett wrote: > On Jun 27, 2007, at 23:57:54, Matthew Wilcox wrote: > >On Wed, Jun 27, 2007 at 06:30:52PM -0400, Kyle Moffett wrote: > >>Then all 64-bit archs have: > >>typedef signed long __s64; > >>typedef unsigned long __u64; > >> > >>While all 32-bit archs have: > >>typedef signed long long __s64; > >>typedef unsigned long long __u64; > > > >include/asm-parisc/types.h:typedef unsigned long long __u64; > > > >For both 32 and 64-bit. > > > >include/asm-sh64/types.h:typedef unsigned long long __u64; > >include/asm-x86_64/types.h:typedef unsigned long long __u64; > > > >So that's three architectures that violate your first assertion. > > Oh, ok, that makes it even easier to say this with certainty: > Changing the other 64-bit archs to use "long long" for their 64-bit > numbers will not cause additional warnings. I'm also almost certain > there are no architectures which use "long long" for 128-bit > integers. (Moreover, I can't find hardly anything which does 128-bit > integers at all). unsigned long and unsigned long long have the same size, precision and alignment on all LP64 arches, that's true. But they have different ranks and more importantly they mangle differently in C++. So, whether some user exposed type uses unsigned long or unsigned long long is part of the ABI, whether that's size_t, uintptr_t, uint64_t, u_int64_t or any other type, you can't change it without breaking the ABI. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] get_random_long() and AT_ENTROPY for auxv, kernel 2.6.21.5
On Sun, Jun 24, 2007 at 09:43:03PM -0700, Arjan van de Ven wrote: > > - something to do with aux vector headers > > the primary goal is to pass a random value to userspace at process > start; this to save glibc from having to open /dev/urandom on ever > program start (which it does now for all apps compiled with > -fstack-protector, which in various distros is "everything"). There are 2 ways to compile -fstack-protector supporting glibc actually, only one opens /dev/urandom on every program initialization, the other computes the stack guard from some bits of the stack address (so indirectly depends on get_random_int() in stack randomization). Nevertheless, having one random long (32-bit for 32-bit arches, 64-bit otherwise) in aux vector would be useful. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: O_CLOEXEC: An alternate proposal
On Fri, Jun 08, 2007 at 03:47:12AM -0400, Daniel Colascione wrote: > Hey, this is my first post to linux-kernel, so please be kind. :-) > > Linus Torvalds wrote on May 31: > > I'm with Uli on this one. "Stateful" stuff is bad. It's essentially > > impossible to handle with libraries - either the library would have to > > explciitly always turn the state the way _it_ needs it, or the library > > will do the wrogn thing. > > I agree that stateful stuff is generally not very elegant, > but I think it's a win here -- we wouldn't have to create any > new APIs except for the state-setting stuff. > > The state just has to be thread-local. > > If it's thread-local, a library, say, glibc, > can use code like this: > > /* Internal library function */ > old_fd_flags = kernel_default_fd_flags(FD_CLOEXEC | FD_RANDFD); > event_fd = super_duper_event_polling_mechanism_fd(); > kernel_default_fd_flags(old_fd_flags); It is not a win, what if a signal comes in between the two kernel_default_fd_flags syscalls? open and other functions are async signal safe and programs will be certainly upset if suddenly the syscalls in the signal handler start to behave differently depending on which exact code the async signal has interrupted. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Introduce O_CLOEXEC (take >2)
On Thu, May 31, 2007 at 11:46:31AM -0700, Davide Libenzi wrote: > On Thu, 31 May 2007, Ulrich Drepper wrote: > > Davide Libenzi wrote: > > > Isn't this better be a global process flag? Default should be, for legacy > > > reasons, > > > > No. Policies are always wrong since it means code that cannot change > > the policy (e.g, all runtime libraries) have no access to the > > functionality. I cannot set the policy to default to close-on-exit in > > glibc all the while the application assumes this is not the case. > > I was talking for a broader usage, not only glibc centric. Most ppl > writing MT+exec apps wants all but (eventually) and handfull of files > leaking across the exec boundary. If open (and all other syscalls that create fds) have O_CLOEXEC (and something similar for other syscalls), then such a policy can be easily implemented on the userland, if desired. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory
On Tue, May 08, 2007 at 04:12:00PM +1000, Nick Piggin wrote: > I didn't actually check system and user times for the mysql > benchmark, but that's exactly what I had in mind when I > mentioned the poor cache behaviour this patch could cause. I > definitely did see user times go up in benchmarks where I > measured. > > We have percpu and cache affine page allocators, so when > userspace just frees a page, it is likely to be cache hot, so > we want to free it up so it can be reused by this CPU ASAP. > Likewise, when we newly allocate a page, we want it to be one > that is cache hot on this CPU. malloc has per-thread arenas, so when using MADV_FREE the pages should be local to the thread as well (unless the thread has switched to a different CPU also to the CPU) and in case of sysbench should be cache hot as well (it is reused RSN). With MADV_DONTNEED you need to clear the pages while that is not necessary with MADV_FREE. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/5] fallocate() implementation in i86, x86_64 and powerpc
On Thu, May 03, 2007 at 11:28:15PM -0700, Andrew Morton wrote: > > > The posix spec implies that negative `len' is permitted - presumably > > > "allocate > > > ahead of `offset'". How peculiar. > > > > I just checked the man page for posix_fallocate() and it says: > > > > EINVAL offset or len was less than zero. That describes the current glibc implementation. > > We should probably follow this lead. > > Yes, I think so. I'm suspecting that > http://www.opengroup.org/onlinepubs/009695399/functions/posix_fallocate.html > is just buggy. Or I can't read. > > I mean, if we're going to support negative `len' then is the byte at > `offset' inside or outside the segment? Head spins. > > However it would be neat if someone could test $OTHER_OS and, perhaps more > importantly, the present glibc emulation (which I assume your manpage is > referring to, so this would be a manpage test ;)). int posix_fallocate (int fd, __off_t offset, __off_t len) { struct stat64 st; struct statfs f; /* `off_t' is a signed type. Therefore we can determine whether OFFSET + LEN is too large if it is a negative value. */ if (offset < 0 || len < 0) return EINVAL; if (offset + len < 0) return EFBIG; /* First thing we have to make sure is that this is really a regular file. */ if (__fxstat64 (_STAT_VER, fd, &st) != 0) return EBADF; if (S_ISFIFO (st.st_mode)) return ESPIPE; if (! S_ISREG (st.st_mode)) return ENODEV; if (len == 0) { if (st.st_size < offset) { int ret = __ftruncate (fd, offset); if (ret != 0) ret = errno; return ret; } return 0; } ... is what glibc does ATM. Seems we violate the case where len == 0, as EINVAL in that case is "shall fail". But reading the standard to imply negative len is ok is too much guessing, there is no word what it means when len is negative and "required storage for regular file data starting at offset and continuing for len bytes" doesn't make sense for negative size. And given the general "Implementations may support additional errors not included in this list, may generate errors included in this list under circumstances other than those described here, or may contain extensions or limitations that prevent some errors from occurring." I believe returning EINVAL for len < 0 is not a POSIX violation. That doesn't mean the standard shouldn't be clarified, whether by saying EINVAL must be returned for non-positive len or saying that using negative len has undefined or implementation defined behavior. > The above opengroup page only permits S_ISREG. Preallocating directories > sounds quite useful to me, although it's something which would be pretty > hard to emulate if the FS doesn't support it. And there's a decent case to > be made for emulating it - run-anywhere reasons. Does glibc emulation support > directories? Quite unlikely. No, see above. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: MADV_FREE functionality
On Mon, Apr 30, 2007 at 06:18:39PM -0700, Andrew Morton wrote: > > In short: > > - both MADV_FREE and MADV_DONTNEED only unmap file pages > > - after MADV_DONTNEED the application will always get back > >fresh zero filled anonymous pages when accessing the > >memory > > - after MADV_FREE the application can either get back the > >original data (without a page fault) or zero filled > >anonymous memory > > > > The Linux MADV_DONTNEED behavior is not POSIX compliant. > > POSIX says that with MADV_DONTNEED the application's data > > will be preserved. > > > > Currently glibc simply ignores POSIX_MADV_DONTNEED requests > > from applications on Linux. Changing the behaviour which > > some Linux applications may rely on might not be the best > > idea. > > OK, thanks. I stuck that in the changelog. FYI, Solaris man page on MADV_FREE says: MADV_FREE Tells the kernel that contents in the specified address range are no longer important and the range will be overwritten. When there is demand for memory, the system will free pages associated with the speci- fied address range. In this instance, the next time a page in the address range is referenced, it will con- tail all zeroes. Otherwise, it will contain the data that was there prior to the MADV_FREE call. References made to the address range will not make the system read from backing store (swap space) until the page is modified again. This value cannot be used on mappings that have under- lying file objects. The last paragraph seems to be just about the operation being undefined, madvise MADV_FREE on MAP_SHARED file mapping returns 0 rather than flagging an error. FreeBSD man page: MADV_FREEGives the VM system the freedom to free pages, and tells the system that information in the specified page range is no longer important. This is an efficient way of allowing malloc(3) to free pages anywhere in the address space, while keeping the address space valid. The next time that the page is referenced, the page might be demand zeroed, or might contain the data that was there before the MADV_FREE call. References made to that address space range will not make the VM system page the information back in from backing store until the page is modified again. > Also, where did we end up with the Solaris compatibility? > > The patch I have at present retains MADV_FREE=0x05 for sparc and sparc64 > which should be good. > > Did we decide that the Solaris and Linux implementations of MADV_FREE are > compatible? SPARC Solaris binary compatibility in Linux is in really bad shape, madvise in Solaris is implemented using memcntl syscall (at least according to truss(1)) and that syscall is systbl.S: .word solaris_unimplemented /* memcntl 131 */ When/if anyone decides to put more effort into the Solaris binary compatibility (I'm quite doubtful anyone will), codes which don't match can be simply translated into other codes, ignored etc., we can't use sys_madvise to implement memcntl syscall anyway. While Solaris MADV_FREE is the same as Linux MADV_FREE proposed by Rik (except perhaps the documented undefined behavior with file mappings, on #include #include #include int main (void) { getpid (); int fd = open ("test", O_RDWR); void *p = mmap (0, 8192, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); memset (p, ' ', 8192); madvise (p, 8192, MADV_FREE); return 0; } on Solaris the spaces actually made it into the file), MADV_DONTNEED is not, but that doesn't really matter except for arch/sparc*/solaris/ layer if anyone cares. We certainly can't change current MADV_DONTNEED behavior, all we can do is implement a new MADV_* code with a different behavior and let glibc translate POSIX_MADV_* codes on posix_madvise to the Linux specific MADV_*. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Big reserved mappings on x86_64
On Wed, Apr 25, 2007 at 10:42:20AM +0200, Jan Engelhardt wrote: > I actually took a look at `pmap $$`, which reveals that a lot of shared > libraries map 2044K or 2048K unreadable-unwritable-private > mappings...for _what_ purpose? > > 10:37 opteron:~ > pmap $$ > 4403: bash > START SIZE RSS DIRTY PERM MAPPING > 2ae6cca7212K172K 0K r-xp /lib64/libreadline.so.5.2 > 2ae6ccaa5000 2044K 0K 0K ---p /lib64/libreadline.so.5.2 <-- > 2ae6ccca4000 4K 4K 4K r--p /lib64/libreadline.so.5.2 > 2ae6ccca5000 28K 28K 28K rw-p /lib64/libreadline.so.5.2 > 2ae6cccac000 8K 8K 8K rw-p [anon] > 2ae6cccae000 28K 16K 0K r-xp /lib64/libhistory.so.5.2 > 2ae6cccb5000 2048K 0K 0K ---p /lib64/libhistory.so.5.2 <-- > 2ae6cceb5000 8K 8K 8K rw-p /lib64/libhistory.so.5.2 > 2ae6cceb7000320K208K 0K r-xp /lib64/libncurses.so.5.5 > 2ae6ccf07000 2048K 0K 0K ---p /lib64/libncurses.so.5.5 <-- > 2ae6cd107000 48K 48K 48K r--p /lib64/libncurses.so.5.5 > 2ae6cd113000 28K 28K 28K rw-p /lib64/libncurses.so.5.5 > [...] > > What could these ominous mappings be? Does anyone else see that - > perhaps someone with x86_64 && !(opensuse 10.2)? While i386 only supports 4KB pages, x86_64 ELF objects ought to support up to 2MB pages. The gap between read-only/executable and writable segments is intentionally mapped PROT_NONE, so that other things aren't mapped in between the segments. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] lazy freeing of memory through MADV_FREE
On Mon, Apr 23, 2007 at 08:21:37PM +1000, Nick Piggin wrote: > I guess it is a good idea to batch these things. But can you > do that on all architectures? What happens if your tlb flush > happens after another thread already accesses it again, or > after it subsequently gets removed from the address space via > another CPU? Accessing the page by another thread before madvise (MADV_FREE) returns is undefined behavior, it can act as if that access happened right before the madvise (MADV_FREE) call or right after it. That's ok for glibc and supposedly any other malloc implementation, madvise (MADV_FREE) is called while holding containing's arena lock and for whatever malloc implementaton, madvise (MADV_FREE) would be part of free operations and you definitely need some synchronization between one thread freeing some memory and other thread deciding to reuse that memory and return it from malloc/realloc/calloc/etc. My only concern is whether using non-atomic update of the pte is ok or not. ptep_test_and_clear_young/ptep_test_and_clear_dirty Rik's patch was doing before are done using atomic instructions, at least on x86_64. The operation we want for MADV_FREE is, clear young/dirty bits if they have been set on entry to the MADV_FREE madvise call, undefined values for these 2 bits if some other task modifies the young/dirty bits concurrently with this MADV_FREE zap_page_range, but I'd say other bits need to be unmodified. Now, is there some kernel code which while either not holding corresponding mmap_sem at all or holding it just down_read modifies other bits in the pte? If yes, we need to do this clearing atomically, basically do a cmpxchg loop until we succeed to clear the 2 bits and then flush the tlb if any of them was set before (ptep_test_and_clear_dirty_and_young?), if not, set_pte_at is ok and faster than a lock prefixed insn. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] lazy freeing of memory through MADV_FREE
On Fri, Apr 20, 2007 at 07:52:44PM -0400, Rik van Riel wrote: > It turns out that Nick's patch does not improve peak > performance much, but it does prevent the decline when > running with 16 threads on my quad core CPU! > > We _definately_ want both patches, there's a huge benefit > in having them both. > > Here are the transactions/seconds for each combination: > >vanilla new glibc madv_free kernel madv_free + mmap_sem > threads > > 1 610 609 596545 > 2103211361196 1200 > 4107011282014 2024 > 8100010881665 2087 > 1677910731310 1999 FYI, I have uploaded a testing glibc that uses MADV_FREE and falls back to MADV_DONTUSE if MADV_FREE is not available, to http://people.redhat.com/jakub/glibc/2.5.90-21.1/ and I'm also attaching the glibc patch for those who want to build it themselves: 2007-04-19 Ulrich Drepper <[EMAIL PROTECTED]> Jakub Jelinek <[EMAIL PROTECTED]> * malloc/arena.c (heap_info): Add mprotect_size field, adjust pad. (new_heap): Initialize mprotect_size. (no_madv_free): New variable. (grow_heap): When growing, only mprotect from mprotect_size till new_size if mprotect_size is smaller. When shrinking, use PROT_NONE MMAP for __libc_enable_secure only, otherwise if MADV_FREE is available use it and fall back to MADV_DONTNEED. * sysdeps/unix/sysv/linux/alpha/bits/mman.h (MADV_FREE): Define. * sysdeps/unix/sysv/linux/ia64/bits/mman.h (MADV_FREE): Likewise. * sysdeps/unix/sysv/linux/i386/bits/mman.h (MADV_FREE): Likewise. * sysdeps/unix/sysv/linux/s390/bits/mman.h (MADV_FREE): Likewise. * sysdeps/unix/sysv/linux/powerpc/bits/mman.h (MADV_FREE): Likewise. * sysdeps/unix/sysv/linux/x86_64/bits/mman.h (MADV_FREE): Likewise. * sysdeps/unix/sysv/linux/sparc/bits/mman.h (MADV_FREE): Likewise. * sysdeps/unix/sysv/linux/sh/bits/mman.h (MADV_FREE): Likewise. --- libc/malloc/arena.c.jj 2006-10-31 23:05:31.0 +0100 +++ libc/malloc/arena.c 2007-04-19 18:54:20.0 +0200 @@ -1,5 +1,6 @@ /* Malloc implementation for multiple threads without lock contention. - Copyright (C) 2001,2002,2003,2004,2005,2006 Free Software Foundation, Inc. + Copyright (C) 2001,2002,2003,2004,2005,2006,2007 + Free Software Foundation, Inc. This file is part of the GNU C Library. Contributed by Wolfram Gloger <[EMAIL PROTECTED]>, 2001. @@ -59,10 +60,12 @@ typedef struct _heap_info { mstate ar_ptr; /* Arena for this heap. */ struct _heap_info *prev; /* Previous heap. */ size_t size; /* Current size in bytes. */ + size_t mprotect_size;/* Size in bytes that has been mprotected + PROT_READ|PROT_WRITE. */ /* Make sure the following data is properly aligned, particularly that sizeof (heap_info) + 2 * SIZE_SZ is a multiple of - MALLOG_ALIGNMENT. */ - char pad[-5 * SIZE_SZ & MALLOC_ALIGN_MASK]; + MALLOC_ALIGNMENT. */ + char pad[-6 * SIZE_SZ & MALLOC_ALIGN_MASK]; } heap_info; /* Get a compile-time error if the heap_info padding is not correct @@ -692,10 +695,15 @@ new_heap(size, top_pad) size_t size, top } h = (heap_info *)p2; h->size = size; + h->mprotect_size = size; THREAD_STAT(stat_n_heaps++); return h; } +#if defined _LIBC && defined MADV_FREE +static int no_madv_free; +#endif + /* Grow or shrink a heap. size is automatically rounded up to a multiple of the page size if it is positive. */ @@ -714,17 +722,49 @@ grow_heap(h, diff) heap_info *h; long di new_size = (long)h->size + diff; if((unsigned long) new_size > (unsigned long) HEAP_MAX_SIZE) return -1; -if(mprotect((char *)h + h->size, diff, PROT_READ|PROT_WRITE) != 0) - return -2; +if((unsigned long) new_size > h->mprotect_size) { + if (mprotect((char *)h + h->mprotect_size, + (unsigned long) new_size - h->mprotect_size, + PROT_READ|PROT_WRITE) != 0) + return -2; + h->mprotect_size = new_size; +} } else { new_size = (long)h->size + diff; if(new_size < (long)sizeof(*h)) return -1; /* Try to re-map the extra heap space freshly to save memory, and make it inaccessible. */ -if((char *)MMAP((char *)h + new_size, -diff, PROT_NONE, -MAP_PRIVATE|MAP_FIXED) == (char *) MAP_FAILED) - return -2; +#ifdef _LIBC +if (__builtin_expect (__libc_enable_secure, 0)) +#else +if (1) +#endif + { + if((char *)MMAP((char *)h + new_size, -diff, PROT_NONE, + MAP_PRIVATE|MAP_FIXED) == (char *) MAP_FAILED) + return -2; +
Re: Interface for the new fallocate() system call
On Fri, Apr 20, 2007 at 07:21:46PM +0530, Amit K. Arora wrote: > Ok. > In this case we may have to consider following things: > > 1) Obviously, for this glibc will have to call fallocate() syscall with > different arguments on s390, than other archs. I think this should be > doable and should not be an issue with glibc folks (right?). glibc can cope with this easily, will just add sysdeps/unix/sysv/linux/s390/fallocate.c or something similar to override the generic Linux implementation. > 2) we also need to see how strace behaves in this case. With little > knowledge that I have of strace, I don't think it should depend on > argument ordering of a system call on different archs (since it uses > ptrace internally and that should take care of it). But, it will be > nice if someone can confirm this. strace would solve this with #ifdef mess, it already does that in many places so guess another few lines don't make it significantly worse. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] make MADV_FREE lazily free memory
On Mon, Apr 16, 2007 at 11:10:39AM -0500, Anton Blanchard wrote: > > Making the pte clean also needs to clear the hardware writable > > bit on architectures where we do pte dirtying in software. > > > > If we don't, we would have corruption problems all over the VM, > > for example in the code around pte_clean_one :) > > > > >But as Linus recently said, even hardware handled faults still > > >take expensive microarchitectural traps. > > > > Nowhere near as expensive as a full page fault, though... > > Unfortunately it will be expensive on architectures that have software > referenced and changed. It would be great if we could just leave them > dirty in the pagetables and transition between a clean and dirty state > via madvise calls, but thats just wishful thinking on my part :) That would mean an additional syscall. Furthermore, if you allocate a big chunk of memory, dirty it, then free (with madvise (MADV_FREE)) it and soon allocate the same size of memory again, it is better to start that with non-dirty memory, it might be that this time you e.g. don't modify a big part of the chunk. If all that memory was kept dirty all the time and just marked/unmarked for lazy reuse with MADV_FREE/MADV_UNDO_FREE, all that memory would need to be saved to disk when paging out as it was marked dirty, while with current Rik's MADV_FREE that will happen only for pages that were actually dirtied after the last malloc. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH, take4] FUTEX : new PRIVATE futexes
On Sat, Apr 07, 2007 at 10:43:39AM +0200, Eric Dumazet wrote: > get_futex_key() does a check against sizeof(u32) regardless of futex being > 64bits or not. > So it is possible a 64bit futex spans two pages of memory... That would be a user bug. 32-bit futexes have to be 32-bit aligned, 64-bit futexes have to be 64-bit aligned. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: missing madvise functionality
On Thu, Apr 05, 2007 at 03:31:24AM -0400, Rik van Riel wrote: > >My guess is that all the page zeroing is pretty expensive as well and > >takes significant time, but I haven't profiled it. > > With the attached patch (Andrew, I'll change the details around > if you want - I just wanted something to test now), your test > case run time went down considerably. Thanks. --- linux-2.6.20.noarch/mm/madvise.c.madvise2007-04-03 21:53:47.0 -0400 +++ linux-2.6.20.noarch/mm/madvise.c2007-04-04 23:48:34.0 -0400 @@ -142,8 +142,12 @@ static long madvise_dontneed(struct vm_a .last_index = ULONG_MAX, }; zap_page_range(vma, start, end - start, &details); - } else - zap_page_range(vma, start, end - start, NULL); + } else { + struct zap_details details = { + .madv_free = 1, + }; + zap_page_range(vma, start, end - start, &details); + } return 0; } @@ -209,7 +213,9 @@ madvise_vma(struct vm_area_struct *vma, error = madvise_willneed(vma, prev, start, end); break; + /* FIXME: POSIX says that MADV_DONTNEED cannot throw away data. */ case MADV_DONTNEED: + case MADV_FREE: error = madvise_dontneed(vma, prev, start, end); break; I think you should only use the new behavior for madvise MADV_FREE, not for MADV_DONTNEED. The current MADV_DONTNEED behavior (which conflicts with POSIX POSIX_MADV_DONTNEED, but that doesn't matter since what glibc maps posix_madvise POSIX_MADV_DONTNEED in madvise call if anything doesn't have to be MADV_DONTNEED, but can be anything else) is apparently documented in Linux man pages: MADV_DONTNEED Do not expect access in the near future. (For the time being, the application is finished with the given range, so the kernel can free resources associated with it.) Subsequent accesses of pages in this range will succeed, but will result either in re-loading of the memory contents from the under- lying mapped file (see mmap()) or zero-fill-on-demand pages for mappings without an underlying file. so it wouldn't surprise me if something relied on zero filling. So IMHO madv_free in details should be only set if MADV_FREE. Also, I think MADV_FREE shouldn't do anything at all (i.e. don't call zap_page_range, but don't fail either) for shared or file backed vmas, only for private anon memory it should do something. After all, it is just an optimization and it makes sense only for private anon mappings. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: missing madvise functionality
On Wed, Apr 04, 2007 at 05:46:12PM +1000, Nick Piggin wrote: > Does mmap(PROT_NONE) actually free the memory? Yes. /* Clear old maps */ error = -ENOMEM; munmap_back: vma = find_vma_prepare(mm, addr, &prev, &rb_link, &rb_parent); if (vma && vma->vm_start < addr + len) { if (do_munmap(mm, addr, len)) return -ENOMEM; goto munmap_back; } > In the case of pages being unused then almost immediately reused, why is > it a bad solution to avoid freeing? Is it that you want to avoid > heuristics because in some cases they could fail and end up using memory? free(3) doesn't know if the memory will be reused soon, late or never. So avoiding trimming could substantially increase memory consumption with certain malloc/free patterns, especially in threaded programs that use multiple arenas. Implementing some sort of deferred memory trimming in malloc is "solving" the problem in a wrong place, each app really has no idea (and should not have) what the current system memory pressure is. > Secondly, why is MADV_DONTNEED bad? How much more expensive is a pagefault > than a syscall? (including the cost of the TLB fill for the memory access > after the syscall, of course). That's page fault per page rather than a syscall for the whole chunk, furthermore zeroing is expensive. We really want something like FreeBSD MADV_FREE in Linux, see e.g. http://mail.nl.linux.org/linux-mm/2000-03/msg00059.html for some details. Apparently FreeBSD malloc is using MADV_FREE for years (according to their CVS for 10 years already). Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: getting processor numbers
On Tue, Apr 03, 2007 at 07:04:58PM -0700, Paul Jackson wrote: > Andrew wrote: > > I'd have thought that in general an application should be querying its > > present affinity mask - something like sched_getaffinity()? That fixes the > > CPU hotplug issues too, of course. > > The sched_getaffinity call is quick, too, and it nicely reflects any > cpuset constraints, while still working on kernels which don't have > CPUSETs configured. > > There are really at least four "number of CPUs" answers here, and we > should be aware of which we are providing. There are, in order of > decreasing size: > 1) the size of the kernels cpumask_t (NR_CPUS), > 2) the maximum number of CPUs that might ever be hotplugged into a > booted system, > 3) the current number of CPUs online in that system, and > 4) the number of CPUs that the current task is allowed to use. > > I would suggest that (4) is what we should typically return. > Certainly it would seem that the use that Ulrich is concerned with, > by OpenMP, wants (4). > > Currently, the sysconf(_SC_NPROCESSORS_CONF) returns (3), by counting > the CPUs in /proc/stat, which is rather bogus on cpuset, or even > sched_setaffinity, constrained systems. OpenMP wants (4) and I'll change it that way. sysconf(_SC_NPROCESSORS_ONLN) must return (3) (this currently scans /proc/stat) and sysconf(_SC_NPROCESSORS_CONF) should IMHO return (2) (this currently scans /proc/cpuinfo on alpha and sparc{,64} for ((ncpus|CPUs) probed|cpus detected) and for the rest just returns sysconf(_SC_NPROCESSORS_ONLN)). Neither of the sysconf returned values should be affected by affinity. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: missing madvise functionality
On Tue, Apr 03, 2007 at 01:17:09PM -0700, Ulrich Drepper wrote: > Andrew Morton wrote: > > Ulrich, could you suggest a little test app which would demonstrate this > > behaviour? > > It's not really reliably possible to demonstrate this with a small > program using malloc. You'd need something like this mysql test case > which Rik said is not hard to run by yourself. > > If somebody adds a kernel interface I can easily produce a glibc patch > so that the test can be run in the new environment. > > But it's of course easy enough to simulate the specific problem in a > micro benchmark. If you want that let me know. I think something like following testcase which simulates what free and malloc do when trimming/growing a non-main arena. My guess is that all the page zeroing is pretty expensive as well and takes significant time, but I haven't profiled it. #include #include #include #include void * tf (void *arg) { (void) arg; size_t ps = sysconf (_SC_PAGE_SIZE); void *p = mmap (NULL, 128 * ps, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (p == MAP_FAILED) exit (1); int i; for (i = 0; i < 10; i++) { /* Pretend to use the buffer. */ char *q, *r = (char *) p + 128 * ps; size_t s; for (q = (char *) p; q < r; q += ps) *q = 1; for (s = 0, q = (char *) p; q < r; q += ps) s += *q; /* Free it. Replace this mmap with madvise (p, 128 * ps, MADV_THROWAWAY) when implemented. */ if (mmap (p, 128 * ps, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0) != p) exit (2); /* And immediately malloc again. This would then be deleted. */ if (mprotect (p, 128 * ps, PROT_READ | PROT_WRITE)) exit (3); } return NULL; } int main (void) { pthread_t th[32]; int i; for (i = 0; i < 32; i++) if (pthread_create (&th[i], NULL, tf, NULL)) exit (4); for (i = 0; i < 32; i++) pthread_join (th[i], NULL); return 0; } Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: getting processor numbers
On Tue, Apr 03, 2007 at 10:59:53AM -0700, Ulrich Drepper wrote: > Siddha, Suresh B wrote: > > Not all of the cpu* directories in /sys/devices/system/cpu may be > > online. > > Brilliant. You people really know how to create user interfaces. So > now in any case per CPU another stat/access syscall is needed to check > the 'online' pseudo-file? With this the readdir solution is only > marginally faster than parsing /proc/cpuinfo which means, it's > unacceptably slow. Note that glibc actually parses /proc/stat in preference of /proc/cpuinfo ATM, because /proc/stat is at least uniform while parsing /proc/cpuinfo needs a special parser for each architecture. And /proc/stat reading is even slower than /proc/cpuinfo, on x86_64 reading/parsing /proc/stat takes about 450usec, while e.g. stat64 on /sys/devices/system/cpu is just 2.5usec. But if that can't be trusted as the number of online CPUs, can somebody please add a short file to proc or sysfs which will contain the number of online and number of configured CPUs? See e.g. http://openmp.org/pipermail/omp/2007/000714.html where the first time after second g++ invocation is with omp_set_dynamic (1) and ought to be about as fast as omp_set_dynamic (0) case with the same number of threads, but it is far slower due to slow sysconf (_SC_NPROCESSORS_ONLN). Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Interface for the new fallocate() system call
On Thu, Mar 29, 2007 at 10:10:10AM -0700, Andrew Morton wrote: > > Platform: s390 > > -- > > s390 prefers following layout: > > > >int fallocate(int fd, loff_t offset, loff_t len, int mode) > > > > For details on why and how "int, int, loff_t, loff_t" is a problem on > > s390, please see Heiko's mail on 16th March. Here is the link: > > http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg133595.html > > > > Platform: ppc, arm > > -- > > ppc (32 bit) has a problem with "int, loff_t, loff_t, int" layout, > > since this will result in a pad between fd and offset, making seven > > arguments total - which is not supported by ppc32. It supports only > > 6 arguments. Thus the desired layout by ppc32 is: > > > >int fallocate(int fd, int mode, loff_t offset, loff_t len) > > > > Even ARM prefers above kind of layout. For details please see the > > definition of sys_arm_sync_file_range(). > > This is a clean-looking option. Can s390 be changed to support seven-arg > syscalls? Wouldn't int fallocate(loff_t offset, loff_t len, int fd, int mode) work on both s390 and ppc/arm? glibc will certainly wrap it and reorder the arguments as needed, so there is no need to keep fd first. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.21-rc4-mm1 4/4] sys_futex64 : allows 64bit futexes
On Wed, Mar 21, 2007 at 10:54:36AM +0100, [EMAIL PROTECTED] wrote: > This last patch is an adaptation of the sys_futex64 syscall provided in -rt > patch (originally written by Ingo Molnar). It allows the use of 64-bit futex. > > I have re-worked most of the code to avoid the duplication of the code. > > It does not provide the functionality for all architectures (only for x64 for > now). I don't think you should blindly add all operations to sys_futex64 without thinking what they really do. E.g. FUTEX_{{,UN,TRY}LOCK,CMP_REQUEUE}_PI doesn't really make any sense for 64-bit futexes, the format of PI futexes is hardcoded in the kernel and is always 32-bit, see FUTEX_TID_MASK, FUTEX_WAITERS, FUTEX_OWNER_DIED definitions. exit_robust_list/handle_futex_death will handle 32-bit PI futexes anyway. Similarly, sys_futex64 shouldn't support the obsolete operations that are there solely for compatibility (e.g. FUTEX_REQUEUE or FUTEX_FD). When you just -ENOSYS on the PI ops, there is no need to implement futex_atomic_cmpxchg_inatomic64. FUTEX_WAKE_OP is questionable for 64-bit, IMHO it is better to just -ENOSYS on it and only if anyone ever finds actual uses for it, add it. For 64-bit futexes the only needed operations are actually FUTEX_WAIT and perhaps FUTEX_CMP_REQUEUE, so I wonder if it isn't better to just add FUTEX_WAIT64 and FUTEX_CMP_REQUEUE64 ops to sys_futex instead of adding a new syscall. But the FUTEX_{{,UN,TRY}LOCK,CMP_REQUEUE}_PI removal for 64-bit futexes is IMHO the most important part of my complain. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: thread stacks and strict vm overcommit accounting
On Thu, Mar 15, 2007 at 11:08:40PM +, Hugh Dickins wrote: > > appears to increase Committed_AS by around 200kb. But we've committed to > > providing it with 8MB for stack. > > > > How come this is correct? > > We've no more committed to providing each instance with 8MB of stack, > than we've committed to providing each instance with RLIMIT_AS of > address space. The rlimits are limits, not commitments, surely? RLIMIT_STACK only applies to the initial thread, POSIX threads have just stack size attribute, not maximum thread stack size attribute. If you set it explicitly with pthread_attr_setstacksize, then libpthread will honor whatever thread stack size you want, otherwise it just uses some default thread stack size. This happens to be in NPTL derived from RLIMIT_STACK value, but very well could be something else. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench
On Tue, Mar 13, 2007 at 01:02:44PM +0100, Eric Dumazet wrote: > On Tuesday 13 March 2007 12:42, Andrea Arcangeli wrote: > > > My wild guess is that they're allocating memory after taking > > futexes. If they do, something like this will happen: > > > > taskA taskB taskC > > user lock > > mmap_sem lock > > mmap sem -> schedule > > user lock -> schedule > > > > If taskB wouldn't be there triggering more random trashing over the > > mmap_sem, the lock holder wouldn't wait and task C wouldn't wait too. > > > > I suspect the real fix is not to allocate memory or to run other > > expensive syscalls that can block inside the futex critical sections... > > glibc malloc uses arenas, and trylock() only. It should not block because if > an arena is already locked, thread automatically chose another arena, and > might create a new one if necessary. Well, only when allocating it uses trylock, free uses normal lock. glibc malloc will by default use the same arena for all threads, only when it sees contention during allocation it gives different threads different arenas. So, e.g. if mysql did all allocations while holding some global heap lock (thus glibc wouldn't see any contention on allocation), but freeing would be done outside of application's critical section, you would see contention on main arena's lock in the free path. Calling malloc_stats (); from e.g. atexit handler could give interesting details, especially if you recompile glibc malloc with -DTHREAD_STATS=1. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -freg-struct-return?
On Thu, Feb 22, 2007 at 12:09:04AM -0800, Jeremy Fitzhardinge wrote: > Arjan van de Ven wrote: > > Do we know how many gcc bugs this has? (regparm used to have many) > > other than that.. sounds like a win... > > > > The documentation suggests that its the preferred mode of operation, and > that its the default on platforms where gcc is the primary compiler. So > the fact that it isn't for Linux suggests either an oversight or that it > is actually broken... It is used for Linux on many architectures (x86_64, sparc64, ia64, ppc{,64}, arm, sh, m68k to name just a few), but it is an ABI decision, so e.g. on i386 is not used by default as the ABI mandates structs/unions are returned in memory. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: dvb shared datastructure bug?
On Tue, Feb 13, 2007 at 03:14:23PM +0400, Manu Abraham wrote: > >thanks for pointing out this issue. > > > >attached find a patch that fixes the problem. > > > >@mauro - please pull changeset a7ac92d208fe > > dvbdev: fix illegal re-usage of fileoperations struct > > > >from http://www.linuxtv.org/hg/~mws/v4l-dvb-fixtree > > > > Ack'd-by: Manu Abraham <[EMAIL PROTECTED]> Wouldn't it be better to kmalloc both struct dvb_device and struct file_operations together instead of doing 2 separate allocations? struct dvd_device_plus_fops { struct dvb_device dev; struct file_operations fops; } *dev_fops = kmalloc (sizeof (struct dvd_device_plus_fops), GFP_KERNEL); *pdvbdev = dvbdev = (struct dvb_device *)dev_fops; if (dev_fops == NULL) error handling; memset (&dev_fops->fops, 0, sizeof (dev_fops->fops)); ... dvbdev->fops = &dev_fops->fops; Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANN] Userspace M-on-N threading model implementation. Alpha release.
On Sun, Feb 04, 2007 at 03:12:32PM -0500, Bill Davidsen wrote: > Arjan van de Ven wrote: > >>Because user threading can avoid context switches, there will always be > >>cases where it will outperform o/s threads for hardware reasons. > > > >actually.. switching from one "real" thread to another in Linux is not > >an actual context switch in the hardware sense... at least this part of > >your argument seems to be incorrect ;) > > > How does that work? Switching between kernel threads requires going into > the kernel, user level thread switches are all done in user mode. > > Do you have some way to change o/s threads w/o going into the kernel? But going into kernel is not very expensive on Linux. On the other side, the overhead you need to add for every single syscall that might block for the M:N threads and the associated complications which make it far harder to conform to POSIX IMHO far outweight the costs of going into the kernel for a context switch. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] Define the EF_AS_NO_RANDOM e_flag bit
On Wed, Jan 24, 2007 at 12:06:45AM +0300, Samium Gromoff wrote: > Should we introduce per-arch asm/elf.h files to hold the relevant flag > definitions then? On some architectures there are no bits left. On others you'd need to go through whomever maintains the relevant psABI to get a bit officially allocated. Really, it is very bad idea to use e_flags for this. If all you care about is running setuid LISP programs, you'd much better put your energy into fixing the buggy ELF dumper in it. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] Define the EF_AS_NO_RANDOM e_flag bit
On Tue, Jan 23, 2007 at 11:28:13PM +0300, Samium Gromoff wrote: > Author: Samium Gromoff <[EMAIL PROTECTED]> > Date: Tue Jan 23 22:31:13 2007 +0300 > > Define the ELF binary header flag EF_AS_NO_RANDOM > > EF_AS_NO_RANDOM should mean that the binary requests to not apply > randomisation to address spaces of its processes. > > diff --git a/include/linux/elf.h b/include/linux/elf.h > index 60713e6..58ebb47 100644 > --- a/include/linux/elf.h > +++ b/include/linux/elf.h > @@ -172,6 +172,8 @@ typedef struct elf64_sym { > > #define EI_NIDENT 16 > > +#define EF_AS_NO_RANDOM 0x1/* do not randomise the address space */ > + You can't make up EF_* flags this way, they are arch specific, the LSB bit (but many others too) are already used on many architectures. E.g.: elf/mt.h:#define EF_MT_CPU_MRISC 0x0001 /* default */ elf/sparc.h:#define EF_SPARCV9_PSO0x1 /* partial store ordering */ elf/bfin.h:#define EF_BFIN_PIC0x0001 /* -fpic */ elf/alpha.h:#define EF_ALPHA_32BIT0x0001 elf/mips.h:#define EF_MIPS_NOREORDER 0x0001 elf/m68k.h:#define EF_M68K_CF_ISA_A_NODIV 0x01 /* ISA A except for div */ elf/sh.h:#define EF_SH1 1 elf/arm.h:#define EF_ARM_RELEXEC 0x01 elf/cris.h:#define EF_CRIS_UNDERSCORE 0x0001 elf/ia64.h:#define EF_IA_64_TRAPNIL (1 << 0) /* Trap NIL pointer dereferences. */ elf/vax.h:#define EF_VAX_NONPIC 0x0001 /* Object contains non-PIC code */ elf/iq2000.h:#define EF_IQ2000_CPU_IQ2000 0x0001 /* default */ elf/frv.h:#define EF_FRV_GPR_32 0x0001 /* -mgpr-32 */ to name just a few. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.20-rc4 4/4][RFC] sys_futex64 : allows 64bit futexes
On Tue, Jan 09, 2007 at 05:25:26PM +0100, Pierre Peiffer wrote: > This latest patch is an adaptation of the sys_futex64 syscall provided in > -rt > patch (originally written by Ingo). It allows the use of 64bit futex. > > I have re-worked most of the code to avoid the duplication of the code. > > It does not provide the functionality for all architectures, and thus, it > can > not be applied "as is". > But, again, feedbacks and comments are welcome. Why do you support all operations for 64-bit futexes? IMHO PI futexes don't make sense for 64-bit futexes, PI futexes have hardcoded bit layout of the 32-bit word. Similarly, FUTEX_WAKE is not really necessary for 64-bit futexes, 32-bit futex's FUTEX_WAKE can wake it equally well (it never reads anything, all it cares is about the futex's address). Similarly, I don't see a need for FUTEX_WAKE_OP (and this could simplify the patch quite a lot, no need to change asm*/futex.h headers at all). All that's needed is 64-bit FUTEX_WAIT and perhaps FUTEX_CMP_REQUEUE. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.20-rc4 1/4] futex priority based wakeup
On Wed, Jan 10, 2007 at 12:47:21PM +0100, Pierre Peiffer wrote: > So, yes it (logically) has a cost, depending of the number of different > priorities used, so it's specially measurable with real-time threads. > With SCHED_OTHER, I suppose that the priorities are not be very distributed. > > May be, supposing it makes sense to respect the priority order only for > real-time pthreads, I can register all SCHED_OTHER threads to the same > MAX_RT_PRIO priotity ? > Or do you think this must be set behind a CONFIG* option ? > (Or finally not interesting enough for mainline ?) As soon as there is at least one non-SCHED_OTHER thread among the waiters, there is no question about whether plist should be used or not, that's a correctness issue and if we want to conform to POSIX, we have to use that. I guess Ulrich's question was mainly about performance differences with/without plist wakeup when all threads are SCHED_OTHER. I'd say for that a pure pthread_mutex_{lock,unlock} benchmark or even just a program which uses futex FUTEX_WAIT/FUTEX_WAKE in a bunch of threads would be better. In the past we talked with Ingo about the possibilities here, one is use plist always and prove that it doesn't add measurable overhead over current FIFO (when only SCHED_OTHER is involved), the other possibility would be to start using FIFOs as before, but when the first non-SCHED_OTHER thread decides to wait on the futex, switch it to plist wakeup mode (convert the FIFO into a plist) and from that point on just use plist wakeups on it. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Intel Core Duo/Duo2 T2300/E6400 - Hyper-Threading (the absence of)
On Mon, Jan 08, 2007 at 01:44:32AM -0800, Robin H. Johnson wrote: > (Please CC me, I am not subscribed to LKML [I have set the > Mail-Followup-To header accordingly]). > > On two of my new machines, with Intel Core Duo T2300 and Core2 Duo E6400 > chips respectively, I noticed some weirdness in how many CPUs are > present. > > If the hyper-threading bit is present in the CPU info, should there > always be a an extra CPU presented to the system per physical core? No. The ht flag just says whether HT reporting via CPUID is supported. Core2 Duo E6400 is AFAIK not hyper-threaded, you just have 2 real sibling CPUs (except that they share L2 cache). Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel + gcc 4.1 = several problems
On Wed, Jan 03, 2007 at 05:32:16AM -0800, Arjan van de Ven wrote: > On Wed, 2007-01-03 at 12:44 +, Alan wrote: > > > > fixed. At that point an i686 kernel would contain i686 instructions and > > > > actually run on all i686 processors ending all the i586 pain for most > > > > users and distributions. > > > > > > Could you explain why CMOV is pointless now? Are there any benchmarks > > > proving that? > > > > Take a look at the recent ffmpeg bits on the mplayer list for one example > > I have to hand - P4 cmov is pretty slow. The crypto folks find the same > > things. > > cmov is effectively the same cost as a compare and jump, in both cases > the cpu needs to do a prediction, and on a mispredict, restart. > > the reason cmov can make sense is because it's smaller code... BTW, from GCC POV availability of CMOV is the only difference between -march=i586 -mtune=something and -march=i686 -mtune=something. So this is just a naming thing, it could be called -march=i686cmov to make it more obvious but it is too late (and too unimportant) to change it now. Perhaps adding a note to info gcc/man gcc ought to be enough? If you don't want CMOV being emitted, compile with -march=i586 -mtune=generic (or whatever other tuning you pick up), with -march=i686 -mtune=generic you tell GCC you have CMOV. Whether CMOV is actually used in generated code is another matter, which should be decided based on the selected -mtune. For -Os CMOV should be used whenever available, as that means usually smaller code, otherwise if on some particular chip CMOV is actually slower than compare, jump and assignment, then CMOV should not be selected for that particular tuning (say if Pentium4 has slower CMOV than compare+jump+assignment, -mtune=pentium4 should not emit CMOV, at least not often), if you have examples of that, please file a bug to http://gcc.gnu.org/bugzilla/. -mtune=generic should emit resp. not emit CMOV depending on whether it is a win on the currently common CPUs. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away
On Fri, Dec 01, 2006 at 08:28:16AM +0100, Willy Tarreau wrote: > Oh, I'm perfectly aware of this. That's in part why I started the hotfix > branch in the past :-) But sometimes, fixes consist in merging all the > patches from the maintenance branch (eg: from 4.1.0 to 4.1.1), and if > this is the case, there would not be much justification not to simply > update the version. In fact, what's really missing is a "fixlevel" in > the packages, to inform the user that 4.1.0 as shipped by the distro > has the same level of fixes as 4.1.1. But this is what the version is > used for today. This is even more complicated by the fact that upstream GCC release branches (and also several Linux distributors) start announcing the upcoming version already a few days after a release is tagged. E.g. 14 days old gcc-4_1-branch says: ./xgcc -B ./ --version; ./xgcc -B ./ -dD -E -xc /dev/null | grep GNU xgcc (GCC) 4.1.2 20061114 (prerelease) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. #define __GNUC__ 4 #define __GNUC_MINOR__ 1 #define __GNUC_PATCHLEVEL__ 2 but GCC 4.1.2 has not been released yet. In Fedora Core/RHEL and I think a few other distros the version number is only changed when it is officially released, e.g.: gcc --version; gcc -dD -E -xc /dev/null | grep GNU gcc (GCC) 4.1.1 20061011 (Red Hat 4.1.1-30) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. #define __GNUC__ 4 #define __GNUC_MINOR__ 1 #define __GNUC_PATCHLEVEL__ 1 #define __GNUC_RH_RELEASE__ 30 Note, 4.1.1 was released end of May this year and 4.1.2 has not been released. So, using __GNUC_PATCHLEVEL__ to detect if a bug has been fixed or not isn't very useful (you'd need to rule out also __GNUC_PATCHLEVEL__ <= 1 because gcc-4_1-branch was announcing that patchlevel already since beggining of March, on the other side there is a lot of GCCs with __GNUC_PATCHLEVEL__ == 1 that certainly have that bug fixed). You perhaps could parse the prerelease vs. release vs. vendor strings, but that could be quite difficult, perhaps easier would be just parse the date in the --version output. Checking for the bug is best though, because that will catch even backports of the bugfix without rebasing from the release branch. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 4/5][AIO] - AIO completion signal notification
On Wed, Nov 29, 2006 at 11:33:01AM +0100, S?bastien Dugu? wrote: > AIO completion signal notification > > The current 2.6 kernel does not support notification of user space via > an RT signal upon an asynchronous IO completion. The POSIX specification > states that when an AIO request completes, a signal can be delivered to > the application as notification. > > This patch adds a struct sigevent *aio_sigeventp to the iocb. > The relevant fields (pid, signal number and value) are stored in the kiocb > for use when the request completes. > > That sigevent structure is filled by the application as part of the AIO > request preparation. Upon request completion, the kernel notifies the > application using those sigevent parameters. If SIGEV_NONE has been specified, > then the old behaviour is retained and the application must rely on polling > the completion queue using io_getevents(). Well, from what I see applications must rely on polling the completion queue using io_getevents() in any case, isn't that the only way how to free the kernel resources associated with the AIO request, even if it uses SIGEV_SIGNAL or thread notification? aio_error/aio_return/aio_suspend will still need to io_getevents, the only important difference with this patch is that a) the polling doesn't need to be asynchronous (i.e. have a special thread which just loops doing io_getevents) b) it doesn't need to care about notification itself. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away
On Wed, Nov 29, 2006 at 02:56:20PM +1100, Keith Owens wrote: > Nicholas Miell (on Tue, 28 Nov 2006 19:08:25 -0800) wrote: > >On Wed, 2006-11-29 at 13:22 +1100, Keith Owens wrote: > >> Compiling 2.6.19-rc6 with gcc version 4.1.0 (SUSE Linux), > >> wait_hpet_tick is optimized away to a never ending loop and the kernel > >> hangs on boot in timer setup. > >> > >> 001a : > >> 1a: 55 push %ebp > >> 1b: 89 e5 mov%esp,%ebp > >> 1d: eb fe jmp1d > >> > >> This is not a problem with gcc 3.3.5. Adding barrier() calls to > >> wait_hpet_tick does not help, making the variables volatile does. > >> > >> Signed-off-by: Keith Owens > >> > >> --- > >> arch/i386/kernel/time_hpet.c |2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> Index: linux-2.6/arch/i386/kernel/time_hpet.c > >> === > >> --- linux-2.6.orig/arch/i386/kernel/time_hpet.c > >> +++ linux-2.6/arch/i386/kernel/time_hpet.c > >> @@ -51,7 +51,7 @@ static void hpet_writel(unsigned long d, > >> */ > >> static void __devinit wait_hpet_tick(void) > >> { > >> - unsigned int start_cmp_val, end_cmp_val; > >> + unsigned volatile int start_cmp_val, end_cmp_val; > >> > >>start_cmp_val = hpet_readl(HPET_T0_CMP); > >>do { > > > >When you examine the inlined functions involved, this looks an awful lot > >like http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22278 > > > >Perhaps SUSE should fix their gcc instead of working around compiler > >problems in the kernel? > > Firstly, the fix for 22278 is included in gcc 4.1.0. This actually sounds more like http://gcc.gnu.org/PR27236 And that one is broken in 4.1.0, fixed in 4.1.1. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] work around gcc4 issue with -Os in Dwarf2 stack unwind code
On Tue, Nov 28, 2006 at 02:48:15PM +, Jan Beulich wrote: > I disagree - the standard says there's a sequence point at a function > call after evaluating all function arguments. To me this means that any That's true, that sequence point makes sure e.g. all side effects such as pre-{dec,inc}rement on the arguments happen before the call. But as I said, no sequence point demands any particular ordering of evaluation of the LHS and RHS of +=. > (parts of an) expression the function call is contained in must be > evaluated after the function call. Otherwise it would be illegal to e.g. > modify a variable in both operands of && or ||. That's different, there is a sequence point at the end of the first operand of &&, ||, ?: and , operators (second bullet in ISO C99 Annex C). Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] work around gcc4 issue with -Os in Dwarf2 stack unwind code
On Tue, Nov 28, 2006 at 02:12:24PM +, Jan Beulich wrote: > This fixes a problem with gcc4 mis-compiling the stack unwind code under > -Os, which resulted in 'stuck' messages whenever an assembly routine was > encountered. "mis-compiling" and "work around" are wrong words, the code had undefined behavior (there is no sequence point between evaluation of ptr and get_uleb128(&ptr, end) and ptr is modified twice, so the compiler can evaluate it e.g. as: temp = ptr; temp = temp + get_uleb128(&ptr, end); ptr = temp; or temp = get_uleb128(&ptr, end); ptr += temp; While gcc has some warnings for sequence point semantics violations (-Wsequence-point), this can't be one of the cases at least until IPA moves much further, because get_uleb128 might very well not modify the variable and at that point the code would be ok). > Signed-off-by: Jan Beulich <[EMAIL PROTECTED]> > > --- linux-2.6.19-rc6/kernel/unwind.c 2006-11-22 14:54:10.0 +0100 > +++ 2.6.19-rc6-unwind-stuck/kernel/unwind.c 2006-11-28 15:02:15.0 > +0100 > @@ -938,8 +938,11 @@ int unwind(struct unwind_frame_info *fra > else { > retAddrReg = state.version <= 1 ? *ptr++ : > get_uleb128(&ptr, end); > /* skip augmentation */ > - if (((const char *)(cie + 2))[1] == 'z') > - ptr += get_uleb128(&ptr, end); > + if (((const char *)(cie + 2))[1] == 'z') { > + uleb128_t augSize = get_uleb128(&ptr, end); > + > + ptr += augSize; > + } > if (ptr > end > || retAddrReg >= ARRAY_SIZE(reg_info) > || REG_INVALID(retAddrReg) Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/