Re: [PATCH v7 2/2] Kbuild: implement support for DWARF v5

2021-02-03 Thread Jakub Jelinek
On Thu, Feb 04, 2021 at 08:06:12AM +0900, Masahiro Yamada wrote:
> GCC never outputs '.file 0', which is why
> this test is only needed for Clang, correct?

No, GCC outputs .file 0 if it during configure time detected assembler that
supports it and doesn't have any of the known bugs related to it.
But that means kernel doesn't need to care because GCC already took care of
that.

Jakub



Re: [PATCH v6 2/2] Kbuild: implement support for DWARF v5

2021-01-29 Thread Jakub Jelinek
On Fri, Jan 29, 2021 at 02:05:59PM -0800, Nick Desaulniers wrote:
> Ah, I see.  Then I should update the script I add
> (scripts/test_dwarf5_support.sh) to feature detect that bug, since
> it's the latest of the bunch.  Also, should update my comment to note
> that this requires binutils greater than 2.35.1 (which is what I have,
> which fails, since the backport landed in ... what?!)  How was this
> backported to 2.35
> (https://sourceware.org/bugzilla/show_bug.cgi?id=27195#c12, Jan 26
> 2021) when binutils-2_35_1 was tagged sept 19 2020?  Or will there be
> a binutils 2.35.2 point release?

AFAIK yes, soon.

Jakub



Re: [PATCH v6 1/2] Kbuild: make DWARF version a choice

2021-01-29 Thread Jakub Jelinek
On Fri, Jan 29, 2021 at 04:32:32PM -0500, Arvind Sankar wrote:
> Given what Jakub is saying, i.e. it was previously impossible to get
> dwarf2 with gcc, and you get dwarf4 whether or not DEBUG_INFO_DWARF4 was

It isn't impossible to get it, -gdwarf-2 works, it is just not a very good
choice (at least unless one knows some debug info consumer is not DWARF3 or
later ready).
Though, even gcc -gdwarf-2 will use many extensions from DWARF3 and later,
as long as there is no way to describe stuff in DWARF2.  -gstrict-dwarf
option requests that no DWARF extensions are used.

Jakub



Re: [PATCH v6 2/2] Kbuild: implement support for DWARF v5

2021-01-29 Thread Jakub Jelinek
On Fri, Jan 29, 2021 at 01:05:56PM -0800, Nick Desaulniers wrote:
> > Wasn't that fixed in GAS?
> > https://sourceware.org/bugzilla/show_bug.cgi?id=27195
> 
> $ make LLVM=1 -j72 defconfig
> $ ./scripts/config -e DEBUG_INFO -e DEBUG_INFO_DWARF5
> $ make LLVM=1 -j72
> ...
> /tmp/init-d50d89.s: Assembler messages:
> /tmp/init-d50d89.s:10: Error: file number less than one
> /tmp/init-d50d89.s:11: Error: junk at end of line, first unrecognized
> character is `m'
> 
> which is https://sourceware.org/bugzilla/show_bug.cgi?id=25611.
> 
> $ as --version | head -n1
> GNU assembler (GNU Binutils for Debian) 2.35.1
> 
> Maybe GAS should not need to be told -gdwarf-5 to parse these?  Then
> we would not need to pass -Wa,-gdwarf-5 via clang with
> -no-integrated-as.

That is what sw#27195 is about, just try current binutils 2.35, 2.36 or
trunk branches.

Jakub



Re: [PATCH v6 2/2] Kbuild: implement support for DWARF v5

2021-01-29 Thread Jakub Jelinek
On Fri, Jan 29, 2021 at 12:48:11PM -0800, Nick Desaulniers wrote:
> > Should this be...?
> >
> > KBUILD_AFLAGS += -Wa,-gdwarf-5
> 
> No; under the set of conditions Clang is compiling .c to .S with DWARF
> v5 assembler directives. GAS will choke unless told -gdwarf-5 via
> -Wa,-gdwarf-5 for .c source files, hence it is a C flag, not an A

Wasn't that fixed in GAS?
https://sourceware.org/bugzilla/show_bug.cgi?id=27195

Jakub



Re: [PATCH v6 1/2] Kbuild: make DWARF version a choice

2021-01-29 Thread Jakub Jelinek
On Fri, Jan 29, 2021 at 11:43:17AM -0800, Nick Desaulniers wrote:
> Modifies CONFIG_DEBUG_INFO_DWARF4 to be a member of a choice. Adds an
> explicit CONFIG_DEBUG_INFO_DWARF2, which is the default. Does so in a
> way that's forward compatible with existing configs, and makes adding
> future versions more straightforward.
> 
> Suggested-by: Arvind Sankar 
> Suggested-by: Fangrui Song 
> Suggested-by: Nathan Chancellor 
> Suggested-by: Masahiro Yamada 
> Signed-off-by: Nick Desaulniers 
> ---
>  Makefile  |  6 +++---
>  lib/Kconfig.debug | 21 -
>  2 files changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 95ab9856f357..20141cd9319e 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -830,9 +830,9 @@ ifneq ($(LLVM_IAS),1)
>  KBUILD_AFLAGS+= -Wa,-gdwarf-2
>  endif
>  
> -ifdef CONFIG_DEBUG_INFO_DWARF4
> -DEBUG_CFLAGS += -gdwarf-4
> -endif
> +dwarf-version-$(CONFIG_DEBUG_INFO_DWARF2) := 2
> +dwarf-version-$(CONFIG_DEBUG_INFO_DWARF4) := 4
> +DEBUG_CFLAGS += -gdwarf-$(dwarf-version-y)

Why do you make DWARF2 the default?  That seems a big step back from what
the Makefile used to do before, where it defaulted to whatever DWARF version
the compiler defaulted to?
E.g. GCC 4.8 up to 10 defaults to -gdwarf-4 and GCC 11 will default to
-gdwarf-5.
DWARF2 is more than 27 years old standard, DWARF3 15 years old,
DWARF4 over 10 years old and DWARF5 almost 4 years old...
It is true that some tools aren't DWARF5 ready at this point, but with GCC
defaulting to that it will change quickly, but at least DWARF4 support has
been around for years.

Jakub



Re: [PATCH] ilog2: Improve ilog2 for constant arguments

2020-11-21 Thread Jakub Jelinek
On Sat, Nov 21, 2020 at 09:23:10PM +0100, Luc Van Oostenryck wrote:
> On Fri, Nov 20, 2020 at 01:51:54PM +0100, Peter Zijlstra wrote:
> > 
> > Other option would be to change the const_ilog2 macro, though as the
> > description says it is meant to be used also in C constant expressions,
> > and while GCC will fold it to constant with constant argument even in
> > those, perhaps it is better to avoid using extensions in that case.
> 
> Just for info, the description is outdated and Sparse is just fine with
> __builtin_clzll() and friends in constant expressions (since Feb 2017)

Why is the description outdated?  It is still an extension that not every
compiler might fold in constant expressions.  And, the large expressions
aren't really a problem in constant expressions, they will be folded there
to constant or error.
The problem the patch was trying to solve is that the large expressions are
a problem at least for GCC in runtime code when guarded by
__builtin_constant_p, because __builtin_constant_p is folded quite late
(intentionally so, so that more constants can be propagated into it, e.g.
after inlining etc.), and the large expressions might confuse inliner
heuristics.

Jakub



[tip: core/core] ilog2 vs. GCC inlining heuristics

2020-11-20 Thread tip-bot2 for Jakub Jelinek
The following commit has been merged into the core/core branch of tip:

Commit-ID: ecbd43f6728a5cf79c8b50ed326658e9181531b1
Gitweb:
https://git.kernel.org/tip/ecbd43f6728a5cf79c8b50ed326658e9181531b1
Author:Jakub Jelinek 
AuthorDate:Wed, 21 Oct 2020 15:27:18 +02:00
Committer: Peter Zijlstra 
CommitterDate: Thu, 19 Nov 2020 11:26:18 +01:00

ilog2 vs. GCC inlining heuristics

Hi!

Based on the GCC PR97445 discussions, I'd like to propose following change,
which should significantly decrease the amount of code in inline functions
that use ilog2, but as I'm already two decades out of the Linux kernel
development, I'd appreciate if some kernel developer could try that (all
I have done is check that it gives the same results as before) and if it
works submit it for inclusion into the kernel?

Thanks.

Improve ilog2 for constant arguments

As discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
the const_ilog2 macro generates a lot of code which interferes badly
with GCC inlining heuristics, until it can be proven that the ilog2
argument can or can't be simplified into a constant.

It can be expressed using __builtin_clzll builtin which is supported
by GCC 3.4 and later and when used only in the __builtin_constant_p guarded
code it ought to always fold back to a constant.
Other compilers support the same builtin for many years too.

Other option would be to change the const_ilog2 macro, though as the
description says it is meant to be used also in C constant expressions,
and while GCC will fold it to constant with constant argument even in
those, perhaps it is better to avoid using extensions in that case.

Signed-off-by: Jakub Jelinek 
Signed-off-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20201021132718.GB2176@tucnak
---
 include/linux/log2.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/log2.h b/include/linux/log2.h
index c619ec6..4307d34 100644
--- a/include/linux/log2.h
+++ b/include/linux/log2.h
@@ -156,7 +156,8 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
 #define ilog2(n) \
 ( \
__builtin_constant_p(n) ?   \
-   const_ilog2(n) :\
+   ((n) < 2 ? 0 :  \
+63 - __builtin_clzll (n)) :\
(sizeof(n) <= 4) ?  \
__ilog2_u32(n) :\
__ilog2_u64(n)  \


Re: [PATCH] Kbuild: implement support for DWARF5

2020-11-04 Thread Jakub Jelinek
On Tue, Nov 03, 2020 at 02:21:22PM -0800, Nick Desaulniers wrote:
> > > This script fails for GCC 10.
> >
> > One thing is GCC DWARF-5 support, that is whether the compiler
> > will support -gdwarf-5 flag, and that support should be there from
> > GCC 7 onwards.
> 
> I should improve my Kconfig check; I don't actually have a test for
> -gdwarf-5 for the compiler.  In godbolt, it looks like -gdwarf-5
> produces an error from GCC up until GCC 5.1.  Does (5.1 < GCC < 7) not
> produce DWARF5?

No.  After all, those versions also predate DWARF5.
All 5.1 - 6.x did was start accepting -gdwarf-5 as experimental option
that enabled some small DWARF subset (initially only a few DW_LANG_* codes
newly added to DWARF5 drafts).  Only GCC 7 (released after DWARF 5 has
been finalized) started emitting DWARF5 section headers and got most of the
DWARF5 changes in, e.g. including switching over most of the now
standardized GNU extensions from their DW_*_GNU_* codes to DWARF5 DW_*).
With GCC 5/6, you get:
echo 'int i;' | gcc -c -o /tmp/test.o -xc - -gdwarf-5; readelf -wi /tmp/test.o 
| grep Version:
   Version:   4
while with 7+
   Version:   5
instead.

>  Maybe there's a more specific test you had in mind?

Guess what you want to test is what version you actually get in .debug_info
if you compile with -gdwarf-5.

> > Another separate thing is whether the assembler does support
> > the -gdwarf-5 option (i.e. if you can compile assembler files
> > with -Wa,-gdwarf-5) for GNU as I think that is binutils 35.1,
> > i.e. very new); but only if you want to pass the -Wa,-gdwarf-5
> > only when compiling *.s and *.S files.  That option is about whether
> > the assembler will emit DWARF5 or DWARF2 .debug_line.
> > It is fine to compile C sources with -gdwarf-5 and use DWARF2
> > .debug_line for assembler files if as doesn't support it.
> >
> > Yet another thing is if you can pass -Wa,-gdwarf-5 even when
> > compiling C files.  There are several bugs in that category that have been
> > fixed only in the last few days on binutils trunk, I'd suggest
> > just not to bother, GCC 11 will have proper test for fixed assembler
> > and will pass -gdwarf-5 to as when compiling even C sources with -gdwarf-5.
> 
> Do you have links?  I would prefer to do feature detection rather than

The
https://gcc.gnu.org/r11-3693
https://gcc.gnu.org/r11-4338
commits contain those tests in gcc/configure.ac

Jakub



Re: [PATCH] Kbuild: implement support for DWARF5

2020-11-02 Thread Jakub Jelinek
On Mon, Nov 02, 2020 at 11:20:41AM +0900, Masahiro Yamada wrote:
> > --- /dev/null
> > +++ b/scripts/test_dwarf5_support.sh
> > @@ -0,0 +1,4 @@
> > +#!/bin/sh
> > +# SPDX-License-Identifier: GPL-2.0
> > +set -eu
> > +echo ".file 0 \"asdf\"" | $* -Wa,-gdwarf-5 -c -x assembler -o /dev/null -
> 
> 
> 
> Please tell me how this script detects the dwarf-5 capability.
> 
> 
> This script fails for GCC 10.

One thing is GCC DWARF-5 support, that is whether the compiler
will support -gdwarf-5 flag, and that support should be there from
GCC 7 onwards.

Another separate thing is whether the assembler does support
the -gdwarf-5 option (i.e. if you can compile assembler files
with -Wa,-gdwarf-5) for GNU as I think that is binutils 35.1,
i.e. very new); but only if you want to pass the -Wa,-gdwarf-5
only when compiling *.s and *.S files.  That option is about whether
the assembler will emit DWARF5 or DWARF2 .debug_line.
It is fine to compile C sources with -gdwarf-5 and use DWARF2
.debug_line for assembler files if as doesn't support it.

Yet another thing is if you can pass -Wa,-gdwarf-5 even when
compiling C files.  There are several bugs in that category that have been
fixed only in the last few days on binutils trunk, I'd suggest
just not to bother, GCC 11 will have proper test for fixed assembler
and will pass -gdwarf-5 to as when compiling even C sources with -gdwarf-5.
The reason is to get DWARF5 .debug_line (.debug_line is usually produced
by the assembler, not compiler, from .file/.loc directives).

Jakub



Re: GCC section alignment, and GCC-4.9 being a weird one

2020-10-21 Thread Jakub Jelinek
On Wed, Oct 21, 2020 at 10:00:31AM +0200, Peter Zijlstra wrote:
> On Tue, Oct 20, 2020 at 04:39:38PM -0700, Florian Fainelli wrote:
> > This patch causes all files under kernel/sched/* that include sched.h to
> > be rebuilt whenever the value of CONFIG_BLK_DEV_INITRD. There are at
> > least two build systems (buildroot and OpenWrt) that toggle this
> > configuration value in order to produce a kernel image without an
> > initramfs, and one with.
> > 
> > On ARM we get all of these to be needlessly rebuilt:
> 
> Is it really ARM specific? AFAICT this should happen on everything.
> 
> > Short of moving the STRUCT_ALIGNMENT to a separate header that would not
> > be subject to any configuration key change, can you think of a good way
> > to avoid these rebuilds, including for architectures like ARM that ship
> > their own vmlinux.lds.h? I would not say this is a bug, but it is
> > definitively an inconvenience.
> 
> Well, no :/ I barely made it work in the first place. This linker cruft
> is not my forte. GCC-4.9 being 'special' here is just weird in any case.
> 
> We can ask our friends on linux-toolchains; maybe they'll have a clue.
> 
> Guys, the problem is the below commit which, for dubious raisins makes
> kernel/sched/sched.h depend on asm-generic/vmlinux.lds.h and triggers
> rebuilds whenever a CONFIG mentioned in asm-generic/vmlinux.lds.h
> changes.
> 
> Is there an explanation for why GCC-4.9 is weird and is there a better
> way to find the appropriate value?

I have only looked at x86.
If you are talking about:
struct sched_class {
#define A(n) void (*n) (void);
#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4)
  B(fn1) B(fn2) B(fn3) B(fn4)
  A(fn50) A(fn51) A(fn52) A(fn53)
} __attribute__((aligned(32)));

struct sched_class a;

having the the variable aligned to 64 bytes rather than 32 bytes, it
started with
https://gcc.gnu.org/r0-127405-gd3c2fee09607e7d70cc7e69822638fab2bda6c7b
and disappeared with
https://gcc.gnu.org/r5-5887-g239711f6afe3070a11c4f1d9266588e8db1217ee

As the aligned attribute is just on the type and not on the actual
variables, I don't consider it a bug, GCC only guarantees it when
the variable itself has user specified alignment.
Otherwise the compiler can choose to align variables more if they
are large for performance reasons (to help vectorization e.g. when
the variable is copied around).  Plus the ABI sometimes requires
bigger alignment too (e.g. on x86_64 the psABI
says that array variables larger than 15 bytes must have alignment at least
16 bytes).

So, if you tweak the above testcase to have either
struct sched_class a __attribute__((aligned(32)));
or even remove it also from the sched_class struct, you should get
the behavior you want even from GCC 4.9 and other GCC versions.

And, if e.g. 32-byte alignment is not what you really need, but e.g.
natural alignment of the struct sched_class would be sufficient, you could
use __attribute__((aligned (alignof (struct sched_class on the variables
instead (perhaps wrapped in some macro).

BTW, the 32 vs. 64 vs. whatever else byte alignment is heavily architecture
and GCC version dependent, it is not that on all arches larger structures
will be always 32 byte aligned no matter what.

> Fixes: c3a340f7e7ea ("sched: Have sched_class_highest define by 
> vmlinux.lds.h")
> Reported-by: kernel test robot 
> Signed-off-by: Peter Zijlstra (Intel) 
> Link: 
> https://lkml.kernel.org/r/20200630144905.gx4...@hirez.programming.kicks-ass.net
> 
> diff --git a/include/asm-generic/vmlinux.lds.h 
> b/include/asm-generic/vmlinux.lds.h
> index 66fb84c3dc7e..3ceb4b7279ec 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -108,6 +108,17 @@
>  #define SBSS_MAIN .sbss
>  #endif
>  
> +/*
> + * GCC 4.5 and later have a 32 bytes section alignment for structures.
> + * Except GCC 4.9, that feels the need to align on 64 bytes.
> + */
> +#if __GNUC__ == 4 && __GNUC_MINOR__ == 9
> +#define STRUCT_ALIGNMENT 64
> +#else
> +#define STRUCT_ALIGNMENT 32
> +#endif
> +#define STRUCT_ALIGN() . = ALIGN(STRUCT_ALIGNMENT)
> +
>  /*
>   * The order of the sched class addresses are important, as they are
>   * used to determine the order of the priority of each sched class in
> @@ -123,13 +134,6 @@
>   *(__stop_sched_class)   \
>   __end_sched_classes = .;
>  
> -/*
> - * Align to a 32 byte boundary equal to the
> - * alignment gcc 4.5 uses for a struct
> - */
> -#define STRUCT_ALIGNMENT 32
> -#define STRUCT_ALIGN() . = ALIGN(STRUCT_ALIGNMENT)
> -
>  /* The actual configuration determine if the init/exit sections
>   * are handled as text/data or they can be discarded (which
>   * often happens at runtime)
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 5aa6661ecaf1..9bef2dd01247 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -67,6 +67,7 @@
>  #include 
>  
>  #include 
> +#include 
>  
>  #ifdef CONFIG_PARAVIRT
>  # include 
> @@ -1810,

Re: [PATCH] [RFT] crypto: aes-generic - turn off -ftree-pre and -ftree-sra

2017-12-20 Thread Jakub Jelinek
On Wed, Dec 20, 2017 at 09:52:05PM +0100, Arnd Bergmann wrote:
> diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c
> index ca554d57d01e..35f973ba9878 100644
> --- a/crypto/aes_generic.c
> +++ b/crypto/aes_generic.c
> @@ -1331,6 +1331,20 @@ EXPORT_SYMBOL_GPL(crypto_aes_set_key);
>   f_rl(bo, bi, 3, k); \
>  } while (0)
>  
> +#if __GNUC__ >= 7
> +/*
> + * Newer compilers try to optimize integer arithmetic more aggressively,
> + * which generally improves code quality a lot, but in this specific case
> + * ends up hurting more than it helps, in some configurations drastically
> + * so. This turns off two optimization steps that have been shown to
> + * lead to rather badly optimized code with gcc-7.
> + *
> + * See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
> + */
> +#pragma GCC optimize("-fno-tree-pre")
> +#pragma GCC optimize("-fno-tree-sra")

So do it only when UBSAN is enabled?  GCC doesn't have a particular
predefined macro for those (only for asan and tsan), but either the kernel
does have something already, or could have something added in the
corresponding Makefile.

Jakub


Re: [PATCH] ARC: uaccess: get_user to zero out dest in cause of fault

2016-08-20 Thread Jakub Jelinek
On Sat, Aug 20, 2016 at 05:45:00PM -0700, Linus Torvalds wrote:
> You have to save the stack pointer at the setjmp point too. And there
> might be other architecture-specific ABI rules for that. But you're
> right, it might be worth it.
> 
> I *would* be a bit worried about code generation issues.
> setjmp/longjmp is so seldom used that it's one of those things where
> it might be best to verify with some gcc person that it doesn't cause
> huge code-gen problems.
> 
> Adding Jakub just to check: Jakub, would a setjump/longjump kind of
> interface for exception handling going to cause us problems
> (performance or correctness) with gcc?

If you plan to use setjmp/longjmp a lot, then it is certainly a major
performance and compile time/memory problem.
Older versions don't model it properly, and newer gccs emit abnormal edges
from every longjmp or call that might longjmp to an artificial basic block
and from there to every setjmp.
Also note that gcc has/supports two setjmp kind of APIs, normal setjmp and
slightly more lightweight __builtin_setjmp which saves fewer registers, and
on some targets is/used to be used for EH instead of DWARF based ones.

Jakub


Re: Q: why didn't GCC warn about this uninitialized variable? (was: Re: [PATCH] perf tests: initialize sa.sa_flags)

2016-03-03 Thread Jakub Jelinek
On Thu, Mar 03, 2016 at 02:47:16PM +0100, Ingo Molnar wrote:
> I tried to distill a testcase out of it, and the following silly hack seems 
> to 
> trigger it:

...

This is a known issue, which we don't have a solution for yet.
The thing is, GCC has 2 uninitialized warning passes, one is done
very early, on fairly unoptimized code, which warns for -O and above
only about must be uninitialized cases in code that is executed
unconditionally (if the containing function is executed, and doesn't
have PHI handling code), and then a very late uninitialized pass,
that warns also about maybe-uninitialized cases, has predicate aware
handling in it, etc.; but this warns only about the cases where the
uninitialized uses survived through the optimizations until that phase.
In the testcase, the conditional uninitialized uses got optimized away,
passes seeing that you can get alt_idx initialized say to 2 from one branch
and uninitialized from another one just optimize it into 2.
Warning right away at that spot when the optimization pass performs this
might not be the right thing, as it could warn for stuff in dead code,
or couldn't be backed up by the predicate aware uninit analysis which is
costly and couldn't be done in every pass that just happens to optimize away
some uninitialized stuff.  Not to mention that it doesn't have to be always
even so obvious to the optimizing pass.  Say, when computing value ranges,
the uninitialized uses should be ignored, because they can't be used in
valid paths, so if say you have value range [2, 34] from one branch and
uninitialized use from another branch, the resulting value range will be
[2, 34].  Then later on, you just optimize based on this value range and
perhaps the uninitialized use will go away because of that.
We could handle the uninitialized uses pessimistically, by not optimizing
PHI  into just initialized_2, etc., by
considering uninitialized uses as VARYING ([min, max] range) rather than
something that doesn't happen, etc., and then the late uninitialized pass
would warn here.  But then we'd trade the warning for less optimized code.
GCC is primarily an optimizing compiler, rather than static analyzer, so
that is why GCC chooses to do what it does.  Do you want us introduce
-Ow mode, which will prefer warnings over generated code quality?

BTW, as for false positives and new warnings, my experience is that
in the kernel generally such warnings are just disabled, even if they
helped discover severe errors in other packages.

Jakub


Re: Q: why didn't GCC warn about this uninitialized variable? (was: Re: [PATCH] perf tests: initialize sa.sa_flags)

2016-03-03 Thread Jakub Jelinek
On Thu, Mar 03, 2016 at 02:24:34PM +0100, Ingo Molnar wrote:
> 6 hours of PeterZ time translates to quite a bit of code restructuring 
> overhead to 
> eliminate false positive warnings...

I'll file a bugzilla enhancement request for this (with new attribute),
perhaps we could do it in FRE that is able to see through memory
stores/loads even in addressable structures in some cases.
Though, certainly GCC 7 material.
And, in this particular case it couldn't do anything anyway, because
the sigfillset call is not inlined, and takes address of a field in the
structure.  The compiler can't know if it doesn't cast it back to struct
sigaction and initialize the other fields.
BTW, valgrind should be able to detect this.

Jakub


Re: Q: why didn't GCC warn about this uninitialized variable? (was: Re: [PATCH] perf tests: initialize sa.sa_flags)

2016-03-03 Thread Jakub Jelinek
On Thu, Mar 03, 2016 at 01:19:44PM +0100, Ingo Molnar wrote:
> struct sigaction sa;
> 
>   ...
> 
> sigfillset(&sa.sa_mask);
> sa.sa_sigaction = segfault_handler;
> sigaction(SIGSEGV, &sa, NULL);
> 
> ... which uninitialized sa.sa_flags field GCC merrily accepted as proper C 
> code, 
> despite us turning on essentially _all_ GCC warnings for the perf build that 
> exist 
> under the sun:

GCC -W{,maybe-}uninitialized warning works only on SSA form, so in order to
get that warning, either it needs to be some scalar that is uninitialized,
or Scalar Replacement of Aggregates needs to be able to turn the structure
into independent scalars.  Neither is the case here, as you take address of
the struct when passing its address to sigaction, and that call can't be
inlined nor is in any other way visible to the compiler, so that it could
optimize both the caller and sigaction itself at the same time.

Even if GCC added infrastructure for tracking which bits/bytes in
aggregates are or might be uninitialized at which point, generally,
  struct XYZ abc;
  foo (&abc);
is so common pattern that warning here that the fields are uninitialized
would have extremely high false positive ratio.
Even if as somebody mentioned that the argument is const struct sigaction *
rather than struct sigaction *, that doesn't change really anything,
you can cast away the constness and still write into it in the other
function.
Furthermore, in many APIs, only a subset of fields need to be initialized
unconditionally, and other fields might need to be initialized only
conditionally, depending on those always initialized fields, other
parameters to functions, etc.
So, in order to warn here, we'd need some assurance (new attribute on
sigaction function?) that when it is called, it has to have all named fields in
the pointed to structures initialized (perhaps attribute like nonnull, which
can either apply to all pointer arguments, or selected ones) at the entry of
the function and not initializing them all is a bug.
So, this really isn't as trivial as you might think it is.

Jakub


Re: gcc feature request / RFC: extra clobbered regs

2015-07-01 Thread Jakub Jelinek
On Wed, Jul 01, 2015 at 01:35:16PM -0400, Vladimir Makarov wrote:
> Actually it raise a question for me.  If we describe that a function
> clobbers more than calling convention and then use it as a value (assigning
> a variable or passing as an argument) and loosing a track of it and than
> call it.  How can RA know what the call clobbers actually.  So for the
> function with the attributes we should prohibit use it as a value or make
> the attributes as a part of the function type, or at least say it is unsafe.
> So now I see this as a *bigger problem* with this extension.  Although I
> guess it already exists as we have description of different ABI as an
> extension.

Unfortunately target attribute is function decl attribute rather than
function type.  And having more attributes affect switchable targets will be
non-fun.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: gcc feature request / RFC: extra clobbered regs

2015-07-01 Thread Jakub Jelinek
On Wed, Jul 01, 2015 at 11:23:17AM -0400, Vladimir Makarov wrote:
> >>(I'm not necessarily suggesting that we do this for the syscall bodies
> >>themselves.  I want to do it for the entry and exit helpers, so we'd
> >>still lose the five cycles in the full fast-path case, but we'd do
> >>better in the slower paths, and the slower paths are becoming
> >>increasingly important in real workloads.)
> >GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
> >options, which allow to tweak the calling conventions; but it is per
> >translation unit right now.  It isn't clear which of these options
> >you mean with the extra_clobber.
> >I assume you are looking for a possibility to change this to be
> >per-function, with caller with a different calling convention having to
> >adjust for different ABI callee.  To some extent, recent GCC versions
> >do that automatically with -fipa-ra already - if some call used registers
> >are not clobbered by some call and the caller can analyze that callee,
> >it can stick values in such registers across the call.
> >I'd say the most natural API for this would be to allow
> >f{fixed,call-{used,saved}}-REG in target attribute.
> >
> >
> One consequence of frequent changing calling convention per function or
> register usage could be GCC slowdown.  RA calculates too many data and it
> requires a lot of time to recalculate them after something in the register
> usage convention is changed.

That is true.  i?86/x86_64 is a switchable target, so at least for the case
of info computed for the callee with non-standard calling convention such
info can be computed just once when the function with such a target
attribute would be seen first.  But for the caller side, I agree not
everything can be precomputed, if we can't use e.g. regsets saved in the
callee; as a single function can call different functions with different
ABIs.  But to some extent we have that already with -fipa-ra, don't we?

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: gcc feature request / RFC: extra clobbered regs

2015-06-30 Thread Jakub Jelinek
On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote:
> I'm working on a massive set of cleanups to Linux's syscall handling.
> We currently have a nasty optimization in which we don't save rbx,
> rbp, r12, r13, r14, and r15 on x86_64 before calling C functions.
> This works, but it makes the code a huge mess.  I'd rather save all
> regs in asm and then call C code.
> 
> Unfortunately, this will add five cycles (on SNB) to one of the
> hottest paths in the kernel.  To counteract it, I have a gcc feature
> request that might not be all that crazy.  When writing C functions
> intended to be called from asm, what if we could do:
> 
> __attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14",
> "r15"))) void func(void);
> 
> This will save enough pushes and pops that it could easily give us our
> five cycles back and then some.  It's also easy to be compatible with
> old GCC versions -- we could just omit the attribute, since preserving
> a register is always safe.
> 
> Thoughts?  Is this totally crazy?  Is it easy to implement?
> 
> (I'm not necessarily suggesting that we do this for the syscall bodies
> themselves.  I want to do it for the entry and exit helpers, so we'd
> still lose the five cycles in the full fast-path case, but we'd do
> better in the slower paths, and the slower paths are becoming
> increasingly important in real workloads.)

GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG
options, which allow to tweak the calling conventions; but it is per
translation unit right now.  It isn't clear which of these options
you mean with the extra_clobber.
I assume you are looking for a possibility to change this to be
per-function, with caller with a different calling convention having to
adjust for different ABI callee.  To some extent, recent GCC versions
do that automatically with -fipa-ra already - if some call used registers
are not clobbered by some call and the caller can analyze that callee,
it can stick values in such registers across the call.
I'd say the most natural API for this would be to allow
f{fixed,call-{used,saved}}-REG in target attribute.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Optimize variable_test_bit()

2015-05-01 Thread Jakub Jelinek
On Fri, May 01, 2015 at 09:03:32AM -0700, Linus Torvalds wrote:
> > PPS. Jakub, I see gcc5.1 still hasn't got output operands for asm goto;
> >  is this something we can get 'fixed' ?

CCing Richard as author of asm goto and Vlad as register allocator
maintainer.  There are a few enhancement requests to support this, like
http://gcc.gnu.org/PR59615 and http://gcc.gnu.org/PR52381 , but indeed the
reason why no outputs are allowed is the register allocation issue.
Don't know if LRA would be better suited to handle that case, but it would
indeed be pretty hard.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bit fields && data tearing

2014-09-04 Thread Jakub Jelinek
On Thu, Sep 04, 2014 at 08:24:12AM -0400, Peter Hurley wrote:
> And I just confirmed with the Alpha cross-compiler that the fields are
> not 'padded out' if volatile either.

They can't be, struct layout is part of the ABI.
Guess you can introduce say atomic_bool and similar typedefs which would be
bool or char on most arches and on alpha int.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bit fields && data tearing

2014-09-04 Thread Jakub Jelinek
On Thu, Sep 04, 2014 at 10:57:40AM +0200, Mikael Pettersson wrote:
> Benjamin Herrenschmidt writes:
>  > On Wed, 2014-09-03 at 18:51 -0400, Peter Hurley wrote:
>  > 
>  > > Apologies for hijacking this thread but I need to extend this discussion
>  > > somewhat regarding what a compiler might do with adjacent fields in a 
> structure.
>  > > 
>  > > The tty subsystem defines a large aggregate structure, struct tty_struct.
>  > > Importantly, several different locks apply to different fields within 
> that
>  > > structure; ie., a specific spinlock will be claimed before updating or 
> accessing
>  > > certain fields while a different spinlock will be claimed before 
> updating or
>  > > accessing certain _adjacent_ fields.
>  > > 
>  > > What is necessary and sufficient to prevent accidental false-sharing?
>  > > The patch below was flagged as insufficient on ia64, and possibly ARM.
>  > 
>  > We expect native aligned scalar types to be accessed atomically (the
>  > read/modify/write of a larger quantity that gcc does on some bitfield
>  > cases has been flagged as a gcc bug, but shouldn't happen on normal
>  > scalar types).
>  > 
>  > I am not 100% certain of "bool" here, I assume it's treated as a normal
>  > scalar and thus atomic but if unsure, you can always use int.
> 
> Please use an aligned int or long.  Some machines cannot do atomic
> accesses on sub-int/long quantities, so 'bool' may cause unexpected
> rmw cycles on adjacent fields.

Yeah, at least pre-EV56 Alpha performs rmw cycles on char/short accesses
and thus those are not atomic.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] gcc version 5: add basic definition header for latest gcc version

2014-08-15 Thread Jakub Jelinek
On Fri, Aug 15, 2014 at 03:23:01PM -0400, Paul Gortmaker wrote:
> --- /dev/null
> +++ b/include/linux/compiler-gcc5.h
> @@ -0,0 +1,52 @@
> +#ifndef __LINUX_COMPILER_H
> +#error "Please don't include  directly, include 
>  instead."
> +#endif
> +
> +#define __used   __attribute__((__used__))
> +#define __must_check __attribute__((warn_unused_result))
> +#define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
> +
> +/* Mark functions as cold. gcc will assume any path leading to a call
> +   to them will be unlikely.  This means a lot of manual unlikely()s
> +   are unnecessary now for any paths leading to the usual suspects
> +   like BUG(), printk(), panic() etc. [but let's keep them for now for
> +   older compilers]
> +
> +   Early snapshots of gcc 4.3 don't support this and we can't detect this
> +   in the preprocessor, but we can live with this because they're unreleased.
> +   Maketime probing would be overkill here.

The above 4 lines are probably unnecessary.

More importantly, is it a good idea to store the gcc major number in the
header?  gcc 5.x will be released next year, but the year after that we'll
have gcc 6.x, so you'd need to copy this header each year.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-08-06 Thread Jakub Jelinek
On Tue, Aug 05, 2014 at 03:36:39PM -0700, Linus Torvalds wrote:
> On Tue, Aug 5, 2014 at 2:07 PM, Frank Ch. Eigler  wrote:
> >
> > Actually, "perf probe" does (via HAVE_DWARF_SUPPORT), to place probes
> > and to extract variables at those probes, much as systemtap does.
> > Without var-tracking, probes placed at most interior points of
> > functions will make variables inaccessible.
> 
> .. and as mentioned, -O2 already does that for many things, even
> *with* tracking.

Sure, debug info coverage for highly optimized code is never going to be
perfect, but there is a difference if you have 75% of vars 
or just 33% of vars (see the numbers I've posted, I've picked ext4.o just
randomly because it was one of the largest modules, can post more numbers if
needed).  BTW, because var-tracking is not performed at -O0, sometimes debug
info is actually worse in -O0 than at higher optimization levels, because
variable locations are bogus in prologues, epilogues and for variables with
register keyword anywhere.
There have been several man-years of work to get from the 25% var coverage
to 67%, several DWARF extensions (most of them to be available in DWARF5 or
work in progress on that) and with -fno-var-tracking-assignments that is
just returned to the old state.

> In other words, anybody who relies on it has already learnt to work
> around it. Or, more likely, there just isn't anybody who relies on it.
> 
> I don't understand how you guys can be so cavalier about a compiler
> bug that has already resulted in actual real problems. You bring up

I have no problem with a -fno-var-tracking-assignments workaround for
compilers that have the PR61801 wrong-code bug.  What I have problem with
is with disabling it even for compilers that have that bug fixed.
That is in essence disabling a useful feature just because it could have
other bugs.  If my memory serves me well, PR61801 is the only wrong-code
I remember caused by -fvar-tracking-assignments during the 5 years since
it has been introduced into gcc.  Sure, there have been several
-fcompare-debug bugs, where we generated slightly different code between
-g and -g0, and as you mentioned we have one still pending (Vladimir is
working on it right now), but that is mainly relevant to the case where
you'd ship -g0 built binaries (== kernel) and then only if bugs appear
wanted someone else to build kernel with -g and get identical binary, so
that you could debug it.  I believe if people build kernel with -g, then
they usually build it with -g from the beginning, and either save the kernel
with debug info somewhere, strip it to file or handle it similarly.
If there is a fear there could be other wrong-code bugs with
-fno-var-tracking-assignments, from the past experience that would be
~ another 5 years to discover it.  Compare that to the frequency of -O2
wrong code issues, with that you'd need to disable -O2 because of the fear
of unknown compiler bugs first.  And, we had various wrong-code bugs even at
-O0, so even that wouldn't help.  Compiler bugs are just that, bugs that
need to be reported, fixed, fixed compiler distributed to users, it is the
same thing with kernel bugs, libc bugs etc.

> theoretical cases that nobody has actually reported, and are
> apparently ignoring the fact that the compiler generates INCORRECT
> CODE. So on one hand we have known breakage, on the other we have

It actually isn't theoretical, actually various -fvar-tracking-assignments
changes have been done because people complained about important variables
in the kernel being optimized away, the whole DW_OP_GNU_entry_value DWARF
extension (DW_OP_entry_value in DWARF5 when it is released) was added
because of bugreports from people trying to debug the kernel.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-08-05 Thread Jakub Jelinek
On Tue, Aug 05, 2014 at 01:46:49PM +0200, Markus Trippelsdorf wrote:
> On 2014.08.05 at 07:31 -0400, Josh Boyer wrote:
> > Sorry to bring this back up after the fact, but it's important for a
> > number of things in various distros.  I don't disagree it should be
> > disabled by default, but making it unconditional is going to force the
> > distributions that care about perf, systemtap, and debuggers to
> > manually revert this.  That deviation is concerning because the
> > upstream kernel won't easily be buildable the same way distros build
> > it.
> > 
> > I'm happy to come up with a config option patch, but I'm not sure if
> > it would be accepted.  Is that a possibility at this point?
> 
> Please note that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61923

But that isn't a wrong-code bug, in some cases it is no code generation 
difference
between -g and -g0 at all (just :TI on the dumps being different), in some
cases it can affect scheduling decisions for real, but still not in a way
that disregards the actual dependencies in between instructions.  You can
get such scheduling differences also if you just use slightly different
compiler revision.

> isn't fixed yet. So it would be premature to manually revert Linus'
> patch yet. 
> 
> When PR61923 gets fixed (and backported) its testcase could be combined
> with the testcase Jakub posted earlier in this thread.

Just some data to show that the debug info differences are nothing close to
insignificant.  I've built a 3.16 kernel (from Fedora 21 rpm) both normally
(which includes the -fno-var-tracking-assignments change) and with that line
in Makefile commented out.  E.g. on fs/ext4/ext4.o, dwlocstat says for
-fno-var-tracking-assignments:
dwlocstat --tabulate=0.0:10,99,100 fs/ext4/ext4.o
cov%samples cumul
0.0 19620/75%   19620/75%
0..10   518/1%  20138/77%
11..20  317/1%  20455/78%
21..30  362/1%  20817/79%
31..40  349/1%  21166/81%
41..50  240/0%  21406/82%
51..60  297/1%  21703/83%
61..70  293/1%  21996/84%
71..80  355/1%  22351/85%
81..90  402/1%  22753/87%
91..99  901/3%  23654/90%
100 2425/9% 26079/100%
and without it:
dwlocstat --tabulate=0.0:10,99,100 fs/ext4/ext4.o
cov%samples cumul
0.0 8697/33%8697/33%
0..10   431/1%  9128/35%
11..20  371/1%  9499/36%
21..30  608/2%  10107/38%
31..40  525/2%  10632/40%
41..50  461/1%  11093/42%
51..60  881/3%  11974/45%
61..70  678/2%  12652/48%
71..80  894/3%  13546/51%
81..90  1036/3% 14582/55%
91..99  949/3%  15531/59%
100 10548/40%   26079/100%

So, with -fno-var-tracking-assignments, about 75% of parameters/variables have
no location info at all in ext4.o, 9% of parameters/variables have coverage
for all instructions where the parameter/variable is in scope, the rest
something in between.  Without -fno-var-tracking-assignments, only 33% of
params/vars have no location info at all, and 40% have coverage on all
instructions where they are in scope.  Even for the params/vars without 100%
coverage there can be seen improvements, e.g. in the 51%-90% coverage.

Or e.g. -fno-var-tracking-assignments:
dwlocstat --tabulate=0.0:10,99,100 lib/built-in.o
cov%samples cumul
0.0 6323/64%6323/64%
0..10   144/1%  6467/66%
11..20  138/1%  6605/67%
21..30  123/1%  6728/68%
31..40  144/1%  6872/70%
41..50  169/1%  7041/72%
51..60  130/1%  7171/73%
61..70  146/1%  7317/74%
71..80  198/2%  7515/76%
81..90  258/2%  7773/79%
91..99  541/5%  8314/85%
100 1448/14%9762/100%
without -fno-var-tracking-assignments:
dwlocstat --tabulate=0.0:10,99,100 lib/built-in.o
cov%samples cumul
0.0 2954/30%2954/30%
0..10   131/1%  3085/31%
11..20  110/1%  3195/32%
21..30  141/1%  3336/34%
31..40  212/2%  3548/36%
41..50  226/2%  3774/38%
51..60  254/2%  4028/41%
61..70  237/2%  4265/43%
71..80  325/3%  4590/47%
81..90  420/4%  5010/51%
91..99  425/4%  5435/55%
100 4328/44%9763/100%

It is visible also e.g. in the section sizes on vmlinux:
-fno-var-tracking-assignments:
readelf -WS vmlinux 2>&1 | awk '/\.debug_/{printf "%s %fMB\n", $2, 
strtonum("0x"$6)/1024.0/1024.0}'
.debug_aranges 0.115387MB
.debug_info 96.625183MB
.debug_abbrev 2.585021MB
.debug_line 5.936069MB
.debug_frame 1.710655MB
.debug_str 1.956143MB
.debug_loc 8.713246MB
.debug_ranges 3.020081MB
without -fno-var-tracking-assignments:
readelf -WS vmlinux 2>&1 | awk '/\.debug_/{printf "%s %fMB\n", $2, 
strtonum("0x"$6)/1024.0/1024.0}'
.debug_aranges 0.115387MB
.debug_info 99.564449MB
.debug_abbrev 2.665213MB
.debug_line 5.936069MB
.debug_frame 1.710655MB
.debug_str 1.955960MB
.debug_loc 22.607447MB
.debug_ranges 3.020264MB

.debug_info growth is only very small (not even 3MB), only few vars have the
same value everywhere, but .debug_loc growth is significant (2.6 times
bigger).

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read

Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-07-30 Thread Jakub Jelinek
On Wed, Jul 30, 2014 at 09:13:08AM +0200, Markus Trippelsdorf wrote:
> On 2014.07.30 at 08:53 +0200, Jakub Jelinek wrote:
> > On Tue, Jul 29, 2014 at 06:49:09PM -0700, Greg Kroah-Hartman wrote:
> > > 3.15-stable review patch.  If anyone has any objections, please let me 
> > > know.
> > 
> > IMNSHO this is a too big hammer approach.  The bug happened on a single
> > file only (right?), so if anything, IMHO it could be disabled for that
> > single file only, and better do it only for compilers with the bug.
> 
> No. There are many more files affected. It just happens that Linus
> analyzed the assembly of this single file (fair.c) and found a bug. 
> Just build your redhat distro kernel with GCC_COMPARE_DEBUG=1 and you'll
> see. So unless someone analyzes the assembly output of all other
> affected files by hand and finds no issues, one has to assume the worst.

I'm talking about wrong-code issues.  For -fcompare-debug, we indeed check
it primarily during gcc bootstrap (through bootstrap-debug) and some
testcases, and we'll certainly try to build some more code with
-fcompare-debug and fix the issues.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-07-29 Thread Jakub Jelinek
On Tue, Jul 29, 2014 at 06:49:09PM -0700, Greg Kroah-Hartman wrote:
> 3.15-stable review patch.  If anyone has any objections, please let me know.

IMNSHO this is a too big hammer approach.  The bug happened on a single
file only (right?), so if anything, IMHO it could be disabled for that
single file only, and better do it only for compilers with the bug.
If there are wrong code bugs with -O2 (we've fixed many in the past), you
also don't turn off -O2 everywhere, similarly for -Os or any other options.
Disabling it just in case the same bug happens elsewhere when it actually
took 5 years before the bug caused miscompilation of something is too
defensive.  We had at least 15 other wrong-code bugfixes just in between
4.9.0 and 4.9.1.  -fvar-tracking-assignments doesn't make a small difference
in debug info, but significant for optimized code.
If you build the kernel without and with -fno-var-tracking-assignments,
you can use e.g. the dwlocstat tool to see what kind of difference it makes
for the kernel in particular in variable debug info coverage.
> 
> --- a/Makefile
> +++ b/Makefile
> @@ -669,6 +669,8 @@ KBUILD_CFLAGS += -fomit-frame-pointer
>  endif
>  endif
>  
> +KBUILD_CFLAGS   += $(call cc-option, -fno-var-tracking-assignments)
> +
>  ifdef CONFIG_DEBUG_INFO
>  KBUILD_CFLAGS+= -g
>  KBUILD_AFLAGS+= -Wa,--gdwarf-2
> 

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Random panic in load_balance() with 3.16-rc

2014-07-29 Thread Jakub Jelinek
On Mon, Jul 28, 2014 at 08:09:02PM +0200, Markus Trippelsdorf wrote:
> Here's the testcase:
> 
> int a, b, c;
> void fn1 ()
> {
>   int d;
>   if (fn2 () && !0)
> {
>   b = (
>{
>int e;
>fn3 ();
>switch (0)
>default:
>asm volatile("" : "=a"(e) : "0"(a), "i"(0));
>e;
>});
>   d = b;
> }
>   c = d;
> }

int a, c;
int bar (void);
void baz (void);

void
foo (void)
{
  int d;
  if (bar ())
{
  int e;
  baz ();
  asm volatile ("" : "=a" (e) : "0" (a), "i" (0));
  d = e;
}
  c = d;
}

fails the same way and has more creduce cruft removed.
Fails also with 4.7 at -O2 -fcompare-debug.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Random panic in load_balance() with 3.16-rc

2014-07-26 Thread Jakub Jelinek
On Sat, Jul 26, 2014 at 10:20:55PM +0200, Markus Trippelsdorf wrote:
> On 2014.07.26 at 15:55 -0400, Theodore Ts'o wrote:
> > On Sat, Jul 26, 2014 at 09:35:57PM +0200, Markus Trippelsdorf wrote:
> > > 
> > > But fortunately the workaround for the new inode.c bug is the same as
> > > for the original bug: -fno-var-tracking-assignments. 
> > > 
> > > It would make sense to enabled it unconditionally for all debug
> > > configurations for now.
> > 
> > What's the downside of enabling this unconditionally on a compiler
> > with the bug fixed?  I assume a certain amount of optimization will
> > lost, but is it significant/measurable?
> 
> Only the quality of the debug info would suffer a bit.

Which for various tools that use kernel's debug info is a significant
difference.
So adding the option even for fixed gcc is undesirable (and, tracking
gcc version numbers only is not enough, I guess most of the distro gccs
will backport the fix soon).

This PR is the first -fcompare-debug wrong-code in the last few years
I remember. There are -fcompare-debug failures from time to time, but
usually they are just that either there is insignificant code change or
no change at all, just changes in the text dump files -fcompare-debug
uses to check whether there might be code differences or not.
GCC's stated goal is that -g should not affect code generation, so we
treat all such differences as bugs, but most of the time they aren't
breaking anything.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Random panic in load_balance() with 3.16-rc

2014-07-25 Thread Jakub Jelinek
On Fri, Jul 25, 2014 at 01:01:11PM -0700, Linus Torvalds wrote:
> For example, gcc will not create a small stack frame with "sub
> $8,%rsp". No, what gcc does is to use a random "push" instruction.
> Fair enough, but that really makes things much harder to see. Here's
> an example:

That is because for -Os, push is certainly shorter than sub $8,%rsp.

If you want to test for this gcc bug in the kernel, supposedly one should
just take the short testcase from the GCC PR, try to compile it and see if
you e.g. get a -fcompare-debug -Os failure with the testcase.
In that case, you could instead of giving up completely just
-fno-var-tracking-assignments.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Random panic in load_balance() with 3.16-rc

2014-07-24 Thread Jakub Jelinek
On Thu, Jul 24, 2014 at 11:47:17AM -0700, Linus Torvalds wrote:
> Adding Jakub to the cc, because gcc-4.9.0 seems to be terminally broken.
...
> Jakub, any ideas?

Can I ask anyone involved in this for preprocessed source and all gcc command
line options to reproduce it, best in the form of a http://gcc.gnu.org/bugzilla/
bugreport?

Thanks.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tell gcc optimizer to never introduce new data races

2014-06-10 Thread Jakub Jelinek
On Tue, Jun 10, 2014 at 05:13:29PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 10, 2014 at 05:04:55PM +0200, Marek Polacek wrote:
> > On Tue, Jun 10, 2014 at 04:53:27PM +0200, Peter Zijlstra wrote:
> > > On Tue, Jun 10, 2014 at 03:23:36PM +0200, Jiri Kosina wrote:
> > > > +# Tell gcc to never replace conditional load with a non-conditional one
> > > > +KBUILD_CFLAGS  += $(call cc-option,--param allow-store-data-races=0)
> > > > +
> > > 
> > > Why do we not want: -fmemory-model=safe? And should we not at the very
> > > least also disable packed-store-data-races?
> > 
> > Note that the option does not exist, even though it is mentioned in the
> > documentation.
> 
> Urgh.. ok. Any word on the packed-store-data thing?

That is recognized, undocumented and never used in the compiler (not in 4.7
or any later release till now).  Most of the spots in the compiler that
could introduce data races were actually just changed, there is (already
since 4.7) just a single conditional on the --param allow-store-data-races=X
value.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:x86/urgent] compiler/gcc4: Make quirk for asm_volatile_goto( ) unconditional

2014-02-13 Thread Jakub Jelinek
On Thu, Feb 13, 2014 at 03:37:08AM -0800, tip-bot for Steven Noonan wrote:
> Commit-ID:  a9f180345f5378ac87d80ed0bea55ba421d83859
> Gitweb: http://git.kernel.org/tip/a9f180345f5378ac87d80ed0bea55ba421d83859
> Author: Steven Noonan 
> AuthorDate: Wed, 12 Feb 2014 23:01:07 -0800
> Committer:  Ingo Molnar 
> CommitDate: Thu, 13 Feb 2014 12:34:05 +0100
> 
> compiler/gcc4: Make quirk for asm_volatile_goto() unconditional
> 
> I started noticing problems with KVM guest destruction on Linux
> 3.12+, where guest memory wasn't being cleaned up. I bisected it
> down to the commit introducing the new 'asm goto'-based atomics,
> and found this quirk was later applied to those.
> 
> Unfortunately, even with GCC 4.8.2 (which ostensibly fixed the
> known 'asm goto' bug) I am still getting some kind of
> miscompilation. If I enable the asm_volatile_goto quirk for my
> compiler, KVM guests are destroyed correctly and the memory is
> cleaned up.

BTW, which exact 4.8.2 were you using?
The last known asm goto bug has been fixed on October, 10th, 2013:
http://gcc.gnu.org/PR58670
so before the October, 16th, 2013 4.8.2 release.  But already since
May 31th, 2013 the tip of the 4.8 GCC branch has been announcing itself
as 4.8.2 prerelease.  While some distribution versions of GCC announce
themselves as the new version only starting from the release date,
i.e. snapshots in between 4.8.1 release and 4.8.2 release announce
themselves as 4.8.1, in other distributions or upstream it announces itself
as 4.8.2.  So, if you are using the latter and a snapshot in between May
31th, 2013 and October, 10th, 2013, then you could see gcc patchlevel 2,
yet have a gcc with that bug unfixed.
So, if the kernel doesn't use a runtime test/configure test to check for
this issue, but instead just relies on the patchlevel version, the only
safe way would be to look for GCC >= 4.9 or GCC 4.8 with patchlevel > 2
rather than > 1.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] compiler/gcc4: Add quirk for 'asm goto' miscompilation bug

2014-02-12 Thread Jakub Jelinek
On Wed, Feb 12, 2014 at 08:11:49PM -0800, Linus Torvalds wrote:
> Jakub, any suggestions to how Steven might be able to pinpoint where
> the code generation problem lies?

For a suspected wrong-code where you have no idea where the problem is from
debugging or oops etc., usually the best way is to bisect first at the *.o
level between (ABI compatible) objects built in a way that it works (in this
case supposedly the asm goto workaround you have in the kernel, but
generally it can be -O0 compilation, a different version of the compiler,
some other compiler flags that make the issue go away) and objects built in
a way that fails, once you narrow it down to (hopefully a single) object
where everything built the "bad" way but that single object works in the end
and everything built the "good" way but that single object fails, the next
step is to narrow it down to a single routine (unless there is e.g. just a
single asm goto left in the TU).  For that one can try e.g. optimize
attribute on selected functions, or in this case supposedly preprocessing
the file and bisecting between portions of the preprocessed file from "good"
preprocessed file (with the asm goto workarounds applied) and "bad"
preprocessed file (without them) - count the number of asm gotos there and
bisect to find which asm goto matters.  Or it is also possible to bisect
at the *.s level (easiest is to build with -fno-asynchronous-unwind-tables
-fno-exceptions -g0 if possible to decrease number of labels, build twice
("good" and "bad"), then rename the .LNNN labels in one or both files
such that they don't clash (say to .LXNNN in one file, .LYNNN in another
one) and then bisect at the function level - start with taking approximately
first half of functions from one file and second half from another file,
etc.).  When you narrow it down to function level, eyeball the assembly and
try to find out where it is wrong, or try to create a self-contained
executable testcase out of the preprocessed source (cut out everything
unneeded from the preprocessed file, keep there just the problematic routine
and everything that is inlined into it and all symbols they need, supply
from another file? a new main that calls the routine with the parameters
that cause the problem and stub all the functions it calls with something
that still reproduces it), or even just file a http://gcc.gnu.org/bugzilla/
bug with the preprocessed source, compiler options and hopefully detailed
description what to look at.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: current_thread_info() not respecting program order with gcc 4.8.x

2013-11-19 Thread Jakub Jelinek
On Tue, Nov 19, 2013 at 04:57:49PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 19, 2013 at 03:29:12PM +, Mathieu Desnoyers wrote:
> > However, looking at ARM arch/arm/include/asm/thread_info.h:
> > 
> > static inline struct thread_info *current_thread_info(void) { register
> > unsigned long sp asm ("sp"); return (struct thread_info *)(sp &
> > ~(THREAD_SIZE - 1)); }
> > 
> > The inline assembly has no clobber and is not volatile. (this is also
> > true for all other architectures I've looked at so far, which includes
> > x86 and powerpc)

The above is not inline assembly, it is a local register variable extension,
see
http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Local-Reg-Vars.html#Local-Reg-Vars

> > Since each current_thread_info() is a different asm ("sp") without
> > clobber nor volatile, AFAIU, the compiler is within its right to
> > reorder them.

Sure.

> > One possible solution to this might be to add "memory" clobber and
> > volatile to this inline asm, but I fear it would put way too much
> > constraints on the compiler optimizations (too heavyweight).

As it is not inline asm extension, you can't.
Of course you could add asm volatile ("" : "=r" (ret) : "0" (sp));
or similar and thus make it a barrier, but I think the current
definition of current_thread_info meant to avoid all that extra
overhead.  Why does it matter when sp is different between the
current_thread_info () calls?  As long as sp & ~(THREAD_SIZE - 1) is
the same, it shouldn't make a difference.  Or is the call in between those
changing sp to something else?

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] gcc4: Add 'asm goto' miscompilation quirk

2013-10-10 Thread Jakub Jelinek
On Thu, Oct 10, 2013 at 07:04:18AM -0700, Richard Henderson wrote:
> On 10/10/2013 01:31 AM, Jakub Jelinek wrote:
> > Also, for the bitops patch, you probably want an asm_volatile_goto variant.
> 
> Why?  Asm without output (which asm goto must be) are automatically volatile.

You're right.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH, -v2] compiler/gcc4: Add quirk for 'asm goto' miscompilation bug

2013-10-10 Thread Jakub Jelinek
On Thu, Oct 10, 2013 at 01:56:17PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 10, 2013 at 10:55:06AM +0200, Ingo Molnar wrote:
> > +/*
> > + * GCC 'asm goto' miscompiles certain code sequences:
> > + *
> > + *   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
> > + *
> > + * Work it around via quirk suggested by Jakub Jelinek.
> > + * Fixed in GCC 4.8.2 and later versions.
> > + */
> > +#if GCC_VERSION <= 40801
> 
> We didn't do version checks for CC_HAVE_ASM_GOTO because of vendor
> backports; can't we detect this in the same way?

The problem is that it will be harder to check for this as compile time only
check, and for runtime check you'd need to have the assembly string for
every architecture and you couldn't do it for cross-compiling anyway.
For compile time only check, it wouldn't be 100% reliable, you could e.g.
check for that using -S -O2 -xc - -o - on:
int
foo (int a, int b)
{
  if (a)
return -3;
  asm volatile goto ("asm volatile goto to %l[lab]" : : "m" (b) : "memory" : 
lab);
  return 0;
lab:
  return 0;
}
and use awk on the resulting assembly to find out if the
asm volatile goto to (.*)$
string, then skip lines starting in column 0 with an
assembly comment character(s) (#, %, //, not sure if those 3 are all you can
see) and check that the first non-skipped line starts with the string matching
(.*) earlier followed by : (or perhaps skip other labels too?).
That said, the check could fail even in fixed gccs, so perhaps you want to
combine that with both version check and test, if version is >= 4.8.3
(note, while I hope it will be fixed in 4.8.2 release, people using
prerelease compilers would still have __GNUC_PATCHLEVEL__ == 2, at least
in upstream gcc (e.g. in Fedora/RHEL we patch down the patchlevel version,
so that __GNUC_PATCHLEVEL__ is 2 only for GCC release x.y.2 and following
snapshots, while upstream bumps patchlevel immediately after a release is
made), even with gcc containing that bug.  So for >= 4.8.3 just assume no
workaround is needed, otherwise scan assembly.

> 
> > +# define __asm_goto(vol, x...) do { asm vol goto(x); asm (""); } while (0)
> > +#else
> > +# define __asm_goto(vol, x...) do { asm vol goto(x); } while (0)
> > +#endif
> 
> This places the asm("") in the fallthrough case; but Jakub wrote:
> 
> > @@ -8,6 +8,7 @@ foo (int a, int b)
> >asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : 
> > lab);
> >return 0;
> >  lab:
> > +  asm ("");
> >return 0;
> >  }
> 
> Which places the asm ("") after the label, these two are not the same.

See the follow-up mails, I think placing it immediately after asm goto might
be better.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] gcc4: Add 'asm goto' miscompilation quirk

2013-10-10 Thread Jakub Jelinek
On Thu, Oct 10, 2013 at 10:24:30AM +0200, Ingo Molnar wrote:
> Something like the patch below? (Totally untested and all that.)
> 
> Notes:
> 
> - If the bug is fixed in 4.8.3 then the version check can be sharpened
>   from 9 to 40803.

The bug is likely going to be fixed already for 4.8.2 (to be released
next week or so).

> - I'd really prefer this quirk versus having to add the extra barrier to 
>   the label, as it makes the actual usage sites a lot less painful.

Please check how much it bloats the generated code.
Also, for the bitops patch, you probably want an asm_volatile_goto variant.

> --- a/include/linux/compiler-gcc4.h
> +++ b/include/linux/compiler-gcc4.h
> @@ -65,6 +65,19 @@
>  #define __visible __attribute__((externally_visible))
>  #endif
>  
> +/*
> + * GCC 'asm goto' miscompiles certain code sequences:
> + *
> + *   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
> + *
> + * Work it around via quirk suggested by Jakub Jelinek.
> + * Not yet fixed, so use the quirk on all compiler versions:
> + */
> +#if GCC_VERSION <= 9
> +# define asm_goto(x...) do { asm goto(x); asm (""); } while (0)
> +#else
> +# define asm_goto(x...) do { asm goto(x); } while (0)
> +#endif
>  
>  #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
>  #if GCC_VERSION >= 40400

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-10 Thread Jakub Jelinek
On Thu, Oct 10, 2013 at 08:51:04AM +0200, Jakub Jelinek wrote:
> @@ -8,6 +8,7 @@ foo (int a, int b)
>asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : lab);
>return 0;
>  lab:
> +  asm ("");
>return 0;
>  }

Or alternatively put the asm (""); right after asm goto,
  asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : lab);
  asm ("");
  return ...;
lab;
  return ...;
What generates better code remains to be tested.  In any case, please
conditionalize the hacks on non-fixed compilers once the fix is released.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Jakub Jelinek
On Thu, Oct 10, 2013 at 08:22:38AM +0200, Ingo Molnar wrote:
> > On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote:
> > > On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote:
> > >
> > > > Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670 Seems all of 
> > > > 4.[6-9] miscompile it.  Will have a look tomorrow unless somebody 
> > > > beats me to it.  But historically, the case where asm goto labels 
> > > > jump to fallthru basic block had numerous problems in the past.
> > > 
> > > That bug lists the component as middle end; this suggests x86_64 would 
> > > be vulnerable too, can you confirm? So far we've only observed the 
> > > wrong code on i386 targets, x86_64 targets appeared correct.
> > 
> > Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and 
> > even say on ppc64 (sure, one would have to rewrite the asm to have it 
> > fail at runtime).
> 
> Please let us know once you know enough about the bug to suggest 
> workarounds. Because it's a nice optimization even extra instruction(s) 
> would be acceptable I suspect: we could perhaps put a NOP into a slowpath, 
> with an (unused) goto to it, or something like that?

IMHO you don't need to put there a nop, I guess asm (""); would be enough,
that will still make sure the label is never in the fallthru basic block
and the whole class of issues with asm goto with labels in the fallthru
bb can't hit.  The disadvantage is that it will generate worse code.

@@ -8,6 +8,7 @@ foo (int a, int b)
   asm volatile goto ("bts $1, %0; jc %l[lab]" : : "m" (b) : "memory" : lab);
   return 0;
 lab:
+  asm ("");
   return 0;
 }

on the testcase from the PR results in something like:
#APP
# 8 "pr58670-1.c" 1
bts $1, -4(%rsp); jc .L3
# 0 "" 2
#NO_APP
.L5:
xorl%eax, %eax
ret
.p2align 4,,10
.p2align 3
.L3:
xorl%eax, %eax
ret
.p2align 4,,10
.p2align 3
.L4:
movl$-3, %eax
ret
while code without the extra asm (""); and with a fixed compiler:
#APP
# 6 "pr58670.c" 1
bts $1, -4(%rsp); jc .L3
# 0 "" 2
#NO_APP
.L3:
xorl%eax, %eax
ret
.p2align 4,,10
.p2align 3
.L4:
.L2:
movl$-3, %eax
ret

FYI, list of past compiler issues with asm goto include:
PR54127, PR46226, PR44071, PR52650, PR54455, PR51767.

I hope we get this fixed for 4.8.2, so you could then avoid
these hacks for GCC 4.8.2 and later.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Jakub Jelinek
On Wed, Oct 09, 2013 at 09:02:31PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 09, 2013 at 08:16:13PM +0200, Jakub Jelinek wrote:
> > Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670
> > Seems all of 4.[6-9] miscompile it.  Will have a look tomorrow
> > unless somebody beats me to it.  But historically, the case where
> > asm goto labels jump to fallthru basic block had numerous problems in the
> > past.
> 
> That bug lists the component as middle end; this suggests x86_64 would
> be vulnerable too, can you confirm? So far we've only observed the wrong
> code on i386 targets, x86_64 targets appeared correct.

Any target, the testcase in the bugzilla aborts on x86_64 with -O2, and
even say on ppc64 (sure, one would have to rewrite the asm to have it fail
at runtime).

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-09 Thread Jakub Jelinek
On Wed, Oct 09, 2013 at 04:46:56PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 09, 2013 at 04:33:59PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 09, 2013 at 04:07:34PM +0200, Peter Zijlstra wrote:
> > > Once I force a x86_64 build using the 'same' config it goes away and
> > > generates 'sensible' code again (although I don't see why L9 isn't
> > > merged with L2):
> > 
> > i386-SMP also generates correct code afaict; a tad stupid but not wrong.
> > 
> > If I remove ftrace from the .config its still broken..
> > If I also remove the likely/unlikely tracer its still broken and lots
> > smaller:
> 
> OK, its -march=winchip2 that's buggered.

Confirmed as gcc bug, filed http://gcc.gnu.org/PR58670
Seems all of 4.[6-9] miscompile it.  Will have a look tomorrow
unless somebody beats me to it.  But historically, the case where
asm goto labels jump to fallthru basic block had numerous problems in the
past.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Jakub Jelinek
On Tue, Oct 08, 2013 at 08:51:54PM +0200, Oleg Nesterov wrote:
> On 10/08, Linus Torvalds wrote:
> >
> > (not yet merged), see:
> >
> > 
> > http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d
> 
> I do not really understand inline assembly constraints, but I'll ask
> anyway.
> 
>   +#define __GEN_RMWcc(fullop, var, cc, ...) \
>   +do { \
>   + asm volatile goto (fullop "; j" cc " %l[cc_label]" \
>   + : : "m" (var), ## __VA_ARGS__ \
> ^
> 
> don't we need
> 
>   "+m" (var)
> 
> here?

You actually can't have output operands with asm goto, only inputs
and clobbers.  But the "memory" clobber should be enough here.

If you suspect a compiler bug, can somebody please narrow it down to
a single object file (if I've skimmed the patch right, it is just an
optimization, where object files compiled without and with the patch
should actually coexist fine in the same kernel), ideally to a single
routine if possible and post a preprocessed source + gcc command line
+ version of gcc?

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Friendlier EPERM - Request for input

2013-01-09 Thread Jakub Jelinek
On Wed, Jan 09, 2013 at 12:53:40PM -0800, Casey Schaufler wrote:
> I'm suggesting that the string returned by get_extended_error_info()
> ought to be the audit record the system call would generate, regardless
> of whether the audit system would emit it or not.

What system call would that info be for and would it be reset on next
syscall that succeeded, or also failed?

The thing is, various functions e.g. perform some syscall, save errno, do
some other syscall, and if they decide that the first syscall should be what
determines the whole function's errno, just restore errno from the saved
value and return.  Similarly, various functions just set errno upon
detecting some error condition in userspace.
There is no 1:1 mapping between many libc library calls and syscalls.
So, when would it be safe to call this new get_extended_error_info function
and how to determine to which syscall it was relevant?

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86.git#mm] stack protector fixes, vmsplice exploit

2008-02-14 Thread Jakub Jelinek
On Thu, Feb 14, 2008 at 09:25:35PM +0100, Ingo Molnar wrote:
> The per function call overhead from stackprotector is already pretty 
> serious IMO, but at least that's something that GCC _could_ be doing 
> (much) smarter (why doesnt it jne forward out to __check_stk_failure, 
> instead of generating 4 instructions, one of them a default-mispredicted 
> branch instruction??), so that overhead could in theory be something 
> like 4 fall-through instructions per function, instead of the current 6.

Where do you see a mispredicted branch?
int foo (void)
{
  char buf[64];
  bar (buf);
  return 6;
}

-O2 -fstack-protector -m64:
subq$88, %rsp
movq%fs:40, %rax
movq%rax, 72(%rsp)
xorl%eax, %eax
movq%rsp, %rdi
callbar
movq72(%rsp), %rdx
xorq%fs:40, %rdx
movl$6, %eax
jne .L5
addq$88, %rsp
ret
.L5:
.p2align 4,,6
.p2align 3
call__stack_chk_fail
-O2 -fstack-protector -m32:
pushl   %ebp
movl%esp, %ebp
subl$88, %esp
movl%gs:20, %eax
movl%eax, -4(%ebp)
xorl%eax, %eax
leal-68(%ebp), %eax
movl%eax, (%esp)
callbar
movl$6, %eax
movl-4(%ebp), %edx
xorl%gs:20, %edx
jne .L5
leave
ret
.L5:
.p2align 4,,7
.p2align 3
call__stack_chk_fail
-O2 -fstack-protector -m64 -mcmodel=kernel:
subq$88, %rsp
movq%gs:40, %rax
movq%rax, 72(%rsp)
xorl%eax, %eax
movq%rsp, %rdi
callbar
movq72(%rsp), %rdx
xorq%gs:40, %rdx
movl$6, %eax
jne .L5
addq$88, %rsp
ret
.L5:
.p2align 4,,6
.p2align 3
call__stack_chk_fail

both with gcc 4.1.x and 4.3.0.
BTW, you can use -fstack-protector --param=ssp-buffer-size=4
etc. to tweak the size of buffers to trigger stack protection, the
default is 8, but e.g. whole Fedora is compiled with 4.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: asm-x86/sigcontext.h changes break userland

2008-02-13 Thread Jakub Jelinek
On Wed, Feb 13, 2008 at 08:26:50AM +0100, Ingo Molnar wrote:
> 
> * Jakub Jelinek <[EMAIL PROTECTED]> wrote:
> 
> > x86: use generic register names in struct sigcontext 
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=742fa54a62be6a263df14a553bf832724471dfbe
> > 
> > changeset breaks userland, e.g. it is not possible to compile gcc 
> > anymore (both 32-bit and 64-bit libgcc), and I expect any other 
> > program which pokes into struct sigcontext.  The register names with e 
> > resp. r have been in use for years, what's the point breaking it now?
> 
> ok - does the patch below solve the problem for you?

Yes, this fixes it.  Thanks.

FYI, gcc uses glibc headers to get at struct sigcontext, but
on i386 (and many other arches) glibc's  just includes
.  On x86_64, ia64 and sparc* glibc doesn't include
asm/sigcontext.h, but provides its own definitions, so for gcc itself
only changing 32-bit parts woiuld be enough.  That said, there are certainly
programs which include asm/sigcontext.h directly (plus there are other c
libraries, some of which may use asm/sigcontext.h).

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


asm-x86/sigcontext.h changes break userland

2008-02-12 Thread Jakub Jelinek
Hi!

The

x86: use generic register names in struct sigcontext
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=742fa54a62be6a263df14a553bf832724471dfbe

changeset breaks userland, e.g. it is not possible to compile gcc anymore
(both 32-bit and 64-bit libgcc), and I expect any other program which pokes
into struct sigcontext.  The register names with e resp. r have been in use
for years, what's the point breaking it now?

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [regression] Re: brk randomization breaks columns

2008-02-05 Thread Jakub Jelinek
On Tue, Feb 05, 2008 at 01:54:26PM +0100, Ingo Molnar wrote:
> * Jiri Kosina <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, 5 Feb 2008, Pavel Machek wrote:
> > 
> > > > Actually, this clearly shows that either prehistoric libc.so.5 or the 
> > > > program itself are broken.
> > > I believe it shows clear regression in latest 2.6.25 kernel.
> > 
> > I am still not completely sure. It might be a regression, but it also 
> > might just trigger the bug in ancient version in libc.so.5 which might 
> > be fixed in some later version [...]
> 
> which too is a regression ...
> 
> really, lets add a sysctl for this, and a .config option that either 
> disables or enables it. Then we will default to disabled. (but users can 
> enable it - and distros can build their kernels with this .config option 
> enabled)

I don't think kernel should care about programs which are buggy and make invalid
assumptions, and that's the case here.  I remember we have been through this
5 years ago when brk randomization has been added to Red Hat kernels.  There was
one or two broken programs which made assumptions on what brk(0) is supposed
to return at program startup, everything else was ok.
For the buggy apps there is always setarch i386 -R ./the_buggy_program
so I don't think we need to add another sysctl for this.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Possible 2.6.24-rc7 issue w/respect to pthreads

2008-01-09 Thread Jakub Jelinek
On Wed, Jan 09, 2008 at 02:35:32AM -0800, [EMAIL PROTECTED] wrote:
> After I patched my 2.6.23 kernel to 2.6.24-rc7 this morning, I noticed
> some odd behavior with respect to POSIX threads in a test program I had
> written (originally to test epoll.)
> 
> The behavior is as follows:
> 
> 1.  main() creates a new thread of execution with pthread_create
> 2.  thread_func() immediately calls pthread_detach(), which is supposed to
> ensure that thread resources are cleaned up when the thread terminates.
> 3.  The spawned thread sleeps and then prints a message "got here"
> 4.  The main thread calls pthread_join().  According to the POSIX
> documentation, this should suspend execution until the spawned thread has
> terminated.

Your testcase is buggy.  Detached threads aren't joinable, you can't call
pthread_join on them.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fedora's latest gcc produces unbootable kernels

2007-12-03 Thread Jakub Jelinek
On Mon, Dec 03, 2007 at 12:34:17PM +0100, Thomas Gleixner wrote:
> Of course just to annoy you :)

It doesn't matter whether I'm annoyed about this or not, but whether gcc is
able to generate decent code with it or not.  And especially with union it
is not, at least through all the tree ssa passes.  You already have a lot of
the details hidden in ktime.h accessor inlines, so I don't think it would be
hard to add further one or two.

Anyway, even just using typedef struct ktime { s64 tv64; } ktime_t; could
make things better in case you have just one field.  Unlike unions, structs
can be (and in this case most likely will be) scalarized by SRA, so
half of tree SSA passes will see it as integral var and will be able to
perform optimizations on it.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fedora's latest gcc produces unbootable kernels

2007-12-03 Thread Jakub Jelinek
On Mon, Dec 03, 2007 at 09:17:22AM +0100, Thomas Gleixner wrote:
> I looked at the disassembly but I can not spot the problem.
> 
> I think the real problem is somewhere else. Likely candidates are
> hrtimer_forward() or hrtimer_start() - in that order.

Should be hopefully fixed in latest Fedora gcc.  The problem was in code like
typedef union { long long int s; } U;
typedef struct { U u; } S;

void foo (S *s, long long int x, unsigned long int y)
{
  s->u = ({ (U) { .s = s->u.s + x * y }; });
}

where a backport of a recent optimization of mine, without which gcc handles
terribly initializers from compound literals (which is something hrtimer
uses just everywhere - why can't ktime.h for #if BITS_PER_LONG == 64 || 
defined(CONFIG_KTIME_SCALAR)
just use a scalar rather than union with a scalar in it??), sets the LHS
object to the compound literal's initializer rather than forcing creation of
a temporary object (the compound literal).  Unfortunately the gimplifier
had some bugs in case the initializer references (or at least might
reference) parts of LHS object.  Fixed by backporting 2 Ada bugfixes for the
gimplifier from GCC trunk (Ada was hitting those bugs even without this
compound literal optimization).

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC: 2.6 patch] add -fno-tree-scev-cprop to KBUILD_CFLAGS

2007-11-12 Thread Jakub Jelinek
On Sun, Nov 11, 2007 at 07:48:29AM +0100, Adrian Bunk wrote:
> The gcc from svn that will become gcc 4.3 generates libgcc calls in 
> cases like the following (on 32bit architectures):
> 
> <--  snip  -->
> 
> static inline void timespec_add_ns(struct timespec *a, u64 ns)
> {
> ...
> while(ns >= NSEC_PER_SEC) {
> ns -= NSEC_PER_SEC;
> a->tv_sec++;
> }
> ...
> 
> <--  snip  -->

Blindly using -fno-tree-scev-cprop just to get rid of one case where
this turns out to be a pessimization when kernel knows ns is usually very
small is IMHO a wrong thing, you'd lose many cases where this optimization
can actually improve performance.  Instead, for this exact case just
add an optimization barrier to avoid gcc doing this.
Adding asm ("" : "=r" (ns) : "0" (ns)); (or hide it in some macro) into the
loop will do the job just fine.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [20/45] x86_64: Use 8 byte stack alignment when possible

2007-09-21 Thread Jakub Jelinek
On Fri, Sep 21, 2007 at 10:45:02PM +0200, Andi Kleen wrote:
> 
> Kernel doesn't use SSE2, so it doesn't need 16 byte alignment. Also
> the stack can be already unaligned so letting the compiler align
> is useless. This may make some stack frames smaller.

Shouldn't sources that are compiled into the VDSO or VSYSCALL pages
revert this to the default?

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: increase AT_VECTOR_SIZE to terminate saved_auxv properly

2007-09-15 Thread Jakub Jelinek
On Fri, Sep 14, 2007 at 01:00:57PM +0200, Olaf Hering wrote:
> include/asm-powerpc/elf.h has 6 entries in ARCH_DLINFO.
> fs/binfmt_elf.c has 14 unconditional NEW_AUX_ENT entries and 2
> conditional NEW_AUX_ENT entries.
> So in the worst case, saved_auxv does not get an AT_NULL entry at the
> end.
> 
> Is an AT_NULL entry required or must userspace use the AT_VECTOR_SIZE
> to not loop past the end of the array?

Of course it is required, AT_VECTOR_SIZE is a kernel implementation detail.

> If AT_NULL is required, AT_VECTOR_SIZE should be changed from 44 to 46.

No, it should be computed instead from the number of target independent aux
vector pairs and then from an per-arch macro which says how many arch
specific aux vector pairs are needed.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RESEND] PIE executable randomization

2007-08-14 Thread Jakub Jelinek
On Wed, Aug 08, 2007 at 04:03:07PM +0200, Jiri Kosina wrote:
> @@ -870,11 +917,15 @@ static int load_elf_binary(struct linux_binprm *bprm, 
> struct pt_regs *regs)
>* default mmap base, as well as whatever program they
>* might try to exec.  This is because the brk will
>* follow the loader, and is not movable.  */
> +#ifdef CONFIG_X86
> + load_bias = 0;
> +#else
>   load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr);
> +#endif
>   }
>  
>   error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,
> - elf_prot, elf_flags);
> + elf_prot, elf_flags,0);
>   if (BAD_ADDR(error)) {
>   send_sig(SIGKILL, current, 0);
>   retval = IS_ERR((void *)error) ?

If I'm reading the above hunk correctly, this means we will randomize
all PIEs and even all dynamic linkers invoked as executables on i?86 and
x86_64, and on the rest of arches we won't randomize at all, instead
load ET_DYN objects at ELF_ET_DYN_BASE address.

But I don't see anything i?86/x86_64 specific on this.

What would make much more sense to me would be conditionalizing on
whether we are loading a dynamic linker (in which case loading it
at ELF_ET_DYN_BASE is desirable or not (PIEs, ...; and for PIEs we
want to randomize on all architectures).

So something like
if (elf_interpreter)
load_bias = 0;
else
/* Probably dynamic linker invoked as
   /lib*/ld*so* program args - load at
   ELF_ET_DYN_BASE.  */
load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - 
vaddr);
instead of
#ifdef CONFIG_X86
load_bias = 0;
#else
load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr);
#endif

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Implementation of POSIX mqueues in Linux 2.6

2007-08-03 Thread Jakub Jelinek
On Fri, Aug 03, 2007 at 09:59:32AM +, gregfe wrote:
> I find little documentation on the actual implementation of POSIX message
> queues in Linux, and need some advise. In particular, I am wondering
> whether it supports inter-process *and* inter-thread communication, and if

Not sure what exactly you mean by inter-thread communication, whether
communication between threads within one process or between threads from
different processes.  You can use mq_* for either, except that mq_notify
registered signal notification is sent to the process that called mq_notify,
not thread (and for SIGEV_THREAD a new thread is created).

Though of course for communication between threads within one process
mq_* is a huge overkill.

> On more thing: kernel's "make menuconfig" of
> version 2.6.11 says :
> 
> >> To use this feature you will also need mqueue library, available
> 
> >> from <... a URL ... to M. Wronski's and K. Benedyczak's home page>"
> 
> Is it still up to date ?

No, glibc supports mq_* APIs for more than 3 years now.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: gcc fixed size char array initialization bug - known?

2007-08-02 Thread Jakub Jelinek
On Thu, Aug 02, 2007 at 09:55:51PM +0200, Guennadi Liakhovetski wrote:
> I've run across the following gcc "feature":
> 
>   char c[4] = "01234";
> 
> gcc emits a nice warning
> 
> warning: initializer-string for array of chars is too long
> 
> But do a
> 
>   char c[4] = "0123";
> 
> and - a wonder - no warning. No warning with gcc 3.3.2, 3.3.5, 3.4.5, 
> 4.1.2. I was told 4.2.x does produce a warning.

4.2.x nor 4.3 doesn't warn either and it is correct not to warn about
perfectly valid code.
ISO C99 is very obvious in that the terminating '\0' (resp. L'\0') from
the string literal is only added if there is room in the array or if the
array has unknown size.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: smaller kernel with no real time futexes

2007-08-01 Thread Jakub Jelinek
On Wed, Aug 01, 2007 at 09:24:34PM +0200, Andi Kleen wrote:
> Adrian,
> 
> You said earlier you're looking at smaller allnoconfig kernels.
> One thing I noticed recently that realtime pi futexes are always
> enabled and that pulls in a lot of other code (like the plists) 
> 
> Userland needs to handle them not being available anyways for older
> kernels.
> 
> Might be worth looking into turning that into a CONFIG.

That's a very bad idea.  glibc configured for 2.6.18 and higher kernels
assumes PI futexes are present.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: vdso.so mislinked by buggy linker was Re: Linus 2.6.23-rc1

2007-07-23 Thread Jakub Jelinek
On Mon, Jul 23, 2007 at 01:56:20AM +0200, Andi Kleen wrote:
> On Monday 23 July 2007 01:38:40 Andre Noll wrote:
> [readded linux-kernel, Linus]
> 
> >   [Nr] Name  Type Address   Offset
> >Size  EntSize  Flags  Link  Info  Align
> >   [ 0]   NULL   
> >     0 0 0
> >   [ 1] .hash HASH ff700120  0120
> >00b4  0004   A   2 0 8
> >   [ 2] .dynsym   DYNSYM   ff7001d8  01d8
> >0270  0018   A   312 8
> >   [ 3] .dynstr   STRTAB   ff700448  0448
> >0059     A   0 0 1
> >   [ 4] .gnu.version  VERSYM   ff7004a2  04a2
> >0034  0002   A   2 0 2
> >   [ 5] .gnu.version_dVERDEF   ff7004d8  04d8
> >0038     A   3 2 8
> >   [ 6] .text PROGBITS ff700c00  00100bab
>   
> >02e4    AX   0 0 64
> 
> It puts .text at 1MB. Your vdso file must be huge? 
> 
> It looks like it ignores the
> -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096
> options passed to it. The AMD64 ABI has a 1MB minimum page size, but
> these options are supposed to disable it.
> 
> Not sure how to work around this, but having an 1+MB vdso would be incredibly
> wasteful. What version is it? Perhaps we just drop support for this. I can't
> think of a workaround currently.

Looking at vdso.lds.S, if you change just VDSO_TEXT_OFFSET to 0xc00 and
don't tweak the linker script, then you jump backwards with the dot, you
should even get a linker warning about it:

  . = VDSO_PRELINK + VDSO_TEXT_OFFSET;

  .text   : { *(.text) }:text
  .text.ptr   : { *(.text.ptr) }:text
  . = VDSO_PRELINK + 0x900;

Guess that 0x900 should have been VDSO_TEXT_OFFSET + 0x400 or something
similar.  Also note that it is highly desirable to fit the whole vdso into
one page, so increasing VDSO_TEXT_OFFSET etc. offsets too much is just
wasting memory.  From the above dump, VDSO_TEXT_OFFSET 0x500 is too low,
but 0x600 should work, assuming .data section is moved 0x100 higher as well.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linus 2.6.23-rc1

2007-07-22 Thread Jakub Jelinek
On Mon, Jul 23, 2007 at 01:31:00AM +0200, Andi Kleen wrote:
> On Monday 23 July 2007 01:23:38 Andre Noll wrote:
> > On 00:22, Andi Kleen wrote:
> > > > /usr/bin/ld: section .text [ff700500 -> ff7007e3] 
> > > > overlaps section .gnu.version_d [ff7004d8 -> ff70050f]
> > > 
> > > Does this patch fix it?
> > 
> > Nope, with 0x600 I still get the same error. But it helped to further
> > increase VDSO_TEXT_OFFSET to 0xc00. I tried 0x700, 0x800,... and 0xc00
> > is the smallest value in this series that makes the error go away, i.e.
> > the patch below works for me.
> 
> Can you send (privately) readelf -a output from your vdso.so ? 
> Your linker must be doing something weird.
> 
> 0xc00 is quite wasteful.

I think Roland's --build-id doesn't create very big section, the likely
culprit would be a hacked up ld that e.g. defaults to --hash-style=both.
Can you retry with --hash-style=sysv?  vdso really has to include the
traditional .hash section, otherwise it wouldn't be compatible with
old glibcs, and an additional .gnu.hash might be an overkill for it
- doesn't the vdso define only very few symbols?

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: vdso.so mislinked by buggy linker was Re: Linus 2.6.23-rc1

2007-07-22 Thread Jakub Jelinek
On Mon, Jul 23, 2007 at 01:56:20AM +0200, Andi Kleen wrote:
> On Monday 23 July 2007 01:38:40 Andre Noll wrote:
> [readded linux-kernel, Linus]
> 
> >   [Nr] Name  Type Address   Offset
> >Size  EntSize  Flags  Link  Info  Align
> >   [ 0]   NULL   
> >     0 0 0
> >   [ 1] .hash HASH ff700120  0120
> >00b4  0004   A   2 0 8
> >   [ 2] .dynsym   DYNSYM   ff7001d8  01d8
> >0270  0018   A   312 8
> >   [ 3] .dynstr   STRTAB   ff700448  0448
> >0059     A   0 0 1
> >   [ 4] .gnu.version  VERSYM   ff7004a2  04a2
> >0034  0002   A   2 0 2
> >   [ 5] .gnu.version_dVERDEF   ff7004d8  04d8
> >0038     A   3 2 8
> >   [ 6] .text PROGBITS ff700c00  00100bab
>   
> >02e4    AX   0 0 64
> 
> It puts .text at 1MB. Your vdso file must be huge? 
> 
> It looks like it ignores the
> -Wl,-z,max-page-size=4096 -Wl,-z,common-page-size=4096
> options passed to it. The AMD64 ABI has a 1MB minimum page size, but
> these options are supposed to disable it.

These options are fairly new, before they were ignored (like all unknown
-z options).  They were added 2006-05-30 to CVS binutils.

I guess the problem is caused by the gap being too big and old binutils.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RESEND] PIE randomization

2007-07-10 Thread Jakub Jelinek
On Mon, Jul 09, 2007 at 11:58:07PM +0200, Jiri Kosina wrote:
> On Mon, 9 Jul 2007, Jiri Kosina wrote:
> > [ ... ]
> > > - if (!BAD_ADDR(elf_entry)) {
> > > + if (!IS_ERR((void *)elf_entry)) {
> > I agree that this is better solution. Andrew, this Jakub's patch should 
> > replace the pie-randomization-fix-bad_addr-macro.patch if possible. You 
> > can add 
> 
> as this raced :) with Andrew who already folded the 
> pie-randomization-fix-bad_addr-macro.patch into pie-randomization.patch, 
> do you think you could rebase this change against the current state of -mm 
> and resend it? Thanks,

Here it is:

Restore BAD_ADDR check strictness, use IS_ERR in the only place where
the stricter BAD_ADDR can't work, as the value is a load bias rather
than userland address.

Signed-off-by: Jakub Jelinek <[EMAIL PROTECTED]>

--- linux/fs/binfmt_elf.c   2007-07-10 11:39:29.0 +0200
+++ linux/fs/binfmt_elf.c   2007-07-10 11:41:03.0 +0200
@@ -80,7 +80,7 @@ static struct linux_binfmt elf_format = 
.hasvdso= 1
 };
 
-#define BAD_ADDR(x) IS_ERR_VALUE(x)
+#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
 
 static int set_brk(unsigned long start, unsigned long end)
 {
@@ -1005,7 +1005,7 @@ static int load_elf_binary(struct linux_
interpreter,
&interp_map_addr,
load_bias);
-   if (!BAD_ADDR(elf_entry)) {
+   if (!IS_ERR((void *)elf_entry)) {
/*
 * load_elf_interp() returns relocation
 * adjustment


Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RESEND] PIE randomization

2007-07-07 Thread Jakub Jelinek
On Sat, Jul 07, 2007 at 02:13:01AM +0200, Jiri Kosina wrote:
> On Thu, 5 Jul 2007, Rik van Riel wrote:
> 
> > So the original patch has:
> > #define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
> > For some reason(?) it got changed to the clearly buggy:
> > #define BAD_ADDR(x) ((unsigned long)(x) >= PAGE_MASK)
> > Jiri's patch undoes that second buggy define, which is very
> > different from the original that was sent in by you and Ernie.
> 
> This is a part of execshield patch, fthe pie-compiled binary executable 
> memory layout randomization was extracted from - see 
> http://people.redhat.com/~mingo/exec-shield/exec-shield-nx-2.6.19.patch
> 
> Note that load_elf_interp() in vanilla kernel differs from the 
> execshield's (and pie-randomization.patch) version.
> 
> The fix makes the BAD_ADDR check whether the address belongs to the 
> ERR_PTR range, which seems valid for all uses of BAD_ADDR in the patched 
> binfmt_elf.c (do_brk(), elf_map(), do_mmap() etc return valid address or 
> err ptr) ... am I missing something obvious here?

I believe BAD_ADDR macro was changes from ((unsigned long)(x) >= TASK_SIZE)
(which is the right test for invalid user addresses, stronger check than
>= PAGE_MASK) to >= PAGE_MASK only because of the one check of the return
value of load_elf_interp.  All other uses of BAD_ADDR macro are either on
userland addresses (what do_mmap, elf_map, do_brk etc. return;
where TASK_SIZE or more is certainly wrong) or in one case still on unbiased
ELF p_vaddr:
if (BAD_ADDR(k) || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||
in load_elf_binary (where >= TASK_SIZE check is ok too).

So perhaps doing this instead of changing BAD_ADDR to IS_ERR_VAL
might be better:

Signed-off-by: Jakub Jelinek <[EMAIL PROTECTED]>

--- linux/fs/binfmt_elf.c   2007-06-08 21:53:45.0 +0200
+++ linux/fs/binfmt_elf.c   2007-07-07 14:19:14.0 +0200
@@ -80,7 +80,7 @@ static struct linux_binfmt elf_format = 
.hasvdso= 1
 };
 
-#define BAD_ADDR(x) ((unsigned long)(x) >= PAGE_MASK)
+#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
 
 static int set_brk(unsigned long start, unsigned long end)
 {
@@ -1015,7 +1015,7 @@ static int load_elf_binary(struct linux_
interpreter,
&interp_map_addr,
load_bias);
-   if (!BAD_ADDR(elf_entry)) {
+   if (!IS_ERR((void *)elf_entry)) {
/* load_elf_interp() returns relocation 
adjustment */
interp_load_addr = elf_entry;
elf_entry += loc->interp_elf_ex.e_entry;


Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RESEND] PIE randomization

2007-07-04 Thread Jakub Jelinek
On Wed, May 23, 2007 at 10:50:24AM +0200, Jiri Kosina wrote:
> From: Jan Kratochvil <[EMAIL PROTECTED]>
> 
> This patch is using mmap()'s randomization functionality in such a way 
> that it maps the main executable of (specially compiled/linked -pie/-fpie) 
> ET_DYN binaries onto a random address (in cases in which mmap() is allowed 
> to perform a randomization).
> 
> Origin of this patch is in exec-shield
> (http://people.redhat.com/mingo/exec-shield/)
> 
> Signed-off-by: Jan Kratochvil <[EMAIL PROTECTED]>
> Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]>
> Cc: Ingo Molnar <[EMAIL PROTECTED]>
> Cc: Roland McGrath <[EMAIL PROTECTED]>
> Cc: Jakub Jelinek <[EMAIL PROTECTED]>

> -#define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
> +#define BAD_ADDR(x) ((unsigned long)(x) >= PAGE_MASK)
...
> @@ -442,8 +491,7 @@ static unsigned long load_elf_interp(str
>   goto out_close;
>   }
>  
> - *interp_load_addr = load_addr;
> - error = ((unsigned long)interp_elf_ex->e_entry) + load_addr;
> + error = load_addr;
...
>   if (elf_interpreter) {
> - if (interpreter_type == INTERPRETER_AOUT)
> + if (interpreter_type == INTERPRETER_AOUT) {
>   elf_entry = load_aout_interp(&loc->interp_ex,
>interpreter);
> - else
> + } else {
> + unsigned long interp_map_addr;  /* unused */
> +
>   elf_entry = load_elf_interp(&loc->interp_elf_ex,
>   interpreter,
> - &interp_load_addr);
> + &interp_map_addr,
> + load_bias);
> + if (!BAD_ADDR(elf_entry)) {
> + /*
> +  * load_elf_interp() returns relocation
> +  * adjustment
> +  */
> + interp_load_addr = elf_entry;
> + elf_entry += loc->interp_elf_ex.e_entry;
> + }
> + }
>   if (BAD_ADDR(elf_entry)) {
>   force_sig(SIGSEGV, current);
>   retval = IS_ERR((void *)elf_entry) ?

The above highlighted changes are the cause of random segfaults of PIE
binaries.  See
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=246623
The problem is if ld.so is prelinked to some address in the area where
the kernel actually maps it, particularly if elf_map in load_elf_interp
returns an address one page below its first PT_LOAD segments vaddr.
Then load_addr (it is a load bias actually) returned from load_elf_interp
is 0xf000 (on 32-bit kernels) and BAD_ADDR are all
addresses >= 0xf000 (on i?86).
The fix should be either changing the definition of BAD_ADDR to
e.g. IS_ERR_VALUE(x), or at least changing the if (!BAD_ADDR(elf_entry)) {
above to if (!IS_ERR_VALUE(elf_entry)) {, the second BAD_ADDR can already
stay, because at that place elf_entry is no longer a bias (difference
between actual and preferred load address), but an actual address, where
very high addresses are of course invalid.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Userspace compiler support of "long long"

2007-06-28 Thread Jakub Jelinek
On Thu, Jun 28, 2007 at 07:53:51AM -0400, Kyle Moffett wrote:
> On Jun 27, 2007, at 23:57:54, Matthew Wilcox wrote:
> >On Wed, Jun 27, 2007 at 06:30:52PM -0400, Kyle Moffett wrote:
> >>Then all 64-bit archs have:
> >>typedef   signed long  __s64;
> >>typedef unsigned long  __u64;
> >>
> >>While all 32-bit archs have:
> >>typedef   signed long long __s64;
> >>typedef unsigned long long __u64;
> >
> >include/asm-parisc/types.h:typedef unsigned long long __u64;
> >
> >For both 32 and 64-bit.
> >
> >include/asm-sh64/types.h:typedef unsigned long long __u64;
> >include/asm-x86_64/types.h:typedef unsigned long long  __u64;
> >
> >So that's three architectures that violate your first assertion.
> 
> Oh, ok, that makes it even easier to say this with certainty:   
> Changing the other 64-bit archs to use "long long" for their 64-bit  
> numbers will not cause additional warnings.  I'm also almost certain  
> there are no architectures which use "long long" for 128-bit  
> integers. (Moreover, I can't find hardly anything which does 128-bit  
> integers at all).

unsigned long and unsigned long long have the same size, precision
and alignment on all LP64 arches, that's true.  But they have
different ranks and more importantly they mangle differently in C++.
So, whether some user exposed type uses unsigned long or unsigned long long
is part of the ABI, whether that's size_t, uintptr_t, uint64_t, u_int64_t
or any other type, you can't change it without breaking the ABI.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] get_random_long() and AT_ENTROPY for auxv, kernel 2.6.21.5

2007-06-25 Thread Jakub Jelinek
On Sun, Jun 24, 2007 at 09:43:03PM -0700, Arjan van de Ven wrote:
> > - something to do with aux vector headers
> 
> the primary goal is to pass a random value to userspace at process
> start; this to save glibc from having to open /dev/urandom on ever
> program start (which it does now for all apps compiled with
> -fstack-protector, which in various distros is "everything").

There are 2 ways to compile -fstack-protector supporting glibc actually,
only one opens /dev/urandom on every program initialization, the other
computes the stack guard from some bits of the stack address (so indirectly
depends on get_random_int() in stack randomization).
Nevertheless, having one random long (32-bit for 32-bit arches, 64-bit
otherwise) in aux vector would be useful.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: O_CLOEXEC: An alternate proposal

2007-06-08 Thread Jakub Jelinek
On Fri, Jun 08, 2007 at 03:47:12AM -0400, Daniel Colascione wrote:
> Hey, this is my first post to linux-kernel, so please be kind. :-)
> 
> Linus Torvalds wrote on May 31:
> > I'm with Uli on this one. "Stateful" stuff is bad. It's essentially
> > impossible to handle with libraries - either the library would have to
> > explciitly always turn the state the way _it_ needs it, or the library
> > will do the wrogn thing.
> 
> I agree that stateful stuff is generally not very elegant,
> but I think it's a win here -- we wouldn't have to create any
> new APIs except for the state-setting stuff.
> 
> The state just has to be thread-local.
> 
> If it's thread-local, a library, say, glibc,
> can use code like this:
> 
>   /* Internal library function */
>   old_fd_flags = kernel_default_fd_flags(FD_CLOEXEC | FD_RANDFD);
>   event_fd = super_duper_event_polling_mechanism_fd();
>   kernel_default_fd_flags(old_fd_flags);

It is not a win, what if a signal comes in between the two
kernel_default_fd_flags syscalls?  open and other functions
are async signal safe and programs will be certainly upset
if suddenly the syscalls in the signal handler start to behave
differently depending on which exact code the async signal
has interrupted.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Introduce O_CLOEXEC (take >2)

2007-05-31 Thread Jakub Jelinek
On Thu, May 31, 2007 at 11:46:31AM -0700, Davide Libenzi wrote:
> On Thu, 31 May 2007, Ulrich Drepper wrote:
> > Davide Libenzi wrote:
> > > Isn't this better be a global process flag? Default should be, for legacy
> > > reasons,
> > 
> > No.  Policies are always wrong since it means code that cannot change
> > the policy (e.g, all runtime libraries) have no access to the
> > functionality.  I cannot set the policy to default to close-on-exit in
> > glibc all the while the application assumes this is not the case.
> 
> I was talking for a broader usage, not only glibc centric. Most ppl 
> writing MT+exec apps wants all but (eventually) and handfull of files 
> leaking across the exec boundary.

If open (and all other syscalls that create fds) have O_CLOEXEC (and
something similar for other syscalls), then such a policy can be easily
implemented on the userland, if desired.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] MM: implement MADV_FREE lazy freeing of anonymous memory

2007-05-08 Thread Jakub Jelinek
On Tue, May 08, 2007 at 04:12:00PM +1000, Nick Piggin wrote:
> I didn't actually check system and user times for the mysql
> benchmark, but that's exactly what I had in mind when I
> mentioned the poor cache behaviour this patch could cause. I
> definitely did see user times go up in benchmarks where I
> measured.
> 
> We have percpu and cache affine page allocators, so when
> userspace just frees a page, it is likely to be cache hot, so
> we want to free it up so it can be reused by this CPU ASAP.
> Likewise, when we newly allocate a page, we want it to be one
> that is cache hot on this CPU.

malloc has per-thread arenas, so when using MADV_FREE the pages
should be local to the thread as well (unless the thread has switched
to a different CPU also to the CPU) and in case of sysbench should
be cache hot as well (it is reused RSN).  With MADV_DONTNEED you need to
clear the pages while that is not necessary with MADV_FREE.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] fallocate() implementation in i86, x86_64 and powerpc

2007-05-03 Thread Jakub Jelinek
On Thu, May 03, 2007 at 11:28:15PM -0700, Andrew Morton wrote:
> > > The posix spec implies that negative `len' is permitted - presumably 
> > > "allocate
> > > ahead of `offset'".  How peculiar.
> > 
> > I just checked the man page for posix_fallocate() and it says:
> > 
> >   EINVAL  offset or len was less than zero.

That describes the current glibc implementation.


> > We should probably follow this lead.
> 
> Yes, I think so.  I'm suspecting that
> http://www.opengroup.org/onlinepubs/009695399/functions/posix_fallocate.html
> is just buggy.  Or I can't read.
> 
> I mean, if we're going to support negative `len' then is the byte at
> `offset' inside or outside the segment?  Head spins.
> 
> However it would be neat if someone could test $OTHER_OS and, perhaps more
> importantly, the present glibc emulation (which I assume your manpage is
> referring to, so this would be a manpage test ;)).

int
posix_fallocate (int fd, __off_t offset, __off_t len)
{
  struct stat64 st;
  struct statfs f;

  /* `off_t' is a signed type.  Therefore we can determine whether
 OFFSET + LEN is too large if it is a negative value.  */
  if (offset < 0 || len < 0)
return EINVAL;
  if (offset + len < 0)
return EFBIG;

  /* First thing we have to make sure is that this is really a regular
 file.  */
  if (__fxstat64 (_STAT_VER, fd, &st) != 0)
return EBADF;
  if (S_ISFIFO (st.st_mode))
return ESPIPE;
  if (! S_ISREG (st.st_mode))
return ENODEV;

  if (len == 0)
{
  if (st.st_size < offset)
{
  int ret = __ftruncate (fd, offset);

  if (ret != 0)
ret = errno;
  return ret;
}
  return 0;
}
...

is what glibc does ATM.  Seems we violate the case where len == 0, as
EINVAL in that case is "shall fail".  But reading the standard to imply
negative len is ok is too much guessing, there is no word what it means
when len is negative and
"required storage for regular file data starting at offset and continuing for 
len bytes"
doesn't make sense for negative size.  
And given the general
"Implementations may support additional errors not included in this list,
may generate errors included in this list under circumstances other than
those described here, or may contain extensions or limitations that prevent
some errors from occurring."
I believe returning EINVAL for len < 0 is not a POSIX violation.
That doesn't mean the standard shouldn't be clarified, whether by saying
EINVAL must be returned for non-positive len or saying that using negative
len has undefined or implementation defined behavior.

> The above opengroup page only permits S_ISREG.  Preallocating directories
> sounds quite useful to me, although it's something which would be pretty
> hard to emulate if the FS doesn't support it.  And there's a decent case to
> be made for emulating it - run-anywhere reasons.  Does glibc emulation support
> directories?  Quite unlikely.

No, see above.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MADV_FREE functionality

2007-05-01 Thread Jakub Jelinek
On Mon, Apr 30, 2007 at 06:18:39PM -0700, Andrew Morton wrote:
> > In short:
> > - both MADV_FREE and MADV_DONTNEED only unmap file pages
> > - after MADV_DONTNEED the application will always get back
> >fresh zero filled anonymous pages when accessing the
> >memory
> > - after MADV_FREE the application can either get back the
> >original data (without a page fault) or zero filled
> >anonymous memory
> > 
> > The Linux MADV_DONTNEED behavior is not POSIX compliant.
> > POSIX says that with MADV_DONTNEED the application's data
> > will be preserved.
> > 
> > Currently glibc simply ignores POSIX_MADV_DONTNEED requests
> > from applications on Linux.  Changing the behaviour which
> > some Linux applications may rely on might not be the best
> > idea.
> 
> OK, thanks.  I stuck that in the changelog.

FYI, Solaris man page on MADV_FREE says:

  MADV_FREE
Tells  the  kernel  that  contents  in  the  specified
address  range  are  no longer important and the range
will be overwritten. When there is demand for  memory,
the  system will free pages associated with the speci-
fied address range. In this instance, the next time  a
page  in the address range is referenced, it will con-
tail all zeroes.  Otherwise, it will contain the  data
that was there prior to the MADV_FREE call. References
made to the address range will  not  make  the  system
read from backing store (swap space) until the page is
modified again.

This value cannot be used on mappings that have under-
lying file objects.

The last paragraph seems to be just about the operation being
undefined, madvise MADV_FREE on MAP_SHARED file mapping returns 0
rather than flagging an error.

FreeBSD man page:

MADV_FREEGives the VM system the freedom to free pages, and 
tells
 the system that information in the specified page range
 is no longer important.  This is an efficient way of
 allowing malloc(3) to free pages anywhere in the 
address
 space, while keeping the address space valid.  The next
 time that the page is referenced, the page might be
 demand zeroed, or might contain the data that was there
 before the MADV_FREE call.  References made to that
 address space range will not make the VM system page 
the
 information back in from backing store until the page 
is
 modified again.

> Also, where did we end up with the Solaris compatibility?
> 
> The patch I have at present retains MADV_FREE=0x05 for sparc and sparc64
> which should be good.
> 
> Did we decide that the Solaris and Linux implementations of MADV_FREE are
> compatible?

SPARC Solaris binary compatibility in Linux is in really bad shape, madvise
in Solaris is implemented using memcntl syscall (at least according to truss(1))
and that syscall is
systbl.S:   .word solaris_unimplemented /* memcntl  131 
*/
When/if anyone decides to put more effort into the Solaris binary compatibility
(I'm quite doubtful anyone will), codes which don't match can be simply 
translated into
other codes, ignored etc., we can't use sys_madvise to implement memcntl
syscall anyway.  While Solaris MADV_FREE is the same as Linux MADV_FREE proposed
by Rik (except perhaps the documented undefined behavior with file mappings,
on
#include 
#include 
#include 

int
main (void)
{
  getpid ();
  int fd = open ("test", O_RDWR);
  void *p = mmap (0, 8192, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
  memset (p, ' ', 8192);
  madvise (p, 8192, MADV_FREE);
  return 0;
}
on Solaris the spaces actually made it into the file), MADV_DONTNEED is not,
but that doesn't really matter except for arch/sparc*/solaris/ layer if anyone
cares.  We certainly can't change current MADV_DONTNEED behavior, all we
can do is implement a new MADV_* code with a different behavior and let glibc
translate POSIX_MADV_* codes on posix_madvise to the Linux specific MADV_*.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Big reserved mappings on x86_64

2007-04-25 Thread Jakub Jelinek
On Wed, Apr 25, 2007 at 10:42:20AM +0200, Jan Engelhardt wrote:
> I actually took a look at `pmap $$`, which reveals that a lot of shared 
> libraries map 2044K or 2048K unreadable-unwritable-private 
> mappings...for _what_ purpose?
> 
> 10:37 opteron:~ > pmap $$
> 4403: bash
> START   SIZE RSS   DIRTY PERM MAPPING
> 2ae6cca7212K172K  0K r-xp /lib64/libreadline.so.5.2
> 2ae6ccaa5000   2044K  0K  0K ---p /lib64/libreadline.so.5.2 <--
> 2ae6ccca4000  4K  4K  4K r--p /lib64/libreadline.so.5.2
> 2ae6ccca5000 28K 28K 28K rw-p /lib64/libreadline.so.5.2
> 2ae6cccac000  8K  8K  8K rw-p [anon]
> 2ae6cccae000 28K 16K  0K r-xp /lib64/libhistory.so.5.2
> 2ae6cccb5000   2048K  0K  0K ---p /lib64/libhistory.so.5.2 <--
> 2ae6cceb5000  8K  8K  8K rw-p /lib64/libhistory.so.5.2
> 2ae6cceb7000320K208K  0K r-xp /lib64/libncurses.so.5.5
> 2ae6ccf07000   2048K  0K  0K ---p /lib64/libncurses.so.5.5 <--
> 2ae6cd107000 48K 48K 48K r--p /lib64/libncurses.so.5.5
> 2ae6cd113000 28K 28K 28K rw-p /lib64/libncurses.so.5.5
> [...]
> 
> What could these ominous mappings be? Does anyone else see that - 
> perhaps someone with x86_64 && !(opensuse 10.2)?

While i386 only supports 4KB pages, x86_64 ELF objects ought to support
up to 2MB pages.  The gap between read-only/executable and writable segments
is intentionally mapped PROT_NONE, so that other things aren't mapped in
between the segments.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] lazy freeing of memory through MADV_FREE

2007-04-23 Thread Jakub Jelinek
On Mon, Apr 23, 2007 at 08:21:37PM +1000, Nick Piggin wrote:
> I guess it is a good idea to batch these things. But can you
> do that on all architectures? What happens if your tlb flush
> happens after another thread already accesses it again, or
> after it subsequently gets removed from the address space via
> another CPU?

Accessing the page by another thread before madvise (MADV_FREE)
returns is undefined behavior, it can act as if that access happened
right before the madvise (MADV_FREE) call or right after it.
That's ok for glibc and supposedly any other malloc implementation,
madvise (MADV_FREE) is called while holding containing's arena lock
and for whatever malloc implementaton, madvise (MADV_FREE) would be
part of free operations and you definitely need some synchronization
between one thread freeing some memory and other thread deciding
to reuse that memory and return it from malloc/realloc/calloc/etc.

My only concern is whether using non-atomic update of the pte is
ok or not.
ptep_test_and_clear_young/ptep_test_and_clear_dirty Rik's patch
was doing before are done using atomic instructions, at least on x86_64.
The operation we want for MADV_FREE is, clear young/dirty bits if they
have been set on entry to the MADV_FREE madvise call, undefined values
for these 2 bits if some other task modifies the young/dirty bits
concurrently with this MADV_FREE zap_page_range, but I'd say other
bits need to be unmodified.
Now, is there some kernel code which while either not holding corresponding
mmap_sem at all or holding it just down_read modifies other bits
in the pte?  If yes, we need to do this clearing atomically, basically
do a cmpxchg loop until we succeed to clear the 2 bits and then flush
the tlb if any of them was set before (ptep_test_and_clear_dirty_and_young?),
if not, set_pte_at is ok and faster than a lock prefixed insn.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] lazy freeing of memory through MADV_FREE

2007-04-21 Thread Jakub Jelinek
On Fri, Apr 20, 2007 at 07:52:44PM -0400, Rik van Riel wrote:
> It turns out that Nick's patch does not improve peak
> performance much, but it does prevent the decline when
> running with 16 threads on my quad core CPU!
> 
> We _definately_ want both patches, there's a huge benefit
> in having them both.
> 
> Here are the transactions/seconds for each combination:
> 
>vanilla   new glibc  madv_free kernel   madv_free + mmap_sem
> threads
> 
> 1 610 609 596545
> 2103211361196   1200
> 4107011282014   2024
> 8100010881665   2087
> 1677910731310   1999

FYI, I have uploaded a testing glibc that uses MADV_FREE and falls back
to MADV_DONTUSE if MADV_FREE is not available, to
http://people.redhat.com/jakub/glibc/2.5.90-21.1/
and I'm also attaching the glibc patch for those who want to build it
themselves:

2007-04-19  Ulrich Drepper  <[EMAIL PROTECTED]>
Jakub Jelinek  <[EMAIL PROTECTED]>

* malloc/arena.c (heap_info): Add mprotect_size field, adjust pad.
(new_heap): Initialize mprotect_size.
(no_madv_free): New variable.
(grow_heap): When growing, only mprotect from mprotect_size till
new_size if mprotect_size is smaller.  When shrinking, use PROT_NONE
MMAP for __libc_enable_secure only, otherwise if MADV_FREE is
available use it and fall back to MADV_DONTNEED.
* sysdeps/unix/sysv/linux/alpha/bits/mman.h (MADV_FREE): Define.
* sysdeps/unix/sysv/linux/ia64/bits/mman.h (MADV_FREE): Likewise.
* sysdeps/unix/sysv/linux/i386/bits/mman.h (MADV_FREE): Likewise.
* sysdeps/unix/sysv/linux/s390/bits/mman.h (MADV_FREE): Likewise.
* sysdeps/unix/sysv/linux/powerpc/bits/mman.h (MADV_FREE): Likewise.
* sysdeps/unix/sysv/linux/x86_64/bits/mman.h (MADV_FREE): Likewise.
* sysdeps/unix/sysv/linux/sparc/bits/mman.h (MADV_FREE): Likewise.
* sysdeps/unix/sysv/linux/sh/bits/mman.h (MADV_FREE): Likewise.

--- libc/malloc/arena.c.jj  2006-10-31 23:05:31.0 +0100
+++ libc/malloc/arena.c 2007-04-19 18:54:20.0 +0200
@@ -1,5 +1,6 @@
 /* Malloc implementation for multiple threads without lock contention.
-   Copyright (C) 2001,2002,2003,2004,2005,2006 Free Software Foundation, Inc.
+   Copyright (C) 2001,2002,2003,2004,2005,2006,2007
+   Free Software Foundation, Inc.
This file is part of the GNU C Library.
Contributed by Wolfram Gloger <[EMAIL PROTECTED]>, 2001.
 
@@ -59,10 +60,12 @@ typedef struct _heap_info {
   mstate ar_ptr; /* Arena for this heap. */
   struct _heap_info *prev; /* Previous heap. */
   size_t size;   /* Current size in bytes. */
+  size_t mprotect_size;/* Size in bytes that has been mprotected
+  PROT_READ|PROT_WRITE.  */
   /* Make sure the following data is properly aligned, particularly
  that sizeof (heap_info) + 2 * SIZE_SZ is a multiple of
- MALLOG_ALIGNMENT. */
-  char pad[-5 * SIZE_SZ & MALLOC_ALIGN_MASK];
+ MALLOC_ALIGNMENT. */
+  char pad[-6 * SIZE_SZ & MALLOC_ALIGN_MASK];
 } heap_info;
 
 /* Get a compile-time error if the heap_info padding is not correct
@@ -692,10 +695,15 @@ new_heap(size, top_pad) size_t size, top
   }
   h = (heap_info *)p2;
   h->size = size;
+  h->mprotect_size = size;
   THREAD_STAT(stat_n_heaps++);
   return h;
 }
 
+#if defined _LIBC && defined MADV_FREE
+static int no_madv_free;
+#endif
+
 /* Grow or shrink a heap.  size is automatically rounded up to a
multiple of the page size if it is positive. */
 
@@ -714,17 +722,49 @@ grow_heap(h, diff) heap_info *h; long di
 new_size = (long)h->size + diff;
 if((unsigned long) new_size > (unsigned long) HEAP_MAX_SIZE)
   return -1;
-if(mprotect((char *)h + h->size, diff, PROT_READ|PROT_WRITE) != 0)
-  return -2;
+if((unsigned long) new_size > h->mprotect_size) {
+  if (mprotect((char *)h + h->mprotect_size,
+  (unsigned long) new_size - h->mprotect_size,
+  PROT_READ|PROT_WRITE) != 0)
+   return -2;
+  h->mprotect_size = new_size;
+}
   } else {
 new_size = (long)h->size + diff;
 if(new_size < (long)sizeof(*h))
   return -1;
 /* Try to re-map the extra heap space freshly to save memory, and
make it inaccessible. */
-if((char *)MMAP((char *)h + new_size, -diff, PROT_NONE,
-MAP_PRIVATE|MAP_FIXED) == (char *) MAP_FAILED)
-  return -2;
+#ifdef _LIBC
+if (__builtin_expect (__libc_enable_secure, 0))
+#else
+if (1)
+#endif
+  {
+   if((char *)MMAP((char *)h + new_size, -diff, PROT_NONE,
+   MAP_PRIVATE|MAP_FIXED) == (char *) MAP_FAILED)
+ return -2;
+   

Re: Interface for the new fallocate() system call

2007-04-20 Thread Jakub Jelinek
On Fri, Apr 20, 2007 at 07:21:46PM +0530, Amit K. Arora wrote:
> Ok.
> In this case we may have to consider following things:
> 
> 1) Obviously, for this glibc will have to call fallocate() syscall with
> different arguments on s390, than other archs. I think this should be
> doable and should not be an issue with glibc folks (right?).

glibc can cope with this easily, will just add
sysdeps/unix/sysv/linux/s390/fallocate.c or something similar to override
the generic Linux implementation.

> 2) we also need to see how strace behaves in this case. With little
> knowledge that I have of strace, I don't think it should depend on
> argument ordering of a system call on different archs (since it uses
> ptrace internally and that should take care of it). But, it will be
> nice if someone can confirm this.

strace would solve this with #ifdef mess, it already does that in many
places so guess another few lines don't make it significantly worse.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] make MADV_FREE lazily free memory

2007-04-16 Thread Jakub Jelinek
On Mon, Apr 16, 2007 at 11:10:39AM -0500, Anton Blanchard wrote:
> > Making the pte clean also needs to clear the hardware writable
> > bit on architectures where we do pte dirtying in software.
> > 
> > If we don't, we would have corruption problems all over the VM,
> > for example in the code around pte_clean_one :)
> > 
> > >But as Linus recently said, even hardware handled faults still
> > >take expensive microarchitectural traps.
> > 
> > Nowhere near as expensive as a full page fault, though...
> 
> Unfortunately it will be expensive on architectures that have software
> referenced and changed. It would be great if we could just leave them
> dirty in the pagetables and transition between a clean and dirty state
> via madvise calls, but thats just wishful thinking on my part :)

That would mean an additional syscall.  Furthermore, if you allocate a big
chunk of memory, dirty it, then free (with madvise (MADV_FREE)) it and soon
allocate the same size of memory again, it is better to start that with
non-dirty memory, it might be that this time you e.g. don't modify a big
part of the chunk.  If all that memory was kept dirty all the time and
just marked/unmarked for lazy reuse with MADV_FREE/MADV_UNDO_FREE, all that
memory would need to be saved to disk when paging out as it was marked
dirty, while with current Rik's MADV_FREE that will happen only for pages
that were actually dirtied after the last malloc.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH, take4] FUTEX : new PRIVATE futexes

2007-04-07 Thread Jakub Jelinek
On Sat, Apr 07, 2007 at 10:43:39AM +0200, Eric Dumazet wrote:
> get_futex_key() does a check against sizeof(u32) regardless of futex being 
> 64bits or not.
> So it is possible a 64bit futex spans two pages of memory...

That would be a user bug.  32-bit futexes have to be 32-bit aligned, 64-bit
futexes have to be 64-bit aligned.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: missing madvise functionality

2007-04-05 Thread Jakub Jelinek
On Thu, Apr 05, 2007 at 03:31:24AM -0400, Rik van Riel wrote:
> >My guess is that all the page zeroing is pretty expensive as well and
> >takes significant time, but I haven't profiled it.
> 
> With the attached patch (Andrew, I'll change the details around
> if you want - I just wanted something to test now), your test
> case run time went down considerably.

Thanks.

--- linux-2.6.20.noarch/mm/madvise.c.madvise2007-04-03 21:53:47.0 
-0400
+++ linux-2.6.20.noarch/mm/madvise.c2007-04-04 23:48:34.0 -0400
@@ -142,8 +142,12 @@ static long madvise_dontneed(struct vm_a
.last_index = ULONG_MAX,
};
zap_page_range(vma, start, end - start, &details);
-   } else
-   zap_page_range(vma, start, end - start, NULL);
+   } else {
+   struct zap_details details = {
+   .madv_free = 1,
+   };
+   zap_page_range(vma, start, end - start, &details);
+   }
return 0;
 }
 
@@ -209,7 +213,9 @@ madvise_vma(struct vm_area_struct *vma, 
error = madvise_willneed(vma, prev, start, end);
break;
 
+   /* FIXME: POSIX says that MADV_DONTNEED cannot throw away data. */
case MADV_DONTNEED:
+   case MADV_FREE:
error = madvise_dontneed(vma, prev, start, end);
break;
 
I think you should only use the new behavior for madvise MADV_FREE, not for
MADV_DONTNEED.  The current MADV_DONTNEED behavior (which conflicts with
POSIX POSIX_MADV_DONTNEED, but that doesn't matter since what glibc
maps posix_madvise POSIX_MADV_DONTNEED in madvise call if anything doesn't
have to be MADV_DONTNEED, but can be anything else) is apparently documented
in Linux man pages:
   MADV_DONTNEED
  Do not expect access in the near future.  (For the time being, 
the application is finished with  the
  given  range, so the kernel can free resources associated with 
it.)  Subsequent accesses of pages in
  this range will succeed, but will result either in re-loading of 
the memory contents from the under-
  lying mapped file (see mmap()) or zero-fill-on-demand pages for 
mappings without an underlying file.
so it wouldn't surprise me if something relied on zero filling.
So IMHO madv_free in details should be only set if MADV_FREE.

Also, I think MADV_FREE shouldn't do anything at all (i.e. don't call
zap_page_range, but don't fail either) for shared or file backed vmas,
only for private anon memory it should do something.  After all, it
is just an optimization and it makes sense only for private anon mappings.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: missing madvise functionality

2007-04-04 Thread Jakub Jelinek
On Wed, Apr 04, 2007 at 05:46:12PM +1000, Nick Piggin wrote:
> Does mmap(PROT_NONE) actually free the memory?

Yes.
/* Clear old maps */
error = -ENOMEM;
munmap_back:
vma = find_vma_prepare(mm, addr, &prev, &rb_link, &rb_parent);
if (vma && vma->vm_start < addr + len) {
if (do_munmap(mm, addr, len))
return -ENOMEM;
goto munmap_back;
}

> In the case of pages being unused then almost immediately reused, why is
> it a bad solution to avoid freeing? Is it that you want to avoid
> heuristics because in some cases they could fail and end up using memory?

free(3) doesn't know if the memory will be reused soon, late or never.
So avoiding trimming could substantially increase memory consumption with
certain malloc/free patterns, especially in threaded programs that use
multiple arenas.  Implementing some sort of deferred memory trimming
in malloc is "solving" the problem in a wrong place, each app really has no
idea (and should not have) what the current system memory pressure is.

> Secondly, why is MADV_DONTNEED bad? How much more expensive is a pagefault
> than a syscall? (including the cost of the TLB fill for the memory access
> after the syscall, of course).

That's page fault per page rather than a syscall for the whole chunk,
furthermore zeroing is expensive.

We really want something like FreeBSD MADV_FREE in Linux, see e.g.
http://mail.nl.linux.org/linux-mm/2000-03/msg00059.html
for some details.  Apparently FreeBSD malloc is using MADV_FREE for years
(according to their CVS for 10 years already).

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: getting processor numbers

2007-04-03 Thread Jakub Jelinek
On Tue, Apr 03, 2007 at 07:04:58PM -0700, Paul Jackson wrote:
> Andrew wrote:
> > I'd have thought that in general an application should be querying its
> > present affinity mask - something like sched_getaffinity()?  That fixes the
> > CPU hotplug issues too, of course.
> 
> The sched_getaffinity call is quick, too, and it nicely reflects any
> cpuset constraints, while still working on kernels which don't have
> CPUSETs configured.
> 
> There are really at least four "number of CPUs" answers here, and we
> should be aware of which we are providing.  There are, in order of
> decreasing size:
>  1) the size of the kernels cpumask_t (NR_CPUS),
>  2) the maximum number of CPUs that might ever be hotplugged into a
> booted system,
>  3) the current number of CPUs online in that system, and
>  4) the number of CPUs that the current task is allowed to use.
> 
> I would suggest that (4) is what we should typically return.
> Certainly it would seem that the use that Ulrich is concerned with,
> by OpenMP, wants (4).
> 
> Currently, the sysconf(_SC_NPROCESSORS_CONF) returns (3), by counting
> the CPUs in /proc/stat, which is rather bogus on cpuset, or even
> sched_setaffinity, constrained systems.

OpenMP wants (4) and I'll change it that way.

sysconf(_SC_NPROCESSORS_ONLN) must return (3) (this currently scans /proc/stat)
and
sysconf(_SC_NPROCESSORS_CONF) should IMHO return (2) (this currently
scans /proc/cpuinfo on alpha and sparc{,64} for ((ncpus|CPUs) probed|cpus 
detected)
and for the rest just returns sysconf(_SC_NPROCESSORS_ONLN)).
Neither of the sysconf returned values should be affected by affinity.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: missing madvise functionality

2007-04-03 Thread Jakub Jelinek
On Tue, Apr 03, 2007 at 01:17:09PM -0700, Ulrich Drepper wrote:
> Andrew Morton wrote:
> > Ulrich, could you suggest a little test app which would demonstrate this
> > behaviour?
> 
> It's not really reliably possible to demonstrate this with a small
> program using malloc.  You'd need something like this mysql test case
> which Rik said is not hard to run by yourself.
> 
> If somebody adds a kernel interface I can easily produce a glibc patch
> so that the test can be run in the new environment.
> 
> But it's of course easy enough to simulate the specific problem in a
> micro benchmark.  If you want that let me know.

I think something like following testcase which simulates what free
and malloc do when trimming/growing a non-main arena.

My guess is that all the page zeroing is pretty expensive as well and
takes significant time, but I haven't profiled it.

#include 
#include 
#include 
#include 

void *
tf (void *arg)
{
  (void) arg;
  size_t ps = sysconf (_SC_PAGE_SIZE);
  void *p = mmap (NULL, 128 * ps, PROT_READ | PROT_WRITE,
  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
  if (p == MAP_FAILED)
exit (1);
  int i;
  for (i = 0; i < 10; i++)
{
  /* Pretend to use the buffer.  */
  char *q, *r = (char *) p + 128 * ps;
  size_t s;
  for (q = (char *) p; q < r; q += ps)
*q = 1;
  for (s = 0, q = (char *) p; q < r; q += ps)
s += *q;
  /* Free it.  Replace this mmap with
 madvise (p, 128 * ps, MADV_THROWAWAY) when implemented.  */
  if (mmap (p, 128 * ps, PROT_NONE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0) != p)
exit (2);
  /* And immediately malloc again.  This would then be deleted.  */
  if (mprotect (p, 128 * ps, PROT_READ | PROT_WRITE))
exit (3);
}
  return NULL;
}

int
main (void)
{
  pthread_t th[32];
  int i;
  for (i = 0; i < 32; i++)
if (pthread_create (&th[i], NULL, tf, NULL))
  exit (4);
  for (i = 0; i < 32; i++)
pthread_join (th[i], NULL);
  return 0;
}

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: getting processor numbers

2007-04-03 Thread Jakub Jelinek
On Tue, Apr 03, 2007 at 10:59:53AM -0700, Ulrich Drepper wrote:
> Siddha, Suresh B wrote:
> > Not all of the cpu* directories in /sys/devices/system/cpu may be
> > online.
> 
> Brilliant.  You people really know how to create user interfaces.  So
> now in any case per CPU another stat/access syscall is needed to check
> the 'online' pseudo-file?  With this the readdir solution is only
> marginally faster than parsing /proc/cpuinfo which means, it's
> unacceptably slow.

Note that glibc actually parses /proc/stat in preference of /proc/cpuinfo
ATM, because /proc/stat is at least uniform while parsing /proc/cpuinfo
needs a special parser for each architecture.  And /proc/stat reading
is even slower than /proc/cpuinfo, on x86_64 reading/parsing /proc/stat
takes about 450usec, while e.g. stat64 on /sys/devices/system/cpu
is just 2.5usec.
But if that can't be trusted as the number of online CPUs, can
somebody please add a short file to proc or sysfs which will contain
the number of online and number of configured CPUs?

See e.g. http://openmp.org/pipermail/omp/2007/000714.html
where the first time after second g++ invocation is with omp_set_dynamic (1)
and ought to be about as fast as omp_set_dynamic (0) case with the same
number of threads, but it is far slower due to slow
sysconf (_SC_NPROCESSORS_ONLN).

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interface for the new fallocate() system call

2007-03-29 Thread Jakub Jelinek
On Thu, Mar 29, 2007 at 10:10:10AM -0700, Andrew Morton wrote:
> > Platform: s390
> > --
> > s390 prefers following layout:
> > 
> >int fallocate(int fd, loff_t offset, loff_t len, int mode)
> > 
> > For details on why and how "int, int, loff_t, loff_t" is a problem on
> > s390, please see Heiko's mail on 16th March. Here is the link:
> > http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg133595.html
> > 
> > Platform: ppc, arm
> > --
> > ppc (32 bit) has a problem with "int, loff_t, loff_t, int" layout,
> > since this will result in a pad between fd and offset, making seven
> > arguments total - which is not supported by ppc32. It supports only
> > 6 arguments. Thus the desired layout by ppc32 is:
> > 
> >int fallocate(int fd, int mode, loff_t offset, loff_t len)
> > 
> > Even ARM prefers above kind of layout. For details please see the
> > definition of sys_arm_sync_file_range().
> 
> This is a clean-looking option.  Can s390 be changed to support seven-arg
> syscalls?

Wouldn't
int fallocate(loff_t offset, loff_t len, int fd, int mode)
work on both s390 and ppc/arm?  glibc will certainly wrap it and
reorder the arguments as needed, so there is no need to keep fd first.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc4-mm1 4/4] sys_futex64 : allows 64bit futexes

2007-03-27 Thread Jakub Jelinek
On Wed, Mar 21, 2007 at 10:54:36AM +0100, [EMAIL PROTECTED] wrote:
> This last patch is an adaptation of the sys_futex64 syscall provided in -rt
> patch (originally written by Ingo Molnar). It allows the use of 64-bit futex.
> 
> I have re-worked most of the code to avoid the duplication of the code.
> 
> It does not provide the functionality for all architectures (only for x64 for 
> now).

I don't think you should blindly add all operations to sys_futex64 without
thinking what they really do.
E.g. FUTEX_{{,UN,TRY}LOCK,CMP_REQUEUE}_PI doesn't really make any sense for 
64-bit
futexes, the format of PI futexes is hardcoded in the kernel and is always
32-bit, see FUTEX_TID_MASK, FUTEX_WAITERS, FUTEX_OWNER_DIED definitions.
exit_robust_list/handle_futex_death will handle 32-bit PI futexes anyway.
Similarly, sys_futex64 shouldn't support the obsolete operations that
are there solely for compatibility (e.g. FUTEX_REQUEUE or FUTEX_FD).

When you just -ENOSYS on the PI ops, there is no need to implement
futex_atomic_cmpxchg_inatomic64.

FUTEX_WAKE_OP is questionable for 64-bit, IMHO it is better to just
-ENOSYS on it and only if anyone ever finds actual uses for it, add it.

For 64-bit futexes the only needed operations are actually
FUTEX_WAIT and perhaps FUTEX_CMP_REQUEUE, so I wonder if it isn't
better to just add FUTEX_WAIT64 and FUTEX_CMP_REQUEUE64 ops to sys_futex
instead of adding a new syscall.

But the FUTEX_{{,UN,TRY}LOCK,CMP_REQUEUE}_PI removal for 64-bit futexes
is IMHO the most important part of my complain.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: thread stacks and strict vm overcommit accounting

2007-03-16 Thread Jakub Jelinek
On Thu, Mar 15, 2007 at 11:08:40PM +, Hugh Dickins wrote:
> > appears to increase Committed_AS by around 200kb.  But we've committed to
> > providing it with 8MB for stack.
> > 
> > How come this is correct?
> 
> We've no more committed to providing each instance with 8MB of stack,
> than we've committed to providing each instance with RLIMIT_AS of
> address space.  The rlimits are limits, not commitments, surely?

RLIMIT_STACK only applies to the initial thread, POSIX threads have just
stack size attribute, not maximum thread stack size attribute.
If you set it explicitly with pthread_attr_setstacksize, then libpthread
will honor whatever thread stack size you want, otherwise it just uses
some default thread stack size.  This happens to be in NPTL derived
from RLIMIT_STACK value, but very well could be something else.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SMP performance degradation with sysbench

2007-03-13 Thread Jakub Jelinek
On Tue, Mar 13, 2007 at 01:02:44PM +0100, Eric Dumazet wrote:
> On Tuesday 13 March 2007 12:42, Andrea Arcangeli wrote:
> 
> > My wild guess is that they're allocating memory after taking
> > futexes. If they do, something like this will happen:
> >
> >  taskA  taskB   taskC
> >  user lock
> > mmap_sem lock
> >  mmap sem -> schedule
> > user lock -> schedule
> >
> > If taskB wouldn't be there triggering more random trashing over the
> > mmap_sem, the lock holder wouldn't wait and task C wouldn't wait too.
> >
> > I suspect the real fix is not to allocate memory or to run other
> > expensive syscalls that can block inside the futex critical sections...
> 
> glibc malloc uses arenas, and trylock() only. It should not block because if 
> an arena is already locked, thread automatically chose another arena, and 
> might create a new one if necessary.

Well, only when allocating it uses trylock, free uses normal lock.
glibc malloc will by default use the same arena for all threads, only when
it sees contention during allocation it gives different threads different
arenas.  So, e.g. if mysql did all allocations while holding some global
heap lock (thus glibc wouldn't see any contention on allocation), but
freeing would be done outside of application's critical section, you would
see contention on main arena's lock in the free path.
Calling malloc_stats (); from e.g. atexit handler could give interesting
details, especially if you recompile glibc malloc with -DTHREAD_STATS=1.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -freg-struct-return?

2007-02-22 Thread Jakub Jelinek
On Thu, Feb 22, 2007 at 12:09:04AM -0800, Jeremy Fitzhardinge wrote:
> Arjan van de Ven wrote:
> > Do we know how many gcc bugs this has? (regparm used to have many)
> > other than that.. sounds like a win...
> >   
> 
> The documentation suggests that its the preferred mode of operation, and
> that its the default on platforms where gcc is the primary compiler.  So
> the fact that it isn't for Linux suggests either an oversight or that it
> is actually broken...

It is used for Linux on many architectures (x86_64, sparc64, ia64,
ppc{,64}, arm, sh, m68k to name just a few), but it is an ABI decision,
so e.g. on i386 is not used by default as the ABI mandates structs/unions
are returned in memory.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: dvb shared datastructure bug?

2007-02-13 Thread Jakub Jelinek
On Tue, Feb 13, 2007 at 03:14:23PM +0400, Manu Abraham wrote:
> >thanks for pointing out this issue.
> >
> >attached find a patch that fixes the problem.
> >
> >@mauro - please pull changeset a7ac92d208fe
> >   dvbdev: fix illegal re-usage of fileoperations struct
> >
> >from  http://www.linuxtv.org/hg/~mws/v4l-dvb-fixtree
> >
> 
> Ack'd-by: Manu Abraham <[EMAIL PROTECTED]>

Wouldn't it be better to kmalloc both struct dvb_device and
struct file_operations together instead of doing 2 separate allocations?
struct dvd_device_plus_fops
{
  struct dvb_device dev;
  struct file_operations fops;
} *dev_fops = kmalloc (sizeof (struct dvd_device_plus_fops), GFP_KERNEL);
*pdvbdev = dvbdev = (struct dvb_device *)dev_fops;
if (dev_fops == NULL)
  error handling;
memset (&dev_fops->fops, 0, sizeof (dev_fops->fops));
...
dvbdev->fops = &dev_fops->fops;

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANN] Userspace M-on-N threading model implementation. Alpha release.

2007-02-04 Thread Jakub Jelinek
On Sun, Feb 04, 2007 at 03:12:32PM -0500, Bill Davidsen wrote:
> Arjan van de Ven wrote:
> >>Because user threading can avoid context switches, there will always be 
> >>cases where it will outperform o/s threads for hardware reasons.
> >
> >actually.. switching from one "real" thread to another in Linux is not
> >an actual context switch in the hardware sense... at least this part of
> >your argument seems to be incorrect ;)
> >
> How does that work? Switching between kernel threads requires going into 
> the kernel, user level thread switches are all done in user mode.
> 
> Do you have some way to change o/s threads w/o going into the kernel?

But going into kernel is not very expensive on Linux.

On the other side, the overhead you need to add for every single syscall
that might block for the M:N threads and the associated complications
which make it far harder to conform to POSIX IMHO far outweight the costs
of going into the kernel for a context switch.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Define the EF_AS_NO_RANDOM e_flag bit

2007-01-23 Thread Jakub Jelinek
On Wed, Jan 24, 2007 at 12:06:45AM +0300, Samium Gromoff wrote:
> Should we introduce per-arch asm/elf.h files to hold the relevant flag 
> definitions then?

On some architectures there are no bits left.  On others you'd need to go
through whomever maintains the relevant psABI to get a bit officially
allocated.  Really, it is very bad idea to use e_flags for this.

If all you care about is running setuid LISP programs, you'd much better put
your energy into fixing the buggy ELF dumper in it.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Define the EF_AS_NO_RANDOM e_flag bit

2007-01-23 Thread Jakub Jelinek
On Tue, Jan 23, 2007 at 11:28:13PM +0300, Samium Gromoff wrote:
> Author: Samium Gromoff <[EMAIL PROTECTED]>
> Date:   Tue Jan 23 22:31:13 2007 +0300
> 
> Define the ELF binary header flag EF_AS_NO_RANDOM
> 
> EF_AS_NO_RANDOM should mean that the binary requests to not apply
> randomisation to address spaces of its processes.
> 
> diff --git a/include/linux/elf.h b/include/linux/elf.h
> index 60713e6..58ebb47 100644
> --- a/include/linux/elf.h
> +++ b/include/linux/elf.h
> @@ -172,6 +172,8 @@ typedef struct elf64_sym {
>  
>  #define EI_NIDENT  16
>  
> +#define EF_AS_NO_RANDOM 0x1/* do not randomise the address space */
> +

You can't make up EF_* flags this way, they are arch specific, the LSB bit
(but many others too) are already used on many architectures.
E.g.:
elf/mt.h:#define EF_MT_CPU_MRISC  0x0001  /* default */
elf/sparc.h:#define EF_SPARCV9_PSO0x1 /* partial store 
ordering */
elf/bfin.h:#define EF_BFIN_PIC0x0001  /* -fpic */
elf/alpha.h:#define EF_ALPHA_32BIT0x0001
elf/mips.h:#define EF_MIPS_NOREORDER  0x0001
elf/m68k.h:#define EF_M68K_CF_ISA_A_NODIV 0x01  /* ISA A except for div */
elf/sh.h:#define EF_SH1  1
elf/arm.h:#define EF_ARM_RELEXEC 0x01
elf/cris.h:#define EF_CRIS_UNDERSCORE 0x0001
elf/ia64.h:#define EF_IA_64_TRAPNIL (1 << 0)  /* Trap NIL pointer dereferences. 
 */
elf/vax.h:#define EF_VAX_NONPIC   0x0001  /* Object contains 
non-PIC code */
elf/iq2000.h:#define EF_IQ2000_CPU_IQ2000 0x0001  /* default */
elf/frv.h:#define EF_FRV_GPR_32   0x0001  /* -mgpr-32 */
to name just a few.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.20-rc4 4/4][RFC] sys_futex64 : allows 64bit futexes

2007-01-11 Thread Jakub Jelinek
On Tue, Jan 09, 2007 at 05:25:26PM +0100, Pierre Peiffer wrote:
> This latest patch is an adaptation of the sys_futex64 syscall provided in 
> -rt
> patch (originally written by Ingo). It allows the use of 64bit futex.
> 
> I have re-worked most of the code to avoid the duplication of the code.
> 
> It does not provide the functionality for all architectures, and thus, it 
> can
> not be applied "as is".
> But, again, feedbacks and comments are welcome.

Why do you support all operations for 64-bit futexes?
IMHO PI futexes don't make sense for 64-bit futexes, PI futexes have
hardcoded bit layout of the 32-bit word.  Similarly, FUTEX_WAKE
is not really necessary for 64-bit futexes, 32-bit futex's FUTEX_WAKE
can wake it equally well (it never reads anything, all it cares
is about the futex's address).  Similarly, I don't see a need for
FUTEX_WAKE_OP (and this could simplify the patch quite a lot, no
need to change asm*/futex.h headers at all).
All that's needed is 64-bit FUTEX_WAIT and perhaps FUTEX_CMP_REQUEUE.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.20-rc4 1/4] futex priority based wakeup

2007-01-10 Thread Jakub Jelinek
On Wed, Jan 10, 2007 at 12:47:21PM +0100, Pierre Peiffer wrote:
> So, yes it (logically) has a cost, depending of the number of different 
> priorities used, so it's specially measurable with real-time threads.
> With SCHED_OTHER, I suppose that the priorities are not be very distributed.
> 
> May be, supposing it makes sense to respect the priority order only for 
> real-time pthreads, I can register all SCHED_OTHER threads to the same 
> MAX_RT_PRIO priotity ?
> Or do you think this must be set behind a CONFIG* option ?
> (Or finally not interesting enough for mainline ?)

As soon as there is at least one non-SCHED_OTHER thread among the waiters,
there is no question about whether plist should be used or not, that's
a correctness issue and if we want to conform to POSIX, we have to use that.

I guess Ulrich's question was mainly about performance differences
with/without plist wakeup when all threads are SCHED_OTHER.  I'd say for
that a pure pthread_mutex_{lock,unlock} benchmark or even just a program
which uses futex FUTEX_WAIT/FUTEX_WAKE in a bunch of threads would be
better.

In the past we talked with Ingo about the possibilities here, one is use
plist always and prove that it doesn't add measurable overhead over current
FIFO (when only SCHED_OTHER is involved), the other possibility would be
to start using FIFOs as before, but when the first non-SCHED_OTHER thread
decides to wait on the futex, switch it to plist wakeup mode (convert the
FIFO into a plist) and from that point on just use plist wakeups on it.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Intel Core Duo/Duo2 T2300/E6400 - Hyper-Threading (the absence of)

2007-01-08 Thread Jakub Jelinek
On Mon, Jan 08, 2007 at 01:44:32AM -0800, Robin H. Johnson wrote:
> (Please CC me, I am not subscribed to LKML [I have set the
> Mail-Followup-To header accordingly]).
> 
> On two of my new machines, with Intel Core Duo T2300 and Core2 Duo E6400
> chips respectively, I noticed some weirdness in how many CPUs are
> present. 
> 
> If the hyper-threading bit is present in the CPU info, should there
> always be a an extra CPU presented to the system per physical core?

No.  The ht flag just says whether HT reporting via CPUID is supported.
Core2 Duo E6400 is AFAIK not hyper-threaded, you just have 2 real sibling
CPUs (except that they share L2 cache).

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems

2007-01-03 Thread Jakub Jelinek
On Wed, Jan 03, 2007 at 05:32:16AM -0800, Arjan van de Ven wrote:
> On Wed, 2007-01-03 at 12:44 +, Alan wrote:
> > > > fixed. At that point an i686 kernel would contain i686 instructions and
> > > > actually run on all i686 processors ending all the i586 pain for most
> > > > users and distributions.
> > > 
> > > Could you explain why CMOV is pointless now? Are there any benchmarks 
> > > proving that?
> > 
> > Take a look at the recent ffmpeg bits on the mplayer list for one example
> > I have to hand - P4 cmov is pretty slow. The crypto folks find the same
> > things.
> 
> cmov is effectively the same cost as a compare and jump, in both cases
> the cpu needs to do a prediction, and on a mispredict, restart.
> 
> the reason cmov can make sense is because it's smaller code...

BTW, from GCC POV availability of CMOV is the only difference between
-march=i586 -mtune=something and -march=i686 -mtune=something.  So this is
just a naming thing, it could be called -march=i686cmov to make it more
obvious but it is too late (and too unimportant) to change it now.
Perhaps adding a note to info gcc/man gcc ought to be enough?
If you don't want CMOV being emitted, compile with -march=i586 -mtune=generic
(or whatever other tuning you pick up), with -march=i686 -mtune=generic
you tell GCC you have CMOV.  Whether CMOV is actually used in generated
code is another matter, which should be decided based on the selected
-mtune.  For -Os CMOV should be used whenever available, as that means
usually smaller code, otherwise if on some particular chip CMOV is actually
slower than compare, jump and assignment, then CMOV should not be selected
for that particular tuning (say if Pentium4 has slower CMOV than
compare+jump+assignment, -mtune=pentium4 should not emit CMOV, at least not
often), if you have examples of that, please file a bug to
http://gcc.gnu.org/bugzilla/.  -mtune=generic should emit resp. not emit
CMOV depending on whether it is a win on the currently common CPUs.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

2006-11-30 Thread Jakub Jelinek
On Fri, Dec 01, 2006 at 08:28:16AM +0100, Willy Tarreau wrote:
> Oh, I'm perfectly aware of this. That's in part why I started the hotfix
> branch in the past :-) But sometimes, fixes consist in merging all the
> patches from the maintenance branch (eg: from 4.1.0 to 4.1.1), and if
> this is the case, there would not be much justification not to simply
> update the version. In fact, what's really missing is a "fixlevel" in
> the packages, to inform the user that 4.1.0 as shipped by the distro
> has the same level of fixes as 4.1.1. But this is what the version is
> used for today.

This is even more complicated by the fact that upstream GCC release branches
(and also several Linux distributors) start announcing the upcoming version
already a few days after a release is tagged.
E.g. 14 days old gcc-4_1-branch says:
./xgcc -B ./ --version; ./xgcc -B ./ -dD -E -xc /dev/null | grep GNU
xgcc (GCC) 4.1.2 20061114 (prerelease)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

#define __GNUC__ 4
#define __GNUC_MINOR__ 1
#define __GNUC_PATCHLEVEL__ 2

but GCC 4.1.2 has not been released yet.
In Fedora Core/RHEL and I think a few other distros the version number
is only changed when it is officially released, e.g.:

gcc --version; gcc -dD -E -xc /dev/null | grep GNU
gcc (GCC) 4.1.1 20061011 (Red Hat 4.1.1-30)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

#define __GNUC__ 4
#define __GNUC_MINOR__ 1
#define __GNUC_PATCHLEVEL__ 1
#define __GNUC_RH_RELEASE__ 30

Note, 4.1.1 was released end of May this year and 4.1.2 has not been
released.  So, using __GNUC_PATCHLEVEL__ to detect if a bug has been fixed
or not isn't very useful (you'd need to rule out also __GNUC_PATCHLEVEL__ <= 1
because gcc-4_1-branch was announcing that patchlevel already since
beggining of March, on the other side there is a lot of GCCs with
__GNUC_PATCHLEVEL__ == 1 that certainly have that bug fixed).

You perhaps could parse the prerelease vs. release vs. vendor strings,
but that could be quite difficult, perhaps easier would be just parse the
date in the --version output.  Checking for the bug is best though, because
that will catch even backports of the bugfix without rebasing from the
release branch.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 4/5][AIO] - AIO completion signal notification

2006-11-29 Thread Jakub Jelinek
On Wed, Nov 29, 2006 at 11:33:01AM +0100, S?bastien Dugu? wrote:
>   AIO completion signal notification
> 
>   The current 2.6 kernel does not support notification of user space via
> an RT signal upon an asynchronous IO completion. The POSIX specification
> states that when an AIO request completes, a signal can be delivered to
> the application as notification.
> 
>   This patch adds a struct sigevent *aio_sigeventp to the iocb.
> The relevant fields (pid, signal number and value) are stored in the kiocb
> for use when the request completes.
> 
>   That sigevent structure is filled by the application as part of the AIO
> request preparation. Upon request completion, the kernel notifies the
> application using those sigevent parameters. If SIGEV_NONE has been specified,
> then the old behaviour is retained and the application must rely on polling
> the completion queue using io_getevents().

Well, from what I see applications must rely on polling the completion
queue using io_getevents() in any case, isn't that the only way how to free
the kernel resources associated with the AIO request, even if it uses
SIGEV_SIGNAL or thread notification?  aio_error/aio_return/aio_suspend
will still need to io_getevents, the only important difference with this
patch is that a) the polling doesn't need to be asynchronous (i.e. have
a special thread which just loops doing io_getevents)
b) it doesn't need to care about notification itself.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

2006-11-29 Thread Jakub Jelinek
On Wed, Nov 29, 2006 at 02:56:20PM +1100, Keith Owens wrote:
> Nicholas Miell (on Tue, 28 Nov 2006 19:08:25 -0800) wrote:
> >On Wed, 2006-11-29 at 13:22 +1100, Keith Owens wrote:
> >> Compiling 2.6.19-rc6 with gcc version 4.1.0 (SUSE Linux),
> >> wait_hpet_tick is optimized away to a never ending loop and the kernel
> >> hangs on boot in timer setup.
> >> 
> >> 001a :
> >>   1a:   55  push   %ebp
> >>   1b:   89 e5   mov%esp,%ebp
> >>   1d:   eb fe   jmp1d 
> >> 
> >> This is not a problem with gcc 3.3.5.  Adding barrier() calls to
> >> wait_hpet_tick does not help, making the variables volatile does.
> >> 
> >> Signed-off-by: Keith Owens 
> >> 
> >> ---
> >>  arch/i386/kernel/time_hpet.c |2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> 
> >> Index: linux-2.6/arch/i386/kernel/time_hpet.c
> >> ===
> >> --- linux-2.6.orig/arch/i386/kernel/time_hpet.c
> >> +++ linux-2.6/arch/i386/kernel/time_hpet.c
> >> @@ -51,7 +51,7 @@ static void hpet_writel(unsigned long d,
> >>   */
> >>  static void __devinit wait_hpet_tick(void)
> >>  {
> >> -  unsigned int start_cmp_val, end_cmp_val;
> >> +  unsigned volatile int start_cmp_val, end_cmp_val;
> >>  
> >>start_cmp_val = hpet_readl(HPET_T0_CMP);
> >>do {
> >
> >When you examine the inlined functions involved, this looks an awful lot
> >like http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22278
> >
> >Perhaps SUSE should fix their gcc instead of working around compiler
> >problems in the kernel?
> 
> Firstly, the fix for 22278 is included in gcc 4.1.0.

This actually sounds more like http://gcc.gnu.org/PR27236
And that one is broken in 4.1.0, fixed in 4.1.1.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] work around gcc4 issue with -Os in Dwarf2 stack unwind code

2006-11-28 Thread Jakub Jelinek
On Tue, Nov 28, 2006 at 02:48:15PM +, Jan Beulich wrote:
> I disagree - the standard says there's a sequence point at a function
> call after evaluating all function arguments. To me this means that any

That's true, that sequence point makes sure e.g. all side effects such as
pre-{dec,inc}rement on the arguments happen before the call.
But as I said, no sequence point demands any particular ordering of
evaluation of the LHS and RHS of +=.

> (parts of an) expression the function call is contained in must be
> evaluated after the function call. Otherwise it would be illegal to e.g.
> modify a variable in both operands of && or ||.

That's different, there is a sequence point at the end of the first operand
of &&, ||, ?: and , operators (second bullet in ISO C99 Annex C).

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] work around gcc4 issue with -Os in Dwarf2 stack unwind code

2006-11-28 Thread Jakub Jelinek
On Tue, Nov 28, 2006 at 02:12:24PM +, Jan Beulich wrote:
> This fixes a problem with gcc4 mis-compiling the stack unwind code under
> -Os, which resulted in 'stuck' messages whenever an assembly routine was
> encountered.

"mis-compiling" and "work around" are wrong words, the code had undefined
behavior (there is no sequence point between evaluation of ptr and
get_uleb128(&ptr, end) and ptr is modified twice, so the compiler can
evaluate it e.g. as:
temp = ptr;
temp = temp + get_uleb128(&ptr, end);
ptr = temp;
or
temp = get_uleb128(&ptr, end);
ptr += temp;
While gcc has some warnings for sequence point semantics violations
(-Wsequence-point), this can't be one of the cases at least until IPA moves
much further, because get_uleb128 might very well not modify the variable
and at that point the code would be ok).

> Signed-off-by: Jan Beulich <[EMAIL PROTECTED]>
> 
> --- linux-2.6.19-rc6/kernel/unwind.c  2006-11-22 14:54:10.0 +0100
> +++ 2.6.19-rc6-unwind-stuck/kernel/unwind.c   2006-11-28 15:02:15.0 
> +0100
> @@ -938,8 +938,11 @@ int unwind(struct unwind_frame_info *fra
>   else {
>   retAddrReg = state.version <= 1 ? *ptr++ : 
> get_uleb128(&ptr, end);
>   /* skip augmentation */
> - if (((const char *)(cie + 2))[1] == 'z')
> - ptr += get_uleb128(&ptr, end);
> + if (((const char *)(cie + 2))[1] == 'z') {
> + uleb128_t augSize = get_uleb128(&ptr, end);
> +
> + ptr += augSize;
> + }
>   if (ptr > end
>  || retAddrReg >= ARRAY_SIZE(reg_info)
>  || REG_INVALID(retAddrReg)

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >