from:"Peter Zijlstra"

Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics

2023-07-07 Thread Peter Zijlstra

On Fri, Jul 07, 2023 at 10:04:06AM -0400, Olivier Dion wrote:
> On Tue, 04 Jul 2023, Peter Zijlstra  wrote:
> > On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote:
> [...]
> >> On x86-64 (gcc 13.1 -O2) we get:
> >> 
> >>   t0():
> >>   movl$1, x(%rip)
> >>   movl$1, %eax
> >>   xchgl   dummy(%rip), %eax
> >>   lock orq $0, (%rsp)   ;; Redundant with previous exchange.
> >>   movly(%rip), %eax
> >>   movl%eax, r0(%rip)
> >>   ret
> >>   t1():
> >>   movl$1, y(%rip)
> >>   lock orq $0, (%rsp)
> >>   movlx(%rip), %eax
> >>   movl%eax, r1(%rip)
> >>   ret
> >
> > So I would expect the compilers to do better here. It should know those
> > __atomic_thread_fence() thingies are superfluous and simply not emit
> > them. This could even be done as a peephole pass later, where it sees
> > consecutive atomic ops and the second being a no-op.
> 
> Indeed, a peephole optimization could work for this Dekker, if the
> compiler adds the pattern for it.  However, AFAIK, a peephole can not be
> applied when the two fences are in different basic blocks.  For example,
> only emitting a fence on a compare_exchange success.  This limitation
> implies that the optimization can not be done across functions/modules
> (shared libraries).

LTO FTW :-)

> For example, it would be interesting to be able to
> promote an acquire fence of a pthread_mutex_lock() to a full fence on
> weakly ordered architectures while preventing a redundant fence on
> strongly ordered architectures.

That's a very non-trivial thing to do. I know Linux has
smp_mb__after_spinlock() and that x86 has it a no-op, but even on x86
adding a full fence after a lock has observable differences IIRC.

Specifically, the actual store that acquires the lock is not well
ordered vs the critical section itself for non-trivial spinlock
implementations (notably qspinlock).

For RCU you mostly care about RCsc locks (IIRC), and upgrading unlock is
a 'simpler' (IMO) approach to achieve that (which is what RCU does with
smp_mb_after_unlock_lock()).

> We know that at least Clang has such peephole optimizations for some
> architecture backends.  It seems however that they do not recognize
> lock-prefixed instructions as fence.

They seem confused in general for emitting MFENCE.

> AFAIK, GCC does not have that kind
> of optimization.

> We are also aware that some research has been done on this topic [0].
> The idea is to use PRE for elimiation of redundant fences.  This would
> work across multiple basic blocks, although the paper focus on
> intra-procedural eliminations.  However, it seems that the latest work
> on that [1] has never been completed [2].
> 
> Our proposed approach provides a mean for the user to express -- and
> document -- the wanted semantic in the source code.  This allows the
> compiler to only emit wanted fences, therefore not relying on
> architecture specific backend optimizations.  In other words, this
> applies even on unoptimized binaries.

I'm not a tool person, but if I were, I'd be very hesitant to add
__builtin functions that 'conflict'/'overlap' with what an optimizer
should be able to do.

Either way around you need work done on the compilers, and I'm thinking
'fixing' the optimizer will benefit far more people than adding
__builtin's.

Then again, I'm not a tools person, so you don't need to convince me.
But one of the selling points of the whole Atomics as a language feature
was that whole optimizer angle. Otherwise you might as well do as we do,
inline asm the world.

I'll shut up now, thanks for that PRE reference [0], that seems a fun
read for when I'm bored.

Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics

2023-07-04 Thread Peter Zijlstra

On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote:

>   int x = 0;
>   int y = 0;
>   int r0, r1;
> 
>   int dummy;
> 
>   void t0(void)
>   {
>   __atomic_store_n(&x, 1, __ATOMIC_RELAXED);
> 
>   __atomic_exchange_n(&dummy, 1, __ATOMIC_SEQ_CST);
>   __atomic_thread_fence(__ATOMIC_SEQ_CST);
> 
>   r0 = __atomic_load_n(&y, __ATOMIC_RELAXED);
>   }
> 
>   void t1(void)
>   {
>   __atomic_store_n(&y, 1, __ATOMIC_RELAXED);
>   __atomic_thread_fence(__ATOMIC_SEQ_CST);
>   r1 = __atomic_load_n(&x, __ATOMIC_RELAXED);
>   }
> 
>   // BUG_ON(r0 == 0 && r1 == 0)
> 
> On x86-64 (gcc 13.1 -O2) we get:
> 
>   t0():
>   movl$1, x(%rip)
>   movl$1, %eax
>   xchgl   dummy(%rip), %eax
>   lock orq $0, (%rsp)   ;; Redundant with previous exchange.
>   movly(%rip), %eax
>   movl%eax, r0(%rip)
>   ret
>   t1():
>   movl$1, y(%rip)
>   lock orq $0, (%rsp)
>   movlx(%rip), %eax
>   movl%eax, r1(%rip)
>   ret

So I would expect the compilers to do better here. It should know those
__atomic_thread_fence() thingies are superfluous and simply not emit
them. This could even be done as a peephole pass later, where it sees
consecutive atomic ops and the second being a no-op.

> On x86-64 (clang 16 -O2) we get:
> 
>   t0():
>   movl$1, x(%rip)
>   movl$1, %eax
>   xchgl   %eax, dummy(%rip)
>   mfence;; Redundant with previous exchange.

And that's just terrible :/ Nobody should be using MFENCE for this. And
using MFENCE after a LOCK prefixes instruction (implicit in this case)
is just fail, because I don't think C++ atomics cover MMIO and other
such 'lovely' things.

>   movly(%rip), %eax
>   movl%eax, r0(%rip)
>   retq
>   t1():
>   movl$1, y(%rip)
>   mfence
>   movlx(%rip), %eax
>   movl%eax, r1(%rip)
>   retq

Re: GCC 12 miscompilation of volatile asm (was: Re: [PATCH] arm64/io: Remind compiler that there is a memory side effect)

2022-04-05 Thread Peter Zijlstra

On Tue, Apr 05, 2022 at 01:51:30PM +0100, Mark Rutland wrote:
> Hi all,
> 
> [adding kernel folk who work on asm stuff]
> 
> As a heads-up, GCC 12 (not yet released) appears to erroneously optimize away
> calls to functions with volatile asm. Szabolcs has raised an issue on the GCC
> bugzilla:  
> 
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160
> 
> ... which is a P1 release blocker, and is currently being investigated.
> 
> Jemery originally reported this as an issue with {readl,writel}_relaxed(), but
> the underlying problem doesn't have anything to do with those specifically.
> 
> I'm dumping a bunch of info here largely for posterity / archival, and to find
> out who (from the kernel side) is willing and able to test proposed compiler
> fixes, once those are available.
> 
> I'm happy to do so for aarch64; Peter, I assume you'd be happy to look at the
> x86 side?

Sure..

Re: Re: typeof and operands in named address spaces

2020-11-16 Thread Peter Zijlstra

On Mon, Nov 16, 2020 at 12:23:17PM +, Uecker, Martin wrote:

> > > > Another way to drop qualifiers is using a cast. So you
> > > > can use typeof twice:
> > > > 
> > > > typeof((typeof(_var))_var) tmp__;
> > > > 
> > > > This also works for non-scalars but this is a GCC extension.
> 
> (That casts drop qualifiers is standard C. The extensions
> are 'typeof' and that casts can be used for non-scalar types.)

Ah, I'll clarify. Thanks!

Re: Re: typeof and operands in named address spaces

2020-11-16 Thread Peter Zijlstra

On Mon, Nov 16, 2020 at 12:10:56PM +0100, Peter Zijlstra wrote:

> > > Another way to drop qualifiers is using a cast. So you
> > > can use typeof twice:
> > >
> > > typeof((typeof(_var))_var) tmp__;
> > >
> > > This also works for non-scalars but this is a GCC extension.

FWIW, clang seems to support this extension as well..

Re: Re: typeof and operands in named address spaces

2020-11-16 Thread Peter Zijlstra



( restoring at least linux-toolcha...@vger.kernel.org, since that seems
  to have gone missing )

On Mon, Nov 16, 2020 at 10:11:50AM +0100, Richard Biener wrote:
> On Sun, Nov 15, 2020 at 11:53 AM Uecker, Martin
>  wrote:
> > > On Wed, Nov 04, 2020 at 07:31:42PM +0100, Uros Bizjak wrote:
> > > > Hello!
> > > >
> > > > I was looking at the recent linux patch series [1] where segment
> > > > qualifiers (named address spaces) were introduced to handle percpu
> > > > variables. In the patch [2], the author mentions that:
> > > >
> > > > --q--
> > > > Unfortunately, gcc does not provide a way to remove segment
> > > > qualifiers, which is needed to use typeof() to create local instances
> > > > of the per-cpu variable. For this reason, do not use the segment
> > > > qualifier for per-cpu variables, and do casting using the segment
> > > > qualifier instead.
> > > > --/q--
> > >
> > > C in general does not provide means to strip qualifiers. We recently had
> > > a _lot_ of 'fun' trying to strip volatile from a type, see here:
> > >
> > >   https://lore.kernel.org/lkml/875zimp0ay@mpe.ellerman.id.au
> > >
> > > which resulted in the current __unqual_scalar_typeof() hack.
> > >
> > > If we're going to do compiler extentions here, can we pretty please have
> > > a sane means of modifying qualifiers in general?
> >
> > Another way to drop qualifiers is using a cast. So you
> > can use typeof twice:
> >
> > typeof((typeof(_var))_var) tmp__;
> >
> > This also works for non-scalars but this is a GCC extension.
> >
> >
> > WG14 plans to standardize typeof. I would like to hear opinion
> > whether we should have typeof drop qualifiers or not.
> >
> > Currently, it does not do this on all compilers I tested
> > (except _Atomic on GCC) and there are also use cases for
> > keeping qualifiers. This is an argument for keeping qualifiers
> > should we standardize it, but then we need a way to drop
> > qualifiers.
> >
> >
> > lvalue conversion drops qualifers in C.  In GCC, this is not
> > implemented correctly as it is unobvervable in standard C
> > (but it using typeof).
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97702
> >
> > A have a working patch in preparation to change this. Then you
> > could use
> >
> > typeof( ((void)0, x) )

Neat, that actually already works with clang. And I suppose we can use
the above GCC extention until such time as that GCC is fixed.

See below..

> > to drop qualifiers. But this would then
> > also do array-to-pointer conversion. I am not sure
> > whether this is a problem.

I don't _think_ so, but..

> > Of course, we could also introduce a new feature for
> > dropping qualifiers. Thoughts?

> Just add a new qualifier that un-qualifies?
> 
> _Unqual volatile T x;
> 
> is T with volatile (evenually) removed.  Or just a way to drop
> all using _Unqual?
> 
> _Unqual T x;
> 
> removing all qualifiers from T.  Or add a special _Unqual_all
> to achieve that.  I think removing a specific qualification is
> useful.  Leaves cases like
> 
> _Unqual volatile volatile T x;
> 
> to be specified (that is ordering and cancellation of the
> unqual and qual variants of qualifiers).

I rather like this, however I think I'd prefer the syntax be something
like:

_Unqual T x;

for removing all qualifiers, and:

_Unqual(volatile) volatile T X;

for stripping specific qualifiers. The syntax as proposed above seems
very error prone to me.


---
Subject: compiler: Improve __unqual_typeof()

Improve our __unqual_scalar_typeof() implementation by relying on C
dropping qualifiers for lvalue convesions. There is one small catch in
that GCC is currently known broken in this respect, however it happens
to have a C language extention that achieves the very same, it drops
qualifiers on casts.

This gets rid of the _Generic() usage and should improve compile times
(less preprocessor output) as well as increases the capabilities of the
macros.

XXX: I've only verified the below actually compiles, I've not verified
 the generated code is actually 'correct'.

Suggested-by: "Uecker, Martin" 
Signed-off-by: Peter Zijlstra (Intel) 
---
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 74c6c0486eed..3c5cb52c12f9 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -156,3 +156,11 @@
 #else
 #define __diag_GCC_8(s)
 #endif
+
+/*
+ * GCC has a bug where lvalue conversion

Re: typeof and operands in named address spaces

2020-11-10 Thread Peter Zijlstra

On Tue, Nov 10, 2020 at 10:42:58AM -0800, Nick Desaulniers wrote:

> When I think of qualifiers, I think of const and volatile.  I'm not
> sure why the first post I'm cc'ed on talks about "segment" qualifiers.
> Maybe it's in reference to a variable attribute that the kernel
> defines?  Looking at Clang's Qualifier class, I see const, volatile,
> restrict (ah, right), some Objective-C stuff, and address space
> (TR18037 is referenced, I haven't looked up what that is) though maybe
> "segment" pseudo qualifiers the kernel defines expand to address space
> variable attributes?

Right, x86 Named Address Space:

  
https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Named-Address-Spaces.html#Named-Address-Spaces

Also, Google found me this:

  https://reviews.llvm.org/D64676

The basic problem seems to be they act exactly like qualifiers in that
typeof() preserves them, so if you have:

( and now I realize the parent isn't Cc'd to LKML, find here:
  https://gcc.gnu.org/pipermail/gcc/2020-November/234119.html )

> --cut here--
> #define foo(_var)\
> ({\
> typeof(_var) tmp__;\
> asm ("mov %1, %0" : "=r"(tmp__) : "m"(_var));\
> tmp__;\
> })
>
> __seg_fs int x;
>
> int test (void)
> {
> int y;
>
> y = foo (x);
> return y;
> }
> --cut here--

> when compiled with -O2 for x86 target, the compiler reports:
>
> pcpu.c: In function ‘test’:
> pcpu.c:14:3: error: ‘__seg_fs’ specified for auto variable ‘tmp__’


> Maybe stripping all qualifiers is fine since you can add them back in
> if necessary?

So far that seems sufficient. Although the Devil's advocate in me is
trying to construct a case where we need to preserve const but strip
volatile and that's then means we need to detect if the original has
const or not, because unconditionally adding it will be wrong.

Re: typeof and operands in named address spaces

2020-11-09 Thread Peter Zijlstra

On Mon, Nov 09, 2020 at 11:50:15AM -0800, Nick Desaulniers wrote:
> On Mon, Nov 9, 2020 at 11:46 AM Segher Boessenkool
>  wrote:
> >
> > On Mon, Nov 09, 2020 at 01:47:13PM +0100, Peter Zijlstra wrote:
> > >
> > > + lots of people and linux-toolchains
> > >
> > > On Wed, Nov 04, 2020 at 07:31:42PM +0100, Uros Bizjak wrote:
> > > > Hello!
> > > >
> > > > I was looking at the recent linux patch series [1] where segment
> > > > qualifiers (named address spaces) were introduced to handle percpu
> > > > variables. In the patch [2], the author mentions that:
> > > >
> > > > --q--
> > > > Unfortunately, gcc does not provide a way to remove segment
> > > > qualifiers, which is needed to use typeof() to create local instances
> > > > of the per-cpu variable. For this reason, do not use the segment
> > > > qualifier for per-cpu variables, and do casting using the segment
> > > > qualifier instead.
> > > > --/q--
> > >
> > > C in general does not provide means to strip qualifiers.
> >
> > Most ways you can try to use the result are undefined behaviour, even.
> 
> Yes, removing `const` from a `const` declared variable (via cast) then
> expecting to use the result is a great way to have clang omit the use
> from the final program.  This has bitten us in the past getting MIPS
> support up and running, and one of the MTK gfx drivers.

Stripping const to delcare another variable is useful though. Sure C has
sharp edges, esp. if you cast stuff, but since when did that stop anyone
;-)

The point is, C++ has these very nice template helpers that can strip
qualifiers, I want that too, for much of the same reasons. We might not
have templates :-(, but we've become very creative with our
pre-processor.

Surely our __unqual_scalar_typeof() cries for a better solution.

Re: typeof and operands in named address spaces

2020-11-09 Thread Peter Zijlstra

On Mon, Nov 09, 2020 at 01:38:51PM -0600, Segher Boessenkool wrote:
> On Mon, Nov 09, 2020 at 01:47:13PM +0100, Peter Zijlstra wrote:
> > 
> > + lots of people and linux-toolchains
> > 
> > On Wed, Nov 04, 2020 at 07:31:42PM +0100, Uros Bizjak wrote:
> > > Hello!
> > > 
> > > I was looking at the recent linux patch series [1] where segment
> > > qualifiers (named address spaces) were introduced to handle percpu
> > > variables. In the patch [2], the author mentions that:
> > > 
> > > --q--
> > > Unfortunately, gcc does not provide a way to remove segment
> > > qualifiers, which is needed to use typeof() to create local instances
> > > of the per-cpu variable. For this reason, do not use the segment
> > > qualifier for per-cpu variables, and do casting using the segment
> > > qualifier instead.
> > > --/q--
> > 
> > C in general does not provide means to strip qualifiers.
> 
> Most ways you can try to use the result are undefined behaviour, even.
> 
> > We recently had
> > a _lot_ of 'fun' trying to strip volatile from a type, see here:
> > 
> >   https://lore.kernel.org/lkml/875zimp0ay@mpe.ellerman.id.au
> > 
> > which resulted in the current __unqual_scalar_typeof() hack.
> > 
> > If we're going to do compiler extentions here, can we pretty please have
> > a sane means of modifying qualifiers in general?
> 
> What do you want to do with it?  It may be more feasible to do a
> compiler extension for *that*.

Like with the parent use-case it's pretty much always declaring
temporaries in macros. We don't want the temporaries to be volatile, or
as the parent post points out, to have a segment qualifier.

Re: typeof and operands in named address spaces

2020-11-09 Thread Peter Zijlstra

+ lots of people and linux-toolchains

On Wed, Nov 04, 2020 at 07:31:42PM +0100, Uros Bizjak wrote:
> Hello!
> 
> I was looking at the recent linux patch series [1] where segment
> qualifiers (named address spaces) were introduced to handle percpu
> variables. In the patch [2], the author mentions that:
> 
> --q--
> Unfortunately, gcc does not provide a way to remove segment
> qualifiers, which is needed to use typeof() to create local instances
> of the per-cpu variable. For this reason, do not use the segment
> qualifier for per-cpu variables, and do casting using the segment
> qualifier instead.
> --/q--

C in general does not provide means to strip qualifiers. We recently had
a _lot_ of 'fun' trying to strip volatile from a type, see here:

  https://lore.kernel.org/lkml/875zimp0ay@mpe.ellerman.id.au

which resulted in the current __unqual_scalar_typeof() hack.

If we're going to do compiler extentions here, can we pretty please have
a sane means of modifying qualifiers in general?

Re: Broken check rejecting -fcf-protection and -mindirect-branch=thunk-extern

2020-04-28 Thread Peter Zijlstra

On Tue, Apr 28, 2020 at 02:41:33PM +0100, Andrew Cooper wrote:
> Its fine to focus on userspace first, but the kernel is far more simple.
> 
> Looking at that presentation, the only thing missing for kernel is the
> notrack thunks, in the unlikely case that such code would be tolerated
> (Frankly, I don't expect Xen or Linux to run with notrack enabled, as
> there is no legacy code to be concerned with).

Uhhh.. ftrace and kretprobes play dodgy games with the
return stack, doesn't that make the CET thing slightly more interesting?

Re: [PATCH] tell gcc optimizer to never introduce new data races

2014-06-10 Thread Peter Zijlstra

On Tue, Jun 10, 2014 at 05:04:55PM +0200, Marek Polacek wrote:
> On Tue, Jun 10, 2014 at 04:53:27PM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 10, 2014 at 03:23:36PM +0200, Jiri Kosina wrote:
> > > +# Tell gcc to never replace conditional load with a non-conditional one
> > > +KBUILD_CFLAGS+= $(call cc-option,--param allow-store-data-races=0)
> > > +
> > 
> > Why do we not want: -fmemory-model=safe? And should we not at the very
> > least also disable packed-store-data-races?
> 
> Note that the option does not exist, even though it is mentioned in the
> documentation.

Urgh.. ok. Any word on the packed-store-data thing?


pgpMWhQCfAGsj.pgp
Description: PGP signature

Re: [PATCH] tell gcc optimizer to never introduce new data races

2014-06-10 Thread Peter Zijlstra

On Tue, Jun 10, 2014 at 03:23:36PM +0200, Jiri Kosina wrote:
> +# Tell gcc to never replace conditional load with a non-conditional one
> +KBUILD_CFLAGS+= $(call cc-option,--param allow-store-data-races=0)
> +

Why do we not want: -fmemory-model=safe? And should we not at the very
least also disable packed-store-data-races?

pgpDwTn3j17ts.pgp
Description: PGP signature

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-19 Thread Peter Zijlstra

On Wed, Feb 19, 2014 at 12:07:02PM +0100, Torvald Riegel wrote:
> > Its not only hardware; also the kernel/user boundary has this same
> > problem. We cannot a-priory say what userspace will do; in fact, because
> > we're a general purpose OS, we must assume it will willfully try its
> > bestest to wreck whatever assumptions we make about its behaviour.
> 
> That's a good note, and I think a distinct case from those below,
> because here you're saying that you can't assume that the userspace code
> follows the C11 semantics ...

Right; we can malfunction in those cases though; as long as the
malfunctioning happens on the userspace side. That is, whatever
userspace does should not cause the kernel to crash, but userspace
crashing itself, or getting crap data or whatever is its own damn fault
for not following expected behaviour.

To stay on topic; if the kernel/user interface requires memory ordering
and userspace explicitly omits the barriers all malfunctioning should be
on the user. For instance it might loose a fwd progress guarantee or
data integrity guarantees.

In specific, given a kernel/user lockless producer/consumer buffer, if
the user-side allows the tail write to happen before its data reads are
complete, the kernel might overwrite the data its still reading.

Or in case of futexes, if the user side doesn't use the appropriate
operations its lock state gets corrupt but only userspace should suffer.

But yes, this does require some care and consideration from our side.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-18 Thread Peter Zijlstra

> > 4.  Some drivers allow user-mode code to mmap() some of their
> > state.  Any changes undertaken by the user-mode code would
> > be invisible to the compiler.
> 
> A good point, but a compiler that doesn't try to (incorrectly) assume
> something about the semantics of mmap will simply see that the mmap'ed
> data will escape to stuff if can't analyze, so it will not be able to
> make a proof.
> 
> This is different from, for example, malloc(), which is guaranteed to
> return "fresh" nonaliasing memory.

The kernel side of this is different.. it looks like 'normal' memory, we
just happen to allow it to end up in userspace too.

But on that point; how do you tell the compiler the difference between
malloc() and mmap()? Is that some function attribute?

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-18 Thread Peter Zijlstra

On Tue, Feb 18, 2014 at 10:21:56PM +0100, Torvald Riegel wrote:
> Yes, I do.  But that seems to be "volatile" territory.  It crosses the
> boundaries of the abstract machine, and thus is input/output.  Which
> fraction of your atomic accesses can read values produced by hardware?
> I would still suppose that lots of synchronization is not affected by
> this.

Its not only hardware; also the kernel/user boundary has this same
problem. We cannot a-priory say what userspace will do; in fact, because
we're a general purpose OS, we must assume it will willfully try its
bestest to wreck whatever assumptions we make about its behaviour.

We also have loadable modules -- much like regular userspace DSOs -- so
there too we cannot say what will or will not happen.

We also have JITs that generate code on demand.

And I'm absolutely sure (with the exception of the JITs, its not an area
I've worked on) that we have atomic usage across all those boundaries.

I must agree with Linus, global state driven optimizations are crack
brained; esp. for atomics. We simply cannot know all state at compile
time. The best we can hope for are local optimizations.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-18 Thread Peter Zijlstra

On Tue, Feb 18, 2014 at 10:21:56PM +0100, Torvald Riegel wrote:
> Well, that's how atomics that aren't volatile are defined in the
> standard.  I can see that you want something else too, but that doesn't
> mean that the other thing is broken.

Well that other thing depends on being able to see the entire program at
compile time. PaulMck already listed various ways in which this is
not feasible even for normal userspace code.

In particular; DSOs and JITs were mentioned.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-18 Thread Peter Zijlstra

On Tue, Feb 18, 2014 at 12:12:06PM +, Peter Sewell wrote:
> Several of you have said that the standard and compiler should not
> permit speculative writes of atomics, or (effectively) that the
> compiler should preserve dependencies.

The example below only deals with control dependencies; so I'll limit
myself to that.

> In simple examples it's easy
> to see what that means, but in general it's not so clear what the
> language should guarantee, because dependencies may go via non-atomic
> code in other compilation units, and we have to consider the extent to
> which it's desirable to limit optimisation there.
> 
> For example, suppose we have, in one compilation unit:
> 
> void f(int ra, int*rb) {
>   if (ra==42)
> *rb=42;
>   else
> *rb=42;
> }
> 
> and in another compilation unit the bodies of two threads:
> 
> // Thread 0
> r1 = x;
> f(r1,&r2);
> y = r2;
> 
> // Thread 1
> r3 = y;
> f(r3,&r4);
> x = r4;
> 
> where accesses to x and y are annotated C11 atomic
> memory_order_relaxed or Linux ACCESS_ONCE(), accesses to
> r1,r2,r3,r4,ra,rb are not annotated, and x and y initially hold 0.

So I'm intuitively ok with this, however I would expect something like:

  void f(_Atomic int ra, _Atomic int *rb);

To preserve dependencies and not make the conditional go away, simply
because in that case the:

  if (ra == 42)

the 'ra' usage can be seen as an atomic load.

> So as far as we can see, either:
> 
> 1) if you can accept the latter behaviour (if the Linux codebase does
>not rely on its absence), the language definition should permit it,
>and current compiler optimisations can be used,

Currently there's exactly 1 site in the Linux kernel that relies on
control dependencies as far as I know -- the one I put in. And its
limited to a single function, so no cross translation unit funnies
there.

Of course, nobody is going to tell me when or where they'll put in the
next one; since its now documented as accepted practise.

However, PaulMck and our RCU usage very much do cross all sorts of TU
boundaries; but those are data dependencies.

~ Peter

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-14 Thread Peter Zijlstra

On Thu, Feb 13, 2014 at 09:07:55PM -0800, Torvald Riegel wrote:
> That depends on what your goal is.

A compiler that we don't need to fight in order to generate sane code
would be nice. But as Linus said; we can continue to ignore you lot and
go on as we've done.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-12 Thread Peter Zijlstra

On Wed, Feb 12, 2014 at 09:42:09AM -0800, Paul E. McKenney wrote:
> You need volatile semantics to force the compiler to ignore any proofs
> it might otherwise attempt to construct.  Hence all the ACCESS_ONCE()
> calls in my email to Torvald.  (Hopefully I translated your example
> reasonably.)

My brain gave out for today; but it did appear to have the right
structure.

I would prefer it C11 would not require the volatile casts. It should
simply _never_ speculate with atomic writes, volatile or not.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-12 Thread Peter Zijlstra

> I don't know the specifics of your example, but from how I understand
> it, I don't see a problem if the compiler can prove that the store will
> always happen.
> 
> To be more specific, if the compiler can prove that the store will
> happen anyway, and the region of code can be assumed to always run
> atomically (e.g., there's no loop or such in there), then it is known
> that we have one atomic region of code that will always perform the
> store, so we might as well do the stuff in the region in some order.
> 
> Now, if any of the memory accesses are atomic, then the whole region of
> code containing those accesses is often not atomic because other threads
> might observe intermediate results in a data-race-free way.
> 
> (I know that this isn't a very precise formulation, but I hope it brings
> my line of reasoning across.)

So given something like:

if (x)
y = 3;

assuming both x and y are atomic (so don't gimme crap for now knowing
the C11 atomic incantations); and you can prove x is always true; you
don't see a problem with not emitting the conditional?

Avoiding the conditional changes the result; see that control dependency
email from earlier. In the above example the load of X and the store to
Y are strictly ordered, due to control dependencies. Not emitting the
condition and maybe not even emitting the load completely wrecks this.

Its therefore an invalid optimization to take out the conditional or
speculate the store, since it takes out the dependency.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-10 Thread Peter Zijlstra

On Mon, Feb 10, 2014 at 11:49:29AM +, Will Deacon wrote:
> On Mon, Feb 10, 2014 at 11:48:13AM +0000, Peter Zijlstra wrote:
> > On Fri, Feb 07, 2014 at 10:02:16AM -0800, Paul E. McKenney wrote:
> > > As near as I can tell, compiler writers hate the idea of prohibiting
> > > speculative-store optimizations because it requires them to introduce
> > > both control and data dependency tracking into their compilers.  Many of
> > > them seem to hate dependency tracking with a purple passion.  At least,
> > > such a hatred would go a long way towards explaining the incomplete
> > > and high-overhead implementations of memory_order_consume, the long
> > > and successful use of idioms based on the memory_order_consume pattern
> > > notwithstanding [*].  ;-)
> > 
> > Just tell them that because the hardware provides control dependencies
> > we actually use and rely on them.
> 
> s/control/address/ ?

Nope, control.

Since stores cannot be speculated and thus require linear control flow
history we can use it to order LOAD -> STORE when the LOAD is required
for the control flow decision and the STORE depends on the control flow
path.

Also see commit 18c03c61444a211237f3d4782353cb38dba795df to
Documentation/memory-barriers.txt

---
commit c7f2e3cd6c1f4932ccc4135d050eae3f7c7aef63
Author: Peter Zijlstra 
Date:   Mon Nov 25 11:49:10 2013 +0100

perf: Optimize ring-buffer write by depending on control dependencies

Remove a full barrier from the ring-buffer write path by relying on
a control dependency to order a LOAD -> STORE scenario.

Cc: "Paul E. McKenney" 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/n/tip-8alv40z6ikk57jzbaobnx...@git.kernel.org
Signed-off-by: Ingo Molnar 

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index e8b168af135b..146a5792b1d2 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -61,19 +61,20 @@ static void perf_output_put_handle(struct 
perf_output_handle *handle)
 *
 *   kernel user
 *
-*   READ ->data_tail   READ ->data_head
-*   smp_mb()   (A) smp_rmb()   (C)
-*   WRITE $dataREAD $data
-*   smp_wmb()  (B) smp_mb()(D)
-*   STORE ->data_head  WRITE ->data_tail
+*   if (LOAD ->data_tail) {LOAD ->data_head
+*  (A) smp_rmb()   (C)
+*  STORE $data LOAD $data
+*  smp_wmb()   (B) smp_mb()(D)
+*  STORE ->data_head   STORE ->data_tail
+*   }
 *
 * Where A pairs with D, and B pairs with C.
 *
-* I don't think A needs to be a full barrier because we won't in fact
-* write data until we see the store from userspace. So we simply don't
-* issue the data WRITE until we observe it. Be conservative for now.
+* In our case (A) is a control dependency that separates the load of
+* the ->data_tail and the stores of $data. In case ->data_tail
+* indicates there is no room in the buffer to store $data we do not.
 *
-* OTOH, D needs to be a full barrier since it separates the data READ
+* D needs to be a full barrier since it separates the data READ
 * from the tail WRITE.
 *
 * For B a WMB is sufficient since it separates two WRITEs, and for C
@@ -81,7 +82,7 @@ static void perf_output_put_handle(struct perf_output_handle 
*handle)
 *
 * See perf_output_begin().
 */
-   smp_wmb();
+   smp_wmb(); /* B, matches C */
rb->user_page->data_head = head;
 
/*
@@ -144,17 +145,26 @@ int perf_output_begin(struct perf_output_handle *handle,
if (!rb->overwrite &&
unlikely(CIRC_SPACE(head, tail, perf_data_size(rb)) < size))
goto fail;
+
+   /*
+* The above forms a control dependency barrier separating the
+* @tail load above from the data stores below. Since the @tail
+* load is required to compute the branch to fail below.
+*
+* A, matches D; the full memory barrier userspace SHOULD issue
+* after reading the data and before storing the new tail
+* position.
+*
+* See perf_output_put_handle().
+*/
+
head += size;
} while (local_cmpxchg(&rb->head, offset, head) != offset);
 
/*
-* Separate the userpage->tail read from the data stores below.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-10 Thread Peter Zijlstra

On Fri, Feb 07, 2014 at 10:02:16AM -0800, Paul E. McKenney wrote:
> As near as I can tell, compiler writers hate the idea of prohibiting
> speculative-store optimizations because it requires them to introduce
> both control and data dependency tracking into their compilers.  Many of
> them seem to hate dependency tracking with a purple passion.  At least,
> such a hatred would go a long way towards explaining the incomplete
> and high-overhead implementations of memory_order_consume, the long
> and successful use of idioms based on the memory_order_consume pattern
> notwithstanding [*].  ;-)

Just tell them that because the hardware provides control dependencies
we actually use and rely on them.

Not that I expect they care too much what we do, given the current state
of things.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-10 Thread Peter Zijlstra

On Mon, Feb 10, 2014 at 01:27:51AM +0100, Torvald Riegel wrote:
> > Initial state: x == y == 0
> > 
> > T1: r1 = atomic_load_explicit(x, memory_order_relaxed);
> > atomic_store_explicit(42, y, memory_order_relaxed);
> > if (r1 != 42)
> > atomic_store_explicit(r1, y, memory_order_relaxed);
> > 
> > T2: r2 = atomic_load_explicit(y, memory_order_relaxed);
> > atomic_store_explicit(r2, x, memory_order_relaxed);
> 
> Intuitively, this is wrong because this let's the program take a step
> the abstract machine wouldn't do.  This is different to the sequential
> code that Peter posted because it uses atomics, and thus one can't
> easily assume that the difference is not observable.

Yeah, my bad for not being familiar with the atrocious crap C11 made of
atomics :/

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-07 Thread Peter Zijlstra

On Fri, Feb 07, 2014 at 05:13:36PM +, Will Deacon wrote:
> Understood, but that doesn't explain why Paul wants to add ISB/isync
> instructions which affect the *CPU* rather than the compiler!

I doubt Paul wants it, but yeah, I'm curious about that proposal as
well, sounds like someone took a big toke from the bong again; it seems
a favourite past time amongst committees.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-07 Thread Peter Zijlstra

On Fri, Feb 07, 2014 at 04:55:48PM +, Will Deacon wrote:
> Hi Paul,
> 
> On Fri, Feb 07, 2014 at 04:50:28PM +, Paul E. McKenney wrote:
> > On Fri, Feb 07, 2014 at 08:44:05AM +0100, Peter Zijlstra wrote:
> > > On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote:
> > > > Hopefully some discussion of out-of-thin-air values as well.
> > > 
> > > Yes, absolutely shoot store speculation in the head already. Then drive
> > > a wooden stake through its hart.
> > > 
> > > C11/C++11 should not be allowed to claim itself a memory model until that
> > > is sorted.
> > 
> > There actually is a proposal being put forward, but it might not make ARM
> > and Power people happy because it involves adding a compare, a branch,
> > and an ISB/isync after every relaxed load...  Me, I agree with you,
> > much preferring the no-store-speculation approach.
> 
> Can you elaborate a bit on this please? We don't permit speculative stores
> in the ARM architecture, so it seems counter-intuitive that GCC needs to
> emit any additional instructions to prevent that from happening.
> 
> Stores can, of course, be observed out-of-order but that's a lot more
> reasonable :)

This is more about the compiler speculating on stores; imagine:

  if (x)
y = 1;
  else
y = 2;

The compiler is allowed to change that into:

  y = 2;
  if (x)
y = 1;

Which is of course a big problem when you want to rely on the ordering.

There's further problems where things like memset() can write outside
the specified address range. Examples are memset() using single
instructions to wipe entire cachelines and then 'restoring' the tail
bit.

While valid for single threaded, its a complete disaster for concurrent
code.

There's more, but it all boils down to doing stores you don't expect in
a 'sane' concurrent environment and/or don't respect the control flow.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-06 Thread Peter Zijlstra

On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote:
> Hopefully some discussion of out-of-thin-air values as well.

Yes, absolutely shoot store speculation in the head already. Then drive
a wooden stake through its hart.

C11/C++11 should not be allowed to claim itself a memory model until that
is sorted.

Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-13 Thread Peter Zijlstra

On Mon, Aug 12, 2013 at 10:47:37AM -0700, H. Peter Anvin wrote:
> On 08/12/2013 09:09 AM, Peter Zijlstra wrote:
> >>
> >> On the majority of architectures, including x86, you cannot simply copy
> >> a piece of code elsewhere and have it still work.
> > 
> > I thought we used -fPIC which would allow just that.
> > 
> 
> Doubly wrong.  The kernel is not compiled with -fPIC, nor does -fPIC
> allow this kind of movement for code that contains intramodule
> references (that is *all* references in the kernel).  Since we really
> doesn't want to burden the kernel with a GOT and a PLT, that is life.

OK. never mind then..

Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-12 Thread Peter Zijlstra

On Mon, Aug 12, 2013 at 09:02:02AM -0700, Andi Kleen wrote:
> "H. Peter Anvin"  writes:
> 
> > However, I would really like to
> > understand what the value is.
> 
> Probably very little. When I last looked at it, the main overhead in
> perf currently seems to be backtraces and the ring buffer, not this
> code.

backtraces do indeed blow and make pretty much everything else
irrelevant, but when not using them the branch forest was significant
when I last looked at it.

Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-12 Thread Peter Zijlstra

On Mon, Aug 12, 2013 at 07:56:10AM -0700, H. Peter Anvin wrote:
> On 08/12/2013 02:17 AM, Peter Zijlstra wrote:
> > 
> > I've been wanting to 'abuse' static_key/asm-goto to sort-of JIT
> > if-forest functions like perf_prepare_sample() and perf_output_sample().
> > 
> > They are of the form:
> > 
> > void func(obj, args..)
> > {
> > unsigned long f = ...;
> > 
> > if (f & F1)
> > do_f1();
> > 
> > if (f & F2)
> > do_f2();
> > 
> > ...
> > 
> > if (f & FN)
> > do_fn();
> > }
> > 
> 
> Am I reading this right that f can be a combination of any of these?

Correct.

> > Where f is constant for the entire lifetime of the particular object.
> > 
> > So I was thinking of having these functions use static_key/asm-goto;
> > then write the proper static key values unsafe so as to avoid all
> > trickery (as these functions would never actually be used) and copy the
> > end result into object private memory. The object will then use indirect
> > calls into these functions.
> 
> I'm really not following what you are proposing here, especially not
> "copy the end result into object private memory."
> 
> With asm goto you end up with at minimum a jump or NOP for each of these
> function entries, whereas an actual JIT can elide that as well.
> 
> On the majority of architectures, including x86, you cannot simply copy
> a piece of code elsewhere and have it still work.

I thought we used -fPIC which would allow just that.

> You end up doing a
> bunch of the work that a JIT would do anyway, and would end up with
> considerably higher complexity and worse results than a true JIT.  

Well, less complexity but worse result, yes. We'd only poke the specific
static_branch sites with either NOPs or the (relative) jump target for
each of these branches. Then copy the result.

> You
> also say "the object will then use indirect calls into these
> functions"... you mean the JIT or pseudo-JIT generated functions, or the
> calls inside them?

The calls to these pseudo-JIT generated functions.

> > I suppose the question is, do people strenuously object to creativity
> > like that and or is there something GCC can do to make this
> > easier/better still?
> 
> I think it would be much easier to just write a minimal JIT for this,
> even though it is per architecture.  However, I would really like to
> understand what the value is.

Removing a lot of the conditionals from the sample path. Depending on
the configuration these can be quite expensive.

Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-12 Thread Peter Zijlstra

On Mon, Aug 05, 2013 at 12:55:15PM -0400, Steven Rostedt wrote:
> [ sent to both Linux kernel mailing list and to gcc list ]
> 

Let me hijack this thread for something related...

I've been wanting to 'abuse' static_key/asm-goto to sort-of JIT
if-forest functions like perf_prepare_sample() and perf_output_sample().

They are of the form:

void func(obj, args..)
{
unsigned long f = ...;

if (f & F1)
do_f1();

if (f & F2)
do_f2();

...

if (f & FN)
do_fn();
}

Where f is constant for the entire lifetime of the particular object.

So I was thinking of having these functions use static_key/asm-goto;
then write the proper static key values unsafe so as to avoid all
trickery (as these functions would never actually be used) and copy the
end result into object private memory. The object will then use indirect
calls into these functions.

The advantage of using something like this is that it would work for all
architectures that now support the asm-goto feature. For arch/gcc
combinations that do not we'd simply revert to the current state of
affairs.

I suppose the question is, do people strenuously object to creativity
like that and or is there something GCC can do to make this
easier/better still?

Re: WTF?

2009-11-27 Thread Peter Zijlstra

On Wed, 2009-11-25 at 20:38 +0100, Richard Guenther wrote:
> If you can offer advice on how to teach quilt
> (which I belive uses patch) to ignore whitespace changes when
> applying patches then more power to you 

QUILT_PATCH_OPTS="-l"

problems bootstrapping gcc-4.0-20051117 on i386-pc-solaris2.10

2005-11-24 Thread Peter Zijlstra

Hi,

I'm having a lot of problems bootstrapping this compiler on said target.
What I did so far:

from the build dir (which I located in the extracted source dir)
../configure --srcdir=.. --enable-languages="c c++" --with-gnu-as
--with-as=/usr/sfw/bin/gas --without-gnu-ld --with-ld=/usr/ccs/bin/ld

however this fails to select '/usr/sfw/bin/gas' as the assembler; per
configure output is says:

...
checking for i386-pc-solaris2.10-as... no
checking for as... as
...


So I set the environment variable 'AS' to point to gas:
export AS='/usr/sfw/bin/gas'; removed the build dir and started over.
This gave:

...
checking for i386-pc-solaris2.10-as... /usr/sfw/bin/gas
...

after which I continued with: make -j5 bootstrap
At that point it starts building, but fails with an assembler error:

/usr/local/src/gcc-4.0-20051117/gcc/config/i386/gmon-sol2.c:406: warning: 
control reaches end of non-void function
/usr/local/src/gcc-4.0-20051117/gcc/config/i386/gmon-sol2.c: At top level:
/usr/local/src/gcc-4.0-20051117/gcc/config/i386/gmon-sol2.c:58: warning: 
'sccsid' defined but not used
Fixing headers into /usr/local/src/gcc-4.0-20051117/build/gcc/include for 
i386-pc-solaris2.10 target
Assembler:
"", line 1 : Illegal flag (-)
make[2]: *** [gmon.o] Error 1
make[2]: *** Waiting for unfinished jobs


using truss I find that it is actually using /usr/ccs/bin/as ?!?
luckyily it stats a few paths before happening on /usr/ccs/bin/as, so
what I did is to symlink /usr/sfw/bin/gas to one of the other paths (ln
-s /usr/sfw/bin/gas /usr/local/i386-pc-solaris2.10/bin/as) and restart
all over again.

This get me further along, however not quite there. Now it fails to
configure libstd++ with the following error:

checking for exception model to use... configure: error: unable to
detect exception model
make[1]: *** [configure-target-libstdc++-v3] Error 1
make[1]: Leaving directory `/usr/local/src/gcc-4.0-20051117/build'
make: *** [bootstrap] Error 2


I'm sure I'm doing something horribly wrong here, can somebody point me
to the way of a working compiler?

Kind regards,

Peter Zijlstra

Re: C++ vs. pthread_cancel

2005-08-15 Thread Peter Zijlstra

On Mon, 2005-08-15 at 09:53 -0400, Daniel Jacobowitz wrote:
> On Mon, Aug 15, 2005 at 09:51:17AM -0400, Andrew Pinski wrote:
> > > Yes, I'm aware of the list. My question was what the current behaviour
> > > of the various gcc versions is. And if gcc supports the various work
> > > around mentiod. Like explicity configuring the behavour of the 'catch
> > > (...)' etc..
> > 
> > There is none yet because there have been no consensus yet.  That is 
> > why I mentioned the list.  GCC is not going to implement anything
> > until there is a consensus of how to proceed.
> 
> Eh, that's obviously incorrect.  The feature was implemented back in
> 3.3; the behavior hasn't been _changed_ and won't be until there is
> consensus.
> 
> I believe that it's still can be caught, must be rethrown, or the
> program will be aborted.  Someone who knows better than I may want to
> confirm this.
> 

AFAICT this is idd correct; I've been reading the libstdc++5 and NPTL
sources. However once it is caught it is impossible to distinguish from
other exceptions due to the lack of exception information set in:
  glibc-2.3.5/nptl/unwind.c:__pthread_unwind()
Hence once is left in the situation where both forward and backward are
not an option.

Nor do I think they (being the company I work for) will allow me to ship
patched versions of libpthread.so and libstdc++.so.5.

Too bad, guess I have to redesign the issue.

Peter Zijlstra

Re: C++ vs. pthread_cancel

2005-08-15 Thread Peter Zijlstra

On Mon, 2005-08-15 at 09:33 -0400, Andrew Pinski wrote:
> > 
> > Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> > > On this controversial subject, could somebody please - pretty please
> > > with a cherry on top - tell me what the current status is:
> > >  - in general,
> > >  - as implemented in the 3.4 series and
> > >  - as implemented in the 4.0 series.
> > > 
> > > At work we're using 3.4 and we have managed to shoot our foot of with
> > > this issue :-(, google gives a lot of hits on the issue but it is a bit
> > > hard to get the current impl. status for 3.4. Which in turn makes it
> > > hard to decide on how to bandage our foot.
> > 
> > Could you provide a link to a description of the particular
> > problem?  I looked around, and all I could find was
> > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=111548
> > 
> > I suppose the controversial part is that you're using
> > pthread_cancel, which is somewhat frowned upon as
> > inherently unsafe.
> 
> There is a whole mailing list about this:
> http://www.codesourcery.com/archives/c++-pthreads/threads.html
> 
> This has to be done correctly with the C++ standard and POSIX people and
> the GCC people will be involved but not on the GCC list as it just gets
> in the way.

Yes, I'm aware of the list. My question was what the current behaviour
of the various gcc versions is. And if gcc supports the various work
around mentiod. Like explicity configuring the behavour of the 'catch
(...)' etc..

Peter Zijlstra

re: C++ vs. pthread_cancel

2005-08-15 Thread Peter Zijlstra

On Mon, 2005-08-15 at 06:12 -0700, Dan Kegel wrote:
> Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> > On this controversial subject, could somebody please - pretty please
> > with a cherry on top - tell me what the current status is:
> >  - in general,
> >  - as implemented in the 3.4 series and
> >  - as implemented in the 4.0 series.
> > 
> > At work we're using 3.4 and we have managed to shoot our foot of with
> > this issue :-(, google gives a lot of hits on the issue but it is a bit
> > hard to get the current impl. status for 3.4. Which in turn makes it
> > hard to decide on how to bandage our foot.
> 
> Could you provide a link to a description of the particular
> problem?  I looked around, and all I could find was
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=111548
> 
> I suppose the controversial part is that you're using
> pthread_cancel, which is somewhat frowned upon as
> inherently unsafe.
> - Dan
> 

Yes, is seems to be that problem.

Discussion here:
http://gcc.gnu.org/ml/gcc/2003-12/msg00743.html

And this list seems dedicated to the problem:
http://www.codesourcery.com/archives/c++-pthreads/maillist.html


The issue seems to be that pthread_cancel is implemented using
force_unwind, the same mechanism as used for exception handling. And the
interaction is ill defined.

The behaviour of gcc-3.4 is that the unclassified exception caused by
SIGCANCEL can be caught by the catch-all clause: 'catch (...)'. And then
when not rethrown causes an abort.

Peter Zijlstra

C++ vs. pthread_cancel

2005-08-15 Thread Peter Zijlstra

Hi all,

On this controversial subject, could somebody please - pretty please
with a cherry on top - tell me what the current status is:
 - in general,
 - as implemented in the 3.4 series and
 - as implemented in the 4.0 series.

At work we're using 3.4 and we have managed to shoot our foot of with
this issue :-(, google gives a lot of hits on the issue but it is a bit
hard to get the current impl. status for 3.4. Which in turn makes it
hard to decide on how to bandage our foot.

Kind regards,

Peter Zijlstra

37 matches

Mail list logo