Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics

2023-07-07 Thread Peter Zijlstra
On Fri, Jul 07, 2023 at 10:04:06AM -0400, Olivier Dion wrote:
> On Tue, 04 Jul 2023, Peter Zijlstra  wrote:
> > On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote:
> [...]
> >> On x86-64 (gcc 13.1 -O2) we get:
> >> 
> >>   t0():
> >>   movl$1, x(%rip)
> >>   movl$1, %eax
> >>   xchgl   dummy(%rip), %eax
> >>   lock orq $0, (%rsp)   ;; Redundant with previous exchange.
> >>   movly(%rip), %eax
> >>   movl%eax, r0(%rip)
> >>   ret
> >>   t1():
> >>   movl$1, y(%rip)
> >>   lock orq $0, (%rsp)
> >>   movlx(%rip), %eax
> >>   movl%eax, r1(%rip)
> >>   ret
> >
> > So I would expect the compilers to do better here. It should know those
> > __atomic_thread_fence() thingies are superfluous and simply not emit
> > them. This could even be done as a peephole pass later, where it sees
> > consecutive atomic ops and the second being a no-op.
> 
> Indeed, a peephole optimization could work for this Dekker, if the
> compiler adds the pattern for it.  However, AFAIK, a peephole can not be
> applied when the two fences are in different basic blocks.  For example,
> only emitting a fence on a compare_exchange success.  This limitation
> implies that the optimization can not be done across functions/modules
> (shared libraries).

LTO FTW :-)

> For example, it would be interesting to be able to
> promote an acquire fence of a pthread_mutex_lock() to a full fence on
> weakly ordered architectures while preventing a redundant fence on
> strongly ordered architectures.

That's a very non-trivial thing to do. I know Linux has
smp_mb__after_spinlock() and that x86 has it a no-op, but even on x86
adding a full fence after a lock has observable differences IIRC.

Specifically, the actual store that acquires the lock is not well
ordered vs the critical section itself for non-trivial spinlock
implementations (notably qspinlock).

For RCU you mostly care about RCsc locks (IIRC), and upgrading unlock is
a 'simpler' (IMO) approach to achieve that (which is what RCU does with
smp_mb_after_unlock_lock()).

> We know that at least Clang has such peephole optimizations for some
> architecture backends.  It seems however that they do not recognize
> lock-prefixed instructions as fence.

They seem confused in general for emitting MFENCE.

> AFAIK, GCC does not have that kind
> of optimization.

> We are also aware that some research has been done on this topic [0].
> The idea is to use PRE for elimiation of redundant fences.  This would
> work across multiple basic blocks, although the paper focus on
> intra-procedural eliminations.  However, it seems that the latest work
> on that [1] has never been completed [2].
> 
> Our proposed approach provides a mean for the user to express -- and
> document -- the wanted semantic in the source code.  This allows the
> compiler to only emit wanted fences, therefore not relying on
> architecture specific backend optimizations.  In other words, this
> applies even on unoptimized binaries.

I'm not a tool person, but if I were, I'd be very hesitant to add
__builtin functions that 'conflict'/'overlap' with what an optimizer
should be able to do.

Either way around you need work done on the compilers, and I'm thinking
'fixing' the optimizer will benefit far more people than adding
__builtin's.

Then again, I'm not a tools person, so you don't need to convince me.
But one of the selling points of the whole Atomics as a language feature
was that whole optimizer angle. Otherwise you might as well do as we do,
inline asm the world.

I'll shut up now, thanks for that PRE reference [0], that seems a fun
read for when I'm bored.


Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics

2023-07-04 Thread Peter Zijlstra
On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote:

>   int x = 0;
>   int y = 0;
>   int r0, r1;
> 
>   int dummy;
> 
>   void t0(void)
>   {
>   __atomic_store_n(, 1, __ATOMIC_RELAXED);
> 
>   __atomic_exchange_n(, 1, __ATOMIC_SEQ_CST);
>   __atomic_thread_fence(__ATOMIC_SEQ_CST);
> 
>   r0 = __atomic_load_n(, __ATOMIC_RELAXED);
>   }
> 
>   void t1(void)
>   {
>   __atomic_store_n(, 1, __ATOMIC_RELAXED);
>   __atomic_thread_fence(__ATOMIC_SEQ_CST);
>   r1 = __atomic_load_n(, __ATOMIC_RELAXED);
>   }
> 
>   // BUG_ON(r0 == 0 && r1 == 0)
> 
> On x86-64 (gcc 13.1 -O2) we get:
> 
>   t0():
>   movl$1, x(%rip)
>   movl$1, %eax
>   xchgl   dummy(%rip), %eax
>   lock orq $0, (%rsp)   ;; Redundant with previous exchange.
>   movly(%rip), %eax
>   movl%eax, r0(%rip)
>   ret
>   t1():
>   movl$1, y(%rip)
>   lock orq $0, (%rsp)
>   movlx(%rip), %eax
>   movl%eax, r1(%rip)
>   ret

So I would expect the compilers to do better here. It should know those
__atomic_thread_fence() thingies are superfluous and simply not emit
them. This could even be done as a peephole pass later, where it sees
consecutive atomic ops and the second being a no-op.

> On x86-64 (clang 16 -O2) we get:
> 
>   t0():
>   movl$1, x(%rip)
>   movl$1, %eax
>   xchgl   %eax, dummy(%rip)
>   mfence;; Redundant with previous exchange.

And that's just terrible :/ Nobody should be using MFENCE for this. And
using MFENCE after a LOCK prefixes instruction (implicit in this case)
is just fail, because I don't think C++ atomics cover MMIO and other
such 'lovely' things.

>   movly(%rip), %eax
>   movl%eax, r0(%rip)
>   retq
>   t1():
>   movl$1, y(%rip)
>   mfence
>   movlx(%rip), %eax
>   movl%eax, r1(%rip)
>   retq



Re: [RFC/RFT,V2] CFI: Add support for gcc CFI in aarch64

2023-03-27 Thread Peter Zijlstra
On Sat, Mar 25, 2023 at 01:54:16AM -0700, Dan Li wrote:

> In the compiler part[4], most of the content is the same as Sami's
> implementation[3], except for some minor differences, mainly including:
> 
> 1. The function typeid is calculated differently and it is difficult
> to be consistent.

This means there is an effective ABI break between the compilers, which
is sad :-( Is there really nothing to be done about this?


Re: GCC 12 miscompilation of volatile asm (was: Re: [PATCH] arm64/io: Remind compiler that there is a memory side effect)

2022-04-05 Thread Peter Zijlstra
On Tue, Apr 05, 2022 at 01:51:30PM +0100, Mark Rutland wrote:
> Hi all,
> 
> [adding kernel folk who work on asm stuff]
> 
> As a heads-up, GCC 12 (not yet released) appears to erroneously optimize away
> calls to functions with volatile asm. Szabolcs has raised an issue on the GCC
> bugzilla:  
> 
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160
> 
> ... which is a P1 release blocker, and is currently being investigated.
> 
> Jemery originally reported this as an issue with {readl,writel}_relaxed(), but
> the underlying problem doesn't have anything to do with those specifically.
> 
> I'm dumping a bunch of info here largely for posterity / archival, and to find
> out who (from the kernel side) is willing and able to test proposed compiler
> fixes, once those are available.
> 
> I'm happy to do so for aarch64; Peter, I assume you'd be happy to look at the
> x86 side?

Sure..


Re: [PATCH 2/6] Add returns_zero_on_success/failure attributes

2021-11-15 Thread Peter Zijlstra
On Mon, Nov 15, 2021 at 12:33:16PM +0530, Prathamesh Kulkarni wrote:
> On Sun, 14 Nov 2021 at 02:07, David Malcolm via Gcc-patches

> > +/* Handle "returns_zero_on_failure" and "returns_zero_on_success" 
> > attributes;
> > +   arguments as in struct attribute_spec.handler.  */
> > +
> > +static tree
> > +handle_returns_zero_on_attributes (tree *node, tree name, tree, int,
> > +  bool *no_add_attrs)
> > +{
> > +  if (!INTEGRAL_TYPE_P (TREE_TYPE (*node)))
> > +{
> > +  error ("%qE attribute on a function not returning an integral type",
> > +name);
> > +  *no_add_attrs = true;
> > +}
> > +  return NULL_TREE;
> Hi David,
> Just curious if a warning should be emitted if the function is marked
> with the attribute but it's return value isn't actually 0 ?
> 
> There are other constants like -1 or 1 that are often used to indicate
> error, so maybe tweak the attribute to
> take the integer as an argument ?
> Sth like returns_int_on_success(cst) / returns_int_on_failure(cst) ?
> 
> Also, would it make sense to extend it for pointers too for returning
> NULL on success / failure ?

Please also consider that in Linux we use the 'last' page for error code
returns. That is, a function returning a pointer could return '(void
*)-EFAULT' also see linux/err.h


Re: [PATCH 0/6] RFC: adding support to GCC for detecting trust boundaries

2021-11-13 Thread Peter Zijlstra
On Sat, Nov 13, 2021 at 03:37:24PM -0500, David Malcolm wrote:

> This approach is much less expressive that the custom addres space
> approach; it would only cover the trust boundary aspect; it wouldn't
> cover any differences between generic pointers and __user, vs __iomem,
> __percpu, and __rcu which I admit I only dimly understand.

__iomem would point at device memory, which can have curious side
effects or is yet another trust boundary, depending on device and usage.

__percpu is an address space that denotes a per-cpu variable's relative
offset, it needs be combined with a per-cpu offset to get a 'real'
pointer, on x86_64 %gs segment offset is used for this purpose, other
architectures are less fortunate. The whole per_cpu()/this_cpu_*()
family of APIs accepts such pointers.

__rcu is the regular kernel address space, but denotes that the object
pointed to has RCU lifetime management. The attribute is laundered
through rcu_dereference() to remove the __rcu qualifier.

> Possibly silly question: is it always a bug for the value of a kernel
> pointer to leak into user space?  i.e. should I be complaining about an
> infoleak if the value of a trusted_ptr itself is written to
> *untrusted_ptr?  e.g.

Yes, always. Leaking kernel pointers is unconditionally bad.


Re: Re: typeof and operands in named address spaces

2020-11-16 Thread Peter Zijlstra
On Mon, Nov 16, 2020 at 12:23:17PM +, Uecker, Martin wrote:

> > > > Another way to drop qualifiers is using a cast. So you
> > > > can use typeof twice:
> > > > 
> > > > typeof((typeof(_var))_var) tmp__;
> > > > 
> > > > This also works for non-scalars but this is a GCC extension.
> 
> (That casts drop qualifiers is standard C. The extensions
> are 'typeof' and that casts can be used for non-scalar types.)

Ah, I'll clarify. Thanks!


Re: Re: typeof and operands in named address spaces

2020-11-16 Thread Peter Zijlstra
On Mon, Nov 16, 2020 at 12:10:56PM +0100, Peter Zijlstra wrote:

> > > Another way to drop qualifiers is using a cast. So you
> > > can use typeof twice:
> > >
> > > typeof((typeof(_var))_var) tmp__;
> > >
> > > This also works for non-scalars but this is a GCC extension.

FWIW, clang seems to support this extension as well..


Re: Re: typeof and operands in named address spaces

2020-11-16 Thread Peter Zijlstra


( restoring at least linux-toolcha...@vger.kernel.org, since that seems
  to have gone missing )

On Mon, Nov 16, 2020 at 10:11:50AM +0100, Richard Biener wrote:
> On Sun, Nov 15, 2020 at 11:53 AM Uecker, Martin
>  wrote:
> > > On Wed, Nov 04, 2020 at 07:31:42PM +0100, Uros Bizjak wrote:
> > > > Hello!
> > > >
> > > > I was looking at the recent linux patch series [1] where segment
> > > > qualifiers (named address spaces) were introduced to handle percpu
> > > > variables. In the patch [2], the author mentions that:
> > > >
> > > > --q--
> > > > Unfortunately, gcc does not provide a way to remove segment
> > > > qualifiers, which is needed to use typeof() to create local instances
> > > > of the per-cpu variable. For this reason, do not use the segment
> > > > qualifier for per-cpu variables, and do casting using the segment
> > > > qualifier instead.
> > > > --/q--
> > >
> > > C in general does not provide means to strip qualifiers. We recently had
> > > a _lot_ of 'fun' trying to strip volatile from a type, see here:
> > >
> > >   https://lore.kernel.org/lkml/875zimp0ay@mpe.ellerman.id.au
> > >
> > > which resulted in the current __unqual_scalar_typeof() hack.
> > >
> > > If we're going to do compiler extentions here, can we pretty please have
> > > a sane means of modifying qualifiers in general?
> >
> > Another way to drop qualifiers is using a cast. So you
> > can use typeof twice:
> >
> > typeof((typeof(_var))_var) tmp__;
> >
> > This also works for non-scalars but this is a GCC extension.
> >
> >
> > WG14 plans to standardize typeof. I would like to hear opinion
> > whether we should have typeof drop qualifiers or not.
> >
> > Currently, it does not do this on all compilers I tested
> > (except _Atomic on GCC) and there are also use cases for
> > keeping qualifiers. This is an argument for keeping qualifiers
> > should we standardize it, but then we need a way to drop
> > qualifiers.
> >
> >
> > lvalue conversion drops qualifers in C.  In GCC, this is not
> > implemented correctly as it is unobvervable in standard C
> > (but it using typeof).
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97702
> >
> > A have a working patch in preparation to change this. Then you
> > could use
> >
> > typeof( ((void)0, x) )

Neat, that actually already works with clang. And I suppose we can use
the above GCC extention until such time as that GCC is fixed.

See below..

> > to drop qualifiers. But this would then
> > also do array-to-pointer conversion. I am not sure
> > whether this is a problem.

I don't _think_ so, but..

> > Of course, we could also introduce a new feature for
> > dropping qualifiers. Thoughts?

> Just add a new qualifier that un-qualifies?
> 
> _Unqual volatile T x;
> 
> is T with volatile (evenually) removed.  Or just a way to drop
> all using _Unqual?
> 
> _Unqual T x;
> 
> removing all qualifiers from T.  Or add a special _Unqual_all
> to achieve that.  I think removing a specific qualification is
> useful.  Leaves cases like
> 
> _Unqual volatile volatile T x;
> 
> to be specified (that is ordering and cancellation of the
> unqual and qual variants of qualifiers).

I rather like this, however I think I'd prefer the syntax be something
like:

_Unqual T x;

for removing all qualifiers, and:

_Unqual(volatile) volatile T X;

for stripping specific qualifiers. The syntax as proposed above seems
very error prone to me.


---
Subject: compiler: Improve __unqual_typeof()

Improve our __unqual_scalar_typeof() implementation by relying on C
dropping qualifiers for lvalue convesions. There is one small catch in
that GCC is currently known broken in this respect, however it happens
to have a C language extention that achieves the very same, it drops
qualifiers on casts.

This gets rid of the _Generic() usage and should improve compile times
(less preprocessor output) as well as increases the capabilities of the
macros.

XXX: I've only verified the below actually compiles, I've not verified
 the generated code is actually 'correct'.

Suggested-by: "Uecker, Martin" 
Signed-off-by: Peter Zijlstra (Intel) 
---
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 74c6c0486eed..3c5cb52c12f9 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -156,3 +156,11 @@
 #else
 #define __diag_GCC_8(s)
 #endif
+
+/*
+ * GCC has a bug where lvalue conversion doesn't drop qualifiers, use a GCC
+ * extent

Re: typeof and operands in named address spaces

2020-11-10 Thread Peter Zijlstra
On Tue, Nov 10, 2020 at 10:42:58AM -0800, Nick Desaulniers wrote:

> When I think of qualifiers, I think of const and volatile.  I'm not
> sure why the first post I'm cc'ed on talks about "segment" qualifiers.
> Maybe it's in reference to a variable attribute that the kernel
> defines?  Looking at Clang's Qualifier class, I see const, volatile,
> restrict (ah, right), some Objective-C stuff, and address space
> (TR18037 is referenced, I haven't looked up what that is) though maybe
> "segment" pseudo qualifiers the kernel defines expand to address space
> variable attributes?

Right, x86 Named Address Space:

  
https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Named-Address-Spaces.html#Named-Address-Spaces

Also, Google found me this:

  https://reviews.llvm.org/D64676

The basic problem seems to be they act exactly like qualifiers in that
typeof() preserves them, so if you have:

( and now I realize the parent isn't Cc'd to LKML, find here:
  https://gcc.gnu.org/pipermail/gcc/2020-November/234119.html )

> --cut here--
> #define foo(_var)\
> ({\
> typeof(_var) tmp__;\
> asm ("mov %1, %0" : "=r"(tmp__) : "m"(_var));\
> tmp__;\
> })
>
> __seg_fs int x;
>
> int test (void)
> {
> int y;
>
> y = foo (x);
> return y;
> }
> --cut here--

> when compiled with -O2 for x86 target, the compiler reports:
>
> pcpu.c: In function ‘test’:
> pcpu.c:14:3: error: ‘__seg_fs’ specified for auto variable ‘tmp__’


> Maybe stripping all qualifiers is fine since you can add them back in
> if necessary?

So far that seems sufficient. Although the Devil's advocate in me is
trying to construct a case where we need to preserve const but strip
volatile and that's then means we need to detect if the original has
const or not, because unconditionally adding it will be wrong.




Re: typeof and operands in named address spaces

2020-11-09 Thread Peter Zijlstra
On Mon, Nov 09, 2020 at 11:50:15AM -0800, Nick Desaulniers wrote:
> On Mon, Nov 9, 2020 at 11:46 AM Segher Boessenkool
>  wrote:
> >
> > On Mon, Nov 09, 2020 at 01:47:13PM +0100, Peter Zijlstra wrote:
> > >
> > > + lots of people and linux-toolchains
> > >
> > > On Wed, Nov 04, 2020 at 07:31:42PM +0100, Uros Bizjak wrote:
> > > > Hello!
> > > >
> > > > I was looking at the recent linux patch series [1] where segment
> > > > qualifiers (named address spaces) were introduced to handle percpu
> > > > variables. In the patch [2], the author mentions that:
> > > >
> > > > --q--
> > > > Unfortunately, gcc does not provide a way to remove segment
> > > > qualifiers, which is needed to use typeof() to create local instances
> > > > of the per-cpu variable. For this reason, do not use the segment
> > > > qualifier for per-cpu variables, and do casting using the segment
> > > > qualifier instead.
> > > > --/q--
> > >
> > > C in general does not provide means to strip qualifiers.
> >
> > Most ways you can try to use the result are undefined behaviour, even.
> 
> Yes, removing `const` from a `const` declared variable (via cast) then
> expecting to use the result is a great way to have clang omit the use
> from the final program.  This has bitten us in the past getting MIPS
> support up and running, and one of the MTK gfx drivers.

Stripping const to delcare another variable is useful though. Sure C has
sharp edges, esp. if you cast stuff, but since when did that stop anyone
;-)

The point is, C++ has these very nice template helpers that can strip
qualifiers, I want that too, for much of the same reasons. We might not
have templates :-(, but we've become very creative with our
pre-processor.

Surely our __unqual_scalar_typeof() cries for a better solution.


Re: typeof and operands in named address spaces

2020-11-09 Thread Peter Zijlstra
On Mon, Nov 09, 2020 at 01:38:51PM -0600, Segher Boessenkool wrote:
> On Mon, Nov 09, 2020 at 01:47:13PM +0100, Peter Zijlstra wrote:
> > 
> > + lots of people and linux-toolchains
> > 
> > On Wed, Nov 04, 2020 at 07:31:42PM +0100, Uros Bizjak wrote:
> > > Hello!
> > > 
> > > I was looking at the recent linux patch series [1] where segment
> > > qualifiers (named address spaces) were introduced to handle percpu
> > > variables. In the patch [2], the author mentions that:
> > > 
> > > --q--
> > > Unfortunately, gcc does not provide a way to remove segment
> > > qualifiers, which is needed to use typeof() to create local instances
> > > of the per-cpu variable. For this reason, do not use the segment
> > > qualifier for per-cpu variables, and do casting using the segment
> > > qualifier instead.
> > > --/q--
> > 
> > C in general does not provide means to strip qualifiers.
> 
> Most ways you can try to use the result are undefined behaviour, even.
> 
> > We recently had
> > a _lot_ of 'fun' trying to strip volatile from a type, see here:
> > 
> >   https://lore.kernel.org/lkml/875zimp0ay@mpe.ellerman.id.au
> > 
> > which resulted in the current __unqual_scalar_typeof() hack.
> > 
> > If we're going to do compiler extentions here, can we pretty please have
> > a sane means of modifying qualifiers in general?
> 
> What do you want to do with it?  It may be more feasible to do a
> compiler extension for *that*.

Like with the parent use-case it's pretty much always declaring
temporaries in macros. We don't want the temporaries to be volatile, or
as the parent post points out, to have a segment qualifier.



Re: typeof and operands in named address spaces

2020-11-09 Thread Peter Zijlstra


+ lots of people and linux-toolchains

On Wed, Nov 04, 2020 at 07:31:42PM +0100, Uros Bizjak wrote:
> Hello!
> 
> I was looking at the recent linux patch series [1] where segment
> qualifiers (named address spaces) were introduced to handle percpu
> variables. In the patch [2], the author mentions that:
> 
> --q--
> Unfortunately, gcc does not provide a way to remove segment
> qualifiers, which is needed to use typeof() to create local instances
> of the per-cpu variable. For this reason, do not use the segment
> qualifier for per-cpu variables, and do casting using the segment
> qualifier instead.
> --/q--

C in general does not provide means to strip qualifiers. We recently had
a _lot_ of 'fun' trying to strip volatile from a type, see here:

  https://lore.kernel.org/lkml/875zimp0ay@mpe.ellerman.id.au

which resulted in the current __unqual_scalar_typeof() hack.

If we're going to do compiler extentions here, can we pretty please have
a sane means of modifying qualifiers in general?


Re: Broken check rejecting -fcf-protection and -mindirect-branch=thunk-extern

2020-04-28 Thread Peter Zijlstra
On Tue, Apr 28, 2020 at 02:41:33PM +0100, Andrew Cooper wrote:
> Its fine to focus on userspace first, but the kernel is far more simple.
> 
> Looking at that presentation, the only thing missing for kernel is the
> notrack thunks, in the unlikely case that such code would be tolerated
> (Frankly, I don't expect Xen or Linux to run with notrack enabled, as
> there is no legacy code to be concerned with).

Uhhh.. ftrace and kretprobes play dodgy games with the
return stack, doesn't that make the CET thing slightly more interesting?


Re: [PATCH] tell gcc optimizer to never introduce new data races

2014-06-10 Thread Peter Zijlstra
On Tue, Jun 10, 2014 at 03:23:36PM +0200, Jiri Kosina wrote:
 +# Tell gcc to never replace conditional load with a non-conditional one
 +KBUILD_CFLAGS+= $(call cc-option,--param allow-store-data-races=0)
 +

Why do we not want: -fmemory-model=safe? And should we not at the very
least also disable packed-store-data-races?


pgpDwTn3j17ts.pgp
Description: PGP signature


Re: [PATCH] tell gcc optimizer to never introduce new data races

2014-06-10 Thread Peter Zijlstra
On Tue, Jun 10, 2014 at 05:04:55PM +0200, Marek Polacek wrote:
 On Tue, Jun 10, 2014 at 04:53:27PM +0200, Peter Zijlstra wrote:
  On Tue, Jun 10, 2014 at 03:23:36PM +0200, Jiri Kosina wrote:
   +# Tell gcc to never replace conditional load with a non-conditional one
   +KBUILD_CFLAGS+= $(call cc-option,--param allow-store-data-races=0)
   +
  
  Why do we not want: -fmemory-model=safe? And should we not at the very
  least also disable packed-store-data-races?
 
 Note that the option does not exist, even though it is mentioned in the
 documentation.

Urgh.. ok. Any word on the packed-store-data thing?


pgpMWhQCfAGsj.pgp
Description: PGP signature


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-19 Thread Peter Zijlstra
On Wed, Feb 19, 2014 at 12:07:02PM +0100, Torvald Riegel wrote:
  Its not only hardware; also the kernel/user boundary has this same
  problem. We cannot a-priory say what userspace will do; in fact, because
  we're a general purpose OS, we must assume it will willfully try its
  bestest to wreck whatever assumptions we make about its behaviour.
 
 That's a good note, and I think a distinct case from those below,
 because here you're saying that you can't assume that the userspace code
 follows the C11 semantics ...

Right; we can malfunction in those cases though; as long as the
malfunctioning happens on the userspace side. That is, whatever
userspace does should not cause the kernel to crash, but userspace
crashing itself, or getting crap data or whatever is its own damn fault
for not following expected behaviour.

To stay on topic; if the kernel/user interface requires memory ordering
and userspace explicitly omits the barriers all malfunctioning should be
on the user. For instance it might loose a fwd progress guarantee or
data integrity guarantees.

In specific, given a kernel/user lockless producer/consumer buffer, if
the user-side allows the tail write to happen before its data reads are
complete, the kernel might overwrite the data its still reading.

Or in case of futexes, if the user side doesn't use the appropriate
operations its lock state gets corrupt but only userspace should suffer.

But yes, this does require some care and consideration from our side.


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-18 Thread Peter Zijlstra
On Tue, Feb 18, 2014 at 12:12:06PM +, Peter Sewell wrote:
 Several of you have said that the standard and compiler should not
 permit speculative writes of atomics, or (effectively) that the
 compiler should preserve dependencies.

The example below only deals with control dependencies; so I'll limit
myself to that.

 In simple examples it's easy
 to see what that means, but in general it's not so clear what the
 language should guarantee, because dependencies may go via non-atomic
 code in other compilation units, and we have to consider the extent to
 which it's desirable to limit optimisation there.
 
 For example, suppose we have, in one compilation unit:
 
 void f(int ra, int*rb) {
   if (ra==42)
 *rb=42;
   else
 *rb=42;
 }
 
 and in another compilation unit the bodies of two threads:
 
 // Thread 0
 r1 = x;
 f(r1,r2);
 y = r2;
 
 // Thread 1
 r3 = y;
 f(r3,r4);
 x = r4;
 
 where accesses to x and y are annotated C11 atomic
 memory_order_relaxed or Linux ACCESS_ONCE(), accesses to
 r1,r2,r3,r4,ra,rb are not annotated, and x and y initially hold 0.

So I'm intuitively ok with this, however I would expect something like:

  void f(_Atomic int ra, _Atomic int *rb);

To preserve dependencies and not make the conditional go away, simply
because in that case the:

  if (ra == 42)

the 'ra' usage can be seen as an atomic load.

 So as far as we can see, either:
 
 1) if you can accept the latter behaviour (if the Linux codebase does
not rely on its absence), the language definition should permit it,
and current compiler optimisations can be used,

Currently there's exactly 1 site in the Linux kernel that relies on
control dependencies as far as I know -- the one I put in. And its
limited to a single function, so no cross translation unit funnies
there.

Of course, nobody is going to tell me when or where they'll put in the
next one; since its now documented as accepted practise.

However, PaulMck and our RCU usage very much do cross all sorts of TU
boundaries; but those are data dependencies.

~ Peter


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-18 Thread Peter Zijlstra
On Tue, Feb 18, 2014 at 10:21:56PM +0100, Torvald Riegel wrote:
 Yes, I do.  But that seems to be volatile territory.  It crosses the
 boundaries of the abstract machine, and thus is input/output.  Which
 fraction of your atomic accesses can read values produced by hardware?
 I would still suppose that lots of synchronization is not affected by
 this.

Its not only hardware; also the kernel/user boundary has this same
problem. We cannot a-priory say what userspace will do; in fact, because
we're a general purpose OS, we must assume it will willfully try its
bestest to wreck whatever assumptions we make about its behaviour.

We also have loadable modules -- much like regular userspace DSOs -- so
there too we cannot say what will or will not happen.

We also have JITs that generate code on demand.

And I'm absolutely sure (with the exception of the JITs, its not an area
I've worked on) that we have atomic usage across all those boundaries.

I must agree with Linus, global state driven optimizations are crack
brained; esp. for atomics. We simply cannot know all state at compile
time. The best we can hope for are local optimizations.


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-18 Thread Peter Zijlstra
  4.  Some drivers allow user-mode code to mmap() some of their
  state.  Any changes undertaken by the user-mode code would
  be invisible to the compiler.
 
 A good point, but a compiler that doesn't try to (incorrectly) assume
 something about the semantics of mmap will simply see that the mmap'ed
 data will escape to stuff if can't analyze, so it will not be able to
 make a proof.
 
 This is different from, for example, malloc(), which is guaranteed to
 return fresh nonaliasing memory.

The kernel side of this is different.. it looks like 'normal' memory, we
just happen to allow it to end up in userspace too.

But on that point; how do you tell the compiler the difference between
malloc() and mmap()? Is that some function attribute?


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-14 Thread Peter Zijlstra
On Thu, Feb 13, 2014 at 09:07:55PM -0800, Torvald Riegel wrote:
 That depends on what your goal is.

A compiler that we don't need to fight in order to generate sane code
would be nice. But as Linus said; we can continue to ignore you lot and
go on as we've done.


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-12 Thread Peter Zijlstra
 I don't know the specifics of your example, but from how I understand
 it, I don't see a problem if the compiler can prove that the store will
 always happen.
 
 To be more specific, if the compiler can prove that the store will
 happen anyway, and the region of code can be assumed to always run
 atomically (e.g., there's no loop or such in there), then it is known
 that we have one atomic region of code that will always perform the
 store, so we might as well do the stuff in the region in some order.
 
 Now, if any of the memory accesses are atomic, then the whole region of
 code containing those accesses is often not atomic because other threads
 might observe intermediate results in a data-race-free way.
 
 (I know that this isn't a very precise formulation, but I hope it brings
 my line of reasoning across.)

So given something like:

if (x)
y = 3;

assuming both x and y are atomic (so don't gimme crap for now knowing
the C11 atomic incantations); and you can prove x is always true; you
don't see a problem with not emitting the conditional?

Avoiding the conditional changes the result; see that control dependency
email from earlier. In the above example the load of X and the store to
Y are strictly ordered, due to control dependencies. Not emitting the
condition and maybe not even emitting the load completely wrecks this.

Its therefore an invalid optimization to take out the conditional or
speculate the store, since it takes out the dependency.


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-12 Thread Peter Zijlstra
On Wed, Feb 12, 2014 at 09:42:09AM -0800, Paul E. McKenney wrote:
 You need volatile semantics to force the compiler to ignore any proofs
 it might otherwise attempt to construct.  Hence all the ACCESS_ONCE()
 calls in my email to Torvald.  (Hopefully I translated your example
 reasonably.)

My brain gave out for today; but it did appear to have the right
structure.

I would prefer it C11 would not require the volatile casts. It should
simply _never_ speculate with atomic writes, volatile or not.





Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-10 Thread Peter Zijlstra
On Mon, Feb 10, 2014 at 01:27:51AM +0100, Torvald Riegel wrote:
  Initial state: x == y == 0
  
  T1: r1 = atomic_load_explicit(x, memory_order_relaxed);
  atomic_store_explicit(42, y, memory_order_relaxed);
  if (r1 != 42)
  atomic_store_explicit(r1, y, memory_order_relaxed);
  
  T2: r2 = atomic_load_explicit(y, memory_order_relaxed);
  atomic_store_explicit(r2, x, memory_order_relaxed);
 
 Intuitively, this is wrong because this let's the program take a step
 the abstract machine wouldn't do.  This is different to the sequential
 code that Peter posted because it uses atomics, and thus one can't
 easily assume that the difference is not observable.

Yeah, my bad for not being familiar with the atrocious crap C11 made of
atomics :/



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-10 Thread Peter Zijlstra
On Fri, Feb 07, 2014 at 10:02:16AM -0800, Paul E. McKenney wrote:
 As near as I can tell, compiler writers hate the idea of prohibiting
 speculative-store optimizations because it requires them to introduce
 both control and data dependency tracking into their compilers.  Many of
 them seem to hate dependency tracking with a purple passion.  At least,
 such a hatred would go a long way towards explaining the incomplete
 and high-overhead implementations of memory_order_consume, the long
 and successful use of idioms based on the memory_order_consume pattern
 notwithstanding [*].  ;-)

Just tell them that because the hardware provides control dependencies
we actually use and rely on them.

Not that I expect they care too much what we do, given the current state
of things.


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-10 Thread Peter Zijlstra
On Mon, Feb 10, 2014 at 11:49:29AM +, Will Deacon wrote:
 On Mon, Feb 10, 2014 at 11:48:13AM +, Peter Zijlstra wrote:
  On Fri, Feb 07, 2014 at 10:02:16AM -0800, Paul E. McKenney wrote:
   As near as I can tell, compiler writers hate the idea of prohibiting
   speculative-store optimizations because it requires them to introduce
   both control and data dependency tracking into their compilers.  Many of
   them seem to hate dependency tracking with a purple passion.  At least,
   such a hatred would go a long way towards explaining the incomplete
   and high-overhead implementations of memory_order_consume, the long
   and successful use of idioms based on the memory_order_consume pattern
   notwithstanding [*].  ;-)
  
  Just tell them that because the hardware provides control dependencies
  we actually use and rely on them.
 
 s/control/address/ ?

Nope, control.

Since stores cannot be speculated and thus require linear control flow
history we can use it to order LOAD - STORE when the LOAD is required
for the control flow decision and the STORE depends on the control flow
path.

Also see commit 18c03c61444a211237f3d4782353cb38dba795df to
Documentation/memory-barriers.txt

---
commit c7f2e3cd6c1f4932ccc4135d050eae3f7c7aef63
Author: Peter Zijlstra pet...@infradead.org
Date:   Mon Nov 25 11:49:10 2013 +0100

perf: Optimize ring-buffer write by depending on control dependencies

Remove a full barrier from the ring-buffer write path by relying on
a control dependency to order a LOAD - STORE scenario.

Cc: Paul E. McKenney paul...@us.ibm.com
Signed-off-by: Peter Zijlstra pet...@infradead.org
Link: http://lkml.kernel.org/n/tip-8alv40z6ikk57jzbaobnx...@git.kernel.org
Signed-off-by: Ingo Molnar mi...@kernel.org

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index e8b168af135b..146a5792b1d2 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -61,19 +61,20 @@ static void perf_output_put_handle(struct 
perf_output_handle *handle)
 *
 *   kernel user
 *
-*   READ -data_tail   READ -data_head
-*   smp_mb()   (A) smp_rmb()   (C)
-*   WRITE $dataREAD $data
-*   smp_wmb()  (B) smp_mb()(D)
-*   STORE -data_head  WRITE -data_tail
+*   if (LOAD -data_tail) {LOAD -data_head
+*  (A) smp_rmb()   (C)
+*  STORE $data LOAD $data
+*  smp_wmb()   (B) smp_mb()(D)
+*  STORE -data_head   STORE -data_tail
+*   }
 *
 * Where A pairs with D, and B pairs with C.
 *
-* I don't think A needs to be a full barrier because we won't in fact
-* write data until we see the store from userspace. So we simply don't
-* issue the data WRITE until we observe it. Be conservative for now.
+* In our case (A) is a control dependency that separates the load of
+* the -data_tail and the stores of $data. In case -data_tail
+* indicates there is no room in the buffer to store $data we do not.
 *
-* OTOH, D needs to be a full barrier since it separates the data READ
+* D needs to be a full barrier since it separates the data READ
 * from the tail WRITE.
 *
 * For B a WMB is sufficient since it separates two WRITEs, and for C
@@ -81,7 +82,7 @@ static void perf_output_put_handle(struct perf_output_handle 
*handle)
 *
 * See perf_output_begin().
 */
-   smp_wmb();
+   smp_wmb(); /* B, matches C */
rb-user_page-data_head = head;
 
/*
@@ -144,17 +145,26 @@ int perf_output_begin(struct perf_output_handle *handle,
if (!rb-overwrite 
unlikely(CIRC_SPACE(head, tail, perf_data_size(rb))  size))
goto fail;
+
+   /*
+* The above forms a control dependency barrier separating the
+* @tail load above from the data stores below. Since the @tail
+* load is required to compute the branch to fail below.
+*
+* A, matches D; the full memory barrier userspace SHOULD issue
+* after reading the data and before storing the new tail
+* position.
+*
+* See perf_output_put_handle().
+*/
+
head += size;
} while (local_cmpxchg(rb-head, offset, head) != offset);
 
/*
-* Separate the userpage-tail read from the data stores below.
-* Matches the MB userspace SHOULD issue after reading the data
-* and before storing the new tail position.
-*
-* See perf_output_put_handle

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-07 Thread Peter Zijlstra
On Fri, Feb 07, 2014 at 05:13:36PM +, Will Deacon wrote:
 Understood, but that doesn't explain why Paul wants to add ISB/isync
 instructions which affect the *CPU* rather than the compiler!

I doubt Paul wants it, but yeah, I'm curious about that proposal as
well, sounds like someone took a big toke from the bong again; it seems
a favourite past time amongst committees.


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-07 Thread Peter Zijlstra
On Fri, Feb 07, 2014 at 04:55:48PM +, Will Deacon wrote:
 Hi Paul,
 
 On Fri, Feb 07, 2014 at 04:50:28PM +, Paul E. McKenney wrote:
  On Fri, Feb 07, 2014 at 08:44:05AM +0100, Peter Zijlstra wrote:
   On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote:
Hopefully some discussion of out-of-thin-air values as well.
   
   Yes, absolutely shoot store speculation in the head already. Then drive
   a wooden stake through its hart.
   
   C11/C++11 should not be allowed to claim itself a memory model until that
   is sorted.
  
  There actually is a proposal being put forward, but it might not make ARM
  and Power people happy because it involves adding a compare, a branch,
  and an ISB/isync after every relaxed load...  Me, I agree with you,
  much preferring the no-store-speculation approach.
 
 Can you elaborate a bit on this please? We don't permit speculative stores
 in the ARM architecture, so it seems counter-intuitive that GCC needs to
 emit any additional instructions to prevent that from happening.
 
 Stores can, of course, be observed out-of-order but that's a lot more
 reasonable :)

This is more about the compiler speculating on stores; imagine:

  if (x)
y = 1;
  else
y = 2;

The compiler is allowed to change that into:

  y = 2;
  if (x)
y = 1;

Which is of course a big problem when you want to rely on the ordering.

There's further problems where things like memset() can write outside
the specified address range. Examples are memset() using single
instructions to wipe entire cachelines and then 'restoring' the tail
bit.

While valid for single threaded, its a complete disaster for concurrent
code.

There's more, but it all boils down to doing stores you don't expect in
a 'sane' concurrent environment and/or don't respect the control flow.




Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-06 Thread Peter Zijlstra
On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote:
 Hopefully some discussion of out-of-thin-air values as well.

Yes, absolutely shoot store speculation in the head already. Then drive
a wooden stake through its hart.

C11/C++11 should not be allowed to claim itself a memory model until that
is sorted.


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-13 Thread Peter Zijlstra
On Mon, Aug 12, 2013 at 10:47:37AM -0700, H. Peter Anvin wrote:
 On 08/12/2013 09:09 AM, Peter Zijlstra wrote:
 
  On the majority of architectures, including x86, you cannot simply copy
  a piece of code elsewhere and have it still work.
  
  I thought we used -fPIC which would allow just that.
  
 
 Doubly wrong.  The kernel is not compiled with -fPIC, nor does -fPIC
 allow this kind of movement for code that contains intramodule
 references (that is *all* references in the kernel).  Since we really
 doesn't want to burden the kernel with a GOT and a PLT, that is life.

OK. never mind then..


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-12 Thread Peter Zijlstra
On Mon, Aug 05, 2013 at 12:55:15PM -0400, Steven Rostedt wrote:
 [ sent to both Linux kernel mailing list and to gcc list ]
 

Let me hijack this thread for something related...

I've been wanting to 'abuse' static_key/asm-goto to sort-of JIT
if-forest functions like perf_prepare_sample() and perf_output_sample().

They are of the form:

void func(obj, args..)
{
unsigned long f = ...;

if (f  F1)
do_f1();

if (f  F2)
do_f2();

...

if (f  FN)
do_fn();
}

Where f is constant for the entire lifetime of the particular object.

So I was thinking of having these functions use static_key/asm-goto;
then write the proper static key values unsafe so as to avoid all
trickery (as these functions would never actually be used) and copy the
end result into object private memory. The object will then use indirect
calls into these functions.

The advantage of using something like this is that it would work for all
architectures that now support the asm-goto feature. For arch/gcc
combinations that do not we'd simply revert to the current state of
affairs.

I suppose the question is, do people strenuously object to creativity
like that and or is there something GCC can do to make this
easier/better still?



Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-12 Thread Peter Zijlstra
On Mon, Aug 12, 2013 at 07:56:10AM -0700, H. Peter Anvin wrote:
 On 08/12/2013 02:17 AM, Peter Zijlstra wrote:
  
  I've been wanting to 'abuse' static_key/asm-goto to sort-of JIT
  if-forest functions like perf_prepare_sample() and perf_output_sample().
  
  They are of the form:
  
  void func(obj, args..)
  {
  unsigned long f = ...;
  
  if (f  F1)
  do_f1();
  
  if (f  F2)
  do_f2();
  
  ...
  
  if (f  FN)
  do_fn();
  }
  
 
 Am I reading this right that f can be a combination of any of these?

Correct.

  Where f is constant for the entire lifetime of the particular object.
  
  So I was thinking of having these functions use static_key/asm-goto;
  then write the proper static key values unsafe so as to avoid all
  trickery (as these functions would never actually be used) and copy the
  end result into object private memory. The object will then use indirect
  calls into these functions.
 
 I'm really not following what you are proposing here, especially not
 copy the end result into object private memory.
 
 With asm goto you end up with at minimum a jump or NOP for each of these
 function entries, whereas an actual JIT can elide that as well.
 
 On the majority of architectures, including x86, you cannot simply copy
 a piece of code elsewhere and have it still work.

I thought we used -fPIC which would allow just that.

 You end up doing a
 bunch of the work that a JIT would do anyway, and would end up with
 considerably higher complexity and worse results than a true JIT.  

Well, less complexity but worse result, yes. We'd only poke the specific
static_branch sites with either NOPs or the (relative) jump target for
each of these branches. Then copy the result.

 You
 also say the object will then use indirect calls into these
 functions... you mean the JIT or pseudo-JIT generated functions, or the
 calls inside them?

The calls to these pseudo-JIT generated functions.

  I suppose the question is, do people strenuously object to creativity
  like that and or is there something GCC can do to make this
  easier/better still?
 
 I think it would be much easier to just write a minimal JIT for this,
 even though it is per architecture.  However, I would really like to
 understand what the value is.

Removing a lot of the conditionals from the sample path. Depending on
the configuration these can be quite expensive.


Re: WTF?

2009-11-27 Thread Peter Zijlstra
On Wed, 2009-11-25 at 20:38 +0100, Richard Guenther wrote:
 If you can offer advice on how to teach quilt
 (which I belive uses patch) to ignore whitespace changes when
 applying patches then more power to you 

QUILT_PATCH_OPTS=-l


problems bootstrapping gcc-4.0-20051117 on i386-pc-solaris2.10

2005-11-24 Thread Peter Zijlstra
Hi,

I'm having a lot of problems bootstrapping this compiler on said target.
What I did so far:

from the build dir (which I located in the extracted source dir)
../configure --srcdir=.. --enable-languages=c c++ --with-gnu-as
--with-as=/usr/sfw/bin/gas --without-gnu-ld --with-ld=/usr/ccs/bin/ld

however this fails to select '/usr/sfw/bin/gas' as the assembler; per
configure output is says:

...
checking for i386-pc-solaris2.10-as... no
checking for as... as
...


So I set the environment variable 'AS' to point to gas:
export AS='/usr/sfw/bin/gas'; removed the build dir and started over.
This gave:

...
checking for i386-pc-solaris2.10-as... /usr/sfw/bin/gas
...

after which I continued with: make -j5 bootstrap
At that point it starts building, but fails with an assembler error:

/usr/local/src/gcc-4.0-20051117/gcc/config/i386/gmon-sol2.c:406: warning: 
control reaches end of non-void function
/usr/local/src/gcc-4.0-20051117/gcc/config/i386/gmon-sol2.c: At top level:
/usr/local/src/gcc-4.0-20051117/gcc/config/i386/gmon-sol2.c:58: warning: 
'sccsid' defined but not used
Fixing headers into /usr/local/src/gcc-4.0-20051117/build/gcc/include for 
i386-pc-solaris2.10 target
Assembler:
, line 1 : Illegal flag (-)
make[2]: *** [gmon.o] Error 1
make[2]: *** Waiting for unfinished jobs


using truss I find that it is actually using /usr/ccs/bin/as ?!?
luckyily it stats a few paths before happening on /usr/ccs/bin/as, so
what I did is to symlink /usr/sfw/bin/gas to one of the other paths (ln
-s /usr/sfw/bin/gas /usr/local/i386-pc-solaris2.10/bin/as) and restart
all over again.

This get me further along, however not quite there. Now it fails to
configure libstd++ with the following error:

checking for exception model to use... configure: error: unable to
detect exception model
make[1]: *** [configure-target-libstdc++-v3] Error 1
make[1]: Leaving directory `/usr/local/src/gcc-4.0-20051117/build'
make: *** [bootstrap] Error 2


I'm sure I'm doing something horribly wrong here, can somebody point me
to the way of a working compiler?

Kind regards,

Peter Zijlstra



C++ vs. pthread_cancel

2005-08-15 Thread Peter Zijlstra
Hi all,

On this controversial subject, could somebody please - pretty please
with a cherry on top - tell me what the current status is:
 - in general,
 - as implemented in the 3.4 series and
 - as implemented in the 4.0 series.

At work we're using 3.4 and we have managed to shoot our foot of with
this issue :-(, google gives a lot of hits on the issue but it is a bit
hard to get the current impl. status for 3.4. Which in turn makes it
hard to decide on how to bandage our foot.

Kind regards,

Peter Zijlstra



re: C++ vs. pthread_cancel

2005-08-15 Thread Peter Zijlstra
On Mon, 2005-08-15 at 06:12 -0700, Dan Kegel wrote:
 Peter Zijlstra [EMAIL PROTECTED] wrote:
  On this controversial subject, could somebody please - pretty please
  with a cherry on top - tell me what the current status is:
   - in general,
   - as implemented in the 3.4 series and
   - as implemented in the 4.0 series.
  
  At work we're using 3.4 and we have managed to shoot our foot of with
  this issue :-(, google gives a lot of hits on the issue but it is a bit
  hard to get the current impl. status for 3.4. Which in turn makes it
  hard to decide on how to bandage our foot.
 
 Could you provide a link to a description of the particular
 problem?  I looked around, and all I could find was
 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=111548
 
 I suppose the controversial part is that you're using
 pthread_cancel, which is somewhat frowned upon as
 inherently unsafe.
 - Dan
 

Yes, is seems to be that problem.

Discussion here:
http://gcc.gnu.org/ml/gcc/2003-12/msg00743.html

And this list seems dedicated to the problem:
http://www.codesourcery.com/archives/c++-pthreads/maillist.html


The issue seems to be that pthread_cancel is implemented using
force_unwind, the same mechanism as used for exception handling. And the
interaction is ill defined.

The behaviour of gcc-3.4 is that the unclassified exception caused by
SIGCANCEL can be caught by the catch-all clause: 'catch (...)'. And then
when not rethrown causes an abort.

Peter Zijlstra



Re: C++ vs. pthread_cancel

2005-08-15 Thread Peter Zijlstra
On Mon, 2005-08-15 at 09:33 -0400, Andrew Pinski wrote:
  
  Peter Zijlstra [EMAIL PROTECTED] wrote:
   On this controversial subject, could somebody please - pretty please
   with a cherry on top - tell me what the current status is:
- in general,
- as implemented in the 3.4 series and
- as implemented in the 4.0 series.
   
   At work we're using 3.4 and we have managed to shoot our foot of with
   this issue :-(, google gives a lot of hits on the issue but it is a bit
   hard to get the current impl. status for 3.4. Which in turn makes it
   hard to decide on how to bandage our foot.
  
  Could you provide a link to a description of the particular
  problem?  I looked around, and all I could find was
  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=111548
  
  I suppose the controversial part is that you're using
  pthread_cancel, which is somewhat frowned upon as
  inherently unsafe.
 
 There is a whole mailing list about this:
 http://www.codesourcery.com/archives/c++-pthreads/threads.html
 
 This has to be done correctly with the C++ standard and POSIX people and
 the GCC people will be involved but not on the GCC list as it just gets
 in the way.

Yes, I'm aware of the list. My question was what the current behaviour
of the various gcc versions is. And if gcc supports the various work
around mentiod. Like explicity configuring the behavour of the 'catch
(...)' etc..

Peter Zijlstra



Re: C++ vs. pthread_cancel

2005-08-15 Thread Peter Zijlstra
On Mon, 2005-08-15 at 09:53 -0400, Daniel Jacobowitz wrote:
 On Mon, Aug 15, 2005 at 09:51:17AM -0400, Andrew Pinski wrote:
   Yes, I'm aware of the list. My question was what the current behaviour
   of the various gcc versions is. And if gcc supports the various work
   around mentiod. Like explicity configuring the behavour of the 'catch
   (...)' etc..
  
  There is none yet because there have been no consensus yet.  That is 
  why I mentioned the list.  GCC is not going to implement anything
  until there is a consensus of how to proceed.
 
 Eh, that's obviously incorrect.  The feature was implemented back in
 3.3; the behavior hasn't been _changed_ and won't be until there is
 consensus.
 
 I believe that it's still can be caught, must be rethrown, or the
 program will be aborted.  Someone who knows better than I may want to
 confirm this.
 

AFAICT this is idd correct; I've been reading the libstdc++5 and NPTL
sources. However once it is caught it is impossible to distinguish from
other exceptions due to the lack of exception information set in:
  glibc-2.3.5/nptl/unwind.c:__pthread_unwind()
Hence once is left in the situation where both forward and backward are
not an option.

Nor do I think they (being the company I work for) will allow me to ship
patched versions of libpthread.so and libstdc++.so.5.

Too bad, guess I have to redesign the issue.

Peter Zijlstra