subject:"Usefulness of GCC's 64bit __sync

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-12 Thread Dave Martin

On Tue, Jul 12, 2011 at 02:25:09PM +0100, Richard Sandiford wrote:
> Sorry, should have included this in my last reply.
> 
> Dave Martin  writes:
> >> Also, remember this whole discussion is just to print a message and exit 
> >> nicely
> >> in the case of someone using a currently incredibly rare function on an old
> >> kernel!
> >
> > I'd say we want to notify the operating environment and/or the user.  This
> > may be realised by writing some text to stderr, but that's not useful in
> > the context of GUI environments.
> 
> Shouldn't a decent GUI environment be able to catch these kinds of
> errors and log them though?

Ideally yes.  Which is why calling some hook may be better than writing
a message to stderr, because messages written to stderr are rarely
usefully machine-readable (even if not locale-translated).  GUI toolkits
may provide such a hook, but in the case of dnyamic linking errors and
similar, the GUI toolkit is unlikely to have started up at the time the
error occurs.

Anyway, it's out of our scope to solve this here, so long as we're no
worse than the de facto behaviour.

> One problem with early write()s is that we can't do any locale-specific
> translation.  The message is always going to be in English.  That's
> something that the VSO approach solves for free, because it's then
> up to libc or the dynamic loader to print a suitable message.

Agreed, although locale-translated messages on stderr, while good for
humans, tend to be even less machine-readable.

Cheers
---Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-12 Thread Richard Sandiford

Sorry, should have included this in my last reply.

Dave Martin  writes:
>> Also, remember this whole discussion is just to print a message and exit 
>> nicely
>> in the case of someone using a currently incredibly rare function on an old
>> kernel!
>
> I'd say we want to notify the operating environment and/or the user.  This
> may be realised by writing some text to stderr, but that's not useful in
> the context of GUI environments.

Shouldn't a decent GUI environment be able to catch these kinds of
errors and log them though?

One problem with early write()s is that we can't do any locale-specific
translation.  The message is always going to be in English.  That's
something that the VSO approach solves for free, because it's then
up to libc or the dynamic loader to print a suitable message.

Richard

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-12 Thread David Gilbert

On 12 July 2011 13:40, Dave Martin  wrote:
> On Tue, Jul 12, 2011 at 01:10:24PM +0100, David Gilbert wrote:

>> Does it help address rth's concerns though?
>
> Which ones in particular?

Good question - hence my prompt to rth at the bottom; I know he originally
asked why not go to VDSO, so what I don't yet understand is whether
he would accept the other alternatives.


>> Isn't this a bit over-the-top for the failure we're trying to catch?
>
> Yes and no.  Implementing such a solution just for this case would be very
> over-the-top.  But integrating with a general mechanism for reporting errors
> if it exists (now or in the future) is very sensible.

Right - but I don't think we should be doing anything new here; we
can't be the first people to be trying to detect running on an older kernel
and exiting relatively cleanly.

> If it's necessary to to any check at all before concluding that it's safe
> to call a function, I don't see the difference.  Checking a magic number
> in one place seems no better than checking for a version number in another
> place, AFAICT.

Something like AUXV would have seemed cleaner to me; but that's
just preference.

> Having a faulting instruction doesn't permit the program to abort cleanly
> ahead of time (unless you check in advance -- in which case you're really
> back to a magic number again).
>
> This matters if constructors might do things like screwing up the terminal --
> if so, the chance to un-screw thing up should be offered to the program before
> it exits.
>
> Possibly constructors should never do things like that, in which case these
> concerns don't apply.  I don't know what the official line is regarding
> what kind of things global constructors should and shouldn't do.

If I understand correctly that's the whole point of abort() - if they
need to tidy
up then they should be handling SIGABRT.

>> Also, remember this whole discussion is just to print a message and exit 
>> nicely
>> in the case of someone using a currently incredibly rare function on an old
>> kernel!
>
> I'd say we want to notify the operating environment and/or the user.  This
> may be realised by writing some text to stderr, but that's not useful in
> the context of GUI environments.

Yes it is; exiting with a non-zero return value and writing to stderr
is perfectly
normal - whatever launched it can go and display that text.
(Or as I say trapping SIGABRT).

> I am not suggesting that we should engineer a special solution to that
> problem here, so long as we don't defeat potential solutions either.
> For now, it doesn't like any of the proposals risk that though.

Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-12 Thread Richard Sandiford

Dave Martin  writes:
>> To be honest I don't see the point in the more general indirected
>> approach; if we
>> want to be more general then I think we should use IFUNC (it would be the 1st
>> use of it, which means we may have to fix some issues but hey that's what 
>> we're
>> here for).
>
> Does IFUNC rely on support from libc in order to get resolved correctly?
> My concern is that if so, binaries using it will silently misbehave on
> older libcs.

Well, it uses new relocations, so it relies on the dynamic loader
implementing those relocations correctly.  But older loaders from
older libcs will error out if they see that relocation.  Programs
wouldn't silently misbehave if your loader is too old.

Static executables are responsible for resolving their own ifuncs
(in the common start-up code) so they should run on any system.

Richard

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-12 Thread Dave Martin

On Tue, Jul 12, 2011 at 01:10:24PM +0100, David Gilbert wrote:
> On 12 July 2011 11:43, Dave Martin  wrote:
> 
> > Just for context, I had a quick play to get a feel for the feasibility of
> > implementing this directly, without relying either on a VDSO or on IFUNC.
> 
> I originally thought about doing something similar to what you've done
> with the indirection; but eventually convinced myself that it didn't provide
> anything that the initial constructor check didn't provide.
> 
> > It's possible to produce something which works reasonably well: see the
> > attachment.  The result is almost the same as what IFUNC would achieve
> > (although &__kernel_cmpxchg64 leaves something to be desired; macros
> > could potentially fix that).
> 
> Does it help address rth's concerns though?

Which ones in particular?

> > The various helpers are checked independently at startup if they are used,
> > by registering the resolver functions as constructors as others have
> > suggested.  (Note that gcc/libc already fails to do this check for the
> > older kernel helper functions; it's just that you're unlikely to observe
> > a problem on any reasonably new kernel.)
> 
> I don't see any point going back and adding checks for those; the code is
> already there.

Indeed -- I'm merely observing that this problem is not actually new with
cmpxchg64; it's just that it wasn't solved previously.

> > The problem of constructor ordering is solved by pointing each indirect
> > function pointer at the appropriate resolver function initially.
> > Threading is not taken into account because no additional threads can
> > exist during execution of constructors (though if more libraries are
> > later dlopen()'d they may invoke additional constructors in the presence
> > of threads, so this might need fixing).
> 
> I'm not sure threads are a problem; you could have two threads going into
> the check at the same time; with any luck they would both make the same
> decision and rewrite the pointer to the same value and either call the 
> function
> ro exit; I guess you could end up with two threads trying to print the error
> and exit at the same time.

Hmmm, probably right.

> > The main question is the correct way to blow the program up if a needed
> > helper function is not available.  Calling exit() is possibly too gentle,
> > whereas calling abort() may be too aggressive.  libc may have some correct
> > internal mechanism for doing this, but my knowledge falls short of that.
> 
> I took the write() -> abort() from someother libc code that forced an exit.
> 
> > What to do in the case of recursive lookup failures isn't obvious:
> > currently I just call abort().  By this point the process should be
> > running atexit handlers or destructors, so calling exit() isn't going
> > to help in this situation anyway.
> 
> You could always just send yourself a SIGKILL.

Probably, yes.  abort() is almost the same, except that SIGKILL is not
catchable.

It's probably up to libc guys to judge what would be correct.

> > The other question is whether and how the program can be enabled to catch
> > the error and report something intelligible to the user.  If there is a
> > hook for catching dynamic link failures, we should maybe use the same.
> > If this problem is not solved for dynamic linking in general, it doesn't
> > make sense to require the kuser functions to solve it either (except that
> > if the problem is eventually solved, we want the fix to work smoothly for
> > both).
> 
> Isn't this a bit over-the-top for the failure we're trying to catch?

Yes and no.  Implementing such a solution just for this case would be very
over-the-top.  But integrating with a general mechanism for reporting errors
if it exists (now or in the future) is very sensible.

> > Of course, all of these are generic dynamic linking challenges which
> > ld.so likely has some policy or solution for already.
> >
> > Overall, this isn't a perfect solution, but it doesn't feel catastrophically
> > bad either.
> >
> > Any thoughts?
> 
> We seem to be growing solutions rather than figuring out which ones satisfy 
> the
> requirements of both gcc and kernel:
> 
>   1) my original simple solution (about 5 lines of code)
>   2) IFUNC
>   3) Your more general indirected code
>   4) A VDSO

I was just exploring the problem space; I appreciate that's not the same
thing as choosing a solution.

> To be honest I don't see the point in the more general indirected
> approach; if we
> want to be more general then I think we should use IFUNC (it would be the 1st
> use of it, which means we may have to fix some issues but hey that's what 
> we're
> here for).

Does IFUNC rely on support from libc in order to get resolved correctly?
My concern is that if so, binaries using it will silently misbehave on
older libcs.  This seems to be the case at present, though I haven't figured
out whether this is caused by libc or the toolchain.  It may be that I'm
worrying for nothing -- I don't know exac

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-12 Thread David Gilbert

On 12 July 2011 11:43, Dave Martin  wrote:

> Just for context, I had a quick play to get a feel for the feasibility of
> implementing this directly, without relying either on a VDSO or on IFUNC.

I originally thought about doing something similar to what you've done
with the indirection; but eventually convinced myself that it didn't provide
anything that the initial constructor check didn't provide.

> It's possible to produce something which works reasonably well: see the
> attachment.  The result is almost the same as what IFUNC would achieve
> (although &__kernel_cmpxchg64 leaves something to be desired; macros
> could potentially fix that).

Does it help address rth's concerns though?

> The various helpers are checked independently at startup if they are used,
> by registering the resolver functions as constructors as others have
> suggested.  (Note that gcc/libc already fails to do this check for the
> older kernel helper functions; it's just that you're unlikely to observe
> a problem on any reasonably new kernel.)

I don't see any point going back and adding checks for those; the code is
already there.

> The problem of constructor ordering is solved by pointing each indirect
> function pointer at the appropriate resolver function initially.
> Threading is not taken into account because no additional threads can
> exist during execution of constructors (though if more libraries are
> later dlopen()'d they may invoke additional constructors in the presence
> of threads, so this might need fixing).

I'm not sure threads are a problem; you could have two threads going into
the check at the same time; with any luck they would both make the same
decision and rewrite the pointer to the same value and either call the function
ro exit; I guess you could end up with two threads trying to print the error
and exit at the same time.

> The main question is the correct way to blow the program up if a needed
> helper function is not available.  Calling exit() is possibly too gentle,
> whereas calling abort() may be too aggressive.  libc may have some correct
> internal mechanism for doing this, but my knowledge falls short of that.

I took the write() -> abort() from someother libc code that forced an exit.

> What to do in the case of recursive lookup failures isn't obvious:
> currently I just call abort().  By this point the process should be
> running atexit handlers or destructors, so calling exit() isn't going
> to help in this situation anyway.

You could always just send yourself a SIGKILL.

> The other question is whether and how the program can be enabled to catch
> the error and report something intelligible to the user.  If there is a
> hook for catching dynamic link failures, we should maybe use the same.
> If this problem is not solved for dynamic linking in general, it doesn't
> make sense to require the kuser functions to solve it either (except that
> if the problem is eventually solved, we want the fix to work smoothly for
> both).

Isn't this a bit over-the-top for the failure we're trying to catch?

> Of course, all of these are generic dynamic linking challenges which
> ld.so likely has some policy or solution for already.
>
> Overall, this isn't a perfect solution, but it doesn't feel catastrophically
> bad either.
>
> Any thoughts?

We seem to be growing solutions rather than figuring out which ones satisfy the
requirements of both gcc and kernel:

  1) my original simple solution (about 5 lines of code)
  2) IFUNC
  3) Your more general indirected code
  4) A VDSO

To be honest I don't see the point in the more general indirected
approach; if we
want to be more general then I think we should use IFUNC (it would be the 1st
use of it, which means we may have to fix some issues but hey that's what we're
here for).

There is some stuff that is a bit of a shame that it wasn't more
general already;
e.g. those slots in the commpage for helper functions - if they were
filled with a known
marker, you could just check the marker rather than having a separate version,
or a faulting instruction or the like.

Also, remember this whole discussion is just to print a message and exit nicely
in the case of someone using a currently incredibly rare function on an old
kernel!

RTH: From a gcc point of view is anything other than the VDSO acceptible?

Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-12 Thread Dave Martin

On Mon, Jul 11, 2011 at 01:00:08PM +0100, David Gilbert wrote:
> On 11 July 2011 12:30, Dave Martin  wrote:
> > On Mon, Jul 11, 2011 at 10:42:27AM +0100, Richard Sandiford wrote:
> >> Dave Martin  writes:
> >> > IFUNC doesn't solve the problem because either it gets resolved
> >> > lazily (violating the above principle (*)), or we have to force _all_
> >> > symbols to resolve at startup, with may have a significant impact on
> >> > startup time for large programs.
> >>
> >> IFUNCs are never resolved lazily; that's one way in which they differ
> >> from PLTs.
> >>
> >> (BTW, in response to a comment upthread, ARM does support IFUNCs now.)
> >
> > Do you know when support was merged in eglibc?
> >
> > For example, Linaro binutils 2.21.0.20110327-2ubuntu3 appears to support
> > IFUNC, but the natty version of eglibc (2.13-0ubuntu13) does not handle
> > it correctly.
> 
> That's why I didn't try and use it; I thought it was best to wait for
> it to percolate through.
> 
> Richard's patch went into glibc-ports on April 26th and looks like it
> got imported into eglibc-ports
> on 5th May (svn rev 13608); there was also a fix from Joseph Myers in that 
> area
> on 21st June.

Just for context, I had a quick play to get a feel for the feasibility of
implementing this directly, without relying either on a VDSO or on IFUNC.

It's possible to produce something which works reasonably well: see the
attachment.  The result is almost the same as what IFUNC would achieve
(although &__kernel_cmpxchg64 leaves something to be desired; macros
could potentially fix that).

[see attached]

The various helpers are checked independently at startup if they are used,
by registering the resolver functions as constructors as others have
suggested.  (Note that gcc/libc already fails to do this check for the
older kernel helper functions; it's just that you're unlikely to observe
a problem on any reasonably new kernel.)

The problem of constructor ordering is solved by pointing each indirect
function pointer at the appropriate resolver function initially.
Threading is not taken into account because no additional threads can
exist during execution of constructors (though if more libraries are
later dlopen()'d they may invoke additional constructors in the presence
of threads, so this might need fixing).

The main question is the correct way to blow the program up if a needed
helper function is not available.  Calling exit() is possibly too gentle,
whereas calling abort() may be too aggressive.  libc may have some correct
internal mechanism for doing this, but my knowledge falls short of that.

What to do in the case of recursive lookup failures isn't obvious:
currently I just call abort().  By this point the process should be
running atexit handlers or destructors, so calling exit() isn't going
to help in this situation anyway.

The other question is whether and how the program can be enabled to catch
the error and report something intelligible to the user.  If there is a
hook for catching dynamic link failures, we should maybe use the same.
If this problem is not solved for dynamic linking in general, it doesn't
make sense to require the kuser functions to solve it either (except that
if the problem is eventually solved, we want the fix to work smoothly for
both).

Of course, all of these are generic dynamic linking challenges which
ld.so likely has some policy or solution for already.

Overall, this isn't a perfect solution, but it doesn't feel catastrophically
bad either.

Any thoughts?

Cheers
---Dave

libkuser.tar.gz
Description: Binary data
___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-11 Thread David Gilbert

On 11 July 2011 12:30, Dave Martin  wrote:
> On Mon, Jul 11, 2011 at 10:42:27AM +0100, Richard Sandiford wrote:
>> Dave Martin  writes:
>> > IFUNC doesn't solve the problem because either it gets resolved
>> > lazily (violating the above principle (*)), or we have to force _all_
>> > symbols to resolve at startup, with may have a significant impact on
>> > startup time for large programs.
>>
>> IFUNCs are never resolved lazily; that's one way in which they differ
>> from PLTs.
>>
>> (BTW, in response to a comment upthread, ARM does support IFUNCs now.)
>
> Do you know when support was merged in eglibc?
>
> For example, Linaro binutils 2.21.0.20110327-2ubuntu3 appears to support
> IFUNC, but the natty version of eglibc (2.13-0ubuntu13) does not handle
> it correctly.

That's why I didn't try and use it; I thought it was best to wait for
it to percolate through.

Richard's patch went into glibc-ports on April 26th and looks like it
got imported into eglibc-ports
on 5th May (svn rev 13608); there was also a fix from Joseph Myers in that area
on 21st June.

Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-11 Thread Dave Martin

On Mon, Jul 11, 2011 at 10:42:27AM +0100, Richard Sandiford wrote:
> Dave Martin  writes:
> > IFUNC doesn't solve the problem because either it gets resolved
> > lazily (violating the above principle (*)), or we have to force _all_
> > symbols to resolve at startup, with may have a significant impact on
> > startup time for large programs.
> 
> IFUNCs are never resolved lazily; that's one way in which they differ
> from PLTs.
> 
> (BTW, in response to a comment upthread, ARM does support IFUNCs now.)

Do you know when support was merged in eglibc?

For example, Linaro binutils 2.21.0.20110327-2ubuntu3 appears to support
IFUNC, but the natty version of eglibc (2.13-0ubuntu13) does not handle
it correctly.

Cheers
---Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-11 Thread Dave Martin

On Mon, Jul 11, 2011 at 10:42:27AM +0100, Richard Sandiford wrote:
> Dave Martin  writes:
> > IFUNC doesn't solve the problem because either it gets resolved
> > lazily (violating the above principle (*)), or we have to force _all_
> > symbols to resolve at startup, with may have a significant impact on
> > startup time for large programs.
> 
> IFUNCs are never resolved lazily; that's one way in which they differ
> from PLTs.
> 
> (BTW, in response to a comment upthread, ARM does support IFUNCs now.)

Hmmm, OK then -- the point about IFUNC not being suitable for this case
is perhaps less valid, in that case.

Cheers
---Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-11 Thread Richard Sandiford

Dave Martin  writes:
> IFUNC doesn't solve the problem because either it gets resolved
> lazily (violating the above principle (*)), or we have to force _all_
> symbols to resolve at startup, with may have a significant impact on
> startup time for large programs.

IFUNCs are never resolved lazily; that's one way in which they differ
from PLTs.

(BTW, in response to a comment upthread, ARM does support IFUNCs now.)

Richard

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-11 Thread Peter Maydell

On 11 July 2011 09:42, David Gilbert  wrote:
> On 11 July 2011 09:36, Dave Martin  wrote:
>> On Sat, Jul 09, 2011 at 12:29:01AM +0100, Peter Maydell wrote:
>>> QEMU supports calls into the fixed vector page (it just special cases
>>> attempts to execute at addresses >= 0x and emits code to do
>>> "cause special purpose exception" so we get control back at runtime).
>>> What it doesn't support is attempts to do a load from the vector page,
>>> because so far nobody has had a need to probe the vector page for what
>>> it supports.
>>
>> To solve that particular problem, can you just map a page in RAM with the
>> right content for __kernel_helper_version?
>
> We could, but that's dependent on the target system letting us do that.
> (for that matter if the target system decided to map another page there
> it won't fault access to it and then we're going to get who knows what).

To clarify, that's the system we're running on (usually x86, more often
called the "host system"). This is because in linux-user emulation mode
QEMU doesn't completely manage the guest address space, but just exposes
a contiguous chunk of the host QEMU process' address space to the target.
So if the host kernel has mapped something in the area corresponding to
where the guest vector page is supposed to be then you're stuck.

(This implementation has other problems, eg
https://bugs.launchpad.net/qemu-linaro/+bug/806873
but has the advantage of being fast... In the long term I quite like
the idea suggested by rth on qemu-devel of making the linux-user mode
use qemu's softmmu so we actually control the guest process address
space completely; however that's a fairly substantial bit of rework.)

-- PMM

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-11 Thread David Gilbert

On 11 July 2011 09:36, Dave Martin  wrote:
> On Sat, Jul 09, 2011 at 12:29:01AM +0100, Peter Maydell wrote:
>> On 8 July 2011 19:32, Nicolas Pitre  wrote:
>> > On Fri, 8 Jul 2011, Dave Martin wrote:
>> >> On Fri, Jul 08, 2011 at 12:21:27AM +0100, David Gilbert wrote:
>> >> > Nicolas just added; Richard's argument is that if it was actually a
>> >> > VDSO I'd just have linked against a symbol and if the symbol wasn't
>> >> > there then I would have got a fairly normal linker error - I wouldn't
>> >> > have needed any special code; and to be honest I think that would also
>> >> > solve the problem that accessing that wacky address is also a pain for
>> >> > things like qemu because we're going to have to intercept that read.
>> >>
>> >> Note that pre-existing code already calls into the vectors page.
>> >
>> > Right, qemu must be intercepting it already.  The kernel helpers located
>> > at a fixed address in the vector page have been around for more than 6
>> > years.
>>
>> QEMU supports calls into the fixed vector page (it just special cases
>> attempts to execute at addresses >= 0x and emits code to do
>> "cause special purpose exception" so we get control back at runtime).
>> What it doesn't support is attempts to do a load from the vector page,
>> because so far nobody has had a need to probe the vector page for what
>> it supports.
>
> To solve that particular problem, can you just map a page in RAM with the
> right content for __kernel_helper_version?

We could, but that's dependent on the target system letting us do that.
(for that matter if the target system decided to map another page there
it won't fault access to it and then we're going to get who knows what).

Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-11 Thread Dave Martin

On Sat, Jul 09, 2011 at 12:29:01AM +0100, Peter Maydell wrote:
> On 8 July 2011 19:32, Nicolas Pitre  wrote:
> > On Fri, 8 Jul 2011, Dave Martin wrote:
> >> On Fri, Jul 08, 2011 at 12:21:27AM +0100, David Gilbert wrote:
> >> > Nicolas just added; Richard's argument is that if it was actually a
> >> > VDSO I'd just have linked against a symbol and if the symbol wasn't
> >> > there then I would have got a fairly normal linker error - I wouldn't
> >> > have needed any special code; and to be honest I think that would also
> >> > solve the problem that accessing that wacky address is also a pain for
> >> > things like qemu because we're going to have to intercept that read.
> >>
> >> Note that pre-existing code already calls into the vectors page.
> >
> > Right, qemu must be intercepting it already.  The kernel helpers located
> > at a fixed address in the vector page have been around for more than 6
> > years.
> 
> QEMU supports calls into the fixed vector page (it just special cases
> attempts to execute at addresses >= 0x and emits code to do
> "cause special purpose exception" so we get control back at runtime).
> What it doesn't support is attempts to do a load from the vector page,
> because so far nobody has had a need to probe the vector page for what
> it supports.

To solve that particular problem, can you just map a page in RAM with the
right content for __kernel_helper_version?

The value there can be a static feature of QEMU, indicating which kuser
helper functions QEMU knows how to emulate.

User processes shouldn't read anything else from the vectors page, at least
for now.

Cheers
---Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-11 Thread Dave Martin

On Fri, Jul 08, 2011 at 11:36:30AM -0700, Richard Henderson wrote:
> On 07/08/2011 09:55 AM, Dave Martin wrote:
> > Talking to Will Deacon about this, it sounds like there may be little
> > appetite for VDSO-ifying the vectors page unless there's a real, concrete
> > benefit.
> > 
> > Making the libc startup's job slightly easier probably doesn't count
> > as such a benefit, but if it renders possible something which is
> > otherwise impossible to do from userspace than that would be more
> > compelling.
> 
> I don't consider it anything to do with "making libc's job easier",
> but of providing a standard mechanism for version control of the ABI.
> 
> All you're going to do by using a different mechanism is re-invent
> the wheel.
> 
> >> There's limited support for dlopen within statically linked 
> >> programs as well.  The userland side can provide a static
> >> interface which defers to the kernel implementation.
> > 
> > But you don't dlopen the VDSO, it's just mapped by the kernel
> > on process startup.  To find it, on other architectures something
> > in the C library needs to grab the base address passed by the
> > kernel in the aux vector.
> 
> Of course.  Did you think I didn't know that?  And so does the
> dynamic linker, and the dynamic linker stub in libc.a.  The point
> is to get a handle that you can pass into the other functions
> like dlsym.

I guess I misunderstood the point you were making when you referred
to being able to call dlopen from statically-linked programs...

> See also _dl_vdso_vsym in libc/sysdeps/unix/sysv/linux/dl-vdso.c,
> and the uses in libc/sysdeps/unix/sysv/linux/*/init-first.c.
> 
> > Can individual IFUNCs be set to resolve at startup?
> 
> No.  They're kinda sorta a special type of PLT entry.  So they're
> resolved either when called, or at startup.  See also LD_BIND_NOW
> for the entire program and DT_BIND_NOW (-Wl,-z,now) for individual
> libraries and executables.

The key problem that we have is not how to direct the 64-bit atomics
to the correct function, but rather what to do about it when there
is no suitable target function at all.

I really think we have to detect that on process startup -- detecting
that and destroying the program just when it first tries to do a
cmpxchg64 is unacceptable IMHO. (*)

I don't think that a constructor function solves this in a maintainable
way, because there is no metadata allowing us to determine when this
check should be done relative to the other constructors.  Simply
putting it first raises the question of what order to do things in if
we also have another contructor which much be called first.

IFUNC doesn't solve the problem because either it gets resolved
lazily (violating the above principle (*)), or we have to force _all_
symbols to resolve at startup, with may have a significant impact on
startup time for large programs.

We could modify the libc startup objects to do the check, but that
feels a bit unclean.

That brings us back to VDSO -- am I right in thinking that the symbol
versioning checks allow the program to be killed on startup of the
VDSO ABI is tool old?  If so, maybe that's the right approach after
all.

This doesn't replace the existing interface, and the few non-libc
based programs can still interact directly with the vectors page to
do the same check if they need to.

Cheers
---Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-08 Thread Nicolas Pitre

On Fri, 8 Jul 2011, Richard Henderson wrote:

> I find this attitude to be short-sighted.  These exact same arguments
> were made about a.out vs elf, and all the horrible extra overhead
> that elf has with its plts and dynamic symbol resolution.

The fact that people came up with hacks such as prelinking must not be 
for nothing.

> I urge you to go ahead and create the VDSO anyway, even if you 
> continue to bypass it with new fixed addresses for new entry points.

I have no problem with that at all.  Keeping the fixed address ABI will 
at least provide an alternative to those who are always after the last 
bit of performance because they don't know better than calling get_tls() 
a couple thousand times per second within the same thread.

> The VDSO does more than simply hold the code for those functions,
> it also describes them with the symbol and symbol versioning tables,
> and the unwind info.  This is of enormous value to debugging and
> other introspection tools.

Absolutely.

Nicolas

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-08 Thread Richard Henderson

On 07/08/2011 12:33 PM, Nicolas Pitre wrote:
> I'm not sure I agree.  We're talking about extremely lightweight 
> functions here, in the order of a very few assembly instructions only.  
> Adding a significant overhead relative to their cost is not very 
> appealing.
...
> But my point is that they 
> were willing to do such hacks to completely avoid the call overhead, and 
> in such cases I don't see adding to it with a full blown VDSO as 
> something positive.

My last word on a subject I'm not currently being paid to care about:

I find this attitude to be short-sighted.  These exact same arguments
were made about a.out vs elf, and all the horrible extra overhead
that elf has with its plts and dynamic symbol resolution.  And yet,
thankfully, a.out has gone the way of the dodo.

I urge you to go ahead and create the VDSO anyway, even if you 
continue to bypass it with new fixed addresses for new entry points.
The VDSO does more than simply hold the code for those functions,
it also describes them with the symbol and symbol versioning tables,
and the unwind info.  This is of enormous value to debugging and
other introspection tools.

r~

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-08 Thread Richard Henderson

On 07/08/2011 09:55 AM, Dave Martin wrote:
> Talking to Will Deacon about this, it sounds like there may be little
> appetite for VDSO-ifying the vectors page unless there's a real, concrete
> benefit.
> 
> Making the libc startup's job slightly easier probably doesn't count
> as such a benefit, but if it renders possible something which is
> otherwise impossible to do from userspace than that would be more
> compelling.

I don't consider it anything to do with "making libc's job easier",
but of providing a standard mechanism for version control of the ABI.

All you're going to do by using a different mechanism is re-invent
the wheel.

>> There's limited support for dlopen within statically linked 
>> programs as well.  The userland side can provide a static
>> interface which defers to the kernel implementation.
> 
> But you don't dlopen the VDSO, it's just mapped by the kernel
> on process startup.  To find it, on other architectures something
> in the C library needs to grab the base address passed by the
> kernel in the aux vector.

Of course.  Did you think I didn't know that?  And so does the
dynamic linker, and the dynamic linker stub in libc.a.  The point
is to get a handle that you can pass into the other functions
like dlsym.

See also _dl_vdso_vsym in libc/sysdeps/unix/sysv/linux/dl-vdso.c,
and the uses in libc/sysdeps/unix/sysv/linux/*/init-first.c.

> Can individual IFUNCs be set to resolve at startup?

No.  They're kinda sorta a special type of PLT entry.  So they're
resolved either when called, or at startup.  See also LD_BIND_NOW
for the entire program and DT_BIND_NOW (-Wl,-z,now) for individual
libraries and executables.

r~

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-08 Thread Peter Maydell

On 8 July 2011 19:32, Nicolas Pitre  wrote:
> On Fri, 8 Jul 2011, Dave Martin wrote:
>> On Fri, Jul 08, 2011 at 12:21:27AM +0100, David Gilbert wrote:
>> > Nicolas just added; Richard's argument is that if it was actually a
>> > VDSO I'd just have linked against a symbol and if the symbol wasn't
>> > there then I would have got a fairly normal linker error - I wouldn't
>> > have needed any special code; and to be honest I think that would also
>> > solve the problem that accessing that wacky address is also a pain for
>> > things like qemu because we're going to have to intercept that read.
>>
>> Note that pre-existing code already calls into the vectors page.
>
> Right, qemu must be intercepting it already.  The kernel helpers located
> at a fixed address in the vector page have been around for more than 6
> years.

QEMU supports calls into the fixed vector page (it just special cases
attempts to execute at addresses >= 0x and emits code to do
"cause special purpose exception" so we get control back at runtime).
What it doesn't support is attempts to do a load from the vector page,
because so far nobody has had a need to probe the vector page for what
it supports.

-- PMM

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-08 Thread Nicolas Pitre

On Fri, 8 Jul 2011, Richard Henderson wrote:

> On 07/08/2011 01:23 AM, Richard Earnshaw wrote:
> > There is a slight performance hit to using a VDSO in that each entry
> > will need to go through the PLT rather than jumping directly to the
> > helper function in the kernel.
> 
> Yes.  But IMO the flexibility gained is worth it.

I'm not sure I agree.  We're talking about extremely lightweight 
functions here, in the order of a very few assembly instructions only.  
Adding a significant overhead relative to their cost is not very 
appealing.  For example, we have this code located at 0x0fe0 to 
retrieve the TLS value.  Here's the non-SMP implementation:

ldr r0, [pc, #(16 - 8)]
bx  lr

Of course the location relative to the pc where the TLS value is 
retrieved is implementation specific and not part of the ABI at all.  
Yet, some people found the call to this code too much overhead and 
started fetching the TLS value directly from memory themselves (*).  
Obviously their program would break if executed on a SMP system because 
then the TLS value is not stored in memory.  But my point is that they 
were willing to do such hacks to completely avoid the call overhead, and 
in such cases I don't see adding to it with a full blown VDSO as 
something positive.

(*) I even considered changing the location of the TLS value in that 
case to break those abusers and make it clear that this is not the 
proper interface.

Nicolas

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-08 Thread Nicolas Pitre

On Fri, 8 Jul 2011, Dave Martin wrote:

> On Fri, Jul 08, 2011 at 12:21:27AM +0100, David Gilbert wrote:
> > Nicolas just added; Richard's argument is that if it was actually a
> > VDSO I'd just have linked against a symbol and if the symbol wasn't
> > there then I would have got a fairly normal linker error - I wouldn't
> > have needed any special code; and to be honest I think that would also
> > solve the problem that accessing that wacky address is also a pain for
> > things like qemu because we're going to have to intercept that read.
> 
> Note that pre-existing code already calls into the vectors page.

Right, qemu must be intercepting it already.  The kernel helpers located 
at a fixed address in the vector page have been around for more than 6 
years.

> This doesn't necessarily mean that adding a VDSO interface is bad,
> but it does mean the existing interface can't easily be changed or
> removed.

Agreed.  There is nothing preventing the addition of a VDSO, but I've 
yet to see a convincing argument for spending time adding it.


Nicolas

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-08 Thread Dave Martin

On Fri, Jul 08, 2011 at 12:21:27AM +0100, David Gilbert wrote:
> On 5 July 2011 15:49, Dave Martin  wrote:
> > On Fri, Jul 1, 2011 at 6:10 PM, David Gilbert  
> > wrote:

[...]

> My patch is having to test an arbitrary address in the commpage in
> init code to see if it's new enough to have the new helper that

Minor niggle: the address is in no way arbitrary.  Although there's no
corresponding C symbol, it's a well-known absolute address.

> Nicolas just added; Richard's argument is that if it was actually a
> VDSO I'd just have linked against a symbol and if the symbol wasn't
> there then I would have got a fairly normal linker error - I wouldn't
> have needed any special code; and to be honest I think that would also
> solve the problem that accessing that wacky address is also a pain for
> things like qemu because we're going to have to intercept that read.

Note that pre-existing code already calls into the vectors page.
Introducing a VDSO interface will not automatically cause all software
to stop using the binary interface.

How much of a problem that is in practice, I don't know.

This doesn't necessarily mean that adding a VDSO interface is bad,
but it does mean the existing interface can't easily be changed or
removed.

Cheers
---Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-08 Thread Dave Martin

On Fri, Jul 08, 2011 at 09:03:51AM -0700, Richard Henderson wrote:
> On 07/07/2011 04:21 PM, David Gilbert wrote:
> >> We could possibly wrap the vectors page to make it look like a DSO in
> >> a forwards-compatible way, but since this has not happened so far it
> >> feels like people never saw much benefit.
> > 
> > Any idea how big a job that is?
> 
> It's slightly tricky to get the memory layout correct, but ought
> not be *that* hard.  The x86_64 port also had legacy absolute
> addresses to deal with.

Talking to Will Deacon about this, it sounds like there may be little
appetite for VDSO-ifying the vectors page unless there's a real, concrete
benefit.

Making the libc startup's job slightly easier probably doesn't count
as such a benefit, but if it renders possible something which is
otherwise impossible to do from userspace than that would be more
compelling.

> >> Can you describe the benefits with regard to this case?  (And, out of
> >> interest, how do statically-linked programs make use of needed
> >> functionality in the VDSO?
> 
> There's limited support for dlopen within statically linked 
> programs as well.  The userland side can provide a static
> interface which defers to the kernel implementation.

But you don't dlopen the VDSO, it's just mapped by the kernel
on process startup.  To find it, on other architectures something
in the C library needs to grab the base address passed by the
kernel in the aux vector.

If calls into the VDSO are needed, the C library might have
to find the VDSO and get its symbols by special means in the case
of statically linked programs.  I'm not sure how this works
in practice for architectures using a VDSO.

> Ideally this would be done with an IFUNC relocation, but ARM
> binutils doesn't support that feature yet.  Thankfully it is
> ABI compatible to replace a normal function with an IFUNC at
> a later date.

If we can do this properly using IFUNC in the future, then it
might make sense to go for an interim solution for now, and tidy
it up when IFUNC is available.

Can individual IFUNCs be set to resolve at startup?

I'm still not sure how that would work for static executables...

Cheers
---Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-08 Thread Richard Earnshaw

On 08/07/11 17:04, Richard Henderson wrote:
> On 07/08/2011 01:23 AM, Richard Earnshaw wrote:
>> There is a slight performance hit to using a VDSO in that each entry
>> will need to go through the PLT rather than jumping directly to the
>> helper function in the kernel.
> 
> Yes.  But IMO the flexibility gained is worth it.
> 
> 
> r~
> 

How do statically linked executables work?

R.


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-08 Thread Richard Henderson

On 07/08/2011 01:23 AM, Richard Earnshaw wrote:
> There is a slight performance hit to using a VDSO in that each entry
> will need to go through the PLT rather than jumping directly to the
> helper function in the kernel.

Yes.  But IMO the flexibility gained is worth it.


r~

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-08 Thread Richard Henderson

On 07/07/2011 04:21 PM, David Gilbert wrote:
>> We could possibly wrap the vectors page to make it look like a DSO in
>> a forwards-compatible way, but since this has not happened so far it
>> feels like people never saw much benefit.
> 
> Any idea how big a job that is?

It's slightly tricky to get the memory layout correct, but ought
not be *that* hard.  The x86_64 port also had legacy absolute
addresses to deal with.

>> Can you describe the benefits with regard to this case?  (And, out of
>> interest, how do statically-linked programs make use of needed
>> functionality in the VDSO?

There's limited support for dlopen within statically linked 
programs as well.  The userland side can provide a static
interface which defers to the kernel implementation.

Ideally this would be done with an IFUNC relocation, but ARM
binutils doesn't support that feature yet.  Thankfully it is
ABI compatible to replace a normal function with an IFUNC at
a later date.

r~

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-08 Thread Richard Earnshaw

On 08/07/11 00:21, David Gilbert wrote:
> On 5 July 2011 15:49, Dave Martin  wrote:
>> On Fri, Jul 1, 2011 at 6:10 PM, David Gilbert  
>> wrote:
>>> Hi All,
>>>  I've just submitted the patches for the 64 bit atomic stuff to the
>>> gcc-patches list.
>>> Richard Henderson has raised the question of why the ARM commpage isn't a 
>>> full
>>> VDSO and, if it was, then it would make the version number check a lot 
>>> simpler.
>>>
>>> What's the history behind this/how big a job is it/
>>
>> I think it's there partly for historical reasons, and partly because
>> there has to be a page mapped somewhere for the exception vectors
>> anyway, so we may as well put other snippets of required userspace
>> code there.
> 
> OK.
> 
>>> Dave (on holiday for a week but checking mail intermittently)
>>
>> Because existing userspace source and binaries already call directly
>> into the helper functions in the vectors page, we can't easily
>> remove/move it.
>>
>> We could possibly wrap the vectors page to make it look like a DSO in
>> a forwards-compatible way, but since this has not happened so far it
>> feels like people never saw much benefit.
> 
> Any idea how big a job that is?
> 
>> Can you describe the benefits with regard to this case?  (And, out of
>> interest, how do statically-linked programs make use of needed
>> functionality in the VDSO?  Browsing the web hasn't revealed any
>> coherent answers to that question for me...)
> 
> Richard's question to my patch is here:
> 
> http://old.nabble.com/-Patch-0-3--ARM-64-bit-atomic-operations-tp31974816p31974901.html
> 
> My patch is having to test an arbitrary address in the commpage in
> init code to see if it's new enough to have the new helper that
> Nicolas just added; Richard's argument is that if it was actually a
> VDSO I'd just have linked against a symbol and if the symbol wasn't
> there then I would have got a fairly normal linker error - I wouldn't
> have needed any special code; and to be honest I think that would also
> solve the problem that accessing that wacky address is also a pain for
> things like qemu because we're going to have to intercept that read.
> 
> (I've added Richard to the cc).

There is a slight performance hit to using a VDSO in that each entry
will need to go through the PLT rather than jumping directly to the
helper function in the kernel.

R.

R.



___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-07 Thread David Gilbert

On 5 July 2011 15:49, Dave Martin  wrote:
> On Fri, Jul 1, 2011 at 6:10 PM, David Gilbert  
> wrote:
>> Hi All,
>>  I've just submitted the patches for the 64 bit atomic stuff to the
>> gcc-patches list.
>> Richard Henderson has raised the question of why the ARM commpage isn't a 
>> full
>> VDSO and, if it was, then it would make the version number check a lot 
>> simpler.
>>
>> What's the history behind this/how big a job is it/
>
> I think it's there partly for historical reasons, and partly because
> there has to be a page mapped somewhere for the exception vectors
> anyway, so we may as well put other snippets of required userspace
> code there.

OK.

>> Dave (on holiday for a week but checking mail intermittently)
>
> Because existing userspace source and binaries already call directly
> into the helper functions in the vectors page, we can't easily
> remove/move it.
>
> We could possibly wrap the vectors page to make it look like a DSO in
> a forwards-compatible way, but since this has not happened so far it
> feels like people never saw much benefit.

Any idea how big a job that is?

> Can you describe the benefits with regard to this case?  (And, out of
> interest, how do statically-linked programs make use of needed
> functionality in the VDSO?  Browsing the web hasn't revealed any
> coherent answers to that question for me...)

Richard's question to my patch is here:

http://old.nabble.com/-Patch-0-3--ARM-64-bit-atomic-operations-tp31974816p31974901.html

My patch is having to test an arbitrary address in the commpage in
init code to see if it's new enough to have the new helper that
Nicolas just added; Richard's argument is that if it was actually a
VDSO I'd just have linked against a symbol and if the symbol wasn't
there then I would have got a fairly normal linker error - I wouldn't
have needed any special code; and to be honest I think that would also
solve the problem that accessing that wacky address is also a pain for
things like qemu because we're going to have to intercept that read.

(I've added Richard to the cc).

Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-05 Thread Dave Martin

On Fri, Jul 1, 2011 at 6:10 PM, David Gilbert  wrote:
> Hi All,
>  I've just submitted the patches for the 64 bit atomic stuff to the
> gcc-patches list.
> Richard Henderson has raised the question of why the ARM commpage isn't a full
> VDSO and, if it was, then it would make the version number check a lot 
> simpler.
>
> What's the history behind this/how big a job is it/

I think it's there partly for historical reasons, and partly because
there has to be a page mapped somewhere for the exception vectors
anyway, so we may as well put other snippets of required userspace
code there.

> Dave (on holiday for a week but checking mail intermittently)

Because existing userspace source and binaries already call directly
into the helper functions in the vectors page, we can't easily
remove/move it.

We could possibly wrap the vectors page to make it look like a DSO in
a forwards-compatible way, but since this has not happened so far it
feels like people never saw much benefit.

Can you describe the benefits with regard to this case?  (And, out of
interest, how do statically-linked programs make use of needed
functionality in the VDSO?  Browsing the web hasn't revealed any
coherent answers to that question for me...)

Cheers
---Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-07-01 Thread David Gilbert

Hi All,
  I've just submitted the patches for the 64 bit atomic stuff to the
gcc-patches list.
Richard Henderson has raised the question of why the ARM commpage isn't a full
VDSO and, if it was, then it would make the version number check a lot simpler.

What's the history behind this/how big a job is it/


Dave (on holiday for a week but checking mail intermittently)

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-06-13 Thread David Gilbert

On 10 June 2011 20:38, Nicolas Pitre  wrote:
> On Fri, 10 Jun 2011, David Gilbert wrote:
>
>> On 25 May 2011 04:45, Nicolas Pitre  wrote:
>>
>> 
>>
>> > FWIW, here's what the kernel part might look like, i.e. for
>> > compatibility with pre ARMv6k systems (beware, only compile tested):
>>
>> 
>>
>> Hi Nicolas,
>>   I've just about got a set of gcc backend changes working for the inline 
>> case
>> and plan on attacking libgcc next week.
>>
>>   What are your intentions for that code that you sent in this message - do
>> you intend to finish it off and upstream it or were you wanting me to do that
>> as part of this task?
>
> I'll refine it and push it upstream.

OK, thanks.

>
>>   If you're doing it could you confirm the interface to work to.
>
> Here it goes:
>
>  * Reference prototype:
>  *
>  *     int __kernel_cmpxchgd64(const int64_t *oldval,
>  *                             const int64_t *newval,
>  *                             volatile int64_t *ptr);
>  *
>  * Input:
>  *
>  *     r0 = pointer to oldval
>  *     r1 = pointer to newval
>  *     r2 = pointer to target value
>  *     lr = return address
>  *
>  * Output:
>  *
>  *     r0 = returned value (zero or non-zero)
>  *     C flag = set if r0 == 0, clear if r0 != 0
>  *
>  * Clobbered:
>  *
>  *     r3, condition flags

OK



>  *    - This call is valid only if __kernel_helper_version >= 5.
>
> Of course, as discussed, this would be preferable if the interface in
> libgcc was in a separate object file so any reference to it would also
> bring in a global constructor from which the __kernel_helper_version
> value could be verified in order to prevent usage of this interface on
> kernels that lack it.

I was thinking of looking at ifunc for this;but I've not looked yet.

>> Also, do you think there should be a halfword and byte interface as well,
>> given that halfword and byte ldrex implementations aren't available on
>> older ARMs,
>> or expect user space to do a bit of mask fiddling with the 32bit ones?
>
> It is indeed better if user space deals with it using the 32-bit
> interface. I'd prefer adding new kernel helpers only when absolutely
> required.

Actually, I've found Julian Brown already implemented that in user space in 2008
(see gcc/config/arm/linux-atomic.c)

Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-06-10 Thread Nicolas Pitre

On Fri, 10 Jun 2011, David Gilbert wrote:

> On 25 May 2011 04:45, Nicolas Pitre  wrote:
> 
> 
> 
> > FWIW, here's what the kernel part might look like, i.e. for
> > compatibility with pre ARMv6k systems (beware, only compile tested):
> 
> 
> 
> Hi Nicolas,
>   I've just about got a set of gcc backend changes working for the inline case
> and plan on attacking libgcc next week.
> 
>   What are your intentions for that code that you sent in this message - do
> you intend to finish it off and upstream it or were you wanting me to do that
> as part of this task?

I'll refine it and push it upstream.

>   If you're doing it could you confirm the interface to work to.

Here it goes:

 * Reference prototype:
 *
 * int __kernel_cmpxchgd64(const int64_t *oldval,
 * const int64_t *newval,
 * volatile int64_t *ptr);
 *
 * Input:
 *
 * r0 = pointer to oldval
 * r1 = pointer to newval
 * r2 = pointer to target value
 * lr = return address
 *
 * Output:
 *
 * r0 = returned value (zero or non-zero)
 * C flag = set if r0 == 0, clear if r0 != 0
 *
 * Clobbered:
 *
 * r3, condition flags
 *
 * Definition and user space usage example:
 *
 * typedef int (__kernel_cmpxchg64_t)(const int64_t *oldval,
 *const int64_t *newval,
 *volatile int64_t *ptr);
 * #define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *)0x0f60)
 *
 * Atomically store *newval in *ptr if *ptr is equal to *oldval for user space.
 * Return zero if *ptr was changed or non-zero if no exchange happened.
 * The C flag is also set if *ptr was changed to allow for assembly
 * optimization in the calling code.
 *
 * Notes:
 *
 *- This routine already includes memory barriers as needed.
 *
 *- Due to the length of some sequences, this spans 2 regular kuser
 *  "slots" so 0x0f80 is not used as a valid entry point.
 *
 *- This call is valid only if __kernel_helper_version >= 5.

Of course, as discussed, this would be preferable if the interface in 
libgcc was in a separate object file so any reference to it would also 
bring in a global constructor from which the __kernel_helper_version 
value could be verified in order to prevent usage of this interface on 
kernels that lack it.

> Also, do you think there should be a halfword and byte interface as well,
> given that halfword and byte ldrex implementations aren't available on
> older ARMs,
> or expect user space to do a bit of mask fiddling with the 32bit ones?

It is indeed better if user space deals with it using the 32-bit 
interface. I'd prefer adding new kernel helpers only when absolutely 
required.


Nicolas

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-06-10 Thread David Gilbert

On 25 May 2011 04:45, Nicolas Pitre  wrote:

> FWIW, here's what the kernel part might look like, i.e. for
> compatibility with pre ARMv6k systems (beware, only compile tested):

Hi Nicolas,
  I've just about got a set of gcc backend changes working for the inline case
and plan on attacking libgcc next week.

  What are your intentions for that code that you sent in this message - do
you intend to finish it off and upstream it or were you wanting me to do that
as part of this task?

  If you're doing it could you confirm the interface to work to.
Also, do you think there should be a halfword and byte interface as well,
given that halfword and byte ldrex implementations aren't available on
older ARMs,
or expect user space to do a bit of mask fiddling with the 32bit ones?

Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-06-01 Thread Richard Earnshaw

On Tue, 2011-05-31 at 16:03 +0100, David Gilbert wrote:
> On 31 May 2011 15:35, Richard Earnshaw  wrote:
> > I think the difficulty here is that glibc expects either the compiler,
> > or libgcc to provide the sync primitives; and while GCC can tie the
> > inlined copy of the primitive to use of CPUs with the relevant
> > instruction, the libgcc version doesn't know how to specify that the
> > code it's relying on requires a minimal kernel version...
> >
> > It could throw the dependency back on glibc, but then you've got an
> > expensive operation (the libgcc copies are normally implemented as
> > private, per-library, helpers to avoid a PLT call overhead).
> 
> You say
>  'the libgcc version doesn't know how to specify that the
>  code it's relying on requires a minimal kernel version...'
> 
> but why isn't that code just in libgcc; if the libgcc helper gets
> called and the kernel interface is too old it throws an error.

You don't want to repeat the test every time the sync function is
called.  That's excessive.  Further, you want to know when the program
starts if the kernel isn't current enough, not a long way into
execution.

It might be possible to construct an object file that contained the
support, but which also contained a 'constructor' that was run at init
time.  The 'constructor' would run the kernel version check and
terminate the program if it wasn't sufficiently up-to-date.  That still
begs the question of how this should be done.  Not very useful printing
a message if the program is based on a GUI.  The constructor would have
to be run early to avoid the case of another (user or library)
constructor needing the sync primitive.

> eglibc always uses the compilers sync primitives

Which the compiler will either inline, or fall back to the libgcc
versions.
> 
> (I've not checked if all of eglibc's atomic's match gcc's or if it
> needs any more).
> 

I presume that (e)glibc doesn't use 64-bit sync or we would have added
it before now...

R.

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-06-01 Thread Dave Martin

On Tue, May 31, 2011 at 12:53:35PM -0400, Nicolas Pitre wrote:
> On Tue, 31 May 2011, Dave Martin wrote:
> 
> > On Tue, May 31, 2011 at 03:35:42PM +0100, Richard Earnshaw wrote:
> > > I think the difficulty here is that glibc expects either the compiler,
> > > or libgcc to provide the sync primitives; and while GCC can tie the
> > > inlined copy of the primitive to use of CPUs with the relevant
> > > instruction, the libgcc version doesn't know how to specify that the
> > > code it's relying on requires a minimal kernel version...
> > 
> > The libgcc 64-bit atomic helpers could do a runtime check on
> > the __kernel_helper_version field in the vectors page before calling
> > the 64-bit cmpxchg helper.  This will allow the absence of the helper
> > to be reliably detected on older kernels.  Because this is data, it
> > might cause an extra D-TLB miss to accompany any other miss associated
> > with calling the kernel helper (if it exists).
> 
> Isn't there a way to pull in a global constructor or similar whenever a 
> reference to the 64-bit cmpxchg helper is made, so that the constructor 
> code could simply validate the kuser version number and bail out 
> otherwise?

Hmm, that should be possible.

I don't know how this interacts with other constructors: really this
would need to run at a defined point early program startup.  If it just
ends up somewhere in the middle of the list of global constructurs, we'd
have a problem if some of the other constructurs try to use the 64-bit
atomics.

Richard, do you know how this could work?

> I know this is certainly the case on x86 that if you statically compile 
> a binary on a modern distro in the hope of being able to execute that 
> binary anywhere, you may get a "kernel too old" error message when 
> trying to run it on older distros.  The same thing could be done for the 
> ARM 64-bit cmpxchg helper as well, at startup rather than on every 
> invokation, no?

As a principle, it feels like it ought to work.

Cheers
---Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-31 Thread Nicolas Pitre

On Tue, 31 May 2011, Dave Martin wrote:

> On Tue, May 31, 2011 at 03:35:42PM +0100, Richard Earnshaw wrote:
> > I think the difficulty here is that glibc expects either the compiler,
> > or libgcc to provide the sync primitives; and while GCC can tie the
> > inlined copy of the primitive to use of CPUs with the relevant
> > instruction, the libgcc version doesn't know how to specify that the
> > code it's relying on requires a minimal kernel version...
> 
> The libgcc 64-bit atomic helpers could do a runtime check on
> the __kernel_helper_version field in the vectors page before calling
> the 64-bit cmpxchg helper.  This will allow the absence of the helper
> to be reliably detected on older kernels.  Because this is data, it
> might cause an extra D-TLB miss to accompany any other miss associated
> with calling the kernel helper (if it exists).

Isn't there a way to pull in a global constructor or similar whenever a 
reference to the 64-bit cmpxchg helper is made, so that the constructor 
code could simply validate the kuser version number and bail out 
otherwise?

I know this is certainly the case on x86 that if you statically compile 
a binary on a modern distro in the hope of being able to execute that 
binary anywhere, you may get a "kernel too old" error message when 
trying to run it on older distros.  The same thing could be done for the 
ARM 64-bit cmpxchg helper as well, at startup rather than on every 
invokation, no?

Nicolas

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-31 Thread David Gilbert

On 31 May 2011 15:35, Richard Earnshaw  wrote:
> I think the difficulty here is that glibc expects either the compiler,
> or libgcc to provide the sync primitives; and while GCC can tie the
> inlined copy of the primitive to use of CPUs with the relevant
> instruction, the libgcc version doesn't know how to specify that the
> code it's relying on requires a minimal kernel version...
>
> It could throw the dependency back on glibc, but then you've got an
> expensive operation (the libgcc copies are normally implemented as
> private, per-library, helpers to avoid a PLT call overhead).

You say
 'the libgcc version doesn't know how to specify that the
 code it's relying on requires a minimal kernel version...'

but why isn't that code just in libgcc; if the libgcc helper gets
called and the kernel interface is too old it throws an error.

If the compiler knows it's on 6k or above it inlines else it calls libgcc
If libgcc was built with a 6k compiler it inlines (in case something
explicitly calls the
library routine)
If libgcc was built with a <6k compiler it checks to see if it finds
itself on a new enough kernel  else it errors.

eglibc always uses the compilers sync primitives

(I've not checked if all of eglibc's atomic's match gcc's or if it
needs any more).

Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-31 Thread Dave Martin

On Tue, May 31, 2011 at 03:35:42PM +0100, Richard Earnshaw wrote:
> 
> On Tue, 2011-05-31 at 13:17 +0100, Dave Martin wrote:
> > On Mon, May 30, 2011 at 09:38:25AM +0200, Ken Werner wrote:
> > > On 05/25/2011 03:17 PM, Dave Martin wrote:
> > > >On Wed, May 25, 2011 at 12:58:30PM +0100, David Gilbert wrote:
> > > >>On 25 May 2011 04:45, Nicolas Pitre  wrote:
> > > >>>FWIW, here's what the kernel part might look like, i.e. for
> > > >>>compatibility with pre ARMv6k systems (beware, only compile tested):
> > > >>
> > > >>OK, so that makes a eglibc part for that pretty easy.
> > > >>For things like fetch_and_add (which I can see membase needs)
> > > >>would you expect implementation using this cmpxchg so it has a fall
> > > >>back or just to use ldrexd directly which I assume would be somewhat
> > > >>more efficient.
> > > >>
> > > >>(Question holds for both eglibc and gcc's __sync_*)
> > > >
> > > >It depends on the baseline architecture for the build.
> > > >
> > > >An eglibc built for ARMv6 and above would need to call the helper by
> > > >default, though it could also use ldrexd/strexd if it determines at run-
> > > >time that this is supported by the CPU.
> > > >
> > > >Similarly, if GCC is building for -march=marmv7-a it can inline the
> > > >atomics directly using ldrex/strex and friends, but for -march=armv6 it
> > > >will need to call helpers via libgcc.
> > > 
> > > I would have thought that the libc does not decide this directly but
> > > just calls the GCC __sync_* routines (if build with a GCC that
> > > supports them). Then the GCC decides whether to inline them using
> > > ldrexd/strexd (ARMv6+) or emit calls to libgcc which calls the
> > > kernel helpers.
> > 
> > You're right; it looks like eglibc uses the GCC __sync_*() functions if
> > they exist.  So, it would be natural to follow this model for 64-bit
> > atomics too.
> 
> I think the difficulty here is that glibc expects either the compiler,
> or libgcc to provide the sync primitives; and while GCC can tie the
> inlined copy of the primitive to use of CPUs with the relevant
> instruction, the libgcc version doesn't know how to specify that the
> code it's relying on requires a minimal kernel version...

The libgcc 64-bit atomic helpers could do a runtime check on
the __kernel_helper_version field in the vectors page before calling
the 64-bit cmpxchg helper.  This will allow the absence of the helper
to be reliably detected on older kernels.  Because this is data, it
might cause an extra D-TLB miss to accompany any other miss associated
with calling the kernel helper (if it exists).

It's an overhead, so this may not be very desirable; however,
the overhead will normally not apply if the platform is built for armv7+,
since I believe in that case the atomics will usually get inlined --
is that the case?

If the runtime check finds the kernel helper isn't there, there's
a question of what to do.  If libgcc was not built for v7, the safest
approach is probably to spit out a diasnotsic message and call abort()
or similar, since there's no guarantee of doing a 64-bit atomic at all
in such situations.

These issues only apply to the 64-bit atomics.  For the other atomics,
we have a de facto "assume the kernel is new enough" policy, which
seems OK in practice due to the fact that the GCC atomics support is
rather newer than the kernel helpers themselves in any case.

Cheers
---Dave

> It could throw the dependency back on glibc, but then you've got an
> expensive operation (the libgcc copies are normally implemented as
> private, per-library, helpers to avoid a PLT call overhead).
> 
> I'm not sure what the best solution is here.
> 
> R.
> 
> 

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-31 Thread Richard Earnshaw


On Tue, 2011-05-31 at 13:17 +0100, Dave Martin wrote:
> On Mon, May 30, 2011 at 09:38:25AM +0200, Ken Werner wrote:
> > On 05/25/2011 03:17 PM, Dave Martin wrote:
> > >On Wed, May 25, 2011 at 12:58:30PM +0100, David Gilbert wrote:
> > >>On 25 May 2011 04:45, Nicolas Pitre  wrote:
> > >>>FWIW, here's what the kernel part might look like, i.e. for
> > >>>compatibility with pre ARMv6k systems (beware, only compile tested):
> > >>
> > >>OK, so that makes a eglibc part for that pretty easy.
> > >>For things like fetch_and_add (which I can see membase needs)
> > >>would you expect implementation using this cmpxchg so it has a fall
> > >>back or just to use ldrexd directly which I assume would be somewhat
> > >>more efficient.
> > >>
> > >>(Question holds for both eglibc and gcc's __sync_*)
> > >
> > >It depends on the baseline architecture for the build.
> > >
> > >An eglibc built for ARMv6 and above would need to call the helper by
> > >default, though it could also use ldrexd/strexd if it determines at run-
> > >time that this is supported by the CPU.
> > >
> > >Similarly, if GCC is building for -march=marmv7-a it can inline the
> > >atomics directly using ldrex/strex and friends, but for -march=armv6 it
> > >will need to call helpers via libgcc.
> > 
> > I would have thought that the libc does not decide this directly but
> > just calls the GCC __sync_* routines (if build with a GCC that
> > supports them). Then the GCC decides whether to inline them using
> > ldrexd/strexd (ARMv6+) or emit calls to libgcc which calls the
> > kernel helpers.
> 
> You're right; it looks like eglibc uses the GCC __sync_*() functions if
> they exist.  So, it would be natural to follow this model for 64-bit
> atomics too.

I think the difficulty here is that glibc expects either the compiler,
or libgcc to provide the sync primitives; and while GCC can tie the
inlined copy of the primitive to use of CPUs with the relevant
instruction, the libgcc version doesn't know how to specify that the
code it's relying on requires a minimal kernel version...

It could throw the dependency back on glibc, but then you've got an
expensive operation (the libgcc copies are normally implemented as
private, per-library, helpers to avoid a PLT call overhead).

I'm not sure what the best solution is here.

R.



___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-31 Thread Dave Martin

On Mon, May 30, 2011 at 09:38:25AM +0200, Ken Werner wrote:
> On 05/25/2011 03:17 PM, Dave Martin wrote:
> >On Wed, May 25, 2011 at 12:58:30PM +0100, David Gilbert wrote:
> >>On 25 May 2011 04:45, Nicolas Pitre  wrote:
> >>>FWIW, here's what the kernel part might look like, i.e. for
> >>>compatibility with pre ARMv6k systems (beware, only compile tested):
> >>
> >>OK, so that makes a eglibc part for that pretty easy.
> >>For things like fetch_and_add (which I can see membase needs)
> >>would you expect implementation using this cmpxchg so it has a fall
> >>back or just to use ldrexd directly which I assume would be somewhat
> >>more efficient.
> >>
> >>(Question holds for both eglibc and gcc's __sync_*)
> >
> >It depends on the baseline architecture for the build.
> >
> >An eglibc built for ARMv6 and above would need to call the helper by
> >default, though it could also use ldrexd/strexd if it determines at run-
> >time that this is supported by the CPU.
> >
> >Similarly, if GCC is building for -march=marmv7-a it can inline the
> >atomics directly using ldrex/strex and friends, but for -march=armv6 it
> >will need to call helpers via libgcc.
> 
> I would have thought that the libc does not decide this directly but
> just calls the GCC __sync_* routines (if build with a GCC that
> supports them). Then the GCC decides whether to inline them using
> ldrexd/strexd (ARMv6+) or emit calls to libgcc which calls the
> kernel helpers.

You're right; it looks like eglibc uses the GCC __sync_*() functions if
they exist.  So, it would be natural to follow this model for 64-bit
atomics too.

Cheers
---Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-30 Thread Ken Werner


On 05/25/2011 03:17 PM, Dave Martin wrote:

On Wed, May 25, 2011 at 12:58:30PM +0100, David Gilbert wrote:

On 25 May 2011 04:45, Nicolas Pitre  wrote:

FWIW, here's what the kernel part might look like, i.e. for
compatibility with pre ARMv6k systems (beware, only compile tested):


OK, so that makes a eglibc part for that pretty easy.
For things like fetch_and_add (which I can see membase needs)
would you expect implementation using this cmpxchg so it has a fall
back or just to use ldrexd directly which I assume would be somewhat
more efficient.

(Question holds for both eglibc and gcc's __sync_*)


It depends on the baseline architecture for the build.

An eglibc built for ARMv6 and above would need to call the helper by
default, though it could also use ldrexd/strexd if it determines at run-
time that this is supported by the CPU.

Similarly, if GCC is building for -march=marmv7-a it can inline the
atomics directly using ldrex/strex and friends, but for -march=armv6 it
will need to call helpers via libgcc.


I would have thought that the libc does not decide this directly but 
just calls the GCC __sync_* routines (if build with a GCC that supports 
them). Then the GCC decides whether to inline them using ldrexd/strexd 
(ARMv6+) or emit calls to libgcc which calls the kernel helpers.


Regards
Ken

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-26 Thread Dave Martin

On Thu, May 26, 2011 at 09:29:51AM -0400, Nicolas Pitre wrote:
> On Thu, 26 May 2011, Dave Martin wrote:
> 
> > > The core logic spans 5 instructions.  Only 3 of them are actually the 
> > > same in both cases i.e. those with no references to 1b or 2b, and 
> > > they're perfectly interlaced in between the others.  Trying to make this 
> > > into common runtime code would result in even bigger code I'm afraid.  
> > > This could be made into common source code via a macro though.
> > 
> > Fair enough -- a macro might be worth a try _if_ it simplifies things
> > in the source, but I think you're right that merging the actual code
> > probably isn't worth it just to save a few words in the vectors page
> > (which eats up 4K regardless of what we put in it) and a few cycles per
> > fault (which already costs many, many cycles).
> 
> In the normal cases, there is no additional cycles per fault as the 
> inline check remains unchanged, and it goes like this:
> 
> @ Make sure our user space atomic helper is restarted
> @ if it was interrupted in a critical region.  Here we
> @ perform a quick test inline since it should be false
> @ 99.% of the time.  The rest is done out of line.
> cmp r2, #TASK_SIZE
> blhskuser_cmpxchg_fixup
> 
> In most cases the branch is not taken.

True!

---Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-26 Thread Nicolas Pitre

On Thu, 26 May 2011, Dave Martin wrote:

> > The core logic spans 5 instructions.  Only 3 of them are actually the 
> > same in both cases i.e. those with no references to 1b or 2b, and 
> > they're perfectly interlaced in between the others.  Trying to make this 
> > into common runtime code would result in even bigger code I'm afraid.  
> > This could be made into common source code via a macro though.
> 
> Fair enough -- a macro might be worth a try _if_ it simplifies things
> in the source, but I think you're right that merging the actual code
> probably isn't worth it just to save a few words in the vectors page
> (which eats up 4K regardless of what we put in it) and a few cycles per
> fault (which already costs many, many cycles).

In the normal cases, there is no additional cycles per fault as the 
inline check remains unchanged, and it goes like this:

@ Make sure our user space atomic helper is restarted
@ if it was interrupted in a critical region.  Here we
@ perform a quick test inline since it should be false
@ 99.% of the time.  The rest is done out of line.
cmp r2, #TASK_SIZE
blhskuser_cmpxchg_fixup

In most cases the branch is not taken.

Nicolas

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-26 Thread Dave Martin

On Wed, May 25, 2011 at 10:21:43AM -0400, Nicolas Pitre wrote:
> On Wed, 25 May 2011, Dave Martin wrote:
> 
> > On Tue, May 24, 2011 at 11:45:04PM -0400, Nicolas Pitre wrote:
> > > + *   typedef int (__kernel_cmpxchg64_t)(const int64_t *oldval,
> > > + *  const int64_t *newval,
> > > + *  volatile int64_t *ptr);
> > > + *   #define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *)0x0f60)
> > > + *
> > > + * Atomically store newval in *ptr if *ptr is equal to oldval for user 
> > > space.
> > > + * Return zero if *ptr was changed or non-zero if no exchange happened.
> > > + * The C flag is also set if *ptr was changed to allow for assembly
> > > + * optimization in the calling code.
> > 
> >  * Do not attempt to call this function unless __kernel_helper_version >= 5.
> >  *
> 
> Yep.  I will queue your other patch prior to this one and make this 
> blurb consistent with the rest.

Oh, OK.  I hadn't feedback on that yet, so I wasn't sure whether anyone
was picking it up.  I will repost to alkml, but I was waiting for the
merge window to close since this was just a tidyup rather than something
urgent.

This warning is more important for the new helper than for the existing
ones (where really we could omit it without much risk).

> 
> > > +kuser_cmpxchg64_fixup:
> > > + @ Called from kuser_cmpxchg_fixup.
> > > + @ r2 = address of interrupted insn (must be preserved).
> > > + @ sp = saved regs. r7 and r8 are clobbered.
> > > + @ 1b = first critical insn, 2b = last critical insn.
> > > + @ If r2 >= 1b and r2 <= 2b then saved pc_usr is set to 1b.
> > > + mov r7, #0x0fff
> > > + sub r7, r7, #(0x0fff - (0x0f60 + (1b - __kuser_cmpxchg64)))
> > > + subsr8, r2, r7
> > > + rsbcss  r8, r8, #(2b - 1b)
> > > + strcs   r7, [sp, #S_PC]
> > > +#if __LINUX_ARM_ARCH__ < 6
> > > + b   kuser_cmpxchg32_fixup
> > 
> > Can we just have movcs pc,lr here, and put kuser_cmpxchg32_fixup
> > immediately after?
> > 
> > This would allow us to skip the branch, and the initial "mov r7" in
> > the kuser_cmpxchg32_fixup code.
> 
> The 'mov r7' is still needed unless the second instruction uses another 
> target register.

What I meant is that at the end of the above sequence, r7 = 1b.
So we can just fall through in to the 32-bit fixup code and add the
appropriate small offset to r7 so that it now points to the corresponding
1b label for __kuser_cmpxchg32, without needing to re-derive the address
using mov+sub.

It's a pretty minor tweak though.

> I thought about the possibility of "movcs pc, lr", especially since the 
> .text segments are simply concatenated and therefore the branch is 
> effectively branching to the very next instruction.  So there could be 
> like a common preamble, then a list of concatenated fixup segments (only 
> two of them now) and finally a postamble which would simply be "mov pc, lr".  
> This would all be put contigous at link time.  However I'm not sure yet 
> if this is worth optimizing that far since this code is far from being 
> hot, and clarity would also be affected.

Also, probably the number of fixups is never going to grow very large.

> 
> > There's a fair amount of duplicated logic from the 32-bit case.
> > Is it worth trying to merge the two?
> 
> The core logic spans 5 instructions.  Only 3 of them are actually the 
> same in both cases i.e. those with no references to 1b or 2b, and 
> they're perfectly interlaced in between the others.  Trying to make this 
> into common runtime code would result in even bigger code I'm afraid.  
> This could be made into common source code via a macro though.

Fair enough -- a macro might be worth a try _if_ it simplifies things
in the source, but I think you're right that merging the actual code
probably isn't worth it just to save a few words in the vectors page
(which eats up 4K regardless of what we put in it) and a few cycles per
fault (which already costs many, many cycles).

Cheers
---Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-25 Thread Nicolas Pitre

On Wed, 25 May 2011, Dave Martin wrote:

> On Tue, May 24, 2011 at 11:45:04PM -0400, Nicolas Pitre wrote:
> > + * typedef int (__kernel_cmpxchg64_t)(const int64_t *oldval,
> > + *const int64_t *newval,
> > + *volatile int64_t *ptr);
> > + * #define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *)0x0f60)
> > + *
> > + * Atomically store newval in *ptr if *ptr is equal to oldval for user 
> > space.
> > + * Return zero if *ptr was changed or non-zero if no exchange happened.
> > + * The C flag is also set if *ptr was changed to allow for assembly
> > + * optimization in the calling code.
> 
>  * Do not attempt to call this function unless __kernel_helper_version >= 5.
>  *

Yep.  I will queue your other patch prior to this one and make this 
blurb consistent with the rest.

> > +kuser_cmpxchg64_fixup:
> > +   @ Called from kuser_cmpxchg_fixup.
> > +   @ r2 = address of interrupted insn (must be preserved).
> > +   @ sp = saved regs. r7 and r8 are clobbered.
> > +   @ 1b = first critical insn, 2b = last critical insn.
> > +   @ If r2 >= 1b and r2 <= 2b then saved pc_usr is set to 1b.
> > +   mov r7, #0x0fff
> > +   sub r7, r7, #(0x0fff - (0x0f60 + (1b - __kuser_cmpxchg64)))
> > +   subsr8, r2, r7
> > +   rsbcss  r8, r8, #(2b - 1b)
> > +   strcs   r7, [sp, #S_PC]
> > +#if __LINUX_ARM_ARCH__ < 6
> > +   b   kuser_cmpxchg32_fixup
> 
> Can we just have movcs pc,lr here, and put kuser_cmpxchg32_fixup
> immediately after?
> 
> This would allow us to skip the branch, and the initial "mov r7" in
> the kuser_cmpxchg32_fixup code.

The 'mov r7' is still needed unless the second instruction uses another 
target register.

I thought about the possibility of "movcs pc, lr", especially since the 
.text segments are simply concatenated and therefore the branch is 
effectively branching to the very next instruction.  So there could be 
like a common preamble, then a list of concatenated fixup segments (only 
two of them now) and finally a postamble which would simply be "mov pc, lr".  
This would all be put contigous at link time.  However I'm not sure yet 
if this is worth optimizing that far since this code is far from being 
hot, and clarity would also be affected.

> There's a fair amount of duplicated logic from the 32-bit case.
> Is it worth trying to merge the two?

The core logic spans 5 instructions.  Only 3 of them are actually the 
same in both cases i.e. those with no references to 1b or 2b, and 
they're perfectly interlaced in between the others.  Trying to make this 
into common runtime code would result in even bigger code I'm afraid.  
This could be made into common source code via a macro though.

Thanks for the review.

Nicolas

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-25 Thread Dave Martin

On Wed, May 25, 2011 at 12:58:30PM +0100, David Gilbert wrote:
> On 25 May 2011 04:45, Nicolas Pitre  wrote:
> > FWIW, here's what the kernel part might look like, i.e. for
> > compatibility with pre ARMv6k systems (beware, only compile tested):
> 
> OK, so that makes a eglibc part for that pretty easy.
> For things like fetch_and_add (which I can see membase needs)
> would you expect implementation using this cmpxchg so it has a fall
> back or just to use ldrexd directly which I assume would be somewhat
> more efficient.
> 
> (Question holds for both eglibc and gcc's __sync_*)

It depends on the baseline architecture for the build.

An eglibc built for ARMv6 and above would need to call the helper by
default, though it could also use ldrexd/strexd if it determines at run-
time that this is supported by the CPU.

Similarly, if GCC is building for -march=marmv7-a it can inline the
atomics directly using ldrex/strex and friends, but for -march=armv6 it
will need to call helpers via libgcc.

> 
> 
> > + * Notes:
> > + *
> > + *    - This routine already includes memory barriers as needed.
> 
> Hmm I wonder whether something like the atomic add primitives from user
> space need that or not; it depends whether people are using them to build

The GCC __sync_* primitives must mostly be full barriers.  This is what
the Itanium ABI specifies (this is the spec GCC follows for these).

The kernel user helper itself could omit the barriers, but this would
deviate from the existing 32-bit implementation, and also slow down the
(common) case where at least one of the barriers really is needed.

_just_ updating a counter doesn't need the barriers.  But if it's important
that a counter can be atomically updated by multiple threads, the
synchronisation of that update against other data structures usually turns
out to be important too...

> larger data structures or just trying to keep a consistent count somewhere.

GCC's atomic primitives don't address the full/partial barrier distinction.
Some other atomics APIs, like Qt's for example, do express different flavours
of barrier behavour and so in principle can be more optimal for cases where
it makes a difference.

Cheers
---Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-25 Thread David Gilbert

On 25 May 2011 04:45, Nicolas Pitre  wrote:
> FWIW, here's what the kernel part might look like, i.e. for
> compatibility with pre ARMv6k systems (beware, only compile tested):

OK, so that makes a eglibc part for that pretty easy.
For things like fetch_and_add (which I can see membase needs)
would you expect implementation using this cmpxchg so it has a fall
back or just to use ldrexd directly which I assume would be somewhat
more efficient.

(Question holds for both eglibc and gcc's __sync_*)

> + * Notes:
> + *
> + *    - This routine already includes memory barriers as needed.

Hmm I wonder whether something like the atomic add primitives from user
space need that or not; it depends whether people are using them to build
larger data structures or just trying to keep a consistent count somewhere.

Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-25 Thread Dave Martin

On Tue, May 24, 2011 at 11:45:04PM -0400, Nicolas Pitre wrote:
> On Tue, 24 May 2011, Michael Hope wrote:
> 
> > On Tue, May 24, 2011 at 10:33 AM, Michael Casadevall
> >  wrote:
> > > On 05/19/2011 10:56 AM, David Gilbert wrote:
> > >> On 19 May 2011 16:49, Ken Werner  wrote:
> > >>> On 05/19/2011 12:40 PM, David Rusling wrote:
> > 
> >  Is this going to end up in a blueprint?   This is the last loose end of
> >  SMP / atomic memory operations work and I'd like to see it happen
> > >>>
> > >>> Hi,
> > >>>
> > >>> Yep, there is one (kind of a skeleton) in place at:
> > >>> https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/64-bit-sync-primitives
> > >>
> > >> Which I'll be filling out in the next few days.
> > >>
> > >> Dave
> > >
> > > Is there a timeline for this feature? It's been requested by members of
> > > the ARM Server Club to have this implemented, and its important that
> > > this makes it into the Ubuntu 11.10 release.
> > > Michael
> > 
> > Hi Michael.  The topics for this planning cycle are listed here:
> >  https://wiki.linaro.org/Cycles//TechnicalTopics/Toolchain
> > 
> > 64 bit sync primitives are medium priority so they will be achieved in
> > the next six months.
> > 
> > A draft of what is in whos queue is at:
> >  https://wiki.linaro.org/Cycles//TechnicalTopics/Toolchain/Planning
> > 
> > The primitives are second in Dave's queue so should be started in the
> > next three months.
> 
> FWIW, here's what the kernel part might look like, i.e. for 
> compatibility with pre ARMv6k systems (beware, only compile tested):
> 
> diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
> index e8d8856..53830a7 100644
> --- a/arch/arm/kernel/entry-armv.S
> +++ b/arch/arm/kernel/entry-armv.S
> @@ -383,7 +383,7 @@ ENDPROC(__pabt_svc)
>   .endm
>  
>   .macro  kuser_cmpxchg_check
> -#if __LINUX_ARM_ARCH__ < 6 && !defined(CONFIG_NEEDS_SYSCALL_FOR_CMPXCHG)
> +#if !defined(CONFIG_CPU_32v6K) && !defined(CONFIG_NEEDS_SYSCALL_FOR_CMPXCHG)
>  #ifndef CONFIG_MMU
>  #warning "NPTL on non MMU needs fixing"
>  #else
> @@ -392,7 +392,7 @@ ENDPROC(__pabt_svc)
>   @ perform a quick test inline since it should be false
>   @ 99.% of the time.  The rest is done out of line.
>   cmp r2, #TASK_SIZE
> - blhskuser_cmpxchg_fixup
> + blhskuser_cmpxchg64_fixup
>  #endif
>  #endif
>   .endm
> @@ -797,6 +797,139 @@ __kuser_helper_start:
>  /*
>   * Reference prototype:
>   *
> + *   int __kernel_cmpxchgd64(int64_t *oldval, int64_t *newval, int64_t *ptr)
> + *
> + * Input:
> + *
> + *   r0 = pointer to oldval
> + *   r1 = pointer to newval
> + *   r2 = pointer to target value
> + *   lr = return address
> + *
> + * Output:
> + *
> + *   r0 = returned value (zero or non-zero)
> + *   C flag = set if r0 == 0, clear if r0 != 0
> + *
> + * Clobbered:
> + *
> + *   r3, flags
> + *
> + * Definition and user space usage example:
> + *
> + *   typedef int (__kernel_cmpxchg64_t)(const int64_t *oldval,
> + *  const int64_t *newval,
> + *  volatile int64_t *ptr);
> + *   #define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *)0x0f60)
> + *
> + * Atomically store newval in *ptr if *ptr is equal to oldval for user space.
> + * Return zero if *ptr was changed or non-zero if no exchange happened.
> + * The C flag is also set if *ptr was changed to allow for assembly
> + * optimization in the calling code.

 * Do not attempt to call this function unless __kernel_helper_version >= 5.
 *

> + *
> + * Notes:
> + *
> + *- This routine already includes memory barriers as needed.
> + *
> + *- Due to the length of some sequences, this spans 2 regular kuser
> + *  "slots" so 0x0f80 is not used as a valid entry point.
> + */
> +
> +__kuser_cmpxchg64:   @ 0x0f60
> +
> +#if defined(CONFIG_NEEDS_SYSCALL_FOR_CMPXCHG)
> +
> + /*
> +  * Poor you.  No fast solution possible...
> +  * The kernel itself must perform the operation.
> +  * A special ghost syscall is used for that (see traps.c).
> +  */
> + stmfd   sp!, {r7, lr}
> + ldr r7, 1f  @ it's 20 bits
> + swi __ARM_NR_cmpxchg64
> + ldmfd   sp!, {r7, pc}
> +1:   .word   __ARM_NR_cmpxchg64
> +
> +#elif defined(CONFIG_CPU_32v6K)
> +
> + stmfd   sp!, {r4, r5, r6, r7}
> + ldrdr4, r5, [r0]@ load old val
> + ldrdr6, r7, [r1]@ load new val
> + smp_dmb arm
> +1:   ldrexd  r0, r1, [r2]@ load current val
> + eorsr3, r0, r4  @ compare with oldval (1)
> + eoreqs  r3, r1, r5  @ compare with oldval (2)
> + strexdeq r3, r6, r7, [r2]   @ store newval if eq
> + teqeq   r3, #1  @ success?
> + beq 1b  @ if no then retry
> + smp_dmb arm
> +

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-24 Thread Nicolas Pitre

On Tue, 24 May 2011, Michael Hope wrote:

> On Tue, May 24, 2011 at 10:33 AM, Michael Casadevall
>  wrote:
> > On 05/19/2011 10:56 AM, David Gilbert wrote:
> >> On 19 May 2011 16:49, Ken Werner  wrote:
> >>> On 05/19/2011 12:40 PM, David Rusling wrote:
> 
>  Is this going to end up in a blueprint?   This is the last loose end of
>  SMP / atomic memory operations work and I'd like to see it happen
> >>>
> >>> Hi,
> >>>
> >>> Yep, there is one (kind of a skeleton) in place at:
> >>> https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/64-bit-sync-primitives
> >>
> >> Which I'll be filling out in the next few days.
> >>
> >> Dave
> >
> > Is there a timeline for this feature? It's been requested by members of
> > the ARM Server Club to have this implemented, and its important that
> > this makes it into the Ubuntu 11.10 release.
> > Michael
> 
> Hi Michael.  The topics for this planning cycle are listed here:
>  https://wiki.linaro.org/Cycles//TechnicalTopics/Toolchain
> 
> 64 bit sync primitives are medium priority so they will be achieved in
> the next six months.
> 
> A draft of what is in whos queue is at:
>  https://wiki.linaro.org/Cycles//TechnicalTopics/Toolchain/Planning
> 
> The primitives are second in Dave's queue so should be started in the
> next three months.

FWIW, here's what the kernel part might look like, i.e. for 
compatibility with pre ARMv6k systems (beware, only compile tested):

diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index e8d8856..53830a7 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -383,7 +383,7 @@ ENDPROC(__pabt_svc)
.endm
 
.macro  kuser_cmpxchg_check
-#if __LINUX_ARM_ARCH__ < 6 && !defined(CONFIG_NEEDS_SYSCALL_FOR_CMPXCHG)
+#if !defined(CONFIG_CPU_32v6K) && !defined(CONFIG_NEEDS_SYSCALL_FOR_CMPXCHG)
 #ifndef CONFIG_MMU
 #warning "NPTL on non MMU needs fixing"
 #else
@@ -392,7 +392,7 @@ ENDPROC(__pabt_svc)
@ perform a quick test inline since it should be false
@ 99.% of the time.  The rest is done out of line.
cmp r2, #TASK_SIZE
-   blhskuser_cmpxchg_fixup
+   blhskuser_cmpxchg64_fixup
 #endif
 #endif
.endm
@@ -797,6 +797,139 @@ __kuser_helper_start:
 /*
  * Reference prototype:
  *
+ * int __kernel_cmpxchgd64(int64_t *oldval, int64_t *newval, int64_t *ptr)
+ *
+ * Input:
+ *
+ * r0 = pointer to oldval
+ * r1 = pointer to newval
+ * r2 = pointer to target value
+ * lr = return address
+ *
+ * Output:
+ *
+ * r0 = returned value (zero or non-zero)
+ * C flag = set if r0 == 0, clear if r0 != 0
+ *
+ * Clobbered:
+ *
+ * r3, flags
+ *
+ * Definition and user space usage example:
+ *
+ * typedef int (__kernel_cmpxchg64_t)(const int64_t *oldval,
+ *const int64_t *newval,
+ *volatile int64_t *ptr);
+ * #define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *)0x0f60)
+ *
+ * Atomically store newval in *ptr if *ptr is equal to oldval for user space.
+ * Return zero if *ptr was changed or non-zero if no exchange happened.
+ * The C flag is also set if *ptr was changed to allow for assembly
+ * optimization in the calling code.
+ *
+ * Notes:
+ *
+ *- This routine already includes memory barriers as needed.
+ *
+ *- Due to the length of some sequences, this spans 2 regular kuser
+ *  "slots" so 0x0f80 is not used as a valid entry point.
+ */
+
+__kuser_cmpxchg64: @ 0x0f60
+
+#if defined(CONFIG_NEEDS_SYSCALL_FOR_CMPXCHG)
+
+   /*
+* Poor you.  No fast solution possible...
+* The kernel itself must perform the operation.
+* A special ghost syscall is used for that (see traps.c).
+*/
+   stmfd   sp!, {r7, lr}
+   ldr r7, 1f  @ it's 20 bits
+   swi __ARM_NR_cmpxchg64
+   ldmfd   sp!, {r7, pc}
+1: .word   __ARM_NR_cmpxchg64
+
+#elif defined(CONFIG_CPU_32v6K)
+
+   stmfd   sp!, {r4, r5, r6, r7}
+   ldrdr4, r5, [r0]@ load old val
+   ldrdr6, r7, [r1]@ load new val
+   smp_dmb arm
+1: ldrexd  r0, r1, [r2]@ load current val
+   eorsr3, r0, r4  @ compare with oldval (1)
+   eoreqs  r3, r1, r5  @ compare with oldval (2)
+   strexdeq r3, r6, r7, [r2]   @ store newval if eq
+   teqeq   r3, #1  @ success?
+   beq 1b  @ if no then retry
+   smp_dmb arm
+   rsbsr0, r3, #0  @ set returned val and C flag
+   ldmfd   sp!, {r4, r5, r6, r7}
+   usr_ret lr
+
+#elif !defined(CONFIG_SMP)
+
+#ifdef CONFIG_MMU
+
+   /*
+* The only thing that can break atomicity in this cmpxchg64
+* implementation is either an IRQ or a data abort excep

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-23 Thread Michael Hope

On Tue, May 24, 2011 at 10:33 AM, Michael Casadevall
 wrote:
> On 05/19/2011 10:56 AM, David Gilbert wrote:
>> On 19 May 2011 16:49, Ken Werner  wrote:
>>> On 05/19/2011 12:40 PM, David Rusling wrote:

 Is this going to end up in a blueprint?   This is the last loose end of
 SMP / atomic memory operations work and I'd like to see it happen
>>>
>>> Hi,
>>>
>>> Yep, there is one (kind of a skeleton) in place at:
>>> https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/64-bit-sync-primitives
>>
>> Which I'll be filling out in the next few days.
>>
>> Dave
>
> Is there a timeline for this feature? It's been requested by members of
> the ARM Server Club to have this implemented, and its important that
> this makes it into the Ubuntu 11.10 release.
> Michael

Hi Michael.  The topics for this planning cycle are listed here:
 https://wiki.linaro.org/Cycles//TechnicalTopics/Toolchain

64 bit sync primitives are medium priority so they will be achieved in
the next six months.

A draft of what is in whos queue is at:
 https://wiki.linaro.org/Cycles//TechnicalTopics/Toolchain/Planning

The primitives are second in Dave's queue so should be started in the
next three months.

-- Michael

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-23 Thread Michael Casadevall

On 05/19/2011 10:56 AM, David Gilbert wrote:
> On 19 May 2011 16:49, Ken Werner  wrote:
>> On 05/19/2011 12:40 PM, David Rusling wrote:
>>>
>>> Is this going to end up in a blueprint?   This is the last loose end of
>>> SMP / atomic memory operations work and I'd like to see it happen
>>
>> Hi,
>>
>> Yep, there is one (kind of a skeleton) in place at:
>> https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/64-bit-sync-primitives
> 
> Which I'll be filling out in the next few days.
> 
> Dave

Is there a timeline for this feature? It's been requested by members of
the ARM Server Club to have this implemented, and its important that
this makes it into the Ubuntu 11.10 release.
Michael


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-19 Thread David Gilbert

On 19 May 2011 16:49, Ken Werner  wrote:
> On 05/19/2011 12:40 PM, David Rusling wrote:
>>
>> Is this going to end up in a blueprint?   This is the last loose end of
>> SMP / atomic memory operations work and I'd like to see it happen
>
> Hi,
>
> Yep, there is one (kind of a skeleton) in place at:
> https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/64-bit-sync-primitives

Which I'll be filling out in the next few days.

Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-19 Thread Ken Werner


On 05/19/2011 12:40 PM, David Rusling wrote:

Is this going to end up in a blueprint?   This is the last loose end of SMP / 
atomic memory operations work and I'd like to see it happen


Hi,

Yep, there is one (kind of a skeleton) in place at:
https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/64-bit-sync-primitives

Regards
Ken

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-19 Thread David Rusling

Is this going to end up in a blueprint?   This is the last loose end of SMP / 
atomic memory operations work and I'd like to see it happen

Dave

On 18 May 2011, at 16:07, Dave Martin wrote:

> On Tue, May 17, 2011 at 5:30 PM, Nicolas Pitre  
> wrote:
>> On Tue, 17 May 2011, Dave Martin wrote:
>> 
>>> On Fri, May 6, 2011 at 10:39 PM, Nicolas Pitre  
>>> wrote:
 On Fri, 6 May 2011, Ramana Radhakrishnan wrote:
 
> On 6 May 2011 16:06, Ken Werner  wrote:
>> 
>> Currently the GCC ARM backend doesn't provide a pattern to inline 64bit
>> __sync_* functions but the compiler emits __sync_*_8 function calls [1]. 
>> The
>> libgcc does not provide these symbols via the usual thin wrapper around 
>> the
>> kernel helper [2] because the ARM Linux __kernel_cmpxchg supports 32bit 
>> only.
>> My understanding is that for ARMv7 the GCC backend could be enhanced to 
>> inline
>> the __sync_* functions by using the LDREXD and STREXD instructions. But 
>> for
>> ARMv5 we would still rely on a new kernel helper.
> 
> It's a bit tricky with when you want to use the kernel helper for v5t,
> so we've got to find a way of turning this on only with special knobs
> and not by default and that's a bit tricky.
 
 What is the problem with v5t?
 
> Think new user space and old kernel and a jump into never-never land.
 
 The kernel helpers are "versioned".  At 0x0ffc you can read the
 number of helpers currently available.  If a program uses a new helper
 then when it is started this value can be verified to equal or exceed
 the expected value and bail out otherwise.
>>> 
>>> To what extent do we think that userspace code is actually checking this?
>> 
>> I think right now it is none of it simply because most of the helpers
>> were added almost all at once.  But if future helpers are added then it
>> would be a good idea to check this but only if the new helper is
>> actually invoked for a given application.
>> 
>>> I may suggest a patch to the documentation text in entry-armv.S to
>>> make this requirement clearer, as well as getting rid of the example
>>> assembler code (which I consider to be mis-educative, but more
>>> importantly it's also Thumb-incompatible and will likely suffer poor
>>> branch-prediction on recent CPUs.
>> 
>> If you have suggestions for improving this then please do so.
> 
> I have a patch which I'll suggest at some point, but it's not high priority.
> 
>> 
>>> This code was the source of a TLS
>>> bug in bionic, and may have been inappropriately pasted elsewhere
>>> too...)
>> 
>> What bug?
> 
> Actually, to be fair I think I may be mis-remembering here... I can't
> seem to find the exact bug now.
> 
> Cheers
> ---Dave
> 
> ___
> linaro-dev mailing list
> linaro-dev@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-dev


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-18 Thread Dave Martin

On Tue, May 17, 2011 at 5:30 PM, Nicolas Pitre  wrote:
> On Tue, 17 May 2011, Dave Martin wrote:
>
>> On Fri, May 6, 2011 at 10:39 PM, Nicolas Pitre  
>> wrote:
>> > On Fri, 6 May 2011, Ramana Radhakrishnan wrote:
>> >
>> >> On 6 May 2011 16:06, Ken Werner  wrote:
>> >> >
>> >> > Currently the GCC ARM backend doesn't provide a pattern to inline 64bit
>> >> > __sync_* functions but the compiler emits __sync_*_8 function calls 
>> >> > [1]. The
>> >> > libgcc does not provide these symbols via the usual thin wrapper around 
>> >> > the
>> >> > kernel helper [2] because the ARM Linux __kernel_cmpxchg supports 32bit 
>> >> > only.
>> >> > My understanding is that for ARMv7 the GCC backend could be enhanced to 
>> >> > inline
>> >> > the __sync_* functions by using the LDREXD and STREXD instructions. But 
>> >> > for
>> >> > ARMv5 we would still rely on a new kernel helper.
>> >>
>> >> It's a bit tricky with when you want to use the kernel helper for v5t,
>> >> so we've got to find a way of turning this on only with special knobs
>> >> and not by default and that's a bit tricky.
>> >
>> > What is the problem with v5t?
>> >
>> >> Think new user space and old kernel and a jump into never-never land.
>> >
>> > The kernel helpers are "versioned".  At 0x0ffc you can read the
>> > number of helpers currently available.  If a program uses a new helper
>> > then when it is started this value can be verified to equal or exceed
>> > the expected value and bail out otherwise.
>>
>> To what extent do we think that userspace code is actually checking this?
>
> I think right now it is none of it simply because most of the helpers
> were added almost all at once.  But if future helpers are added then it
> would be a good idea to check this but only if the new helper is
> actually invoked for a given application.
>
>> I may suggest a patch to the documentation text in entry-armv.S to
>> make this requirement clearer, as well as getting rid of the example
>> assembler code (which I consider to be mis-educative, but more
>> importantly it's also Thumb-incompatible and will likely suffer poor
>> branch-prediction on recent CPUs.
>
> If you have suggestions for improving this then please do so.

I have a patch which I'll suggest at some point, but it's not high priority.

>
>> This code was the source of a TLS
>> bug in bionic, and may have been inappropriately pasted elsewhere
>> too...)
>
> What bug?

Actually, to be fair I think I may be mis-remembering here... I can't
seem to find the exact bug now.

Cheers
---Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-17 Thread Nicolas Pitre

On Tue, 17 May 2011, Dave Martin wrote:

> On Fri, May 6, 2011 at 10:39 PM, Nicolas Pitre  
> wrote:
> > On Fri, 6 May 2011, Ramana Radhakrishnan wrote:
> >
> >> On 6 May 2011 16:06, Ken Werner  wrote:
> >> >
> >> > Currently the GCC ARM backend doesn't provide a pattern to inline 64bit
> >> > __sync_* functions but the compiler emits __sync_*_8 function calls [1]. 
> >> > The
> >> > libgcc does not provide these symbols via the usual thin wrapper around 
> >> > the
> >> > kernel helper [2] because the ARM Linux __kernel_cmpxchg supports 32bit 
> >> > only.
> >> > My understanding is that for ARMv7 the GCC backend could be enhanced to 
> >> > inline
> >> > the __sync_* functions by using the LDREXD and STREXD instructions. But 
> >> > for
> >> > ARMv5 we would still rely on a new kernel helper.
> >>
> >> It's a bit tricky with when you want to use the kernel helper for v5t,
> >> so we've got to find a way of turning this on only with special knobs
> >> and not by default and that's a bit tricky.
> >
> > What is the problem with v5t?
> >
> >> Think new user space and old kernel and a jump into never-never land.
> >
> > The kernel helpers are "versioned".  At 0x0ffc you can read the
> > number of helpers currently available.  If a program uses a new helper
> > then when it is started this value can be verified to equal or exceed
> > the expected value and bail out otherwise.
> 
> To what extent do we think that userspace code is actually checking this?

I think right now it is none of it simply because most of the helpers 
were added almost all at once.  But if future helpers are added then it 
would be a good idea to check this but only if the new helper is 
actually invoked for a given application.

> I may suggest a patch to the documentation text in entry-armv.S to
> make this requirement clearer, as well as getting rid of the example
> assembler code (which I consider to be mis-educative, but more
> importantly it's also Thumb-incompatible and will likely suffer poor
> branch-prediction on recent CPUs.

If you have suggestions for improving this then please do so.

> This code was the source of a TLS
> bug in bionic, and may have been inappropriately pasted elsewhere
> too...)

What bug?


Nicolas___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-17 Thread Dave Martin

On Fri, May 6, 2011 at 10:39 PM, Nicolas Pitre  wrote:
> On Fri, 6 May 2011, Ramana Radhakrishnan wrote:
>
>> On 6 May 2011 16:06, Ken Werner  wrote:
>> >
>> > Currently the GCC ARM backend doesn't provide a pattern to inline 64bit
>> > __sync_* functions but the compiler emits __sync_*_8 function calls [1]. 
>> > The
>> > libgcc does not provide these symbols via the usual thin wrapper around the
>> > kernel helper [2] because the ARM Linux __kernel_cmpxchg supports 32bit 
>> > only.
>> > My understanding is that for ARMv7 the GCC backend could be enhanced to 
>> > inline
>> > the __sync_* functions by using the LDREXD and STREXD instructions. But for
>> > ARMv5 we would still rely on a new kernel helper.
>>
>> It's a bit tricky with when you want to use the kernel helper for v5t,
>> so we've got to find a way of turning this on only with special knobs
>> and not by default and that's a bit tricky.
>
> What is the problem with v5t?
>
>> Think new user space and old kernel and a jump into never-never land.
>
> The kernel helpers are "versioned".  At 0x0ffc you can read the
> number of helpers currently available.  If a program uses a new helper
> then when it is started this value can be verified to equal or exceed
> the expected value and bail out otherwise.

To what extent do we think that userspace code is actually checking this?

I may suggest a patch to the documentation text in entry-armv.S to
make this requirement clearer, as well as getting rid of the example
assembler code (which I consider to be mis-educative, but more
importantly it's also Thumb-incompatible and will likely suffer poor
branch-prediction on recent CPUs.  This code was the source of a TLS
bug in bionic, and may have been inappropriately pasted elsewhere
too...)

Cheers
---Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-17 Thread Dave Martin

On Wed, May 11, 2011 at 6:14 PM, saeed bishara  wrote:
> On Sat, May 7, 2011 at 12:39 AM, Nicolas Pitre  
> wrote:
>> On Fri, 6 May 2011, Ramana Radhakrishnan wrote:
>>
>>> On 6 May 2011 16:06, Ken Werner  wrote:
>>> >
>>> > Currently the GCC ARM backend doesn't provide a pattern to inline 64bit
>>> > __sync_* functions but the compiler emits __sync_*_8 function calls [1]. 
>>> > The
>>> > libgcc does not provide these symbols via the usual thin wrapper around 
>>> > the
>>> > kernel helper [2] because the ARM Linux __kernel_cmpxchg supports 32bit 
>>> > only.
>>> > My understanding is that for ARMv7 the GCC backend could be enhanced to 
>>> > inline
>>> > the __sync_* functions by using the LDREXD and STREXD instructions. But 
>>> > for
>>> > ARMv5 we would still rely on a new kernel helper.
>>>
>>> It's a bit tricky with when you want to use the kernel helper for v5t,
>>> so we've got to find a way of turning this on only with special knobs
>>> and not by default and that's a bit tricky.
>>
>> What is the problem with v5t?
> Hi, when I run the following on my jaunty (for armv5), I get the following
>
>
> echo "void f(){__sync_synchronize();}" | gcc -O2 -x c -Wall -dA -S - -o -
>        .arch armv6
>        .eabi_attribute 27, 3
>        .fpu vfp
>        .eabi_attribute 20, 1
>        .eabi_attribute 21, 1
>        .eabi_attribute 23, 3
>        .eabi_attribute 24, 1
>        .eabi_attribute 25, 1
>        .eabi_attribute 26, 2
>        .eabi_attribute 30, 2
>        .eabi_attribute 18, 4
>        .file   ""
>        .text
>        .align  2
>        .global f
>        .type   f, %function
> f:
>        @ args = 0, pretend = 0, frame = 0
>        @ frame_needed = 0, uses_anonymous_args = 0
>        @ link register save eliminated.
>        @ basic block 2
>        bx      lr
>        .size   f, .-f
>        .ident  "GCC: (Ubuntu 4.4.1-4ubuntu9) 4.4.1"
>        .section        .note.GNU-stack,"",%progbits
>
> but I can see (with debugger) that the kernel helper functions are
> reached. any idea how?

There was a compiler bug some time ago which mistakenly generated no
code for __sync_synchronize():
https://bugs.launchpad.net/ubuntu/+source/gcc-4.4/+bug/491872

I suspect your Jaunty-era compiler may be affected by this -- you
probably need to upgrade it.

If you're observing calls to the helpers, they may be coming from
glibc etc. instead of your function.


You _should_ be seeing code something like this:

$ echo 'void f(void) { __sync_synchronize(); }' |
arm-linux-gnueabi-gcc -march=armv7-a -O2 -S -o - -x c -
[...]
f:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
dmb sy
bx  lr
[...]

(If -march is set high enough, the dmb is explicit and inline.)

$ echo 'void f(void) { __sync_synchronize(); }' |
arm-linux-gnueabi-gcc -march=armv5t -O2 -S -o - -x c -
[...]
f:
push{r3, lr}
bl  __sync_synchronize
@ sp needed for prologue
pop {r3, pc}

(-march=armv5t indicates an architecture which doesn't have the
barrier instructions, so the kuser helper is needed instead)

Cheers
---Dave

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-11 Thread saeed bishara

On Sat, May 7, 2011 at 12:39 AM, Nicolas Pitre  wrote:
> On Fri, 6 May 2011, Ramana Radhakrishnan wrote:
>
>> On 6 May 2011 16:06, Ken Werner  wrote:
>> >
>> > Currently the GCC ARM backend doesn't provide a pattern to inline 64bit
>> > __sync_* functions but the compiler emits __sync_*_8 function calls [1]. 
>> > The
>> > libgcc does not provide these symbols via the usual thin wrapper around the
>> > kernel helper [2] because the ARM Linux __kernel_cmpxchg supports 32bit 
>> > only.
>> > My understanding is that for ARMv7 the GCC backend could be enhanced to 
>> > inline
>> > the __sync_* functions by using the LDREXD and STREXD instructions. But for
>> > ARMv5 we would still rely on a new kernel helper.
>>
>> It's a bit tricky with when you want to use the kernel helper for v5t,
>> so we've got to find a way of turning this on only with special knobs
>> and not by default and that's a bit tricky.
>
> What is the problem with v5t?
Hi, when I run the following on my jaunty (for armv5), I get the following


echo "void f(){__sync_synchronize();}" | gcc -O2 -x c -Wall -dA -S - -o -
.arch armv6
.eabi_attribute 27, 3
.fpu vfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 2
.eabi_attribute 18, 4
.file   ""
.text
.align  2
.global f
.type   f, %function
f:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
@ basic block 2
bx  lr
.size   f, .-f
.ident  "GCC: (Ubuntu 4.4.1-4ubuntu9) 4.4.1"
.section.note.GNU-stack,"",%progbits

but I can see (with debugger) that the kernel helper functions are
reached. any idea how?

saeed

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-06 Thread Nicolas Pitre

On Fri, 6 May 2011, Ramana Radhakrishnan wrote:

> On 6 May 2011 16:06, Ken Werner  wrote:
> >
> > Currently the GCC ARM backend doesn't provide a pattern to inline 64bit
> > __sync_* functions but the compiler emits __sync_*_8 function calls [1]. The
> > libgcc does not provide these symbols via the usual thin wrapper around the
> > kernel helper [2] because the ARM Linux __kernel_cmpxchg supports 32bit 
> > only.
> > My understanding is that for ARMv7 the GCC backend could be enhanced to 
> > inline
> > the __sync_* functions by using the LDREXD and STREXD instructions. But for
> > ARMv5 we would still rely on a new kernel helper.
> 
> It's a bit tricky with when you want to use the kernel helper for v5t,
> so we've got to find a way of turning this on only with special knobs
> and not by default and that's a bit tricky.

What is the problem with v5t?

> Think new user space and old kernel and a jump into never-never land.

The kernel helpers are "versioned".  At 0x0ffc you can read the 
number of helpers currently available.  If a program uses a new helper 
then when it is started this value can be verified to equal or exceed 
the expected value and bail out otherwise.


Nicolas

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-06 Thread Rob Herring


Ken,

On 05/06/2011 10:25 AM, Richard Earnshaw wrote:


On Fri, 2011-05-06 at 17:06 +0200, Ken Werner wrote:

Hi,

We've been thinking about adding support for the built-in functions for 64bit
atomic memory access and I'd like to know if this is of any interest.
Currently the main use of these functions seems to be to implement (SMP safe)
locking mechanisms where the existent 32bit memory ops are sufficient.
However, there might be code out there that implements a parallel algorithm
using 64bit atomic memory operations.


I've been told (though I've not verified it) that the Membase NoSQL
database needs 64-bit atomic operations to work effectively
(http://www.membase.org/downloadsrc)



By work effectively, that means they are needed to get membase to build 
and run. Without gcc support you have to copy the routines out of the 
kernel and link them with membase.


Rob

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-06 Thread Ramana Radhakrishnan

On 6 May 2011 16:06, Ken Werner  wrote:
>
> Currently the GCC ARM backend doesn't provide a pattern to inline 64bit
> __sync_* functions but the compiler emits __sync_*_8 function calls [1]. The
> libgcc does not provide these symbols via the usual thin wrapper around the
> kernel helper [2] because the ARM Linux __kernel_cmpxchg supports 32bit only.
> My understanding is that for ARMv7 the GCC backend could be enhanced to inline
> the __sync_* functions by using the LDREXD and STREXD instructions. But for
> ARMv5 we would still rely on a new kernel helper.

It's a bit tricky with when you want to use the kernel helper for v5t,
so we've got to find a way of turning this on only with special knobs
and not by default and that's a bit tricky. Think new user space and
old kernel and a jump into never-never land. For v7-a you could inline
the function using ldrexd / strexd.


cheers
Ramana






>
> ___
> linaro-dev mailing list
> linaro-dev@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-dev
>

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-06 Thread Ken Werner

On Friday, May 06, 2011 5:25:46 pm Richard Earnshaw wrote:
> On Fri, 2011-05-06 at 17:06 +0200, Ken Werner wrote:
> > Hi,
> > 
> > We've been thinking about adding support for the built-in functions for
> > 64bit atomic memory access and I'd like to know if this is of any
> > interest. Currently the main use of these functions seems to be to
> > implement (SMP safe) locking mechanisms where the existent 32bit memory
> > ops are sufficient. However, there might be code out there that
> > implements a parallel algorithm using 64bit atomic memory operations.
> 
> I've been told (though I've not verified it) that the Membase NoSQL
> database needs 64-bit atomic operations to work effectively
> (http://www.membase.org/downloadsrc)

Interesting...
..Thanks!

> R.

Regards
Ken

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Re: Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-06 Thread Richard Earnshaw


On Fri, 2011-05-06 at 17:06 +0200, Ken Werner wrote:
> Hi,
> 
> We've been thinking about adding support for the built-in functions for 64bit 
> atomic memory access and I'd like to know if this is of any interest. 
> Currently the main use of these functions seems to be to implement (SMP safe) 
> locking mechanisms where the existent 32bit memory ops are sufficient. 
> However, there might be code out there that implements a parallel algorithm 
> using 64bit atomic memory operations.

I've been told (though I've not verified it) that the Membase NoSQL
database needs 64-bit atomic operations to work effectively
(http://www.membase.org/downloadsrc)

R.





___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Usefulness of GCC's 64bit __sync_* ops on ARM

2011-05-06 Thread Ken Werner

Hi,

We've been thinking about adding support for the built-in functions for 64bit 
atomic memory access and I'd like to know if this is of any interest. 
Currently the main use of these functions seems to be to implement (SMP safe) 
locking mechanisms where the existent 32bit memory ops are sufficient. 
However, there might be code out there that implements a parallel algorithm 
using 64bit atomic memory operations.

Currently the GCC ARM backend doesn't provide a pattern to inline 64bit 
__sync_* functions but the compiler emits __sync_*_8 function calls [1]. The 
libgcc does not provide these symbols via the usual thin wrapper around the 
kernel helper [2] because the ARM Linux __kernel_cmpxchg supports 32bit only. 
My understanding is that for ARMv7 the GCC backend could be enhanced to inline 
the __sync_* functions by using the LDREXD and STREXD instructions. But for 
ARMv5 we would still rely on a new kernel helper.

Any ideas/thoughts are appreciated.

[1] https://wiki.linaro.org/WorkingGroups/ToolChain/AtomicMemoryOperations#GCC
[2] 
https://wiki.linaro.org/WorkingGroups/ToolChain/AtomicMemoryOperations#implementation_details

Regards
Ken

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

65 matches

Mail list logo