Re: Libitm issues porting to POWER8 HTM

2013-06-19 Thread Andi Kleen
On Wed, Jun 19, 2013 at 11:04:25AM -0500, Peter Bergner wrote:
> On Tue, 2013-06-18 at 21:48 +0200, Andi Kleen wrote:
> > > Given Torvald's comment, can you verify whether your hw txn succeeds
> > > (all the way to commit) or whether it is failing and somehow skips
> > > the fall through code that is hanging for us (Power and S390)?
> > 
> > All the 3 transactions in reentrant.c abort.
> 
> Can you please explain the above?  When you say abort, do you mean
> that libitm is calling htm_abort() or that your xbegin hardware
> instruction isn't succeeding?

XBEGIN aborts, according to the hardware counters.
> 
> > That's not surprising, because there are usually lots of aborts in
> > the startup phase of programs, and the test doesn't use a loop.
> 
> Is this a libitm statement or an Intel RTM statement, that the
> startup phase usually has lots of aborts?

This is a Intel RTM statement.

-Andi


Re: Libitm issues porting to POWER8 HTM

2013-06-19 Thread Peter Bergner
On Tue, 2013-06-18 at 21:48 +0200, Andi Kleen wrote:
> > Given Torvald's comment, can you verify whether your hw txn succeeds
> > (all the way to commit) or whether it is failing and somehow skips
> > the fall through code that is hanging for us (Power and S390)?
> 
> All the 3 transactions in reentrant.c abort.

Can you please explain the above?  When you say abort, do you mean
that libitm is calling htm_abort() or that your xbegin hardware
instruction isn't succeeding?

> That's not surprising, because there are usually lots of aborts in
> the startup phase of programs, and the test doesn't use a loop.

Is this a libitm statement or an Intel RTM statement, that the
startup phase usually has lots of aborts?

Peter





Re: Libitm issues porting to POWER8 HTM

2013-06-18 Thread Andi Kleen
> Given Torvald's comment, can you verify whether your hw txn succeeds
> (all the way to commit) or whether it is failing and somehow skips
> the fall through code that is hanging for us (Power and S390)?

All the 3 transactions in reentrant.c abort. That's not surprising,
because there are usually lots of aborts in the startup phase of
programs, and the test doesn't use a loop.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.


Re: Libitm issues porting to POWER8 HTM

2013-06-18 Thread Peter Bergner
On Tue, 2013-06-18 at 18:41 +0200, Torvald Riegel wrote:
> On Fri, 2013-06-14 at 19:44 -0500, Peter Bergner wrote:
> > I'll note that if I hack the call to
> > htm_abort_should_retry(ret) so that we break of of the loop and fallback
> > to SW TM, then the test case executes correctly.
> 
> That matches what I suppose the bug is.
> 
> Please feel free to create a bug report.  I will work on a patch.

Done.  http://gcc.gnu.org/PR57643

Since this seems to pass on x86, let me know if you want me to test a
patch on our power8 system.

Peter





Re: Libitm issues porting to POWER8 HTM

2013-06-18 Thread Peter Bergner
On Tue, 2013-06-18 at 11:22 -0700, Andi Kleen wrote:
> Peter Bergner  writes:
> >
> > I have yet to track down who has the write lock and why, but I am working
> > towards that.  Talking with Andreas, he said he is seeing the same failure
> > on S390, so I'm wondering whether this might be a generic libitm issue
> > and it might hit Intel too.  Does anyone know whether this executes 
> > correctly
> > on Intel hardware with RTM?  I'll note that if I hack the call to
> 
> FWIW on a TSX system I get the following for libitm with current
> trunk. So no hangs on reentrant at least.

Given Torvald's comment, can you verify whether your hw txn succeeds
(all the way to commit) or whether it is failing and somehow skips
the fall through code that is hanging for us (Power and S390)?

Thanks!

Peter





Re: Libitm issues porting to POWER8 HTM

2013-06-18 Thread Andi Kleen
Peter Bergner  writes:
>
> I have yet to track down who has the write lock and why, but I am working
> towards that.  Talking with Andreas, he said he is seeing the same failure
> on S390, so I'm wondering whether this might be a generic libitm issue
> and it might hit Intel too.  Does anyone know whether this executes correctly
> on Intel hardware with RTM?  I'll note that if I hack the call to

FWIW on a TSX system I get the following for libitm with current
trunk. So no hangs on reentrant at least.

Native configuration is x86_64-unknown-linux-gnu

=== libitm tests ===

Schedule of variations:
unix

Running target unix
Running /home/ak/gcc/gcc/libitm/testsuite/libitm.c/c.exp ...
PASS: libitm.c/cancel.c (test for excess errors)
PASS: libitm.c/cancel.c execution test
PASS: libitm.c/clone-1.c (test for excess errors)
PASS: libitm.c/clone-1.c execution test
PASS: libitm.c/dropref-2.c (test for excess errors)
XFAIL: libitm.c/dropref-2.c execution test
PASS: libitm.c/dropref.c (test for excess errors)
XFAIL: libitm.c/dropref.c execution test
PASS: libitm.c/memcpy-1.c (test for excess errors)
PASS: libitm.c/memcpy-1.c execution test
PASS: libitm.c/memset-1.c (test for excess errors)
PASS: libitm.c/memset-1.c execution test
PASS: libitm.c/notx.c (test for excess errors)
PASS: libitm.c/notx.c execution test
PASS: libitm.c/reentrant.c (test for excess errors)
PASS: libitm.c/reentrant.c execution test
PASS: libitm.c/simple-1.c (test for excess errors)
PASS: libitm.c/simple-1.c execution test
PASS: libitm.c/simple-2.c (test for excess errors)
PASS: libitm.c/simple-2.c execution test
PASS: libitm.c/stackundo.c (test for excess errors)
PASS: libitm.c/stackundo.c execution test
PASS: libitm.c/txrelease.c (test for excess errors)
PASS: libitm.c/txrelease.c execution test
Running /home/ak/gcc/gcc/libitm/testsuite/libitm.c++/c++.exp ...
PASS: libitm.c++/dropref.C (test for excess errors)
XFAIL: libitm.c++/dropref.C execution test
PASS: libitm.c++/eh-1.C (test for excess errors)
PASS: libitm.c++/eh-1.C execution test
UNSUPPORTED: libitm.c++/static_ctor.C
PASS: libitm.c++/throwdown.C (test for excess errors)

=== libitm Summary ===

# of expected passes26
# of expected failures  3
# of unsupported tests  1

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: Libitm issues porting to POWER8 HTM

2013-06-18 Thread Torvald Riegel
On Fri, 2013-06-14 at 19:44 -0500, Peter Bergner wrote:
> I'm currently implementing support for hardware transactional memory in
> the rs6000 backend for POWER8.  Things seem to be mostly working, but I
> have run into a few issues I'm wondering whether other people are seeing.
> 
> For me, all of the libitm execution test cases in libitm/testsuite/libitm.c/
> compile and execute without error, except for reentrant.c, which hangs for me.
> My gdb hasn't been ported to support HTM on Power yet, so debugging has been
> slow, but what I've learned is, that my tbegin. instruction succeeds, but I
> fail the test (meaning someone has the write lock) at beginend.cc:200:
> 
> if (unlikely(serial_lock.is_write_locked()))
>   htm_abort();
> 
> ...so we abort the transaction.  The failure is not persistent, so we do
> not break out of the loop due to:
> 
> if (!htm_abort_should_retry(ret))
>   break;
> 
> We then fall into the following code, where we hang trying to get the
> read lock:
> 
> serial_lock.read_lock(tx);
> 
> I have yet to track down who has the write lock and why, but I am working
> towards that.  Talking with Andreas, he said he is seeing the same failure
> on S390, so I'm wondering whether this might be a generic libitm issue
> and it might hit Intel too.

I think that this is a bug in libitm's HTM fastpath.  What I suppose
happens is that we have a relaxed outermost transaction that executes
unsafe code (see reentrant.c), thus switches to serial-irrevocable mode,
and then tries to start a nested transaction.  The nested txn then
observes in the HTM fastpath that there is a serial-mode txn already,
but it never checks whether it is enclosed in an already serial
outermost transaction.

> Does anyone know whether this executes correctly
> on Intel hardware with RTM?

I don't know currently, but I suppose the bug should trigger there too
(unless, for some reason, the nested txn always aborts immediately with
RTM).

> I'll note that if I hack the call to
> htm_abort_should_retry(ret) so that we break of of the loop and fallback
> to SW TM, then the test case executes correctly.

That matches what I suppose the bug is.

Please feel free to create a bug report.  I will work on a patch.

Torvald



Re: Libitm issues porting to POWER8 HTM

2013-06-17 Thread Patrick Marlier
Hi Peter,

On Sat, Jun 15, 2013 at 2:44 AM, Peter Bergner  wrote:
> I'm currently implementing support for hardware transactional memory in
> the rs6000 backend for POWER8.  Things seem to be mostly working, but I
> have run into a few issues I'm wondering whether other people are seeing.

It sounds great! Is it already publicly available?

> Finially, when compiling (static or non-static) static-ctor.C, I'm seeing:
>
> /home/bergner/gcc/gcc-fsf-mainline-htm/libitm/testsuite/libitm.c++/static_ctor.C:12:18:
>  error: unsafe function call 'void __cxa_guard_release(long long int*)' 
> within 'transaction_safe' function
>static int y = x;
>   ^
> /home/bergner/gcc/gcc-fsf-mainline-htm/libitm/testsuite/libitm.c++/static_ctor.C:12:18:
>  error: unsafe function call 'int __cxa_guard_acquire(long long int*)' within 
> 'transaction_safe' function
>
> Does x86 not get calls to __cxa_guard_acquire and __cxa_guard_release for
> this access, so it doesn't see this error?  To be honest, I'm not sure
> what we're supposed to do with this error.

Sorry I don't have answers to your previous questions (I may have in
the future when I will get a CPU with HTM).

About the last one, this fails for a long long time now (even on x86):
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51173
Indeed, static constructors are not transaction safe yet and we should
have a workaround for this...
--
Patrick


Libitm issues porting to POWER8 HTM

2013-06-14 Thread Peter Bergner
I'm currently implementing support for hardware transactional memory in
the rs6000 backend for POWER8.  Things seem to be mostly working, but I
have run into a few issues I'm wondering whether other people are seeing.

For me, all of the libitm execution test cases in libitm/testsuite/libitm.c/
compile and execute without error, except for reentrant.c, which hangs for me.
My gdb hasn't been ported to support HTM on Power yet, so debugging has been
slow, but what I've learned is, that my tbegin. instruction succeeds, but I
fail the test (meaning someone has the write lock) at beginend.cc:200:

if (unlikely(serial_lock.is_write_locked()))
  htm_abort();

...so we abort the transaction.  The failure is not persistent, so we do
not break out of the loop due to:

if (!htm_abort_should_retry(ret))
  break;

We then fall into the following code, where we hang trying to get the
read lock:

serial_lock.read_lock(tx);

I have yet to track down who has the write lock and why, but I am working
towards that.  Talking with Andreas, he said he is seeing the same failure
on S390, so I'm wondering whether this might be a generic libitm issue
and it might hit Intel too.  Does anyone know whether this executes correctly
on Intel hardware with RTM?  I'll note that if I hack the call to
htm_abort_should_retry(ret) so that we break of of the loop and fallback
to SW TM, then the test case executes correctly.


Secondly, many of the test cases in libitm/testsuite/libitm.c++/ fail
to build for me when I use -static with the following error:

/home/bergner/gcc/install/gcc-fsf-mainline-htm/lib64/libitm.a(method-serial.o):(.opd+0x1098):
 multiple definition of `__cxa_pure_virtual'
/home/bergner/gcc/install/gcc-fsf-mainline-htm/lib64/libstdc++.a(pure.o):(.opd+0x0):
 first defined here
collect2: error: ld returned 1 exit status

The comment in method-serial.cc says it's trying to avoid a dependency
on libstdc++.  Is the __cxa_pure_virtual workaround in method-serial.cc
supposed to work with -static?


Finially, when compiling (static or non-static) static-ctor.C, I'm seeing:

/home/bergner/gcc/gcc-fsf-mainline-htm/libitm/testsuite/libitm.c++/static_ctor.C:12:18:
 error: unsafe function call 'void __cxa_guard_release(long long int*)' within 
'transaction_safe' function
   static int y = x;
  ^
/home/bergner/gcc/gcc-fsf-mainline-htm/libitm/testsuite/libitm.c++/static_ctor.C:12:18:
 error: unsafe function call 'int __cxa_guard_acquire(long long int*)' within 
'transaction_safe' function

Does x86 not get calls to __cxa_guard_acquire and __cxa_guard_release for
this access, so it doesn't see this error?  To be honest, I'm not sure
what we're supposed to do with this error.


Peter