Re: [PATCH] rs6000: Add a builtin to read the time base register on PowerPC

2012-09-04 Thread Segher Boessenkool
For things that do mftb with high frequency, maybe you should  
also add a
builtin that does just an mftb, i.e. returns a 32-bit result on  
32-bit

implementations.


Are you thinking in a function that returns only the TBL?


On 32-bit, just TBL; on 64-bit, the whole TB (there is no machine
instruction to read just TBL on 64-bit, so it doesn't make much
sense to have it return a 32-bit number).


It sounds like you are asking for an additional interface for
high-frequency events that only reads one register on both PPC32 and
PPC64.


Yes.  A builtin only makes sense for measuring very short intervals;
the builtin is quite a hassle (the timebase is not part of the UISA,
and as we see it actually differs a lot between implementations), and
there is no advantage over having it in some library if you're
measuring big intervals.


  I do not believe that interface currently exists for PPC in
GLibc


Does glibc implement the timebase thing at all?  I lost track of
those patches.


and that seems out of the scope of this patch.  It could be a
nice feature, but it's a new feature request that is not necessary for
this round of patches.


Sure; on the other hand, it seems simple enough to implement.  It
was just a request.


Segher



Re: [PATCH] rs6000: Add a builtin to read the time base register on PowerPC

2012-09-04 Thread David Edelsohn
On Wed, Aug 29, 2012 at 3:09 PM, Segher Boessenkool
 wrote:
>>> For things that do mftb with high frequency, maybe you should also add a
>>> builtin that does just an mftb, i.e. returns a 32-bit result on 32-bit
>>> implementations.
>>
>> Are you thinking in a function that returns only the TBL?
>
> On 32-bit, just TBL; on 64-bit, the whole TB (there is no machine
> instruction to read just TBL on 64-bit, so it doesn't make much
> sense to have it return a 32-bit number).

It sounds like you are asking for an additional interface for
high-frequency events that only reads one register on both PPC32 and
PPC64.  I do not believe that interface currently exists for PPC in
GLibc and that seems out of the scope of this patch.  It could be a
nice feature, but it's a new feature request that is not necessary for
this round of patches.

Thanks, David


Re: [PATCH] rs6000: Add a builtin to read the time base register on PowerPC

2012-08-29 Thread Segher Boessenkool

Point being, for simulator environments, you may not want the
loop that was suggested later.  On the other hand, that might
not be an observable period, either.


I don't think looping a million times would be too slow for the
testsuite: there are many tests that do a lot more work than that,
already.

The worst case for hardware that I know of can take about 100
clock cycles for one timebase tick.

But how about this then, which only iterates much if the test
fails:


int
main (void)
{
uint64_t t = __builtin_ppc_get_timebase ();
int j;

for (j = 0; j < 100; j++)
if (t != __builtin_ppc_get_timebase ())
break;

return (j == 100);
}


Segher



Re: [PATCH] rs6000: Add a builtin to read the time base register on PowerPC

2012-08-29 Thread Tulio Magno Quites Machado Filho

Segher Boessenkool  writes:

For things that do mftb with high frequency, maybe you should 
also add a
builtin that does just an mftb, i.e. returns a 32-bit result 
on 32- 
bit

implementations.


Are you thinking in a function that returns only the TBL?


On 32-bit, just TBL; on 64-bit, the whole TB (there is no 
machine

instruction to read just TBL on 64-bit, so it doesn't make much
sense to have it return a 32-bit number).


OK.


+(define_insn "get_timebase_ppc32"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
+(unspec_volatile:DI [(const_int 0)] UNSPECV_GETTB))
+   (clobber (match_scratch:SI 1 "=r"))]
+  "TARGET_POWERPC && !TARGET_POWERPC64"
+{
+return "mftbu %0\;"
+  "mftb %L0\;"
+  "mftbu %1\;"
+  "cmpw %0,%1\;"
+  "bne- $-16";
+})


This only works for WORDS_BIG_ENDIAN.


Yes.


Do you mean you are fixing it?  :-)


Yes. At least I'll try to.  :-)

Does mftb work on all supported assemblers?  The machine 
instruction

is phased out, but some assemblers translate it to mfspr.


According to the Power ISA 2.06 they should translate it to 
mfspr.


Yes, I realised that later.

But then a binary compiled with an assembler that emits mfspr 
for mftb
will not run on POWER3 or 601.  I don't know what to do about 
that;

maybe just document it.


We can easily fix this at runtime, which isn't the case here.

Thanks again,

--
Tulio Magno



Re: [PATCH] rs6000: Add a builtin to read the time base register on PowerPC

2012-08-29 Thread Hans-Peter Nilsson
On Wed, 29 Aug 2012, Michael Meissner wrote:
> On Wed, Aug 29, 2012 at 01:56:05PM -0400, Hans-Peter Nilsson wrote:
> > On Wed, 29 Aug 2012, Segher Boessenkool wrote:
> > > On some systems the timebase runs at a rather low frequency, say 20MHz.
> > > This test will spuriously fail there.  Waste a million CPU cycles before
> > > reading TB the second time?
> >
> > Waste said million cycles portably by calling sched_yield()?
> > (Available only on POSIX systems. :)
>
> Well only for a test environment.  You don't want to call sched_yield in the
> normal case, since the apps that do this many millions of times need this to 
> be
> as a fast as possible.

Surely, but IMHO what goes for the normal case is not a valid
reading of "waste"..."millions of cycles". ;)

Point being, for simulator environments, you may not want the
loop that was suggested later.  On the other hand, that might
not be an observable period, either.

brgds, H-P


Re: [PATCH] rs6000: Add a builtin to read the time base register on PowerPC

2012-08-29 Thread Segher Boessenkool
For things that do mftb with high frequency, maybe you should also  
add a
builtin that does just an mftb, i.e. returns a 32-bit result on 32- 
bit

implementations.


Are you thinking in a function that returns only the TBL?


On 32-bit, just TBL; on 64-bit, the whole TB (there is no machine
instruction to read just TBL on 64-bit, so it doesn't make much
sense to have it return a 32-bit number).


+(define_insn "get_timebase_ppc32"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
+(unspec_volatile:DI [(const_int 0)] UNSPECV_GETTB))
+   (clobber (match_scratch:SI 1 "=r"))]
+  "TARGET_POWERPC && !TARGET_POWERPC64"
+{
+return "mftbu %0\;"
+  "mftb %L0\;"
+  "mftbu %1\;"
+  "cmpw %0,%1\;"
+  "bne- $-16";
+})


This only works for WORDS_BIG_ENDIAN.


Yes.


Do you mean you are fixing it?  :-)


Does mftb work on all supported assemblers?  The machine instruction
is phased out, but some assemblers translate it to mfspr.


According to the Power ISA 2.06 they should translate it to mfspr.


Yes, I realised that later.

But then a binary compiled with an assembler that emits mfspr for mftb
will not run on POWER3 or 601.  I don't know what to do about that;
maybe just document it.


Segher



Re: [PATCH] rs6000: Add a builtin to read the time base register on PowerPC

2012-08-29 Thread Segher Boessenkool
On some systems the timebase runs at a rather low frequency, say  
20MHz.
This test will spuriously fail there.  Waste a million CPU cycles  
before

reading TB the second time?


Waste said million cycles portably by calling sched_yield()?
(Available only on POSIX systems. :)


I was thinking more along the lines of

int j;
for (j = 0; j < 100; j++)
asm("" : : "r"(j));

which is more portable (and a lot more predictable).


Segher



Re: [PATCH] rs6000: Add a builtin to read the time base register on PowerPC

2012-08-29 Thread Tulio Magno Quites Machado Filho


Hi Segher,

Segher Boessenkool  writes:
Add __builtin_ppc_get_timebase to read the time base register 
on PowerPC.
This is required for applications that measure time at high 
frequencies

with high precision that can't afford a syscall.


For things that do mftb with high frequency, maybe you should 
also add a
builtin that does just an mftb, i.e. returns a 32-bit result on 
32-bit

implementations.


Are you thinking in a function that returns only the TBL?
I don't think such a builtin would make sense on a 64-bit 
environment, right?

Do you have a suggestion for its name?


Please add documentation for the new builtin(s).


Sure!


--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1429,6 +1429,9 @@ BU_SPECIAL_X (RS6000_BUILTIN_RSQRT, 
"__builtin_rsqrt", RS6000_BTM_FRSQRTE,
 BU_SPECIAL_X (RS6000_BUILTIN_RSQRTF, "__builtin_rsqrtf", 
 RS6000_BTM_FRSQRTES,

  RS6000_BTC_FP)

+BU_SPECIAL_X (RS6000_BUILTIN_GET_TB, 
"__builtin_ppc_get_timebase",

+RS6000_BTM_POWERPC, RS6000_BTC_MISC)


RS6000_BTM_POWERPC does not exist anymore.  RS6000_BTM_ALWAYS?


I'm replacing.

+/* Expand an expression EXP that calls a builtin without 
arguments.  */

+static rtx
+rs6000_expand_noop_builtin (enum insn_code icode, rtx target)


"noop" gives the wrong idea, "zeroop" perhaps?


zeroop is much better.




+(define_expand "get_timebase"


You should probably prefix this with powerpc_ or rs6000_ as 
well.

The existing code is not very consistent in this.


OK.


+  [(use (match_operand:DI 0 "gpc_reg_operand" ""))]
+  ""
+  "
+{
+  if (TARGET_POWERPC64)
+emit_insn (gen_get_timebase_ppc64 (operands[0]));
+  else if (TARGET_POWERPC)
+emit_insn (gen_get_timebase_ppc32 (operands[0]));
+  else
+FAIL;
+  DONE;
+}")


TARGET_POWERPC is always true.


OK.


+(define_insn "get_timebase_ppc32"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
+(unspec_volatile:DI [(const_int 0)] UNSPECV_GETTB))
+   (clobber (match_scratch:SI 1 "=r"))]
+  "TARGET_POWERPC && !TARGET_POWERPC64"
+{
+return "mftbu %0\;"
+  "mftb %L0\;"
+  "mftbu %1\;"
+  "cmpw %0,%1\;"
+  "bne- $-16";
+})


This only works for WORDS_BIG_ENDIAN.


Yes.

You should say you clobber CR0 here I think; actually, allow any 
CRn

instead.


Yes.

Does mftb work on all supported assemblers?  The machine 
instruction

is phased out, but some assemblers translate it to mfspr.


According to the Power ISA 2.06 they should translate it to mfspr.


+(define_insn "get_timebase_ppc64"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
+(unspec_volatile:DI [(const_int 0)] UNSPECV_GETTB))]
+  "TARGET_POWERPC64"
+{
+return "mfspr %0, 268";
+})


POWER3 needs mftb.


Nice catch!


+int
+main(void)
+{
+  uint64_t t1, t2, t3;
+
+  t1 = __builtin_ppc_get_timebase ();
+  t2 = __builtin_ppc_get_timebase ();
+  t3 = __builtin_ppc_get_timebase ();
+
+  if (t1 != t2 && t1 != t3 && t2 != t3)
+return 0;
+
+  return 1;
+}


On some systems the timebase runs at a rather low frequency, say 
20MHz.
This test will spuriously fail there.  Waste a million CPU 
cycles before

reading TB the second time?


Yes.

Thank you,

--
Tulio Magno



Re: [PATCH] rs6000: Add a builtin to read the time base register on PowerPC

2012-08-29 Thread Michael Meissner
On Wed, Aug 29, 2012 at 01:56:05PM -0400, Hans-Peter Nilsson wrote:
> On Wed, 29 Aug 2012, Segher Boessenkool wrote:
> > On some systems the timebase runs at a rather low frequency, say 20MHz.
> > This test will spuriously fail there.  Waste a million CPU cycles before
> > reading TB the second time?
> 
> Waste said million cycles portably by calling sched_yield()?
> (Available only on POSIX systems. :)

Well only for a test environment.  You don't want to call sched_yield in the
normal case, since the apps that do this many millions of times need this to be
as a fast as possible.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899



Re: [PATCH] rs6000: Add a builtin to read the time base register on PowerPC

2012-08-29 Thread Hans-Peter Nilsson
On Wed, 29 Aug 2012, Segher Boessenkool wrote:
> > +++ b/gcc/testsuite/gcc.target/powerpc/ppc-get-timebase.c
> > @@ -0,0 +1,22 @@
> > +/* { dg-do run { target { powerpc*-*-* } } } */
> > +
> > +/* Test if __builtin_ppc_get_timebase() is compatible with the current
> > +   processor and if it's changing between reads.  A read failure might
> > indicate
> > +   a Power ISA or binutils change.  */
> > +
> > +#include 
> > +
> > +int
> > +main(void)
> > +{
> > +  uint64_t t1, t2, t3;
> > +
> > +  t1 = __builtin_ppc_get_timebase ();
> > +  t2 = __builtin_ppc_get_timebase ();
> > +  t3 = __builtin_ppc_get_timebase ();
> > +
> > +  if (t1 != t2 && t1 != t3 && t2 != t3)
> > +return 0;
> > +
> > +  return 1;
> > +}
>
> On some systems the timebase runs at a rather low frequency, say 20MHz.
> This test will spuriously fail there.  Waste a million CPU cycles before
> reading TB the second time?

Waste said million cycles portably by calling sched_yield()?
(Available only on POSIX systems. :)

brgds, H-P



Re: [PATCH] rs6000: Add a builtin to read the time base register on PowerPC

2012-08-29 Thread Andrew Pinski
On Wed, Aug 29, 2012 at 6:56 AM, Tulio Magno Quites Machado Filho
 wrote:
> Add __builtin_ppc_get_timebase to read the time base register on PowerPC.
> This is required for applications that measure time at high frequencies
> with high precision that can't afford a syscall.
>
> [gcc]
> 2012-08-29 Tulio Magno Quites Machado Filho 
>
> * config/rs6000/rs6000-builtin.def: Add __builtin_ppc_get_timebase.
> * config/rs6000/rs6000.c (rs6000_expand_noop_builtin): New
> function to expand an expression that calls a builtin without
> arguments.
> (rs6000_expand_builtin): Add __builtin_ppc_get_timebase.
> (rs6000_init_builtins): Likewise.
> * config/rs6000/rs6000.md: Likewise.
>
> [gcc/testsuite]
> 2012-08-29 Tulio Magno Quites Machado Filho 
>
> * gcc.target/powerpc/ppc-get-timebase.c: New file.
> ---
>  gcc/config/rs6000/rs6000-builtin.def   |3 ++
>  gcc/config/rs6000/rs6000.c |   31 +
>  gcc/config/rs6000/rs6000.md|   36 
> 
>  .../gcc.target/powerpc/ppc-get-timebase.c  |   22 
>  4 files changed, 92 insertions(+), 0 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/ppc-get-timebase.c
>
> diff --git a/gcc/config/rs6000/rs6000-builtin.def 
> b/gcc/config/rs6000/rs6000-builtin.def
> index c8f8f86..75ad184 100644
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -1429,6 +1429,9 @@ BU_SPECIAL_X (RS6000_BUILTIN_RSQRT, "__builtin_rsqrt", 
> RS6000_BTM_FRSQRTE,
>  BU_SPECIAL_X (RS6000_BUILTIN_RSQRTF, "__builtin_rsqrtf", RS6000_BTM_FRSQRTES,
>   RS6000_BTC_FP)
>
> +BU_SPECIAL_X (RS6000_BUILTIN_GET_TB, "__builtin_ppc_get_timebase",
> +RS6000_BTM_POWERPC, RS6000_BTC_MISC)
> +
>  /* Darwin CfString builtin.  */
>  BU_SPECIAL_X (RS6000_BUILTIN_CFSTRING, "__builtin_cfstring", 
> RS6000_BTM_ALWAYS,
>   RS6000_BTC_MISC)
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 6c58307..24e274d 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -9747,6 +9747,30 @@ rs6000_overloaded_builtin_p (enum rs6000_builtins 
> fncode)
>return (rs6000_builtin_info[(int)fncode].attr & RS6000_BTC_OVERLOADED) != 
> 0;
>  }
>
> +/* Expand an expression EXP that calls a builtin without arguments.  */
> +static rtx
> +rs6000_expand_noop_builtin (enum insn_code icode, rtx target)
> +{
> +  rtx pat;
> +  enum machine_mode tmode = insn_data[icode].operand[0].mode;
> +
> +  if (icode == CODE_FOR_nothing)
> +/* Builtin not supported on this processor.  */
> +return 0;
> +
> +  if (target == 0
> +  || GET_MODE (target) != tmode
> +  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
> +target = gen_reg_rtx (tmode);
> +
> +  pat = GEN_FCN (icode) (target);
> +  if (! pat)
> +return 0;
> +  emit_insn (pat);
> +
> +  return target;
> +}
> +
>
>  static rtx
>  rs6000_expand_unop_builtin (enum insn_code icode, tree exp, rtx target)
> @@ -11336,6 +11360,9 @@ rs6000_expand_builtin (tree exp, rtx target, rtx 
> subtarget ATTRIBUTE_UNUSED,
>? CODE_FOR_bpermd_di
>: CODE_FOR_bpermd_si), exp, 
> target);
>
> +case RS6000_BUILTIN_GET_TB:
> +  return rs6000_expand_noop_builtin (CODE_FOR_get_timebase, target);
> +
>  case ALTIVEC_BUILTIN_MASK_FOR_LOAD:
>  case ALTIVEC_BUILTIN_MASK_FOR_STORE:
>{
> @@ -11620,6 +11647,10 @@ rs6000_init_builtins (void)
>  POWER7_BUILTIN_BPERMD, "__builtin_bpermd");
>def_builtin ("__builtin_bpermd", ftype, POWER7_BUILTIN_BPERMD);
>
> +  ftype = build_function_type_list (unsigned_intDI_type_node,
> +   NULL_TREE);
> +  def_builtin ("__builtin_ppc_get_timebase", ftype, RS6000_BUILTIN_GET_TB);
> +
>  #if TARGET_XCOFF
>/* AIX libm provides clog as __clog.  */
>if ((tdecl = builtin_decl_explicit (BUILT_IN_CLOG)) != NULL_TREE)
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index d5ffd81..09bdd80 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -136,6 +136,7 @@
> UNSPECV_PROBE_STACK_RANGE   ; probe range of stack addresses
> UNSPECV_EH_RR   ; eh_reg_restore
> UNSPECV_ISYNC   ; isync instruction
> +   UNSPECV_GETTB   ; get timebase built-in
>])
>
>
> @@ -14101,6 +14102,41 @@
>""
>"")
>
> +(define_expand "get_timebase"
> +  [(use (match_operand:DI 0 "gpc_reg_operand" ""))]
> +  ""
> +  "
> +{
> +  if (TARGET_POWERPC64)
> +emit_insn (gen_get_timebase_ppc64 (operands[0]));
> +  else if (TARGET_POWERPC)
> +emit_insn (gen_get_timebase_ppc32 (operands[0]));
> +  else
> +FAIL;
> +  DONE;
> +}")
> +
> +(define_insn "get_timebase_ppc32"
> +  [(set (match_operand:DI 0 "gp

Re: [PATCH] rs6000: Add a builtin to read the time base register on PowerPC

2012-08-29 Thread Segher Boessenkool

Hi Tulio,

Add __builtin_ppc_get_timebase to read the time base register on  
PowerPC.
This is required for applications that measure time at high  
frequencies

with high precision that can't afford a syscall.


For things that do mftb with high frequency, maybe you should also add a
builtin that does just an mftb, i.e. returns a 32-bit result on 32-bit
implementations.

Please add documentation for the new builtin(s).


--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1429,6 +1429,9 @@ BU_SPECIAL_X (RS6000_BUILTIN_RSQRT,  
"__builtin_rsqrt", RS6000_BTM_FRSQRTE,
 BU_SPECIAL_X (RS6000_BUILTIN_RSQRTF, "__builtin_rsqrtf",  
RS6000_BTM_FRSQRTES,

  RS6000_BTC_FP)

+BU_SPECIAL_X (RS6000_BUILTIN_GET_TB, "__builtin_ppc_get_timebase",
+RS6000_BTM_POWERPC, RS6000_BTC_MISC)


RS6000_BTM_POWERPC does not exist anymore.  RS6000_BTM_ALWAYS?

+/* Expand an expression EXP that calls a builtin without  
arguments.  */

+static rtx
+rs6000_expand_noop_builtin (enum insn_code icode, rtx target)


"noop" gives the wrong idea, "zeroop" perhaps?


+(define_expand "get_timebase"


You should probably prefix this with powerpc_ or rs6000_ as well.
The existing code is not very consistent in this.


+  [(use (match_operand:DI 0 "gpc_reg_operand" ""))]
+  ""
+  "
+{
+  if (TARGET_POWERPC64)
+emit_insn (gen_get_timebase_ppc64 (operands[0]));
+  else if (TARGET_POWERPC)
+emit_insn (gen_get_timebase_ppc32 (operands[0]));
+  else
+FAIL;
+  DONE;
+}")


TARGET_POWERPC is always true.


+(define_insn "get_timebase_ppc32"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
+(unspec_volatile:DI [(const_int 0)] UNSPECV_GETTB))
+   (clobber (match_scratch:SI 1 "=r"))]
+  "TARGET_POWERPC && !TARGET_POWERPC64"
+{
+return "mftbu %0\;"
+  "mftb %L0\;"
+  "mftbu %1\;"
+  "cmpw %0,%1\;"
+  "bne- $-16";
+})


This only works for WORDS_BIG_ENDIAN.

You should say you clobber CR0 here I think; actually, allow any CRn
instead.

Does mftb work on all supported assemblers?  The machine instruction
is phased out, but some assemblers translate it to mfspr.


+(define_insn "get_timebase_ppc64"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
+(unspec_volatile:DI [(const_int 0)] UNSPECV_GETTB))]
+  "TARGET_POWERPC64"
+{
+return "mfspr %0, 268";
+})


POWER3 needs mftb.


--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/ppc-get-timebase.c
@@ -0,0 +1,22 @@
+/* { dg-do run { target { powerpc*-*-* } } } */
+
+/* Test if __builtin_ppc_get_timebase() is compatible with the  
current
+   processor and if it's changing between reads.  A read failure  
might indicate

+   a Power ISA or binutils change.  */
+
+#include 
+
+int
+main(void)
+{
+  uint64_t t1, t2, t3;
+
+  t1 = __builtin_ppc_get_timebase ();
+  t2 = __builtin_ppc_get_timebase ();
+  t3 = __builtin_ppc_get_timebase ();
+
+  if (t1 != t2 && t1 != t3 && t2 != t3)
+return 0;
+
+  return 1;
+}


On some systems the timebase runs at a rather low frequency, say 20MHz.
This test will spuriously fail there.  Waste a million CPU cycles before
reading TB the second time?


Segher