Re: [PATCH 1/3] Add rtx costs for sse integer ops

2012-06-27 Thread Igor Zamyatin
On Wed, Jun 27, 2012 at 9:14 PM, Richard Henderson  wrote:
> On 06/27/2012 10:08 AM, Igor Zamyatin wrote:
>> On Wed, Jun 27, 2012 at 9:00 PM, Richard Henderson  wrote:
>>> > On 06/27/2012 09:07 AM, Igor Zamyatin wrote:
 >> May I ask about the purpose of the following piece of change? Doesn't
 >> it affect non-sse cases either?
>>> >
>>> > Err, no, it doesn't affect non-sse cases.  All MODE_VECTOR_INT
>>> > cases will be implemented in the xmm registers (ignoring the
>>> > deprecated and largely ignored mmx case).
>> Probably I misunderstand something... This condition - GET_MODE_SIZE
>> (mode) < UNITS_PER_WORD - is outside the check for MODE_VECTOR_INT.
>
> Of course.
>
> We currently have
>
>        if (vector mode)
>        else if (mode <= word size)
>        else /* scalar mode > word size */

Sure, it's clear. So am I correct that in this case now for second
possibility we run the code which was executed for third possibility
before the patch (it was under TARGET_64BIT && GET_MODE (XEXP (x, 0))
== DImode)? If yes, why?

>
> We could no doubt legitimately rearrange this to
>
>        if (mode < word size)
>        else if (vector mode)
>        else /* scalar mode > word size */
>
> but I don't see how that's any clearer.  We certainly
> can't eliminate any tests, since there are in fact
> three different possibilities.
>
>
>
> r~


Re: [PATCH 1/3] Add rtx costs for sse integer ops

2012-06-27 Thread Richard Henderson
On 06/27/2012 10:08 AM, Igor Zamyatin wrote:
> On Wed, Jun 27, 2012 at 9:00 PM, Richard Henderson  wrote:
>> > On 06/27/2012 09:07 AM, Igor Zamyatin wrote:
>>> >> May I ask about the purpose of the following piece of change? Doesn't
>>> >> it affect non-sse cases either?
>> >
>> > Err, no, it doesn't affect non-sse cases.  All MODE_VECTOR_INT
>> > cases will be implemented in the xmm registers (ignoring the
>> > deprecated and largely ignored mmx case).
> Probably I misunderstand something... This condition - GET_MODE_SIZE
> (mode) < UNITS_PER_WORD - is outside the check for MODE_VECTOR_INT.

Of course.

We currently have

if (vector mode)
else if (mode <= word size)
else /* scalar mode > word size */

We could no doubt legitimately rearrange this to

if (mode < word size)
else if (vector mode)
else /* scalar mode > word size */

but I don't see how that's any clearer.  We certainly
can't eliminate any tests, since there are in fact
three different possibilities.



r~


Re: [PATCH 1/3] Add rtx costs for sse integer ops

2012-06-27 Thread Igor Zamyatin
On Wed, Jun 27, 2012 at 9:00 PM, Richard Henderson  wrote:
> On 06/27/2012 09:07 AM, Igor Zamyatin wrote:
>> May I ask about the purpose of the following piece of change? Doesn't
>> it affect non-sse cases either?
>
> Err, no, it doesn't affect non-sse cases.  All MODE_VECTOR_INT
> cases will be implemented in the xmm registers (ignoring the
> deprecated and largely ignored mmx case).

Probably I misunderstand something... This condition - GET_MODE_SIZE
(mode) < UNITS_PER_WORD - is outside the check for MODE_VECTOR_INT.

>
>
>>
>> @@ -32038,7 +32042,15 @@ ix86_rtx_costs (rtx x, int code, int
>> outer_code_i, int opno, int *total,
>>     case ASHIFTRT:
>>     case LSHIFTRT:
>>     case ROTATERT:
>> -      if (!TARGET_64BIT && GET_MODE (XEXP (x, 0)) == DImode)
>> +      if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
>> +       {
>> +         /* ??? Should be SSE vector operation cost.  */
>> +         /* At least for published AMD latencies, this really is the same
>> +            as the latency for a simple fpu operation like fabs.  */
>> +         *total = cost->fabs;
>> +         return false;
>> +       }
>> +      if (GET_MODE_SIZE (mode) < UNITS_PER_WORD)
>>        {
>>          if (CONST_INT_P (XEXP (x, 1)))
>>            {
>>
>> It also seems that we reversed the condition for the code that is now
>> under if (GET_MODE_SIZE (mode) < UNITS_PER_WORD). Why do we need this?
>
> I'm not sure what you're suggesting.  But we certainly don't use
> the xmm registers to implement DImode operations in 32-bit, so...
>
>
> r~


Re: [PATCH 1/3] Add rtx costs for sse integer ops

2012-06-27 Thread Richard Henderson
On 06/27/2012 09:07 AM, Igor Zamyatin wrote:
> May I ask about the purpose of the following piece of change? Doesn't
> it affect non-sse cases either?

Err, no, it doesn't affect non-sse cases.  All MODE_VECTOR_INT
cases will be implemented in the xmm registers (ignoring the
deprecated and largely ignored mmx case).


> 
> @@ -32038,7 +32042,15 @@ ix86_rtx_costs (rtx x, int code, int
> outer_code_i, int opno, int *total,
> case ASHIFTRT:
> case LSHIFTRT:
> case ROTATERT:
> -  if (!TARGET_64BIT && GET_MODE (XEXP (x, 0)) == DImode)
> +  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> +   {
> + /* ??? Should be SSE vector operation cost.  */
> + /* At least for published AMD latencies, this really is the same
> +as the latency for a simple fpu operation like fabs.  */
> + *total = cost->fabs;
> + return false;
> +   }
> +  if (GET_MODE_SIZE (mode) < UNITS_PER_WORD)
>{
>  if (CONST_INT_P (XEXP (x, 1)))
>{
> 
> It also seems that we reversed the condition for the code that is now
> under if (GET_MODE_SIZE (mode) < UNITS_PER_WORD). Why do we need this?

I'm not sure what you're suggesting.  But we certainly don't use
the xmm registers to implement DImode operations in 32-bit, so...


r~


Re: [PATCH 1/3] Add rtx costs for sse integer ops

2012-06-27 Thread Igor Zamyatin
May I ask about the purpose of the following piece of change? Doesn't
it affect non-sse cases either?

@@ -32038,7 +32042,15 @@ ix86_rtx_costs (rtx x, int code, int
outer_code_i, int opno, int *total,
case ASHIFTRT:
case LSHIFTRT:
case ROTATERT:
-  if (!TARGET_64BIT && GET_MODE (XEXP (x, 0)) == DImode)
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+   {
+ /* ??? Should be SSE vector operation cost.  */
+ /* At least for published AMD latencies, this really is the same
+as the latency for a simple fpu operation like fabs.  */
+ *total = cost->fabs;
+ return false;
+   }
+  if (GET_MODE_SIZE (mode) < UNITS_PER_WORD)
   {
 if (CONST_INT_P (XEXP (x, 1)))
   {

It also seems that we reversed the condition for the code that is now
under if (GET_MODE_SIZE (mode) < UNITS_PER_WORD). Why do we need this?


Thanks,
Igor

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org
[mailto:gcc-patches-ow...@gcc.gnu.org] On Behalf Of Richard Henderson
Sent: Saturday, June 16, 2012 12:57 AM
To: gcc-patches@gcc.gnu.org
Cc: rguent...@suse.de; ubiz...@gmail.com; hjl.to...@gmail.com
Subject: [PATCH 1/3] Add rtx costs for sse integer ops

---
 gcc/config/i386/i386.c |   50 ++-
 1 files changed, 40 insertions(+), 10 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index
e2f5740..578a756 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -31990,13 +31990,16 @@ ix86_rtx_costs (rtx x, int code, int
outer_code_i, int opno, int *total,
           break;
         case 0:
         case -1:
-           /* Start with (MEM (SYMBOL_REF)), since that's where
-              it'll probably end up.  Add a penalty for size.  */
-           *total = (COSTS_N_INSNS (1)
-                     + (flag_pic != 0 && !TARGET_64BIT)
-                     + (mode == SFmode ? 0 : mode == DFmode ? 1 : 2));
           break;
         }
+      /* FALLTHRU */
+
+    case CONST_VECTOR:
+      /* Start with (MEM (SYMBOL_REF)), since that's where
+        it'll probably end up.  Add a penalty for size.  */
+      *total = (COSTS_N_INSNS (1)
+               + (flag_pic != 0 && !TARGET_64BIT)
+               + (mode == SFmode ? 0 : mode == DFmode ? 1 : 2));
      return true;

    case ZERO_EXTEND:
@@ -32016,8 +32019,9 @@ ix86_rtx_costs (rtx x, int code, int
outer_code_i, int opno, int *total,
      return false;

    case ASHIFT:
-      if (CONST_INT_P (XEXP (x, 1))
-         && (GET_MODE (XEXP (x, 0)) != DImode || TARGET_64BIT))
+      if (SCALAR_INT_MODE_P (mode)
+         && GET_MODE_SIZE (mode) < UNITS_PER_WORD
+         && CONST_INT_P (XEXP (x, 1)))
       {
         HOST_WIDE_INT value = INTVAL (XEXP (x, 1));
         if (value == 1)
@@ -32038,7 +32042,15 @@ ix86_rtx_costs (rtx x, int code, int
outer_code_i, int opno, int *total,
    case ASHIFTRT:
    case LSHIFTRT:
    case ROTATERT:
-      if (!TARGET_64BIT && GET_MODE (XEXP (x, 0)) == DImode)
+      if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+       {
+         /* ??? Should be SSE vector operation cost.  */
+         /* At least for published AMD latencies, this really is the same
+            as the latency for a simple fpu operation like fabs.  */
+         *total = cost->fabs;
+         return false;
+       }
+      if (GET_MODE_SIZE (mode) < UNITS_PER_WORD)
       {
         if (CONST_INT_P (XEXP (x, 1)))
           {
@@ -32107,6 +32119,16 @@ ix86_rtx_costs (rtx x, int code, int
outer_code_i, int opno, int *total,
         *total = cost->fmul;
         return false;
       }
+      else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+       {
+         /* Without sse4.1, we don't have PMULLD; it's emulated with 7
+            insns, including two PMULUDQ.  */
+         if (mode == V4SImode && !(TARGET_SSE4_1 || TARGET_AVX))
+           *total = cost->fmul * 2 + cost->fabs * 5;
+         else
+           *total = cost->fmul;
+         return false;
+       }
      else
       {
         rtx op0 = XEXP (x, 0);
@@ -32171,7 +32193,7 @@ ix86_rtx_costs (rtx x, int code, int
outer_code_i, int opno, int *total,

    case PLUS:
      if (GET_MODE_CLASS (mode) == MODE_INT
-              && GET_MODE_BITSIZE (mode) <= GET_MODE_BITSIZE (Pmode))
+         && GET_MODE_SIZE (mode) <= UNITS_PER_WORD)
       {
         if (GET_CODE (XEXP (x, 0)) == PLUS
             && GET_CODE (XEXP (XEXP (x, 0), 0)) == MULT @@ -32271,6
+32293,14 @@ ix86_rtx_costs (rtx x, int code, int outer_code_i, int
opno, int *total,
      /* FALLTHRU */

    case NOT:
+      if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+       {
+         /* ??? Should be SSE vector operation cost.  */
+         /* At least for published AMD latencies, this really is the same
+            as the latency for 

[PATCH 1/3] Add rtx costs for sse integer ops

2012-06-15 Thread Richard Henderson
---
 gcc/config/i386/i386.c |   50 ++-
 1 files changed, 40 insertions(+), 10 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e2f5740..578a756 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -31990,13 +31990,16 @@ ix86_rtx_costs (rtx x, int code, int outer_code_i, 
int opno, int *total,
break;
  case 0:
  case -1:
-   /* Start with (MEM (SYMBOL_REF)), since that's where
-  it'll probably end up.  Add a penalty for size.  */
-   *total = (COSTS_N_INSNS (1)
- + (flag_pic != 0 && !TARGET_64BIT)
- + (mode == SFmode ? 0 : mode == DFmode ? 1 : 2));
break;
  }
+  /* FALLTHRU */
+
+case CONST_VECTOR:
+  /* Start with (MEM (SYMBOL_REF)), since that's where
+it'll probably end up.  Add a penalty for size.  */
+  *total = (COSTS_N_INSNS (1)
+   + (flag_pic != 0 && !TARGET_64BIT)
+   + (mode == SFmode ? 0 : mode == DFmode ? 1 : 2));
   return true;
 
 case ZERO_EXTEND:
@@ -32016,8 +32019,9 @@ ix86_rtx_costs (rtx x, int code, int outer_code_i, int 
opno, int *total,
   return false;
 
 case ASHIFT:
-  if (CONST_INT_P (XEXP (x, 1))
- && (GET_MODE (XEXP (x, 0)) != DImode || TARGET_64BIT))
+  if (SCALAR_INT_MODE_P (mode)
+ && GET_MODE_SIZE (mode) < UNITS_PER_WORD
+ && CONST_INT_P (XEXP (x, 1)))
{
  HOST_WIDE_INT value = INTVAL (XEXP (x, 1));
  if (value == 1)
@@ -32038,7 +32042,15 @@ ix86_rtx_costs (rtx x, int code, int outer_code_i, int 
opno, int *total,
 case ASHIFTRT:
 case LSHIFTRT:
 case ROTATERT:
-  if (!TARGET_64BIT && GET_MODE (XEXP (x, 0)) == DImode)
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+   {
+ /* ??? Should be SSE vector operation cost.  */
+ /* At least for published AMD latencies, this really is the same
+as the latency for a simple fpu operation like fabs.  */
+ *total = cost->fabs;
+ return false;
+   }
+  if (GET_MODE_SIZE (mode) < UNITS_PER_WORD)
{
  if (CONST_INT_P (XEXP (x, 1)))
{
@@ -32107,6 +32119,16 @@ ix86_rtx_costs (rtx x, int code, int outer_code_i, int 
opno, int *total,
  *total = cost->fmul;
  return false;
}
+  else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+   {
+ /* Without sse4.1, we don't have PMULLD; it's emulated with 7
+insns, including two PMULUDQ.  */
+ if (mode == V4SImode && !(TARGET_SSE4_1 || TARGET_AVX))
+   *total = cost->fmul * 2 + cost->fabs * 5;
+ else
+   *total = cost->fmul;
+ return false;
+   }
   else
{
  rtx op0 = XEXP (x, 0);
@@ -32171,7 +32193,7 @@ ix86_rtx_costs (rtx x, int code, int outer_code_i, int 
opno, int *total,
 
 case PLUS:
   if (GET_MODE_CLASS (mode) == MODE_INT
-  && GET_MODE_BITSIZE (mode) <= GET_MODE_BITSIZE (Pmode))
+ && GET_MODE_SIZE (mode) <= UNITS_PER_WORD)
{
  if (GET_CODE (XEXP (x, 0)) == PLUS
  && GET_CODE (XEXP (XEXP (x, 0), 0)) == MULT
@@ -32271,6 +32293,14 @@ ix86_rtx_costs (rtx x, int code, int outer_code_i, int 
opno, int *total,
   /* FALLTHRU */
 
 case NOT:
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+   {
+ /* ??? Should be SSE vector operation cost.  */
+ /* At least for published AMD latencies, this really is the same
+as the latency for a simple fpu operation like fabs.  */
+ *total = cost->fabs;
+ return false;
+   }
   if (!TARGET_64BIT && mode == DImode)
*total = cost->add * 2;
   else
@@ -32331,7 +32361,7 @@ ix86_rtx_costs (rtx x, int code, int outer_code_i, int 
opno, int *total,
   /* ??? Assume all of these vector manipulation patterns are
 recognizable.  In which case they all pretty much have the
 same cost.  */
- *total = COSTS_N_INSNS (1);
+ *total = cost->fabs;
  return true;
 
 default:
-- 
1.7.7.6