subject:"\[Mesa\-dev\] \[PATCH v3\] R600\/SI\: Add pattern for AMDGPUurecip"

Re: [Mesa-dev] [PATCH v3] R600/SI: Add pattern for AMDGPUurecip

2013-04-20 Thread Christian König


Am 20.04.2013 06:06, schrieb Tom Stellard:

On Thu, Apr 11, 2013 at 10:12:01AM +0200, Christian König wrote:

Am 10.04.2013 18:50, schrieb Tom Stellard:

On Wed, Apr 10, 2013 at 05:59:48PM +0200, Michel Dänzer wrote:

[SNIP]

We should start using the updated pattern syntax for all new patterns.
This means replacing register classes with types for the input patterns
and omitting the type in the output pattern:

def : Pat <
   (AMDGPUurecip i32:$src0),
   (V_CVT_U32_F32_e32
 (V_MUL_F32_e32 CONST.FP_UINT_MAX_PLUS_1,
(V_RCP_IFLAG_F32_e32 (V_CVT_F32_U32_e32 $src0

With that change:

Reviewed-by: Tom Stellard 

BTW: I created the attached patches two weeks ago. They rework most
of the existing patterns on SI to use the new format, but I
currently don't have time to rebase, test & commit them. They
shouldn't change anything in functionality, so if you guys think
they are ok then please review and commit them.


Thanks for doing this.  I've thrown these patches into a branch along
with changes to the R600 patterns.  I will try to test them next week.
Is there any reason why we can't squash all these patches together before
we commit?


No not really. I just usually split patches up for testing each 
individually, so feel free to squash merge them for commit.


Christian.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3] R600/SI: Add pattern for AMDGPUurecip

2013-04-19 Thread Tom Stellard

On Thu, Apr 11, 2013 at 10:12:01AM +0200, Christian König wrote:
> Am 10.04.2013 18:50, schrieb Tom Stellard:
> >On Wed, Apr 10, 2013 at 05:59:48PM +0200, Michel Dänzer wrote:
> >>[SNIP]
> >We should start using the updated pattern syntax for all new patterns.
> >This means replacing register classes with types for the input patterns
> >and omitting the type in the output pattern:
> >
> >def : Pat <
> >   (AMDGPUurecip i32:$src0),
> >   (V_CVT_U32_F32_e32
> > (V_MUL_F32_e32 CONST.FP_UINT_MAX_PLUS_1,
> >(V_RCP_IFLAG_F32_e32 (V_CVT_F32_U32_e32 $src0
> >
> >With that change:
> >
> >Reviewed-by: Tom Stellard 
> 
> BTW: I created the attached patches two weeks ago. They rework most
> of the existing patterns on SI to use the new format, but I
> currently don't have time to rebase, test & commit them. They
> shouldn't change anything in functionality, so if you guys think
> they are ok then please review and commit them.
> 

Thanks for doing this.  I've thrown these patches into a branch along
with changes to the R600 patterns.  I will try to test them next week.
Is there any reason why we can't squash all these patches together before
we commit?

-Tom


> Thanks,
> Christian.

> From f0175c616db5f6d3f1024137edbd8773c118f7dc Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Christian=20K=C3=B6nig?= 
> Date: Thu, 28 Mar 2013 12:50:55 +0100
> Subject: [PATCH 1/9] R600/SI: remove nonsense select pattern
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Fortunately this pattern never matched, otherwise
> we would have generated incorrect code.
> 
> Signed-off-by: Christian K??nig 
> ---
>  lib/Target/R600/SIInstructions.td |9 +
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/lib/Target/R600/SIInstructions.td 
> b/lib/Target/R600/SIInstructions.td
> index eb410d7..e37003e 100644
> --- a/lib/Target/R600/SIInstructions.td
> +++ b/lib/Target/R600/SIInstructions.td
> @@ -1019,18 +1019,11 @@ def S_MAX_U32 : SOP2_32 <0x0009, "S_MAX_U32", []>;
>  def S_CSELECT_B32 : SOP2 <
>0x000a, (outs SReg_32:$dst),
>(ins SReg_32:$src0, SReg_32:$src1, SCCReg:$scc), "S_CSELECT_B32",
> -  [(set (i32 SReg_32:$dst), (select (i1 SCCReg:$scc),
> - SReg_32:$src0, SReg_32:$src1))]
> +  []
>  >;
>  
>  def S_CSELECT_B64 : SOP2_64 <0x000b, "S_CSELECT_B64", []>;
>  
> -// f32 pattern for S_CSELECT_B32
> -def : Pat <
> -  (f32 (select (i1 SCCReg:$scc), SReg_32:$src0, SReg_32:$src1)),
> -  (S_CSELECT_B32 SReg_32:$src0, SReg_32:$src1, SCCReg:$scc)
> ->;
> -
>  def S_AND_B32 : SOP2_32 <0x000e, "S_AND_B32", []>;
>  
>  def S_AND_B64 : SOP2_64 <0x000f, "S_AND_B64",
> -- 
> 1.7.10.4
> 

> From 7a2c0f084fa9ac949084a2c719d9944dd680a866 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Christian=20K=C3=B6nig?= 
> Date: Thu, 28 Mar 2013 11:18:00 +0100
> Subject: [PATCH 2/9] R600/SI: start reworking patterns
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> We don't need register classes in patterns any longer.
> Let's start with the indirect addressing patterns.
> 
> Signed-off-by: Christian K??nig 
> ---
>  lib/Target/R600/SIInstructions.td |   36 ++--
>  1 file changed, 14 insertions(+), 22 deletions(-)
> 
> diff --git a/lib/Target/R600/SIInstructions.td 
> b/lib/Target/R600/SIInstructions.td
> index e37003e..6ee3923 100644
> --- a/lib/Target/R600/SIInstructions.td
> +++ b/lib/Target/R600/SIInstructions.td
> @@ -1542,45 +1542,37 @@ defm : SMRD_Pattern  S_LOAD_DWORDX8_SGPR, v32i8>;
>  /**   Indirect adressing   **/
>  /** == **/
>  
> -multiclass SI_INDIRECT_Pattern  -SI_INDIRECT_DST IndDst> {
> +multiclass SI_INDIRECT_Pattern  {
> +
>// 1. Extract with offset
>def : Pat<
> -(vector_extract (vt rc:$vec),
> -  (i64 (zext (i32 (add VReg_32:$idx, imm:$off
> -),
> -(f32 (SI_INDIRECT_SRC (IMPLICIT_DEF), rc:$vec, VReg_32:$idx, imm:$off))
> +(vector_extract vt:$vec, (i64 (zext (add i32:$idx, imm:$off,
> +(f32 (SI_INDIRECT_SRC (IMPLICIT_DEF), vt:$vec, VReg_32:$idx, imm:$off))
>>;
>  
>// 2. Extract without offset
>def : Pat<
> -(vector_extract (vt rc:$vec),
> -  (i64 (zext (i32 VReg_32:$idx)))
> -),
> -(f32 (SI_INDIRECT_SRC (IMPLICIT_DEF), rc:$vec, VReg_32:$idx, 0))
> +(vector_extract vt:$vec, (i64 (zext i32:$idx))),
> +(f32 (SI_INDIRECT_SRC (IMPLICIT_DEF), vt:$vec, i32:$idx, 0))
>>;
>  
>// 3. Insert with offset
>def : Pat<
> -(vector_insert (vt rc:$vec), (f32 VReg_32:$val),
> -  (i64 (zext (i32 (add VReg_32:$idx, imm:$off
> -),
> -(vt (IndDst (IMPLICIT_DEF), rc:$vec, VReg_32:$idx, imm:$off, 
> VReg_32:$val))
> +(vector_insert vt:$vec, f32:$val, (i64 (zext (add i32:$idx, imm:$off,
> +(IndDst (IMPLICIT_DEF), vt:$vec, i32:$idx,

Re: [Mesa-dev] [PATCH v3] R600/SI: Add pattern for AMDGPUurecip

2013-04-11 Thread Christian König


Am 10.04.2013 18:50, schrieb Tom Stellard:

On Wed, Apr 10, 2013 at 05:59:48PM +0200, Michel Dänzer wrote:

[SNIP]

We should start using the updated pattern syntax for all new patterns.
This means replacing register classes with types for the input patterns
and omitting the type in the output pattern:

def : Pat <
   (AMDGPUurecip i32:$src0),
   (V_CVT_U32_F32_e32
 (V_MUL_F32_e32 CONST.FP_UINT_MAX_PLUS_1,
(V_RCP_IFLAG_F32_e32 (V_CVT_F32_U32_e32 $src0

With that change:

Reviewed-by: Tom Stellard 


BTW: I created the attached patches two weeks ago. They rework most of 
the existing patterns on SI to use the new format, but I currently don't 
have time to rebase, test & commit them. They shouldn't change anything 
in functionality, so if you guys think they are ok then please review 
and commit them.


Thanks,
Christian.
>From f0175c616db5f6d3f1024137edbd8773c118f7dc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= 
Date: Thu, 28 Mar 2013 12:50:55 +0100
Subject: [PATCH 1/9] R600/SI: remove nonsense select pattern
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fortunately this pattern never matched, otherwise
we would have generated incorrect code.

Signed-off-by: Christian KÃ¶nig 
---
 lib/Target/R600/SIInstructions.td |9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
index eb410d7..e37003e 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -1019,18 +1019,11 @@ def S_MAX_U32 : SOP2_32 <0x0009, "S_MAX_U32", []>;
 def S_CSELECT_B32 : SOP2 <
   0x000a, (outs SReg_32:$dst),
   (ins SReg_32:$src0, SReg_32:$src1, SCCReg:$scc), "S_CSELECT_B32",
-  [(set (i32 SReg_32:$dst), (select (i1 SCCReg:$scc),
- SReg_32:$src0, SReg_32:$src1))]
+  []
 >;
 
 def S_CSELECT_B64 : SOP2_64 <0x000b, "S_CSELECT_B64", []>;
 
-// f32 pattern for S_CSELECT_B32
-def : Pat <
-  (f32 (select (i1 SCCReg:$scc), SReg_32:$src0, SReg_32:$src1)),
-  (S_CSELECT_B32 SReg_32:$src0, SReg_32:$src1, SCCReg:$scc)
->;
-
 def S_AND_B32 : SOP2_32 <0x000e, "S_AND_B32", []>;
 
 def S_AND_B64 : SOP2_64 <0x000f, "S_AND_B64",
-- 
1.7.10.4

>From 7a2c0f084fa9ac949084a2c719d9944dd680a866 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= 
Date: Thu, 28 Mar 2013 11:18:00 +0100
Subject: [PATCH 2/9] R600/SI: start reworking patterns
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We don't need register classes in patterns any longer.
Let's start with the indirect addressing patterns.

Signed-off-by: Christian KÃ¶nig 
---
 lib/Target/R600/SIInstructions.td |   36 ++--
 1 file changed, 14 insertions(+), 22 deletions(-)

diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
index e37003e..6ee3923 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -1542,45 +1542,37 @@ defm : SMRD_Pattern ;
 /**   Indirect adressing   **/
 /** == **/
 
-multiclass SI_INDIRECT_Pattern  {
+multiclass SI_INDIRECT_Pattern  {
+
   // 1. Extract with offset
   def : Pat<
-(vector_extract (vt rc:$vec),
-  (i64 (zext (i32 (add VReg_32:$idx, imm:$off
-),
-(f32 (SI_INDIRECT_SRC (IMPLICIT_DEF), rc:$vec, VReg_32:$idx, imm:$off))
+(vector_extract vt:$vec, (i64 (zext (add i32:$idx, imm:$off,
+(f32 (SI_INDIRECT_SRC (IMPLICIT_DEF), vt:$vec, VReg_32:$idx, imm:$off))
   >;
 
   // 2. Extract without offset
   def : Pat<
-(vector_extract (vt rc:$vec),
-  (i64 (zext (i32 VReg_32:$idx)))
-),
-(f32 (SI_INDIRECT_SRC (IMPLICIT_DEF), rc:$vec, VReg_32:$idx, 0))
+(vector_extract vt:$vec, (i64 (zext i32:$idx))),
+(f32 (SI_INDIRECT_SRC (IMPLICIT_DEF), vt:$vec, i32:$idx, 0))
   >;
 
   // 3. Insert with offset
   def : Pat<
-(vector_insert (vt rc:$vec), (f32 VReg_32:$val),
-  (i64 (zext (i32 (add VReg_32:$idx, imm:$off
-),
-(vt (IndDst (IMPLICIT_DEF), rc:$vec, VReg_32:$idx, imm:$off, VReg_32:$val))
+(vector_insert vt:$vec, f32:$val, (i64 (zext (add i32:$idx, imm:$off,
+(IndDst (IMPLICIT_DEF), vt:$vec, i32:$idx, imm:$off, f32:$val)
   >;
 
   // 4. Insert without offset
   def : Pat<
-(vector_insert (vt rc:$vec), (f32 VReg_32:$val),
-  (i64 (zext (i32 VReg_32:$idx)))
-),
-(vt (IndDst (IMPLICIT_DEF), rc:$vec, VReg_32:$idx, 0, VReg_32:$val))
+(vector_insert vt:$vec, f32:$val, (i64 (zext i32:$idx))),
+(IndDst (IMPLICIT_DEF), vt:$vec, i32:$idx, 0, f32:$val)
   >;
 }
 
-defm : SI_INDIRECT_Pattern ;
-defm : SI_INDIRECT_Pattern ;
-defm : SI_INDIRECT_Pattern ;
-defm : SI_INDIRECT_Pattern ;
+defm : SI_INDIRECT_Pattern ;
+defm : SI_INDIRECT_Pattern ;
+defm : SI_INDIRECT_Pattern ;
+defm : SI_INDIRECT_Pattern ;
 
 /** ===

Re: [Mesa-dev] [PATCH v3] R600/SI: Add pattern for AMDGPUurecip

2013-04-10 Thread Tom Stellard

On Wed, Apr 10, 2013 at 05:59:48PM +0200, Michel Dänzer wrote:
> From: Michel Dänzer 
> 
> 21 more little piglits with radeonsi.
> 
> Signed-off-by: Michel Dänzer 
> ---
> 
> v3: Use constant for and add comments about scaling multiplications
> 
>  lib/Target/R600/AMDGPUInstructions.td |  1 +
>  lib/Target/R600/R600Instructions.td   |  3 ++-
>  lib/Target/R600/SIInstructions.td | 12 ++--
>  test/CodeGen/R600/urecip.ll   | 12 
>  4 files changed, 25 insertions(+), 3 deletions(-)
>  create mode 100644 test/CodeGen/R600/urecip.ll
> 
> diff --git a/lib/Target/R600/AMDGPUInstructions.td 
> b/lib/Target/R600/AMDGPUInstructions.td
> index e740348..fa890c1 100644
> --- a/lib/Target/R600/AMDGPUInstructions.td
> +++ b/lib/Target/R600/AMDGPUInstructions.td
> @@ -94,6 +94,7 @@ class Constants {
>  int TWO_PI = 0x40c90fdb;
>  int PI = 0x40490fdb;
>  int TWO_PI_INV = 0x3e22f983;
> +int FP_UINT_MAX_PLUS_1 = 0x4f80; // 1 << 32 in floating point encoding
>  }
>  def CONST : Constants;
>  
> diff --git a/lib/Target/R600/R600Instructions.td 
> b/lib/Target/R600/R600Instructions.td
> index b4c45e1..8ede6cc 100644
> --- a/lib/Target/R600/R600Instructions.td
> +++ b/lib/Target/R600/R600Instructions.td
> @@ -1923,10 +1923,11 @@ def : COS_PAT ;
>  defm DIV_cm : DIV_Common;
>  
>  // RECIP_UINT emulation for Cayman
> +// The multiplication scales from [0,1] to the unsigned integer range
>  def : Pat <
>(AMDGPUurecip R600_Reg32:$src0),
>(FLT_TO_UINT_eg (MUL_IEEE (RECIP_IEEE_cm (UINT_TO_FLT_eg 
> R600_Reg32:$src0)),
> -(MOV_IMM_I32 0x4f80)))
> +(MOV_IMM_I32 CONST.FP_UINT_MAX_PLUS_1)))
>  >;
>  
>  
> diff --git a/lib/Target/R600/SIInstructions.td 
> b/lib/Target/R600/SIInstructions.td
> index e2a08fc..0226d5a 100644
> --- a/lib/Target/R600/SIInstructions.td
> +++ b/lib/Target/R600/SIInstructions.td
> @@ -602,8 +602,8 @@ defm V_READFIRSTLANE_B32 : VOP1_32 <0x0002, 
> "V_READFIRSTLANE_B32", []>;
>  defm V_CVT_F32_I32 : VOP1_32 <0x0005, "V_CVT_F32_I32",
>[(set VReg_32:$dst, (sint_to_fp VSrc_32:$src0))]
>  >;
> -//defm V_CVT_F32_U32 : VOP1_32 <0x0006, "V_CVT_F32_U32", []>;
> -//defm V_CVT_U32_F32 : VOP1_32 <0x0007, "V_CVT_U32_F32", []>;
> +defm V_CVT_F32_U32 : VOP1_32 <0x0006, "V_CVT_F32_U32", []>;
> +defm V_CVT_U32_F32 : VOP1_32 <0x0007, "V_CVT_U32_F32", []>;
>  defm V_CVT_I32_F32 : VOP1_32 <0x0008, "V_CVT_I32_F32",
>[(set (i32 VReg_32:$dst), (fp_to_sint VSrc_32:$src0))]
>  >;
> @@ -1514,6 +1514,14 @@ def : Pat <
>(BUFFER_LOAD_DWORD 0, 1, 0, 0, 0, 0, VReg_32:$voff, SReg_128:$sbase, 0, 0, 
> 0)
>  >;
>  
> +// The multiplication scales from [0,1] to the unsigned integer range
> +def : Pat <
> +  (AMDGPUurecip VSrc_32:$src0),
> +  (V_CVT_U32_F32_e32
> +(V_MUL_F32_e32 CONST.FP_UINT_MAX_PLUS_1,
> +   (V_RCP_IFLAG_F32_e32 (V_CVT_F32_U32_e32 VSrc_32:$src0
> +>;
> +

We should start using the updated pattern syntax for all new patterns.
This means replacing register classes with types for the input patterns
and omitting the type in the output pattern:

def : Pat <
  (AMDGPUurecip i32:$src0),
  (V_CVT_U32_F32_e32
(V_MUL_F32_e32 CONST.FP_UINT_MAX_PLUS_1,
   (V_RCP_IFLAG_F32_e32 (V_CVT_F32_U32_e32 $src0

With that change:

Reviewed-by: Tom Stellard 
>  /** == **/
>  /**   VOP3 Patterns**/
>  /** == **/
> diff --git a/test/CodeGen/R600/urecip.ll b/test/CodeGen/R600/urecip.ll
> new file mode 100644
> index 000..dad02dd
> --- /dev/null
> +++ b/test/CodeGen/R600/urecip.ll
> @@ -0,0 +1,12 @@
> +;RUN: llc < %s -march=r600 -mcpu=verde | FileCheck %s
> +
> +;CHECK: V_RCP_IFLAG_F32_e32
> +
> +define void @test(i32 %p, i32 %q) {
> +   %i = udiv i32 %p, %q
> +   %r = bitcast i32 %i to float
> +   call void @llvm.SI.export(i32 15, i32 0, i32 1, i32 12, i32 0, float %r, 
> float %r, float %r, float %r)
> +   ret void
> +}
> +
> +declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, 
> float)
> -- 
> 1.8.2
> 
> ___
> llvm-commits mailing list
> llvm-comm...@cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3] R600/SI: Add pattern for AMDGPUurecip

2013-04-10 Thread Michel Dänzer

From: Michel Dänzer 

21 more little piglits with radeonsi.

Signed-off-by: Michel Dänzer 
---

v3: Use constant for and add comments about scaling multiplications

 lib/Target/R600/AMDGPUInstructions.td |  1 +
 lib/Target/R600/R600Instructions.td   |  3 ++-
 lib/Target/R600/SIInstructions.td | 12 ++--
 test/CodeGen/R600/urecip.ll   | 12 
 4 files changed, 25 insertions(+), 3 deletions(-)
 create mode 100644 test/CodeGen/R600/urecip.ll

diff --git a/lib/Target/R600/AMDGPUInstructions.td 
b/lib/Target/R600/AMDGPUInstructions.td
index e740348..fa890c1 100644
--- a/lib/Target/R600/AMDGPUInstructions.td
+++ b/lib/Target/R600/AMDGPUInstructions.td
@@ -94,6 +94,7 @@ class Constants {
 int TWO_PI = 0x40c90fdb;
 int PI = 0x40490fdb;
 int TWO_PI_INV = 0x3e22f983;
+int FP_UINT_MAX_PLUS_1 = 0x4f80;   // 1 << 32 in floating point encoding
 }
 def CONST : Constants;
 
diff --git a/lib/Target/R600/R600Instructions.td 
b/lib/Target/R600/R600Instructions.td
index b4c45e1..8ede6cc 100644
--- a/lib/Target/R600/R600Instructions.td
+++ b/lib/Target/R600/R600Instructions.td
@@ -1923,10 +1923,11 @@ def : COS_PAT ;
 defm DIV_cm : DIV_Common;
 
 // RECIP_UINT emulation for Cayman
+// The multiplication scales from [0,1] to the unsigned integer range
 def : Pat <
   (AMDGPUurecip R600_Reg32:$src0),
   (FLT_TO_UINT_eg (MUL_IEEE (RECIP_IEEE_cm (UINT_TO_FLT_eg R600_Reg32:$src0)),
-(MOV_IMM_I32 0x4f80)))
+(MOV_IMM_I32 CONST.FP_UINT_MAX_PLUS_1)))
 >;
 
 
diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index e2a08fc..0226d5a 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -602,8 +602,8 @@ defm V_READFIRSTLANE_B32 : VOP1_32 <0x0002, 
"V_READFIRSTLANE_B32", []>;
 defm V_CVT_F32_I32 : VOP1_32 <0x0005, "V_CVT_F32_I32",
   [(set VReg_32:$dst, (sint_to_fp VSrc_32:$src0))]
 >;
-//defm V_CVT_F32_U32 : VOP1_32 <0x0006, "V_CVT_F32_U32", []>;
-//defm V_CVT_U32_F32 : VOP1_32 <0x0007, "V_CVT_U32_F32", []>;
+defm V_CVT_F32_U32 : VOP1_32 <0x0006, "V_CVT_F32_U32", []>;
+defm V_CVT_U32_F32 : VOP1_32 <0x0007, "V_CVT_U32_F32", []>;
 defm V_CVT_I32_F32 : VOP1_32 <0x0008, "V_CVT_I32_F32",
   [(set (i32 VReg_32:$dst), (fp_to_sint VSrc_32:$src0))]
 >;
@@ -1514,6 +1514,14 @@ def : Pat <
   (BUFFER_LOAD_DWORD 0, 1, 0, 0, 0, 0, VReg_32:$voff, SReg_128:$sbase, 0, 0, 0)
 >;
 
+// The multiplication scales from [0,1] to the unsigned integer range
+def : Pat <
+  (AMDGPUurecip VSrc_32:$src0),
+  (V_CVT_U32_F32_e32
+(V_MUL_F32_e32 CONST.FP_UINT_MAX_PLUS_1,
+   (V_RCP_IFLAG_F32_e32 (V_CVT_F32_U32_e32 VSrc_32:$src0
+>;
+
 /** == **/
 /**   VOP3 Patterns**/
 /** == **/
diff --git a/test/CodeGen/R600/urecip.ll b/test/CodeGen/R600/urecip.ll
new file mode 100644
index 000..dad02dd
--- /dev/null
+++ b/test/CodeGen/R600/urecip.ll
@@ -0,0 +1,12 @@
+;RUN: llc < %s -march=r600 -mcpu=verde | FileCheck %s
+
+;CHECK: V_RCP_IFLAG_F32_e32
+
+define void @test(i32 %p, i32 %q) {
+   %i = udiv i32 %p, %q
+   %r = bitcast i32 %i to float
+   call void @llvm.SI.export(i32 15, i32 0, i32 1, i32 12, i32 0, float %r, 
float %r, float %r, float %r)
+   ret void
+}
+
+declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, 
float)
-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3] R600/SI: Add pattern for AMDGPUurecip

Re: [Mesa-dev] [PATCH v3] R600/SI: Add pattern for AMDGPUurecip

Re: [Mesa-dev] [PATCH v3] R600/SI: Add pattern for AMDGPUurecip

Re: [Mesa-dev] [PATCH v3] R600/SI: Add pattern for AMDGPUurecip

[Mesa-dev] [PATCH v3] R600/SI: Add pattern for AMDGPUurecip

5 matches

Site Navigation

Mail list logo

Footer information