Re: [Mesa-dev] R600/SI: Support for local memory and derivatives

2013-07-10 Thread Michel Dänzer
On Fre, 2013-06-28 at 14:37 -0700, Tom Stellard wrote:
 On Wed, Jun 19, 2013 at 06:28:21PM +0200, Michel Dänzer wrote:
  
  These patches implement enough of local memory support to allow radeonsi
  to use that for computing derivatives, as suggested by Tom.
  
  They also almost allow test/CodeGen/R600/local-memory.ll to generate
  code for SI. Right now it still fails because it tries to copy a VGPR to
  an SGPR, which is not possible.
 
 Can you add some lit tests for these new intrinsics

Done, updated patches attached.


 and also add CHECK lines for SI to the existing local-memory.ll test.

Can't do that while it still fails to generate SI code. Should I commit
the other patches anyway, which are only necessary for that test?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
From 3572bab6a6b5c967d19add0b0497a96123754ec2 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Michel=20D=C3=A4nzer?= michel.daen...@amd.com
Date: Thu, 21 Feb 2013 16:12:45 +0100
Subject: [PATCH v2 1/4] R600/SI: Add intrinsics for texture sampling with user
 derivatives
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Michel Dänzer michel.daen...@amd.com
---

v2: Add lit test

 lib/Target/R600/SIInstructions.td|   7 +-
 lib/Target/R600/SIIntrinsics.td  |   1 +
 test/CodeGen/R600/llvm.SI.sampled.ll | 140 +++
 3 files changed, 147 insertions(+), 1 deletion(-)
 create mode 100644 test/CodeGen/R600/llvm.SI.sampled.ll

diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
index 9c96c08..c9eac7d 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -535,7 +535,7 @@ def IMAGE_SAMPLE_B : MIMG_Sampler_Helper 0x0025, IMAGE_SAMPLE_B;
 //def IMAGE_SAMPLE_LZ : MIMG_NoPattern_ IMAGE_SAMPLE_LZ, 0x0027;
 def IMAGE_SAMPLE_C : MIMG_Sampler_Helper 0x0028, IMAGE_SAMPLE_C;
 //def IMAGE_SAMPLE_C_CL : MIMG_NoPattern_ IMAGE_SAMPLE_C_CL, 0x0029;
-//def IMAGE_SAMPLE_C_D : MIMG_NoPattern_ IMAGE_SAMPLE_C_D, 0x002a;
+def IMAGE_SAMPLE_C_D : MIMG_Sampler_Helper 0x002a, IMAGE_SAMPLE_C_D;
 //def IMAGE_SAMPLE_C_D_CL : MIMG_NoPattern_ IMAGE_SAMPLE_C_D_CL, 0x002b;
 def IMAGE_SAMPLE_C_L : MIMG_Sampler_Helper 0x002c, IMAGE_SAMPLE_C_L;
 def IMAGE_SAMPLE_C_B : MIMG_Sampler_Helper 0x002d, IMAGE_SAMPLE_C_B;
@@ -1296,6 +1296,11 @@ multiclass SamplePatternsValueType addr_type {
   def : SampleArrayPattern int_SI_sampleb, IMAGE_SAMPLE_B, addr_type;
   def : SampleShadowPattern int_SI_sampleb, IMAGE_SAMPLE_C_B, addr_type;
   def : SampleShadowArrayPattern int_SI_sampleb, IMAGE_SAMPLE_C_B, addr_type;
+
+  def : SamplePattern int_SI_sampled, IMAGE_SAMPLE_D, addr_type;
+  def : SampleArrayPattern int_SI_sampled, IMAGE_SAMPLE_D, addr_type;
+  def : SampleShadowPattern int_SI_sampled, IMAGE_SAMPLE_C_D, addr_type;
+  def : SampleShadowArrayPattern int_SI_sampled, IMAGE_SAMPLE_C_D, addr_type;
 }
 
 defm : SamplePatternsv2i32;
diff --git a/lib/Target/R600/SIIntrinsics.td b/lib/Target/R600/SIIntrinsics.td
index 224cd2f..d2643e0 100644
--- a/lib/Target/R600/SIIntrinsics.td
+++ b/lib/Target/R600/SIIntrinsics.td
@@ -23,6 +23,7 @@ let TargetPrefix = SI, isTarget = 1 in {
 
   def int_SI_sample : Sample;
   def int_SI_sampleb : Sample;
+  def int_SI_sampled : Sample;
   def int_SI_samplel : Sample;
 
   def int_SI_imageload : Intrinsic [llvm_v4i32_ty], [llvm_anyvector_ty, llvm_v32i8_ty, llvm_i32_ty], [IntrNoMem];
diff --git a/test/CodeGen/R600/llvm.SI.sampled.ll b/test/CodeGen/R600/llvm.SI.sampled.ll
new file mode 100644
index 000..71b8ef5
--- /dev/null
+++ b/test/CodeGen/R600/llvm.SI.sampled.ll
@@ -0,0 +1,140 @@
+;RUN: llc  %s -march=r600 -mcpu=verde | FileCheck %s
+
+;CHECK: IMAGE_SAMPLE_D {{VGPR[0-9]+_VGPR[0-9]+_VGPR[0-9]+_VGPR[0-9]+}}, 15
+;CHECK: IMAGE_SAMPLE_D {{VGPR[0-9]+_VGPR[0-9]+}}, 3
+;CHECK: IMAGE_SAMPLE_D {{VGPR[0-9]+}}, 2
+;CHECK: IMAGE_SAMPLE_D {{VGPR[0-9]+}}, 1
+;CHECK: IMAGE_SAMPLE_D {{VGPR[0-9]+}}, 4
+;CHECK: IMAGE_SAMPLE_D {{VGPR[0-9]+}}, 8
+;CHECK: IMAGE_SAMPLE_C_D {{VGPR[0-9]+_VGPR[0-9]+}}, 5
+;CHECK: IMAGE_SAMPLE_C_D {{VGPR[0-9]+_VGPR[0-9]+}}, 9
+;CHECK: IMAGE_SAMPLE_C_D {{VGPR[0-9]+_VGPR[0-9]+}}, 6
+;CHECK: IMAGE_SAMPLE_D {{VGPR[0-9]+_VGPR[0-9]+}}, 10
+;CHECK: IMAGE_SAMPLE_D {{VGPR[0-9]+_VGPR[0-9]+}}, 12
+;CHECK: IMAGE_SAMPLE_C_D {{VGPR[0-9]+_VGPR[0-9]+_VGPR[0-9]+}}, 7
+;CHECK: IMAGE_SAMPLE_C_D {{VGPR[0-9]+_VGPR[0-9]+_VGPR[0-9]+}}, 11
+;CHECK: IMAGE_SAMPLE_C_D {{VGPR[0-9]+_VGPR[0-9]+_VGPR[0-9]+}}, 13
+;CHECK: IMAGE_SAMPLE_D {{VGPR[0-9]+_VGPR[0-9]+_VGPR[0-9]+}}, 14
+;CHECK: IMAGE_SAMPLE_D {{VGPR[0-9]+}}, 8
+
+define void @test(i32 %a1, i32 %a2, i32 %a3, i32 %a4) {
+   %v1 = insertelement 4 x i32 undef, i32 %a1, i32 0
+   %v2 = insertelement 4 x i32 undef, i32 %a1, i32 1
+   %v3 = insertelement 4 x i32 undef, i32 %a1, i32 2
+   %v4 = insertelement 4 x i32 undef, i32 %a1, i32 3

Re: [Mesa-dev] R600/SI: Support for local memory and derivatives

2013-07-10 Thread Tom Stellard
On Wed, Jul 10, 2013 at 12:32:25PM +0200, Michel Dänzer wrote:
 On Fre, 2013-06-28 at 14:37 -0700, Tom Stellard wrote:
  On Wed, Jun 19, 2013 at 06:28:21PM +0200, Michel Dänzer wrote:
   
   These patches implement enough of local memory support to allow radeonsi
   to use that for computing derivatives, as suggested by Tom.
   
   They also almost allow test/CodeGen/R600/local-memory.ll to generate
   code for SI. Right now it still fails because it tries to copy a VGPR to
   an SGPR, which is not possible.
  
  Can you add some lit tests for these new intrinsics
 
 Done, updated patches attached.
 
 
  and also add CHECK lines for SI to the existing local-memory.ll test.
 
 Can't do that while it still fails to generate SI code. Should I commit
 the other patches anyway, which are only necessary for that test?
 


Can you add a TODO comment to that test for adding SI checks?

With that change, the patches are:

Reviewed-by: Tom Stellard thomas.stell...@amd.com
 
 -- 
 Earthling Michel Dänzer   |   http://www.amd.com
 Libre software enthusiast |  Debian, X and DRI developer

 From 3572bab6a6b5c967d19add0b0497a96123754ec2 Mon Sep 17 00:00:00 2001
 From: =?UTF-8?q?Michel=20D=C3=A4nzer?= michel.daen...@amd.com
 Date: Thu, 21 Feb 2013 16:12:45 +0100
 Subject: [PATCH v2 1/4] R600/SI: Add intrinsics for texture sampling with user
  derivatives
 MIME-Version: 1.0
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit
 
 Signed-off-by: Michel Dänzer michel.daen...@amd.com
 ---
 
 v2: Add lit test
 
  lib/Target/R600/SIInstructions.td|   7 +-
  lib/Target/R600/SIIntrinsics.td  |   1 +
  test/CodeGen/R600/llvm.SI.sampled.ll | 140 
 +++
  3 files changed, 147 insertions(+), 1 deletion(-)
  create mode 100644 test/CodeGen/R600/llvm.SI.sampled.ll
 
 diff --git a/lib/Target/R600/SIInstructions.td 
 b/lib/Target/R600/SIInstructions.td
 index 9c96c08..c9eac7d 100644
 --- a/lib/Target/R600/SIInstructions.td
 +++ b/lib/Target/R600/SIInstructions.td
 @@ -535,7 +535,7 @@ def IMAGE_SAMPLE_B : MIMG_Sampler_Helper 0x0025, 
 IMAGE_SAMPLE_B;
  //def IMAGE_SAMPLE_LZ : MIMG_NoPattern_ IMAGE_SAMPLE_LZ, 0x0027;
  def IMAGE_SAMPLE_C : MIMG_Sampler_Helper 0x0028, IMAGE_SAMPLE_C;
  //def IMAGE_SAMPLE_C_CL : MIMG_NoPattern_ IMAGE_SAMPLE_C_CL, 0x0029;
 -//def IMAGE_SAMPLE_C_D : MIMG_NoPattern_ IMAGE_SAMPLE_C_D, 0x002a;
 +def IMAGE_SAMPLE_C_D : MIMG_Sampler_Helper 0x002a, IMAGE_SAMPLE_C_D;
  //def IMAGE_SAMPLE_C_D_CL : MIMG_NoPattern_ IMAGE_SAMPLE_C_D_CL, 
 0x002b;
  def IMAGE_SAMPLE_C_L : MIMG_Sampler_Helper 0x002c, IMAGE_SAMPLE_C_L;
  def IMAGE_SAMPLE_C_B : MIMG_Sampler_Helper 0x002d, IMAGE_SAMPLE_C_B;
 @@ -1296,6 +1296,11 @@ multiclass SamplePatternsValueType addr_type {
def : SampleArrayPattern int_SI_sampleb, IMAGE_SAMPLE_B, addr_type;
def : SampleShadowPattern int_SI_sampleb, IMAGE_SAMPLE_C_B, addr_type;
def : SampleShadowArrayPattern int_SI_sampleb, IMAGE_SAMPLE_C_B, 
 addr_type;
 +
 +  def : SamplePattern int_SI_sampled, IMAGE_SAMPLE_D, addr_type;
 +  def : SampleArrayPattern int_SI_sampled, IMAGE_SAMPLE_D, addr_type;
 +  def : SampleShadowPattern int_SI_sampled, IMAGE_SAMPLE_C_D, addr_type;
 +  def : SampleShadowArrayPattern int_SI_sampled, IMAGE_SAMPLE_C_D, 
 addr_type;
  }
  
  defm : SamplePatternsv2i32;
 diff --git a/lib/Target/R600/SIIntrinsics.td b/lib/Target/R600/SIIntrinsics.td
 index 224cd2f..d2643e0 100644
 --- a/lib/Target/R600/SIIntrinsics.td
 +++ b/lib/Target/R600/SIIntrinsics.td
 @@ -23,6 +23,7 @@ let TargetPrefix = SI, isTarget = 1 in {
  
def int_SI_sample : Sample;
def int_SI_sampleb : Sample;
 +  def int_SI_sampled : Sample;
def int_SI_samplel : Sample;
  
def int_SI_imageload : Intrinsic [llvm_v4i32_ty], [llvm_anyvector_ty, 
 llvm_v32i8_ty, llvm_i32_ty], [IntrNoMem];
 diff --git a/test/CodeGen/R600/llvm.SI.sampled.ll 
 b/test/CodeGen/R600/llvm.SI.sampled.ll
 new file mode 100644
 index 000..71b8ef5
 --- /dev/null
 +++ b/test/CodeGen/R600/llvm.SI.sampled.ll
 @@ -0,0 +1,140 @@
 +;RUN: llc  %s -march=r600 -mcpu=verde | FileCheck %s
 +
 +;CHECK: IMAGE_SAMPLE_D {{VGPR[0-9]+_VGPR[0-9]+_VGPR[0-9]+_VGPR[0-9]+}}, 15
 +;CHECK: IMAGE_SAMPLE_D {{VGPR[0-9]+_VGPR[0-9]+}}, 3
 +;CHECK: IMAGE_SAMPLE_D {{VGPR[0-9]+}}, 2
 +;CHECK: IMAGE_SAMPLE_D {{VGPR[0-9]+}}, 1
 +;CHECK: IMAGE_SAMPLE_D {{VGPR[0-9]+}}, 4
 +;CHECK: IMAGE_SAMPLE_D {{VGPR[0-9]+}}, 8
 +;CHECK: IMAGE_SAMPLE_C_D {{VGPR[0-9]+_VGPR[0-9]+}}, 5
 +;CHECK: IMAGE_SAMPLE_C_D {{VGPR[0-9]+_VGPR[0-9]+}}, 9
 +;CHECK: IMAGE_SAMPLE_C_D {{VGPR[0-9]+_VGPR[0-9]+}}, 6
 +;CHECK: IMAGE_SAMPLE_D {{VGPR[0-9]+_VGPR[0-9]+}}, 10
 +;CHECK: IMAGE_SAMPLE_D {{VGPR[0-9]+_VGPR[0-9]+}}, 12
 +;CHECK: IMAGE_SAMPLE_C_D {{VGPR[0-9]+_VGPR[0-9]+_VGPR[0-9]+}}, 7
 +;CHECK: IMAGE_SAMPLE_C_D {{VGPR[0-9]+_VGPR[0-9]+_VGPR[0-9]+}}, 11
 +;CHECK: IMAGE_SAMPLE_C_D {{VGPR[0-9]+_VGPR[0-9]+_VGPR[0-9]+}}, 13
 +;CHECK: IMAGE_SAMPLE_D 

Re: [Mesa-dev] R600/SI: Support for local memory and derivatives

2013-07-10 Thread Michel Dänzer
On Mit, 2013-07-10 at 08:15 -0700, Tom Stellard wrote:
 On Wed, Jul 10, 2013 at 12:32:25PM +0200, Michel Dänzer wrote:
  On Fre, 2013-06-28 at 14:37 -0700, Tom Stellard wrote:
   On Wed, Jun 19, 2013 at 06:28:21PM +0200, Michel Dänzer wrote:

These patches implement enough of local memory support to allow radeonsi
to use that for computing derivatives, as suggested by Tom.

They also almost allow test/CodeGen/R600/local-memory.ll to generate
code for SI. Right now it still fails because it tries to copy a VGPR to
an SGPR, which is not possible.
   
   Can you add some lit tests for these new intrinsics
  
  Done, updated patches attached.
  
  
   and also add CHECK lines for SI to the existing local-memory.ll test.
  
  Can't do that while it still fails to generate SI code. Should I commit
  the other patches anyway, which are only necessary for that test?
 
 Can you add a TODO comment to that test for adding SI checks?
 
 With that change, the patches are:
 
 Reviewed-by: Tom Stellard thomas.stell...@amd.com

Thanks, I managed to enable basic lit testing after all, see the
attached patches 4 and 5.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
From 0f11058228a2c6504ed78f9856e6de3f8af0c0e8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Michel=20D=C3=A4nzer?= michel.daen...@amd.com
Date: Wed, 19 Jun 2013 11:01:00 +0200
Subject: [PATCH 4/5] R600/SI: Add pattern for the AMDGPU.barrier.local
 intrinsic
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

lit test coverage to follow in the next commit.

Reviewed-by: Tom Stellard thomas.stell...@amd.com
Signed-off-by: Michel Dänzer michel.daen...@amd.com
---
 lib/Target/R600/SIInstructions.td | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
index 61755b4..30f2a4a 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -774,8 +774,17 @@ def S_CBRANCH_EXECNZ : SOPP 
 } // End isBranch = 1
 } // End isTerminator = 1
 
-//def S_BARRIER : SOPP_ 0x000a, S_BARRIER, [];
 let hasSideEffects = 1 in {
+def S_BARRIER : SOPP 0x000a, (ins), S_BARRIER,
+  [(int_AMDGPU_barrier_local)]
+ {
+  let SIMM16 = 0;
+  let isBarrier = 1;
+  let hasCtrlDep = 1;
+  let mayLoad = 1;
+  let mayStore = 1;
+}
+
 def S_WAITCNT : SOPP 0x000c, (ins i32imm:$simm16), S_WAITCNT $simm16,
   []
 ;
-- 
1.8.3.2

From 09715a4574c2e35b02176516f542bc0d1d0dc132 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Michel=20D=C3=A4nzer?= michel.daen...@amd.com
Date: Mon, 17 Jun 2013 12:21:29 +0200
Subject: [PATCH v2 5/5] R600/SI: Initial local memory support
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Enough for the radeonsi driver to use it for calculating derivatives.

Reviewed-by: Tom Stellard thomas.stell...@amd.com
Signed-off-by: Michel Dänzer michel.daen...@amd.com
---

v2: Enable some lit testing of local memory on SI.

 lib/Target/R600/AMDGPUAsmPrinter.cpp  |  7 +++
 lib/Target/R600/AMDGPUISelLowering.cpp|  4 +-
 lib/Target/R600/R600ISelLowering.cpp  |  2 +
 lib/Target/R600/SIDefines.h   |  4 ++
 lib/Target/R600/SIISelLowering.cpp|  5 ++
 lib/Target/R600/SIInstructions.td | 15 ++
 test/CodeGen/R600/local-memory-two-objects.ll | 51 
 test/CodeGen/R600/local-memory.ll | 67 ++-
 8 files changed, 100 insertions(+), 55 deletions(-)
 create mode 100644 test/CodeGen/R600/local-memory-two-objects.ll

diff --git a/lib/Target/R600/AMDGPUAsmPrinter.cpp b/lib/Target/R600/AMDGPUAsmPrinter.cpp
index 996d2a6..e039b77 100644
--- a/lib/Target/R600/AMDGPUAsmPrinter.cpp
+++ b/lib/Target/R600/AMDGPUAsmPrinter.cpp
@@ -233,7 +233,14 @@ void AMDGPUAsmPrinter::EmitProgramInfoSI(MachineFunction MF) {
 
   OutStreamer.EmitIntValue(RsrcReg, 4);
   OutStreamer.EmitIntValue(S_00B028_VGPRS(MaxVGPR / 4) | S_00B028_SGPRS(MaxSGPR / 8), 4);
+
+  if (MFI-ShaderType == ShaderType::COMPUTE) {
+OutStreamer.EmitIntValue(R_00B84C_COMPUTE_PGM_RSRC2, 4);
+OutStreamer.EmitIntValue(S_00B84C_LDS_SIZE(RoundUpToAlignment(MFI-LDSSize, 256)  8), 4);
+  }
   if (MFI-ShaderType == ShaderType::PIXEL) {
+OutStreamer.EmitIntValue(R_00B02C_SPI_SHADER_PGM_RSRC2_PS, 4);
+OutStreamer.EmitIntValue(S_00B02C_EXTRA_LDS_SIZE(RoundUpToAlignment(MFI-LDSSize, 256)  8), 4);
 OutStreamer.EmitIntValue(R_0286CC_SPI_PS_INPUT_ENA, 4);
 OutStreamer.EmitIntValue(MFI-PSInputAddr, 4);
   }
diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp b/lib/Target/R600/AMDGPUISelLowering.cpp
index 4019a1f..7fad3bb 100644
--- a/lib/Target/R600/AMDGPUISelLowering.cpp
+++ b/lib/Target/R600/AMDGPUISelLowering.cpp
@@ -72,8 +72,6 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine TM) :
   

Re: [Mesa-dev] R600/SI: Support for local memory and derivatives

2013-06-28 Thread Tom Stellard
On Wed, Jun 19, 2013 at 06:28:21PM +0200, Michel Dänzer wrote:
 
 These patches implement enough of local memory support to allow radeonsi
 to use that for computing derivatives, as suggested by Tom.
 
 They also almost allow test/CodeGen/R600/local-memory.ll to generate
 code for SI. Right now it still fails because it tries to copy a VGPR to
 an SGPR, which is not possible.
 


Can you add some lit tests for these new intrinsics and also add CHECK
lines for SI to the existing local-memory.ll test.

With the tests added, these patches are:

Reviewed-by: Tom Stellard thomas.stell...@amd.com

 -- 
 Earthling Michel Dänzer   |   http://www.amd.com
 Libre software enthusiast |  Debian, X and DRI developer

 From f4ca359c4536aa53122b654196f2e007d50976f8 Mon Sep 17 00:00:00 2001
 From: =?UTF-8?q?Michel=20D=C3=A4nzer?= michel.daen...@amd.com
 Date: Thu, 21 Feb 2013 16:12:45 +0100
 Subject: [PATCH 1/6] R600/SI: Add intrinsics for texture sampling with user
  derivatives
 MIME-Version: 1.0
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit
 
 Signed-off-by: Michel Dänzer michel.daen...@amd.com
 ---
  lib/Target/R600/SIInstructions.td | 7 ++-
  lib/Target/R600/SIIntrinsics.td   | 1 +
  2 files changed, 7 insertions(+), 1 deletion(-)
 
 diff --git a/lib/Target/R600/SIInstructions.td 
 b/lib/Target/R600/SIInstructions.td
 index 9c96c08..c9eac7d 100644
 --- a/lib/Target/R600/SIInstructions.td
 +++ b/lib/Target/R600/SIInstructions.td
 @@ -535,7 +535,7 @@ def IMAGE_SAMPLE_B : MIMG_Sampler_Helper 0x0025, 
 IMAGE_SAMPLE_B;
  //def IMAGE_SAMPLE_LZ : MIMG_NoPattern_ IMAGE_SAMPLE_LZ, 0x0027;
  def IMAGE_SAMPLE_C : MIMG_Sampler_Helper 0x0028, IMAGE_SAMPLE_C;
  //def IMAGE_SAMPLE_C_CL : MIMG_NoPattern_ IMAGE_SAMPLE_C_CL, 0x0029;
 -//def IMAGE_SAMPLE_C_D : MIMG_NoPattern_ IMAGE_SAMPLE_C_D, 0x002a;
 +def IMAGE_SAMPLE_C_D : MIMG_Sampler_Helper 0x002a, IMAGE_SAMPLE_C_D;
  //def IMAGE_SAMPLE_C_D_CL : MIMG_NoPattern_ IMAGE_SAMPLE_C_D_CL, 
 0x002b;
  def IMAGE_SAMPLE_C_L : MIMG_Sampler_Helper 0x002c, IMAGE_SAMPLE_C_L;
  def IMAGE_SAMPLE_C_B : MIMG_Sampler_Helper 0x002d, IMAGE_SAMPLE_C_B;
 @@ -1296,6 +1296,11 @@ multiclass SamplePatternsValueType addr_type {
def : SampleArrayPattern int_SI_sampleb, IMAGE_SAMPLE_B, addr_type;
def : SampleShadowPattern int_SI_sampleb, IMAGE_SAMPLE_C_B, addr_type;
def : SampleShadowArrayPattern int_SI_sampleb, IMAGE_SAMPLE_C_B, 
 addr_type;
 +
 +  def : SamplePattern int_SI_sampled, IMAGE_SAMPLE_D, addr_type;
 +  def : SampleArrayPattern int_SI_sampled, IMAGE_SAMPLE_D, addr_type;
 +  def : SampleShadowPattern int_SI_sampled, IMAGE_SAMPLE_C_D, addr_type;
 +  def : SampleShadowArrayPattern int_SI_sampled, IMAGE_SAMPLE_C_D, 
 addr_type;
  }
  
  defm : SamplePatternsv2i32;
 diff --git a/lib/Target/R600/SIIntrinsics.td b/lib/Target/R600/SIIntrinsics.td
 index 224cd2f..d2643e0 100644
 --- a/lib/Target/R600/SIIntrinsics.td
 +++ b/lib/Target/R600/SIIntrinsics.td
 @@ -23,6 +23,7 @@ let TargetPrefix = SI, isTarget = 1 in {
  
def int_SI_sample : Sample;
def int_SI_sampleb : Sample;
 +  def int_SI_sampled : Sample;
def int_SI_samplel : Sample;
  
def int_SI_imageload : Intrinsic [llvm_v4i32_ty], [llvm_anyvector_ty, 
 llvm_v32i8_ty, llvm_i32_ty], [IntrNoMem];
 -- 
 1.8.3.1
 

 From 7a0048bb2ab1b661831da2b764bf1a52f66bec15 Mon Sep 17 00:00:00 2001
 From: =?UTF-8?q?Michel=20D=C3=A4nzer?= michel.daen...@amd.com
 Date: Thu, 21 Feb 2013 18:51:38 +0100
 Subject: [PATCH v3 2/6] R600/SI: Initial support for LDS/GDS instructions
 MIME-Version: 1.0
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit
 
 Signed-off-by: Michel Dänzer michel.daen...@amd.com
 ---
 
 v3: Drop vdst operand from DS_Store_Helper class, and adapt
 SIInsertWaits::getHwCounts() to handle that. Unfortunately, this seems
 to mess up the asm string output somehow, not sure what's going on
 there.
 
  lib/Target/R600/SIInsertWaits.cpp  |  2 ++
  lib/Target/R600/SIInstrFormats.td  | 24 
  lib/Target/R600/SIInstrInfo.td | 23 +++
  lib/Target/R600/SIInstructions.td  |  3 +++
  lib/Target/R600/SILowerControlFlow.cpp | 16 
  5 files changed, 68 insertions(+)
 
 diff --git a/lib/Target/R600/SIInsertWaits.cpp 
 b/lib/Target/R600/SIInsertWaits.cpp
 index c36e1dc..d31da45 100644
 --- a/lib/Target/R600/SIInsertWaits.cpp
 +++ b/lib/Target/R600/SIInsertWaits.cpp
 @@ -134,6 +134,8 @@ Counters SIInsertWaits::getHwCounts(MachineInstr MI) {
if (TSFlags  SIInstrFlags::LGKM_CNT) {
  
  MachineOperand Op = MI.getOperand(0);
 +if (!Op.isReg())
 +  Op = MI.getOperand(1);
  assert(Op.isReg()  First LGKM operand must be a register!);
  
  unsigned Reg = Op.getReg();
 diff --git a/lib/Target/R600/SIInstrFormats.td 
 b/lib/Target/R600/SIInstrFormats.td
 index 51f323d..434aa7e 100644
 --- 

[Mesa-dev] R600/SI: Support for local memory and derivatives

2013-06-19 Thread Michel Dänzer

These patches implement enough of local memory support to allow radeonsi
to use that for computing derivatives, as suggested by Tom.

They also almost allow test/CodeGen/R600/local-memory.ll to generate
code for SI. Right now it still fails because it tries to copy a VGPR to
an SGPR, which is not possible.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
From f4ca359c4536aa53122b654196f2e007d50976f8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Michel=20D=C3=A4nzer?= michel.daen...@amd.com
Date: Thu, 21 Feb 2013 16:12:45 +0100
Subject: [PATCH 1/6] R600/SI: Add intrinsics for texture sampling with user
 derivatives
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Michel Dänzer michel.daen...@amd.com
---
 lib/Target/R600/SIInstructions.td | 7 ++-
 lib/Target/R600/SIIntrinsics.td   | 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
index 9c96c08..c9eac7d 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -535,7 +535,7 @@ def IMAGE_SAMPLE_B : MIMG_Sampler_Helper 0x0025, IMAGE_SAMPLE_B;
 //def IMAGE_SAMPLE_LZ : MIMG_NoPattern_ IMAGE_SAMPLE_LZ, 0x0027;
 def IMAGE_SAMPLE_C : MIMG_Sampler_Helper 0x0028, IMAGE_SAMPLE_C;
 //def IMAGE_SAMPLE_C_CL : MIMG_NoPattern_ IMAGE_SAMPLE_C_CL, 0x0029;
-//def IMAGE_SAMPLE_C_D : MIMG_NoPattern_ IMAGE_SAMPLE_C_D, 0x002a;
+def IMAGE_SAMPLE_C_D : MIMG_Sampler_Helper 0x002a, IMAGE_SAMPLE_C_D;
 //def IMAGE_SAMPLE_C_D_CL : MIMG_NoPattern_ IMAGE_SAMPLE_C_D_CL, 0x002b;
 def IMAGE_SAMPLE_C_L : MIMG_Sampler_Helper 0x002c, IMAGE_SAMPLE_C_L;
 def IMAGE_SAMPLE_C_B : MIMG_Sampler_Helper 0x002d, IMAGE_SAMPLE_C_B;
@@ -1296,6 +1296,11 @@ multiclass SamplePatternsValueType addr_type {
   def : SampleArrayPattern int_SI_sampleb, IMAGE_SAMPLE_B, addr_type;
   def : SampleShadowPattern int_SI_sampleb, IMAGE_SAMPLE_C_B, addr_type;
   def : SampleShadowArrayPattern int_SI_sampleb, IMAGE_SAMPLE_C_B, addr_type;
+
+  def : SamplePattern int_SI_sampled, IMAGE_SAMPLE_D, addr_type;
+  def : SampleArrayPattern int_SI_sampled, IMAGE_SAMPLE_D, addr_type;
+  def : SampleShadowPattern int_SI_sampled, IMAGE_SAMPLE_C_D, addr_type;
+  def : SampleShadowArrayPattern int_SI_sampled, IMAGE_SAMPLE_C_D, addr_type;
 }
 
 defm : SamplePatternsv2i32;
diff --git a/lib/Target/R600/SIIntrinsics.td b/lib/Target/R600/SIIntrinsics.td
index 224cd2f..d2643e0 100644
--- a/lib/Target/R600/SIIntrinsics.td
+++ b/lib/Target/R600/SIIntrinsics.td
@@ -23,6 +23,7 @@ let TargetPrefix = SI, isTarget = 1 in {
 
   def int_SI_sample : Sample;
   def int_SI_sampleb : Sample;
+  def int_SI_sampled : Sample;
   def int_SI_samplel : Sample;
 
   def int_SI_imageload : Intrinsic [llvm_v4i32_ty], [llvm_anyvector_ty, llvm_v32i8_ty, llvm_i32_ty], [IntrNoMem];
-- 
1.8.3.1

From 7a0048bb2ab1b661831da2b764bf1a52f66bec15 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Michel=20D=C3=A4nzer?= michel.daen...@amd.com
Date: Thu, 21 Feb 2013 18:51:38 +0100
Subject: [PATCH v3 2/6] R600/SI: Initial support for LDS/GDS instructions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Michel Dänzer michel.daen...@amd.com
---

v3: Drop vdst operand from DS_Store_Helper class, and adapt
SIInsertWaits::getHwCounts() to handle that. Unfortunately, this seems
to mess up the asm string output somehow, not sure what's going on
there.

 lib/Target/R600/SIInsertWaits.cpp  |  2 ++
 lib/Target/R600/SIInstrFormats.td  | 24 
 lib/Target/R600/SIInstrInfo.td | 23 +++
 lib/Target/R600/SIInstructions.td  |  3 +++
 lib/Target/R600/SILowerControlFlow.cpp | 16 
 5 files changed, 68 insertions(+)

diff --git a/lib/Target/R600/SIInsertWaits.cpp b/lib/Target/R600/SIInsertWaits.cpp
index c36e1dc..d31da45 100644
--- a/lib/Target/R600/SIInsertWaits.cpp
+++ b/lib/Target/R600/SIInsertWaits.cpp
@@ -134,6 +134,8 @@ Counters SIInsertWaits::getHwCounts(MachineInstr MI) {
   if (TSFlags  SIInstrFlags::LGKM_CNT) {
 
 MachineOperand Op = MI.getOperand(0);
+if (!Op.isReg())
+  Op = MI.getOperand(1);
 assert(Op.isReg()  First LGKM operand must be a register!);
 
 unsigned Reg = Op.getReg();
diff --git a/lib/Target/R600/SIInstrFormats.td b/lib/Target/R600/SIInstrFormats.td
index 51f323d..434aa7e 100644
--- a/lib/Target/R600/SIInstrFormats.td
+++ b/lib/Target/R600/SIInstrFormats.td
@@ -281,6 +281,30 @@ class VINTRP bits 2 op, dag outs, dag ins, string asm, listdag pattern :
 
 let Uses = [EXEC] in {
 
+class DS bits8 op, dag outs, dag ins, string asm, listdag pattern :
+Enc64 outs, ins, asm, pattern {
+
+  bits8 vdst;
+  bits1 gds;
+  bits8 addr;
+  bits8 data0;
+  bits8 data1;
+  bits8 offset0;
+  bits8 offset1;
+
+  let