Re: [Mesa-dev] R600 Patchset: Optimizations for bfgminer

2013-04-29 Thread Aaron Watry
Hi Tom,

I'm not too qualified to review the llvm code changes, but the changes
looked sane. I did want to point out a few piglit changes/regressions as a
result of this set of patches.

For my HD6850, running latest llvm from git:
gegl-rgb-gamma-u8-to-ragabaf: pass -> fail
v3i32-stack: pass -> fail
v3i32-stack-array(All Tests): skip -> fail

Dumps attached for each of these tests using the following environment var:
R600_DEBUG=cs,compute

Also, I did a make check in llvm, and test/CodeGen/R600/setcc.ll failed
with the following... I also had this same error for the CL abs(int2)
builtin, but that test had previously already been failing, so I haven't
included it above.  I'm assuming that we just need to expand ISD::SRA (just
as we expand v2i32/v4i32 for SHL and SRL).



FAIL: LLVM :: CodeGen/R600/setcc.ll (2104 of 7693)
 TEST 'LLVM :: CodeGen/R600/setcc.ll' FAILED

Script:
--
/home/awatry/src/llvm-build/Debug+Asserts/bin/llc <
/home/awatry/src/llvm/test/CodeGen/R600/setcc.ll -march=r600 -mcpu=redwood
| /home/awatry/src/llvm-build/Debug+Asserts/bin/FileCheck
/home/awatry/src/llvm/test/CodeGen/R600/setcc.ll
--
Exit Code: 2
Command Output (stderr):
--
LLVM ERROR: Cannot select: 0x20dce30: v2i32 = sra 0x20dd310, 0x20dcc30
[ID=26]
  0x20dd310: v2i32 = BUILD_VECTOR 0x20dcf30, 0x20dd210 [ID=25]
0x20dcf30: i32 = shl 0x20dc830, 0x20dcb30 [ID=24]
  0x20dc830: i32 = select_cc 0x20d9d60, 0x20dc130, 0x20dc330,
0x20d9c60, 0x20d9b60 [ID=22]
0x20d9d60: i32 = extract_vector_elt 0x20d9860, 0x20d9c60 [ID=17]
  0x20d9860: v2i32,ch = load 0x20a7828, 0x20d9760,
0x20d9560 [ORD=1] [ID=12]
0x20d9760: i32 = Constant<40> [ORD=1] [ID=3]
0x20d9560: i32 = undef [ORD=1] [ID=2]
  0x20d9c60: i32 = Constant<0> [ID=6]
0x20dc130: i32 = extract_vector_elt 0x20d9a60, 0x20d9c60 [ID=19]
  0x20d9a60: v2i32,ch = load 0x20a7828, 0x20d9960,
0x20d9560 [ORD=1] [ID=13]
0x20d9960: i32 = Constant<48> [ORD=1] [ID=4]
0x20d9560: i32 = undef [ORD=1] [ID=2]
  0x20d9c60: i32 = Constant<0> [ID=6]
0x20dc330: i32 = Constant<-1> [ID=7]
0x20d9c60: i32 = Constant<0> [ID=6]
  0x20dcb30: i32 = Constant<31> [ID=9]
0x20dd210: i32 = shl 0x20d9e60, 0x20dcb30 [ID=23]
  0x20d9e60: i32 = select_cc 0x20dc630, 0x20dc730, 0x20dc330,
0x20d9c60, 0x20d9b60 [ID=21]
0x20dc630: i32 = extract_vector_elt 0x20d9860, 0x20dc530 [ID=16]
  0x20d9860: v2i32,ch = load 0x20a7828, 0x20d9760,
0x20d9560 [ORD=1] [ID=12]
0x20d9760: i32 = Constant<40> [ORD=1] [ID=3]
0x20d9560: i32 = undef [ORD=1] [ID=2]
  0x20dc530: i32 = Constant<1> [ID=8]
0x20dc730: i32 = extract_vector_elt 0x20d9a60, 0x20dc530 [ID=18]
  0x20d9a60: v2i32,ch = load 0x20a7828, 0x20d9960,
0x20d9560 [ORD=1] [ID=13]
0x20d9960: i32 = Constant<48> [ORD=1] [ID=4]
0x20d9560: i32 = undef [ORD=1] [ID=2]
  0x20dc530: i32 = Constant<1> [ID=8]
0x20dc330: i32 = Constant<-1> [ID=7]
0x20d9c60: i32 = Constant<0> [ID=6]
  0x20dcb30: i32 = Constant<31> [ID=9]
  0x20dcc30: v2i32 = BUILD_VECTOR 0x20dcb30, 0x20dcb30 [ID=14]
0x20dcb30: i32 = Constant<31> [ID=9]
0x20dcb30: i32 = Constant<31> [ID=9]
In function: setcc_v2i32
FileCheck error: '-' is empty.
--


Testing Time: 48.76s

Failing Tests (1):
LLVM :: CodeGen/R600/setcc.ll

  Expected Passes: 5543
  Expected Failures  : 29
  Unsupported Tests  : 2120
  Unexpected Failures: 1
make[1]: *** [check-local] Error 1
make[1]: Leaving directory `/home/awatry/src/llvm-build/test'
make: *** [check] Error 2



--Aaron





On Mon, Apr 29, 2013 at 3:24 PM, Tom Stellard  wrote:

> Hi,
>
> The attached patchset implements a few optimizations for the bfgminer
> bitcoin mining program.
>
> Please Review.
>
> -Tom
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>


gegl-rgb-gamma-u8-to-ragabaf.cl.dump
Description: Binary data


v3i32-stack.cl.dump
Description: Binary data


v3i32-stack-array.cl.dump
Description: Binary data
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] R600 Patchset: Optimizations for bfgminer

2013-04-29 Thread Tom Stellard
Hi,

The attached patchset implements a few optimizations for the bfgminer
bitcoin mining program.

Please Review.

-Tom
>From 661e832408a8bafc03a7c4c600c4a140b03054b4 Mon Sep 17 00:00:00 2001
From: Dmitry Cherkassov 
Date: Thu, 7 Mar 2013 20:17:59 +0400
Subject: [PATCH 1/3] R600: Add 64-bit load/store support

* Added R600_Reg64 class
* Added T#Index#.XY registers definition
* Added v2i32 register reads from parameter and global space
* Added f32 and i32 elements extraction from v2f32 and v2i32
* Added v2i32 -> v2f32 conversions

Signed-off-by: Dmitry Cherkassov 

Tom Stellard:
  - Mark vec2 operations as expand.  The addition of a vec2 register
class made them all legal.
---
 lib/Target/R600/AMDGPUISelLowering.cpp |  6 +++
 lib/Target/R600/AMDILISelDAGToDAG.cpp  | 10 -
 lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp |  3 ++
 lib/Target/R600/R600ISelLowering.cpp   | 17 +
 lib/Target/R600/R600InstrInfo.cpp  | 19 ++
 lib/Target/R600/R600Instructions.td| 44 ++
 lib/Target/R600/R600RegisterInfo.td| 16 
 test/CodeGen/R600/64bit-kernel-args.ll | 41 
 test/CodeGen/R600/fadd.ll  | 10 +
 test/CodeGen/R600/fdiv.ll  | 37 +-
 test/CodeGen/R600/fmul.ll  | 10 +
 test/CodeGen/R600/fp_to_sint.ll| 10 +
 test/CodeGen/R600/fp_to_uint.ll| 10 +
 test/CodeGen/R600/fsub.ll  | 20 +++---
 test/CodeGen/R600/setcc.ll | 18 +++--
 test/CodeGen/R600/sint_to_fp.ll| 10 +
 test/CodeGen/R600/udiv.ll  | 20 +++---
 test/CodeGen/R600/uint_to_fp.ll| 10 +
 test/CodeGen/R600/urem.ll  | 21 ---
 19 files changed, 292 insertions(+), 40 deletions(-)
 create mode 100644 test/CodeGen/R600/64bit-kernel-args.ll

diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp 
b/lib/Target/R600/AMDGPUISelLowering.cpp
index a266df5..4a064b1 100644
--- a/lib/Target/R600/AMDGPUISelLowering.cpp
+++ b/lib/Target/R600/AMDGPUISelLowering.cpp
@@ -51,6 +51,9 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine &TM) 
:
   setOperationAction(ISD::STORE, MVT::f32, Promote);
   AddPromotedToType(ISD::STORE, MVT::f32, MVT::i32);
 
+  setOperationAction(ISD::STORE, MVT::v2f32, Promote);
+  AddPromotedToType(ISD::STORE, MVT::v2f32, MVT::v2i32);
+
   setOperationAction(ISD::STORE, MVT::v4f32, Promote);
   AddPromotedToType(ISD::STORE, MVT::v4f32, MVT::v4i32);
 
@@ -60,6 +63,9 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine &TM) 
:
   setOperationAction(ISD::LOAD, MVT::v4f32, Promote);
   AddPromotedToType(ISD::LOAD, MVT::v4f32, MVT::v4i32);
 
+  setOperationAction(ISD::LOAD, MVT::v2f32, Promote);
+  AddPromotedToType(ISD::LOAD, MVT::v2f32, MVT::v2i32);
+
   setOperationAction(ISD::MUL, MVT::i64, Expand);
 
   setOperationAction(ISD::UDIV, MVT::i32, Expand);
diff --git a/lib/Target/R600/AMDILISelDAGToDAG.cpp 
b/lib/Target/R600/AMDILISelDAGToDAG.cpp
index ba75a44..198cd7e 100644
--- a/lib/Target/R600/AMDILISelDAGToDAG.cpp
+++ b/lib/Target/R600/AMDILISelDAGToDAG.cpp
@@ -167,12 +167,20 @@ SDNode *AMDGPUDAGToDAGISel::Select(SDNode *N) {
 if (ST.device()->getGeneration() > AMDGPUDeviceInfo::HD6XXX) {
   break;
 }
+unsigned RegSequenceClassID;
+EVT VT = N->getValueType(0);
+assert(VT.isVector());
+switch (VT.getVectorNumElements()) {
+case 4: RegSequenceClassID = AMDGPU::R600_Reg128RegClassID; break;
+case 2: RegSequenceClassID = AMDGPU::R600_Reg64RegClassID; break;
+default: llvm_unreachable("Unhandled vector width in BUILD_VECTOR");
+}
 // BUILD_VECTOR is usually lowered into an IMPLICIT_DEF + 4 INSERT_SUBREG
 // that adds a 128 bits reg copy when going through TwoAddressInstructions
 // pass. We want to avoid 128 bits copies as much as possible because they
 // can't be bundled by our scheduler.
 SDValue RegSeqArgs[9] = {
-  CurDAG->getTargetConstant(AMDGPU::R600_Reg128RegClassID, MVT::i32),
+  CurDAG->getTargetConstant(RegSequenceClassID, MVT::i32),
   SDValue(), CurDAG->getTargetConstant(AMDGPU::sub0, MVT::i32),
   SDValue(), CurDAG->getTargetConstant(AMDGPU::sub1, MVT::i32),
   SDValue(), CurDAG->getTargetConstant(AMDGPU::sub2, MVT::i32),
diff --git a/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp 
b/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
index 7c83d86..030fc87 100644
--- a/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
+++ b/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
@@ -150,6 +150,7 @@ void R600MCCodeEmitter::EncodeInstruction(const MCInst &MI, 
raw_ostream &OS,
   } else {
 switch(MI.getOpcode()) {
 case AMDGPU::RAT_WRITE_CACHELESS_32_eg:
+case AMDGPU::RAT_WRITE_CACHEL