Re: [Beignet] [PATCH v3 3/3] add utest for creating 2d image from buffer.

2015-09-15 Thread Guo, Yejun
2 comments inline, thanks.

-Original Message-
From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of 
xionghu@intel.com
Sent: Wednesday, September 09, 2015 1:44 PM
To: beignet@lists.freedesktop.org
Cc: Luo, Xionghu
Subject: [Beignet] [PATCH v3 3/3] add utest for creating 2d image from buffer.



+  OCL_CALL (clGetDeviceInfo, device, CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT, 
0, 0, _value_size);
+  size_t base_address_alignment = 0;
+  OCL_CALL (clGetDeviceInfo, device, CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT, 
param_value_size, _address_alignment, _value_size);
[Yejun] the proper usage is: OCL_CALL (clGetDeviceInfo, device, 
CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT, sizeof(base_address_alignment), 
_address_alignment, NULL);



+  // Setup kernel and images
+  size_t buffer_sz = sizeof(uint32_t) * w * h;
[yejun] it is better to query IMAGE_PITCH_ALIGNMENT to do alignment for w


___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


[Beignet] [PATCH] GBE: implement pre-register-allocation instruction scheduling.

2015-09-15 Thread Zhigang Gong
To find out an instruction scheduling policy to achieve the theoretical minimum
registers required in a basic block is a NP problem. We have to use some 
heuristic
factor to simplify the algorithm. There are many researchs which indicate a
bottom-up list scheduling is much better than the top-down method in turns of
register pressure.  I choose one of such research paper as our target. The paper
is as below:

"Register-Sensitive Selection, Duplication, and Sequencing of Instructions"
It use the bottom-up list scheduling with a Sethi-Ullman label as an
heuristic number. As we will do cycle awareness scheduling after the register
allocation, we don't need to bother with cycle related heuristic number here.
I just skipped the EST computing and usage part in the algorithm.

It turns out this algorithm works well. It could reduce the register spilling
in clBlas's sgemmBlock kernel from 83+ to only 20.

Although this scheduling method seems to be lowering the ILP(instruction level 
parallism).
It's not a big issue, because we will allocate as much as possible different 
registers
in the following register allocation stage, and we will do a after allocation
instruction scheduling which will try to get as much ILP as possible.

Signed-off-by: Zhigang Gong 
---
 backend/src/backend/gen_insn_scheduling.cpp | 137 +++-
 1 file changed, 116 insertions(+), 21 deletions(-)

diff --git a/backend/src/backend/gen_insn_scheduling.cpp 
b/backend/src/backend/gen_insn_scheduling.cpp
index 358a2ce..f4f1e70 100644
--- a/backend/src/backend/gen_insn_scheduling.cpp
+++ b/backend/src/backend/gen_insn_scheduling.cpp
@@ -41,26 +41,29 @@
  * ==
  *
  * We try to limit the register pressure.
- * Well, this is a hard problem and we have a decent strategy now that we 
called
- * "zero cycled LIFO scheduling".
- * We use a local forward list scheduling and we schedule the instructions in a
- * LIFO order i.e. as a stack. Basically, we take the most recent instruction
- * and schedule it right away. Obviously we ignore completely the real 
latencies
- * and throuputs and just simulate instructions that are issued and completed 
in
- * zero cycle. For the complex kernels we already have (like menger sponge),
- * this provides a pretty good strategy enabling SIMD16 code generation where
- * when scheduling is deactivated, even SIMD8 fails
  *
- * One may argue that this strategy is bad, latency wise. This is not true 
since
- * the register allocator will anyway try to burn as many registers as 
possible.
- * So, there is still opportunities to schedule after register allocation.
+ * To find out an instruction scheduling policy to achieve the theoretical 
minimum
+ * registers required in a basic block is a NP problem. We have to use some 
heuristic
+ * factor to simplify the algorithm. There are many researchs which indicate a
+ * bottom-up list scheduling is much better than the top-down method in turns 
of
+ * register pressure.  I choose one of such research paper as our target. The 
paper
+ * is as below:
  *
- * Our idea seems to work decently. There is however a strong research article
- * that is able to near-optimally reschudle the instructions to minimize
- * register use. This is:
+ * "Register-Sensitive Selection, Duplication, and Sequencing of Instructions"
+ * It use the bottom-up list scheduling with a Sethi-Ullman label as an
+ * heuristic number. As we will do cycle awareness scheduling after the 
register
+ * allocation, we don't need to bother with cycle related heuristic number 
here.
+ * I just skipped the EST computing and usage part in the algorithm.
  *
- * "Minimum Register Instruction Sequence Problem: Revisiting Optimal Code
- *  Generation for DAGs"
+ * It turns out this algorithm works well. It could reduce the register 
spilling
+ * in clBlas's sgemmBlock kernel from 83+ to only 20.
+ *
+ * Although this scheduling method seems to be lowering the ILP(instruction 
level parallism).
+ * It's not a big issue, because we will allocate as much as possible 
different registers
+ * in the following register allocation stage, and we will do a after 
allocation
+ * instruction scheduling which will try to get as much ILP as possible.
+ *
+ * FIXME: we only need to do this scheduling when a BB is really under high 
register pressure.
  *
  * After the register allocation
  * ==
@@ -114,7 +117,7 @@ namespace gbe
   struct ScheduleDAGNode
   {
 INLINE ScheduleDAGNode(SelectionInstruction ) :
-  insn(insn), refNum(0), retiredCycle(0), preRetired(false), 
readDistance(0x7fff) {}
+  insn(insn), refNum(0), depNum(0), retiredCycle(0), preRetired(false), 
readDistance(0x7fff) {}
 bool dependsOn(ScheduleDAGNode *node) const {
   GBE_ASSERT(node != NULL);
   for (auto child : node->children)
@@ -128,6 +131,10 @@ namespace gbe
 SelectionInstruction 
 /*! Number of nodes that point to us 

[Beignet] [PATCH 2/8] Backend: Add FDIV64 function for gen_insn_selection.

2015-09-15 Thread junyan . he
From: Junyan He 

Signed-off-by: Junyan He 
---
 backend/src/backend/gen_context.cpp|  4 +++
 backend/src/backend/gen_context.hpp|  1 +
 .../src/backend/gen_insn_gen7_schedule_info.hxx|  1 +
 backend/src/backend/gen_insn_selection.cpp | 41 +-
 backend/src/backend/gen_insn_selection.hxx |  1 +
 5 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/backend/src/backend/gen_context.cpp 
b/backend/src/backend/gen_context.cpp
index 25fdf08..9e2fd03 100644
--- a/backend/src/backend/gen_context.cpp
+++ b/backend/src/backend/gen_context.cpp
@@ -1679,6 +1679,10 @@ namespace gbe
 }
   }
 
+  void GenContext::emitF64DIVInstruction(const SelectionInstruction ) {
+GBE_ASSERT(0); // No support for double on Gen7
+  }
+
   void GenContext::emitTernaryInstruction(const SelectionInstruction ) {
 const GenRegister dst = ra->genReg(insn.dst(0));
 const GenRegister src0 = ra->genReg(insn.src(0));
diff --git a/backend/src/backend/gen_context.hpp 
b/backend/src/backend/gen_context.hpp
index 34f9293..57eb0a6 100644
--- a/backend/src/backend/gen_context.hpp
+++ b/backend/src/backend/gen_context.hpp
@@ -173,6 +173,7 @@ namespace gbe
 void emitGetImageInfoInstruction(const SelectionInstruction );
 virtual void emitI64MULInstruction(const SelectionInstruction );
 virtual void emitI64DIVREMInstruction(const SelectionInstruction );
+virtual void emitF64DIVInstruction(const SelectionInstruction );
 void scratchWrite(const GenRegister header, uint32_t offset, uint32_t 
reg_num, uint32_t reg_type, uint32_t channel_mode);
 void scratchRead(const GenRegister dst, const GenRegister header, uint32_t 
offset, uint32_t reg_num, uint32_t reg_type, uint32_t channel_mode);
 unsigned beforeMessage(const SelectionInstruction , GenRegister bti, 
GenRegister flagTemp, unsigned desc);
diff --git a/backend/src/backend/gen_insn_gen7_schedule_info.hxx 
b/backend/src/backend/gen_insn_gen7_schedule_info.hxx
index d073770..9b60c17 100644
--- a/backend/src/backend/gen_insn_gen7_schedule_info.hxx
+++ b/backend/src/backend/gen_insn_gen7_schedule_info.hxx
@@ -43,3 +43,4 @@ DECL_GEN7_SCHEDULE(Atomic,  80,1,1)
 DECL_GEN7_SCHEDULE(I64MUL,  20,40,  20)
 DECL_GEN7_SCHEDULE(I64SATADD,   20,40,  20)
 DECL_GEN7_SCHEDULE(I64SATSUB,   20,40,  20)
+DECL_GEN7_SCHEDULE(F64DIV,  20,40,  20)
diff --git a/backend/src/backend/gen_insn_selection.cpp 
b/backend/src/backend/gen_insn_selection.cpp
index ab00269..eaf56f9 100644
--- a/backend/src/backend/gen_insn_selection.cpp
+++ b/backend/src/backend/gen_insn_selection.cpp
@@ -361,8 +361,10 @@ namespace gbe
 bool has32X32Mul() const { return bHas32X32Mul; }
 void setHas32X32Mul(bool b) { bHas32X32Mul = b; }
 bool hasLongType() const { return bHasLongType; }
+bool hasDoubleType() const { return bHasDoubleType; }
 bool hasHalfType() const { return bHasHalfType; }
 void setHasLongType(bool b) { bHasLongType = b; }
+void setHasDoubleType(bool b) { bHasDoubleType = b; }
 void setHasHalfType(bool b) { bHasHalfType = b; }
 bool hasLongRegRestrict() { return bLongRegRestrict; }
 void setLongRegRestrict(bool b) { bLongRegRestrict = b; }
@@ -669,6 +671,8 @@ namespace gbe
 void I64DIV(Reg dst, Reg src0, Reg src1, GenRegister *tmp, int tmp_int);
 /*! 64-bit integer remainder of division */
 void I64REM(Reg dst, Reg src0, Reg src1, GenRegister *tmp, int tmp_int);
+/*! double division */
+void F64DIV(Reg dst, Reg src0, Reg src1, GenRegister tmp[7]);
 /* common functions for both binary instruction and sel_cmp and compare 
instruction.
It will handle the IMM or normal register assignment, and will try to 
avoid LOADI
as much as possible. */
@@ -745,6 +749,7 @@ namespace gbe
 uint32_t currAuxLabel;
 bool bHas32X32Mul;
 bool bHasLongType;
+bool bHasDoubleType;
 bool bHasHalfType;
 bool bLongRegRestrict;
 uint32_t ldMsgOrder;
@@ -788,7 +793,7 @@ namespace gbe
 curr(ctx.getSimdWidth()), file(ctx.getFunction().getRegisterFile()),
 maxInsnNum(ctx.getFunction().getLargestBlockSize()), dagPool(maxInsnNum),
 stateNum(0), vectorNum(0), bwdCodeGeneration(false), 
currAuxLabel(ctx.getFunction().labelNum()),
-bHas32X32Mul(false), bHasLongType(false), bHasHalfType(false), 
bLongRegRestrict(false),
+bHas32X32Mul(false), bHasLongType(false), bHasDoubleType(false), 
bHasHalfType(false), bLongRegRestrict(false),
 ldMsgOrder(LD_MSG_ORDER_IVB), slowByteGather(false)
   {
 const ir::Function  = ctx.getFunction();
@@ -1618,6 +1623,15 @@ namespace gbe
   insn->dst(i + 1) = tmp[i];
   }
 
+  void Selection::Opaque::F64DIV(Reg dst, Reg src0, Reg src1, GenRegister 
tmp[7]) {
+SelectionInstruction *insn = this->appendInsn(SEL_OP_F64DIV, 7 + 1, 2);
+

[Beignet] [PATCH 0/8] Implement double division on BDW

2015-09-15 Thread junyan . he
From: Junyan He 

We use the macro:
r0 = 0, r6 = a, r7 = b, r1 = 1

math.eo.f0.0 (4) r8.acc2 r6.noacc r7.noacc 0xE
(-f0.0) if
madm (4) r9.acc3 r0.noacc r6.noacc r8.acc2   // Step(1), q0=a*y0
madm (4) r10.acc4 r1.noacc -r7.noacc r8.acc2 // Step(2), e0=(1-b*y0)
madm (4) r11.acc5 r6.noacc -r7.noacc r9.acc3 // Step(3), r0=a-b*q0
madm (4) r12.acc6 r8.acc2 r10.acc4 r8.acc2   // Step(4), y1=y0+e0*y0
madm (4) r13.acc7 r1.noacc -r7.noacc r12.acc6// Step(5), e1=(1-b*y1)
madm (4) r8.acc8 r8.acc2 r10.acc4 r12.acc6   // Step(6), y2=y0+e0*y1
madm (4) r9.acc9 r9.acc3 r11.acc5 r12.acc6   // Step(7), q1=q0+r0*y1
madm (4) r12.acc2 r12.acc6 r8.acc8 r13.acc7  // Step(8), y3=y1+e1*y2
madm (4) r11.acc3 r6.noacc -r7.noacc r9.acc9 // Step(9), r1=a-b*q1

madm (4) r8.noacc r9.acc9 r11.acc3 r12.acc2  // Step(10), q=q1+r1*y3
endif

to implement hi precision double division on BDW.

Signed-off-by: Junyan He 
---


___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


[Beignet] [PATCH 4/8] Backend: Add MATH_WITH_ACC function.

2015-09-15 Thread junyan . he
From: Junyan He 

Also add setSrc0WithAcc and setSrc1WithAcc help functions
to set the correct special accumulator fields of instruction.

Signed-off-by: Junyan He 
---
 backend/src/backend/gen8_encoder.cpp | 61 +++-
 backend/src/backend/gen8_encoder.hpp |  5 +++
 backend/src/backend/gen_defs.hpp | 11 +++
 3 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/backend/src/backend/gen8_encoder.cpp 
b/backend/src/backend/gen8_encoder.cpp
index 69eabb2..0af27a3 100644
--- a/backend/src/backend/gen8_encoder.cpp
+++ b/backend/src/backend/gen8_encoder.cpp
@@ -360,6 +360,46 @@ namespace gbe
 gen8_insn->bits1.da1.dest_horiz_stride = dest.hstride;
   }
 
+  void Gen8Encoder::setSrc0WithAcc(GenNativeInstruction *insn, GenRegister 
reg, uint32_t accN) {
+Gen8NativeInstruction *gen8_insn = >gen8_insn;
+assert(reg.file == GEN_GENERAL_REGISTER_FILE);
+assert(reg.nr < 128);
+assert(gen8_insn->header.access_mode == GEN_ALIGN_16);
+assert(reg.subnr == 0);
+assert(gen8_insn->header.execution_size >= GEN_WIDTH_4);
+
+gen8_insn->bits1.da16acc.src0_reg_file = reg.file;
+gen8_insn->bits1.da16acc.src0_reg_type = reg.type;
+gen8_insn->bits2.da16acc.src0_abs = reg.absolute;
+gen8_insn->bits2.da16acc.src0_negate = reg.negation;
+gen8_insn->bits2.da16acc.src0_address_mode = reg.address_mode;
+gen8_insn->bits2.da16acc.src0_subreg_nr = reg.subnr / 16;
+gen8_insn->bits2.da16acc.src0_reg_nr = reg.nr;
+gen8_insn->bits2.da16acc.src0_specal_acc_lo = accN;
+gen8_insn->bits2.da16acc.src0_specal_acc_hi = 0;
+gen8_insn->bits2.da16acc.src0_vert_stride = reg.vstride;
+  }
+
+  void Gen8Encoder::setSrc1WithAcc(GenNativeInstruction *insn, GenRegister 
reg, uint32_t accN) {
+Gen8NativeInstruction *gen8_insn = >gen8_insn;
+assert(reg.file == GEN_GENERAL_REGISTER_FILE);
+assert(reg.nr < 128);
+assert(gen8_insn->header.access_mode == GEN_ALIGN_16);
+assert(reg.subnr == 0);
+assert(gen8_insn->header.execution_size >= GEN_WIDTH_4);
+
+gen8_insn->bits2.da16acc.src1_reg_file = reg.file;
+gen8_insn->bits2.da16acc.src1_reg_type = reg.type;
+gen8_insn->bits3.da16acc.src1_abs = reg.absolute;
+gen8_insn->bits3.da16acc.src1_negate = reg.negation;
+gen8_insn->bits3.da16acc.src1_address_mode = reg.address_mode;
+gen8_insn->bits3.da16acc.src1_subreg_nr = reg.subnr / 16;
+gen8_insn->bits3.da16acc.src1_reg_nr = reg.nr;
+gen8_insn->bits3.da16acc.src1_specal_acc_lo = accN;
+gen8_insn->bits3.da16acc.src1_specal_acc_hi = 0;
+gen8_insn->bits3.da16acc.src1_vert_stride = reg.vstride;
+  }
+
   void Gen8Encoder::setSrc0(GenNativeInstruction *insn, GenRegister reg) {
 Gen8NativeInstruction *gen8_insn = >gen8_insn;
 if (reg.file != GEN_ARCHITECTURE_REGISTER_FILE)
@@ -372,7 +412,7 @@ namespace gbe
   gen8_insn->bits2.da1.src0_negate = reg.negation;
   gen8_insn->bits2.da1.src0_address_mode = reg.address_mode;
   if (reg.file == GEN_IMMEDIATE_VALUE) {
-if (reg.type == GEN_TYPE_L || reg.type == GEN_TYPE_UL) {
+if (reg.type == GEN_TYPE_L || reg.type == GEN_TYPE_UL || reg.type == 
GEN_TYPE_DF_IMM) {
   gen8_insn->bits3.ud = (uint32_t)(reg.value.i64 >> 32);
   gen8_insn->bits2.ud = (uint32_t)(reg.value.i64);
 } else {
@@ -532,4 +572,23 @@ namespace gbe
 gen8_insn->bits3.da3src.src2_reg_nr++;
  }
   }
+
+  void Gen8Encoder::MATH_WITH_ACC(GenRegister dst, uint32_t function, 
GenRegister src0, GenRegister src1,
+ uint32_t dstAcc, uint32_t src0Acc, uint32_t 
src1Acc)
+  {
+ GenNativeInstruction *insn = this->next(GEN_OPCODE_MATH);
+ Gen8NativeInstruction *gen8_insn = >gen8_insn;
+ assert(dst.file == GEN_GENERAL_REGISTER_FILE);
+ assert(src0.file == GEN_GENERAL_REGISTER_FILE);
+ assert(src1.file == GEN_GENERAL_REGISTER_FILE);
+ assert(dst.hstride == GEN_HORIZONTAL_STRIDE_1 || dst.hstride == 
GEN_HORIZONTAL_STRIDE_0);
+
+ gen8_insn->header.access_mode = GEN_ALIGN_16;
+ insn->header.destreg_or_condmod = function;
+ this->setHeader(insn);
+ this->setDst(insn, dst);
+ gen8_insn->bits1.da16acc.dst_specal_acc = dstAcc;
+ this->setSrc0WithAcc(insn, src0, src0Acc);
+ this->setSrc1WithAcc(insn, src1, src1Acc);
+  }
 } /* End of the name space. */
diff --git a/backend/src/backend/gen8_encoder.hpp 
b/backend/src/backend/gen8_encoder.hpp
index 504e13d..53ec3d1 100644
--- a/backend/src/backend/gen8_encoder.hpp
+++ b/backend/src/backend/gen8_encoder.hpp
@@ -69,6 +69,11 @@ namespace gbe
 virtual unsigned setAtomicMessageDesc(GenNativeInstruction *insn, unsigned 
function, unsigned bti, unsigned srcNum);
 virtual unsigned setUntypedReadMessageDesc(GenNativeInstruction *insn, 
unsigned bti, unsigned elemNum);
 virtual unsigned setUntypedWriteMessageDesc(GenNativeInstruction *insn, 
unsigned bti, unsigned 

[Beignet] [PATCH 5/8] Backend: Add the MADM function to gen8 encoder.

2015-09-15 Thread junyan . he
From: Junyan He 

Signed-off-by: Junyan He 
---
 backend/src/backend/gen8_encoder.cpp | 56 
 backend/src/backend/gen8_encoder.hpp |  2 ++
 backend/src/backend/gen_defs.hpp |  2 ++
 3 files changed, 60 insertions(+)

diff --git a/backend/src/backend/gen8_encoder.cpp 
b/backend/src/backend/gen8_encoder.cpp
index 0af27a3..002a8b5 100644
--- a/backend/src/backend/gen8_encoder.cpp
+++ b/backend/src/backend/gen8_encoder.cpp
@@ -591,4 +591,60 @@ namespace gbe
  this->setSrc0WithAcc(insn, src0, src0Acc);
  this->setSrc1WithAcc(insn, src1, src1Acc);
   }
+
+  void Gen8Encoder::MADM(GenRegister dst, GenRegister src0, GenRegister src1, 
GenRegister src2,
+  uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc, uint32_t src2Acc)
+  {
+GenNativeInstruction *insn = this->next(GEN_OPCODE_MADM);
+Gen8NativeInstruction *gen8_insn = >gen8_insn;
+assert(dst.file == GEN_GENERAL_REGISTER_FILE);
+assert(src0.file == GEN_GENERAL_REGISTER_FILE);
+assert(src1.file == GEN_GENERAL_REGISTER_FILE);
+assert(src2.file == GEN_GENERAL_REGISTER_FILE);
+assert(dst.hstride == GEN_HORIZONTAL_STRIDE_1 || dst.hstride == 
GEN_HORIZONTAL_STRIDE_0);
+assert(src0.type == GEN_TYPE_DF || src0.type == GEN_TYPE_F);
+assert(src0.type == dst.type);
+assert(src0.type == src1.type);
+assert(src0.type == src2.type);
+int32_t dataType = src0.type == GEN_TYPE_DF ? 3 : 0;
+
+this->setHeader(insn);
+gen8_insn->bits1.da3srcacc.dest_reg_nr = dst.nr;
+gen8_insn->bits1.da3srcacc.dest_subreg_nr = dst.subnr / 16;
+gen8_insn->bits1.da3srcacc.dst_specal_acc = dstAcc;
+gen8_insn->bits1.da3srcacc.src_type = dataType;
+gen8_insn->bits1.da3srcacc.dest_type = dataType;
+gen8_insn->header.access_mode = GEN_ALIGN_16;
+
+assert(src0.file == GEN_GENERAL_REGISTER_FILE);
+assert(src0.address_mode == GEN_ADDRESS_DIRECT);
+assert(src0.nr < 128);
+gen8_insn->bits2.da3srcacc.src0_specal_acc = src0Acc;
+gen8_insn->bits2.da3srcacc.src0_subreg_nr = src0.subnr / 4 ;
+gen8_insn->bits2.da3srcacc.src0_reg_nr = src0.nr;
+gen8_insn->bits1.da3srcacc.src0_abs = src0.absolute;
+gen8_insn->bits1.da3srcacc.src0_negate = src0.negation;
+gen8_insn->bits2.da3srcacc.src0_rep_ctrl = src0.vstride == 
GEN_VERTICAL_STRIDE_0;
+
+assert(src1.file == GEN_GENERAL_REGISTER_FILE);
+assert(src1.address_mode == GEN_ADDRESS_DIRECT);
+assert(src1.nr < 128);
+gen8_insn->bits2.da3srcacc.src1_specal_acc = src1Acc;
+gen8_insn->bits2.da3srcacc.src1_subreg_nr_low = (src1.subnr / 4) & 0x3;
+gen8_insn->bits3.da3srcacc.src1_subreg_nr_high = (src1.subnr / 4) >> 2;
+gen8_insn->bits2.da3srcacc.src1_rep_ctrl = src1.vstride == 
GEN_VERTICAL_STRIDE_0;
+gen8_insn->bits3.da3srcacc.src1_reg_nr = src1.nr;
+gen8_insn->bits1.da3srcacc.src1_abs = src1.absolute;
+gen8_insn->bits1.da3srcacc.src1_negate = src1.negation;
+
+assert(src2.file == GEN_GENERAL_REGISTER_FILE);
+assert(src2.address_mode == GEN_ADDRESS_DIRECT);
+assert(src2.nr < 128);
+gen8_insn->bits3.da3srcacc.src2_specal_acc = src2Acc;
+gen8_insn->bits3.da3srcacc.src2_subreg_nr = src2.subnr / 4;
+gen8_insn->bits3.da3srcacc.src2_rep_ctrl = src2.vstride == 
GEN_VERTICAL_STRIDE_0;
+gen8_insn->bits3.da3srcacc.src2_reg_nr = src2.nr;
+gen8_insn->bits1.da3srcacc.src2_abs = src2.absolute;
+gen8_insn->bits1.da3srcacc.src2_negate = src2.negation;
+  }
 } /* End of the name space. */
diff --git a/backend/src/backend/gen8_encoder.hpp 
b/backend/src/backend/gen8_encoder.hpp
index 53ec3d1..8e7939b 100644
--- a/backend/src/backend/gen8_encoder.hpp
+++ b/backend/src/backend/gen8_encoder.hpp
@@ -74,6 +74,8 @@ namespace gbe
 
 void MATH_WITH_ACC(GenRegister dst, uint32_t function, GenRegister src0, 
GenRegister src1,
uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc);
+void MADM(GenRegister dst, GenRegister src0, GenRegister src1, GenRegister 
src2,
+  uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc, uint32_t 
src2Acc);
   };
 }
 #endif /* __GBE_GEN8_ENCODER_HPP__ */
diff --git a/backend/src/backend/gen_defs.hpp b/backend/src/backend/gen_defs.hpp
index a1bd8dd..1b550ac 100644
--- a/backend/src/backend/gen_defs.hpp
+++ b/backend/src/backend/gen_defs.hpp
@@ -174,6 +174,8 @@ enum opcode {
   GEN_OPCODE_LINE = 89,
   GEN_OPCODE_PLN = 90,
   GEN_OPCODE_MAD = 91,
+  GEN_OPCODE_LRP = 92,
+  GEN_OPCODE_MADM = 93,
   GEN_OPCODE_NOP = 126,
 };
 
-- 
1.9.1



___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


[Beignet] [PATCH 3/8] Backend: Add gen8 instruction field for special accumulator.

2015-09-15 Thread junyan . he
From: Junyan He 

The madm and invm function need to set accumulator id in the
instruction. On BDW, the write mask of the dst and channel
mask of src are reinterpreted for acc2~acc9 selection.

Signed-off-by: Junyan He 
---
 backend/src/backend/gen8_instruction.hpp | 86 
 1 file changed, 86 insertions(+)

diff --git a/backend/src/backend/gen8_instruction.hpp 
b/backend/src/backend/gen8_instruction.hpp
index 5cf1032..2aa5bf7 100644
--- a/backend/src/backend/gen8_instruction.hpp
+++ b/backend/src/backend/gen8_instruction.hpp
@@ -135,6 +135,22 @@ union Gen8NativeInstruction
 uint32_t dest_address_mode:1;
   } ia16;
 
+  struct { // The sub reg field is reinterpreted as accumulator selector.
+uint32_t flag_sub_reg_nr:1;
+uint32_t flag_reg_nr:1;
+uint32_t mask_control:1;
+uint32_t dest_reg_file:2;
+uint32_t dest_reg_type:4;
+uint32_t src0_reg_file:2;
+uint32_t src0_reg_type:4;
+uint32_t pad:1;
+uint32_t dst_specal_acc:4;
+uint32_t dest_subreg_nr:1;
+uint32_t dest_reg_nr:8;
+uint32_t reserved:2;
+uint32_t dest_address_mode:1;
+  } da16acc;
+
   struct {
 uint32_t flag_sub_reg_nr:1;
 uint32_t flag_reg_nr:1;
@@ -153,6 +169,25 @@ union Gen8NativeInstruction
 uint32_t dest_subreg_nr:3;
 uint32_t dest_reg_nr:8;
   } da3src;
+
+  struct {
+uint32_t flag_sub_reg_nr:1;
+uint32_t flag_reg_nr:1;
+uint32_t mask_control:1;
+uint32_t src1_type:1;
+uint32_t src2_type:1;
+uint32_t src0_abs:1;
+uint32_t src0_negate:1;
+uint32_t src1_abs:1;
+uint32_t src1_negate:1;
+uint32_t src2_abs:1;
+uint32_t src2_negate:1;
+uint32_t src_type:3;
+uint32_t dest_type:3;
+uint32_t dst_specal_acc:4;
+uint32_t dest_subreg_nr:3;
+uint32_t dest_reg_nr:8;
+  } da3srcacc;
 }bits1;
 
 union {
@@ -219,6 +254,21 @@ union Gen8NativeInstruction
   } ia16;
 
   struct {
+uint32_t src0_specal_acc_lo:4;
+uint32_t src0_subreg_nr:1;
+uint32_t src0_reg_nr:8;
+uint32_t src0_abs:1;
+uint32_t src0_negate:1;
+uint32_t src0_address_mode:1;
+uint32_t src0_specal_acc_hi:4;
+uint32_t pad0:1;
+uint32_t src0_vert_stride:4;
+uint32_t src1_reg_file:2;
+uint32_t src1_reg_type:4;
+uint32_t pad:1;
+  } da16acc;
+
+  struct {
 uint32_t src0_rep_ctrl:1;
 uint32_t src0_swizzle:8;
 uint32_t src0_subreg_nr:3;
@@ -230,6 +280,17 @@ union Gen8NativeInstruction
   } da3src;
 
   struct {
+uint32_t src0_rep_ctrl:1;
+uint32_t src0_specal_acc:8;
+uint32_t src0_subreg_nr:3;
+uint32_t src0_reg_nr:8;
+uint32_t src0_subreg_nr_w:1;
+uint32_t src1_rep_ctrl:1;
+uint32_t src1_specal_acc:8;
+uint32_t src1_subreg_nr_low:2;
+  } da3srcacc;
+
+  struct {
 uint32_t uip:32;
   } gen8_branch;
 
@@ -294,6 +355,19 @@ union Gen8NativeInstruction
   } ia16;
 
   struct {
+uint32_t src1_specal_acc_lo:4;
+uint32_t src1_subreg_nr:1;
+uint32_t src1_reg_nr:8;
+uint32_t src1_abs:1;
+uint32_t src1_negate:1;
+uint32_t src1_address_mode:1;
+uint32_t src1_specal_acc_hi:4;
+uint32_t pad1:1;
+uint32_t src1_vert_stride:4;
+uint32_t pad2:7;
+  } da16acc;
+
+  struct {
 uint32_t function_control:19;
 uint32_t header_present:1;
 uint32_t response_length:5;
@@ -504,6 +578,18 @@ union Gen8NativeInstruction
 uint32_t pad:1;
   } da3src;
 
+  struct {
+uint32_t src1_subreg_nr_high:1;
+uint32_t src1_reg_nr:8;
+uint32_t src1_subreg_nr_w:1;
+uint32_t src2_rep_ctrl:1;
+uint32_t src2_specal_acc:8;
+uint32_t src2_subreg_nr:3;
+uint32_t src2_reg_nr:8;
+uint32_t src2_subreg_nr_w:1;
+uint32_t pad:1;
+  } da3srcacc;
+
   /*! Message gateway */
   struct {
 uint32_t subfunc:3;
-- 
1.9.1



___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


[Beignet] [PATCH 8/8] Utest: Add double division test.

2015-09-15 Thread junyan . he
From: Junyan He 

Signed-off-by: Junyan He 
---
 kernels/compiler_double_4.cl   |  5 -
 kernels/compiler_double_div.cl |  5 +
 utests/CMakeLists.txt  |  1 +
 utests/compiler_double_4.cpp   | 40 
 utests/compiler_double_div.cpp | 42 ++
 5 files changed, 48 insertions(+), 45 deletions(-)
 delete mode 100644 kernels/compiler_double_4.cl
 create mode 100644 kernels/compiler_double_div.cl
 delete mode 100644 utests/compiler_double_4.cpp
 create mode 100644 utests/compiler_double_div.cpp

diff --git a/kernels/compiler_double_4.cl b/kernels/compiler_double_4.cl
deleted file mode 100644
index e5e46f9..000
--- a/kernels/compiler_double_4.cl
+++ /dev/null
@@ -1,5 +0,0 @@
-#pragma OPENCL EXTENSION cl_khr_fp64 : enable
-kernel void compiler_double_4(global double *src1, global double *src2, global 
double *dst) {
-  int i = get_global_id(0);
-  dst[i] = src1[i] + src2[i];
-}
diff --git a/kernels/compiler_double_div.cl b/kernels/compiler_double_div.cl
new file mode 100644
index 000..3758e65
--- /dev/null
+++ b/kernels/compiler_double_div.cl
@@ -0,0 +1,5 @@
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+kernel void compiler_double_div(global double *src1, global double *src2, 
global double *dst) {
+  int i = get_global_id(0);
+  dst[i] = src1[i] / src2[i];
+}
diff --git a/utests/CMakeLists.txt b/utests/CMakeLists.txt
index e7a9e26..aeae3d6 100644
--- a/utests/CMakeLists.txt
+++ b/utests/CMakeLists.txt
@@ -194,6 +194,7 @@ set (utests_sources
   compiler_sub_group_all.cpp
   compiler_time_stamp.cpp
   compiler_double_precision.cpp
+  compiler_double_div.cpp
   load_program_from_gen_bin.cpp
   load_program_from_spir.cpp
   get_arg_info.cpp
diff --git a/utests/compiler_double_4.cpp b/utests/compiler_double_4.cpp
deleted file mode 100644
index cb25bd4..000
--- a/utests/compiler_double_4.cpp
+++ /dev/null
@@ -1,40 +0,0 @@
-#include 
-#include "utest_helper.hpp"
-
-void compiler_double_4(void)
-{
-  const size_t n = 16;
-  double cpu_src1[n], cpu_src2[n];
-
-  // Setup kernel and buffers
-  OCL_CREATE_KERNEL("compiler_double_4");
-  OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(double), NULL);
-  OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(double), NULL);
-  OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(double), NULL);
-  OCL_SET_ARG(0, sizeof(cl_mem), [0]);
-  OCL_SET_ARG(1, sizeof(cl_mem), [1]);
-  OCL_SET_ARG(2, sizeof(cl_mem), [2]);
-  globals[0] = n;
-  locals[0] = 16;
-
-  // Run random tests
-  OCL_MAP_BUFFER(0);
-  OCL_MAP_BUFFER(1);
-  for (int32_t i = 0; i < (int32_t) n; ++i) {
-cpu_src1[i] = ((double*)buf_data[0])[i] = rand() * 1e-2;
-cpu_src2[i] = ((double*)buf_data[1])[i] = rand() * 1e-2;
-  }
-  OCL_UNMAP_BUFFER(0);
-  OCL_UNMAP_BUFFER(1);
-
-  // Run the kernel on GPU
-  OCL_NDRANGE(1);
-
-  // Compare
-  OCL_MAP_BUFFER(2);
-  for (int32_t i = 0; i < (int32_t) n; ++i)
-OCL_ASSERT(fabs(((double*)buf_data[2])[i] - cpu_src1[i] - cpu_src2[i]) < 
1e-4);
-  OCL_UNMAP_BUFFER(2);
-}
-
-MAKE_UTEST_FROM_FUNCTION(compiler_double_4);
diff --git a/utests/compiler_double_div.cpp b/utests/compiler_double_div.cpp
new file mode 100644
index 000..f3a21df
--- /dev/null
+++ b/utests/compiler_double_div.cpp
@@ -0,0 +1,42 @@
+#include 
+#include "utest_helper.hpp"
+
+void compiler_double_div(void)
+{
+  const size_t n = 16;
+  double cpu_src0[n], cpu_src1[n];
+
+  // Setup kernel and buffers
+  OCL_CREATE_KERNEL("compiler_double_div");
+  OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(double), NULL);
+  OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(double), NULL);
+  OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(double), NULL);
+  OCL_SET_ARG(0, sizeof(cl_mem), [0]);
+  OCL_SET_ARG(1, sizeof(cl_mem), [1]);
+  OCL_SET_ARG(2, sizeof(cl_mem), [2]);
+  globals[0] = n;
+  locals[0] = 16;
+
+  // Run random tests
+  OCL_MAP_BUFFER(0);
+  OCL_MAP_BUFFER(1);
+  for (int32_t i = 0; i < (int32_t) n; ++i) {
+cpu_src0[i] = ((double*)buf_data[0])[i] = ((double)(((i - 5)*1334) * 
11105));
+cpu_src1[i] = ((double*)buf_data[1])[i] = 499.13542123d*(i + 132.43d + 
142.32*i);
+  }
+  OCL_UNMAP_BUFFER(0);
+  OCL_UNMAP_BUFFER(1);
+
+  // Run the kernel on GPU
+  OCL_NDRANGE(1);
+
+  // Compare
+  OCL_MAP_BUFFER(2);
+  for (int32_t i = 0; i < (int32_t) n; ++i) {
+OCL_ASSERT(fabs(((double*)buf_data[2])[i] - cpu_src0[i]/cpu_src1[i]) < 
1e-32);
+//printf("%d :  %fref value: %f\n", i, ((double*)buf_data[2])[i], 
cpu_src0[i]/cpu_src1[i]);
+  }
+  OCL_UNMAP_BUFFER(2);
+}
+
+MAKE_UTEST_FROM_FUNCTION(compiler_double_div);
-- 
1.9.1



___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


[Beignet] [PATCH 7/8] Backend: Add madm and invm instrucions to disasm.

2015-09-15 Thread junyan . he
From: Junyan He 

We also add special accumulator field print to disasm.

Signed-off-by: Junyan He 
---
 backend/src/backend/gen/gen_mesa_disasm.c | 89 +--
 1 file changed, 84 insertions(+), 5 deletions(-)

diff --git a/backend/src/backend/gen/gen_mesa_disasm.c 
b/backend/src/backend/gen/gen_mesa_disasm.c
index 5220233..733b2e6 100644
--- a/backend/src/backend/gen/gen_mesa_disasm.c
+++ b/backend/src/backend/gen/gen_mesa_disasm.c
@@ -84,6 +84,7 @@ static const struct {
   [GEN_OPCODE_DP3] = { .name = "dp3", .nsrc = 2, .ndst = 1 },
   [GEN_OPCODE_DP2] = { .name = "dp2", .nsrc = 2, .ndst = 1 },
   [GEN_OPCODE_MATH] = { .name = "math", .nsrc = 2, .ndst = 1 },
+  [GEN_OPCODE_MADM] = { .name = "madm", .nsrc = 3, .ndst = 1 },
 
   [GEN_OPCODE_AVG] = { .name = "avg", .nsrc = 2, .ndst = 1 },
   [GEN_OPCODE_ADD] = { .name = "add", .nsrc = 2, .ndst = 1 },
@@ -311,6 +312,18 @@ static const char *writemask[16] = {
   [0xf] = "",
 };
 
+static const char *special_acc[9] = {
+  [0x0] = ".acc2",
+  [0x1] = ".acc3",
+  [0x2] = ".acc4",
+  [0x3] = ".acc5",
+  [0x4] = ".acc6",
+  [0x5] = ".acc7",
+  [0x6] = ".acc8",
+  [0x7] = ".acc9",
+  [0x8] = ".noacc",
+};
+
 static const char *end_of_thread[2] = {
   [0] = "",
   [1] = "EOT"
@@ -532,6 +545,24 @@ static int gen_version;
 #define GENERIC_MSG_LENGTH(inst)   GEN_BITS_FIELD(inst, 
bits3.generic_gen5.msg_length)
 #define GENERIC_RESPONSE_LENGTH(inst) GEN_BITS_FIELD(inst, 
bits3.generic_gen5.response_length)
 
+static int is_special_acc(const void* inst)
+{
+  if (gen_version < 80)
+return 0;
+
+  if (OPCODE(inst) != GEN_OPCODE_MADM && OPCODE(inst) != GEN_OPCODE_MATH)
+return 0;
+
+  if (OPCODE(inst) == GEN_OPCODE_MATH &&
+(MATH_FUNCTION(inst) != GEN8_MATH_FUNCTION_INVM && MATH_FUNCTION(inst) != 
GEN8_MATH_FUNCTION_RSQRTM))
+return 0;
+
+  if (ACCESS_MODE(inst) != GEN_ALIGN_16)
+return 0;
+
+  return 1;
+}
+
 static int string(FILE *file, const char *string)
 {
   fputs (string, file);
@@ -688,7 +719,12 @@ static int dest(FILE *file, const void* inst)
 format(file, ".%d", GEN_BITS_FIELD(inst, bits1.da16.dest_subreg_nr) /
reg_type_size[GEN_BITS_FIELD(inst, bits1.da16.dest_reg_type)]);
   string(file, "<1>");
-  err |= control(file, "writemask", writemask, GEN_BITS_FIELD(inst, 
bits1.da16.dest_writemask), NULL);
+
+  if (is_special_acc(inst)) {
+err |= control(file, "specialacc", special_acc, ((const union 
Gen8NativeInstruction *)inst)->bits1.da16acc.dst_specal_acc, NULL);
+  } else {
+err |= control(file, "writemask", writemask, GEN_BITS_FIELD(inst, 
bits1.da16.dest_writemask), NULL);
+  }
   err |= control(file, "dest reg encoding", reg_encoding, 
GEN_BITS_FIELD(inst, bits1.da16.dest_reg_type), NULL);
 } else {
   err = 1;
@@ -710,7 +746,11 @@ static int dest_3src(FILE *file, const void *inst)
   if (GEN_BITS_FIELD(inst, bits1.da3src.dest_subreg_nr))
 format(file, ".%d", GEN_BITS_FIELD(inst, bits1.da3src.dest_subreg_nr));
   string(file, "<1>");
-  err |= control(file, "writemask", writemask, GEN_BITS_FIELD(inst, 
bits1.da3src.dest_writemask), NULL);
+  if (is_special_acc(inst)) {
+err |= control(file, "specialacc", special_acc, ((const union 
Gen8NativeInstruction *)inst)->bits1.da3srcacc.dst_specal_acc, NULL);
+  } else {
+err |= control(file, "writemask", writemask, GEN_BITS_FIELD(inst, 
bits1.da16.dest_writemask), NULL);
+  }
   err |= control(file, "dest reg encoding", reg_encoding, GEN_TYPE_F, NULL);
 
   return 0;
@@ -775,7 +815,7 @@ static int src_ia1(FILE *file,
   return err;
 }
 
-static int src_da16(FILE *file,
+static int src_da16(FILE *file, const void* inst, int src_num,
 uint32_t _reg_type,
 uint32_t _reg_file,
 uint32_t _vert_stride,
@@ -803,6 +843,17 @@ static int src_da16(FILE *file,
 
   err |= control(file, "vert stride", vert_stride, _vert_stride, NULL);
   string(file, ",4,1>");
+
+  if (is_special_acc(inst)) {
+if (src_num == 0) {
+  err |= control(file, "specialacc", special_acc, ((const union 
Gen8NativeInstruction *)inst)->bits2.da16acc.src0_specal_acc_lo, NULL);
+} else {
+  assert(src_num == 1);
+  err |= control(file, "specialacc", special_acc, ((const union 
Gen8NativeInstruction *)inst)->bits3.da16acc.src1_specal_acc_lo, NULL);
+}
+return err;
+  }
+
   /*
* Three kinds of swizzle display:
*  identity - nothing printed
@@ -850,6 +901,12 @@ static int src0_3src(FILE *file, const void* inst)
 string(file, "<8,8,1>");
   err |= control(file, "src da16 reg type", reg_encoding,
  GEN_TYPE_F, NULL);
+
+  if (is_special_acc(inst)) {
+err |= control(file, "specialacc", special_acc, ((const union 
Gen8NativeInstruction *)inst)->bits2.da3srcacc.src0_specal_acc, NULL);
+return err;
+  }
+
   /*
* Three kinds of swizzle display:
*  identity - 

[Beignet] [PATCH 6/8] Backend: Implement FDIV64 on BDW.

2015-09-15 Thread junyan . he
From: Junyan He 

According to the document, we use a set of instructions
to implement double type division.

Signed-off-by: Junyan He 
---
 backend/src/backend/gen8_context.cpp | 68 
 backend/src/backend/gen8_context.hpp |  2 ++
 2 files changed, 70 insertions(+)

diff --git a/backend/src/backend/gen8_context.cpp 
b/backend/src/backend/gen8_context.cpp
index b497ee5..f465832 100644
--- a/backend/src/backend/gen8_context.cpp
+++ b/backend/src/backend/gen8_context.cpp
@@ -924,6 +924,74 @@ namespace gbe
 this->unpackLongVec(src, dst, p->curr.execWidth);
   }
 
+  void Gen8Context::emitF64DIVInstruction(const SelectionInstruction ) {
+/* Macro for Double Precision IEEE Compliant fdiv
+
+   Set Rounding Mode in CR to RNE
+   GRF are initialized: r0 = 0, r6 = a, r7 = b, r1 = 1
+   The default data type for the macro is :df
+
+   math.eo.f0.0 (4) r8.acc2 r6.noacc r7.noacc 0xE
+   (-f0.0) if
+   madm (4) r9.acc3 r0.noacc r6.noacc r8.acc2   // Step(1), q0=a*y0
+   madm (4) r10.acc4 r1.noacc -r7.noacc r8.acc2 // Step(2), e0=(1-b*y0)
+   madm (4) r11.acc5 r6.noacc -r7.noacc r9.acc3 // Step(3), r0=a-b*q0
+   madm (4) r12.acc6 r8.acc2 r10.acc4 r8.acc2   // Step(4), y1=y0+e0*y0
+   madm (4) r13.acc7 r1.noacc -r7.noacc r12.acc6// Step(5), e1=(1-b*y1)
+   madm (4) r8.acc8 r8.acc2 r10.acc4 r12.acc6   // Step(6), y2=y0+e0*y1
+   madm (4) r9.acc9 r9.acc3 r11.acc5 r12.acc6   // Step(7), q1=q0+r0*y1
+   madm (4) r12.acc2 r12.acc6 r8.acc8 r13.acc7  // Step(8), y3=y1+e1*y2
+   madm (4) r11.acc3 r6.noacc -r7.noacc r9.acc9 // Step(9), r1=a-b*q1
+
+   Change Rounding Mode in CR if required
+   Implicit Accumulator for destination is NULL
+
+   madm (4) r8.noacc r9.acc9 r11.acc3 r12.acc2  // Step(10), q=q1+r1*y3
+   endif */
+GenRegister r6 = GenRegister::retype(ra->genReg(insn.src(0)), GEN_TYPE_DF);
+GenRegister r7 = GenRegister::retype(ra->genReg(insn.src(1)), GEN_TYPE_DF);
+GenRegister r8 = GenRegister::retype(ra->genReg(insn.dst(0)), GEN_TYPE_DF);
+const GenRegister r0 = GenRegister::retype(ra->genReg(insn.dst(1)), 
GEN_TYPE_DF);
+const GenRegister r1 = GenRegister::retype(ra->genReg(insn.dst(2)), 
GEN_TYPE_DF);
+const GenRegister r9 = GenRegister::retype(ra->genReg(insn.dst(3)), 
GEN_TYPE_DF);
+const GenRegister r10 = GenRegister::retype(ra->genReg(insn.dst(4)), 
GEN_TYPE_DF);
+const GenRegister r11 = GenRegister::retype(ra->genReg(insn.dst(5)), 
GEN_TYPE_DF);
+const GenRegister r12 = GenRegister::retype(ra->genReg(insn.dst(6)), 
GEN_TYPE_DF);
+const GenRegister r13 = GenRegister::retype(ra->genReg(insn.dst(7)), 
GEN_TYPE_DF);
+Gen8Encoder *p8 = reinterpret_cast(p);
+p->push(); {
+  p->curr.execWidth = 4;
+  p->curr.predicate = GEN_PREDICATE_NONE;
+  p->curr.noMask= 1;
+  p->MOV(r1, GenRegister::immdf(1.0d));
+  p->MOV(r0, GenRegister::immdf(0.0d));
+
+  for (int i = 0; i < (simdWidth == 16 ? 4 : 2); i++) {
+p->curr.predicate = GEN_PREDICATE_NONE;
+p8->MATH_WITH_ACC(r8, GEN8_MATH_FUNCTION_INVM, r6, r7, GEN8_INSN_ACC2, 
GEN8_INSN_NOACC, GEN8_INSN_NOACC);
+p->curr.useFlag(insn.state.flag, insn.state.subFlag);
+p->curr.predicate = GEN_PREDICATE_NORMAL;
+p->curr.inversePredicate = 1;
+p->curr.noMask= 0;
+p8->MADM(r9, r0, r6, r8, GEN8_INSN_ACC3, GEN8_INSN_NOACC, 
GEN8_INSN_NOACC, GEN8_INSN_ACC2);
+p8->MADM(r10, r1, GenRegister::negate(r7), r8, GEN8_INSN_ACC4, 
GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC2);
+p8->MADM(r11, r6, GenRegister::negate(r7), r9, GEN8_INSN_ACC5, 
GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC3);
+p8->MADM(r12, r8, r10, r8, GEN8_INSN_ACC6, GEN8_INSN_ACC2, 
GEN8_INSN_ACC4, GEN8_INSN_ACC2);
+p8->MADM(r13, r1, GenRegister::negate(r7), r12, GEN8_INSN_ACC7, 
GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC6);
+p8->MADM(r8, r8, r10, r12, GEN8_INSN_ACC8, GEN8_INSN_ACC2, 
GEN8_INSN_ACC4, GEN8_INSN_ACC6);
+p8->MADM(r9, r9, r11, r12, GEN8_INSN_ACC9, GEN8_INSN_ACC3, 
GEN8_INSN_ACC5, GEN8_INSN_ACC6);
+p8->MADM(r12, r12, r8, r13, GEN8_INSN_ACC2, GEN8_INSN_ACC6, 
GEN8_INSN_ACC8, GEN8_INSN_ACC7);
+p8->MADM(r11, r6, GenRegister::negate(r7), r9, GEN8_INSN_ACC3, 
GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC9);
+
+p8->MADM(r8, r9, r11, r12, GEN8_INSN_NOACC, GEN8_INSN_ACC9, 
GEN8_INSN_ACC3, GEN8_INSN_ACC2);
+
+r6 = GenRegister::offset(r6, 1);
+r7 = GenRegister::offset(r7, 1);
+r8 = GenRegister::offset(r8, 1);
+  }
+} p->pop();
+  }
+
   void Gen8Context::setA0Content(uint16_t new_a0[16], uint16_t max_offset, int 
sz) {
 if (sz == 0)
   sz = 16;
diff --git a/backend/src/backend/gen8_context.hpp 
b/backend/src/backend/gen8_context.hpp
index 84508e9..386f7f3 100644
--- 

Re: [Beignet] [PATCH 6/8] Backend: Implement FDIV64 on BDW.

2015-09-15 Thread He Junyan
On Tue, Sep 15, 2015 at 06:00:57AM -0700, Matt Turner wrote:
> Date: Tue, 15 Sep 2015 06:00:57 -0700
> From: Matt Turner 
> To: "junyan.he" 
> Cc: "beignet@lists.freedesktop.org" 
> Subject: Re: [Beignet] [PATCH 6/8] Backend: Implement FDIV64 on BDW.
> 
> On Tue, Sep 15, 2015 at 4:15 AM,   wrote:
> > From: Junyan He 
> >
> > According to the document, we use a set of instructions
> > to implement double type division.
> >
> > Signed-off-by: Junyan He 
> > ---
> >  backend/src/backend/gen8_context.cpp | 68 
> > 
> >  backend/src/backend/gen8_context.hpp |  2 ++
> >  2 files changed, 70 insertions(+)
> >
> > diff --git a/backend/src/backend/gen8_context.cpp 
> > b/backend/src/backend/gen8_context.cpp
> > index b497ee5..f465832 100644
> > --- a/backend/src/backend/gen8_context.cpp
> > +++ b/backend/src/backend/gen8_context.cpp
> > @@ -924,6 +924,74 @@ namespace gbe
> >  this->unpackLongVec(src, dst, p->curr.execWidth);
> >}
> >
> > +  void Gen8Context::emitF64DIVInstruction(const SelectionInstruction 
> > ) {
> > +/* Macro for Double Precision IEEE Compliant fdiv
> > +
> > +   Set Rounding Mode in CR to RNE
> > +   GRF are initialized: r0 = 0, r6 = a, r7 = b, r1 = 1
> > +   The default data type for the macro is :df
> > +
> > +   math.eo.f0.0 (4) r8.acc2 r6.noacc r7.noacc 0xE
> > +   (-f0.0) if
> > +   madm (4) r9.acc3 r0.noacc r6.noacc r8.acc2   // Step(1), q0=a*y0
> > +   madm (4) r10.acc4 r1.noacc -r7.noacc r8.acc2 // Step(2), 
> > e0=(1-b*y0)
> > +   madm (4) r11.acc5 r6.noacc -r7.noacc r9.acc3 // Step(3), 
> > r0=a-b*q0
> > +   madm (4) r12.acc6 r8.acc2 r10.acc4 r8.acc2   // Step(4), 
> > y1=y0+e0*y0
> > +   madm (4) r13.acc7 r1.noacc -r7.noacc r12.acc6// Step(5), 
> > e1=(1-b*y1)
> > +   madm (4) r8.acc8 r8.acc2 r10.acc4 r12.acc6   // Step(6), 
> > y2=y0+e0*y1
> > +   madm (4) r9.acc9 r9.acc3 r11.acc5 r12.acc6   // Step(7), 
> > q1=q0+r0*y1
> > +   madm (4) r12.acc2 r12.acc6 r8.acc8 r13.acc7  // Step(8), 
> > y3=y1+e1*y2
> > +   madm (4) r11.acc3 r6.noacc -r7.noacc r9.acc9 // Step(9), 
> > r1=a-b*q1
> > +
> > +   Change Rounding Mode in CR if required
> > +   Implicit Accumulator for destination is NULL
> > +
> > +   madm (4) r8.noacc r9.acc9 r11.acc3 r12.acc2  // Step(10), 
> > q=q1+r1*y3
> > +   endif */
> 
> I don't see an IF or an ENDIF instruction emitted in the code below.
> Is that intentional, or am I misreading the code?
> 
Here, we use f0.1 as the predication for all the instructions, like:
(-f0.1) madm (4) r9.acc3 r0.noacc r6.noacc r8.acc2 
(-f0.1) madm (4) r10.acc4 r1.noacc -r7.noacc r8.acc2
.
I avoid using IF-Endif here, because we need to calculate the instruction number
within IF clause, and it is not convenient.

> > +GenRegister r6 = GenRegister::retype(ra->genReg(insn.src(0)), 
> > GEN_TYPE_DF);
> > +GenRegister r7 = GenRegister::retype(ra->genReg(insn.src(1)), 
> > GEN_TYPE_DF);
> > +GenRegister r8 = GenRegister::retype(ra->genReg(insn.dst(0)), 
> > GEN_TYPE_DF);
> > +const GenRegister r0 = GenRegister::retype(ra->genReg(insn.dst(1)), 
> > GEN_TYPE_DF);
> > +const GenRegister r1 = GenRegister::retype(ra->genReg(insn.dst(2)), 
> > GEN_TYPE_DF);
> > +const GenRegister r9 = GenRegister::retype(ra->genReg(insn.dst(3)), 
> > GEN_TYPE_DF);
> > +const GenRegister r10 = GenRegister::retype(ra->genReg(insn.dst(4)), 
> > GEN_TYPE_DF);
> > +const GenRegister r11 = GenRegister::retype(ra->genReg(insn.dst(5)), 
> > GEN_TYPE_DF);
> > +const GenRegister r12 = GenRegister::retype(ra->genReg(insn.dst(6)), 
> > GEN_TYPE_DF);
> > +const GenRegister r13 = GenRegister::retype(ra->genReg(insn.dst(7)), 
> > GEN_TYPE_DF);
> > +Gen8Encoder *p8 = reinterpret_cast(p);
> > +p->push(); {
> > +  p->curr.execWidth = 4;
> > +  p->curr.predicate = GEN_PREDICATE_NONE;
> > +  p->curr.noMask= 1;
> > +  p->MOV(r1, GenRegister::immdf(1.0d));
> > +  p->MOV(r0, GenRegister::immdf(0.0d));
> > +
> > +  for (int i = 0; i < (simdWidth == 16 ? 4 : 2); i++) {
> > +p->curr.predicate = GEN_PREDICATE_NONE;
> > +p8->MATH_WITH_ACC(r8, GEN8_MATH_FUNCTION_INVM, r6, r7, 
> > GEN8_INSN_ACC2, GEN8_INSN_NOACC, GEN8_INSN_NOACC);
> > +p->curr.useFlag(insn.state.flag, insn.state.subFlag);
> > +p->curr.predicate = GEN_PREDICATE_NORMAL;
> > +p->curr.inversePredicate = 1;
> > +p->curr.noMask= 0;
> > +p8->MADM(r9, r0, r6, r8, GEN8_INSN_ACC3, GEN8_INSN_NOACC, 
> > GEN8_INSN_NOACC, GEN8_INSN_ACC2);
> > +p8->MADM(r10, r1, GenRegister::negate(r7), r8, GEN8_INSN_ACC4, 
> > GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC2);
> > +p8->MADM(r11, r6, GenRegister::negate(r7), r9, GEN8_INSN_ACC5, 
> > 

Re: [Beignet] [PATCH 5/8] Backend: Add the MADM function to gen8 encoder.

2015-09-15 Thread He Junyan
On Tue, Sep 15, 2015 at 05:57:13AM -0700, Matt Turner wrote:
> Date: Tue, 15 Sep 2015 05:57:13 -0700
> From: Matt Turner 
> To: "junyan.he" 
> Cc: "beignet@lists.freedesktop.org" 
> Subject: Re: [Beignet] [PATCH 5/8] Backend: Add the MADM function to gen8
>  encoder.
> 
> On Tue, Sep 15, 2015 at 4:15 AM,   wrote:
> > From: Junyan He 
> >
> > Signed-off-by: Junyan He 
> > ---
> >  backend/src/backend/gen8_encoder.cpp | 56 
> > 
> >  backend/src/backend/gen8_encoder.hpp |  2 ++
> >  backend/src/backend/gen_defs.hpp |  2 ++
> >  3 files changed, 60 insertions(+)
> >
> > diff --git a/backend/src/backend/gen8_encoder.cpp 
> > b/backend/src/backend/gen8_encoder.cpp
> > index 0af27a3..002a8b5 100644
> > --- a/backend/src/backend/gen8_encoder.cpp
> > +++ b/backend/src/backend/gen8_encoder.cpp
> > @@ -591,4 +591,60 @@ namespace gbe
> >   this->setSrc0WithAcc(insn, src0, src0Acc);
> >   this->setSrc1WithAcc(insn, src1, src1Acc);
> >}
> > +
> > +  void Gen8Encoder::MADM(GenRegister dst, GenRegister src0, GenRegister 
> > src1, GenRegister src2,
> > +  uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc, uint32_t 
> > src2Acc)
> > +  {
> > +GenNativeInstruction *insn = this->next(GEN_OPCODE_MADM);
> > +Gen8NativeInstruction *gen8_insn = >gen8_insn;
> > +assert(dst.file == GEN_GENERAL_REGISTER_FILE);
> > +assert(src0.file == GEN_GENERAL_REGISTER_FILE);
> > +assert(src1.file == GEN_GENERAL_REGISTER_FILE);
> > +assert(src2.file == GEN_GENERAL_REGISTER_FILE);
> > +assert(dst.hstride == GEN_HORIZONTAL_STRIDE_1 || dst.hstride == 
> > GEN_HORIZONTAL_STRIDE_0);
> > +assert(src0.type == GEN_TYPE_DF || src0.type == GEN_TYPE_F);
> > +assert(src0.type == dst.type);
> > +assert(src0.type == src1.type);
> > +assert(src0.type == src2.type);
> > +int32_t dataType = src0.type == GEN_TYPE_DF ? 3 : 0;
> > +
> > +this->setHeader(insn);
> > +gen8_insn->bits1.da3srcacc.dest_reg_nr = dst.nr;
> > +gen8_insn->bits1.da3srcacc.dest_subreg_nr = dst.subnr / 16;
> > +gen8_insn->bits1.da3srcacc.dst_specal_acc = dstAcc;
> > +gen8_insn->bits1.da3srcacc.src_type = dataType;
> > +gen8_insn->bits1.da3srcacc.dest_type = dataType;
> > +gen8_insn->header.access_mode = GEN_ALIGN_16;
> > +
> > +assert(src0.file == GEN_GENERAL_REGISTER_FILE);
> > +assert(src0.address_mode == GEN_ADDRESS_DIRECT);
> > +assert(src0.nr < 128);
> > +gen8_insn->bits2.da3srcacc.src0_specal_acc = src0Acc;
> > +gen8_insn->bits2.da3srcacc.src0_subreg_nr = src0.subnr / 4 ;
> > +gen8_insn->bits2.da3srcacc.src0_reg_nr = src0.nr;
> > +gen8_insn->bits1.da3srcacc.src0_abs = src0.absolute;
> > +gen8_insn->bits1.da3srcacc.src0_negate = src0.negation;
> > +gen8_insn->bits2.da3srcacc.src0_rep_ctrl = src0.vstride == 
> > GEN_VERTICAL_STRIDE_0;
> > +
> > +assert(src1.file == GEN_GENERAL_REGISTER_FILE);
> > +assert(src1.address_mode == GEN_ADDRESS_DIRECT);
> > +assert(src1.nr < 128);
> > +gen8_insn->bits2.da3srcacc.src1_specal_acc = src1Acc;
> > +gen8_insn->bits2.da3srcacc.src1_subreg_nr_low = (src1.subnr / 4) & 0x3;
> > +gen8_insn->bits3.da3srcacc.src1_subreg_nr_high = (src1.subnr / 4) >> 2;
> > +gen8_insn->bits2.da3srcacc.src1_rep_ctrl = src1.vstride == 
> > GEN_VERTICAL_STRIDE_0;
> > +gen8_insn->bits3.da3srcacc.src1_reg_nr = src1.nr;
> > +gen8_insn->bits1.da3srcacc.src1_abs = src1.absolute;
> > +gen8_insn->bits1.da3srcacc.src1_negate = src1.negation;
> > +
> > +assert(src2.file == GEN_GENERAL_REGISTER_FILE);
> > +assert(src2.address_mode == GEN_ADDRESS_DIRECT);
> > +assert(src2.nr < 128);
> > +gen8_insn->bits3.da3srcacc.src2_specal_acc = src2Acc;
> > +gen8_insn->bits3.da3srcacc.src2_subreg_nr = src2.subnr / 4;
> > +gen8_insn->bits3.da3srcacc.src2_rep_ctrl = src2.vstride == 
> > GEN_VERTICAL_STRIDE_0;
> > +gen8_insn->bits3.da3srcacc.src2_reg_nr = src2.nr;
> > +gen8_insn->bits1.da3srcacc.src2_abs = src2.absolute;
> > +gen8_insn->bits1.da3srcacc.src2_negate = src2.negation;
> > +  }
> >  } /* End of the name space. */
> > diff --git a/backend/src/backend/gen8_encoder.hpp 
> > b/backend/src/backend/gen8_encoder.hpp
> > index 53ec3d1..8e7939b 100644
> > --- a/backend/src/backend/gen8_encoder.hpp
> > +++ b/backend/src/backend/gen8_encoder.hpp
> > @@ -74,6 +74,8 @@ namespace gbe
> >
> >  void MATH_WITH_ACC(GenRegister dst, uint32_t function, GenRegister 
> > src0, GenRegister src1,
> > uint32_t dstAcc, uint32_t src0Acc, uint32_t 
> > src1Acc);
> > +void MADM(GenRegister dst, GenRegister src0, GenRegister src1, 
> > GenRegister src2,
> > +  uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc, 
> > uint32_t src2Acc);
> >};
> >  }
> >  #endif /* __GBE_GEN8_ENCODER_HPP__ */

Re: [Beignet] [PATCH 3/8] Backend: Add gen8 instruction field for special accumulator.

2015-09-15 Thread Matt Turner
On Tue, Sep 15, 2015 at 4:15 AM,   wrote:
> From: Junyan He 
>
> The madm and invm function need to set accumulator id in the
> instruction. On BDW, the write mask of the dst and channel
> mask of src are reinterpreted for acc2~acc9 selection.
>
> Signed-off-by: Junyan He 
> ---
>  backend/src/backend/gen8_instruction.hpp | 86 
> 
>  1 file changed, 86 insertions(+)
>
> diff --git a/backend/src/backend/gen8_instruction.hpp 
> b/backend/src/backend/gen8_instruction.hpp
> index 5cf1032..2aa5bf7 100644
> --- a/backend/src/backend/gen8_instruction.hpp
> +++ b/backend/src/backend/gen8_instruction.hpp
> @@ -135,6 +135,22 @@ union Gen8NativeInstruction
>  uint32_t dest_address_mode:1;
>} ia16;
>
> +  struct { // The sub reg field is reinterpreted as accumulator selector.
> +uint32_t flag_sub_reg_nr:1;
> +uint32_t flag_reg_nr:1;
> +uint32_t mask_control:1;
> +uint32_t dest_reg_file:2;
> +uint32_t dest_reg_type:4;
> +uint32_t src0_reg_file:2;
> +uint32_t src0_reg_type:4;
> +uint32_t pad:1;
> +uint32_t dst_specal_acc:4;

s/specal/special/ throughout this patch.
___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH 5/8] Backend: Add the MADM function to gen8 encoder.

2015-09-15 Thread Matt Turner
On Tue, Sep 15, 2015 at 4:15 AM,   wrote:
> From: Junyan He 
>
> Signed-off-by: Junyan He 
> ---
>  backend/src/backend/gen8_encoder.cpp | 56 
> 
>  backend/src/backend/gen8_encoder.hpp |  2 ++
>  backend/src/backend/gen_defs.hpp |  2 ++
>  3 files changed, 60 insertions(+)
>
> diff --git a/backend/src/backend/gen8_encoder.cpp 
> b/backend/src/backend/gen8_encoder.cpp
> index 0af27a3..002a8b5 100644
> --- a/backend/src/backend/gen8_encoder.cpp
> +++ b/backend/src/backend/gen8_encoder.cpp
> @@ -591,4 +591,60 @@ namespace gbe
>   this->setSrc0WithAcc(insn, src0, src0Acc);
>   this->setSrc1WithAcc(insn, src1, src1Acc);
>}
> +
> +  void Gen8Encoder::MADM(GenRegister dst, GenRegister src0, GenRegister 
> src1, GenRegister src2,
> +  uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc, uint32_t src2Acc)
> +  {
> +GenNativeInstruction *insn = this->next(GEN_OPCODE_MADM);
> +Gen8NativeInstruction *gen8_insn = >gen8_insn;
> +assert(dst.file == GEN_GENERAL_REGISTER_FILE);
> +assert(src0.file == GEN_GENERAL_REGISTER_FILE);
> +assert(src1.file == GEN_GENERAL_REGISTER_FILE);
> +assert(src2.file == GEN_GENERAL_REGISTER_FILE);
> +assert(dst.hstride == GEN_HORIZONTAL_STRIDE_1 || dst.hstride == 
> GEN_HORIZONTAL_STRIDE_0);
> +assert(src0.type == GEN_TYPE_DF || src0.type == GEN_TYPE_F);
> +assert(src0.type == dst.type);
> +assert(src0.type == src1.type);
> +assert(src0.type == src2.type);
> +int32_t dataType = src0.type == GEN_TYPE_DF ? 3 : 0;
> +
> +this->setHeader(insn);
> +gen8_insn->bits1.da3srcacc.dest_reg_nr = dst.nr;
> +gen8_insn->bits1.da3srcacc.dest_subreg_nr = dst.subnr / 16;
> +gen8_insn->bits1.da3srcacc.dst_specal_acc = dstAcc;
> +gen8_insn->bits1.da3srcacc.src_type = dataType;
> +gen8_insn->bits1.da3srcacc.dest_type = dataType;
> +gen8_insn->header.access_mode = GEN_ALIGN_16;
> +
> +assert(src0.file == GEN_GENERAL_REGISTER_FILE);
> +assert(src0.address_mode == GEN_ADDRESS_DIRECT);
> +assert(src0.nr < 128);
> +gen8_insn->bits2.da3srcacc.src0_specal_acc = src0Acc;
> +gen8_insn->bits2.da3srcacc.src0_subreg_nr = src0.subnr / 4 ;
> +gen8_insn->bits2.da3srcacc.src0_reg_nr = src0.nr;
> +gen8_insn->bits1.da3srcacc.src0_abs = src0.absolute;
> +gen8_insn->bits1.da3srcacc.src0_negate = src0.negation;
> +gen8_insn->bits2.da3srcacc.src0_rep_ctrl = src0.vstride == 
> GEN_VERTICAL_STRIDE_0;
> +
> +assert(src1.file == GEN_GENERAL_REGISTER_FILE);
> +assert(src1.address_mode == GEN_ADDRESS_DIRECT);
> +assert(src1.nr < 128);
> +gen8_insn->bits2.da3srcacc.src1_specal_acc = src1Acc;
> +gen8_insn->bits2.da3srcacc.src1_subreg_nr_low = (src1.subnr / 4) & 0x3;
> +gen8_insn->bits3.da3srcacc.src1_subreg_nr_high = (src1.subnr / 4) >> 2;
> +gen8_insn->bits2.da3srcacc.src1_rep_ctrl = src1.vstride == 
> GEN_VERTICAL_STRIDE_0;
> +gen8_insn->bits3.da3srcacc.src1_reg_nr = src1.nr;
> +gen8_insn->bits1.da3srcacc.src1_abs = src1.absolute;
> +gen8_insn->bits1.da3srcacc.src1_negate = src1.negation;
> +
> +assert(src2.file == GEN_GENERAL_REGISTER_FILE);
> +assert(src2.address_mode == GEN_ADDRESS_DIRECT);
> +assert(src2.nr < 128);
> +gen8_insn->bits3.da3srcacc.src2_specal_acc = src2Acc;
> +gen8_insn->bits3.da3srcacc.src2_subreg_nr = src2.subnr / 4;
> +gen8_insn->bits3.da3srcacc.src2_rep_ctrl = src2.vstride == 
> GEN_VERTICAL_STRIDE_0;
> +gen8_insn->bits3.da3srcacc.src2_reg_nr = src2.nr;
> +gen8_insn->bits1.da3srcacc.src2_abs = src2.absolute;
> +gen8_insn->bits1.da3srcacc.src2_negate = src2.negation;
> +  }
>  } /* End of the name space. */
> diff --git a/backend/src/backend/gen8_encoder.hpp 
> b/backend/src/backend/gen8_encoder.hpp
> index 53ec3d1..8e7939b 100644
> --- a/backend/src/backend/gen8_encoder.hpp
> +++ b/backend/src/backend/gen8_encoder.hpp
> @@ -74,6 +74,8 @@ namespace gbe
>
>  void MATH_WITH_ACC(GenRegister dst, uint32_t function, GenRegister src0, 
> GenRegister src1,
> uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc);
> +void MADM(GenRegister dst, GenRegister src0, GenRegister src1, 
> GenRegister src2,
> +  uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc, uint32_t 
> src2Acc);
>};
>  }
>  #endif /* __GBE_GEN8_ENCODER_HPP__ */
> diff --git a/backend/src/backend/gen_defs.hpp 
> b/backend/src/backend/gen_defs.hpp
> index a1bd8dd..1b550ac 100644
> --- a/backend/src/backend/gen_defs.hpp
> +++ b/backend/src/backend/gen_defs.hpp
> @@ -174,6 +174,8 @@ enum opcode {
>GEN_OPCODE_LINE = 89,
>GEN_OPCODE_PLN = 90,
>GEN_OPCODE_MAD = 91,
> +  GEN_OPCODE_LRP = 92,

Unrelated to the main purpose of the patch: Do I understand correctly
that Beignet does not emit the LRP instruction?

If not, I'm curious why not? It maps pretty well to the mix() function
(just reverse the 

Re: [Beignet] [PATCH 6/8] Backend: Implement FDIV64 on BDW.

2015-09-15 Thread Matt Turner
On Tue, Sep 15, 2015 at 4:15 AM,   wrote:
> From: Junyan He 
>
> According to the document, we use a set of instructions
> to implement double type division.
>
> Signed-off-by: Junyan He 
> ---
>  backend/src/backend/gen8_context.cpp | 68 
> 
>  backend/src/backend/gen8_context.hpp |  2 ++
>  2 files changed, 70 insertions(+)
>
> diff --git a/backend/src/backend/gen8_context.cpp 
> b/backend/src/backend/gen8_context.cpp
> index b497ee5..f465832 100644
> --- a/backend/src/backend/gen8_context.cpp
> +++ b/backend/src/backend/gen8_context.cpp
> @@ -924,6 +924,74 @@ namespace gbe
>  this->unpackLongVec(src, dst, p->curr.execWidth);
>}
>
> +  void Gen8Context::emitF64DIVInstruction(const SelectionInstruction ) {
> +/* Macro for Double Precision IEEE Compliant fdiv
> +
> +   Set Rounding Mode in CR to RNE
> +   GRF are initialized: r0 = 0, r6 = a, r7 = b, r1 = 1
> +   The default data type for the macro is :df
> +
> +   math.eo.f0.0 (4) r8.acc2 r6.noacc r7.noacc 0xE
> +   (-f0.0) if
> +   madm (4) r9.acc3 r0.noacc r6.noacc r8.acc2   // Step(1), q0=a*y0
> +   madm (4) r10.acc4 r1.noacc -r7.noacc r8.acc2 // Step(2), 
> e0=(1-b*y0)
> +   madm (4) r11.acc5 r6.noacc -r7.noacc r9.acc3 // Step(3), r0=a-b*q0
> +   madm (4) r12.acc6 r8.acc2 r10.acc4 r8.acc2   // Step(4), 
> y1=y0+e0*y0
> +   madm (4) r13.acc7 r1.noacc -r7.noacc r12.acc6// Step(5), 
> e1=(1-b*y1)
> +   madm (4) r8.acc8 r8.acc2 r10.acc4 r12.acc6   // Step(6), 
> y2=y0+e0*y1
> +   madm (4) r9.acc9 r9.acc3 r11.acc5 r12.acc6   // Step(7), 
> q1=q0+r0*y1
> +   madm (4) r12.acc2 r12.acc6 r8.acc8 r13.acc7  // Step(8), 
> y3=y1+e1*y2
> +   madm (4) r11.acc3 r6.noacc -r7.noacc r9.acc9 // Step(9), r1=a-b*q1
> +
> +   Change Rounding Mode in CR if required
> +   Implicit Accumulator for destination is NULL
> +
> +   madm (4) r8.noacc r9.acc9 r11.acc3 r12.acc2  // Step(10), 
> q=q1+r1*y3
> +   endif */

I don't see an IF or an ENDIF instruction emitted in the code below.
Is that intentional, or am I misreading the code?

> +GenRegister r6 = GenRegister::retype(ra->genReg(insn.src(0)), 
> GEN_TYPE_DF);
> +GenRegister r7 = GenRegister::retype(ra->genReg(insn.src(1)), 
> GEN_TYPE_DF);
> +GenRegister r8 = GenRegister::retype(ra->genReg(insn.dst(0)), 
> GEN_TYPE_DF);
> +const GenRegister r0 = GenRegister::retype(ra->genReg(insn.dst(1)), 
> GEN_TYPE_DF);
> +const GenRegister r1 = GenRegister::retype(ra->genReg(insn.dst(2)), 
> GEN_TYPE_DF);
> +const GenRegister r9 = GenRegister::retype(ra->genReg(insn.dst(3)), 
> GEN_TYPE_DF);
> +const GenRegister r10 = GenRegister::retype(ra->genReg(insn.dst(4)), 
> GEN_TYPE_DF);
> +const GenRegister r11 = GenRegister::retype(ra->genReg(insn.dst(5)), 
> GEN_TYPE_DF);
> +const GenRegister r12 = GenRegister::retype(ra->genReg(insn.dst(6)), 
> GEN_TYPE_DF);
> +const GenRegister r13 = GenRegister::retype(ra->genReg(insn.dst(7)), 
> GEN_TYPE_DF);
> +Gen8Encoder *p8 = reinterpret_cast(p);
> +p->push(); {
> +  p->curr.execWidth = 4;
> +  p->curr.predicate = GEN_PREDICATE_NONE;
> +  p->curr.noMask= 1;
> +  p->MOV(r1, GenRegister::immdf(1.0d));
> +  p->MOV(r0, GenRegister::immdf(0.0d));
> +
> +  for (int i = 0; i < (simdWidth == 16 ? 4 : 2); i++) {
> +p->curr.predicate = GEN_PREDICATE_NONE;
> +p8->MATH_WITH_ACC(r8, GEN8_MATH_FUNCTION_INVM, r6, r7, 
> GEN8_INSN_ACC2, GEN8_INSN_NOACC, GEN8_INSN_NOACC);
> +p->curr.useFlag(insn.state.flag, insn.state.subFlag);
> +p->curr.predicate = GEN_PREDICATE_NORMAL;
> +p->curr.inversePredicate = 1;
> +p->curr.noMask= 0;
> +p8->MADM(r9, r0, r6, r8, GEN8_INSN_ACC3, GEN8_INSN_NOACC, 
> GEN8_INSN_NOACC, GEN8_INSN_ACC2);
> +p8->MADM(r10, r1, GenRegister::negate(r7), r8, GEN8_INSN_ACC4, 
> GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC2);
> +p8->MADM(r11, r6, GenRegister::negate(r7), r9, GEN8_INSN_ACC5, 
> GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC3);
> +p8->MADM(r12, r8, r10, r8, GEN8_INSN_ACC6, GEN8_INSN_ACC2, 
> GEN8_INSN_ACC4, GEN8_INSN_ACC2);
> +p8->MADM(r13, r1, GenRegister::negate(r7), r12, GEN8_INSN_ACC7, 
> GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC6);
> +p8->MADM(r8, r8, r10, r12, GEN8_INSN_ACC8, GEN8_INSN_ACC2, 
> GEN8_INSN_ACC4, GEN8_INSN_ACC6);
> +p8->MADM(r9, r9, r11, r12, GEN8_INSN_ACC9, GEN8_INSN_ACC3, 
> GEN8_INSN_ACC5, GEN8_INSN_ACC6);
> +p8->MADM(r12, r12, r8, r13, GEN8_INSN_ACC2, GEN8_INSN_ACC6, 
> GEN8_INSN_ACC8, GEN8_INSN_ACC7);
> +p8->MADM(r11, r6, GenRegister::negate(r7), r9, GEN8_INSN_ACC3, 
> GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC9);
> +
> +p8->MADM(r8, r9, r11, r12, GEN8_INSN_NOACC, GEN8_INSN_ACC9, 
> GEN8_INSN_ACC3, GEN8_INSN_ACC2);
> +
> +