Re: [Beignet] [PATCH v3 3/3] add utest for creating 2d image from buffer.
2 comments inline, thanks. -Original Message- From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of xionghu@intel.com Sent: Wednesday, September 09, 2015 1:44 PM To: beignet@lists.freedesktop.org Cc: Luo, Xionghu Subject: [Beignet] [PATCH v3 3/3] add utest for creating 2d image from buffer. + OCL_CALL (clGetDeviceInfo, device, CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT, 0, 0, _value_size); + size_t base_address_alignment = 0; + OCL_CALL (clGetDeviceInfo, device, CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT, param_value_size, _address_alignment, _value_size); [Yejun] the proper usage is: OCL_CALL (clGetDeviceInfo, device, CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT, sizeof(base_address_alignment), _address_alignment, NULL); + // Setup kernel and images + size_t buffer_sz = sizeof(uint32_t) * w * h; [yejun] it is better to query IMAGE_PITCH_ALIGNMENT to do alignment for w ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
[Beignet] [PATCH] GBE: implement pre-register-allocation instruction scheduling.
To find out an instruction scheduling policy to achieve the theoretical minimum registers required in a basic block is a NP problem. We have to use some heuristic factor to simplify the algorithm. There are many researchs which indicate a bottom-up list scheduling is much better than the top-down method in turns of register pressure. I choose one of such research paper as our target. The paper is as below: "Register-Sensitive Selection, Duplication, and Sequencing of Instructions" It use the bottom-up list scheduling with a Sethi-Ullman label as an heuristic number. As we will do cycle awareness scheduling after the register allocation, we don't need to bother with cycle related heuristic number here. I just skipped the EST computing and usage part in the algorithm. It turns out this algorithm works well. It could reduce the register spilling in clBlas's sgemmBlock kernel from 83+ to only 20. Although this scheduling method seems to be lowering the ILP(instruction level parallism). It's not a big issue, because we will allocate as much as possible different registers in the following register allocation stage, and we will do a after allocation instruction scheduling which will try to get as much ILP as possible. Signed-off-by: Zhigang Gong--- backend/src/backend/gen_insn_scheduling.cpp | 137 +++- 1 file changed, 116 insertions(+), 21 deletions(-) diff --git a/backend/src/backend/gen_insn_scheduling.cpp b/backend/src/backend/gen_insn_scheduling.cpp index 358a2ce..f4f1e70 100644 --- a/backend/src/backend/gen_insn_scheduling.cpp +++ b/backend/src/backend/gen_insn_scheduling.cpp @@ -41,26 +41,29 @@ * == * * We try to limit the register pressure. - * Well, this is a hard problem and we have a decent strategy now that we called - * "zero cycled LIFO scheduling". - * We use a local forward list scheduling and we schedule the instructions in a - * LIFO order i.e. as a stack. Basically, we take the most recent instruction - * and schedule it right away. Obviously we ignore completely the real latencies - * and throuputs and just simulate instructions that are issued and completed in - * zero cycle. For the complex kernels we already have (like menger sponge), - * this provides a pretty good strategy enabling SIMD16 code generation where - * when scheduling is deactivated, even SIMD8 fails * - * One may argue that this strategy is bad, latency wise. This is not true since - * the register allocator will anyway try to burn as many registers as possible. - * So, there is still opportunities to schedule after register allocation. + * To find out an instruction scheduling policy to achieve the theoretical minimum + * registers required in a basic block is a NP problem. We have to use some heuristic + * factor to simplify the algorithm. There are many researchs which indicate a + * bottom-up list scheduling is much better than the top-down method in turns of + * register pressure. I choose one of such research paper as our target. The paper + * is as below: * - * Our idea seems to work decently. There is however a strong research article - * that is able to near-optimally reschudle the instructions to minimize - * register use. This is: + * "Register-Sensitive Selection, Duplication, and Sequencing of Instructions" + * It use the bottom-up list scheduling with a Sethi-Ullman label as an + * heuristic number. As we will do cycle awareness scheduling after the register + * allocation, we don't need to bother with cycle related heuristic number here. + * I just skipped the EST computing and usage part in the algorithm. * - * "Minimum Register Instruction Sequence Problem: Revisiting Optimal Code - * Generation for DAGs" + * It turns out this algorithm works well. It could reduce the register spilling + * in clBlas's sgemmBlock kernel from 83+ to only 20. + * + * Although this scheduling method seems to be lowering the ILP(instruction level parallism). + * It's not a big issue, because we will allocate as much as possible different registers + * in the following register allocation stage, and we will do a after allocation + * instruction scheduling which will try to get as much ILP as possible. + * + * FIXME: we only need to do this scheduling when a BB is really under high register pressure. * * After the register allocation * == @@ -114,7 +117,7 @@ namespace gbe struct ScheduleDAGNode { INLINE ScheduleDAGNode(SelectionInstruction ) : - insn(insn), refNum(0), retiredCycle(0), preRetired(false), readDistance(0x7fff) {} + insn(insn), refNum(0), depNum(0), retiredCycle(0), preRetired(false), readDistance(0x7fff) {} bool dependsOn(ScheduleDAGNode *node) const { GBE_ASSERT(node != NULL); for (auto child : node->children) @@ -128,6 +131,10 @@ namespace gbe SelectionInstruction /*! Number of nodes that point to us
[Beignet] [PATCH 2/8] Backend: Add FDIV64 function for gen_insn_selection.
From: Junyan HeSigned-off-by: Junyan He --- backend/src/backend/gen_context.cpp| 4 +++ backend/src/backend/gen_context.hpp| 1 + .../src/backend/gen_insn_gen7_schedule_info.hxx| 1 + backend/src/backend/gen_insn_selection.cpp | 41 +- backend/src/backend/gen_insn_selection.hxx | 1 + 5 files changed, 47 insertions(+), 1 deletion(-) diff --git a/backend/src/backend/gen_context.cpp b/backend/src/backend/gen_context.cpp index 25fdf08..9e2fd03 100644 --- a/backend/src/backend/gen_context.cpp +++ b/backend/src/backend/gen_context.cpp @@ -1679,6 +1679,10 @@ namespace gbe } } + void GenContext::emitF64DIVInstruction(const SelectionInstruction ) { +GBE_ASSERT(0); // No support for double on Gen7 + } + void GenContext::emitTernaryInstruction(const SelectionInstruction ) { const GenRegister dst = ra->genReg(insn.dst(0)); const GenRegister src0 = ra->genReg(insn.src(0)); diff --git a/backend/src/backend/gen_context.hpp b/backend/src/backend/gen_context.hpp index 34f9293..57eb0a6 100644 --- a/backend/src/backend/gen_context.hpp +++ b/backend/src/backend/gen_context.hpp @@ -173,6 +173,7 @@ namespace gbe void emitGetImageInfoInstruction(const SelectionInstruction ); virtual void emitI64MULInstruction(const SelectionInstruction ); virtual void emitI64DIVREMInstruction(const SelectionInstruction ); +virtual void emitF64DIVInstruction(const SelectionInstruction ); void scratchWrite(const GenRegister header, uint32_t offset, uint32_t reg_num, uint32_t reg_type, uint32_t channel_mode); void scratchRead(const GenRegister dst, const GenRegister header, uint32_t offset, uint32_t reg_num, uint32_t reg_type, uint32_t channel_mode); unsigned beforeMessage(const SelectionInstruction , GenRegister bti, GenRegister flagTemp, unsigned desc); diff --git a/backend/src/backend/gen_insn_gen7_schedule_info.hxx b/backend/src/backend/gen_insn_gen7_schedule_info.hxx index d073770..9b60c17 100644 --- a/backend/src/backend/gen_insn_gen7_schedule_info.hxx +++ b/backend/src/backend/gen_insn_gen7_schedule_info.hxx @@ -43,3 +43,4 @@ DECL_GEN7_SCHEDULE(Atomic, 80,1,1) DECL_GEN7_SCHEDULE(I64MUL, 20,40, 20) DECL_GEN7_SCHEDULE(I64SATADD, 20,40, 20) DECL_GEN7_SCHEDULE(I64SATSUB, 20,40, 20) +DECL_GEN7_SCHEDULE(F64DIV, 20,40, 20) diff --git a/backend/src/backend/gen_insn_selection.cpp b/backend/src/backend/gen_insn_selection.cpp index ab00269..eaf56f9 100644 --- a/backend/src/backend/gen_insn_selection.cpp +++ b/backend/src/backend/gen_insn_selection.cpp @@ -361,8 +361,10 @@ namespace gbe bool has32X32Mul() const { return bHas32X32Mul; } void setHas32X32Mul(bool b) { bHas32X32Mul = b; } bool hasLongType() const { return bHasLongType; } +bool hasDoubleType() const { return bHasDoubleType; } bool hasHalfType() const { return bHasHalfType; } void setHasLongType(bool b) { bHasLongType = b; } +void setHasDoubleType(bool b) { bHasDoubleType = b; } void setHasHalfType(bool b) { bHasHalfType = b; } bool hasLongRegRestrict() { return bLongRegRestrict; } void setLongRegRestrict(bool b) { bLongRegRestrict = b; } @@ -669,6 +671,8 @@ namespace gbe void I64DIV(Reg dst, Reg src0, Reg src1, GenRegister *tmp, int tmp_int); /*! 64-bit integer remainder of division */ void I64REM(Reg dst, Reg src0, Reg src1, GenRegister *tmp, int tmp_int); +/*! double division */ +void F64DIV(Reg dst, Reg src0, Reg src1, GenRegister tmp[7]); /* common functions for both binary instruction and sel_cmp and compare instruction. It will handle the IMM or normal register assignment, and will try to avoid LOADI as much as possible. */ @@ -745,6 +749,7 @@ namespace gbe uint32_t currAuxLabel; bool bHas32X32Mul; bool bHasLongType; +bool bHasDoubleType; bool bHasHalfType; bool bLongRegRestrict; uint32_t ldMsgOrder; @@ -788,7 +793,7 @@ namespace gbe curr(ctx.getSimdWidth()), file(ctx.getFunction().getRegisterFile()), maxInsnNum(ctx.getFunction().getLargestBlockSize()), dagPool(maxInsnNum), stateNum(0), vectorNum(0), bwdCodeGeneration(false), currAuxLabel(ctx.getFunction().labelNum()), -bHas32X32Mul(false), bHasLongType(false), bHasHalfType(false), bLongRegRestrict(false), +bHas32X32Mul(false), bHasLongType(false), bHasDoubleType(false), bHasHalfType(false), bLongRegRestrict(false), ldMsgOrder(LD_MSG_ORDER_IVB), slowByteGather(false) { const ir::Function = ctx.getFunction(); @@ -1618,6 +1623,15 @@ namespace gbe insn->dst(i + 1) = tmp[i]; } + void Selection::Opaque::F64DIV(Reg dst, Reg src0, Reg src1, GenRegister tmp[7]) { +SelectionInstruction *insn = this->appendInsn(SEL_OP_F64DIV, 7 + 1, 2); +
[Beignet] [PATCH 0/8] Implement double division on BDW
From: Junyan HeWe use the macro: r0 = 0, r6 = a, r7 = b, r1 = 1 math.eo.f0.0 (4) r8.acc2 r6.noacc r7.noacc 0xE (-f0.0) if madm (4) r9.acc3 r0.noacc r6.noacc r8.acc2 // Step(1), q0=a*y0 madm (4) r10.acc4 r1.noacc -r7.noacc r8.acc2 // Step(2), e0=(1-b*y0) madm (4) r11.acc5 r6.noacc -r7.noacc r9.acc3 // Step(3), r0=a-b*q0 madm (4) r12.acc6 r8.acc2 r10.acc4 r8.acc2 // Step(4), y1=y0+e0*y0 madm (4) r13.acc7 r1.noacc -r7.noacc r12.acc6// Step(5), e1=(1-b*y1) madm (4) r8.acc8 r8.acc2 r10.acc4 r12.acc6 // Step(6), y2=y0+e0*y1 madm (4) r9.acc9 r9.acc3 r11.acc5 r12.acc6 // Step(7), q1=q0+r0*y1 madm (4) r12.acc2 r12.acc6 r8.acc8 r13.acc7 // Step(8), y3=y1+e1*y2 madm (4) r11.acc3 r6.noacc -r7.noacc r9.acc9 // Step(9), r1=a-b*q1 madm (4) r8.noacc r9.acc9 r11.acc3 r12.acc2 // Step(10), q=q1+r1*y3 endif to implement hi precision double division on BDW. Signed-off-by: Junyan He --- ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
[Beignet] [PATCH 4/8] Backend: Add MATH_WITH_ACC function.
From: Junyan HeAlso add setSrc0WithAcc and setSrc1WithAcc help functions to set the correct special accumulator fields of instruction. Signed-off-by: Junyan He --- backend/src/backend/gen8_encoder.cpp | 61 +++- backend/src/backend/gen8_encoder.hpp | 5 +++ backend/src/backend/gen_defs.hpp | 11 +++ 3 files changed, 76 insertions(+), 1 deletion(-) diff --git a/backend/src/backend/gen8_encoder.cpp b/backend/src/backend/gen8_encoder.cpp index 69eabb2..0af27a3 100644 --- a/backend/src/backend/gen8_encoder.cpp +++ b/backend/src/backend/gen8_encoder.cpp @@ -360,6 +360,46 @@ namespace gbe gen8_insn->bits1.da1.dest_horiz_stride = dest.hstride; } + void Gen8Encoder::setSrc0WithAcc(GenNativeInstruction *insn, GenRegister reg, uint32_t accN) { +Gen8NativeInstruction *gen8_insn = >gen8_insn; +assert(reg.file == GEN_GENERAL_REGISTER_FILE); +assert(reg.nr < 128); +assert(gen8_insn->header.access_mode == GEN_ALIGN_16); +assert(reg.subnr == 0); +assert(gen8_insn->header.execution_size >= GEN_WIDTH_4); + +gen8_insn->bits1.da16acc.src0_reg_file = reg.file; +gen8_insn->bits1.da16acc.src0_reg_type = reg.type; +gen8_insn->bits2.da16acc.src0_abs = reg.absolute; +gen8_insn->bits2.da16acc.src0_negate = reg.negation; +gen8_insn->bits2.da16acc.src0_address_mode = reg.address_mode; +gen8_insn->bits2.da16acc.src0_subreg_nr = reg.subnr / 16; +gen8_insn->bits2.da16acc.src0_reg_nr = reg.nr; +gen8_insn->bits2.da16acc.src0_specal_acc_lo = accN; +gen8_insn->bits2.da16acc.src0_specal_acc_hi = 0; +gen8_insn->bits2.da16acc.src0_vert_stride = reg.vstride; + } + + void Gen8Encoder::setSrc1WithAcc(GenNativeInstruction *insn, GenRegister reg, uint32_t accN) { +Gen8NativeInstruction *gen8_insn = >gen8_insn; +assert(reg.file == GEN_GENERAL_REGISTER_FILE); +assert(reg.nr < 128); +assert(gen8_insn->header.access_mode == GEN_ALIGN_16); +assert(reg.subnr == 0); +assert(gen8_insn->header.execution_size >= GEN_WIDTH_4); + +gen8_insn->bits2.da16acc.src1_reg_file = reg.file; +gen8_insn->bits2.da16acc.src1_reg_type = reg.type; +gen8_insn->bits3.da16acc.src1_abs = reg.absolute; +gen8_insn->bits3.da16acc.src1_negate = reg.negation; +gen8_insn->bits3.da16acc.src1_address_mode = reg.address_mode; +gen8_insn->bits3.da16acc.src1_subreg_nr = reg.subnr / 16; +gen8_insn->bits3.da16acc.src1_reg_nr = reg.nr; +gen8_insn->bits3.da16acc.src1_specal_acc_lo = accN; +gen8_insn->bits3.da16acc.src1_specal_acc_hi = 0; +gen8_insn->bits3.da16acc.src1_vert_stride = reg.vstride; + } + void Gen8Encoder::setSrc0(GenNativeInstruction *insn, GenRegister reg) { Gen8NativeInstruction *gen8_insn = >gen8_insn; if (reg.file != GEN_ARCHITECTURE_REGISTER_FILE) @@ -372,7 +412,7 @@ namespace gbe gen8_insn->bits2.da1.src0_negate = reg.negation; gen8_insn->bits2.da1.src0_address_mode = reg.address_mode; if (reg.file == GEN_IMMEDIATE_VALUE) { -if (reg.type == GEN_TYPE_L || reg.type == GEN_TYPE_UL) { +if (reg.type == GEN_TYPE_L || reg.type == GEN_TYPE_UL || reg.type == GEN_TYPE_DF_IMM) { gen8_insn->bits3.ud = (uint32_t)(reg.value.i64 >> 32); gen8_insn->bits2.ud = (uint32_t)(reg.value.i64); } else { @@ -532,4 +572,23 @@ namespace gbe gen8_insn->bits3.da3src.src2_reg_nr++; } } + + void Gen8Encoder::MATH_WITH_ACC(GenRegister dst, uint32_t function, GenRegister src0, GenRegister src1, + uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc) + { + GenNativeInstruction *insn = this->next(GEN_OPCODE_MATH); + Gen8NativeInstruction *gen8_insn = >gen8_insn; + assert(dst.file == GEN_GENERAL_REGISTER_FILE); + assert(src0.file == GEN_GENERAL_REGISTER_FILE); + assert(src1.file == GEN_GENERAL_REGISTER_FILE); + assert(dst.hstride == GEN_HORIZONTAL_STRIDE_1 || dst.hstride == GEN_HORIZONTAL_STRIDE_0); + + gen8_insn->header.access_mode = GEN_ALIGN_16; + insn->header.destreg_or_condmod = function; + this->setHeader(insn); + this->setDst(insn, dst); + gen8_insn->bits1.da16acc.dst_specal_acc = dstAcc; + this->setSrc0WithAcc(insn, src0, src0Acc); + this->setSrc1WithAcc(insn, src1, src1Acc); + } } /* End of the name space. */ diff --git a/backend/src/backend/gen8_encoder.hpp b/backend/src/backend/gen8_encoder.hpp index 504e13d..53ec3d1 100644 --- a/backend/src/backend/gen8_encoder.hpp +++ b/backend/src/backend/gen8_encoder.hpp @@ -69,6 +69,11 @@ namespace gbe virtual unsigned setAtomicMessageDesc(GenNativeInstruction *insn, unsigned function, unsigned bti, unsigned srcNum); virtual unsigned setUntypedReadMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned elemNum); virtual unsigned setUntypedWriteMessageDesc(GenNativeInstruction *insn, unsigned bti, unsigned
[Beignet] [PATCH 5/8] Backend: Add the MADM function to gen8 encoder.
From: Junyan HeSigned-off-by: Junyan He --- backend/src/backend/gen8_encoder.cpp | 56 backend/src/backend/gen8_encoder.hpp | 2 ++ backend/src/backend/gen_defs.hpp | 2 ++ 3 files changed, 60 insertions(+) diff --git a/backend/src/backend/gen8_encoder.cpp b/backend/src/backend/gen8_encoder.cpp index 0af27a3..002a8b5 100644 --- a/backend/src/backend/gen8_encoder.cpp +++ b/backend/src/backend/gen8_encoder.cpp @@ -591,4 +591,60 @@ namespace gbe this->setSrc0WithAcc(insn, src0, src0Acc); this->setSrc1WithAcc(insn, src1, src1Acc); } + + void Gen8Encoder::MADM(GenRegister dst, GenRegister src0, GenRegister src1, GenRegister src2, + uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc, uint32_t src2Acc) + { +GenNativeInstruction *insn = this->next(GEN_OPCODE_MADM); +Gen8NativeInstruction *gen8_insn = >gen8_insn; +assert(dst.file == GEN_GENERAL_REGISTER_FILE); +assert(src0.file == GEN_GENERAL_REGISTER_FILE); +assert(src1.file == GEN_GENERAL_REGISTER_FILE); +assert(src2.file == GEN_GENERAL_REGISTER_FILE); +assert(dst.hstride == GEN_HORIZONTAL_STRIDE_1 || dst.hstride == GEN_HORIZONTAL_STRIDE_0); +assert(src0.type == GEN_TYPE_DF || src0.type == GEN_TYPE_F); +assert(src0.type == dst.type); +assert(src0.type == src1.type); +assert(src0.type == src2.type); +int32_t dataType = src0.type == GEN_TYPE_DF ? 3 : 0; + +this->setHeader(insn); +gen8_insn->bits1.da3srcacc.dest_reg_nr = dst.nr; +gen8_insn->bits1.da3srcacc.dest_subreg_nr = dst.subnr / 16; +gen8_insn->bits1.da3srcacc.dst_specal_acc = dstAcc; +gen8_insn->bits1.da3srcacc.src_type = dataType; +gen8_insn->bits1.da3srcacc.dest_type = dataType; +gen8_insn->header.access_mode = GEN_ALIGN_16; + +assert(src0.file == GEN_GENERAL_REGISTER_FILE); +assert(src0.address_mode == GEN_ADDRESS_DIRECT); +assert(src0.nr < 128); +gen8_insn->bits2.da3srcacc.src0_specal_acc = src0Acc; +gen8_insn->bits2.da3srcacc.src0_subreg_nr = src0.subnr / 4 ; +gen8_insn->bits2.da3srcacc.src0_reg_nr = src0.nr; +gen8_insn->bits1.da3srcacc.src0_abs = src0.absolute; +gen8_insn->bits1.da3srcacc.src0_negate = src0.negation; +gen8_insn->bits2.da3srcacc.src0_rep_ctrl = src0.vstride == GEN_VERTICAL_STRIDE_0; + +assert(src1.file == GEN_GENERAL_REGISTER_FILE); +assert(src1.address_mode == GEN_ADDRESS_DIRECT); +assert(src1.nr < 128); +gen8_insn->bits2.da3srcacc.src1_specal_acc = src1Acc; +gen8_insn->bits2.da3srcacc.src1_subreg_nr_low = (src1.subnr / 4) & 0x3; +gen8_insn->bits3.da3srcacc.src1_subreg_nr_high = (src1.subnr / 4) >> 2; +gen8_insn->bits2.da3srcacc.src1_rep_ctrl = src1.vstride == GEN_VERTICAL_STRIDE_0; +gen8_insn->bits3.da3srcacc.src1_reg_nr = src1.nr; +gen8_insn->bits1.da3srcacc.src1_abs = src1.absolute; +gen8_insn->bits1.da3srcacc.src1_negate = src1.negation; + +assert(src2.file == GEN_GENERAL_REGISTER_FILE); +assert(src2.address_mode == GEN_ADDRESS_DIRECT); +assert(src2.nr < 128); +gen8_insn->bits3.da3srcacc.src2_specal_acc = src2Acc; +gen8_insn->bits3.da3srcacc.src2_subreg_nr = src2.subnr / 4; +gen8_insn->bits3.da3srcacc.src2_rep_ctrl = src2.vstride == GEN_VERTICAL_STRIDE_0; +gen8_insn->bits3.da3srcacc.src2_reg_nr = src2.nr; +gen8_insn->bits1.da3srcacc.src2_abs = src2.absolute; +gen8_insn->bits1.da3srcacc.src2_negate = src2.negation; + } } /* End of the name space. */ diff --git a/backend/src/backend/gen8_encoder.hpp b/backend/src/backend/gen8_encoder.hpp index 53ec3d1..8e7939b 100644 --- a/backend/src/backend/gen8_encoder.hpp +++ b/backend/src/backend/gen8_encoder.hpp @@ -74,6 +74,8 @@ namespace gbe void MATH_WITH_ACC(GenRegister dst, uint32_t function, GenRegister src0, GenRegister src1, uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc); +void MADM(GenRegister dst, GenRegister src0, GenRegister src1, GenRegister src2, + uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc, uint32_t src2Acc); }; } #endif /* __GBE_GEN8_ENCODER_HPP__ */ diff --git a/backend/src/backend/gen_defs.hpp b/backend/src/backend/gen_defs.hpp index a1bd8dd..1b550ac 100644 --- a/backend/src/backend/gen_defs.hpp +++ b/backend/src/backend/gen_defs.hpp @@ -174,6 +174,8 @@ enum opcode { GEN_OPCODE_LINE = 89, GEN_OPCODE_PLN = 90, GEN_OPCODE_MAD = 91, + GEN_OPCODE_LRP = 92, + GEN_OPCODE_MADM = 93, GEN_OPCODE_NOP = 126, }; -- 1.9.1 ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
[Beignet] [PATCH 3/8] Backend: Add gen8 instruction field for special accumulator.
From: Junyan HeThe madm and invm function need to set accumulator id in the instruction. On BDW, the write mask of the dst and channel mask of src are reinterpreted for acc2~acc9 selection. Signed-off-by: Junyan He --- backend/src/backend/gen8_instruction.hpp | 86 1 file changed, 86 insertions(+) diff --git a/backend/src/backend/gen8_instruction.hpp b/backend/src/backend/gen8_instruction.hpp index 5cf1032..2aa5bf7 100644 --- a/backend/src/backend/gen8_instruction.hpp +++ b/backend/src/backend/gen8_instruction.hpp @@ -135,6 +135,22 @@ union Gen8NativeInstruction uint32_t dest_address_mode:1; } ia16; + struct { // The sub reg field is reinterpreted as accumulator selector. +uint32_t flag_sub_reg_nr:1; +uint32_t flag_reg_nr:1; +uint32_t mask_control:1; +uint32_t dest_reg_file:2; +uint32_t dest_reg_type:4; +uint32_t src0_reg_file:2; +uint32_t src0_reg_type:4; +uint32_t pad:1; +uint32_t dst_specal_acc:4; +uint32_t dest_subreg_nr:1; +uint32_t dest_reg_nr:8; +uint32_t reserved:2; +uint32_t dest_address_mode:1; + } da16acc; + struct { uint32_t flag_sub_reg_nr:1; uint32_t flag_reg_nr:1; @@ -153,6 +169,25 @@ union Gen8NativeInstruction uint32_t dest_subreg_nr:3; uint32_t dest_reg_nr:8; } da3src; + + struct { +uint32_t flag_sub_reg_nr:1; +uint32_t flag_reg_nr:1; +uint32_t mask_control:1; +uint32_t src1_type:1; +uint32_t src2_type:1; +uint32_t src0_abs:1; +uint32_t src0_negate:1; +uint32_t src1_abs:1; +uint32_t src1_negate:1; +uint32_t src2_abs:1; +uint32_t src2_negate:1; +uint32_t src_type:3; +uint32_t dest_type:3; +uint32_t dst_specal_acc:4; +uint32_t dest_subreg_nr:3; +uint32_t dest_reg_nr:8; + } da3srcacc; }bits1; union { @@ -219,6 +254,21 @@ union Gen8NativeInstruction } ia16; struct { +uint32_t src0_specal_acc_lo:4; +uint32_t src0_subreg_nr:1; +uint32_t src0_reg_nr:8; +uint32_t src0_abs:1; +uint32_t src0_negate:1; +uint32_t src0_address_mode:1; +uint32_t src0_specal_acc_hi:4; +uint32_t pad0:1; +uint32_t src0_vert_stride:4; +uint32_t src1_reg_file:2; +uint32_t src1_reg_type:4; +uint32_t pad:1; + } da16acc; + + struct { uint32_t src0_rep_ctrl:1; uint32_t src0_swizzle:8; uint32_t src0_subreg_nr:3; @@ -230,6 +280,17 @@ union Gen8NativeInstruction } da3src; struct { +uint32_t src0_rep_ctrl:1; +uint32_t src0_specal_acc:8; +uint32_t src0_subreg_nr:3; +uint32_t src0_reg_nr:8; +uint32_t src0_subreg_nr_w:1; +uint32_t src1_rep_ctrl:1; +uint32_t src1_specal_acc:8; +uint32_t src1_subreg_nr_low:2; + } da3srcacc; + + struct { uint32_t uip:32; } gen8_branch; @@ -294,6 +355,19 @@ union Gen8NativeInstruction } ia16; struct { +uint32_t src1_specal_acc_lo:4; +uint32_t src1_subreg_nr:1; +uint32_t src1_reg_nr:8; +uint32_t src1_abs:1; +uint32_t src1_negate:1; +uint32_t src1_address_mode:1; +uint32_t src1_specal_acc_hi:4; +uint32_t pad1:1; +uint32_t src1_vert_stride:4; +uint32_t pad2:7; + } da16acc; + + struct { uint32_t function_control:19; uint32_t header_present:1; uint32_t response_length:5; @@ -504,6 +578,18 @@ union Gen8NativeInstruction uint32_t pad:1; } da3src; + struct { +uint32_t src1_subreg_nr_high:1; +uint32_t src1_reg_nr:8; +uint32_t src1_subreg_nr_w:1; +uint32_t src2_rep_ctrl:1; +uint32_t src2_specal_acc:8; +uint32_t src2_subreg_nr:3; +uint32_t src2_reg_nr:8; +uint32_t src2_subreg_nr_w:1; +uint32_t pad:1; + } da3srcacc; + /*! Message gateway */ struct { uint32_t subfunc:3; -- 1.9.1 ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
[Beignet] [PATCH 8/8] Utest: Add double division test.
From: Junyan HeSigned-off-by: Junyan He --- kernels/compiler_double_4.cl | 5 - kernels/compiler_double_div.cl | 5 + utests/CMakeLists.txt | 1 + utests/compiler_double_4.cpp | 40 utests/compiler_double_div.cpp | 42 ++ 5 files changed, 48 insertions(+), 45 deletions(-) delete mode 100644 kernels/compiler_double_4.cl create mode 100644 kernels/compiler_double_div.cl delete mode 100644 utests/compiler_double_4.cpp create mode 100644 utests/compiler_double_div.cpp diff --git a/kernels/compiler_double_4.cl b/kernels/compiler_double_4.cl deleted file mode 100644 index e5e46f9..000 --- a/kernels/compiler_double_4.cl +++ /dev/null @@ -1,5 +0,0 @@ -#pragma OPENCL EXTENSION cl_khr_fp64 : enable -kernel void compiler_double_4(global double *src1, global double *src2, global double *dst) { - int i = get_global_id(0); - dst[i] = src1[i] + src2[i]; -} diff --git a/kernels/compiler_double_div.cl b/kernels/compiler_double_div.cl new file mode 100644 index 000..3758e65 --- /dev/null +++ b/kernels/compiler_double_div.cl @@ -0,0 +1,5 @@ +#pragma OPENCL EXTENSION cl_khr_fp64 : enable +kernel void compiler_double_div(global double *src1, global double *src2, global double *dst) { + int i = get_global_id(0); + dst[i] = src1[i] / src2[i]; +} diff --git a/utests/CMakeLists.txt b/utests/CMakeLists.txt index e7a9e26..aeae3d6 100644 --- a/utests/CMakeLists.txt +++ b/utests/CMakeLists.txt @@ -194,6 +194,7 @@ set (utests_sources compiler_sub_group_all.cpp compiler_time_stamp.cpp compiler_double_precision.cpp + compiler_double_div.cpp load_program_from_gen_bin.cpp load_program_from_spir.cpp get_arg_info.cpp diff --git a/utests/compiler_double_4.cpp b/utests/compiler_double_4.cpp deleted file mode 100644 index cb25bd4..000 --- a/utests/compiler_double_4.cpp +++ /dev/null @@ -1,40 +0,0 @@ -#include -#include "utest_helper.hpp" - -void compiler_double_4(void) -{ - const size_t n = 16; - double cpu_src1[n], cpu_src2[n]; - - // Setup kernel and buffers - OCL_CREATE_KERNEL("compiler_double_4"); - OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(double), NULL); - OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(double), NULL); - OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(double), NULL); - OCL_SET_ARG(0, sizeof(cl_mem), [0]); - OCL_SET_ARG(1, sizeof(cl_mem), [1]); - OCL_SET_ARG(2, sizeof(cl_mem), [2]); - globals[0] = n; - locals[0] = 16; - - // Run random tests - OCL_MAP_BUFFER(0); - OCL_MAP_BUFFER(1); - for (int32_t i = 0; i < (int32_t) n; ++i) { -cpu_src1[i] = ((double*)buf_data[0])[i] = rand() * 1e-2; -cpu_src2[i] = ((double*)buf_data[1])[i] = rand() * 1e-2; - } - OCL_UNMAP_BUFFER(0); - OCL_UNMAP_BUFFER(1); - - // Run the kernel on GPU - OCL_NDRANGE(1); - - // Compare - OCL_MAP_BUFFER(2); - for (int32_t i = 0; i < (int32_t) n; ++i) -OCL_ASSERT(fabs(((double*)buf_data[2])[i] - cpu_src1[i] - cpu_src2[i]) < 1e-4); - OCL_UNMAP_BUFFER(2); -} - -MAKE_UTEST_FROM_FUNCTION(compiler_double_4); diff --git a/utests/compiler_double_div.cpp b/utests/compiler_double_div.cpp new file mode 100644 index 000..f3a21df --- /dev/null +++ b/utests/compiler_double_div.cpp @@ -0,0 +1,42 @@ +#include +#include "utest_helper.hpp" + +void compiler_double_div(void) +{ + const size_t n = 16; + double cpu_src0[n], cpu_src1[n]; + + // Setup kernel and buffers + OCL_CREATE_KERNEL("compiler_double_div"); + OCL_CREATE_BUFFER(buf[0], 0, n * sizeof(double), NULL); + OCL_CREATE_BUFFER(buf[1], 0, n * sizeof(double), NULL); + OCL_CREATE_BUFFER(buf[2], 0, n * sizeof(double), NULL); + OCL_SET_ARG(0, sizeof(cl_mem), [0]); + OCL_SET_ARG(1, sizeof(cl_mem), [1]); + OCL_SET_ARG(2, sizeof(cl_mem), [2]); + globals[0] = n; + locals[0] = 16; + + // Run random tests + OCL_MAP_BUFFER(0); + OCL_MAP_BUFFER(1); + for (int32_t i = 0; i < (int32_t) n; ++i) { +cpu_src0[i] = ((double*)buf_data[0])[i] = ((double)(((i - 5)*1334) * 11105)); +cpu_src1[i] = ((double*)buf_data[1])[i] = 499.13542123d*(i + 132.43d + 142.32*i); + } + OCL_UNMAP_BUFFER(0); + OCL_UNMAP_BUFFER(1); + + // Run the kernel on GPU + OCL_NDRANGE(1); + + // Compare + OCL_MAP_BUFFER(2); + for (int32_t i = 0; i < (int32_t) n; ++i) { +OCL_ASSERT(fabs(((double*)buf_data[2])[i] - cpu_src0[i]/cpu_src1[i]) < 1e-32); +//printf("%d : %fref value: %f\n", i, ((double*)buf_data[2])[i], cpu_src0[i]/cpu_src1[i]); + } + OCL_UNMAP_BUFFER(2); +} + +MAKE_UTEST_FROM_FUNCTION(compiler_double_div); -- 1.9.1 ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
[Beignet] [PATCH 7/8] Backend: Add madm and invm instrucions to disasm.
From: Junyan HeWe also add special accumulator field print to disasm. Signed-off-by: Junyan He --- backend/src/backend/gen/gen_mesa_disasm.c | 89 +-- 1 file changed, 84 insertions(+), 5 deletions(-) diff --git a/backend/src/backend/gen/gen_mesa_disasm.c b/backend/src/backend/gen/gen_mesa_disasm.c index 5220233..733b2e6 100644 --- a/backend/src/backend/gen/gen_mesa_disasm.c +++ b/backend/src/backend/gen/gen_mesa_disasm.c @@ -84,6 +84,7 @@ static const struct { [GEN_OPCODE_DP3] = { .name = "dp3", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_DP2] = { .name = "dp2", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_MATH] = { .name = "math", .nsrc = 2, .ndst = 1 }, + [GEN_OPCODE_MADM] = { .name = "madm", .nsrc = 3, .ndst = 1 }, [GEN_OPCODE_AVG] = { .name = "avg", .nsrc = 2, .ndst = 1 }, [GEN_OPCODE_ADD] = { .name = "add", .nsrc = 2, .ndst = 1 }, @@ -311,6 +312,18 @@ static const char *writemask[16] = { [0xf] = "", }; +static const char *special_acc[9] = { + [0x0] = ".acc2", + [0x1] = ".acc3", + [0x2] = ".acc4", + [0x3] = ".acc5", + [0x4] = ".acc6", + [0x5] = ".acc7", + [0x6] = ".acc8", + [0x7] = ".acc9", + [0x8] = ".noacc", +}; + static const char *end_of_thread[2] = { [0] = "", [1] = "EOT" @@ -532,6 +545,24 @@ static int gen_version; #define GENERIC_MSG_LENGTH(inst) GEN_BITS_FIELD(inst, bits3.generic_gen5.msg_length) #define GENERIC_RESPONSE_LENGTH(inst) GEN_BITS_FIELD(inst, bits3.generic_gen5.response_length) +static int is_special_acc(const void* inst) +{ + if (gen_version < 80) +return 0; + + if (OPCODE(inst) != GEN_OPCODE_MADM && OPCODE(inst) != GEN_OPCODE_MATH) +return 0; + + if (OPCODE(inst) == GEN_OPCODE_MATH && +(MATH_FUNCTION(inst) != GEN8_MATH_FUNCTION_INVM && MATH_FUNCTION(inst) != GEN8_MATH_FUNCTION_RSQRTM)) +return 0; + + if (ACCESS_MODE(inst) != GEN_ALIGN_16) +return 0; + + return 1; +} + static int string(FILE *file, const char *string) { fputs (string, file); @@ -688,7 +719,12 @@ static int dest(FILE *file, const void* inst) format(file, ".%d", GEN_BITS_FIELD(inst, bits1.da16.dest_subreg_nr) / reg_type_size[GEN_BITS_FIELD(inst, bits1.da16.dest_reg_type)]); string(file, "<1>"); - err |= control(file, "writemask", writemask, GEN_BITS_FIELD(inst, bits1.da16.dest_writemask), NULL); + + if (is_special_acc(inst)) { +err |= control(file, "specialacc", special_acc, ((const union Gen8NativeInstruction *)inst)->bits1.da16acc.dst_specal_acc, NULL); + } else { +err |= control(file, "writemask", writemask, GEN_BITS_FIELD(inst, bits1.da16.dest_writemask), NULL); + } err |= control(file, "dest reg encoding", reg_encoding, GEN_BITS_FIELD(inst, bits1.da16.dest_reg_type), NULL); } else { err = 1; @@ -710,7 +746,11 @@ static int dest_3src(FILE *file, const void *inst) if (GEN_BITS_FIELD(inst, bits1.da3src.dest_subreg_nr)) format(file, ".%d", GEN_BITS_FIELD(inst, bits1.da3src.dest_subreg_nr)); string(file, "<1>"); - err |= control(file, "writemask", writemask, GEN_BITS_FIELD(inst, bits1.da3src.dest_writemask), NULL); + if (is_special_acc(inst)) { +err |= control(file, "specialacc", special_acc, ((const union Gen8NativeInstruction *)inst)->bits1.da3srcacc.dst_specal_acc, NULL); + } else { +err |= control(file, "writemask", writemask, GEN_BITS_FIELD(inst, bits1.da16.dest_writemask), NULL); + } err |= control(file, "dest reg encoding", reg_encoding, GEN_TYPE_F, NULL); return 0; @@ -775,7 +815,7 @@ static int src_ia1(FILE *file, return err; } -static int src_da16(FILE *file, +static int src_da16(FILE *file, const void* inst, int src_num, uint32_t _reg_type, uint32_t _reg_file, uint32_t _vert_stride, @@ -803,6 +843,17 @@ static int src_da16(FILE *file, err |= control(file, "vert stride", vert_stride, _vert_stride, NULL); string(file, ",4,1>"); + + if (is_special_acc(inst)) { +if (src_num == 0) { + err |= control(file, "specialacc", special_acc, ((const union Gen8NativeInstruction *)inst)->bits2.da16acc.src0_specal_acc_lo, NULL); +} else { + assert(src_num == 1); + err |= control(file, "specialacc", special_acc, ((const union Gen8NativeInstruction *)inst)->bits3.da16acc.src1_specal_acc_lo, NULL); +} +return err; + } + /* * Three kinds of swizzle display: * identity - nothing printed @@ -850,6 +901,12 @@ static int src0_3src(FILE *file, const void* inst) string(file, "<8,8,1>"); err |= control(file, "src da16 reg type", reg_encoding, GEN_TYPE_F, NULL); + + if (is_special_acc(inst)) { +err |= control(file, "specialacc", special_acc, ((const union Gen8NativeInstruction *)inst)->bits2.da3srcacc.src0_specal_acc, NULL); +return err; + } + /* * Three kinds of swizzle display: * identity -
[Beignet] [PATCH 6/8] Backend: Implement FDIV64 on BDW.
From: Junyan HeAccording to the document, we use a set of instructions to implement double type division. Signed-off-by: Junyan He --- backend/src/backend/gen8_context.cpp | 68 backend/src/backend/gen8_context.hpp | 2 ++ 2 files changed, 70 insertions(+) diff --git a/backend/src/backend/gen8_context.cpp b/backend/src/backend/gen8_context.cpp index b497ee5..f465832 100644 --- a/backend/src/backend/gen8_context.cpp +++ b/backend/src/backend/gen8_context.cpp @@ -924,6 +924,74 @@ namespace gbe this->unpackLongVec(src, dst, p->curr.execWidth); } + void Gen8Context::emitF64DIVInstruction(const SelectionInstruction ) { +/* Macro for Double Precision IEEE Compliant fdiv + + Set Rounding Mode in CR to RNE + GRF are initialized: r0 = 0, r6 = a, r7 = b, r1 = 1 + The default data type for the macro is :df + + math.eo.f0.0 (4) r8.acc2 r6.noacc r7.noacc 0xE + (-f0.0) if + madm (4) r9.acc3 r0.noacc r6.noacc r8.acc2 // Step(1), q0=a*y0 + madm (4) r10.acc4 r1.noacc -r7.noacc r8.acc2 // Step(2), e0=(1-b*y0) + madm (4) r11.acc5 r6.noacc -r7.noacc r9.acc3 // Step(3), r0=a-b*q0 + madm (4) r12.acc6 r8.acc2 r10.acc4 r8.acc2 // Step(4), y1=y0+e0*y0 + madm (4) r13.acc7 r1.noacc -r7.noacc r12.acc6// Step(5), e1=(1-b*y1) + madm (4) r8.acc8 r8.acc2 r10.acc4 r12.acc6 // Step(6), y2=y0+e0*y1 + madm (4) r9.acc9 r9.acc3 r11.acc5 r12.acc6 // Step(7), q1=q0+r0*y1 + madm (4) r12.acc2 r12.acc6 r8.acc8 r13.acc7 // Step(8), y3=y1+e1*y2 + madm (4) r11.acc3 r6.noacc -r7.noacc r9.acc9 // Step(9), r1=a-b*q1 + + Change Rounding Mode in CR if required + Implicit Accumulator for destination is NULL + + madm (4) r8.noacc r9.acc9 r11.acc3 r12.acc2 // Step(10), q=q1+r1*y3 + endif */ +GenRegister r6 = GenRegister::retype(ra->genReg(insn.src(0)), GEN_TYPE_DF); +GenRegister r7 = GenRegister::retype(ra->genReg(insn.src(1)), GEN_TYPE_DF); +GenRegister r8 = GenRegister::retype(ra->genReg(insn.dst(0)), GEN_TYPE_DF); +const GenRegister r0 = GenRegister::retype(ra->genReg(insn.dst(1)), GEN_TYPE_DF); +const GenRegister r1 = GenRegister::retype(ra->genReg(insn.dst(2)), GEN_TYPE_DF); +const GenRegister r9 = GenRegister::retype(ra->genReg(insn.dst(3)), GEN_TYPE_DF); +const GenRegister r10 = GenRegister::retype(ra->genReg(insn.dst(4)), GEN_TYPE_DF); +const GenRegister r11 = GenRegister::retype(ra->genReg(insn.dst(5)), GEN_TYPE_DF); +const GenRegister r12 = GenRegister::retype(ra->genReg(insn.dst(6)), GEN_TYPE_DF); +const GenRegister r13 = GenRegister::retype(ra->genReg(insn.dst(7)), GEN_TYPE_DF); +Gen8Encoder *p8 = reinterpret_cast(p); +p->push(); { + p->curr.execWidth = 4; + p->curr.predicate = GEN_PREDICATE_NONE; + p->curr.noMask= 1; + p->MOV(r1, GenRegister::immdf(1.0d)); + p->MOV(r0, GenRegister::immdf(0.0d)); + + for (int i = 0; i < (simdWidth == 16 ? 4 : 2); i++) { +p->curr.predicate = GEN_PREDICATE_NONE; +p8->MATH_WITH_ACC(r8, GEN8_MATH_FUNCTION_INVM, r6, r7, GEN8_INSN_ACC2, GEN8_INSN_NOACC, GEN8_INSN_NOACC); +p->curr.useFlag(insn.state.flag, insn.state.subFlag); +p->curr.predicate = GEN_PREDICATE_NORMAL; +p->curr.inversePredicate = 1; +p->curr.noMask= 0; +p8->MADM(r9, r0, r6, r8, GEN8_INSN_ACC3, GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC2); +p8->MADM(r10, r1, GenRegister::negate(r7), r8, GEN8_INSN_ACC4, GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC2); +p8->MADM(r11, r6, GenRegister::negate(r7), r9, GEN8_INSN_ACC5, GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC3); +p8->MADM(r12, r8, r10, r8, GEN8_INSN_ACC6, GEN8_INSN_ACC2, GEN8_INSN_ACC4, GEN8_INSN_ACC2); +p8->MADM(r13, r1, GenRegister::negate(r7), r12, GEN8_INSN_ACC7, GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC6); +p8->MADM(r8, r8, r10, r12, GEN8_INSN_ACC8, GEN8_INSN_ACC2, GEN8_INSN_ACC4, GEN8_INSN_ACC6); +p8->MADM(r9, r9, r11, r12, GEN8_INSN_ACC9, GEN8_INSN_ACC3, GEN8_INSN_ACC5, GEN8_INSN_ACC6); +p8->MADM(r12, r12, r8, r13, GEN8_INSN_ACC2, GEN8_INSN_ACC6, GEN8_INSN_ACC8, GEN8_INSN_ACC7); +p8->MADM(r11, r6, GenRegister::negate(r7), r9, GEN8_INSN_ACC3, GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC9); + +p8->MADM(r8, r9, r11, r12, GEN8_INSN_NOACC, GEN8_INSN_ACC9, GEN8_INSN_ACC3, GEN8_INSN_ACC2); + +r6 = GenRegister::offset(r6, 1); +r7 = GenRegister::offset(r7, 1); +r8 = GenRegister::offset(r8, 1); + } +} p->pop(); + } + void Gen8Context::setA0Content(uint16_t new_a0[16], uint16_t max_offset, int sz) { if (sz == 0) sz = 16; diff --git a/backend/src/backend/gen8_context.hpp b/backend/src/backend/gen8_context.hpp index 84508e9..386f7f3 100644 ---
Re: [Beignet] [PATCH 6/8] Backend: Implement FDIV64 on BDW.
On Tue, Sep 15, 2015 at 06:00:57AM -0700, Matt Turner wrote: > Date: Tue, 15 Sep 2015 06:00:57 -0700 > From: Matt Turner> To: "junyan.he" > Cc: "beignet@lists.freedesktop.org" > Subject: Re: [Beignet] [PATCH 6/8] Backend: Implement FDIV64 on BDW. > > On Tue, Sep 15, 2015 at 4:15 AM, wrote: > > From: Junyan He > > > > According to the document, we use a set of instructions > > to implement double type division. > > > > Signed-off-by: Junyan He > > --- > > backend/src/backend/gen8_context.cpp | 68 > > > > backend/src/backend/gen8_context.hpp | 2 ++ > > 2 files changed, 70 insertions(+) > > > > diff --git a/backend/src/backend/gen8_context.cpp > > b/backend/src/backend/gen8_context.cpp > > index b497ee5..f465832 100644 > > --- a/backend/src/backend/gen8_context.cpp > > +++ b/backend/src/backend/gen8_context.cpp > > @@ -924,6 +924,74 @@ namespace gbe > > this->unpackLongVec(src, dst, p->curr.execWidth); > >} > > > > + void Gen8Context::emitF64DIVInstruction(const SelectionInstruction > > ) { > > +/* Macro for Double Precision IEEE Compliant fdiv > > + > > + Set Rounding Mode in CR to RNE > > + GRF are initialized: r0 = 0, r6 = a, r7 = b, r1 = 1 > > + The default data type for the macro is :df > > + > > + math.eo.f0.0 (4) r8.acc2 r6.noacc r7.noacc 0xE > > + (-f0.0) if > > + madm (4) r9.acc3 r0.noacc r6.noacc r8.acc2 // Step(1), q0=a*y0 > > + madm (4) r10.acc4 r1.noacc -r7.noacc r8.acc2 // Step(2), > > e0=(1-b*y0) > > + madm (4) r11.acc5 r6.noacc -r7.noacc r9.acc3 // Step(3), > > r0=a-b*q0 > > + madm (4) r12.acc6 r8.acc2 r10.acc4 r8.acc2 // Step(4), > > y1=y0+e0*y0 > > + madm (4) r13.acc7 r1.noacc -r7.noacc r12.acc6// Step(5), > > e1=(1-b*y1) > > + madm (4) r8.acc8 r8.acc2 r10.acc4 r12.acc6 // Step(6), > > y2=y0+e0*y1 > > + madm (4) r9.acc9 r9.acc3 r11.acc5 r12.acc6 // Step(7), > > q1=q0+r0*y1 > > + madm (4) r12.acc2 r12.acc6 r8.acc8 r13.acc7 // Step(8), > > y3=y1+e1*y2 > > + madm (4) r11.acc3 r6.noacc -r7.noacc r9.acc9 // Step(9), > > r1=a-b*q1 > > + > > + Change Rounding Mode in CR if required > > + Implicit Accumulator for destination is NULL > > + > > + madm (4) r8.noacc r9.acc9 r11.acc3 r12.acc2 // Step(10), > > q=q1+r1*y3 > > + endif */ > > I don't see an IF or an ENDIF instruction emitted in the code below. > Is that intentional, or am I misreading the code? > Here, we use f0.1 as the predication for all the instructions, like: (-f0.1) madm (4) r9.acc3 r0.noacc r6.noacc r8.acc2 (-f0.1) madm (4) r10.acc4 r1.noacc -r7.noacc r8.acc2 . I avoid using IF-Endif here, because we need to calculate the instruction number within IF clause, and it is not convenient. > > +GenRegister r6 = GenRegister::retype(ra->genReg(insn.src(0)), > > GEN_TYPE_DF); > > +GenRegister r7 = GenRegister::retype(ra->genReg(insn.src(1)), > > GEN_TYPE_DF); > > +GenRegister r8 = GenRegister::retype(ra->genReg(insn.dst(0)), > > GEN_TYPE_DF); > > +const GenRegister r0 = GenRegister::retype(ra->genReg(insn.dst(1)), > > GEN_TYPE_DF); > > +const GenRegister r1 = GenRegister::retype(ra->genReg(insn.dst(2)), > > GEN_TYPE_DF); > > +const GenRegister r9 = GenRegister::retype(ra->genReg(insn.dst(3)), > > GEN_TYPE_DF); > > +const GenRegister r10 = GenRegister::retype(ra->genReg(insn.dst(4)), > > GEN_TYPE_DF); > > +const GenRegister r11 = GenRegister::retype(ra->genReg(insn.dst(5)), > > GEN_TYPE_DF); > > +const GenRegister r12 = GenRegister::retype(ra->genReg(insn.dst(6)), > > GEN_TYPE_DF); > > +const GenRegister r13 = GenRegister::retype(ra->genReg(insn.dst(7)), > > GEN_TYPE_DF); > > +Gen8Encoder *p8 = reinterpret_cast(p); > > +p->push(); { > > + p->curr.execWidth = 4; > > + p->curr.predicate = GEN_PREDICATE_NONE; > > + p->curr.noMask= 1; > > + p->MOV(r1, GenRegister::immdf(1.0d)); > > + p->MOV(r0, GenRegister::immdf(0.0d)); > > + > > + for (int i = 0; i < (simdWidth == 16 ? 4 : 2); i++) { > > +p->curr.predicate = GEN_PREDICATE_NONE; > > +p8->MATH_WITH_ACC(r8, GEN8_MATH_FUNCTION_INVM, r6, r7, > > GEN8_INSN_ACC2, GEN8_INSN_NOACC, GEN8_INSN_NOACC); > > +p->curr.useFlag(insn.state.flag, insn.state.subFlag); > > +p->curr.predicate = GEN_PREDICATE_NORMAL; > > +p->curr.inversePredicate = 1; > > +p->curr.noMask= 0; > > +p8->MADM(r9, r0, r6, r8, GEN8_INSN_ACC3, GEN8_INSN_NOACC, > > GEN8_INSN_NOACC, GEN8_INSN_ACC2); > > +p8->MADM(r10, r1, GenRegister::negate(r7), r8, GEN8_INSN_ACC4, > > GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC2); > > +p8->MADM(r11, r6, GenRegister::negate(r7), r9, GEN8_INSN_ACC5, > >
Re: [Beignet] [PATCH 5/8] Backend: Add the MADM function to gen8 encoder.
On Tue, Sep 15, 2015 at 05:57:13AM -0700, Matt Turner wrote: > Date: Tue, 15 Sep 2015 05:57:13 -0700 > From: Matt Turner> To: "junyan.he" > Cc: "beignet@lists.freedesktop.org" > Subject: Re: [Beignet] [PATCH 5/8] Backend: Add the MADM function to gen8 > encoder. > > On Tue, Sep 15, 2015 at 4:15 AM, wrote: > > From: Junyan He > > > > Signed-off-by: Junyan He > > --- > > backend/src/backend/gen8_encoder.cpp | 56 > > > > backend/src/backend/gen8_encoder.hpp | 2 ++ > > backend/src/backend/gen_defs.hpp | 2 ++ > > 3 files changed, 60 insertions(+) > > > > diff --git a/backend/src/backend/gen8_encoder.cpp > > b/backend/src/backend/gen8_encoder.cpp > > index 0af27a3..002a8b5 100644 > > --- a/backend/src/backend/gen8_encoder.cpp > > +++ b/backend/src/backend/gen8_encoder.cpp > > @@ -591,4 +591,60 @@ namespace gbe > > this->setSrc0WithAcc(insn, src0, src0Acc); > > this->setSrc1WithAcc(insn, src1, src1Acc); > >} > > + > > + void Gen8Encoder::MADM(GenRegister dst, GenRegister src0, GenRegister > > src1, GenRegister src2, > > + uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc, uint32_t > > src2Acc) > > + { > > +GenNativeInstruction *insn = this->next(GEN_OPCODE_MADM); > > +Gen8NativeInstruction *gen8_insn = >gen8_insn; > > +assert(dst.file == GEN_GENERAL_REGISTER_FILE); > > +assert(src0.file == GEN_GENERAL_REGISTER_FILE); > > +assert(src1.file == GEN_GENERAL_REGISTER_FILE); > > +assert(src2.file == GEN_GENERAL_REGISTER_FILE); > > +assert(dst.hstride == GEN_HORIZONTAL_STRIDE_1 || dst.hstride == > > GEN_HORIZONTAL_STRIDE_0); > > +assert(src0.type == GEN_TYPE_DF || src0.type == GEN_TYPE_F); > > +assert(src0.type == dst.type); > > +assert(src0.type == src1.type); > > +assert(src0.type == src2.type); > > +int32_t dataType = src0.type == GEN_TYPE_DF ? 3 : 0; > > + > > +this->setHeader(insn); > > +gen8_insn->bits1.da3srcacc.dest_reg_nr = dst.nr; > > +gen8_insn->bits1.da3srcacc.dest_subreg_nr = dst.subnr / 16; > > +gen8_insn->bits1.da3srcacc.dst_specal_acc = dstAcc; > > +gen8_insn->bits1.da3srcacc.src_type = dataType; > > +gen8_insn->bits1.da3srcacc.dest_type = dataType; > > +gen8_insn->header.access_mode = GEN_ALIGN_16; > > + > > +assert(src0.file == GEN_GENERAL_REGISTER_FILE); > > +assert(src0.address_mode == GEN_ADDRESS_DIRECT); > > +assert(src0.nr < 128); > > +gen8_insn->bits2.da3srcacc.src0_specal_acc = src0Acc; > > +gen8_insn->bits2.da3srcacc.src0_subreg_nr = src0.subnr / 4 ; > > +gen8_insn->bits2.da3srcacc.src0_reg_nr = src0.nr; > > +gen8_insn->bits1.da3srcacc.src0_abs = src0.absolute; > > +gen8_insn->bits1.da3srcacc.src0_negate = src0.negation; > > +gen8_insn->bits2.da3srcacc.src0_rep_ctrl = src0.vstride == > > GEN_VERTICAL_STRIDE_0; > > + > > +assert(src1.file == GEN_GENERAL_REGISTER_FILE); > > +assert(src1.address_mode == GEN_ADDRESS_DIRECT); > > +assert(src1.nr < 128); > > +gen8_insn->bits2.da3srcacc.src1_specal_acc = src1Acc; > > +gen8_insn->bits2.da3srcacc.src1_subreg_nr_low = (src1.subnr / 4) & 0x3; > > +gen8_insn->bits3.da3srcacc.src1_subreg_nr_high = (src1.subnr / 4) >> 2; > > +gen8_insn->bits2.da3srcacc.src1_rep_ctrl = src1.vstride == > > GEN_VERTICAL_STRIDE_0; > > +gen8_insn->bits3.da3srcacc.src1_reg_nr = src1.nr; > > +gen8_insn->bits1.da3srcacc.src1_abs = src1.absolute; > > +gen8_insn->bits1.da3srcacc.src1_negate = src1.negation; > > + > > +assert(src2.file == GEN_GENERAL_REGISTER_FILE); > > +assert(src2.address_mode == GEN_ADDRESS_DIRECT); > > +assert(src2.nr < 128); > > +gen8_insn->bits3.da3srcacc.src2_specal_acc = src2Acc; > > +gen8_insn->bits3.da3srcacc.src2_subreg_nr = src2.subnr / 4; > > +gen8_insn->bits3.da3srcacc.src2_rep_ctrl = src2.vstride == > > GEN_VERTICAL_STRIDE_0; > > +gen8_insn->bits3.da3srcacc.src2_reg_nr = src2.nr; > > +gen8_insn->bits1.da3srcacc.src2_abs = src2.absolute; > > +gen8_insn->bits1.da3srcacc.src2_negate = src2.negation; > > + } > > } /* End of the name space. */ > > diff --git a/backend/src/backend/gen8_encoder.hpp > > b/backend/src/backend/gen8_encoder.hpp > > index 53ec3d1..8e7939b 100644 > > --- a/backend/src/backend/gen8_encoder.hpp > > +++ b/backend/src/backend/gen8_encoder.hpp > > @@ -74,6 +74,8 @@ namespace gbe > > > > void MATH_WITH_ACC(GenRegister dst, uint32_t function, GenRegister > > src0, GenRegister src1, > > uint32_t dstAcc, uint32_t src0Acc, uint32_t > > src1Acc); > > +void MADM(GenRegister dst, GenRegister src0, GenRegister src1, > > GenRegister src2, > > + uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc, > > uint32_t src2Acc); > >}; > > } > > #endif /* __GBE_GEN8_ENCODER_HPP__ */
Re: [Beignet] [PATCH 3/8] Backend: Add gen8 instruction field for special accumulator.
On Tue, Sep 15, 2015 at 4:15 AM,wrote: > From: Junyan He > > The madm and invm function need to set accumulator id in the > instruction. On BDW, the write mask of the dst and channel > mask of src are reinterpreted for acc2~acc9 selection. > > Signed-off-by: Junyan He > --- > backend/src/backend/gen8_instruction.hpp | 86 > > 1 file changed, 86 insertions(+) > > diff --git a/backend/src/backend/gen8_instruction.hpp > b/backend/src/backend/gen8_instruction.hpp > index 5cf1032..2aa5bf7 100644 > --- a/backend/src/backend/gen8_instruction.hpp > +++ b/backend/src/backend/gen8_instruction.hpp > @@ -135,6 +135,22 @@ union Gen8NativeInstruction > uint32_t dest_address_mode:1; >} ia16; > > + struct { // The sub reg field is reinterpreted as accumulator selector. > +uint32_t flag_sub_reg_nr:1; > +uint32_t flag_reg_nr:1; > +uint32_t mask_control:1; > +uint32_t dest_reg_file:2; > +uint32_t dest_reg_type:4; > +uint32_t src0_reg_file:2; > +uint32_t src0_reg_type:4; > +uint32_t pad:1; > +uint32_t dst_specal_acc:4; s/specal/special/ throughout this patch. ___ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet
Re: [Beignet] [PATCH 5/8] Backend: Add the MADM function to gen8 encoder.
On Tue, Sep 15, 2015 at 4:15 AM,wrote: > From: Junyan He > > Signed-off-by: Junyan He > --- > backend/src/backend/gen8_encoder.cpp | 56 > > backend/src/backend/gen8_encoder.hpp | 2 ++ > backend/src/backend/gen_defs.hpp | 2 ++ > 3 files changed, 60 insertions(+) > > diff --git a/backend/src/backend/gen8_encoder.cpp > b/backend/src/backend/gen8_encoder.cpp > index 0af27a3..002a8b5 100644 > --- a/backend/src/backend/gen8_encoder.cpp > +++ b/backend/src/backend/gen8_encoder.cpp > @@ -591,4 +591,60 @@ namespace gbe > this->setSrc0WithAcc(insn, src0, src0Acc); > this->setSrc1WithAcc(insn, src1, src1Acc); >} > + > + void Gen8Encoder::MADM(GenRegister dst, GenRegister src0, GenRegister > src1, GenRegister src2, > + uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc, uint32_t src2Acc) > + { > +GenNativeInstruction *insn = this->next(GEN_OPCODE_MADM); > +Gen8NativeInstruction *gen8_insn = >gen8_insn; > +assert(dst.file == GEN_GENERAL_REGISTER_FILE); > +assert(src0.file == GEN_GENERAL_REGISTER_FILE); > +assert(src1.file == GEN_GENERAL_REGISTER_FILE); > +assert(src2.file == GEN_GENERAL_REGISTER_FILE); > +assert(dst.hstride == GEN_HORIZONTAL_STRIDE_1 || dst.hstride == > GEN_HORIZONTAL_STRIDE_0); > +assert(src0.type == GEN_TYPE_DF || src0.type == GEN_TYPE_F); > +assert(src0.type == dst.type); > +assert(src0.type == src1.type); > +assert(src0.type == src2.type); > +int32_t dataType = src0.type == GEN_TYPE_DF ? 3 : 0; > + > +this->setHeader(insn); > +gen8_insn->bits1.da3srcacc.dest_reg_nr = dst.nr; > +gen8_insn->bits1.da3srcacc.dest_subreg_nr = dst.subnr / 16; > +gen8_insn->bits1.da3srcacc.dst_specal_acc = dstAcc; > +gen8_insn->bits1.da3srcacc.src_type = dataType; > +gen8_insn->bits1.da3srcacc.dest_type = dataType; > +gen8_insn->header.access_mode = GEN_ALIGN_16; > + > +assert(src0.file == GEN_GENERAL_REGISTER_FILE); > +assert(src0.address_mode == GEN_ADDRESS_DIRECT); > +assert(src0.nr < 128); > +gen8_insn->bits2.da3srcacc.src0_specal_acc = src0Acc; > +gen8_insn->bits2.da3srcacc.src0_subreg_nr = src0.subnr / 4 ; > +gen8_insn->bits2.da3srcacc.src0_reg_nr = src0.nr; > +gen8_insn->bits1.da3srcacc.src0_abs = src0.absolute; > +gen8_insn->bits1.da3srcacc.src0_negate = src0.negation; > +gen8_insn->bits2.da3srcacc.src0_rep_ctrl = src0.vstride == > GEN_VERTICAL_STRIDE_0; > + > +assert(src1.file == GEN_GENERAL_REGISTER_FILE); > +assert(src1.address_mode == GEN_ADDRESS_DIRECT); > +assert(src1.nr < 128); > +gen8_insn->bits2.da3srcacc.src1_specal_acc = src1Acc; > +gen8_insn->bits2.da3srcacc.src1_subreg_nr_low = (src1.subnr / 4) & 0x3; > +gen8_insn->bits3.da3srcacc.src1_subreg_nr_high = (src1.subnr / 4) >> 2; > +gen8_insn->bits2.da3srcacc.src1_rep_ctrl = src1.vstride == > GEN_VERTICAL_STRIDE_0; > +gen8_insn->bits3.da3srcacc.src1_reg_nr = src1.nr; > +gen8_insn->bits1.da3srcacc.src1_abs = src1.absolute; > +gen8_insn->bits1.da3srcacc.src1_negate = src1.negation; > + > +assert(src2.file == GEN_GENERAL_REGISTER_FILE); > +assert(src2.address_mode == GEN_ADDRESS_DIRECT); > +assert(src2.nr < 128); > +gen8_insn->bits3.da3srcacc.src2_specal_acc = src2Acc; > +gen8_insn->bits3.da3srcacc.src2_subreg_nr = src2.subnr / 4; > +gen8_insn->bits3.da3srcacc.src2_rep_ctrl = src2.vstride == > GEN_VERTICAL_STRIDE_0; > +gen8_insn->bits3.da3srcacc.src2_reg_nr = src2.nr; > +gen8_insn->bits1.da3srcacc.src2_abs = src2.absolute; > +gen8_insn->bits1.da3srcacc.src2_negate = src2.negation; > + } > } /* End of the name space. */ > diff --git a/backend/src/backend/gen8_encoder.hpp > b/backend/src/backend/gen8_encoder.hpp > index 53ec3d1..8e7939b 100644 > --- a/backend/src/backend/gen8_encoder.hpp > +++ b/backend/src/backend/gen8_encoder.hpp > @@ -74,6 +74,8 @@ namespace gbe > > void MATH_WITH_ACC(GenRegister dst, uint32_t function, GenRegister src0, > GenRegister src1, > uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc); > +void MADM(GenRegister dst, GenRegister src0, GenRegister src1, > GenRegister src2, > + uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc, uint32_t > src2Acc); >}; > } > #endif /* __GBE_GEN8_ENCODER_HPP__ */ > diff --git a/backend/src/backend/gen_defs.hpp > b/backend/src/backend/gen_defs.hpp > index a1bd8dd..1b550ac 100644 > --- a/backend/src/backend/gen_defs.hpp > +++ b/backend/src/backend/gen_defs.hpp > @@ -174,6 +174,8 @@ enum opcode { >GEN_OPCODE_LINE = 89, >GEN_OPCODE_PLN = 90, >GEN_OPCODE_MAD = 91, > + GEN_OPCODE_LRP = 92, Unrelated to the main purpose of the patch: Do I understand correctly that Beignet does not emit the LRP instruction? If not, I'm curious why not? It maps pretty well to the mix() function (just reverse the
Re: [Beignet] [PATCH 6/8] Backend: Implement FDIV64 on BDW.
On Tue, Sep 15, 2015 at 4:15 AM,wrote: > From: Junyan He > > According to the document, we use a set of instructions > to implement double type division. > > Signed-off-by: Junyan He > --- > backend/src/backend/gen8_context.cpp | 68 > > backend/src/backend/gen8_context.hpp | 2 ++ > 2 files changed, 70 insertions(+) > > diff --git a/backend/src/backend/gen8_context.cpp > b/backend/src/backend/gen8_context.cpp > index b497ee5..f465832 100644 > --- a/backend/src/backend/gen8_context.cpp > +++ b/backend/src/backend/gen8_context.cpp > @@ -924,6 +924,74 @@ namespace gbe > this->unpackLongVec(src, dst, p->curr.execWidth); >} > > + void Gen8Context::emitF64DIVInstruction(const SelectionInstruction ) { > +/* Macro for Double Precision IEEE Compliant fdiv > + > + Set Rounding Mode in CR to RNE > + GRF are initialized: r0 = 0, r6 = a, r7 = b, r1 = 1 > + The default data type for the macro is :df > + > + math.eo.f0.0 (4) r8.acc2 r6.noacc r7.noacc 0xE > + (-f0.0) if > + madm (4) r9.acc3 r0.noacc r6.noacc r8.acc2 // Step(1), q0=a*y0 > + madm (4) r10.acc4 r1.noacc -r7.noacc r8.acc2 // Step(2), > e0=(1-b*y0) > + madm (4) r11.acc5 r6.noacc -r7.noacc r9.acc3 // Step(3), r0=a-b*q0 > + madm (4) r12.acc6 r8.acc2 r10.acc4 r8.acc2 // Step(4), > y1=y0+e0*y0 > + madm (4) r13.acc7 r1.noacc -r7.noacc r12.acc6// Step(5), > e1=(1-b*y1) > + madm (4) r8.acc8 r8.acc2 r10.acc4 r12.acc6 // Step(6), > y2=y0+e0*y1 > + madm (4) r9.acc9 r9.acc3 r11.acc5 r12.acc6 // Step(7), > q1=q0+r0*y1 > + madm (4) r12.acc2 r12.acc6 r8.acc8 r13.acc7 // Step(8), > y3=y1+e1*y2 > + madm (4) r11.acc3 r6.noacc -r7.noacc r9.acc9 // Step(9), r1=a-b*q1 > + > + Change Rounding Mode in CR if required > + Implicit Accumulator for destination is NULL > + > + madm (4) r8.noacc r9.acc9 r11.acc3 r12.acc2 // Step(10), > q=q1+r1*y3 > + endif */ I don't see an IF or an ENDIF instruction emitted in the code below. Is that intentional, or am I misreading the code? > +GenRegister r6 = GenRegister::retype(ra->genReg(insn.src(0)), > GEN_TYPE_DF); > +GenRegister r7 = GenRegister::retype(ra->genReg(insn.src(1)), > GEN_TYPE_DF); > +GenRegister r8 = GenRegister::retype(ra->genReg(insn.dst(0)), > GEN_TYPE_DF); > +const GenRegister r0 = GenRegister::retype(ra->genReg(insn.dst(1)), > GEN_TYPE_DF); > +const GenRegister r1 = GenRegister::retype(ra->genReg(insn.dst(2)), > GEN_TYPE_DF); > +const GenRegister r9 = GenRegister::retype(ra->genReg(insn.dst(3)), > GEN_TYPE_DF); > +const GenRegister r10 = GenRegister::retype(ra->genReg(insn.dst(4)), > GEN_TYPE_DF); > +const GenRegister r11 = GenRegister::retype(ra->genReg(insn.dst(5)), > GEN_TYPE_DF); > +const GenRegister r12 = GenRegister::retype(ra->genReg(insn.dst(6)), > GEN_TYPE_DF); > +const GenRegister r13 = GenRegister::retype(ra->genReg(insn.dst(7)), > GEN_TYPE_DF); > +Gen8Encoder *p8 = reinterpret_cast(p); > +p->push(); { > + p->curr.execWidth = 4; > + p->curr.predicate = GEN_PREDICATE_NONE; > + p->curr.noMask= 1; > + p->MOV(r1, GenRegister::immdf(1.0d)); > + p->MOV(r0, GenRegister::immdf(0.0d)); > + > + for (int i = 0; i < (simdWidth == 16 ? 4 : 2); i++) { > +p->curr.predicate = GEN_PREDICATE_NONE; > +p8->MATH_WITH_ACC(r8, GEN8_MATH_FUNCTION_INVM, r6, r7, > GEN8_INSN_ACC2, GEN8_INSN_NOACC, GEN8_INSN_NOACC); > +p->curr.useFlag(insn.state.flag, insn.state.subFlag); > +p->curr.predicate = GEN_PREDICATE_NORMAL; > +p->curr.inversePredicate = 1; > +p->curr.noMask= 0; > +p8->MADM(r9, r0, r6, r8, GEN8_INSN_ACC3, GEN8_INSN_NOACC, > GEN8_INSN_NOACC, GEN8_INSN_ACC2); > +p8->MADM(r10, r1, GenRegister::negate(r7), r8, GEN8_INSN_ACC4, > GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC2); > +p8->MADM(r11, r6, GenRegister::negate(r7), r9, GEN8_INSN_ACC5, > GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC3); > +p8->MADM(r12, r8, r10, r8, GEN8_INSN_ACC6, GEN8_INSN_ACC2, > GEN8_INSN_ACC4, GEN8_INSN_ACC2); > +p8->MADM(r13, r1, GenRegister::negate(r7), r12, GEN8_INSN_ACC7, > GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC6); > +p8->MADM(r8, r8, r10, r12, GEN8_INSN_ACC8, GEN8_INSN_ACC2, > GEN8_INSN_ACC4, GEN8_INSN_ACC6); > +p8->MADM(r9, r9, r11, r12, GEN8_INSN_ACC9, GEN8_INSN_ACC3, > GEN8_INSN_ACC5, GEN8_INSN_ACC6); > +p8->MADM(r12, r12, r8, r13, GEN8_INSN_ACC2, GEN8_INSN_ACC6, > GEN8_INSN_ACC8, GEN8_INSN_ACC7); > +p8->MADM(r11, r6, GenRegister::negate(r7), r9, GEN8_INSN_ACC3, > GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC9); > + > +p8->MADM(r8, r9, r11, r12, GEN8_INSN_NOACC, GEN8_INSN_ACC9, > GEN8_INSN_ACC3, GEN8_INSN_ACC2); > + > +