Re: [Beignet] [PATCH 1/2] Backend: Add optimization for negtive modifier

2017-05-22 Thread Matt Turner
On Wed, May 17, 2017 at 1:20 AM, rander.wang  wrote:
>  LLVM transform Mad(a, -b, c) to
>  Add b, -b, 0
>  Mad val, a, b, c

I think you mean that LLVM translates

>  Add b, -b, 0
>  Mad val, a, b, c

to

> Mad(a, -b, c)

As it is written in your summary, it says that LLVM makes the inverse
transformation.
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] utests: added for optimization negtiveAdd

2017-05-22 Thread Matt Turner
In the patch title and the code: negtive -> negative
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH V2] GBE: try to avoid bank conflict in register allocator.

2016-04-27 Thread Matt Turner
On Wed, Apr 27, 2016 at 12:43 AM, Ruiling Song  wrote:
> v2:
> fix build error.

Some documentation or description about this would be very welcome. I
cannot honestly imagine anyone being able to review this without it,
and I think I know what this patch is doing. :)

Some questions I think your patch's commit message (as well as
comments in the code) should answer:

 - What is a register bank?
 - What is a register bank conflict?
 - How are the register banks laid out? Does it differ per-generation
of hardware?
 - What effect does a register bank conflict have? E.g., changes in
instruction issue rate, instruction latency, ability to co-issue.
 - How does this patch attempt to avoid register bank conflicts?
 - In practice, what effect does this patch have on register bank
conflicts? For instance, I might count the number of register bank
conflicts in a collection of programs before and after the patch to
demonstrate that it is effective.

> Signed-off-by: Ruiling Song 
> ---
>  backend/src/backend/gen_reg_allocation.cpp | 31 
> --
>  1 file changed, 29 insertions(+), 2 deletions(-)
>
> diff --git a/backend/src/backend/gen_reg_allocation.cpp 
> b/backend/src/backend/gen_reg_allocation.cpp
> index 89c53d4..ce07f8a 100644
> --- a/backend/src/backend/gen_reg_allocation.cpp
> +++ b/backend/src/backend/gen_reg_allocation.cpp
> @@ -35,6 +35,7 @@
>  #include 
>
>
> +#define HALF_REGISTER_FILE_OFFSET (32*64)
>  namespace gbe
>  {
>
> /
> @@ -48,9 +49,10 @@ namespace gbe
> */
>struct GenRegInterval {
>  INLINE GenRegInterval(ir::Register reg) :
> -  reg(reg), minID(INT_MAX), maxID(-INT_MAX) {}
> +  reg(reg), minID(INT_MAX), maxID(-INT_MAX), conflictReg(0) {}
>  ir::Register reg; //!< (virtual) register of the interval
>  int32_t minID, maxID; //!< Starting and ending points
> +ir::Register conflictReg; // < has banck conflict with this register

Typo: bank

>};
>
>typedef struct GenRegIntervalKey {
> @@ -1052,7 +1054,17 @@ namespace gbe
>  // and the source is a scalar Dword. If that is the case, the byte 
> register
>  // must get 4byte alignment register offset.
>  alignment = (alignment + 3) & ~3;
> -while ((grfOffset = ctx.allocate(size, alignment)) == 0) {
> +
> +bool direction = true;
> +if (interval.conflictReg != 0) {
> +  // try to allocate conflict registers in top/bottom half.
> +  if (RA.contains(interval.conflictReg)) {
> +if (RA.find(interval.conflictReg)->second < 
> HALF_REGISTER_FILE_OFFSET) {
> +  direction = false;
> +}
> +  }
> +}
> +while ((grfOffset = ctx.allocate(size, alignment, direction)) == 0) {
>const bool success = this->expireGRF(interval);
>if (success == false) {
>  if (spillAtInterval(interval, size, alignment) == false)
> @@ -1104,6 +1116,7 @@ namespace gbe
>for (auto &insn : block.insnList) {
>  const uint32_t srcNum = insn.srcNum, dstNum = insn.dstNum;
>  insn.ID  = insnID;
> +bool is3SrcOp = insn.opcode == SEL_OP_MAD;

Does Beignet not use other 3-src opcodes? LRP, BFI2, BFE, CSEL? Of
course, MAD is the most important one.
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH V2 2/2] utests: add an utest for mix

2015-11-02 Thread Matt Turner
On Wed, Oct 21, 2015 at 8:21 PM, Pan Xiuli  wrote:
> Add a testcase for compiler mix. Since mix will have
> error, we take err limit as 1e-3 and print the max err.

I don't know what OpenCL's spec says about this issue, but you should
be aware that Gen's LRP instruction does not return INFINITY when 
or  is INFINITY.
___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH 6/8] Backend: Implement FDIV64 on BDW.

2015-09-15 Thread Matt Turner
On Tue, Sep 15, 2015 at 7:00 AM, He Junyan  wrote:
> On Tue, Sep 15, 2015 at 06:00:57AM -0700, Matt Turner wrote:
>> Date: Tue, 15 Sep 2015 06:00:57 -0700
>> From: Matt Turner 
>> To: "junyan.he" 
>> Cc: "beignet@lists.freedesktop.org" 
>> Subject: Re: [Beignet] [PATCH 6/8] Backend: Implement FDIV64 on BDW.
>>
>> On Tue, Sep 15, 2015 at 4:15 AM,   wrote:
>> > From: Junyan He 
>> >
>> > According to the document, we use a set of instructions
>> > to implement double type division.
>> >
>> > Signed-off-by: Junyan He 
>> > ---
>> >  backend/src/backend/gen8_context.cpp | 68 
>> > 
>> >  backend/src/backend/gen8_context.hpp |  2 ++
>> >  2 files changed, 70 insertions(+)
>> >
>> > diff --git a/backend/src/backend/gen8_context.cpp 
>> > b/backend/src/backend/gen8_context.cpp
>> > index b497ee5..f465832 100644
>> > --- a/backend/src/backend/gen8_context.cpp
>> > +++ b/backend/src/backend/gen8_context.cpp
>> > @@ -924,6 +924,74 @@ namespace gbe
>> >  this->unpackLongVec(src, dst, p->curr.execWidth);
>> >}
>> >
>> > +  void Gen8Context::emitF64DIVInstruction(const SelectionInstruction 
>> > &insn) {
>> > +/* Macro for Double Precision IEEE Compliant fdiv
>> > +
>> > +   Set Rounding Mode in CR to RNE
>> > +   GRF are initialized: r0 = 0, r6 = a, r7 = b, r1 = 1
>> > +   The default data type for the macro is :df
>> > +
>> > +   math.eo.f0.0 (4) r8.acc2 r6.noacc r7.noacc 0xE
>> > +   (-f0.0) if
>> > +   madm (4) r9.acc3 r0.noacc r6.noacc r8.acc2   // Step(1), 
>> > q0=a*y0
>> > +   madm (4) r10.acc4 r1.noacc -r7.noacc r8.acc2 // Step(2), 
>> > e0=(1-b*y0)
>> > +   madm (4) r11.acc5 r6.noacc -r7.noacc r9.acc3 // Step(3), 
>> > r0=a-b*q0
>> > +   madm (4) r12.acc6 r8.acc2 r10.acc4 r8.acc2   // Step(4), 
>> > y1=y0+e0*y0
>> > +   madm (4) r13.acc7 r1.noacc -r7.noacc r12.acc6// Step(5), 
>> > e1=(1-b*y1)
>> > +   madm (4) r8.acc8 r8.acc2 r10.acc4 r12.acc6   // Step(6), 
>> > y2=y0+e0*y1
>> > +   madm (4) r9.acc9 r9.acc3 r11.acc5 r12.acc6   // Step(7), 
>> > q1=q0+r0*y1
>> > +   madm (4) r12.acc2 r12.acc6 r8.acc8 r13.acc7  // Step(8), 
>> > y3=y1+e1*y2
>> > +   madm (4) r11.acc3 r6.noacc -r7.noacc r9.acc9 // Step(9), 
>> > r1=a-b*q1
>> > +
>> > +   Change Rounding Mode in CR if required
>> > +   Implicit Accumulator for destination is NULL
>> > +
>> > +   madm (4) r8.noacc r9.acc9 r11.acc3 r12.acc2  // Step(10), 
>> > q=q1+r1*y3
>> > +   endif */
>>
>> I don't see an IF or an ENDIF instruction emitted in the code below.
>> Is that intentional, or am I misreading the code?
>>
> Here, we use f0.1 as the predication for all the instructions, like:
> (-f0.1) madm (4) r9.acc3 r0.noacc r6.noacc r8.acc2
> (-f0.1) madm (4) r10.acc4 r1.noacc -r7.noacc r8.acc2
> .
> I avoid using IF-Endif here, because we need to calculate the instruction 
> number
> within IF clause, and it is not convenient.

Ah, I see.

While that works, I think it does not take advantage of the "early
out" capability of the INVM math instruction. As I understand it, for
some input values, it can calculate a full double-precision value
without any of the MADM sequence, so using IF/ENDIF will allow the EU
to jump over all of the MADM instructions -- but if you just predicate
the instructions the EU cannot jump over them, it must send each down
the pipeline.

Just something to consider. I don't know whether the difficulties of
using IF/ENDIF are great enough to avoid using them.
___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH 6/8] Backend: Implement FDIV64 on BDW.

2015-09-15 Thread Matt Turner
On Tue, Sep 15, 2015 at 4:15 AM,   wrote:
> From: Junyan He 
>
> According to the document, we use a set of instructions
> to implement double type division.
>
> Signed-off-by: Junyan He 
> ---
>  backend/src/backend/gen8_context.cpp | 68 
> 
>  backend/src/backend/gen8_context.hpp |  2 ++
>  2 files changed, 70 insertions(+)
>
> diff --git a/backend/src/backend/gen8_context.cpp 
> b/backend/src/backend/gen8_context.cpp
> index b497ee5..f465832 100644
> --- a/backend/src/backend/gen8_context.cpp
> +++ b/backend/src/backend/gen8_context.cpp
> @@ -924,6 +924,74 @@ namespace gbe
>  this->unpackLongVec(src, dst, p->curr.execWidth);
>}
>
> +  void Gen8Context::emitF64DIVInstruction(const SelectionInstruction &insn) {
> +/* Macro for Double Precision IEEE Compliant fdiv
> +
> +   Set Rounding Mode in CR to RNE
> +   GRF are initialized: r0 = 0, r6 = a, r7 = b, r1 = 1
> +   The default data type for the macro is :df
> +
> +   math.eo.f0.0 (4) r8.acc2 r6.noacc r7.noacc 0xE
> +   (-f0.0) if
> +   madm (4) r9.acc3 r0.noacc r6.noacc r8.acc2   // Step(1), q0=a*y0
> +   madm (4) r10.acc4 r1.noacc -r7.noacc r8.acc2 // Step(2), 
> e0=(1-b*y0)
> +   madm (4) r11.acc5 r6.noacc -r7.noacc r9.acc3 // Step(3), r0=a-b*q0
> +   madm (4) r12.acc6 r8.acc2 r10.acc4 r8.acc2   // Step(4), 
> y1=y0+e0*y0
> +   madm (4) r13.acc7 r1.noacc -r7.noacc r12.acc6// Step(5), 
> e1=(1-b*y1)
> +   madm (4) r8.acc8 r8.acc2 r10.acc4 r12.acc6   // Step(6), 
> y2=y0+e0*y1
> +   madm (4) r9.acc9 r9.acc3 r11.acc5 r12.acc6   // Step(7), 
> q1=q0+r0*y1
> +   madm (4) r12.acc2 r12.acc6 r8.acc8 r13.acc7  // Step(8), 
> y3=y1+e1*y2
> +   madm (4) r11.acc3 r6.noacc -r7.noacc r9.acc9 // Step(9), r1=a-b*q1
> +
> +   Change Rounding Mode in CR if required
> +   Implicit Accumulator for destination is NULL
> +
> +   madm (4) r8.noacc r9.acc9 r11.acc3 r12.acc2  // Step(10), 
> q=q1+r1*y3
> +   endif */

I don't see an IF or an ENDIF instruction emitted in the code below.
Is that intentional, or am I misreading the code?

> +GenRegister r6 = GenRegister::retype(ra->genReg(insn.src(0)), 
> GEN_TYPE_DF);
> +GenRegister r7 = GenRegister::retype(ra->genReg(insn.src(1)), 
> GEN_TYPE_DF);
> +GenRegister r8 = GenRegister::retype(ra->genReg(insn.dst(0)), 
> GEN_TYPE_DF);
> +const GenRegister r0 = GenRegister::retype(ra->genReg(insn.dst(1)), 
> GEN_TYPE_DF);
> +const GenRegister r1 = GenRegister::retype(ra->genReg(insn.dst(2)), 
> GEN_TYPE_DF);
> +const GenRegister r9 = GenRegister::retype(ra->genReg(insn.dst(3)), 
> GEN_TYPE_DF);
> +const GenRegister r10 = GenRegister::retype(ra->genReg(insn.dst(4)), 
> GEN_TYPE_DF);
> +const GenRegister r11 = GenRegister::retype(ra->genReg(insn.dst(5)), 
> GEN_TYPE_DF);
> +const GenRegister r12 = GenRegister::retype(ra->genReg(insn.dst(6)), 
> GEN_TYPE_DF);
> +const GenRegister r13 = GenRegister::retype(ra->genReg(insn.dst(7)), 
> GEN_TYPE_DF);
> +Gen8Encoder *p8 = reinterpret_cast(p);
> +p->push(); {
> +  p->curr.execWidth = 4;
> +  p->curr.predicate = GEN_PREDICATE_NONE;
> +  p->curr.noMask= 1;
> +  p->MOV(r1, GenRegister::immdf(1.0d));
> +  p->MOV(r0, GenRegister::immdf(0.0d));
> +
> +  for (int i = 0; i < (simdWidth == 16 ? 4 : 2); i++) {
> +p->curr.predicate = GEN_PREDICATE_NONE;
> +p8->MATH_WITH_ACC(r8, GEN8_MATH_FUNCTION_INVM, r6, r7, 
> GEN8_INSN_ACC2, GEN8_INSN_NOACC, GEN8_INSN_NOACC);
> +p->curr.useFlag(insn.state.flag, insn.state.subFlag);
> +p->curr.predicate = GEN_PREDICATE_NORMAL;
> +p->curr.inversePredicate = 1;
> +p->curr.noMask= 0;
> +p8->MADM(r9, r0, r6, r8, GEN8_INSN_ACC3, GEN8_INSN_NOACC, 
> GEN8_INSN_NOACC, GEN8_INSN_ACC2);
> +p8->MADM(r10, r1, GenRegister::negate(r7), r8, GEN8_INSN_ACC4, 
> GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC2);
> +p8->MADM(r11, r6, GenRegister::negate(r7), r9, GEN8_INSN_ACC5, 
> GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC3);
> +p8->MADM(r12, r8, r10, r8, GEN8_INSN_ACC6, GEN8_INSN_ACC2, 
> GEN8_INSN_ACC4, GEN8_INSN_ACC2);
> +p8->MADM(r13, r1, GenRegister::negate(r7), r12, GEN8_INSN_ACC7, 
> GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC6);
> +p8->MADM(r8, r8, r10, r12, GEN8_INSN_ACC8, GEN8_INSN_ACC2, 
> GEN8_INSN_ACC4, GEN8_INSN_ACC6);
> +p8->MADM(r9, r9, r11, r12, GEN8_INSN_ACC9, GEN8_INSN_ACC3, 
> GEN8_INSN_ACC5, GEN8_INSN_ACC6);
> +p8->MADM(r12, r12, r8, r13, GEN8_INSN_ACC2, GEN8_INSN_ACC6, 
> GEN8_INSN_ACC8, GEN8_INSN_ACC7);
> +p8->MADM(r11, r6, GenRegister::negate(r7), r9, GEN8_INSN_ACC3, 
> GEN8_INSN_NOACC, GEN8_INSN_NOACC, GEN8_INSN_ACC9);
> +
> +p8->MADM(r8, r9, r11, r12, GEN8_INSN_NOACC, GEN8_INSN_ACC9, 
> GEN8_INSN_ACC3, GEN8_INSN_ACC2);
> +
> +r6 = GenRegister::offset(r6, 1);
> +r7 = GenRegister::offse

Re: [Beignet] [PATCH 5/8] Backend: Add the MADM function to gen8 encoder.

2015-09-15 Thread Matt Turner
On Tue, Sep 15, 2015 at 4:15 AM,   wrote:
> From: Junyan He 
>
> Signed-off-by: Junyan He 
> ---
>  backend/src/backend/gen8_encoder.cpp | 56 
> 
>  backend/src/backend/gen8_encoder.hpp |  2 ++
>  backend/src/backend/gen_defs.hpp |  2 ++
>  3 files changed, 60 insertions(+)
>
> diff --git a/backend/src/backend/gen8_encoder.cpp 
> b/backend/src/backend/gen8_encoder.cpp
> index 0af27a3..002a8b5 100644
> --- a/backend/src/backend/gen8_encoder.cpp
> +++ b/backend/src/backend/gen8_encoder.cpp
> @@ -591,4 +591,60 @@ namespace gbe
>   this->setSrc0WithAcc(insn, src0, src0Acc);
>   this->setSrc1WithAcc(insn, src1, src1Acc);
>}
> +
> +  void Gen8Encoder::MADM(GenRegister dst, GenRegister src0, GenRegister 
> src1, GenRegister src2,
> +  uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc, uint32_t src2Acc)
> +  {
> +GenNativeInstruction *insn = this->next(GEN_OPCODE_MADM);
> +Gen8NativeInstruction *gen8_insn = &insn->gen8_insn;
> +assert(dst.file == GEN_GENERAL_REGISTER_FILE);
> +assert(src0.file == GEN_GENERAL_REGISTER_FILE);
> +assert(src1.file == GEN_GENERAL_REGISTER_FILE);
> +assert(src2.file == GEN_GENERAL_REGISTER_FILE);
> +assert(dst.hstride == GEN_HORIZONTAL_STRIDE_1 || dst.hstride == 
> GEN_HORIZONTAL_STRIDE_0);
> +assert(src0.type == GEN_TYPE_DF || src0.type == GEN_TYPE_F);
> +assert(src0.type == dst.type);
> +assert(src0.type == src1.type);
> +assert(src0.type == src2.type);
> +int32_t dataType = src0.type == GEN_TYPE_DF ? 3 : 0;
> +
> +this->setHeader(insn);
> +gen8_insn->bits1.da3srcacc.dest_reg_nr = dst.nr;
> +gen8_insn->bits1.da3srcacc.dest_subreg_nr = dst.subnr / 16;
> +gen8_insn->bits1.da3srcacc.dst_specal_acc = dstAcc;
> +gen8_insn->bits1.da3srcacc.src_type = dataType;
> +gen8_insn->bits1.da3srcacc.dest_type = dataType;
> +gen8_insn->header.access_mode = GEN_ALIGN_16;
> +
> +assert(src0.file == GEN_GENERAL_REGISTER_FILE);
> +assert(src0.address_mode == GEN_ADDRESS_DIRECT);
> +assert(src0.nr < 128);
> +gen8_insn->bits2.da3srcacc.src0_specal_acc = src0Acc;
> +gen8_insn->bits2.da3srcacc.src0_subreg_nr = src0.subnr / 4 ;
> +gen8_insn->bits2.da3srcacc.src0_reg_nr = src0.nr;
> +gen8_insn->bits1.da3srcacc.src0_abs = src0.absolute;
> +gen8_insn->bits1.da3srcacc.src0_negate = src0.negation;
> +gen8_insn->bits2.da3srcacc.src0_rep_ctrl = src0.vstride == 
> GEN_VERTICAL_STRIDE_0;
> +
> +assert(src1.file == GEN_GENERAL_REGISTER_FILE);
> +assert(src1.address_mode == GEN_ADDRESS_DIRECT);
> +assert(src1.nr < 128);
> +gen8_insn->bits2.da3srcacc.src1_specal_acc = src1Acc;
> +gen8_insn->bits2.da3srcacc.src1_subreg_nr_low = (src1.subnr / 4) & 0x3;
> +gen8_insn->bits3.da3srcacc.src1_subreg_nr_high = (src1.subnr / 4) >> 2;
> +gen8_insn->bits2.da3srcacc.src1_rep_ctrl = src1.vstride == 
> GEN_VERTICAL_STRIDE_0;
> +gen8_insn->bits3.da3srcacc.src1_reg_nr = src1.nr;
> +gen8_insn->bits1.da3srcacc.src1_abs = src1.absolute;
> +gen8_insn->bits1.da3srcacc.src1_negate = src1.negation;
> +
> +assert(src2.file == GEN_GENERAL_REGISTER_FILE);
> +assert(src2.address_mode == GEN_ADDRESS_DIRECT);
> +assert(src2.nr < 128);
> +gen8_insn->bits3.da3srcacc.src2_specal_acc = src2Acc;
> +gen8_insn->bits3.da3srcacc.src2_subreg_nr = src2.subnr / 4;
> +gen8_insn->bits3.da3srcacc.src2_rep_ctrl = src2.vstride == 
> GEN_VERTICAL_STRIDE_0;
> +gen8_insn->bits3.da3srcacc.src2_reg_nr = src2.nr;
> +gen8_insn->bits1.da3srcacc.src2_abs = src2.absolute;
> +gen8_insn->bits1.da3srcacc.src2_negate = src2.negation;
> +  }
>  } /* End of the name space. */
> diff --git a/backend/src/backend/gen8_encoder.hpp 
> b/backend/src/backend/gen8_encoder.hpp
> index 53ec3d1..8e7939b 100644
> --- a/backend/src/backend/gen8_encoder.hpp
> +++ b/backend/src/backend/gen8_encoder.hpp
> @@ -74,6 +74,8 @@ namespace gbe
>
>  void MATH_WITH_ACC(GenRegister dst, uint32_t function, GenRegister src0, 
> GenRegister src1,
> uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc);
> +void MADM(GenRegister dst, GenRegister src0, GenRegister src1, 
> GenRegister src2,
> +  uint32_t dstAcc, uint32_t src0Acc, uint32_t src1Acc, uint32_t 
> src2Acc);
>};
>  }
>  #endif /* __GBE_GEN8_ENCODER_HPP__ */
> diff --git a/backend/src/backend/gen_defs.hpp 
> b/backend/src/backend/gen_defs.hpp
> index a1bd8dd..1b550ac 100644
> --- a/backend/src/backend/gen_defs.hpp
> +++ b/backend/src/backend/gen_defs.hpp
> @@ -174,6 +174,8 @@ enum opcode {
>GEN_OPCODE_LINE = 89,
>GEN_OPCODE_PLN = 90,
>GEN_OPCODE_MAD = 91,
> +  GEN_OPCODE_LRP = 92,

Unrelated to the main purpose of the patch: Do I understand correctly
that Beignet does not emit the LRP instruction?

If not, I'm curious why not? It maps pretty well to the mix() function
(just reverse the argument order), but it does not handle infinities
properly [0], which 

Re: [Beignet] [PATCH 3/8] Backend: Add gen8 instruction field for special accumulator.

2015-09-15 Thread Matt Turner
On Tue, Sep 15, 2015 at 4:15 AM,   wrote:
> From: Junyan He 
>
> The madm and invm function need to set accumulator id in the
> instruction. On BDW, the write mask of the dst and channel
> mask of src are reinterpreted for acc2~acc9 selection.
>
> Signed-off-by: Junyan He 
> ---
>  backend/src/backend/gen8_instruction.hpp | 86 
> 
>  1 file changed, 86 insertions(+)
>
> diff --git a/backend/src/backend/gen8_instruction.hpp 
> b/backend/src/backend/gen8_instruction.hpp
> index 5cf1032..2aa5bf7 100644
> --- a/backend/src/backend/gen8_instruction.hpp
> +++ b/backend/src/backend/gen8_instruction.hpp
> @@ -135,6 +135,22 @@ union Gen8NativeInstruction
>  uint32_t dest_address_mode:1;
>} ia16;
>
> +  struct { // The sub reg field is reinterpreted as accumulator selector.
> +uint32_t flag_sub_reg_nr:1;
> +uint32_t flag_reg_nr:1;
> +uint32_t mask_control:1;
> +uint32_t dest_reg_file:2;
> +uint32_t dest_reg_type:4;
> +uint32_t src0_reg_file:2;
> +uint32_t src0_reg_type:4;
> +uint32_t pad:1;
> +uint32_t dst_specal_acc:4;

s/specal/special/ throughout this patch.
___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [patch v2 2/3] enable create image 2d from buffer in clCreateImage.

2015-08-28 Thread Matt Turner
On Fri, Aug 28, 2015 at 12:52 AM,   wrote:
> From: Luo Xionghu 
>
> this patch allows create 2d image with a cl buffer with zero copy.
> v2: should use reference to manage the release the buffer and image.
> After being created, the buffer reference count is 2, and image reference
> count is 1.
> if image is released first, decrease the image reference count and
> buffer reference count both, release the bo when the buffer is released
> at last;
> if buffer is released first, decrease the buffer reference count only,
> release the buffer when the image is released.
> add CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT in cl_device_info.
>
> Signed-off-by: Luo Xionghu 
> ---
>  src/cl_api.c|   3 +-
>  src/cl_device_id.c  |   2 +
>  src/cl_device_id.h  |   2 +
>  src/cl_extensions.c |   2 +
>  src/cl_gt_device.h  |   3 +-
>  src/cl_mem.c| 109 
> +++-
>  src/cl_mem.h|   1 +
>  7 files changed, 93 insertions(+), 29 deletions(-)
>
> diff --git a/src/cl_api.c b/src/cl_api.c
> index 5c9b250..0690af4 100644
> --- a/src/cl_api.c
> +++ b/src/cl_api.c
> @@ -549,8 +549,9 @@ clCreateImage(cl_context context,
>  goto error;
>}
>/* buffer refers to a valid buffer memory object if image_type is
> - CL_MEM_OBJECT_IMAGE1D_BUFFER. Otherwise it must be NULL. */
> + CL_MEM_OBJECT_IMAGE1D_BUFFER or CL_MEM_OBJECT_IMAGE2D. Otherwise it 
> must be NULL. */
>if (image_desc->image_type != CL_MEM_OBJECT_IMAGE1D_BUFFER &&
> +  image_desc->image_type != CL_MEM_OBJECT_IMAGE2D &&
>   image_desc->buffer) {
>  err = CL_INVALID_IMAGE_DESCRIPTOR;
>  goto error;
> diff --git a/src/cl_device_id.c b/src/cl_device_id.c
> index 1778292..78d2cf4 100644
> --- a/src/cl_device_id.c
> +++ b/src/cl_device_id.c
> @@ -810,6 +810,8 @@ cl_get_device_info(cl_device_id device,
>  DECL_FIELD(PARTITION_AFFINITY_DOMAIN, affinity_domain)
>  DECL_FIELD(PARTITION_TYPE, partition_type)
>  DECL_FIELD(REFERENCE_COUNT, device_reference_count)
> +DECL_FIELD(IMAGE_PITCH_ALIGNMENT, image_pitch_alignment)
> +DECL_FIELD(IMAGE_BASE_ADDRESS_ALIGNMENT, image_base_address_alignment)
>
>  case CL_DRIVER_VERSION:
>if (param_value_size_ret) {
> diff --git a/src/cl_device_id.h b/src/cl_device_id.h
> index b5db91c..02d1e0f 100644
> --- a/src/cl_device_id.h
> +++ b/src/cl_device_id.h
> @@ -116,6 +116,8 @@ struct _cl_device_id {
>cl_device_partition_property partition_type[3];
>cl_uint  device_reference_count;
>uint32_t atomic_test_result;
> +  uint32_t image_pitch_alignment;
> +  uint32_t image_base_address_alignment;
>  };
>
>  /* Get a device from the given platform */
> diff --git a/src/cl_extensions.c b/src/cl_extensions.c
> index 3eb303f..6cb1579 100644
> --- a/src/cl_extensions.c
> +++ b/src/cl_extensions.c
> @@ -46,6 +46,8 @@ void check_opt1_extension(cl_extensions_t *extensions)
>  if (id == EXT_ID(khr_spir))
>extensions->extensions[id].base.ext_enabled = 1;
>  #endif
> +if (id == EXT_ID(khr_image2d_from_buffer))
> +  extensions->extensions[id].base.ext_enabled = 1;
>}
>  }
>
> diff --git a/src/cl_gt_device.h b/src/cl_gt_device.h
> index a51843d..c2f9f56 100644
> --- a/src/cl_gt_device.h
> +++ b/src/cl_gt_device.h
> @@ -126,4 +126,5 @@ DECL_INFO_STRING(driver_version, 
> LIBCL_DRIVER_VERSION_STRING)
>  .affinity_domain = 0,
>  .partition_type = {0},
>  .device_reference_count = 1,
> -
> +.image_pitch_alignment = 1,
> +.image_base_address_alignment = 4096,
> diff --git a/src/cl_mem.c b/src/cl_mem.c
> index b5671bd..bb065f5 100644
> --- a/src/cl_mem.c
> +++ b/src/cl_mem.c
> @@ -264,6 +264,7 @@ cl_mem_allocate(enum cl_mem_type type,
>SET_ICD(mem->dispatch)
>mem->ref_n = 1;
>mem->magic = CL_MAGIC_MEM_HEADER;
> +  mem->is_image_from_buffer = 0;
>mem->flags = flags;
>mem->is_userptr = 0;
>mem->offset = 0;
> @@ -308,10 +309,19 @@ cl_mem_allocate(enum cl_mem_type type,
>}
>  }
>
> -if (!mem->is_userptr)
> +if(type == CL_MEM_IMAGE_TYPE && host_ptr && ((cl_mem)host_ptr)->magic == 
> CL_MAGIC_MEM_HEADER) {
> +  // if the image if created from buffer, should use the bo directly to 
> share same bo.
> +  mem->bo = ((cl_mem)host_ptr)->bo;
> +  mem->is_image_from_buffer = 1;
> +} else  if (!mem->is_userptr)
>mem->bo = cl_buffer_alloc(bufmgr, "CL memory object", sz, alignment);
>  #else
> -mem->bo = cl_buffer_alloc(bufmgr, "CL memory object", sz, alignment);
> +if(type == CL_MEM_IMAGE_TYPE && host_ptr && ((cl_mem)host_ptr)->magic == 
> CL_MAGIC_MEM_HEADER) {
> +  // if the image if created from buffer, should use the bo directly to 
> share same bo.
> +  mem->bo = ((cl_mem)host_ptr)->bo;
> +  mem->is_image_from_buffer = 1;
> +} else
> +  mem->bo = cl_buffer_alloc(bufmgr, "CL memory object", sz, alignment);
>  #endif
>
>  if (UNLIKELY(mem->bo == NULL)) {
> @@ -756,6 +766,8 @@ _cl_mem_new_image(cl_context ctx,
>   

Re: [Beignet] [PATCH] libocl: fix degrees function precision issue.

2015-08-06 Thread Matt Turner
On Thu, Aug 6, 2015 at 12:57 AM,   wrote:
> From: Luo Xionghu 
>
> should define and use M_180_PI_F directly instead of using 180/M_PI_F.
>
> Signed-off-by: Luo Xionghu 
> ---
>  backend/src/libocl/include/ocl_float.h | 1 +
>  backend/src/libocl/tmpl/ocl_common.tmpl.cl | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/backend/src/libocl/include/ocl_float.h 
> b/backend/src/libocl/include/ocl_float.h
> index 916233b..e63eaf9 100644
> --- a/backend/src/libocl/include/ocl_float.h
> +++ b/backend/src/libocl/include/ocl_float.h
> @@ -88,6 +88,7 @@ INLINE_OVERLOADABLE int __ocl_finitef (float x){
>  #define M_PI_4_F 0.7853981633974483F
>  #define M_1_PI_F 0.3183098861837907F
>  #define M_2_PI_F 0.6366197723675814F
> +#define M_180_PI_F   57.295779513082321F
>  #define M_2_SQRTPI_F 1.1283791670955126F
>  #define M_SQRT2_F1.4142135623730951F
>  #define M_SQRT1_2_F  0.7071067811865476F
> diff --git a/backend/src/libocl/tmpl/ocl_common.tmpl.cl 
> b/backend/src/libocl/tmpl/ocl_common.tmpl.cl
> index 76aca2b..136fe70 100644
> --- a/backend/src/libocl/tmpl/ocl_common.tmpl.cl
> +++ b/backend/src/libocl/tmpl/ocl_common.tmpl.cl
> @@ -44,7 +44,7 @@ OVERLOADABLE float clamp(float v, float l, float u) {
>
>
>  OVERLOADABLE float degrees(float radians) {
> -  return (180 / M_PI_F) * radians;
> +  return M_180_PI_F * radians;

I was surprised by this, so I wrote a program to test. Indeed, 180 /
(float)M_PI is less precise:

(float)(180 / M_PI) = 57.2957763671875 (0x1.ca5dc0p+5) (0x42652ee0)
(180 / (float)M_PI) = 57.2957763671875 (0x1.ca5dc0p+5) (0x42652ee0)
57.295779513082321F = 57.29578018188476562 (0x1.ca5dc2p+5) (0x42652ee1)

A difference of one bit. :)
___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] fix a powr function issue in cpu compiler math

2015-07-29 Thread Matt Turner
On Sun, Jul 19, 2015 at 6:33 AM, Meng Mengmeng  wrote:
> In OpenCL spec, gentype powr(gentype x, gentype y). In the meantime,
> added edge tests for powr.
> ---
>  utests/utest_math_gen.py | 24 
>  1 file changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/utests/utest_math_gen.py b/utests/utest_math_gen.py
> index 83edcc3..24ddaa4 100755
> --- a/utests/utest_math_gen.py
> +++ b/utests/utest_math_gen.py
> @@ -467,14 +467,30 @@ static float pown(float x, int y){
>pownUtests = 
> func('pown','pown',[pown_input_type1,pown_input_type2],pown_output_type,[pown_input_values1,pown_input_values2],'16
>  * FLT_ULP', pown_cpu_func)
>
># gentype powr(gentype x, gentype y)
> -  powr_input_values1 = [80, -80, 3.14, -3.14, 0.5, 1, -1, 
> 0.0,6,1500.24,-1500.24]
> -  powr_input_values2 = [5,6,7,8,10,11,12,13,14,0,12]
> +  powr_input_values1 = [80,-80,3.14,1, 1.257,+0, -0,+0,-0,  +0, -0,  +1, 
> +1,  -80,  0,-0,0,-0, 
> 'INFINITY','INFINITY',+1,+1,0,2.5,'NAN','NAN','NAN' ]
> +  powr_input_values2 = [5.5,6,7, +0,-0,-1,-15.67,'-INFINITY', '-INFINITY',1, 
>  -2.7,10.5, 3.1415,3.5,-0,-0,0,0,   0,   
> -0,'INFINITY','-INFINITY','NAN','NAN',-1.5,0,1.5]
>powr_input_type1 = ['float','float2','float4','float8','float16']
>powr_input_type2 = ['float','float2','float4','float8','float16']
>powr_output_type = ['float','float2','float4','float8','float16']
>powr_cpu_func='''
> -static float powr(float x, int y){
> -if (x<0)
> +static float powr(float x, float y){
> +if (((x > 0) && (x != +INFINITY)) &&((y == -0) || (y == -0)))

Space after &&. I think you meant (y == +0) here, but I have more
comments below about this.

> +return 1;
> +else if (((x == +0) || (x == -0)) && ((y <0) || (y == -INFINITY)))

Space after <

> +return +INFINITY;
> +else if (((x == +0) || (x == -0)) && (y > 0))
> +return +0;
> +else if (((x == +0) || (x == -0)) && ((y == +0) || (y == -0)))
> +return NAN;
> +else if ((x == +1) && ((y == +INFINITY) || (y == -INFINITY)))
> +return NAN;
> +else if ((x == +1) && ((y != +INFINITY) && (y != -INFINITY)))
> +return 1;
> +else if ((x == +INFINITY) && ((y == +0) || (y == -0)))

This pattern of (y == +0) || (y == -0) is meaningless for a few reasons:

Float == comparison against 0.0f is true if the float is positive or
negative 0.0f. There's no need to test for +0.0f and -0.0f separately.

Also, the literals you've used ("+0", "-0") are integers which are
implicitly promoted to float, and since there isn't a negative-zero
integer representation, they both evaluate to (y == 0.0f)... which as
I said already handles both positive and negative zero.

The code should simply be (y == 0.0f). (The expression y == 0.0
implicitly promotes y to a double since 0.0 without the suffix is
double-precision)

> +return NAN;
> +else if (isnan(x) || (x < 0))
> +return NAN;
> +else if ((x >=  0) && (isnan(y)))
>  return NAN;
>  else
>  return powf(x,y);
> --
> 1.9.1
>
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH 6/8] Backend: Add half float ASM output support.

2015-05-21 Thread Matt Turner
On Thu, May 21, 2015 at 1:25 AM,   wrote:
> From: Junyan He 
>
> Signed-off-by: Junyan He 
> ---
>  backend/src/backend/gen/gen_mesa_disasm.c | 83 
> +--
>  1 file changed, 78 insertions(+), 5 deletions(-)
>
> diff --git a/backend/src/backend/gen/gen_mesa_disasm.c 
> b/backend/src/backend/gen/gen_mesa_disasm.c
> index 705f5e2..a8a3aa0 100644
> --- a/backend/src/backend/gen/gen_mesa_disasm.c
> +++ b/backend/src/backend/gen/gen_mesa_disasm.c
> @@ -257,7 +257,7 @@ static const char *access_mode[2] = {
>[1] = "align16",
>  };
>
> -static const char *reg_encoding[10] = {
> +static const char *reg_encoding[11] = {
>[0] = ":UD",
>[1] = ":D",
>[2] = ":UW",
> @@ -267,10 +267,11 @@ static const char *reg_encoding[10] = {
>[6] = ":DF",
>[7] = ":F",
>[8] = ":UQ",
> -  [9] = ":Q"
> +  [9] = ":Q",
> +  [10] = ":HF"
>  };
>
> -int reg_type_size[10] = {
> +int reg_type_size[11] = {
>[0] = 4,
>[1] = 4,
>[2] = 2,
> @@ -280,7 +281,8 @@ int reg_type_size[10] = {
>[6] = 8,
>[7] = 4,
>[8] = 8,
> -  [9] = 8
> +  [9] = 8,
> +  [10] = 2,
>  };
>
>  static const char *reg_file[4] = {
> @@ -463,6 +465,17 @@ static int gen_version;
>  bits;   \
>})
>
> +#define GEN_BITS_FIELD_WITH_TYPE(inst, gen, TYPE)   \
> +  ({\
> +TYPE bits;  \
> +if (gen_version < 80)   \
> +  bits = ((const union Gen7NativeInstruction *)inst)->gen; \
> +else\
> +  bits = ((const union Gen8NativeInstruction *)inst)->gen; \
> +bits;   \
> +  })
> +
> +
>  #define GEN_BITS_FIELD2(inst, gen7, gen8)   \
>({\
>  int bits;   \
> @@ -954,6 +967,57 @@ static int src2_3src(FILE *file, const void* inst)
>return err;
>  }
>
> +static uint32_t __conv_half_to_float(uint16_t h)
> +{
> +  struct __FP32 {
> +uint32_t mantissa:23;
> +uint32_t exponent:8;
> +uint32_t sign:1;
> +  };
> +  struct __FP16 {
> +uint32_t mantissa:10;
> +uint32_t exponent:5;
> +uint32_t sign:1;
> +  };
> +  uint32_t f;
> +  struct __FP32 o;
> +  memset(&o, 0, sizeof(o));
> +  struct __FP16 i;
> +  memcpy(&i, &h, sizeof(uint16_t));
> +
> +  if (i.exponent == 0 && i.mantissa == 0) // (Signed) zero
> +o.sign = i.sign;
> +  else {
> +if (i.exponent == 0) { // Denormal (converts to normalized)
> +  // Adjust mantissa so it's normalized (and keep
> +  // track of exponent adjustment)
> +  int e = -1;
> +  uint m = i.mantissa;
> +  do {
> +e++;
> +m <<= 1;
> +  } while ((m & 0x400) == 0);
> +
> +  o.mantissa = (m & 0x3ff) << 13;
> +  o.exponent = 127 - 15 - e;
> +  o.sign = i.sign;
> +} else if (i.exponent == 0x1f) { // Inf/NaN
> +  // NOTE: Both can be handled with same code path
> +  // since we just pass through mantissa bits.
> +  o.mantissa = i.mantissa << 13;
> +  o.exponent = 255;
> +  o.sign = i.sign;
> +} else { // Normalized number
> +  o.mantissa = i.mantissa << 13;
> +  o.exponent = 127 - 15 + i.exponent;
> +  o.sign = i.sign;
> +}
> +  }

Using the F16C intrinsics here might really be worth it, at least from
a code saving perspective. See the f16intrin.h header shipped with
gcc.
___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH 1/8] Backend: Add half float as a new type.

2015-05-21 Thread Matt Turner
On Thu, May 21, 2015 at 1:25 AM,   wrote:
> From: Junyan He 
>
> Because the CPU of X86 does not support half float
> instructions, there is no support for half float operations.
> So we introduce the half class to handle the operations for
> half float using llvm's APFloat utility.

Ivybridge and newer have the F16C instruction set
(http://en.wikipedia.org/wiki/F16C) which offers instructions to
convert half-precision <-> single-precision floats.

I don't know if it's valuable to use it, but it's there.
___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH 6/7] replace mad with llvm intrinsic.

2015-03-10 Thread Matt Turner
On Tue, Mar 10, 2015 at 6:55 PM, Song, Ruiling  wrote:
>> I'm not sure that it matters for this patch, but do we know if Gen's MAD
>> instruction is a fused-multiply-add? That is, does it not do an intermediate
>> rounding step after the multiply?
> I also have such kind of concern, so I did a simple test:
> on cpu side, I use "reference = (double)x1*(double)x2 + (double)x3;"

Some recent CPUs have FMA instructions. You should make sure you know
whether your code is compiled using FMA or not.

> And on gpu side, I use "result = mad(x1, x2, x3);"
> Then compare the result and reference, the bits are exactly the same, so I 
> think gen's MAD does not do intermediate rounding after multiply.

The intermediate rounding step will not affect many pairs of numbers
that are multiplied together. You need to make sure you're testing a
pair of numbers that are affected by the intermediate rounding step.

I wrote a small program to find cases where fmaf(x, y, z) != x*y+z
(attached). Compile with -std=c99 -O2 -march=native -lm. I'm testing
on Haswell which has FMA.

It shows that

fmaf(1, 0.33, 0.67) is 1 (0x1.02p+0), but 1 * 0.33 +
0.67 is 1 (0x1p+0)

Please test that Gen's MAD instruction produces what fmaf() produces
for 1.0 * 0.33 + 0.67.

Assuming glibc's fmaf() is correct... I'm again surprised by
floating-point numbers. :)
#include 
#include 

int main() {
	const float y = 1.0f / 3.0f;
	const float z = 2.0f / 3.0f;

	for (float x = 1.0f; x < 10.0f; x = nextafterf(x, 2.0f)) {
		float fma_result = fmaf(x, y, z);
		float opencoded_result = x * y + z;

		if (fma_result != opencoded_result) {
			printf("fmaf(%g, %g, %g) is %g (%a), but %g * %g + %g is %g (%a)\n",
			   x, y, z, fma_result, fma_result,
			   x, y, z, opencoded_result, opencoded_result);
			return -1;
		}
	}

	return 0;
}
___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] double precision support

2015-03-10 Thread Matt Turner
On Tue, Mar 10, 2015 at 2:19 AM, Zhigang Gong
 wrote:
> 2. The double support is not fully supported. For example, all the math
>functions and even the divide instruction is not supported.

You're right that the hardware doesn't natively do most of the math
operations on doubles (it even doesn't do floor/ceil/trunc!), but this
BSpec page [0] does describe using features new to Broadwell to get
IEEE-compliant fdiv and sqrt for both single-precision and
double-precision.

It uses the new INVM and RSQRTM math operations, the new MADM
instruction, and the additional accumulator registers.

It seems that INVM/RSQRTM always write the flag register (the
math.eo.f0 apparently means "early out", it only seems to be
documented in passing on that page) in order to skip some instructions
when not necessary.

[0] 3D-Media-GPGPU Engine > EU Overview > ISA Introduction >
Instruction Set Reference > EUISA Instructions > math – Extended Math
Function [SNB+]
___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH 6/7] replace mad with llvm intrinsic.

2015-03-10 Thread Matt Turner
On Mon, Mar 9, 2015 at 10:59 PM,   wrote:
> From: Luo Xionghu 
>
> translate native mad to llvm.fma.

I'm not sure that it matters for this patch, but do we know if Gen's
MAD instruction is a fused-multiply-add? That is, does it not do an
intermediate rounding step after the multiply?
___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] libocl: refine implementation of sign().

2015-01-29 Thread Matt Turner
On Wed, Jan 28, 2015 at 11:18 PM, Ruiling Song  wrote:
> Avoid if-branching.
>
> Signed-off-by: Ruiling Song 
> ---
>  backend/src/libocl/tmpl/ocl_common.tmpl.cl |   16 +---
>  1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/backend/src/libocl/tmpl/ocl_common.tmpl.cl 
> b/backend/src/libocl/tmpl/ocl_common.tmpl.cl
> index db7b0d8..77bd2d3 100644
> --- a/backend/src/libocl/tmpl/ocl_common.tmpl.cl
> +++ b/backend/src/libocl/tmpl/ocl_common.tmpl.cl
> @@ -17,6 +17,7 @@
>   */
>  #include "ocl_common.h"
>  #include "ocl_float.h"
> +#include "ocl_relational.h"
>
>  /
>  // Common Functions
> @@ -55,11 +56,12 @@ OVERLOADABLE float smoothstep(float e0, float e1, float 
> x) {
>  }
>
>  OVERLOADABLE float sign(float x) {
> -  if(x > 0)
> -return 1;
> -  if(x < 0)
> -return -1;
> -  if(x == -0.f)
> -return -0.f;
> -  return 0.f;
> +  union {float f; unsigned u;} ieee;
> +  ieee.f = x;
> +  unsigned k = ieee.u;
> +  float r = (k&0x8000) ? -1.0f : 1.0f;
> +  // differentiate +0.0f -0.0f
> +  float s = 0.0f * r;
> +  s = (x == 0.0f) ? s : r;
> +  return isnan(x) ? 0.0f : s;
>  }
> --
> 1.7.10.4

I don't know if the structure of Beignet allows it (I see that the
implementation is in OpenCL C rather than hardware instructions), but
Mesa implements sign() for GLSL in three instructions:

cmp.nz.f0  nullx:f  0.0:f
andret:ud  x:ud 0x8000:ud
(+f0) or   ret:ud  ret:ud 0x3f80:ud

The AND instruction extracts the sign bit, and the predicated OR
instruction ORs in the hex value of 1.0 if x is not zero.

This gives +1.0 if x > 0.0
   +0.0 if x == +0.0
   -0.0 if x == -0.0
   -1.0 if x < 0.0

And since the CMP.NZ's src1 is zero, you can move the conditional mod
back into the instruction that generated x.

The CL spec says you also have to handle NaN, which this
implementation doesn't do, but that should just be an additional two
instructions, I think:

 (I don't remember precisely... CMPN.U maybe?)
(+f0) mov  ret:f   0.0f

I think this should be a few instructions shorter than what your code
will compile to.
___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH 1/3] GBE: switch to CLANG native sampler_t.

2014-12-18 Thread Matt Turner
On Sun, Dec 14, 2014 at 5:02 PM, Zhigang Gong  wrote:
> diff --git a/backend/src/libocl/include/ocl_types.h 
> b/backend/src/libocl/include/ocl_types.h
> index 49ac907..7798ee1 100644
> --- a/backend/src/libocl/include/ocl_types.h
> +++ b/backend/src/libocl/include/ocl_types.h
> @@ -87,8 +87,8 @@ DEF(double);
>  // FIXME:
>  // This is a transitional hack to bypass the LLVM 3.3 built-in types.
>  // See the Khronos SPIR specification for handling of these types.
> -#define sampler_t __sampler_t
> -typedef const ushort __sampler_t;
> +//#define sampler_t __sampler_t
> +//typedef const ushort __sampler_t;

Did you mean to delete these lines, instead of commenting them out?

>
>  /
>  // OpenCL built-in event types
> diff --git a/backend/src/libocl/src/ocl_image.cl 
> b/backend/src/libocl/src/ocl_image.cl
> index c4ca2f8..6da8e90 100644
> --- a/backend/src/libocl/src/ocl_image.cl
> +++ b/backend/src/libocl/src/ocl_image.cl
> @@ -136,18 +136,24 @@ GEN_VALIDATE_ARRAY_INDEX(int, image1d_buffer_t)
>  // integer type surfaces correctly with CLK_ADDRESS_CLAMP and 
> CLK_FILTER_NEAREST.
>  // The work around is to use a LD message instead of normal sample message.
>  
> ///
> +
> +bool __gen_ocl_sampler_need_fix(sampler_t);
> +bool __gen_ocl_sampler_need_rounding_fix(sampler_t);
> +
>  bool __gen_sampler_need_fix(const sampler_t sampler)
>  {
> -  return (((sampler & __CLK_ADDRESS_MASK) == CLK_ADDRESS_CLAMP) &&
> -  ((sampler & __CLK_FILTER_MASK) == CLK_FILTER_NEAREST));
> +  return __gen_ocl_sampler_need_fix(sampler);
> +
> +//  return (((sampler & __CLK_ADDRESS_MASK) == CLK_ADDRESS_CLAMP) &&
> +//  ((sampler & __CLK_FILTER_MASK) == CLK_FILTER_NEAREST));

And here?

>  }
>
>  bool __gen_sampler_need_rounding_fix(const sampler_t sampler)
>  {
> -  return ((sampler & CLK_NORMALIZED_COORDS_TRUE) == 0);
> +  return __gen_ocl_sampler_need_rounding_fix(sampler);
> +//  return ((sampler & CLK_NORMALIZED_COORDS_TRUE) == 0);

And here?

Copyright date should be updated as well.
___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] GBE: disable spill register under simd16 mode.

2014-11-25 Thread Matt Turner
On Thu, Nov 20, 2014 at 8:09 PM, Zhigang Gong  wrote:
> Register spilling awlays cost much more than fallback to simd8
> which could avoid register spilling or at least reduce the spilled
> registers.

For what it's worth, we made the same decision in the i965 Mesa driver.

There has been some conjecture that a spilling SIMD16 program *could*
potentially be faster than a non-spilling SIMD8 program, but I don't
know of any hard evidence.
___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Update LunarGLASS copyright.

2014-09-04 Thread Matt Turner
On Thu, Sep 4, 2014 at 7:25 PM, Yang Rong  wrote:
> LunarGLASS have update his copyright, so update the copyright in 
> llvm_scalarize.cpp.
>
> Signed-off-by: Yang Rong 
> ---
>  backend/src/llvm/llvm_scalarize.cpp | 44 
> -
>  1 file changed, 29 insertions(+), 15 deletions(-)
>
> diff --git a/backend/src/llvm/llvm_scalarize.cpp 
> b/backend/src/llvm/llvm_scalarize.cpp
> index 5c14012..030939a 100644
> --- a/backend/src/llvm/llvm_scalarize.cpp
> +++ b/backend/src/llvm/llvm_scalarize.cpp
> @@ -1,4 +1,4 @@
> -;/*
> +/*
>   * Copyright © 2012 Intel Corporation
>   *
>   * This library is free software; you can redistribute it and/or
> @@ -20,28 +20,42 @@
>   * \author Yang Rong 
>   *
>   * This file is derived from:
> - *  
> https://code.google.com/p/lunarglass/source/browse/trunk/Core/Passes/Transforms/Scalarize.cpp?r=605
> + *  
> https://code.google.com/p/lunarglass/source/browse/trunk/Core/Passes/Transforms/Scalarize.cpp?r=903
>   */
>
>  //===- Scalarize.cpp - Scalarize LunarGLASS IR 
> ===//
>  //
>  // LunarGLASS: An Open Modular Shader Compiler Architecture
> -// Copyright (C) 2010-2011 LunarG, Inc.
> +// Copyright (C) 2010-2014 LunarG, Inc.
>  //
> -// This program is free software; you can redistribute it and/or
> -// modify it under the terms of the GNU General Public License
> -// as published by the Free Software Foundation; version 2 of the
> -// License.
> +// Redistribution and use in source and binary forms, with or without
> +// modification, are permitted provided that the following conditions
> +// are met:
>  //
> -// This program is distributed in the hope that it will be useful,
> -// but WITHOUT ANY WARRANTY; without even the implied warranty of
> -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> -// GNU General Public License for more details.
> +// Redistributions of source code must retain the above copyright
> +// notice, this list of conditions and the following disclaimer.
>  //
> -// You should have received a copy of the GNU General Public License
> -// along with this program; if not, write to the Free Software
> -// Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> -// 02110-1301, USA.
> +// Redistributions in binary form must reproduce the above
> +// copyright notice, this list of conditions and the following
> +// disclaimer in the documentation and/or other materials provided
> +// with the distribution.
> +//
> +// Neither the name of LunarG Inc. nor the names of its
> +// contributors may be used to endorse or promote products derived
> +// from this software without specific prior written permission.
> +//
> +// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> +// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> +// COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> +// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
> +// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> +// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
> +// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> +// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
> +// ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
> +// POSSIBILITY OF SUCH DAMAGE.
>  //
>  
> //===--===//
>  //
> --

I don't know that this is sufficient. The version of the code you
imported was GPLv2. They've updated the code since then and changed
the license. I don't think you can just take the old code under the
new license, more permissive or not.

Just reimport the new BSD licensed code.
___
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet