On Mon, 2017-06-26 at 10:38 -0700, Francisco Jerez wrote: > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > > > On Fri, 2017-06-23 at 11:06 -0700, Francisco Jerez wrote: > > > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > > > > > > > On Thu, 2017-06-22 at 16:25 -0700, Francisco Jerez wrote: > > > > > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > > > > > > > > > > > Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia. > > > > > > com> > > > > > > --- > > > > > > src/intel/compiler/brw_eu_defines.h | 2 + > > > > > > src/intel/compiler/brw_shader.cpp | 5 + > > > > > > src/intel/compiler/brw_vec4.cpp | 7 ++ > > > > > > src/intel/compiler/brw_vec4.h | 8 ++ > > > > > > src/intel/compiler/brw_vec4_generator.cpp | 136 > > > > > > +++++++++++++++++++++++++++ > > > > > > src/intel/compiler/brw_vec4_reg_allocate.cpp | 6 +- > > > > > > src/intel/compiler/brw_vec4_visitor.cpp | 49 > > > > > > ++++++++++ > > > > > > 7 files changed, 212 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/src/intel/compiler/brw_eu_defines.h > > > > > > b/src/intel/compiler/brw_eu_defines.h > > > > > > index 1af835d47e..3c148de0fa 100644 > > > > > > --- a/src/intel/compiler/brw_eu_defines.h > > > > > > +++ b/src/intel/compiler/brw_eu_defines.h > > > > > > @@ -436,6 +436,8 @@ enum opcode { > > > > > > VEC4_OPCODE_PICK_HIGH_32BIT, > > > > > > VEC4_OPCODE_SET_LOW_32BIT, > > > > > > VEC4_OPCODE_SET_HIGH_32BIT, > > > > > > + VEC4_OPCODE_GEN4_SCRATCH_READ_1OWORD_LOW, > > > > > > + VEC4_OPCODE_GEN4_SCRATCH_READ_1OWORD_HIGH, > > > > > > > > > > > > > > > > What's the point of introducing two different opcodes with > > > > > essentially > > > > > the same semantics (read 32B worth of data) as the current > > > > > SHADER_OPCODE_GEN4_SCRATCH_READ? > > > > > > > > Originally I had only SHADER_OPCODE_GEN4_SCRATCH_READ but I > > > > changed > > > > it > > > > to don't allocate more registers than needed when doing scratch > > > > write > > > > of a partial DF write. Let me explain it: > > > > > > > > When doing spilling, as DF instructions are both split and > > > > scalarized, > > > > we read the existing contents in scratch memory, overwrite them > > > > with > > > > the destination of the instruction, then emit scratch write. > > > > Together > > > > with the fact that I am not shuffling DF data, we only need to > > > > allocate > > > > 1 GRF to do so, instead of 2 (if I had emitted > > > > SHADER_OPCODE_GEN4_SCRATCH_READ), when doing spilling on > > > > partial DF > > > > writes. > > > > > > > > > > Why would you need to allocate more GRFs for > > > SHADER_OPCODE_GEN4_SCRATCH_READ? It also only reads one > > > register, > > > which > > > should be sufficient for a single scalarized instruction as long > > > as > > > you > > > don't shuffle data around -- Have a look at how the FS back-end > > > addresses this problem. > > > > > > > OK > > > > > > > Is there any downside from using the > > > > > current opcode with force_writemask_all? If anything it > > > > > would > > > > > give > > > > > you > > > > > better performance because you'd only have to set up one > > > > > header > > > > > (which > > > > > stalls the EU pipeline twice), send down one message to the > > > > > dataport, > > > > > and avoid stalling to shuffle the data around in the return > > > > > payload > > > > > (which prevents your two 1OWORD messages from being pipelined > > > > > at > > > > > all). > > > > > > > > > > > > > Sorry, I am confused here. Do you mean using > > > > SHADER_OPCODE_GEN4_SCRATCH_READ as-is, which emits a "OWord > > > > Dual > > > > Block > > > > Read" message (so only one message)? > > > > > > > > If that's the case, then I should shuffle the destination data > > > > of > > > > the > > > > partial DF write, change the 1-Oword block write offsets and so > > > > on... > > > > > > Why would you need to shuffle any spilled data? I don't think > > > there's > > > much of a benefit from shuffling since scratch overwrites need > > > read > > > the > > > original data for the most part anyway because of > > > writemasking. In > > > fact > > > shuffling DF data is probably the reason things blow up right now > > > whenever you have mixed DF and single-precision reads or writes > > > to > > > the > > > same spilled variable, which I guess is the reason you need to > > > look > > > for > > > those cases and mark them as no_spill... > > > > > > > Right, I don't need to shuffle data for the scratch write. > > > > > > in order to save it inside scratch memory in the proper place > > > > to > > > > make > > > > OWord Dual Block Read work. That would require to some extra > > > > instructions, but I don't know if this would give better > > > > performance > > > > against current implementation or not. > > > > > > > > > > I expect the most serious performance issue with the approach of > > > this > > > patch will be the sequence of non-pipelined single-oword reads, > > > which > > > means you get to pay for the EU-dataport roundtrip latency twice > > > instead > > > of once. > > > > > > > Then, why do I need force_writemask=true when emitting > > > > SHADER_OPCODE_GEN4_SCRATCH_READ? > > > > > > > > > > Because you probably don't want to shuffle data in your scratch > > > buffer, > > > and you don't want the dataport to apply bogus 16B channel > > > enables to > > > your reads and writes. > > > > > > > If we save the dvec4 data of a vertex altogether in consecutive 32 > > bytes in scratch memory (i.e. no need of shuffling and we use > > force_writemask_all as you said), then we need to create a special > > case > > for IVB and partial DFs reads on HSW+ when unspilling the data. > > > > What I am thinking now is if the scratch write is done wisely, we > > can > > write the data in the proper places for the two > > SHADER_OPCODE_GEN4_SCRATCH_READ we use for unspill DF data: write > > each > > XY components with the respective 1-OWord scratch write message and > > ZW > > components with other 1-OWord scratch write messages with an offset > > of > > 32 bytes. Thanks to this, we don't need to touch the current code > > for > > unspilling (which does data shuffling) and it allows us to do > > unspilling on IVB and partial DF reads on HSW+ without any special > > case. > > > > If we choose the no-shuffling-at-all solution, this is an > > improvement > > to what I have sent in this v1, but I am leaning toward the > > solution in > > last paragraph because it re-uses existing code and simplifies the > > changes, although we have some data shuffling overhead. > > > > What do you think? > > > > Cannot we just drop the shuffling on HSW+ too? AFAIA it has the same > drawbacks on HSW+ as it has on IVB, so I don't see any reason for > supporting both codepaths. >
OK! Sam > > Sam > > > > > > > > I can try this alternative solution if this is what you meant. > > > > It > > > > has > > > > the advantage of simplifying the changes a lot, which is always > > > > great. > > > > > > > > Sam > > > > > > > > > > FS_OPCODE_DDX_COARSE, > > > > > > FS_OPCODE_DDX_FINE, > > > > > > diff --git a/src/intel/compiler/brw_shader.cpp > > > > > > b/src/intel/compiler/brw_shader.cpp > > > > > > index 53d0742d2e..248feacbd2 100644 > > > > > > --- a/src/intel/compiler/brw_shader.cpp > > > > > > +++ b/src/intel/compiler/brw_shader.cpp > > > > > > @@ -296,6 +296,11 @@ brw_instruction_name(const struct > > > > > > gen_device_info *devinfo, enum opcode op) > > > > > > case FS_OPCODE_PACK: > > > > > > return "pack"; > > > > > > > > > > > > + > > > > > > + case VEC4_OPCODE_GEN4_SCRATCH_READ_1OWORD_LOW: > > > > > > + return "gen4_scratch_read_1word_low"; > > > > > > + case VEC4_OPCODE_GEN4_SCRATCH_READ_1OWORD_HIGH: > > > > > > + return "gen4_scratch_read_1word_high"; > > > > > > case SHADER_OPCODE_GEN4_SCRATCH_READ: > > > > > > return "gen4_scratch_read"; > > > > > > case SHADER_OPCODE_GEN4_SCRATCH_WRITE: > > > > > > diff --git a/src/intel/compiler/brw_vec4.cpp > > > > > > b/src/intel/compiler/brw_vec4.cpp > > > > > > index b443effca9..b6d409eea2 100644 > > > > > > --- a/src/intel/compiler/brw_vec4.cpp > > > > > > +++ b/src/intel/compiler/brw_vec4.cpp > > > > > > @@ -259,6 +259,8 @@ bool > > > > > > vec4_instruction::can_do_writemask(const struct > > > > > > gen_device_info > > > > > > *devinfo) > > > > > > { > > > > > > switch (opcode) { > > > > > > + case VEC4_OPCODE_GEN4_SCRATCH_READ_1OWORD_LOW: > > > > > > + case VEC4_OPCODE_GEN4_SCRATCH_READ_1OWORD_HIGH: > > > > > > case SHADER_OPCODE_GEN4_SCRATCH_READ: > > > > > > case VEC4_OPCODE_DOUBLE_TO_F32: > > > > > > case VEC4_OPCODE_DOUBLE_TO_D32: > > > > > > @@ -335,6 +337,9 @@ > > > > > > vec4_visitor::implied_mrf_writes(vec4_instruction *inst) > > > > > > return 1; > > > > > > case VS_OPCODE_PULL_CONSTANT_LOAD: > > > > > > return 2; > > > > > > + case VEC4_OPCODE_GEN4_SCRATCH_READ_1OWORD_LOW: > > > > > > + case VEC4_OPCODE_GEN4_SCRATCH_READ_1OWORD_HIGH: > > > > > > + return 1; > > > > > > case SHADER_OPCODE_GEN4_SCRATCH_READ: > > > > > > return 2; > > > > > > case SHADER_OPCODE_GEN4_SCRATCH_WRITE: > > > > > > @@ -2091,6 +2096,8 @@ get_lowered_simd_width(const struct > > > > > > gen_device_info *devinfo, > > > > > > { > > > > > > /* Do not split some instructions that require special > > > > > > handling > > > > > > */ > > > > > > switch (inst->opcode) { > > > > > > + case VEC4_OPCODE_GEN4_SCRATCH_READ_1OWORD_LOW: > > > > > > + case VEC4_OPCODE_GEN4_SCRATCH_READ_1OWORD_HIGH: > > > > > > case SHADER_OPCODE_GEN4_SCRATCH_READ: > > > > > > case SHADER_OPCODE_GEN4_SCRATCH_WRITE: > > > > > > return inst->exec_size; > > > > > > diff --git a/src/intel/compiler/brw_vec4.h > > > > > > b/src/intel/compiler/brw_vec4.h > > > > > > index d828da02ea..a5b45aca21 100644 > > > > > > --- a/src/intel/compiler/brw_vec4.h > > > > > > +++ b/src/intel/compiler/brw_vec4.h > > > > > > @@ -214,6 +214,9 @@ public: > > > > > > enum brw_conditional_mod > > > > > > condition); > > > > > > vec4_instruction *IF(enum brw_predicate predicate); > > > > > > EMIT1(SCRATCH_READ) > > > > > > + vec4_instruction *DF_IVB_SCRATCH_READ(const dst_reg > > > > > > &dst, > > > > > > const > > > > > > src_reg &src0, > > > > > > + bool low); > > > > > > + > > > > > > EMIT2(SCRATCH_WRITE) > > > > > > EMIT3(LRP) > > > > > > EMIT1(BFREV) > > > > > > @@ -294,6 +297,11 @@ public: > > > > > > dst_reg dst, > > > > > > src_reg orig_src, > > > > > > int base_offset); > > > > > > + void emit_1grf_df_ivb_scratch_read(bblock_t *block, > > > > > > + vec4_instruction > > > > > > *inst, > > > > > > + dst_reg temp, > > > > > > src_reg > > > > > > orig_src, > > > > > > + int base_offset, > > > > > > bool > > > > > > first_grf); > > > > > > + > > > > > > void emit_scratch_write(bblock_t *block, > > > > > > vec4_instruction > > > > > > *inst, > > > > > > int base_offset); > > > > > > void emit_pull_constant_load(bblock_t *block, > > > > > > vec4_instruction > > > > > > *inst, > > > > > > diff --git a/src/intel/compiler/brw_vec4_generator.cpp > > > > > > b/src/intel/compiler/brw_vec4_generator.cpp > > > > > > index 334933d15a..3bb931385a 100644 > > > > > > --- a/src/intel/compiler/brw_vec4_generator.cpp > > > > > > +++ b/src/intel/compiler/brw_vec4_generator.cpp > > > > > > @@ -1133,6 +1133,73 @@ generate_unpack_flags(struct > > > > > > brw_codegen > > > > > > *p, > > > > > > } > > > > > > > > > > > > static void > > > > > > +generate_scratch_read_1oword(struct brw_codegen *p, > > > > > > + vec4_instruction *inst, > > > > > > + struct brw_reg dst, > > > > > > + struct brw_reg index, > > > > > > + bool low) > > > > > > +{ > > > > > > + const struct gen_device_info *devinfo = p->devinfo; > > > > > > + > > > > > > + assert(devinfo->gen >= 7 && inst->exec_size == 4 && > > > > > > + type_sz(dst.type) == 8); > > > > > > + brw_set_default_access_mode(p, BRW_ALIGN_1); > > > > > > + brw_set_default_exec_size(p, BRW_EXECUTE_8); > > > > > > + > > > > > > + if (!low) { > > > > > > + /* Read second GRF (offset in OWORDs) */ > > > > > > + for (int i = 0; i < 2; i++) { > > > > > > + brw_oword_block_read_scratch(p, > > > > > > + dst, > > > > > > + brw_message_reg(inst > > > > > > - > > > > > > > base_mrf), > > > > > > > > > > > > + 1, 32*inst->offset + > > > > > > 16*i + > > > > > > 32, false, true); > > > > > > + if (i == 0) { > > > > > > + /* The scratch read message writes the 128 MSB > > > > > > (OWORD1 > > > > > > HIGH) of > > > > > > + * the destination. We need to move them to > > > > > > dst.0 > > > > > > so > > > > > > we can > > > > > > + * read the pending 128 bits without using a > > > > > > temporary > > > > > > register. > > > > > > + */ > > > > > > + brw_set_default_exec_size(p, BRW_EXECUTE_4); > > > > > > + struct brw_reg tmp = > > > > > > + stride(suboffset(dst, 16 / > > > > > > type_sz(dst.type)), > > > > > > + 4, 4, 1); > > > > > > + > > > > > > + brw_set_default_mask_control(p, true); > > > > > > + brw_MOV(p, dst, tmp); > > > > > > + brw_set_default_mask_control(p, inst- > > > > > > > force_writemask_all); > > > > > > > > > > > > + brw_set_default_exec_size(p, BRW_EXECUTE_8); > > > > > > + } > > > > > > + } > > > > > > + } else { > > > > > > + /* Read first GRF (offset in OWORDs) */ > > > > > > + for (int i = 1; i >= 0; i--) { > > > > > > + brw_oword_block_read_scratch(p, > > > > > > + dst, > > > > > > + brw_message_reg(inst > > > > > > - > > > > > > > base_mrf), > > > > > > > > > > > > + 1, 32*inst->offset + > > > > > > 16*i, > > > > > > true, false); > > > > > > + > > > > > > + if (i == 1) { > > > > > > + /* The scratch read message writes the 128 LSB > > > > > > (OWORD1 > > > > > > LOW) of > > > > > > + * the destination. We need to move them to > > > > > > dst.4 > > > > > > so > > > > > > we can > > > > > > + * read the pending 128 bits without using a > > > > > > temporary > > > > > > register. > > > > > > + */ > > > > > > + struct brw_reg tmp = stride(dst, 4, 4, 1); > > > > > > + brw_set_default_exec_size(p, BRW_EXECUTE_4); > > > > > > + brw_set_default_mask_control(p, true); > > > > > > + brw_MOV(p, > > > > > > + suboffset(dst, 16 / > > > > > > type_sz(dst.type)), > > > > > > + tmp); > > > > > > + brw_set_default_mask_control(p, inst- > > > > > > > force_writemask_all); > > > > > > > > > > > > + brw_set_default_exec_size(p, BRW_EXECUTE_8); > > > > > > + } > > > > > > + } > > > > > > + } > > > > > > + > > > > > > + brw_set_default_exec_size(p, cvt(inst->exec_size) - 1); > > > > > > + brw_set_default_access_mode(p, BRW_ALIGN_16); > > > > > > + return; > > > > > > +} > > > > > > + > > > > > > +static void > > > > > > generate_scratch_read(struct brw_codegen *p, > > > > > > vec4_instruction *inst, > > > > > > struct brw_reg dst, > > > > > > @@ -1143,6 +1210,16 @@ generate_scratch_read(struct > > > > > > brw_codegen > > > > > > *p, > > > > > > > > > > > > gen6_resolve_implied_move(p, &header, inst->base_mrf); > > > > > > > > > > > > + if (devinfo->gen >= 7 && inst->exec_size == 4 && > > > > > > + type_sz(dst.type) == 8) { > > > > > > + /* First read second GRF (offset in OWORDs) */ > > > > > > + struct brw_reg dst_high = suboffset(dst, 32 / > > > > > > type_sz(dst.type)); > > > > > > + generate_scratch_read_1oword(p, inst, dst_high, > > > > > > index, > > > > > > false); > > > > > > + /* Now read first GRF (data from first vertex) */ > > > > > > + generate_scratch_read_1oword(p, inst, dst, index, > > > > > > true); > > > > > > + return; > > > > > > + } > > > > > > + > > > > > > generate_oword_dual_block_offsets(p, > > > > > > brw_message_reg(inst- > > > > > > > base_mrf + 1), > > > > > > > > > > > > index); > > > > > > > > > > > > @@ -1192,6 +1269,57 @@ generate_scratch_write(struct > > > > > > brw_codegen > > > > > > *p, > > > > > > struct brw_reg header = brw_vec8_grf(0, 0); > > > > > > bool write_commit; > > > > > > > > > > > > + if (devinfo->gen >= 7 && inst->exec_size == 4 && > > > > > > + type_sz(src.type) == 8) { > > > > > > + brw_set_default_access_mode(p, BRW_ALIGN_1); > > > > > > + > > > > > > + /* The messages only works with group == 0, we use > > > > > > the > > > > > > group > > > > > > to know which > > > > > > + * message emit (1-OWORD LOW or 1-OWORD HIGH). > > > > > > + */ > > > > > > + brw_set_default_group(p, 0); > > > > > > + > > > > > > + if (inst->group == 0) { > > > > > > + for (int i = 0; i < 2; i++) { > > > > > > + brw_set_default_exec_size(p, BRW_EXECUTE_4); > > > > > > + brw_set_default_mask_control(p, true); > > > > > > + struct brw_reg temp = > > > > > > + retype(suboffset(src, i * 16 / > > > > > > type_sz(src.type)), > > > > > > BRW_REGISTER_TYPE_UD); > > > > > > + temp = stride(temp, 4, 4, 1); > > > > > > + > > > > > > + brw_MOV(p, brw_uvec_mrf(4, inst->base_mrf + 1, > > > > > > 0), > > > > > > + temp); > > > > > > + brw_set_default_mask_control(p, inst- > > > > > > > force_writemask_all); > > > > > > > > > > > > + brw_set_default_exec_size(p, BRW_EXECUTE_8); > > > > > > + > > > > > > + /* Offset in OWORDs */ > > > > > > + brw_oword_block_write_scratch(p, > > > > > > brw_message_reg(inst- > > > > > > > base_mrf), > > > > > > > > > > > > + 1, 32*inst- > > > > > > >offset + > > > > > > 16*i, true, false); > > > > > > + } > > > > > > + } else { > > > > > > + for (int i = 0; i < 2; i++) { > > > > > > + brw_set_default_exec_size(p, BRW_EXECUTE_4); > > > > > > + > > > > > > + brw_set_default_mask_control(p, true); > > > > > > + struct brw_reg temp = > > > > > > + retype(suboffset(src, i * 16 / > > > > > > type_sz(src.type)), > > > > > > BRW_REGISTER_TYPE_UD); > > > > > > + temp = stride(temp, 4, 4, 1); > > > > > > + > > > > > > + brw_MOV(p, brw_uvec_mrf(4, inst->base_mrf + 1, > > > > > > 4), > > > > > > + temp); > > > > > > + > > > > > > + brw_set_default_mask_control(p, inst- > > > > > > > force_writemask_all); > > > > > > > > > > > > + brw_set_default_exec_size(p, BRW_EXECUTE_8); > > > > > > + > > > > > > + /* Offset in OWORDs */ > > > > > > + brw_oword_block_write_scratch(p, > > > > > > brw_message_reg(inst- > > > > > > > base_mrf), > > > > > > > > > > > > + 1, 32*inst- > > > > > > >offset + > > > > > > 16*i + 32, false, true); > > > > > > + } > > > > > > + } > > > > > > + brw_set_default_exec_size(p, cvt(inst->exec_size) - > > > > > > 1); > > > > > > + brw_set_default_access_mode(p, BRW_ALIGN_16); > > > > > > + return; > > > > > > + } > > > > > > + > > > > > > /* If the instruction is predicated, we'll predicate > > > > > > the > > > > > > send, > > > > > > not > > > > > > * the header setup. > > > > > > */ > > > > > > @@ -1780,6 +1908,14 @@ generate_code(struct brw_codegen *p, > > > > > > generate_vs_urb_write(p, inst); > > > > > > break; > > > > > > > > > > > > + case VEC4_OPCODE_GEN4_SCRATCH_READ_1OWORD_LOW: > > > > > > + generate_scratch_read_1oword(p, inst, dst, > > > > > > src[0], > > > > > > true); > > > > > > + fill_count++; > > > > > > + break; > > > > > > + case VEC4_OPCODE_GEN4_SCRATCH_READ_1OWORD_HIGH: > > > > > > + generate_scratch_read_1oword(p, inst, dst, > > > > > > src[0], > > > > > > false); > > > > > > + fill_count++; > > > > > > + break; > > > > > > case SHADER_OPCODE_GEN4_SCRATCH_READ: > > > > > > generate_scratch_read(p, inst, dst, src[0]); > > > > > > fill_count++; > > > > > > diff --git a/src/intel/compiler/brw_vec4_reg_allocate.cpp > > > > > > b/src/intel/compiler/brw_vec4_reg_allocate.cpp > > > > > > index a0ba77b867..ec5ba10e86 100644 > > > > > > --- a/src/intel/compiler/brw_vec4_reg_allocate.cpp > > > > > > +++ b/src/intel/compiler/brw_vec4_reg_allocate.cpp > > > > > > @@ -332,7 +332,9 @@ can_use_scratch_for_source(const > > > > > > vec4_instruction *inst, unsigned i, > > > > > > * reusing scratch_reg for this instruction. > > > > > > */ > > > > > > if (prev_inst->opcode == > > > > > > SHADER_OPCODE_GEN4_SCRATCH_WRITE || > > > > > > - prev_inst->opcode == > > > > > > SHADER_OPCODE_GEN4_SCRATCH_READ) > > > > > > + prev_inst->opcode == > > > > > > SHADER_OPCODE_GEN4_SCRATCH_READ > > > > > > > > > > > > > > > > > > > > + prev_inst->opcode == > > > > > > VEC4_OPCODE_GEN4_SCRATCH_READ_1OWORD_LOW || > > > > > > + prev_inst->opcode == > > > > > > VEC4_OPCODE_GEN4_SCRATCH_READ_1OWORD_HIGH) > > > > > > continue; > > > > > > > > > > > > /* If the previous instruction does not write to > > > > > > scratch_reg, then check > > > > > > @@ -467,6 +469,8 @@ > > > > > > vec4_visitor::evaluate_spill_costs(float > > > > > > *spill_costs, bool *no_spill) > > > > > > loop_scale /= 10; > > > > > > break; > > > > > > > > > > > > + case VEC4_OPCODE_GEN4_SCRATCH_READ_1OWORD_LOW: > > > > > > + case VEC4_OPCODE_GEN4_SCRATCH_READ_1OWORD_HIGH: > > > > > > case SHADER_OPCODE_GEN4_SCRATCH_READ: > > > > > > case SHADER_OPCODE_GEN4_SCRATCH_WRITE: > > > > > > for (int i = 0; i < 3; i++) { > > > > > > diff --git a/src/intel/compiler/brw_vec4_visitor.cpp > > > > > > b/src/intel/compiler/brw_vec4_visitor.cpp > > > > > > index 22ee4dd1c4..37ae31c0d5 100644 > > > > > > --- a/src/intel/compiler/brw_vec4_visitor.cpp > > > > > > +++ b/src/intel/compiler/brw_vec4_visitor.cpp > > > > > > @@ -264,6 +264,24 @@ vec4_visitor::SCRATCH_READ(const > > > > > > dst_reg > > > > > > &dst, > > > > > > const src_reg &index) > > > > > > } > > > > > > > > > > > > vec4_instruction * > > > > > > +vec4_visitor::DF_IVB_SCRATCH_READ(const dst_reg &dst, > > > > > > + const src_reg &index, > > > > > > + bool first_grf) > > > > > > +{ > > > > > > + vec4_instruction *inst; > > > > > > + enum opcode op = first_grf ? > > > > > > + VEC4_OPCODE_GEN4_SCRATCH_READ_1OWORD_LOW : > > > > > > + VEC4_OPCODE_GEN4_SCRATCH_READ_1OWORD_HIGH; > > > > > > + > > > > > > + inst = new(mem_ctx) vec4_instruction(op, > > > > > > + dst, index); > > > > > > + inst->base_mrf = FIRST_SPILL_MRF(devinfo->gen) + 1; > > > > > > + inst->mlen = 1; > > > > > > + > > > > > > + return inst; > > > > > > +} > > > > > > + > > > > > > +vec4_instruction * > > > > > > vec4_visitor::SCRATCH_WRITE(const dst_reg &dst, const > > > > > > src_reg > > > > > > &src, > > > > > > const src_reg &index) > > > > > > { > > > > > > @@ -1472,6 +1490,37 @@ > > > > > > vec4_visitor::get_scratch_offset(bblock_t > > > > > > *block, vec4_instruction *inst, > > > > > > > > > > > > /** > > > > > > * Emits an instruction before @inst to load the value > > > > > > named > > > > > > by > > > > > > @orig_src > > > > > > + * from scratch space at @base_offset to @temp. This > > > > > > instruction > > > > > > only reads > > > > > > + * DF value on IVB, one GRF each time. > > > > > > + * > > > > > > + * @base_offset is measured in 32-byte units (the size of > > > > > > a > > > > > > register). > > > > > > + * @first_grf indicates if we want to read first vertex > > > > > > data > > > > > > (true) or > > > > > > + * the second (false). > > > > > > + */ > > > > > > +void > > > > > > +vec4_visitor::emit_1grf_df_ivb_scratch_read(bblock_t > > > > > > *block, > > > > > > + vec4_instructi > > > > > > on > > > > > > *inst, > > > > > > + dst_reg temp, > > > > > > src_reg > > > > > > orig_src, > > > > > > + int > > > > > > base_offset, > > > > > > bool > > > > > > first_grf) > > > > > > +{ > > > > > > + assert(orig_src.offset % REG_SIZE == 0); > > > > > > + src_reg index = get_scratch_offset(block, inst, 0, > > > > > > base_offset); > > > > > > + > > > > > > + assert(devinfo->gen == 7 && !devinfo->is_haswell && > > > > > > type_sz(temp.type) == 8); > > > > > > + temp.offset = 0; > > > > > > + vec4_instruction *read = DF_IVB_SCRATCH_READ(temp, > > > > > > index, > > > > > > first_grf); > > > > > > + read->exec_size = 4; > > > > > > + /* The instruction will use group 0 but a different > > > > > > message > > > > > > depending of the > > > > > > + * vertex data to load. > > > > > > + */ > > > > > > + read->group = 0; > > > > > > + read->offset = base_offset; > > > > > > + read->size_written = 1; > > > > > > + emit_before(block, inst, read); > > > > > > +} > > > > > > + > > > > > > +/** > > > > > > + * Emits an instruction before @inst to load the value > > > > > > named > > > > > > by > > > > > > @orig_src > > > > > > * from scratch space at @base_offset to @temp. > > > > > > * > > > > > > * @base_offset is measured in 32-byte units (the size of > > > > > > a > > > > > > register). > > > > > > -- > > > > > > 2.11.0 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev