On 04/03/17 01:04, Francisco Jerez wrote: > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > >> From: "Juan A. Suarez Romero" <jasua...@igalia.com> >> >> When splitting VEC4_OPCODE_FROM_DOUBLE in Ivybridge/Baytrail, the second >> part should use a temporal register, and then move the values to the >> second half of the original destination, so we get all the results in the >> same register. >> >> v2: >> - Fix typos (Matt). >> --- >> src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 +++++++++++++---- >> src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 1 + >> 2 files changed, 14 insertions(+), 4 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp >> b/src/mesa/drivers/dri/i965/brw_vec4.cpp >> index 64b435f3ec4..adcde085305 100644 >> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp >> @@ -2191,9 +2191,15 @@ vec4_visitor::lower_simd_width() >> linst->group = channel_offset; >> linst->size_written = size_written; >> >> + /* When splitting VEC4_OPCODE_FROM_DOUBLE on Ivybridge, the second >> part >> + * should use in a temporal register. Later we will move the values >> + * to the second half of the original destination, so we get all >> the >> + * results in the same register. We use d2f_pass to detect this >> case. >> + */ >> + bool d2f_pass = (inst->opcode == VEC4_OPCODE_FROM_DOUBLE && n > 0); >> /* Compute split dst region */ >> dst_reg dst; >> - if (needs_temp) { >> + if (needs_temp || d2f_pass) { >> unsigned num_regs = DIV_ROUND_UP(size_written, REG_SIZE); >> dst = retype(dst_reg(VGRF, alloc.allocate(num_regs)), >> inst->dst.type); >> @@ -2226,9 +2232,12 @@ vec4_visitor::lower_simd_width() >> /* If we used a temporary to store the result of the split >> * instruction, copy the result to the original destination >> */ >> - if (needs_temp) { >> - vec4_instruction *mov = >> - MOV(offset(inst->dst, lowered_width, n), src_reg(dst)); >> + if (needs_temp || d2f_pass) { >> + vec4_instruction *mov; >> + if (d2f_pass) >> + mov = MOV(horiz_offset(inst->dst, n * >> type_sz(inst->dst.type)), src_reg(dst)); > > I have no idea how this could possibly work... horiz_offset() expects a > number of scalar components, not bytes. Anyway I have a hunch this is > trying to workaround the bug I pointed out in PATCH 15... >
This worked by luck. We want an horizontal offset of 'lowered_width' components per split. As inst->dst.type is F, type_sz(inst->dst.type) = 4 and we set the maximum 'lowered_width' value to 4, it matches in the cases we tested... however, this is a bug. Actually it works too if we use offset() instead, so this hunk is not needed. Sam >> + else >> + mov = MOV(offset(inst->dst, lowered_width, n), src_reg(dst)); >> mov->exec_size = lowered_width; >> mov->group = channel_offset; >> mov->size_written = size_written; >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp >> b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp >> index 7fa1afc9073..b570792badd 100644 >> --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp >> @@ -1532,6 +1532,7 @@ generate_code(struct brw_codegen *p, >> is_ivb_df); >> >> assert(inst->group % 8 == 0 || >> + (inst->exec_size == 4 && inst->group % 4 == 0) || >> inst->dst.type == BRW_REGISTER_TYPE_DF || >> inst->src[0].type == BRW_REGISTER_TYPE_DF || >> inst->src[1].type == BRW_REGISTER_TYPE_DF || >> -- >> 2.11.0 >> >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: OpenPGP digital signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev