[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.
https://bugs.freedesktop.org/show_bug.cgi?id=98172 --- Comment #2 from Michel Dänzer --- Created attachment 127204 --> https://bugs.freedesktop.org/attachment.cgi?id=127204&action=edit Work with a local reference of so->fence Does this patch help? -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: emit TA_CS_BC_BASE_ADDR on SI only if the kernel allows it
Reviewed-by: Nicolai Hähnle On 10.10.2016 13:25, Marek Olšák wrote: From: Marek Olšák The kernel patch has been sent to amd-gfx. --- src/gallium/drivers/radeonsi/si_compute.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c index 1d1df2f..8a803c9 100644 --- a/src/gallium/drivers/radeonsi/si_compute.c +++ b/src/gallium/drivers/radeonsi/si_compute.c @@ -244,21 +244,26 @@ static void si_initialize_compute(struct si_context *sctx) } /* Set the pointer to border colors. */ bc_va = sctx->border_color_buffer->gpu_address; if (sctx->b.chip_class >= CIK) { radeon_set_uconfig_reg_seq(cs, R_030E00_TA_CS_BC_BASE_ADDR, 2); radeon_emit(cs, bc_va >> 8); /* R_030E00_TA_CS_BC_BASE_ADDR */ radeon_emit(cs, bc_va >> 40); /* R_030E04_TA_CS_BC_BASE_ADDR_HI */ } else { - radeon_set_config_reg(cs, R_00950C_TA_CS_BC_BASE_ADDR, bc_va >> 8); + if (sctx->screen->b.info.drm_major == 3 || + (sctx->screen->b.info.drm_major == 2 && +sctx->screen->b.info.drm_minor >= 48)) { + radeon_set_config_reg(cs, R_00950C_TA_CS_BC_BASE_ADDR, + bc_va >> 8); + } } sctx->cs_shader_state.emitted_program = NULL; sctx->cs_shader_state.initialized = true; } static bool si_setup_compute_scratch_buffer(struct si_context *sctx, struct si_shader *shader, struct si_shader_config *config) { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 014/103] i965/disasm: align16 DF source regions have a width of 2
Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_disasm.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c b/src/mesa/drivers/dri/i965/brw_disasm.c index 5e51be7..1d2a4d2 100644 --- a/src/mesa/drivers/dri/i965/brw_disasm.c +++ b/src/mesa/drivers/dri/i965/brw_disasm.c @@ -942,7 +942,10 @@ src_da16(FILE *file, format(file, ".%d", 16 / reg_type_size[_reg_type]); string(file, "<"); err |= control(file, "vert stride", vert_stride, _vert_stride, NULL); - string(file, ",4,1>"); + if (reg_type_size[_reg_type] == 8) + string(file, ",2,1>"); + else + string(file, ",4,1>"); err |= src_swizzle(file, BRW_SWIZZLE4(swz_x, swz_y, swz_z, swz_w)); err |= control(file, "src da16 reg type", reg_encoding, _reg_type, NULL); return err; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0
It's been some time since we sent the first version of the patches, so here is a v2, which adds: 1. Feedback from Curro to v1. I think the only thing missing is the suggestion to change the semantics of the offset() helper in vec4 to match those in the scalar backend. I sent this as a separate series [1] that is still awaiting review. Once that is good to land we should adapt this series accordingly. 2. Adaptations to the sub-register offsets work done by Curro in master. 3. Some rudimentary support for 64-bit spilling. This is quite limited at the moment, since it skips spilling of fp64 data in a number of cases where it is not safe to do it at present. I guess we can look for ways improve this going forward, but I rather do that after we land the bulk of fp64, since the series is already quite big as it is. 4. Avoid scalarizing a number of swizzle combinations that we can support natively. 5. Many other small clean-ups and fixes. The series is available for testing in the 'i965-fp64-gen7-scalar-vec4-rc2' branch of our github repository [2]. This series implements the bulk of the fp64 align16 backend support and creates the infrastructure to implement vertex attrib 64bit as well, so once this lands in master we plan to send additional series that add VA64 for Haswell, and then Fp64 and VA64 for IvyBridge. [1] https://lists.freedesktop.org/archives/mesa-dev/2016-October/130459.html [2] https://github.com/Igalia/mesa/tree/i965-fp64-gen7-scalar-vec4-rc2 Connor Abbott (6): i965/vec4/nir: simplify glsl_type_for_nir_alu_type() i965/vec4/nir: allocate two registers for dvec3/dvec4 i965/vec4/nir: set the right type for 64-bit registers i965/vec4: add support for printing DF immediates i965: add brw_vecn_grf() i965/vec4: don't constant propagate 64-bit immediates Iago Toral Quiroga (92): i965/vec4/nir: Add bit-size information to types i965/vec4/nir: support doubles in ALU operations i965/vec4/nir: fix emitting 64-bit immediates i965/vec4: add double/float conversion pseudo-opcodes i965/vec4: translate d2f/f2d i965: fix subnr overflow in suboffset() i965/vec4: set correct register regions for 32-bit and 64-bit i965/disasm: align16 DF source regions have a width of 2 i965/vec4: We only support 32-bit integer ALU operations for now i965/vec4: add dst_null_df() i965/vec4: add VEC4_OPCODE_PICK_{LOW,HIGH}_32BIT opcodes i965/vec4: add VEC4_OPCODE_SET_{LOW,HIGH}_32BIT opcodes i965/vec4: Fix DCE for VEC4_OPCODE_SET_{LOW,HIGH}_32BIT i965/vec4: don't copy propagate vector opcodes that operate in align1 mode i965/vec4: implement double unpacking i965/vec4: implement double packing i965/vec4/nir: implement double comparisons i965/vec4: fix base offset for nir_registers with doubles i965/vec4: fix indentation in get_nir_src() i965/vec4: fix get_nir_dest() to use DF type for 64-bit destinations i965/vec4: make opt_vector_float ignore doubles i965/vec4: fix register allocation for 64-bit undef sources i965/vec4: Rename DF to/from F generator opcodes i965/vec4: add helpers for conversions to/from doubles i965/vec4: implement hardware workaround for align16 double to float conversion i965/vec4: implement d2i, d2u, i2d and u2d i965/vec4: implement d2b i965/vec4: implement fsign() for doubles i965/vec4: fix optimize predicate for doubles i965/vec4: add a helper function to create double immediates i965: move exec_size from fs_instruction to backend_instruction i965/vec4: fix size_written for doubles i965/vec4: fix regs_read() for doubles i965/vec4: use the IR's execution size i965/vec4: dump the instruction execution size i965/vec4: add a horiz_offset() helper i965: move the group field from fs_inst to backend_instruction. i965/vec4: add a SIMD lowering pass i965/vec4: make the generator set correct NibCtrl for SIMD4 DF instructions i965/vec4: dump NibCtrl for instructions with execsize != 8 i965/disasm: print NibCtrl for instructions with execsize < 8 i965/vec4: teach CSE about exec_size, group and doubles i965/vec4: teach cmod propagation about different execution sizes i965/vec4: split double-precision bcsel i965/vec4: add a scalarization pass for double-precision instructions i965/vec4: translate 64-bit swizzles to 32-bit i965/vec4: implement access to DF source components Z/W i965/disasm: fix subreg for dst in Align16 mode i965/vec4: teach register coalescing about 64-bit i965/vec4: fix pack_uniform_registers for doubles i965/vec4: fix indentation in pack_uniform_registers i965/vec4: Skip swizzle to subnr in 3src instructions with DF operands i965/vec4/nir: do not emit 64-bit MAD i965/vec4: do not emit 64-bit MAD i965/vec4: support multiple dispatch widths and groups in the IR builder. i965/vec4: Add a shuffle_64bit_data helper i965/vec4: Fix UBO loads for 64-bit data i965/vec4: Fix SSBO loads for 64-bit data i965/vec4: Fix SSBO stores for 64-bit data i965/vec4: prevent co
[Mesa-dev] [PATCH 2/3] radv/winsys: Move a 'default:' to the end of case stmt
Shift this down and maintain the exact same behaviour as the current code. Signed-off-by: Edward O'Callaghan --- src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c index 7319a98..3f41778 100644 --- a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c +++ b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c @@ -242,10 +242,11 @@ static unsigned radv_eg_tile_split_rev(unsigned eg_tile_split) case 128: return 1; case 256: return 2; case 512: return 3; - default: case 1024: return 4; case 2048: return 5; case 4096: return 6; + default: + return 4; } } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 017/103] i965/vec4: add VEC4_OPCODE_PICK_{LOW, HIGH}_32BIT opcodes
These opcodes will pick the low/high 32-bit in each 64-bit data element using Align1 mode. We will use this, for example, to do things like unpackDouble2x32. We use Align1 mode because in order to implement this in Align16 mode we would need to use 32-bit logical swizzles (XZ for low, YW for high), but the IR works in terms of 64-bit logical swizzles for DF operands all the way up to codegen. v2: - use suboffset() instead of get_element_ud() - no need to set the width on the dst --- src/mesa/drivers/dri/i965/brw_defines.h | 2 ++ src/mesa/drivers/dri/i965/brw_shader.cpp | 4 src/mesa/drivers/dri/i965/brw_vec4.cpp | 4 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 25 4 files changed, 35 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 79b96a4..8ffb50c 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1100,6 +1100,8 @@ enum opcode { VEC4_OPCODE_UNPACK_UNIFORM, VEC4_OPCODE_DOUBLE_TO_FLOAT, VEC4_OPCODE_FLOAT_TO_DOUBLE, + VEC4_OPCODE_PICK_LOW_32BIT, + VEC4_OPCODE_PICK_HIGH_32BIT, FS_OPCODE_DDX_COARSE, FS_OPCODE_DDX_FINE, diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index b063f77..b2f3a56 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -321,6 +321,10 @@ brw_instruction_name(const struct gen_device_info *devinfo, enum opcode op) return "double_to_float"; case VEC4_OPCODE_FLOAT_TO_DOUBLE: return "float_to_double"; + case VEC4_OPCODE_PICK_LOW_32BIT: + return "pick_low_32bit"; + case VEC4_OPCODE_PICK_HIGH_32BIT: + return "pick_high_32bit"; case FS_OPCODE_DDX_COARSE: return "ddx_coarse"; diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 40f8702..4fd04f1 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -255,6 +255,8 @@ vec4_instruction::can_do_writemask(const struct gen_device_info *devinfo) case SHADER_OPCODE_GEN4_SCRATCH_READ: case VEC4_OPCODE_DOUBLE_TO_FLOAT: case VEC4_OPCODE_FLOAT_TO_DOUBLE: + case VEC4_OPCODE_PICK_LOW_32BIT: + case VEC4_OPCODE_PICK_HIGH_32BIT: case VS_OPCODE_PULL_CONSTANT_LOAD: case VS_OPCODE_PULL_CONSTANT_LOAD_GEN7: case VS_OPCODE_SET_SIMD4X2_HEADER_GEN9: @@ -510,6 +512,8 @@ vec4_visitor::opt_reduce_swizzle() case VEC4_OPCODE_FLOAT_TO_DOUBLE: case VEC4_OPCODE_DOUBLE_TO_FLOAT: + case VEC4_OPCODE_PICK_LOW_32BIT: + case VEC4_OPCODE_PICK_HIGH_32BIT: swizzle = brw_swizzle_for_size(4); break; diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index 6f4c438..b8778c4 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -1940,6 +1940,31 @@ generate_code(struct brw_codegen *p, break; } + case VEC4_OPCODE_PICK_LOW_32BIT: + case VEC4_OPCODE_PICK_HIGH_32BIT: { + /* Stores the low/high 32-bit of each 64-bit element in src[0] into + * dst using ALIGN1 mode and a <8,4,2>:UD region on the source. + */ + assert(type_sz(src[0].type) == 8); + assert(type_sz(dst.type) == 4); + + brw_set_default_access_mode(p, BRW_ALIGN_1); + + dst = retype(dst, BRW_REGISTER_TYPE_UD); + dst.hstride = BRW_HORIZONTAL_STRIDE_1; + + src[0] = retype(src[0], BRW_REGISTER_TYPE_UD); + if (inst->opcode == VEC4_OPCODE_PICK_HIGH_32BIT) +src[0] = suboffset(src[0], 1); + src[0].vstride = BRW_VERTICAL_STRIDE_8; + src[0].width = BRW_WIDTH_4; + src[0].hstride = BRW_HORIZONTAL_STRIDE_2; + brw_MOV(p, dst, src[0]); + + brw_set_default_access_mode(p, BRW_ALIGN_16); + break; + } + case VEC4_OPCODE_PACK_BYTES: { /* Is effectively: * -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] radv/winsys: Trivial style and readability fixups
Drop/add a few newlines where appropriate and drop a couple of unnessary braces. Signed-off-by: Edward O'Callaghan --- src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c | 16 ++-- src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.h | 2 +- src/amd/vulkan/winsys/amdgpu/radv_amdgpu_surface.c | 8 src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c | 3 ++- 4 files changed, 17 insertions(+), 12 deletions(-) diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c index dedc778..330b59b 100644 --- a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c +++ b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c @@ -84,6 +84,7 @@ static bool radv_amdgpu_fence_wait(struct radeon_winsys *_ws, unsigned flags = absolute ? AMDGPU_QUERY_FENCE_TIMEOUT_IS_ABSOLUTE : 0; int r; uint32_t expired = 0; + /* Now use the libdrm query. */ r = amdgpu_cs_query_fence_status(fence, timeout, @@ -95,16 +96,16 @@ static bool radv_amdgpu_fence_wait(struct radeon_winsys *_ws, return false; } - if (expired) { + if (expired) return true; - } - return false; + return false; } static void radv_amdgpu_cs_destroy(struct radeon_winsys_cs *rcs) { struct radv_amdgpu_cs *cs = radv_amdgpu_cs(rcs); + if (cs->ib_buffer) cs->ws->base.buffer_destroy(cs->ib_buffer); else @@ -112,6 +113,7 @@ static void radv_amdgpu_cs_destroy(struct radeon_winsys_cs *rcs) for (unsigned i = 0; i < cs->num_old_ib_buffers; ++i) cs->ws->base.buffer_destroy(cs->old_ib_buffers[i]); + free(cs->old_ib_buffers); free(cs->handles); free(cs->priorities); @@ -121,9 +123,9 @@ static void radv_amdgpu_cs_destroy(struct radeon_winsys_cs *rcs) static boolean radv_amdgpu_init_cs(struct radv_amdgpu_cs *cs, enum ring_type ring_type) { - for (int i = 0; i < ARRAY_SIZE(cs->buffer_hash_table); ++i) { + for (int i = 0; i < ARRAY_SIZE(cs->buffer_hash_table); ++i) cs->buffer_hash_table[i] = -1; - } + return true; } @@ -297,7 +299,7 @@ static int radv_amdgpu_cs_find_buffer(struct radv_amdgpu_cs *cs, if (index == -1) return -1; - if(cs->handles[index] == bo) + if (cs->handles[index] == bo) return index; for (unsigned i = 0; i < cs->num_buffers; ++i) { @@ -306,6 +308,7 @@ static int radv_amdgpu_cs_find_buffer(struct radv_amdgpu_cs *cs, return i; } } + return -1; } @@ -455,6 +458,7 @@ static int radv_amdgpu_create_bo_list(struct radv_amdgpu_winsys *ws, free(handles); free(priorities); } + return r; } diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.h b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.h index b4482fc..affee95 100644 --- a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.h +++ b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.h @@ -36,8 +36,8 @@ #include #include "radv_radeon_winsys.h" - #include "radv_amdgpu_winsys.h" + struct radv_amdgpu_ctx { struct radv_amdgpu_winsys *ws; amdgpu_context_handle ctx; diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_surface.c b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_surface.c index a3c2411..31927ec 100644 --- a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_surface.c +++ b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_surface.c @@ -27,12 +27,14 @@ */ #include + #include "radv_private.h" #include "addrlib/addrinterface.h" #include "util/bitset.h" #include "radv_amdgpu_winsys.h" #include "radv_amdgpu_surface.h" #include "sid.h" + #ifndef NO_ENTRIES #define NO_ENTRIES 32 #endif @@ -194,9 +196,8 @@ static int radv_compute_level(ADDR_HANDLE addrlib, ret = AddrComputeSurfaceInfo(addrlib, AddrSurfInfoIn, AddrSurfInfoOut); - if (ret != ADDR_OK) { + if (ret != ADDR_OK) return ret; - } surf_level = is_stencil ? &surf->stencil_level[level] : &surf->level[level]; surf_level->offset = align64(surf->bo_size, AddrSurfInfoOut->baseAlign); @@ -340,8 +341,7 @@ static int radv_amdgpu_winsys_surface_init(struct radeon_winsys *_ws, default: assert(0); } - } - else { + } else { AddrDccIn.bpp = AddrSurfInfoIn.bpp = surf->bpe * 8; } diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c index 9450536..0ce44ac 100644 --- a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c +++ b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c @@ -37,6 +37,7 @@ #include "radv_amdgpu_cs.h" #include "radv_a
[Mesa-dev] [PATCH v2 013/103] i965/vec4: set correct register regions for 32-bit and 64-bit
For 32-bit instructions we want to use <4,4,1> regions for VGRF sources so we should really set a width of 4 (we were setting 8). For 64-bit instructions we want to use a width of 2 because the hardware uses 32-bit swizzles, meaning that we can only address 2 consecutive 64-bit components in a row. Also, Curro suggested that the hardware is probably fixing the width to 2 for 64-bit instructions anyway, so just go with that and use <2,2,1>. v2: - No need to explicitly set the vertical stride of 64-bit regions to 2, brw_vecn_grf with a width of 2 will do that for us. - No need to adjust the width of dst registers. Signed-off-by: Connor Abbott --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 32c04b2..40f8702 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1873,20 +1873,24 @@ vec4_visitor::convert_to_hw_regs() struct src_reg &src = inst->src[i]; struct brw_reg reg; switch (src.file) { - case VGRF: -reg = byte_offset(brw_vec8_grf(src.nr, 0), src.offset); + case VGRF: { +unsigned type_size = type_sz(src.type); +unsigned width = REG_SIZE / 2 / MAX2(4, type_size); +reg = byte_offset(brw_vecn_grf(width, src.nr, 0), src.offset); reg.type = src.type; reg.swizzle = src.swizzle; reg.abs = src.abs; reg.negate = src.negate; break; + } - case UNIFORM: + case UNIFORM: { +unsigned width = REG_SIZE / 2 / MAX2(4, type_sz(src.type)); reg = stride(byte_offset(brw_vec4_grf( prog_data->base.dispatch_grf_start_reg + src.nr / 2, src.nr % 2 * 4), src.offset), - 0, 4, 1); + 0, width, 1); reg.type = src.type; reg.swizzle = src.swizzle; reg.abs = src.abs; @@ -1895,6 +1899,7 @@ vec4_visitor::convert_to_hw_regs() /* This should have been moved to pull constants. */ assert(!src.reladdr); break; + } case ARF: case FIXED_GRF: -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Various radv fixups, style + one mem leak fix
Nothing major here, patch 3 is the only interesting one. Edward O'Callaghan (3): [PATCH 1/3] radv/winsys: Trivial style and readability fixups [PATCH 2/3] radv/winsys: Move a 'default:' to the end of case stmt [PATCH 3/3] radv/winsys: Fix mem leak at failed do_winsys_init() call ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 022/103] i965/vec4: implement double packing
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 2631bf3..37c3d7c 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -1538,6 +1538,17 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) break; } + case nir_op_pack_double_2x32_split: { + dst_reg result = dst_reg(this, glsl_type::dvec4_type); + dst_reg tmp = dst_reg(this, glsl_type::uvec4_type); + emit(MOV(tmp, retype(op[0], BRW_REGISTER_TYPE_UD))); + emit(VEC4_OPCODE_SET_LOW_32BIT, result, src_reg(tmp)); + emit(MOV(tmp, retype(op[1], BRW_REGISTER_TYPE_UD))); + emit(VEC4_OPCODE_SET_HIGH_32BIT, result, src_reg(tmp)); + emit(MOV(dst, src_reg(result))); + break; + } + case nir_op_unpack_double_2x32_split_x: case nir_op_unpack_double_2x32_split_y: { enum opcode oper = (instr->op == nir_op_unpack_double_2x32_split_x) ? -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 032/103] i965/vec4: implement d2i, d2u, i2d and u2d
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 14 ++ 1 file changed, 14 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 0170d21..cc10247 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -1166,6 +1166,20 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) BRW_REGISTER_TYPE_F); break; + case nir_op_d2i: + case nir_op_d2u: + emit_double_to_single(dst, op[0], instr->dest.saturate, +instr->op == nir_op_d2i ? BRW_REGISTER_TYPE_D : + BRW_REGISTER_TYPE_UD); + break; + + case nir_op_i2d: + case nir_op_u2d: + emit_single_to_double(dst, op[0], instr->dest.saturate, +instr->op == nir_op_i2d ? BRW_REGISTER_TYPE_D : + BRW_REGISTER_TYPE_UD); + break; + case nir_op_iadd: assert(nir_dest_bit_size(instr->dest.dest) < 64); case nir_op_fadd: -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 029/103] i965/vec4: Rename DF to/from F generator opcodes
The opcodes are not specific for conversions to/from float since we need the same for conversions to/from other 32-bit types. Rename the opcodes accordingly and change the asserts to check the size of the types involved instead. --- src/mesa/drivers/dri/i965/brw_defines.h | 4 ++-- src/mesa/drivers/dri/i965/brw_shader.cpp| 8 src/mesa/drivers/dri/i965/brw_vec4.cpp | 8 src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 4 ++-- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp| 12 ++-- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 6 +++--- 6 files changed, 21 insertions(+), 21 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 35d638c..b137fb4 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1098,8 +1098,8 @@ enum opcode { VEC4_OPCODE_MOV_BYTES, VEC4_OPCODE_PACK_BYTES, VEC4_OPCODE_UNPACK_UNIFORM, - VEC4_OPCODE_DOUBLE_TO_FLOAT, - VEC4_OPCODE_FLOAT_TO_DOUBLE, + VEC4_OPCODE_DOUBLE_TO_SINGLE, + VEC4_OPCODE_SINGLE_TO_DOUBLE, VEC4_OPCODE_PICK_LOW_32BIT, VEC4_OPCODE_PICK_HIGH_32BIT, VEC4_OPCODE_SET_LOW_32BIT, diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 153bd43..df43509 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -317,10 +317,10 @@ brw_instruction_name(const struct gen_device_info *devinfo, enum opcode op) return "pack_bytes"; case VEC4_OPCODE_UNPACK_UNIFORM: return "unpack_uniform"; - case VEC4_OPCODE_DOUBLE_TO_FLOAT: - return "double_to_float"; - case VEC4_OPCODE_FLOAT_TO_DOUBLE: - return "float_to_double"; + case VEC4_OPCODE_DOUBLE_TO_SINGLE: + return "double_to_single"; + case VEC4_OPCODE_SINGLE_TO_DOUBLE: + return "single_to_double"; case VEC4_OPCODE_PICK_LOW_32BIT: return "pick_low_32bit"; case VEC4_OPCODE_PICK_HIGH_32BIT: diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 675b7fc..75a8473 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -253,8 +253,8 @@ vec4_instruction::can_do_writemask(const struct gen_device_info *devinfo) { switch (opcode) { case SHADER_OPCODE_GEN4_SCRATCH_READ: - case VEC4_OPCODE_DOUBLE_TO_FLOAT: - case VEC4_OPCODE_FLOAT_TO_DOUBLE: + case VEC4_OPCODE_DOUBLE_TO_SINGLE: + case VEC4_OPCODE_SINGLE_TO_DOUBLE: case VEC4_OPCODE_PICK_LOW_32BIT: case VEC4_OPCODE_PICK_HIGH_32BIT: case VEC4_OPCODE_SET_LOW_32BIT: @@ -513,8 +513,8 @@ vec4_visitor::opt_reduce_swizzle() swizzle = brw_swizzle_for_size(2); break; - case VEC4_OPCODE_FLOAT_TO_DOUBLE: - case VEC4_OPCODE_DOUBLE_TO_FLOAT: + case VEC4_OPCODE_SINGLE_TO_DOUBLE: + case VEC4_OPCODE_DOUBLE_TO_SINGLE: case VEC4_OPCODE_PICK_LOW_32BIT: case VEC4_OPCODE_PICK_HIGH_32BIT: case VEC4_OPCODE_SET_LOW_32BIT: diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp index d0045a7..49920c2 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp @@ -286,8 +286,8 @@ static bool is_align1_opcode(unsigned opcode) { switch (opcode) { - case VEC4_OPCODE_DOUBLE_TO_FLOAT: - case VEC4_OPCODE_FLOAT_TO_DOUBLE: + case VEC4_OPCODE_DOUBLE_TO_SINGLE: + case VEC4_OPCODE_SINGLE_TO_DOUBLE: case VEC4_OPCODE_PICK_LOW_32BIT: case VEC4_OPCODE_PICK_HIGH_32BIT: case VEC4_OPCODE_SET_LOW_32BIT: diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index 120797b..4d05fcd 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -1896,9 +1896,9 @@ generate_code(struct brw_codegen *p, break; } - case VEC4_OPCODE_DOUBLE_TO_FLOAT: { - assert(src[0].type == BRW_REGISTER_TYPE_DF); - assert(dst.type == BRW_REGISTER_TYPE_F); + case VEC4_OPCODE_DOUBLE_TO_SINGLE: { + assert(type_sz(src[0].type) == 8); + assert(type_sz(dst.type) == 4); brw_set_default_access_mode(p, BRW_ALIGN_1); @@ -1917,9 +1917,9 @@ generate_code(struct brw_codegen *p, break; } - case VEC4_OPCODE_FLOAT_TO_DOUBLE: { - assert(src[0].type == BRW_REGISTER_TYPE_F); - assert(dst.type == BRW_REGISTER_TYPE_DF); + case VEC4_OPCODE_SINGLE_TO_DOUBLE: { + assert(type_sz(src[0].type) == 4); + assert(type_sz(dst.type) == 8); brw_set_default_access_mode(p, BRW_ALIGN_1); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 4dffd76..5
[Mesa-dev] [PATCH v2 008/103] i965/vec4: add support for printing DF immediates
From: Connor Abbott Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 6aa9102..c29cfb5 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1517,6 +1517,9 @@ vec4_visitor::dump_instruction(backend_instruction *be_inst, FILE *file) case BRW_REGISTER_TYPE_F: fprintf(file, "%fF", inst->src[i].f); break; + case BRW_REGISTER_TYPE_DF: +fprintf(file, "%fDF", inst->src[i].df); +break; case BRW_REGISTER_TYPE_D: fprintf(file, "%dD", inst->src[i].d); break; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] radv/winsys: Fix mem leak at failed do_winsys_init() call site
Probably unlikely however ensure we don't leak a heap allocation on the fail path. Signed-off-by: Edward O'Callaghan --- src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c index 0ce44ac..ded5ed7 100644 --- a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c +++ b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c @@ -356,5 +356,6 @@ radv_amdgpu_winsys_create(int fd) return &ws->base; fail: + free(ws); return NULL; } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 007/103] i965/vec4/nir: fix emitting 64-bit immediates
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 22 ++ 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 05e7f29..ce95c8d 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -352,8 +352,15 @@ vec4_visitor::get_indirect_offset(nir_intrinsic_instr *instr) void vec4_visitor::nir_emit_load_const(nir_load_const_instr *instr) { - dst_reg reg = dst_reg(VGRF, alloc.allocate(1)); - reg.type = BRW_REGISTER_TYPE_D; + dst_reg reg; + + if (instr->def.bit_size == 64) { + reg = dst_reg(VGRF, alloc.allocate(2)); + reg.type = BRW_REGISTER_TYPE_DF; + } else { + reg = dst_reg(VGRF, alloc.allocate(1)); + reg.type = BRW_REGISTER_TYPE_D; + } unsigned remaining = brw_writemask_for_size(instr->def.num_components); @@ -368,13 +375,20 @@ vec4_visitor::nir_emit_load_const(nir_load_const_instr *instr) continue; for (unsigned j = i; j < instr->def.num_components; j++) { - if (instr->value.u32[i] == instr->value.u32[j]) { + if ((instr->def.bit_size == 32 && + instr->value.u32[i] == instr->value.u32[j]) || + (instr->def.bit_size == 64 && + instr->value.f64[i] == instr->value.f64[j])) { writemask |= 1 << j; } } reg.writemask = writemask; - emit(MOV(reg, brw_imm_d(instr->value.i32[i]))); + if (instr->def.bit_size == 64) { + emit(MOV(reg, brw_imm_df(instr->value.f64[i]))); + } else { + emit(MOV(reg, brw_imm_d(instr->value.i32[i]))); + } remaining &= ~writemask; } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 010/103] i965/vec4: translate d2f/f2d
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 24 1 file changed, 24 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index ce95c8d..b75337c 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -,6 +,30 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) inst = emit(MOV(dst, op[0])); break; + case nir_op_d2f: { + dst_reg temp = dst_reg(this, glsl_type::dvec4_type); + emit(MOV(temp, op[0])); + + dst_reg temp2 = dst_reg(this, glsl_type::dvec4_type); + temp2 = retype(temp2, BRW_REGISTER_TYPE_F); + emit(VEC4_OPCODE_DOUBLE_TO_FLOAT, temp2, src_reg(temp)) + ->size_written = 2 * REG_SIZE; + + vec4_instruction *inst = emit(MOV(dst, src_reg(temp2))); + inst->saturate = instr->dest.saturate; + break; + } + + case nir_op_f2d: { + dst_reg tmp_dst = dst_reg(src_reg(this, glsl_type::dvec4_type)); + src_reg tmp_src = src_reg(this, glsl_type::vec4_type); + emit(MOV(dst_reg(tmp_src), retype(op[0], BRW_REGISTER_TYPE_F))); + emit(VEC4_OPCODE_FLOAT_TO_DOUBLE, tmp_dst, tmp_src); + vec4_instruction *inst = emit(MOV(dst, src_reg(tmp_dst))); + inst->saturate = instr->dest.saturate; + break; + } + case nir_op_fadd: /* fall through */ case nir_op_iadd: -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 002/103] i965/vec4/nir: simplify glsl_type_for_nir_alu_type()
From: Connor Abbott Less duplication, one one less case to handle for doubles and support for sized NIR types. v2: Fix call to get_instance by swapping rows and columns params (Iago) Signed-off-by: Iago Toral Quiroga Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 16 ++-- 1 file changed, 2 insertions(+), 14 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 1d834a4..ddeff2d 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -1784,20 +1784,8 @@ const glsl_type * glsl_type_for_nir_alu_type(nir_alu_type alu_type, unsigned components) { - switch (alu_type) { - case nir_type_float: - return glsl_type::vec(components); - case nir_type_int: - return glsl_type::ivec(components); - case nir_type_uint: - return glsl_type::uvec(components); - case nir_type_bool: - return glsl_type::bvec(components); - default: - return glsl_type::error_type; - } - - return glsl_type::error_type; + return glsl_type::get_instance(brw_glsl_base_type_for_nir_type(alu_type), + components, 1); } void -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 026/103] i965/vec4: fix get_nir_dest() to use DF type for 64-bit destinations
v2: Make dst_reg_for_nir_reg() handle this for nir_register since we want to have the correct type set before we call offset(). --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index c825aeb..fdd3cba 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -259,8 +259,10 @@ dst_reg_for_nir_reg(vec4_visitor *v, nir_register *nir_reg, dst_reg reg; reg = v->nir_locals[nir_reg->index]; - if (nir_reg->bit_size == 64) + if (nir_reg->bit_size == 64) { base_offset *= 2; + reg.type = BRW_REGISTER_TYPE_DF; + } reg = offset(reg, base_offset); if (indirect) { reg.reladdr = @@ -277,6 +279,8 @@ vec4_visitor::get_nir_dest(const nir_dest &dest) if (dest.is_ssa) { dst_reg dst = dst_reg(VGRF, alloc.allocate(DIV_ROUND_UP(dest.ssa.bit_size, 32))); + if (dest.ssa.bit_size == 64) + dst.type = BRW_REGISTER_TYPE_DF; nir_ssa_values[dest.ssa.index] = dst; return dst; } else { -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 003/103] i965/vec4/nir: allocate two registers for dvec3/dvec4
From: Connor Abbott v2 (Curro): - Do not special-case for a bit-size of 64, divide the bit_size by 32 instead. - Use DIV_ROUND_UP so we can handle sub-32-bit types. --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index ddeff2d..af76730 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -140,8 +140,8 @@ vec4_visitor::nir_emit_impl(nir_function_impl *impl) foreach_list_typed(nir_register, reg, node, &impl->registers) { unsigned array_elems = reg->num_array_elems == 0 ? 1 : reg->num_array_elems; - - nir_locals[reg->index] = dst_reg(VGRF, alloc.allocate(array_elems)); + unsigned num_regs = array_elems * DIV_ROUND_UP(reg->bit_size, 32); + nir_locals[reg->index] = dst_reg(VGRF, alloc.allocate(num_regs)); } nir_ssa_values = ralloc_array(mem_ctx, dst_reg, impl->ssa_alloc); @@ -270,7 +270,8 @@ dst_reg vec4_visitor::get_nir_dest(const nir_dest &dest) { if (dest.is_ssa) { - dst_reg dst = dst_reg(VGRF, alloc.allocate(1)); + dst_reg dst = + dst_reg(VGRF, alloc.allocate(DIV_ROUND_UP(dest.ssa.bit_size, 32))); nir_ssa_values[dest.ssa.index] = dst; return dst; } else { -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 012/103] i965: add brw_vecn_grf()
From: Connor Abbott Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_reg.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_reg.h b/src/mesa/drivers/dri/i965/brw_reg.h index 8907c9c..1fa2595 100644 --- a/src/mesa/drivers/dri/i965/brw_reg.h +++ b/src/mesa/drivers/dri/i965/brw_reg.h @@ -722,6 +722,12 @@ brw_vec16_grf(unsigned nr, unsigned subnr) return brw_vec16_reg(BRW_GENERAL_REGISTER_FILE, nr, subnr); } +static inline struct brw_reg +brw_vecn_grf(unsigned width, unsigned nr, unsigned subnr) +{ + return brw_vecn_reg(width, BRW_GENERAL_REGISTER_FILE, nr, subnr); +} + static inline struct brw_reg brw_uw8_grf(unsigned nr, unsigned subnr) -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 034/103] i965/vec4: implement fsign() for doubles
v2: use a predicated MOV instead of a CMP, like we do in d2b, to skip loading a double immediate. --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 64 +++--- 1 file changed, 49 insertions(+), 15 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 69f11ff..c0cb141 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -1773,24 +1773,58 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) unreachable("not reached: should have been lowered"); case nir_op_fsign: - /* AND(val, 0x8000) gives the sign bit. - * - * Predicated OR ORs 1.0 (0x3f80) with the sign bit if val is not - * zero. - */ - emit(CMP(dst_null_f(), op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ)); + if (type_sz(op[0].type) < 8) { + /* AND(val, 0x8000) gives the sign bit. + * + * Predicated OR ORs 1.0 (0x3f80) with the sign bit if val is not + * zero. + */ + emit(CMP(dst_null_f(), op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ)); - op[0].type = BRW_REGISTER_TYPE_UD; - dst.type = BRW_REGISTER_TYPE_UD; - emit(AND(dst, op[0], brw_imm_ud(0x8000u))); + op[0].type = BRW_REGISTER_TYPE_UD; + dst.type = BRW_REGISTER_TYPE_UD; + emit(AND(dst, op[0], brw_imm_ud(0x8000u))); - inst = emit(OR(dst, src_reg(dst), brw_imm_ud(0x3f80u))); - inst->predicate = BRW_PREDICATE_NORMAL; - dst.type = BRW_REGISTER_TYPE_F; + inst = emit(OR(dst, src_reg(dst), brw_imm_ud(0x3f80u))); + inst->predicate = BRW_PREDICATE_NORMAL; + dst.type = BRW_REGISTER_TYPE_F; + + if (instr->dest.saturate) { +inst = emit(MOV(dst, src_reg(dst))); +inst->saturate = true; + } + } else { + /* For doubles we do the same but we need to consider: + * + * - We use a predicated MOV instead of a CMP so that we can skip + * loading a 0.0 immediate. We use a source modifier on the source + * of the MOV so that we flush denormalized values to 0. Since we + * want to compare against 0, this won't alter the result. + * - We need to extract the high 32-bit of each DF where the sign + * is stored. + * - We need to produce a DF result. + */ + + /* Check for zero */ + src_reg value = op[0]; + value.abs = true; + inst = emit(MOV(dst_null_df(), value)); + inst->conditional_mod = BRW_CONDITIONAL_NZ; + + /* AND each high 32-bit channel with 0x8000u */ + dst_reg tmp = dst_reg(this, glsl_type::uvec4_type); + emit(VEC4_OPCODE_PICK_HIGH_32BIT, tmp, op[0]); + emit(AND(tmp, src_reg(tmp), brw_imm_ud(0x8000u))); + + /* Add 1.0 to each channel, predicated to skip the cases where the + * channel's value was 0 + */ + inst = emit(OR(tmp, src_reg(tmp), brw_imm_ud(0x3f80u))); + inst->predicate = BRW_PREDICATE_NORMAL; - if (instr->dest.saturate) { - inst = emit(MOV(dst, src_reg(dst))); - inst->saturate = true; + /* Now convert the result from float to double */ + emit_single_to_double(dst, src_reg(tmp), instr->dest.saturate, + BRW_REGISTER_TYPE_F); } break; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 018/103] i965/vec4: add VEC4_OPCODE_SET_{LOW, HIGH}_32BIT opcodes
These opcodes will set the low/high 32-bit in each 64-bit data element using Align1 mode. We will use this to implement packDouble2x32. We use Align1 mode because in order to implement this in Align16 mode we would need to use 32-bit logical swizzles (XZ for low, YW for high), but the IR works in terms of 64-bit logical swizzles for DF operands all the way up to codegen. v2: - use suboffset() instead of get_element_ud() - no need to set the width on the dst --- src/mesa/drivers/dri/i965/brw_defines.h | 2 ++ src/mesa/drivers/dri/i965/brw_shader.cpp | 4 src/mesa/drivers/dri/i965/brw_vec4.cpp | 4 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 25 4 files changed, 35 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 8ffb50c..35d638c 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1102,6 +1102,8 @@ enum opcode { VEC4_OPCODE_FLOAT_TO_DOUBLE, VEC4_OPCODE_PICK_LOW_32BIT, VEC4_OPCODE_PICK_HIGH_32BIT, + VEC4_OPCODE_SET_LOW_32BIT, + VEC4_OPCODE_SET_HIGH_32BIT, FS_OPCODE_DDX_COARSE, FS_OPCODE_DDX_FINE, diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index b2f3a56..153bd43 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -325,6 +325,10 @@ brw_instruction_name(const struct gen_device_info *devinfo, enum opcode op) return "pick_low_32bit"; case VEC4_OPCODE_PICK_HIGH_32BIT: return "pick_high_32bit"; + case VEC4_OPCODE_SET_LOW_32BIT: + return "set_low_32bit"; + case VEC4_OPCODE_SET_HIGH_32BIT: + return "set_high_32bit"; case FS_OPCODE_DDX_COARSE: return "ddx_coarse"; diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 4fd04f1..06fa38f 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -257,6 +257,8 @@ vec4_instruction::can_do_writemask(const struct gen_device_info *devinfo) case VEC4_OPCODE_FLOAT_TO_DOUBLE: case VEC4_OPCODE_PICK_LOW_32BIT: case VEC4_OPCODE_PICK_HIGH_32BIT: + case VEC4_OPCODE_SET_LOW_32BIT: + case VEC4_OPCODE_SET_HIGH_32BIT: case VS_OPCODE_PULL_CONSTANT_LOAD: case VS_OPCODE_PULL_CONSTANT_LOAD_GEN7: case VS_OPCODE_SET_SIMD4X2_HEADER_GEN9: @@ -514,6 +516,8 @@ vec4_visitor::opt_reduce_swizzle() case VEC4_OPCODE_DOUBLE_TO_FLOAT: case VEC4_OPCODE_PICK_LOW_32BIT: case VEC4_OPCODE_PICK_HIGH_32BIT: + case VEC4_OPCODE_SET_LOW_32BIT: + case VEC4_OPCODE_SET_HIGH_32BIT: swizzle = brw_swizzle_for_size(4); break; diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index b8778c4..120797b 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -1965,6 +1965,31 @@ generate_code(struct brw_codegen *p, break; } + case VEC4_OPCODE_SET_LOW_32BIT: + case VEC4_OPCODE_SET_HIGH_32BIT: { + /* Reads consecutive 32-bit elements from src[0] and writes + * them to the low/high 32-bit of each 64-bit element in dst. + */ + assert(type_sz(src[0].type) == 4); + assert(type_sz(dst.type) == 8); + + brw_set_default_access_mode(p, BRW_ALIGN_1); + + dst = retype(dst, BRW_REGISTER_TYPE_UD); + if (inst->opcode == VEC4_OPCODE_SET_HIGH_32BIT) +dst = suboffset(dst, 1); + dst.hstride = BRW_HORIZONTAL_STRIDE_2; + + src[0] = retype(src[0], BRW_REGISTER_TYPE_UD); + src[0].vstride = BRW_VERTICAL_STRIDE_4; + src[0].width = BRW_WIDTH_4; + src[0].hstride = BRW_HORIZONTAL_STRIDE_1; + brw_MOV(p, dst, src[0]); + + brw_set_default_access_mode(p, BRW_ALIGN_16); + break; + } + case VEC4_OPCODE_PACK_BYTES: { /* Is effectively: * -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 004/103] i965/vec4/nir: Add bit-size information to types
Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index af76730..5048c4e 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -325,7 +325,7 @@ src_reg vec4_visitor::get_nir_src(const nir_src &src, unsigned num_components) { /* if type is not specified, default to signed int */ - return get_nir_src(src, nir_type_int, num_components); + return get_nir_src(src, nir_type_int32, num_components); } src_reg @@ -747,7 +747,7 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) const nir_intrinsic_info *info = &nir_intrinsic_infos[instr->intrinsic]; /* Get the arguments of the atomic intrinsic. */ - src_reg offset = get_nir_src(instr->src[0], nir_type_int, + src_reg offset = get_nir_src(instr->src[0], nir_type_int32, instr->num_components); const src_reg surface = brw_imm_ud(surf_index); const src_reg src0 = (info->num_srcs >= 2 @@ -793,7 +793,7 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) * from any live channel. */ surf_index = src_reg(this, glsl_type::uint_type); - emit(ADD(dst_reg(surf_index), get_nir_src(instr->src[0], nir_type_int, + emit(ADD(dst_reg(surf_index), get_nir_src(instr->src[0], nir_type_int32, instr->num_components), brw_imm_ud(prog_data->base.binding_table.ubo_start))); surf_index = emit_uniformize(surf_index); @@ -811,7 +811,7 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) if (const_offset) { offset = brw_imm_ud(const_offset->u32[0] & ~15); } else { - offset = get_nir_src(instr->src[1], nir_type_int, 1); + offset = get_nir_src(instr->src[1], nir_type_uint32, 1); } src_reg packed_consts = src_reg(this, glsl_type::vec4_type); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 011/103] i965: fix subnr overflow in suboffset()
--- src/mesa/drivers/dri/i965/brw_reg.h | 13 + 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_reg.h b/src/mesa/drivers/dri/i965/brw_reg.h index 3b46d27..8907c9c 100644 --- a/src/mesa/drivers/dri/i965/brw_reg.h +++ b/src/mesa/drivers/dri/i965/brw_reg.h @@ -520,14 +520,6 @@ sechalf(struct brw_reg reg) } static inline struct brw_reg -suboffset(struct brw_reg reg, unsigned delta) -{ - reg.subnr += delta * type_sz(reg.type); - return reg; -} - - -static inline struct brw_reg offset(struct brw_reg reg, unsigned delta) { reg.nr += delta; @@ -544,6 +536,11 @@ byte_offset(struct brw_reg reg, unsigned bytes) return reg; } +static inline struct brw_reg +suboffset(struct brw_reg reg, unsigned delta) +{ + return byte_offset(reg, delta * type_sz(reg.type)); +} /** Construct unsigned word[16] register */ static inline struct brw_reg -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 005/103] i965/vec4/nir: support doubles in ALU operations
Basically, this involves considering the bit-size information to set the appropriate type on both operands and destination. v2 (Curro) - Don't use two temporaries (and write one of them twice ) to obtain the nir_alu_type. Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 5048c4e..0d4c8f5 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -1055,14 +1055,17 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) { vec4_instruction *inst; - dst_reg dst = get_nir_dest(instr->dest.dest, - nir_op_infos[instr->op].output_type); + nir_alu_type dst_type = (nir_alu_type) (nir_op_infos[instr->op].output_type | + nir_dest_bit_size(instr->dest.dest)); + dst_reg dst = get_nir_dest(instr->dest.dest, dst_type); dst.writemask = instr->dest.write_mask; src_reg op[4]; for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) { - op[i] = get_nir_src(instr->src[i].src, - nir_op_infos[instr->op].input_types[i], 4); + nir_alu_type src_type = (nir_alu_type) + (nir_op_infos[instr->op].input_types[i] | + nir_src_bit_size(instr->src[i].src)); + op[i] = get_nir_src(instr->src[i].src, src_type, 4); op[i].swizzle = brw_swizzle_for_nir_swizzle(instr->src[i].swizzle); op[i].abs = instr->src[i].abs; op[i].negate = instr->src[i].negate; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 028/103] i965/vec4: fix register allocation for 64-bit undef sources
Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index fdd3cba..4dffd76 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -2085,7 +2085,8 @@ vec4_visitor::nir_emit_texture(nir_tex_instr *instr) void vec4_visitor::nir_emit_undef(nir_ssa_undef_instr *instr) { - nir_ssa_values[instr->def.index] = dst_reg(VGRF, alloc.allocate(1)); + nir_ssa_values[instr->def.index] = + dst_reg(VGRF, alloc.allocate(DIV_ROUND_UP(instr->def.bit_size, 32))); } } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 021/103] i965/vec4: implement double unpacking
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 12 1 file changed, 12 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 04f70ef..2631bf3 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -1538,6 +1538,18 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) break; } + case nir_op_unpack_double_2x32_split_x: + case nir_op_unpack_double_2x32_split_y: { + enum opcode oper = (instr->op == nir_op_unpack_double_2x32_split_x) ? + VEC4_OPCODE_PICK_LOW_32BIT : VEC4_OPCODE_PICK_HIGH_32BIT; + dst_reg tmp = dst_reg(this, glsl_type::dvec4_type); + emit(MOV(tmp, op[0])); + dst_reg tmp2 = dst_reg(this, glsl_type::uvec4_type); + emit(oper, tmp2, src_reg(tmp)); + emit(MOV(dst, src_reg(tmp2))); + break; + } + case nir_op_unpack_half_2x16: /* As NIR does not guarantee that we have a correct swizzle outside the * boundaries of a vector, and the implementation of emit_unpack_half_2x16 -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 006/103] i965/vec4/nir: set the right type for 64-bit registers
From: Connor Abbott --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 0d4c8f5..05e7f29 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -142,6 +142,9 @@ vec4_visitor::nir_emit_impl(nir_function_impl *impl) reg->num_array_elems == 0 ? 1 : reg->num_array_elems; unsigned num_regs = array_elems * DIV_ROUND_UP(reg->bit_size, 32); nir_locals[reg->index] = dst_reg(VGRF, alloc.allocate(num_regs)); + + if (reg->bit_size == 64) + nir_locals[reg->index].type = BRW_REGISTER_TYPE_DF; } nir_ssa_values = ralloc_array(mem_ctx, dst_reg, impl->ssa_alloc); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 035/103] i965/vec4: fix optimize predicate for doubles
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index c0cb141..088ed13 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -1010,8 +1010,10 @@ vec4_visitor::optimize_predicate(nir_alu_instr *instr, src_reg op[2]; assert(nir_op_infos[cmp_instr->op].num_inputs == 2); for (unsigned i = 0; i < 2; i++) { - op[i] = get_nir_src(cmp_instr->src[i].src, - nir_op_infos[cmp_instr->op].input_types[i], 4); + nir_alu_type type = nir_op_infos[cmp_instr->op].input_types[i]; + unsigned bit_size = nir_src_bit_size(cmp_instr->src[i].src); + type = (nir_alu_type) (((unsigned) type) | bit_size); + op[i] = get_nir_src(cmp_instr->src[i].src, type, 4); unsigned base_swizzle = brw_swizzle_for_nir_swizzle(cmp_instr->src[i].swizzle); op[i].swizzle = brw_compose_swizzle(size_swizzle, base_swizzle); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 067/103] i965/vec4: Fix SSBO stores for 64-bit data
In this case we need to shuffle the 64-bit data before we write it to memory, source from reg_offset + 1 to write components Z and W and consider that each DF channel is twice as big. --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 40 -- 1 file changed, 32 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 001a62f..60a8425 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -510,7 +510,7 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) } /* Value */ - src_reg val_reg = get_nir_src(instr->src[0], 4); + src_reg val_reg = get_nir_src(instr->src[0], BRW_REGISTER_TYPE_F, 4); /* Writemask */ unsigned write_mask = instr->const_index[0]; @@ -556,24 +556,47 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) const vec4_builder bld = vec4_builder(this).at_end() .annotate(current_annotation, base_ir); - int swizzle[4] = { 0, 0, 0, 0}; + unsigned type_slots = nir_src_bit_size(instr->src[0]) / 32; + if (type_slots == 2) { + dst_reg tmp = dst_reg(this, glsl_type::dvec4_type); + shuffle_64bit_data(tmp, retype(val_reg, tmp.type), true); + val_reg = src_reg(retype(tmp, BRW_REGISTER_TYPE_F)); + } + + uint8_t swizzle[4] = { 0, 0, 0, 0}; int num_channels = 0; unsigned skipped_channels = 0; int num_components = instr->num_components; for (int i = 0; i < num_components; i++) { + /* Read components Z/W of a dvec from the appropriate place. We will + * also have to adjust the swizzle (we do that with the '% 4' below) + */ + if (i == 2 && type_slots == 2) +val_reg = offset(val_reg, 1); + /* Check if this channel needs to be written. If so, record the * channel we need to take the data from in the swizzle array */ int component_mask = 1 << i; int write_test = write_mask & component_mask; - if (write_test) -swizzle[num_channels++] = i; + if (write_test) { +/* If we are writing doubles we have to write 2 channels worth of + * of data (64 bits) for each double component. + */ +swizzle[num_channels++] = (i * type_slots) % 4; +if (type_slots == 2) + swizzle[num_channels++] = (i * type_slots + 1) % 4; + } /* If we don't have to write this channel it means we have a gap in the * vector, so write the channels we accumulated until now, if any. Do - * the same if this was the last component in the vector. + * the same if this was the last component in the vector, if we have + * enough channels for a full vec4 write or if we have processed + * components XY of a dvec (since components ZW are not in the same + * SIMD register) */ - if (!write_test || i == num_components - 1) { + if (!write_test || i == num_components - 1 || num_channels == 4 || + (i == 1 && type_slots == 2)) { if (num_channels > 0) { /* We have channels to write, so update the offset we need to * write at to skip the channels we skipped, if any. @@ -607,8 +630,9 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) num_channels = 0; } -/* We did not write the current channel, so increase skipped count */ -skipped_channels++; +/* If we didn't write the channel, increase skipped count */ +if (!write_test) + skipped_channels += type_slots; } } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 044/103] i965/vec4: add a horiz_offset() helper
This will come in handy when we implement a simd lowering pass in a follow-up patch. --- src/mesa/drivers/dri/i965/brw_ir_vec4.h | 41 + 1 file changed, 41 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h b/src/mesa/drivers/dri/i965/brw_ir_vec4.h index 7451f44..e271fe1 100644 --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h @@ -69,6 +69,40 @@ offset(src_reg reg, unsigned delta) return reg; } +static inline void +add_horiz_offset(backend_reg *reg, unsigned delta) +{ + switch (reg->file) { + case BAD_FILE: + break; + case MRF: + case VGRF: + case ATTR: + case UNIFORM: { + reg->offset += delta * type_sz(reg->type); + assert(reg->offset % 16 == 0); + break; + } + case ARF: + case FIXED_GRF: { + const unsigned suboffset = reg->subnr + delta * type_sz(reg->type); + reg->nr += suboffset / REG_SIZE; + reg->subnr = suboffset % REG_SIZE; + assert(reg->subnr % 16 == 0); + break; + } + default: + assert(delta == 0); + } +} + +static inline src_reg +horiz_offset(src_reg reg, unsigned delta) +{ + add_horiz_offset(®, delta); + return reg; +} + /** * Reswizzle a given source register. * \sa brw_swizzle(). @@ -139,6 +173,13 @@ offset(dst_reg reg, unsigned delta) } static inline dst_reg +horiz_offset(dst_reg reg, unsigned delta) +{ + add_horiz_offset(®, delta); + return reg; +} + +static inline dst_reg writemask(dst_reg reg, unsigned mask) { assert(reg.file != IMM); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 041/103] i965/vec4: use the IR's execution size
In the vec4 backend the generator sets to 8 the execution size for all instructions by default, however, to implement 64-bit floating-point we will need to split certain instruction into smaller sizes so we need the IR to convey this information like we do in the scalar backend. This patch uses the execution size from the vec4 IR. We will use this feature in a later patch when we implement a SIMD splitting pass. v2: - Drop the assertion on the execution size being 8 or 4 (Curro) - Use exec_size from backend_instruction (Curro) --- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index 4d05fcd..e4e2742 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -1498,6 +1498,7 @@ generate_code(struct brw_codegen *p, brw_set_default_saturate(p, inst->saturate); brw_set_default_mask_control(p, inst->force_writemask_all); brw_set_default_acc_write_control(p, inst->writes_accumulator); + brw_set_default_exec_size(p, cvt(inst->exec_size) - 1); assert(inst->base_mrf + inst->mlen <= BRW_MAX_MRF(devinfo->gen)); assert(inst->mlen <= BRW_MAX_MSG_LENGTH); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 051/103] i965/vec4: teach cmod propagation about different execution sizes
We can't propagate the conditional modifier from one instruction to another of a different execution size / group, since that would change the channels affected by the conditional. --- src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp index c531fba..4454cdb 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp @@ -76,7 +76,9 @@ opt_cmod_propagation_local(bblock_t *block) scan_inst->dst.writemask != WRITEMASK_XYZW) || (scan_inst->dst.writemask == WRITEMASK_XYZW && inst->src[0].swizzle != BRW_SWIZZLE_XYZW) || -(inst->dst.writemask & ~scan_inst->dst.writemask) != 0) { +(inst->dst.writemask & ~scan_inst->dst.writemask) != 0 || +scan_inst->exec_size != inst->exec_size || +scan_inst->group != inst->group) { break; } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 074/103] i965/vec4: Do not use DepCtrl with 64-bit instructions
The BDW PRM says that it is not supported, but it seems that gen7 is also affected, since doing DepCtrl on double-float instructions leads to GPU hangs in some cases, which is probably not surprising knowing that this is not supported in new hardware iterations. The SKL PRMs do not mention this restriction, so it is probably fine. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 7f6acc3..f60334f 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -908,12 +908,16 @@ vec4_visitor::is_dep_ctrl_unsafe(const vec4_instruction *inst) (reg.type == BRW_REGISTER_TYPE_UD || \ reg.type == BRW_REGISTER_TYPE_D) +#define IS_64BIT(reg) (reg.file != BAD_FILE && type_sz(reg.type) == 8) + /* From the Cherryview and Broadwell PRMs: * * "When source or destination datatype is 64b or operation is integer DWord * multiply, DepCtrl must not be used." * -* SKL PRMs don't include this restriction though. +* SKL PRMs don't include this restriction, however, gen7 seems to be +* affected, at least by the 64b restriction, since DepCtrl with double +* precision instructions seems to produce GPU hangs in some cases. */ if (devinfo->gen == 8) { if (inst->opcode == BRW_OPCODE_MUL && @@ -921,6 +925,14 @@ vec4_visitor::is_dep_ctrl_unsafe(const vec4_instruction *inst) IS_DWORD(inst->src[1])) return true; } + + if (devinfo->gen >= 7 && devinfo->gen <= 8) { + if (IS_64BIT(inst->dst) || IS_64BIT(inst->src[0]) || + IS_64BIT(inst->src[1]) || IS_64BIT(inst->src[2])) + return true; + } + +#undef IS_64BIT #undef IS_DWORD if (devinfo->gen >= 8) { -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 019/103] i965/vec4: Fix DCE for VEC4_OPCODE_SET_{LOW, HIGH}_32BIT
These align1 opcodes do partial writes of 64-bit data. The problem is that we want to use them to write on the same register to implement packDouble2x32 and from the point of view of DCE, since both opcodes write to the same register, only the last one stands and decides to eliminate the first, which is not correct, so prevent this from happening. v2: Make a helper in vec4_instruction to know if the instruction is an align1 partial write. This will come in handy when we implement a simd splitting pass in a later patch. --- src/mesa/drivers/dri/i965/brw_ir_vec4.h| 6 ++ src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp | 3 ++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h b/src/mesa/drivers/dri/i965/brw_ir_vec4.h index a8e5f4a..7451f44 100644 --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h @@ -232,6 +232,12 @@ public: bool can_change_types() const; bool has_source_and_destination_hazard() const; + bool is_align1_partial_write() + { + return opcode == VEC4_OPCODE_SET_LOW_32BIT || + opcode == VEC4_OPCODE_SET_HIGH_32BIT; + } + bool reads_flag() { return predicate || opcode == VS_OPCODE_UNPACK_FLAGS_SIMD4X2; diff --git a/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp index 50706a9..950c6c8 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp @@ -109,7 +109,8 @@ vec4_visitor::dead_code_eliminate() } } - if (inst->dst.file == VGRF && !inst->predicate) { + if (inst->dst.file == VGRF && !inst->predicate && + !inst->is_align1_partial_write()) { for (unsigned i = 0; i < regs_written(inst); i++) { for (int c = 0; c < 4; c++) { if (inst->dst.writemask & (1 << c)) { -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 077/103] i965/vec4: fix scratch reads for 64bit data
v2: Setup for a 64-bit scratch read by checking the type size of the correct register (Iago) --- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 5d47f83..44e6709 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -1499,7 +1499,17 @@ vec4_visitor::emit_scratch_read(bblock_t *block, vec4_instruction *inst, src_reg index = get_scratch_offset(block, inst, orig_src.reladdr, reg_offset); - emit_before(block, inst, SCRATCH_READ(temp, index)); + if (type_sz(orig_src.type) < 8) { + emit_before(block, inst, SCRATCH_READ(temp, index)); + } else { + dst_reg shuffled = dst_reg(this, glsl_type::dvec4_type); + dst_reg shuffled_float = retype(shuffled, BRW_REGISTER_TYPE_F); + emit_before(block, inst, SCRATCH_READ(shuffled_float, index)); + index = get_scratch_offset(block, inst, orig_src.reladdr, reg_offset + 1); + vec4_instruction *last_read = SCRATCH_READ(offset(shuffled_float, 1), index); + emit_before(block, inst, last_read); + shuffle_64bit_data(temp, src_reg(shuffled), false, block, last_read); + } } /** @@ -1565,7 +1575,8 @@ vec4_visitor::emit_resolve_reladdr(int scratch_loc[], bblock_t *block, /* Now handle scratch access on src */ if (src.file == VGRF && scratch_loc[src.nr] != -1) { - dst_reg temp = dst_reg(this, glsl_type::vec4_type); + dst_reg temp = dst_reg(this, type_sz(src.type) == 8 ? + glsl_type::dvec4_type : glsl_type::vec4_type); emit_scratch_read(block, inst, temp, src, scratch_loc[src.nr]); src.nr = temp.nr; src.offset %= REG_SIZE; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 033/103] i965/vec4: implement d2b
v2 (Curo): - Generate the flag register with a predicated MOV instead of a CMP instruction, which has the benefit that we can skip loading a DF 0.0 constant. - Avoid the PICK_LOW_32BIT + MOV by using the flag result and a SEL to set the boolean result. --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 18 ++ 1 file changed, 18 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index cc10247..69f11ff 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -1547,6 +1547,24 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) emit(CMP(dst, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ)); break; + case nir_op_d2b: { + /* We use a predicated MOV to check if the provided value is 0.0. We want + * this to flush denormalized numbers to zero, so we set a source modifier + * on the source operand to trigger this, as source modifiers don't + * affect the result of the testing against 0.0. + */ + src_reg value = op[0]; + value.abs = true; + vec4_instruction *inst = emit(MOV(dst_null_df(), value)); + inst->conditional_mod = BRW_CONDITIONAL_NZ; + + src_reg one = src_reg(this, glsl_type::ivec4_type); + emit(MOV(dst_reg(one), brw_imm_d(~0))); + inst = emit(BRW_OPCODE_SEL, dst, one, brw_imm_d(0)); + inst->predicate = BRW_PREDICATE_NORMAL; + break; + } + case nir_op_i2b: emit(CMP(dst, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ)); break; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 053/103] i965/vec4: add a scalarization pass for double-precision instructions
The hardware only supports 32-bit swizzles, which means that we can only access directly channels XY of a DF making access to channels ZW more difficult, specially considering the various regioning restrictions imposed by the hardware. The combination of both things makes handling ramdom swizzles on DF operands rather difficult, as there are many combinations that can't be represented at all, at least not without some work and some level of instruction splitting depending on the case. Writemasks are 64-bit in general, however XY and ZW writemasks also work in 32-bit, which means these writemasks can't be represented natively, adding to the complexity. For now, we decided to try and simplify things as much as possible to avoid dealing with all this from the get go by adding a scalarization pass that runs after the main optimization loop. By fully scalarizing DF instructions in align16 we avoid most of the complexity introduced by the aforementioned hardware restrictions and we have an easier path to an initial fully functional version for the vector backend in Haswell and IvyBridge. Later, we can improve the implementation so we don't necessarily scalarize everything, iteratively adding more complexity and building on top of a framework that is already working. Curro drafted some ideas for how this could be done here: https://bugs.freedesktop.org/show_bug.cgi?id=92760#c82 v2: - Use a copy constructor for the scalar instructions so we copy all relevant instructions fields from the original instruction. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 91 ++ src/mesa/drivers/dri/i965/brw_vec4.h | 1 + 2 files changed, 92 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 48816be..b15fcee 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -2137,6 +2137,95 @@ vec4_visitor::lower_simd_width() return progress; } +static bool +is_align1_df(vec4_instruction *inst) +{ + switch (inst->opcode) { + case VEC4_OPCODE_DOUBLE_TO_SINGLE: + case VEC4_OPCODE_SINGLE_TO_DOUBLE: + case VEC4_OPCODE_PICK_LOW_32BIT: + case VEC4_OPCODE_PICK_HIGH_32BIT: + case VEC4_OPCODE_SET_LOW_32BIT: + case VEC4_OPCODE_SET_HIGH_32BIT: + return true; + default: + return false; + } +} + +static brw_predicate +scalarize_predicate(brw_predicate predicate, unsigned writemask) +{ + if (predicate != BRW_PREDICATE_NORMAL) + return predicate; + + switch (writemask) { + case WRITEMASK_X: + return BRW_PREDICATE_ALIGN16_REPLICATE_X; + case WRITEMASK_Y: + return BRW_PREDICATE_ALIGN16_REPLICATE_Y; + case WRITEMASK_Z: + return BRW_PREDICATE_ALIGN16_REPLICATE_Z; + case WRITEMASK_W: + return BRW_PREDICATE_ALIGN16_REPLICATE_W; + default: + unreachable("invalid writemask"); + } +} + +bool +vec4_visitor::scalarize_df() +{ + bool progress = false; + + foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) { + /* Skip DF instructions that operate in Align1 mode */ + if (is_align1_df(inst)) + continue; + + /* Check if this is a double-precision instruction */ + bool is_double = type_sz(inst->dst.type) == 8; + for (int arg = 0; !is_double && arg < 3; arg++) { + is_double = inst->src[arg].file != BAD_FILE && + type_sz(inst->src[arg].type) == 8; + } + + if (!is_double) + continue; + + /* Generate scalar instructions for each enabled channel */ + for (unsigned chan = 0; chan < 4; chan++) { + unsigned chan_mask = 1 << chan; + if (!(inst->dst.writemask & chan_mask)) +continue; + + vec4_instruction *scalar_inst = new(mem_ctx) vec4_instruction(*inst); + + for (unsigned i = 0; i < 3; i++) { +unsigned swz = BRW_GET_SWZ(inst->src[i].swizzle, chan); +scalar_inst->src[i].swizzle = BRW_SWIZZLE4(swz, swz, swz, swz); + } + + scalar_inst->dst.writemask = chan_mask; + + if (inst->predicate != BRW_PREDICATE_NONE) { +scalar_inst->predicate = + scalarize_predicate(inst->predicate, chan_mask); + } + + inst->insert_before(block, scalar_inst); + } + + inst->remove(block); + progress = true; + } + + if (progress) + invalidate_live_intervals(); + + return progress; +} + bool vec4_visitor::run() { @@ -2236,6 +2325,8 @@ vec4_visitor::run() if (failed) return false; + OPT(scalarize_df); + setup_payload(); if (unlikely(INTEL_DEBUG & DEBUG_SPILL_VEC4)) { diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 3f7045e..03c7345 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -163,6 +163,7 @@ public: void convert_to_hw_regs(); bool lower_simd_width(); + bool scal
[Mesa-dev] [PATCH v2 097/103] i965/vec4: run scalarize_df() after spilling
Spilling of 64-bit data requires data shuffling for the corresponding scratch read/write messages. This produces unsupported swizzle regions and writemasks that we need to scalarize. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 18 ++ 1 file changed, 18 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index edb8a84..29ac2d6 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -2632,6 +2632,12 @@ vec4_visitor::run() return false; OPT(translate_64bit_mad_to_mul_add); + + /* Run this before payload setup because tesselation shaders +* rely on it to prevent cross dvec2 regioning on DF attributes +* that are setup so that XY are on the second half of register and +* ZW are in the first half of the next. +*/ OPT(scalarize_df); setup_payload(); @@ -2647,6 +2653,12 @@ vec4_visitor::run() continue; spill_reg(i); } + + /* We want to run this after spilling because 64-bit (un)spills need to + * emit code to shuffle 64-bit data for the 32-bit scratch read/write + * messages that can produce unsupported 64-bit swizzle regions. + */ + OPT(scalarize_df); } bool allocated_without_spills = reg_allocate(); @@ -2662,6 +2674,12 @@ vec4_visitor::run() if (failed) return false; } + + /* We want to run this after spilling because 64-bit (un)spills need to + * emit code to shuffle 64-bit data for the 32-bit scratch read/write + * messages that can produce unsupported 64-bit swizzle regions. + */ + OPT(scalarize_df); } opt_schedule_instructions(); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 039/103] i965/vec4: fix size_written for doubles
--- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 619e010..4e7515c 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -46,7 +46,6 @@ vec4_instruction::vec4_instruction(enum opcode opcode, const dst_reg &dst, this->predicate = BRW_PREDICATE_NONE; this->predicate_inverse = false; this->target = 0; - this->size_written = (dst.file == BAD_FILE ? 0 : REG_SIZE); this->shadow_compare = false; this->ir = NULL; this->urb_write_flags = BRW_URB_WRITE_NO_FLAGS; @@ -56,6 +55,8 @@ vec4_instruction::vec4_instruction(enum opcode opcode, const dst_reg &dst, this->base_mrf = 0; this->offset = 0; this->exec_size = 8; + this->size_written = (dst.file == BAD_FILE ? + 0 : this->exec_size * type_sz(dst.type)); this->annotation = NULL; } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 016/103] i965/vec4: add dst_null_df()
Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_vec4.h | 5 + 1 file changed, 5 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 1505ba6..86e58f3 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -79,6 +79,11 @@ public: return dst_reg(brw_null_reg()); } + dst_reg dst_null_df() + { + return dst_reg(retype(brw_null_reg(), BRW_REGISTER_TYPE_DF)); + } + dst_reg dst_null_d() { return dst_reg(retype(brw_null_reg(), BRW_REGISTER_TYPE_D)); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 084/103] i965/vec4: fix attribute setup for doubles
--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 426faf0..56a46ad 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1659,12 +1659,19 @@ vec4_visitor::dump_instruction(backend_instruction *be_inst, FILE *file) static inline struct brw_reg -attribute_to_hw_reg(int attr, bool interleaved) +attribute_to_hw_reg(int attr, brw_reg_type type, bool interleaved) { - if (interleaved) - return stride(brw_vec4_grf(attr / 2, (attr % 2) * 4), 0, 4, 1); - else - return brw_vec8_grf(attr, 0); + struct brw_reg reg; + + unsigned width = REG_SIZE / 2 / MAX2(4, type_sz(type)); + if (interleaved) { + reg = stride(brw_vecn_grf(width, attr / 2, (attr % 2) * 4), 0, width, 1); + } else { + reg = brw_vecn_grf(width, attr, 0); + } + + reg.type = type; + return reg; } @@ -1698,9 +1705,9 @@ vec4_visitor::lower_attributes_to_hw_regs(const int *attribute_map, */ assert(grf != 0); - struct brw_reg reg = attribute_to_hw_reg(grf, interleaved); + struct brw_reg reg = +attribute_to_hw_reg(grf, inst->src[i].type, interleaved); reg.swizzle = inst->src[i].swizzle; - reg.type = inst->src[i].type; if (inst->src[i].abs) reg = brw_abs(reg); if (inst->src[i].negate) -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 037/103] i965/vec4: use the new helper function to create double immediates
From: Samuel Iglesias Gonsálvez Signed-off-by: Samuel Iglesias Gonsálvez --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 4d5fa96..1da7c85 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -391,7 +391,7 @@ vec4_visitor::nir_emit_load_const(nir_load_const_instr *instr) reg.writemask = writemask; if (instr->def.bit_size == 64) { - emit(MOV(reg, brw_imm_df(instr->value.f64[i]))); + emit(MOV(reg, setup_imm_df(instr->value.f64[i]))); } else { emit(MOV(reg, brw_imm_d(instr->value.i32[i]))); } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 047/103] i965/vec4: make the generator set correct NibCtrl for SIMD4 DF instructions
From the HSW PRM, Command Reference, QtrCtrl: "NibCtrl is only allowed for SIMD4 instructions with a DF (Double Float) source or destination type." v2: Assert that the type is DF (Samuel) v3: Don't set the default group to 0 and then set it only for 4-wide instructions. Instead, assert that exec size and group are always a correct match and then always set the default group from the instruction. (Curro) Signed-off-by: Samuel Iglesias Gonsálvez --- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 9 + 1 file changed, 9 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index e4e2742..33071f2 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -1500,6 +1500,15 @@ generate_code(struct brw_codegen *p, brw_set_default_acc_write_control(p, inst->writes_accumulator); brw_set_default_exec_size(p, cvt(inst->exec_size) - 1); + assert(inst->group % inst->exec_size == 0); + assert(inst->group % 8 == 0 || + inst->dst.type == BRW_REGISTER_TYPE_DF || + inst->src[0].type == BRW_REGISTER_TYPE_DF || + inst->src[1].type == BRW_REGISTER_TYPE_DF || + inst->src[2].type == BRW_REGISTER_TYPE_DF); + if (!inst->force_writemask_all) + brw_set_default_group(p, inst->group); + assert(inst->base_mrf + inst->mlen <= BRW_MAX_MRF(devinfo->gen)); assert(inst->mlen <= BRW_MAX_MSG_LENGTH); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 024/103] i965/vec4: fix base offset for nir_registers with doubles
v2: do this inside dst_reg_for_nir_reg() instead of its callers --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 815082e..860ec51 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -259,6 +259,8 @@ dst_reg_for_nir_reg(vec4_visitor *v, nir_register *nir_reg, dst_reg reg; reg = v->nir_locals[nir_reg->index]; + if (nir_reg->bit_size == 64) + base_offset *= 2; reg = offset(reg, base_offset); if (indirect) { reg.reladdr = -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 009/103] i965/vec4: add double/float conversion pseudo-opcodes
These need to be emitted as align1 MOV's, since they need to have a stride of 2 on the float register (whether src or dest) so that data from another thread doesn't cross the middle of a SIMD8 register. v2 (Iago): - The float-to-double needs to align 32-bit data to 64-bit before doing the conversion. This was doable in align16 when we tried to use an execsize of 4, but with an execsize of 8 we would need another align1 opcode to do that (since we need data to cross the middle of a SIMD register). Just making the opcode handle this internally seems more practical that adding another opcode just for this purpose and having the caller know about this before converting. - The double-to-float conversion produces 32-bit elements aligned to 64-bit so we make the opcode re-pack the result to 32-bit and fit in one register, as expected by SIMD4x2 operation. This still requires that callers reserve two registers for the float data destination because we need to produce 64-bit aligned data first, and repack it later on the same destination register, but it saves the need for a re-pack opcode only to achieve this making the operation complete in a single opcode. Hopefully that is worth the weirdness of the double register allocation... Signed-off-by: Connor Abbott Signed-off-by: Iago Toral Quiroga --- src/mesa/drivers/dri/i965/brw_defines.h | 2 ++ src/mesa/drivers/dri/i965/brw_shader.cpp | 4 +++ src/mesa/drivers/dri/i965/brw_vec4.cpp | 8 + src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 44 4 files changed, 58 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index c4e0f27..79b96a4 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1098,6 +1098,8 @@ enum opcode { VEC4_OPCODE_MOV_BYTES, VEC4_OPCODE_PACK_BYTES, VEC4_OPCODE_UNPACK_UNIFORM, + VEC4_OPCODE_DOUBLE_TO_FLOAT, + VEC4_OPCODE_FLOAT_TO_DOUBLE, FS_OPCODE_DDX_COARSE, FS_OPCODE_DDX_FINE, diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index ed81563..b063f77 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -317,6 +317,10 @@ brw_instruction_name(const struct gen_device_info *devinfo, enum opcode op) return "pack_bytes"; case VEC4_OPCODE_UNPACK_UNIFORM: return "unpack_uniform"; + case VEC4_OPCODE_DOUBLE_TO_FLOAT: + return "double_to_float"; + case VEC4_OPCODE_FLOAT_TO_DOUBLE: + return "float_to_double"; case FS_OPCODE_DDX_COARSE: return "ddx_coarse"; diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index c29cfb5..32c04b2 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -253,6 +253,8 @@ vec4_instruction::can_do_writemask(const struct gen_device_info *devinfo) { switch (opcode) { case SHADER_OPCODE_GEN4_SCRATCH_READ: + case VEC4_OPCODE_DOUBLE_TO_FLOAT: + case VEC4_OPCODE_FLOAT_TO_DOUBLE: case VS_OPCODE_PULL_CONSTANT_LOAD: case VS_OPCODE_PULL_CONSTANT_LOAD_GEN7: case VS_OPCODE_SET_SIMD4X2_HEADER_GEN9: @@ -505,6 +507,12 @@ vec4_visitor::opt_reduce_swizzle() case BRW_OPCODE_DP2: swizzle = brw_swizzle_for_size(2); break; + + case VEC4_OPCODE_FLOAT_TO_DOUBLE: + case VEC4_OPCODE_DOUBLE_TO_FLOAT: + swizzle = brw_swizzle_for_size(4); + break; + default: swizzle = brw_swizzle_for_mask(inst->dst.writemask); break; diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index 163cf9d..6f4c438 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -1896,6 +1896,50 @@ generate_code(struct brw_codegen *p, break; } + case VEC4_OPCODE_DOUBLE_TO_FLOAT: { + assert(src[0].type == BRW_REGISTER_TYPE_DF); + assert(dst.type == BRW_REGISTER_TYPE_F); + + brw_set_default_access_mode(p, BRW_ALIGN_1); + + dst.hstride = BRW_HORIZONTAL_STRIDE_2; + dst.width = BRW_WIDTH_4; + src[0].vstride = BRW_VERTICAL_STRIDE_4; + src[0].width = BRW_WIDTH_4; + brw_MOV(p, dst, src[0]); + + struct brw_reg dst_as_src = dst; + dst.hstride = BRW_HORIZONTAL_STRIDE_1; + dst.width = BRW_WIDTH_8; + brw_MOV(p, dst, dst_as_src); + + brw_set_default_access_mode(p, BRW_ALIGN_16); + break; + } + + case VEC4_OPCODE_FLOAT_TO_DOUBLE: { + assert(src[0].type == BRW_REGISTER_TYPE_F); + assert(dst.type == BRW_REGISTER_TYPE_DF); + + brw_set_default_access_mode(p, BRW_ALIGN_1); + + struct brw_reg tmp = retype(dst, src[0].type); + tmp.hstride = BRW_HORIZONTAL_STRIDE_2; + tmp.wid
[Mesa-dev] [PATCH v2 079/103] i965/vec4: fix move_uniform_array_access_to_pull_constant() for 64-bit data
--- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index b0b5f39..f12a114 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -1817,8 +1817,23 @@ vec4_visitor::move_uniform_array_access_to_pull_constants() assert(inst->src[0].swizzle == BRW_SWIZZLE_NOOP); - emit_pull_constant_load(block, inst, inst->dst, inst->src[0], - pull_constant_loc[uniform_nr], inst->src[1]); + if (type_sz(inst->src[0].type) != 8) { + emit_pull_constant_load(block, inst, inst->dst, inst->src[0], + pull_constant_loc[uniform_nr], inst->src[1]); + } else { + dst_reg shuffled = dst_reg(this, glsl_type::dvec4_type); + dst_reg shuffled_float = retype(shuffled, BRW_REGISTER_TYPE_F); + + emit_pull_constant_load(block, inst, shuffled_float, inst->src[0], + pull_constant_loc[uniform_nr], inst->src[1]); + emit_pull_constant_load(block, inst, offset(shuffled_float, 1), + offset(inst->src[0], 1), + pull_constant_loc[uniform_nr], inst->src[1]); + + shuffle_64bit_data(retype(inst->dst, BRW_REGISTER_TYPE_DF), +src_reg(shuffled), false, block, inst); + } + inst->remove(block); } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 063/103] i965/vec4: support multiple dispatch widths and groups in the IR builder.
--- src/mesa/drivers/dri/i965/brw_vec4_builder.h | 39 ++-- 1 file changed, 37 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_builder.h b/src/mesa/drivers/dri/i965/brw_vec4_builder.h index dab6e03..8352542 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_builder.h +++ b/src/mesa/drivers/dri/i965/brw_vec4_builder.h @@ -52,8 +52,9 @@ namespace brw { /** * Construct a vec4_builder that inserts instructions into \p shader. */ - vec4_builder(backend_shader *shader) : + vec4_builder(backend_shader *shader, unsigned dispatch_width = 8) : shader(shader), block(NULL), cursor(NULL), + _dispatch_width(dispatch_width), _group(0), force_writemask_all(false), annotation() { @@ -67,6 +68,7 @@ namespace brw { */ vec4_builder(backend_shader *shader, bblock_t *block, instruction *inst) : shader(shader), block(block), cursor(inst), + _dispatch_width(inst->exec_size), _group(inst->group), force_writemask_all(inst->force_writemask_all) { annotation.str = inst->annotation; @@ -99,6 +101,25 @@ namespace brw { } /** + * Construct a builder specifying the default SIMD width and group of + * channel enable signals, inheriting other code generation parameters + * from this. + * + * \p n gives the default SIMD width, \p i gives the slot group used for + * predication and control flow masking in multiples of \p n channels. + */ + vec4_builder + group(unsigned n, unsigned i) const + { + assert(force_writemask_all || +(n <= dispatch_width() && i < dispatch_width() / n)); + vec4_builder bld = *this; + bld._dispatch_width = n; + bld._group += i * n; + return bld; + } + + /** * Construct a builder with per-channel control flow execution masking * disabled if \p b is true. If control flow execution masking is * already disabled this has no effect. @@ -130,7 +151,16 @@ namespace brw { unsigned dispatch_width() const { - return 8; + return _dispatch_width; + } + + /** + * Get the channel group in use. + */ + unsigned + group() const + { + return _group; } /** @@ -281,7 +311,10 @@ namespace brw { instruction * emit(instruction *inst) const { + inst->exec_size = dispatch_width(); + inst->group = group(); inst->force_writemask_all = force_writemask_all; + inst->size_written = inst->exec_size * type_sz(inst->dst.type); inst->annotation = annotation.str; inst->ir = annotation.ir; @@ -587,6 +620,8 @@ namespace brw { bblock_t *block; exec_node *cursor; + unsigned _dispatch_width; + unsigned _group; bool force_writemask_all; /** Debug annotation info. */ -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 093/103] i965/vec4: split instructions that read 64-bit interleaved attributes
Stages that use interleaved attributes generate regions with a vstride=0 that can hit the gen7 hardware decompression bug. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 28 ++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 33a8c52..9b9bef1 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -2033,6 +2033,20 @@ vec4_visitor::convert_to_hw_regs() } } +bool +stage_uses_interleaved_attributes(unsigned stage, + enum shader_dispatch_mode dispatch_mode) +{ + switch (stage) { + case MESA_SHADER_TESS_EVAL: + return true; + case MESA_SHADER_GEOMETRY: + return dispatch_mode != DISPATCH_MODE_4X2_DUAL_OBJECT; + default: + return false; + } +} + /** * Get the closest native SIMD width supported by the hardware for instruction * \p inst. The instruction will be left untouched by @@ -2041,7 +2055,8 @@ vec4_visitor::convert_to_hw_regs() */ static unsigned get_lowered_simd_width(const struct gen_device_info *devinfo, - const vec4_instruction *inst) + enum shader_dispatch_mode dispatch_mode, + unsigned stage, const vec4_instruction *inst) { /* Do not split some instructions that require special handling */ switch (inst->opcode) { @@ -2076,6 +2091,14 @@ get_lowered_simd_width(const struct gen_device_info *devinfo, continue; if (inst->size_read(i) <= REG_SIZE) lowered_width = MIN2(lowered_width, 4); + + /* Interleaved attribute setups use a vertical stride of 0, which + * makes them hit the associated instruction decompression bug in gen7. + * Split them to prevent this. + */ + if (inst->src[i].file == ATTR && + stage_uses_interleaved_attributes(stage, dispatch_mode)) +lowered_width = MIN2(lowered_width, 4); } } @@ -2117,7 +2140,8 @@ vec4_visitor::lower_simd_width() bool progress = false; foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) { - const unsigned lowered_width = get_lowered_simd_width(devinfo, inst); + const unsigned lowered_width = + get_lowered_simd_width(devinfo, prog_data->dispatch_mode, stage, inst); assert(lowered_width <= inst->exec_size); if (lowered_width == inst->exec_size) continue; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 075/103] i965/vec4: do not split scratch read/write opcodes
64-bit scratch read/writes require to shuffle data around so we need to have access to the full 64-bit data. We will do the right thing for these when we emit the messages. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 9 + 1 file changed, 9 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index f60334f..75e47f9 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -2034,6 +2034,15 @@ static unsigned get_lowered_simd_width(const struct gen_device_info *devinfo, const vec4_instruction *inst) { + /* Do not split some instructions that require special handling */ + switch (inst->opcode) { + case SHADER_OPCODE_GEN4_SCRATCH_READ: + case SHADER_OPCODE_GEN4_SCRATCH_WRITE: + return inst->exec_size; + default: + break; + } + unsigned lowered_width = MIN2(16, inst->exec_size); /* We need to split some cases of double-precision instructions that write -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 083/103] i965/vec4: fix indentation in lower_attributes_to_hw_regs()
--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index e732bf4..426faf0 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1686,8 +1686,8 @@ vec4_visitor::lower_attributes_to_hw_regs(const int *attribute_map, { foreach_block_and_inst(block, vec4_instruction, inst, cfg) { for (int i = 0; i < 3; i++) { -if (inst->src[i].file != ATTR) - continue; + if (inst->src[i].file != ATTR) +continue; int grf = attribute_map[inst->src[i].nr + inst->src[i].offset / REG_SIZE]; @@ -1698,13 +1698,13 @@ vec4_visitor::lower_attributes_to_hw_regs(const int *attribute_map, */ assert(grf != 0); -struct brw_reg reg = attribute_to_hw_reg(grf, interleaved); -reg.swizzle = inst->src[i].swizzle; + struct brw_reg reg = attribute_to_hw_reg(grf, interleaved); + reg.swizzle = inst->src[i].swizzle; reg.type = inst->src[i].type; -if (inst->src[i].abs) - reg = brw_abs(reg); -if (inst->src[i].negate) - reg = negate(reg); + if (inst->src[i].abs) +reg = brw_abs(reg); + if (inst->src[i].negate) +reg = negate(reg); inst->src[i] = reg; } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 052/103] i965/vec4: split double-precision bcsel
There is a hardware bug affecting compressed double-precision bcsel instructions in align16 mode by which they won't read predication mask properly. The bug does not affect other predicated instructions and it does not affect bcsel in Align1 mode either. This was found empirically and verified by Curro in the simulator. Fix this by splitting double-precision bcsel in Align16 mode to use an execution size of 4. v2: Check that the dst type is 64-bit, since we can have 16-wide single precision bcsel instructions that also write 2 registers. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 69fdb1e..48816be 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1997,6 +1997,12 @@ get_lowered_simd_width(const struct gen_device_info *devinfo, * only hardware that implements fp64 in Align16. */ if (devinfo->gen == 7 && inst->size_written > REG_SIZE) { + /* Align16 8-wide double-precision bcsel does not work well. Verified + * empirically. + */ + if (inst->opcode == BRW_OPCODE_SEL && type_sz(inst->dst.type) == 8) + lowered_width = MIN2(lowered_width, 4); + /* HSW PRM, 3D Media GPGPU Engine, Region Alignment Rules for Direct * Register Addressing: * -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 042/103] i965/vec4: dump the instruction execution size
Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 2bde628..3191eab 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1437,7 +1437,8 @@ vec4_visitor::dump_instruction(backend_instruction *be_inst, FILE *file) pred_ctrl_align16[inst->predicate]); } - fprintf(file, "%s", brw_instruction_name(devinfo, inst->opcode)); + fprintf(file, "%s(%d)", brw_instruction_name(devinfo, inst->opcode), + inst->exec_size); if (inst->saturate) fprintf(file, ".sat"); if (inst->conditional_mod) { -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 068/103] i965/vec4: don't constant propagate 64-bit immediates
From: Connor Abbott v2: Also check if the instruction source target is 64-bit. (Samuel) Signed-off-by: Samuel Iglesias Gonsálvez --- src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp index 49920c2..7b53aed 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp @@ -151,6 +151,13 @@ try_constant_propagate(const struct gen_device_info *devinfo, if (value.file != IMM) return false; + /* 64-bit types can't be used except for one-source instructions, which +* higher levels should have constant folded away, so there's no point in +* propagating immediates here. +*/ + if (type_sz(value.type) == 8 || type_sz(inst->src[arg].type) == 8) + return false; + if (value.type == BRW_REGISTER_TYPE_VF) { /* The result of bit-casting the component values of a vector float * cannot in general be represented as an immediate. -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 043/103] i965/vec4: handle 32 and 64 bit channels in liveness analysis
From: "Juan A. Suarez Romero" Our current data flow analysis does not take into account that channels on 64-bit operands are 64-bit. This is a problem when the same register is accessed using both 64-bit and 32-bit channels. This is very common in operations where we need to access 64-bit data in 32-bit chunks, such as the double packing and packing operations. This patch changes the analysis by checking the bits that each source or destination datatype needs. Actually, rather than bits, we use blocks of 32bits, which is the minimum channel size. Because a vgrf can contain a dvec4 (256 bits), we reserve 8 32-bit blocks to map the channels. v2 (Curro): - Simplify code by making the var_from_reg helpers take an extra argument with the register component we want. - Fix a couple of cases where we had to update the code to the new way of representing live variables. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 2 +- .../dri/i965/brw_vec4_dead_code_eliminate.cpp | 25 + .../drivers/dri/i965/brw_vec4_live_variables.cpp | 32 +++--- .../drivers/dri/i965/brw_vec4_live_variables.h | 15 ++ 5 files changed, 42 insertions(+), 34 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 3191eab..34cab04 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1140,7 +1140,7 @@ vec4_visitor::opt_register_coalesce() /* Can't coalesce this GRF if someone else was going to * read it later. */ - if (var_range_end(var_from_reg(alloc, dst_reg(inst->src[0])), 4) > ip) + if (var_range_end(var_from_reg(alloc, dst_reg(inst->src[0])), 8) > ip) continue; /* We need to check interference with the final destination between this diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp index 1b91db9..bef897a 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp @@ -246,7 +246,7 @@ vec4_visitor::opt_cse_local(bblock_t *block) * more -- a sure sign they'll fail operands_match(). */ if (src->file == VGRF) { - if (var_range_end(var_from_reg(alloc, dst_reg(*src)), 4) < ip) { + if (var_range_end(var_from_reg(alloc, dst_reg(*src)), 8) < ip) { entry->remove(); ralloc_free(entry); break; diff --git a/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp index 950c6c8..6a80810 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp @@ -57,12 +57,13 @@ vec4_visitor::dead_code_eliminate() if ((inst->dst.file == VGRF && !inst->has_side_effects()) || (inst->dst.is_null() && inst->writes_flag())){ bool result_live[4] = { false }; - if (inst->dst.file == VGRF) { - for (unsigned i = 0; i < regs_written(inst); i++) { - for (int c = 0; c < 4; c++) - result_live[c] |= BITSET_TEST( -live, var_from_reg(alloc, offset(inst->dst, i), c)); + for (unsigned i = 0; i < 2 * regs_written(inst); i++) { + for (int c = 0; c < 4; c++) { + const unsigned v = +var_from_reg(alloc, inst->dst, c, i); + result_live[c] |= BITSET_TEST(live, v); + } } } else { for (unsigned c = 0; c < 4; c++) @@ -111,11 +112,12 @@ vec4_visitor::dead_code_eliminate() if (inst->dst.file == VGRF && !inst->predicate && !inst->is_align1_partial_write()) { -for (unsigned i = 0; i < regs_written(inst); i++) { +for (unsigned i = 0; i < 2 * regs_written(inst); i++) { for (int c = 0; c < 4; c++) { if (inst->dst.writemask & (1 << c)) { - BITSET_CLEAR(live, var_from_reg(alloc, - offset(inst->dst, i), c)); + const unsigned v = +var_from_reg(alloc, inst->dst, c, i); + BITSET_CLEAR(live, v); } } } @@ -133,10 +135,11 @@ vec4_visitor::dead_code_eliminate() for (int i = 0; i < 3; i++) { if (inst->src[i].file == VGRF) { - for (unsigned j = 0; j < regs_read(inst, i); j++) { + for (unsigned j = 0; j < 2 * regs_read(inst, i); j++) { for (int c = 0; c < 4; c++) { - BITSET_SET(live, var_from_reg(alloc, -
[Mesa-dev] [PATCH v2 059/103] i965/vec4: fix indentation in pack_uniform_registers
--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index b79fd5e..45d49e9 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -660,25 +660,25 @@ vec4_visitor::pack_uniform_registers() int dst; /* Find the lowest place we can slot this uniform in. */ for (dst = 0; dst < src; dst++) { -if (chans_used[dst] + size <= 4) - break; + if (chans_used[dst] + size <= 4) +break; } if (src == dst) { -new_loc[src] = dst; -new_chan[src] = 0; + new_loc[src] = dst; + new_chan[src] = 0; } else { -new_loc[src] = dst; -new_chan[src] = chans_used[dst]; + new_loc[src] = dst; + new_chan[src] = chans_used[dst]; -/* Move the references to the data */ -for (int j = 0; j < size; j++) { - stage_prog_data->param[dst * 4 + new_chan[src] + j] = - stage_prog_data->param[src * 4 + j]; -} + /* Move the references to the data */ + for (int j = 0; j < size; j++) { +stage_prog_data->param[dst * 4 + new_chan[src] + j] = + stage_prog_data->param[src * 4 + j]; + } -chans_used[dst] += size; -chans_used[src] = 0; + chans_used[dst] += size; + chans_used[src] = 0; } new_uniform_count = MAX2(new_uniform_count, dst + 1); @@ -691,8 +691,8 @@ vec4_visitor::pack_uniform_registers() for (int i = 0 ; i < 3; i++) { int src = inst->src[i].nr; -if (inst->src[i].file != UNIFORM) - continue; + if (inst->src[i].file != UNIFORM) +continue; inst->src[i].nr = new_loc[src]; inst->src[i].swizzle += BRW_SWIZZLE4(new_chan[src], new_chan[src], -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 095/103] i965/vec4/scalarize_df: support more swizzles via vstride=0
By exploiting gen7's hardware decompression bug with vstride=0 we gain the capacity to support additional swizzle combinations. This also fixes ZW writes from X/Y channels like in: mov r2.z:df r0.:df Because DF regions use 2-wide rows with a vstride of 2, the region generated for the source would be r0<2,2,1>.xyxy:DF, which is equivalent to r0.xxzz, so we end up writing r0.z in r2.z instead of r0.x. Using a vertical stride of 0 in these cases we get to replicate the XX swizzle and write what we want. --- src/mesa/drivers/dri/i965/brw_reg.h| 2 + src/mesa/drivers/dri/i965/brw_vec4.cpp | 68 -- src/mesa/drivers/dri/i965/brw_vec4.h | 2 +- 3 files changed, 51 insertions(+), 21 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_reg.h b/src/mesa/drivers/dri/i965/brw_reg.h index 39cc25a..f849f42 100644 --- a/src/mesa/drivers/dri/i965/brw_reg.h +++ b/src/mesa/drivers/dri/i965/brw_reg.h @@ -81,11 +81,13 @@ struct gen_device_info; #define BRW_SWIZZLE_ BRW_SWIZZLE4(2,2,2,2) #define BRW_SWIZZLE_ BRW_SWIZZLE4(3,3,3,3) #define BRW_SWIZZLE_XYXY BRW_SWIZZLE4(0,1,0,1) +#define BRW_SWIZZLE_YXYX BRW_SWIZZLE4(1,0,1,0) #define BRW_SWIZZLE_XZXZ BRW_SWIZZLE4(0,2,0,2) #define BRW_SWIZZLE_YZXW BRW_SWIZZLE4(1,2,0,3) #define BRW_SWIZZLE_YWYW BRW_SWIZZLE4(1,3,1,3) #define BRW_SWIZZLE_ZXYW BRW_SWIZZLE4(2,0,1,3) #define BRW_SWIZZLE_ZWZW BRW_SWIZZLE4(2,3,2,3) +#define BRW_SWIZZLE_WZWZ BRW_SWIZZLE4(3,2,3,2) #define BRW_SWIZZLE_WZYX BRW_SWIZZLE4(3,2,1,0) #define BRW_SWIZZLE_XXZZ BRW_SWIZZLE4(0,0,2,2) #define BRW_SWIZZLE_YYWW BRW_SWIZZLE4(1,1,3,3) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 438dce1..d33fb65 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -2259,18 +2259,33 @@ scalarize_predicate(brw_predicate predicate, unsigned writemask) } } +/* Gen7 has a hardware decompression bug that we can exploit to represent + * handful of additional swizzles natively. + */ +static bool +is_gen7_supported_64bit_swizzle(vec4_instruction *inst, unsigned arg) +{ + switch (inst->src[arg].swizzle) { + case BRW_SWIZZLE_: + case BRW_SWIZZLE_: + case BRW_SWIZZLE_: + case BRW_SWIZZLE_: + case BRW_SWIZZLE_XYXY: + case BRW_SWIZZLE_YXYX: + case BRW_SWIZZLE_ZWZW: + case BRW_SWIZZLE_WZWZ: + return true; + default: + return false; + } +} + /* 64-bit sources use regions with a width of 2. These 2 elements in each row * can be addressed using 32-bit swizzles (which is what the hardware supports) * but it also means that the swizzle we apply on the first two components of a * dvec4 is coupled with the swizzle we use for the last 2. In other words, * only some specific swizzle combinations can be natively supported. * - * FIXME: We can also exploit the vstride 0 decompression bug in gen7 to - *implement some more swizzles via simple translations. For - *example: as XYXY, as ZWZW (same for and by - *using subnr), XYXY as XYZW, YXYX as ZWXY (same for ZWZW and - *WZWZ using subnr). - * * FIXME: we can go an step further and implement even more swizzle *variations using only partial scalarization. * @@ -2278,8 +2293,9 @@ scalarize_predicate(brw_predicate predicate, unsigned writemask) * https://bugs.freedesktop.org/show_bug.cgi?id=92760#c82 */ bool -vec4_visitor::is_supported_64bit_region(src_reg src) +vec4_visitor::is_supported_64bit_region(vec4_instruction *inst, unsigned arg) { + const src_reg &src = inst->src[arg]; assert(type_sz(src.type) == 8); /* Uniform regions have a vstride=0. Because we use 2-wide rows with @@ -2301,7 +2317,7 @@ vec4_visitor::is_supported_64bit_region(src_reg src) case BRW_SWIZZLE_YXWZ: return true; default: - return false; + return devinfo->gen == 7 && is_gen7_supported_64bit_swizzle(inst, arg); } } @@ -2340,8 +2356,7 @@ vec4_visitor::scalarize_df() for (unsigned i = 0; i < 3; i++) { if (inst->src[i].file == BAD_FILE || type_sz(inst->src[i].type) < 8) continue; -skip_lowering = skip_lowering && -is_supported_64bit_region(inst->src[i]); +skip_lowering = skip_lowering && is_supported_64bit_region(inst, i); } } @@ -2455,9 +2470,10 @@ vec4_visitor::apply_logical_swizzle(struct brw_reg *hw_reg, /* Take the 64-bit logical swizzle channel and translate it to 32-bit */ assert(brw_is_single_value_swizzle(reg.swizzle) || - is_supported_64bit_region(reg)); + is_supported_64bit_region(inst, arg)); - if (is_supported_64bit_region(reg)) { + if (is_supported_64bit_region(inst, arg) && + !is_gen7_supported_64bit_swizzle(inst, arg)) { /* Supported 64-bit swizzles are those such
[Mesa-dev] [PATCH v2 027/103] i965/vec4: make opt_vector_float ignore doubles
The pass does not support doubles in its current form. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 06fa38f..675b7fc 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -393,6 +393,7 @@ vec4_visitor::opt_vector_float() inst->src[0].file == IMM && inst->predicate == BRW_PREDICATE_NONE && inst->dst.writemask != WRITEMASK_XYZW && + type_sz(inst->src[0].type) < 8 && (inst->src[0].type == inst->dst.type || inst->src[0].d == 0)) { vf = brw_float_to_vf(inst->src[0].d); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 048/103] i965/vec4: dump NibCtrl for instructions with execsize != 8
v2: do it in the same fashion as the FS backend for consistency (Curro) --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 490cbae..69fdb1e 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1612,6 +1612,9 @@ vec4_visitor::dump_instruction(backend_instruction *be_inst, FILE *file) if (inst->force_writemask_all) fprintf(file, " NoMask"); + if (inst->exec_size != 8) + fprintf(file, " group%d", inst->group); + fprintf(file, "\n"); } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 030/103] i965/vec4: add helpers for conversions to/from doubles
Use these helpers to implement d2f and f2d. We will reuse these helpers when we implement things like d2i or i2d as well. --- src/mesa/drivers/dri/i965/brw_vec4.h | 5 +++ src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 54 +++--- 2 files changed, 39 insertions(+), 20 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 86e58f3..0111966 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -317,6 +317,11 @@ public: bool optimize_predicate(nir_alu_instr *instr, enum brw_predicate *predicate); + void emit_double_to_single(dst_reg dst, src_reg src, bool saturate, + brw_reg_type single_type); + void emit_single_to_double(dst_reg dst, src_reg src, bool saturate, + brw_reg_type single_type); + virtual void emit_nir_code(); virtual void nir_setup_uniforms(); virtual void nir_setup_system_value_intrinsic(nir_intrinsic_instr *instr); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 502a290..94d0161 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -1074,6 +1074,34 @@ emit_find_msb_using_lzd(const vec4_builder &bld, } void +vec4_visitor::emit_double_to_single(dst_reg dst, src_reg src, bool saturate, +brw_reg_type single_type) +{ + dst_reg temp = dst_reg(this, glsl_type::dvec4_type); + emit(MOV(temp, src)); + + dst_reg temp2 = dst_reg(this, glsl_type::dvec4_type); + temp2 = retype(temp2, single_type); + emit(VEC4_OPCODE_DOUBLE_TO_SINGLE, temp2, src_reg(temp)) + ->size_written = 2 * REG_SIZE; + + vec4_instruction *inst = emit(MOV(dst, src_reg(temp2))); + inst->saturate = saturate; +} + +void +vec4_visitor::emit_single_to_double(dst_reg dst, src_reg src, bool saturate, +brw_reg_type single_type) +{ + dst_reg tmp_dst = dst_reg(src_reg(this, glsl_type::dvec4_type)); + src_reg tmp_src = retype(src_reg(this, glsl_type::vec4_type), single_type); + emit(MOV(dst_reg(tmp_src), retype(src, single_type))); + emit(VEC4_OPCODE_SINGLE_TO_DOUBLE, tmp_dst, tmp_src); + vec4_instruction *inst = emit(MOV(dst, src_reg(tmp_dst))); + inst->saturate = saturate; +} + +void vec4_visitor::nir_emit_alu(nir_alu_instr *instr) { vec4_instruction *inst; @@ -1117,29 +1145,15 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) inst = emit(MOV(dst, op[0])); break; - case nir_op_d2f: { - dst_reg temp = dst_reg(this, glsl_type::dvec4_type); - emit(MOV(temp, op[0])); - - dst_reg temp2 = dst_reg(this, glsl_type::dvec4_type); - temp2 = retype(temp2, BRW_REGISTER_TYPE_F); - emit(VEC4_OPCODE_DOUBLE_TO_SINGLE, temp2, src_reg(temp)) - ->regs_written = 2 * REG_SIZE; - - vec4_instruction *inst = emit(MOV(dst, src_reg(temp2))); - inst->saturate = instr->dest.saturate; + case nir_op_d2f: + emit_double_to_single(dst, op[0], instr->dest.saturate, +BRW_REGISTER_TYPE_F); break; - } - case nir_op_f2d: { - dst_reg tmp_dst = dst_reg(src_reg(this, glsl_type::dvec4_type)); - src_reg tmp_src = src_reg(this, glsl_type::vec4_type); - emit(MOV(dst_reg(tmp_src), retype(op[0], BRW_REGISTER_TYPE_F))); - emit(VEC4_OPCODE_SINGLE_TO_DOUBLE, tmp_dst, tmp_src); - vec4_instruction *inst = emit(MOV(dst, src_reg(tmp_dst))); - inst->saturate = instr->dest.saturate; + case nir_op_f2d: + emit_single_to_double(dst, op[0], instr->dest.saturate, +BRW_REGISTER_TYPE_F); break; - } case nir_op_iadd: assert(nir_dest_bit_size(instr->dest.dest) < 64); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 064/103] i965/vec4: Add a shuffle_64bit_data helper
SIMD4x2 64bit data is stored in register space like this: r0.0:DF x0 y0 z0 w0 r0.1:DF x1 y1 z1 w1 When we need to write data such as this to memory using 32-bit write messages we need to shuffle it in this fashion: r0.0:DF x0 y0 x1 y1 r0.1:DF z0 w0 z1 w1 and emit two 32-bit write messages, one for r0.0 at base_offset and another one for r0.1 at base_offset+16. We also need to do the inverse operation when we read using 32-bit messages to produce valid SIMD4x2 64bit data from the data read. We can achieve this by aplying the exact same shuffling to the data read, although we need to apply different channel enables since the layout of the data is reversed. This helper implements the data shuffling logic and we will use it in various places where we read and write 64bit data from/to memory. v2 (Curro): - Use the writemask helper and don't assert on the original writemask being XYZW. - Use the Vec4 IR builder to simplify the implementation. --- src/mesa/drivers/dri/i965/brw_vec4.h | 5 +++ src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 70 ++ 2 files changed, 75 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 0af55c5..6942918 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -330,6 +330,11 @@ public: src_reg setup_imm_df(double v); + vec4_instruction *shuffle_64bit_data(dst_reg dst, src_reg src, +bool for_write, +bblock_t *block = NULL, +vec4_instruction *ref = NULL); + virtual void emit_nir_code(); virtual void nir_setup_uniforms(); virtual void nir_setup_system_value_intrinsic(nir_intrinsic_instr *instr); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 0b8c808..04e95a7 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -2227,4 +2227,74 @@ vec4_visitor::nir_emit_undef(nir_ssa_undef_instr *instr) dst_reg(VGRF, alloc.allocate(DIV_ROUND_UP(instr->def.bit_size, 32))); } +/* SIMD4x2 64bit data is stored in register space like this: + * + * r0.0:DF x0 y0 z0 w0 + * r0.1:DF x1 y1 z1 w1 + * + * When we need to write data such as this to memory using 32-bit write + * messages we need to shuffle it in this fashion: + * + * r0.0:DF x0 y0 x1 y1 (to be written at base offset) + * r0.0:DF z0 w0 z1 w1 (to be written at base offset + 16) + * + * We need to do the inverse operation when we read using 32-bit messages, + * which we can do by applying the same exact shuffling on the 64-bit data + * read, only that because the data for each vertex is positioned differently + * we need to apply different channel enables. + * + * This function takes 64bit data and shuffles it as explained above. + * + * The @for_write parameter is used to specify if the shuffling is being done + * for proper SIMD4x2 64-bit data that needs to be shuffled prior to a 32-bit + * write message (for_write = true), or instead we are doing the inverse + * opperation and we have just read 64-bit data using a 32-bit messages that we + * need to shuffle to create valid SIMD4x2 64-bit data (for_write = false). + * + * If @block and @ref are non-NULL, then the shuffling is done after @ref, + * otherwise the instructions are emitted normally at the end. The function + * returns the last instruction inserted. + * + * Notice that @src and @dst cannot be the same register. + */ +vec4_instruction * +vec4_visitor::shuffle_64bit_data(dst_reg dst, src_reg src, bool for_write, + bblock_t *block, vec4_instruction *ref) +{ + assert(type_sz(src.type) == 8); + assert(type_sz(dst.type) == 8); + assert(!regions_overlap(dst, 2 * REG_SIZE, src, 2 * REG_SIZE)); + assert(!ref == !block); + + const vec4_builder bld = !ref ? vec4_builder(this).at_end() : + vec4_builder(this).at(block, ref->next); + + /* Resolve swizzle in src */ + vec4_instruction *inst; + if (src.swizzle != BRW_SWIZZLE_XYZW) { + dst_reg data = dst_reg(this, glsl_type::dvec4_type); + inst = bld.MOV(data, src); + src = src_reg(data); + } + + /* dst+0.XY = src+0.XY */ + inst = bld.group(4, 0).MOV(writemask(dst, WRITEMASK_XY), src); + + /* dst+0.ZW = src+1.XY */ + inst = bld.group(4, for_write ? 1 : 0). +MOV(writemask(dst, WRITEMASK_ZW), +swizzle(offset(src, 1), BRW_SWIZZLE_XYXY)); + + /* dst+1.XY = src+0.ZW */ + inst = bld.group(4, for_write ? 0 : 1). +MOV(writemask(offset(dst, 1), WRITEMASK_XY), +swizzle(src, BRW_SWIZZLE_ZWZW)); + + /* dst+1.ZW = src+1.ZW */ + inst = bld.group(4, 1). +MOV(writemask(offset(dst, 1), WRITEMASK_ZW), offset(src, 1)); + + return inst; +} + } -- 2.7.4
[Mesa-dev] [PATCH v2 055/103] i965/vec4: implement access to DF source components Z/W
The general idea is that with 32-bit swizzles we cannot address DF components Z/W directly, so instead we select the region that starts at the the 16B offset into the register and use X/Y swizzles. The above, however, has the caveat that we can't do that without violating register region restrictions unless we probably do some sort of SIMD splitting. Alternatively, we can accomplish what we need without SIMD splitting by exploiting the gen7 hardware decompression bug for instructions with a vstride=0. For example, an instruction like this: mov(8) r2.x:DF r0.2<0>xyzw:DF Activates the hardware bug and produces this region: Component: x0 y0 z0 w0 x1 y1 z1 w1 Register: r0.2 r0.3 r0.2 r0.3 r1.2 r1.3 r1.2 r1.3 Where r0.2 and r0.3 are r0.z:DF for the first vertex of the SIMD4x2 execution and r1.2 and r1.3 are the same for the second vertex. Using this to our advantage we can select r0.z:DF by doing r0.2<0,2,1>.xyxy and r0.w by doing r0.2<0,2,1>.zwzw without needing to split the instruction. Of course, this only works for gen7, but that is the only hardware platform were we implement align16/fp64 at the moment. v2: Adapted to the fact that we now do this after converting to hardware registers (Iago) --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 21 + 1 file changed, 21 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index b37dd59..c728e38 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -2264,7 +2264,28 @@ vec4_visitor::apply_logical_swizzle(struct brw_reg *hw_reg, */ assert(brw_is_single_value_swizzle(reg.swizzle)); + /* To gain access to Z/W components we need to select the second half +* of the register and then use a X/Y swizzle to select Z/W respectively. +*/ unsigned swizzle = BRW_GET_SWZ(reg.swizzle, 0); + + if (swizzle >= 2) { + *hw_reg = suboffset(*hw_reg, 2); + swizzle -= 2; + } + + /* Any 64-bit source with an offset at 16B is intended to address the +* second half of a register and needs a vertical stride of 0 so we: +* +* 1. Don't violate register region restrictions. +* 2. Activate the gen7 instruction decompresion bug exploit when +*execsize > 4 +*/ + if (hw_reg->subnr % REG_SIZE == 16) { + assert(devinfo->gen == 7); + hw_reg->vstride = BRW_VERTICAL_STRIDE_0; + } + hw_reg->swizzle = BRW_SWIZZLE4(swizzle * 2, swizzle * 2 + 1, swizzle * 2, swizzle * 2 + 1); } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 094/103] i965/vec4/scalarize_df: do not scalarize swizzles that we can support natively
Certain swizzles like XYZW can be supported by translating only the first two 64-bit swizzle channels to 32-bit channels. This happens with swizzles such that the first two logical components, when translated to 32-bit channels and replicated across the second dvec2 row, select the same channels specified by the 3rd and 4th logical swizzle components. Notice that this opens up the possibility that some instructions are not scalarized and can end up with XY or ZW 32-bit writemasks. Make sure we always scalarize in such cases. --- src/mesa/drivers/dri/i965/brw_reg.h| 3 + src/mesa/drivers/dri/i965/brw_vec4.cpp | 133 ++--- src/mesa/drivers/dri/i965/brw_vec4.h | 1 + 3 files changed, 112 insertions(+), 25 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_reg.h b/src/mesa/drivers/dri/i965/brw_reg.h index 1fa2595..39cc25a 100644 --- a/src/mesa/drivers/dri/i965/brw_reg.h +++ b/src/mesa/drivers/dri/i965/brw_reg.h @@ -87,6 +87,9 @@ struct gen_device_info; #define BRW_SWIZZLE_ZXYW BRW_SWIZZLE4(2,0,1,3) #define BRW_SWIZZLE_ZWZW BRW_SWIZZLE4(2,3,2,3) #define BRW_SWIZZLE_WZYX BRW_SWIZZLE4(3,2,1,0) +#define BRW_SWIZZLE_XXZZ BRW_SWIZZLE4(0,0,2,2) +#define BRW_SWIZZLE_YYWW BRW_SWIZZLE4(1,1,3,3) +#define BRW_SWIZZLE_YXWZ BRW_SWIZZLE4(1,0,3,2) #define BRW_SWZ_COMP_INPUT(comp) (BRW_SWIZZLE_XYZW >> ((comp)*2)) #define BRW_SWZ_COMP_OUTPUT(comp) (BRW_SWIZZLE_XYZW << ((comp)*2)) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 9b9bef1..438dce1 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -2259,6 +2259,52 @@ scalarize_predicate(brw_predicate predicate, unsigned writemask) } } +/* 64-bit sources use regions with a width of 2. These 2 elements in each row + * can be addressed using 32-bit swizzles (which is what the hardware supports) + * but it also means that the swizzle we apply on the first two components of a + * dvec4 is coupled with the swizzle we use for the last 2. In other words, + * only some specific swizzle combinations can be natively supported. + * + * FIXME: We can also exploit the vstride 0 decompression bug in gen7 to + *implement some more swizzles via simple translations. For + *example: as XYXY, as ZWZW (same for and by + *using subnr), XYXY as XYZW, YXYX as ZWXY (same for ZWZW and + *WZWZ using subnr). + * + * FIXME: we can go an step further and implement even more swizzle + *variations using only partial scalarization. + * + * For more details see: + * https://bugs.freedesktop.org/show_bug.cgi?id=92760#c82 + */ +bool +vec4_visitor::is_supported_64bit_region(src_reg src) +{ + assert(type_sz(src.type) == 8); + + /* Uniform regions have a vstride=0. Because we use 2-wide rows with +* 64-bit regions it means that we cannot access components Z/W, so +* return false for any such case. Interleaved attributes will also be +* mapped to GRF registers with a vstride of 0, so apply the same +* treatment. +*/ + if ((is_uniform(src) || +(stage_uses_interleaved_attributes(stage, prog_data->dispatch_mode) && + src.file == ATTR)) && + (brw_mask_for_swizzle(src.swizzle) & 12)) + return false; + + switch (src.swizzle) { + case BRW_SWIZZLE_XYZW: + case BRW_SWIZZLE_XXZZ: + case BRW_SWIZZLE_YYWW: + case BRW_SWIZZLE_YXWZ: + return true; + default: + return false; + } +} + bool vec4_visitor::scalarize_df() { @@ -2279,6 +2325,29 @@ vec4_visitor::scalarize_df() if (!is_double) continue; + /* Skip the lowering for specific regioning scenarios that we can + * support natively. + */ + bool skip_lowering = true; + + /* XY and ZW writemasks operate in 32-bit, which means that they don't + * have a native 64-bit representation and they should always be split. + */ + if (inst->dst.writemask == WRITEMASK_XY || + inst->dst.writemask == WRITEMASK_ZW) { + skip_lowering = false; + } else { + for (unsigned i = 0; i < 3; i++) { +if (inst->src[i].file == BAD_FILE || type_sz(inst->src[i].type) < 8) + continue; +skip_lowering = skip_lowering && +is_supported_64bit_region(inst->src[i]); + } + } + + if (skip_lowering) + continue; + /* Generate scalar instructions for each enabled channel */ for (unsigned chan = 0; chan < 4; chan++) { unsigned chan_mask = 1 << chan; @@ -2384,35 +2453,49 @@ vec4_visitor::apply_logical_swizzle(struct brw_reg *hw_reg, return; } - /* Otherwise we should have scalarized the instruction, so take the single -* 64-bit logical swizzle channel and translate it to 32-bit -*/ - assert(brw_is_single_value_swizzle(reg.swizzle)); + /* Take the 64-bit logical swizzle c
[Mesa-dev] [PATCH v2 069/103] i965/vec4: prevent copy-propagation from values with a different type size
Because the meaning of the swizzles and writemasks involved is different, so replacing the source would lead to different semantics. --- src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp index 7b53aed..08da96d 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp @@ -324,6 +324,13 @@ try_copy_propagate(const struct gen_device_info *devinfo, value.file != ATTR) return false; + /* If the type of the copy value is different from the type of the +* instruction then the swizzles and writemasks involved don't have the same +* meaning and simply replacing the source would produce different semantics. +*/ + if (type_sz(value.type) != type_sz(inst->src[arg].type)) + return false; + if (devinfo->gen >= 8 && (value.negate || value.abs) && is_logic_op(inst->opcode)) { return false; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 092/103] i965/vec4: dump subnr for FIXED_GRF
This came in handy when debugging the payload setup for Tess Eval, since it prints correct subnr for attributes that can be loaded in the second half of a register. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 56a46ad..33a8c52 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1561,7 +1561,7 @@ vec4_visitor::dump_instruction(backend_instruction *be_inst, FILE *file) fprintf(file, "vgrf%d", inst->src[i].nr); break; case FIXED_GRF: - fprintf(file, "g%d", inst->src[i].nr); + fprintf(file, "g%d.%d", inst->src[i].nr, inst->src[i].subnr); break; case ATTR: fprintf(file, "attr%d", inst->src[i].nr); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 099/103] i965/vec4: avoid spilling of registers that mix 32-bit and 64-bit access
When 64-bit registers are (un)spilled, we need to execute data shuffling code before writing to or after reading from memory. If we have instructions that operate on 64-bit data via 32-bit instructions, (un)spills for the register produced by 32-bit instructions will not do data shuffling at all (because we only see a normal 32-bit istruction seemingly operating on 32-bit data). This means that subsequent reads with that register using DF access will unshuffle data read from memory that was never adequately shuffled when it was written. Fixing this would require to identify which 32-bit instructions write 64-bit data and emit spill instructions only when the full 64-bit data has been written (by multiple 32-bit instructions writing to different offsets of the same register) and always emit 64-bit unspills whenever 64-bit data is read, even when the instruction uses a 32-bit type to read from them. --- .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 24 ++ 1 file changed, 24 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp index 7aff2d8..79951e2 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp @@ -374,9 +374,13 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, bool *no_spill) { float loop_scale = 1.0; + unsigned *reg_type_size = (unsigned *) + ralloc_size(NULL, this->alloc.count * sizeof(unsigned)); + for (unsigned i = 0; i < this->alloc.count; i++) { spill_costs[i] = 0.0; no_spill[i] = alloc.sizes[i] != 1 && alloc.sizes[i] != 2; + reg_type_size[i] = 0; } /* Calculate costs for spilling nodes. Call it a cost of 1 per @@ -406,6 +410,15 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, bool *no_spill) if (type_sz(inst->src[i].type) == 8 && inst->exec_size != 8) no_spill[inst->src[i].nr] = true; } + +/* We can't spill registers that mix 32-bit and 64-bit access (that + * contain 64-bit data that is operated on via 32-bit instructions) + */ +unsigned type_size = type_sz(inst->src[i].type); +if (reg_type_size[inst->src[i].nr] == 0) + reg_type_size[inst->src[i].nr] = type_size; +else if (reg_type_size[inst->src[i].nr] != type_size) + no_spill[inst->src[i].nr] = true; } } @@ -422,6 +435,15 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, bool *no_spill) */ if (type_sz(inst->dst.type) == 8 && inst->exec_size != 8) no_spill[inst->dst.nr] = true; + + /* We can't spill registers that mix 32-bit and 64-bit access (that + * contain 64-bit data that is operated on via 32-bit instructions) + */ + unsigned type_size = type_sz(inst->dst.type); + if (reg_type_size[inst->dst.nr] == 0) +reg_type_size[inst->dst.nr] = type_size; + else if (reg_type_size[inst->dst.nr] != type_size) +no_spill[inst->dst.nr] = true; } switch (inst->opcode) { @@ -448,6 +470,8 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, bool *no_spill) break; } } + + ralloc_free(reg_type_size); } int -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 065/103] i965/vec4: Fix UBO loads for 64-bit data
We need to emit 2 32-bit load messages to load a full dvec4. If only 1 or 2 double components are needed dead-code-elimination will remove the second one. We also need to shuffle the result of the 32-bit messages to form valid 64-bit SIMD4x2 data. --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 46 +- 1 file changed, 32 insertions(+), 14 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 04e95a7..f234e65 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -829,31 +829,49 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) nir->info.num_ubos - 1); } - src_reg offset; + src_reg offset_reg; nir_const_value *const_offset = nir_src_as_const_value(instr->src[1]); if (const_offset) { - offset = brw_imm_ud(const_offset->u32[0] & ~15); + offset_reg = src_reg(this, glsl_type::uint_type); + emit(MOV(dst_reg(offset_reg), brw_imm_ud(const_offset->u32[0] & ~15))); } else { - offset = get_nir_src(instr->src[1], nir_type_uint32, 1); + offset_reg = get_nir_src(instr->src[1], nir_type_uint32, 1); } - src_reg packed_consts = src_reg(this, glsl_type::vec4_type); - packed_consts.type = dest.type; + src_reg packed_consts; + if (nir_dest_bit_size(instr->dest) == 32) { + packed_consts = src_reg(this, glsl_type::vec4_type); + emit_pull_constant_load_reg(dst_reg(packed_consts), + surf_index, + offset_reg, + NULL, NULL /* before_block/inst */); + } else { + src_reg temp = src_reg(this, glsl_type::dvec4_type); + src_reg temp_float = retype(temp, BRW_REGISTER_TYPE_F); + + emit_pull_constant_load_reg(dst_reg(temp_float), + surf_index, offset_reg, NULL, NULL); - emit_pull_constant_load_reg(dst_reg(packed_consts), - surf_index, - offset, - NULL, NULL /* before_block/inst */); + emit(ADD(dst_reg(offset_reg), offset_reg, brw_imm_ud(16u))); + emit_pull_constant_load_reg(dst_reg(offset(temp_float, 1)), + surf_index, offset_reg, NULL, NULL); + + packed_consts = src_reg(this, glsl_type::dvec4_type); + shuffle_64bit_data(dst_reg(packed_consts), temp, false); + } packed_consts.swizzle = brw_swizzle_for_size(instr->num_components); if (const_offset) { - packed_consts.swizzle += BRW_SWIZZLE4(const_offset->u32[0] % 16 / 4, - const_offset->u32[0] % 16 / 4, - const_offset->u32[0] % 16 / 4, - const_offset->u32[0] % 16 / 4); + unsigned type_size = type_sz(dest.type); + packed_consts.swizzle += +BRW_SWIZZLE4(const_offset->u32[0] % 16 / type_size, + const_offset->u32[0] % 16 / type_size, + const_offset->u32[0] % 16 / type_size, + const_offset->u32[0] % 16 / type_size); } - emit(MOV(dest, packed_consts)); + emit(MOV(dest, retype(packed_consts, dest.type))); + break; } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 091/103] i965/vec4/tes: consider register offsets during attribute setup
--- src/mesa/drivers/dri/i965/brw_vec4_tes.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp b/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp index c8fa2ca..a1aa672 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp @@ -84,8 +84,8 @@ vec4_tes_visitor::setup_payload() bool is_64bit = type_sz(inst->src[i].type) == 8; - struct brw_reg grf = -brw_vec4_grf(reg + inst->src[i].nr / 2, 4 * (inst->src[i].nr % 2)); + unsigned slot = inst->src[i].nr + inst->src[i].offset / 16; + struct brw_reg grf = brw_vec4_grf(reg + slot / 2, 4 * (slot % 2)); grf = stride(grf, 0, is_64bit ? 2 : 4, 1); grf.swizzle = inst->src[i].swizzle; grf.type = inst->src[i].type; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 089/103] i965/vec4/tes: fix input loading for 64bit data types
--- src/mesa/drivers/dri/i965/brw_vec4_tes.cpp | 72 +++--- 1 file changed, 55 insertions(+), 17 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp b/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp index 226dcb4..f2a4507 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp @@ -177,10 +177,12 @@ vec4_tes_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) case nir_intrinsic_load_input: case nir_intrinsic_load_per_vertex_input: { src_reg indirect_offset = get_indirect_offset(instr); - dst_reg dst = get_nir_dest(instr->dest, BRW_REGISTER_TYPE_D); unsigned imm_offset = instr->const_index[0]; - unsigned first_component = nir_intrinsic_component(instr); src_reg header = input_read_header; + bool is_64bit = nir_dest_bit_size(instr->dest) == 64; + unsigned first_component = nir_intrinsic_component(instr); + if (is_64bit) + first_component /= 2; if (indirect_offset.file != BAD_FILE) { header = src_reg(this, glsl_type::uvec4_type); @@ -192,31 +194,67 @@ vec4_tes_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) */ const unsigned max_push_slots = 24; if (imm_offset < max_push_slots) { -src_reg src = src_reg(ATTR, imm_offset, glsl_type::ivec4_type); +const glsl_type *src_glsl_type = + is_64bit ? glsl_type::dvec4_type : glsl_type::ivec4_type; +src_reg src = src_reg(ATTR, imm_offset, src_glsl_type); src.swizzle = BRW_SWZ_COMP_INPUT(first_component); -emit(MOV(dst, src)); +const brw_reg_type dst_reg_type = + is_64bit ? BRW_REGISTER_TYPE_DF : BRW_REGISTER_TYPE_D; +emit(MOV(get_nir_dest(instr->dest, dst_reg_type), src)); + prog_data->urb_read_length = MAX2(prog_data->urb_read_length, -DIV_ROUND_UP(imm_offset + 1, 2)); +DIV_ROUND_UP(imm_offset + (is_64bit ? 2 : 1), 2)); break; } } - dst_reg temp(this, glsl_type::ivec4_type); - vec4_instruction *read = - emit(VEC4_OPCODE_URB_READ, temp, src_reg(header)); - read->offset = imm_offset; - read->urb_write_flags = BRW_URB_WRITE_PER_SLOT_OFFSET; + if (!is_64bit) { + dst_reg temp(this, glsl_type::ivec4_type); + vec4_instruction *read = +emit(VEC4_OPCODE_URB_READ, temp, src_reg(header)); + read->offset = imm_offset; + read->urb_write_flags = BRW_URB_WRITE_PER_SLOT_OFFSET; - src_reg src = src_reg(temp); - src.swizzle = BRW_SWZ_COMP_INPUT(first_component); + src_reg src = src_reg(temp); + src.swizzle = BRW_SWZ_COMP_INPUT(first_component); - /* Copy to target. We might end up with some funky writemasks landing - * in here, but we really don't want them in the above pseudo-ops. - */ - dst.writemask = brw_writemask_for_size(instr->num_components); - emit(MOV(dst, src)); + /* Copy to target. We might end up with some funky writemasks landing + * in here, but we really don't want them in the above pseudo-ops. + */ + dst_reg dst = get_nir_dest(instr->dest, BRW_REGISTER_TYPE_D); + dst.writemask = brw_writemask_for_size(instr->num_components); + emit(MOV(dst, src)); + } else { + /* For 64-bit we need to load twice as many 32-bit components, and for + * dvec3/4 we need to emit 2 URB Read messages + */ + dst_reg temp(this, glsl_type::dvec4_type); + dst_reg temp_d = retype(temp, BRW_REGISTER_TYPE_D); + + vec4_instruction *read = +emit(VEC4_OPCODE_URB_READ, temp_d, src_reg(header)); + read->offset = imm_offset; + read->urb_write_flags = BRW_URB_WRITE_PER_SLOT_OFFSET; + + if (instr->num_components > 2) { +read = + emit(VEC4_OPCODE_URB_READ, offset(temp_d, 1), src_reg(header)); +read->offset = imm_offset + 1; +read->urb_write_flags = BRW_URB_WRITE_PER_SLOT_OFFSET; + } + + src_reg temp_as_src = src_reg(temp); + temp_as_src.swizzle = BRW_SWZ_COMP_INPUT(first_component); + + dst_reg shuffled(this, glsl_type::dvec4_type); + shuffle_64bit_data(shuffled, temp_as_src, false); + + dst_reg dst = get_nir_dest(instr->dest, BRW_REGISTER_TYPE_DF); + dst.writemask = brw_writemask_for_size(instr->num_components); + emit(MOV(dst, src_reg(shuffled))); + } break; } default: -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 070/103] i965/vec4: Prevent copy propagation from violating pre-gen8 restrictions
In gen < 8 instructions that write more than one register need to read more than one register too. Make sure we don't break that restriction by copy propagating from a uniform. --- src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp index 08da96d..116287e 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp @@ -324,6 +324,13 @@ try_copy_propagate(const struct gen_device_info *devinfo, value.file != ATTR) return false; + /* In gen < 8 instructions that write 2 registers also need to read 2 +* registers. Make sure we don't break that restriction by copy +* propagating from a uniform. +*/ + if (devinfo->gen < 8 && inst->size_written > REG_SIZE && is_uniform(value)) + return false; + /* If the type of the copy value is different from the type of the * instruction then the swizzles and writemasks involved don't have the same * meaning and simply replacing the source would produce different semantics. -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 071/103] i965/vec4: don't propagate single-precision uniforms into 4-wide instructions
Otherwise we end up producing code that violates the register region restriction that says that when execsize == width and hstride != 0 the vstride can't be 0. --- src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp index 116287e..4f7b844 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp @@ -331,6 +331,17 @@ try_copy_propagate(const struct gen_device_info *devinfo, if (devinfo->gen < 8 && inst->size_written > REG_SIZE && is_uniform(value)) return false; + /* There is a regioning restriction such that if execsize == width +* and hstride != 0 then the vstride can't be 0. When we split instrutions +* that take a single-precision source (like F->DF conversions) we end up +* with a 4-wide source on an instruction with an execution size of 4. +* If we then copy-propagate the source from a uniform we also end up with a +* vstride of 0 and we violate the restriction. +*/ + if (inst->exec_size == 4 && value.file == UNIFORM && + type_sz(value.type) == 4) + return false; + /* If the type of the copy value is different from the type of the * instruction then the swizzles and writemasks involved don't have the same * meaning and simply replacing the source would produce different semantics. -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 066/103] i965/vec4: Fix SSBO loads for 64-bit data
Same requirements as for UBO loads. --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 31 +- 1 file changed, 26 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index f234e65..001a62f 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -645,7 +645,8 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) src_reg offset_reg; nir_const_value *const_offset = nir_src_as_const_value(instr->src[1]); if (const_offset) { - offset_reg = brw_imm_ud(const_offset->u32[0]); + offset_reg = src_reg(this, glsl_type::uint_type); + emit(MOV(dst_reg(offset_reg), brw_imm_ud(const_offset->u32[0]))); } else { offset_reg = get_nir_src(instr->src[1], 1); } @@ -654,14 +655,34 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) const vec4_builder bld = vec4_builder(this).at_end() .annotate(current_annotation, base_ir); - src_reg read_result = emit_untyped_read(bld, surf_index, offset_reg, - 1 /* dims */, 4 /* size*/, - BRW_PREDICATE_NONE); + src_reg read_result; dst_reg dest = get_nir_dest(instr->dest); + if (type_sz(dest.type) < 8) { + read_result = emit_untyped_read(bld, surf_index, offset_reg, + 1 /* dims */, 4 /* size*/, + BRW_PREDICATE_NONE); + } else { + src_reg shuffled = src_reg(this, glsl_type::dvec4_type); + + src_reg temp; + temp = emit_untyped_read(bld, surf_index, offset_reg, + 1 /* dims */, 4 /* size*/, + BRW_PREDICATE_NONE); + emit(MOV(dst_reg(retype(shuffled, temp.type)), temp)); + + emit(ADD(dst_reg(offset_reg), offset_reg, brw_imm_ud(16))); + temp = emit_untyped_read(bld, surf_index, offset_reg, + 1 /* dims */, 4 /* size*/, + BRW_PREDICATE_NONE); + emit(MOV(dst_reg(retype(offset(shuffled, 1), temp.type)), temp)); + + read_result = src_reg(this, glsl_type::dvec4_type); + shuffle_64bit_data(dst_reg(read_result), shuffled, false); + } + read_result.type = dest.type; read_result.swizzle = brw_swizzle_for_size(instr->num_components); emit(MOV(dest, read_result)); - break; } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 001/103] i965/nir: double/dvec2 uniforms only need to be padded to a single vec4 slot
From: Samuel Iglesias Gonsálvez max_vector_size is used in the vec4 backend to pad out the uniform components to match a size that is a multiple of a vec4. Double and dvec2 uniforms only require a single vec4 slot, not two. Signed-off-by: Samuel Iglesias Gonsálvez Signed-off-by: Iago Toral Quiroga Reviewed-by: Timothy Arceri --- src/mesa/drivers/dri/i965/brw_nir_uniforms.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_nir_uniforms.cpp b/src/mesa/drivers/dri/i965/brw_nir_uniforms.cpp index b752ad5..e3ce5f9 100644 --- a/src/mesa/drivers/dri/i965/brw_nir_uniforms.cpp +++ b/src/mesa/drivers/dri/i965/brw_nir_uniforms.cpp @@ -107,7 +107,8 @@ brw_nir_setup_glsl_uniform(gl_shader_stage stage, nir_variable *var, unsigned max_vector_size = 4; if (storage->type->base_type == GLSL_TYPE_DOUBLE) { vector_size *= 2; -max_vector_size *= 2; +if (vector_size > 4) + max_vector_size = 8; } for (unsigned s = 0; s < vector_count; s++) { -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 085/103] i965/vec4: fix store output for 64-bit types
We need to shuffle the data before it is written to the URB. Also, dvec3/4 need two vec4 slots. --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 29 ++--- 1 file changed, 26 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 60a8425..dfe2740 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -437,16 +437,39 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) int varying = instr->const_index[0] + const_offset->u32[0]; - src = get_nir_src(instr->src[0], BRW_REGISTER_TYPE_F, -instr->num_components); + bool is_64bit = nir_src_bit_size(instr->src[0]) == 64; + if (is_64bit) { + src_reg data; + src = get_nir_src(instr->src[0], BRW_REGISTER_TYPE_DF, + instr->num_components); + data = src_reg(this, glsl_type::dvec4_type); + shuffle_64bit_data(dst_reg(data), src, true); + src = retype(data, BRW_REGISTER_TYPE_F); + } else { + src = get_nir_src(instr->src[0], BRW_REGISTER_TYPE_F, + instr->num_components); + } if (varying >= VARYING_SLOT_VAR0) { unsigned c = nir_intrinsic_component(instr); unsigned v = varying - VARYING_SLOT_VAR0; + + unsigned num_components = instr->num_components; + if (is_64bit) +num_components *= 2; + output_generic_reg[v][c] = dst_reg(src); - output_generic_num_components[v][c] = instr->num_components; + output_generic_num_components[v][c] = MIN2(4, num_components); + + if (is_64bit && num_components > 4) { +assert(num_components <= 8); +output_generic_reg[v + 1][c] = offset(dst_reg(src), 1); +output_generic_num_components[v + 1][c] = num_components - 4; + } } else { output_reg[varying] = dst_reg(src); + if (is_64bit && instr->num_components > 2) +output_reg[varying + 1] = offset(dst_reg(src), 1); } break; } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 060/103] i965/vec4: Skip swizzle to subnr in 3src instructions with DF operands
We make scalar sources in 3src instructions use subnr instead of swizzles because they don't really use swizzles. With doubles it is more complicated because we use vstride=0 in more scenarios in which they don't produce scalar regions. Also RepCtrl=1 is not allowed with 64-bit operands, so we should avoid this. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 45d49e9..190581e 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1961,9 +1961,12 @@ vec4_visitor::convert_to_hw_regs() if (inst->is_3src(devinfo)) { /* 3-src instructions with scalar sources support arbitrary subnr, * but don't actually use swizzles. Convert swizzle into subnr. + * Skip this for double-precision instructions: RepCtrl=1 is not + * allowed for them and need special handling. */ for (int i = 0; i < 3; i++) { -if (inst->src[i].vstride == BRW_VERTICAL_STRIDE_0) { +if (inst->src[i].vstride == BRW_VERTICAL_STRIDE_0 && +type_sz(inst->src[i].type) < 8) { assert(brw_is_single_value_swizzle(inst->src[i].swizzle)); inst->src[i].subnr += 4 * BRW_GET_SWZ(inst->src[i].swizzle, 0); } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 090/103] i965/vec4/tes: fix setup_payload() for 64bit data types
Use a width of 2 with 64-bit attributes. Also, if we have a dvec3/4 attribute that gets split across two registers such that components XY are stored in the second half of a register and components ZW are stored in the first half of the next, we need to fix regioning for any instruction that reads components Z/W of the attribute. Notice this also means that we can't support sources that read cross-dvec2 swizzles (like XZ for example). v2: don't assert that we have a single channel swizzle in the case that we have to fix up Z/W access on the first half of the next register. We can handle any swizzle that does not cross dvec2 boundaries, which the double scalarization pass should have prevented anyway. --- src/mesa/drivers/dri/i965/brw_vec4_tes.cpp | 21 - 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp b/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp index f2a4507..c8fa2ca 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp @@ -82,14 +82,33 @@ vec4_tes_visitor::setup_payload() if (inst->src[i].file != ATTR) continue; + bool is_64bit = type_sz(inst->src[i].type) == 8; + struct brw_reg grf = brw_vec4_grf(reg + inst->src[i].nr / 2, 4 * (inst->src[i].nr % 2)); - grf = stride(grf, 0, 4, 1); + grf = stride(grf, 0, is_64bit ? 2 : 4, 1); grf.swizzle = inst->src[i].swizzle; grf.type = inst->src[i].type; grf.abs = inst->src[i].abs; grf.negate = inst->src[i].negate; + /* For 64-bit attributes we can end up with components XY in the + * second half of a register and components ZW in the first half + * of the next. Fix it up here. + */ + if (is_64bit && grf.subnr > 0) { +/* We can't do swizzles that mix XY and ZW channels in this case. + * Such cases should have been handled by the scalarization pass. + */ +assert((brw_mask_for_swizzle(grf.swizzle) & 0x3) ^ + (brw_mask_for_swizzle(grf.swizzle) & 0xc)); +if (brw_mask_for_swizzle(grf.swizzle) & 0xc) { + grf.subnr = 0; + grf.nr++; + grf.swizzle -= BRW_SWIZZLE_; +} + } + inst->src[i] = grf; } } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 082/103] i965/vec4: make emit_pull_constant_load support 64-bit loads
This way callers don't need to know about 64-bit particularities and we reuse some code. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 22 ++- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 81 ++ 2 files changed, 50 insertions(+), 53 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index b0bc2d5..e732bf4 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -884,24 +884,12 @@ vec4_visitor::move_push_constants_to_pull_constants() int uniform = inst->src[i].nr; - dst_reg temp; - if (type_sz(inst->src[i].type) != 8) { -temp = dst_reg(this, glsl_type::vec4_type); -emit_pull_constant_load(block, inst, temp, inst->src[i], -pull_constant_loc[uniform], src_reg()); - } else { -dst_reg shuffled = dst_reg(this, glsl_type::dvec4_type); -dst_reg shuffled_float = retype(shuffled, BRW_REGISTER_TYPE_F); - -emit_pull_constant_load(block, inst, shuffled_float, inst->src[i], -pull_constant_loc[uniform], src_reg()); -emit_pull_constant_load(block, inst, offset(shuffled_float, 1), -offset(inst->src[i], 1), -pull_constant_loc[uniform], src_reg()); + const glsl_type *temp_type = type_sz(inst->src[i].type) == 8 ? +glsl_type::dvec4_type : glsl_type::vec4_type; + dst_reg temp = dst_reg(this, temp_type); -temp = dst_reg(this, glsl_type::dvec4_type); -shuffle_64bit_data(temp, src_reg(shuffled), false, block, inst); - } + emit_pull_constant_load(block, inst, temp, inst->src[i], + pull_constant_loc[uniform], src_reg()); inst->src[i].file = temp.file; inst->src[i].nr = temp.nr; diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index f12a114..0177f68 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -1718,33 +1718,57 @@ vec4_visitor::move_grf_array_access_to_scratch() */ void vec4_visitor::emit_pull_constant_load(bblock_t *block, vec4_instruction *inst, - dst_reg temp, src_reg orig_src, + dst_reg temp, src_reg orig_src, int base_offset, src_reg indirect) { assert(orig_src.offset % 16 == 0); - int reg_offset = base_offset + orig_src.offset / 16; const unsigned index = prog_data->base.binding_table.pull_constants_start; - src_reg offset; - if (indirect.file != BAD_FILE) { - offset = src_reg(this, glsl_type::uint_type); - - emit_before(block, inst, ADD(dst_reg(offset), indirect, - brw_imm_ud(reg_offset * 16))); - } else if (devinfo->gen >= 8) { - /* Store the offset in a GRF so we can send-from-GRF. */ - offset = src_reg(this, glsl_type::uint_type); - emit_before(block, inst, MOV(dst_reg(offset), brw_imm_ud(reg_offset * 16))); - } else { - offset = brw_imm_d(reg_offset * 16); + /* For 64bit loads we need to emit two 32-bit load messages and we also +* we need to shuffle the 32-bit data result into proper 64-bit data. To do +* that we emit the 32-bit loads into a temporary and we shuffle the result +* into the original destination. +*/ + dst_reg orig_temp = temp; + bool is_64bit = type_sz(orig_src.type) == 8; + if (is_64bit) { + assert(type_sz(temp.type) == 8); + dst_reg temp_df = dst_reg(this, glsl_type::dvec4_type); + temp = retype(temp_df, BRW_REGISTER_TYPE_F); } - emit_pull_constant_load_reg(temp, - brw_imm_ud(index), - offset, - block, inst); + src_reg src = orig_src; + for (int i = 0; i < (is_64bit ? 2 : 1); i++) { + int reg_offset = base_offset + src.offset / 16; + + src_reg byte_offset; + if (indirect.file != BAD_FILE) { + byte_offset = src_reg(this, glsl_type::uint_type); + emit_before(block, inst, ADD(dst_reg(byte_offset), indirect, + brw_imm_ud(reg_offset * 16))); + } else if (devinfo->gen >= 8) { + /* Store the offset in a GRF so we can send-from-GRF. */ + byte_offset = src_reg(this, glsl_type::uint_type); + emit_before(block, inst, MOV(dst_reg(byte_offset), + brw_imm_ud(reg_offset * 16))); + } else { + byte_offset = brw_imm_d(reg_offset * 16); + } + + emit_pull_constant_load_reg(offset(temp, i), + brw_imm_ud(index), + byte_offset, +
[Mesa-dev] [PATCH v2 086/103] i965/vec4/gs: fix input loading for 64bit data
From: Samuel Iglesias Gonsálvez v2 (Iago): - Adapt 64-bit path to component packing changes. Signed-off-by: Samuel Iglesias Gonsálvez Signed-off-by: Iago Toral Quiroga --- src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp | 51 ++- 1 file changed, 34 insertions(+), 17 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp index 16d2410..ed8c03b 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp @@ -64,23 +64,40 @@ vec4_gs_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) * be constant. We should handle indirects someday. */ nir_const_value *vertex = nir_src_as_const_value(instr->src[0]); - nir_const_value *offset = nir_src_as_const_value(instr->src[1]); - - /* Make up a type...we have no way of knowing... */ - const glsl_type *const type = glsl_type::ivec(instr->num_components); - - src = src_reg(ATTR, BRW_VARYING_SLOT_COUNT * vertex->u32[0] + - instr->const_index[0] + offset->u32[0], -type); - src.swizzle = BRW_SWZ_COMP_INPUT(nir_intrinsic_component(instr)); - - /* gl_PointSize is passed in the .w component of the VUE header */ - if (instr->const_index[0] == VARYING_SLOT_PSIZ) - src.swizzle = BRW_SWIZZLE_; - - dest = get_nir_dest(instr->dest, src.type); - dest.writemask = brw_writemask_for_size(instr->num_components); - emit(MOV(dest, src)); + nir_const_value *offset_reg = nir_src_as_const_value(instr->src[1]); + + if (nir_dest_bit_size(instr->dest) == 64) { + src = src_reg(ATTR, BRW_VARYING_SLOT_COUNT * vertex->u32[0] + + instr->const_index[0] + offset_reg->u32[0], + glsl_type::dvec4_type); + + dst_reg tmp = dst_reg(this, glsl_type::dvec4_type); + shuffle_64bit_data(tmp, src, false); + + src = src_reg(tmp); + src.swizzle = BRW_SWZ_COMP_INPUT(nir_intrinsic_component(instr) / 2); + + /* Write to dst reg taking into account original writemask */ + dest = get_nir_dest(instr->dest, BRW_REGISTER_TYPE_DF); + dest.writemask = brw_writemask_for_size(instr->num_components); + emit(MOV(dest, src)); + } else { + /* Make up a type...we have no way of knowing... */ + const glsl_type *const type = glsl_type::ivec(instr->num_components); + + src = src_reg(ATTR, BRW_VARYING_SLOT_COUNT * vertex->u32[0] + + instr->const_index[0] + offset_reg->u32[0], + type); + src.swizzle = BRW_SWZ_COMP_INPUT(nir_intrinsic_component(instr)); + + /* gl_PointSize is passed in the .w component of the VUE header */ + if (instr->const_index[0] == VARYING_SLOT_PSIZ) +src.swizzle = BRW_SWIZZLE_; + + dest = get_nir_dest(instr->dest, src.type); + dest.writemask = brw_writemask_for_size(instr->num_components); + emit(MOV(dest, src)); + } break; } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 057/103] i965/vec4: teach register coalescing about 64-bit
Specifically, at least for now, we don't want to deal with the fact that channel sizes for fp64 instructions are twice the size, so prevent coalescing from instructions with a different type size. Also, we should check that if we are coalescing a register from another MOV we should be reading the same amount of data written by that MOV, Otherwise it might not be safe to eliminate it. This can happen, for example, when we have split fp64 MOVs with an exec size of 4 that only write one register each and then a MOV with exec size of 8 that reads both. We want to avoid the pass to think that it can coalesce from the first split MOV alone. Ideally we would like the pass to see that it can coalesce from both split MOVs instead, but for now we keep it simple. Finally, the pass doesn't support coalescing of multiple registers but in the case of normal SIMD4x2 double-precision instructions they naturally write two registers (one per vertex) and there is no reason why we should not allow coalescing in this case. Change the restriction to bail if we see instructions that write more than 8 channels, where the channels can be 32-bit or 64-bit. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 21 ++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index c728e38..e5391b9 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1191,6 +1191,19 @@ vec4_visitor::opt_register_coalesce() scan_inst->dst.type == scan_inst->src[0].type)) break; +/* Only allow coalescing between registers of the same type size. + * Otherwise we would need to make the pass aware of the fact that + * channel sizes are different for single and double precision. + */ +if (type_sz(inst->src[0].type) != type_sz(scan_inst->src[0].type)) + break; + +/* Check that scan_inst writes at least the same amount of data + * that we read in the instruction + */ +if (scan_inst->size_written >= inst->size_read(0)) + break; + /* If we can't handle the swizzle, bail. */ if (!scan_inst->can_reswizzle(devinfo, inst->dst.writemask, inst->src[0].swizzle, @@ -1198,10 +1211,12 @@ vec4_visitor::opt_register_coalesce() break; } -/* This only handles coalescing of a single register starting at - * the source offset of the copy instruction. +/* This only handles coalescing writes of 8 channels (1 register + * for single-precision and 2 registers for double-precision) + * starting at the source offset of the copy instruction. */ -if (scan_inst->size_written > REG_SIZE || +if (DIV_ROUND_UP(scan_inst->size_written, + type_sz(scan_inst->dst.type)) > 8 || scan_inst->dst.offset != inst->src[0].offset) break; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 020/103] i965/vec4: don't copy propagate vector opcodes that operate in align1 mode
Basically, ALIGN1 mode will ignore swizzles on the input vectors so we don't want the copy propagation pass to mess with them. --- .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 24 ++ 1 file changed, 24 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp index 545f4c7..d0045a7 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp @@ -283,6 +283,22 @@ try_constant_propagate(const struct gen_device_info *devinfo, } static bool +is_align1_opcode(unsigned opcode) +{ + switch (opcode) { + case VEC4_OPCODE_DOUBLE_TO_FLOAT: + case VEC4_OPCODE_FLOAT_TO_DOUBLE: + case VEC4_OPCODE_PICK_LOW_32BIT: + case VEC4_OPCODE_PICK_HIGH_32BIT: + case VEC4_OPCODE_SET_LOW_32BIT: + case VEC4_OPCODE_SET_HIGH_32BIT: + return true; + default: + return false; + } +} + +static bool try_copy_propagate(const struct gen_device_info *devinfo, vec4_instruction *inst, int arg, const copy_entry *entry, int attributes_per_reg) @@ -326,6 +342,14 @@ try_copy_propagate(const struct gen_device_info *devinfo, unsigned composed_swizzle = brw_compose_swizzle(inst->src[arg].swizzle, value.swizzle); + + /* Instructions that operate on vectors in ALIGN1 mode will ignore swizzles +* so copy-propagation won't be safe if the composed swizzle is anything +* other than the identity. +*/ + if (is_align1_opcode(inst->opcode) && composed_swizzle != BRW_SWIZZLE_XYZW) + return false; + if (inst->is_3src(devinfo) && (value.file == UNIFORM || (value.file == ATTR && attributes_per_reg != 1)) && -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 050/103] i965/vec4: teach CSE about exec_size, group and doubles
--- src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 31 +++--- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp index bef897a..229d7b2 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp @@ -130,6 +130,8 @@ instructions_match(vec4_instruction *a, vec4_instruction *b) a->dst.writemask == b->dst.writemask && a->force_writemask_all == b->force_writemask_all && a->size_written == b->size_written && + a->exec_size == b->exec_size && + a->group == b->group && operands_match(a, b); } @@ -181,9 +183,17 @@ vec4_visitor::opt_cse_local(bblock_t *block) regs_written(entry->generator)), NULL), inst->dst.type); - for (unsigned i = 0; i < regs_written(entry->generator); ++i) { - vec4_instruction *copy = MOV(offset(entry->generator->dst, i), - offset(entry->tmp, i)); + unsigned type_scale = DIV_ROUND_UP(type_sz(entry->tmp.type), 4); + unsigned regs_per_mov = + DIV_ROUND_UP(type_scale * entry->generator->exec_size, 8); + unsigned num_copy_movs = + DIV_ROUND_UP(regs_written(entry->generator), regs_per_mov); + for (unsigned i = 0; i < num_copy_movs; ++i) { + vec4_instruction *copy = + MOV(offset(entry->generator->dst, i * regs_per_mov), +offset(entry->tmp, i * regs_per_mov)); + copy->exec_size = entry->generator->exec_size; + copy->group = entry->generator->group; copy->force_writemask_all = entry->generator->force_writemask_all; entry->generator->insert_after(block, copy); @@ -195,10 +205,17 @@ vec4_visitor::opt_cse_local(bblock_t *block) /* dest <- temp */ if (!inst->dst.is_null()) { assert(inst->dst.type == entry->tmp.type); - - for (unsigned i = 0; i < regs_written(inst); ++i) { - vec4_instruction *copy = MOV(offset(inst->dst, i), - offset(entry->tmp, i)); + unsigned type_scale = DIV_ROUND_UP(type_sz(inst->dst.type), 4); + unsigned regs_per_mov = + DIV_ROUND_UP(type_scale * inst->exec_size, 8); + unsigned num_copy_movs = + DIV_ROUND_UP(regs_written(inst), regs_per_mov); + for (unsigned i = 0; i < num_copy_movs; ++i) { + vec4_instruction *copy = + MOV(offset(inst->dst, i * regs_per_mov), + offset(entry->tmp, i * regs_per_mov)); + copy->exec_size = inst->exec_size; + copy->group = inst->group; copy->force_writemask_all = inst->force_writemask_all; inst->insert_before(block, copy); } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 015/103] i965/vec4: We only support 32-bit integer ALU operations for now
Add asserts so we remember to address this when we enable 64-bit integer support, as suggested by Connor and Jason. Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 71 ++ 1 file changed, 53 insertions(+), 18 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index b75337c..04f70ef 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -1135,9 +1135,9 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) break; } - case nir_op_fadd: - /* fall through */ case nir_op_iadd: + assert(nir_dest_bit_size(instr->dest.dest) < 64); + case nir_op_fadd: inst = emit(ADD(dst, op[0], op[1])); inst->saturate = instr->dest.saturate; break; @@ -1148,6 +1148,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) break; case nir_op_imul: { + assert(nir_dest_bit_size(instr->dest.dest) < 64); if (devinfo->gen < 8) { nir_const_value *value0 = nir_src_as_const_value(instr->src[0].src); nir_const_value *value1 = nir_src_as_const_value(instr->src[1].src); @@ -1183,6 +1184,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) case nir_op_imul_high: case nir_op_umul_high: { + assert(nir_dest_bit_size(instr->dest.dest) < 64); struct brw_reg acc = retype(brw_acc_reg(8), dst.type); if (devinfo->gen >= 8) @@ -1221,6 +1223,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) case nir_op_idiv: case nir_op_udiv: + assert(nir_dest_bit_size(instr->dest.dest) < 64); emit_math(SHADER_OPCODE_INT_QUOTIENT, dst, op[0], op[1]); break; @@ -1230,6 +1233,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) * appears that our hardware just does the right thing for signed * remainder. */ + assert(nir_dest_bit_size(instr->dest.dest) < 64); emit_math(SHADER_OPCODE_INT_REMAINDER, dst, op[0], op[1]); break; @@ -1283,6 +1287,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) break; case nir_op_uadd_carry: { + assert(nir_dest_bit_size(instr->dest.dest) < 64); struct brw_reg acc = retype(brw_acc_reg(8), BRW_REGISTER_TYPE_UD); emit(ADDC(dst_null_ud(), op[0], op[1])); @@ -1291,6 +1296,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) } case nir_op_usub_borrow: { + assert(nir_dest_bit_size(instr->dest.dest) < 64); struct brw_reg acc = retype(brw_acc_reg(8), BRW_REGISTER_TYPE_UD); emit(SUBB(dst_null_ud(), op[0], op[1])); @@ -1358,16 +1364,18 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) break; } - case nir_op_fmin: case nir_op_imin: case nir_op_umin: + assert(nir_dest_bit_size(instr->dest.dest) < 64); + case nir_op_fmin: inst = emit_minmax(BRW_CONDITIONAL_L, dst, op[0], op[1]); inst->saturate = instr->dest.saturate; break; - case nir_op_fmax: case nir_op_imax: case nir_op_umax: + assert(nir_dest_bit_size(instr->dest.dest) < 64); + case nir_op_fmax: inst = emit_minmax(BRW_CONDITIONAL_GE, dst, op[0], op[1]); inst->saturate = instr->dest.saturate; break; @@ -1380,26 +1388,30 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) case nir_op_fddy_fine: unreachable("derivatives are not valid in vertex shaders"); - case nir_op_flt: case nir_op_ilt: case nir_op_ult: - case nir_op_fge: case nir_op_ige: case nir_op_uge: - case nir_op_feq: case nir_op_ieq: - case nir_op_fne: case nir_op_ine: + assert(nir_dest_bit_size(instr->dest.dest) < 64); + /* Fallthrough */ + case nir_op_flt: + case nir_op_fge: + case nir_op_feq: + case nir_op_fne: emit(CMP(dst, op[0], op[1], brw_conditional_for_nir_comparison(instr->op))); break; - case nir_op_ball_fequal2: case nir_op_ball_iequal2: - case nir_op_ball_fequal3: case nir_op_ball_iequal3: - case nir_op_ball_fequal4: - case nir_op_ball_iequal4: { + case nir_op_ball_iequal4: + assert(nir_dest_bit_size(instr->dest.dest) < 64); + /* Fallthrough */ + case nir_op_ball_fequal2: + case nir_op_ball_fequal3: + case nir_op_ball_fequal4: { unsigned swiz = brw_swizzle_for_size(nir_op_infos[instr->op].input_sizes[0]); @@ -1411,12 +1423,14 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) break; } - case nir_op_bany_fnequal2: case nir_op_bany_inequal2: - case nir_op_bany_fnequal3: case nir_op_bany_inequal3: - case nir_op_bany_fnequal4: - case nir_op_bany_inequal4: { + case nir_op_bany_inequal4: + assert(nir_dest_bit_size(instr->dest.dest) < 64); + /* Fallthrough */ + case nir_op_bany_fnequal2: + case nir_op_bany_fnequal3: + case nir_op_bany_fnequal4: { unsigned swiz = brw_swizzle_for_size(nir_op_infos[inst
[Mesa-dev] [PATCH v2 078/103] i965/vec4: fix scratch writes for 64bit data
Mostly the same stuff as usual: we ned to shuffle the data before we write and we need to emit two 32-bit write messages (with appropriate 32-bit writemask channels set) for a full dvec4 scratch write. --- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 64 ++ 1 file changed, 55 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 44e6709..b0b5f39 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -1534,17 +1534,63 @@ vec4_visitor::emit_scratch_write(bblock_t *block, vec4_instruction *inst, * weren't initialized, it will confuse live interval analysis, which will * make spilling fail to make progress. */ - const src_reg temp = swizzle(retype(src_reg(this, glsl_type::vec4_type), + bool is_64bit = type_sz(inst->dst.type) == 8; + const glsl_type *alloc_type = + is_64bit ? glsl_type::dvec4_type : glsl_type::vec4_type; + const src_reg temp = swizzle(retype(src_reg(this, alloc_type), inst->dst.type), brw_swizzle_for_mask(inst->dst.writemask)); - dst_reg dst = dst_reg(brw_writemask(brw_vec8_grf(0, 0), - inst->dst.writemask)); - vec4_instruction *write = SCRATCH_WRITE(dst, temp, index); - if (inst->opcode != BRW_OPCODE_SEL) - write->predicate = inst->predicate; - write->ir = inst->ir; - write->annotation = inst->annotation; - inst->insert_after(block, write); + + if (!is_64bit) { + dst_reg dst = dst_reg(brw_writemask(brw_vec8_grf(0, 0), + inst->dst.writemask)); + vec4_instruction *write = SCRATCH_WRITE(dst, temp, index); + if (inst->opcode != BRW_OPCODE_SEL) + write->predicate = inst->predicate; + write->ir = inst->ir; + write->annotation = inst->annotation; + inst->insert_after(block, write); + } else { + dst_reg shuffled = dst_reg(this, alloc_type); + vec4_instruction *last = + shuffle_64bit_data(shuffled, temp, true, block, inst); + src_reg shuffled_float = src_reg(retype(shuffled, BRW_REGISTER_TYPE_F)); + + uint8_t mask = 0; + if (inst->dst.writemask & WRITEMASK_X) + mask |= WRITEMASK_XY; + if (inst->dst.writemask & WRITEMASK_Y) + mask |= WRITEMASK_ZW; + if (mask) { + dst_reg dst = dst_reg(brw_writemask(brw_vec8_grf(0, 0), mask)); + + vec4_instruction *write = SCRATCH_WRITE(dst, shuffled_float, index); + if (inst->opcode != BRW_OPCODE_SEL) +write->predicate = inst->predicate; + write->ir = inst->ir; + write->annotation = inst->annotation; + last->insert_after(block, write); + } + + mask = 0; + if (inst->dst.writemask & WRITEMASK_Z) + mask |= WRITEMASK_XY; + if (inst->dst.writemask & WRITEMASK_W) + mask |= WRITEMASK_ZW; + if (mask) { + dst_reg dst = dst_reg(brw_writemask(brw_vec8_grf(0, 0), mask)); + + src_reg index = get_scratch_offset(block, inst, inst->dst.reladdr, +reg_offset + 1); + vec4_instruction *write = +SCRATCH_WRITE(dst, offset(shuffled_float, 1), index); + if (inst->opcode != BRW_OPCODE_SEL) +write->predicate = inst->predicate; + write->ir = inst->ir; + write->annotation = inst->annotation; + last->insert_after(block, write); + } + } inst->dst.file = temp.file; inst->dst.nr = temp.nr; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 072/103] i965/vec4: don't copy propagate misaligned registers
From: Samuel Iglesias Gonsálvez This means we would copy propagate partial reads or writes and that can affect the result. Signed-off-by: Samuel Iglesias Gonsálvez --- src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp index 4f7b844..db2b317 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp @@ -354,6 +354,9 @@ try_copy_propagate(const struct gen_device_info *devinfo, return false; } + if (inst->src[arg].offset % REG_SIZE || value.offset % REG_SIZE) + return false; + bool has_source_modifiers = value.negate || value.abs; /* gen6 math and gen7+ SENDs from GRFs ignore source modifiers on -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 046/103] i965/vec4: add a SIMD lowering pass
Generally, instructions in Align16 mode only ever write to a single register and don't need any form of SIMD splitting, that's why we have never had a SIMD splitting pass in the vec4 backend. However, double-precision instructions typically write 2 registers and in some cases they run into certain hardware bugs and limitations that we need to work around by splitting the instructions so we only write to 1 register at a time. This patch implements a SIMD splitting pass similar to the one in the scalar backend. Because we only use double-precision instructions in Align16 mode in gen7 (gen8+ is fully scalar and gens < 7 do not implement fp64) the pass should be a no-op on any other generation. For now the pass only handles the gen7 restriction where any instruction that writes 2 registers also needs to read 2 registers. This affects double-precision instructions reading uniforms, for example. Later patches will extend the lowering pass adding a few more cases. v2: - Move the simd lowering pass after the main optimization loop and run copy-propagation and dce if it reports progress (Curro) - Compute number of registers written instead of fixing it to 1 (Iago) - Use group from backend_instruction (Iago) - Drop assertion that checked that we only split 8-wide instructions into 4-wide. (Curro) - Don't assume that instructions can only be 8-wide, we might want to use 16-wide instructions in the future too (Curro) - Wrap gen7 workarounds in a conditional to ease adding workarounds for other gens in the future (Curro) - Handle dst/src overlap hazard (Curro) - Use the horiz_offset() helper to simplify the implementation (Curro) - Drop the assertion that checks that each split instruction writes exactly one register (Curro) - Use the copy constructor to generate split instructions with all the relevant fields initialized to the values in the original instruction instead of copying only a handful of them manually (Curro) --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 156 + src/mesa/drivers/dri/i965/brw_vec4.h | 2 + 2 files changed, 158 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 34cab04..490cbae 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1977,6 +1977,157 @@ vec4_visitor::convert_to_hw_regs() } } +/** + * Get the closest native SIMD width supported by the hardware for instruction + * \p inst. The instruction will be left untouched by + * vec4_visitor::lower_simd_width() if the returned value matches the + * instruction's original execution size. + */ +static unsigned +get_lowered_simd_width(const struct gen_device_info *devinfo, + const vec4_instruction *inst) +{ + unsigned lowered_width = MIN2(16, inst->exec_size); + + /* We need to split some cases of double-precision instructions that write +* 2 registers. We only need to care about this in gen7 because that is the +* only hardware that implements fp64 in Align16. +*/ + if (devinfo->gen == 7 && inst->size_written > REG_SIZE) { + /* HSW PRM, 3D Media GPGPU Engine, Region Alignment Rules for Direct + * Register Addressing: + * + *"When destination spans two registers, the source MUST span two + * registers." + */ + for (unsigned i = 0; i < 3; i++) { + if (inst->src[i].file == BAD_FILE) +continue; + if (inst->size_read(i) <= REG_SIZE) +lowered_width = MIN2(lowered_width, 4); + } + } + + return lowered_width; +} + +static bool +dst_src_regions_overlap(vec4_instruction *inst) +{ + if (inst->size_written == 0) + return false; + + unsigned dst_start = inst->dst.offset; + unsigned dst_end = dst_start + inst->size_written - 1; + for (int i = 0; i < 3; i++) { + if (inst->src[i].file == BAD_FILE) + continue; + + if (inst->dst.file != inst->src[i].file || + inst->dst.nr != inst->src[i].nr) + continue; + + unsigned src_start = inst->src[i].offset; + unsigned src_end = src_start + inst->size_read(i) - 1; + + if ((dst_start >= src_start && dst_start <= src_end) || + (dst_end >= src_start && dst_end <= src_end) || + (dst_start <= src_start && dst_end >= src_end)) { + return true; + } + } + + return false; +} + +bool +vec4_visitor::lower_simd_width() +{ + bool progress = false; + + foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) { + const unsigned lowered_width = get_lowered_simd_width(devinfo, inst); + assert(lowered_width <= inst->exec_size); + if (lowered_width == inst->exec_size) + continue; + + /* We need to deal with source / destination overlaps when splitting. + * The hardware supports reading from and writing to the same register + * in the same instruction, but we need to be caref
[Mesa-dev] [PATCH v2 025/103] i965/vec4: fix indentation in get_nir_src()
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 860ec51..c825aeb 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -308,8 +308,8 @@ vec4_visitor::get_nir_src(const nir_src &src, enum brw_reg_type type, reg = nir_ssa_values[src.ssa->index]; } else { - reg = dst_reg_for_nir_reg(this, src.reg.reg, src.reg.base_offset, - src.reg.indirect); + reg = dst_reg_for_nir_reg(this, src.reg.reg, src.reg.base_offset, +src.reg.indirect); } reg = retype(reg, type); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 040/103] i965/vec4: fix regs_read() for doubles
--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 75a8473..2bde628 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -228,8 +228,8 @@ vec4_instruction::size_read(unsigned arg) const case UNIFORM: return 4 * type_sz(src[arg].type); default: - /* XXX - Represent actual execution size and vertical stride. */ - return 8 * type_sz(src[arg].type); + /* XXX - Represent actual vertical stride. */ + return exec_size * type_sz(src[arg].type); } } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 073/103] i965/vec4: extend the DWORD multiply DepCtrl restriction to all gen8 platforms
--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 7af65ab..7f6acc3 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -908,11 +908,14 @@ vec4_visitor::is_dep_ctrl_unsafe(const vec4_instruction *inst) (reg.type == BRW_REGISTER_TYPE_UD || \ reg.type == BRW_REGISTER_TYPE_D) - /* "When source or destination datatype is 64b or operation is integer DWord + /* From the Cherryview and Broadwell PRMs: +* +* "When source or destination datatype is 64b or operation is integer DWord * multiply, DepCtrl must not be used." -* May apply to future SoCs as well. +* +* SKL PRMs don't include this restriction though. */ - if (devinfo->is_cherryview) { + if (devinfo->gen == 8) { if (inst->opcode == BRW_OPCODE_MUL && IS_DWORD(inst->src[0]) && IS_DWORD(inst->src[1])) -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 036/103] i965/vec4: add a helper function to create double immediates
Gen7 hardware does not support double immediates so these need to be moved in 32-bit chunks to a regular vgrf instead. Instead of doing this every time we need to create a DF immediate, create a helper function that does the right thing depending on the hardware generation. v2 (Curro): - Use swizzle() and writemask() helpers and make tmp const. Signed-off-by: Samuel Iglesias Gonsálvez --- src/mesa/drivers/dri/i965/brw_vec4.h | 2 ++ src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 38 ++ 2 files changed, 40 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 0111966..6063bee 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -322,6 +322,8 @@ public: void emit_single_to_double(dst_reg dst, src_reg src, bool saturate, brw_reg_type single_type); + src_reg setup_imm_df(double v); + virtual void emit_nir_code(); virtual void nir_setup_uniforms(); virtual void nir_setup_system_value_intrinsic(nir_intrinsic_instr *instr); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 088ed13..4d5fa96 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -1114,6 +1114,44 @@ vec4_visitor::emit_single_to_double(dst_reg dst, src_reg src, bool saturate, inst->saturate = saturate; } +src_reg +vec4_visitor::setup_imm_df(double v) +{ + assert(devinfo->gen >= 7); + + if (devinfo->gen >= 8) + return brw_imm_df(v); + + /* gen7 does not support DF immediates */ + union { + double d; + struct { + uint32_t i1; + uint32_t i2; + }; + } di; + + di.d = v; + + /* Write the low 32-bit of the constant to the X:UD channel and the +* high 32-bit to the Y:UD channel to build the constant in a VGRF. +* We have to do this twice (offset 0 and offset 1), since a DF VGRF takes +* two SIMD8 registers in SIMD4x2 execution. Finally, return a swizzle +* so any access to the VGRF only reads the constant data in these +* channels. +*/ + const dst_reg tmp = + retype(dst_reg(VGRF, alloc.allocate(2)), BRW_REGISTER_TYPE_UD); + for (int n = 0; n < 2; n++) { + emit(MOV(writemask(offset(tmp, n), WRITEMASK_X), brw_imm_ud(di.i1))) + ->force_writemask_all = true; + emit(MOV(writemask(offset(tmp, n), WRITEMASK_Y), brw_imm_ud(di.i2))) + ->force_writemask_all = true; + } + + return swizzle(src_reg(retype(tmp, BRW_REGISTER_TYPE_DF)), BRW_SWIZZLE_); +} + void vec4_visitor::nir_emit_alu(nir_alu_instr *instr) { -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 023/103] i965/vec4/nir: implement double comparisons
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 ++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 37c3d7c..815082e 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -1399,10 +1399,25 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) case nir_op_flt: case nir_op_fge: case nir_op_feq: - case nir_op_fne: - emit(CMP(dst, op[0], op[1], - brw_conditional_for_nir_comparison(instr->op))); + case nir_op_fne: { + enum brw_conditional_mod conditional_mod = + brw_conditional_for_nir_comparison(instr->op); + if (nir_src_bit_size(instr->src[0].src) < 64) { + emit(CMP(dst, op[0], op[1], conditional_mod)); + } else { + /* Produce a 32-bit boolean result from the DF comparison by selecting + * only the low 32-bit in each DF produced. Do this in a temporary + * so we can then move from there to the result using align16 again + * to honor the original writemask. + */ + dst_reg temp = dst_reg(this, glsl_type::dvec4_type); + emit(CMP(temp, op[0], op[1], conditional_mod)); + dst_reg result = dst_reg(this, glsl_type::bvec4_type); + emit(VEC4_OPCODE_PICK_LOW_32BIT, result, src_reg(temp)); + emit(MOV(dst, src_reg(result))); + } break; + } case nir_op_ball_iequal2: case nir_op_ball_iequal3: -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 049/103] i965/disasm: print NibCtrl for instructions with execsize < 8
v2 (Curro): - Print it also for execsize < 4. - QtrCtrl is still in effect, so print 2 * qtr_ctl + nib_ctl + 1 - Do not read the nib ctl from the instruction in gen < 7, the field only exists in gen7+. --- src/mesa/drivers/dri/i965/brw_disasm.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c b/src/mesa/drivers/dri/i965/brw_disasm.c index 1d2a4d2..0c43217 100644 --- a/src/mesa/drivers/dri/i965/brw_disasm.c +++ b/src/mesa/drivers/dri/i965/brw_disasm.c @@ -1193,7 +1193,11 @@ qtr_ctrl(FILE *file, const struct gen_device_info *devinfo, brw_inst *inst) int qtr_ctl = brw_inst_qtr_control(devinfo, inst); int exec_size = 1 << brw_inst_exec_size(devinfo, inst); - if (exec_size == 8) { + if (exec_size < 8) { + const unsigned nib_ctl = devinfo->gen < 7 ? 0 : + brw_inst_nib_control(devinfo, inst); + format(file, " %dN", qtr_ctl * 2 + nib_ctl + 1); + } else if (exec_size == 8) { switch (qtr_ctl) { case 0: string(file, " 1Q"); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 080/103] i965/vec4: fix indentation in move_push_constants_to_pull_constants()
--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 60 +- 1 file changed, 30 insertions(+), 30 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 75e47f9..0788ba2 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -842,34 +842,34 @@ vec4_visitor::move_push_constants_to_pull_constants() pull_constant_loc[i / 4] = -1; if (i >= max_uniform_components) { -const gl_constant_value **values = &stage_prog_data->param[i]; + const gl_constant_value **values = &stage_prog_data->param[i]; -/* Try to find an existing copy of this uniform in the pull - * constants if it was part of an array access already. - */ -for (unsigned int j = 0; j < stage_prog_data->nr_pull_params; j += 4) { - int matches; + /* Try to find an existing copy of this uniform in the pull + * constants if it was part of an array access already. + */ + for (unsigned int j = 0; j < stage_prog_data->nr_pull_params; j += 4) { +int matches; - for (matches = 0; matches < 4; matches++) { - if (stage_prog_data->pull_param[j + matches] != values[matches]) - break; - } +for (matches = 0; matches < 4; matches++) { + if (stage_prog_data->pull_param[j + matches] != values[matches]) + break; +} - if (matches == 4) { - pull_constant_loc[i / 4] = j / 4; - break; - } -} +if (matches == 4) { + pull_constant_loc[i / 4] = j / 4; + break; +} + } -if (pull_constant_loc[i / 4] == -1) { - assert(stage_prog_data->nr_pull_params % 4 == 0); - pull_constant_loc[i / 4] = stage_prog_data->nr_pull_params / 4; + if (pull_constant_loc[i / 4] == -1) { +assert(stage_prog_data->nr_pull_params % 4 == 0); +pull_constant_loc[i / 4] = stage_prog_data->nr_pull_params / 4; - for (int j = 0; j < 4; j++) { - stage_prog_data->pull_param[stage_prog_data->nr_pull_params++] = +for (int j = 0; j < 4; j++) { + stage_prog_data->pull_param[stage_prog_data->nr_pull_params++] = values[j]; - } -} +} + } } } @@ -878,21 +878,21 @@ vec4_visitor::move_push_constants_to_pull_constants() */ foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) { for (int i = 0 ; i < 3; i++) { -if (inst->src[i].file != UNIFORM || + if (inst->src[i].file != UNIFORM || pull_constant_loc[inst->src[i].nr] == -1) - continue; +continue; int uniform = inst->src[i].nr; -dst_reg temp = dst_reg(this, glsl_type::vec4_type); + dst_reg temp = dst_reg(this, glsl_type::vec4_type); -emit_pull_constant_load(block, inst, temp, inst->src[i], -pull_constant_loc[uniform], src_reg()); + emit_pull_constant_load(block, inst, temp, inst->src[i], + pull_constant_loc[uniform], src_reg()); -inst->src[i].file = temp.file; + inst->src[i].file = temp.file; inst->src[i].nr = temp.nr; -inst->src[i].offset %= 16; -inst->src[i].reladdr = NULL; + inst->src[i].offset %= 16; + inst->src[i].reladdr = NULL; } } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 062/103] i965/vec4: do not emit 64-bit MAD
The previous patch made sure that we do not generate MAD instructions for any NIR's 64-bit ffma, but there is nothing preventing i965 from producing MAD instructions as a result of lowerings or optimization passes. This patch makes sure that any 64-bit MAD produced inside the driver after translating from NIR is also converted to MUL+ADD before we generate code. v2: - Use a copy constructor to copy all relevant instruction fields from the original mad into the add and mul instructions --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 44 ++ src/mesa/drivers/dri/i965/brw_vec4.h | 1 + 2 files changed, 45 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 190581e..7af65ab 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -2255,6 +2255,49 @@ vec4_visitor::scalarize_df() return progress; } +bool +vec4_visitor::translate_64bit_mad_to_mul_add() +{ + bool progress = false; + + foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) { + if (inst->opcode != BRW_OPCODE_MAD) + continue; + + if (type_sz(inst->dst.type) != 8) + continue; + + dst_reg mul_dst = dst_reg(this, glsl_type::dvec4_type); + + /* Use the copy constructor so we copy all relevant instruction fields + * from the original mad into the add and mul instructions + */ + vec4_instruction *mul = new(mem_ctx) vec4_instruction(*inst); + mul->opcode = BRW_OPCODE_MUL; + mul->dst = mul_dst; + mul->src[0] = inst->src[1]; + mul->src[1] = inst->src[2]; + mul->src[2].file = BAD_FILE; + + vec4_instruction *add = new(mem_ctx) vec4_instruction(*inst); + add->opcode = BRW_OPCODE_ADD; + add->src[0] = src_reg(mul_dst); + add->src[1] = inst->src[0]; + add->src[2].file = BAD_FILE; + + inst->insert_before(block, mul); + inst->insert_before(block, add); + inst->remove(block); + + progress = true; + } + + if (progress) + invalidate_live_intervals(); + + return progress; +} + /* The align16 hardware can only do 32-bit swizzle channels, so we need to * translate the logical 64-bit swizzle channels that we use in the Vec4 IR * to 32-bit swizzle channels in hardware registers. @@ -2414,6 +2457,7 @@ vec4_visitor::run() if (failed) return false; + OPT(translate_64bit_mad_to_mul_add); OPT(scalarize_df); setup_payload(); diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 7e51c41..0af55c5 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -164,6 +164,7 @@ public: bool lower_simd_width(); bool scalarize_df(); + bool translate_64bit_mad_to_mul_add(); void apply_logical_swizzle(struct brw_reg *hw_reg, vec4_instruction *inst, int arg); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 058/103] i965/vec4: fix pack_uniform_registers for doubles
We need to consider the fact that dvec3/4 require two vec4 slots. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index e5391b9..b79fd5e 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -610,13 +610,20 @@ vec4_visitor::pack_uniform_registers() if (inst->src[i].file != UNIFORM) continue; + assert(type_sz(inst->src[i].type) % 4 == 0); + unsigned channel_size = type_sz(inst->src[i].type) / 4; + int reg = inst->src[i].nr; for (int c = 0; c < 4; c++) { if (!(readmask & (1 << c))) continue; -chans_used[reg] = MAX2(chans_used[reg], - BRW_GET_SWZ(inst->src[i].swizzle, c) + 1); +unsigned channel = BRW_GET_SWZ(inst->src[i].swizzle, c) + 1; +unsigned used = MAX2(chans_used[reg], channel * channel_size); +if (used <= 4) + chans_used[reg] = used; +else + chans_used[reg + 1] = used - 4; } } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 088/103] i965/vec4/tcs: fix outputs for 64-bit data
--- src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp | 31 -- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp b/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp index f62dc9c..914396c 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp @@ -443,13 +443,40 @@ vec4_tcs_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) unsigned first_component = nir_intrinsic_component(instr); if (first_component) { + if (nir_src_bit_size(instr->src[0]) == 64) +first_component /= 2; assert(swiz == BRW_SWIZZLE_XYZW); swiz = BRW_SWZ_COMP_OUTPUT(first_component); mask = mask << first_component; } - emit_urb_write(swizzle(value, swiz), mask, - imm_offset, indirect_offset); + if (nir_src_bit_size(instr->src[0]) == 64) { + /* For 64-bit data we need to shuffle the data before we write and + * emit two messages. Also, since each channel is twice as large we + * need to fix the writemask in each 32-bit message to account for it. + */ + value = swizzle(retype(value, BRW_REGISTER_TYPE_DF), swiz); + dst_reg shuffled = dst_reg(this, glsl_type::dvec4_type); + shuffle_64bit_data(shuffled, value, true); + src_reg shuffled_float = src_reg(retype(shuffled, BRW_REGISTER_TYPE_F)); + + for (int n = 0; n < 2; n++) { +unsigned fixed_mask = 0; +if (mask & WRITEMASK_X) + fixed_mask |= WRITEMASK_XY; +if (mask & WRITEMASK_Y) + fixed_mask |= WRITEMASK_ZW; +emit_urb_write(shuffled_float, fixed_mask, + imm_offset, indirect_offset); + +shuffled_float = offset(shuffled_float, 1); +mask >>= 2; +imm_offset++; + } + } else { + emit_urb_write(swizzle(value, swiz), mask, +imm_offset, indirect_offset); + } break; } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 056/103] i965/disasm: fix subreg for dst in Align16 mode
There is a single bit for this, so it is a binary 0 or 1 meaning offset 0B or 16B respectively. v2: - Since brw_inst_dst_da16_subreg_nr() is known to be 1, remove it from the expression (Curro) Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_disasm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c b/src/mesa/drivers/dri/i965/brw_disasm.c index 0c43217..e439ec4 100644 --- a/src/mesa/drivers/dri/i965/brw_disasm.c +++ b/src/mesa/drivers/dri/i965/brw_disasm.c @@ -772,7 +772,7 @@ dest(FILE *file, const struct gen_device_info *devinfo, brw_inst *inst) if (err == -1) return 0; if (brw_inst_dst_da16_subreg_nr(devinfo, inst)) -format(file, ".%"PRIu64, brw_inst_dst_da16_subreg_nr(devinfo, inst) / +format(file, ".%u", 16 / reg_type_size[brw_inst_dst_reg_type(devinfo, inst)]); string(file, "<1>"); err |= control(file, "writemask", writemask, -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 081/103] i965/vec4: fix move_push_constants_to_pull_constants() for 64-bit data
--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 0788ba2..b0bc2d5 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -884,10 +884,24 @@ vec4_visitor::move_push_constants_to_pull_constants() int uniform = inst->src[i].nr; - dst_reg temp = dst_reg(this, glsl_type::vec4_type); + dst_reg temp; + if (type_sz(inst->src[i].type) != 8) { +temp = dst_reg(this, glsl_type::vec4_type); +emit_pull_constant_load(block, inst, temp, inst->src[i], +pull_constant_loc[uniform], src_reg()); + } else { +dst_reg shuffled = dst_reg(this, glsl_type::dvec4_type); +dst_reg shuffled_float = retype(shuffled, BRW_REGISTER_TYPE_F); + +emit_pull_constant_load(block, inst, shuffled_float, inst->src[i], +pull_constant_loc[uniform], src_reg()); +emit_pull_constant_load(block, inst, offset(shuffled_float, 1), +offset(inst->src[i], 1), +pull_constant_loc[uniform], src_reg()); - emit_pull_constant_load(block, inst, temp, inst->src[i], - pull_constant_loc[uniform], src_reg()); +temp = dst_reg(this, glsl_type::dvec4_type); +shuffle_64bit_data(temp, src_reg(shuffled), false, block, inst); + } inst->src[i].file = temp.file; inst->src[i].nr = temp.nr; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 045/103] i965: move the group field from fs_inst to backend_instruction.
Just like the exec_size, we are going to need this in the vec4 backend when we implement a simd splitting pass. --- src/mesa/drivers/dri/i965/brw_ir_fs.h | 9 - src/mesa/drivers/dri/i965/brw_shader.h | 9 + src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 1 + 3 files changed, 10 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h b/src/mesa/drivers/dri/i965/brw_ir_fs.h index c569bd4..cad3712 100644 --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h @@ -367,15 +367,6 @@ public: uint8_t sources; /**< Number of fs_reg sources. */ - /** -* Channel group from the hardware execution and predication mask that -* should be applied to the instruction. The subset of channel enable -* signals (calculated from the EU control flow and predication state) -* given by [group, group + exec_size) will be used to mask GRF writes and -* any other side effects of the instruction. -*/ - uint8_t group; - bool eot:1; bool pi_noperspective:1; /**< Pixel interpolator noperspective flag */ }; diff --git a/src/mesa/drivers/dri/i965/brw_shader.h b/src/mesa/drivers/dri/i965/brw_shader.h index aca26dc..0c8f296 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.h +++ b/src/mesa/drivers/dri/i965/brw_shader.h @@ -140,6 +140,15 @@ struct backend_instruction { */ uint8_t exec_size; + /** +* Channel group from the hardware execution and predication mask that +* should be applied to the instruction. The subset of channel enable +* signals (calculated from the EU control flow and predication state) +* given by [group, group + exec_size) will be used to mask GRF writes and +* any other side effects of the instruction. +*/ + uint8_t group; + uint32_t offset; /**< spill/unspill offset or texture offset bitfield */ uint8_t mlen; /**< SEND message length */ int8_t base_mrf; /**< First MRF in the SEND message, if mlen is nonzero. */ diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 4e7515c..75c60a0 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -55,6 +55,7 @@ vec4_instruction::vec4_instruction(enum opcode opcode, const dst_reg &dst, this->base_mrf = 0; this->offset = 0; this->exec_size = 8; + this->group = 0; this->size_written = (dst.file == BAD_FILE ? 0 : this->exec_size * type_sz(dst.type)); this->annotation = NULL; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 031/103] i965/vec4: implement hardware workaround for align16 double to float conversion
From the BDW PRM, Workarounds chapter: "DF->f format conversion for Align16 has wrong emask calculation when source is immediate." So detect the case and move the immediate source to a VGRF before we attempt the conversion. Notice that Broadwell and later are strictly scalar at the moment though, so this is not really necessary. --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 94d0161..0170d21 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -1077,6 +1077,17 @@ void vec4_visitor::emit_double_to_single(dst_reg dst, src_reg src, bool saturate, brw_reg_type single_type) { + /* BDW PRM vol 15 - workarounds: +* DF->f format conversion for Align16 has wrong emask calculation when +* source is immediate. +*/ + if (devinfo->gen == 8 && single_type == BRW_REGISTER_TYPE_F && + src.file == BRW_IMMEDIATE_VALUE) { + dst_reg fixed_src = dst_reg(this, glsl_type::dvec4_type); + emit(MOV(fixed_src, src)); + src = src_reg(fixed_src); + } + dst_reg temp = dst_reg(this, glsl_type::dvec4_type); emit(MOV(temp, src)); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 054/103] i965/vec4: translate 64-bit swizzles to 32-bit
The hardware can only operate with 32-bit swizzles, which is a rather limiting restriction. However, the idea is not to expose this to the optimization passes, which would be a mess to deal with. Instead, we let the bulk of the vec4 backend ignore this fact and we fix the swizzles right at codegen time. At the moment the pass only needs to handle single value swizzles thanks to the scalarization pass that runs before it. Notice that this only works for X/Y swizzles. We will add support for Z/W swizzles in the next patch, since they need a bit more work. v2 (Sam): - Do not expand swizzle of 64-bit immediate values. v3: - Do this after translation to hardware registers instead of doing it right before so we don't need the force_vstride0 flag (Curro). - Squashed patch that included FIXED_GRF in the list of register files that need this translation (Iago). - Remove swizzle assignments for VGRF and UNIFORM files in convert_to_hw_regs(), they will be set by apply_logical_swizzle() (Iago). Signed-off-by: Samuel Iglesias Gonsálvez --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 49 +++--- src/mesa/drivers/dri/i965/brw_vec4.h | 2 ++ 2 files changed, 48 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index b15fcee..b37dd59 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1891,7 +1891,6 @@ vec4_visitor::convert_to_hw_regs() unsigned width = REG_SIZE / 2 / MAX2(4, type_size); reg = byte_offset(brw_vecn_grf(width, src.nr, 0), src.offset); reg.type = src.type; -reg.swizzle = src.swizzle; reg.abs = src.abs; reg.negate = src.negate; break; @@ -1905,7 +1904,6 @@ vec4_visitor::convert_to_hw_regs() src.offset), 0, width, 1); reg.type = src.type; -reg.swizzle = src.swizzle; reg.abs = src.abs; reg.negate = src.negate; @@ -1914,8 +1912,13 @@ vec4_visitor::convert_to_hw_regs() break; } - case ARF: case FIXED_GRF: +if (type_sz(src.type) == 8) { + reg = src.as_brw_reg(); + break; +} +/* fallthrough */ + case ARF: case IMM: continue; @@ -1929,6 +1932,7 @@ vec4_visitor::convert_to_hw_regs() unreachable("not reached"); } + apply_logical_swizzle(®, inst, i); src = reg; } @@ -2226,6 +2230,45 @@ vec4_visitor::scalarize_df() return progress; } +/* The align16 hardware can only do 32-bit swizzle channels, so we need to + * translate the logical 64-bit swizzle channels that we use in the Vec4 IR + * to 32-bit swizzle channels in hardware registers. + * + * @inst and @arg identify the original vec4 IR source operand we need to + * translate the swizzle for and @hw_reg is the hardware register where we + * will write the hardware swizzle to use. + * + * This pass assumes that Align16/DF instructions have been fully scalarized + * previously so there is just one 64-bit swizzle channel to deal with for any + * given Vec4 IR source. + */ +void +vec4_visitor::apply_logical_swizzle(struct brw_reg *hw_reg, +vec4_instruction *inst, int arg) +{ + src_reg reg = inst->src[arg]; + + if (reg.file == BAD_FILE || reg.file == BRW_IMMEDIATE_VALUE) + return; + + /* If this is not a 64-bit operand or this is a scalar instruction we don't +* need to do anything about the swizzles. +*/ + if(type_sz(reg.type) < 8 || is_align1_df(inst)) { + hw_reg->swizzle = reg.swizzle; + return; + } + + /* Otherwise we should have scalarized the instruction, so take the single +* 64-bit logical swizzle channel and translate it to 32-bit +*/ + assert(brw_is_single_value_swizzle(reg.swizzle)); + + unsigned swizzle = BRW_GET_SWZ(reg.swizzle, 0); + hw_reg->swizzle = BRW_SWIZZLE4(swizzle * 2, swizzle * 2 + 1, + swizzle * 2, swizzle * 2 + 1); +} + bool vec4_visitor::run() { diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 03c7345..7e51c41 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -164,6 +164,8 @@ public: bool lower_simd_width(); bool scalarize_df(); + void apply_logical_swizzle(struct brw_reg *hw_reg, + vec4_instruction *inst, int arg); vec4_instruction *emit(vec4_instruction *inst); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev