On Fri, Jul 14, 2017 at 3:23 PM, Francisco Jerez <curroje...@riseup.net> wrote: > Matt Turner <matts...@gmail.com> writes: > >> The implementations of the ARB_shader_group_vote intrinsics will >> explicitly write the flag as the destination register. >> --- >> src/intel/compiler/brw_fs.cpp | 12 ++++++++++-- >> 1 file changed, 10 insertions(+), 2 deletions(-) >> >> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp >> index 43b6e34204..97908a4563 100644 >> --- a/src/intel/compiler/brw_fs.cpp >> +++ b/src/intel/compiler/brw_fs.cpp >> @@ -890,9 +890,17 @@ fs_inst::flags_written() const >> opcode != BRW_OPCODE_WHILE)) || >> opcode == FS_OPCODE_MOV_DISPATCH_TO_FLAGS) { >> return flag_mask(this); >> - } else { >> - return 0; >> + } else if (dst.file == ARF) { >> + if (dst.nr == BRW_ARF_FLAG + 0 && dst.subnr == 0) >> + return 0b0001; >> + if (dst.nr == BRW_ARF_FLAG + 0 && dst.subnr == 1) >> + return 0b0010; >> + if (dst.nr == BRW_ARF_FLAG + 1 && dst.subnr == 0) >> + return 0b0100; >> + if (dst.nr == BRW_ARF_FLAG + 1 && dst.subnr == 1) >> + return 0b1000; > > This doesn't look right to me, it assumes that flag registers are 16b > (they're 32b), and that you can only write a single byte of the flag > register at a time, which is hardly ever the case.
I don't think I read the implementation of flag_mask(), just the comment, and thought that it returned a bit per flag subregister. > > I think you want something along the lines of: > > | unsigned > | bit_mask(unsigned n) > | { > | return (n >= CHAR_BIT * sizeof(bit_mask(n)) ? ~0u : (1 << n) - 1); > | } > | > | unsigned > | flag_mask(const fs_reg &r, unsigned sz) > | { > | if (r.file == ARF) { > | const unsigned start = (r.nr - BRW_ARF_FLAG) * 4 + r.subnr; > | const unsigned end = start + sz; > | return bit_mask(end) & ~bit_mask(start); > | } else { > | return 0; > | } > | } > > [Yes, this should give you the right result (i.e. zero) for ARF > registers outside the range of the (flag) ARF window of the 32-bit > bitmask.] Notice the similarity between this and the current > flag_mask() overload. This will allow you to easily re-use the same > logic for both flags_read() and flags_written(), you just need to pass > 'inst->src[i], inst->size_read(i)' and 'inst->dst, inst->size_written' > as arguments respectively, ORing the results for each source in the > former case. Wow, thanks. That's fantastic. I've made this change (and also changed the authorship on 05/20 and 12/20 to you! :) _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev