El 14/07/18 a las 00:14, Francisco Jerez escribió: > Jose Maria Casanova Crespo <jmcasan...@igalia.com> writes: > >> For a register source/destination of an instruction the function returns >> the read/write byte pattern of a 32-byte registers as a unsigned int. >> >> The returned pattern takes into account the exec_size of the instruction, >> the type bitsize, the stride and if the register is source or destination. >> >> The objective of the functions if to help to know the read/written bytes >> of the instructions to improve the liveness analysis for partial read/writes. >> >> We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL >> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize >> parameter they have a different read pattern. >> --- >> src/intel/compiler/brw_fs.cpp | 183 +++++++++++++++++++++++++++++++++ >> src/intel/compiler/brw_ir_fs.h | 1 + >> 2 files changed, 184 insertions(+) >> >> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp >> index 2b8363ca362..f3045c4ff6c 100644 >> --- a/src/intel/compiler/brw_fs.cpp >> +++ b/src/intel/compiler/brw_fs.cpp >> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const >> this->dst.offset % REG_SIZE != 0); >> } >> >> +/** >> + * Returns a 32-bit uint whose bits represent if the associated register >> byte >> + * has been read/written by the instruction. The returned pattern takes into >> + * account the exec_size of the instruction, the type bitsize and the >> register >> + * stride and the register is source or destination for the instruction. >> + * >> + * The objective of this function is to identify which parts of the register >> + * are read or written for operations that don't read/write a full register. >> + * So we can identify in live range variable analysis if a partial write has >> + * completelly defined the part of the register used by a partial read. So >> we >> + * avoid extending the liveness range because all data read was already >> + * defined although the wasn't completely written. >> + */ >> +unsigned >> +fs_inst::register_byte_use_pattern(const fs_reg &r, boolean is_dst) const >> +{ >> + if (is_dst) {
> Please split into two functions (like fs_inst::src_read and > ::src_written) since that would make the call-sites of this method more > self-documenting than a boolean parameter. You should be able to share > code by refactoring the common logic into a separate function (see below > for some suggestions on how that could be achieved). Sure, it would improve readability and simplifies the logic, I've chosen dst_write_pattern and src_read_pattern. > >> + /* We don't know what is written so we return the worts case */ > > "worst" Fixed. >> + if (this->predicate && this->opcode != BRW_OPCODE_SEL) >> + return 0; >> + /* We assume that send destinations are completely written */ >> + if (this->is_send_from_grf()) >> + return ~0u; > > Some send-like instructions won't be caught by this condition, you > should check for this->mlen != 0 in addition. Would it be enough to check for (this->mlen > 0) and forget about is_send_from_grf? I am using this approach in v2 I am sending. >> + } else { >> + /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned >> + * so the read pattern depends on the bitsize stored at src[4] >> + */ >> + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL && >> + this->src[1].nr == r.nr) { > I feel uncomfortable about attempting to guess the source the caller is > referring to by comparing the registers for equality. E.g. you could > potentially end up with two sources that compare equal but have > different semantics (e.g. as a result of CSE) which might cause it to > get the wrong answer. It would probably be better to pass a source > index and a byte offset as argument instead of an fs_reg. I've didn't thought about CSE, I'm now receiving the number of source and the reg_offset. I'm using reg_offset instead of byte offsets as it simplifies the logic. Now we are using always the base src register to do all the calculation >> + switch (this->src[4].ud) { >> + case 32: >> + return ~0u; >> + case 16: >> + return 0x33333333; >> + case 8: >> + return 0x11111111; >> + default: >> + unreachable("Unsupported bitsize at >> byte_scattered_write_logical"); >> + } > > Replace the above switch statement with a call to "periodic_mask(8, 4, > this->src[4].ud / 8)" (see below for the definition). Ok. >> + } >> + /* As for byte_scattered_write_logical but we need to take into >> account >> + * that data written are in the payload offset 32 with SIMD8 and >> offset >> + * 64 with SIMD16. >> + */ >> + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE && >> + this->src[0].nr == r.nr) { >> + fs_reg payload = this->src[0]; >> + payload.offset = REG_SIZE * this->exec_size / 8; > > byte_offset() is your friend. I've removed the overlap logic, and I'm just checking if we are in the reg_offset 1 on SIMD8 or reg_offset 2-3 in SIMD16. >> + if (regions_overlap(r, REG_SIZE, >> + payload, REG_SIZE * this->exec_size / 8)) { >> + switch (this->src[2].ud) { >> + case 32: >> + return ~0u; >> + case 16: >> + return 0x33333333; >> + case 8: >> + return 0x11111111; >> + default: >> + unreachable("Unsupported bitsize at byte_scattered_write"); >> + } > > Replace the above switch statement with a call to "periodic_mask(8, 4, > this->src[2].ud / 8)". Ok. >> + } else { >> + return ~0u; >> + } >> + } >> + } >> + >> + /* We define the most conservative value in order to calculate liveness >> + * range. If it is a destination nothing is defined and if is a source >> + * all the bytes of the register could be read. So for release builds >> + * the unreachables would have always safe return value. */ >> + unsigned pattern = is_dst ? 0 : ~0u; >> + >> + /* In the general case we calculate the pattern for a specific register >> + * on base of the type_size and stride. We calculate the SIMD8 pattern >> + * and then we adjust the patter if needed for different exec_sizes >> + * and offset >> + */ >> + switch (type_sz(r.type)){ >> + case 1: >> + switch (r.stride) { >> + case 0: >> + pattern = 0X1; >> + break; >> + case 1: >> + pattern = 0xff; >> + break; >> + case 2: >> + pattern = 0x5555; >> + break; >> + case 4: >> + pattern = 0x11111111; >> + break; >> + case 8: >> + pattern = 0x01010101; >> + break; >> + default: >> + unreachable("Unknown pattern unsupported 8-bit stride"); >> + } >> + break; >> + case 2: >> + switch (r.stride) { >> + case 0: >> + pattern = 0X3; >> + break; >> + case 1: >> + pattern = 0xffff; >> + break; >> + case 2: >> + pattern = 0x33333333; >> + break; >> + case 4: >> + pattern = 0x03030303; >> + break; >> + case 8: >> + pattern = 0x00030003; >> + break; >> + default: >> + unreachable("Unknown pattern unsupported 16-bit stride"); >> + } >> + break; >> + case 4: >> + switch (r.stride) { >> + case 0: >> + pattern = 0Xf; >> + break; >> + case 1: >> + pattern = ~0u; >> + break; >> + case 2: >> + pattern = 0x0f0f0f0f; >> + break; >> + case 4: >> + pattern = 0x000f000f; >> + break; >> + default: >> + unreachable("Unknown pattern unsupported 32-bit stride"); >> + } >> + break; >> + case 8: >> + switch (r.stride) { >> + case 0: >> + pattern = 0Xff; >> + break; >> + case 1: >> + pattern = ~0u; >> + break; >> + case 2: >> + pattern = 0x00ff00ff; >> + break; >> + case 4: >> + pattern = 0xff; >> + break; >> + default: >> + unreachable("Unknown pattern unsupported 64-bit stride"); >> + } >> + break; >> + default: >> + unreachable("Unknown pattern for unsupported bitsize "); >> + } >> + >> + if (this->exec_size > 8 && r.stride * type_sz(r.type) * 8 < REG_SIZE) { >> + /* For exec_size greater than SIMD8 we repeat the pattern until it >> + * represents a full register already represent a full register */ >> + pattern = pattern | (pattern << (8 * r.stride * type_sz(r.type))); >> + if (this->exec_size > 16 && r.stride * type_sz(r.type) * 16 < >> REG_SIZE) >> + pattern = pattern | (pattern << (16 * r.stride * type_sz(r.type))); >> + } else if (this->exec_size < 8 && >> + r.stride * type_sz(r.type) * this->exec_size < REG_SIZE) { >> + /* For exec_size smaller than SIMD8 we reduce the pattern if its size >> + * is smaller than a full register. */ >> + pattern = pattern >> (MIN2(REG_SIZE, 8 * type_sz(r.type) * r.stride) - >> + this->exec_size * type_sz(r.type) * r.stride); >> + } >> + > This seems really mad, no clue whether it's correct. Why not replace > the above ~110 lines with a call to the following (fully > untested) 5-LOC function: Your suggestion seems to work perfectly fine, my original approach was trying to avoid the loop of creating the read/write pattern but after testing my v2 I wasn't able to notice any performance difference running shader-db and having the same results. I was originally trying that SIMD8 patterns were already constants. Sorry for the added complexity. > > | uint32_t > | periodic_mask(unsigned count, unsigned step, unsigned bits) > | { > | uint32_t m = (count ? (1 << bits) - 1 : 0); > | const unsigined max = MIN2(count * step, sizeof(m) * CHAR_BITS); > | > | for (unsigned shift = step; shift < max; shift *= 2) > | m |= m << shift; > | > | return m; > | } I've used your function just changing the sizeof(m) * CHAR_BITS for the REG_SIZE, to not include the limits.h. And I've also included an offset parameter that allows us to shift the bits of the pattern when we have an offset inside the register. >> + /* We adjust the pattern to the byte_offset of the register */ >> + pattern = pattern << (r.offset % REG_SIZE); >> + > > This doesn't really work except for r equal to the first GRF read by the > source. Regions with non-zero sub-GRF offset that straddle multiple > GRFs are not really very common at the IR level, but they're allowed by > the hardware with some restrictions, so it would make sense to at least > handle them conservatively in order to keep things from breaking silently. I included an assert in the periodic_mask function to detect if the combination of offset and mask goes over the REG_SIZE, so we would detect an straddle at this level of the IR. assert(offset + max - (step - bits) <= REG_SIZE); Thanks for the review, I think that v2 has better shape. Chema >> + assert(pattern); >> + >> + return pattern; >> +} >> + >> + >> unsigned >> fs_inst::components_read(unsigned i) const >> { >> diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h >> index 92dad269a34..5ea6294b8ad 100644 >> --- a/src/intel/compiler/brw_ir_fs.h >> +++ b/src/intel/compiler/brw_ir_fs.h >> @@ -350,6 +350,7 @@ public: >> bool equals(fs_inst *inst) const; >> bool is_send_from_grf() const; >> bool is_partial_write() const; >> + unsigned register_byte_use_pattern(const fs_reg &r, boolean is_dst) >> const; >> bool is_copy_payload(const brw::simple_allocator &grf_alloc) const; >> unsigned components_read(unsigned i) const; >> unsigned size_read(int arg) const; >> -- >> 2.17.1 >> >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >> >> >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev