El 26/07/18 a las 20:02, Francisco Jerez escribió: > Chema Casanova <jmcasan...@igalia.com> writes: > >> El 20/07/18 a las 22:10, Francisco Jerez escribió: >>> Chema Casanova <jmcasan...@igalia.com> writes: >>> >>>> El 20/07/18 a las 00:34, Francisco Jerez escribió: >>>>> Chema Casanova <jmcasan...@igalia.com> writes: >>>>> >>>>>> El 14/07/18 a las 00:14, Francisco Jerez escribió: >>>>>>> Jose Maria Casanova Crespo <jmcasan...@igalia.com> writes: >>>>>>> >>>>>>>> For a register source/destination of an instruction the function >>>>>>>> returns >>>>>>>> the read/write byte pattern of a 32-byte registers as a unsigned int. >>>>>>>> >>>>>>>> The returned pattern takes into account the exec_size of the >>>>>>>> instruction, >>>>>>>> the type bitsize, the stride and if the register is source or >>>>>>>> destination. >>>>>>>> >>>>>>>> The objective of the functions if to help to know the read/written >>>>>>>> bytes >>>>>>>> of the instructions to improve the liveness analysis for partial >>>>>>>> read/writes. >>>>>>>> >>>>>>>> We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL >>>>>>>> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize >>>>>>>> parameter they have a different read pattern. >>>>>>>> --- >>>>>>>> src/intel/compiler/brw_fs.cpp | 183 +++++++++++++++++++++++++++++++++ >>>>>>>> src/intel/compiler/brw_ir_fs.h | 1 + >>>>>>>> 2 files changed, 184 insertions(+) >>>>>>>> >>>>>>>> diff --git a/src/intel/compiler/brw_fs.cpp >>>>>>>> b/src/intel/compiler/brw_fs.cpp >>>>>>>> index 2b8363ca362..f3045c4ff6c 100644 >>>>>>>> --- a/src/intel/compiler/brw_fs.cpp >>>>>>>> +++ b/src/intel/compiler/brw_fs.cpp >>>>>>>> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const >>>>>>>> this->dst.offset % REG_SIZE != 0); >>>>>>>> } >>>>>>>> >>>>>>>> +/** >>>>>>>> + * Returns a 32-bit uint whose bits represent if the associated >>>>>>>> register byte >>>>>>>> + * has been read/written by the instruction. The returned pattern >>>>>>>> takes into >>>>>>>> + * account the exec_size of the instruction, the type bitsize and the >>>>>>>> register >>>>>>>> + * stride and the register is source or destination for the >>>>>>>> instruction. >>>>>>>> + * >>>>>>>> + * The objective of this function is to identify which parts of the >>>>>>>> register >>>>>>>> + * are read or written for operations that don't read/write a full >>>>>>>> register. >>>>>>>> + * So we can identify in live range variable analysis if a partial >>>>>>>> write has >>>>>>>> + * completelly defined the part of the register used by a partial >>>>>>>> read. So we >>>>>>>> + * avoid extending the liveness range because all data read was >>>>>>>> already >>>>>>>> + * defined although the wasn't completely written. >>>>>>>> + */ >>>>>>>> +unsigned >>>>>>>> +fs_inst::register_byte_use_pattern(const fs_reg &r, boolean is_dst) >>>>>>>> const >>>>>>>> +{ >>>>>>>> + if (is_dst) { >>>>>> >>>>>>> Please split into two functions (like fs_inst::src_read and >>>>>>> ::src_written) since that would make the call-sites of this method more >>>>>>> self-documenting than a boolean parameter. You should be able to share >>>>>>> code by refactoring the common logic into a separate function (see below >>>>>>> for some suggestions on how that could be achieved). >>>>>> >>>>>> Sure, it would improve readability and simplifies the logic, I've chosen >>>>>> dst_write_pattern and src_read_pattern. >>>>>> >>>>>>> >>>>>>>> + /* We don't know what is written so we return the worts case */ >>>>>>> >>>>>>> "worst" >>>>>> >>>>>> Fixed. >>>>>> >>>>>>>> + if (this->predicate && this->opcode != BRW_OPCODE_SEL) >>>>>>>> + return 0; >>>>>>>> + /* We assume that send destinations are completely written */ >>>>>>>> + if (this->is_send_from_grf()) >>>>>>>> + return ~0u; >>>>>>> >>>>>>> Some send-like instructions won't be caught by this condition, you >>>>>>> should check for this->mlen != 0 in addition. >>>>>> >>>>>> Would it be enough to check for (this->mlen > 0) and forget about >>>>>> is_send_from_grf? I am using this approach in v2 I am sending. >>>>>> >>>>> >>>>> I don't think the mlen > 0 condition would catch all cases either... >>>>> E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC. You probably need both >>>>> conditions. Sucks... >>>> >>>> That is true, so now we have the: >>>> (this->is_send_from_grf() || this->mlen != 0) >>>> >>>>>>>> + } else { >>>>>>>> + /* byte_scattered_write_logical pattern of src[1] is 32-bit >>>>>>>> aligned >>>>>>>> + * so the read pattern depends on the bitsize stored at src[4] >>>>>>>> + */ >>>>>>>> + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL >>>>>>>> && >>>>>>>> + this->src[1].nr == r.nr) { >>>>>> >>>>>>> I feel uncomfortable about attempting to guess the source the caller is >>>>>>> referring to by comparing the registers for equality. E.g. you could >>>>>>> potentially end up with two sources that compare equal but have >>>>>>> different semantics (e.g. as a result of CSE) which might cause it to >>>>>>> get the wrong answer. It would probably be better to pass a source >>>>>>> index and a byte offset as argument instead of an fs_reg. >>>>>> >>>>>> I've didn't thought about CSE, I'm now receiving the number of source >>>>>> and the reg_offset. I'm using reg_offset instead of byte offsets as it >>>>>> simplifies the logic. Now we are using always the base src register to >>>>>> do all the calculation >>>>>>>> + switch (this->src[4].ud) { >>>>>>>> + case 32: >>>>>>>> + return ~0u; >>>>>>>> + case 16: >>>>>>>> + return 0x33333333; >>>>>>>> + case 8: >>>>>>>> + return 0x11111111; >>>>>>>> + default: >>>>>>>> + unreachable("Unsupported bitsize at >>>>>>>> byte_scattered_write_logical"); >>>>>>>> + } >>>>>>> >>>>>>> Replace the above switch statement with a call to "periodic_mask(8, 4, >>>>>>> this->src[4].ud / 8)" (see below for the definition). >>>>>> >>>>>> Ok. >>>>>> >>>>>>>> + } >>>>>>>> + /* As for byte_scattered_write_logical but we need to take into >>>>>>>> account >>>>>>>> + * that data written are in the payload offset 32 with SIMD8 >>>>>>>> and offset >>>>>>>> + * 64 with SIMD16. >>>>>>>> + */ >>>>>>>> + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE && >>>>>>>> + this->src[0].nr == r.nr) { >>>>>>>> + fs_reg payload = this->src[0]; >>>>>>>> + payload.offset = REG_SIZE * this->exec_size / 8; >>>>>>> >>>>>>> byte_offset() is your friend. >>>>>> >>>>>> I've removed the overlap logic, and I'm just checking if we are in the >>>>>> reg_offset 1 on SIMD8 or reg_offset 2-3 in SIMD16. >>>>>> >>>>>>>> + if (regions_overlap(r, REG_SIZE, >>>>>>>> + payload, REG_SIZE * this->exec_size / >>>>>>>> 8)) { >>>>>>>> + switch (this->src[2].ud) { >>>>>>>> + case 32: >>>>>>>> + return ~0u; >>>>>>>> + case 16: >>>>>>>> + return 0x33333333; >>>>>>>> + case 8: >>>>>>>> + return 0x11111111; >>>>>>>> + default: >>>>>>>> + unreachable("Unsupported bitsize at >>>>>>>> byte_scattered_write"); >>>>>>>> + } >>>>>>> >>>>>>> Replace the above switch statement with a call to "periodic_mask(8, 4, >>>>>>> this->src[2].ud / 8)". >>>>>> >>>>>> Ok. >>>>>> >>>>>>>> + } else { >>>>>>>> + return ~0u; >>>>>>>> + } >>>>>>>> + } >>>>>>>> + } >>>>>>>> + >>>>>>>> + /* We define the most conservative value in order to calculate >>>>>>>> liveness >>>>>>>> + * range. If it is a destination nothing is defined and if is a >>>>>>>> source >>>>>>>> + * all the bytes of the register could be read. So for release >>>>>>>> builds >>>>>>>> + * the unreachables would have always safe return value. */ >>>>>>>> + unsigned pattern = is_dst ? 0 : ~0u; >>>>>>>> + >>>>>>>> + /* In the general case we calculate the pattern for a specific >>>>>>>> register >>>>>>>> + * on base of the type_size and stride. We calculate the SIMD8 >>>>>>>> pattern >>>>>>>> + * and then we adjust the patter if needed for different exec_sizes >>>>>>>> + * and offset >>>>>>>> + */ >>>>>>>> + switch (type_sz(r.type)){ >>>>>>>> + case 1: >>>>>>>> + switch (r.stride) { >>>>>>>> + case 0: >>>>>>>> + pattern = 0X1; >>>>>>>> + break; >>>>>>>> + case 1: >>>>>>>> + pattern = 0xff; >>>>>>>> + break; >>>>>>>> + case 2: >>>>>>>> + pattern = 0x5555; >>>>>>>> + break; >>>>>>>> + case 4: >>>>>>>> + pattern = 0x11111111; >>>>>>>> + break; >>>>>>>> + case 8: >>>>>>>> + pattern = 0x01010101; >>>>>>>> + break; >>>>>>>> + default: >>>>>>>> + unreachable("Unknown pattern unsupported 8-bit stride"); >>>>>>>> + } >>>>>>>> + break; >>>>>>>> + case 2: >>>>>>>> + switch (r.stride) { >>>>>>>> + case 0: >>>>>>>> + pattern = 0X3; >>>>>>>> + break; >>>>>>>> + case 1: >>>>>>>> + pattern = 0xffff; >>>>>>>> + break; >>>>>>>> + case 2: >>>>>>>> + pattern = 0x33333333; >>>>>>>> + break; >>>>>>>> + case 4: >>>>>>>> + pattern = 0x03030303; >>>>>>>> + break; >>>>>>>> + case 8: >>>>>>>> + pattern = 0x00030003; >>>>>>>> + break; >>>>>>>> + default: >>>>>>>> + unreachable("Unknown pattern unsupported 16-bit stride"); >>>>>>>> + } >>>>>>>> + break; >>>>>>>> + case 4: >>>>>>>> + switch (r.stride) { >>>>>>>> + case 0: >>>>>>>> + pattern = 0Xf; >>>>>>>> + break; >>>>>>>> + case 1: >>>>>>>> + pattern = ~0u; >>>>>>>> + break; >>>>>>>> + case 2: >>>>>>>> + pattern = 0x0f0f0f0f; >>>>>>>> + break; >>>>>>>> + case 4: >>>>>>>> + pattern = 0x000f000f; >>>>>>>> + break; >>>>>>>> + default: >>>>>>>> + unreachable("Unknown pattern unsupported 32-bit stride"); >>>>>>>> + } >>>>>>>> + break; >>>>>>>> + case 8: >>>>>>>> + switch (r.stride) { >>>>>>>> + case 0: >>>>>>>> + pattern = 0Xff; >>>>>>>> + break; >>>>>>>> + case 1: >>>>>>>> + pattern = ~0u; >>>>>>>> + break; >>>>>>>> + case 2: >>>>>>>> + pattern = 0x00ff00ff; >>>>>>>> + break; >>>>>>>> + case 4: >>>>>>>> + pattern = 0xff; >>>>>>>> + break; >>>>>>>> + default: >>>>>>>> + unreachable("Unknown pattern unsupported 64-bit stride"); >>>>>>>> + } >>>>>>>> + break; >>>>>>>> + default: >>>>>>>> + unreachable("Unknown pattern for unsupported bitsize "); >>>>>>>> + } >>>>>>>> + >>>>>>>> + if (this->exec_size > 8 && r.stride * type_sz(r.type) * 8 < >>>>>>>> REG_SIZE) { >>>>>>>> + /* For exec_size greater than SIMD8 we repeat the pattern until >>>>>>>> it >>>>>>>> + * represents a full register already represent a full register >>>>>>>> */ >>>>>>>> + pattern = pattern | (pattern << (8 * r.stride * >>>>>>>> type_sz(r.type))); >>>>>>>> + if (this->exec_size > 16 && r.stride * type_sz(r.type) * 16 < >>>>>>>> REG_SIZE) >>>>>>>> + pattern = pattern | (pattern << (16 * r.stride * >>>>>>>> type_sz(r.type))); >>>>>>>> + } else if (this->exec_size < 8 && >>>>>>>> + r.stride * type_sz(r.type) * this->exec_size < >>>>>>>> REG_SIZE) { >>>>>>>> + /* For exec_size smaller than SIMD8 we reduce the pattern if >>>>>>>> its size >>>>>>>> + * is smaller than a full register. */ >>>>>>>> + pattern = pattern >> (MIN2(REG_SIZE, 8 * type_sz(r.type) * >>>>>>>> r.stride) - >>>>>>>> + this->exec_size * type_sz(r.type) * >>>>>>>> r.stride); >>>>>>>> + } >>>>>>>> + >>>>>> >>>>>>> This seems really mad, no clue whether it's correct. Why not replace >>>>>>> the above ~110 lines with a call to the following (fully >>>>>>> untested) 5-LOC function: >>>>>> >>>>>> Your suggestion seems to work perfectly fine, my original approach was >>>>>> trying to avoid the loop of creating the read/write pattern but after >>>>>> testing my v2 I wasn't able to notice any performance difference running >>>>>> shader-db and having the same results. I was originally trying that >>>>>> SIMD8 patterns were already constants. Sorry for the added complexity. >>>>>> >>>>> >>>>> The loop runs for a logarithmic number of iterations though, so it has >>>>> the exact same run-time complexity as your original patch, roughly the >>>>> same amount of branches at run-time (but possibly less indirect >>>>> branches!), and it should be compiled into a substantially lower number >>>>> of instructions, which may actually cause it to perform better due to >>>>> more favourable caching. It's hard to tell though which one will >>>>> perform better in practice without benchmarking them, and as you >>>>> probably realized this is so far from being a bottleneck that whatever >>>>> the difference was is likely lost in the noise. So it really doesn't >>>>> matter which one performs better... >>>>> >>>>>>> >>>>>>> | uint32_t >>>>>>> | periodic_mask(unsigned count, unsigned step, unsigned bits) >>>>>>> | { >>>>>>> | uint32_t m = (count ? (1 << bits) - 1 : 0); >>>>>>> | const unsigined max = MIN2(count * step, sizeof(m) * CHAR_BITS); >>>>>>> | >>>>>>> | for (unsigned shift = step; shift < max; shift *= 2) >>>>>>> | m |= m << shift; >>>>>>> | >>>>>>> | return m; >>>>>>> | } >>>>>> >>>>>> I've used your function just changing the sizeof(m) * CHAR_BITS for the >>>>>> REG_SIZE, to not include the limits.h. >>>>> >>>>> My intention was to make the function as agnostic to IR details as >>>>> possible: the only reason there is a limit of 32 bits is because that's >>>>> the size of the type used to hold the return value. Using sizeof makes >>>>> sure that e.g. extending the code to 64 bits is as simple as changing >>>>> the datatype to uint64_t. >>>>> >>>>>> And I've also included an offset parameter that allows us to shift the >>>>>> bits of the pattern when we have an offset inside the register. >>>>>> >>>>> >>>>> That sounds fine. >>>>> >>>>>>>> + /* We adjust the pattern to the byte_offset of the register */ >>>>>>>> + pattern = pattern << (r.offset % REG_SIZE); >>>>>>>> + >>>>>>> >>>>>>> This doesn't really work except for r equal to the first GRF read by the >>>>>>> source. Regions with non-zero sub-GRF offset that straddle multiple >>>>>>> GRFs are not really very common at the IR level, but they're allowed by >>>>>>> the hardware with some restrictions, so it would make sense to at least >>>>>>> handle them conservatively in order to keep things from breaking >>>>>>> silently. >>>>>> >>>>>> >>>>>> I included an assert in the periodic_mask function to detect if the >>>>>> combination of offset and mask goes over the REG_SIZE, so we would >>>>>> detect an straddle at this level of the IR. >>>>>> >>>>>> assert(offset + max - (step - bits) <= REG_SIZE); >>>> >>>>> Uhm... What's the definition of "offset" here? You seem to be passing >>>>> the offset relative to the start of the VGRF modulo REG_SIZE but that >>>>> doesn't really make sense to me whenever "reg_offset" is non-zero. >>>> >>>> Yes I agree on removing this assert in the periodic_mask as it a generic >>>> function. I was to focus on seen the integer as the register :) >>>> >>>> The motivation was because of the following cases that happen with >>>> 8/16-bit more usually: >>>> >>>> (1) mov(16) g12<2>HF g1<16,8,2>HF { align1 1H }; >>>> (2) mov(16) g12.1<2>HF g2<16,8,2>HF { align1 1H }; >>>> >>>> In the dst_write_pattern, at point of calling the general periodic_mask >>>> case we know that instruction is not expected to be a send message >>>> destination. So the pattern is the same for reg_offset = 0 and >>>> reg_offset = 1. >>>> >>> >>> In this specific example, yes, but in general the pattern will not be >>> the same for multiple registers for arbitrary fs_reg::offset values. >> >> I've been running some tests. I've found that the three opcodes reach >> this situation in shader-db (MOV, INDIRECT_MOV and OR). >> >> If we just manage the MOV opcode that we agree that the same pattern is >> repeated for the same source/dst used registers have the same spilling >> reduction that with my original patch. >> >>>> 0x3333333 for case (1) and 0xccccccc for case (2). But without repeating >>>> the pattern offset we would get 0xcccccccc for reg_offset=0 but >>>> 0x333333333 for reg_offset=1 witch would be incorrect. >>>> >>>>> I think you want to pass "src[i].offset % REG_SIZE - reg_offset * >>>>> REG_SIZE" as a signed integer in order to get the offset of the first >>>>> byte actually written by the instruction relative to the first byte of >>>>> the GRF window of the pattern. You don't really need to assert-fail >>>>> when the offset is greater or equal to 32 (which shouldn't actually >>>>> happen in practice), "return 0" gives you the correct behavior. For >>>>> negative offsets (which means the pattern starts after the first byte >>>>> written by the instruction) you can just return ~0u conservatively >>>>> whenever the current logic wouldn't work [assuming you don't feel like >>>>> implementing the code to handle that case accurately ;)]. >>>> >>>> Maybe you feel more comfortable with the following approach for >>>> src_read_pattern.: >>>> >>>> For read sources we also assume that SENDs sources are completely >>>> read so return ~0u except the byte_scattered_write source exceptions. >>>> >>>> if (this->is_send_from_grf() || this->mlen != 0) >>>> return ~0u; >>>> >>>> So after this I think that we can assume that the following condition >>>> should be always correct where source operand must reside within two >>>> adjacent 256-bit registers so the pattern would be periodic for >>>> all reg_offsets and we can use the (this->src[i].offset % REG_SIZE). >>>> >>> >>> That's not a particularly futureproof assumption. I'm okay with >>> handling non-reg_offset-periodic cases inaccurately for the moment for >>> the sake of simplicity, but there is no reason to have the code blow up >>> in such cases if you could just return ~0u. >> >> I've added some modifications in the v3, we return ~0u for reads and 0u >> for writes by default, but i added the case of !this->is_partial_write() >> that will return ~0u for writes, that won't we reached with current use, >> but to be coherent with any future uses of this function. >> >>>> assert(reg_offset < DIV_ROUND_UP(this->src[i].stride ? >>>> this->src[i].stride * type_sz(this->src[i].type) * this->exec_size : >>>> type_sz(this->src[i].type), REG_SIZE)); >>>> >> >>> I don't think it's useful to assert-fail on this condition, since it >>> doesn't handle all cases where the code below will be broken for a >>> certain valid IR, and the condition it does protect against can be >>> handled trivially in periodic_mask(), by returning zero when the 32-byte >>> pattern window falls outside the region read or written by the >>> instruction -- But currently you can't even know whether that's the >>> case, because periodic_mask() is unaware where the 32-byte window starts >>> relative to the source region, which is *all* it cares about, instead >>> you're passing an offset to it that is equal to that in some special >>> cases -- In all other cases the result of the function will be bogus. >> >> I understand your concerns, so in my V3, I am using only the >> periodic_mask function when reg_offset is 0, and for the MOV special >> case that I understand we agree that the internal offset for >> reg_offset=1 is generally reg.offset % REG_SIZE. With that two cases >> covered we solve the register_pressure on current supported 64/16/8 bit >> cases. >>
> I don't understand. The MOV opcode is not substantially different from > any other ALU opcode with regards to regioning. The assumptions this is > based on are invalid for general Align1 regions, whether the opcode is a > MOV or not. Let's try step by step to find my misunderstanding :-) My original understanding was that for a general ALU operations to define witch parts of the VGRF are written/read we need to use 4 different variables in the scalar backend for src/dst registers. - The type size. - The horizontal stride. - The execution size. - The initial byte offset of the first register of src[i]/dst If stride==1 && type_size * execution_size>=REG_SIZE && byte_offset== 0 we have a full/write of one or two registers. In that case the solution is easy everything is written ~0u. If we have a partial write/read: I understood that you my initial patter proposal would only be ok for the first GRF of src[i]/dst (reg_offset == 0) periodic_mask(this->exec_size, /* count */ this->src[i].stride * type_sz(this->src[i].type), /*step */ type_sz(this->src[i].type), /* bits */ this->src[i].offset % REG_SIZE); /* offset */ In the case we manage only reg_offset == 0 we get a huge improvement reducing all problems many of the register_pressure we have now on all SIMD8 shaders with 8/16bits test cases. I understood that you didn't agree that for cases where src/destination use more than 1 GRF (reg_offset == 1) we can not guarantee that we can apply the same internal offset (this->src[i].offset % REG_SIZE) as the base register to calculate a patter. So It would be better to return ~0u on reads or 0u in writes. That is the point were I couldn't found a counter-case with the search I did to identify the different register region that appear in shaderdb generated MOV instructions and 16/8 bit CTS tests, that used two registers with a region pattern that implied a different byte_pattern, for the first and the second register. But it should exist :) If my bad understanding is before this point, I should back to re-review the Regioning PRM section. Or maybe I messing the IR level and the emission level that is another option with different semantics :-? Thank you for your time. Chema >>>> So we could call in this scenario. >>>> >>>> return periodic_mask(this->exec_size, >>>> this->src[i].stride * type_sz(this->src[i].type), >>>> type_sz(this->src[i].type), >>>> this->src[i].offset % REG_SIZE); >>>> >>>> The only issue I could think about that could generate issues would a >>>> case that I didn't found in our current code where a source is defined >>>> like r5.0<1;8;2>:w explained at PRM KBL vol07 Page 790 "A 16-element >>>> register region with interleaved rows (r5.0<1;8,2>:w)". >>>> >>> >>> You couldn't really handle that even if you wanted to, because there is >>> no way to represent such a thing at the IR level, since the horizontal >>> and vertical strides cannot be controlled independently, only the stride >>> member is meaningful for VGRFs. >> >> Good, that simplifies things. >> >> Thanks again. >> >> Chema >> >>>> What do you think? >>>> >>>> Chema >>>> >>>> >>>>>> Thanks for the review, I think that v2 has better shape. >>>>>> >>>>> >>>>> No problem. >>>>> >>>>>> Chema >>>>>> >>>>>>>> + assert(pattern); >>>>>>>> + >>>>>>>> + return pattern; >>>>>>>> +} >>>>>>>> + >>>>>>>> + >>>>>>>> unsigned >>>>>>>> fs_inst::components_read(unsigned i) const >>>>>>>> { >>>>>>>> diff --git a/src/intel/compiler/brw_ir_fs.h >>>>>>>> b/src/intel/compiler/brw_ir_fs.h >>>>>>>> index 92dad269a34..5ea6294b8ad 100644 >>>>>>>> --- a/src/intel/compiler/brw_ir_fs.h >>>>>>>> +++ b/src/intel/compiler/brw_ir_fs.h >>>>>>>> @@ -350,6 +350,7 @@ public: >>>>>>>> bool equals(fs_inst *inst) const; >>>>>>>> bool is_send_from_grf() const; >>>>>>>> bool is_partial_write() const; >>>>>>>> + unsigned register_byte_use_pattern(const fs_reg &r, boolean >>>>>>>> is_dst) const; >>>>>>>> bool is_copy_payload(const brw::simple_allocator &grf_alloc) const; >>>>>>>> unsigned components_read(unsigned i) const; >>>>>>>> unsigned size_read(int arg) const; >>>>>>>> -- >>>>>>>> 2.17.1 >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> mesa-dev mailing list >>>>>>>> mesa-dev@lists.freedesktop.org >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> mesa-dev mailing list >>>>>>>> mesa-dev@lists.freedesktop.org >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> mesa-dev mailing list >>>>>>>> mesa-dev@lists.freedesktop.org >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> mesa-dev mailing list >>>>>>>> mesa-dev@lists.freedesktop.org >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev