Re: [Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
On Tue, May 5, 2015 at 4:13 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Wed, Apr 15, 2015 at 6:44 PM, Matt Turner matts...@gmail.com wrote: On Wed, Apr 15, 2015 at 5:13 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Wed, Apr 15, 2015 at 1:31 PM, Matt Turner matts...@gmail.com wrote: On Wed, Apr 1, 2015 at 6:19 PM, Jason Ekstrand ja...@jlekstrand.net wrote: Instead of the complicated and broken-by-design pile of heuristics we had before, we now have a straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and saturate modifiers as the LOAD_PAYLOAD operation itself. Let's not do this. Nothing puts a saturate modifier on LOAD_PAYLOAD today, and it is kind of confusing about what it means. Can't we have fbwrites that write depth as well. I wouldn't think we wanted to saturate that. Sure. I can drop saturate and just assert that it's not set. We do want to keep force_sechalf and force_writemask_all though. I didn't think about those before, but I don't know how a load_payload could have force_writemask_all set. Have I missed something? No, no one (to my knowlege) sets force_writemask_all on it but I see no reason why it shouldn't be respected. As for saturate, we do for fb_writes when key-clamp_fragment_color is set. --Jason In your last email in this thread, I thought we'd agreed not to do anything (i.e., not allow) saturate on load_payload. Handling it seems confusing and moreover, unnecessary. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
On Tue, May 5, 2015 at 4:20 PM, Matt Turner matts...@gmail.com wrote: On Tue, May 5, 2015 at 4:13 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Wed, Apr 15, 2015 at 6:44 PM, Matt Turner matts...@gmail.com wrote: On Wed, Apr 15, 2015 at 5:13 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Wed, Apr 15, 2015 at 1:31 PM, Matt Turner matts...@gmail.com wrote: On Wed, Apr 1, 2015 at 6:19 PM, Jason Ekstrand ja...@jlekstrand.net wrote: Instead of the complicated and broken-by-design pile of heuristics we had before, we now have a straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and saturate modifiers as the LOAD_PAYLOAD operation itself. Let's not do this. Nothing puts a saturate modifier on LOAD_PAYLOAD today, and it is kind of confusing about what it means. Can't we have fbwrites that write depth as well. I wouldn't think we wanted to saturate that. Sure. I can drop saturate and just assert that it's not set. We do want to keep force_sechalf and force_writemask_all though. I didn't think about those before, but I don't know how a load_payload could have force_writemask_all set. Have I missed something? No, no one (to my knowlege) sets force_writemask_all on it but I see no reason why it shouldn't be respected. As for saturate, we do for fb_writes when key-clamp_fragment_color is set. --Jason In your last email in this thread, I thought we'd agreed not to do anything (i.e., not allow) saturate on load_payload. Handling it seems confusing and moreover, unnecessary. Right. It'll be a bit of a pain, but I can get that fixed... --Jason ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
On Wed, Apr 15, 2015 at 6:44 PM, Matt Turner matts...@gmail.com wrote: On Wed, Apr 15, 2015 at 5:13 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Wed, Apr 15, 2015 at 1:31 PM, Matt Turner matts...@gmail.com wrote: On Wed, Apr 1, 2015 at 6:19 PM, Jason Ekstrand ja...@jlekstrand.net wrote: Instead of the complicated and broken-by-design pile of heuristics we had before, we now have a straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and saturate modifiers as the LOAD_PAYLOAD operation itself. Let's not do this. Nothing puts a saturate modifier on LOAD_PAYLOAD today, and it is kind of confusing about what it means. Can't we have fbwrites that write depth as well. I wouldn't think we wanted to saturate that. Sure. I can drop saturate and just assert that it's not set. We do want to keep force_sechalf and force_writemask_all though. I didn't think about those before, but I don't know how a load_payload could have force_writemask_all set. Have I missed something? No, no one (to my knowlege) sets force_writemask_all on it but I see no reason why it shouldn't be respected. As for saturate, we do for fb_writes when key-clamp_fragment_color is set. --Jason I see that setup_color_payload sets force_sechalf for dual-source fbwrites -- that's the only case we're going to have force_sechalf set, right? That is, the Gen 6 case is going to be handled by passing 16-channel sources to load_payload and letting it do compr4? I don't think it buys us anything. If we just run copy propagation after lower_load_payload() we'll get the code we want. [snip] +/* The COMPR4 code took care of the first 4 sources. We'll let + * the regular path handle any remaining sources. Yes, we are + * modifying the instruction but we're about to delete it so + * this really doesn't hurt anything. + */ +inst-header_size += 4; I mean, while the comment is a true statement, why is doing this any better than just... + } + + for (uint8_t i = inst-header_size; i inst-sources; i++) { ... changing this to inst-header_size + 4? Because the inst-header_size += 4 is predicated on it being a COMPR4 destination while the code below handles both the remaining sources (in the COMPR4 case) and the regular non-COMPR4 case. Ahh, right. It'd be fewer lines (no commenting necessary) to just have a 'start' variable that you set to header_size at the top and +=4 here. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
On Wed, Apr 1, 2015 at 6:19 PM, Jason Ekstrand ja...@jlekstrand.net wrote: Instead of the complicated and broken-by-design pile of heuristics we had before, we now have a straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and saturate modifiers as the LOAD_PAYLOAD operation itself. Let's not do this. Nothing puts a saturate modifier on LOAD_PAYLOAD today, and it is kind of confusing about what it means. Can't we have fbwrites that write depth as well. I wouldn't think we wanted to saturate that. I don't think it buys us anything. If we just run copy propagation after lower_load_payload() we'll get the code we want. 3) In order to accommodate older gens that need interleaved colors, lower_load_payload detects when the destination is a COMPR4 register and automatically interleaves the non-header sources. The lower_load_payload pass does the right thing here regardless of whether or not the hardware actually supports COMPR4. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 154 ++- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 - 2 files changed, 80 insertions(+), 77 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index f8e26c0..bc45a38 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3201,93 +3201,99 @@ fs_visitor::lower_load_payload() { bool progress = false; - int vgrf_to_reg[alloc.count]; - int reg_count = 0; - for (unsigned i = 0; i alloc.count; ++i) { - vgrf_to_reg[i] = reg_count; - reg_count += alloc.sizes[i]; - } - - struct { - bool written:1; /* Whether this register has ever been written */ - bool force_writemask_all:1; - bool force_sechalf:1; - } metadata[reg_count]; - memset(metadata, 0, sizeof(metadata)); - foreach_block_and_inst_safe (block, fs_inst, inst, cfg) { - if (inst-dst.file == GRF) { - const int dst_reg = vgrf_to_reg[inst-dst.reg] + inst-dst.reg_offset; - bool force_sechalf = inst-force_sechalf - !inst-force_writemask_all; - bool toggle_sechalf = inst-dst.width == 16 - type_sz(inst-dst.type) == 4 - !inst-force_writemask_all; - for (int i = 0; i inst-regs_written; ++i) { -metadata[dst_reg + i].written = true; -metadata[dst_reg + i].force_sechalf = force_sechalf; -metadata[dst_reg + i].force_writemask_all = inst-force_writemask_all; -force_sechalf = (toggle_sechalf != force_sechalf); - } - } - if (inst-opcode == SHADER_OPCODE_LOAD_PAYLOAD) { We should invert this condition and 'continue' so that we can avoid an extra level of indentation which only makes the code harder to read. assert(inst-dst.file == MRF || inst-dst.file == GRF); + fs_reg dst = inst-dst; - for (int i = 0; i inst-sources; i++) { -dst.width = inst-src[i].effective_width; -dst.type = inst-src[i].type; - -if (inst-src[i].file == BAD_FILE) { - /* Do nothing but otherwise increment as normal */ -} else if (dst.file == MRF - dst.width == 8 - brw-has_compr4 - i + 4 inst-sources - inst-src[i + 4].equals(horiz_offset(inst-src[i], 8))) { - fs_reg compr4_dst = dst; - compr4_dst.reg += BRW_MRF_COMPR4; - compr4_dst.width = 16; - fs_reg compr4_src = inst-src[i]; - compr4_src.width = 16; - fs_inst *mov = MOV(compr4_dst, compr4_src); + /* Get rid of COMPR4. We'll add it back in if we need it */ + if (dst.file == MRF dst.reg BRW_MRF_COMPR4) +dst.reg = dst.reg ~BRW_MRF_COMPR4; No point in checking whether the BRW_MRF_COMPR4 bit is set before clearing it. Just clear it. + + dst.width = 8; + for (uint8_t i = 0; i inst-header_size; i++) { +if (inst-src[i].file != BAD_FILE) { + fs_reg mov_dst = retype(dst, BRW_REGISTER_TYPE_UD); + fs_reg mov_src = retype(inst-src[i], BRW_REGISTER_TYPE_UD); + mov_src.width = 8; + fs_inst *mov = MOV(mov_dst, mov_src); mov-force_writemask_all = true; inst-insert_before(block, mov); - /* Mark i+4 as BAD_FILE so we don't emit a MOV for it */ - inst-src[i + 4].file = BAD_FILE; -} else { - fs_inst *mov = MOV(dst, inst-src[i]); -
Re: [Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
On Fri, Apr 3, 2015 at 7:28 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Thu, Apr 2, 2015 at 3:01 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: Instead of the complicated and broken-by-design pile of heuristics we had before, we now have a straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and saturate modifiers as the LOAD_PAYLOAD operation itself. 3) In order to accommodate older gens that need interleaved colors, lower_load_payload detects when the destination is a COMPR4 register and automatically interleaves the non-header sources. The lower_load_payload pass does the right thing here regardless of whether or not the hardware actually supports COMPR4. I had a quick glance at the series and it seems to be going in the right direction. One thing I honestly don't like is the ad-hoc and IMHO premature treatment of payload headers, it still feels like the LOAD_PAYLOAD instruction has more complex semantics than necessary and the benefit is unclear to me. I suppose that your motivation was to avoid setting force_writemask_all in LOAD_PAYLOAD instructions with header. The optimizer should be able to cope with those flags and get rid of them from the resulting moves where they are redundant, and if it's not able to it looks like something that should be fixed anyway. The explicit handling of headers is responsible for much of the churn in this series and is likely to complicate users of LOAD_PAYLOAD and optimization passes that have to manipulate them. Avoiding force_writemask_all is only half of the motivation and the small half at that. A header source, more properly defined, is a single physical register that, conceptually, applies to all channels. Effectively, a header source (I should have stated this clearly) has two properties: 1) It has force_writemask_all set 2) It is exactly one physical hardware register. This second property is the more important of the two. Most of the disaster of the previous LOAD_PAYLOAD implementation was that we did a pile of guesswork and had a ill-conceved effective width think in order to figure out how big the register actually was. Making the user specify which sources are header sources eliminates that guesswork. It also has the nice side-effect that we can do the right force_writemask_all and we can properly handle COMPR4 for the the user. Yeah, true, but this seems like the least orthogonal and most annoying to use solution for this problem, it forces the caller to provide redundant information, it takes into account the saturate flag on some arguments and not others, There's no way to get a saturated load_payload, as far as I can tell. Let's just avoid the problem you mention all together and continue not doing it. I agree that this patch adding support for that is not a good idea. it shuffles sources with respect to the specified order when COMPR4 is set, but only for the first four non-header sources. To confirm I understand -- the existing lower_load_payload takes 2x 8-channel color sources and recognizes that src[i] and src[i+4] can be emitted as a compr4 MOV. The proposed patch changes that to take 1x 16-channel color source and just emits a compr4 MOV. I don't really find that change particularly contentious. In fact, I kind of wonder what case we were trying to handle by allowing separate sources for the first 8-channels and the second 8-channels. Is there some case where these are going to be produced by different instructions that can't do compr4? Texturing or math instructions come to mind, but they can't have MRF destinations anyway. In another email, you mention that compr4 change in this patch means that optimization passes need to know some extra information. I'm not able to think of any places where we'd need that information in optimization passes. The only passes that really operate on load_payload are CSE, copy propagate, and register coalescing; and I don't think any of these will touch things writing to MRFs. Am I forgetting something? I think I understand the concern in general -- the sources are copied into different destination registers dependent on (dst.reg BRW_MRF_COMPR4) == 0. That's a little strange, but that strangeness is in the hardware. I mean, if it's strange for load_payload to do this, isn't it strange for other hardware instructions to do it? Or is it just that it's a virtual opcode doing a hardware-specific thing? I don't know. It seems like it's just hiding a little bit of complexity (give me 16-channels and I'll emit compr4 MOVs to put them in the right hardware-specific places). I think any of the following
Re: [Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
On Wed, Apr 15, 2015 at 1:31 PM, Matt Turner matts...@gmail.com wrote: On Wed, Apr 1, 2015 at 6:19 PM, Jason Ekstrand ja...@jlekstrand.net wrote: Instead of the complicated and broken-by-design pile of heuristics we had before, we now have a straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and saturate modifiers as the LOAD_PAYLOAD operation itself. Let's not do this. Nothing puts a saturate modifier on LOAD_PAYLOAD today, and it is kind of confusing about what it means. Can't we have fbwrites that write depth as well. I wouldn't think we wanted to saturate that. Sure. I can drop saturate and just assert that it's not set. We do want to keep force_sechalf and force_writemask_all though. I don't think it buys us anything. If we just run copy propagation after lower_load_payload() we'll get the code we want. 3) In order to accommodate older gens that need interleaved colors, lower_load_payload detects when the destination is a COMPR4 register and automatically interleaves the non-header sources. The lower_load_payload pass does the right thing here regardless of whether or not the hardware actually supports COMPR4. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 154 ++- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 - 2 files changed, 80 insertions(+), 77 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index f8e26c0..bc45a38 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3201,93 +3201,99 @@ fs_visitor::lower_load_payload() { bool progress = false; - int vgrf_to_reg[alloc.count]; - int reg_count = 0; - for (unsigned i = 0; i alloc.count; ++i) { - vgrf_to_reg[i] = reg_count; - reg_count += alloc.sizes[i]; - } - - struct { - bool written:1; /* Whether this register has ever been written */ - bool force_writemask_all:1; - bool force_sechalf:1; - } metadata[reg_count]; - memset(metadata, 0, sizeof(metadata)); - foreach_block_and_inst_safe (block, fs_inst, inst, cfg) { - if (inst-dst.file == GRF) { - const int dst_reg = vgrf_to_reg[inst-dst.reg] + inst-dst.reg_offset; - bool force_sechalf = inst-force_sechalf - !inst-force_writemask_all; - bool toggle_sechalf = inst-dst.width == 16 - type_sz(inst-dst.type) == 4 - !inst-force_writemask_all; - for (int i = 0; i inst-regs_written; ++i) { -metadata[dst_reg + i].written = true; -metadata[dst_reg + i].force_sechalf = force_sechalf; -metadata[dst_reg + i].force_writemask_all = inst-force_writemask_all; -force_sechalf = (toggle_sechalf != force_sechalf); - } - } - if (inst-opcode == SHADER_OPCODE_LOAD_PAYLOAD) { We should invert this condition and 'continue' so that we can avoid an extra level of indentation which only makes the code harder to read. Sure. I can do that. assert(inst-dst.file == MRF || inst-dst.file == GRF); + fs_reg dst = inst-dst; - for (int i = 0; i inst-sources; i++) { -dst.width = inst-src[i].effective_width; -dst.type = inst-src[i].type; - -if (inst-src[i].file == BAD_FILE) { - /* Do nothing but otherwise increment as normal */ -} else if (dst.file == MRF - dst.width == 8 - brw-has_compr4 - i + 4 inst-sources - inst-src[i + 4].equals(horiz_offset(inst-src[i], 8))) { - fs_reg compr4_dst = dst; - compr4_dst.reg += BRW_MRF_COMPR4; - compr4_dst.width = 16; - fs_reg compr4_src = inst-src[i]; - compr4_src.width = 16; - fs_inst *mov = MOV(compr4_dst, compr4_src); + /* Get rid of COMPR4. We'll add it back in if we need it */ + if (dst.file == MRF dst.reg BRW_MRF_COMPR4) +dst.reg = dst.reg ~BRW_MRF_COMPR4; No point in checking whether the BRW_MRF_COMPR4 bit is set before clearing it. Just clear it. Sure. + + dst.width = 8; + for (uint8_t i = 0; i inst-header_size; i++) { +if (inst-src[i].file != BAD_FILE) { + fs_reg mov_dst = retype(dst, BRW_REGISTER_TYPE_UD); + fs_reg mov_src = retype(inst-src[i], BRW_REGISTER_TYPE_UD); + mov_src.width = 8; + fs_inst *mov = MOV(mov_dst, mov_src); mov-force_writemask_all = true;
Re: [Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
On Wed, Apr 15, 2015 at 5:13 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Wed, Apr 15, 2015 at 1:31 PM, Matt Turner matts...@gmail.com wrote: On Wed, Apr 1, 2015 at 6:19 PM, Jason Ekstrand ja...@jlekstrand.net wrote: Instead of the complicated and broken-by-design pile of heuristics we had before, we now have a straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and saturate modifiers as the LOAD_PAYLOAD operation itself. Let's not do this. Nothing puts a saturate modifier on LOAD_PAYLOAD today, and it is kind of confusing about what it means. Can't we have fbwrites that write depth as well. I wouldn't think we wanted to saturate that. Sure. I can drop saturate and just assert that it's not set. We do want to keep force_sechalf and force_writemask_all though. I didn't think about those before, but I don't know how a load_payload could have force_writemask_all set. Have I missed something? I see that setup_color_payload sets force_sechalf for dual-source fbwrites -- that's the only case we're going to have force_sechalf set, right? That is, the Gen 6 case is going to be handled by passing 16-channel sources to load_payload and letting it do compr4? I don't think it buys us anything. If we just run copy propagation after lower_load_payload() we'll get the code we want. [snip] +/* The COMPR4 code took care of the first 4 sources. We'll let + * the regular path handle any remaining sources. Yes, we are + * modifying the instruction but we're about to delete it so + * this really doesn't hurt anything. + */ +inst-header_size += 4; I mean, while the comment is a true statement, why is doing this any better than just... + } + + for (uint8_t i = inst-header_size; i inst-sources; i++) { ... changing this to inst-header_size + 4? Because the inst-header_size += 4 is predicated on it being a COMPR4 destination while the code below handles both the remaining sources (in the COMPR4 case) and the regular non-COMPR4 case. Ahh, right. It'd be fewer lines (no commenting necessary) to just have a 'start' variable that you set to header_size at the top and +=4 here. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
On Fri, Apr 3, 2015 at 7:28 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Thu, Apr 2, 2015 at 3:01 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: Instead of the complicated and broken-by-design pile of heuristics we had before, we now have a straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and saturate modifiers as the LOAD_PAYLOAD operation itself. 3) In order to accommodate older gens that need interleaved colors, lower_load_payload detects when the destination is a COMPR4 register and automatically interleaves the non-header sources. The lower_load_payload pass does the right thing here regardless of whether or not the hardware actually supports COMPR4. I had a quick glance at the series and it seems to be going in the right direction. One thing I honestly don't like is the ad-hoc and IMHO premature treatment of payload headers, it still feels like the LOAD_PAYLOAD instruction has more complex semantics than necessary and the benefit is unclear to me. I suppose that your motivation was to avoid setting force_writemask_all in LOAD_PAYLOAD instructions with header. The optimizer should be able to cope with those flags and get rid of them from the resulting moves where they are redundant, and if it's not able to it looks like something that should be fixed anyway. The explicit handling of headers is responsible for much of the churn in this series and is likely to complicate users of LOAD_PAYLOAD and optimization passes that have to manipulate them. Avoiding force_writemask_all is only half of the motivation and the small half at that. A header source, more properly defined, is a single physical register that, conceptually, applies to all channels. Effectively, a header source (I should have stated this clearly) has two properties: 1) It has force_writemask_all set 2) It is exactly one physical hardware register. This second property is the more important of the two. Most of the disaster of the previous LOAD_PAYLOAD implementation was that we did a pile of guesswork and had a ill-conceved effective width think in order to figure out how big the register actually was. Making the user specify which sources are header sources eliminates that guesswork. It also has the nice side-effect that we can do the right force_writemask_all and we can properly handle COMPR4 for the the user. Yeah, true, but this seems like the least orthogonal and most annoying to use solution for this problem, it forces the caller to provide redundant information, it takes into account the saturate flag on some arguments and not others, it shuffles sources with respect to the specified order when COMPR4 is set, but only for the first four non-header sources. I think any of the following solutions would be better-behaved than the current approach: I don't know that saying which sources are headers is really redundant. It's explicit which is what we want. Yes, the COMPR4 thing is a bit magical but we have to do COMPR4 in lower_load_payload so we have to have some way of doing it and this method puts the interleving code in one place instead of two. 1/ Use the source width to determine the size of each copy. This would imply that the source width carries semantic information and hence would have to be left alone by e.g. copy propagation. That's what do now and it's terrible. The effective width field was basically a width that gets kept. 2/ Use the instruction exec size and flags to determine the properties of *all* copies. This means that if a header is present the exec size would necessarily have to be 8 and the halves of a 16-wide register would have to be specified separately, which sounds annoying at first but in practice wouldn't necessarily be because it could be handled by the LOAD_PAYLOAD() helper based on the argument widths without running into problems with optimization passes changing the meaning of the instruction. The semantics of the instruction itself would be as stupid as possible, but the implementation could still trivially recognise 16-wide and COMPR4 copies using the exact same mechanism you are using now. Yes, that might work. I'll try and take a swing at it today. It will *hopefully* have less code churn than the solution in this series because the magic will still happen, just in a different place. Of course, this solution also requires that we lower everything with force_writemask_all and *hopefully* we can get rid of those and set force_sechalf appropriately in optimization passes. 3/ Split LOAD_PAYLOAD into two separate instructions, each of them
Re: [Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
On Fri, Apr 3, 2015 at 8:37 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Apr 3, 2015 at 7:28 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Thu, Apr 2, 2015 at 3:01 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: Instead of the complicated and broken-by-design pile of heuristics we had before, we now have a straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and saturate modifiers as the LOAD_PAYLOAD operation itself. 3) In order to accommodate older gens that need interleaved colors, lower_load_payload detects when the destination is a COMPR4 register and automatically interleaves the non-header sources. The lower_load_payload pass does the right thing here regardless of whether or not the hardware actually supports COMPR4. I had a quick glance at the series and it seems to be going in the right direction. One thing I honestly don't like is the ad-hoc and IMHO premature treatment of payload headers, it still feels like the LOAD_PAYLOAD instruction has more complex semantics than necessary and the benefit is unclear to me. I suppose that your motivation was to avoid setting force_writemask_all in LOAD_PAYLOAD instructions with header. The optimizer should be able to cope with those flags and get rid of them from the resulting moves where they are redundant, and if it's not able to it looks like something that should be fixed anyway. The explicit handling of headers is responsible for much of the churn in this series and is likely to complicate users of LOAD_PAYLOAD and optimization passes that have to manipulate them. Avoiding force_writemask_all is only half of the motivation and the small half at that. A header source, more properly defined, is a single physical register that, conceptually, applies to all channels. Effectively, a header source (I should have stated this clearly) has two properties: 1) It has force_writemask_all set 2) It is exactly one physical hardware register. This second property is the more important of the two. Most of the disaster of the previous LOAD_PAYLOAD implementation was that we did a pile of guesswork and had a ill-conceved effective width think in order to figure out how big the register actually was. Making the user specify which sources are header sources eliminates that guesswork. It also has the nice side-effect that we can do the right force_writemask_all and we can properly handle COMPR4 for the the user. Ok, Allow me to be a bit more explicit as to what all we need to keep track of: 1) How big is the source register for real. Even immediates can end up being two registers in the copy. 2) Do we want force_writemask_all? 3) If not, do we want force_sechalf? 4) On g45 and gen5, we want to use COMPR4 for interlaced movs 5) When lowering, we want to use 16-wide moves when possible in SIMD16 With the patch series I sent, all of this is explicit except for COMPR4 which is, admittedly, kind of magic. Which of these sources are headers? is a reasonable question for the caller to answer. It knows explicitly and it would take the LOAD_PAYLOAD helper some work to guess it correctly. Another option would be to guess that based on exec sizes but then the caller has to know not to pass in the wrong register type or the guess will be wrong. I like explicit. Yeah, true, but this seems like the least orthogonal and most annoying to use solution for this problem, it forces the caller to provide redundant information, it takes into account the saturate flag on some arguments and not others, it shuffles sources with respect to the specified order when COMPR4 is set, but only for the first four non-header sources. I think any of the following solutions would be better-behaved than the current approach: I don't know that saying which sources are headers is really redundant. It's explicit which is what we want. Yes, the COMPR4 thing is a bit magical but we have to do COMPR4 in lower_load_payload so we have to have some way of doing it and this method puts the interleving code in one place instead of two. Well, at least with the previous approach LOAD_PAYLOAD had consistent (if broken) semantics across its arguments, and regardless of COMPR4 being used or not, which IMHO is preferable to the modest code saving. To be clear, I don't really like the way I did COMPR4 either. I just couldn't come up with anything better. 1/ Use the source width to determine the size of each copy. This would imply that the source width carries semantic information and hence would have to be left alone by e.g. copy
Re: [Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Apr 3, 2015 at 8:37 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Apr 3, 2015 at 7:28 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Thu, Apr 2, 2015 at 3:01 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: Instead of the complicated and broken-by-design pile of heuristics we had before, we now have a straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and saturate modifiers as the LOAD_PAYLOAD operation itself. 3) In order to accommodate older gens that need interleaved colors, lower_load_payload detects when the destination is a COMPR4 register and automatically interleaves the non-header sources. The lower_load_payload pass does the right thing here regardless of whether or not the hardware actually supports COMPR4. I had a quick glance at the series and it seems to be going in the right direction. One thing I honestly don't like is the ad-hoc and IMHO premature treatment of payload headers, it still feels like the LOAD_PAYLOAD instruction has more complex semantics than necessary and the benefit is unclear to me. I suppose that your motivation was to avoid setting force_writemask_all in LOAD_PAYLOAD instructions with header. The optimizer should be able to cope with those flags and get rid of them from the resulting moves where they are redundant, and if it's not able to it looks like something that should be fixed anyway. The explicit handling of headers is responsible for much of the churn in this series and is likely to complicate users of LOAD_PAYLOAD and optimization passes that have to manipulate them. Avoiding force_writemask_all is only half of the motivation and the small half at that. A header source, more properly defined, is a single physical register that, conceptually, applies to all channels. Effectively, a header source (I should have stated this clearly) has two properties: 1) It has force_writemask_all set 2) It is exactly one physical hardware register. This second property is the more important of the two. Most of the disaster of the previous LOAD_PAYLOAD implementation was that we did a pile of guesswork and had a ill-conceved effective width think in order to figure out how big the register actually was. Making the user specify which sources are header sources eliminates that guesswork. It also has the nice side-effect that we can do the right force_writemask_all and we can properly handle COMPR4 for the the user. Ok, Allow me to be a bit more explicit as to what all we need to keep track of: 1) How big is the source register for real. Even immediates can end up being two registers in the copy. 2) Do we want force_writemask_all? 3) If not, do we want force_sechalf? 4) On g45 and gen5, we want to use COMPR4 for interlaced movs 5) When lowering, we want to use 16-wide moves when possible in SIMD16 With the patch series I sent, all of this is explicit except for COMPR4 which is, admittedly, kind of magic. Which of these sources are headers? is a reasonable question for the caller to answer. It knows explicitly and it would take the LOAD_PAYLOAD helper some work to guess it correctly. Another option would be to guess that based on exec sizes but then the caller has to know not to pass in the wrong register type or the guess will be wrong. I like explicit. I'm not advocating LOAD_PAYLOAD to perform any guesswork, I'm advocating it to be more stupid -- Let it do as much as it can sensibly do with the information it already has, and no more. Yeah, true, but this seems like the least orthogonal and most annoying to use solution for this problem, it forces the caller to provide redundant information, it takes into account the saturate flag on some arguments and not others, it shuffles sources with respect to the specified order when COMPR4 is set, but only for the first four non-header sources. I think any of the following solutions would be better-behaved than the current approach: I don't know that saying which sources are headers is really redundant. It's explicit which is what we want. Yes, the COMPR4 thing is a bit magical but we have to do COMPR4 in lower_load_payload so we have to have some way of doing it and this method puts the interleving code in one place instead of two. Well, at least with the previous approach LOAD_PAYLOAD had consistent (if broken) semantics across its arguments, and regardless of COMPR4 being used or not, which IMHO is preferable to the modest code saving. To be clear, I don't really like the way I
Re: [Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
Jason Ekstrand ja...@jlekstrand.net writes: On Thu, Apr 2, 2015 at 3:01 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: Instead of the complicated and broken-by-design pile of heuristics we had before, we now have a straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and saturate modifiers as the LOAD_PAYLOAD operation itself. 3) In order to accommodate older gens that need interleaved colors, lower_load_payload detects when the destination is a COMPR4 register and automatically interleaves the non-header sources. The lower_load_payload pass does the right thing here regardless of whether or not the hardware actually supports COMPR4. I had a quick glance at the series and it seems to be going in the right direction. One thing I honestly don't like is the ad-hoc and IMHO premature treatment of payload headers, it still feels like the LOAD_PAYLOAD instruction has more complex semantics than necessary and the benefit is unclear to me. I suppose that your motivation was to avoid setting force_writemask_all in LOAD_PAYLOAD instructions with header. The optimizer should be able to cope with those flags and get rid of them from the resulting moves where they are redundant, and if it's not able to it looks like something that should be fixed anyway. The explicit handling of headers is responsible for much of the churn in this series and is likely to complicate users of LOAD_PAYLOAD and optimization passes that have to manipulate them. Avoiding force_writemask_all is only half of the motivation and the small half at that. A header source, more properly defined, is a single physical register that, conceptually, applies to all channels. Effectively, a header source (I should have stated this clearly) has two properties: 1) It has force_writemask_all set 2) It is exactly one physical hardware register. This second property is the more important of the two. Most of the disaster of the previous LOAD_PAYLOAD implementation was that we did a pile of guesswork and had a ill-conceved effective width think in order to figure out how big the register actually was. Making the user specify which sources are header sources eliminates that guesswork. It also has the nice side-effect that we can do the right force_writemask_all and we can properly handle COMPR4 for the the user. Yeah, true, but this seems like the least orthogonal and most annoying to use solution for this problem, it forces the caller to provide redundant information, it takes into account the saturate flag on some arguments and not others, it shuffles sources with respect to the specified order when COMPR4 is set, but only for the first four non-header sources. I think any of the following solutions would be better-behaved than the current approach: 1/ Use the source width to determine the size of each copy. This would imply that the source width carries semantic information and hence would have to be left alone by e.g. copy propagation. 2/ Use the instruction exec size and flags to determine the properties of *all* copies. This means that if a header is present the exec size would necessarily have to be 8 and the halves of a 16-wide register would have to be specified separately, which sounds annoying at first but in practice wouldn't necessarily be because it could be handled by the LOAD_PAYLOAD() helper based on the argument widths without running into problems with optimization passes changing the meaning of the instruction. The semantics of the instruction itself would be as stupid as possible, but the implementation could still trivially recognise 16-wide and COMPR4 copies using the exact same mechanism you are using now. 3/ Split LOAD_PAYLOAD into two separate instructions, each of them dead simple, say COLLECT and LOAD_HEADER. COLLECT dst, src0, ..., srcn would be equivalent to the LOAD_PAYLOAD instruction described in 2, but it would only be used to load full-width non-header sources of the payload, so you would avoid the need to split 16-wide registers in half. LOAD_HEADER dst, header, payload would handle the asymmetric requirements of prepending a header, like using 8-wide instead of exec_size-wide copies and setting force_writemask_all. You could use mlen to specify the size of payload as is usual for instructions taking a payload source. --Jason Thanks for looking into this BTW. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 154 ++- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 - 2 files changed, 80 insertions(+), 77 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
Re: [Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Apr 3, 2015 at 7:28 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Thu, Apr 2, 2015 at 3:01 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: Instead of the complicated and broken-by-design pile of heuristics we had before, we now have a straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and saturate modifiers as the LOAD_PAYLOAD operation itself. 3) In order to accommodate older gens that need interleaved colors, lower_load_payload detects when the destination is a COMPR4 register and automatically interleaves the non-header sources. The lower_load_payload pass does the right thing here regardless of whether or not the hardware actually supports COMPR4. I had a quick glance at the series and it seems to be going in the right direction. One thing I honestly don't like is the ad-hoc and IMHO premature treatment of payload headers, it still feels like the LOAD_PAYLOAD instruction has more complex semantics than necessary and the benefit is unclear to me. I suppose that your motivation was to avoid setting force_writemask_all in LOAD_PAYLOAD instructions with header. The optimizer should be able to cope with those flags and get rid of them from the resulting moves where they are redundant, and if it's not able to it looks like something that should be fixed anyway. The explicit handling of headers is responsible for much of the churn in this series and is likely to complicate users of LOAD_PAYLOAD and optimization passes that have to manipulate them. Avoiding force_writemask_all is only half of the motivation and the small half at that. A header source, more properly defined, is a single physical register that, conceptually, applies to all channels. Effectively, a header source (I should have stated this clearly) has two properties: 1) It has force_writemask_all set 2) It is exactly one physical hardware register. This second property is the more important of the two. Most of the disaster of the previous LOAD_PAYLOAD implementation was that we did a pile of guesswork and had a ill-conceved effective width think in order to figure out how big the register actually was. Making the user specify which sources are header sources eliminates that guesswork. It also has the nice side-effect that we can do the right force_writemask_all and we can properly handle COMPR4 for the the user. Yeah, true, but this seems like the least orthogonal and most annoying to use solution for this problem, it forces the caller to provide redundant information, it takes into account the saturate flag on some arguments and not others, it shuffles sources with respect to the specified order when COMPR4 is set, but only for the first four non-header sources. I think any of the following solutions would be better-behaved than the current approach: I don't know that saying which sources are headers is really redundant. It's explicit which is what we want. Yes, the COMPR4 thing is a bit magical but we have to do COMPR4 in lower_load_payload so we have to have some way of doing it and this method puts the interleving code in one place instead of two. Well, at least with the previous approach LOAD_PAYLOAD had consistent (if broken) semantics across its arguments, and regardless of COMPR4 being used or not, which IMHO is preferable to the modest code saving. 1/ Use the source width to determine the size of each copy. This would imply that the source width carries semantic information and hence would have to be left alone by e.g. copy propagation. That's what do now and it's terrible. The effective width field was basically a width that gets kept. Couldn't you just prevent copy propagation from propagating a source of different width into a LOAD_PAYLOAD? You would still be able to get rid of the metadata guess and effective_width, but you may have to re-run copy propagate after lower_load_payload() for the case you missed any optimization oportunities. 2/ Use the instruction exec size and flags to determine the properties of *all* copies. This means that if a header is present the exec size would necessarily have to be 8 and the halves of a 16-wide register would have to be specified separately, which sounds annoying at first but in practice wouldn't necessarily be because it could be handled by the LOAD_PAYLOAD() helper based on the argument widths without running into problems with optimization passes changing the meaning of the instruction. The semantics of the instruction itself would be as stupid as possible, but the
Re: [Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
Jason Ekstrand ja...@jlekstrand.net writes: Instead of the complicated and broken-by-design pile of heuristics we had before, we now have a straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and saturate modifiers as the LOAD_PAYLOAD operation itself. 3) In order to accommodate older gens that need interleaved colors, lower_load_payload detects when the destination is a COMPR4 register and automatically interleaves the non-header sources. The lower_load_payload pass does the right thing here regardless of whether or not the hardware actually supports COMPR4. I had a quick glance at the series and it seems to be going in the right direction. One thing I honestly don't like is the ad-hoc and IMHO premature treatment of payload headers, it still feels like the LOAD_PAYLOAD instruction has more complex semantics than necessary and the benefit is unclear to me. I suppose that your motivation was to avoid setting force_writemask_all in LOAD_PAYLOAD instructions with header. The optimizer should be able to cope with those flags and get rid of them from the resulting moves where they are redundant, and if it's not able to it looks like something that should be fixed anyway. The explicit handling of headers is responsible for much of the churn in this series and is likely to complicate users of LOAD_PAYLOAD and optimization passes that have to manipulate them. Thanks for looking into this BTW. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 154 ++- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 - 2 files changed, 80 insertions(+), 77 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index f8e26c0..bc45a38 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3201,93 +3201,99 @@ fs_visitor::lower_load_payload() { bool progress = false; - int vgrf_to_reg[alloc.count]; - int reg_count = 0; - for (unsigned i = 0; i alloc.count; ++i) { - vgrf_to_reg[i] = reg_count; - reg_count += alloc.sizes[i]; - } - - struct { - bool written:1; /* Whether this register has ever been written */ - bool force_writemask_all:1; - bool force_sechalf:1; - } metadata[reg_count]; - memset(metadata, 0, sizeof(metadata)); - foreach_block_and_inst_safe (block, fs_inst, inst, cfg) { - if (inst-dst.file == GRF) { - const int dst_reg = vgrf_to_reg[inst-dst.reg] + inst-dst.reg_offset; - bool force_sechalf = inst-force_sechalf - !inst-force_writemask_all; - bool toggle_sechalf = inst-dst.width == 16 - type_sz(inst-dst.type) == 4 - !inst-force_writemask_all; - for (int i = 0; i inst-regs_written; ++i) { -metadata[dst_reg + i].written = true; -metadata[dst_reg + i].force_sechalf = force_sechalf; -metadata[dst_reg + i].force_writemask_all = inst-force_writemask_all; -force_sechalf = (toggle_sechalf != force_sechalf); - } - } - if (inst-opcode == SHADER_OPCODE_LOAD_PAYLOAD) { assert(inst-dst.file == MRF || inst-dst.file == GRF); + fs_reg dst = inst-dst; - for (int i = 0; i inst-sources; i++) { -dst.width = inst-src[i].effective_width; -dst.type = inst-src[i].type; - -if (inst-src[i].file == BAD_FILE) { - /* Do nothing but otherwise increment as normal */ -} else if (dst.file == MRF - dst.width == 8 - brw-has_compr4 - i + 4 inst-sources - inst-src[i + 4].equals(horiz_offset(inst-src[i], 8))) { - fs_reg compr4_dst = dst; - compr4_dst.reg += BRW_MRF_COMPR4; - compr4_dst.width = 16; - fs_reg compr4_src = inst-src[i]; - compr4_src.width = 16; - fs_inst *mov = MOV(compr4_dst, compr4_src); + /* Get rid of COMPR4. We'll add it back in if we need it */ + if (dst.file == MRF dst.reg BRW_MRF_COMPR4) +dst.reg = dst.reg ~BRW_MRF_COMPR4; + + dst.width = 8; + for (uint8_t i = 0; i inst-header_size; i++) { +if (inst-src[i].file != BAD_FILE) { + fs_reg mov_dst = retype(dst, BRW_REGISTER_TYPE_UD); + fs_reg mov_src = retype(inst-src[i], BRW_REGISTER_TYPE_UD); + mov_src.width = 8; + fs_inst *mov = MOV(mov_dst, mov_src); mov-force_writemask_all = true;
Re: [Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
On Thu, Apr 2, 2015 at 3:01 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: Instead of the complicated and broken-by-design pile of heuristics we had before, we now have a straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and saturate modifiers as the LOAD_PAYLOAD operation itself. 3) In order to accommodate older gens that need interleaved colors, lower_load_payload detects when the destination is a COMPR4 register and automatically interleaves the non-header sources. The lower_load_payload pass does the right thing here regardless of whether or not the hardware actually supports COMPR4. I had a quick glance at the series and it seems to be going in the right direction. One thing I honestly don't like is the ad-hoc and IMHO premature treatment of payload headers, it still feels like the LOAD_PAYLOAD instruction has more complex semantics than necessary and the benefit is unclear to me. I suppose that your motivation was to avoid setting force_writemask_all in LOAD_PAYLOAD instructions with header. The optimizer should be able to cope with those flags and get rid of them from the resulting moves where they are redundant, and if it's not able to it looks like something that should be fixed anyway. The explicit handling of headers is responsible for much of the churn in this series and is likely to complicate users of LOAD_PAYLOAD and optimization passes that have to manipulate them. Avoiding force_writemask_all is only half of the motivation and the small half at that. A header source, more properly defined, is a single physical register that, conceptually, applies to all channels. Effectively, a header source (I should have stated this clearly) has two properties: 1) It has force_writemask_all set 2) It is exactly one physical hardware register. This second property is the more important of the two. Most of the disaster of the previous LOAD_PAYLOAD implementation was that we did a pile of guesswork and had a ill-conceved effective width think in order to figure out how big the register actually was. Making the user specify which sources are header sources eliminates that guesswork. It also has the nice side-effect that we can do the right force_writemask_all and we can properly handle COMPR4 for the the user. --Jason Thanks for looking into this BTW. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 154 ++- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 - 2 files changed, 80 insertions(+), 77 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index f8e26c0..bc45a38 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3201,93 +3201,99 @@ fs_visitor::lower_load_payload() { bool progress = false; - int vgrf_to_reg[alloc.count]; - int reg_count = 0; - for (unsigned i = 0; i alloc.count; ++i) { - vgrf_to_reg[i] = reg_count; - reg_count += alloc.sizes[i]; - } - - struct { - bool written:1; /* Whether this register has ever been written */ - bool force_writemask_all:1; - bool force_sechalf:1; - } metadata[reg_count]; - memset(metadata, 0, sizeof(metadata)); - foreach_block_and_inst_safe (block, fs_inst, inst, cfg) { - if (inst-dst.file == GRF) { - const int dst_reg = vgrf_to_reg[inst-dst.reg] + inst-dst.reg_offset; - bool force_sechalf = inst-force_sechalf - !inst-force_writemask_all; - bool toggle_sechalf = inst-dst.width == 16 - type_sz(inst-dst.type) == 4 - !inst-force_writemask_all; - for (int i = 0; i inst-regs_written; ++i) { -metadata[dst_reg + i].written = true; -metadata[dst_reg + i].force_sechalf = force_sechalf; -metadata[dst_reg + i].force_writemask_all = inst-force_writemask_all; -force_sechalf = (toggle_sechalf != force_sechalf); - } - } - if (inst-opcode == SHADER_OPCODE_LOAD_PAYLOAD) { assert(inst-dst.file == MRF || inst-dst.file == GRF); + fs_reg dst = inst-dst; - for (int i = 0; i inst-sources; i++) { -dst.width = inst-src[i].effective_width; -dst.type = inst-src[i].type; - -if (inst-src[i].file == BAD_FILE) { - /* Do nothing but otherwise increment as normal */ -} else if (dst.file == MRF - dst.width == 8 - brw-has_compr4 - i + 4 inst-sources - inst-src[i +
[Mesa-dev] [PATCH 09/13] SQUASH: i965/fs: Rework fs_visitor::lower_load_payload
Instead of the complicated and broken-by-design pile of heuristics we had before, we now have a straightforward lowering: 1) All header sources are copied directly using force_writemask_all and, since they are guaranteed to be a single register, there are no force_sechalf issues. 2) All non-header sources are copied using the exact same force_sechalf and saturate modifiers as the LOAD_PAYLOAD operation itself. 3) In order to accommodate older gens that need interleaved colors, lower_load_payload detects when the destination is a COMPR4 register and automatically interleaves the non-header sources. The lower_load_payload pass does the right thing here regardless of whether or not the hardware actually supports COMPR4. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 154 ++- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 - 2 files changed, 80 insertions(+), 77 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index f8e26c0..bc45a38 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3201,93 +3201,99 @@ fs_visitor::lower_load_payload() { bool progress = false; - int vgrf_to_reg[alloc.count]; - int reg_count = 0; - for (unsigned i = 0; i alloc.count; ++i) { - vgrf_to_reg[i] = reg_count; - reg_count += alloc.sizes[i]; - } - - struct { - bool written:1; /* Whether this register has ever been written */ - bool force_writemask_all:1; - bool force_sechalf:1; - } metadata[reg_count]; - memset(metadata, 0, sizeof(metadata)); - foreach_block_and_inst_safe (block, fs_inst, inst, cfg) { - if (inst-dst.file == GRF) { - const int dst_reg = vgrf_to_reg[inst-dst.reg] + inst-dst.reg_offset; - bool force_sechalf = inst-force_sechalf - !inst-force_writemask_all; - bool toggle_sechalf = inst-dst.width == 16 - type_sz(inst-dst.type) == 4 - !inst-force_writemask_all; - for (int i = 0; i inst-regs_written; ++i) { -metadata[dst_reg + i].written = true; -metadata[dst_reg + i].force_sechalf = force_sechalf; -metadata[dst_reg + i].force_writemask_all = inst-force_writemask_all; -force_sechalf = (toggle_sechalf != force_sechalf); - } - } - if (inst-opcode == SHADER_OPCODE_LOAD_PAYLOAD) { assert(inst-dst.file == MRF || inst-dst.file == GRF); + fs_reg dst = inst-dst; - for (int i = 0; i inst-sources; i++) { -dst.width = inst-src[i].effective_width; -dst.type = inst-src[i].type; - -if (inst-src[i].file == BAD_FILE) { - /* Do nothing but otherwise increment as normal */ -} else if (dst.file == MRF - dst.width == 8 - brw-has_compr4 - i + 4 inst-sources - inst-src[i + 4].equals(horiz_offset(inst-src[i], 8))) { - fs_reg compr4_dst = dst; - compr4_dst.reg += BRW_MRF_COMPR4; - compr4_dst.width = 16; - fs_reg compr4_src = inst-src[i]; - compr4_src.width = 16; - fs_inst *mov = MOV(compr4_dst, compr4_src); + /* Get rid of COMPR4. We'll add it back in if we need it */ + if (dst.file == MRF dst.reg BRW_MRF_COMPR4) +dst.reg = dst.reg ~BRW_MRF_COMPR4; + + dst.width = 8; + for (uint8_t i = 0; i inst-header_size; i++) { +if (inst-src[i].file != BAD_FILE) { + fs_reg mov_dst = retype(dst, BRW_REGISTER_TYPE_UD); + fs_reg mov_src = retype(inst-src[i], BRW_REGISTER_TYPE_UD); + mov_src.width = 8; + fs_inst *mov = MOV(mov_dst, mov_src); mov-force_writemask_all = true; inst-insert_before(block, mov); - /* Mark i+4 as BAD_FILE so we don't emit a MOV for it */ - inst-src[i + 4].file = BAD_FILE; -} else { - fs_inst *mov = MOV(dst, inst-src[i]); - if (inst-src[i].file == GRF) { - int src_reg = vgrf_to_reg[inst-src[i].reg] + -inst-src[i].reg_offset; - mov-force_sechalf = metadata[src_reg].force_sechalf; - mov-force_writemask_all = metadata[src_reg].force_writemask_all; - } else { - /* We don't have any useful metadata for immediates or - * uniforms. Assume that any of the channels of the - * destination may be used. - */ - assert(inst-src[i].file == IMM || - inst-src[i].file == UNIFORM); - mov-force_writemask_all = true; - } +}