Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates
On Fri, Oct 23, 2015 at 5:38 AM, Francisco Jerez wrote: > Kristian Høgsberg writes: > >> On Tue, Oct 20, 2015 at 11:56 AM, Francisco Jerez >> wrote: >>> Kristian Høgsberg writes: >>> On Tue, Oct 20, 2015 at 3:16 AM, Francisco Jerez wrote: > Kristian Høgsberg writes: > >> On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez >> wrote: >>> Neil Roberts writes: >>> Just a thought, would it be better to move this check into the eliminate_find_live_channel optimisation? That way it could catch sources that become immediates through later optimisations. One problem with this that I've seen before is that eliminating the FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be eliminated because the copy propagation doesn't work. >>> >>> I believe in this particular case the BROADCAST instruction should >>> already be eliminated (by opt_algebraic), because its first source is an >>> immediate, so this optimization would in fact be redundant with >>> opt_algebraic+constant propagation if it weren't because the latter is >>> unable to handle scalar copies. >> >> Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets >> eliminated because it's outside control flow >> (fs_visitor::eliminate_find_live_channel). Then the broadcast gets >> reduced to a MOV in opt_algebraic because src0 is uniform (immediate >> constant). The problem then is the dst of the MOV has stride 0 which >> can_propagate_from() then bails on. >> >>> Another possibility that would likely >>> be more general than this (because it would handle cases in which, as >>> Neil said, the argument becomes an immediate only after optimization, >>> and because it would also avoid the issue with liveness analysis >>> extending the live ranges of scalar values incorrectly [1]) would be to >>> extend the destination register of the BROADCAST instructions to a >>> vector [2] and then add a case to the switch statement of >>> try_constant_propagate() so that the immediate MOV resulting from >>> opt_algebraic is propagated into surface instructions (see attachment). >> >> I wrote exactly this code to make constant propagate work, plus adding >> the extra opcodes to the switch statement. It works and I could >> certainly send that out after this, but >> >> 1) This doesn't mean we shouldn't do the if (src.file == IMM) >> shortcut. It saves the compiler a bit of work in the very common case >> of >> non-indirect buffer access. >> >> 2) I'm not even sure it makes sense to extend copy-propagation to do >> this (which is why I went back to just the IMM test). Anything that >> would be an immediate at this point should be an immediate, if not >> we're missing something in nir. >> > Still this doesn't address the root of the problem, which is that > emit_uniformize() emits scalar code that the rest of the compiler is not > able to handle properly. This is not a case of papering over the root cause, it's about not creating the root cause in the first place. The output of emit_uniformize() always ends up as either a surface or a sampler index, which we only look at later in the generator. There are no other cases where the result of emit_uniformize() might be part of an expression that we can copy propagate or otherwise optimize. If the input to emit_uniformize() isn't an immediate where it could be, nir optimization needs fixing. So if we add these two lines to emit_uniformize() to pass immediates straight through, we avoid generating code that we have to extend the copy prop pass to handle. >>> >>> Kristian, there are legitimate uses of emit_uniformize() in which the >>> argument is not an immediate but still can be optimized out later on -- >>> E.g. for images it will frequently be a uniform register, or a >>> non-constant expression calculated within uniform control flow (in which >>> case the FIND_LIVE_CHANNEL instruction emitted here will be reduced to a >>> constant MOV by eliminate_find_live_chaso nnel()). In such cases we still >>> want copy-propagation to kick in, but it wont because of the problem I >>> was talking about with scalar writes. Even if the instructions emitted >>> by emit_uniformize() cannot be optimized out, liveness analysis will >>> overestimate the live ranges of the temporaries used by >>> emit_uniformize() for the same reason, potentially causing the register >>> allocator to spill or run out of registers. >> >> Yeah, fair point. I went and took a look at non-constant surface index >> and sent out a new series that make that work better as well as ssbo >> write optimizations. I took out the imm shortcut and put in the >> generic optimizations. It makes non-const surface index a little >> better, but as I wrote in the co
Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates
Kristian Høgsberg writes: > On Tue, Oct 20, 2015 at 11:56 AM, Francisco Jerez > wrote: >> Kristian Høgsberg writes: >> >>> On Tue, Oct 20, 2015 at 3:16 AM, Francisco Jerez >>> wrote: Kristian Høgsberg writes: > On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez > wrote: >> Neil Roberts writes: >> >>> Just a thought, would it be better to move this check into the >>> eliminate_find_live_channel optimisation? That way it could catch >>> sources that become immediates through later optimisations. One problem >>> with this that I've seen before is that eliminating the >>> FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be >>> eliminated because the copy propagation doesn't work. >> >> I believe in this particular case the BROADCAST instruction should >> already be eliminated (by opt_algebraic), because its first source is an >> immediate, so this optimization would in fact be redundant with >> opt_algebraic+constant propagation if it weren't because the latter is >> unable to handle scalar copies. > > Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets > eliminated because it's outside control flow > (fs_visitor::eliminate_find_live_channel). Then the broadcast gets > reduced to a MOV in opt_algebraic because src0 is uniform (immediate > constant). The problem then is the dst of the MOV has stride 0 which > can_propagate_from() then bails on. > >> Another possibility that would likely >> be more general than this (because it would handle cases in which, as >> Neil said, the argument becomes an immediate only after optimization, >> and because it would also avoid the issue with liveness analysis >> extending the live ranges of scalar values incorrectly [1]) would be to >> extend the destination register of the BROADCAST instructions to a >> vector [2] and then add a case to the switch statement of >> try_constant_propagate() so that the immediate MOV resulting from >> opt_algebraic is propagated into surface instructions (see attachment). > > I wrote exactly this code to make constant propagate work, plus adding > the extra opcodes to the switch statement. It works and I could > certainly send that out after this, but > > 1) This doesn't mean we shouldn't do the if (src.file == IMM) > shortcut. It saves the compiler a bit of work in the very common case > of > non-indirect buffer access. > > 2) I'm not even sure it makes sense to extend copy-propagation to do > this (which is why I went back to just the IMM test). Anything that > would be an immediate at this point should be an immediate, if not > we're missing something in nir. > Still this doesn't address the root of the problem, which is that emit_uniformize() emits scalar code that the rest of the compiler is not able to handle properly. >>> >>> This is not a case of papering over the root cause, it's about not >>> creating the root cause in the first place. The output of >>> emit_uniformize() always ends up as either a surface or a sampler >>> index, which we only look at later in the generator. There are no >>> other cases where the result of emit_uniformize() might be part of an >>> expression that we can copy propagate or otherwise optimize. If the >>> input to emit_uniformize() isn't an immediate where it could be, nir >>> optimization needs fixing. So if we add these two lines to >>> emit_uniformize() to pass immediates straight through, we avoid >>> generating code that we have to extend the copy prop pass to handle. >>> >> >> Kristian, there are legitimate uses of emit_uniformize() in which the >> argument is not an immediate but still can be optimized out later on -- >> E.g. for images it will frequently be a uniform register, or a >> non-constant expression calculated within uniform control flow (in which >> case the FIND_LIVE_CHANNEL instruction emitted here will be reduced to a >> constant MOV by eliminate_find_live_chaso nnel()). In such cases we still >> want copy-propagation to kick in, but it wont because of the problem I >> was talking about with scalar writes. Even if the instructions emitted >> by emit_uniformize() cannot be optimized out, liveness analysis will >> overestimate the live ranges of the temporaries used by >> emit_uniformize() for the same reason, potentially causing the register >> allocator to spill or run out of registers. > > Yeah, fair point. I went and took a look at non-constant surface index > and sent out a new series that make that work better as well as ssbo > write optimizations. I took out the imm shortcut and put in the > generic optimizations. It makes non-const surface index a little > better, but as I wrote in the cover letter, it still doesn't work that > well as we don't propagate the surface index into the send > instructions. > >> Anyway don't take me wrong, I
Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates
On Tue, Oct 20, 2015 at 11:56 AM, Francisco Jerez wrote: > Kristian Høgsberg writes: > >> On Tue, Oct 20, 2015 at 3:16 AM, Francisco Jerez >> wrote: >>> Kristian Høgsberg writes: >>> On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez wrote: > Neil Roberts writes: > >> Just a thought, would it be better to move this check into the >> eliminate_find_live_channel optimisation? That way it could catch >> sources that become immediates through later optimisations. One problem >> with this that I've seen before is that eliminating the >> FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be >> eliminated because the copy propagation doesn't work. > > I believe in this particular case the BROADCAST instruction should > already be eliminated (by opt_algebraic), because its first source is an > immediate, so this optimization would in fact be redundant with > opt_algebraic+constant propagation if it weren't because the latter is > unable to handle scalar copies. Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets eliminated because it's outside control flow (fs_visitor::eliminate_find_live_channel). Then the broadcast gets reduced to a MOV in opt_algebraic because src0 is uniform (immediate constant). The problem then is the dst of the MOV has stride 0 which can_propagate_from() then bails on. > Another possibility that would likely > be more general than this (because it would handle cases in which, as > Neil said, the argument becomes an immediate only after optimization, > and because it would also avoid the issue with liveness analysis > extending the live ranges of scalar values incorrectly [1]) would be to > extend the destination register of the BROADCAST instructions to a > vector [2] and then add a case to the switch statement of > try_constant_propagate() so that the immediate MOV resulting from > opt_algebraic is propagated into surface instructions (see attachment). I wrote exactly this code to make constant propagate work, plus adding the extra opcodes to the switch statement. It works and I could certainly send that out after this, but 1) This doesn't mean we shouldn't do the if (src.file == IMM) shortcut. It saves the compiler a bit of work in the very common case of non-indirect buffer access. 2) I'm not even sure it makes sense to extend copy-propagation to do this (which is why I went back to just the IMM test). Anything that would be an immediate at this point should be an immediate, if not we're missing something in nir. >>> Still this doesn't address the root of the problem, which is that >>> emit_uniformize() emits scalar code that the rest of the compiler is not >>> able to handle properly. >> >> This is not a case of papering over the root cause, it's about not >> creating the root cause in the first place. The output of >> emit_uniformize() always ends up as either a surface or a sampler >> index, which we only look at later in the generator. There are no >> other cases where the result of emit_uniformize() might be part of an >> expression that we can copy propagate or otherwise optimize. If the >> input to emit_uniformize() isn't an immediate where it could be, nir >> optimization needs fixing. So if we add these two lines to >> emit_uniformize() to pass immediates straight through, we avoid >> generating code that we have to extend the copy prop pass to handle. >> > > Kristian, there are legitimate uses of emit_uniformize() in which the > argument is not an immediate but still can be optimized out later on -- > E.g. for images it will frequently be a uniform register, or a > non-constant expression calculated within uniform control flow (in which > case the FIND_LIVE_CHANNEL instruction emitted here will be reduced to a > constant MOV by eliminate_find_live_chaso nnel()). In such cases we still > want copy-propagation to kick in, but it wont because of the problem I > was talking about with scalar writes. Even if the instructions emitted > by emit_uniformize() cannot be optimized out, liveness analysis will > overestimate the live ranges of the temporaries used by > emit_uniformize() for the same reason, potentially causing the register > allocator to spill or run out of registers. Yeah, fair point. I went and took a look at non-constant surface index and sent out a new series that make that work better as well as ssbo write optimizations. I took out the imm shortcut and put in the generic optimizations. It makes non-const surface index a little better, but as I wrote in the cover letter, it still doesn't work that well as we don't propagate the surface index into the send instructions. > Anyway don't take me wrong, I'm not NAK-ing your patch or anything, I > just have the feeling that by fixing this more generally you could've > saved us [or most
Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates
Kristian Høgsberg writes: > On Tue, Oct 20, 2015 at 3:16 AM, Francisco Jerez > wrote: >> Kristian Høgsberg writes: >> >>> On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez >>> wrote: Neil Roberts writes: > Just a thought, would it be better to move this check into the > eliminate_find_live_channel optimisation? That way it could catch > sources that become immediates through later optimisations. One problem > with this that I've seen before is that eliminating the > FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be > eliminated because the copy propagation doesn't work. I believe in this particular case the BROADCAST instruction should already be eliminated (by opt_algebraic), because its first source is an immediate, so this optimization would in fact be redundant with opt_algebraic+constant propagation if it weren't because the latter is unable to handle scalar copies. >>> >>> Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets >>> eliminated because it's outside control flow >>> (fs_visitor::eliminate_find_live_channel). Then the broadcast gets >>> reduced to a MOV in opt_algebraic because src0 is uniform (immediate >>> constant). The problem then is the dst of the MOV has stride 0 which >>> can_propagate_from() then bails on. >>> Another possibility that would likely be more general than this (because it would handle cases in which, as Neil said, the argument becomes an immediate only after optimization, and because it would also avoid the issue with liveness analysis extending the live ranges of scalar values incorrectly [1]) would be to extend the destination register of the BROADCAST instructions to a vector [2] and then add a case to the switch statement of try_constant_propagate() so that the immediate MOV resulting from opt_algebraic is propagated into surface instructions (see attachment). >>> >>> I wrote exactly this code to make constant propagate work, plus adding >>> the extra opcodes to the switch statement. It works and I could >>> certainly send that out after this, but >>> >>> 1) This doesn't mean we shouldn't do the if (src.file == IMM) >>> shortcut. It saves the compiler a bit of work in the very common case >>> of >>> non-indirect buffer access. >>> >>> 2) I'm not even sure it makes sense to extend copy-propagation to do >>> this (which is why I went back to just the IMM test). Anything that >>> would be an immediate at this point should be an immediate, if not >>> we're missing something in nir. >>> >> Still this doesn't address the root of the problem, which is that >> emit_uniformize() emits scalar code that the rest of the compiler is not >> able to handle properly. > > This is not a case of papering over the root cause, it's about not > creating the root cause in the first place. The output of > emit_uniformize() always ends up as either a surface or a sampler > index, which we only look at later in the generator. There are no > other cases where the result of emit_uniformize() might be part of an > expression that we can copy propagate or otherwise optimize. If the > input to emit_uniformize() isn't an immediate where it could be, nir > optimization needs fixing. So if we add these two lines to > emit_uniformize() to pass immediates straight through, we avoid > generating code that we have to extend the copy prop pass to handle. > Kristian, there are legitimate uses of emit_uniformize() in which the argument is not an immediate but still can be optimized out later on -- E.g. for images it will frequently be a uniform register, or a non-constant expression calculated within uniform control flow (in which case the FIND_LIVE_CHANNEL instruction emitted here will be reduced to a constant MOV by eliminate_find_live_channel()). In such cases we still want copy-propagation to kick in, but it wont because of the problem I was talking about with scalar writes. Even if the instructions emitted by emit_uniformize() cannot be optimized out, liveness analysis will overestimate the live ranges of the temporaries used by emit_uniformize() for the same reason, potentially causing the register allocator to spill or run out of registers. Anyway don't take me wrong, I'm not NAK-ing your patch or anything, I just have the feeling that by fixing this more generally you could've saved us [or most likely me ;)] work in the near future. > Kristian > >> Re-implementing a special case of the >> optimization already done by opt_algebraic() might have helped in this >> specific case, but it won't help when the argument is of some other kind >> of uniform value which are also frequently encountered in practice and >> could also be copy-propagated, and it won't help avoid the liveness >> analysis bug [1] in cases where the argument is not an immediate (the >> bug is reported to break some thousands SSBO dEQP tests, although I >> don't know what fraction of them actual
Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates
On Tue, Oct 20, 2015 at 3:16 AM, Francisco Jerez wrote: > Kristian Høgsberg writes: > >> On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez >> wrote: >>> Neil Roberts writes: >>> Just a thought, would it be better to move this check into the eliminate_find_live_channel optimisation? That way it could catch sources that become immediates through later optimisations. One problem with this that I've seen before is that eliminating the FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be eliminated because the copy propagation doesn't work. >>> >>> I believe in this particular case the BROADCAST instruction should >>> already be eliminated (by opt_algebraic), because its first source is an >>> immediate, so this optimization would in fact be redundant with >>> opt_algebraic+constant propagation if it weren't because the latter is >>> unable to handle scalar copies. >> >> Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets >> eliminated because it's outside control flow >> (fs_visitor::eliminate_find_live_channel). Then the broadcast gets >> reduced to a MOV in opt_algebraic because src0 is uniform (immediate >> constant). The problem then is the dst of the MOV has stride 0 which >> can_propagate_from() then bails on. >> >>> Another possibility that would likely >>> be more general than this (because it would handle cases in which, as >>> Neil said, the argument becomes an immediate only after optimization, >>> and because it would also avoid the issue with liveness analysis >>> extending the live ranges of scalar values incorrectly [1]) would be to >>> extend the destination register of the BROADCAST instructions to a >>> vector [2] and then add a case to the switch statement of >>> try_constant_propagate() so that the immediate MOV resulting from >>> opt_algebraic is propagated into surface instructions (see attachment). >> >> I wrote exactly this code to make constant propagate work, plus adding >> the extra opcodes to the switch statement. It works and I could >> certainly send that out after this, but >> >> 1) This doesn't mean we shouldn't do the if (src.file == IMM) >> shortcut. It saves the compiler a bit of work in the very common case >> of >> non-indirect buffer access. >> >> 2) I'm not even sure it makes sense to extend copy-propagation to do >> this (which is why I went back to just the IMM test). Anything that >> would be an immediate at this point should be an immediate, if not >> we're missing something in nir. >> > Still this doesn't address the root of the problem, which is that > emit_uniformize() emits scalar code that the rest of the compiler is not > able to handle properly. This is not a case of papering over the root cause, it's about not creating the root cause in the first place. The output of emit_uniformize() always ends up as either a surface or a sampler index, which we only look at later in the generator. There are no other cases where the result of emit_uniformize() might be part of an expression that we can copy propagate or otherwise optimize. If the input to emit_uniformize() isn't an immediate where it could be, nir optimization needs fixing. So if we add these two lines to emit_uniformize() to pass immediates straight through, we avoid generating code that we have to extend the copy prop pass to handle. Kristian > Re-implementing a special case of the > optimization already done by opt_algebraic() might have helped in this > specific case, but it won't help when the argument is of some other kind > of uniform value which are also frequently encountered in practice and > could also be copy-propagated, and it won't help avoid the liveness > analysis bug [1] in cases where the argument is not an immediate (the > bug is reported to break some thousands SSBO dEQP tests, although I > don't know what fraction of them actually use non-constant indexing). > The alternative solution I provided patches for (and you seem to have > implemented yourself independently) would address all these issues at > once. > >> Kristian >> I made this patch a while ago but I never posted it anywhere because it's a of a kludge and it would probably be better to fix the copy propagation: https://github.com/bpeel/mesa/commit/e4c3286075f891f466fb8558106d2aaa >>> Heh, yeah, I'd rather fix copy propagation instead, which I believe will >>> become much easier with the use-def-chain analysis pass I'm working on. >>> Either way though I don't think it would do any harm to have Kristian's patch as well even if we did improve elimintate_find_live_channel so it is: Reviewed-by: Neil Roberts - Neil Kristian Høgsberg Kristensen writes: > An immdiate is already uniform so just return it up front. Without this, > brw_fs_surface_builder ends up passing immediate surface indices through > SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't > constant
Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates
Kristian Høgsberg writes: > On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez > wrote: >> Neil Roberts writes: >> >>> Just a thought, would it be better to move this check into the >>> eliminate_find_live_channel optimisation? That way it could catch >>> sources that become immediates through later optimisations. One problem >>> with this that I've seen before is that eliminating the >>> FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be >>> eliminated because the copy propagation doesn't work. >> >> I believe in this particular case the BROADCAST instruction should >> already be eliminated (by opt_algebraic), because its first source is an >> immediate, so this optimization would in fact be redundant with >> opt_algebraic+constant propagation if it weren't because the latter is >> unable to handle scalar copies. > > Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets > eliminated because it's outside control flow > (fs_visitor::eliminate_find_live_channel). Then the broadcast gets > reduced to a MOV in opt_algebraic because src0 is uniform (immediate > constant). The problem then is the dst of the MOV has stride 0 which > can_propagate_from() then bails on. > >> Another possibility that would likely >> be more general than this (because it would handle cases in which, as >> Neil said, the argument becomes an immediate only after optimization, >> and because it would also avoid the issue with liveness analysis >> extending the live ranges of scalar values incorrectly [1]) would be to >> extend the destination register of the BROADCAST instructions to a >> vector [2] and then add a case to the switch statement of >> try_constant_propagate() so that the immediate MOV resulting from >> opt_algebraic is propagated into surface instructions (see attachment). > > I wrote exactly this code to make constant propagate work, plus adding > the extra opcodes to the switch statement. It works and I could > certainly send that out after this, but > > 1) This doesn't mean we shouldn't do the if (src.file == IMM) > shortcut. It saves the compiler a bit of work in the very common case > of > non-indirect buffer access. > > 2) I'm not even sure it makes sense to extend copy-propagation to do > this (which is why I went back to just the IMM test). Anything that > would be an immediate at this point should be an immediate, if not > we're missing something in nir. > Still this doesn't address the root of the problem, which is that emit_uniformize() emits scalar code that the rest of the compiler is not able to handle properly. Re-implementing a special case of the optimization already done by opt_algebraic() might have helped in this specific case, but it won't help when the argument is of some other kind of uniform value which are also frequently encountered in practice and could also be copy-propagated, and it won't help avoid the liveness analysis bug [1] in cases where the argument is not an immediate (the bug is reported to break some thousands SSBO dEQP tests, although I don't know what fraction of them actually use non-constant indexing). The alternative solution I provided patches for (and you seem to have implemented yourself independently) would address all these issues at once. > Kristian > >>> I made this patch a while ago but I never posted it anywhere because >>> it's a of a kludge and it would probably be better to fix the copy >>> propagation: >>> >>> https://github.com/bpeel/mesa/commit/e4c3286075f891f466fb8558106d2aaa >>> >> Heh, yeah, I'd rather fix copy propagation instead, which I believe will >> become much easier with the use-def-chain analysis pass I'm working on. >> >>> Either way though I don't think it would do any harm to have Kristian's >>> patch as well even if we did improve elimintate_find_live_channel so it >>> is: >>> >>> Reviewed-by: Neil Roberts >>> >>> - Neil >>> >>> Kristian Høgsberg Kristensen writes: >>> An immdiate is already uniform so just return it up front. Without this, brw_fs_surface_builder ends up passing immediate surface indices through SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't constant propagate out of, and further, we don't constant propagate into the typed/untype read/write opcodes at all. The end result is that all typed/untyped read/write/atomics end up as indirect sends. Code generation should always produce either an immediate or an actual indirect surface index, so we can fix this by just special casing immediates in emit_uniformize. Signed-off-by: Kristian Høgsberg Kristensen --- src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h b/src/mesa/drivers/dri/i965/brw_fs_builder.h index df10a9d..98ce71e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h @@ -394,6 +3
Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates
I wrote a similar patch a while back because I was annoyed by how this was causing the send disassembly to not be as nice. :) I dropped it from my branch at some point, but I still think it is a good idea. Reviewed-by: Jordan Justen On 2015-10-18 21:31:43, Kristian Høgsberg Kristensen wrote: > An immdiate is already uniform so just return it up front. Without this, > brw_fs_surface_builder ends up passing immediate surface indices through > SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't > constant propagate out of, and further, we don't constant propagate into > the typed/untype read/write opcodes at all. The end result is that all > typed/untyped read/write/atomics end up as indirect sends. > > Code generation should always produce either an immediate or an actual > indirect surface index, so we can fix this by just special casing > immediates in emit_uniformize. > > Signed-off-by: Kristian Høgsberg Kristensen > --- > src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h > b/src/mesa/drivers/dri/i965/brw_fs_builder.h > index df10a9d..98ce71e 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h > +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h > @@ -394,6 +394,9 @@ namespace brw { > const dst_reg chan_index = component(vgrf(BRW_REGISTER_TYPE_UD), 0); > const dst_reg dst = component(vgrf(src.type), 0); > > + if (src.file == IMM) > +return src; > + > ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index); > ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index); > > -- > 2.6.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates
On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez wrote: > Neil Roberts writes: > >> Just a thought, would it be better to move this check into the >> eliminate_find_live_channel optimisation? That way it could catch >> sources that become immediates through later optimisations. One problem >> with this that I've seen before is that eliminating the >> FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be >> eliminated because the copy propagation doesn't work. > > I believe in this particular case the BROADCAST instruction should > already be eliminated (by opt_algebraic), because its first source is an > immediate, so this optimization would in fact be redundant with > opt_algebraic+constant propagation if it weren't because the latter is > unable to handle scalar copies. Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets eliminated because it's outside control flow (fs_visitor::eliminate_find_live_channel). Then the broadcast gets reduced to a MOV in opt_algebraic because src0 is uniform (immediate constant). The problem then is the dst of the MOV has stride 0 which can_propagate_from() then bails on. > Another possibility that would likely > be more general than this (because it would handle cases in which, as > Neil said, the argument becomes an immediate only after optimization, > and because it would also avoid the issue with liveness analysis > extending the live ranges of scalar values incorrectly [1]) would be to > extend the destination register of the BROADCAST instructions to a > vector [2] and then add a case to the switch statement of > try_constant_propagate() so that the immediate MOV resulting from > opt_algebraic is propagated into surface instructions (see attachment). I wrote exactly this code to make constant propagate work, plus adding the extra opcodes to the switch statement. It works and I could certainly send that out after this, but 1) This doesn't mean we shouldn't do the if (src.file == IMM) shortcut. It saves the compiler a bit of work in the very common case of non-indirect buffer access. 2) I'm not even sure it makes sense to extend copy-propagation to do this (which is why I went back to just the IMM test). Anything that would be an immediate at this point should be an immediate, if not we're missing something in nir. Kristian >> I made this patch a while ago but I never posted it anywhere because >> it's a of a kludge and it would probably be better to fix the copy >> propagation: >> >> https://github.com/bpeel/mesa/commit/e4c3286075f891f466fb8558106d2aaa >> > Heh, yeah, I'd rather fix copy propagation instead, which I believe will > become much easier with the use-def-chain analysis pass I'm working on. > >> Either way though I don't think it would do any harm to have Kristian's >> patch as well even if we did improve elimintate_find_live_channel so it >> is: >> >> Reviewed-by: Neil Roberts >> >> - Neil >> >> Kristian Høgsberg Kristensen writes: >> >>> An immdiate is already uniform so just return it up front. Without this, >>> brw_fs_surface_builder ends up passing immediate surface indices through >>> SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't >>> constant propagate out of, and further, we don't constant propagate into >>> the typed/untype read/write opcodes at all. The end result is that all >>> typed/untyped read/write/atomics end up as indirect sends. >>> >>> Code generation should always produce either an immediate or an actual >>> indirect surface index, so we can fix this by just special casing >>> immediates in emit_uniformize. >>> >>> Signed-off-by: Kristian Høgsberg Kristensen >>> --- >>> src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h >>> b/src/mesa/drivers/dri/i965/brw_fs_builder.h >>> index df10a9d..98ce71e 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h >>> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h >>> @@ -394,6 +394,9 @@ namespace brw { >>> const dst_reg chan_index = component(vgrf(BRW_REGISTER_TYPE_UD), >>> 0); >>> const dst_reg dst = component(vgrf(src.type), 0); >>> >>> + if (src.file == IMM) >>> +return src; >>> + >>> ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index); >>> ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index); >>> >>> -- >>> 2.6.2 >>> >>> ___ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > [1] http://lists.freedesktop.org/archives/mesa-dev/2015-October/097085.html > [2] > http://lists.freedesktop.org/archives/mesa-dev/attachments/20151014/25dd38dc/attachment.patch > _
Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates
Neil Roberts writes: > Just a thought, would it be better to move this check into the > eliminate_find_live_channel optimisation? That way it could catch > sources that become immediates through later optimisations. One problem > with this that I've seen before is that eliminating the > FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be > eliminated because the copy propagation doesn't work. I believe in this particular case the BROADCAST instruction should already be eliminated (by opt_algebraic), because its first source is an immediate, so this optimization would in fact be redundant with opt_algebraic+constant propagation if it weren't because the latter is unable to handle scalar copies. Another possibility that would likely be more general than this (because it would handle cases in which, as Neil said, the argument becomes an immediate only after optimization, and because it would also avoid the issue with liveness analysis extending the live ranges of scalar values incorrectly [1]) would be to extend the destination register of the BROADCAST instructions to a vector [2] and then add a case to the switch statement of try_constant_propagate() so that the immediate MOV resulting from opt_algebraic is propagated into surface instructions (see attachment). > I made this patch a while ago but I never posted it anywhere because > it's a of a kludge and it would probably be better to fix the copy > propagation: > > https://github.com/bpeel/mesa/commit/e4c3286075f891f466fb8558106d2aaa > Heh, yeah, I'd rather fix copy propagation instead, which I believe will become much easier with the use-def-chain analysis pass I'm working on. > Either way though I don't think it would do any harm to have Kristian's > patch as well even if we did improve elimintate_find_live_channel so it > is: > > Reviewed-by: Neil Roberts > > - Neil > > Kristian Høgsberg Kristensen writes: > >> An immdiate is already uniform so just return it up front. Without this, >> brw_fs_surface_builder ends up passing immediate surface indices through >> SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't >> constant propagate out of, and further, we don't constant propagate into >> the typed/untype read/write opcodes at all. The end result is that all >> typed/untyped read/write/atomics end up as indirect sends. >> >> Code generation should always produce either an immediate or an actual >> indirect surface index, so we can fix this by just special casing >> immediates in emit_uniformize. >> >> Signed-off-by: Kristian Høgsberg Kristensen >> --- >> src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h >> b/src/mesa/drivers/dri/i965/brw_fs_builder.h >> index df10a9d..98ce71e 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h >> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h >> @@ -394,6 +394,9 @@ namespace brw { >> const dst_reg chan_index = component(vgrf(BRW_REGISTER_TYPE_UD), >> 0); >> const dst_reg dst = component(vgrf(src.type), 0); >> >> + if (src.file == IMM) >> +return src; >> + >> ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index); >> ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index); >> >> -- >> 2.6.2 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev [1] http://lists.freedesktop.org/archives/mesa-dev/2015-October/097085.html [2] http://lists.freedesktop.org/archives/mesa-dev/attachments/20151014/25dd38dc/attachment.patch diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp index 230b0ca..3970bd8 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp @@ -635,6 +635,18 @@ fs_visitor::try_constant_propagate(fs_inst *inst, acp_entry *entry) progress = true; break; + case SHADER_OPCODE_UNTYPED_SURFACE_READ: + case SHADER_OPCODE_UNTYPED_SURFACE_WRITE: + case SHADER_OPCODE_UNTYPED_ATOMIC: + case SHADER_OPCODE_TYPED_SURFACE_READ: + case SHADER_OPCODE_TYPED_SURFACE_WRITE: + case SHADER_OPCODE_TYPED_ATOMIC: + if (i > 0) { +inst->src[i] = val; +progress = true; + } + break; + default: break; } diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp index 610caef..e4f5ad2 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp @@ -248,6 +248,
Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates
Just a thought, would it be better to move this check into the eliminate_find_live_channel optimisation? That way it could catch sources that become immediates through later optimisations. One problem with this that I've seen before is that eliminating the FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be eliminated because the copy propagation doesn't work. I made this patch a while ago but I never posted it anywhere because it's a of a kludge and it would probably be better to fix the copy propagation: https://github.com/bpeel/mesa/commit/e4c3286075f891f466fb8558106d2aaa Either way though I don't think it would do any harm to have Kristian's patch as well even if we did improve elimintate_find_live_channel so it is: Reviewed-by: Neil Roberts - Neil Kristian Høgsberg Kristensen writes: > An immdiate is already uniform so just return it up front. Without this, > brw_fs_surface_builder ends up passing immediate surface indices through > SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't > constant propagate out of, and further, we don't constant propagate into > the typed/untype read/write opcodes at all. The end result is that all > typed/untyped read/write/atomics end up as indirect sends. > > Code generation should always produce either an immediate or an actual > indirect surface index, so we can fix this by just special casing > immediates in emit_uniformize. > > Signed-off-by: Kristian Høgsberg Kristensen > --- > src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h > b/src/mesa/drivers/dri/i965/brw_fs_builder.h > index df10a9d..98ce71e 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h > +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h > @@ -394,6 +394,9 @@ namespace brw { > const dst_reg chan_index = component(vgrf(BRW_REGISTER_TYPE_UD), 0); > const dst_reg dst = component(vgrf(src.type), 0); > > + if (src.file == IMM) > +return src; > + > ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index); > ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index); > > -- > 2.6.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates
On Sun, 2015-10-18 at 21:31 -0700, Kristian Høgsberg Kristensen wrote: > An immdiate is already uniform so just return it up front. Without this, > brw_fs_surface_builder ends up passing immediate surface indices through > SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't > constant propagate out of, and further, we don't constant propagate into > the typed/untype read/write opcodes at all. The end result is that all > typed/untyped read/write/atomics end up as indirect sends. > > Code generation should always produce either an immediate or an actual > indirect surface index, so we can fix this by just special casing > immediates in emit_uniformize. > > Signed-off-by: Kristian Høgsberg Kristensen > --- > src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h > b/src/mesa/drivers/dri/i965/brw_fs_builder.h > index df10a9d..98ce71e 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h > +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h > @@ -394,6 +394,9 @@ namespace brw { > const dst_reg chan_index = component(vgrf(BRW_REGISTER_TYPE_UD), 0); > const dst_reg dst = component(vgrf(src.type), 0); > > + if (src.file == IMM) > +return src; > + We can move this check to the top of emit_uniformize(), right? > ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index); > ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index); > Iago ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates
An immdiate is already uniform so just return it up front. Without this, brw_fs_surface_builder ends up passing immediate surface indices through SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't constant propagate out of, and further, we don't constant propagate into the typed/untype read/write opcodes at all. The end result is that all typed/untyped read/write/atomics end up as indirect sends. Code generation should always produce either an immediate or an actual indirect surface index, so we can fix this by just special casing immediates in emit_uniformize. Signed-off-by: Kristian Høgsberg Kristensen --- src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h b/src/mesa/drivers/dri/i965/brw_fs_builder.h index df10a9d..98ce71e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h @@ -394,6 +394,9 @@ namespace brw { const dst_reg chan_index = component(vgrf(BRW_REGISTER_TYPE_UD), 0); const dst_reg dst = component(vgrf(src.type), 0); + if (src.file == IMM) +return src; + ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index); ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index); -- 2.6.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev