Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates

2015-10-23 Thread Francisco Jerez
Kristian Høgsberg  writes:

> On Tue, Oct 20, 2015 at 11:56 AM, Francisco Jerez  
> wrote:
>> Kristian Høgsberg  writes:
>>
>>> On Tue, Oct 20, 2015 at 3:16 AM, Francisco Jerez  
>>> wrote:
 Kristian Høgsberg  writes:

> On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez  
> wrote:
>> Neil Roberts  writes:
>>
>>> Just a thought, would it be better to move this check into the
>>> eliminate_find_live_channel optimisation? That way it could catch
>>> sources that become immediates through later optimisations. One problem
>>> with this that I've seen before is that eliminating the
>>> FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be
>>> eliminated because the copy propagation doesn't work.
>>
>> I believe in this particular case the BROADCAST instruction should
>> already be eliminated (by opt_algebraic), because its first source is an
>> immediate, so this optimization would in fact be redundant with
>> opt_algebraic+constant propagation if it weren't because the latter is
>> unable to handle scalar copies.
>
> Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets
> eliminated because it's outside control flow
> (fs_visitor::eliminate_find_live_channel). Then the broadcast gets
> reduced to a MOV in opt_algebraic because src0 is uniform (immediate
> constant). The problem then is the dst of the MOV has stride 0 which
> can_propagate_from() then bails on.
>
>> Another possibility that would likely
>> be more general than this (because it would handle cases in which, as
>> Neil said, the argument becomes an immediate only after optimization,
>> and because it would also avoid the issue with liveness analysis
>> extending the live ranges of scalar values incorrectly [1]) would be to
>> extend the destination register of the BROADCAST instructions to a
>> vector [2] and then add a case to the switch statement of
>> try_constant_propagate() so that the immediate MOV resulting from
>> opt_algebraic is propagated into surface instructions (see attachment).
>
> I wrote exactly this code to make constant propagate work, plus adding
> the extra opcodes to the switch statement. It works and I could
> certainly send that out after this, but
>
> 1) This doesn't mean we shouldn't do the if (src.file == IMM)
> shortcut. It saves the compiler a bit of work in the very common case
> of
> non-indirect buffer access.
>
> 2) I'm not even sure it makes sense to extend copy-propagation to do
> this (which is why I went back to just the IMM test). Anything that
> would be an immediate at this point should be an immediate, if not
> we're missing something in nir.
>
 Still this doesn't address the root of the problem, which is that
 emit_uniformize() emits scalar code that the rest of the compiler is not
 able to handle properly.
>>>
>>> This is not a case of papering over the root cause, it's about not
>>> creating the root cause in the first place. The output of
>>> emit_uniformize() always ends up as either a surface or a sampler
>>> index, which we only look at later in the generator. There are no
>>> other cases where the result of emit_uniformize() might be part of an
>>> expression that we can copy propagate or otherwise optimize. If the
>>> input to emit_uniformize() isn't an immediate where it could be, nir
>>> optimization needs fixing. So if we add these two lines to
>>> emit_uniformize() to pass immediates straight through, we avoid
>>> generating code that we have to extend the copy prop pass to handle.
>>>
>>
>> Kristian, there are legitimate uses of emit_uniformize() in which the
>> argument is not an immediate but still can be optimized out later on --
>> E.g. for images it will frequently be a uniform register, or a
>> non-constant expression calculated within uniform control flow (in which
>> case the FIND_LIVE_CHANNEL instruction emitted here will be reduced to a
>> constant MOV by eliminate_find_live_chaso nnel()).  In such cases we still
>> want copy-propagation to kick in, but it wont because of the problem I
>> was talking about with scalar writes.  Even if the instructions emitted
>> by emit_uniformize() cannot be optimized out, liveness analysis will
>> overestimate the live ranges of the temporaries used by
>> emit_uniformize() for the same reason, potentially causing the register
>> allocator to spill or run out of registers.
>
> Yeah, fair point. I went and took a look at non-constant surface index
> and sent out a new series that make that work better as well as ssbo
> write optimizations. I took out the imm shortcut and put in the
> generic optimizations. It makes non-const surface index a little
> better, but as I wrote in the 

Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates

2015-10-23 Thread Kristian Høgsberg
On Fri, Oct 23, 2015 at 5:38 AM, Francisco Jerez  wrote:
> Kristian Høgsberg  writes:
>
>> On Tue, Oct 20, 2015 at 11:56 AM, Francisco Jerez  
>> wrote:
>>> Kristian Høgsberg  writes:
>>>
 On Tue, Oct 20, 2015 at 3:16 AM, Francisco Jerez  
 wrote:
> Kristian Høgsberg  writes:
>
>> On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez  
>> wrote:
>>> Neil Roberts  writes:
>>>
 Just a thought, would it be better to move this check into the
 eliminate_find_live_channel optimisation? That way it could catch
 sources that become immediates through later optimisations. One problem
 with this that I've seen before is that eliminating the
 FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be
 eliminated because the copy propagation doesn't work.
>>>
>>> I believe in this particular case the BROADCAST instruction should
>>> already be eliminated (by opt_algebraic), because its first source is an
>>> immediate, so this optimization would in fact be redundant with
>>> opt_algebraic+constant propagation if it weren't because the latter is
>>> unable to handle scalar copies.
>>
>> Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets
>> eliminated because it's outside control flow
>> (fs_visitor::eliminate_find_live_channel). Then the broadcast gets
>> reduced to a MOV in opt_algebraic because src0 is uniform (immediate
>> constant). The problem then is the dst of the MOV has stride 0 which
>> can_propagate_from() then bails on.
>>
>>> Another possibility that would likely
>>> be more general than this (because it would handle cases in which, as
>>> Neil said, the argument becomes an immediate only after optimization,
>>> and because it would also avoid the issue with liveness analysis
>>> extending the live ranges of scalar values incorrectly [1]) would be to
>>> extend the destination register of the BROADCAST instructions to a
>>> vector [2] and then add a case to the switch statement of
>>> try_constant_propagate() so that the immediate MOV resulting from
>>> opt_algebraic is propagated into surface instructions (see attachment).
>>
>> I wrote exactly this code to make constant propagate work, plus adding
>> the extra opcodes to the switch statement. It works and I could
>> certainly send that out after this, but
>>
>> 1) This doesn't mean we shouldn't do the if (src.file == IMM)
>> shortcut. It saves the compiler a bit of work in the very common case
>> of
>> non-indirect buffer access.
>>
>> 2) I'm not even sure it makes sense to extend copy-propagation to do
>> this (which is why I went back to just the IMM test). Anything that
>> would be an immediate at this point should be an immediate, if not
>> we're missing something in nir.
>>
> Still this doesn't address the root of the problem, which is that
> emit_uniformize() emits scalar code that the rest of the compiler is not
> able to handle properly.

 This is not a case of papering over the root cause, it's about not
 creating the root cause in the first place. The output of
 emit_uniformize() always ends up as either a surface or a sampler
 index, which we only look at later in the generator. There are no
 other cases where the result of emit_uniformize() might be part of an
 expression that we can copy propagate or otherwise optimize. If the
 input to emit_uniformize() isn't an immediate where it could be, nir
 optimization needs fixing. So if we add these two lines to
 emit_uniformize() to pass immediates straight through, we avoid
 generating code that we have to extend the copy prop pass to handle.

>>>
>>> Kristian, there are legitimate uses of emit_uniformize() in which the
>>> argument is not an immediate but still can be optimized out later on --
>>> E.g. for images it will frequently be a uniform register, or a
>>> non-constant expression calculated within uniform control flow (in which
>>> case the FIND_LIVE_CHANNEL instruction emitted here will be reduced to a
>>> constant MOV by eliminate_find_live_chaso nnel()).  In such cases we still
>>> want copy-propagation to kick in, but it wont because of the problem I
>>> was talking about with scalar writes.  Even if the instructions emitted
>>> by emit_uniformize() cannot be optimized out, liveness analysis will
>>> overestimate the live ranges of the temporaries used by
>>> emit_uniformize() for the same reason, potentially causing the register
>>> allocator to spill or run out of registers.
>>
>> Yeah, fair point. I went and took a look at non-constant surface index
>> and sent out a new series that make that work better as well as 

Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates

2015-10-22 Thread Kristian Høgsberg
On Tue, Oct 20, 2015 at 11:56 AM, Francisco Jerez  wrote:
> Kristian Høgsberg  writes:
>
>> On Tue, Oct 20, 2015 at 3:16 AM, Francisco Jerez  
>> wrote:
>>> Kristian Høgsberg  writes:
>>>
 On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez  
 wrote:
> Neil Roberts  writes:
>
>> Just a thought, would it be better to move this check into the
>> eliminate_find_live_channel optimisation? That way it could catch
>> sources that become immediates through later optimisations. One problem
>> with this that I've seen before is that eliminating the
>> FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be
>> eliminated because the copy propagation doesn't work.
>
> I believe in this particular case the BROADCAST instruction should
> already be eliminated (by opt_algebraic), because its first source is an
> immediate, so this optimization would in fact be redundant with
> opt_algebraic+constant propagation if it weren't because the latter is
> unable to handle scalar copies.

 Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets
 eliminated because it's outside control flow
 (fs_visitor::eliminate_find_live_channel). Then the broadcast gets
 reduced to a MOV in opt_algebraic because src0 is uniform (immediate
 constant). The problem then is the dst of the MOV has stride 0 which
 can_propagate_from() then bails on.

> Another possibility that would likely
> be more general than this (because it would handle cases in which, as
> Neil said, the argument becomes an immediate only after optimization,
> and because it would also avoid the issue with liveness analysis
> extending the live ranges of scalar values incorrectly [1]) would be to
> extend the destination register of the BROADCAST instructions to a
> vector [2] and then add a case to the switch statement of
> try_constant_propagate() so that the immediate MOV resulting from
> opt_algebraic is propagated into surface instructions (see attachment).

 I wrote exactly this code to make constant propagate work, plus adding
 the extra opcodes to the switch statement. It works and I could
 certainly send that out after this, but

 1) This doesn't mean we shouldn't do the if (src.file == IMM)
 shortcut. It saves the compiler a bit of work in the very common case
 of
 non-indirect buffer access.

 2) I'm not even sure it makes sense to extend copy-propagation to do
 this (which is why I went back to just the IMM test). Anything that
 would be an immediate at this point should be an immediate, if not
 we're missing something in nir.

>>> Still this doesn't address the root of the problem, which is that
>>> emit_uniformize() emits scalar code that the rest of the compiler is not
>>> able to handle properly.
>>
>> This is not a case of papering over the root cause, it's about not
>> creating the root cause in the first place. The output of
>> emit_uniformize() always ends up as either a surface or a sampler
>> index, which we only look at later in the generator. There are no
>> other cases where the result of emit_uniformize() might be part of an
>> expression that we can copy propagate or otherwise optimize. If the
>> input to emit_uniformize() isn't an immediate where it could be, nir
>> optimization needs fixing. So if we add these two lines to
>> emit_uniformize() to pass immediates straight through, we avoid
>> generating code that we have to extend the copy prop pass to handle.
>>
>
> Kristian, there are legitimate uses of emit_uniformize() in which the
> argument is not an immediate but still can be optimized out later on --
> E.g. for images it will frequently be a uniform register, or a
> non-constant expression calculated within uniform control flow (in which
> case the FIND_LIVE_CHANNEL instruction emitted here will be reduced to a
> constant MOV by eliminate_find_live_chaso nnel()).  In such cases we still
> want copy-propagation to kick in, but it wont because of the problem I
> was talking about with scalar writes.  Even if the instructions emitted
> by emit_uniformize() cannot be optimized out, liveness analysis will
> overestimate the live ranges of the temporaries used by
> emit_uniformize() for the same reason, potentially causing the register
> allocator to spill or run out of registers.

Yeah, fair point. I went and took a look at non-constant surface index
and sent out a new series that make that work better as well as ssbo
write optimizations. I took out the imm shortcut and put in the
generic optimizations. It makes non-const surface index a little
better, but as I wrote in the cover letter, it still doesn't work that
well as we don't propagate the surface index into the send
instructions.

> Anyway don't take me wrong, 

Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates

2015-10-20 Thread Francisco Jerez
Kristian Høgsberg  writes:

> On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez  
> wrote:
>> Neil Roberts  writes:
>>
>>> Just a thought, would it be better to move this check into the
>>> eliminate_find_live_channel optimisation? That way it could catch
>>> sources that become immediates through later optimisations. One problem
>>> with this that I've seen before is that eliminating the
>>> FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be
>>> eliminated because the copy propagation doesn't work.
>>
>> I believe in this particular case the BROADCAST instruction should
>> already be eliminated (by opt_algebraic), because its first source is an
>> immediate, so this optimization would in fact be redundant with
>> opt_algebraic+constant propagation if it weren't because the latter is
>> unable to handle scalar copies.
>
> Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets
> eliminated because it's outside control flow
> (fs_visitor::eliminate_find_live_channel). Then the broadcast gets
> reduced to a MOV in opt_algebraic because src0 is uniform (immediate
> constant). The problem then is the dst of the MOV has stride 0 which
> can_propagate_from() then bails on.
>
>> Another possibility that would likely
>> be more general than this (because it would handle cases in which, as
>> Neil said, the argument becomes an immediate only after optimization,
>> and because it would also avoid the issue with liveness analysis
>> extending the live ranges of scalar values incorrectly [1]) would be to
>> extend the destination register of the BROADCAST instructions to a
>> vector [2] and then add a case to the switch statement of
>> try_constant_propagate() so that the immediate MOV resulting from
>> opt_algebraic is propagated into surface instructions (see attachment).
>
> I wrote exactly this code to make constant propagate work, plus adding
> the extra opcodes to the switch statement. It works and I could
> certainly send that out after this, but
>
> 1) This doesn't mean we shouldn't do the if (src.file == IMM)
> shortcut. It saves the compiler a bit of work in the very common case
> of
> non-indirect buffer access.
>
> 2) I'm not even sure it makes sense to extend copy-propagation to do
> this (which is why I went back to just the IMM test). Anything that
> would be an immediate at this point should be an immediate, if not
> we're missing something in nir.
>
Still this doesn't address the root of the problem, which is that
emit_uniformize() emits scalar code that the rest of the compiler is not
able to handle properly.  Re-implementing a special case of the
optimization already done by opt_algebraic() might have helped in this
specific case, but it won't help when the argument is of some other kind
of uniform value which are also frequently encountered in practice and
could also be copy-propagated, and it won't help avoid the liveness
analysis bug [1] in cases where the argument is not an immediate (the
bug is reported to break some thousands SSBO dEQP tests, although I
don't know what fraction of them actually use non-constant indexing).
The alternative solution I provided patches for (and you seem to have
implemented yourself independently) would address all these issues at
once.

> Kristian
>
>>> I made this patch a while ago but I never posted it anywhere because
>>> it's a of a kludge and it would probably be better to fix the copy
>>> propagation:
>>>
>>> https://github.com/bpeel/mesa/commit/e4c3286075f891f466fb8558106d2aaa
>>>
>> Heh, yeah, I'd rather fix copy propagation instead, which I believe will
>> become much easier with the use-def-chain analysis pass I'm working on.
>>
>>> Either way though I don't think it would do any harm to have Kristian's
>>> patch as well even if we did improve elimintate_find_live_channel so it
>>> is:
>>>
>>> Reviewed-by: Neil Roberts 
>>>
>>> - Neil
>>>
>>> Kristian Høgsberg Kristensen  writes:
>>>
 An immdiate is already uniform so just return it up front. Without this,
 brw_fs_surface_builder ends up passing immediate surface indices through
 SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't
 constant propagate out of, and further, we don't constant propagate into
 the typed/untype read/write opcodes at all.  The end result is that all
 typed/untyped read/write/atomics end up as indirect sends.

 Code generation should always produce either an immediate or an actual
 indirect surface index, so we can fix this by just special casing
 immediates in emit_uniformize.

 Signed-off-by: Kristian Høgsberg Kristensen 
 ---
  src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++
  1 file changed, 3 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h 
 b/src/mesa/drivers/dri/i965/brw_fs_builder.h
 index df10a9d..98ce71e 

Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates

2015-10-20 Thread Jordan Justen
I wrote a similar patch a while back because I was annoyed by how this
was causing the send disassembly to not be as nice. :) I dropped it
from my branch at some point, but I still think it is a good idea.

Reviewed-by: Jordan Justen 

On 2015-10-18 21:31:43, Kristian Høgsberg Kristensen wrote:
> An immdiate is already uniform so just return it up front. Without this,
> brw_fs_surface_builder ends up passing immediate surface indices through
> SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't
> constant propagate out of, and further, we don't constant propagate into
> the typed/untype read/write opcodes at all.  The end result is that all
> typed/untyped read/write/atomics end up as indirect sends.
> 
> Code generation should always produce either an immediate or an actual
> indirect surface index, so we can fix this by just special casing
> immediates in emit_uniformize.
> 
> Signed-off-by: Kristian Høgsberg Kristensen 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h 
> b/src/mesa/drivers/dri/i965/brw_fs_builder.h
> index df10a9d..98ce71e 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
> @@ -394,6 +394,9 @@ namespace brw {
>   const dst_reg chan_index = component(vgrf(BRW_REGISTER_TYPE_UD), 0);
>   const dst_reg dst = component(vgrf(src.type), 0);
>  
> + if (src.file == IMM)
> +return src;
> +
>   ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index);
>   ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index);
>  
> -- 
> 2.6.2
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates

2015-10-20 Thread Kristian Høgsberg
On Tue, Oct 20, 2015 at 3:16 AM, Francisco Jerez  wrote:
> Kristian Høgsberg  writes:
>
>> On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez  
>> wrote:
>>> Neil Roberts  writes:
>>>
 Just a thought, would it be better to move this check into the
 eliminate_find_live_channel optimisation? That way it could catch
 sources that become immediates through later optimisations. One problem
 with this that I've seen before is that eliminating the
 FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be
 eliminated because the copy propagation doesn't work.
>>>
>>> I believe in this particular case the BROADCAST instruction should
>>> already be eliminated (by opt_algebraic), because its first source is an
>>> immediate, so this optimization would in fact be redundant with
>>> opt_algebraic+constant propagation if it weren't because the latter is
>>> unable to handle scalar copies.
>>
>> Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets
>> eliminated because it's outside control flow
>> (fs_visitor::eliminate_find_live_channel). Then the broadcast gets
>> reduced to a MOV in opt_algebraic because src0 is uniform (immediate
>> constant). The problem then is the dst of the MOV has stride 0 which
>> can_propagate_from() then bails on.
>>
>>> Another possibility that would likely
>>> be more general than this (because it would handle cases in which, as
>>> Neil said, the argument becomes an immediate only after optimization,
>>> and because it would also avoid the issue with liveness analysis
>>> extending the live ranges of scalar values incorrectly [1]) would be to
>>> extend the destination register of the BROADCAST instructions to a
>>> vector [2] and then add a case to the switch statement of
>>> try_constant_propagate() so that the immediate MOV resulting from
>>> opt_algebraic is propagated into surface instructions (see attachment).
>>
>> I wrote exactly this code to make constant propagate work, plus adding
>> the extra opcodes to the switch statement. It works and I could
>> certainly send that out after this, but
>>
>> 1) This doesn't mean we shouldn't do the if (src.file == IMM)
>> shortcut. It saves the compiler a bit of work in the very common case
>> of
>> non-indirect buffer access.
>>
>> 2) I'm not even sure it makes sense to extend copy-propagation to do
>> this (which is why I went back to just the IMM test). Anything that
>> would be an immediate at this point should be an immediate, if not
>> we're missing something in nir.
>>
> Still this doesn't address the root of the problem, which is that
> emit_uniformize() emits scalar code that the rest of the compiler is not
> able to handle properly.

This is not a case of papering over the root cause, it's about not
creating the root cause in the first place. The output of
emit_uniformize() always ends up as either a surface or a sampler
index, which we only look at later in the generator. There are no
other cases where the result of emit_uniformize() might be part of an
expression that we can copy propagate or otherwise optimize. If the
input to emit_uniformize() isn't an immediate where it could be, nir
optimization needs fixing. So if we add these two lines to
emit_uniformize() to pass immediates straight through, we avoid
generating code that we have to extend the copy prop pass to handle.

Kristian

> Re-implementing a special case of the
> optimization already done by opt_algebraic() might have helped in this
> specific case, but it won't help when the argument is of some other kind
> of uniform value which are also frequently encountered in practice and
> could also be copy-propagated, and it won't help avoid the liveness
> analysis bug [1] in cases where the argument is not an immediate (the
> bug is reported to break some thousands SSBO dEQP tests, although I
> don't know what fraction of them actually use non-constant indexing).
> The alternative solution I provided patches for (and you seem to have
> implemented yourself independently) would address all these issues at
> once.
>
>> Kristian
>>
 I made this patch a while ago but I never posted it anywhere because
 it's a of a kludge and it would probably be better to fix the copy
 propagation:

 https://github.com/bpeel/mesa/commit/e4c3286075f891f466fb8558106d2aaa

>>> Heh, yeah, I'd rather fix copy propagation instead, which I believe will
>>> become much easier with the use-def-chain analysis pass I'm working on.
>>>
 Either way though I don't think it would do any harm to have Kristian's
 patch as well even if we did improve elimintate_find_live_channel so it
 is:

 Reviewed-by: Neil Roberts 

 - Neil

 Kristian Høgsberg Kristensen  writes:

> An immdiate is already uniform so just return it up front. Without this,
> brw_fs_surface_builder ends up 

Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates

2015-10-20 Thread Francisco Jerez
Kristian Høgsberg  writes:

> On Tue, Oct 20, 2015 at 3:16 AM, Francisco Jerez  
> wrote:
>> Kristian Høgsberg  writes:
>>
>>> On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez  
>>> wrote:
 Neil Roberts  writes:

> Just a thought, would it be better to move this check into the
> eliminate_find_live_channel optimisation? That way it could catch
> sources that become immediates through later optimisations. One problem
> with this that I've seen before is that eliminating the
> FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be
> eliminated because the copy propagation doesn't work.

 I believe in this particular case the BROADCAST instruction should
 already be eliminated (by opt_algebraic), because its first source is an
 immediate, so this optimization would in fact be redundant with
 opt_algebraic+constant propagation if it weren't because the latter is
 unable to handle scalar copies.
>>>
>>> Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets
>>> eliminated because it's outside control flow
>>> (fs_visitor::eliminate_find_live_channel). Then the broadcast gets
>>> reduced to a MOV in opt_algebraic because src0 is uniform (immediate
>>> constant). The problem then is the dst of the MOV has stride 0 which
>>> can_propagate_from() then bails on.
>>>
 Another possibility that would likely
 be more general than this (because it would handle cases in which, as
 Neil said, the argument becomes an immediate only after optimization,
 and because it would also avoid the issue with liveness analysis
 extending the live ranges of scalar values incorrectly [1]) would be to
 extend the destination register of the BROADCAST instructions to a
 vector [2] and then add a case to the switch statement of
 try_constant_propagate() so that the immediate MOV resulting from
 opt_algebraic is propagated into surface instructions (see attachment).
>>>
>>> I wrote exactly this code to make constant propagate work, plus adding
>>> the extra opcodes to the switch statement. It works and I could
>>> certainly send that out after this, but
>>>
>>> 1) This doesn't mean we shouldn't do the if (src.file == IMM)
>>> shortcut. It saves the compiler a bit of work in the very common case
>>> of
>>> non-indirect buffer access.
>>>
>>> 2) I'm not even sure it makes sense to extend copy-propagation to do
>>> this (which is why I went back to just the IMM test). Anything that
>>> would be an immediate at this point should be an immediate, if not
>>> we're missing something in nir.
>>>
>> Still this doesn't address the root of the problem, which is that
>> emit_uniformize() emits scalar code that the rest of the compiler is not
>> able to handle properly.
>
> This is not a case of papering over the root cause, it's about not
> creating the root cause in the first place. The output of
> emit_uniformize() always ends up as either a surface or a sampler
> index, which we only look at later in the generator. There are no
> other cases where the result of emit_uniformize() might be part of an
> expression that we can copy propagate or otherwise optimize. If the
> input to emit_uniformize() isn't an immediate where it could be, nir
> optimization needs fixing. So if we add these two lines to
> emit_uniformize() to pass immediates straight through, we avoid
> generating code that we have to extend the copy prop pass to handle.
>

Kristian, there are legitimate uses of emit_uniformize() in which the
argument is not an immediate but still can be optimized out later on --
E.g. for images it will frequently be a uniform register, or a
non-constant expression calculated within uniform control flow (in which
case the FIND_LIVE_CHANNEL instruction emitted here will be reduced to a
constant MOV by eliminate_find_live_channel()).  In such cases we still
want copy-propagation to kick in, but it wont because of the problem I
was talking about with scalar writes.  Even if the instructions emitted
by emit_uniformize() cannot be optimized out, liveness analysis will
overestimate the live ranges of the temporaries used by
emit_uniformize() for the same reason, potentially causing the register
allocator to spill or run out of registers.

Anyway don't take me wrong, I'm not NAK-ing your patch or anything, I
just have the feeling that by fixing this more generally you could've
saved us [or most likely me ;)] work in the near future.

> Kristian
>
>> Re-implementing a special case of the
>> optimization already done by opt_algebraic() might have helped in this
>> specific case, but it won't help when the argument is of some other kind
>> of uniform value which are also frequently encountered in practice and
>> could also be copy-propagated, and it won't help avoid the liveness
>> analysis bug [1] in cases where the argument is not an immediate (the
>> 

Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates

2015-10-19 Thread Francisco Jerez
Neil Roberts  writes:

> Just a thought, would it be better to move this check into the
> eliminate_find_live_channel optimisation? That way it could catch
> sources that become immediates through later optimisations. One problem
> with this that I've seen before is that eliminating the
> FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be
> eliminated because the copy propagation doesn't work.

I believe in this particular case the BROADCAST instruction should
already be eliminated (by opt_algebraic), because its first source is an
immediate, so this optimization would in fact be redundant with
opt_algebraic+constant propagation if it weren't because the latter is
unable to handle scalar copies.  Another possibility that would likely
be more general than this (because it would handle cases in which, as
Neil said, the argument becomes an immediate only after optimization,
and because it would also avoid the issue with liveness analysis
extending the live ranges of scalar values incorrectly [1]) would be to
extend the destination register of the BROADCAST instructions to a
vector [2] and then add a case to the switch statement of
try_constant_propagate() so that the immediate MOV resulting from
opt_algebraic is propagated into surface instructions (see attachment).

> I made this patch a while ago but I never posted it anywhere because
> it's a of a kludge and it would probably be better to fix the copy
> propagation:
>
> https://github.com/bpeel/mesa/commit/e4c3286075f891f466fb8558106d2aaa
>
Heh, yeah, I'd rather fix copy propagation instead, which I believe will
become much easier with the use-def-chain analysis pass I'm working on.

> Either way though I don't think it would do any harm to have Kristian's
> patch as well even if we did improve elimintate_find_live_channel so it
> is:
>
> Reviewed-by: Neil Roberts 
>
> - Neil
>
> Kristian Høgsberg Kristensen  writes:
>
>> An immdiate is already uniform so just return it up front. Without this,
>> brw_fs_surface_builder ends up passing immediate surface indices through
>> SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't
>> constant propagate out of, and further, we don't constant propagate into
>> the typed/untype read/write opcodes at all.  The end result is that all
>> typed/untyped read/write/atomics end up as indirect sends.
>>
>> Code generation should always produce either an immediate or an actual
>> indirect surface index, so we can fix this by just special casing
>> immediates in emit_uniformize.
>>
>> Signed-off-by: Kristian Høgsberg Kristensen 
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h 
>> b/src/mesa/drivers/dri/i965/brw_fs_builder.h
>> index df10a9d..98ce71e 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
>> @@ -394,6 +394,9 @@ namespace brw {
>>   const dst_reg chan_index = component(vgrf(BRW_REGISTER_TYPE_UD), 
>> 0);
>>   const dst_reg dst = component(vgrf(src.type), 0);
>>  
>> + if (src.file == IMM)
>> +return src;
>> +
>>   ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index);
>>   ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index);
>>  
>> -- 
>> 2.6.2
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[1] http://lists.freedesktop.org/archives/mesa-dev/2015-October/097085.html
[2] 
http://lists.freedesktop.org/archives/mesa-dev/attachments/20151014/25dd38dc/attachment.patch

diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
index 230b0ca..3970bd8 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
@@ -635,6 +635,18 @@ fs_visitor::try_constant_propagate(fs_inst *inst, acp_entry *entry)
  progress = true;
  break;
 
+  case SHADER_OPCODE_UNTYPED_SURFACE_READ:
+  case SHADER_OPCODE_UNTYPED_SURFACE_WRITE:
+  case SHADER_OPCODE_UNTYPED_ATOMIC:
+  case SHADER_OPCODE_TYPED_SURFACE_READ:
+  case SHADER_OPCODE_TYPED_SURFACE_WRITE:
+  case SHADER_OPCODE_TYPED_ATOMIC:
+ if (i > 0) {
+inst->src[i] = val;
+progress = true;
+ }
+ break;
+
   default:
  break;
   }
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
index 610caef..e4f5ad2 100644
--- 

Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates

2015-10-19 Thread Iago Toral
On Sun, 2015-10-18 at 21:31 -0700, Kristian Høgsberg Kristensen wrote:
> An immdiate is already uniform so just return it up front. Without this,
> brw_fs_surface_builder ends up passing immediate surface indices through
> SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't
> constant propagate out of, and further, we don't constant propagate into
> the typed/untype read/write opcodes at all.  The end result is that all
> typed/untyped read/write/atomics end up as indirect sends.
> 
> Code generation should always produce either an immediate or an actual
> indirect surface index, so we can fix this by just special casing
> immediates in emit_uniformize.
> 
> Signed-off-by: Kristian Høgsberg Kristensen 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h 
> b/src/mesa/drivers/dri/i965/brw_fs_builder.h
> index df10a9d..98ce71e 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
> @@ -394,6 +394,9 @@ namespace brw {
>   const dst_reg chan_index = component(vgrf(BRW_REGISTER_TYPE_UD), 0);
>   const dst_reg dst = component(vgrf(src.type), 0);
>  
> + if (src.file == IMM)
> +return src;
> +

We can move this check to the top of emit_uniformize(), right?

>   ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index);
>   ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index);
>  

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates

2015-10-19 Thread Neil Roberts
Just a thought, would it be better to move this check into the
eliminate_find_live_channel optimisation? That way it could catch
sources that become immediates through later optimisations. One problem
with this that I've seen before is that eliminating the
FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be
eliminated because the copy propagation doesn't work. I made this patch
a while ago but I never posted it anywhere because it's a of a kludge
and it would probably be better to fix the copy propagation:

https://github.com/bpeel/mesa/commit/e4c3286075f891f466fb8558106d2aaa

Either way though I don't think it would do any harm to have Kristian's
patch as well even if we did improve elimintate_find_live_channel so it
is:

Reviewed-by: Neil Roberts 

- Neil

Kristian Høgsberg Kristensen  writes:

> An immdiate is already uniform so just return it up front. Without this,
> brw_fs_surface_builder ends up passing immediate surface indices through
> SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't
> constant propagate out of, and further, we don't constant propagate into
> the typed/untype read/write opcodes at all.  The end result is that all
> typed/untyped read/write/atomics end up as indirect sends.
>
> Code generation should always produce either an immediate or an actual
> indirect surface index, so we can fix this by just special casing
> immediates in emit_uniformize.
>
> Signed-off-by: Kristian Høgsberg Kristensen 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h 
> b/src/mesa/drivers/dri/i965/brw_fs_builder.h
> index df10a9d..98ce71e 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
> @@ -394,6 +394,9 @@ namespace brw {
>   const dst_reg chan_index = component(vgrf(BRW_REGISTER_TYPE_UD), 0);
>   const dst_reg dst = component(vgrf(src.type), 0);
>  
> + if (src.file == IMM)
> +return src;
> +
>   ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index);
>   ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index);
>  
> -- 
> 2.6.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates

2015-10-19 Thread Kristian Høgsberg
On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez  wrote:
> Neil Roberts  writes:
>
>> Just a thought, would it be better to move this check into the
>> eliminate_find_live_channel optimisation? That way it could catch
>> sources that become immediates through later optimisations. One problem
>> with this that I've seen before is that eliminating the
>> FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be
>> eliminated because the copy propagation doesn't work.
>
> I believe in this particular case the BROADCAST instruction should
> already be eliminated (by opt_algebraic), because its first source is an
> immediate, so this optimization would in fact be redundant with
> opt_algebraic+constant propagation if it weren't because the latter is
> unable to handle scalar copies.

Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets
eliminated because it's outside control flow
(fs_visitor::eliminate_find_live_channel). Then the broadcast gets
reduced to a MOV in opt_algebraic because src0 is uniform (immediate
constant). The problem then is the dst of the MOV has stride 0 which
can_propagate_from() then bails on.

> Another possibility that would likely
> be more general than this (because it would handle cases in which, as
> Neil said, the argument becomes an immediate only after optimization,
> and because it would also avoid the issue with liveness analysis
> extending the live ranges of scalar values incorrectly [1]) would be to
> extend the destination register of the BROADCAST instructions to a
> vector [2] and then add a case to the switch statement of
> try_constant_propagate() so that the immediate MOV resulting from
> opt_algebraic is propagated into surface instructions (see attachment).

I wrote exactly this code to make constant propagate work, plus adding
the extra opcodes to the switch statement. It works and I could
certainly send that out after this, but

1) This doesn't mean we shouldn't do the if (src.file == IMM)
shortcut. It saves the compiler a bit of work in the very common case
of
non-indirect buffer access.

2) I'm not even sure it makes sense to extend copy-propagation to do
this (which is why I went back to just the IMM test). Anything that
would be an immediate at this point should be an immediate, if not
we're missing something in nir.

Kristian

>> I made this patch a while ago but I never posted it anywhere because
>> it's a of a kludge and it would probably be better to fix the copy
>> propagation:
>>
>> https://github.com/bpeel/mesa/commit/e4c3286075f891f466fb8558106d2aaa
>>
> Heh, yeah, I'd rather fix copy propagation instead, which I believe will
> become much easier with the use-def-chain analysis pass I'm working on.
>
>> Either way though I don't think it would do any harm to have Kristian's
>> patch as well even if we did improve elimintate_find_live_channel so it
>> is:
>>
>> Reviewed-by: Neil Roberts 
>>
>> - Neil
>>
>> Kristian Høgsberg Kristensen  writes:
>>
>>> An immdiate is already uniform so just return it up front. Without this,
>>> brw_fs_surface_builder ends up passing immediate surface indices through
>>> SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't
>>> constant propagate out of, and further, we don't constant propagate into
>>> the typed/untype read/write opcodes at all.  The end result is that all
>>> typed/untyped read/write/atomics end up as indirect sends.
>>>
>>> Code generation should always produce either an immediate or an actual
>>> indirect surface index, so we can fix this by just special casing
>>> immediates in emit_uniformize.
>>>
>>> Signed-off-by: Kristian Høgsberg Kristensen 
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h 
>>> b/src/mesa/drivers/dri/i965/brw_fs_builder.h
>>> index df10a9d..98ce71e 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
>>> @@ -394,6 +394,9 @@ namespace brw {
>>>   const dst_reg chan_index = component(vgrf(BRW_REGISTER_TYPE_UD), 
>>> 0);
>>>   const dst_reg dst = component(vgrf(src.type), 0);
>>>
>>> + if (src.file == IMM)
>>> +return src;
>>> +
>>>   ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index);
>>>   ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index);
>>>
>>> --
>>> 2.6.2
>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
> [1] http://lists.freedesktop.org/archives/mesa-dev/2015-October/097085.html
> [2] 
>