Re: [Mesa-dev] [PATCH] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-10-10 Thread Jason Ekstrand
On Tue, Oct 10, 2017 at 1:10 PM, Jason Ekstrand 
wrote:

> On Tue, Oct 10, 2017 at 9:16 AM, Connor Abbott 
> wrote:
>
>> I'm a little nervous about this, because really, the only solution to
>> this problem is to ignore all non-WE_all definitions of all variables
>> in liveness analysis. For example, in something like:
>>
>> vec4 color2 = ...
>> if (...) {
>>color2 = texture();
>> }
>>
>> texture() can also overwrite inactive channels of color2. We happen to
>> get this right because we turn live ranges into live intervals without
>> holes, but I can't come up with a good reason why that would save us
>> in all cases except the one in this patch -- which makes me worry that
>> we'll find yet another case where there's a similar problem. I think
>> it would be clearer if we what I said above, i.e. ignore all
>> non-WE_all definitions, which will make things much worse, but then
>> apply Curro's patch which will return things to pretty much how they
>> were before, except this case will be fixed and maybe some other cases
>> we haven't thought of.
>>
>
> What you're suggesting may actually be less code and is arguably better in
> terms of being more straightforward.  However, I think intervals plus this
> patch is equivalent.  Curro's patch + always-partial will cause us to start
> the live range at the IP where it first *may* be defined and we keep the
> behavior of ending the live range at the last IP where some reachable
> instruction may use it.  With my patch + Curro's, we start the live range
> at the IP where it is first defined which will always all places it *may*
> be defined unless there is a back edge.  If there is a back-edge, I pull
> the live range up across the back edge.
>
> That said, I think I agree with you that my solution treats it as a
> special case and not a general problem.  However, I'm relucant to just
> change liveness analysis to assume partial writes because we use it for
> more than computing variable interference.  In particular, we use it for
> dead code elimination and the concept of partial/complete writes is crucial
> there.  I've got another patch cooking which I'll send out soon which
> should make you happier with it.
>

Forget the other patch I have cooking.  It came out burned. :-(  At the
moment, I have no better plan than the one in this patch.  Every "more
general" thing I've tried today has ended up extending the live ranges far
further than needed and making hash of shader-db.  The more I think about
it, the more I only see two solutions:

 1) Take advantage of structure like this patch does.  I really don't think
there's anything wrong, given that we have a structured IR, in thinking of
it in a structured way.

 2) Go fully for the dual-liveness model. Have what amounts to a virtual
"physical" CFG which is just the logical CFG with some edges added and have
"physical" livein/out sets which model the conflicting version.  The tricky
part would be to get the physical CFG right.  You would need at least:

a) Edges from each break to the block after the break like the first
attempt

b) Same thing for the continues

c) Edges from the then block into the else block (this shouldn't
actually matter thanks to intervals but we may as well)

d) An edge from the top of every loop to the bottom to simulate going
through the entire loop as a disabled channel.  You can think of this as
putting the entire contents of the loop in a virtual "if" statement.

I believe what was missing from the patch I sent out first was actually
edge d.  It's all well and good to get to the point where the def is
available at the top of the loop (which a) does) but it's another thing
entirely to say that there's a path from the top to the bottom where
nothing in the middle has a chance to kill it.  The thing I don't like
about option 2 is that it's still taking advantage of structure, it's just
adds enough compiler theory language on top to confuse things to the point
where you almost can't see it.  We may be able to come up with a model
that's truly unstructured but I'd need to think long and hard about how
unstructured SIMD control flow works.

Thoughts?  Curro, I'm happy for you to chime in too now that you're back.

--Jason


> --Jason
>
>
>>
>> On Thu, Oct 5, 2017 at 2:52 PM, Jason Ekstrand 
>> wrote:
>> > No Shader-db changes.
>> >
>> > Cc: mesa-sta...@lists.freedesktop.org
>> > ---
>> >  src/intel/compiler/brw_fs_live_variables.cpp | 55
>> 
>> >  1 file changed, 55 insertions(+)
>> >
>> > diff --git a/src/intel/compiler/brw_fs_live_variables.cpp
>> b/src/intel/compiler/brw_fs_live_variables.cpp
>> > index c449672..380060d 100644
>> > --- a/src/intel/compiler/brw_fs_live_variables.cpp
>> > +++ b/src/intel/compiler/brw_fs_live_variables.cpp
>> > @@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
>> >   }
>> >}
>> > }
>> > +
>> > +   /* Due to the 

Re: [Mesa-dev] [PATCH] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-10-10 Thread Jason Ekstrand
On Tue, Oct 10, 2017 at 9:16 AM, Connor Abbott  wrote:

> I'm a little nervous about this, because really, the only solution to
> this problem is to ignore all non-WE_all definitions of all variables
> in liveness analysis. For example, in something like:
>
> vec4 color2 = ...
> if (...) {
>color2 = texture();
> }
>
> texture() can also overwrite inactive channels of color2. We happen to
> get this right because we turn live ranges into live intervals without
> holes, but I can't come up with a good reason why that would save us
> in all cases except the one in this patch -- which makes me worry that
> we'll find yet another case where there's a similar problem. I think
> it would be clearer if we what I said above, i.e. ignore all
> non-WE_all definitions, which will make things much worse, but then
> apply Curro's patch which will return things to pretty much how they
> were before, except this case will be fixed and maybe some other cases
> we haven't thought of.
>

What you're suggesting may actually be less code and is arguably better in
terms of being more straightforward.  However, I think intervals plus this
patch is equivalent.  Curro's patch + always-partial will cause us to start
the live range at the IP where it first *may* be defined and we keep the
behavior of ending the live range at the last IP where some reachable
instruction may use it.  With my patch + Curro's, we start the live range
at the IP where it is first defined which will always all places it *may*
be defined unless there is a back edge.  If there is a back-edge, I pull
the live range up across the back edge.

That said, I think I agree with you that my solution treats it as a special
case and not a general problem.  However, I'm relucant to just change
liveness analysis to assume partial writes because we use it for more than
computing variable interference.  In particular, we use it for dead code
elimination and the concept of partial/complete writes is crucial there.
I've got another patch cooking which I'll send out soon which should make
you happier with it.

--Jason


>
> On Thu, Oct 5, 2017 at 2:52 PM, Jason Ekstrand 
> wrote:
> > No Shader-db changes.
> >
> > Cc: mesa-sta...@lists.freedesktop.org
> > ---
> >  src/intel/compiler/brw_fs_live_variables.cpp | 55
> 
> >  1 file changed, 55 insertions(+)
> >
> > diff --git a/src/intel/compiler/brw_fs_live_variables.cpp
> b/src/intel/compiler/brw_fs_live_variables.cpp
> > index c449672..380060d 100644
> > --- a/src/intel/compiler/brw_fs_live_variables.cpp
> > +++ b/src/intel/compiler/brw_fs_live_variables.cpp
> > @@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
> >   }
> >}
> > }
> > +
> > +   /* Due to the explicit way the SIMD data is handled on GEN, we need
> to be a
> > +* bit more careful with live ranges and loops.  Consider the
> following
> > +* example:
> > +*
> > +*vec4 color2;
> > +*while (1) {
> > +*   vec4 color = texture();
> > +*   if (...) {
> > +*  color2 = color * 2;
> > +*  break;
> > +*   }
> > +*}
> > +*gl_FragColor = color2;
> > +*
> > +* In this case, the definition of color2 dominates the use because
> the
> > +* loop only has the one exit.  This means that the live range
> interval for
> > +* color2 goes from the statement in the if to it's use below the
> loop.
> > +* Now suppose that the texture operation has a header register that
> gets
> > +* assigned one of the registers used for color2.  If the loop
> condition is
> > +* non-uniform and some of the threads will take the and others will
> > +* continue.  In this case, the next pass through the loop, the
> WE_all
> > +* setup of the header register will stomp the disabled channels of
> color2
> > +* and corrupt the value.
> > +*
> > +* This same problem can occur if you have a mix of 64, 32, and
> 16-bit
> > +* registers because the channels do not line up or if you have a
> SIMD16
> > +* program and the first half of one value overlaps the second half
> of the
> > +* other.
> > +*
> > +* To solve this problem, we take any VGRFs whose live ranges cross
> the
> > +* while instruction of a loop and extend their live ranges to the
> top of
> > +* the loop.  This more accurately models the hardware because the
> value in
> > +* the VGRF needs to be carried through subsequent loop iterations
> in order
> > +* to remain valid when we finally do break.
> > +*/
> > +   foreach_block (block, cfg) {
> > +  if (block->end()->opcode != BRW_OPCODE_WHILE)
> > + continue;
> > +
> > +  /* This is a WHILE instrution. Find the DO block. */
> > +  bblock_t *do_block = NULL;
> > +  foreach_list_typed(bblock_link, child_link, link,
> >children) {
> > + if (child_link->block->start_ip < block->end_ip) {
> > 

Re: [Mesa-dev] [PATCH] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-10-10 Thread Connor Abbott
No, this is a different situation. On i965, the hardware keeps track
of that stuff automatically. The problem is that the header, which is
shared across all the threads in a wavefront and specifies stuff like
LOD, LOD bias, texture array offset, etc., uses the same register
space as normal, vector registers. So, to set it up before we do the
texture, we have to disable the usual exec masking stuff and just
write to certain channels of some random register indiscriminately,
regardless of whether the execution mask is enabled for them. This
then leads to the problems described.

I'm not sure how nvidia handles it, but they probably do something
much more sane like AMD. Although it turns out that you probably have
to deal with this problem anyways when implementing the new subgroup
reduction stuff, since the implementation usually involves some kind
of exec-mask-ignoring write.


On Tue, Oct 10, 2017 at 12:29 PM, Ilia Mirkin  wrote:
> I hope I'm not butting in too much with irrelevant info, but I think
> we had a similar issue in nouveau. On Kepler, texture instructions
> take an arbitrary amount of time to complete, and only write into
> destination registers on completion, while other instructions are
> executing after that tex dispatch. [You have to insert a barrier to
> force a wait for the tex to complete.]
>
> We had a clever thing that figured things out based on texture result
> uses and worked for reasonable cases. However the tricky case to
> handle turned out to be
>
> color = texture();
> if (a) {
>   use color
> } else {
>   dont use color
> }
>
> In that situation, the texture call would randomly overwrite registers
> when we went into the else case, since nothing used the texture
> results and that wasn't properly tracked.
>
> I know what you're going to say - just code-motion the texture into
> the if. But that's not always possible -- the actual original
> situation also added a loops with conditional texture calls to
> complicate matters.
>
> Not sure if this is exactly your situation, but thought I'd point it
> out as it may be relevant.
>
> Cheers,
>
>   -ilia
>
> On Tue, Oct 10, 2017 at 12:16 PM, Connor Abbott  wrote:
>> I'm a little nervous about this, because really, the only solution to
>> this problem is to ignore all non-WE_all definitions of all variables
>> in liveness analysis. For example, in something like:
>>
>> vec4 color2 = ...
>> if (...) {
>>color2 = texture();
>> }
>>
>> texture() can also overwrite inactive channels of color2. We happen to
>> get this right because we turn live ranges into live intervals without
>> holes, but I can't come up with a good reason why that would save us
>> in all cases except the one in this patch -- which makes me worry that
>> we'll find yet another case where there's a similar problem. I think
>> it would be clearer if we what I said above, i.e. ignore all
>> non-WE_all definitions, which will make things much worse, but then
>> apply Curro's patch which will return things to pretty much how they
>> were before, except this case will be fixed and maybe some other cases
>> we haven't thought of.
>>
>>
>>
>> On Thu, Oct 5, 2017 at 2:52 PM, Jason Ekstrand  wrote:
>>> No Shader-db changes.
>>>
>>> Cc: mesa-sta...@lists.freedesktop.org
>>> ---
>>>  src/intel/compiler/brw_fs_live_variables.cpp | 55 
>>> 
>>>  1 file changed, 55 insertions(+)
>>>
>>> diff --git a/src/intel/compiler/brw_fs_live_variables.cpp 
>>> b/src/intel/compiler/brw_fs_live_variables.cpp
>>> index c449672..380060d 100644
>>> --- a/src/intel/compiler/brw_fs_live_variables.cpp
>>> +++ b/src/intel/compiler/brw_fs_live_variables.cpp
>>> @@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
>>>   }
>>>}
>>> }
>>> +
>>> +   /* Due to the explicit way the SIMD data is handled on GEN, we need to 
>>> be a
>>> +* bit more careful with live ranges and loops.  Consider the following
>>> +* example:
>>> +*
>>> +*vec4 color2;
>>> +*while (1) {
>>> +*   vec4 color = texture();
>>> +*   if (...) {
>>> +*  color2 = color * 2;
>>> +*  break;
>>> +*   }
>>> +*}
>>> +*gl_FragColor = color2;
>>> +*
>>> +* In this case, the definition of color2 dominates the use because the
>>> +* loop only has the one exit.  This means that the live range interval 
>>> for
>>> +* color2 goes from the statement in the if to it's use below the loop.
>>> +* Now suppose that the texture operation has a header register that 
>>> gets
>>> +* assigned one of the registers used for color2.  If the loop 
>>> condition is
>>> +* non-uniform and some of the threads will take the and others will
>>> +* continue.  In this case, the next pass through the loop, the WE_all
>>> +* setup of the header register will stomp the disabled channels of 
>>> color2
>>> +* and corrupt 

Re: [Mesa-dev] [PATCH] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-10-10 Thread Ilia Mirkin
I hope I'm not butting in too much with irrelevant info, but I think
we had a similar issue in nouveau. On Kepler, texture instructions
take an arbitrary amount of time to complete, and only write into
destination registers on completion, while other instructions are
executing after that tex dispatch. [You have to insert a barrier to
force a wait for the tex to complete.]

We had a clever thing that figured things out based on texture result
uses and worked for reasonable cases. However the tricky case to
handle turned out to be

color = texture();
if (a) {
  use color
} else {
  dont use color
}

In that situation, the texture call would randomly overwrite registers
when we went into the else case, since nothing used the texture
results and that wasn't properly tracked.

I know what you're going to say - just code-motion the texture into
the if. But that's not always possible -- the actual original
situation also added a loops with conditional texture calls to
complicate matters.

Not sure if this is exactly your situation, but thought I'd point it
out as it may be relevant.

Cheers,

  -ilia

On Tue, Oct 10, 2017 at 12:16 PM, Connor Abbott  wrote:
> I'm a little nervous about this, because really, the only solution to
> this problem is to ignore all non-WE_all definitions of all variables
> in liveness analysis. For example, in something like:
>
> vec4 color2 = ...
> if (...) {
>color2 = texture();
> }
>
> texture() can also overwrite inactive channels of color2. We happen to
> get this right because we turn live ranges into live intervals without
> holes, but I can't come up with a good reason why that would save us
> in all cases except the one in this patch -- which makes me worry that
> we'll find yet another case where there's a similar problem. I think
> it would be clearer if we what I said above, i.e. ignore all
> non-WE_all definitions, which will make things much worse, but then
> apply Curro's patch which will return things to pretty much how they
> were before, except this case will be fixed and maybe some other cases
> we haven't thought of.
>
>
>
> On Thu, Oct 5, 2017 at 2:52 PM, Jason Ekstrand  wrote:
>> No Shader-db changes.
>>
>> Cc: mesa-sta...@lists.freedesktop.org
>> ---
>>  src/intel/compiler/brw_fs_live_variables.cpp | 55 
>> 
>>  1 file changed, 55 insertions(+)
>>
>> diff --git a/src/intel/compiler/brw_fs_live_variables.cpp 
>> b/src/intel/compiler/brw_fs_live_variables.cpp
>> index c449672..380060d 100644
>> --- a/src/intel/compiler/brw_fs_live_variables.cpp
>> +++ b/src/intel/compiler/brw_fs_live_variables.cpp
>> @@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
>>   }
>>}
>> }
>> +
>> +   /* Due to the explicit way the SIMD data is handled on GEN, we need to 
>> be a
>> +* bit more careful with live ranges and loops.  Consider the following
>> +* example:
>> +*
>> +*vec4 color2;
>> +*while (1) {
>> +*   vec4 color = texture();
>> +*   if (...) {
>> +*  color2 = color * 2;
>> +*  break;
>> +*   }
>> +*}
>> +*gl_FragColor = color2;
>> +*
>> +* In this case, the definition of color2 dominates the use because the
>> +* loop only has the one exit.  This means that the live range interval 
>> for
>> +* color2 goes from the statement in the if to it's use below the loop.
>> +* Now suppose that the texture operation has a header register that gets
>> +* assigned one of the registers used for color2.  If the loop condition 
>> is
>> +* non-uniform and some of the threads will take the and others will
>> +* continue.  In this case, the next pass through the loop, the WE_all
>> +* setup of the header register will stomp the disabled channels of 
>> color2
>> +* and corrupt the value.
>> +*
>> +* This same problem can occur if you have a mix of 64, 32, and 16-bit
>> +* registers because the channels do not line up or if you have a SIMD16
>> +* program and the first half of one value overlaps the second half of 
>> the
>> +* other.
>> +*
>> +* To solve this problem, we take any VGRFs whose live ranges cross the
>> +* while instruction of a loop and extend their live ranges to the top of
>> +* the loop.  This more accurately models the hardware because the value 
>> in
>> +* the VGRF needs to be carried through subsequent loop iterations in 
>> order
>> +* to remain valid when we finally do break.
>> +*/
>> +   foreach_block (block, cfg) {
>> +  if (block->end()->opcode != BRW_OPCODE_WHILE)
>> + continue;
>> +
>> +  /* This is a WHILE instrution. Find the DO block. */
>> +  bblock_t *do_block = NULL;
>> +  foreach_list_typed(bblock_link, child_link, link, >children) {
>> + if (child_link->block->start_ip < block->end_ip) {
>> +assert(do_block == NULL);
>> +

Re: [Mesa-dev] [PATCH] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-10-10 Thread Connor Abbott
I'm a little nervous about this, because really, the only solution to
this problem is to ignore all non-WE_all definitions of all variables
in liveness analysis. For example, in something like:

vec4 color2 = ...
if (...) {
   color2 = texture();
}

texture() can also overwrite inactive channels of color2. We happen to
get this right because we turn live ranges into live intervals without
holes, but I can't come up with a good reason why that would save us
in all cases except the one in this patch -- which makes me worry that
we'll find yet another case where there's a similar problem. I think
it would be clearer if we what I said above, i.e. ignore all
non-WE_all definitions, which will make things much worse, but then
apply Curro's patch which will return things to pretty much how they
were before, except this case will be fixed and maybe some other cases
we haven't thought of.



On Thu, Oct 5, 2017 at 2:52 PM, Jason Ekstrand  wrote:
> No Shader-db changes.
>
> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  src/intel/compiler/brw_fs_live_variables.cpp | 55 
> 
>  1 file changed, 55 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs_live_variables.cpp 
> b/src/intel/compiler/brw_fs_live_variables.cpp
> index c449672..380060d 100644
> --- a/src/intel/compiler/brw_fs_live_variables.cpp
> +++ b/src/intel/compiler/brw_fs_live_variables.cpp
> @@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
>   }
>}
> }
> +
> +   /* Due to the explicit way the SIMD data is handled on GEN, we need to be 
> a
> +* bit more careful with live ranges and loops.  Consider the following
> +* example:
> +*
> +*vec4 color2;
> +*while (1) {
> +*   vec4 color = texture();
> +*   if (...) {
> +*  color2 = color * 2;
> +*  break;
> +*   }
> +*}
> +*gl_FragColor = color2;
> +*
> +* In this case, the definition of color2 dominates the use because the
> +* loop only has the one exit.  This means that the live range interval 
> for
> +* color2 goes from the statement in the if to it's use below the loop.
> +* Now suppose that the texture operation has a header register that gets
> +* assigned one of the registers used for color2.  If the loop condition 
> is
> +* non-uniform and some of the threads will take the and others will
> +* continue.  In this case, the next pass through the loop, the WE_all
> +* setup of the header register will stomp the disabled channels of color2
> +* and corrupt the value.
> +*
> +* This same problem can occur if you have a mix of 64, 32, and 16-bit
> +* registers because the channels do not line up or if you have a SIMD16
> +* program and the first half of one value overlaps the second half of the
> +* other.
> +*
> +* To solve this problem, we take any VGRFs whose live ranges cross the
> +* while instruction of a loop and extend their live ranges to the top of
> +* the loop.  This more accurately models the hardware because the value 
> in
> +* the VGRF needs to be carried through subsequent loop iterations in 
> order
> +* to remain valid when we finally do break.
> +*/
> +   foreach_block (block, cfg) {
> +  if (block->end()->opcode != BRW_OPCODE_WHILE)
> + continue;
> +
> +  /* This is a WHILE instrution. Find the DO block. */
> +  bblock_t *do_block = NULL;
> +  foreach_list_typed(bblock_link, child_link, link, >children) {
> + if (child_link->block->start_ip < block->end_ip) {
> +assert(do_block == NULL);
> +do_block = child_link->block;
> + }
> +  }
> +  assert(do_block);
> +
> +  for (int i = 0; i < num_vars; i++) {
> + if (start[i] < block->end_ip && end[i] > block->end_ip)
> +start[i] = MIN2(start[i], do_block->start_ip);
> +  }
> +   }
>  }
>
>  fs_live_variables::fs_live_variables(fs_visitor *v, const cfg_t *cfg)
> --
> 2.5.0.400.gff86faf
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-10-10 Thread Chema Casanova
With this patch applied I can not reproduce anymore the regression
related to cross-channel variable interference in non-uniformly executed
loops exposed at
dEQP-VK.glsl.return.return_in_dynamic_loop_dynamic_vertex when applying
Curro's liveness patch

Tested-by: Jose Maria Casanova Crespo 

On 05/10/17 20:52, Jason Ekstrand wrote:
> No Shader-db changes.
> 
> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  src/intel/compiler/brw_fs_live_variables.cpp | 55 
> 
>  1 file changed, 55 insertions(+)
> 
> diff --git a/src/intel/compiler/brw_fs_live_variables.cpp 
> b/src/intel/compiler/brw_fs_live_variables.cpp
> index c449672..380060d 100644
> --- a/src/intel/compiler/brw_fs_live_variables.cpp
> +++ b/src/intel/compiler/brw_fs_live_variables.cpp
> @@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
>   }
>}
> }
> +
> +   /* Due to the explicit way the SIMD data is handled on GEN, we need to be 
> a
> +* bit more careful with live ranges and loops.  Consider the following
> +* example:
> +*
> +*vec4 color2;
> +*while (1) {
> +*   vec4 color = texture();
> +*   if (...) {
> +*  color2 = color * 2;
> +*  break;
> +*   }
> +*}
> +*gl_FragColor = color2;
> +*
> +* In this case, the definition of color2 dominates the use because the
> +* loop only has the one exit.  This means that the live range interval 
> for
> +* color2 goes from the statement in the if to it's use below the loop.
> +* Now suppose that the texture operation has a header register that gets
> +* assigned one of the registers used for color2.  If the loop condition 
> is
> +* non-uniform and some of the threads will take the and others will
> +* continue.  In this case, the next pass through the loop, the WE_all
> +* setup of the header register will stomp the disabled channels of color2
> +* and corrupt the value.
> +*
> +* This same problem can occur if you have a mix of 64, 32, and 16-bit
> +* registers because the channels do not line up or if you have a SIMD16
> +* program and the first half of one value overlaps the second half of the
> +* other.
> +*
> +* To solve this problem, we take any VGRFs whose live ranges cross the
> +* while instruction of a loop and extend their live ranges to the top of
> +* the loop.  This more accurately models the hardware because the value 
> in
> +* the VGRF needs to be carried through subsequent loop iterations in 
> order
> +* to remain valid when we finally do break.
> +*/
> +   foreach_block (block, cfg) {
> +  if (block->end()->opcode != BRW_OPCODE_WHILE)
> + continue;
> +
> +  /* This is a WHILE instrution. Find the DO block. */
> +  bblock_t *do_block = NULL;
> +  foreach_list_typed(bblock_link, child_link, link, >children) {
> + if (child_link->block->start_ip < block->end_ip) {
> +assert(do_block == NULL);
> +do_block = child_link->block;
> + }
> +  }
> +  assert(do_block);
> +
> +  for (int i = 0; i < num_vars; i++) {
> + if (start[i] < block->end_ip && end[i] > block->end_ip)
> +start[i] = MIN2(start[i], do_block->start_ip);
> +  }
> +   }
>  }
>  
>  fs_live_variables::fs_live_variables(fs_visitor *v, const cfg_t *cfg)
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-10-05 Thread Jason Ekstrand
No Shader-db changes.

Cc: mesa-sta...@lists.freedesktop.org
---
 src/intel/compiler/brw_fs_live_variables.cpp | 55 
 1 file changed, 55 insertions(+)

diff --git a/src/intel/compiler/brw_fs_live_variables.cpp 
b/src/intel/compiler/brw_fs_live_variables.cpp
index c449672..380060d 100644
--- a/src/intel/compiler/brw_fs_live_variables.cpp
+++ b/src/intel/compiler/brw_fs_live_variables.cpp
@@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
  }
   }
}
+
+   /* Due to the explicit way the SIMD data is handled on GEN, we need to be a
+* bit more careful with live ranges and loops.  Consider the following
+* example:
+*
+*vec4 color2;
+*while (1) {
+*   vec4 color = texture();
+*   if (...) {
+*  color2 = color * 2;
+*  break;
+*   }
+*}
+*gl_FragColor = color2;
+*
+* In this case, the definition of color2 dominates the use because the
+* loop only has the one exit.  This means that the live range interval for
+* color2 goes from the statement in the if to it's use below the loop.
+* Now suppose that the texture operation has a header register that gets
+* assigned one of the registers used for color2.  If the loop condition is
+* non-uniform and some of the threads will take the and others will
+* continue.  In this case, the next pass through the loop, the WE_all
+* setup of the header register will stomp the disabled channels of color2
+* and corrupt the value.
+*
+* This same problem can occur if you have a mix of 64, 32, and 16-bit
+* registers because the channels do not line up or if you have a SIMD16
+* program and the first half of one value overlaps the second half of the
+* other.
+*
+* To solve this problem, we take any VGRFs whose live ranges cross the
+* while instruction of a loop and extend their live ranges to the top of
+* the loop.  This more accurately models the hardware because the value in
+* the VGRF needs to be carried through subsequent loop iterations in order
+* to remain valid when we finally do break.
+*/
+   foreach_block (block, cfg) {
+  if (block->end()->opcode != BRW_OPCODE_WHILE)
+ continue;
+
+  /* This is a WHILE instrution. Find the DO block. */
+  bblock_t *do_block = NULL;
+  foreach_list_typed(bblock_link, child_link, link, >children) {
+ if (child_link->block->start_ip < block->end_ip) {
+assert(do_block == NULL);
+do_block = child_link->block;
+ }
+  }
+  assert(do_block);
+
+  for (int i = 0; i < num_vars; i++) {
+ if (start[i] < block->end_ip && end[i] > block->end_ip)
+start[i] = MIN2(start[i], do_block->start_ip);
+  }
+   }
 }
 
 fs_live_variables::fs_live_variables(fs_visitor *v, const cfg_t *cfg)
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-10-04 Thread Jason Ekstrand
Bah!  This one's bogus too.  I think it messes up register coalesce but I'm
not 100% sure...

On Wed, Oct 4, 2017 at 8:22 PM, Jason Ekstrand  wrote:

> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  src/intel/compiler/brw_fs_live_variables.cpp | 55
> 
>  1 file changed, 55 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs_live_variables.cpp
> b/src/intel/compiler/brw_fs_live_variables.cpp
> index c449672..23ec280 100644
> --- a/src/intel/compiler/brw_fs_live_variables.cpp
> +++ b/src/intel/compiler/brw_fs_live_variables.cpp
> @@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
>   }
>}
> }
> +
> +   /* Due to the explicit way the SIMD data is handled on GEN, we need to
> be a
> +* bit more careful with live ranges and loops.  Consider the following
> +* example:
> +*
> +*vec4 color2;
> +*while (1) {
> +*   vec4 color = texture();
> +*   if (...) {
> +*  color2 = color * 2;
> +*  break;
> +*   }
> +*}
> +*gl_FragColor = color2;
> +*
> +* In this case, the definition of color2 dominates the use because the
> +* loop only has the one exit.  This means that the live range
> interval for
> +* color2 goes from the statement in the if to it's use below the loop.
> +* Now suppose that the texture operation has a header register that
> gets
> +* assigned one of the registers used for color2.  If the loop
> condition is
> +* non-uniform and some of the threads will take the and others will
> +* continue.  In this case, the next pass through the loop, the WE_all
> +* setup of the header register will stomp the disabled channels of
> color2
> +* and corrupt the value.
> +*
> +* This same problem can occur if you have a mix of 64, 32, and 16-bit
> +* registers because the channels do not line up or if you have a
> SIMD16
> +* program and the first half of one value overlaps the second half of
> the
> +* other.
> +*
> +* To solve this problem, we take any VGRFs whose live ranges cross the
> +* while instruction of a loop and extend their live ranges to the top
> of
> +* the loop.  This more accurately models the hardware because the
> value in
> +* the VGRF needs to be carried through subsequent loop iterations in
> order
> +* to remain valid when we finally do break.
> +*/
> +   foreach_block (block, cfg) {
> +  if (block->end()->opcode != BRW_OPCODE_WHILE)
> + continue;
> +
> +  /* This is a WHILE instrution. Find the DO block. */
> +  bblock_t *do_block = NULL;
> +  foreach_list_typed(bblock_link, child_link, link,
> >children) {
> + if (child_link->block->start_ip < block->end_ip) {
> +assert(do_block == NULL);
> +do_block = child_link->block;
> + }
> +  }
> +  assert(do_block);
> +
> +  for (int i = 0; i < num_vars; i++) {
> + if (start[i] < block->end_ip && end[i] > block->end_ip)
> +start[i] = do_block->start_ip;
> +  }
> +   }
>  }
>
>  fs_live_variables::fs_live_variables(fs_visitor *v, const cfg_t *cfg)
> --
> 2.5.0.400.gff86faf
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-10-04 Thread Jason Ekstrand
Cc: mesa-sta...@lists.freedesktop.org
---
 src/intel/compiler/brw_fs_live_variables.cpp | 55 
 1 file changed, 55 insertions(+)

diff --git a/src/intel/compiler/brw_fs_live_variables.cpp 
b/src/intel/compiler/brw_fs_live_variables.cpp
index c449672..23ec280 100644
--- a/src/intel/compiler/brw_fs_live_variables.cpp
+++ b/src/intel/compiler/brw_fs_live_variables.cpp
@@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
  }
   }
}
+
+   /* Due to the explicit way the SIMD data is handled on GEN, we need to be a
+* bit more careful with live ranges and loops.  Consider the following
+* example:
+*
+*vec4 color2;
+*while (1) {
+*   vec4 color = texture();
+*   if (...) {
+*  color2 = color * 2;
+*  break;
+*   }
+*}
+*gl_FragColor = color2;
+*
+* In this case, the definition of color2 dominates the use because the
+* loop only has the one exit.  This means that the live range interval for
+* color2 goes from the statement in the if to it's use below the loop.
+* Now suppose that the texture operation has a header register that gets
+* assigned one of the registers used for color2.  If the loop condition is
+* non-uniform and some of the threads will take the and others will
+* continue.  In this case, the next pass through the loop, the WE_all
+* setup of the header register will stomp the disabled channels of color2
+* and corrupt the value.
+*
+* This same problem can occur if you have a mix of 64, 32, and 16-bit
+* registers because the channels do not line up or if you have a SIMD16
+* program and the first half of one value overlaps the second half of the
+* other.
+*
+* To solve this problem, we take any VGRFs whose live ranges cross the
+* while instruction of a loop and extend their live ranges to the top of
+* the loop.  This more accurately models the hardware because the value in
+* the VGRF needs to be carried through subsequent loop iterations in order
+* to remain valid when we finally do break.
+*/
+   foreach_block (block, cfg) {
+  if (block->end()->opcode != BRW_OPCODE_WHILE)
+ continue;
+
+  /* This is a WHILE instrution. Find the DO block. */
+  bblock_t *do_block = NULL;
+  foreach_list_typed(bblock_link, child_link, link, >children) {
+ if (child_link->block->start_ip < block->end_ip) {
+assert(do_block == NULL);
+do_block = child_link->block;
+ }
+  }
+  assert(do_block);
+
+  for (int i = 0; i < num_vars; i++) {
+ if (start[i] < block->end_ip && end[i] > block->end_ip)
+start[i] = do_block->start_ip;
+  }
+   }
 }
 
 fs_live_variables::fs_live_variables(fs_visitor *v, const cfg_t *cfg)
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev