Re: [Mesa-dev] [PATCH 3/4] intel/cfg: Always add both successors to a break

2017-10-04 Thread Jason Ekstrand
New patch on the list.

On Wed, Oct 4, 2017 at 7:42 PM, Jason Ekstrand  wrote:

> On Wed, Oct 4, 2017 at 5:35 PM, Jason Ekstrand 
> wrote:
>
>> On Wed, Oct 4, 2017 at 5:29 PM, Connor Abbott 
>> wrote:
>>
>>> This won't completely solve the problem. For example, what if you
>>> hoist the assignment to color2 outside the loop?
>>>
>>> vec4 color2;
>>> while (1) {
>>>vec4 color = texture();
>>>color2 = color * 2;
>>>if (...) {
>>>   break;
>>>}
>>> }
>>> gl_FragColor = color2;
>>>
>>>
>>> Now the definition still dominates the use, even with the modified
>>> control-flow graph, and you have the same problem
>>
>>
>> Curro had me convinced that some detail of the liveness analysis pass
>> saved us here but now I can't remember what. :-(
>>
>>
>>> The real problem is
>>> that the assignment to color2 is really a conditional assignment: if
>>> we're going channel-by-channel, it's not, but if you consider the
>>> *whole* register at the same time, it is. To really fix the problem,
>>> you need to model exactly what the machine actually does: you need to
>>> insert "fake" edges like these, that model the jumps that the machine
>>> can take, and you need to make every assignment a conditional
>>> assignment (i.e. it doesn't kill the register). It's probably not as
>>> bad with Curro's patch on top, though. Also, once you do this you can
>>> make register allocation more accurate by generating interferences
>>> from the liveness information directly instead of from the intervals.
>>>
>>> One thing I've thought about is, in addition to maintaining this
>>> "whole-vector" view of things, is to maintain a "per-channel" liveness
>>> that doesn't use the extra edges, partial definitions etc. and then
>>> use the "per-channel view" to calculate interference when the channels
>>> always line up.
>>>
>>
>> Yes, we've considered that and it's a good idea.  However, I'm trying to
>> fix bugs right now, not write the world's best liveness analysis pass. :-)
>>
>
> You're correct, as usual... I've inspected the result of liveness anlaysis
> and we do indeed get it wrong.  I'll come up with something less bogus.
>
>
>> On Wed, Oct 4, 2017 at 7:58 PM, Jason Ekstrand 
>>> wrote:
>>> > Shader-db results on Sky Lake:
>>> >
>>> > total instructions in shared programs: 12955125 -> 12953698
>>> (-0.01%)
>>> > instructions in affected programs: 55956 -> 54529 (-2.55%)
>>> > helped: 6
>>> > HURT: 38
>>> >
>>> > All of the hurt programs were hurt by exactly one instruction because
>>> > this patch affects copy propagation.  Most of the helped instructions
>>> > came from a single orbital explorer shader that was helped by 14.26%
>>> >
>>> > Cc: mesa-sta...@lists.freedesktop.org
>>> > ---
>>> >  src/intel/compiler/brw_cfg.cpp | 37 ++
>>> +--
>>> >  1 file changed, 35 insertions(+), 2 deletions(-)
>>> >
>>> > diff --git a/src/intel/compiler/brw_cfg.cpp
>>> b/src/intel/compiler/brw_cfg.cpp
>>> > index fad12ee..d8bf725 100644
>>> > --- a/src/intel/compiler/brw_cfg.cpp
>>> > +++ b/src/intel/compiler/brw_cfg.cpp
>>> > @@ -289,9 +289,42 @@ cfg_t::cfg_t(exec_list *instructions)
>>> >   assert(cur_while != NULL);
>>> >  cur->add_successor(mem_ctx, cur_while);
>>> >
>>> > + /* We also add the next block as a successor of the break.
>>> If the
>>> > +  * break is predicated, we need to do this because the break
>>> may not
>>> > +  * be taken.  If the break is not predicated, we add it
>>> anyway so
>>> > +  * that our live intervals computations will operate as if
>>> the break
>>> > +  * may or may not be taken.  Consider the following example:
>>> > +  *
>>> > +  *vec4 color2;
>>> > +  *while (1) {
>>> > +  *   vec4 color = texture();
>>> > +  *   if (...) {
>>> > +  *  color2 = color * 2;
>>> > +  *  break;
>>> > +  *   }
>>> > +  *}
>>> > +  *gl_FragColor = color2;
>>> > +  *
>>> > +  * In this case, the definition of color2 dominates the use
>>> because
>>> > +  * the loop only has the one exit.  This means that the live
>>> range
>>> > +  * interval for color2 goes from the statement in the if to
>>> it's use
>>> > +  * below the loop.  Now suppose that the texture operation
>>> has a
>>> > +  * header register that gets assigned one of the registers
>>> used for
>>> > +  * color2.  If the loop condition is non-uniform and some of
>>> the
>>> > +  * threads will take the break and others will continue.  In
>>> this
>>> > +  * case, the next pass through the loop, the WE_all setup of
>>> the
>>> > +  * header register will stomp the disabled channels of
>>> color2 and
>>> > +  * corrupt the value.
>>> > +  *
>>> > +  * This same problem can occur if you have a mix of 64, 32,
>>> and
>>> > +

Re: [Mesa-dev] [PATCH 3/4] intel/cfg: Always add both successors to a break

2017-10-04 Thread Jason Ekstrand
On Wed, Oct 4, 2017 at 5:35 PM, Jason Ekstrand  wrote:

> On Wed, Oct 4, 2017 at 5:29 PM, Connor Abbott  wrote:
>
>> This won't completely solve the problem. For example, what if you
>> hoist the assignment to color2 outside the loop?
>>
>> vec4 color2;
>> while (1) {
>>vec4 color = texture();
>>color2 = color * 2;
>>if (...) {
>>   break;
>>}
>> }
>> gl_FragColor = color2;
>>
>>
>> Now the definition still dominates the use, even with the modified
>> control-flow graph, and you have the same problem
>
>
> Curro had me convinced that some detail of the liveness analysis pass
> saved us here but now I can't remember what. :-(
>
>
>> The real problem is
>> that the assignment to color2 is really a conditional assignment: if
>> we're going channel-by-channel, it's not, but if you consider the
>> *whole* register at the same time, it is. To really fix the problem,
>> you need to model exactly what the machine actually does: you need to
>> insert "fake" edges like these, that model the jumps that the machine
>> can take, and you need to make every assignment a conditional
>> assignment (i.e. it doesn't kill the register). It's probably not as
>> bad with Curro's patch on top, though. Also, once you do this you can
>> make register allocation more accurate by generating interferences
>> from the liveness information directly instead of from the intervals.
>>
>> One thing I've thought about is, in addition to maintaining this
>> "whole-vector" view of things, is to maintain a "per-channel" liveness
>> that doesn't use the extra edges, partial definitions etc. and then
>> use the "per-channel view" to calculate interference when the channels
>> always line up.
>>
>
> Yes, we've considered that and it's a good idea.  However, I'm trying to
> fix bugs right now, not write the world's best liveness analysis pass. :-)
>

You're correct, as usual... I've inspected the result of liveness anlaysis
and we do indeed get it wrong.  I'll come up with something less bogus.


> On Wed, Oct 4, 2017 at 7:58 PM, Jason Ekstrand 
>> wrote:
>> > Shader-db results on Sky Lake:
>> >
>> > total instructions in shared programs: 12955125 -> 12953698 (-0.01%)
>> > instructions in affected programs: 55956 -> 54529 (-2.55%)
>> > helped: 6
>> > HURT: 38
>> >
>> > All of the hurt programs were hurt by exactly one instruction because
>> > this patch affects copy propagation.  Most of the helped instructions
>> > came from a single orbital explorer shader that was helped by 14.26%
>> >
>> > Cc: mesa-sta...@lists.freedesktop.org
>> > ---
>> >  src/intel/compiler/brw_cfg.cpp | 37 ++
>> +--
>> >  1 file changed, 35 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/src/intel/compiler/brw_cfg.cpp
>> b/src/intel/compiler/brw_cfg.cpp
>> > index fad12ee..d8bf725 100644
>> > --- a/src/intel/compiler/brw_cfg.cpp
>> > +++ b/src/intel/compiler/brw_cfg.cpp
>> > @@ -289,9 +289,42 @@ cfg_t::cfg_t(exec_list *instructions)
>> >   assert(cur_while != NULL);
>> >  cur->add_successor(mem_ctx, cur_while);
>> >
>> > + /* We also add the next block as a successor of the break.
>> If the
>> > +  * break is predicated, we need to do this because the break
>> may not
>> > +  * be taken.  If the break is not predicated, we add it
>> anyway so
>> > +  * that our live intervals computations will operate as if
>> the break
>> > +  * may or may not be taken.  Consider the following example:
>> > +  *
>> > +  *vec4 color2;
>> > +  *while (1) {
>> > +  *   vec4 color = texture();
>> > +  *   if (...) {
>> > +  *  color2 = color * 2;
>> > +  *  break;
>> > +  *   }
>> > +  *}
>> > +  *gl_FragColor = color2;
>> > +  *
>> > +  * In this case, the definition of color2 dominates the use
>> because
>> > +  * the loop only has the one exit.  This means that the live
>> range
>> > +  * interval for color2 goes from the statement in the if to
>> it's use
>> > +  * below the loop.  Now suppose that the texture operation
>> has a
>> > +  * header register that gets assigned one of the registers
>> used for
>> > +  * color2.  If the loop condition is non-uniform and some of
>> the
>> > +  * threads will take the break and others will continue.  In
>> this
>> > +  * case, the next pass through the loop, the WE_all setup of
>> the
>> > +  * header register will stomp the disabled channels of color2
>> and
>> > +  * corrupt the value.
>> > +  *
>> > +  * This same problem can occur if you have a mix of 64, 32,
>> and
>> > +  * 16-bit registers because the channels do not line up or if
>> you
>> > +  * have a SIMD16 program and the first half of one value
>> overlaps the
>> > +  * second half of the other.  To solve it, 

Re: [Mesa-dev] [PATCH 3/4] intel/cfg: Always add both successors to a break

2017-10-04 Thread Connor Abbott
On Wed, Oct 4, 2017 at 8:35 PM, Jason Ekstrand  wrote:
> On Wed, Oct 4, 2017 at 5:29 PM, Connor Abbott  wrote:
>>
>> This won't completely solve the problem. For example, what if you
>> hoist the assignment to color2 outside the loop?
>>
>> vec4 color2;
>> while (1) {
>>vec4 color = texture();
>>color2 = color * 2;
>>if (...) {
>>   break;
>>}
>> }
>> gl_FragColor = color2;
>>
>>
>> Now the definition still dominates the use, even with the modified
>> control-flow graph, and you have the same problem
>
>
> Curro had me convinced that some detail of the liveness analysis pass saved
> us here but now I can't remember what. :-(
>
>>
>> The real problem is
>> that the assignment to color2 is really a conditional assignment: if
>> we're going channel-by-channel, it's not, but if you consider the
>> *whole* register at the same time, it is. To really fix the problem,
>> you need to model exactly what the machine actually does: you need to
>> insert "fake" edges like these, that model the jumps that the machine
>> can take, and you need to make every assignment a conditional
>> assignment (i.e. it doesn't kill the register). It's probably not as
>> bad with Curro's patch on top, though. Also, once you do this you can
>> make register allocation more accurate by generating interferences
>> from the liveness information directly instead of from the intervals.
>>
>> One thing I've thought about is, in addition to maintaining this
>> "whole-vector" view of things, is to maintain a "per-channel" liveness
>> that doesn't use the extra edges, partial definitions etc. and then
>> use the "per-channel view" to calculate interference when the channels
>> always line up.
>
>
> Yes, we've considered that and it's a good idea.  However, I'm trying to fix
> bugs right now, not write the world's best liveness analysis pass. :-)

That's fair, although just implementing the first bit shouldn't be too hard.

>
>>
>> On Wed, Oct 4, 2017 at 7:58 PM, Jason Ekstrand 
>> wrote:
>> > Shader-db results on Sky Lake:
>> >
>> > total instructions in shared programs: 12955125 -> 12953698 (-0.01%)
>> > instructions in affected programs: 55956 -> 54529 (-2.55%)
>> > helped: 6
>> > HURT: 38
>> >
>> > All of the hurt programs were hurt by exactly one instruction because
>> > this patch affects copy propagation.  Most of the helped instructions
>> > came from a single orbital explorer shader that was helped by 14.26%
>> >
>> > Cc: mesa-sta...@lists.freedesktop.org
>> > ---
>> >  src/intel/compiler/brw_cfg.cpp | 37
>> > +++--
>> >  1 file changed, 35 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/src/intel/compiler/brw_cfg.cpp
>> > b/src/intel/compiler/brw_cfg.cpp
>> > index fad12ee..d8bf725 100644
>> > --- a/src/intel/compiler/brw_cfg.cpp
>> > +++ b/src/intel/compiler/brw_cfg.cpp
>> > @@ -289,9 +289,42 @@ cfg_t::cfg_t(exec_list *instructions)
>> >   assert(cur_while != NULL);
>> >  cur->add_successor(mem_ctx, cur_while);
>> >
>> > + /* We also add the next block as a successor of the break.  If
>> > the
>> > +  * break is predicated, we need to do this because the break
>> > may not
>> > +  * be taken.  If the break is not predicated, we add it anyway
>> > so
>> > +  * that our live intervals computations will operate as if the
>> > break
>> > +  * may or may not be taken.  Consider the following example:
>> > +  *
>> > +  *vec4 color2;
>> > +  *while (1) {
>> > +  *   vec4 color = texture();
>> > +  *   if (...) {
>> > +  *  color2 = color * 2;
>> > +  *  break;
>> > +  *   }
>> > +  *}
>> > +  *gl_FragColor = color2;
>> > +  *
>> > +  * In this case, the definition of color2 dominates the use
>> > because
>> > +  * the loop only has the one exit.  This means that the live
>> > range
>> > +  * interval for color2 goes from the statement in the if to
>> > it's use
>> > +  * below the loop.  Now suppose that the texture operation has
>> > a
>> > +  * header register that gets assigned one of the registers
>> > used for
>> > +  * color2.  If the loop condition is non-uniform and some of
>> > the
>> > +  * threads will take the break and others will continue.  In
>> > this
>> > +  * case, the next pass through the loop, the WE_all setup of
>> > the
>> > +  * header register will stomp the disabled channels of color2
>> > and
>> > +  * corrupt the value.
>> > +  *
>> > +  * This same problem can occur if you have a mix of 64, 32,
>> > and
>> > +  * 16-bit registers because the channels do not line up or if
>> > you
>> > +  * have a SIMD16 program and the first half of one value
>> > overlaps the
>> > +  * second half of the other.  To solve it, we simply treat the
>> > break

Re: [Mesa-dev] [PATCH 3/4] intel/cfg: Always add both successors to a break

2017-10-04 Thread Jason Ekstrand
On Wed, Oct 4, 2017 at 5:29 PM, Connor Abbott  wrote:

> This won't completely solve the problem. For example, what if you
> hoist the assignment to color2 outside the loop?
>
> vec4 color2;
> while (1) {
>vec4 color = texture();
>color2 = color * 2;
>if (...) {
>   break;
>}
> }
> gl_FragColor = color2;
>
>
> Now the definition still dominates the use, even with the modified
> control-flow graph, and you have the same problem


Curro had me convinced that some detail of the liveness analysis pass saved
us here but now I can't remember what. :-(


> The real problem is
> that the assignment to color2 is really a conditional assignment: if
> we're going channel-by-channel, it's not, but if you consider the
> *whole* register at the same time, it is. To really fix the problem,
> you need to model exactly what the machine actually does: you need to
> insert "fake" edges like these, that model the jumps that the machine
> can take, and you need to make every assignment a conditional
> assignment (i.e. it doesn't kill the register). It's probably not as
> bad with Curro's patch on top, though. Also, once you do this you can
> make register allocation more accurate by generating interferences
> from the liveness information directly instead of from the intervals.
>
> One thing I've thought about is, in addition to maintaining this
> "whole-vector" view of things, is to maintain a "per-channel" liveness
> that doesn't use the extra edges, partial definitions etc. and then
> use the "per-channel view" to calculate interference when the channels
> always line up.
>

Yes, we've considered that and it's a good idea.  However, I'm trying to
fix bugs right now, not write the world's best liveness analysis pass. :-)


> On Wed, Oct 4, 2017 at 7:58 PM, Jason Ekstrand 
> wrote:
> > Shader-db results on Sky Lake:
> >
> > total instructions in shared programs: 12955125 -> 12953698 (-0.01%)
> > instructions in affected programs: 55956 -> 54529 (-2.55%)
> > helped: 6
> > HURT: 38
> >
> > All of the hurt programs were hurt by exactly one instruction because
> > this patch affects copy propagation.  Most of the helped instructions
> > came from a single orbital explorer shader that was helped by 14.26%
> >
> > Cc: mesa-sta...@lists.freedesktop.org
> > ---
> >  src/intel/compiler/brw_cfg.cpp | 37 ++
> +--
> >  1 file changed, 35 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/intel/compiler/brw_cfg.cpp
> b/src/intel/compiler/brw_cfg.cpp
> > index fad12ee..d8bf725 100644
> > --- a/src/intel/compiler/brw_cfg.cpp
> > +++ b/src/intel/compiler/brw_cfg.cpp
> > @@ -289,9 +289,42 @@ cfg_t::cfg_t(exec_list *instructions)
> >   assert(cur_while != NULL);
> >  cur->add_successor(mem_ctx, cur_while);
> >
> > + /* We also add the next block as a successor of the break.  If
> the
> > +  * break is predicated, we need to do this because the break
> may not
> > +  * be taken.  If the break is not predicated, we add it anyway
> so
> > +  * that our live intervals computations will operate as if the
> break
> > +  * may or may not be taken.  Consider the following example:
> > +  *
> > +  *vec4 color2;
> > +  *while (1) {
> > +  *   vec4 color = texture();
> > +  *   if (...) {
> > +  *  color2 = color * 2;
> > +  *  break;
> > +  *   }
> > +  *}
> > +  *gl_FragColor = color2;
> > +  *
> > +  * In this case, the definition of color2 dominates the use
> because
> > +  * the loop only has the one exit.  This means that the live
> range
> > +  * interval for color2 goes from the statement in the if to
> it's use
> > +  * below the loop.  Now suppose that the texture operation has
> a
> > +  * header register that gets assigned one of the registers
> used for
> > +  * color2.  If the loop condition is non-uniform and some of
> the
> > +  * threads will take the break and others will continue.  In
> this
> > +  * case, the next pass through the loop, the WE_all setup of
> the
> > +  * header register will stomp the disabled channels of color2
> and
> > +  * corrupt the value.
> > +  *
> > +  * This same problem can occur if you have a mix of 64, 32, and
> > +  * 16-bit registers because the channels do not line up or if
> you
> > +  * have a SIMD16 program and the first half of one value
> overlaps the
> > +  * second half of the other.  To solve it, we simply treat the
> break
> > +  * as if it may also continue on because some of the threads
> may
> > +  * continue on.
> > +  */
> >  next = new_block();
> > -if (inst->predicate)
> > -   cur->add_successor(mem_ctx, next);
> > +cur->add_successor(mem_ctx, next);
> >
> > 

Re: [Mesa-dev] [PATCH 3/4] intel/cfg: Always add both successors to a break

2017-10-04 Thread Connor Abbott
This won't completely solve the problem. For example, what if you
hoist the assignment to color2 outside the loop?

vec4 color2;
while (1) {
   vec4 color = texture();
   color2 = color * 2;
   if (...) {
  break;
   }
}
gl_FragColor = color2;


Now the definition still dominates the use, even with the modified
control-flow graph, and you have the same problem. The real problem is
that the assignment to color2 is really a conditional assignment: if
we're going channel-by-channel, it's not, but if you consider the
*whole* register at the same time, it is. To really fix the problem,
you need to model exactly what the machine actually does: you need to
insert "fake" edges like these, that model the jumps that the machine
can take, and you need to make every assignment a conditional
assignment (i.e. it doesn't kill the register). It's probably not as
bad with Curro's patch on top, though. Also, once you do this you can
make register allocation more accurate by generating interferences
from the liveness information directly instead of from the intervals.

One thing I've thought about is, in addition to maintaining this
"whole-vector" view of things, is to maintain a "per-channel" liveness
that doesn't use the extra edges, partial definitions etc. and then
use the "per-channel view" to calculate interference when the channels
always line up.


On Wed, Oct 4, 2017 at 7:58 PM, Jason Ekstrand  wrote:
> Shader-db results on Sky Lake:
>
> total instructions in shared programs: 12955125 -> 12953698 (-0.01%)
> instructions in affected programs: 55956 -> 54529 (-2.55%)
> helped: 6
> HURT: 38
>
> All of the hurt programs were hurt by exactly one instruction because
> this patch affects copy propagation.  Most of the helped instructions
> came from a single orbital explorer shader that was helped by 14.26%
>
> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  src/intel/compiler/brw_cfg.cpp | 37 +++--
>  1 file changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_cfg.cpp b/src/intel/compiler/brw_cfg.cpp
> index fad12ee..d8bf725 100644
> --- a/src/intel/compiler/brw_cfg.cpp
> +++ b/src/intel/compiler/brw_cfg.cpp
> @@ -289,9 +289,42 @@ cfg_t::cfg_t(exec_list *instructions)
>   assert(cur_while != NULL);
>  cur->add_successor(mem_ctx, cur_while);
>
> + /* We also add the next block as a successor of the break.  If the
> +  * break is predicated, we need to do this because the break may not
> +  * be taken.  If the break is not predicated, we add it anyway so
> +  * that our live intervals computations will operate as if the break
> +  * may or may not be taken.  Consider the following example:
> +  *
> +  *vec4 color2;
> +  *while (1) {
> +  *   vec4 color = texture();
> +  *   if (...) {
> +  *  color2 = color * 2;
> +  *  break;
> +  *   }
> +  *}
> +  *gl_FragColor = color2;
> +  *
> +  * In this case, the definition of color2 dominates the use because
> +  * the loop only has the one exit.  This means that the live range
> +  * interval for color2 goes from the statement in the if to it's use
> +  * below the loop.  Now suppose that the texture operation has a
> +  * header register that gets assigned one of the registers used for
> +  * color2.  If the loop condition is non-uniform and some of the
> +  * threads will take the break and others will continue.  In this
> +  * case, the next pass through the loop, the WE_all setup of the
> +  * header register will stomp the disabled channels of color2 and
> +  * corrupt the value.
> +  *
> +  * This same problem can occur if you have a mix of 64, 32, and
> +  * 16-bit registers because the channels do not line up or if you
> +  * have a SIMD16 program and the first half of one value overlaps 
> the
> +  * second half of the other.  To solve it, we simply treat the break
> +  * as if it may also continue on because some of the threads may
> +  * continue on.
> +  */
>  next = new_block();
> -if (inst->predicate)
> -   cur->add_successor(mem_ctx, next);
> +cur->add_successor(mem_ctx, next);
>
>  set_next_block(&cur, next, ip);
>  break;
> --
> 2.5.0.400.gff86faf
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] intel/cfg: Always add both successors to a break

2017-10-04 Thread Jason Ekstrand
Shader-db results on Sky Lake:

total instructions in shared programs: 12955125 -> 12953698 (-0.01%)
instructions in affected programs: 55956 -> 54529 (-2.55%)
helped: 6
HURT: 38

All of the hurt programs were hurt by exactly one instruction because
this patch affects copy propagation.  Most of the helped instructions
came from a single orbital explorer shader that was helped by 14.26%

Cc: mesa-sta...@lists.freedesktop.org
---
 src/intel/compiler/brw_cfg.cpp | 37 +++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_cfg.cpp b/src/intel/compiler/brw_cfg.cpp
index fad12ee..d8bf725 100644
--- a/src/intel/compiler/brw_cfg.cpp
+++ b/src/intel/compiler/brw_cfg.cpp
@@ -289,9 +289,42 @@ cfg_t::cfg_t(exec_list *instructions)
  assert(cur_while != NULL);
 cur->add_successor(mem_ctx, cur_while);
 
+ /* We also add the next block as a successor of the break.  If the
+  * break is predicated, we need to do this because the break may not
+  * be taken.  If the break is not predicated, we add it anyway so
+  * that our live intervals computations will operate as if the break
+  * may or may not be taken.  Consider the following example:
+  *
+  *vec4 color2;
+  *while (1) {
+  *   vec4 color = texture();
+  *   if (...) {
+  *  color2 = color * 2;
+  *  break;
+  *   }
+  *}
+  *gl_FragColor = color2;
+  *
+  * In this case, the definition of color2 dominates the use because
+  * the loop only has the one exit.  This means that the live range
+  * interval for color2 goes from the statement in the if to it's use
+  * below the loop.  Now suppose that the texture operation has a
+  * header register that gets assigned one of the registers used for
+  * color2.  If the loop condition is non-uniform and some of the
+  * threads will take the break and others will continue.  In this
+  * case, the next pass through the loop, the WE_all setup of the
+  * header register will stomp the disabled channels of color2 and
+  * corrupt the value.
+  *
+  * This same problem can occur if you have a mix of 64, 32, and
+  * 16-bit registers because the channels do not line up or if you
+  * have a SIMD16 program and the first half of one value overlaps the
+  * second half of the other.  To solve it, we simply treat the break
+  * as if it may also continue on because some of the threads may
+  * continue on.
+  */
 next = new_block();
-if (inst->predicate)
-   cur->add_successor(mem_ctx, next);
+cur->add_successor(mem_ctx, next);
 
 set_next_block(&cur, next, ip);
 break;
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev