[Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

2017-06-09 Thread Gert Wollny
Dear all,

as I wrote before, I was looking into the temporary register renaming.

This series of patches implements a new approach that achieves a tigher
estimation of the life time of the temporaries, and as a result the Piano
and Voloplosion benchmarks implemented in gputest [1] now work. Before
they failed with "r600_pipe_shader_create - translation from TGSI failed!"

Piglit shows 7 fixes and 6 regressions compared to git 8fac894f, but they don't
seem to be related to shaders. I've also tested other programs like the 
unignie-*
benchmarks and they didn't show regressions.

I think that the patch will need a few more iterations to remove code 
duplication
and generally adhere to the mesa style, but I think it is atthe point where I 
could
need a bit of feedback to get it into shape to be acceptable, and I'd also like 
to
mention that since I'm new to mesa this I have no commit rights.

many thanks,
Gert

[1] http://www.geeks3d.com/gputest/

Gert Wollny (3):
  mesa/st: glsl_to_tgsi move some helper classes to extra files
  mesa/st: glsl_to_tgsi Implement a new lifetime tracker for temporaries
  mesa/st: glsl_to_tgsi: tie in the new register renaming approach

 configure.ac   |   1 +
 src/mesa/Makefile.am   |   4 +-
 src/mesa/Makefile.sources  |   4 +
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 302 +---
 src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp | 241 +++
 src/mesa/state_tracker/st_glsl_to_tgsi_private.h   | 135 
 .../state_tracker/st_glsl_to_tgsi_temprename.cpp   | 551 ++
 .../state_tracker/st_glsl_to_tgsi_temprename.h | 114 +++
 src/mesa/state_tracker/tests/Makefile.am   |  40 ++
 src/mesa/state_tracker/tests/st-renumerate-test| 210 ++
 .../tests/test_glsl_to_tgsi_lifetime.cpp   | 789 +
 11 files changed, 2104 insertions(+), 287 deletions(-)
 create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp
 create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_private.h
 create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_temprename.cpp
 create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_temprename.h
 create mode 100644 src/mesa/state_tracker/tests/Makefile.am
 create mode 100755 src/mesa/state_tracker/tests/st-renumerate-test
 create mode 100644 src/mesa/state_tracker/tests/test_glsl_to_tgsi_lifetime.cpp

-- 
2.13.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

2017-06-11 Thread Marek Olšák
Hi Gert,

Have you measured the CPU overhead of the new code?

Marek

On Sat, Jun 10, 2017 at 1:15 AM, Gert Wollny  wrote:
> Dear all,
>
> as I wrote before, I was looking into the temporary register renaming.
>
> This series of patches implements a new approach that achieves a tigher
> estimation of the life time of the temporaries, and as a result the Piano
> and Voloplosion benchmarks implemented in gputest [1] now work. Before
> they failed with "r600_pipe_shader_create - translation from TGSI failed!"
>
> Piglit shows 7 fixes and 6 regressions compared to git 8fac894f, but they 
> don't
> seem to be related to shaders. I've also tested other programs like the 
> unignie-*
> benchmarks and they didn't show regressions.
>
> I think that the patch will need a few more iterations to remove code 
> duplication
> and generally adhere to the mesa style, but I think it is atthe point where I 
> could
> need a bit of feedback to get it into shape to be acceptable, and I'd also 
> like to
> mention that since I'm new to mesa this I have no commit rights.
>
> many thanks,
> Gert
>
> [1] http://www.geeks3d.com/gputest/
>
> Gert Wollny (3):
>   mesa/st: glsl_to_tgsi move some helper classes to extra files
>   mesa/st: glsl_to_tgsi Implement a new lifetime tracker for temporaries
>   mesa/st: glsl_to_tgsi: tie in the new register renaming approach
>
>  configure.ac   |   1 +
>  src/mesa/Makefile.am   |   4 +-
>  src/mesa/Makefile.sources  |   4 +
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 302 +---
>  src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp | 241 +++
>  src/mesa/state_tracker/st_glsl_to_tgsi_private.h   | 135 
>  .../state_tracker/st_glsl_to_tgsi_temprename.cpp   | 551 ++
>  .../state_tracker/st_glsl_to_tgsi_temprename.h | 114 +++
>  src/mesa/state_tracker/tests/Makefile.am   |  40 ++
>  src/mesa/state_tracker/tests/st-renumerate-test| 210 ++
>  .../tests/test_glsl_to_tgsi_lifetime.cpp   | 789 
> +
>  11 files changed, 2104 insertions(+), 287 deletions(-)
>  create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp
>  create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_private.h
>  create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_temprename.cpp
>  create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_temprename.h
>  create mode 100644 src/mesa/state_tracker/tests/Makefile.am
>  create mode 100755 src/mesa/state_tracker/tests/st-renumerate-test
>  create mode 100644 
> src/mesa/state_tracker/tests/test_glsl_to_tgsi_lifetime.cpp
>
> --
> 2.13.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

2017-06-11 Thread Marek Olšák
Also, I don't know if people will like that it uses STL. I personally
have no issue with that as long as it doesn't break apps (e.g. the STL
shipped with apps should be the same as the STL shipped with the
distribution).

Marek

On Sun, Jun 11, 2017 at 4:12 PM, Marek Olšák  wrote:
> Hi Gert,
>
> Have you measured the CPU overhead of the new code?
>
> Marek
>
> On Sat, Jun 10, 2017 at 1:15 AM, Gert Wollny  wrote:
>> Dear all,
>>
>> as I wrote before, I was looking into the temporary register renaming.
>>
>> This series of patches implements a new approach that achieves a tigher
>> estimation of the life time of the temporaries, and as a result the Piano
>> and Voloplosion benchmarks implemented in gputest [1] now work. Before
>> they failed with "r600_pipe_shader_create - translation from TGSI failed!"
>>
>> Piglit shows 7 fixes and 6 regressions compared to git 8fac894f, but they 
>> don't
>> seem to be related to shaders. I've also tested other programs like the 
>> unignie-*
>> benchmarks and they didn't show regressions.
>>
>> I think that the patch will need a few more iterations to remove code 
>> duplication
>> and generally adhere to the mesa style, but I think it is atthe point where 
>> I could
>> need a bit of feedback to get it into shape to be acceptable, and I'd also 
>> like to
>> mention that since I'm new to mesa this I have no commit rights.
>>
>> many thanks,
>> Gert
>>
>> [1] http://www.geeks3d.com/gputest/
>>
>> Gert Wollny (3):
>>   mesa/st: glsl_to_tgsi move some helper classes to extra files
>>   mesa/st: glsl_to_tgsi Implement a new lifetime tracker for temporaries
>>   mesa/st: glsl_to_tgsi: tie in the new register renaming approach
>>
>>  configure.ac   |   1 +
>>  src/mesa/Makefile.am   |   4 +-
>>  src/mesa/Makefile.sources  |   4 +
>>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 302 +---
>>  src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp | 241 +++
>>  src/mesa/state_tracker/st_glsl_to_tgsi_private.h   | 135 
>>  .../state_tracker/st_glsl_to_tgsi_temprename.cpp   | 551 ++
>>  .../state_tracker/st_glsl_to_tgsi_temprename.h | 114 +++
>>  src/mesa/state_tracker/tests/Makefile.am   |  40 ++
>>  src/mesa/state_tracker/tests/st-renumerate-test| 210 ++
>>  .../tests/test_glsl_to_tgsi_lifetime.cpp   | 789 
>> +
>>  11 files changed, 2104 insertions(+), 287 deletions(-)
>>  create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp
>>  create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_private.h
>>  create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_temprename.cpp
>>  create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_temprename.h
>>  create mode 100644 src/mesa/state_tracker/tests/Makefile.am
>>  create mode 100755 src/mesa/state_tracker/tests/st-renumerate-test
>>  create mode 100644 
>> src/mesa/state_tracker/tests/test_glsl_to_tgsi_lifetime.cpp
>>
>> --
>> 2.13.0
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

2017-06-11 Thread Gert Wollny
Hello Marek, 

thanks for chiming in. 

Am Sonntag, den 11.06.2017, 16:15 +0200 schrieb Marek Olšák:
> Also, I don't know if people will like that it uses STL. I personally
> have no issue with that as long as it doesn't break apps (e.g. the
> STL shipped with apps should be the same as the STL shipped with the
> distribution).

Well, on Linux I would take it for granted that the STL used to run the
code is the same like the one the code was compiled with, and there are
already quite some places in the mesa code where STL constructs are
used (if that wounld't  have been the case, then I would tried to avoid
the STL). I am actually more concerned that propagating the  C++11
requirement to the whole  of src/mesa might not be welcomed (although
everything compiles and runs fine).


> On Sun, Jun 11, 2017 at 4:12 PM, Marek Olšák 
> wrote:
> > Hi Gert,
> > 
> > Have you measured the CPU overhead of the new code?

So far no, I guess one would do that with the shader-db to get
reasonable complex shaders, but I only have a r600 based card so I'm
not sure whether I can run this. In any case, tomorrow I will take a
look into this. 

Best, 
Gert 

> > 
> > Marek
> > 
> > On Sat, Jun 10, 2017 at 1:15 AM, Gert Wollny 
> > wrote:
> > > Dear all,
> > > 
> > > as I wrote before, I was looking into the temporary register
> > > renaming.
> > > 
> > > This series of patches implements a new approach that achieves a
> > > tigher
> > > estimation of the life time of the temporaries, and as a result
> > > the Piano
> > > and Voloplosion benchmarks implemented in gputest [1] now work.
> > > Before
> > > they failed with "r600_pipe_shader_create - translation from TGSI
> > > failed!"
> > > 
> > > Piglit shows 7 fixes and 6 regressions compared to git 8fac894f,
> > > but they don't
> > > seem to be related to shaders. I've also tested other programs
> > > like the unignie-*
> > > benchmarks and they didn't show regressions.
> > > 
> > > I think that the patch will need a few more iterations to remove
> > > code duplication
> > > and generally adhere to the mesa style, but I think it is atthe
> > > point where I could
> > > need a bit of feedback to get it into shape to be acceptable, and
> > > I'd also like to
> > > mention that since I'm new to mesa this I have no commit rights.
> > > 
> > > many thanks,
> > > Gert
> > > 
> > > [1] http://www.geeks3d.com/gputest/
> > > 
> > > Gert Wollny (3):
> > >   mesa/st: glsl_to_tgsi move some helper classes to extra files
> > >   mesa/st: glsl_to_tgsi Implement a new lifetime tracker for
> > > temporaries
> > >   mesa/st: glsl_to_tgsi: tie in the new register renaming
> > > approach
> > > 
> > >  configure.ac   |   1 +
> > >  src/mesa/Makefile.am   |   4 +-
> > >  src/mesa/Makefile.sources  |   4 +
> > >  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 302 +
> > > ---
> > >  src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp | 241 +++
> > >  src/mesa/state_tracker/st_glsl_to_tgsi_private.h   | 135 
> > >  .../state_tracker/st_glsl_to_tgsi_temprename.cpp   | 551
> > > ++
> > >  .../state_tracker/st_glsl_to_tgsi_temprename.h | 114 +++
> > >  src/mesa/state_tracker/tests/Makefile.am   |  40 ++
> > >  src/mesa/state_tracker/tests/st-renumerate-test| 210 ++
> > >  .../tests/test_glsl_to_tgsi_lifetime.cpp   | 789
> > > +
> > >  11 files changed, 2104 insertions(+), 287 deletions(-)
> > >  create mode 100644
> > > src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp
> > >  create mode 100644
> > > src/mesa/state_tracker/st_glsl_to_tgsi_private.h
> > >  create mode 100644
> > > src/mesa/state_tracker/st_glsl_to_tgsi_temprename.cpp
> > >  create mode 100644
> > > src/mesa/state_tracker/st_glsl_to_tgsi_temprename.h
> > >  create mode 100644 src/mesa/state_tracker/tests/Makefile.am
> > >  create mode 100755 src/mesa/state_tracker/tests/st-renumerate-
> > > test
> > >  create mode 100644
> > > src/mesa/state_tracker/tests/test_glsl_to_tgsi_lifetime.cpp
> > > 
> > > --
> > > 2.13.0
> > > 
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

2017-06-11 Thread Michel Dänzer
On 10/06/17 08:15 AM, Gert Wollny wrote:
> 
> Piglit shows 7 fixes and 6 regressions compared to git 8fac894f, but they 
> don't
> seem to be related to shaders.

Which tests regressed (maybe you can put up a piglit HTML summary
somewhere generated from a run with and without your patches)? Do they
consistently pass without your patches and fail with them?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

2017-06-12 Thread Gert Wollny
Am Sonntag, den 11.06.2017, 19:21 +0200 schrieb Gert Wollny:
> 
> > > 
> > > Have you measured the CPU overhead of the new code?
> 
> So far no, I guess one would do that with the shader-db to get
> reasonable complex shaders, but I only have a r600 based card so I'm
> not sure whether I can run this. In any case, tomorrow I will take a
> look into this. 

I did runs of the shader-db/run program with valgrind/callgrind 

Here the original merge_registers reports 0.21% and my code reports
0.50%. 

If it is important to cut down on these addes 0.3%, I think, I can
eliminate 0.1% by changing the implementation to not used so many
dynamically allocated objects, but beyond that it will be difficult.

Best, 
Gert 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

2017-06-12 Thread Gert Wollny
Am Montag, den 12.06.2017, 15:44 +0900 schrieb Michel Dänzer:
> On 10/06/17 08:15 AM, Gert Wollny wrote:
> > 
> > Piglit shows 7 fixes and 6 regressions compared to git 8fac894f,
> > but they don't
> > seem to be related to shaders.
> 
> Which tests regressed (maybe you can put up a piglit HTML summary
> somewhere generated from a run with and without your patches)? Do
> they consistently pass without your patches and fail with them?


I had to redo the results, because I realized that I had compared the
system mesa version (with EGL support) versus the test version (without
  EGL support). 

Now both tested versions were configure with the same options and I run
both versions two times. The result diff of the quick test are: 

piglit summary console -d results/o1 results/o2 results/n1 results/n2 

glx/glx-multithread-texture: pass pass fail fail

glx/glx-visuals-stencil: fail fail pass pass

glx/glx_arb_sync_control/timing -fullscreen -divisor 1: pass fail pass
fail

glx/glx_arb_sync_control/timing -fullscreen -divisor 2: pass fail fail
warn

glx/glx_arb_sync_control/timing -fullscreen -msc-delta 1: fail fail
warn fail

glx/glx_arb_sync_control/timing -fullscreen -msc-delta 2: fail fail
fail pass

glx/glx_arb_sync_control/timing -msc-delta 2: warn fail pass fail

glx/glx_arb_sync_control/timing -waitformsc -msc-delta 2: fail pass
pass fail

spec/arb_shader_bit_encoding/execution/and-clamp: fail fail pass fail

spec/arb_shading_language_420pack/active sampler conflict: fail fail
pass fail

spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec2-index-
rd: fail fail pass pass

spec/nv_conditional_render/drawpixels: fail pass fail pass

spec/nv_conditional_render/vertex_array: fail pass pass pass


summary:
   name: o1 o2 n1 n2
     -- -- -- --
   pass:  31583  31584  31588  31585
   fail:   1454   1454   1449   1452
  crash:  5  5  5  5
   skip:  17356  17356  17356  17356
timeout:  0  0  0  0
   warn: 14 13 14 14
 incomplete:  0  0  0  0
 dmesg-warn:  0  0  0  0
 dmesg-fail:  0  0  0  0
changes:  0  6  9  9
  fixes:  0  3  7  3
regressions:  0  3  2  6
  total:  50412  50412  50412  50412


Best, 
Gert 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

2017-06-12 Thread Michel Dänzer
On 12/06/17 06:32 PM, Gert Wollny wrote:
> Am Montag, den 12.06.2017, 15:44 +0900 schrieb Michel Dänzer:
>> On 10/06/17 08:15 AM, Gert Wollny wrote:
>>>
>>> Piglit shows 7 fixes and 6 regressions compared to git 8fac894f,
>>> but they don't
>>> seem to be related to shaders.
>>
>> Which tests regressed (maybe you can put up a piglit HTML summary
>> somewhere generated from a run with and without your patches)? Do
>> they consistently pass without your patches and fail with them?
> 
> 
> I had to redo the results, because I realized that I had compared the
> system mesa version (with EGL support) versus the test version (without
>   EGL support). 
> 
> Now both tested versions were configure with the same options and I run
> both versions two times. The result diff of the quick test are: 
> 
> piglit summary console -d results/o1 results/o2 results/n1 results/n2 
> 
> glx/glx-multithread-texture: pass pass fail fail

Might want to make sure this isn't a regression caused by your patches,
but FWIW this test seems to fail for me with radeonsi with "random" values.


> glx/glx_arb_sync_control/timing -fullscreen -divisor 1: pass fail pass
> fail
> glx/glx_arb_sync_control/timing -fullscreen -divisor 2: pass fail fail
> warn
> glx/glx_arb_sync_control/timing -fullscreen -msc-delta 1: fail fail
> warn fail
> glx/glx_arb_sync_control/timing -fullscreen -msc-delta 2: fail fail
> fail pass
> glx/glx_arb_sync_control/timing -msc-delta 2: warn fail pass fail
> glx/glx_arb_sync_control/timing -waitformsc -msc-delta 2: fail pass
> pass fail

You can ignore these, their results are somewhat random. The piglit
patches below help a little, but the results are still not 100% reliable
when piglit runs multiple tests concurrently.

https://patchwork.freedesktop.org/patch/150486/
https://patchwork.freedesktop.org/patch/150484/
https://patchwork.freedesktop.org/patch/150485/


In summary, it looks like your patches don't cause any piglit regressions.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

2017-06-12 Thread Nicolai Hähnle

On 12.06.2017 11:32, Gert Wollny wrote:

Am Montag, den 12.06.2017, 15:44 +0900 schrieb Michel Dänzer:

On 10/06/17 08:15 AM, Gert Wollny wrote:


Piglit shows 7 fixes and 6 regressions compared to git 8fac894f,
but they don't
seem to be related to shaders.


Which tests regressed (maybe you can put up a piglit HTML summary
somewhere generated from a run with and without your patches)? Do
they consistently pass without your patches and fail with them?



I had to redo the results, because I realized that I had compared the
system mesa version (with EGL support) versus the test version (without
   EGL support).

Now both tested versions were configure with the same options and I run
both versions two times. The result diff of the quick test are:

piglit summary console -d results/o1 results/o2 results/n1 results/n2

glx/glx-multithread-texture: pass pass fail fail

glx/glx-visuals-stencil: fail fail pass pass

glx/glx_arb_sync_control/timing -fullscreen -divisor 1: pass fail pass
fail

glx/glx_arb_sync_control/timing -fullscreen -divisor 2: pass fail fail
warn

glx/glx_arb_sync_control/timing -fullscreen -msc-delta 1: fail fail
warn fail

glx/glx_arb_sync_control/timing -fullscreen -msc-delta 2: fail fail
fail pass

glx/glx_arb_sync_control/timing -msc-delta 2: warn fail pass fail

glx/glx_arb_sync_control/timing -waitformsc -msc-delta 2: fail pass
pass fail


The above are probably noise.



spec/arb_shader_bit_encoding/execution/and-clamp: fail fail pass fail

spec/arb_shading_language_420pack/active sampler conflict: fail fail
pass fail

spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec2-index-
rd: fail fail pass pass

spec/nv_conditional_render/drawpixels: fail pass fail pass

spec/nv_conditional_render/vertex_array: fail pass pass pass 


It's disconcerting that you have tests here whose pass status is 
unstable. Those tests really should be deterministic.


You should definitely investigate 
spec/arb_shader_bit_encoding/execution/and-clamp to see if it's related 
to your patches.


Cheers,
Nicolai





summary:
name: o1 o2 n1 n2
  -- -- -- --
pass:  31583  31584  31588  31585
fail:   1454   1454   1449   1452
   crash:  5  5  5  5
skip:  17356  17356  17356  17356
 timeout:  0  0  0  0
warn: 14 13 14 14
  incomplete:  0  0  0  0
  dmesg-warn:  0  0  0  0
  dmesg-fail:  0  0  0  0
 changes:  0  6  9  9
   fixes:  0  3  7  3
regressions:  0  3  2  6
   total:  50412  50412  50412  50412


Best,
Gert

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

2017-06-12 Thread Nicolai Hähnle

On 10.06.2017 01:15, Gert Wollny wrote:

Dear all,

as I wrote before, I was looking into the temporary register renaming.

This series of patches implements a new approach that achieves a tigher
estimation of the life time of the temporaries, and as a result the Piano
and Voloplosion benchmarks implemented in gputest [1] now work. Before
they failed with "r600_pipe_shader_create - translation from TGSI failed!"

Piglit shows 7 fixes and 6 regressions compared to git 8fac894f, but they don't
seem to be related to shaders. I've also tested other programs like the 
unignie-*
benchmarks and they didn't show regressions.

I think that the patch will need a few more iterations to remove code 
duplication
and generally adhere to the mesa style, but I think it is atthe point where I 
could
need a bit of feedback to get it into shape to be acceptable, and I'd also like 
to
mention that since I'm new to mesa this I have no commit rights.


Plenty of style issues aside, can you explain where and why you get 
tighter lifetimes?


Cheers,
Nicolai



many thanks,
Gert

[1] http://www.geeks3d.com/gputest/

Gert Wollny (3):
   mesa/st: glsl_to_tgsi move some helper classes to extra files
   mesa/st: glsl_to_tgsi Implement a new lifetime tracker for temporaries
   mesa/st: glsl_to_tgsi: tie in the new register renaming approach

  configure.ac   |   1 +
  src/mesa/Makefile.am   |   4 +-
  src/mesa/Makefile.sources  |   4 +
  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 302 +---
  src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp | 241 +++
  src/mesa/state_tracker/st_glsl_to_tgsi_private.h   | 135 
  .../state_tracker/st_glsl_to_tgsi_temprename.cpp   | 551 ++
  .../state_tracker/st_glsl_to_tgsi_temprename.h | 114 +++
  src/mesa/state_tracker/tests/Makefile.am   |  40 ++
  src/mesa/state_tracker/tests/st-renumerate-test| 210 ++
  .../tests/test_glsl_to_tgsi_lifetime.cpp   | 789 +
  11 files changed, 2104 insertions(+), 287 deletions(-)
  create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_private.cpp
  create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_private.h
  create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_temprename.cpp
  create mode 100644 src/mesa/state_tracker/st_glsl_to_tgsi_temprename.h
  create mode 100644 src/mesa/state_tracker/tests/Makefile.am
  create mode 100755 src/mesa/state_tracker/tests/st-renumerate-test
  create mode 100644 src/mesa/state_tracker/tests/test_glsl_to_tgsi_lifetime.cpp




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

2017-06-12 Thread Gert Wollny
Am Montag, den 12.06.2017, 12:28 +0200 schrieb Nicolai Hähnle:
> On 10.06.2017 01:15, Gert Wollny wrote:
> > 
> Plenty of style issues aside, can you explain where and why you get 
> tighter lifetimes?

In the original code if a temporary is used within a loop it gets the
whole life time of the loop assigned. 

With this patch I track in more detail where a temporary is accesses
and base the lifetime on this: For instance, if a variable is first
unconditionally written and later read for the last time in the same
scope (loop, if, or switch branch), then the lifetime can be restricted
to that first-written - last-read range.

The code gets more complex because it tries to resolve this also for
nested scopes, and one also has to take care about whether a variable
is written only conditionally within a loop, or conditionally read
before it is written (in the source code sense, but not in the program
flow sense).

Shaders that profit from this better lifetime estimation are the ones
that have many short living values within long loops, think 

for () {
  float4 t[1] = f(in2); 
  float4 t[2] = g(in1); 
  float4 t[3] = op(t[1], t[2]); 
   ...
  sum += t[200];
}
 
Here the old code would keep all of the t[i] alive for the whole loop,
and in fact with the GpuTest Piano benchmark I have seen a shader with
>2000 temporaries where many were used in a loop but only required for
two or three lines so that my code could merge them to less then 100
temporaries, and this made it possible for the tgsi to bytecode layer
in r600g to actually translate the shader.

Best, 
Gert 

PS: Regarding style, I am fully aware that I have to iterate over this
code a few more times. I tried to adhere to the way the existing code
represents itself to me, but I'm happy to listen to any advise I can
get. 

In any case, I though it might be best to send this patch out now for
discussion. Now, with the unit tests in place I will rework it and
focus also more on style questions. One thing that comes up immediately
up is that I will try to reduce the use of dynamically allocated
memory, since 60% of the run time of my code is with memory allocation
and de-allocation. 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

2017-06-12 Thread Gert Wollny
Hello Nicolai, 

Am Montag, den 12.06.2017, 12:17 +0200 schrieb Nicolai Hähnle:
> > spec/arb_shader_bit_encoding/execution/and-clamp: fail fail pass
> > fail
> > 
> > 
> > 

> It's disconcerting that you have tests here whose pass status is 
> unstable. Those tests really should be deterministic.

When I run only these tests ("all" and specified with -t ) then they
always pass (= three times in a row).

spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec2-
> > index-
> > rd: fail fail pass pass
> > 
This one actually now passes because of the patch (before it failed
because it needed 125 registers, and only 124 are free to be used. 


> > spec/arb_shading_language_420pack/active sampler conflict: fail
> > fail pass fail
> > 
> > spec/nv_conditional_render/drawpixels: fail pass fail pass
> > 
> > spec/nv_conditional_render/vertex_array: fail pass pass pass > 

These, however, are unstable, independent on whether my patches are
applied or not.

Best, 
Gert 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

2017-06-12 Thread Nicolai Hähnle

On 12.06.2017 14:34, Gert Wollny wrote:

Am Montag, den 12.06.2017, 12:28 +0200 schrieb Nicolai Hähnle:

On 10.06.2017 01:15, Gert Wollny wrote:



Plenty of style issues aside, can you explain where and why you get
tighter lifetimes?


In the original code if a temporary is used within a loop it gets the
whole life time of the loop assigned.

With this patch I track in more detail where a temporary is accesses
and base the lifetime on this: For instance, if a variable is first
unconditionally written and later read for the last time in the same
scope (loop, if, or switch branch), then the lifetime can be restricted
to that first-written - last-read range.

The code gets more complex because it tries to resolve this also for
nested scopes, and one also has to take care about whether a variable
is written only conditionally within a loop, or conditionally read
before it is written (in the source code sense, but not in the program
flow sense).

Shaders that profit from this better lifetime estimation are the ones
that have many short living values within long loops, think

for () {
   float4 t[1] = f(in2);
   float4 t[2] = g(in1);
   float4 t[3] = op(t[1], t[2]);
...
   sum += t[200];
}
  
Here the old code would keep all of the t[i] alive for the whole loop,

and in fact with the GpuTest Piano benchmark I have seen a shader with

2000 temporaries where many were used in a loop but only required for

two or three lines so that my code could merge them to less then 100
temporaries, and this made it possible for the tgsi to bytecode layer
in r600g to actually translate the shader.


Okay. I think you should seriously re-think your algorithm in a way that 
makes it a more natural evolution from the algorithm that's already there.


Basically, the current algorithm tracks (first_write, last_read), so 
think about what you need to track in order to obtain a single-pass 
algorithm that computes lifetime (first, last) for every temporary. I 
think the following should do it for a first cut:


   struct st_calculate_lifetime_state {
  /* First and last instruction that has this temporary */
  unsigned first;
  unsigned last;

  /* First instruction of the outer-most in-active loop that
   * contains this temporary. (A loop is active if we're
   * currently processing instructions in it.)
  unsigned loop_first;

  /* Position of a read without preceding dominating write. */
  unsigned undef_read[4];

  /* First write in the program that is dominating our
   * current position, per channel.
   */
  unsigned first_dominating_write[4];
   };

In addition, you need to keep a stack of active scopes (loops and ifs), 
but you really only need to remember the start of the scope (and for 
loops, probably the position of the first BREAK).


Here's a sketch of the "state machine" that you need to run while 
traversing the program, assuming no BRK and CONT:


Init: last = 0, everything else = ~0

These are updates on individual variables on use:

On any use (source or dest):
- if first > cur_pos, set first = loop_first = cur_pos
- if loop_first < first, set first = loop_first
- update last

On use as source:
- if first_dominating_write > cur_pos and undef_read > cur_pos, set 
undef_read = cur_pos


On use as dest:
- if first_dominating_write > cur_pos, set first_dominating_write = cur_pos

These are updates of all temporaries on scope change:

On ENDLOOP, for all temps:
- if loop_first > start of loop, set loop_first = start of loop
- first < start of loop, update last to end of loop
- if undef_read between start and end of loop: set first = MIN(first, 
start of loop) and last = end of loop

- if first_dominating_write < end of loop, set undef_read = ~0

On ELSE and ENDIF, for all temps:
- if first_dominating_write > start of scope, set first_dominating_write 
= ~0


I'm not sure right now whether BREAK / CONT need special treatment at 
all. I think what you need is:


On ENDLOOP:
- if first_dominating_write is between the first BREAK in the loop and 
the end of the loop, set first_dominating_write = ~0


And for CONT, you probably don't really need anything, because CONTs 
cannot make you skip code forever.


What this state machine doesn't yet cover is

IF ..
  MOV TEMP[0], ...
ELSE
  MOV TEMP[0], ...
ENDIF

Still, I'd start with it and see whether you need to cover that case.

And even that case can probably be dealt with in a fairly efficient and 
pragmatic way. The idea is to keep track of the nesting level of IFs, 
plus an


   uint32 dominating_write_in_true_block[4];

per temp. Then:

On ELSE:
- if first_dominating_write < cur_pos, set the bit corresponding to the 
current nesting level in dominating_write_in_true_block


On ENDIF:
- don't reset first_dominating_write if the bit corresponding to the 
current nesting level in dominating_write_in_true_block is set

- unconditionally clear that bit

It won't cover cases with nesting level > 32, but do we really care?

Re: [Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

2017-06-12 Thread Nicolai Hähnle

On 12.06.2017 20:52, Nicolai Hähnle wrote:

On 12.06.2017 14:34, Gert Wollny wrote:

Am Montag, den 12.06.2017, 12:28 +0200 schrieb Nicolai Hähnle:

On 10.06.2017 01:15, Gert Wollny wrote:



Plenty of style issues aside, can you explain where and why you get
tighter lifetimes?


In the original code if a temporary is used within a loop it gets the
whole life time of the loop assigned.

With this patch I track in more detail where a temporary is accesses
and base the lifetime on this: For instance, if a variable is first
unconditionally written and later read for the last time in the same
scope (loop, if, or switch branch), then the lifetime can be restricted
to that first-written - last-read range.

The code gets more complex because it tries to resolve this also for
nested scopes, and one also has to take care about whether a variable
is written only conditionally within a loop, or conditionally read
before it is written (in the source code sense, but not in the program
flow sense).

Shaders that profit from this better lifetime estimation are the ones
that have many short living values within long loops, think

for () {
   float4 t[1] = f(in2);
   float4 t[2] = g(in1);
   float4 t[3] = op(t[1], t[2]);
...
   sum += t[200];
}
Here the old code would keep all of the t[i] alive for the whole loop,
and in fact with the GpuTest Piano benchmark I have seen a shader with

2000 temporaries where many were used in a loop but only required for

two or three lines so that my code could merge them to less then 100
temporaries, and this made it possible for the tgsi to bytecode layer
in r600g to actually translate the shader.


Okay. I think you should seriously re-think your algorithm in a way that 
makes it a more natural evolution from the algorithm that's already there.


Basically, the current algorithm tracks (first_write, last_read), so 
think about what you need to track in order to obtain a single-pass 
algorithm that computes lifetime (first, last) for every temporary. I 
think the following should do it for a first cut:


struct st_calculate_lifetime_state {
   /* First and last instruction that has this temporary */
   unsigned first;
   unsigned last;

   /* First instruction of the outer-most in-active loop that
* contains this temporary. (A loop is active if we're
* currently processing instructions in it.)
   unsigned loop_first;

   /* Position of a read without preceding dominating write. */
   unsigned undef_read[4];

   /* First write in the program that is dominating our
* current position, per channel.
*/
   unsigned first_dominating_write[4];
};

In addition, you need to keep a stack of active scopes (loops and ifs), 
but you really only need to remember the start of the scope (and for 
loops, probably the position of the first BREAK).


Here's a sketch of the "state machine" that you need to run while 
traversing the program, assuming no BRK and CONT:


Init: last = 0, everything else = ~0

These are updates on individual variables on use:

On any use (source or dest):
- if first > cur_pos, set first = loop_first = cur_pos
- if loop_first < first, set first = loop_first
- update last

On use as source:
- if first_dominating_write > cur_pos and undef_read > cur_pos, set 
undef_read = cur_pos


On use as dest:
- if first_dominating_write > cur_pos, set first_dominating_write = cur_pos

These are updates of all temporaries on scope change:

On ENDLOOP, for all temps:
- if loop_first > start of loop, set loop_first = start of loop
- first < start of loop, update last to end of loop


This should of course read:

- if first < start of loop and last is inside the loop, then update last 
to the end of the loop


Cheers,
Nicolai


- if undef_read between start and end of loop: set first = MIN(first, 
start of loop) and last = end of loop

- if first_dominating_write < end of loop, set undef_read = ~0

On ELSE and ENDIF, for all temps:
- if first_dominating_write > start of scope, set first_dominating_write 
= ~0


I'm not sure right now whether BREAK / CONT need special treatment at 
all. I think what you need is:


On ENDLOOP:
- if first_dominating_write is between the first BREAK in the loop and 
the end of the loop, set first_dominating_write = ~0


And for CONT, you probably don't really need anything, because CONTs 
cannot make you skip code forever.


What this state machine doesn't yet cover is

IF ..
   MOV TEMP[0], ...
ELSE
   MOV TEMP[0], ...
ENDIF

Still, I'd start with it and see whether you need to cover that case.

And even that case can probably be dealt with in a fairly efficient and 
pragmatic way. The idea is to keep track of the nesting level of IFs, 
plus an


uint32 dominating_write_in_true_block[4];

per temp. Then:

On ELSE:
- if first_dominating_write < cur_pos, set the bit corresponding to the 
current nesting level in dominating_write_in_true_block


On ENDIF:
- don't reset first_dominating_

Re: [Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

2017-06-13 Thread Gert Wollny
Am Montag, den 12.06.2017, 21:00 +0200 schrieb Nicolai Hähnle:

Thanks for you comments, although I do not agree with most of them. 

> 
> > Okay. I think you should seriously re-think your algorithm in a way
> > that  makes it a more natural evolution from the algorithm that's
> > already there.

Well, when I look at the current algorithm than I don't really see much
to evolve from: The tracking of loops is minimal and it actually only
uses the outer loop to assign life times.
In any case, 80% of this code I already re-used (i.e. the loop over the
instructions and iterating over the registers). 


> > Basically, the current algorithm tracks (first_write, last_read),
> > so  think about what you need to track in order to obtain a single-
> > pass  algorithm that computes lifetime (first, last) for every
> > temporary. I 

IMHO a single pass algorithm is not better then what I do now.
Especially with nested loops a single pass algorithm will be more
complicated.

Think e.g. 

...
BEGIN_LOOP
  ...
  BEGIN_LOOP
 a = ... 
 b = ... 
 c = a OP b; 
  END_LOOP 
  ... /* more processing that doesn't access a, b, or c */
END_LOOP

out =   f(a, c, ...)
END 


Here, a and c must be kept alive for both loops, but b only is 
needed for a few instructions. However, in a single pass state tracker 
I have to keep all the information for a, b, and c until the end of the
program, because only then I can discard the loop information for b,
and resolve everything else for a and c.

With the approach I am using, i.e. only collecting all the access
information in the pass over the program, and resolving the life-times
at the end by re-playing the temp-access timeline, the above can be
achieved with less hassle, because I don't need to track per temporary
information for all the temporaries all the time, instead, I only need
to resolve loops and scopes when it is really needed.

[snip]

To me the algorithm you lined out looks quite complicated, but not too
different from what I'm doing when I replay the access time-line of a
register. However, with your approach one has to track the state of
each variable all the time, information that could be shared otherwise
and might not even needed.

[snip]

> > 
> > I hope I didn't miss anything, because after all this is
> > admittedly subtle stuff.
It is, and this is why I think it is better to separate the steps into
manageable chunks that can be put under test. (I admit, I'm a fan of
test driven development, and for that I also think it is more important
to write test cases instead of sketching out algorithms).

> > Still, I think this kind of state-machine approach should 
> > work and allow you to avoid *lots* of allocations and pointer-
> > chasing.
The allocations I can (and will) get rid of, but I don't see some
pointer-chasing as a problem, since it is encapsulated within class
methods.

I thank you for your comments, but given that my code is working I
don't see that re-doing it from scratch is such a good idea. I think
refactoring it to eliminate the allocations and covering additional
test cases is a better approach. If this makes it possible to move the
implementation to be single pass, then I might consider it, but I think
tracking all the information for all temporaries all the time is not
such a good idea, especially for large shaders that might have 2000+
temporaries before register merging. 

Best, 
Gert 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

2017-06-13 Thread Nicolai Hähnle

On 13.06.2017 10:01, Gert Wollny wrote:

Am Montag, den 12.06.2017, 21:00 +0200 schrieb Nicolai Hähnle:

Thanks for you comments, although I do not agree with most of them.




Okay. I think you should seriously re-think your algorithm in a way
that  makes it a more natural evolution from the algorithm that's
already there.


Well, when I look at the current algorithm than I don't really see much
to evolve from: The tracking of loops is minimal and it actually only
uses the outer loop to assign life times.
In any case, 80% of this code I already re-used (i.e. the loop over the
instructions and iterating over the registers).



Basically, the current algorithm tracks (first_write, last_read),
so  think about what you need to track in order to obtain a single-
pass  algorithm that computes lifetime (first, last) for every
temporary. I


IMHO a single pass algorithm is not better then what I do now.
Especially with nested loops a single pass algorithm will be more
complicated.

Think e.g.

...
BEGIN_LOOP
   ...
   BEGIN_LOOP
  a = ...
  b = ...
  c = a OP b;
   END_LOOP
   ... /* more processing that doesn't access a, b, or c */
END_LOOP

out =   f(a, c, ...)
END


Here, a and c must be kept alive for both loops, but b only is
needed for a few instructions. However, in a single pass state tracker
I have to keep all the information for a, b, and c until the end of the
program, because only then I can discard the loop information for b,
and resolve everything else for a and c.


In terms of memory use, your approach also keeps the information for all 
variables all the time, because of the multi-pass.


In terms of running time, it's true that what I sketched will loop over 
all temporaries for every end-of-scope.


However, that's fairly simple to fix by keeping track of which 
temporaries occurred per-scope, and then only looping over those 
temporaries.


You could even do the tracking with a single stack-like array, so you 
end up with only 3 memory allocations:


- stack of currently active scopes
- array of temporaries
- stack of temporary-occurences per scope



With the approach I am using, i.e. only collecting all the access
information in the pass over the program, and resolving the life-times
at the end by re-playing the temp-access timeline, the above can be
achieved with less hassle, because I don't need to track per temporary
information for all the temporaries all the time, instead, I only need
to resolve loops and scopes when it is really needed.

[snip]

To me the algorithm you lined out looks quite complicated, but not too
different from what I'm doing when I replay the access time-line of a
register. However, with your approach one has to track the state of
each variable all the time, information that could be shared otherwise
and might not even needed.

[snip]



I hope I didn't miss anything, because after all this is
admittedly subtle stuff.

It is, and this is why I think it is better to separate the steps into
manageable chunks that can be put under test. (I admit, I'm a fan of
test driven development, and for that I also think it is more important
to write test cases instead of sketching out algorithms).


Still, I think this kind of state-machine approach should
work and allow you to avoid *lots* of allocations and pointer-
chasing.

The allocations I can (and will) get rid of, but I don't see some
pointer-chasing as a problem, since it is encapsulated within class
methods.


How will you get rid of those allocations? I find that it's useful to be 
able to sketch the data structures first.


At the very least, you need a vector per temporary. You're also not 
handling the case where a temporary is set in both branches of an 
if/else, and I'd argue that the approach is more annoying to adapt to 
handling it than what I sketched.


By the way, that's related to another conceptual advantage of having a 
state machine that acts on end-of-scope, which is that this is basically 
like having an action for phi-nodes.




I thank you for your comments, but given that my code is working I
don't see that re-doing it from scratch is such a good idea. I think
refactoring it to eliminate the allocations and covering additional
test cases is a better approach. If this makes it possible to move the
implementation to be single pass, then I might consider it, but I think
tracking all the information for all temporaries all the time is not
such a good idea, especially for large shaders that might have 2000+
temporaries before register merging.


Well, as I said, you're tracking all the information anyway, and as for 
when you *update* the information, there's an easy way to make that lean.


I'm curious what you'd suggest for getting rid of allocations anyway.

Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman

Re: [Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

2017-06-13 Thread Gert Wollny
Am Dienstag, den 13.06.2017, 11:07 +0200 schrieb Nicolai Hähnle:
> 
> 
> I'm curious what you'd suggest for getting rid of allocations anyway.

As the refactoring goes I think I will end up with a hybrid approach:
In the temporaries  I will not keep the full time line, but the
important read/write information - just like you suggested. However, I
will not resolve the scopes at the end of each loop, but only after the
program is fully scanned. For that I need to keep the information of
all scopes available and links to the key scopes fro each temporary.

Equal to what you pointed out above, I'll need three allocations for
this: 

- the vector for the scopes, 
- the vector for the temporaries 
- a stack to handle the scope changes.

To not limit the number of scopes and the scope nesting level the scope
vector and the stack might do re-allocations though.

I think right now I will not go for tracking whether a temporary is
written in both if/else branches or all switch cases. What I want to
achieve is that the drivers don't get into trouble because too many
temporaries need to be allocated when the TGSI is translated into byte
code (test case R600 with 124 free usable registers), and so far this
seems to work without tackling this detail. 

Thank you again for your insights, 
Gert 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

2017-06-14 Thread Nicolai Hähnle

On 14.06.2017 08:30, Gert Wollny wrote:

Am Dienstag, den 13.06.2017, 11:07 +0200 schrieb Nicolai Hähnle:



I'm curious what you'd suggest for getting rid of allocations anyway.


As the refactoring goes I think I will end up with a hybrid approach:
In the temporaries  I will not keep the full time line, but the
important read/write information - just like you suggested. However, I
will not resolve the scopes at the end of each loop, but only after the
program is fully scanned. For that I need to keep the information of
all scopes available and links to the key scopes fro each temporary.


Okay. Hmm, I think in a sense what you might end up doing is resolving 
scopes lazily. That does sound good, and I'm starting to see how it 
might be possible. Try to keep the data structures clean so that you can 
easily explain all the variables in comments. That should help a lot.




Equal to what you pointed out above, I'll need three allocations for
this:

- the vector for the scopes,
- the vector for the temporaries
- a stack to handle the scope changes.

To not limit the number of scopes and the scope nesting level the scope
vector and the stack might do re-allocations though.

I think right now I will not go for tracking whether a temporary is
written in both if/else branches or all switch cases. What I want to
achieve is that the drivers don't get into trouble because too many
temporaries need to be allocated when the TGSI is translated into byte
code (test case R600 with 124 free usable registers), and so far this
seems to work without tackling this detail.


Sounds good. If I'm understanding the "lazy scope resolution" correctly, 
it might even be not too difficult to add if/else and switch case 
resolution after the fact. Anyway, it's not needed for a first version.


Cheers,
Nicolai




Thank you again for your insights,
Gert




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev