from:"Eero Tamminen"

Re: glxsync - explicit frame synchronization sample implementation

2021-12-30 Thread Eero Tamminen


Hi,

Different window managers do resizes differently.  See for example:
https://gitlab.gnome.org/GNOME/mutter/-/issues/60

On which window managers did you test this?


- Eero

On 30.12.2021 7.20, Michael Clark wrote:

Dear Mesa Developers,

I have been using GLFW for tiny cross-platform OpenGL demos for some 
time but something that has really been bothering me are the visual 
artifacts when resizing windows. Over the last year or so I have made 
multiple attempts at solving this issue, digging progressively deeper 
each time, until spending the last month researching compositor 
synchronization protocols, reading compositor code, and writing this 
demo as a prelude to figuring out how one might fix this issue in GLFW 
or even Chrome.


I decided that first it might be a good idea to come up with the 
simplest possible isolated example comprising of a near complete 
solution without the unnecessary complexity of layering for all of the 
cross-platform abstractions. It seems to me despite the ease this can be 
solved with Wayland EGL, it is still useful, primarily for wider 
compatibility, to be able to package X11 GLX applications, which is the 
window system that I typically use when targeting Linux with GLFW.


That brings me to _glxsync_ which is an attempt at creating a minimally 
correct implementation of explicit frame synchronization using X11, GLX, 
XSync and the latest compositor synchronization protocols [1,2], tested 
to work with mutter and GNOME on Xorg or Xwayland.


- https://github.com/michaeljclark/glxsync/

_glxsync_ is an X Windows OpenGL demo app using GLX and XSync extended 
frame synchronization responding to synchronization requests from the 
compositor in response to configuration changes for window resizes. The 
demo updates extended synchronization counters before and after frames 
to signal to the compositor that rendering is in progress so that 
buffers read by the compositor are complete and matches the size in 
configuration change events. It also has rudimentary congestion control.


_glxsync_ depends on the following X11 window system atoms:

- _NET_WM_SYNC_REQUEST
- _NET_WM_SYNC_REQUEST_COUNTER
- _NET_WM_FRAME_DRAWN
- _NET_WM_FRAME_TIMINGS
- _NET_WM_PING

_glxsync_ *does not* yet implement the following extensions:

- _NET_WM_SYNC_FENCES
- _NET_WM_MOVERESIZE

_glxsync_ depends on the following libraries: _X11, Xext, GLX, GL_.

I have to say there were numerous subtle issues that I found while 
testing this code on Ubuntu 21.10 XWayland with an Intel Mesa graphics 
stack and Ubuntu 20.04 LTS Xorg with the NVIDIA proprietary graphics 
stack, so I have no idea how it will fly with other drivers and am very 
interested in feedback. There really is not much sample code that I 
could find that addresses this issue.


I found the Intel driver particularly finicky and there are some very 
carefully placed XFlush calls *before* frame renders, and XSync calls 
during congestion. There are also the beginnings of adaptive frame rate 
using frame times and render timings stored in a circular buffer. That 
said, there is no advanced adaptive frame rate logic beyond detecting 
circumstances that can lead to tears with a back-off to the measured 
short term average frame rate from statistics, and some logic to delay 
frames when there are collisions with Expose events.


There is also some rudimentary tracing infrastructure and some carefully 
placed calls to poll, XEventsQueued(d, QueuedAlready), XEventsQueued(d, 
QueuedAfterReading) to avoid blocking in XNextEvent at all costs. I 
found it necessary to add a heuristic to avoid frame submission until 
receiving frame timings from the compositor. Intuitively one might think 
this makes the loop synchronous, but with the NVIDIA driver, it appears 
the heuristic still allows multiple frames to be submitted in advance. 
It is certainly finicky to debug. There is a --no-sync option to 
simulate the absence of compositor synchronization as a testing aid.


There is very little back-pressure signaling to the client beyond the 
ability to observe timings and serial numbers in frame drawn and frame 
timing messages. It worries me that I need very careful placement of 
XFlush and XSync to make the demo work so I would really appreciate 
feedback if I am doing it wrong. There is some interesting potential for 
control loops when using stats for adaptive frame rate, so I have not 
yet attempted any sophisticated congestion control algorithm.


In any case I am sharing this code with the hopes that folk can help 
with testing. I was thinking to make a patch for GLFW but this was a 
first step. I would really appreciate if folks could help test on 
different drivers such as nouveau and amdgpu as I don't have access to 
them. The code is currently released under the PLEASE LICENSE which is 
practically public domain with one exception, but I am not disinclined 
towards releasing it under an MIT license if it were found to be a 
useful

Re: revenge of CALLOC_STRUCT

2021-12-27 Thread Eero Tamminen


Hi,

Valgrind should be able to report mismatching alloc & free calls, and I 
think one can also annotate which functions are alloc & free ones.


AFAIK Valgrind does not work on Windows, but you could Valgrind Windows 
Mesa version on Linux using Wine.



- Eero

On 26.12.2021 12.37, Jose Fonseca wrote:
I believe that as long as the CALLOC_STRUCT continue to get paired with 
right FREE call (or equivalent if these get renamed) either way should 
work for us.  Whatever option proposed gets followed, there's a risk 
these can get out of balanced, but the risk seems comparable.



For context, IIRC, the main reason these macros remain useful for VMware 
is the sad state of memory debugging tools on Windows.  AFAICT, best 
hope of one day deprecating this would be use AddressSanitizer, which is 
supported on MSVC [1], but unfortunately not yet on MinGW w/ GCC [2], 
and we rely upon a lot for day-to-day development, using Linux 
cross-compilation.  Using MinGW w/ Clang cross compiler seems be a way 
to overcome this difficulty, but that too was still in somewhat 
experimental state when I last tried it.



Jose

[1] 
https://devblogs.microsoft.com/cppblog/asan-for-windows-x64-and-debug-build-support/ 

[2] 
https://stackoverflow.com/questions/67619314/cannot-use-fsanitize-address-in-mingw-compiler 



*From:* Dave Airlie 
*Sent:* Wednesday, December 22, 2021 22:35
*To:* mesa-dev ; Jose Fonseca 
; Brian Paul 

*Subject:* revenge of CALLOC_STRUCT
Hey,

Happy holidays, and as though to consider over the break,

We have the vmware used MALLOC/FREE/CALLOC/CALLOC_STRUCT wrappers used
throughout gallium.

We have ST_CALLOC_STRUCT in use in the mesa state tracker, not used in 
gallium.


Merging the state tracker into mesa proper, and even prior to this a
few CALLOC_STRUCT have started to leak into src/mesa/*.

Now I don't think we want that, but CALLOC_STRUCT is a genuinely
useful macro on it's own,
I'm considering just defined a calloc_struct instead for the
non-gallium use that goes with malloc/calloc/free.

Any opinions, or should mesa just get u_memory.h support for Xmas?

Dave.

Re: [Mesa-dev] Workflow Proposal

2021-10-11 Thread Eero Tamminen


Hi,

On 7.10.2021 16.00, Alyssa Rosenzweig wrote:

I would love to see this be the process across Mesa.  We already don't
rewrite commit messages for freedreno and i915g, and I only have to do
the rebase (busy-)work for my projects in other areas of the tree.


Likewise for Panfrost. At least, I don't do the rewriting. Some Panfrost
devs do, which I'm fine with. But it's not a requirement to merging.

The arguments about "who can help support this years from now?" are moot
at our scale... the team is small enough that the name on the reviewer
is likely the code owner / maintainer, and patches regularly go in
unreviewed for lack of review bandwidth.


I was thinking more of outsiders trying to track down issues (who may be 
using older Mesa versions, and wanting both to ask about bisected 
commit, and find out whether there's a fix for given issue in upstream).


Longer list of people is useful in case Mesa driver team has been 
changing in intervening years.


But this is very much a corner-case concern.



I think it's reasonable that small driver teams and large common
components have different review needs, and it's ok for the process to
reflect that.


+1


- Eero

Re: [Mesa-dev] Workflow Proposal

2021-10-07 Thread Eero Tamminen


Hi,

On 7.10.2021 19.51, Daniel Stone wrote:

On Thu, 7 Oct 2021 at 09:38, Eero Tamminen  wrote:

This sounds horrible from the point of view of trying to track down
somebody who knows about what's & why's of some old commit that is later
on found to cause issues...


But why would your first point of call not be to go back to the review
discussion and look at the context and what was said at the time? Then
when you do that, you can see not only what happened, but also who was
involved and saying what at the time.


You're assuming that:
- The review discussion is still available [1]
- One can find it based on given individual commit

[1] system hosting it could be down, or network could be down.

It's maybe a bit contrived situation, but I kind of prefer 
self-contained information. What, why and who is better to be in commit 
itself than only in MR.



- Eero

Re: [Mesa-dev] Workflow Proposal

2021-10-07 Thread Eero Tamminen


Hi,

On 6.10.2021 23.00, Jordan Justen wrote:

Bas Nieuwenhuizen  writes:

On Wed, Oct 6, 2021 at 8:49 PM Jordan Justen  wrote:

I guess I missed where it was suggested that Marge should remove
Reviewed-by tags. I don't think Marge should ever remove something from
the commit message.


AFAIU this is upstream Marge behavior. Once you enable the
Approval->Rb tag conversion it removes existing Rb tags. Hence why we
don't have the conversion enabled.



Ah, I guess it is documented for --add-reviewers here:

https://github.com/smarkets/marge-bot#adding-reviewed-by-tested-and-part-of-to-commit-messages

"All existing Reviewed-by: trailers on commits in the branch will be
  stripped."


This sounds horrible from the point of view of trying to track down 
somebody who knows about what's & why's of some old commit that is later 
on found to cause issues...


For whom those extra Rb tags are a problem and why?



I hope we would wait for Marge to add a --add-approvers switch which
would leave Reviewed-by tags alone, but add Approved-by tags.


+1


- Eero

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Eero Tamminen


Hi,

On 28.2.2020 10.48, Dave Airlie wrote:

Yes, we could federate everything back out so everyone runs their own
builds and executes those. Tinderbox did something really similar to
that IIRC; not sure if Buildbot does as well. Probably rules out
pre-merge testing, mind.


Why? does gitlab not support the model? having builds done in parallel
on runners closer to the test runners seems like it should be a thing.
I guess artifact transfer would cost less then as a result.


Do container images being tested currently include sources, build 
dependencies or intermediate build results in addition to the installed 
binaries?



Note: with e.g. Docker, removing those after build step doesn't reduce 
image size.


You need either to do the removal in the same step, like this:

RUN git clone --branch ${TAG_LIBDRM} --depth 1 \
https://gitlab.freedesktop.org/mesa/drm.git && \
cd drm  &&  mkdir build  &&  cd build  && \
meson --prefix=${INSTALL_DIR} --libdir ${LIB_DIR} \
  -Dcairo-tests=false ../  && \
ninja  &&  ninja install  && \
cd ../..  &&  rm -r drm


Or use multi-stage container where the final stage installs just
the run-time dependencies, and copies the final (potentially stripped)
installed binaries from the build stage.


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Profile-guides optimizations

2020-02-14 Thread Eero Tamminen


Hi,

On 13.2.2020 20.44, Dylan Baker wrote:

I actually spent a bunch of time toying with PGO a couple of years ago. I got
the guidance all working and was able to train it, but what we found was that it
made the specific workloads we threw at it much faster, but it made every real
world use case I tried (playing a game, running piglit, gnome) slower, often
significantly so.

The hard part is not setting up pgo, it's getting the right training data.


Do you still remember what exact use-cases were used for training, and
what exact cases suffered worst as a result?  Did you for example
profile also X server, and (some) X and Wayland compositor during
the application profiling?

Also, did you use normal FDO (instrumented code) or AutoFDO (profiling)
to gather the profiling data?  I'm wondering whether AutoFDO could be
done so that you install Mesa to the system, profile the whole system,
and just filter out non-Mesa profiling data when applying the
information to the new build...


- Eero


Dylan

Quoting Marek Olšák (2020-02-13 10:30:46)

[Forked from the other thread]

Guys, we could run some simple tests similar to piglit/drawoverhead as the last
step of the pgo=generate build. Tests like that should exercise the most common
codepaths in drivers. We could add subtests that we care about the most.

Marek

On Thu., Feb. 13, 2020, 13:16 Dylan Baker,  wrote:

 meson has buildtins for both of these, -Db_lto=true turns on lto, for pgo
 you
 would run:

 meson build -Db_pgo=generate
 ninja -C build
 
 meson configure build -Db_pgo=use
 ninja -C build

 Quoting Marek Olšák (2020-02-12 10:46:12)
 > How do you enable LTO+PGO? Is it something we could enable by default for
 > release builds?
 >
 > Marek
 >
 > On Wed, Feb 12, 2020 at 1:56 AM Dieter Nützel 
 wrote:
 >
 >     Hello Gert,
 >
 >     your merge 'broke' LTO and then later on PGO compilation/linking.
 >
 >     I do generally compiling with '-Dgallium-drivers=
 r600,radeonsi,swrast'
 >     for testing radeonsi and (your) r600 work. ;-)
 >
 >     After your merge I get several warnings in 'addrlib' with LTO and
 even a
 >     compiler error (gcc (SUSE Linux) 9.2.1 20200128).
 >
 >     I had to disable 'r600' ('swrast' is needed for 'nine') to get a
 working
 >     LTO and even better PGO radeonsi driver.
 >     I'm preparing GREAT LTO+PGO (the later is the greater) numbers over
 the
 >     last 2 months. I'll send my results later, today.
 >
 >     Summary
 >     radeonsi is ~40% smaller and 16-20% faster with PGO (!!!).
 >
 >     Honza and the GCC people (Intel's ICC folks) do GREAT things.
 >     'glmark2' numbers are better then 'vkmark'. (Hello, Marek.).
 >
 >     Need some sleep.
 >
 >     See my log, below.
 >
 >     Greetings and GREAT work!
 >
 >     -Dieter
 >
 >     Am 09.02.2020 15:46, schrieb Gert Wollny:
 >     > Am Donnerstag, den 23.01.2020, 20:31 +0100 schrieb Gert Wollny:
 >     >> has anybody any objections if I merge the r600/NIR code?
 >     >> Without explicitely setting the debug flag it doesn't change a
 >     >> thing, but it would be better to continue developing in-tree.
 >     > Okay, if nobody objects, I'll merge it Monday evening.
 >     >
 >     > Best,
 >     > Gert
 >
 >     [1425/1433] Linking target src/gallium/targets/dri/libgallium_dri.so.
 >     FAILED: src/gallium/targets/dri/libgallium_dri.so
 >     c++  -o src/gallium/targets/dri/libgallium_dri.so
 >     'src/gallium/targets/dri/8381c20@@gallium_dri@sha/target.c.o' -flto
 >     -fprofile-generate -Wl,--as-needed -Wl,--no-undefined -Wl,-O1 -shared
 >     -fPIC -Wl,--start-group -Wl,-soname,libgallium_dri.so
 >     src/mesa/libmesa_gallium.a src/mesa/libmesa_common.a
 >     src/compiler/glsl/libglsl.a src/compiler/glsl/glcpp/libglcpp.a
 >     src/util/libmesa_util.a src/util/format/libmesa_format.a
 >     src/compiler/nir/libnir.a src/compiler/libcompiler.a
 >     src/mesa/libmesa_sse41.a src/mesa/drivers/dri/common/libdricommon.a
 >     src/mesa/drivers/dri/common/libmegadriver_stub.a
 >     src/gallium/state_trackers/dri/libdri.a
 >     src/gallium/auxiliary/libgalliumvl.a src/gallium/auxiliary/
 libgallium.a
 >     src/mapi/shared-glapi/libglapi.so.0.0.0
 >     src/gallium/auxiliary/pipe-loader/libpipe_loader_static.a
 >     src/loader/libloader.a src/util/libxmlconfig.a
 >     src/gallium/winsys/sw/null/libws_null.a
 >     src/gallium/winsys/sw/wrapper/libwsw.a
 >     src/gallium/winsys/sw/dri/libswdri.a
 >     src/gallium/winsys/sw/kms-dri/libswkmsdri.a
 >     src/gallium/drivers/llvmpipe/libllvmpipe.a
 >     src/gallium/drivers/softpipe/libsoftpipe.a
 >     src/gallium/drivers/r600/libr600.a
 >     src/gallium/winsys/r

Re: [Mesa-dev] Merging experimental r600/nir code

2020-02-13 Thread Eero Tamminen


Hi,

On 13.2.2020 10.38, Timur Kristóf wrote:

I think the question about PGO is this: are the profiles of the users'
applications gonna be the same as the profile that is collected from
the benchmarks?

Eg. if the test benchmark uses different draw calls or triggers
different shader compiler code paths than a your favourite game, in
theory PGO could harm the performance of your game.

Also, how do we prevent it from making bad decisions based on the hw
that the profile was made on?

For example, if you collect the profiling data from a machine that has
a Polaris 10 GPU, then the profile will show that chip_class is
extremely likely to be GFX8 and thus the PGO build will be optimized
for that case. If I then run the same build on my Navi 10, the PGO
build might actually be slower, because the driver needs to take a
different code path than what the PGO build was optimized for.

What do you guys think about this?


How much HW specific stuff can impact things, depends on whether those 
things are executed constantly, or is it only something done once.  If 
former, it may be useful to (try) design driver so that they get 
executed only once.


Most CPU extensive part is shader compilation (with Intel, linking stage 
more than things done before it), and the heavy part is AFAIK to a large 
extent HW independent.  In benchmarks, shader compilation is almost 
always done at startup, in games shader compilation typically happens 
also afterwards.


As to how much PGO can make things worse, I think that depends on how 
independent the non-executed part of the code is.  If it's not mixed 
with code that did get executed, I don't think there will be any visible 
impact.  But if it's badly mixed, hot/cold function identification will 
group things wrong.



- Eero


Best regards,
Timur

On Thu, 2020-02-13 at 02:40 -0500, Marek Olšák wrote:

Can we automate this?

Let's say we implement noop ioctls for radeonsi and iris, and then we
run the drivers to collect pgo data on any hw.

Can meson execute this build sequence:
build with pgo=generate
run tests
clean
build with pgo=use

automated as buildtype=release-pgo.
a bit
Marek

On Wed., Feb. 12, 2020, 23:37 Dieter Nützel, 
wrote:

Hello Marek,

I hoped you would ask this...
...but first sorry for the delay of my announced numbers.
Our family is/was sick, my wife more than me and our children are
fine,
again.
So be lenient with me somewhat.

Am 12.02.2020 19:46, schrieb Marek Olšák:

How do you enable LTO+PGO? Is it something we could enable by

default

for release builds?

Marek


I think we can achieve this.

I'm running with LTO+PGO 'release' since late December (around
Christmas).
My KDE Plasma5 (OpenGL 3.0) system/desktop was never
agiler/fluider
since then.
Even the numbers (glmark2) show it. The 'glmark2' numbers are the
best
I've ever seen on this system.
LTO offer only some small space reduction and hardly any speedup.
But LTO+PGO is GREAT.

First I compile with '-Db_lto=true -Db_pgo=generate'.

mkdir build
cd build
meson ../ --strip --buildtype release -Ddri-drivers=
-Dplatforms=drm,x11
-Dgallium-drivers=r600,radeonsi,swrast -Dvulkan-drivers=amd
-Dgallium-nine=true -Dgallium-opencl=standalone -Dglvnd=true
-Dgallium-va=true -Dgallium-xvmc=false -Dgallium-omx=disabled
-Dgallium-xa=false -Db_lto=true -Db_pgo=generate

After that my 'build' dir looks like this:

drwxr-xr-x  8 dieter users4096 13. Feb 04:34 .
drwxr-xr-x 14 dieter users4096 13. Feb 04:33 ..
drwxr-xr-x  2 dieter users4096 13. Feb 04:34 bin
-rw-r--r--  1 dieter users 4369873 13. Feb 04:34 build.ninja
-rw-r--r--  1 dieter users 4236719 13. Feb 04:34
compile_commands.json
drwxr-xr-x  2 dieter users4096 13. Feb 04:34 include
drwxr-xr-x  2 dieter users4096 13. Feb 04:34 meson-info
drwxr-xr-x  2 dieter users4096 13. Feb 04:33 meson-logs
drwxr-xr-x  2 dieter users4096 13. Feb 04:34 meson-private
drwxr-xr-x 14 dieter users4096 13. Feb 04:34 src

time nice +19 ninja

Lasts ~15 minutes on my aging/'slow' Intel Xeon X3470 Nehalem,
4c/8t,
2.93 GHz, 24 GB, Polaris 20.
Without LTO+PGO it is ~4-5 minutes. (AMD anyone?)

Then I remove all files/dirs except 'src'.

Next 'installing' the new built files under '/usr/local/' (mostly
symlinked to /usr/lib64/).

Now run as much OpenGL/Vulkan progs as I can.
Normaly starting with glmark2 and vkmark.

Here comes my (whole) list:
Knights
Wireshark
K3b
Skanlite
Kdenlive
GIMP
Krita
FreeCAD
Blender 2.81x
digikam
K4DirStat
Discover
YaST
Do some 'movements'/work in/with every prog.
+
some LibreOffice work (OpenGL enabled)
one or two OpenGL games
and Vulkan games
+
run some WebGL stuff in my browsers (Konqi/FF).

After that I have the needed '*.gcda' files in 'src'.

Now second rebuild in 'src'.
Due to the deleted files/dirs I can do a second 'meson' config run
in my
current 'build' dir.

meson ../ --strip --buildtype release -Ddri-drivers=
-Dplatforms=drm,x11
-Dgallium-drivers=r600,radeonsi,swrast -Dvulkan-drivers=amd
-Dgallium-nine=true -

Re: [Mesa-dev] Mesa CI with trace regression testing

2019-09-30 Thread Eero Tamminen


Hi,

On 27.9.2019 4.32, Eric Anholt wrote:

Alexandros Frantzis  writes:

The last couple of months we (at Collabora) have been working on a
prototype for a Mesa testing system based on trace replays that supports
correctness regression testing and, in the future, performance
regression testing.

We are aware that large-scale CI systems that perform extensive checks
on Mesa already exist. However, our goal is not to reach that kind of
scale or exhaustiveness, but to produce a system that will be simple and
robust enough to be maintained by the community, while being useful
enough so that the community will want to use and maintain it. We also
want to be able to make it fast enough so that it will be run eventually
on a regular basis, ideally in pre-commit fashion.

The current prototype focuses on the correctness aspect, replaying
traces and comparing images against a set of reference images on
multiple devices. At the moment, we run on softpipe and
intel/chromebook, but it's straightforward to add other devices through
gitlab runners.

For the prototype we have used a simple approach for image comparison,
storing a separate set of reference images per device and using exact
image comparison, but we are also investigating alternative ways to deal
with this. First results indicate that the frequency of reference image
mismatches due to non-bug changes in Mesa is acceptable, but we will get
a more complete picture once we have a richer set of traces and a longer
CI run history.


For CI, I think discarding/ignoring too unstable / slow traces would
be perfectly acceptable. [1]



Some missing context: I was told that over 2400 commits, in glmark2 + a
couple of other open source traces, on intel, there was one spurious
failure due to this diff method.  This is lower than I felt like it was
when I did this in piglit on vc4, but then I was very actively changing
optimization in the compiler while I was using that tool.


Few years ago when I was looking at the results from ezBench (at
the same time) bisecting Mesa commit ranges for build, run-time,
performance and rendering issues, it was very useful to have
rendering diff results in addition to performance numbers.

Rendering didn't change too often, but one needs to look at every change
screenshot directly, error metrics about them aren't enough.  Some
innocent accuracy difference due to calculation order change can cause
e.g. marginally different color on huge area on the rendered result [1],
whereas some real rendering error can be just some tiny reflection
missing from the render, which one would never see from running
benchmark (even with correct one running beside it), one sees them only
from static screenshots.

[1] Whether screenshots are affected by calculation changes, depends
a lot on the benchmark, how stable its calculations are in regards to
accuracy variations. Some benchmarks even use random in their shaders...

(If I remember correctly, good example of unstable results were some
of the GpuTest benchmarks.)



The current design is based on an out-of-tree approach, where the tracie
CI works independently from Mesa CI, fetching and building the latest
Mesa on its own. We did this for maximum flexibility in the prototyping
phase, but this has a complexity cost, and although we could continue to
work this way, we would like to hear people's thoughts about eventually
integrating with Mesa more closely, by becoming part of the upstream
Mesa testing pipelines.

It's worth noting that the last few months other people, most notably
Eric Anholt, have made proposals to extend the scope of testing in CI.
We believe there is much common ground here (multiple devices,
deployment with gitlab runners) and room for cooperation and eventual
integration into upstream Mesa. In the end, the main difference between
all these efforts are the kind of tests (deqp, traces, performance) that
are being run, which all have their place and offer different
trade-offs.

We have also implemented a prototype dashboard to display the results,
which we have deployed at:

https://tracie.freedesktop.org

We are working to improve the dashboard and provide more value by
extracting and displaying additional information, e.g., "softpipe broken
since commit NNN".

The dashboard is currently specific to the trace playback results, but
it would be nice to eventually converge to a single MesaCI dashboard
covering all kinds of Mesa CI test results. We would be happy to help
develop in this direction if there is interest.

You can find the CI scripts for tracie at:

https://gitlab.freedesktop.org/gfx-ci/tracie/tracie

Code for the dashboard is at:

https://gitlab.freedesktop.org/gfx-ci/tracie/tracie_dashboard

Here is an example of a failed CI job (for a purposefully broken Mesa
commit) and the report of the failed trace (click on the red X to
see the image diffs):

https://tracie.freedesktop.org/dashboard/job/642369/

Looking forward to your thoughts and comments.


A couple of thoughts o

Re: [Mesa-dev] 19.2 release train going off the rails

2019-09-04 Thread Eero Tamminen


Hi,

On 3.9.2019 20.27, Matt Turner wrote:

What is going on with the 19.2 RCs? I know that we said we would push
the releases back a week to let some feature work land, but the last
blocker of the feature tracker [0] landed on the 14th and RC1 wasn't
until the 20th. Now another two weeks have passed without an RC. RC2
should have been tagged last week and RC3 this week.

Can someone take over 19.2 and get this rolling again?

Thanks,
Matt

[0] https://bugs.freedesktop.org/show_bug.cgi?id=111265


19.2 release tracker bug has quite a few open regressions open:
https://bugs.freedesktop.org/showdependencytree.cgi?id=111444&hide_resolved=1

Some of the bugs seem to have active development, couple of them might 
not be Mesa bugs, although they started to trigger with new Mesa, but 
there are also couple which don't have any comments.



- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] renderdoc-traces: like shader-db for runtime

2019-06-25 Thread Eero Tamminen


Hi,

On 24.6.2019 19.36, Elie Tournier wrote:

Great topic. For the past few days, I was looking at a CI for Mesa:
https://gitlab.freedesktop.org/hopetech/tracie
OK, it's in a very very alpha stage. ;)

My idea was to use apitrace to dump and replay traces then compare images with 
reference
images or images dump the previous week.
Apitrace was a good choice for a "correctness CI", maybe not for the "performance 
CI".

@eric Out of curiosity, did you looked at apitrace or did you go straight with 
renderdoc?


Note: ezBench supports both Apitrace & vktrace.



I add below some comments based on what I learned playing with the CI.


On Sat, Jun 22, 2019 at 10:59:34AM -0700, Rob Clark wrote:

On Thu, Jun 20, 2019 at 12:26 PM Eric Anholt  wrote:


Hey folks, I wanted to show you this follow-on to shader-db I've been
working on:

https://gitlab.freedesktop.org/anholt/renderdoc-traces


"On each frame drawn, renderdoccmd replay sets up the initial GL state 
again. This will include compiling programs."


Ouch.  This makes it pretty much useless for performance testing.



For x86 development I've got a collection of ad-hoc scripts to capture
FPS numbers from various moderately interesting open source apps so I
could compare-perf them.  I was only looking at specific apps when they
seemed relevant, so it would be easy to miss regressions.

Starting work on freedreno, one of the first questions I ran into was
"does this change to the command stream make the driver faster?".  I
don't have my old set of apps on my debian ARM systems, and even less so
for Chrome OS.  Ultimately, users will be judging us based on web
browser and android app performance, not whatever I've got laying around
on my debian system.  And, I'd love to fix that "I ignore apps unless I
think of them" thing.

So, I've used renderdoc to capture some traces from Android apps.  With
an unlocked phone, it's pretty easy.  Tossing those in a repo (not
shared here), I can then run driver changes past them to see what
happens.  See
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1134 for some
results.

Where is this repo going from here?

- I add a runner for doing frame-to-frame consistency tests.  We could
   catch UB in a lot of circumstances by replaying a few times and making
   sure that results are consistent.  Comparing frames between drivers
   might also be interesting, though for that you would need human
   validation since pixel values and pixels lit will change on many
   shader optimization changes.

Comparing frames between drivers is hard. I try comparing LLVMpipe, sotfpipe 
and i965.
They all produce different frames.
For the human validation, it's sadly hard to avoid. One of the idea Erik come 
of with
was to use a mask.


Statistical approach could be better (like error is handled in video
compression).



I think we should first focus on comparing frame from the same driver and 
extend later.
The subject is hard enough. ;)


Note that there are some benchmarks which don't produce stable rendering
results because they include random input, so you can't do automated
rendering differences detection for them.  I would suggest just dropping
those (I don't anymore remember which benchmarks were such, but Martin
Peres might).



- Need to collect more workloads for the public repo:

I would be happy to help here.
We should create a list of FOSS games/apps to dump based on there OGL 
requirement.


   - I've tried to capture webgl on Chrome and Firefox on Linux with no
 luck. WebGL on ff is supposed to work under apitrace, maybe I could
 do that and then replay on top of renderdoc to capture.


perhaps worth a try capturing these on android?

I have managed to apitrace chromium-browser in the past.. it ends up a
bit weird because there are multiple contexts, but apitrace has
managed to replay them.  Maybe the multiple ctx thing is confusing
renderdoc?

(tbh I've not really played w/ renderdoc yet.. I should probably do so..)


   - Mozilla folks tell me that firefox's WebRender display lists can be
 captured in browser and then replayed from the WR repo under
 apitrace or rendredoc.

   - I tried capturing Mozilla's new Pathfinder (think SVG renderer), but
 it wouldn't play the demo under renderdoc.

   Do you have some apps that should be represented here?

- Add microbenchmarks?  Looks like it would be pretty easy to grab
   piglit drawoverhead results, not using renderdoc.  Capturing from
   arbitrary apps expands the scope of the repo in a way I'm not sure I'm
   excited about (Do we do different configs in those apps?  Then we need
   config infrastructure.  Ugh).

- I should probably add an estimate of "does this overall improve or
   hurt perf?"  Yay doing more stats.


Good way to measure perf could be repeating specific frame in a trace
(when that doesn't include re-compiling the shaders).

If I remember correctly, that's already supported by vktrace and
(apitrace based) frameretrace.



Sure. Sadly mo

Re: [Mesa-dev] renderdoc-traces: like shader-db for runtime

2019-06-24 Thread Eero Tamminen


Hi,

On 20.6.2019 22.26, Eric Anholt wrote:

Hey folks, I wanted to show you this follow-on to shader-db I've been
working on:

https://gitlab.freedesktop.org/anholt/renderdoc-traces

For x86 development I've got a collection of ad-hoc scripts to capture
FPS numbers from various moderately interesting open source apps so I
could compare-perf them.  I was only looking at specific apps when they
seemed relevant, so it would be easy to miss regressions.

Starting work on freedreno, one of the first questions I ran into was
"does this change to the command stream make the driver faster?".  I
don't have my old set of apps on my debian ARM systems, and even less so
for Chrome OS.  Ultimately, users will be judging us based on web
browser and android app performance, not whatever I've got laying around
on my debian system.  And, I'd love to fix that "I ignore apps unless I
think of them" thing.

So, I've used renderdoc to capture some traces from Android apps.  With
an unlocked phone, it's pretty easy.  Tossing those in a repo (not
shared here), I can then run driver changes past them to see what
happens.  See
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1134 for some
results.

Where is this repo going from here?

- I add a runner for doing frame-to-frame consistency tests.  We could
   catch UB in a lot of circumstances by replaying a few times and making
   sure that results are consistent.  Comparing frames between drivers
   might also be interesting, though for that you would need human
   validation since pixel values and pixels lit will change on many
   shader optimization changes.


When bisecting, ezBench[1] lists all perf and rendering changes, and
provides visual diff for *all* the rendering changes.  Without latter,
something like this would have been really hard to notice & bisect:
https://bugs.freedesktop.org/show_bug.cgi?id=103197

Or something like this in a moving part of a long benchmark:
https://bugs.freedesktop.org/attachment.cgi?id=134878
https://bugs.freedesktop.org/attachment.cgi?id=134879

As to providing info on how likely driver is to be rendering something
wrong / differently instead of differences being just due to accuracy /
calculations order... Maybe something like PSNR would be useful:
https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio
?

- Eero

[1] https://github.com/freedesktop/ezbench



- Need to collect more workloads for the public repo:

   - I've tried to capture webgl on Chrome and Firefox on Linux with no
 luck. WebGL on ff is supposed to work under apitrace, maybe I could
 do that and then replay on top of renderdoc to capture.

   - Mozilla folks tell me that firefox's WebRender display lists can be
 captured in browser and then replayed from the WR repo under
 apitrace or rendredoc.

   - I tried capturing Mozilla's new Pathfinder (think SVG renderer), but
 it wouldn't play the demo under renderdoc.

   Do you have some apps that should be represented here?

- Add microbenchmarks?  Looks like it would be pretty easy to grab
   piglit drawoverhead results, not using renderdoc.  Capturing from
   arbitrary apps expands the scope of the repo in a way I'm not sure I'm
   excited about (Do we do different configs in those apps?  Then we need
   config infrastructure.  Ugh).

- I should probably add an estimate of "does this overall improve or
   hurt perf?"  Yay doing more stats.

- I'd love to drop scipy.  I only need it for stats.t.ppf, but it
   prevents me from running run.py directly on my targets.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] GLU and GLUT issues migrated to gitlab

2019-05-15 Thread Eero Tamminen


Hi,

On 14.5.2019 20.06, Adam Jackson wrote:

On Mon, 2019-05-13 at 12:14 -0400, Adam Jackson wrote:

Not, I hope, that anyone is likely to notice, but I've moved the open
GLU and GLUT bugs from bugzilla to gitlab and closed the components in
bugzilla. I say "moved" but really glut only had one open bug and it
was fixed ages ago. Anyway, use gitlab now please.


mesa/demos has been migrated now as well.


Looking at the results for a patch inlined to a comment in this bug:
https://bugs.freedesktop.org/show_bug.cgi?id=97043

Migration tooling will need some improvements to avoid messed formatting.


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Mesa/DRI applications freezing randomly

2019-04-18 Thread Eero Tamminen


Hi,

(Please add answers to the bug you create about this)

On 17.4.2019 16.22, Augusto Mecking Caringi wrote:

Recently applications using Mesa/DRI libraries started fo freeze
randomly in my system...

I'm running Fedora 29 x86_64:
- Kernel: x86_64 Linux 5.0.7-200.fc29.x86_64
- X.Org X Server 1.20.4
- mesa 18.3.6


Have you tried using older versions of each of these to see which one is 
the cause of the regression?


Or latest Mesa git version?



$ lspci -k | grep -EA3 'VGA compatible'
00:02.0 VGA compatible controller: Intel Corporation Skylake GT2 [HD
Graphics 520] (rev 07)

>

 Subsystem: Lenovo Device 2238
 Kernel driver in use: i915
 Kernel modules: i915

$ glxinfo | grep -i "vendor\|rendering"
direct rendering: Yes
server glx vendor string: SGI
client glx vendor string: Mesa Project and SGI
 Vendor: Intel Open Source Technology Center (0x8086)
OpenGL vendor string: Intel Open Source Technology Center

$ grep LoadModule /var/log/Xorg.0.log
[16.093] (II) LoadModule: "glx"
[16.110] (II) LoadModule: "intel"
[16.138] (II) LoadModule: "dri3"
[16.138] (II) LoadModule: "dri2"
[16.138] (II) LoadModule: "present"
[16.325] (II) LoadModule: "libinput"


I'm attaching the backtrace of some frozen applications.


They all seem to be stuck in same place, waiting for buffer from X 
server or compositor.


Which desktop environment / compositor you're using?


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Mesa 19.1.0 release plan

2019-04-16 Thread Eero Tamminen


Hi,

On 16.4.2019 11.38, Juan A. Suarez Romero wrote:

On Mon, 2019-04-15 at 18:57 +0300, Eero Tamminen wrote:

On 15.4.2019 17.15, Juan A. Suarez Romero wrote:

Here is the tentative release plan for 19.1.0.

As many of you are well aware, it's time to the next branch point.
The calendar is already updated, so these are the tentative dates:

   Apr 30 2019 - Feature freeze/Release candidate 1
   May 07 2019 - Release candidate 2
   May 14 2019 - Release candidate 3
   May 21 2019 - Release candidate 4/final release

This gives us around 2 weeks until the branch point.

Note: In the spirit of keeping things clearer and more transparent, we
will be keeping track of any features planned for the release in
Bugzilla [1].

Do add a separate "Depends on" for each work you have planned.
Alternatively you can reply to this email and I'll add them for you.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=110439


Should the current GPU hang bugs initially be blockers for that?


I'm asking because I assume that GPU hangs are serious enough issues, 
that they should be checked out before release.


At least if they have reproducible test-case and/or error state files.



For example the remaining (open) mesa-19.0 blocker bug:
https://bugs.freedesktop.org/show_bug.cgi?id=108820


I understand as 19.0 was released anyway, that issue is not actually a blocker.


So it seems.

If it would have worked according to process (as I understand it), some 
Mesa developer should have looked at the bug, decided that it's not a 
blocker, and removed it from the dependencies before release, instead of 
it just being ignored.


(I had added the bug as blocker after getting confirmation from list 
that regressions from previous release can be added as such.)




(There has been no comment for the attached error states from Mesa devs
since Lionel's initial comment 4 months ago, e.g. on whether Jakub's and
my cases are same or should they be filed as separate bugs.)



- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Mesa 19.1.0 release plan

2019-04-15 Thread Eero Tamminen


Hi,

On 15.4.2019 17.15, Juan A. Suarez Romero wrote:

Here is the tentative release plan for 19.1.0.

As many of you are well aware, it's time to the next branch point.
The calendar is already updated, so these are the tentative dates:

  Apr 30 2019 - Feature freeze/Release candidate 1
  May 07 2019 - Release candidate 2
  May 14 2019 - Release candidate 3
  May 21 2019 - Release candidate 4/final release

This gives us around 2 weeks until the branch point.

Note: In the spirit of keeping things clearer and more transparent, we
will be keeping track of any features planned for the release in
Bugzilla [1].

Do add a separate "Depends on" for each work you have planned.
Alternatively you can reply to this email and I'll add them for you.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=110439


Should the current GPU hang bugs initially be blockers for that?

For example the remaining (open) mesa-19.0 blocker bug:
https://bugs.freedesktop.org/show_bug.cgi?id=108820

(There has been no comment for the attached error states from Mesa devs
since Lionel's initial comment 4 months ago, e.g. on whether Jakub's and
my cases are same or should they be filed as separate bugs.)


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] meson: upgrade minimum meson version

2019-04-09 Thread Eero Tamminen

Hi,

On 9.4.2019 14.16, Ernst Sjöstrand wrote:

For people building newer Mesa pip3 install --user meson is quite a
lot better and more accessible.

Nice thing with backports is that they get upgraded when one upgrades 
rest of the distro.

It's probably less of an issue with Python than some other things, but 
having local installs of things that are also available from distro 
itself can cause nasty issues (e.g. forgetting local install of libdrm 
to /usr/local/lib can mean that your desktop fails after you upgrade 
your distro).

Not a solutions for distributions that want to backport newer Mesa etc though.

Those build dependencies look more like runtime dependencies btw,
Meson is just pure python right?

They were Meson package build deps. Run-time deps are just ninja & python:
https://packages.ubuntu.com/cosmic/meson

My guess is that build configures Meson to the distro, and what's 
available in it.  Dylan?

- Eero

Regards
//Ernst

Den tis 9 apr. 2019 kl 13:00 skrev Eero Tamminen :

Hi,

On 9.4.2019 0.22, Dylan Baker wrote:

It would be nice if people wouldn't try to build bleeding edge mesa on a 12 year
old distro :D. Unfortunately as Eero ppoints out they do. I'd like to bump the
meson version to 0.47 in the future for some of the other features it adds
(automatically figuring out the dependencies for pkg-config, feature options,
etc), but that'll have to wait a bit for distros to catch up.

Currently at least Meson 0.45 is required, which I think to rule out any
12 year old distros :D.  Requiring Meson 0.47 would rule out 2 year old
distro releases.

If somebody can test latest stable distros to see they work if one just
takes newer Meson binary package from newer version of the distro, maybe
Mesa could add instruction for users to do that?

In Ubuntu 18.04 those instructions would be:
- replace "bionic" in /etc/apt/sources.list temporarily with "cosmic"
- do "sudo apt update && sudo apt install meson"
- revert /etc/apt/sources.list back to bionic

In Debian stable/stretch (which has Meson v0.37):
- enable "stretch-backports" repo to install Meson (v0.49)

 - Eero

PS. reason why I say Meson binary packages from newer distro version
need to work is that I think expecting users to build also new Meson
version from source is not realistic as it seems to need huge amount
of esoteric build dependencies:
 https://packages.ubuntu.com/source/cosmic/meson

We've actually talked about removing these deprecation warnings in cases where
the minimum version doesn't have the recommended replacement, but that's hard.

Dylan

Quoting Eero Tamminen (2019-04-08 04:16:49)

Hi,

On 6.4.2019 11.44, Khaled Emara wrote:

Sorry, I though LTS distros used old versions of mesa.
Got it.

Those *provide* older Mesa version, so naturally people want to build
latest Mesa on them (especially if they've just bought HW that isn't
supported by distro version of Mesa).  Building would fail if Mesa would
require newer Meson version than the one in the distro.

  - Eero

PS. On Ubuntu 18.04 LTS, one can install 0.47 Meson binary package
from 18.10, it works fine.  Maybe something similar can be done also
on other LTS distros.

On Sat, Apr 6, 2019 at 12:38 AM Dylan Baker mailto:dy...@pnwbakers.com>> wrote:

  Quoting Khaled Emara (2019-04-05 11:28:28)
   > Upgraded meson's minimum version from 0.45 to 0.47
   > Version 0.47 is required to use build_always_stale
   > ---
   >  meson.build | 2 +-
   >  1 file changed, 1 insertion(+), 1 deletion(-)
   >
   > diff --git a/meson.build b/meson.build
   > index 2c98e9e18a9..454928e244a 100644
   > --- a/meson.build
   > +++ b/meson.build
   > @@ -25,7 +25,7 @@ project(
   >  [find_program('python', 'python2', 'python3'),
  'bin/meson_get_version.py']
   >).stdout(),
   >license : 'MIT',
   > -  meson_version : '>= 0.45',
   > +  meson_version : '>= 0.47',
   >default_options : ['buildtype=debugoptimized',
  'b_ndebug=if-release', 'c_std=c99', 'cpp_std=c++11']
   >  )
   >
   > --
   > 2.21.0
   >
   > ___
   > mesa-dev mailing list
   > mesa-dev@lists.freedesktop.org
  <mailto:mesa-dev@lists.freedesktop.org>
   > https://lists.freedesktop.org/mailman/listinfo/mesa-dev

  Unfortunately we can't bump the minimum version at the moment, there
  are a
  couple of LTS distros we need support for that only ship 0.45.x
  currently. As an
  upstream meson dev, we have no time table in place for the removal of
  build_alw

Re: [Mesa-dev] [PATCH 2/2] meson: upgrade minimum meson version

2019-04-09 Thread Eero Tamminen

Hi,

On 9.4.2019 0.22, Dylan Baker wrote:

It would be nice if people wouldn't try to build bleeding edge mesa on a 12 year
old distro :D. Unfortunately as Eero ppoints out they do. I'd like to bump the
meson version to 0.47 in the future for some of the other features it adds
(automatically figuring out the dependencies for pkg-config, feature options,
etc), but that'll have to wait a bit for distros to catch up.

Currently at least Meson 0.45 is required, which I think to rule out any 
12 year old distros :D.  Requiring Meson 0.47 would rule out 2 year old 
distro releases.

If somebody can test latest stable distros to see they work if one just
takes newer Meson binary package from newer version of the distro, maybe
Mesa could add instruction for users to do that?

In Ubuntu 18.04 those instructions would be:
- replace "bionic" in /etc/apt/sources.list temporarily with "cosmic"
- do "sudo apt update && sudo apt install meson"
- revert /etc/apt/sources.list back to bionic

In Debian stable/stretch (which has Meson v0.37):
- enable "stretch-backports" repo to install Meson (v0.49)

- Eero

PS. reason why I say Meson binary packages from newer distro version
need to work is that I think expecting users to build also new Meson
version from source is not realistic as it seems to need huge amount
of esoteric build dependencies:
https://packages.ubuntu.com/source/cosmic/meson

We've actually talked about removing these deprecation warnings in cases where
the minimum version doesn't have the recommended replacement, but that's hard.

Dylan

Quoting Eero Tamminen (2019-04-08 04:16:49)

Hi,

On 6.4.2019 11.44, Khaled Emara wrote:

Sorry, I though LTS distros used old versions of mesa.
Got it.

Those *provide* older Mesa version, so naturally people want to build
latest Mesa on them (especially if they've just bought HW that isn't
supported by distro version of Mesa).  Building would fail if Mesa would
require newer Meson version than the one in the distro.

 - Eero

PS. On Ubuntu 18.04 LTS, one can install 0.47 Meson binary package
from 18.10, it works fine.  Maybe something similar can be done also
on other LTS distros.

On Sat, Apr 6, 2019 at 12:38 AM Dylan Baker mailto:dy...@pnwbakers.com>> wrote:

 Quoting Khaled Emara (2019-04-05 11:28:28)
  > Upgraded meson's minimum version from 0.45 to 0.47
  > Version 0.47 is required to use build_always_stale
  > ---
  >  meson.build | 2 +-
  >  1 file changed, 1 insertion(+), 1 deletion(-)
  >
  > diff --git a/meson.build b/meson.build
  > index 2c98e9e18a9..454928e244a 100644
  > --- a/meson.build
  > +++ b/meson.build
  > @@ -25,7 +25,7 @@ project(
  >      [find_program('python', 'python2', 'python3'),
 'bin/meson_get_version.py']
  >    ).stdout(),
  >    license : 'MIT',
  > -  meson_version : '>= 0.45',
  > +  meson_version : '>= 0.47',
  >    default_options : ['buildtype=debugoptimized',
 'b_ndebug=if-release', 'c_std=c99', 'cpp_std=c++11']
  >  )
  >
  > --
  > 2.21.0
  >
  > ___
  > mesa-dev mailing list
  > mesa-dev@lists.freedesktop.org
 <mailto:mesa-dev@lists.freedesktop.org>
  > https://lists.freedesktop.org/mailman/listinfo/mesa-dev

 Unfortunately we can't bump the minimum version at the moment, there
 are a
 couple of LTS distros we need support for that only ship 0.45.x
 currently. As an
 upstream meson dev, we have no time table in place for the removal of
 build_always.

 Dylan

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] meson: upgrade minimum meson version

2019-04-08 Thread Eero Tamminen

Hi,

On 6.4.2019 11.44, Khaled Emara wrote:

Sorry, I though LTS distros used old versions of mesa.
Got it.

Those *provide* older Mesa version, so naturally people want to build
latest Mesa on them (especially if they've just bought HW that isn't
supported by distro version of Mesa).  Building would fail if Mesa would
require newer Meson version than the one in the distro.

- Eero

PS. On Ubuntu 18.04 LTS, one can install 0.47 Meson binary package
from 18.10, it works fine.  Maybe something similar can be done also
on other LTS distros.

On Sat, Apr 6, 2019 at 12:38 AM Dylan Baker > wrote:

Quoting Khaled Emara (2019-04-05 11:28:28)
 > Upgraded meson's minimum version from 0.45 to 0.47
 > Version 0.47 is required to use build_always_stale
 > ---
 >  meson.build | 2 +-
 >  1 file changed, 1 insertion(+), 1 deletion(-)
 >
 > diff --git a/meson.build b/meson.build
 > index 2c98e9e18a9..454928e244a 100644
 > --- a/meson.build
 > +++ b/meson.build
 > @@ -25,7 +25,7 @@ project(
 >      [find_program('python', 'python2', 'python3'),
'bin/meson_get_version.py']
 >    ).stdout(),
 >    license : 'MIT',
 > -  meson_version : '>= 0.45',
 > +  meson_version : '>= 0.47',
 >    default_options : ['buildtype=debugoptimized',
'b_ndebug=if-release', 'c_std=c99', 'cpp_std=c++11']
 >  )
 >
 > --
 > 2.21.0
 >
 > ___
 > mesa-dev mailing list
 > mesa-dev@lists.freedesktop.org

 > https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Unfortunately we can't bump the minimum version at the moment, there
are a
couple of LTS distros we need support for that only ship 0.45.x
currently. As an
upstream meson dev, we have no time table in place for the removal of
build_always.

Dylan

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] RFC: Workaround for pthread_setaffinity_np() seccomp filtering

2019-03-13 Thread Eero Tamminen


Hi,

On 12.3.2019 10.59, Marc-André Lureau wrote:

On Fri, Mar 1, 2019 at 12:13 PM Mathias Fröhlich
 wrote:

On Friday, 1 March 2019 12:15:08 CET Eero Tamminen wrote:

On 1.3.2019 11.12, Michel Dänzer wrote:

On 2019-02-28 8:41 p.m., Marek Olšák wrote:

On Thu, Feb 28, 2019 at 1:37 PM Eero Tamminen 

Why distro versions of Qemu filter sched_setaffinity() syscall?


(https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1815889)

Daniel Berrange (berrange) wrote on 2019-02-27: #19

"IMHO that mesa change is not valid. It is settings its affinity to
run on all threads which is definitely *NOT* something we want to be
allowed. Management applications want to control which CPUs QEMU runs
on, and as such Mesa should honour the CPU placement that the QEMU
process has.

This is a great example of why QEMU wants to use seccomp to block
affinity changes to prevent something silently trying to use more CPUs
than are assigned to this QEMU."



Mesa uses thread affinity to optimize memory access performance on some
CPUs (see util_pin_thread_to_L3). Other places in Mesa need to restore the
original thread affinity for some child threads. Additionally, if games
limit the thread affinity, Mesa needs to restore the full thread affinity
for some of its child threads.


The last part sounds like Mesa clearly overstepping its authority.



In essence, the thread affinity should only be considered a hint for the
kernel for optimal performance. There is no reason to kill the process if
it's disallowed. Just ignore the call or modify the thread mask to make it
legal.


The fundamental issue here is that Mesa is using the thread affinity API
for something else than it's intended for. If there was an API for what
Mesa wants (encouraging certain sets of threads to run on topologically
close cores), there should be no need to block that.


Why such process needs to be killed instead the request being masked
suitably, is there some program that breaks subtly if affinity request
is masked (and that being worse than the program being killed)?


But that is still a situation that could be nicely handled with a
EPERM error return. Way better than just kill a process.
That 'badly affected' program still can call abort then.
But nicely working programs don't get just killed then!!



Returning an error seems less secure that prohibiting it completely.
And it may lead to subtle bugs in rarely tested code paths.

It's legitimate that QEMU and management layers want to prevent
arbitrary code from changing resource allocation etc.


They can do that by no-oping the system call, or masking the parts they 
don't want to be modified.  As that affects only (potentially) 
performance, not functionality, it seems to me better than outright 
killing a process.


(As with killing, there should probably be some way to log things that 
were ignored/masked.)




There are no easy way I can think of for mesa (and other libraries) to
probe the seccomp filters and associated action.

So we need a way to tell mesa not to call setaffinity() (and other
syscalls). MESA_NO_THREAD_AFFINITY or MESA_NO_SYSCALLS=setaffinity,...
seem like a relatively easy way to go.



- Eero

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] RFC: Workaround for pthread_setaffinity_np() seccomp filtering

2019-03-01 Thread Eero Tamminen


Hi,

On 1.3.2019 11.12, Michel Dänzer wrote:

On 2019-02-28 8:41 p.m., Marek Olšák wrote:

On Thu, Feb 28, 2019 at 1:37 PM Eero Tamminen 

Why distro versions of Qemu filter sched_setaffinity() syscall?


(https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1815889)

Daniel Berrange (berrange) wrote on 2019-02-27: #19

"IMHO that mesa change is not valid. It is settings its affinity to
run on all threads which is definitely *NOT* something we want to be
allowed. Management applications want to control which CPUs QEMU runs
on, and as such Mesa should honour the CPU placement that the QEMU
process has.

This is a great example of why QEMU wants to use seccomp to block
affinity changes to prevent something silently trying to use more CPUs
than are assigned to this QEMU."



Mesa uses thread affinity to optimize memory access performance on some
CPUs (see util_pin_thread_to_L3). Other places in Mesa need to restore the
original thread affinity for some child threads. Additionally, if games
limit the thread affinity, Mesa needs to restore the full thread affinity
for some of its child threads.


The last part sounds like Mesa clearly overstepping its authority.



In essence, the thread affinity should only be considered a hint for the
kernel for optimal performance. There is no reason to kill the process if
it's disallowed. Just ignore the call or modify the thread mask to make it
legal.


The fundamental issue here is that Mesa is using the thread affinity API
for something else than it's intended for. If there was an API for what
Mesa wants (encouraging certain sets of threads to run on topologically
close cores), there should be no need to block that.


Why such process needs to be killed instead the request being masked 
suitably, is there some program that breaks subtly if affinity request 
is masked (and that being worse than the program being killed)?



- eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] RFC: Workaround for pthread_setaffinity_np() seccomp filtering

2019-02-28 Thread Eero Tamminen


Hi,

On 28.2.2019 11.57, Marc-André Lureau wrote:

On Thu, Feb 28, 2019 at 1:17 AM Marek Olšák  wrote:

I'd rather have something more robust than an env var, like catching SIGSYS.


SIGSYS is info for the invoking parent, not the (Mesa) process doing the 
syscall.


From "man 2 seccomp":

The process terminates as though killed by a SIGSYS signal.  Even if a 
signal handler has been registered for SIGSYS,  the  handler will be 
ignored in this case and the process always terminates.  To a parent 
process that is waiting on this process (using waitpid(2) or similar), 
the returned wstatus will indicate that its child was terminated as 
though by a SIGSYS signal.




With current qemu in most distros, it defaults to SIGSYS (we switched
away from SCMP_ACT_KILL, which had other problems). With more recent
qemu/libseccomp, it will default to SCMP_ACT_KILL_PROCESS. In those
KILL action cases, mesa will not be able to catch the failing
syscalls.


Qemu / libvirt isn't the only thing using seccomp.

For example Docker enables seccomp filters (along with capability
restrictions) for the invoked containers unless that is explicitly
disabled:
https://docs.docker.com/engine/security/seccomp/

What actually gets filtered, is trivially changeable on Docker command 
line by giving a JSON file specifying the syscall filtering.


Default policy seems to be white-listing affinity syscall:
https://github.com/moby/moby/blob/master/profiles/seccomp/default.json


Why distro versions of Qemu filter sched_setaffinity() syscall?


- Eero


Marek

On Wed, Feb 27, 2019 at 6:13 PM  wrote:


From: Marc-André Lureau 

Since commit d877451b48a59ab0f9a4210fc736f51da5851c9a ("util/u_queue:
add UTIL_QUEUE_INIT_SET_FULL_THREAD_AFFINITY"), mesa calls
sched_setaffinity syscall. Unfortunately, qemu crashes with SIGSYS
when sandboxing is enabled (by default with libvirt), as this syscall
is filtered.

There doesn't seem to be a way to check for the seccomp rule other
than doing a call, which may result in various behaviour depending on
seccomp actions. There is a PTRACE_SECCOMP_GET_FILTER, but it is
low-level and a priviledged operation (but there might be a way to use
it?). A safe way would be to try the call in a subprocess,
unfortunately, qemu also prohibits fork(). Also this could be subject
to TOCTOU.

There seems to be few solutions, but the issue can be considered a
regression for various libvirt/Boxes users.

Introduce MESA_NO_THREAD_AFFINITY environment variable to prevent the
offending call. Wrap pthread_setaffinity_np() in a utility function
u_pthread_setaffinity_np(), returning a EACCESS error if the variable
is set.

Note: one call is left with a FIXME, as I didn't investigate how to
build and test it, help welcome!

See also:
https://bugs.freedesktop.org/show_bug.cgi?id=109695

Signed-off-by: Marc-André Lureau 
---
  .../drivers/swr/rasterizer/core/threads.cpp   |  1 +
  src/util/u_queue.c|  2 +-
  src/util/u_thread.h   | 15 ++-
  3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/threads.cpp 
b/src/gallium/drivers/swr/rasterizer/core/threads.cpp
index e30c1170568..d10c79512a1 100644
--- a/src/gallium/drivers/swr/rasterizer/core/threads.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/threads.cpp
@@ -364,6 +364,7 @@ void bindThread(SWR_CONTEXT* pContext,
  CPU_ZERO(&cpuset);
  CPU_SET(threadId, &cpuset);

+/* FIXME: use u_pthread_setaffinity_np() if possible */
  int err = pthread_setaffinity_np(thread, sizeof(cpu_set_t), &cpuset);
  if (err != 0)
  {
diff --git a/src/util/u_queue.c b/src/util/u_queue.c
index 3812c824b6d..dea8d2bb4ae 100644
--- a/src/util/u_queue.c
+++ b/src/util/u_queue.c
@@ -249,7 +249,7 @@ util_queue_thread_func(void *input)
for (unsigned i = 0; i < CPU_SETSIZE; i++)
   CPU_SET(i, &cpuset);

-  pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
+  u_pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
 }
  #endif

diff --git a/src/util/u_thread.h b/src/util/u_thread.h
index a46c18d3db2..a4e6dbae5d7 100644
--- a/src/util/u_thread.h
+++ b/src/util/u_thread.h
@@ -70,6 +70,19 @@ static inline void u_thread_setname( const char *name )
 (void)name;
  }

+#if defined(HAVE_PTHREAD_SETAFFINITY)
+static inline int u_pthread_setaffinity_np(pthread_t thread, size_t cpusetsize,
+   const cpu_set_t *cpuset)
+{
+   if (getenv("MESA_NO_THREAD_AFFINITY")) {
+  errno = EACCES;
+  return -1;
+   }
+
+   return pthread_setaffinity_np(thread, cpusetsize, cpuset);
+}
+#endif
+
  /**
   * An AMD Zen CPU consists of multiple modules where each module has its own 
L3
   * cache. Inter-thread communication such as locks and atomics between modules
@@ -89,7 +102,7 @@ util_pin_thread_to_L3(thrd_t thread, unsigned L3_index, 
unsigned cores_per_L3)
 CPU_Z

Re: [Mesa-dev] A few NIR compile time optimisations

2019-02-20 Thread Eero Tamminen

Hi,

On 20.2.2019 17.32, Marek Olšák wrote:
On Wed, Feb 20, 2019 at 2:31 AM Connor Abbott On Wed, Feb 20, 2019 at 4:29 AM Marek Olšák
...

That's a harsh reaction to a relatively good benchmarking setup.
I use debugoptimized with -DDEBUG. My performance is probably
more affected by -fno-omit-frame-pointer than -DDEBUG.

Why would you enable DEBUG in a profiling build? AFAIK it's supposed
to enable validation more expensive than simple asserts, which you
probably don't want in a benchmarking setup, although from grepping
the source it's not used much. It might be a good idea to move
running NIR validation behind DEBUG instead of !NDEBUG. In the
meantime, unless you really want to benchmark things with assertions
enabled in which case NIR_VALIDATE=0 works (but why would you?), you
can set -Db_ndebug=false in Meson. I just saw today that they're
enabled by default in debugoptimized builds (whoops, better go fix
that and re-profile...).

Assertions and other debug code really don't cost much, so I don't see a
reason to undef DEBUG.

At least on other projects I've seen cases where:
* assert or debug code has been added to places where it's expensive, or
some of those places become expensive later
* one sees low cost code in profiles and assume it to be specific to
DEBUG build, although somebody had forgotten to ifdef it properly
* debug build (or part of it) is compiled with other optimization flags
than release build [1]
* added debug code causes important code not anymore to fit into smaller
cache level (this is very specific to cache sizes on given machine and
code GCC generates though)

[1] Unexpected things happen. Recently I e.g. saw switching
between Meson & Autotools causing Mesa URB allocation changes:
https://bugs.freedesktop.org/show_bug.cgi?id=108787

-> It would help if you could verify what's the perf gap between release
and debug builds, and provide numbers for both (preferably each time
there are significant code changes).

Just using release builds should be simpler though (unless your
intention actually is investigating why debug code is slow).

- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [ANNOUNCE] mesa 19.0.0-rc1

2019-02-11 Thread Eero Tamminen

Hi,

On 6.2.2019 20.03, Dylan Baker wrote:

Quoting Eero Tamminen (2019-02-04 04:41:12)

[...]

* going through bugzilla bugs that have been created after previous
release branch point was tagged and, adding the relevant ones
to the release tracker bug

This has always been the job of each driver team to do.

Wouldn't it still be useful to post a list of bugs reported since
the previous release in the first RC message, and ask which of them
are relevant for the release?

> Some drivers don't care
> about our releases, other's don't like to use the tracker bug.

If there's a verified regression, it could still be mentioned in release
notes, regardless of whether the driver team is interested in fixing
for the release.

- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [ANNOUNCE] mesa 19.0.0-rc

2019-02-06 Thread Eero Tamminen


Hi,

On 4.2.2019 14.41, Eero Tamminen wrote:

On 2.2.2019 3.20, Mark Janes wrote:

[...]

These regressions all need to be added to the release tracker.  Thank
you for reporting them.


If that should track all the things regressed since previous
18.3 version was branched:
---
tag 18.3-branchpoint
Tagger: Emil Velikov 
Date:   Thu Nov 1 18:47:06 2018 +
---

Then there are a couple of other (i965) regressions I've reported:


As there were no comments, I added them as blockers to the release bug.


* https://bugs.freedesktop.org/show_bug.cgi?id=109055 (Vulkan / CPU)
* https://bugs.freedesktop.org/show_bug.cgi?id=108820 (compute hangs)


It would be good for somebody to:

* Check the GPU error data attached to this bug to see whether 
(nowadays) *very* infrequently happening CarChase hangs are related to 
Jakub's *fully reproducible* compute hangs.  If not -> ask him to file 
separate bug.


* Bisect Jakub's minimal compute test-case to see which Mesa commit 
started triggering the GPU hangs.




* https://bugs.freedesktop.org/show_bug.cgi?id=108787 (BSW CarChase aborts)

[...]


- Eero

PS. "Somebody" = somebody else than me, as I'm not supposed to spend 
work time on 3D/Mesa (reporting bugs I happen to otherwise notice is OK, 
spending time investigating them isn't).

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [ANNOUNCE] mesa 19.0.0-rc1

2019-02-04 Thread Eero Tamminen


Hi,

On 2.2.2019 3.20, Mark Janes wrote:

Eero Tamminen  writes:

On 31.1.2019 1.37, Dylan Baker wrote:

This email announces the mesa 19.0 release candidate 1. I'll keep this email
fairly brief since I'm already running a little late on getting this done :)
I've just had to resolve quite a few autotools issues to get the dist built.

Notable in the 19.0-rc1 branch is SWR is set to require LLVM 7 instead of LLVM
6. It is impossible to bootstrap SWR with LLVM 6 and compile with  LLVM 7 due to
LLVM API changes. Since RadeonSI and Radv both require LLVM 7 I've taken the
liberty of bumping SWR so that we could get a tarball built.

We've had an exciting release cycle, plenty of GL and Vulkan extensions, ~1600
commits since the 18.3 branchpoint with substantial work across all areas of
mesa.


Are all the recent (i965) perf regressions included to it:
* https://bugs.freedesktop.org/show_bug.cgi?id=109517 (spilling)
* https://bugs.freedesktop.org/show_bug.cgi?id=109505 (Unigine)
* https://bugs.freedesktop.org/show_bug.cgi?id=109216 (Vulkan)


These regressions all need to be added to the release tracker.  Thank
you for reporting them.


If that should track all the things regressed since previous
18.3 version was branched:
---
tag 18.3-branchpoint
Tagger: Emil Velikov 
Date:   Thu Nov 1 18:47:06 2018 +
---

Then there are a couple of other (i965) regressions I've reported:
* https://bugs.freedesktop.org/show_bug.cgi?id=109055 (Vulkan / CPU)
* https://bugs.freedesktop.org/show_bug.cgi?id=108820 (compute hangs)
* https://bugs.freedesktop.org/show_bug.cgi?id=108787 (BSW CarChase aborts)

(First two seem at least partially resolved.)



PS. There's also much older:
* https://bugs.freedesktop.org/show_bug.cgi?id=107510

Which was already fixed, but then regressed again, and regressing commit
wasn't anymore reverted.  I'm mentioning it because Timothy had a patch
series in October that fixed the tess/geom shader regressions (which
were largest), but for some reason it's not yet in upstream.


That one regressed after 18.2-branchpoint:
---
Tagger: Andres Gomez 
Date:   Thu Aug 2 17:57:41 2018 +0300
---

And it wasn't listed in 18.2.x release tracker bug.

Neither were its duplicates:
* https://bugs.freedesktop.org/show_bug.cgi?id=107706
* https://bugs.freedesktop.org/show_bug.cgi?id=107743


If there's a release check list for Mesa, it could have items for:
* having the release tracker meta bug
* going through bugzilla bugs that have been created after previous
  release branch point was tagged and, adding the relevant ones
  to the release tracker bug


- Eero


Expect rc2 about this time next week, see you then.

Dylan

git tag: mesa-19.0.0-rc1

https://mesa.freedesktop.org/archive/mesa-19.0.0-rc1.tar.gz
MD5:  b3a610b204d0cb3431823353a8cbe8e6  mesa-19.0.0-rc1.tar.gz
SHA1: d1f0d0bc49ec7e02d0cd7d141127fd2fefc72e35  mesa-19.0.0-rc1.tar.gz
SHA256: 0a14bb059f6cead4e50923df9c24d3c5025d9310803ca5189e019f07e539639e  
mesa-19.0.0-rc1.tar.gz
SHA512: 
5bedc917afecef6a0dd11c56688a3e3fdbbaeaceca33062d6825b5525c6e78663e873bdecc96b98b0448d988ad81a7a8617c523e2d312384369c6a333b790b86
  mesa-19.0.0-rc1.tar.gz
PGP:  https://mesa.freedesktop.org/archive/mesa-19.0.0-rc1.tar.gz.sig

https://mesa.freedesktop.org/archive/mesa-19.0.0-rc1.tar.xz
MD5:  727abb6469e518ff1a2e1bde33543503  mesa-19.0.0-rc1.tar.xz
SHA1: 577642259cd269c883007df7c2772c8c636fabfb  mesa-19.0.0-rc1.tar.xz
SHA256: 8efb32956c428d23f78364f9eace5491bda9feaafd767128133672a5f79659e8  
mesa-19.0.0-rc1.tar.xz
SHA512: 
23d21d6c4f03a1d9073ecb1f43dc251d581cdeb6b7cc24a19c299571070b4184ad4f22b0ca170ca42e58c62bb46eca0dadc334a952bbb7e0379961a30a6ca856
  mesa-19.0.0-rc1.tar.xz
PGP:  https://mesa.freedesktop.org/archive/mesa-19.0.0-rc1.tar.xz.sig

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [ANNOUNCE] mesa 19.0.0-rc1

2019-02-01 Thread Eero Tamminen


Hi,

On 31.1.2019 1.37, Dylan Baker wrote:

This email announces the mesa 19.0 release candidate 1. I'll keep this email
fairly brief since I'm already running a little late on getting this done :)
I've just had to resolve quite a few autotools issues to get the dist built.

Notable in the 19.0-rc1 branch is SWR is set to require LLVM 7 instead of LLVM
6. It is impossible to bootstrap SWR with LLVM 6 and compile with  LLVM 7 due to
LLVM API changes. Since RadeonSI and Radv both require LLVM 7 I've taken the
liberty of bumping SWR so that we could get a tarball built.

We've had an exciting release cycle, plenty of GL and Vulkan extensions, ~1600
commits since the 18.3 branchpoint with substantial work across all areas of
mesa.


Are all the recent (i965) perf regressions included to it:
* https://bugs.freedesktop.org/show_bug.cgi?id=109517 (spilling)
* https://bugs.freedesktop.org/show_bug.cgi?id=109505 (Unigine)
* https://bugs.freedesktop.org/show_bug.cgi?id=109216 (Vulkan)
?

- Eero

PS. There's also much older:
* https://bugs.freedesktop.org/show_bug.cgi?id=107510

Which was already fixed, but then regressed again, and regressing commit 
wasn't anymore reverted.  I'm mentioning it because Timothy had a patch 
series in October that fixed the tess/geom shader regressions (which 
were largest), but for some reason it's not yet in upstream.



Expect rc2 about this time next week, see you then.

Dylan

git tag: mesa-19.0.0-rc1

https://mesa.freedesktop.org/archive/mesa-19.0.0-rc1.tar.gz
MD5:  b3a610b204d0cb3431823353a8cbe8e6  mesa-19.0.0-rc1.tar.gz
SHA1: d1f0d0bc49ec7e02d0cd7d141127fd2fefc72e35  mesa-19.0.0-rc1.tar.gz
SHA256: 0a14bb059f6cead4e50923df9c24d3c5025d9310803ca5189e019f07e539639e  
mesa-19.0.0-rc1.tar.gz
SHA512: 
5bedc917afecef6a0dd11c56688a3e3fdbbaeaceca33062d6825b5525c6e78663e873bdecc96b98b0448d988ad81a7a8617c523e2d312384369c6a333b790b86
  mesa-19.0.0-rc1.tar.gz
PGP:  https://mesa.freedesktop.org/archive/mesa-19.0.0-rc1.tar.gz.sig

https://mesa.freedesktop.org/archive/mesa-19.0.0-rc1.tar.xz
MD5:  727abb6469e518ff1a2e1bde33543503  mesa-19.0.0-rc1.tar.xz
SHA1: 577642259cd269c883007df7c2772c8c636fabfb  mesa-19.0.0-rc1.tar.xz
SHA256: 8efb32956c428d23f78364f9eace5491bda9feaafd767128133672a5f79659e8  
mesa-19.0.0-rc1.tar.xz
SHA512: 
23d21d6c4f03a1d9073ecb1f43dc251d581cdeb6b7cc24a19c299571070b4184ad4f22b0ca170ca42e58c62bb46eca0dadc334a952bbb7e0379961a30a6ca856
  mesa-19.0.0-rc1.tar.xz
PGP:  https://mesa.freedesktop.org/archive/mesa-19.0.0-rc1.tar.xz.sig


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] autotools: Deprecate the use of autotools

2019-01-14 Thread Eero Tamminen


Hi,

On 13.1.2019 9.44, Jonathan Gray wrote:
...> As we can not depend on python to build Mesa in OpenBSD I will have to

go back to maintaining a local Mesa build system if autotools is removed.


Mesa already needs python-mako (for code generation) with Autotools.
Why Meson also needing Python is a problem?


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Chromium - Application-level nouveau blacklist

2019-01-08 Thread Eero Tamminen


Hi,

On 8.1.2019 8.56, Stéphane Marchesin wrote:

Yes I think the Chrome-side is very simple here: because there isn't
time or means for in-depth investigation, if a driver crashes too
much, it gets blacklisted. The situation is not unique, the GPU
blacklist file is 1700 lines:
https://chromium.googlesource.com/chromium/src/gpu/+/master/config/software_rendering_list.json

Anyway, IMO if the biggest crashers can be fixed, I think we could
eventually make a case to reenable.


Can Chrome crash tracking system provide following information:
* which www-page caused the crash
* upstream Mesa commit used
* upstream kernel commit used
* if crash is from ChromeOS
   - what patches have been applied on top of kernel & Mesa
* if crash is from a Linux distro
  - distro version and windowing system setup (X/Wayland, compositor)
  - package versions for kernel, Mesa, display server and compositor
(to find what kernel & Mesa patches are applied)
* detailed HW information
* backtrace
?

So that developers could try the corresponding combination?


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] NIR documentation (was: NIR constant problem for GPU which doesn't have native integer support)

2019-01-07 Thread Eero Tamminen


Hi,

On 7.1.2019 12.42, Erik Faye-Lund wrote:

On Fri, 2019-01-04 at 09:40 -0600, Jason Ekstrand wrote:

[...]

Yeah...  Patches welcome?  There have been many attempts by Connor
and myself to better document NIR.  They all end up in /dev/null due
to EBIGGERFIRES. :-(

That said, if you ever want to know how something works, I'm logged
into IRC 24/7 and will happily answer questions.



That's understandable, but even so, at some point we need to reduce
some bus-factor. Somehow.

Do you have some links to the attempts at documenting NIR? Perhaps I
could take a look at it during some down-time?


Google search gave me:
http://www.jlekstrand.net/jason/projects/mesa/nir-notes/
https://people.freedesktop.org/~cwabbott0/nir-docs/

There are also some Igalia blog posts:

https://blogs.igalia.com/apinheiro/2016/06/02/introducing-mesa-intermediate-representations-on-intel-drivers-with-a-practical-example/

I assume you've already found them.  If they're still up to date, it 
might be good to add some links to them also to Mesa site.  At least I 
didn't notice them being referred there yet:

https://www.mesa3d.org/

(And they aren't referenced in any of the files in Mesa git.)


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] autotools: Deprecate the use of autotools

2018-12-20 Thread Eero Tamminen

Hi,

On 20.12.2018 4.40, Stuart Young wrote:
Could this be reduced this from an error to a warning, with the 
command-line option suppressing the warning?

Perhaps as well as producing the warning, the build could sleep for say 
30 seconds after producing the warning message. This would be noticeable 
if you're building it yourself (which would definitely get the message 
out there), but wouldn't stop automated build processes (as it's not 
stopping, just sleeping).

At a later date (when meson has more exposure) then perhaps we could 
move from a warning to an error.

Thoughts?

So it would take 2 releases i.e. about 1/2 year?

* In next release autotools build is going to warn that meson should be 
used instead, as autotools support will be eventually removed.

* Next release from that:
  - errors out if specific option isn't used for doing building with 
autotools

  - tells that autotools support will be removed in following release

* Release after that removes autotools support.

(I assume Emil would be the person supporting Autotools during this as 
he had already volunteered for it.)

While I think above is closer to how other large projects do deprecation 
in general (= have clear deprecation period/releases), build system 
changes are probably more often done just with flag day (early in the 
release cycle, as maintaining multiple build systems can be pretty large 
pain).

- Eero

On Thu, 20 Dec 2018 at 07:37, Gert Wollny > wrote:

Am Mittwoch, den 19.12.2018, 12:19 -0500 schrieb Ilia Mirkin:
 > On Sun, Dec 16, 2018 at 6:24 AM Gert Wollny mailto:gw.foss...@gmail.com>>
 > wrote:
 > >
 > > Since Meson will eventually be the only build system deprecate
 > > autotools
 > > now. It can still be used by invoking configure with the flag
 > >   --enable-autotools
 > >
 > > Signed-off-by: Gert Wollny mailto:gw.foss...@gmail.com>>
 >
 > In case it's not clear from the later discussion:
 >
 > Strongly NAKed-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
 > This should not be applied to the repository.

You should probably add this to the patch proposals that want to remove
autotools then.

I'm certainly not going to push this patch with your NAK, but I'd like
to ask you to reconsider: All this patch does is require to add  --
enable-autotools to the configure call once - a little hickup in the
workflow of those who do not yet use exclusively meson, but it gets the
message out to the others that things are going to change.

Best,
Gert

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org 
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

--
Stuart Young (aka Cefiar)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] last call for autotools

2018-12-17 Thread Eero Tamminen


Hi,

On 17.12.2018 8.08, Marek Olšák wrote:
[...]
I think one of the serious usability issues is that environment 
variables such as CFLAGS, CXXFLAGS, LDFLAGS, and PKG_CONFIG_PATH are not 
saved by meson for future reconfigures.


I don't know what Meson is supposed to do, but to me that would be
a bug in a build tool.

Re-configure is supposed to adapt SW to the changes in the build
environment, and environment variables are part of that (along with
command line options and SW installed to to the system).  Build
configure tool deciding to "remember" some of those things instead
of checking the new situation, seems like a great opportunity for
confusion.


- Eero


I think meson should ignore the 
variables completely, or the Mesa documentation should discourage users 
from the setting the variables. The current situation would be 
acceptable to me if users were warned that setting environment variables 
is a trap.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Suggestions for improving meson-based builds

2018-12-17 Thread Eero Tamminen


Hi,

On 17.12.2018 2.33, Jason Ekstrand wrote:
On Sun, Dec 16, 2018 at 1:49 PM Ilia Mirkin > wrote:

[...]> 2. Summary at the end. This is not strictly an autotools-specific

issue, but rather how mesa has used it. That summary at the end is
very useful to make sure you (or someone you're helping) hasn't
screwed something up.

Run "meson configure" with no options.  It will dump out a full 
configuration.  Sure, it's not the same format but, unless I'm 
misunderstanding something, all of the information from the old summary 
and more is available there.  Or are you concerned with various derived 
or auto-configured values?  Or is it your concern that it get dumped out 
at the end of a configure rather than having to ask for it?  It'd be 
helpful if you'd be more specific about what you want than "the 
autotools thing".


Having build itself output (by default) the configured settings is
useful when going through build logs and looking for errors, and
build env & configuration differences.

It's run-time generated documentation about how the software decided
to configure itself to its build environment.  Personally I'd rather
read well formatted and concise documentation.

Nice (default) output of how the software configured itself to given
environment makes project somehow IMHO much more user-friendly, and
it's first impression of the project for a new developer.


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Let's talk about -DDEBUG

2018-12-13 Thread Eero Tamminen


Hi,

On 13.12.2018 12.19, Eric Engestrom wrote:

On Wednesday, 2018-12-12 19:45:06 -0500, Marek Olšák wrote:

On Wed, Dec 12, 2018 at 7:35 PM Rob Clark  wrote:

On Wed, Dec 12, 2018 at 7:14 PM Dylan Baker  wrote:

Quoting Rob Clark (2018-12-12 15:52:47)

...

I guess I should have covered that:

autotools had effectively two build types "debug" and "not debug",

"debug" set

"-DDEBUG -g -O2", "not debug" set -DNDEBUG

Meson has 4 build types, and a separate toggle for NDEBUG:
debug: -O0 -DDEBUG (we add -DDEBUG)
debugoptimzed: -O2 -g
release: -O2
plain: (nothing)


I generally view meson as an upgrade in this respect, I guess mostly
because I have no use for '-DDEBUG -g -O2' (ie. the -O0 equiv is fine
by me).. maybe making meson debug config use -O2 would bridge the gap?
  If so I have no problem with dropping -O0 and making debug config
also use -O2


Target without optimizations is good to track down issues triggered by
compiler optimizations, and to have faster builds.



Sounds good.

Also I think we should make -Db_ndebug=true the default for release builds.
!NDEBUG enables a lot more debugging code than just assertions.


That's already the case :)

The default value (line 29 of the root meson.build) is:

   b_ndebug=if-release

which means it will default to `true` if `buildtype=release`, and `false`
otherwise, while still being overridden if you manually set `b_ndebug`.


There needs to be something that's release with debug symbols, but
without _any_ other differences.  If build type assert handling differs
from release, debug symbols are less of a use for debugging issues
with release build.


Build type names could also be more descriptive, although something
like this seems overkill:
- release
- release_with_symbols
- release_with_symbols_and_asserts
- full_debug

Meson can help by outputting exact description of each build types,
e.g. when one doesn't specify one, or unrecognized one is provided.


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/4] nir: detect more induction variables

2018-11-28 Thread Eero Tamminen


Hi,

On 28.11.2018 5.25, Timothy Arceri wrote:

This adds allows loop analysis to detect inductions varibales that
are incremented in both branches of an if rather than in a main
loop block. For example:

[...]> Unfortunatly GCM could move the addition out of the if for us

(making this patch unrequired) but we still cannot enable the GCM
pass without regressions.


Maybe there should be a TODO comment about removing this code when GCM 
is added?



- Eero


This unrolls a loop in Rise of The Tomb Raider.

vkpipeline-db results (VEGA):

Totals from affected shaders:
SGPRS: 88 -> 96 (9.09 %)
VGPRS: 56 -> 52 (-7.14 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 2168 -> 4560 (110.33 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 4 -> 4 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32211
---
  src/compiler/nir/nir_loop_analyze.c | 36 +
  1 file changed, 36 insertions(+)

diff --git a/src/compiler/nir/nir_loop_analyze.c 
b/src/compiler/nir/nir_loop_analyze.c
index 8903e15105..cf97d6bf06 100644
--- a/src/compiler/nir/nir_loop_analyze.c
+++ b/src/compiler/nir/nir_loop_analyze.c
@@ -245,6 +245,42 @@ compute_induction_information(loop_info_state *state)
   if (src_var->in_if_branch || src_var->in_nested_loop)
  break;
  
+ /* Detect inductions varibales that are incremented in both branches

+  * of an unnested if rather than in a loop block.
+  */
+ if (is_var_phi(src_var)) {
+nir_phi_instr *src_phi =
+   nir_instr_as_phi(src_var->def->parent_instr);
+
+nir_op alu_op;
+nir_ssa_def *alu_srcs[2] = {0};
+nir_foreach_phi_src(src2, src_phi) {
+   nir_loop_variable *src_var2 =
+  get_loop_var(src2->src.ssa, state);
+
+   if (!src_var2->in_if_branch || !is_var_alu(src_var2))
+  break;
+
+   nir_alu_instr *alu =
+  nir_instr_as_alu(src_var2->def->parent_instr);
+   if (nir_op_infos[alu->op].num_inputs != 2)
+  break;
+
+   if (alu->src[0].src.ssa == alu_srcs[0] &&
+   alu->src[1].src.ssa == alu_srcs[1] &&
+   alu->op == alu_op) {
+  /* Both branches perform the same calculation so we can use
+   * one of them to find the induction variable.
+   */
+  src_var = src_var2;
+   } else {
+  alu_srcs[0] = alu->src[0].src.ssa;
+  alu_srcs[1] = alu->src[1].src.ssa;
+  alu_op = alu->op;
+   }
+}
+ }
+
   if (!src_var->in_loop) {
  biv->def_outside_loop = src_var;
   } else if (is_var_alu(src_var)) {



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/6] nir: Fix holes in nir_instr

2018-11-28 Thread Eero Tamminen


Hi,

On 28.11.2018 2.38, Ian Romanick wrote:

From: Ian Romanick 

Found using pahole.


While Pahole does static analysis for binaries, Valgrind has tool that
can do also _run-time_ analysis of structures and other heap allocations
utilization:
http://valgrind.org/docs/manual/dh-manual.html

With the structure members usage frequency information, one can move
often used parts to same cache-lines and infrequently used parts
together for more efficient cache utilization.  One may also find
structure members that get never used in practice although there's
code that refers them.


- Eero


Changes in peak memory usage according to Valgrind massif:

mean soft fp64 using uint64:   1,343,991,403 => 1,342,759,331
gfxbench5 aztec ruins high 11:63,619,971 =>63,555,571
deus ex mankind divided 148:  62,887,728 =>62,845,304
deus ex mankind divided 2890: 72,399,750 =>71,922,686
dirt showdown 676:69,464,023 =>69,238,607
dolphin ubershaders 210:  78,359,728 =>77,822,072

Signed-off-by: Ian Romanick 
---
  src/compiler/nir/nir.h | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index a292ec73e1e..74c700026ad 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -486,7 +486,7 @@ typedef struct nir_register {
  #define nir_foreach_register_safe(reg, reg_list) \
 foreach_list_typed_safe(nir_register, reg, node, reg_list)
  
-typedef enum {

+typedef enum PACKED {
 nir_instr_type_alu,
 nir_instr_type_deref,
 nir_instr_type_call,
@@ -501,16 +501,16 @@ typedef enum {
  
  typedef struct nir_instr {

 struct exec_node node;
-   nir_instr_type type;
 struct nir_block *block;
-
-   /** generic instruction index. */
-   unsigned index;
+   nir_instr_type type;
  
 /* A temporary for optimization and analysis passes to use for storing

  * flags.  For instance, DCE uses this to store the "dead/live" info.
  */
 uint8_t pass_flags;
+
+   /** generic instruction index. */
+   unsigned index;
  } nir_instr;
  
  static inline nir_instr *




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Lets talk about autotools

2018-11-27 Thread Eero Tamminen


Hi,

On 27.11.2018 19.05, Matt Turner wrote:

On Tue, Nov 27, 2018 at 1:13 AM Timo Aaltonen  wrote:

On 17.11.2018 6.04, Dylan Baker wrote:

Quoting Dylan Baker (2018-09-17 09:44:07)

I feel like for !windows meson is in good enough shape at this point that we
can start having the discussion about deleting the autotools build. So, is there
anything left that autotools can do that meson cannot (that we actually want to
implement)? And, what is a reasonable time-table to remove the autotools build?
I think we could reasonably remove it as soon as 18.3 if others felt confident
that it would work for them.


Okay, time for an update on things and a chance to talk about what else we need.

Support for llvm-config (and any binary, actually) overriding has landed in
meson, and will be present in the 0.49.0 release, which is due out December 9th.


Hi, just a note that Ubuntu 18.04 LTS ships with meson 0.45.1 and will
get Mesa backports up until and including 20.0.x, so I wonder how
complex these required new features in meson are to be backported, or
perhaps easily worked around? Backporting a whole new version of meson
might not happen..


I understand the LTS concept, but what's the value in never upgrading
something like a build tool like Meson? Yeah, new versions give a
possibility of regressions, but with something evolving as quickly as
Meson the version available in April 2018 becomes less useful for its
intended purpose with each passing month...


Ubuntu has so called hardware enabling (HWE) packages, which provide
newer versions of kernel, mesa and few other components for LTS, and
a meta package(s) that pull those in. They're based on package versions
tested in development versions of Ubuntu.

For example, relevant Ubuntu 18.10 packages would be available as
HWE packages for 18.04 somewhere around February, according to
preliminary Ubuntu kernel schedule.

Of the packages relevant to Mesa, 18.10 has for example:
- kernel v4.18   (18.04 has v4.15)
- LLVM v7.0  (18.04 has LLVM v6)
- Meson v0.47.2  (18.04 has v0.45)

I don't know how much that 4 month gap (before new development
distro release package versions become available in preceding LTS
release as HW packages) fluctuates.

Timo?


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] meson: Bump version to 0.46 for python module

2018-11-23 Thread Eero Tamminen


Hi,

On 21.11.2018 20.13, Dylan Baker wrote:

From: Arfrever Frehtes Taifersar Arahesis 

Meson has two modules for finding python, the python3 module and the
python module. Python3 is older, and has some corner cases, python is
newer, has no known corner cases and can detect python2. Things have
generally seemed to work okay for us using python3, but there are cases
where things fall down (such as if you have python 3.4 as your default
python3.

Debian provides 0.48.x in buster (testing)
fedora has 0.47.x in 27 and 28
fedora has 0.48.x in 29
arch has 0.48.x


Ubuntu 18.04 LTS has meson v0.45.1 (and LLVM v6 & kernel v4.15).

First HW enabling packages from 18.10 (kernel v4.18, LLVM v7,
meson v0.47.2) are scheduled for next February 2019.

It would be nice if build breakage would be postponed enough
that deps would be available there too.


- Eero



cc: matts...@gmail.com
distro-bug: https://bugs.gentoo.org/671308
Reviewed-by: Dylan Baker 
---
  docs/meson.html | 2 +-
  meson.build | 4 ++--
  2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/meson.html b/docs/meson.html
index 68f80d6ac42..af109d3d8b0 100644
--- a/docs/meson.html
+++ b/docs/meson.html
@@ -24,7 +24,7 @@ for production
  The meson build is tested on Linux, macOS, Cygwin and Haiku, FreeBSD,
  DragonflyBSD, NetBSD, and should work on OpenBSD.
  
-Mesa requires Meson >= 0.45.0 to build.

+Mesa requires Meson >= 0.46.0 to build.
  
  Some older versions of meson do not check that they are too old and will error

  out in odd ways.
diff --git a/meson.build b/meson.build
index 33f4e5ad3cf..ee2d1a82984 100644
--- a/meson.build
+++ b/meson.build
@@ -25,7 +25,7 @@ project(
  [find_program('python', 'python2', 'python3'), 'bin/meson_get_version.py']
).stdout(),
license : 'MIT',
-  meson_version : '>= 0.45',
+  meson_version : '>= 0.46',
default_options : ['buildtype=debugoptimized', 'b_ndebug=if-release', 
'c_std=c99', 'cpp_std=c++11']
  )
  
@@ -709,7 +709,7 @@ if with_platform_haiku

pre_args += '-DHAVE_HAIKU_PLATFORM'
  endif
  
-prog_python = import('python3').find_python()

+prog_python = import('python').find_installation('python3')
  has_mako = run_command(
prog_python, '-c',
'''



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/5] i965/icl: Fix L3 configurations

2018-11-16 Thread Eero Tamminen


Hi,

On 16.11.2018 10.33, Francisco Jerez wrote:

Kenneth Graunke  writes:

[...]

Perhaps we'll get both configs working, and then will want to be able
to select between them.  I question whether the additional URB is truly
that valuable - how large are the actual gains? - considering that we
have to stall in order to reconfigure everything anyway...


It's more about value of additional space for caching textures.

One can calculate required max URB space when GS/TS isn't used, whereas 
textures can fill all available cache.  For example, if draw does just a 
single quad, L3 is better utilized with minimal URB space and leaving 
rest for texture caching.




That just means that the update frequency needs to be low enough for the
stall overhead to be negligible -- E.g. at batch buffer boundaries or
wherever we're getting stalled anyway.



- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/11] intel/fs: Add a generic SEND opcode and use it on

2018-11-05 Thread Eero Tamminen


Hi,

On 3.11.2018 2.06, Jason Ekstrand wrote:

This patch series is something we've talked about doing for a while and
haven't gotten around to yet.  It implements a generic SEND opcode and then
reworks a bunch of the current SEND instructions such as texturing to use
that instead of piles of shader codegen.  Among other things, this gives us
substantially better scheduling of instructions around indirect sends.
They currently generate up to 5 ALU instructions in the generator to build
the indirect descriptor and this is reduced to 1 with the other 4 happening
in the IR level where we can schedule.


Would that have any impact on (currently missing) GEN9+ SENDS support?

Splitted send message uses 2 register ranges which can be independently
allocated & updated (can improve performance in cases like MRT and blur
shaders with several consecutive & partly similar sends).

I.e. instructions updating those register ranges should be scheduled
differently to get advantage of SENDS.  Would that need to be taken
into account at IR level too?


- Eero


Jason Ekstrand (11):
   intel/defines: Explicitly cast to uint32_t in SET_FIELD and SET_BITS
   intel/fs: Handle IMAGE_SIZE in size_read() and is_send_from_grf()
   intel/fs: Take an explicit exec size in brw_surface_payload_size()
   intel/eu: Add has_simd4x2 bools to surface_write functions
   intel/eu: Rework surface descriptor helpers
   intel/fs: Add a generic SEND opcode
   intel/fs: Use the generic SEND opcode for surface messages
   intel/fs: Mark texture surfaces used in brw_fs_nir
   intel/fs: Use a logical opcode for IMAGE_SIZE
   intel/fs: Use SHADER_OPCODE_SEND for texturing on gen7+
   intel/fs: Use SHADER_OPCODE_SEND for varying UBO pulls on gen7+

  src/intel/compiler/brw_eu.h   | 237 +--
  src/intel/compiler/brw_eu_defines.h   |  13 +-
  src/intel/compiler/brw_eu_emit.c  | 376 +++---
  src/intel/compiler/brw_fs.cpp | 371 ++---
  src/intel/compiler/brw_fs.h   |  12 +-
  src/intel/compiler/brw_fs_cse.cpp |   6 +-
  src/intel/compiler/brw_fs_generator.cpp   | 334 +++-
  src/intel/compiler/brw_fs_nir.cpp |  30 +-
  src/intel/compiler/brw_fs_reg_allocate.cpp|   6 +-
  .../compiler/brw_schedule_instructions.cpp|  90 -
  src/intel/compiler/brw_shader.cpp |  10 +-
  src/intel/compiler/brw_shader.h   |   5 +
  src/mesa/drivers/dri/i965/brw_defines.h   |   2 +-
  13 files changed, 778 insertions(+), 714 deletions(-)



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] intel/compiler: Print message descriptor as immediate source

2018-10-26 Thread Eero Tamminen


Hi,

Toni had a working assembler in early 2017, but he had to stop working
on it because nobody reviewed the patches needed for the disassembler.

Maybe his patch series for the disassembly output still has something
useful for your assembler:

* i965 shader disassembly label support:
https://patchwork.freedesktop.org/series/19946/
* i965 shader disassembly formatting changes:
https://patchwork.freedesktop.org/series/19945/
* Always print message descriptor and SFID for SEND instructions:
https://patchwork.freedesktop.org/patch/223750/

?

- Eero

On 10/25/18 2:25 AM, Sagar Ghuge wrote:

While disassembling send(c) instruction print message descriptor as
immediate source operand along with message descriptor. This allows
assembler to read immediate source operand and set bits accordingly.

Signed-off-by: Sagar Ghuge 
---
  src/intel/compiler/brw_disasm.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_disasm.c b/src/intel/compiler/brw_disasm.c
index 6a7e988641..9c6df9e645 100644
--- a/src/intel/compiler/brw_disasm.c
+++ b/src/intel/compiler/brw_disasm.c
@@ -1606,7 +1606,12 @@ brw_disassemble_inst(FILE *file, const struct 
gen_device_info *devinfo,
   /* show the indirect descriptor source */
   pad(file, 48);
   err |= src1(file, devinfo, inst);
-  }
+ pad(file, 64);
+  } else
+ pad(file, 48);
+
+  /* Print message descriptor as immediate source */
+  fprintf(file, "0x%08"PRIx64, inst->data[1] >> 32);
  
newline(file);

pad(file, 16);
@@ -1615,7 +1620,7 @@ brw_disassemble_inst(FILE *file, const struct 
gen_device_info *devinfo,
fprintf(file, "");
err |= control(file, "SFID", devinfo->gen >= 6 ? gen6_sfid : gen4_sfid,
   sfid, &space);
-
+  string(file, " MsgDesc:");
  
if (brw_inst_src1_reg_file(devinfo, inst) != BRW_IMMEDIATE_VALUE) {

   format(file, " indirect");



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/5] anv: Add a NIR cache

2018-10-16 Thread Eero Tamminen


Hi,

On 10/13/18 3:08 AM, Jason Ekstrand wrote:

This patch series adds a simple NIR shader cache that sits right after
spirv_to_nir and brw_preprocess_nir and before linking.  This should help
alleviate some of the added overhead of link-time optimization since most
of the NIR-level optimization is now cached prior to linking.

I have no numbers to back this series up; just intuition.


Fortunately approximate numbers of where the bottlenecks are, are easy
to come by for shader compilation.

1. Start use-case doing lot of shader compilation (shader-db?)
2. "sudo perf record -a"
3. ^C when shader compilation stops
4. "sudo perf report"

When I profiled GL shader compilation before Mesa switched to NIR,
linking phase was 2/3 of the cycles taken by the whole shader compilation.

(I.e. why shader cache needed to cache linking results to have 
significant impact.)



- Eero


Jason Ekstrand (5):
   anv/pipeline: Move wpos and input attachment lowering to lower_nir
   anv/pipeline: Hash shader modules and spec constants separately
   compiler/types: Serialize/deserialize subpass input types correctly
   anv/pipeline_cache: Add support for caching NIR
   anv/pipeline: Cache the pre-lowered NIR

  src/compiler/glsl_types.cpp   |   4 +-
  src/intel/vulkan/anv_pipeline.c   | 118 ++
  src/intel/vulkan/anv_pipeline_cache.c | 100 ++
  src/intel/vulkan/anv_private.h|  18 
  4 files changed, 204 insertions(+), 36 deletions(-)



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv: Use separate MOCS settings for external BOs

2018-10-03 Thread Eero Tamminen


Hi,

On 10/3/18 12:11 AM, Jason Ekstrand wrote:

On Broadwell and above, we have to use different MOCS settings to allow
the kernel to take over and disable caching when needed for external
buffers.  On Broadwell, this is especially important because the kernel
can't disable eLLC so we have to do it in userspace.  We very badly
don't want to do that on everything so we need separate MOCS for
external and internal BOs.

In order to do this, we add an anv-specific BO flag for "external" and
use that to distinguish between buffers which may be shared with other
processes and/or display and those which are entirely internal.  That,
together with an anv_mocs_for_bo helper lets us choose the right MOCS
settings for each BO use.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99507
Cc: mesa-sta...@lists.freedesktop.org


Tested-By: Eero Tamminen 

This fixes the horrible corruption (especially with modifiers)
in bug 99507. Tested on SKL GT2.


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] util/u_queue: don't inherit thread affinity from parent thread

2018-10-03 Thread Eero Tamminen


Hi,

On 10/3/18 2:00 AM, Marek Olšák wrote:

On Tue, Oct 2, 2018 at 6:36 PM Rob Clark  wrote:

On Tue, Oct 2, 2018 at 6:30 PM Marek Olšák  wrote:

From: Marek Olšák 

[...]

Just curious (and maybe I missed some previous discussion), would this
override taskset?

Asking because when benchmarking on big/little arm SoCs I tend to use
taskset to pin things to either the fast cores or slow cores, to
eliminate a source of uncertainty in the result.  (And I use u_queue
to split of the 2nd half of batch submits, Ie. the part that generates
gmem/tiling cmds and does the kernel submit ioctl).  Would be slightly
annoying to loose that ability to control which group of cores the
u_queue thread runs on.

(But admittedly this is kind of an edge case, so I guess an env var to
override the behavior would be ok.)


I don't know, but I guess it affects it.

pipe_context::set_context_param(ctx,
PIPE_CONTEXT_PARAM_PIN_THREADS_TO_L3_CACHE, L3_group_index); is
similar to what you need.

The ideal option would be to have such default behavior on ARM that is
the most desirable. An env var is the second option.


Use of taskset is not ARM specific.  I've seen nasty kernel scheduler 
CPU core bouncing issues also on other platforms.  Debugging those 
required using taskset for the given game.


Also, to get any kind of reliable CPU utilization figure, one also needs 
to bind task to a specific core (or build a kernel with freq stats 
support, which isn't enabled by default in any distro kernels, and have 
suitable tooling to parse that data).


So, which games use thread affinity for threads that use Mesa, and how 
much that is a problem, i.e. why this patch is needed?



- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/4] drirc: Initial blacklist for adaptive sync

2018-09-13 Thread Eero Tamminen


Hi,

On 11.09.2018 19:24, Nicholas Kazlauskas wrote:

Applications that don't present at a predictable rate (ie. not games)
shouldn't have adapative sync enabled. This list covers some of the
common desktop compositors and some web browsers.


Why videos (in video players like vlc & mpv) wouldn't have predictable rate?

Them being able to use VRR to get refresh rate that exactly matches the 
video's frame rate (24, 25, 30, 50 etc), or some multiple of it, should 
look much better than player being forced to interpolate to something 
else...



- Eero


Signed-off-by: Nicholas Kazlauskas 
---
  src/util/00-mesa-defaults.conf | 37 ++
  1 file changed, 37 insertions(+)

diff --git a/src/util/00-mesa-defaults.conf b/src/util/00-mesa-defaults.conf
index a68bc03027..a5fbead7b9 100644
--- a/src/util/00-mesa-defaults.conf
+++ b/src/util/00-mesa-defaults.conf
@@ -21,6 +21,8 @@ Application bugs worked around in this file:
built-ins (specifically gl_VertexID), which causes the vertex shaders to 
fail
to compile.
  
+* Applications that are not suitable for adapative sync are blacklisted here.

+
  TODO: document the other workarounds.
  
  -->

@@ -306,6 +308,41 @@ TODO: document the other workarounds.
  
  
  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
  
  
  



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl: fixer lexer for unreachable defines

2018-09-03 Thread Eero Tamminen


Hi,

Seems a lot of replicated code to deal with regression from few line 
line patch, but at least it fixed the failing GfxBench ALU2 test. :-)


Tested-By: Eero Tamminen 


- Eero

On 01.09.2018 16:57, Timothy Arceri wrote:

If we have something like:

#ifdef NOT_DEFINED
#define A_MACRO(x) \
if (x)
#endif

The # on the #define is not skipped but the define itself is so
this then gets recognised as #if.

Until 28a3731e3f this didn't happen because we ended up in
{NONSPACE} where BEGIN INITIAL was called stopping the
problem from happening.

This change makes sure we never call RETURN_TOKEN_NEVER_SKIP for
if/else/endif when processing a define.

Cc: Ian Romanick 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107772
---
  src/compiler/glsl/glcpp/glcpp-lex.l | 60 ++---
  src/compiler/glsl/glcpp/glcpp.h |  1 +
  2 files changed, 38 insertions(+), 23 deletions(-)

diff --git a/src/compiler/glsl/glcpp/glcpp-lex.l 
b/src/compiler/glsl/glcpp/glcpp-lex.l
index fe5845acd4e..f7003da0cc8 100644
--- a/src/compiler/glsl/glcpp/glcpp-lex.l
+++ b/src/compiler/glsl/glcpp/glcpp-lex.l
@@ -289,6 +289,7 @@ HEXADECIMAL_INTEGER 0[xX][0-9a-fA-F]+[uU]?
   * token. */
if (parser->first_non_space_token_this_line) {
BEGIN HASH;
+   yyextra->in_define = false;
}
  
  	RETURN_TOKEN_NEVER_SKIP (HASH_TOKEN);

@@ -336,43 +337,55 @@ HEXADECIMAL_INTEGER   0[xX][0-9a-fA-F]+[uU]?
/* For the pre-processor directives, we return these tokens
 * even when we are otherwise skipping. */
  ifdef {
-   BEGIN INITIAL;
-   yyextra->lexing_directive = 1;
-   yyextra->space_tokens = 0;
-   RETURN_TOKEN_NEVER_SKIP (IFDEF);
+   if (!yyextra->in_define) {
+   BEGIN INITIAL;
+   yyextra->lexing_directive = 1;
+   yyextra->space_tokens = 0;
+   RETURN_TOKEN_NEVER_SKIP (IFDEF);
+   }
  }
  
  ifndef {

-   BEGIN INITIAL;
-   yyextra->lexing_directive = 1;
-   yyextra->space_tokens = 0;
-   RETURN_TOKEN_NEVER_SKIP (IFNDEF);
+   if (!yyextra->in_define) {
+   BEGIN INITIAL;
+   yyextra->lexing_directive = 1;
+   yyextra->space_tokens = 0;
+   RETURN_TOKEN_NEVER_SKIP (IFNDEF);
+   }
  }
  
  if/[^_a-zA-Z0-9] {

-   BEGIN INITIAL;
-   yyextra->lexing_directive = 1;
-   yyextra->space_tokens = 0;
-   RETURN_TOKEN_NEVER_SKIP (IF);
+   if (!yyextra->in_define) {
+   BEGIN INITIAL;
+   yyextra->lexing_directive = 1;
+   yyextra->space_tokens = 0;
+   RETURN_TOKEN_NEVER_SKIP (IF);
+   }
  }
  
  elif/[^_a-zA-Z0-9] {

-   BEGIN INITIAL;
-   yyextra->lexing_directive = 1;
-   yyextra->space_tokens = 0;
-   RETURN_TOKEN_NEVER_SKIP (ELIF);
+   if (!yyextra->in_define) {
+   BEGIN INITIAL;
+   yyextra->lexing_directive = 1;
+   yyextra->space_tokens = 0;
+   RETURN_TOKEN_NEVER_SKIP (ELIF);
+   }
  }
  
  else {

-   BEGIN INITIAL;
-   yyextra->space_tokens = 0;
-   RETURN_TOKEN_NEVER_SKIP (ELSE);
+   if (!yyextra->in_define) {
+   BEGIN INITIAL;
+   yyextra->space_tokens = 0;
+   RETURN_TOKEN_NEVER_SKIP (ELSE);
+   }
  }
  
  endif {

-   BEGIN INITIAL;
-   yyextra->space_tokens = 0;
-   RETURN_TOKEN_NEVER_SKIP (ENDIF);
+   if (!yyextra->in_define) {
+   BEGIN INITIAL;
+   yyextra->space_tokens = 0;
+   RETURN_TOKEN_NEVER_SKIP (ENDIF);
+   }
  }
  
  error[^\r\n]* {

@@ -399,7 +412,8 @@ HEXADECIMAL_INTEGER 0[xX][0-9a-fA-F]+[uU]?
 *and not whitespace). This will generate an error.
 */
  define{HSPACE}* {
-   if (! parser->skipping) {
+   yyextra->in_define = true;
+   if (!parser->skipping) {
BEGIN DEFINE;
yyextra->space_tokens = 0;
RETURN_TOKEN (DEFINE_TOKEN);
diff --git a/src/compiler/glsl/glcpp/glcpp.h b/src/compiler/glsl/glcpp/glcpp.h
index c7e382ed30c..e786b24b132 100644
--- a/src/compiler/glsl/glcpp/glcpp.h
+++ b/src/compiler/glsl/glcpp/glcpp.h
@@ -197,6 +197,7 @@ struct glcpp_parser {
int first_non_space_token_this_line;
int newline_as_space;
int in_control_line;
+   bool in_define;
int paren_count;
int commented_newlines;
skip_node_t *skip_stack;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] Bounding box avx2 intrinsic algorithm for openGL/GLES

2018-08-31 Thread Eero Tamminen


Hi,

On 31.08.2018 08:25, kedar.j.kara...@intel.com wrote:

From: "J Karanje, Kedar" 

The feature is enabled by default during make however we need to
add the following to drirc to enable the feature at runtime.


vbo: Main algorithm & code to check for MVP & vertex position location
Build Files: Flags to enable BBOX Code and check AVX version
compiler: Code to recognize simple shader
   (gl_position is a simple function of mvp and vertex)
i965 & util: dri query to check if feature is enabled

vbo: Implements a bounding box algorithm for mesa,we hook into the default
 drawelements and drawrangelements and the MVP & vertex positions location
 and the corresponding program is got,we re-create the frustum planes
 using this data and also create a box around the object and use the 8
 vertices (box vertices) and check if the box is within the frustum or not,
 we drop the draw calls that are completely outside the view frustum and
 go for sub-boxes for objects that are intersecting with the frustum planes.

The current patch has been verified on KBL+Ubuntu 16.04, we noticed
8~10% improvements in GFxBench TREX offscreen and ~2% for Manhattan offscreen,
Platforms where avx2 is not supported shall still see ~6-8% improvement, the
other KPIs were not impacted.


I've tried this with Egypt and T-Rex on KBL GT3e, and I'm not seeing
a performance difference.  What I'm missing / what exactly I should
do to get it enabled, see that it's enabled, and to see the perf
difference?



Based on empirical data we have set minimum vertex count as 999 and the
sub-box size as 198, this provides the best results, we have also implemented
some level of caching for the box co-od and frustum plane co-od.
we have also optimized some algorithms to use avx2 when a target supports it.

Shader classification code is currently in hir and we have got review comments
to move the same to NIR.

Signed-off-by: Aravindan Muthukumar 
Signed-off-by: Yogesh Marathe 
---
  Android.common.mk|   19 +
  configure.ac |   34 +-
  src/compiler/glsl/ast_to_hir.cpp |  168 +++-
  src/compiler/glsl/glsl_parser_extras.cpp |   10 +
  src/compiler/glsl/glsl_parser_extras.h   |7 +
  src/compiler/glsl/linker.cpp |   18 +
  src/intel/common/gen_debug.c |7 +
  src/mesa/Makefile.sources|   11 +
  src/mesa/drivers/dri/i965/brw_context.c  |   17 +
  src/mesa/drivers/dri/i965/intel_screen.c |4 +
  src/mesa/main/bufferobj.c|   19 +
  src/mesa/main/mtypes.h   |   51 +
  src/mesa/program/Android.mk  |1 +
  src/mesa/program/program.c   |3 +
  src/mesa/vbo/vbo_bbox.c  | 1538 ++
  src/mesa/vbo/vbo_bbox.h  |  383 
  src/mesa/vbo/vbo_bbox_cache.c|  195 
  src/mesa/vbo/vbo_context.c   |   11 +-
  src/mesa/vbo/vbo_exec_array.c|   37 +-
  src/util/00-mesa-defaults.conf   |4 +
  src/util/xmlpool/t_options.h |5 +
  21 files changed, 2535 insertions(+), 7 deletions(-)
  mode change 100644 => 100755 src/compiler/glsl/ast_to_hir.cpp
  create mode 100644 src/mesa/vbo/vbo_bbox.c
  create mode 100644 src/mesa/vbo/vbo_bbox.h
  create mode 100644 src/mesa/vbo/vbo_bbox_cache.c

diff --git a/Android.common.mk b/Android.common.mk
index aa1b266..efd6792 100644
--- a/Android.common.mk
+++ b/Android.common.mk
@@ -21,6 +21,8 @@
  # FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
  # DEALINGS IN THE SOFTWARE.
  
+MESA_BBOX_ENABLE=true

+
  ifeq ($(LOCAL_IS_HOST_MODULE),true)
  LOCAL_CFLAGS += -D_GNU_SOURCE
  endif
@@ -80,6 +82,10 @@ LOCAL_CFLAGS += \
-fno-trapping-math \
-Wno-sign-compare
  
+ifeq ($(MESA_BBOX_ENABLE),true)

+LOCAL_CFLAGS += -DMESA_BBOX_OPT
+endif
+
  LOCAL_CPPFLAGS += \
-D__STDC_CONSTANT_MACROS \
-D__STDC_FORMAT_MACROS \
@@ -87,6 +93,10 @@ LOCAL_CPPFLAGS += \
-Wno-error=non-virtual-dtor \
-Wno-non-virtual-dtor
  
+ifeq ($(MESA_BBOX_ENABLE),true)

+LOCAL_CPPFLAGS += -DMESA_BBOX_OPT
+endif
+
  # mesa requires at least c99 compiler
  LOCAL_CONLYFLAGS += \
-std=c99
@@ -98,6 +108,15 @@ ifeq ($(filter 5 6 7 8 9, $(MESA_ANDROID_MAJOR_VERSION)),)
  LOCAL_CFLAGS += -DHAVE_TIMESPEC_GET
  endif
  
+ifeq ($(MESA_BBOX_ENABLE),true)

+#if defined(CONFIG_AS_AVX)
+LOCAL_CONLYFLAGS += -mavx
+#elif
+LOCAL_CONLYFLAGS += -msse4.1
+#endif
+endif
+
+
  ifeq ($(strip $(MESA_ENABLE_ASM)),true)
  ifeq ($(TARGET_ARCH),x86)
  LOCAL_CFLAGS += \
diff --git a/configure.ac b/configure.ac
index 4d9d9e5..dcdbcf3 100644
--- a/configure.ac
+++ b/configure.ac
@@ -278,7 +278,8 @@ _SAVE_LDFLAGS="$LDFLAGS"
  _SAVE_CPPFLAGS="$CPPFLAGS"
  
  dnl Compiler macros

-DEFINES="-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS"
+DEFINES="-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_

Re: [Mesa-dev] [PATCH 3/3] i965/screen: Allow modifiers on sRGB formats

2018-08-29 Thread Eero Tamminen


Hi,

On 29.08.2018 01:22, Jason Ekstrand wrote:

This effectively reverts a26693493570a9d0f0fba1be617e01ee7bfff4db which
was a misguided attempt at protecting intel_query_dma_buf_modifiers from
invalid formats.  Unfortunately, in some internal EGL cases, we can get
an SRGB format validly in this function.


While in SynMark EGL tests E2E RBC got completely lost, there was some 
perf impact (at most half of E2E RBC impact) also in GLX using onscreen 
versions of GfxBench ALU2, Tessellation and Manhattan 3.0 tests.


This patch series (with the "continue" fix you mentioned in later 
comment) fixes both SynMark & GfxBench regressions.


Tested-By: Eero Tamminen 


- Eero


Rejecting such formats caused
us to not allow CCS in some cases where we should have been allowing it.

There's some question of whether or not we really should be using SRGB
"fourcc" formats that aren't actually in drm_foucc.h but there's not
much harm in allowing them through here.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107223
Fixes: a26693493570 "i965/screen: Return false for unsupported..."
---
  src/mesa/drivers/dri/i965/intel_screen.c | 14 +++---
  1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index eaf5a3b9feb..ac1938f6204 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -1274,9 +1274,9 @@ static bool
  intel_image_format_is_supported(const struct gen_device_info *devinfo,
  const struct intel_image_format *fmt)
  {
-   if (fmt->fourcc == __DRI_IMAGE_FOURCC_SARGB ||
-   fmt->fourcc == __DRI_IMAGE_FOURCC_SABGR)
-  return false;
+   /* Currently, all formats with an intel_image_format are available on all
+* platforms so there's really nothing to check there.
+*/
  
  #ifndef NDEBUG

 if (fmt->nplanes == 1) {
@@ -1302,6 +1302,14 @@ intel_query_dma_buf_formats(__DRIscreen *_screen, int 
max,
 int num_formats = 0, i;
  
 for (i = 0; i < ARRAY_SIZE(intel_image_formats); i++) {

+  /* These two formats are valid DRI formats but do not exist in
+   * drm_fourcc.h in the Linux kernel.  We don't want to accidentally
+   * advertise them through the EGL layer.
+   */
+  if (intel_image_formats[i].fourcc == __DRI_IMAGE_FOURCC_SARGB ||
+  intel_image_formats[i].fourcc == __DRI_IMAGE_FOURCC_SABGR)
+ return false;
+
if (!intel_image_format_is_supported(&screen->devinfo,
 &intel_image_formats[i]))
   continue;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/2] Implement INTEL_fragment_shader_ordering

2018-08-28 Thread Eero Tamminen


Hi,

On 28.08.2018 19:15, Rogovin, Kevin wrote:

  On the questions of tests: If people want, I can adapt the test in piglit for 
ARB_fragment_shader_interlock to this INTEL one. In general, I have an 
app/library that uses the extension and testing of that does definitely work on 
i965 (which should be utterly unsurprising since the produced assembly is 
identical).


Did you mean this library:
https://01.org/fast-ui-draw
?


- Eero


Indeed the current implementation in Mesa of ARB_fragment_shader_interlock is 
as flexible as the INTEL one since the compiler accepts code with the begin/end 
interlock anywhere since the only backend that supports it, i965, can easily 
accept it. I view the INTEL_fragment_shader_ordering implementation in a 
similar light as the NV_fragment_shader_interlock where it actually does not 
really do anything new, but signals to an application that the Mesa/i965 
implementation allows more (and does it correctly) than the ARB/NV extensions 
restrict to.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 2/2] i965, anv: Use INTEL_DEBUG for disk_cache driver flags

2018-07-23 Thread Eero Tamminen


Hi,

Should this fix also:
https://bugs.freedesktop.org/show_bug.cgi?id=106382
?

- Eero

On 23.07.2018 07:27, Jordan Justen wrote:

Since various options within INTEL_DEBUG could impact code generation,
we need to set the disk cache driver_flags parameter based on the
INTEL_DEBUG flags in use.

An example that will affect the program generated by i965 is the
INTEL_DEBUG=nocompact option.

The DEBUG_DISK_CACHE_MASK value is added to mask the settings of
INTEL_DEBUG that can affect program generation.

v2:
  * Use driver_flags (Tim)
  * Also update Anvil (Jason)

Signed-off-by: Jordan Justen 
Reviewed-by: Lionel Landwerlin  (v1)
---
  src/intel/common/gen_debug.h   | 5 +
  src/intel/vulkan/anv_device.c  | 3 ++-
  src/mesa/drivers/dri/i965/brw_disk_cache.c | 3 ++-
  3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/intel/common/gen_debug.h b/src/intel/common/gen_debug.h
index f6c44eeb912..aa9f3cf80d7 100644
--- a/src/intel/common/gen_debug.h
+++ b/src/intel/common/gen_debug.h
@@ -84,6 +84,11 @@ extern uint64_t INTEL_DEBUG;
  #define DEBUG_COLOR   (1ull << 40)
  #define DEBUG_REEMIT  (1ull << 41)
  
+/* These flags may affect program generation */

+#define DEBUG_DISK_CACHE_MASK \
+   (DEBUG_SHADER_TIME | DEBUG_NO16 | DEBUG_NO_DUAL_OBJECT_GS | DEBUG_NO8 | \
+   DEBUG_SPILL_FS | DEBUG_SPILL_VEC4 | DEBUG_NO_COMPACTION | DEBUG_DO32)
+
  #ifdef HAVE_ANDROID_PLATFORM
  #define LOG_TAG "INTEL-MESA"
  #if ANDROID_API_LEVEL >= 26
diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 247ba641336..97a71563b8a 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -286,7 +286,8 @@ anv_physical_device_init_disk_cache(struct 
anv_physical_device *device)
 char timestamp[41];
 _mesa_sha1_format(timestamp, device->driver_build_sha1);
  
-   device->disk_cache = disk_cache_create(renderer, timestamp, 0);

+   const uint64_t driver_flags = INTEL_DEBUG & DEBUG_DISK_CACHE_MASK;
+   device->disk_cache = disk_cache_create(renderer, timestamp, driver_flags);
  #else
 device->disk_cache = NULL;
  #endif
diff --git a/src/mesa/drivers/dri/i965/brw_disk_cache.c 
b/src/mesa/drivers/dri/i965/brw_disk_cache.c
index a678c355b9d..8f1b064fd61 100644
--- a/src/mesa/drivers/dri/i965/brw_disk_cache.c
+++ b/src/mesa/drivers/dri/i965/brw_disk_cache.c
@@ -393,6 +393,7 @@ brw_disk_cache_init(struct intel_screen *screen)
 char timestamp[41];
 _mesa_sha1_format(timestamp, id_sha1);
  
-   screen->disk_cache = disk_cache_create(renderer, timestamp, 0);

+   const uint64_t driver_flags = INTEL_DEBUG & DEBUG_DISK_CACHE_MASK;
+   screen->disk_cache = disk_cache_create(renderer, timestamp, driver_flags);
  #endif
  }



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 2/3] nir: Add a discard optimization pass

2018-07-11 Thread Eero Tamminen

Hi,

On 06.07.2018 00:28, Jason Ekstrand wrote:
On Thu, Jul 5, 2018 at 2:18 PM, Jason Ekstrand > wrote:

[...]

>> Optimizing for the latter case is an essentially
>> heuristic assumption that needs to be verified experimentally.  Have 
you
>> tested the effect of this pass on non-DX workloads extensively?
>>
>
> Yes, it is a trade-off.  No, I have not done particularly extensive
> testing.  We do, however, know of non-DXVK workloads that would 
benefit
> from this.  I believe Manhattan is one such example though I have not 
yet
> benchmarked it.
>

You should grab some numbers then to make sure there are no
regressions...

I'm working on that.  Unfortunately the perf system is giving me
trouble so I don't have the numbers yet.

But keep in mind that the i965 scheduler is already
performing a similar optimization (locally, but with cycle-count
information).  This will only help over the existing
optimization if the
shaders that represent a bottleneck in Manhattan have sufficient
control
flow for the basic block boundaries to represent a problem to the
(local) scheduler.

I'm not sure about the manhattan shader but the Skyrim shader does
have control flow which the discard has to get moved above.

I have results from the perf system now and somehow this pass makes 
manhattan noticeably worse.  I'll look into that.

Note: All the more complex GfxBench tests use discard:
Egypt, T-Rex, Manhattan, CarChase, AztecRuins...

(Most of the other benchmarks we're running, don't use them.)

- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Loop unrolling and if statement opts

2018-07-11 Thread Eero Tamminen


Hi,

On 11.07.2018 12:00, Timothy Arceri wrote:

On 11/07/18 18:20, Eero Tamminen wrote:

Have you considered partial loop unrolling support?

I.e. when loop counter is known, but too high for full unroll, doing 
partial loop unrolling (e.g. unroll 4x times) and dividing loop 
counter by same amount (if it didn't divide evenly, need to unroll 
remainder outside of loop).


This is supported e.g. by Intel Windows compiler.



Do you have any examples of apps this helps?


Sorry, no. This was found a while ago when doing synthetic tests for 
things that are handled by compilers on the CPU side (GCC, LLVM...).




If I remember correctly there are very few (if any?) shaders in shader-db
that are not unrolled due to the limit.


Loop unrolling should probably have some limits based on shader cache 
size and how many instructions the loop content has, not on the loop 
counter, but only backend has that info...



- Eero


There was a measurable impact from the unroll limits on the Talos 
benchmark for RADV. I guess it might be interesting to try partial 
unrolling with that Game, it would be good to know where else it might 
help.




 - Eero

On 11.07.2018 09:48, Timothy Arceri wrote:

This series started out as me trying to unrolls some useless loops I
spotted in some shaders from DXVK games (see patch 10), but I found
some other issues and improvements along the way.

The biggest winner seem like it could be the dolphin uber shaders on
i965 (on radeonsi the shaders don't seem to have spilling issues).
The loops in the uber shaders that are unrolled are those used as
wrappers around switchs by GLSL IR.

shader-db results for the series on IVB (note as the loops that are
unrolled only have a single iteration I enabled shader-db reporting
on shaders where loops are unrolled):

total instructions in shared programs: 10018187 -> 10016468 (-0.02%)
instructions in affected programs: 104080 -> 102361 (-1.65%)
helped: 36
HURT: 15

total cycles in shared programs: 220065064 -> 154529655 (-29.78%)
cycles in affected programs: 126063017 -> 60527608 (-51.99%)
helped: 51
HURT: 0

total loops in shared programs: 2515 -> 2308 (-8.23%)
loops in affected programs: 903 -> 696 (-22.92%)
helped: 51
HURT: 0

total spills in shared programs: 4370 -> 4124 (-5.63%)
spills in affected programs: 1397 -> 1151 (-17.61%)
helped: 9
HURT: 12

total fills in shared programs: 4581 -> 4419 (-3.54%)
fills in affected programs: 2201 -> 2039 (-7.36%)
helped: 9
HURT: 15

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Loop unrolling and if statement opts

2018-07-11 Thread Eero Tamminen


Hi,

Have you considered partial loop unrolling support?

I.e. when loop counter is known, but too high for full unroll, doing 
partial loop unrolling (e.g. unroll 4x times) and dividing loop counter 
by same amount (if it didn't divide evenly, need to unroll remainder 
outside of loop).


This is supported e.g. by Intel Windows compiler.


- Eero

On 11.07.2018 09:48, Timothy Arceri wrote:

This series started out as me trying to unrolls some useless loops I
spotted in some shaders from DXVK games (see patch 10), but I found
some other issues and improvements along the way.

The biggest winner seem like it could be the dolphin uber shaders on
i965 (on radeonsi the shaders don't seem to have spilling issues).
The loops in the uber shaders that are unrolled are those used as
wrappers around switchs by GLSL IR.

shader-db results for the series on IVB (note as the loops that are
unrolled only have a single iteration I enabled shader-db reporting
on shaders where loops are unrolled):

total instructions in shared programs: 10018187 -> 10016468 (-0.02%)
instructions in affected programs: 104080 -> 102361 (-1.65%)
helped: 36
HURT: 15

total cycles in shared programs: 220065064 -> 154529655 (-29.78%)
cycles in affected programs: 126063017 -> 60527608 (-51.99%)
helped: 51
HURT: 0

total loops in shared programs: 2515 -> 2308 (-8.23%)
loops in affected programs: 903 -> 696 (-22.92%)
helped: 51
HURT: 0

total spills in shared programs: 4370 -> 4124 (-5.63%)
spills in affected programs: 1397 -> 1151 (-17.61%)
helped: 9
HURT: 12

total fills in shared programs: 4581 -> 4419 (-3.54%)
fills in affected programs: 2201 -> 2039 (-7.36%)
helped: 9
HURT: 15

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 6/7] mesa/util: add allow_glsl_relaxed_es driconfig override

2018-06-08 Thread Eero Tamminen


Hi,

On 08.06.2018 15:19, Timothy Arceri wrote:

This relaxes a number of ES shader restrictions allowing shaders
to follow more desktop GLSL like rules.

This initial implementation relaxes the following:

  - allows linking ES shaders with desktop shaders
  - allows mismatching precision qualifiers
  - always enables standard derivative builtins

These relaxations allow Google Earth VR shaders to compile.


The fixes should rather go to apps, than bug workarounds to Mesa.

Is bug reported to Google Earth VR maintainers and have they stated that 
they aren't going to fix the issues?


If different shader stages have different precision in a variable that's 
actually being used, that's a real bug in the application.


(There are lot of apps on Android that have mismatches on unused/dead 
variables, but for those the mismatches don't matter in practice.)



- Eero


---
  src/compiler/glsl/builtin_functions.cpp   |  3 ++-
  src/compiler/glsl/linker.cpp  | 22 +++
  .../auxiliary/pipe-loader/driinfo_gallium.h   |  1 +
  src/gallium/include/state_tracker/st_api.h|  1 +
  src/gallium/state_trackers/dri/dri_screen.c   |  2 ++
  src/mesa/main/mtypes.h|  6 +
  src/mesa/state_tracker/st_extensions.c|  3 +++
  src/util/xmlpool/t_options.h  |  5 +
  8 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/src/compiler/glsl/builtin_functions.cpp 
b/src/compiler/glsl/builtin_functions.cpp
index e1ee9943172..de206b8fd71 100644
--- a/src/compiler/glsl/builtin_functions.cpp
+++ b/src/compiler/glsl/builtin_functions.cpp
@@ -446,7 +446,8 @@ fs_oes_derivatives(const _mesa_glsl_parse_state *state)
  {
 return state->stage == MESA_SHADER_FRAGMENT &&
(state->is_version(110, 300) ||
-   state->OES_standard_derivatives_enable);
+   state->OES_standard_derivatives_enable ||
+   state->ctx->Const.AllowGLSLRelaxedES);
  }
  
  static bool

diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp
index f060c5316fa..3159af181f1 100644
--- a/src/compiler/glsl/linker.cpp
+++ b/src/compiler/glsl/linker.cpp
@@ -894,7 +894,7 @@ validate_intrastage_arrays(struct gl_shader_program *prog,
   * Perform validation of global variables used across multiple shaders
   */
  static void
-cross_validate_globals(struct gl_shader_program *prog,
+cross_validate_globals(struct gl_context *ctx, struct gl_shader_program *prog,
 struct exec_list *ir, glsl_symbol_table *variables,
 bool uniforms_only)
  {
@@ -1115,7 +1115,8 @@ cross_validate_globals(struct gl_shader_program *prog,
   /* Check the precision qualifier matches for uniform variables on
* GLSL ES.
*/
- if (prog->IsES && !var->get_interface_type() &&
+ if (!ctx->Const.AllowGLSLRelaxedES &&
+ prog->IsES && !var->get_interface_type() &&
   existing->data.precision != var->data.precision) {
  if ((existing->data.used && var->data.used) || prog->data->Version 
>= 300) {
 linker_error(prog, "declarations for %s `%s` have "
@@ -1168,15 +1169,16 @@ cross_validate_globals(struct gl_shader_program *prog,
   * Perform validation of uniforms used across multiple shader stages
   */
  static void
-cross_validate_uniforms(struct gl_shader_program *prog)
+cross_validate_uniforms(struct gl_context *ctx,
+struct gl_shader_program *prog)
  {
 glsl_symbol_table variables;
 for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
if (prog->_LinkedShaders[i] == NULL)
   continue;
  
-  cross_validate_globals(prog, prog->_LinkedShaders[i]->ir, &variables,

- true);
+  cross_validate_globals(ctx, prog, prog->_LinkedShaders[i]->ir,
+ &variables, true);
 }
  }
  
@@ -2202,7 +2204,8 @@ link_intrastage_shaders(void *mem_ctx,

 for (unsigned i = 0; i < num_shaders; i++) {
if (shader_list[i] == NULL)
   continue;
-  cross_validate_globals(prog, shader_list[i]->ir, &variables, false);
+  cross_validate_globals(ctx, prog, shader_list[i]->ir, &variables,
+ false);
 }
  
 if (!prog->data->LinkStatus)

@@ -4799,7 +4802,8 @@ link_shaders(struct gl_context *ctx, struct 
gl_shader_program *prog)
min_version = MIN2(min_version, prog->Shaders[i]->Version);
max_version = MAX2(max_version, prog->Shaders[i]->Version);
  
-  if (prog->Shaders[i]->IsES != prog->Shaders[0]->IsES) {

+  if (!ctx->Const.AllowGLSLRelaxedES &&
+  prog->Shaders[i]->IsES != prog->Shaders[0]->IsES) {
   linker_error(prog, "all shaders must use same shading "
"language version\n");
   goto done;
@@ -4817,7 +4821,7 @@ link_shaders(struct gl_context *ctx, struct 
gl_shader_program *prog)
 /* In desktop GLSL, d

Re: [Mesa-dev] Mesa logo animation (was: Move the Mesa Website to Sphinx)

2018-06-08 Thread Eero Tamminen


Hi,

On 08.06.2018 15:06, robdcl...@gmail.com wrote:

On Fri, Jun 8, 2018 at 3:02 AM, Jordan Justen  wrote:

On Thu, Jun 7, 2018 at 2:56 AM Eero Tamminen  wrote:

...

For Mesa, WebGL could be more fitting implementation than SVG though...


https://github.com/gears3d/gears3d.github.io/blob/master/webgl10.js

One comment I would have for any animation on the main pages (as
opposed to a separate 'easter egg' page), it probably should be
significantly slower moving than the traditional 70 degrees / second.
The faster animation would be distracting on the main pages.


so one idea, which I think isn't too over the top, is to have the
static mesa-gears logo in top corner, but clicking on it starts/stops
the animation (just toggle between static and animated svg, I guess?)


Typically website logo leads back to the main page of the site, so maybe 
that functionality should be only on the main page?


(Easter eggs are supposed to be slight harder to find, right?)


If it's done with JS / SVG, maybe it could also fetch the latest Mesa 
release version number from some file in the site, and show that e.g. on 
the gear(s), rotating with them.



- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 00/16] Move the Mesa Website to Sphinx

2018-06-07 Thread Eero Tamminen


Hi,

On 07.06.2018 12:01, Erik Faye-Lund wrote:

Just as a fun toy, I decided to give an animated SVG "variation" of
this a go myself:

https://codepen.io/kusma/pen/vrXppL

The actual SVG can be found here:

https://gitlab.freedesktop.org/snippets/492

The gears were generated by this python script, based on the glxgears
source code:

https://gitlab.freedesktop.org/snippets/491

Now, dropping this onto the black background doesn't work that well,
as it gets a bit bland, so it's probably better to add back the colors
then.

Also, I'm not really sure if animation is a good idea or not.


Maybe it could be a link target for the static logo?

(Kind of website "easter egg").



But I definitely think logos should be vector rather than raster ;)


For Mesa, WebGL would be more fitting implementation than SVG though...


- Eero



On Sun, Jun 3, 2018 at 1:49 AM, Jason Ekstrand  wrote:

With all this talk of fancy looking new websites, I decided to take a crack
at being artsy and coming up with a fancy-looking logo to put on it.  Here's
a first rough draft:

https://people.freedesktop.org/~jekstrand/gears-logo-bold.png

It's not great but it turned out better than I expected.  That version looks
good on a light background but it could probably be tweaked to look good on
dark/black if we want.

--Jason



On Fri, Jun 1, 2018 at 2:46 AM, Eric Engestrom 
wrote:


On Thursday, 2018-05-31 21:33:16 -0400,
mesa-dev-boun...@lists.freedesktop.org wrote:

On Thu, May 31, 2018 at 8:31 PM, Jordan Justen
 wrote:

On 2018-05-31 16:06:13, Laura Ekstrand wrote:

A little bit of messing around this afternoon, led to this:

https://drive.google.com/file/d/1rteTv9415XEMN1E11fyN0l3hTUCCbgwz/view?usp=sharing


Nice. That confirms we can easily get back to a something closer to
the older look in short order. (Unless the consensus is to actually
stay with the generic look.)


The gears are from Font Awesome, which comes with the RTD format.


It'd be nice to get the icon back, but the gear glyph is an
improvement compared to the house glyph.

Anyway, I stick by what I said below too in reply to Eric about
withdrawing my feedback.


my $0.02 (which is probably worth even less when it comes to the topic
of aesthetics), Laura's new proposal looks much nicer than generic rtd
which looks better than current site..

I mostly don't want to get a much needed infrastructure/process
improvement derailed with a bikeshed discussion about themes.  Nothing
against playing with theme options and seeing what people (with better
taste than me) like, just don't want that to be something that derails
the bigger picture ;-)


Entirely agreed. I also think the old style was distinctive and had
become part of the "personality" of our website, and I would like the
new website to look similar, but this is a minor detail that we can
trivially alter later.

For now let's focus on the infrastructure changes, discussing the style
can be done in parallel or later, but shouldn't block this.



BR,
-R




Thanks,

-Jordan


On Thu, May 31, 2018 at 1:55 PM, Jordan Justen

wrote:


On 2018-05-31 12:27:04, Eric Anholt wrote:

Jordan Justen  writes:


On 2018-05-24 17:37:09, Laura Ekstrand wrote:

A few of the commits are quite large and awaiting list
approval.  I

suggest

that you take a look at the code here:
https://gitlab.freedesktop.org/ldeks/mesa/tree/website1_75,
and the new website here: https://mesa-test.freedesktop.

org/index.html


I think the theme should be changed to look somewhat close to
the
current mesa3d.org website before changing the main site.
(Color
scheme and missing icons.)

Right now the test website looks like a million other sphinx
websites,
including every readthedocs book.

Based on http://www.sphinx-doc.org/en/master/ and
https://www.python.org/, I would say sphinx gives a lot of
opportunities for a custom look.


I think looking like a generic other sphinx website is a step
forward,
and I look forward to this series landing.  No need for
continuity with
the old theme.


I think it is 1 step forward, 1 step back. As no else appears to
agree, I'll withdraw my feedback.

-Jordan


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
me

Re: [Mesa-dev] [PATCH 00/53] intel/fs: SIMD32 support for fragment shaders

2018-06-01 Thread Eero Tamminen


Hi,

On 30.05.2018 17:30, Jason Ekstrand wrote:

On May 30, 2018 06:45:29 Eero Tamminen  wrote:

On 29.05.2018 18:58, Eero Tamminen wrote:

On 25.05.2018 00:55, Jason Ekstrand wrote:

This patch series adds back-end compiler support for SIMD32 fragment
shaders.  Support is added and everything works but it's currently 
hidden

behind INTEL_DEBUG=do32.  We know that it improves performance in some
cases but we do not yet have a good enough heuristic to start turning
it on
by default.  The objective of this series is to just to get the 
compiler

infrastructure landed so that it stops bit-rotting in Curro's branch.


Tested v3 on BXT & SKL.  Everything seems to work fine.


Everything works fine also on GEN8 (BSW & BDW GT2), but half the tests
invoke GPU hangs on GEN7 (BYT & HSW GT2).


That problem is known.  It's caused by using SIMD32 shaders for fast 
clears.  The SIMD32 replicated clear shaders were added on at the last 
minute and didn't get good enough testing before sending out the series. 
We can either drop those two patches and modify the last one to not do 
SIMD32 when use_replicated_clear is set or I have another patch which 
just disables SIMD32 for fast clears.


AFAIK plain copy & write shaders (like clear) don't benefit from SIMD32. 
 They are 100% bottlenecked by input/output bandwidth already with 
SIMD16, so instruction scheduling latency improvement can't help.  At 
worst SIMD32 can make them slower, if it causes extra cache trashing.



- Eero


One option would be to support SIMD32 just for GEN8+.



Tested-by Eero Tamminen 



Figuring out a good heuristic is left as an exercise to the reader. :-)


Simple heuristic that just enables SIMD32 for everything that isn't
MRT shader, gives nice perf improvements on BXT J4205:
* +30% GfxBench ALU2
* +25% SynMark PSPom
* +10% GpuTest Julia32
* +9% GfxBench CarChase
* +7% GfxBench Manhattan 3.0
* +3-7% GLB T-Rex, SynMark ShMapVsm, GpuTest Triangle
* +1-3% GfxBench Manhattan 3.1 & T-Rex, Unigine Heaven, GpuTest FurMark
* -1-2% GfxBench Aztec Ruins, MemBW Write, SynMark DeferredAA, Fill*,
VSInstancing & ZBuffer
* -2-3% GLB 2.7 Fill
* -4-5% MemBW Blend

On SKL, perf differences are smaller.


On GEN8, the improvements are smaller and regressions larger with
the same heuristic.

Main difference with the 12EU single channel BSW, is -15% regression
in perf of SynMark FillTexMulti, i.e. sampling 8 textures and writing
out their average value.  With single-channel memory, increased memory
latency causes a lot more trashing with SIMD32 when many textures are
being sampled close together.



SIMD32 can cause write bound tests to trash, which is visible as perf
regression in fully write bound tests above (that's also the reason
why SIMD32 is good to disable with MRT shaders).

As to reads, SIMD32 improves cache locality until it starts trashing.
In above GfxBench tests, and amount of texture sampling they do, this
shows in HW counters as increased texture cache misses (trashing), but
less L3 misses (better locality).  Along with (more important) better
latency compensation, these explain why SIMD32 improves performance in
them.


More advanced heuristics that try to avoid the SIMD32 performance
regressions, unfortunately also get rid of clear part of the above
improvements.  Such heuristics would need improved instruction scheduler


Heuristics for things affecting texture fetch latencies would help, like
how many fetches there are, to how many different textures and how close
together they are vs. how large caches there are and how fast RAM.


- Eero


that provides feedback on which shaders have latency issues where SIMD32
would help.

(A potential run-time heuristics would be disabling SIMD32 when too
large textures are bound for draw.)


   - Eero


Francisco Jerez (34):
  intel/eu: Remove brw_codegen::compressed_stack.
  intel/fs: Rename a local variable so it doesn't shadow component()
  intel/fs: Use the ATTR file for FS inputs
  intel/fs: Replace the CINTERP opcode with a simple MOV
  intel/fs: Add explicit last_rt flag to fb writes orthogonal to eot.
  intel/fs: Fix Gen4-5 FB write AA data payload munging for non-EOT
    writes.
  intel/eu: Return new instruction to caller from brw_fb_WRITE().
  intel/fs: Fix fs_inst::flags_written() for Gen4-5 FB writes.
  intel/fs: Fix implied_mrf_writes() for headerless FB writes.
  intel/fs: Remove program key argument from generator.
  intel/fs: Disable SIMD32 dispatch on Gen4-6 with control flow
  intel/fs: Disable SIMD32 dispatch for fragment shaders with discard.
  intel/eu: Fix pixel interpolator queries for SIMD32.
  intel/fs: Fix codegen of FS_OPCODE_SET_SAMPLE_ID for SIMD32.
  intel/fs: Don't enable dual source blend if no outputs are written
  intel/fs: Fix FB write message control codegen for SIMD32.
  intel/fs: Fix logical FB write lowering for SIMD32
  intel/fs: Fix FB read header setup for SIMD32.
  intel/fs

Re: [Mesa-dev] [PATCH 00/53] intel/fs: SIMD32 support for fragment shaders

2018-05-30 Thread Eero Tamminen


Hi,

On 29.05.2018 18:58, Eero Tamminen wrote:

On 25.05.2018 00:55, Jason Ekstrand wrote:

This patch series adds back-end compiler support for SIMD32 fragment
shaders.  Support is added and everything works but it's currently hidden
behind INTEL_DEBUG=do32.  We know that it improves performance in some
cases but we do not yet have a good enough heuristic to start turning 
it on

by default.  The objective of this series is to just to get the compiler
infrastructure landed so that it stops bit-rotting in Curro's branch.


Tested v3 on BXT & SKL.  Everything seems to work fine.


Everything works fine also on GEN8 (BSW & BDW GT2), but half the tests
invoke GPU hangs on GEN7 (BYT & HSW GT2).

One option would be to support SIMD32 just for GEN8+.



Tested-by Eero Tamminen 



Figuring out a good heuristic is left as an exercise to the reader. :-)


Simple heuristic that just enables SIMD32 for everything that isn't
MRT shader, gives nice perf improvements on BXT J4205:
* +30% GfxBench ALU2
* +25% SynMark PSPom
* +10% GpuTest Julia32
* +9% GfxBench CarChase
* +7% GfxBench Manhattan 3.0
* +3-7% GLB T-Rex, SynMark ShMapVsm, GpuTest Triangle
* +1-3% GfxBench Manhattan 3.1 & T-Rex, Unigine Heaven, GpuTest FurMark
* -1-2% GfxBench Aztec Ruins, MemBW Write, SynMark DeferredAA, Fill*, 
VSInstancing & ZBuffer

* -2-3% GLB 2.7 Fill
* -4-5% MemBW Blend

>

On SKL, perf differences are smaller.


On GEN8, the improvements are smaller and regressions larger with
the same heuristic.

Main difference with the 12EU single channel BSW, is -15% regression
in perf of SynMark FillTexMulti, i.e. sampling 8 textures and writing
out their average value.  With single-channel memory, increased memory
latency causes a lot more trashing with SIMD32 when many textures are
being sampled close together.



SIMD32 can cause write bound tests to trash, which is visible as perf
regression in fully write bound tests above (that's also the reason
why SIMD32 is good to disable with MRT shaders).

As to reads, SIMD32 improves cache locality until it starts trashing.
In above GfxBench tests, and amount of texture sampling they do, this
shows in HW counters as increased texture cache misses (trashing), but
less L3 misses (better locality).  Along with (more important) better
latency compensation, these explain why SIMD32 improves performance in
them.


More advanced heuristics that try to avoid the SIMD32 performance
regressions, unfortunately also get rid of clear part of the above
improvements.  Such heuristics would need improved instruction scheduler


Heuristics for things affecting texture fetch latencies would help, like
how many fetches there are, to how many different textures and how close
together they are vs. how large caches there are and how fast RAM.


- Eero


that provides feedback on which shaders have latency issues where SIMD32
would help.

(A potential run-time heuristics would be disabling SIMD32 when too
large textures are bound for draw.)


 - Eero


Francisco Jerez (34):
   intel/eu: Remove brw_codegen::compressed_stack.
   intel/fs: Rename a local variable so it doesn't shadow component()
   intel/fs: Use the ATTR file for FS inputs
   intel/fs: Replace the CINTERP opcode with a simple MOV
   intel/fs: Add explicit last_rt flag to fb writes orthogonal to eot.
   intel/fs: Fix Gen4-5 FB write AA data payload munging for non-EOT
 writes.
   intel/eu: Return new instruction to caller from brw_fb_WRITE().
   intel/fs: Fix fs_inst::flags_written() for Gen4-5 FB writes.
   intel/fs: Fix implied_mrf_writes() for headerless FB writes.
   intel/fs: Remove program key argument from generator.
   intel/fs: Disable SIMD32 dispatch on Gen4-6 with control flow
   intel/fs: Disable SIMD32 dispatch for fragment shaders with discard.
   intel/eu: Fix pixel interpolator queries for SIMD32.
   intel/fs: Fix codegen of FS_OPCODE_SET_SAMPLE_ID for SIMD32.
   intel/fs: Don't enable dual source blend if no outputs are written
   intel/fs: Fix FB write message control codegen for SIMD32.
   intel/fs: Fix logical FB write lowering for SIMD32
   intel/fs: Fix FB read header setup for SIMD32.
   intel/fs: Rework INTERPOLATE_AT_PER_SLOT_OFFSET
   intel/fs: Mark LINTERP opcode as writing accumulator implicitly on
 pre-Gen7.
   intel/fs: Disable opt_sampler_eot() in 32-wide dispatch.
   i965: Add plumbing for shader time in 32-wide FS dispatch mode.
   intel/fs: Simplify fs_visitor::emit_samplepos_setup
   intel/fs: Use fs_regs instead of brw_regs in the unlit centroid
 workaround
   intel/fs: Wrap FS payload register look-up in a helper function.
   intel/fs: Extend thread payload layout to SIMD32
   intel/fs: Implement 32-wide FS payload setup on Gen6+
   intel/fs: Fix Gen7 compressed source region alignment restriction for
 SIMD32
   intel/fs: Fix sample id setup for SIMD32.
   intel/fs: Generalize the unlit centroid workaround
   intel/fs: Fix Gen6+ inter

Re: [Mesa-dev] [PATCH 00/53] intel/fs: SIMD32 support for fragment shaders

2018-05-29 Thread Eero Tamminen


Hi,

On 29.05.2018 18:58, Eero Tamminen wrote:

On 25.05.2018 00:55, Jason Ekstrand wrote:

This patch series adds back-end compiler support for SIMD32 fragment
shaders.  Support is added and everything works but it's currently hidden
behind INTEL_DEBUG=do32.  We know that it improves performance in some
cases but we do not yet have a good enough heuristic to start turning 
it on

by default.  The objective of this series is to just to get the compiler
infrastructure landed so that it stops bit-rotting in Curro's branch.


Tested v3 on BXT & SKL.  Everything seems to work otherwise fine.


s/otherwise//


- Eero

(regardless of how many times one reads a mail before sending, there
always seems to be some leftover one misses.)


Tested-by Eero Tamminen 



Figuring out a good heuristic is left as an exercise to the reader. :-)


Simple heuristic that just enables SIMD32 for everything that isn't
MRT shader, gives nice perf improvements on BXT J4205:
* +30% GfxBench ALU2
* +25% SynMark PSPom
* +10% GpuTest Julia32
* +9% GfxBench CarChase
* +7% GfxBench Manhattan 3.0
* +3-7% GLB T-Rex, SynMark ShMapVsm, GpuTest Triangle
* +1-3% GfxBench Manhattan 3.1 & T-Rex, Unigine Heaven, GpuTest FurMark
* -1-2% GfxBench Aztec Ruins, MemBW Write, SynMark DeferredAA, Fill*, 
VSInstancing & ZBuffer

* -2-3% GLB 2.7 Fill
* -4-5% MemBW Blend

On SKL, perf differences are smaller.

SIMD32 can cause write bound tests to trash, which is visible as perf
regression in fully write bound tests above (that's also the reason
why SIMD32 is good to disable with MRT shaders).

As to reads, SIMD32 improves cache locality until it starts trashing.
In above GfxBench tests, and amount of texture sampling they do, this
shows in HW counters as increased texture cache misses (trashing), but
less L3 misses (better locality).  Along with (more important) better
latency compensation, these explain why SIMD32 improves performance in
them.


More advanced heuristics that try to avoid the SIMD32 performance
regressions, unfortunately also get rid of clear part of the above
improvements.  Such heuristics would need improved instruction scheduler
that provides feedback on which shaders have latency issues where SIMD32
would help.

(A potential run-time heuristics would be disabling SIMD32 when too
large textures are bound for draw.)


 - Eero


Francisco Jerez (34):
   intel/eu: Remove brw_codegen::compressed_stack.
   intel/fs: Rename a local variable so it doesn't shadow component()
   intel/fs: Use the ATTR file for FS inputs
   intel/fs: Replace the CINTERP opcode with a simple MOV
   intel/fs: Add explicit last_rt flag to fb writes orthogonal to eot.
   intel/fs: Fix Gen4-5 FB write AA data payload munging for non-EOT
 writes.
   intel/eu: Return new instruction to caller from brw_fb_WRITE().
   intel/fs: Fix fs_inst::flags_written() for Gen4-5 FB writes.
   intel/fs: Fix implied_mrf_writes() for headerless FB writes.
   intel/fs: Remove program key argument from generator.
   intel/fs: Disable SIMD32 dispatch on Gen4-6 with control flow
   intel/fs: Disable SIMD32 dispatch for fragment shaders with discard.
   intel/eu: Fix pixel interpolator queries for SIMD32.
   intel/fs: Fix codegen of FS_OPCODE_SET_SAMPLE_ID for SIMD32.
   intel/fs: Don't enable dual source blend if no outputs are written
   intel/fs: Fix FB write message control codegen for SIMD32.
   intel/fs: Fix logical FB write lowering for SIMD32
   intel/fs: Fix FB read header setup for SIMD32.
   intel/fs: Rework INTERPOLATE_AT_PER_SLOT_OFFSET
   intel/fs: Mark LINTERP opcode as writing accumulator implicitly on
 pre-Gen7.
   intel/fs: Disable opt_sampler_eot() in 32-wide dispatch.
   i965: Add plumbing for shader time in 32-wide FS dispatch mode.
   intel/fs: Simplify fs_visitor::emit_samplepos_setup
   intel/fs: Use fs_regs instead of brw_regs in the unlit centroid
 workaround
   intel/fs: Wrap FS payload register look-up in a helper function.
   intel/fs: Extend thread payload layout to SIMD32
   intel/fs: Implement 32-wide FS payload setup on Gen6+
   intel/fs: Fix Gen7 compressed source region alignment restriction for
 SIMD32
   intel/fs: Fix sample id setup for SIMD32.
   intel/fs: Generalize the unlit centroid workaround
   intel/fs: Fix Gen6+ interpolation setup for SIMD32
   intel/fs: Fix fs_builder::sample_mask_reg() for 32-wide FS dispatch.
   intel/fs: Fix nir_intrinsic_load_helper_invocation for SIMD32.
   intel/fs: Build 32-wide FS shaders.

Jason Ekstrand (19):
   intel/fs: Assert that the gen4-6 plane restrictions are followed
   intel/fs: Use groups for SIMD16 LINTERP on gen11+
   intel/fs: FS_OPCODE_REP_FB_WRITE has side effects
   intel/fs: Properly track implied header regs read by FB writes
   intel/fs: Pull FB write implied headers from src[0]
   intel/fs: Set up FB write message headers in the visitor
   i965: Re-arrange shader kernel setup in WM state
   intel/compiler: Add and use helper

Re: [Mesa-dev] [PATCH 00/53] intel/fs: SIMD32 support for fragment shaders

2018-05-29 Thread Eero Tamminen


Hi,

On 25.05.2018 00:55, Jason Ekstrand wrote:

This patch series adds back-end compiler support for SIMD32 fragment
shaders.  Support is added and everything works but it's currently hidden
behind INTEL_DEBUG=do32.  We know that it improves performance in some
cases but we do not yet have a good enough heuristic to start turning it on
by default.  The objective of this series is to just to get the compiler
infrastructure landed so that it stops bit-rotting in Curro's branch.


Tested v3 on BXT & SKL.  Everything seems to work otherwise fine.

Tested-by Eero Tamminen 



Figuring out a good heuristic is left as an exercise to the reader. :-)


Simple heuristic that just enables SIMD32 for everything that isn't
MRT shader, gives nice perf improvements on BXT J4205:
* +30% GfxBench ALU2
* +25% SynMark PSPom
* +10% GpuTest Julia32
* +9% GfxBench CarChase
* +7% GfxBench Manhattan 3.0
* +3-7% GLB T-Rex, SynMark ShMapVsm, GpuTest Triangle
* +1-3% GfxBench Manhattan 3.1 & T-Rex, Unigine Heaven, GpuTest FurMark
* -1-2% GfxBench Aztec Ruins, MemBW Write, SynMark DeferredAA, Fill*, 
VSInstancing & ZBuffer

* -2-3% GLB 2.7 Fill
* -4-5% MemBW Blend

On SKL, perf differences are smaller.

SIMD32 can cause write bound tests to trash, which is visible as perf
regression in fully write bound tests above (that's also the reason
why SIMD32 is good to disable with MRT shaders).

As to reads, SIMD32 improves cache locality until it starts trashing.
In above GfxBench tests, and amount of texture sampling they do, this
shows in HW counters as increased texture cache misses (trashing), but
less L3 misses (better locality).  Along with (more important) better
latency compensation, these explain why SIMD32 improves performance in
them.


More advanced heuristics that try to avoid the SIMD32 performance
regressions, unfortunately also get rid of clear part of the above
improvements.  Such heuristics would need improved instruction scheduler
that provides feedback on which shaders have latency issues where SIMD32
would help.

(A potential run-time heuristics would be disabling SIMD32 when too
large textures are bound for draw.)


- Eero


Francisco Jerez (34):
   intel/eu: Remove brw_codegen::compressed_stack.
   intel/fs: Rename a local variable so it doesn't shadow component()
   intel/fs: Use the ATTR file for FS inputs
   intel/fs: Replace the CINTERP opcode with a simple MOV
   intel/fs: Add explicit last_rt flag to fb writes orthogonal to eot.
   intel/fs: Fix Gen4-5 FB write AA data payload munging for non-EOT
 writes.
   intel/eu: Return new instruction to caller from brw_fb_WRITE().
   intel/fs: Fix fs_inst::flags_written() for Gen4-5 FB writes.
   intel/fs: Fix implied_mrf_writes() for headerless FB writes.
   intel/fs: Remove program key argument from generator.
   intel/fs: Disable SIMD32 dispatch on Gen4-6 with control flow
   intel/fs: Disable SIMD32 dispatch for fragment shaders with discard.
   intel/eu: Fix pixel interpolator queries for SIMD32.
   intel/fs: Fix codegen of FS_OPCODE_SET_SAMPLE_ID for SIMD32.
   intel/fs: Don't enable dual source blend if no outputs are written
   intel/fs: Fix FB write message control codegen for SIMD32.
   intel/fs: Fix logical FB write lowering for SIMD32
   intel/fs: Fix FB read header setup for SIMD32.
   intel/fs: Rework INTERPOLATE_AT_PER_SLOT_OFFSET
   intel/fs: Mark LINTERP opcode as writing accumulator implicitly on
 pre-Gen7.
   intel/fs: Disable opt_sampler_eot() in 32-wide dispatch.
   i965: Add plumbing for shader time in 32-wide FS dispatch mode.
   intel/fs: Simplify fs_visitor::emit_samplepos_setup
   intel/fs: Use fs_regs instead of brw_regs in the unlit centroid
 workaround
   intel/fs: Wrap FS payload register look-up in a helper function.
   intel/fs: Extend thread payload layout to SIMD32
   intel/fs: Implement 32-wide FS payload setup on Gen6+
   intel/fs: Fix Gen7 compressed source region alignment restriction for
 SIMD32
   intel/fs: Fix sample id setup for SIMD32.
   intel/fs: Generalize the unlit centroid workaround
   intel/fs: Fix Gen6+ interpolation setup for SIMD32
   intel/fs: Fix fs_builder::sample_mask_reg() for 32-wide FS dispatch.
   intel/fs: Fix nir_intrinsic_load_helper_invocation for SIMD32.
   intel/fs: Build 32-wide FS shaders.

Jason Ekstrand (19):
   intel/fs: Assert that the gen4-6 plane restrictions are followed
   intel/fs: Use groups for SIMD16 LINTERP on gen11+
   intel/fs: FS_OPCODE_REP_FB_WRITE has side effects
   intel/fs: Properly track implied header regs read by FB writes
   intel/fs: Pull FB write implied headers from src[0]
   intel/fs: Set up FB write message headers in the visitor
   i965: Re-arrange shader kernel setup in WM state
   intel/compiler: Add and use helpers for working with KSP indices
   intel/fs: Rework KSP data to be SIMD width-based
   intel/fs: Split instructions low to high in lower_simd_width
   intel/fs: Properly copy default flag reg

Re: [Mesa-dev] Mesa documentation (was: Gitlab migration)

2018-05-24 Thread Eero Tamminen


Hi,

On 23.05.2018 22:34, Jason Ekstrand wrote:
[...]> One of the motivations for doing this now is that there has been 
some
desire to move the mesa website away from raw HTML and over to a 
platform such as sphinx.  If we're going to do that, we need a system 
for building the website whenever someone pushes to mesa.  The solution 
that the fd.o admins would like us to use for that is the docker-based 
gitlab CI.  Laura has been working on this the last couple of weeks and 
the results are pretty nice looking so far.  You can check out a preview 
here: https://mesa-test.freedesktop.org/intro.html


I looked at this, and noticed that few things seem out of date:

* Developers section doesn't list Igalia, and latest mentioned larger
  Intel contribution is for Mesa 7.9

* Acknowledgements section content seems to be from 90's

* Software drivers list misses SWR

* Conformance section blurb says:
-
This file has the latest results of testing Mesa with the OpenGL 1.2 
conformance tests. Testing with the preliminary OpenGL 1.3 tests has 
also been done

-

* Compiling and Installing section doesn't mention Meson,
  although there's separate sub section for it

* Bug database "The old bug database on SourceForge is no longer used."
  comment may be out of date

* Shading Language Support section "Contents" list is completely
  out of sync with the page content.  Aren't TOCs autogenerated?

* Open GL ES section:
  "Mesa implements OpenGL ES 1.1 and OpenGL ES 2.0"

* Performance Tips section should tell to use modern GL features
  instead of things like: "Try to maximize the amount of drawing
  done between glBegin/glEnd pairs."

* Application Issues section is out of date.  Maybe it could have
  a link to Mesa Git version of driconf, for a reference of applications
  for which Mesa needs workarounds?

If it's too much work to keep these up to date, maybe the obsolete
information could just be removed?


Also:

* Mesa/DRI Wiki item is a link.  It could be moved to Links section.

* Regarding "i945/i965 driver environment variables" subsection:
  - i945 -> i915 ?
  - i915 certainly doesn't implement all of the options that i965 does,
so maybe it needs a separate section?
  - In the Sphinx generated lists it's hard to see what options
are under what variables.  IMHO lists should use bullets to indicate
hierarchy like the current documentation does:
https://www.mesa3d.org/envvars.html


Using sphinx gives 
us all sorts of neat things like nice text formatting, syntax 
highlighting, and autogenerated searchable navigation. Right now, it's 
still using one of the standard sphinx themes so it looks a bit 
"default" but that's something we can change.



- Eero

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 09/22] intel/compiler: implement 16-bit multiply-add

2018-05-21 Thread Eero Tamminen

Hi,

On 21.05.2018 10:42, Iago Toral wrote:

On Fri, 2018-05-18 at 12:08 +0300, Eero Tamminen wrote:

On 17.05.2018 14:25, Eero Tamminen wrote:

On 17.05.2018 11:46, Iago Toral Quiroga wrote:

The PRM for MAD states that F, DF and HF are supported, however,
then
it requires that the instruction includes a 2-bit mask specifying
the types of each operand like this:

  >

00: 32-bit float
01: 32-bit signed integer
10: 32-bit unsigned integer
11: 64-bit float

Where this was?

This is in the decription of the MAD instruction here (for SKL):

https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl
-vol02a-commandreference-instructions.pdf

It guess the contents for this were copy & pasted from previous PRMs
that didn't support HF...

Ouch.  That looks pretty different from what's in here:
https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol07-3d_media_gpgpu.pdf

And in BSW one:

In
https://01.org/sites/default/files/documentation/intel-gfx-bspec-os
rc-chv-bsw-vol07-3d-media-gpgpu-engine.pdf

I'll ask around who's currently maintaining the 01.org docs, and could 
she/he update the opcode docs. All of them from BSW to KBL seem to have 
the old information, while the Media GPGPU docs have newer info.

Btw.  Did you have test-cases utilizing mad() instructions, and did
they work OK with your patchset?

(If yes, better test-cases may be required.)

Section "EU Changes by Processor Generation" states:
-
These features or behaviors are added for CHV, BSW, continuing to
later generations:
...
In the 3-source instruction format, widen the SrcType and DstType
fields
and add an encoding for the HF (Half Float) type.
-

(I.e. it applies to GEN9+ [1] and on GEN8 for BSW/CHV.)

Actually, I have just verified that the BDW PRMs have the exact same
thing, but stating BDW instead of BSW/CHV, so I guess BDW should be
supported too.

Yes, right.

In section "GEN Instruction Format – 3-src" table, both "Dst Type"
and
"Src Type" fields are 3 bits, and there's additional 1 bit "Src1
Type"
and "Src2 Type" fields to differentiate formats for src1 & src2.

  >

Then, when looking at "Source or Destination Operand Fields
(Alphabetically by Short Name)" section:
---
DstType:

Encoding for three source instructions:
000b = :f. Single precision Float (32-bit).
001b = :d. Signed Doubleword integer.
010b = :ud. Unsigned Doubleword integer.
011b = :df. Double precision Float (64-bit).
100b = :hf. Half precision Float (16-bit).
101b - 111b. Reserved.

...

SrcType:

Three source instructions use one SrcType field for all source
operands,
with a 3-bit encoding that allows fewer data types:

Encoding for three source instructions:
000b = :f. Single precision Float (32-bit).
001b = :d. Signed Doubleword integer.
010b = :ud. Unsigned Doubleword integer.
011b = :df. Double precision Float (64-bit).
100b = :hf. Half precision Float (16-bit).
101b - 111b. Reserved.

Three source instructions can use operands with mixed-mode
precision.
When SrcType field is set to :f or :hf it defines precision for
source 0
only, and fields Src1Type and Src2Type define precision for other
source
operands:
0b = :f. Single precision Float (32-bit).
1b = :hf. Half precision Float (16-bit).
---

Note also the section "Special Requirements for Handling Mixed Mode
Float Operations".

Both BDW & BSW specs list also this restriction, which is lifted GEN9:
* Calculations using the HF (Half Float) type do not support denormals 
or gradual underflow.

Btw. Mesa supporting HF with mad() is important because pln()
doesn't support it in HW, but you can implement pln() HF version
with two mad()s.

- Eero

[1]: SkyLake & KabyLake PRMs lists same amount of bits, but don't
tell
what they represent (often one needs to look into PRM for the HW
where
these changes were first introduced, which in this case was
Braswell /
Cherryview).

So 16-bit float would not be supported.  The driver also asserts
that
the types involved in ALING16 3-src operations are one of these
(MAD is always emitted as an align16 instruction prior to gen10).
---
   src/intel/compiler/brw_fs_nir.cpp | 9 -
   1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp
b/src/intel/compiler/brw_fs_nir.cpp
index 91283ab4911..58ddc456bae 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -1525,7 +1525,14 @@ fs_visitor::nir_emit_alu(const fs_builder
&bld,
nir_alu_instr *instr)
 break;
  case nir_op_ffma:
-  inst = bld.MAD(result, op[2], op[1], op[0]);
+  /* 3-src MAD doesn't support 16-bit operands */
+  if (nir_dest_bit_size(instr->dest.dest) >= 32) {
+

Re: [Mesa-dev] [PATCH 09/22] intel/compiler: implement 16-bit multiply-add

2018-05-17 Thread Eero Tamminen


Hi,

On 17.05.2018 11:46, Iago Toral Quiroga wrote:

The PRM for MAD states that F, DF and HF are supported, however, then
it requires that the instruction includes a 2-bit mask specifying
the types of each operand like this:

>

00: 32-bit float
01: 32-bit signed integer
10: 32-bit unsigned integer
11: 64-bit float


Where this was?


In 
https://01.org/sites/default/files/documentation/intel-gfx-bspec-osrc-chv-bsw-vol07-3d-media-gpgpu-engine.pdf


Section "EU Changes by Processor Generation" states:
-
These features or behaviors are added for CHV, BSW, continuing to later 
generations:

...
In the 3-source instruction format, widen the SrcType and DstType fields 
and add an encoding for the HF (Half Float) type.

-

(I.e. it applies to GEN9+ [1] and on GEN8 for BSW/CHV.)


In section "GEN Instruction Format – 3-src" table, both "Dst Type" and 
"Src Type" fields are 3 bits, and there's additional 1 bit "Src1 Type" 
and "Src2 Type" fields to differentiate formats for src1 & src2.



Then, when looking at "Source or Destination Operand Fields 
(Alphabetically by Short Name)" section:

---
DstType:

Encoding for three source instructions:
000b = :f. Single precision Float (32-bit).
001b = :d. Signed Doubleword integer.
010b = :ud. Unsigned Doubleword integer.
011b = :df. Double precision Float (64-bit).
100b = :hf. Half precision Float (16-bit).
101b - 111b. Reserved.

...

SrcType:

Three source instructions use one SrcType field for all source operands, 
with a 3-bit encoding that allows fewer data types:


Encoding for three source instructions:
000b = :f. Single precision Float (32-bit).
001b = :d. Signed Doubleword integer.
010b = :ud. Unsigned Doubleword integer.
011b = :df. Double precision Float (64-bit).
100b = :hf. Half precision Float (16-bit).
101b - 111b. Reserved.

Three source instructions can use operands with mixed-mode precision. 
When SrcType field is set to :f or :hf it defines precision for source 0 
only, and fields Src1Type and Src2Type define precision for other source 
operands:

0b = :f. Single precision Float (32-bit).
1b = :hf. Half precision Float (16-bit).
---


- Eero

[1]: SkyLake & KabyLake PRMs lists same amount of bits, but don't tell 
what they represent (often one needs to look into PRM for the HW where 
these changes were first introduced, which in this case was Braswell / 
Cherryview).




So 16-bit float would not be supported.  The driver also asserts that
the types involved in ALING16 3-src operations are one of these
(MAD is always emitted as an align16 instruction prior to gen10).
---
  src/intel/compiler/brw_fs_nir.cpp | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 91283ab4911..58ddc456bae 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -1525,7 +1525,14 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, 
nir_alu_instr *instr)
break;
  
 case nir_op_ffma:

-  inst = bld.MAD(result, op[2], op[1], op[0]);
+  /* 3-src MAD doesn't support 16-bit operands */
+  if (nir_dest_bit_size(instr->dest.dest) >= 32) {
+ inst = bld.MAD(result, op[2], op[1], op[0]);
+  } else {
+ fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_HF);
+ bld.MUL(tmp, op[1], op[0]);
+ inst = bld.ADD(result, tmp, op[2]);
+  }
inst->saturate = instr->dest.saturate;
break;
  



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] gcc bug / crash in ast_type_qualifier::validate_in_qualifier()?

2018-05-09 Thread Eero Tamminen


Hi,

On 08.05.2018 20:33, Francisco Jerez wrote:

Kenneth Graunke  writes:

On Tuesday, May 8, 2018 1:23:32 AM PDT Eero Tamminen wrote:

On 08.05.2018 06:45, Matt Turner wrote:

On Mon, May 7, 2018 at 8:02 PM, Brian Paul  wrote:

This probably prevents use of xmm instructions, but I haven't inspected the
code.

Is anyone else seeing this?


Yes, it's https://bugs.freedesktop.org/show_bug.cgi?id=105497

I was surprised that we decided it's not worth working around.


By making above part perform worse for everybody using -O3, or by
disabling vectorization optimization (enabled by -O3) just for
the buggy GCC version?

(If that GCC version gets it wrong in this place, it may get it
wrong also elsewhere, so better turn that particular -O3 optimization
off completely.)

Is there an upstream GCC bug report about that, >>> which would tell which GCC 
versions are affected?

>>

I wouldn't worry about performance here, the AST code is basically
never the hot path (even without shader cache, and now it's glacial).
I was honestly surprised to see it start using xmm intrinsics.


I agree that vectorizing this data structure is unlikely to make any
measurable performance difference in practice, but I think Eero still
has a point -- How do we know that this GCC optimization is not
miscompiling code elsewhere, potentially in a less frequently hit
codepath?  I wouldn't take the risk of shipping a binary of Mesa built
with GCC 5.4 and -O3 even with this workaround.  It may make more sense
to drop support for this GCC version (or as Eero suggested to turn the
optimization off).


I tried to find related GCC bug, but only GCC alignment bugs were
either for GCC 4.4 or older, or for ARM:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57271

(ARM SIMD instruction issue fixed in GCC v6.0 -> is anybody testing
Mesa with GCC -O3 on ARM?)


However, I found a related Glibc bug:
https://sourceware.org/bugzilla/show_bug.cgi?id=21120

Which was fixed in 2017, in Glibc v2.26.

NOTE: When optimizations are enabled, GCC will assume that all
incoming memory addresses (stack, malloc etc) have a certain
minimum alignment (nowadays 16-bytes), and doesn't do any
alignment for them in the callee.

There's no mention in the bug where the memory given to x86 SIMD
instructions (that require 16-byte alignment) came from.  Is it
from stack, or from malloc()?


- Eero

PS. As bug is with shader-db test, I don't think problem is linking with
objects from other compilers, or too old GCC versions, or allocations
coming from auto-generated JIT / hand-written / old assembly code,
which don't implement the nowadays required 16-byte alignment.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] gcc bug / crash in ast_type_qualifier::validate_in_qualifier()?

2018-05-08 Thread Eero Tamminen


Hi,

On 08.05.2018 06:45, Matt Turner wrote:

On Mon, May 7, 2018 at 8:02 PM, Brian Paul  wrote:


I don't know when this started happening (I'll try bisecting tomorrow) but
we're seeing a crash in ast_type_qualifier::validate_in_qualifier() in -O3
builds with gcc 5.4.0 on Ubuntu 16.04.

Specifically, at ast_type.cpp:654:

if ((this->flags.i & ~valid_in_mask.flags.i) != 0) {

It seems to be the ~ operator/function which is implemented with an SSE pxor
instruction.

I found that this patch avoids the issue:

diff --git a/src/compiler/glsl/ast.h b/src/compiler/glsl/ast.h
index a1ec0d5..2e518ce 100644
--- a/src/compiler/glsl/ast.h
+++ b/src/compiler/glsl/ast.h
@@ -474,7 +474,7 @@ enum {

  struct ast_type_qualifier {
 DECLARE_RALLOC_CXX_OPERATORS(ast_type_qualifier);
-   DECLARE_BITSET_T(bitset_t, 128);
+   DECLARE_BITSET_T(bitset_t, 96);

 union flags {
struct {

This probably prevents use of xmm instructions, but I haven't inspected the
code.

Is anyone else seeing this?


Yes, it's https://bugs.freedesktop.org/show_bug.cgi?id=105497

I was surprised that we decided it's not worth working around.


By making above part perform worse for everybody using -O3, or by
disabling vectorization optimization (enabled by -O3) just for
the buggy GCC version?

(If that GCC version gets it wrong in this place, it may get it
wrong also elsewhere, so better turn that particular -O3 optimization
off completely.)

Is there an upstream GCC bug report about that, which would tell
which GCC versions are affected?


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/4] improve buffer cache and reuse

2018-05-07 Thread Eero Tamminen

Hi,

On 05.05.2018 03:56, James Xiong wrote:

From: "Xiong, James" 

With the current implementation, brw_bufmgr may round up a request
size to the next bucket size, result in 25% more memory allocated in
the worst senario. For example:
Request sizeActual size
32KB+1Byte  40KB
.
8MB+1Byte   10MB
.
96MB+1Byte  112MB
This series align the buffer size up to page instead of a bucket size
to improve memory allocation efficiency.

Performance and memory usage were measured on a gen9 platform using
Basemark ES3, GfxBench 4 and 5, each test case ran 6 times.

Basemark ES3
scorepeak memory size(KB)
beforeafter   diff   before  after   diff
max avg   max avg maxavg
22  2123  21  2.83%  1.21%   409928  395573  -14355
20  2020  20  0.53%  0.41%  

Thanks for the new data!

As the values below seem similar to what you earlier sent, I assume
the tests are listed here in the same order, i.e:

GfxBench 4.0
scorepeak memory size(KB)
  > score   peak memory 
size(KB)
  > before  after   diffbefore  after 
diff

  > max   avg   max   avg   max avg
gl_4  >  584   577   586   583  0.45%   1.02%   566489  539699 
-26791
manhattan > 1604  1144  1650  1202	2.81%   4.86%   439220  411596 
-27624
gl_trex   > 2711    2718  2152  0.25%  -3.25%   126065  121398 
-4667
gl_alu2   > 1218  1213  1212  1154 -0.53%  -5.10%54153   53868 
 -285
driver2   >  106   104   106   103  0.85%  -1.66%12730   12666 
  -64
gl_4_off  >  728   727   727   726 -0.03%  -0.16%   614730  586794 
-27936
manhattan_off > 1732  1709  1740  1728  0.49%   1.11%   475716  447726 
-27990
gl_trex_off   > 3051  2969  3066  3047  0.50%   2.55%   154169  148962 
-5207
gl_alu2_off   > 2626  2607  2626  2625  0.00%   0.70%84119   83150 
 -969
driver2_off   >  211   208   208   205 -1.26%  -1.21%39924   39667 
 -257

GfxBench 5.0

  > score   peak memory size(KB)
  > beforeafter diffbefore   afterdiff
  > max  avg  max  avg  max avg
gl_5  > 260  258  259  256 -0.39%  -0.85%   037  1013520 
-97517
gl_5_off  > 298  295  298  297  0.00%   0.45%   1143593  1040844 
-102749

As expected, max gives more stable results than average.

There could be performance improvement in Manhattan v3.0. At least it
had largest peak memory usage saving in GfxBench v4, both absolutely &
relatively (6%).

gl_alu2 onscreen average drop seems also suspiciously large, but as it's
not visible in max value, or in alu2 offscreen, or your previous test,
I think it it's just random variation.

In light of what I know of these tests variance on TDP limited devices,
I think rest of your GfxBench v4 & v5 performance changes also fall 
within random variance.

- Eero

Xiong, James (4):
   i965/drm: Reorganize code for the next patch
   i965/drm: Round down buffer size and calculate the bucket index
   i965/drm: Searching for a cached buffer for reuse
   i965/drm: Purge the bucket when its cached buffer is evicted

  src/mesa/drivers/dri/i965/brw_bufmgr.c | 139 ++---
  src/util/list.h|   5 ++
  2 files changed, 79 insertions(+), 65 deletions(-)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3] i965: Fix ETC2/EAC GetCompressed* functions on Gen7 GPUs

2018-05-07 Thread Eero Tamminen


Hi,

On 04.05.2018 22:06, Eleni Maria Stea wrote:

On Fri, 4 May 2018 18:29:55 +0300
Eero Tamminen  wrote:


You mean returning CAVEAT_SUPPORT in params for compressed formats
which are transparently converted to uncompressed data?


Well, that would be the best solution I think, if it's possible to
modify an existing query in the extension, although I am not certain
which is the best query to modify: TEXTURE_COMPRESSED, or
INTERNALFORMAT_SUPPORTED (or maybe both?).

There's also another solution that we already have, but we are not sure
if it is correct:

I noticed that both mesa and nvidia drivers return GL_FALSE when the
pname is GL_TEXTURE_COMPRESSED and the format is emulated and GL_TRUE
for the natively supported formats.


So, in Nvidia case only the ancient GLES 1.x GL_PALETTE_* formats
are emulated.  I guess Nvidia HW was never than HSW. :-)



(Specifically on mesa the code that
performs the check is in src/mesa/main/formatquery.c and tests only
for native support).

>

But if you take a look at this part of the extension specification:

TEXTURE_COMPRESSED: If  is a compressed format
   that is supported for this type of resource, TRUE is returned in
   . If the internal format is not compressed, or the type of
   resource is not supported, FALSE is returned.

it is not very clear if we should return true or false for an
emulated format. Maybe returning false when we provide emulation is a
bug in both drivers, just a convenient one in this case. :-)

Is there any way to clarify what should be the correct behavior?


Khronos documentation bug tracker?

If you can test also e.g. AMD driver, that would be good background
data for the clarification ticket.



Do you think that even if the current behavior of the
TEXTURE_COMPRESSED query is correct, in which case it should keep
returning GL_FALSE for the emulated formats, we should nevertheless
modify something else, e.g. the INTERNALFORMAT_SUPPORTED query, to
return CAVEAT_SUPPORT?


Uh, I'm not the best person to answer that.  Maybe somebody
on Portland team who has contacts to game engine developers?



That API's not available for GLES v2, where I think ETC is most widely
used, so it would be more of a solution for GLES v3.x applications
only. Sounds OK to me.

Hardest part will be propagating use of this query to engines &
toolkits that would benefit from using it. :-)


+1 on that :)

Thanks a lot for the suggestions and the feedback,
Eleni

PS: here is some code to clarify the current situation:

[1]: https://github.com/hikiko/test-compression is a test program to
quickly check the compressed formats supported (see
the function print_compressed_formats at the end of main.c)

[2]: https://pastebin.com/Qif74fFn is the output of [1] on HSW using
the ETC patch and on nvidia where you can see that the natively
supported compression formats return GL_TRUE in both cards whereas the
emulated ones return GL_FALSE in both cards


Thanks for checking these!


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3] i965: Fix ETC2/EAC GetCompressed* functions on Gen7 GPUs

2018-05-04 Thread Eero Tamminen


Hi,

On 04.05.2018 14:11, Eleni Maria Stea wrote:

Thanks for your feedback,
On Thu, 3 May 2018 13:30:38 +0300
Eero Tamminen  wrote:

On 02.05.2018 20:19, Matt Turner wrote:

On Wed, May 2, 2018 at 9:13 AM, Eleni Maria Stea 
wrote:

Gen 7 GPUs store the compressed EAC/ETC2 images in other
non-compressed formats that can render. When GetCompressed*
functions are called, the pixels are returned in the
non-compressed format that is used for the rendering.

With this patch we store both the compressed and non-compressed
versions of the image, so that both rendering commands and
GetCompressed* commands work.

Also, the assertions for GL_MAP_WRITE_BIT and
GL_MAP_INVALIDATE_RANGE_BIT in intel_miptree_map_etc function have
been removed because when the miptree is mapped for reading (for
example from a GetCompress* function) the GL_MAP_WRITE_BIT won't
be set (and shouldn't be set).

Fixes: the following test in CTS for gen7:
KHR-GL45.direct_state_access.textures_compressed_subimage test

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104272


I think you can add

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81843

as well :)


This is really lovely feature.

Compressed texture formats are used to:
1. Reduce disk / network usage for the application install
2. Increase run-time performance (by reducing required bandwidth)
3. Reduce program memory usage

At the cost of worse texture quality.

Mesa transparently converting ETC to uncompressed data on platforms
that don't support ETC in HW, means that application doesn't get 2),
just worse texture quality, although some applications would have
capability to fall back to another (HW supported) texture compression
format.


Examples of these are GLBenchmark and GfxBench benchmarks GLES versions.

(First one defaults to ETC1, but can use also DXT if ETC is not
available.  GfxBench v5 defaults to ASTC/ETC2, but can use also DXT,
if those textures are shipped with it.  DXT is the default for the GL
version.)



And this new patch means that instead of 3), memory usage actually
_increases_ compared to application using non-compressed textures
directly.



You are right about the memory usage and about the purpose of the patch.


This was bit of a cry for "why the world couldn't be a better place?".
I know hoping for a better solution for this when it requires also
users of the driver to do something extra, is a bit desperate...



Some (many?) applications might fail to run if ETC isn't supported, so
I  understand why this feature is done, but it would be nice to have
some better way to handle it.

Maybe some new extension that can be used by future game engines &
application toolkits to query which of the compressed texture formats
are faked, so that they can instead select a compression format that
actually provides run-time benefits?


After having a look I think that we already have an extension for that
type of query: the ARB_internalformat_query2. We are not sure if the
existing queries would fit, but in case they don't we think that the
new extension would rather add more queries to this one than introducing
new methods. What do you think?


You mean returning CAVEAT_SUPPORT in params for compressed formats
which are transparently converted to uncompressed data?

That API's not available for GLES v2, where I think ETC is most widely
used, so it would be more of a solution for GLES v3.x applications only.
Sounds OK to me.

Hardest part will be propagating use of this query to engines & toolkits
that would benefit from using it. :-)


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/4] improve buffer cache and reuse

2018-05-03 Thread Eero Tamminen


Hi,

On 02.05.2018 21:19, James Xiong wrote:

On Wed, 2 May 2018 14:18:21 +0300
Eero Tamminen  wrote:

[...]

You're missing information on:
* On which plaform you did the testing (affects variance)
* how many test rounds you ran, and
* what is your variance

>

I ran these tests on a gen9 platform/ubuntu 17.10 LTS.


If it's TDP limited in 3D tests (like all NUC and e.g. Broxton devices
seem to be in long running tests), it has clearly higher variance than
non-TDP (or temperature) limited desktop platforms.



Most of the tests
are consistent, especially the memory usage. The only exception is
GfxBench 4.0 gl_manhattan, I had to ran it 3 times and pick the highest
one. I will apply this method to all tests and re-send with updated
results.


(comments below are about FPS results, not memory usage.)

Performance of many GPU bound tests doesn't have normal Gaussian
distribution, but two (tight) peaks.  On our BXT machines these peaks
are currently e.g. in GfxBench Manhattan tests *3%* apart from each
other.

While you can get results from both performance peaks, whether your 
results fall onto either of these performance peaks, is more likely

to change between boots (I think due to alignment changes in kernel
memory allocations), than successive runs.
-> Your results may have less chance of being misleading, if you
   don't reboot when switching between Mesa version with your patch
   and one without.

Especially if you're running tests only on one machine (i.e. don't have
extra data from other machines against which you can correlate results),
I think you need more than 3 runs, both with and  without your patch.

While max() can provide better comparison for this kind of bimodal
result distribution than avg(), you should still calculate and provide
variance for your data with your patches.


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3] i965: Fix ETC2/EAC GetCompressed* functions on Gen7 GPUs

2018-05-03 Thread Eero Tamminen


Hi,

On 02.05.2018 20:19, Matt Turner wrote:

On Wed, May 2, 2018 at 9:13 AM, Eleni Maria Stea  wrote:

Gen 7 GPUs store the compressed EAC/ETC2 images in other non-compressed
formats that can render. When GetCompressed* functions are called, the
pixels are returned in the non-compressed format that is used for the
rendering.

With this patch we store both the compressed and non-compressed versions
of the image, so that both rendering commands and GetCompressed*
commands work.

Also, the assertions for GL_MAP_WRITE_BIT and GL_MAP_INVALIDATE_RANGE_BIT
in intel_miptree_map_etc function have been removed because when the
miptree is mapped for reading (for example from a GetCompress*
function) the GL_MAP_WRITE_BIT won't be set (and shouldn't be set).

Fixes: the following test in CTS for gen7:
KHR-GL45.direct_state_access.textures_compressed_subimage test

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104272


I think you can add

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81843

as well :)


This is really lovely feature.

Compressed texture formats are used to:
1. Reduce disk / network usage for the application install
2. Increase run-time performance (by reducing required bandwidth)
3. Reduce program memory usage

At the cost of worse texture quality.

Mesa transparently converting ETC to uncompressed data on platforms that
don't support ETC in HW, means that application doesn't get 2), just
worse texture quality, although some applications would have capability
to fall back to another (HW supported) texture compression format.

And this new patch means that instead of 3), memory usage actually
_increases_ compared to application using non-compressed textures
directly.


Some (many?) applications might fail to run if ETC isn't supported, so
I  understand why this feature is done, but it would be nice to have
some better way to handle it.

Maybe some new extension that can be used by future game engines &
application toolkits to query which of the compressed texture formats
are faked, so that they can instead select a compression format that
actually provides run-time benefits?


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/4] improve buffer cache and reuse

2018-05-02 Thread Eero Tamminen


Hi,

On 02.05.2018 02:25, James Xiong wrote:

From: "Xiong, James" 

With the current implementation, brw_bufmgr may round up a request
size to the next bucket size, result in 25% more memory allocated in
the worst senario. For example:
Request sizeActual size
32KB+1Byte  40KB
.
8MB+1Byte   10MB
.
96MB+1Byte  112MB
This series align the buffer size up to page instead of a bucket size
to improve memory allocation efficiency. Performances are almost the
same with Basemark ES3, GfxBench4 and 5:

Basemark ES3
scorepeak memory allocation
   before  afterdiffbeforeafter  diff
21.537462  21.888784  1.61%419766272  408809472  -10956800
19.566198  19.763429  1.00% 


What memory you're measuring:

* VmSize (not that relevant unless you're running out of address space)?

* PrivateDirty (listed in /proc/PID/smaps and e.g. by "smem" tool [1])?

* total of allocation sizes used by Mesa?

Or something else?

In general, unused memory isn't much of a problem, only dirty (written) 
memory.  Kernel maps all unused memory to a single zero page, so unused 
memory takes only few bytes of RAM for the page table entries (required 
for tracking the allocation pages).




GfxBench 4.0
 scorepeak memory
  before after diff before   after 
diff
gl_4 564.6052246094  565.2348632813  0.11%  578490368 550199296 
-28291072
gl_4_off 727.0440063477  703.5833129883  -3.33% 629501952 598216704 
-31285248
gl_manhattan 1053.4223632813 1057.3690185547 0.37%  449568768 421134336 
-28434432
gl_trex  2708.0656738281 2699.2646484375 -0.33% 130076672 125042688 
-5033984
gl_alu2  1207.1490478516 1212.2220458984 0.42%  55496704  55029760  
-466944
gl_driver2   103.0383071899  103.5478439331  0.49%  13107200  12980224  
-126976
gl_manhattan_off 1703.4780273438 1736.9074707031 1.92%  490016768 456548352 
-33468416
gl_trex_off  2951.6809082031 3058.5422363281 3.49%  157511680 152260608 
-5251072
gl_alu2_off  2604.0903320313 2626.2524414063 0.84%  86130688  85483520  
-647168
gl_driver2_off   204.0173187256  207.0510101318  1.47%  40869888  40615936  
-253952


You're missing information on:
* On which plaform you did the testing (affects variance)
* how many test rounds you ran, and
* what is your variance

-> I don't know whether your numbers are just random noise.


Memory is allocated in pages from kernel, so there's no point in showing 
its usage as bytes.  Please use KBs, that's more readable.


(Because of randomness e.g. interactions with the windowing system, 
there can be some variance also in process memory usage, which may

also be useful to report.)

Because of variance, you don't need that decimals for the scores. 
Removing the extra ones makes that data a bit more readable too.



- Eero

[1] "smem" is python based tool available at least in Debian.
If you want something simpler, e.g. shell script working with
minimal shells like Busybox, you can use this:
https://github.com/maemo-tools-old/sp-memusage/blob/master/scripts/mem-smaps-private



GfxBench 5.0
 score   peak memory
  beforeafter   before after   diff
gl_5   259   259  1137549312  1038286848 -99262464
gl_5_off   297   297  1170853888  1071357952 -99495936

Xiong, James (4):
   i965/drm: Reorganize code for the next patch
   i965/drm: Round down buffer size and calculate the bucket index
   i965/drm: Searching for a cached buffer for reuse
   i965/drm: Purge the bucket when its cached buffer is evicted

  src/mesa/drivers/dri/i965/brw_bufmgr.c | 139 ++---
  src/util/list.h|   5 ++
  2 files changed, 79 insertions(+), 65 deletions(-)



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] dri3: Only update number of back buffers in loader_dri3_get_buffers

2018-04-30 Thread Eero Tamminen


Hi,

On 27.04.2018 18:56, Michel Dänzer wrote:

From: Michel Dänzer 

And only free no longer needed back buffers there as well.

We want to stick to the same back buffer throughout a frame, otherwise
we can run into various issues.

Bugzilla: https://bugs.freedesktop.org/105906
Fixes: 3160cb86aa92 "egl/x11: Re-allocate buffers if format is suboptimal"
Signed-off-by: Michel Dänzer 
---

Sergii, Eero, here's an alternative fix which I think is cleaner, can
you test it?


Tested on KBL GT2.

Without the patch, there are >30 compiz crashes within ~3h test run.

With the patch, same compiz process works fine for the whole run.

-> looks good to me.


Tested-by: Eero Tamminen 


- Eero



  src/loader/loader_dri3_helper.c | 19 +++
  1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/src/loader/loader_dri3_helper.c b/src/loader/loader_dri3_helper.c
index 23729f7ecb2..6db8303d26d 100644
--- a/src/loader/loader_dri3_helper.c
+++ b/src/loader/loader_dri3_helper.c
@@ -420,13 +420,6 @@ dri3_handle_present_event(struct loader_dri3_drawable 
*draw,
  
   if (buf && buf->pixmap == ie->pixmap)

  buf->busy = 0;
-
- if (buf && draw->cur_blit_source != b && !buf->busy &&
- (buf->reallocate ||
- (draw->num_back <= b && b < LOADER_DRI3_MAX_BACK))) {
-dri3_free_render_buffer(draw, buf);
-draw->buffers[b] = NULL;
- }
}
break;
 }
@@ -559,7 +552,6 @@ dri3_find_back(struct loader_dri3_drawable *draw)
 /* Check whether we need to reuse the current back buffer as new back.
  * In that case, wait until it's not busy anymore.
  */
-   dri3_update_num_back(draw);
 num_to_consider = draw->num_back;
 if (!loader_dri3_have_image_blit(draw) && draw->cur_blit_source != -1) {
num_to_consider = 1;
@@ -1815,6 +1807,7 @@ loader_dri3_get_buffers(__DRIdrawable *driDrawable,
  {
 struct loader_dri3_drawable *draw = loaderPrivate;
 struct loader_dri3_buffer   *front, *back;
+   int buf_id;
  
 buffers->image_mask = 0;

 buffers->front = NULL;
@@ -1826,6 +1819,16 @@ loader_dri3_get_buffers(__DRIdrawable *driDrawable,
 if (!dri3_update_drawable(driDrawable, draw))
return false;
  
+   dri3_update_num_back(draw);

+
+   /* Free no longer needed back buffers */
+   for (buf_id = draw->num_back; buf_id < LOADER_DRI3_MAX_BACK; buf_id++) {
+  if (draw->cur_blit_source != buf_id && draw->buffers[buf_id]) {
+ dri3_free_render_buffer(draw, draw->buffers[buf_id]);
+ draw->buffers[buf_id] = NULL;
+  }
+   }
+
 /* pixmaps always have front buffers.
  * Exchange swaps also mandate fake front buffers.
  */



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa/main: Rework the shader capture naming convention

2018-04-30 Thread Eero Tamminen


Hi,

On 28.04.2018 17:39, Benedikt Schemmer wrote:

Change from a purely number.shader_test naming scheme to sha_number.shader_test
because especially games often use the same number for entirely different 
shaders
based on graphics settings etc. and then already captured shaders get 
overwritten.
It is also useful for capturing shaders from applications that reuse program
numbers a lot i.e. piglit. This has helped to identify problems there as well.

Plus minor cleanups, that could be split into a separate patch.


For review, you need to have these as separate patches:

- moving code around
- actual functional changes
- non-functional code "cleanups", like changing "foo == NULL" to "!foo"

(Small commits in your own repo and "git rebase -i" help.)


- Eero


I have used this for approximately two months now, seems to work just fine.

---
  src/mesa/main/shaderapi.c | 315 --
  1 file changed, 168 insertions(+), 147 deletions(-)

diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
index 44b18af492..65cf0a67a2 100644
--- a/src/mesa/main/shaderapi.c
+++ b/src/mesa/main/shaderapi.c
@@ -64,6 +64,122 @@
  #include "util/mesa-sha1.h"
  #include "util/crc32.h"

+
+/**
+ * Generate a SHA-1 hash value string for given source string.
+ */
+static void
+generate_sha1(const char *source, char sha_str[64])
+{
+   unsigned char sha[20];
+   _mesa_sha1_compute(source, strlen(source), sha);
+   _mesa_sha1_format(sha_str, sha);
+}
+
+
+#ifdef ENABLE_SHADER_CACHE
+/**
+ * Construct a full path for shader replacement functionality using
+ * following format:
+ *
+ * /_.glsl
+ */
+static char *
+construct_name(const gl_shader_stage stage, const char *source,
+   const char *path)
+{
+   char sha[64];
+   static const char *types[] = {
+  "VS", "TC", "TE", "GS", "FS", "CS",
+   };
+
+   generate_sha1(source, sha);
+   return ralloc_asprintf(NULL, "%s/%s_%s.glsl", path, types[stage], sha);
+}
+
+/**
+ * Write given shader source to a file in MESA_SHADER_DUMP_PATH.
+ */
+static void
+dump_shader(const gl_shader_stage stage, const char *source)
+{
+   static bool path_exists = true;
+   char *dump_path;
+   FILE *f;
+
+   if (!path_exists)
+  return;
+
+   dump_path = getenv("MESA_SHADER_DUMP_PATH");
+   if (!dump_path) {
+  path_exists = false;
+  return;
+   }
+
+   char *name = construct_name(stage, source, dump_path);
+
+   f = fopen(name, "w");
+   if (f) {
+  fputs(source, f);
+  fclose(f);
+   } else {
+  GET_CURRENT_CONTEXT(ctx);
+  _mesa_warning(ctx, "could not open %s for dumping shader (%s)", name,
+strerror(errno));
+   }
+   ralloc_free(name);
+}
+
+/**
+ * Read shader source code from a file.
+ * Useful for debugging to override an app's shader.
+ */
+static GLcharARB *
+read_shader(const gl_shader_stage stage, const char *source)
+{
+   char *read_path;
+   static bool path_exists = true;
+   int len, shader_size = 0;
+   GLcharARB *buffer;
+   FILE *f;
+
+   if (!path_exists)
+  return NULL;
+
+   read_path = getenv("MESA_SHADER_READ_PATH");
+   if (!read_path) {
+  path_exists = false;
+  return NULL;
+   }
+
+   char *name = construct_name(stage, source, read_path);
+   f = fopen(name, "r");
+   ralloc_free(name);
+   if (!f)
+  return NULL;
+
+   /* allocate enough room for the entire shader */
+   fseek(f, 0, SEEK_END);
+   shader_size = ftell(f);
+   rewind(f);
+   assert(shader_size);
+
+   /* add one for terminating zero */
+   shader_size++;
+
+   buffer = malloc(shader_size);
+   assert(buffer);
+
+   len = fread(buffer, 1, shader_size, f);
+   buffer[len] = 0;
+
+   fclose(f);
+
+   return buffer;
+}
+
+#endif /* ENABLE_SHADER_CACHE */
+
  /**
   * Return mask of GLSL_x flags by examining the MESA_GLSL env var.
   */
@@ -125,11 +241,10 @@ _mesa_init_shader_state(struct gl_context *ctx)
 /* Device drivers may override these to control what kind of instructions
  * are generated by the GLSL compiler.
  */
-   struct gl_shader_compiler_options options;
+   struct gl_shader_compiler_options options = {};
 gl_shader_stage sh;
 int i;

-   memset(&options, 0, sizeof(options));
 options.MaxUnrollIterations = 32;
 options.MaxIfDepth = UINT_MAX;

@@ -138,7 +253,7 @@ _mesa_init_shader_state(struct gl_context *ctx)

 ctx->Shader.Flags = _mesa_get_shader_flags();

-   if (ctx->Shader.Flags != 0)
+   if (ctx->Shader.Flags)
ctx->Const.GenerateTemporaryNames = true;

 /* Extended for ARB_separate_shader_objects */
@@ -655,7 +770,8 @@ get_programiv(struct gl_context *ctx, GLuint program, 
GLenum pname,
GLint *params)
  {
 struct gl_shader_program *shProg
-  = _mesa_lookup_shader_program_err(ctx, program, 
"glGetProgramiv(program)");
+  = _mesa_lookup_shader_program_err(ctx, program,
+"glGetProgramiv(program)");

 /* Is transform feedback available in this conte

Re: [Mesa-dev] [PATCH] anv/allocator: Don't srink either end of the block pool

2018-04-23 Thread Eero Tamminen


Hi,

On 21.04.2018 08:15, Jason Ekstrand wrote:

Previously, we only tried to ensure that we didn't shrink either end
below what was already handed out.  However, due to the way we handle
relocations with block pools, we can't shrink the back end at all.  It's
probably best to not shrink in either direction.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105374
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106147


Verified that "texture3d" demo works with this patch, crashes without it.

Tested-by: Eero Tamminen 


- Eero


Cc: mesa-sta...@lists.freedesktop.org
---
  src/intel/vulkan/anv_allocator.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index f884ac3..642e161 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -508,12 +508,12 @@ anv_block_pool_grow(struct anv_block_pool *pool, struct 
anv_block_state *state)
assert(center_bo_offset >= back_used);
  
/* Make sure we don't shrink the back end of the pool */

-  if (center_bo_offset < pool->back_state.end)
- center_bo_offset = pool->back_state.end;
+  if (center_bo_offset < back_required)
+ center_bo_offset = back_required;
  
/* Make sure that we don't shrink the front end of the pool */

-  if (size - center_bo_offset < pool->state.end)
- center_bo_offset = size - pool->state.end;
+  if (size - center_bo_offset < front_required)
+ center_bo_offset = size - front_required;
 }
  
 assert(center_bo_offset % PAGE_SIZE == 0);




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] dri3: Prevent multiple freeing of buffers.

2018-04-11 Thread Eero Tamminen


Hi,

Tested on KBL GT2.

Before the patch, there are few dozen compiz segfaults during ~3 hours 
of testing.  With the patched version there are no segfaults.


Tested-by: Eero Tamminen 

- Eero

On 10.04.2018 10:44, Sergii Romantsov wrote:

Commit 3160cb86aa92 adds optimization with flag 'reallocate'.
Processing of flag causes buffers freeing while pointer
is still hold in caller stack and than again used to be freed.

Fixes: 3160cb86aa92 "egl/x11: Re-allocate buffers if format is suboptimal"

v2:
  used flag 'busy' instead of introducing new one.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105906
Signed-off-by: Sergii Romantsov 
Tested-by: Andriy Khulap 
---
  src/loader/loader_dri3_helper.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/loader/loader_dri3_helper.c b/src/loader/loader_dri3_helper.c
index fe17df1..a934db1 100644
--- a/src/loader/loader_dri3_helper.c
+++ b/src/loader/loader_dri3_helper.c
@@ -1688,6 +1688,7 @@ dri3_get_buffer(__DRIdrawable *driDrawable,
 (buffer_type == loader_dri3_buffer_front && draw->have_fake_front))
&& buffer) {
  
+ buffer->busy = true;

   /* Fill the new buffer with data from an old buffer */
   dri3_fence_await(draw->conn, draw, buffer);
   if (!loader_dri3_blit_image(draw,
@@ -1731,6 +1732,7 @@ dri3_get_buffer(__DRIdrawable *driDrawable,
draw->buffers[buf_id] = buffer;
 }
 dri3_fence_await(draw->conn, draw, buffer);
+   buffer = draw->buffers[buf_id];
  
 /*

  * Do we need to preserve the content of a previous buffer?
@@ -1744,7 +1746,8 @@ dri3_get_buffer(__DRIdrawable *driDrawable,
 if (buffer_type == loader_dri3_buffer_back &&
 draw->cur_blit_source != -1 &&
 draw->buffers[draw->cur_blit_source] &&
-   buffer != draw->buffers[draw->cur_blit_source]) {
+   buffer != draw->buffers[draw->cur_blit_source] &&
+   buffer != NULL) {
  
struct loader_dri3_buffer *source = draw->buffers[draw->cur_blit_source];
  



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] dri3: Prevent multiple freeing of buffers.

2018-04-06 Thread Eero Tamminen


Hi,

I tested this on KBL GT2.  Compiz crashes during 3h test run dropped 
from 30 to none.  There were couple of percent changes in few synthetic 
tests, but I think that's just because Compiz is now running properly.


-> looks good!


- Eero

Tested-by: Eero Tamminen 


On 06.04.2018 11:12, Sergii Romantsov wrote:

Commit 3160cb86aa92 adds optimization with flag 'reallocate'.
Processing of flag causes buffers freeing while pointer
is still hold in caller stack and than again used to be freed.

Fixes: 3160cb86aa92 "egl/x11: Re-allocate buffers if format is suboptimal"

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105906
Signed-off-by: Sergii Romantsov 
Tested-by: Andriy Khulap 
---
  src/loader/loader_dri3_helper.c | 7 +--
  src/loader/loader_dri3_helper.h | 1 +
  2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/loader/loader_dri3_helper.c b/src/loader/loader_dri3_helper.c
index fe17df1..5f9cc42 100644
--- a/src/loader/loader_dri3_helper.c
+++ b/src/loader/loader_dri3_helper.c
@@ -422,7 +422,7 @@ dri3_handle_present_event(struct loader_dri3_drawable *draw,
  buf->busy = 0;
  
   if (buf && draw->cur_blit_source != b && !buf->busy &&

- (buf->reallocate ||
+ ((buf->reallocate && !buf->in_use_to_destroy) ||
   (draw->num_back <= b && b < LOADER_DRI3_MAX_BACK))) {
  dri3_free_render_buffer(draw, buf);
  draw->buffers[b] = NULL;
@@ -1688,6 +1688,7 @@ dri3_get_buffer(__DRIdrawable *driDrawable,
 (buffer_type == loader_dri3_buffer_front && draw->have_fake_front))
&& buffer) {
  
+ buffer->in_use_to_destroy = true;

   /* Fill the new buffer with data from an old buffer */
   dri3_fence_await(draw->conn, draw, buffer);
   if (!loader_dri3_blit_image(draw,
@@ -1731,6 +1732,7 @@ dri3_get_buffer(__DRIdrawable *driDrawable,
draw->buffers[buf_id] = buffer;
 }
 dri3_fence_await(draw->conn, draw, buffer);
+   buffer = draw->buffers[buf_id];
  
 /*

  * Do we need to preserve the content of a previous buffer?
@@ -1744,7 +1746,8 @@ dri3_get_buffer(__DRIdrawable *driDrawable,
 if (buffer_type == loader_dri3_buffer_back &&
 draw->cur_blit_source != -1 &&
 draw->buffers[draw->cur_blit_source] &&
-   buffer != draw->buffers[draw->cur_blit_source]) {
+   buffer != draw->buffers[draw->cur_blit_source] &&
+   buffer != NULL) {
  
struct loader_dri3_buffer *source = draw->buffers[draw->cur_blit_source];
  
diff --git a/src/loader/loader_dri3_helper.h b/src/loader/loader_dri3_helper.h

index 7e3d829..9232d61 100644
--- a/src/loader/loader_dri3_helper.h
+++ b/src/loader/loader_dri3_helper.h
@@ -62,6 +62,7 @@ struct loader_dri3_buffer {
 bool busy;   /* Set on swap, cleared on IdleNotify */
 bool own_pixmap; /* We allocated the pixmap ID, free on 
destroy */
 bool reallocate; /* Buffer should be reallocated and not 
reused */
+   bool in_use_to_destroy; /* Buffer is in use and will be 
destroyed soon */
  
 uint32_t num_planes;

 uint32_t size;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: annotate brw_oa.py's --header and --code as required

2018-03-28 Thread Eero Tamminen


Hi,

On 20.03.2018 19:06, Dylan Baker wrote:

Quoting Emil Velikov (2018-03-20 09:29:00)
[snip]

  gens = []
  for xml_file in args.xml_files:
@@ -617,7 +610,7 @@ def main():
  
  """))
  
-c("#include \"" + os.path.basename(args.header) + "\"")

+c("#include \"" + os.path.basename(header_file) + "\"")


You're calling os.path.basename on a file object, which isn't valid. This should
still be args.header.


One could also use .name.


- Eero

  
  c(textwrap.dedent("""\

  #include "brw_context.h"

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] GfxBench & CSDof failures

2018-03-28 Thread Eero Tamminen


Hi,

On 28.03.2018 13:27, Eero Tamminen wrote:

Mesa built from following (last evening) commit:

commit 76dfed8ae2d5c6c509eb2661389be3c6a25077df
Author: Rob Clark 
AuthorDate: Thu Mar 15 18:42:44 2018 -0400
Commit: Rob Clark 
CommitDate: Tue Mar 27 08:36:37 2018 -0400

     nir: mako all the intrinsics


Fails with following GL 4.3 benchmarks I tested:
* GfxBench Manhattan 3.1, CarChase and AztecRuins
* SynMark CSDof

Issues are following:
* CSDof & ActecRuins: compile takes so long that test is killed


Sorry, I should have read more of my mesa-dev 2k mail backlog,
not just check bugs.

It was the memory usage issue mentioned by Ian & Clayton,
not compile time issue.



* Manhattan 3.1 shader linking failed:
-
error: declarations for uniform `depth_parameters` are inside block 
`cameraConsts` and outside a block

-


And this was also using all memory, so I assume issue was due to Mesa
shader compiler handling memory allocation failure badly (continuing
to a bogus compile error instead of reporting compile failure due to 
allocation failure).




Commit from one day earlier, didn't have any problems.

Neither does latest Mesa git version.

Which commit fixed the regression, is it this one:


I assume it was:
-
commit 5f21a7afe072f8a6e558ccc47407a0a94e0d1313
Author: Jason Ekstrand 
AuthorDate: Tue Mar 27 16:12:16 2018 -0700
Commit: Jason Ekstrand 
CommitDate: Tue Mar 27 18:18:26 2018 -0700

nir/intrinsics: Don't report negative dest_components
-


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] GfxBench & CSDof failures

2018-03-28 Thread Eero Tamminen


Hi,

Mesa built from following (last evening) commit:

commit 76dfed8ae2d5c6c509eb2661389be3c6a25077df
Author: Rob Clark 
AuthorDate: Thu Mar 15 18:42:44 2018 -0400
Commit: Rob Clark 
CommitDate: Tue Mar 27 08:36:37 2018 -0400

nir: mako all the intrinsics


Fails with following GL 4.3 /compute shader benchmarks I tested:
* GfxBench Manhattan 3.1, CarChase and AztecRuins
* SynMark CSDof

Issues are following:
* CSDof & ActecRuins: compile takes so long that test is killed
* Manhattan 3.1 shader linking failed:
-
error: declarations for uniform `depth_parameters` are inside block 
`cameraConsts` and outside a block

-

Commit from one day earlier, didn't have any problems.

Neither does latest Mesa git version.

Which commit fixed the regression, is it this one:
-
commit 629ee690addad9b3dc8f68cfff5ae09858f31caf
Author: Timothy Arceri 
AuthorDate: Mon Mar 26 11:41:51 2018 +1100
Commit: Timothy Arceri 
CommitDate: Wed Mar 28 09:59:38 2018 +1100

nir: fix crash in loop unroll corner case
-
?


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/5] i965: Hard code scratch_ids_per_subslice for Cherryview

2018-03-07 Thread Eero Tamminen


Hi,

Tested SynMark CSDof and GfxBench Aztec Ruins GL & GLES / normal & high 
versions, which were earlier GPU hanging.  With this patch hangs are gone.


Tested-by: Eero Tamminen 


On 07.03.2018 10:16, Jordan Justen wrote:

Ken suggested that we might be underallocating scratch space on HD
400. Allocating scratch space as though there was actually 8 EUs


s/8/18/?

- Eero



seems to help with a GPU hang seen on synmark CSDof.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290
Cc: Kenneth Graunke 
Cc: Eero Tamminen 
Cc: 
Signed-off-by: Jordan Justen 
---
  src/mesa/drivers/dri/i965/brw_program.c | 44 -
  1 file changed, 27 insertions(+), 17 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
b/src/mesa/drivers/dri/i965/brw_program.c
index 527f003977b..c121136c439 100644
--- a/src/mesa/drivers/dri/i965/brw_program.c
+++ b/src/mesa/drivers/dri/i965/brw_program.c
@@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw,
if (devinfo->gen >= 9)
   subslices = 4 * brw->screen->devinfo.num_slices;
  
-  /* WaCSScratchSize:hsw

-   *
-   * Haswell's scratch space address calculation appears to be sparse
-   * rather than tightly packed.  The Thread ID has bits indicating
-   * which subslice, EU within a subslice, and thread within an EU
-   * it is.  There's a maximum of two slices and two subslices, so these
-   * can be stored with a single bit.  Even though there are only 10 EUs
-   * per subslice, this is stored in 4 bits, so there's an effective
-   * maximum value of 16 EUs.  Similarly, although there are only 7
-   * threads per EU, this is stored in a 3 bit number, giving an effective
-   * maximum value of 8 threads per EU.
-   *
-   * This means that we need to use 16 * 8 instead of 10 * 7 for the
-   * number of threads per subslice.
-   */
-  const unsigned scratch_ids_per_subslice =
- devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads;
+  unsigned scratch_ids_per_subslice;
+  if (devinfo->is_haswell) {
+ /* WaCSScratchSize:hsw
+  *
+  * Haswell's scratch space address calculation appears to be sparse
+  * rather than tightly packed. The Thread ID has bits indicating
+  * which subslice, EU within a subslice, and thread within an EU it
+  * is. There's a maximum of two slices and two subslices, so these
+  * can be stored with a single bit. Even though there are only 10 EUs
+  * per subslice, this is stored in 4 bits, so there's an effective
+  * maximum value of 16 EUs. Similarly, although there are only 7
+  * threads per EU, this is stored in a 3 bit number, giving an
+  * effective maximum value of 8 threads per EU.
+  *
+  * This means that we need to use 16 * 8 instead of 10 * 7 for the
+  * number of threads per subslice.
+  */
+ scratch_ids_per_subslice = 16 * 8;
+  } else if (devinfo->is_cherryview) {
+ /* For Cherryview, it appears that the scratch addresses for the 6 EU
+  * devices may still generate compute scratch addresses covering the
+  * same range as 8 EU.
+  */
+ scratch_ids_per_subslice = 8 * 7;
+  } else {
+ scratch_ids_per_subslice = devinfo->max_cs_threads;
+  }
  
thread_count = scratch_ids_per_subslice * subslices;

break;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/5] i965: Hard code scratch_ids_per_subslice for Cherryview

2018-03-07 Thread Eero Tamminen


Hi,

Tested SynMark CSDof and GfxBench Aztec Ruins GL & GLES / normal & high 
versions, which were earlier GPU hanging.  With this patch hangs are gone.


Tested-by: Eero Tamminen 


On 07.03.2018 10:16, Jordan Justen wrote:

Ken suggested that we might be underallocating scratch space on HD
400. Allocating scratch space as though there was actually 8 EUs


s/8/18/?

- Eero



seems to help with a GPU hang seen on synmark CSDof.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290
Cc: Kenneth Graunke 
Cc: Eero Tamminen 
Cc: 
Signed-off-by: Jordan Justen 
---
  src/mesa/drivers/dri/i965/brw_program.c | 44 -
  1 file changed, 27 insertions(+), 17 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
b/src/mesa/drivers/dri/i965/brw_program.c
index 527f003977b..c121136c439 100644
--- a/src/mesa/drivers/dri/i965/brw_program.c
+++ b/src/mesa/drivers/dri/i965/brw_program.c
@@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw,
if (devinfo->gen >= 9)
   subslices = 4 * brw->screen->devinfo.num_slices;
  
-  /* WaCSScratchSize:hsw

-   *
-   * Haswell's scratch space address calculation appears to be sparse
-   * rather than tightly packed.  The Thread ID has bits indicating
-   * which subslice, EU within a subslice, and thread within an EU
-   * it is.  There's a maximum of two slices and two subslices, so these
-   * can be stored with a single bit.  Even though there are only 10 EUs
-   * per subslice, this is stored in 4 bits, so there's an effective
-   * maximum value of 16 EUs.  Similarly, although there are only 7
-   * threads per EU, this is stored in a 3 bit number, giving an effective
-   * maximum value of 8 threads per EU.
-   *
-   * This means that we need to use 16 * 8 instead of 10 * 7 for the
-   * number of threads per subslice.
-   */
-  const unsigned scratch_ids_per_subslice =
- devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads;
+  unsigned scratch_ids_per_subslice;
+  if (devinfo->is_haswell) {
+ /* WaCSScratchSize:hsw
+  *
+  * Haswell's scratch space address calculation appears to be sparse
+  * rather than tightly packed. The Thread ID has bits indicating
+  * which subslice, EU within a subslice, and thread within an EU it
+  * is. There's a maximum of two slices and two subslices, so these
+  * can be stored with a single bit. Even though there are only 10 EUs
+  * per subslice, this is stored in 4 bits, so there's an effective
+  * maximum value of 16 EUs. Similarly, although there are only 7
+  * threads per EU, this is stored in a 3 bit number, giving an
+  * effective maximum value of 8 threads per EU.
+  *
+  * This means that we need to use 16 * 8 instead of 10 * 7 for the
+  * number of threads per subslice.
+  */
+ scratch_ids_per_subslice = 16 * 8;
+  } else if (devinfo->is_cherryview) {
+ /* For Cherryview, it appears that the scratch addresses for the 6 EU
+  * devices may still generate compute scratch addresses covering the
+  * same range as 8 EU.
+  */
+ scratch_ids_per_subslice = 8 * 7;
+  } else {
+ scratch_ids_per_subslice = devinfo->max_cs_threads;
+  }
  
thread_count = scratch_ids_per_subslice * subslices;

break;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] i965/tex_image: Avoid the ASTC LDR workaround on gen9lp

2018-03-07 Thread Eero Tamminen


Hi,

This drops the internal benchmark startup time from 18 to ~1 seconds 
(for warm startup) on the gen9lp platform where I tested it.


Tested-by: Eero tammi...@intel.com

- Eero

On 06.03.2018 00:07, Nanley Chery wrote:

Both the internal documentation and the results of testing this in the
CI suggest that this is unnecessary. Add the fixes tag because this
reduces an internal benchmark's startup time significantly. Eero
reported that a less optimal version of this patch reduced the startup
time of the benchmark by about 14 seconds.

Fixes: 710b1d2e665 "i965/tex_image: Flush certain subnormal ASTC channel values"
Cc: Eero Tamminen 
Cc: Scott D Phillips 
---
  src/mesa/drivers/dri/i965/intel_tex_image.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
b/src/mesa/drivers/dri/i965/intel_tex_image.c
index e25bc9a0c08..a0408b304d5 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -927,7 +927,7 @@ intelCompressedTexSubImage(struct gl_context *ctx, GLuint 
dims,
  !_mesa_is_srgb_format(gl_format);
 struct brw_context *brw = (struct brw_context*) ctx;
 const struct gen_device_info *devinfo = &brw->screen->devinfo;
-   if (devinfo->gen == 9 && is_linear_astc)
+   if (devinfo->gen == 9 && !gen_device_info_is_9lp(devinfo) && is_linear_astc)
flush_astc_denorms(ctx, dims, texImage,
   xoffset, yoffset, zoffset,
   width, height, depth);



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/6] intel/fs: Handle surface opcode sample masks via predication.

2018-03-02 Thread Eero Tamminen


Hi,

On 27.02.2018 23:38, Francisco Jerez wrote:

The main motivation is to enable HDC surface opcodes on ICL which no
longer allows the sample mask to be provided in a message header, but
this is enabled all the way back to IVB when possible because it
decreases the instruction count of some shaders using HDC messages
significantly, e.g. one of the SynMark2 CSDof compute shaders
decreases instruction count by about 40% due to the removal of header
setup boilerplate which in turn makes a number of send message
payloads more easily CSE-able.  Shader-db results on SKL:

  total instructions in shared programs: 15325319 -> 15314384 (-0.07%)
  instructions in affected programs: 311532 -> 300597 (-3.51%)
  helped: 491
  HURT: 1

Shader-db results on BDW where the optimization needs to be disabled
in some cases due to hardware restrictions:

  total instructions in shared programs: 15604794 -> 15598028 (-0.04%)
  instructions in affected programs: 220863 -> 214097 (-3.06%)
  helped: 351
  HURT: 0

The FPS of SynMark2 CSDof improves by 5.09% ±0.36% (n=10) on my SKL
laptop with this change.


I tested the series with our full benchmark set, on BYT, HSW GT2, BXT 
and SKL GT2.


CSDof improved by:
* 9% on BYT
* 7-8% on BXT J4205, and on SKL GT2 desktop

(Variance on HSW was too large in this test, to conclude anything.)

Changes in all the other tests were within daily variance.


Tested-By: Eero Tamminen 


- Eero

---
  src/intel/compiler/brw_fs.cpp | 42 +-
  1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 0b87d8ab14e..639432b4f49 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -4432,6 +4432,8 @@ static void
  lower_surface_logical_send(const fs_builder &bld, fs_inst *inst, opcode op,
 const fs_reg &sample_mask)
  {
+   const gen_device_info *devinfo = bld.shader->devinfo;
+
 /* Get the logical send arguments. */
 const fs_reg &addr = inst->src[0];
 const fs_reg &src = inst->src[1];
@@ -4442,7 +,20 @@ lower_surface_logical_send(const fs_builder &bld, 
fs_inst *inst, opcode op,
 /* Calculate the total number of components of the payload. */
 const unsigned addr_sz = inst->components_read(0);
 const unsigned src_sz = inst->components_read(1);
-   const unsigned header_sz = (sample_mask.file == BAD_FILE ? 0 : 1);
+   /* From the BDW PRM Volume 7, page 147:
+*
+*  "For the Data Cache Data Port*, the header must be present for the
+*   following message types: [...] Typed read/write/atomics"
+*
+* Earlier generations have a similar wording.  Because of this restriction
+* we don't attempt to implement sample masks via predication for such
+* messages prior to Gen9, since we have to provide a header anyway.  On
+* Gen11+ the header has been removed so we can only use predication.
+*/
+   const unsigned header_sz = devinfo->gen < 9 &&
+  (op == SHADER_OPCODE_TYPED_SURFACE_READ ||
+   op == SHADER_OPCODE_TYPED_SURFACE_WRITE ||
+   op == SHADER_OPCODE_TYPED_ATOMIC) ? 1 : 0;
 const unsigned sz = header_sz + addr_sz + src_sz;
  
 /* Allocate space for the payload. */

@@ -4462,6 +4477,31 @@ lower_surface_logical_send(const fs_builder &bld, 
fs_inst *inst, opcode op,
  
 bld.LOAD_PAYLOAD(payload, components, sz, header_sz);
  
+   /* Predicate the instruction on the sample mask if no header is

+* provided.
+*/
+   if (!header_sz && sample_mask.file != BAD_FILE &&
+   sample_mask.file != IMM) {
+  const fs_builder ubld = bld.group(1, 0).exec_all();
+  if (inst->predicate) {
+ assert(inst->predicate == BRW_PREDICATE_NORMAL);
+ assert(!inst->predicate_inverse);
+ assert(inst->flag_subreg < 2);
+ /* Combine the sample mask with the existing predicate by using a
+  * vertical predication mode.
+   */
+ inst->predicate = BRW_PREDICATE_ALIGN1_ALLV;
+ ubld.MOV(retype(brw_flag_subreg(inst->flag_subreg + 2),
+ sample_mask.type),
+  sample_mask);
+  } else {
+ inst->flag_subreg = 2;
+ inst->predicate = BRW_PREDICATE_NORMAL;
+ ubld.MOV(retype(brw_flag_subreg(inst->flag_subreg), sample_mask.type),
+  sample_mask);
+  }
+   }
+
 /* Update the original instruction. */
 inst->opcode = op;
 inst->mlen = header_sz + (addr_sz + src_sz) * inst->exec_size / 8;



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 08/15] glsl: Switch ast_type_qualifier to a 128-bit bitset.

2018-02-26 Thread Eero Tamminen


Hi,

On 25.02.2018 22:12, Francisco Jerez wrote:

Roland Scheidegger  writes:


Am 25.02.2018 um 03:35 schrieb Francisco Jerez:

Roland Scheidegger  writes:


This seems to have broken compilation with some gcc versions (with scons
build):

In file included from src/compiler/glsl/ast_array_index.cpp:24:0:
src/compiler/glsl/ast.h:648:16: error: member
‘ast_type_qualifier::bitset_t ast_type_qualifier::flags::i’ with
constructor not allowed in union
bitset_t i;
 ^


Oops...  And the only reason bitset_t has a default constructor was...
to avoid using another C++11 feature (defaulted member functions).
Does the attached patch fix the build failure for you?  The cleaner
alternative would be to define the default constructor of the bitset
object like 'T() = default', but that would imply dropping support for
GCC 4.2-4.3 which don't implement the feature...


FWIW the compile error was happening with gcc 4.8 - I didn't see it with
gcc 5.4.


I do.  It's broken with Ubuntu 16.04 gcc:
$ g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.6) 5.4.0 20160609


It does compile with gcc 6.3 in Ubuntu 17.04 though.  However, with
that, I get segfault when using INTEL_DEBUG=shader_time with this commit
e.g. with SynMark v7, GpuTest v0.7.  I didn't get segfault when using
day older Mesa git version.



(I don't think at vmware we'd care about anything older than gcc 4.4 at
least but last time someone wanted to bump gcc requirements there were
still people requiring gcc 4.2.)

The patch compiles albeit there's about two dozen warnings like the
following:
glsl/ast_type.cpp: In member function 'bool
ast_fully_specified_type::has_qualifiers(_mesa_glsl_parse_state*) const':
glsl/ast_type.cpp:50:67: warning: ISO C++ says that these are ambiguous,
even though the worst conversion for the first is better than the worst
conversion for the second: [enabled by default]
 return (this->qualifier.flags.i & ~subroutine_only.flags.i) != 0;
^
In file included from glsl/ast.h:31:0,
  from glsl/ast_type.cpp:24:
../../src/util/bitset.h:181:7: note: candidate 1: bool operator!=(const
ast_type_qualifier::bitset_t&, unsigned int)
operator!=(const T &b, BITSET_WORD x) \
^
glsl/ast.h:477:4: note: in expansion of macro 'DECLARE_BITSET_T'
 DECLARE_BITSET_T(bitset_t, 128);
 ^
glsl/ast_type.cpp:50:67: note: candidate 2: operator!=(int, int) 
 return (this->qualifier.flags.i & ~subroutine_only.flags.i) != 0;


Ah, yeah, that's because I didn't provide overloads for signed integer
types, but it should be harmless since the two candidates have the same
semantics, and should go away with a C++11-capable compiler.  I think
the attached patch should shut the warnings on older compilers.


Yes, it fixed the compilation with gcc 5.4, and INTEL_DEBUG=shader_time
segfaults with gcc 6.3 build.

Tested-by: Eero Tamminen 


- Eero






src/compiler/glsl/ast.h:648:16: note: unrestricted unions only available
with -std=c++11 or -std=gnu++11
scons: *** [build/linux-x86_64-checked/compiler/glsl/ast_array_index.os]
Error 1
src/gallium/tests/unit/u_format_test.c: In function ‘main’:
src/gallium/tests/unit/u_format_test.c:649:44: warning: array subscript
is above array bounds [-Warray-bounds]
   unpacked[i][j] = test->unpacked[i][j][1];
 ^
In file included from src/compiler/glsl/ast_expr.cpp:24:0:
src/compiler/glsl/ast.h:648:16: error: member
‘ast_type_qualifier::bitset_t ast_type_qualifier::flags::i’ with
constructor not allowed in union
bitset_t i;
 ^
src/compiler/glsl/ast.h:648:16: note: unrestricted unions only available
with -std=c++11 or -std=gnu++11

Roland

[...]






___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/4] i965: Implement dual-patch tessellation evaluation shaders.

2018-02-23 Thread Eero Tamminen


Hi,

Could you also give dual & single shaders different names when dumping
them, e.g. something like the attached patch?


- Eero

On 23.02.2018 10:36, Kenneth Graunke wrote:

Normally, SIMD8 tessellation evaluation shaders operate on a single
patch, with each channel operating on a different vertex within the
patch.  For patch primitives with fewer than 8 vertices, this means
that some of the channels are disabled, effectively wasting compute
power.  This is a fairly common case.

To solve this, Skylake and later add "dual-patch" domain shader support.
Dual-patch mode only supports PATCHLIST_1..4.  The first four channels
process vertices in the first patch, while the second four channels
process vertices from a second patch.  This can double the throughput.

Similar to pixel shader SIMD8 vs. SIMD16 handling, 3DSTATE_DS accepts
two KSPs - one for single-patch mode, and one for dual-patch mode.  The
hardware will dynamically dispatch whichever kernel makes sense.

The two shader modes are nearly identical.  The three differences are:

- There are two URB handles instead of one.
- Dual-patch has an extra payload register (g1) for PrimitiveID.
   (single-patch packs it in g0, but they couldn't fit two IDs there)
- Pushed input data arrives interleaved rather than packed (in SIMD4x2
   style rather than SIMD4x1 style).  For example, varyings 'a' and 'b'
   would show up as follows:

   Single patch (SIMD4x1 style inputs):
   g6: | b1.x | b1.y | b1.z | b1.w | a1.x | a1.y | a1.z | a1.w |

   Dual patch (SIMD4x2 style inputs):
   g6: | a2.x | a2.y | a2.z | a2.w | a1.x | a1.y | a1.z | a1.w |
   g7: | b2.x | b2.y | b2.z | b2.w | b1.x | b1.y | b1.z | b1.w |

This is fairly easy to adjust for - in fact, we can take the FS IR
we already generated for single-patch mode and transform it to
create a dual-patch shader.  We only need to repeat register allocation.

Our load_input handling for TES shaders always emits MOVs to expand
a scalar input to a full SIMD8 register.  While these may be copy
propagated away, it does mean all input file access will be a scalar
<0,1,0> region.  (The FS backend in theory could recognize things
like dot(input1, input2) and emit a vector DP4 operation, but it does
not do such things today, nor is it likely to gain such support.)

Likewise, our URB access reads a single 32-bit URB handle from the
payload, expanding it to 8 handles for the SIMD8 URB messages.  We
can adjust the region from <0,1,0> to <1,4,0> on that MOV to replicate
the first four times, and the second another four times.

Despite having a register for PrimitiveID, the documentation says
dual-patch mode is not allowed when PrimitiveID is in use.  So we
don't need to handle that.

This should be all that's required.

Improves performance in Synmark's Gl40TerrainFlyTess at 1024x768 on
Apollolake 3x6 with single-channel RAM by 0.161727% +/- 0.0686365%
(n=1062).
---
  src/intel/compiler/brw_compiler.h |  2 +
  src/intel/compiler/brw_fs.cpp | 98 ++-
  src/intel/compiler/brw_fs.h   |  2 +
  src/intel/compiler/brw_fs_visitor.cpp |  1 +
  src/intel/compiler/brw_shader.cpp |  3 +
  src/mesa/drivers/dri/i965/genX_state_upload.c |  7 ++
  6 files changed, 112 insertions(+), 1 deletion(-)

This is not that impressive in itself, but it seems like it can only
help...the hardware ought to automatically dispatch in dual mode if
half the channels in an invocation were going to be wasted.

diff --git a/src/intel/compiler/brw_compiler.h 
b/src/intel/compiler/brw_compiler.h
index b1086bbcee5..311cff5a33d 100644
--- a/src/intel/compiler/brw_compiler.h
+++ b/src/intel/compiler/brw_compiler.h
@@ -986,6 +986,8 @@ struct brw_tes_prog_data
 enum brw_tess_partitioning partitioning;
 enum brw_tess_output_topology output_topology;
 enum brw_tess_domain domain;
+
+   uint32_t prog_offset_dual_patch;
  };
  
  struct brw_gs_prog_data

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index bed632d21b9..02383280932 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -6278,6 +6278,16 @@ fs_visitor::run_tcs_single_patch()
 return !failed;
  }
  
+static void

+copy_fs_inst_list(void *mem_ctx, exec_list *dst, exec_list *src)
+{
+   dst->make_empty();
+
+   foreach_in_list(fs_inst, inst, src) {
+  dst->push_tail(new(mem_ctx) fs_inst(*inst));
+   }
+}
+
  bool
  fs_visitor::run_tes()
  {
@@ -6307,9 +6317,95 @@ fs_visitor::run_tes()
 assign_tes_urb_setup();
  
 fixup_3src_null_dest();

+
+   /* The Skylake 3DSTATE_DS documentation says:
+*
+*"SIMD8_SINGLE_OR_DUAL_PATCH must not be used if the domain shader
+* kernel uses primitive id."
+*
+* So, don't bother compiling a dual-patch shader in that case.
+*/
+   cfg_t *single_patch_cfg = cfg;
+   if (devinfo->gen >= 9 &&
+   (nir->info.system_values_read & SYSTEM_BIT_PRIMITIVE_ID) == 0 &&
+   !(IN

Re: [Mesa-dev] [PATCH shaderdb 3/3] run: shader program file created via GetProgramBinary

2018-02-14 Thread Eero Tamminen


Hi,

On 13.02.2018 03:26, Dongwon Kim wrote:

extraction of linked binary program to a file using glGetProgramBinary.
This file is intended to be loaded by glProgramBinary in the graphic
application running on the target system.

A new option, '--out=' is available to be used for specifying
the output file name.

Signed-off-by: Dongwon Kim 
---
  run.c | 46 --
  1 file changed, 44 insertions(+), 2 deletions(-)

diff --git a/run.c b/run.c
index d066567..54575e1 100644
--- a/run.c
+++ b/run.c
@@ -358,18 +358,20 @@ const struct platform platforms[] = {
  enum
  {
  PCI_ID_OVERRIDE_OPTION = CHAR_MAX + 1,
+OUT_PROGRAM_OPTION,
  };
  
  const struct option const long_options[] =

  {
  {"pciid", required_argument, NULL, PCI_ID_OVERRIDE_OPTION},
+{"out", required_argument, NULL, OUT_PROGRAM_OPTION},
  {NULL, 0, NULL, 0}
  };
  
  void print_usage(const char *prog_name)

  {
  fprintf(stderr,
-"Usage: %s [-d ] [-j ] [-o ] [-p ] 
[--pciid=] \n",
+"Usage: %s [-d ] [-j ] [-o ] [-p ] 
[--pciid=] [--out=] \n",
  prog_name);
  }
  
@@ -450,6 +452,7 @@ main(int argc, char **argv)

  int opt;
  bool platf_overridden = 0;
  bool pci_id_overridden = 0;
+char out_file[64] = {0};


File names can be potentially thousands of chars long.

Why not just use:
const char *out_file = NULL;
?


  max_threads = omp_get_max_threads();
  
@@ -518,6 +521,13 @@ main(int argc, char **argv)

  setenv("INTEL_DEVID_OVERRIDE", optarg, 1);
  pci_id_overridden = 1;
  break;
+case OUT_PROGRAM_OPTION:
+if (optarg[0] == 0) {
+  fprintf(stderr, "Output file name is empty.\n");
+  return -1;
+}
+strncpy(out_file, optarg, 64);


...and if pointer cannot assigned directly, strdup & assert if that fails.



+break;
  default:
  fprintf(stderr, "Unknown option: %x\n", opt);
  print_usage(argv[0]);
@@ -858,13 +868,13 @@ main(int argc, char **argv)
  }
  } else if (type == TYPE_CORE || type == TYPE_COMPAT || type == 
TYPE_ES) {
  GLuint prog = glCreateProgram();
+GLint param;
  
  for (unsigned i = 0; i < num_shaders; i++) {

  GLuint s = glCreateShader(shader[i].type);
  glShaderSource(s, 1, &shader[i].text, &shader[i].length);
  glCompileShader(s);
  
-GLint param;

  glGetShaderiv(s, GL_COMPILE_STATUS, ¶m);
  if (unlikely(!param)) {
  GLchar log[4096];
@@ -879,6 +889,38 @@ main(int argc, char **argv)
  }
  
  glLinkProgram(prog);

+
+glGetProgramiv(prog, GL_LINK_STATUS, ¶m);
+if (unlikely(!param)) {
+   GLchar log[4096];


Maybe add define for log buffer size as it's used in multiple places?



+   GLsizei length;
+   glGetProgramInfoLog(prog, 4096, &length, log);


4096 -> sizeof(log)



+
+   fprintf(stderr, "ERROR: failed to link progam:\n%s\n",
+   log);
+} else {
+   if (out_file[0] != 0) {


If changed to pointer, check for NULL.



+  char *prog_buf = (char *)malloc(10*1024*1024);
+  GLenum format;
+  GLsizei length;
+  FILE *fp;
+
+  glGetProgramBinary(prog, 10*1024*1024, &length, &format, 
prog_buf);


Use a define for size instead of magic value.


+
+  param = glGetError();
+  if (param != GL_NO_ERROR) {
+ fprintf(stderr, "ERROR: failed to get Program 
Binary\n");
+  } else {
+ fp = fopen(out_file, "wb");
+ fprintf(stdout, "Binary program is generated (%d 
Byte).\n", length);
+ fprintf(stdout, "Binary Format is %d\n", format);
+ fprintf(stdout, "Now writing to the file\n");
+ fwrite(prog_buf, sizeof(char), length, fp);
+ fclose(fp);
+  }
+  free(prog_buf);
+   }
+}
  glDeleteProgram(prog);
  } else {
  for (unsigned i = 0; i < num_shaders; i++) {




- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/6] i965: Misc sRGB and CCS fixes

2018-01-11 Thread Eero Tamminen


Hi,

This series increases performance except in SynMark DrvState, where drop 
is acceptable & expected [1].


BXT J4205 improvements:
 1-2% in GfxBench Manhattan 3.1
 1% in GfxBench CarChase & SynMark Deferred

KBL GT2 improvements:
 2-3% in GfxBench Manhattan 3.1
 1-2% in GfxBench CarChase & SynMark Deferred

KBL GT3e improvements:
 ~2% in GfxBench Manhattan 3.1 & CarChase and SynMark Deferred

By accident I had left Aztec Ruins out, but I assume ~1% drop in that is 
also fixed.



Tested-by: Eero Tamminen 


- Eero

[1] SynMark DrvState drops few percent because of the CPU overhead with 
fast clear resolving for numerous small viewports.



On 10.01.2018 21:22, Jason Ekstrand wrote:

This series contains the fixes required to fully re-enable CCS for sRGB
render buffers.  A bunch of these patches have been sent out individually
or in smaller series but they sort-of all go together.

Jason Ekstrand (6):
   i965: Call brw_cache_flush_for_render in predraw_resolve_framebuffer
   i965: Track format and aux usage in the render cache
   Re-enable regular fast-clears (CCS_D) on gen9+
   i965/miptree: Refactor CCS_E and CCS_D cases in render_aux_usage
   i965/draw: Do resolves properly for textures used by TXF
   i965: Enable CCS_E sampling of sRGB textures as UNORM

  src/mesa/drivers/dri/i965/brw_context.h   |  2 +-
  src/mesa/drivers/dri/i965/brw_draw.c  | 53 +-
  src/mesa/drivers/dri/i965/brw_meta_util.c | 10 -
  src/mesa/drivers/dri/i965/genX_blorp_exec.c   | 14 --
  src/mesa/drivers/dri/i965/intel_fbo.c | 65 +--
  src/mesa/drivers/dri/i965/intel_fbo.h |  8 +++-
  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 62 +++--
  7 files changed, 147 insertions(+), 67 deletions(-)



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] spirv: Loosten the validation for load/store type matching

2018-01-02 Thread Eero Tamminen


Hi,

On 02.01.2018 06:11, Jason Ekstrand wrote:

This patch depends on the first three patches of this series:

https://patchwork.freedesktop.org/series/35469/


I tested subset of Sacha Willems' demos with the above patch series + 
this patch.  Without them, raytracing demo (still) crashes, with them, 
it works fine.


Tested-by: Eero Tamminen 


On Mon, Jan 1, 2018 at 8:09 PM, Jason Ekstrand <mailto:ja...@jlekstrand.net>> wrote:


Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104424
<https://bugs.freedesktop.org/show_bug.cgi?id=104424>
---
  src/compiler/spirv/vtn_variables.c | 31
+--
  1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/src/compiler/spirv/vtn_variables.c
b/src/compiler/spirv/vtn_variables.c
index d69b056..48797f6 100644
--- a/src/compiler/spirv/vtn_variables.c
+++ b/src/compiler/spirv/vtn_variables.c
@@ -1899,6 +1899,28 @@ vtn_create_variable(struct vtn_builder *b,
struct vtn_value *val,
     }
  }

+static void
+vtn_assert_types_equal(struct vtn_builder *b, SpvOp opcode,
+                       struct vtn_type *dst_type, struct vtn_type
*src_type)
+{
+   if (dst_type->val == src_type->val)
+      return;
+
+   if (dst_type->type == src_type->type) {
+      /* Early versions of GLSLang would re-emit types
unnecessarily and you
+       * would end up with OpLoad, OpStore, or OpCopyMemory opcodes
which have
+       * mismatched source and destination types.
+       */
+      vtn_warn("Source and destination types of %s do not match",
+               spirv_op_to_string(opcode));
+   } else {
+      vtn_fail("Source and destination types of %s do not match: %s
vs. %s",
+               spirv_op_to_string(opcode),
+               glsl_get_type_name(dst_type->type),
+               glsl_get_type_name(src_type->type));
+   }
+}
+
  void
  vtn_handle_variables(struct vtn_builder *b, SpvOp opcode,
                       const uint32_t *w, unsigned count)
@@ -1975,8 +1997,7 @@ vtn_handle_variables(struct vtn_builder *b,
SpvOp opcode,
        struct vtn_value *dest = vtn_value(b, w[1],
vtn_value_type_pointer);
        struct vtn_value *src = vtn_value(b, w[2],
vtn_value_type_pointer);

-      vtn_fail_if(dest->type->deref != src->type->deref,
-                  "Dereferenced pointer types to OpCopyMemory do
not match");
+      vtn_assert_types_equal(b, opcode, dest->type->deref,
src->type->deref);

        vtn_variable_copy(b, dest->pointer, src->pointer);
        break;
@@ -1988,8 +2009,7 @@ vtn_handle_variables(struct vtn_builder *b,
SpvOp opcode,
        struct vtn_value *src_val = vtn_value(b, w[3],
vtn_value_type_pointer);
        struct vtn_pointer *src = src_val->pointer;

-      vtn_fail_if(res_type != src_val->type->deref,
-                  "Result and pointer types of OpLoad do not match");
+      vtn_assert_types_equal(b, opcode, res_type,
src_val->type->deref);

        if (src->mode == vtn_variable_mode_image ||
            src->mode == vtn_variable_mode_sampler) {
@@ -2006,8 +2026,7 @@ vtn_handle_variables(struct vtn_builder *b,
SpvOp opcode,
        struct vtn_pointer *dest = dest_val->pointer;
        struct vtn_value *src_val = vtn_untyped_value(b, w[2]);

-      vtn_fail_if(dest_val->type->deref != src_val->type,
-                  "Value and pointer types of OpStore do not match");
+      vtn_assert_types_equal(b, opcode, dest_val->type->deref,
src_val->type);

        if (glsl_type_is_sampler(dest->type->type)) {
           vtn_warn("OpStore of a sampler detected.  Doing
on-the-fly copy "
--
2.5.0.400.gff86faf




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] spirv: avoid infinite loop / freeze in vtn_cfg_walk_blocks()

2017-12-22 Thread Eero Tamminen


Hi,

On 22.12.2017 11:42, Eero Tamminen wrote:

On 21.12.2017 22:19, Mark Janes wrote:

This patch doesn't apply to master as formatted.


It was against master.  Could you try the attached patch instead?


Argh, I meant the patch attached to this mail.  Sorry again.


- Eero


I assume I had screwed something when I inlined it to my earlier mail. :-/



I've reverted the bisected commit, since it disables testing on master.


My fix is rather obvious, just moving few lines, to make sure loop is 
incremented every round, like it was before the bad commit.



     - Eero


Eero Tamminen  writes:


Fixes: 9702fac68e (spirv: consider bitsize when handling OpSwitch cases)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104359
---
   src/compiler/spirv/vtn_cfg.c | 14 +++---
   1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/compiler/spirv/vtn_cfg.c b/src/compiler/spirv/vtn_cfg.c
index 9c4cbe2..3d5de37 100644
--- a/src/compiler/spirv/vtn_cfg.c
+++ b/src/compiler/spirv/vtn_cfg.c
@@ -549,19 +549,19 @@ vtn_cfg_walk_blocks(struct vtn_builder *b, struct
list_head *cf_list,
   struct vtn_block *case_block =
  vtn_value(b, *w, vtn_value_type_block)->block;

-    if (case_block == break_block)
-   continue;
-
-    vtn_assert(case_block->switch_case);
-
-    vtn_order_case(swtch, case_block->switch_case);
-
   if (bitsize <= 32) {
  w += 2;
   } else {
  assert(bitsize == 64);
  w += 3;
   }
+
+    if (case_block == break_block)
+   continue;
+
+    vtn_assert(case_block->switch_case);
+
+    vtn_order_case(swtch, case_block->switch_case);
    }

    enum vtn_branch_type branch_type =
--
2.7.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



>From b4094813c55eddcb14bae712bf33d9d3ab8910d1 Mon Sep 17 00:00:00 2001
From: Eero Tamminen 
Date: Thu, 21 Dec 2017 15:30:16 +0200
Subject: [PATCH] spirv: avoid infinite loop / freeze in vtn_cfg_walk_blocks()

Fixes: 9702fac68e (spirv: consider bitsize when handling OpSwitch cases)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104359
---
 src/compiler/spirv/vtn_cfg.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/compiler/spirv/vtn_cfg.c b/src/compiler/spirv/vtn_cfg.c
index 9c4cbe2..3d5de37 100644
--- a/src/compiler/spirv/vtn_cfg.c
+++ b/src/compiler/spirv/vtn_cfg.c
@@ -549,19 +549,19 @@ vtn_cfg_walk_blocks(struct vtn_builder *b, struct list_head *cf_list,
 struct vtn_block *case_block =
vtn_value(b, *w, vtn_value_type_block)->block;
 
-if (case_block == break_block)
-   continue;
-
-vtn_assert(case_block->switch_case);
-
-vtn_order_case(swtch, case_block->switch_case);
-
 if (bitsize <= 32) {
w += 2;
 } else {
assert(bitsize == 64);
w += 3;
 }
+
+if (case_block == break_block)
+   continue;
+
+vtn_assert(case_block->switch_case);
+
+vtn_order_case(swtch, case_block->switch_case);
  }
 
  enum vtn_branch_type branch_type =
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] spirv: avoid infinite loop / freeze in vtn_cfg_walk_blocks()

2017-12-22 Thread Eero Tamminen


Hi,

On 21.12.2017 22:19, Mark Janes wrote:

This patch doesn't apply to master as formatted.


It was against master.  Could you try the attached patch instead?

I assume I had screwed something when I inlined it to my earlier mail. :-/



I've reverted the bisected commit, since it disables testing on master.


My fix is rather obvious, just moving few lines, to make sure loop is 
incremented every round, like it was before the bad commit.



    - Eero


Eero Tamminen  writes:


Fixes: 9702fac68e (spirv: consider bitsize when handling OpSwitch cases)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104359
---
   src/compiler/spirv/vtn_cfg.c | 14 +++---
   1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/compiler/spirv/vtn_cfg.c b/src/compiler/spirv/vtn_cfg.c
index 9c4cbe2..3d5de37 100644
--- a/src/compiler/spirv/vtn_cfg.c
+++ b/src/compiler/spirv/vtn_cfg.c
@@ -549,19 +549,19 @@ vtn_cfg_walk_blocks(struct vtn_builder *b, struct
list_head *cf_list,
   struct vtn_block *case_block =
  vtn_value(b, *w, vtn_value_type_block)->block;

-if (case_block == break_block)
-   continue;
-
-vtn_assert(case_block->switch_case);
-
-vtn_order_case(swtch, case_block->switch_case);
-
   if (bitsize <= 32) {
  w += 2;
   } else {
  assert(bitsize == 64);
  w += 3;
   }
+
+if (case_block == break_block)
+   continue;
+
+vtn_assert(case_block->switch_case);
+
+vtn_order_case(swtch, case_block->switch_case);
}

enum vtn_branch_type branch_type =
--
2.7.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp b/src/intel/compiler/brw_fs_copy_propagation.cpp
index af5635eacef..92cc0a8de58 100644
--- a/src/intel/compiler/brw_fs_copy_propagation.cpp
+++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
@@ -186,8 +186,7 @@ fs_copy_prop_dataflow::setup_initial_values()
 
/* Populate the initial values for the livein and liveout sets.  For the
 * block at the start of the program, livein = 0 and liveout = copy.
-* For the others, set liveout to 0 (the empty set) and livein to ~0
-* (the universal set).
+* For the others, set liveout and livein to ~0 (the universal set).
 */
foreach_block (block, cfg) {
   if (block->parents.is_empty()) {
@@ -197,7 +196,7 @@ fs_copy_prop_dataflow::setup_initial_values()
  }
   } else {
  for (int i = 0; i < bitset_words; i++) {
-bd[block->num].liveout[i] = 0u;
+bd[block->num].liveout[i] = ~0u;
 bd[block->num].livein[i] = ~0u;
  }
   }
@@ -228,34 +227,17 @@ fs_copy_prop_dataflow::run()
do {
   progress = false;
 
-  /* Update liveout for all blocks. */
   foreach_block (block, cfg) {
  if (block->parents.is_empty())
 continue;
 
  for (int i = 0; i < bitset_words; i++) {
 const BITSET_WORD old_liveout = bd[block->num].liveout[i];
-
-bd[block->num].liveout[i] =
-   bd[block->num].copy[i] | (bd[block->num].livein[i] &
- ~bd[block->num].kill[i]);
-
-if (old_liveout != bd[block->num].liveout[i])
-   progress = true;
- }
-  }
-
-  /* Update livein for all blocks.  If a copy is live out of all parent
-   * blocks, it's live coming in to this block.
-   */
-  foreach_block (block, cfg) {
- if (block->parents.is_empty())
-continue;
-
- for (int i = 0; i < bitset_words; i++) {
-const BITSET_WORD old_livein = bd[block->num].livein[i];
 BITSET_WORD livein_from_any_block = 0;
 
+/* Update livein for this block.  If a copy is live out of all
+ * parent blocks, it's live coming in to this block.
+ */
 bd[block->num].livein[i] = ~0u;
 foreach_list_typed(bblock_link, parent_link, link, &block->parents) {
bblock_t *parent = parent_link->block;
@@ -278,7 +260,12 @@ fs_copy_prop_dataflow::run()
  */
 bd[block->num].livein[i] &= livein_from_any_block;
 
-if (old_livein != bd[block->num].livein[i])
+/* Update liveout for this block. */
+bd[block->num].liveout[i] =
+   bd[block->num].copy[i] | (bd[block->num].livein[i] &
+ ~bd[block->num].kill[i]);
+
+if (old_liveout != bd[block->num].liveout[i])
progress = true;
  }
   }
__

Re: [Mesa-dev] [PATCHv2] intel/fs: Optimize and simplify the copy propagation dataflow logic.

2017-12-21 Thread Eero Tamminen


Hi,

I tested this on HSW GT2, BXT & SKL GT3e, and didn't see any significant 
regressions this time.  I'll try it also on a machine with smaller 
variance than those (now that it became free), and send a note if that 
does show something.


- Eero

On 20.12.2017 21:27, Francisco Jerez wrote:

Previously the dataflow propagation algorithm would calculate the ACP
live-in and -out sets in a two-pass fixed-point algorithm.  The first
pass would update the live-out sets of all basic blocks of the program
based on their live-in sets, while the second pass would update the
live-in sets based on the live-out sets.  This is incredibly
inefficient in the typical case where the CFG of the program is
approximately acyclic, because it can take up to 2*n passes for an ACP
entry introduced at the top of the program to reach the bottom (where
n is the number of basic blocks in the program), until which point the
algorithm won't be able to reach a fixed point.

The same effect can be achieved in a single pass by computing the
live-in and -out sets in lock-step, because that makes sure that
processing of any basic block will pick up the updated live-out sets
of the lexically preceding blocks.  This gives the dataflow
propagation algorithm effectively O(n) run-time instead of O(n^2) in
the acyclic case.

The time spent in dataflow propagation is reduced by 30x in the
GLES31.functional.ssbo.layout.random.all_shared_buffer.5 dEQP
test-case on my CHV system (the improvement is likely to be of the
same order of magnitude on other platforms).  This more than reverses
an apparent run-time regression in this test-case from my previous
copy-propagation undefined-value handling patch, which was ultimately
caused by the additional work introduced in that commit to account for
undefined values being multiplied by a huge quadratic factor.

According to Chad this test was failing on CHV due to a 30s time-out
imposed by the Android CTS (this was the case regardless of my
undefined-value handling patch, even though my patch substantially
exacerbated the issue).  On my CHV system this patch reduces the
overall run-time of the test by approximately 12x, getting us to
around 13s, well below the time-out.

v2: Initialize live-out set to the universal set to avoid rather
 pessimistic dataflow estimation in shaders with cycles (Addresses
 performance regression reported by Eero in GpuTest Piano).
 Performance numbers given above still apply.  No shader-db changes
 with respect to master.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104271
Reported-by: Chad Versace 
---
  src/intel/compiler/brw_fs_copy_propagation.cpp | 35 --
  1 file changed, 11 insertions(+), 24 deletions(-)

diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp 
b/src/intel/compiler/brw_fs_copy_propagation.cpp
index af5635eacef..92cc0a8de58 100644
--- a/src/intel/compiler/brw_fs_copy_propagation.cpp
+++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
@@ -186,8 +186,7 @@ fs_copy_prop_dataflow::setup_initial_values()
  
 /* Populate the initial values for the livein and liveout sets.  For the

  * block at the start of the program, livein = 0 and liveout = copy.
-* For the others, set liveout to 0 (the empty set) and livein to ~0
-* (the universal set).
+* For the others, set liveout and livein to ~0 (the universal set).
  */
 foreach_block (block, cfg) {
if (block->parents.is_empty()) {
@@ -197,7 +196,7 @@ fs_copy_prop_dataflow::setup_initial_values()
   }
} else {
   for (int i = 0; i < bitset_words; i++) {
-bd[block->num].liveout[i] = 0u;
+bd[block->num].liveout[i] = ~0u;
  bd[block->num].livein[i] = ~0u;
   }
}
@@ -228,34 +227,17 @@ fs_copy_prop_dataflow::run()
 do {
progress = false;
  
-  /* Update liveout for all blocks. */

foreach_block (block, cfg) {
   if (block->parents.is_empty())
  continue;
  
   for (int i = 0; i < bitset_words; i++) {

  const BITSET_WORD old_liveout = bd[block->num].liveout[i];
-
-bd[block->num].liveout[i] =
-   bd[block->num].copy[i] | (bd[block->num].livein[i] &
- ~bd[block->num].kill[i]);
-
-if (old_liveout != bd[block->num].liveout[i])
-   progress = true;
- }
-  }
-
-  /* Update livein for all blocks.  If a copy is live out of all parent
-   * blocks, it's live coming in to this block.
-   */
-  foreach_block (block, cfg) {
- if (block->parents.is_empty())
-continue;
-
- for (int i = 0; i < bitset_words; i++) {
-const BITSET_WORD old_livein = bd[block->num].livein[i];
  BITSET_WORD livein_from_any_block = 0;
  
+/* Update livein for this block.  If a copy is live out of all

+ * parent blocks, it's live coming i

[Mesa-dev] [PATCH] spirv: avoid infinite loop / freeze in vtn_cfg_walk_blocks()

2017-12-21 Thread Eero Tamminen


Fixes: 9702fac68e (spirv: consider bitsize when handling OpSwitch cases)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104359
---
 src/compiler/spirv/vtn_cfg.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/compiler/spirv/vtn_cfg.c b/src/compiler/spirv/vtn_cfg.c
index 9c4cbe2..3d5de37 100644
--- a/src/compiler/spirv/vtn_cfg.c
+++ b/src/compiler/spirv/vtn_cfg.c
@@ -549,19 +549,19 @@ vtn_cfg_walk_blocks(struct vtn_builder *b, struct 
list_head *cf_list,

 struct vtn_block *case_block =
vtn_value(b, *w, vtn_value_type_block)->block;

-if (case_block == break_block)
-   continue;
-
-vtn_assert(case_block->switch_case);
-
-vtn_order_case(swtch, case_block->switch_case);
-
 if (bitsize <= 32) {
w += 2;
 } else {
assert(bitsize == 64);
w += 3;
 }
+
+if (case_block == break_block)
+   continue;
+
+vtn_assert(case_block->switch_case);
+
+vtn_order_case(swtch, case_block->switch_case);
  }

  enum vtn_branch_type branch_type =
--
2.7.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] intel/fs: Optimize and simplify the copy propagation dataflow logic.

2017-12-20 Thread Eero Tamminen


Hi,

I got unexpected results with this, when testing it on BXT & SKL GT2.

While performance in GpuTest Volplosion and GfxBench v4 Tessellation 
test improved slightly, performance in SynMark v7 CSDof and GpuTest v0.7 
Piano dropped clearly.


Piano dropped only on SKL, but there the drop was >10%.  There was also 
a peculiar change in its CPU vs. GPU utilization at the start of the 
test.  With this patch, there's 5-10s of 100% CPU usage before the test 
starts, which extra CPU usage happens only with this patch.


Attached is Callgrind output of where the extra time at startup goes.

(I didn't see that issue with the other tests which perf changed.)


- Eero

On 20.12.2017 04:19, Francisco Jerez wrote:

Previously the dataflow propagation algorithm would calculate the ACP
live-in and -out sets in a two-pass fixed-point algorithm.  The first
pass would update the live-out sets of all basic blocks of the program
based on their live-in sets, while the second pass would update the
live-in sets based on the live-out sets.  This is incredibly
inefficient in the typical case where the CFG of the program is
approximately acyclic, because it can take up to 2*n passes for an ACP
entry introduced at the top of the program to reach the bottom (where
n is the number of basic blocks in the program), until which point the
algorithm won't be able to reach a fixed point.

The same effect can be achieved in a single pass by computing the
live-in and -out sets in lock-step, because that makes sure that
processing of any basic block will pick up the updated live-out sets
of the lexically preceding blocks.  This gives the dataflow
propagation algorithm effectively O(n) run-time instead of O(n^2) in
the acyclic case.

The time spent in dataflow propagation is reduced by 30x in the
GLES31.functional.ssbo.layout.random.all_shared_buffer.5 dEQP
test-case on my CHV system (the improvement is likely to be of the
same order of magnitude on other platforms).  This more than reverses
an apparent run-time regression in this test-case from my previous
copy-propagation undefined-value handling patch, which was ultimately
caused by the additional work introduced in that commit to account for
undefined values being multiplied by a huge quadratic factor.

According to Chad this test was failing on CHV due to a 30s time-out
imposed by the Android CTS (this was the case regardless of my
undefined-value handling patch, even though my patch substantially
exacerbated the issue).  On my CHV system this patch reduces the
overall run-time of the test by approximately 12x, getting us to
around 13s, well below the time-out.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104271
Reported-by: Chad Versace 
---
  src/intel/compiler/brw_fs_copy_propagation.cpp | 30 --
  1 file changed, 9 insertions(+), 21 deletions(-)

diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp 
b/src/intel/compiler/brw_fs_copy_propagation.cpp
index af5635eacef..8d6ab18cd13 100644
--- a/src/intel/compiler/brw_fs_copy_propagation.cpp
+++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
@@ -228,34 +228,17 @@ fs_copy_prop_dataflow::run()
 do {
progress = false;
  
-  /* Update liveout for all blocks. */

foreach_block (block, cfg) {
   if (block->parents.is_empty())
  continue;
  
   for (int i = 0; i < bitset_words; i++) {

  const BITSET_WORD old_liveout = bd[block->num].liveout[i];
-
-bd[block->num].liveout[i] =
-   bd[block->num].copy[i] | (bd[block->num].livein[i] &
- ~bd[block->num].kill[i]);
-
-if (old_liveout != bd[block->num].liveout[i])
-   progress = true;
- }
-  }
-
-  /* Update livein for all blocks.  If a copy is live out of all parent
-   * blocks, it's live coming in to this block.
-   */
-  foreach_block (block, cfg) {
- if (block->parents.is_empty())
-continue;
-
- for (int i = 0; i < bitset_words; i++) {
-const BITSET_WORD old_livein = bd[block->num].livein[i];
  BITSET_WORD livein_from_any_block = 0;
  
+/* Update livein for this block.  If a copy is live out of all

+ * parent blocks, it's live coming in to this block.
+ */
  bd[block->num].livein[i] = ~0u;
  foreach_list_typed(bblock_link, parent_link, link, 
&block->parents) {
 bblock_t *parent = parent_link->block;
@@ -278,7 +261,12 @@ fs_copy_prop_dataflow::run()
   */
  bd[block->num].livein[i] &= livein_from_any_block;
  
-if (old_livein != bd[block->num].livein[i])

+/* Update liveout for this block. */
+bd[block->num].liveout[i] =
+   bd[block->num].copy[i] | (bd[block->num].livein[i] &
+ ~bd[block->num].kill[i]);
+
+if (old_live

Re: [Mesa-dev] [PATCH] intel/fs: Optimize and simplify the copy propagation dataflow logic.

2017-12-20 Thread Eero Tamminen

Hi,

On 20.12.2017 16:29, Eero Tamminen wrote:

I got unexpected results with this, when testing it on BXT & SKL GT2.

While performance in GpuTest Volplosion and GfxBench v4 Tessellation 
test improved slightly, performance in SynMark v7 CSDof and GpuTest v0.7 
Piano dropped clearly.

Piano dropped only on SKL, but there the drop was >10%.

When looking at the generated shader assembly before and after, there's 
one huge shader for which Mesa is able to generated only SIMD8 assembly.

Before the patch:

SIMD8 shader: 2727 instructions. 5 loops. 470840 cycles. 0:0 
spills:fills. Promoted 58 constants. Compacted 43632 to 32560 bytes (25%)

After the patch:

SIMD8 shader: 4784 instructions. 5 loops. 808006 cycles. 212:367 
spills:fills. Promoted 9 constants. Compacted 76544 to 46192 bytes (40%)

Because shader spills i.e. there are no free regs, there are also more 
register bank conflicts.

- Eero

> There was also
> a peculiar change in its CPU vs. GPU utilization at the start of the
> test.  With this patch, there's 5-10s of 100% CPU usage before the test
> starts, which extra CPU usage happens only with this patch.
>
> Attached is Callgrind output of where the extra time at startup goes.
>

(I didn't see that issue with the other tests which perf changed.)

 - Eero

On 20.12.2017 04:19, Francisco Jerez wrote:

Previously the dataflow propagation algorithm would calculate the ACP
live-in and -out sets in a two-pass fixed-point algorithm.  The first
pass would update the live-out sets of all basic blocks of the program
based on their live-in sets, while the second pass would update the
live-in sets based on the live-out sets.  This is incredibly
inefficient in the typical case where the CFG of the program is
approximately acyclic, because it can take up to 2*n passes for an ACP
entry introduced at the top of the program to reach the bottom (where
n is the number of basic blocks in the program), until which point the
algorithm won't be able to reach a fixed point.

The same effect can be achieved in a single pass by computing the
live-in and -out sets in lock-step, because that makes sure that
processing of any basic block will pick up the updated live-out sets
of the lexically preceding blocks.  This gives the dataflow
propagation algorithm effectively O(n) run-time instead of O(n^2) in
the acyclic case.

The time spent in dataflow propagation is reduced by 30x in the
GLES31.functional.ssbo.layout.random.all_shared_buffer.5 dEQP
test-case on my CHV system (the improvement is likely to be of the
same order of magnitude on other platforms).  This more than reverses
an apparent run-time regression in this test-case from my previous
copy-propagation undefined-value handling patch, which was ultimately
caused by the additional work introduced in that commit to account for
undefined values being multiplied by a huge quadratic factor.

According to Chad this test was failing on CHV due to a 30s time-out
imposed by the Android CTS (this was the case regardless of my
undefined-value handling patch, even though my patch substantially
exacerbated the issue).  On my CHV system this patch reduces the
overall run-time of the test by approximately 12x, getting us to
around 13s, well below the time-out.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104271
Reported-by: Chad Versace 
---
  src/intel/compiler/brw_fs_copy_propagation.cpp | 30 
--

  1 file changed, 9 insertions(+), 21 deletions(-)

diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp 
b/src/intel/compiler/brw_fs_copy_propagation.cpp

index af5635eacef..8d6ab18cd13 100644
--- a/src/intel/compiler/brw_fs_copy_propagation.cpp
+++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
@@ -228,34 +228,17 @@ fs_copy_prop_dataflow::run()
 do {
    progress = false;
-  /* Update liveout for all blocks. */
    foreach_block (block, cfg) {
   if (block->parents.is_empty())
  continue;
   for (int i = 0; i < bitset_words; i++) {
  const BITSET_WORD old_liveout = bd[block->num].liveout[i];
-
-    bd[block->num].liveout[i] =
-   bd[block->num].copy[i] | (bd[block->num].livein[i] &
- ~bd[block->num].kill[i]);
-
-    if (old_liveout != bd[block->num].liveout[i])
-   progress = true;
- }
-  }
-
-  /* Update livein for all blocks.  If a copy is live out of all 
parent

-   * blocks, it's live coming in to this block.
-   */
-  foreach_block (block, cfg) {
- if (block->parents.is_empty())
-    continue;
-
- for (int i = 0; i < bitset_words; i++) {
-    const BITSET_WORD old_livein = bd[block->num].livein[i];
  BITSET_WORD livein_from_any_block = 0;
+    /* Update livein for this block.  If a

Re: [Mesa-dev] [PATCH 3/3] Revert "i965: Disable regular fast-clears (CCS_D) on gen9+"

2017-12-14 Thread Eero Tamminen


Hi,

As expected, this series fixes the perf regression in GfxBench when fast 
clears were disabled.  On SKL GT2:

* 2-5% Manhattan 3.1
* 1% AztecRuins & CarChase (on top of Francisco's large improvement 
between the perf regression and this fix)


On 14.12.2017 03:54, Jason Ekstrand wrote:

Better commit message:

     Re-enable regular fast-clears (CCS_D) on gen9+

     This reverts commit ee57b15ec764736e2d5360beaef9fb2045ed0f68, "i965:
     Disable regular fast-clears (CCS_D) on gen9+".  How taht we've 
fixed the

     issue with too many different aux usages in the render cache, it should
     be safe to re-enable CCS_D for sRGB.


* s/How taht/Now that/

* Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=104163

* Tested-by: Eero Tamminen 


- Eero


On Wed, Dec 13, 2017 at 5:52 PM, Jason Ekstrand <mailto:ja...@jlekstrand.net>> wrote:


This reverts commit ee57b15ec764736e2d5360beaef9fb2045ed0f68.

Cc: "17.3" mailto:mesa-sta...@lists.freedesktop.org>>
---
  src/mesa/drivers/dri/i965/brw_meta_util.c     | 10 -
  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 57
---
  2 files changed, 25 insertions(+), 42 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_util.c
b/src/mesa/drivers/dri/i965/brw_meta_util.c
index 54dc6a5..b311815 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_util.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_util.c
@@ -293,17 +293,7 @@ brw_is_color_fast_clear_compatible(struct
brw_context *brw,
         brw->mesa_to_isl_render_format[mt->format])
        return false;

-   /* Gen9 doesn't support fast clear on single-sampled SRGB
buffers. When
-    * GL_FRAMEBUFFER_SRGB is enabled any color renderbuffers will be
-    * resolved in intel_update_state. In that case it's pointless
to do a
-    * fast clear because it's very likely to be immediately resolved.
-    */
     const bool srgb_rb = _mesa_get_srgb_format_linear(mt->format)
!= mt->format;
-   if (devinfo->gen >= 9 &&
-       mt->surf.samples == 1 &&
-       ctx->Color.sRGBEnabled && srgb_rb)
-      return false;
-
    /* Gen10 doesn't automatically decode the clear color of sRGB
buffers. Since
     * we currently don't perform this decode in software, avoid a
fast-clear
     * altogether. TODO: Do this in software.
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index c1a4ce1..b87d356 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -207,13 +207,7 @@ intel_miptree_supports_ccs(struct brw_context *brw,
     if (!brw->mesa_format_supports_render[mt->format])
        return false;

-   if (devinfo->gen >= 9) {
-      mesa_format linear_format =
_mesa_get_srgb_format_linear(mt->format);
-      const enum isl_format isl_format =
-         brw_isl_format_for_mesa_format(linear_format);
-      return isl_format_supports_ccs_e(&brw->screen->devinfo,
isl_format);
-   } else
-      return true;
+   return true;
  }

  static bool
@@ -256,7 +250,7 @@ intel_miptree_supports_hiz(const struct
brw_context *brw,
   * our HW tends to support more linear formats than sRGB ones, we
use this
   * format variant for check for CCS_E compatibility.
   */
-MAYBE_UNUSED static bool
+static bool
  format_ccs_e_compat_with_miptree(const struct gen_device_info
*devinfo,
                                   const struct intel_mipmap_tree *mt,
                                   enum isl_format access_format)
@@ -290,12 +284,13 @@ intel_miptree_supports_ccs_e(struct
brw_context *brw,
     if (!intel_miptree_supports_ccs(brw, mt))
        return false;

-   /* Fast clear can be also used to clear srgb surfaces by using
equivalent
-    * linear format. This trick, however, can't be extended to be
used with
-    * lossless compression and therefore a check is needed to see
if the format
-    * really is linear.
+   /* Many window system buffers are sRGB even if they are never
rendered as
+    * sRGB.  For those, we want CCS_E for when sRGBEncode is
false.  When the
+    * surface is used as sRGB, we fall back to CCS_D.
      */
-   return _mesa_get_srgb_format_linear(mt->format) == mt->format;
+   mesa_format linear_format =
_mesa_get_srgb_format_linear(mt->format);
+   enum isl_format isl_format =
brw_isl_format_for_mesa_format(linear_format);
+   return isl_format_supports_ccs_e(&brw->screen->devinfo, isl_format);
  }

  /**
@@ -2686,29 +2681,27 @@ intel_miptre

Re: [Mesa-dev] [PATCH 0/2] i965: scratch space fixes (v2)

2017-12-12 Thread Eero Tamminen


Hi,

Tested-by: Eero Tamminen 

Fixes GPU hangs I'm seeing also with the *GL* version of CarChase on KBL 
GT3e (when using Ubuntu 16.04 kernel).


- Eero

On 12.12.2017 14:17, kevin.rogo...@intel.com wrote:

From: Kevin Rogovin 

This patch series offers a readability improvement for programming
MEDIA_VFE_STATE and fixes a scratch space sizing bug for Gen9.
Together with the ASTC5x5 fixes posted before, carchase on GLES
works on my SKL GT4.

v2:
  correctly state that first patch is just readability patch
  corrently compute subslices as 4 times number slices

Kevin Rogovin (2):
   i965: Program MEDIA_VFE_STATE in a more readable fashion.
   i965: compute scratch space size correctly for Gen9

  src/mesa/drivers/dri/i965/brw_program.c   |  6 +-
  src/mesa/drivers/dri/i965/genX_state_upload.c | 19 +--
  2 files changed, 18 insertions(+), 7 deletions(-)



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] intel/fs: Implement GRF bank conflict mitigation pass.

2017-12-12 Thread Eero Tamminen


Hi,

On 11.12.2017 12:28, Eero Tamminen wrote:

Thanks for finally having this handled in Mesa!

This patch series, live intervals and "Don't let undefined values 
prevent copy propagation" commits help performance in following tests:

* GfxBench CarChase (2% by live intervals)


That was on SKL GT2.

On BXT J4205, it (or the whole set of commits) improved CarChase by 4-5%!



* GfxBench AztecRuins & Manhattan 3.0 (very marginally)
* GfxBench Tessellation & ALU (not ALU2)
* GpuTest Volplosion & Julia FP64 (maybe also FurMark)
* SynMark CSDof (2-3% by copy propagation)
* SynMark PSPom (1% by live intervals)

Most visible improvements are on (all) GEN9+ platforms, but several of 
them are visible also on earlier GENS.



Shader compilation speed (in SynMark DrvShComp) drops by ~10%, mainly 
from the the copy propagation commit.



Live intervals commit may have introduced small rendering regression in 
DOTA2 (Vulkan version), I'll check that next.


I wasn't able to reproduce that with real games, so I assume it to be an 
issue with Vulkan trace / replay used for the render validation (trace 
replay showed a lot of barrier errors with Vulkan API validation).


-> Seems I need to update our DOTA2 (and other games) validation traces 
after updating vktrace/replay.  Hopefully new versions work better with 
latest Mesa code.



- Eero


On 06.12.2017 22:38, Francisco Jerez wrote:

This series (which is ready for production and improves the cycle count
of over 46k shaders) has been sitting here for nearly half a year.  I'm
planning to self-review it and land it (along with PATCH 3/2 I just sent
to make sure we keep regressions under control) if nobody else does in
the next two weeks.

Francisco Jerez  writes:


Unnecessary GRF bank conflicts increase the issue time of ternary
instructions (the overwhelmingly most common of which is MAD) by
roughly 50%, leading to reduced ALU throughput.  This pass attempts to
minimize the number of bank conflicts by rearranging the layout of the
GRF space post-register allocation.  It's in general not possible to
eliminate all of them without introducing extra copies, which are
typically more expensive than the bank conflict itself.

In a shader-db run on SKL this helps roughly 46k shaders:

    total conflicts in shared programs: 1008981 -> 600461 (-40.49%)
    conflicts in affected programs: 816222 -> 407702 (-50.05%)
    helped: 46234
    HURT: 72

The running time of shader-db itself on SKL seems to be increased by
roughly 2.52%±1.13% with n=20 due to the additional work done by the
compiler back-end.

On earlier generations the pass is somewhat less effective in relative
terms because the hardware incurs a bank conflict anytime the last two
sources of the instruction are duplicate (e.g. while trying to square
a value using MAD), which is impossible to avoid without introducing
copies.  E.g. for a shader-db run on SNB:

    total conflicts in shared programs: 944636 -> 623185 (-34.03%)
    conflicts in affected programs: 853258 -> 531807 (-37.67%)
    helped: 31052
    HURT: 19

And on BDW:

    total conflicts in shared programs: 1418393 -> 987539 (-30.38%)
    conflicts in affected programs: 1179787 -> 748933 (-36.52%)
    helped: 47592
    HURT: 70

On SKL GT4e this improves performance of GpuTest Volplosion by 3.64%
±0.33% with n=16.

NOTE: This patch intentionally disregards some i965 coding conventions
   for the sake of reviewability.  This is addressed by the next
   squash patch which introduces an amount of (for the most part
   boring) boilerplate that might distract reviewers from the
   non-trivial algorithmic details of the pass.
---
  src/intel/Makefile.sources   |   1 +
  src/intel/compiler/brw_fs.cpp    |   2 +
  src/intel/compiler/brw_fs.h  |   1 +
  src/intel/compiler/brw_fs_bank_conflicts.cpp | 791 
+++

  4 files changed, 795 insertions(+)
  create mode 100644 src/intel/compiler/brw_fs_bank_conflicts.cpp

diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources
index a877ff2..1b9799a 100644
--- a/src/intel/Makefile.sources
+++ b/src/intel/Makefile.sources
@@ -44,6 +44,7 @@ COMPILER_FILES = \
  compiler/brw_eu_util.c \
  compiler/brw_eu_validate.c \
  compiler/brw_fs_builder.h \
+    compiler/brw_fs_bank_conflicts.cpp \
  compiler/brw_fs_cmod_propagation.cpp \
  compiler/brw_fs_combine_constants.cpp \
  compiler/brw_fs_copy_propagation.cpp \
diff --git a/src/intel/compiler/brw_fs.cpp 
b/src/intel/compiler/brw_fs.cpp

index 43b6e34..0a85c0c 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -5858,6 +5858,8 @@ fs_visitor::allocate_registers(bool 
allow_spilling)

 if (failed)
    return;
+   opt_bank_conflicts();
+
 schedule_instructions(SCHEDULE_POST);
 if (last_scratch > 0) {
diff --git a/src/intel/compil

Re: [Mesa-dev] [PATCH 0/2] i965: scratch space fixes

2017-12-12 Thread Eero Tamminen


Hi,

Tested-by: Eero Tamminen 

On 12.12.2017 12:04, kevin.rogo...@intel.com wrote:

From: Kevin Rogovin 

This patch series fixes 2 issues for scratch space
on compute shaders for GEN. Together with the ASTC5x5
fixes posted before, carchase on GLES works on my SKL
GT4.


These patches fix GPU hangs I'm seeing also with the *GL* version of 
CarChase on KBL GT3e, when using Ubuntu 16.04 kernel.


NOTE: those hangs don't happen when doing tests with latest drm-tip 
kernel.  Besides the older Ubuntu kernel, I'm seeing hangs also with the 
4.13 drm-tip kernel, but these have happened only since early November 
version of Mesa.



- Eero



Kevin Rogovin (2):
   i965: correctly program MEDIA_VFE_STATE for compute shading
   i965: compute scratch space size correctly for Gen9

  src/mesa/drivers/dri/i965/brw_program.c   | 12 ++--
  src/mesa/drivers/dri/i965/genX_state_upload.c | 19 +--
  2 files changed, 23 insertions(+), 8 deletions(-)



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] intel/fs: Implement GRF bank conflict mitigation pass.

2017-12-11 Thread Eero Tamminen


Hi,

Thanks for finally having this handled in Mesa!

This patch series, live intervals and "Don't let undefined values 
prevent copy propagation" commits help performance in following tests:

* GfxBench CarChase (2% by live intervals)
* GfxBench AztecRuins & Manhattan 3.0 (very marginally)
* GfxBench Tessellation & ALU (not ALU2)
* GpuTest Volplosion & Julia FP64 (maybe also FurMark)
* SynMark CSDof (2-3% by copy propagation)
* SynMark PSPom (1% by live intervals)

Most visible improvements are on (all) GEN9+ platforms, but several of 
them are visible also on earlier GENS.



Shader compilation speed (in SynMark DrvShComp) drops by ~10%, mainly 
from the the copy propagation commit.



Live intervals commit may have introduced small rendering regression in 
DOTA2 (Vulkan version), I'll check that next.



- Eero

On 06.12.2017 22:38, Francisco Jerez wrote:

This series (which is ready for production and improves the cycle count
of over 46k shaders) has been sitting here for nearly half a year.  I'm
planning to self-review it and land it (along with PATCH 3/2 I just sent
to make sure we keep regressions under control) if nobody else does in
the next two weeks.

Francisco Jerez  writes:


Unnecessary GRF bank conflicts increase the issue time of ternary
instructions (the overwhelmingly most common of which is MAD) by
roughly 50%, leading to reduced ALU throughput.  This pass attempts to
minimize the number of bank conflicts by rearranging the layout of the
GRF space post-register allocation.  It's in general not possible to
eliminate all of them without introducing extra copies, which are
typically more expensive than the bank conflict itself.

In a shader-db run on SKL this helps roughly 46k shaders:

total conflicts in shared programs: 1008981 -> 600461 (-40.49%)
conflicts in affected programs: 816222 -> 407702 (-50.05%)
helped: 46234
HURT: 72

The running time of shader-db itself on SKL seems to be increased by
roughly 2.52%±1.13% with n=20 due to the additional work done by the
compiler back-end.

On earlier generations the pass is somewhat less effective in relative
terms because the hardware incurs a bank conflict anytime the last two
sources of the instruction are duplicate (e.g. while trying to square
a value using MAD), which is impossible to avoid without introducing
copies.  E.g. for a shader-db run on SNB:

total conflicts in shared programs: 944636 -> 623185 (-34.03%)
conflicts in affected programs: 853258 -> 531807 (-37.67%)
helped: 31052
HURT: 19

And on BDW:

total conflicts in shared programs: 1418393 -> 987539 (-30.38%)
conflicts in affected programs: 1179787 -> 748933 (-36.52%)
helped: 47592
HURT: 70

On SKL GT4e this improves performance of GpuTest Volplosion by 3.64%
±0.33% with n=16.

NOTE: This patch intentionally disregards some i965 coding conventions
   for the sake of reviewability.  This is addressed by the next
   squash patch which introduces an amount of (for the most part
   boring) boilerplate that might distract reviewers from the
   non-trivial algorithmic details of the pass.
---
  src/intel/Makefile.sources   |   1 +
  src/intel/compiler/brw_fs.cpp|   2 +
  src/intel/compiler/brw_fs.h  |   1 +
  src/intel/compiler/brw_fs_bank_conflicts.cpp | 791 +++
  4 files changed, 795 insertions(+)
  create mode 100644 src/intel/compiler/brw_fs_bank_conflicts.cpp

diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources
index a877ff2..1b9799a 100644
--- a/src/intel/Makefile.sources
+++ b/src/intel/Makefile.sources
@@ -44,6 +44,7 @@ COMPILER_FILES = \
compiler/brw_eu_util.c \
compiler/brw_eu_validate.c \
compiler/brw_fs_builder.h \
+compiler/brw_fs_bank_conflicts.cpp \
compiler/brw_fs_cmod_propagation.cpp \
compiler/brw_fs_combine_constants.cpp \
compiler/brw_fs_copy_propagation.cpp \
diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 43b6e34..0a85c0c 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -5858,6 +5858,8 @@ fs_visitor::allocate_registers(bool allow_spilling)
 if (failed)
return;
  
+   opt_bank_conflicts();

+
 schedule_instructions(SCHEDULE_POST);
  
 if (last_scratch > 0) {

diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index 6c8c027..b1fc7b3 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -141,6 +141,7 @@ public:
 exec_list *acp);
 bool opt_drop_redundant_mov_to_flags();
 bool opt_register_renaming();
+   bool opt_bank_conflicts();
 bool register_coalesce();
 bool compute_to_mrf();
 bool eliminate_find_live_channel();
diff --git a/src/intel/compiler/brw_fs_bank_conflicts.cpp 
b/src/intel/compiler/brw_fs_bank_conflicts.cpp
new file mode 100644
index 000..0225c70
--- /

Re: [Mesa-dev] GPU (and system) monitoring

2017-11-21 Thread Eero Tamminen


Hi,

On 21.11.2017 02:39, Gordon Haverland wrote:

On Mon, 20 Nov 2017 13:25:26 +0100
Nicolai Hähnle  wrote:


... umr does
that for the amdgpu kernel module


I downloaded the source tree, compiled umr, mounted the debugfs to use
umr, and ran
   umr --top

switching to sensor mode, you could see loads.  And I collected a few
second of information to a log file.  Then I umounted the debugfs.

The load values can change from quite high to quite low from sampling
interval to interval.  I think some kind of averaging would probably
need to be done, just taking a single load value chosen at random would
not be useful.

In a sense, temperature is going to be an average related to power
(load), so maybe just having the temperatures may be enough?


1. Temperature is affected by the temperature of the surrounding media, 
power usage less so
2. Temperature sensor might not be exactly where current load is using 
most power (e.g. ALUs vs. memory)




For now, I think I should collect a longer series of data, and then see
how the (moving) average behaves with respect to how many data points go
into the (moving) average.



- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

1 2 3 >

1 - 100 of 255 matches

Mail list logo