Re: [Mesa-dev] Profile-guides optimizations

2020-02-13 Thread Timur Kristóf
The GCC wiki says:

"GCC uses execution profiles consisting of basic block and edge frequency 
counts to guide optimizations such as instruction scheduling, basic block 
reordering, function splitting, and register allocation."

More info here:
https://gcc.gnu.org/wiki/AutoFDO/Tutorial

Timur

On Friday, 14 February 2020, Marek Olšák wrote:
> Yeah I guess it reduces instruction cache misses, but then other codepaths
> are likely to get more misses.
> 
> Does it do anything smarter?
> 
> Marek
> 
> On Thu., Feb. 13, 2020, 17:52 Dave Airlie,  wrote:
> 
> > On Fri, 14 Feb 2020 at 08:22, Marek Olšák  wrote:
> > >
> > > I wonder what PGO really does other than placing likely/unlikely.
> >
> > With LTO it can do a lot more, like grouping hot functions into closer
> > regions so they avoid TLB misses and faults etc.
> >
> > Dave.
> >
>

-- 
Sent from my Sailfish device
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Profile-guides optimizations

2020-02-13 Thread Marek Olšák
Yeah I guess it reduces instruction cache misses, but then other codepaths
are likely to get more misses.

Does it do anything smarter?

Marek

On Thu., Feb. 13, 2020, 17:52 Dave Airlie,  wrote:

> On Fri, 14 Feb 2020 at 08:22, Marek Olšák  wrote:
> >
> > I wonder what PGO really does other than placing likely/unlikely.
>
> With LTO it can do a lot more, like grouping hot functions into closer
> regions so they avoid TLB misses and faults etc.
>
> Dave.
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Profile-guides optimizations

2020-02-13 Thread Dave Airlie
On Fri, 14 Feb 2020 at 08:22, Marek Olšák  wrote:
>
> I wonder what PGO really does other than placing likely/unlikely.

With LTO it can do a lot more, like grouping hot functions into closer
regions so they avoid TLB misses and faults etc.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Profile-guides optimizations

2020-02-13 Thread Marek Olšák
I wonder what PGO really does other than placing likely/unlikely.

Marek

On Thu., Feb. 13, 2020, 13:43 Dylan Baker,  wrote:

> I actually spent a bunch of time toying with PGO a couple of years ago. I
> got
> the guidance all working and was able to train it, but what we found was
> that it
> made the specific workloads we threw at it much faster, but it made every
> real
> world use case I tried (playing a game, running piglit, gnome) slower,
> often
> significantly so.
>
> The hard part is not setting up pgo, it's getting the right training data.
>
> Dylan
>
> Quoting Marek Olšák (2020-02-13 10:30:46)
> > [Forked from the other thread]
> >
> > Guys, we could run some simple tests similar to piglit/drawoverhead as
> the last
> > step of the pgo=generate build. Tests like that should exercise the most
> common
> > codepaths in drivers. We could add subtests that we care about the most.
> >
> > Marek
> >
> > On Thu., Feb. 13, 2020, 13:16 Dylan Baker,  wrote:
> >
> > meson has buildtins for both of these, -Db_lto=true turns on lto,
> for pgo
> > you
> > would run:
> >
> > meson build -Db_pgo=generate
> > ninja -C build
> > 
> > meson configure build -Db_pgo=use
> > ninja -C build
> >
> > Quoting Marek Olšák (2020-02-12 10:46:12)
> > > How do you enable LTO+PGO? Is it something we could enable by
> default for
> > > release builds?
> > >
> > > Marek
> > >
> > > On Wed, Feb 12, 2020 at 1:56 AM Dieter Nützel <
> die...@nuetzel-hh.de>
> > wrote:
> > >
> > > Hello Gert,
> > >
> > > your merge 'broke' LTO and then later on PGO
> compilation/linking.
> > >
> > > I do generally compiling with '-Dgallium-drivers=
> > r600,radeonsi,swrast'
> > > for testing radeonsi and (your) r600 work. ;-)
> > >
> > > After your merge I get several warnings in 'addrlib' with LTO
> and
> > even a
> > > compiler error (gcc (SUSE Linux) 9.2.1 20200128).
> > >
> > > I had to disable 'r600' ('swrast' is needed for 'nine') to get
> a
> > working
> > > LTO and even better PGO radeonsi driver.
> > > I'm preparing GREAT LTO+PGO (the later is the greater) numbers
> over
> > the
> > > last 2 months. I'll send my results later, today.
> > >
> > > Summary
> > > radeonsi is ~40% smaller and 16-20% faster with PGO (!!!).
> > >
> > > Honza and the GCC people (Intel's ICC folks) do GREAT things.
> > > 'glmark2' numbers are better then 'vkmark'. (Hello, Marek.).
> > >
> > > Need some sleep.
> > >
> > > See my log, below.
> > >
> > > Greetings and GREAT work!
> > >
> > > -Dieter
> > >
> > > Am 09.02.2020 15:46, schrieb Gert Wollny:
> > > > Am Donnerstag, den 23.01.2020, 20:31 +0100 schrieb Gert
> Wollny:
> > > >> has anybody any objections if I merge the r600/NIR code?
> > > >> Without explicitely setting the debug flag it doesn't
> change a
> > > >> thing, but it would be better to continue developing
> in-tree.
> > > > Okay, if nobody objects, I'll merge it Monday evening.
> > > >
> > > > Best,
> > > > Gert
> > >
> > > [1425/1433] Linking target
> src/gallium/targets/dri/libgallium_dri.so.
> > > FAILED: src/gallium/targets/dri/libgallium_dri.so
> > > c++  -o src/gallium/targets/dri/libgallium_dri.so
> > > 'src/gallium/targets/dri/8381c20@@gallium_dri@sha/target.c.o'
> -flto
> > > -fprofile-generate -Wl,--as-needed -Wl,--no-undefined -Wl,-O1
> -shared
> > > -fPIC -Wl,--start-group -Wl,-soname,libgallium_dri.so
> > > src/mesa/libmesa_gallium.a src/mesa/libmesa_common.a
> > > src/compiler/glsl/libglsl.a src/compiler/glsl/glcpp/libglcpp.a
> > > src/util/libmesa_util.a src/util/format/libmesa_format.a
> > > src/compiler/nir/libnir.a src/compiler/libcompiler.a
> > > src/mesa/libmesa_sse41.a
> src/mesa/drivers/dri/common/libdricommon.a
> > > src/mesa/drivers/dri/common/libmegadriver_stub.a
> > > src/gallium/state_trackers/dri/libdri.a
> > > src/gallium/auxiliary/libgalliumvl.a src/gallium/auxiliary/
> > libgallium.a
> > > src/mapi/shared-glapi/libglapi.so.0.0.0
> > > src/gallium/auxiliary/pipe-loader/libpipe_loader_static.a
> > > src/loader/libloader.a src/util/libxmlconfig.a
> > > src/gallium/winsys/sw/null/libws_null.a
> > > src/gallium/winsys/sw/wrapper/libwsw.a
> > > src/gallium/winsys/sw/dri/libswdri.a
> > > src/gallium/winsys/sw/kms-dri/libswkmsdri.a
> > > src/gallium/drivers/llvmpipe/libllvmpipe.a
> > > src/gallium/drivers/softpipe/libsoftpipe.a
> > > src/gallium/drivers/r600/libr600.a
> > > src/gallium/winsys/radeon/drm/libradeonwinsys.a
> > > src/gallium/drivers/radeonsi/libradeonsi.a
> > >   

[Mesa-dev] [ANNOUNCE] mesa 19.3.4

2020-02-13 Thread Dylan Baker
Hi List,

Mesa 19.3.4 is now available.

There's lots of stuff here, but also a ton of release process data changes.
We've got changes all over the tree, but aco and anv are leading the way in
changes.


Dylan


Shortlog


Bas Nieuwenhuizen (1):
  radv: Do not set SX DISABLE bits for RB+ with unused surfaces.

Boris Brezillon (1):
  panfrost: Fix the damage box clamping logic

Brian Ho (2):
  anv: Properly fetch partial results in vkGetQueryPoolResults
  anv: Handle unavailable queries in vkCmdCopyQueryPoolResults

Danylo Piliaiev (2):
  i965: Do not set front_buffer_dirty if there is no front buffer
  st/mesa: Handle the rest renderbuffer formats from OSMesa

Drew Davenport (1):
  radeonsi: Clear uninitialized variable

Dylan Baker (17):
  docs: Add SHA 256 sums for 19.3.3
  .pick_status.json: Mark 58c929be0ddbbd9291d0dadbf11538170178e791 as 
backported
  .pick_status.json: Mark df34fa14bb872447fed9076e06ffc504d85e2d1c as 
backported
  .pick_status.json: Update to 997040e4b8353fe9b71a5e9fde2f933eae09c7a3
  .pick_status.json: Update to ca6a22305b275b49fbc88b8f4cba2fefb24c2a5d
  .pick_status.json: Mark 552028c013cc1d49a2b61ebe0fc3a3781a9ba826 as 
denominated
  .pick_status.json: Update to f09c466732e4a5b648d750378c926dd93c29
  bin/pick-ui: Add a new maintainer script for picking patches
  .pick_status.json: Update to b550b7ef3b8d12f533b67b1a03159a127a3ff34a
  .pick_status.json: Update to 9afdcd64f2c96f3fcc1a28912987f2e8066aa995
  .pick_status.json: Update to 7eaf21cb6f67adbe0e79b80b4feb8c816a98a720
  .pick_status.json: Mark ca6a22305b275b49fbc88b8f4cba2fefb24c2a5d as 
backported
  .pick_status.json: Update to d8bae10bfe0f487dcaec721743cd51441bcc12f5
  .pick_status.json: Update to 689817c9dfde9a0852f2b2489cb0fa93ffbcb215
  .pick_status.json: Update to 23037627359e739c42b194dec54875aefbb9d00b
  docs: Add release notes for 19.3.4
  VERSION: bump version for 19.3.4

Eric Anholt (1):
  Revert "gallium: Fix big-endian addressing of non-bitmask array formats."

Florian Will (1):
  radv/winsys: set IB flags prior to submit in the sysmem path

Georg Lehmann (3):
  Correctly wait in the fragment stage until all semaphores are signaled
  Vulkan Overlay: Don't try to change the image layout to present twice
  Vulkan overlay: use the corresponding image index for each swapchain

Hyunjun Ko (1):
  freedreno/ir3: put the conversion back for half const to the right place.

Ian Romanick (1):
  intel/fs: Don't count integer instructions as being possibly coissue

Jan Vesely (1):
  clover: Use explicit conversion from llvm::StringRef to std::string

Jason Ekstrand (6):
  anv: Insert holes for non-existant XFB varyings
  anv: Improve BTI change cache flushing
  anv,iris: Set 3DSTATE_SF::DerefBlockSize to per-poly on Gen12+
  genxml: Add a new 3DSTATE_SF field on gen12
  intel/fs: Write the address register with NoMask for MOV_INDIRECT
  anv/blorp: Use the correct size for vkCmdCopyBufferToImage

Kenneth Graunke (1):
  i965: Use brw_batch_references in tex_busy check

Lionel Landwerlin (1):
  isl: drop CCS row pitch requirement for linear surfaces

Marek Olšák (1):
  radeonsi: fix the DCC MSAA bug workaround

Marek Vasut (1):
  etnaviv: Destroy rsc->pending_ctx set in etna_resource_destroy()

Michel Dänzer (6):
  winsys/amdgpu: Keep a list of amdgpu_screen_winsyses in amdgpu_winsys
  winsys/amdgpu: Keep track of retrieved KMS handles using hash tables
  winsys/amdgpu: Only re-export KMS handles for different DRM FDs
  util: Add os_same_file_description helper
  winsys/amdgpu: Re-use amdgpu_screen_winsys when possible
  winsys/amdgpu: Close KMS handles for other DRM file descriptions

Neha Bhende (1):
  svga: fix size of format_conversion_table[]

Pierre-Eric Pelloux-Prayer (2):
  radeonsi: disable display DCC
  radeonsi: stop using the VM_ALWAYS_VALID flag

Rafael Antognolli (1):
  intel: Load the driver even if I915_PARAM_REVISION is not found.

Rhys Perry (6):
  aco: fix operand to scc when selecting SGPR ufind_msb/ifind_msb
  aco: ensure predecessors' p_logical_end is in WQM when a p_phi is in WQM
  aco: run p_wqm instructions in WQM
  aco: don't consider loop header blocks branch blocks in add_coupling_code
  aco: don't always add logical edges from continue_break blocks to headers
  aco: fix target calculation when vgpr spilling introduces sgpr spilling

Samuel Pitoiset (2):
  radv: do not allow sparse resources with multi-planar formats
  nir: do not use De Morgan's Law rules for flt and fge

Tapani Pälli (2):
  mapi: add GetInteger64vEXT with EXT_disjoint_timer_query
  mesa: allow bit queries for EXT_disjoint_timer_query

Thomas Hellstrom (1):
  svga: Fix banded DMA upload

Vasily Khoruzhick (1):
  lima: ppir: don't delete root ld_tex nodes without 

Re: [Mesa-dev] Profile-guides optimizations

2020-02-13 Thread Dylan Baker
I actually spent a bunch of time toying with PGO a couple of years ago. I got
the guidance all working and was able to train it, but what we found was that it
made the specific workloads we threw at it much faster, but it made every real
world use case I tried (playing a game, running piglit, gnome) slower, often
significantly so.

The hard part is not setting up pgo, it's getting the right training data.

Dylan

Quoting Marek Olšák (2020-02-13 10:30:46)
> [Forked from the other thread]
> 
> Guys, we could run some simple tests similar to piglit/drawoverhead as the 
> last
> step of the pgo=generate build. Tests like that should exercise the most 
> common
> codepaths in drivers. We could add subtests that we care about the most.
> 
> Marek
> 
> On Thu., Feb. 13, 2020, 13:16 Dylan Baker,  wrote:
> 
> meson has buildtins for both of these, -Db_lto=true turns on lto, for pgo
> you
> would run:
> 
> meson build -Db_pgo=generate
> ninja -C build
> 
> meson configure build -Db_pgo=use
> ninja -C build
> 
> Quoting Marek Olšák (2020-02-12 10:46:12)
> > How do you enable LTO+PGO? Is it something we could enable by default 
> for
> > release builds?
> >
> > Marek
> >
> > On Wed, Feb 12, 2020 at 1:56 AM Dieter Nützel 
> wrote:
> >
> >     Hello Gert,
> >
> >     your merge 'broke' LTO and then later on PGO compilation/linking.
> >
> >     I do generally compiling with '-Dgallium-drivers=
> r600,radeonsi,swrast'
> >     for testing radeonsi and (your) r600 work. ;-)
> >
> >     After your merge I get several warnings in 'addrlib' with LTO and
> even a
> >     compiler error (gcc (SUSE Linux) 9.2.1 20200128).
> >
> >     I had to disable 'r600' ('swrast' is needed for 'nine') to get a
> working
> >     LTO and even better PGO radeonsi driver.
> >     I'm preparing GREAT LTO+PGO (the later is the greater) numbers over
> the
> >     last 2 months. I'll send my results later, today.
> >
> >     Summary
> >     radeonsi is ~40% smaller and 16-20% faster with PGO (!!!).
> >
> >     Honza and the GCC people (Intel's ICC folks) do GREAT things.
> >     'glmark2' numbers are better then 'vkmark'. (Hello, Marek.).
> >
> >     Need some sleep.
> >
> >     See my log, below.
> >
> >     Greetings and GREAT work!
> >
> >     -Dieter
> >
> >     Am 09.02.2020 15:46, schrieb Gert Wollny:
> >     > Am Donnerstag, den 23.01.2020, 20:31 +0100 schrieb Gert Wollny:
> >     >> has anybody any objections if I merge the r600/NIR code?
> >     >> Without explicitely setting the debug flag it doesn't change a
> >     >> thing, but it would be better to continue developing in-tree.
> >     > Okay, if nobody objects, I'll merge it Monday evening.
> >     >
> >     > Best,
> >     > Gert
> >
> >     [1425/1433] Linking target 
> src/gallium/targets/dri/libgallium_dri.so.
> >     FAILED: src/gallium/targets/dri/libgallium_dri.so
> >     c++  -o src/gallium/targets/dri/libgallium_dri.so
> >     'src/gallium/targets/dri/8381c20@@gallium_dri@sha/target.c.o' -flto
> >     -fprofile-generate -Wl,--as-needed -Wl,--no-undefined -Wl,-O1 
> -shared
> >     -fPIC -Wl,--start-group -Wl,-soname,libgallium_dri.so
> >     src/mesa/libmesa_gallium.a src/mesa/libmesa_common.a
> >     src/compiler/glsl/libglsl.a src/compiler/glsl/glcpp/libglcpp.a
> >     src/util/libmesa_util.a src/util/format/libmesa_format.a
> >     src/compiler/nir/libnir.a src/compiler/libcompiler.a
> >     src/mesa/libmesa_sse41.a src/mesa/drivers/dri/common/libdricommon.a
> >     src/mesa/drivers/dri/common/libmegadriver_stub.a
> >     src/gallium/state_trackers/dri/libdri.a
> >     src/gallium/auxiliary/libgalliumvl.a src/gallium/auxiliary/
> libgallium.a
> >     src/mapi/shared-glapi/libglapi.so.0.0.0
> >     src/gallium/auxiliary/pipe-loader/libpipe_loader_static.a
> >     src/loader/libloader.a src/util/libxmlconfig.a
> >     src/gallium/winsys/sw/null/libws_null.a
> >     src/gallium/winsys/sw/wrapper/libwsw.a
> >     src/gallium/winsys/sw/dri/libswdri.a
> >     src/gallium/winsys/sw/kms-dri/libswkmsdri.a
> >     src/gallium/drivers/llvmpipe/libllvmpipe.a
> >     src/gallium/drivers/softpipe/libsoftpipe.a
> >     src/gallium/drivers/r600/libr600.a
> >     src/gallium/winsys/radeon/drm/libradeonwinsys.a
> >     src/gallium/drivers/radeonsi/libradeonsi.a
> >     src/gallium/winsys/amdgpu/drm/libamdgpuwinsys.a
> >     src/amd/addrlib/libaddrlib.a src/amd/common/libamd_common.a
> >     src/amd/llvm/libamd_common_llvm.a -Wl,--build-id=sha1
> -Wl,--gc-sections
> >     -Wl,--version-script /opt/mesa/src/gallium/targets/dri/dri.sym
> >     -Wl,--dynamic-list 
> /opt/mesa/src/gallium/targets/dri/../dri-vdpau.dyn
> >     

[Mesa-dev] [ANNOUNCE] mesa 20.0.0-rc3

2020-02-13 Thread Dylan Baker
Hi list,

Mesa 20.0.0-rc3 is now available. This is a much smaller release than last time,
things seem to be slowing down nicely, and the number of opened issues/MRs
against the 20.0 release milestone is 2; I'm hopeful that means we can have the
20.0 release next week, and begin the normal release process without a dozen
RCs.

There's a bit of everything in here, gallium, freedreno, vulkan overlays, anv,
radeonsi, svga, intel common, aco, nir, swr, and panfrost, but no on thing
dominates the changes, which I like a lot.

Dylan


Shortlog


Dylan Baker (4):
  .pick_status.json: Update to d8bae10bfe0f487dcaec721743cd51441bcc12f5
  .pick_status.json: Update to 689817c9dfde9a0852f2b2489cb0fa93ffbcb215
  .pick_status.json: Update to 23037627359e739c42b194dec54875aefbb9d00b
  VERSION: bump for 20.0.0-rc3

Eric Anholt (1):
  Revert "gallium: Fix big-endian addressing of non-bitmask array formats."

Georg Lehmann (3):
  Correctly wait in the fragment stage until all semaphores are signaled
  Vulkan Overlay: Don't try to change the image layout to present twice
  Vulkan overlay: use the corresponding image index for each swapchain

Hyunjun Ko (1):
  freedreno/ir3: put the conversion back for half const to the right place.

James Xiong (1):
  gallium: let the pipe drivers decide the supported modifiers

Lionel Landwerlin (1):
  anv: set MOCS on push constants

Marek Olšák (2):
  radeonsi: don't report that multi-plane formats are supported
  radeonsi: fix the DCC MSAA bug workaround

Neha Bhende (2):
  svga: fix size of format_conversion_table[]
  svga: Use pipe_shader_state_from_tgsi to set shader state

Rafael Antognolli (1):
  intel: Load the driver even if I915_PARAM_REVISION is not found.

Rhys Perry (1):
  aco: fix gfx10_wave64_bpermute

Samuel Pitoiset (4):
  aco: do not use ds_{read,write}2 on GFX6
  aco: fix waiting for scalar stores before "writing back" data on GFX8-GFX9
  aco: fix creating v_madak if v_mad_f32 has two sgpr literals
  nir: do not use De Morgan's Law rules for flt and fge

Tapani Pälli (1):
  intel/vec4: fix valgrind errors with vf_values array

Thomas Hellstrom (1):
  svga: Fix banded DMA upload

Timur Kristóf (1):
  aco/optimizer: Don't combine uniform bool s_and to s_andn2.

Vinson Lee (2):
  swr: Fix GCC 4.9 checks.
  panfrost: Remove unused anonymous enum variables.


git tag: mesa-20.0.0-rc3

https://mesa.freedesktop.org/archive/mesa-20.0.0-rc3.tar.xz
SHA256: aca72ed6201caed1f212dc00770724e5e2b51063bd31df8380f879c90160b4e8  
mesa-20.0.0-rc3.tar.xz
SHA512: 
df873cf961e641b9d9e9a6ce7eccde1a865e9125507e304b1600c6c28f15f89f9b66898a5a474f08a8ad05781d46db532fc7aedf92de3bb73b9ed1f2ba24b6cb
  mesa-20.0.0-rc3.tar.xz
PGP:  https://mesa.freedesktop.org/archive/mesa-20.0.0-rc3.tar.xz.sig



signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Profile-guides optimizations

2020-02-13 Thread Marek Olšák
[Forked from the other thread]

Guys, we could run some simple tests similar to piglit/drawoverhead as the
last step of the pgo=generate build. Tests like that should exercise the
most common codepaths in drivers. We could add subtests that we care about
the most.

Marek

On Thu., Feb. 13, 2020, 13:16 Dylan Baker,  wrote:

> meson has buildtins for both of these, -Db_lto=true turns on lto, for pgo
> you
> would run:
>
> meson build -Db_pgo=generate
> ninja -C build
> 
> meson configure build -Db_pgo=use
> ninja -C build
>
> Quoting Marek Olšák (2020-02-12 10:46:12)
> > How do you enable LTO+PGO? Is it something we could enable by default for
> > release builds?
> >
> > Marek
> >
> > On Wed, Feb 12, 2020 at 1:56 AM Dieter Nützel 
> wrote:
> >
> > Hello Gert,
> >
> > your merge 'broke' LTO and then later on PGO compilation/linking.
> >
> > I do generally compiling with
> '-Dgallium-drivers=r600,radeonsi,swrast'
> > for testing radeonsi and (your) r600 work. ;-)
> >
> > After your merge I get several warnings in 'addrlib' with LTO and
> even a
> > compiler error (gcc (SUSE Linux) 9.2.1 20200128).
> >
> > I had to disable 'r600' ('swrast' is needed for 'nine') to get a
> working
> > LTO and even better PGO radeonsi driver.
> > I'm preparing GREAT LTO+PGO (the later is the greater) numbers over
> the
> > last 2 months. I'll send my results later, today.
> >
> > Summary
> > radeonsi is ~40% smaller and 16-20% faster with PGO (!!!).
> >
> > Honza and the GCC people (Intel's ICC folks) do GREAT things.
> > 'glmark2' numbers are better then 'vkmark'. (Hello, Marek.).
> >
> > Need some sleep.
> >
> > See my log, below.
> >
> > Greetings and GREAT work!
> >
> > -Dieter
> >
> > Am 09.02.2020 15:46, schrieb Gert Wollny:
> > > Am Donnerstag, den 23.01.2020, 20:31 +0100 schrieb Gert Wollny:
> > >> has anybody any objections if I merge the r600/NIR code?
> > >> Without explicitely setting the debug flag it doesn't change a
> > >> thing, but it would be better to continue developing in-tree.
> > > Okay, if nobody objects, I'll merge it Monday evening.
> > >
> > > Best,
> > > Gert
> >
> > [1425/1433] Linking target src/gallium/targets/dri/libgallium_dri.so.
> > FAILED: src/gallium/targets/dri/libgallium_dri.so
> > c++  -o src/gallium/targets/dri/libgallium_dri.so
> > 'src/gallium/targets/dri/8381c20@@gallium_dri@sha/target.c.o' -flto
> > -fprofile-generate -Wl,--as-needed -Wl,--no-undefined -Wl,-O1 -shared
> > -fPIC -Wl,--start-group -Wl,-soname,libgallium_dri.so
> > src/mesa/libmesa_gallium.a src/mesa/libmesa_common.a
> > src/compiler/glsl/libglsl.a src/compiler/glsl/glcpp/libglcpp.a
> > src/util/libmesa_util.a src/util/format/libmesa_format.a
> > src/compiler/nir/libnir.a src/compiler/libcompiler.a
> > src/mesa/libmesa_sse41.a src/mesa/drivers/dri/common/libdricommon.a
> > src/mesa/drivers/dri/common/libmegadriver_stub.a
> > src/gallium/state_trackers/dri/libdri.a
> > src/gallium/auxiliary/libgalliumvl.a
> src/gallium/auxiliary/libgallium.a
> > src/mapi/shared-glapi/libglapi.so.0.0.0
> > src/gallium/auxiliary/pipe-loader/libpipe_loader_static.a
> > src/loader/libloader.a src/util/libxmlconfig.a
> > src/gallium/winsys/sw/null/libws_null.a
> > src/gallium/winsys/sw/wrapper/libwsw.a
> > src/gallium/winsys/sw/dri/libswdri.a
> > src/gallium/winsys/sw/kms-dri/libswkmsdri.a
> > src/gallium/drivers/llvmpipe/libllvmpipe.a
> > src/gallium/drivers/softpipe/libsoftpipe.a
> > src/gallium/drivers/r600/libr600.a
> > src/gallium/winsys/radeon/drm/libradeonwinsys.a
> > src/gallium/drivers/radeonsi/libradeonsi.a
> > src/gallium/winsys/amdgpu/drm/libamdgpuwinsys.a
> > src/amd/addrlib/libaddrlib.a src/amd/common/libamd_common.a
> > src/amd/llvm/libamd_common_llvm.a -Wl,--build-id=sha1
> -Wl,--gc-sections
> > -Wl,--version-script /opt/mesa/src/gallium/targets/dri/dri.sym
> > -Wl,--dynamic-list /opt/mesa/src/gallium/targets/dri/../dri-vdpau.dyn
> > /usr/lib64/libdrm.so -L/usr/local/lib -lLLVM-10git -pthread
> > /usr/lib64/libexpat.so
> > /usr/lib64/gcc/x86_64-suse-linux/9/../../../../lib64/libz.so -lm
> > /usr/lib64/gcc/x86_64-suse-linux/9/../../../../lib64/libzstd.so
> > -L/usr/local/lib -lLLVM-10git /usr/lib64/libunwind.so -ldl -lsensors
> > -L/usr/local/lib -lLLVM-10git /usr/lib64/libdrm_radeon.so
> > /usr/lib64/libelf.so -L/usr/local/lib -lLLVM-10git -L/usr/local/lib
> > -lLLVM-10git -L/usr/local/lib -lLLVM-10git
> /usr/lib64/libdrm_amdgpu.so
> > -L/usr/local/lib -lLLVM-10git -Wl,--end-group
> >
>  '-Wl,-rpath,$ORIGIN/../../../mesa:$ORIGIN/../../../compiler/glsl:$ORIGIN/..
> >
>  /../../compiler/glsl/glcpp:$ORIGIN/../../../util:$ORIGIN/../../../util/
> >
>  format:$ORIGIN/../../../compiler/nir:$ORIGIN/../../../compiler:$ORIGIN/..
> >
>  

Re: [Mesa-dev] Merging experimental r600/nir code

2020-02-13 Thread Dylan Baker
meson has buildtins for both of these, -Db_lto=true turns on lto, for pgo you
would run:

meson build -Db_pgo=generate
ninja -C build

meson configure build -Db_pgo=use
ninja -C build

Quoting Marek Olšák (2020-02-12 10:46:12)
> How do you enable LTO+PGO? Is it something we could enable by default for
> release builds?
> 
> Marek
> 
> On Wed, Feb 12, 2020 at 1:56 AM Dieter Nützel  wrote:
> 
> Hello Gert,
> 
> your merge 'broke' LTO and then later on PGO compilation/linking.
> 
> I do generally compiling with '-Dgallium-drivers=r600,radeonsi,swrast'
> for testing radeonsi and (your) r600 work. ;-)
> 
> After your merge I get several warnings in 'addrlib' with LTO and even a
> compiler error (gcc (SUSE Linux) 9.2.1 20200128).
> 
> I had to disable 'r600' ('swrast' is needed for 'nine') to get a working
> LTO and even better PGO radeonsi driver.
> I'm preparing GREAT LTO+PGO (the later is the greater) numbers over the
> last 2 months. I'll send my results later, today.
> 
> Summary
> radeonsi is ~40% smaller and 16-20% faster with PGO (!!!).
> 
> Honza and the GCC people (Intel's ICC folks) do GREAT things.
> 'glmark2' numbers are better then 'vkmark'. (Hello, Marek.).
> 
> Need some sleep.
> 
> See my log, below.
> 
> Greetings and GREAT work!
> 
> -Dieter
> 
> Am 09.02.2020 15:46, schrieb Gert Wollny:
> > Am Donnerstag, den 23.01.2020, 20:31 +0100 schrieb Gert Wollny:
> >> has anybody any objections if I merge the r600/NIR code?
> >> Without explicitely setting the debug flag it doesn't change a
> >> thing, but it would be better to continue developing in-tree.
> > Okay, if nobody objects, I'll merge it Monday evening.
> >
> > Best,
> > Gert
> 
> [1425/1433] Linking target src/gallium/targets/dri/libgallium_dri.so.
> FAILED: src/gallium/targets/dri/libgallium_dri.so
> c++  -o src/gallium/targets/dri/libgallium_dri.so
> 'src/gallium/targets/dri/8381c20@@gallium_dri@sha/target.c.o' -flto
> -fprofile-generate -Wl,--as-needed -Wl,--no-undefined -Wl,-O1 -shared
> -fPIC -Wl,--start-group -Wl,-soname,libgallium_dri.so
> src/mesa/libmesa_gallium.a src/mesa/libmesa_common.a
> src/compiler/glsl/libglsl.a src/compiler/glsl/glcpp/libglcpp.a
> src/util/libmesa_util.a src/util/format/libmesa_format.a
> src/compiler/nir/libnir.a src/compiler/libcompiler.a
> src/mesa/libmesa_sse41.a src/mesa/drivers/dri/common/libdricommon.a
> src/mesa/drivers/dri/common/libmegadriver_stub.a
> src/gallium/state_trackers/dri/libdri.a
> src/gallium/auxiliary/libgalliumvl.a src/gallium/auxiliary/libgallium.a
> src/mapi/shared-glapi/libglapi.so.0.0.0
> src/gallium/auxiliary/pipe-loader/libpipe_loader_static.a
> src/loader/libloader.a src/util/libxmlconfig.a
> src/gallium/winsys/sw/null/libws_null.a
> src/gallium/winsys/sw/wrapper/libwsw.a
> src/gallium/winsys/sw/dri/libswdri.a
> src/gallium/winsys/sw/kms-dri/libswkmsdri.a
> src/gallium/drivers/llvmpipe/libllvmpipe.a
> src/gallium/drivers/softpipe/libsoftpipe.a
> src/gallium/drivers/r600/libr600.a
> src/gallium/winsys/radeon/drm/libradeonwinsys.a
> src/gallium/drivers/radeonsi/libradeonsi.a
> src/gallium/winsys/amdgpu/drm/libamdgpuwinsys.a
> src/amd/addrlib/libaddrlib.a src/amd/common/libamd_common.a
> src/amd/llvm/libamd_common_llvm.a -Wl,--build-id=sha1 -Wl,--gc-sections
> -Wl,--version-script /opt/mesa/src/gallium/targets/dri/dri.sym
> -Wl,--dynamic-list /opt/mesa/src/gallium/targets/dri/../dri-vdpau.dyn
> /usr/lib64/libdrm.so -L/usr/local/lib -lLLVM-10git -pthread
> /usr/lib64/libexpat.so
> /usr/lib64/gcc/x86_64-suse-linux/9/../../../../lib64/libz.so -lm
> /usr/lib64/gcc/x86_64-suse-linux/9/../../../../lib64/libzstd.so
> -L/usr/local/lib -lLLVM-10git /usr/lib64/libunwind.so -ldl -lsensors
> -L/usr/local/lib -lLLVM-10git /usr/lib64/libdrm_radeon.so
> /usr/lib64/libelf.so -L/usr/local/lib -lLLVM-10git -L/usr/local/lib
> -lLLVM-10git -L/usr/local/lib -lLLVM-10git /usr/lib64/libdrm_amdgpu.so
> -L/usr/local/lib -lLLVM-10git -Wl,--end-group
> 
> '-Wl,-rpath,$ORIGIN/../../../mesa:$ORIGIN/../../../compiler/glsl:$ORIGIN/..
> /../../compiler/glsl/glcpp:$ORIGIN/../../../util:$ORIGIN/../../../util/
> format:$ORIGIN/../../../compiler/nir:$ORIGIN/../../../compiler:$ORIGIN/..
> /../../mesa/drivers/dri/common:$ORIGIN/../../state_trackers/dri:$ORIGIN/..
> /../auxiliary:$ORIGIN/../../../mapi/shared-glapi:$ORIGIN/../../auxiliary/
> 
> pipe-loader:$ORIGIN/../../../loader:$ORIGIN/../../winsys/sw/null:$ORIGIN/..
> /../winsys/sw/wrapper:$ORIGIN/../../winsys/sw/dri:$ORIGIN/../../winsys/sw/
> kms-dri:$ORIGIN/../../drivers/llvmpipe:$ORIGIN/../../drivers/
> 
> softpipe:$ORIGIN/../../drivers/r600:$ORIGIN/../../winsys/radeon/drm:$ORIGIN
> /../../drivers/radeonsi:$ORIGIN/../../winsys/amdgpu/drm:$ORIGIN/../../../
>

Re: [Mesa-dev] Merging experimental r600/nir code

2020-02-13 Thread Michel Dänzer
On 2020-02-12 7:46 p.m., Marek Olšák wrote:
> How do you enable LTO+PGO? Is it something we could enable by default for
> release builds?

Enabling LTO for Mesa, I get a lot of warnings about issues affecting it
specifically, making me doubt that it's currently safe in general, in
particular for the radeonsi/RADV drivers (due to issues in addrlib). It
shouldn't be enabled by default before those issues are addressed (and
ideally CI coverage in place to prevent them from creeping back in).


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Merging experimental r600/nir code

2020-02-13 Thread Eero Tamminen

Hi,

On 13.2.2020 10.38, Timur Kristóf wrote:

I think the question about PGO is this: are the profiles of the users'
applications gonna be the same as the profile that is collected from
the benchmarks?

Eg. if the test benchmark uses different draw calls or triggers
different shader compiler code paths than a your favourite game, in
theory PGO could harm the performance of your game.

Also, how do we prevent it from making bad decisions based on the hw
that the profile was made on?

For example, if you collect the profiling data from a machine that has
a Polaris 10 GPU, then the profile will show that chip_class is
extremely likely to be GFX8 and thus the PGO build will be optimized
for that case. If I then run the same build on my Navi 10, the PGO
build might actually be slower, because the driver needs to take a
different code path than what the PGO build was optimized for.

What do you guys think about this?


How much HW specific stuff can impact things, depends on whether those 
things are executed constantly, or is it only something done once.  If 
former, it may be useful to (try) design driver so that they get 
executed only once.


Most CPU extensive part is shader compilation (with Intel, linking stage 
more than things done before it), and the heavy part is AFAIK to a large 
extent HW independent.  In benchmarks, shader compilation is almost 
always done at startup, in games shader compilation typically happens 
also afterwards.


As to how much PGO can make things worse, I think that depends on how 
independent the non-executed part of the code is.  If it's not mixed 
with code that did get executed, I don't think there will be any visible 
impact.  But if it's badly mixed, hot/cold function identification will 
group things wrong.



- Eero


Best regards,
Timur

On Thu, 2020-02-13 at 02:40 -0500, Marek Olšák wrote:

Can we automate this?

Let's say we implement noop ioctls for radeonsi and iris, and then we
run the drivers to collect pgo data on any hw.

Can meson execute this build sequence:
build with pgo=generate
run tests
clean
build with pgo=use

automated as buildtype=release-pgo.
a bit
Marek

On Wed., Feb. 12, 2020, 23:37 Dieter Nützel, 
wrote:

Hello Marek,

I hoped you would ask this...
...but first sorry for the delay of my announced numbers.
Our family is/was sick, my wife more than me and our children are
fine,
again.
So be lenient with me somewhat.

Am 12.02.2020 19:46, schrieb Marek Olšák:

How do you enable LTO+PGO? Is it something we could enable by

default

for release builds?

Marek


I think we can achieve this.

I'm running with LTO+PGO 'release' since late December (around
Christmas).
My KDE Plasma5 (OpenGL 3.0) system/desktop was never
agiler/fluider
since then.
Even the numbers (glmark2) show it. The 'glmark2' numbers are the
best
I've ever seen on this system.
LTO offer only some small space reduction and hardly any speedup.
But LTO+PGO is GREAT.

First I compile with '-Db_lto=true -Db_pgo=generate'.

mkdir build
cd build
meson ../ --strip --buildtype release -Ddri-drivers=
-Dplatforms=drm,x11
-Dgallium-drivers=r600,radeonsi,swrast -Dvulkan-drivers=amd
-Dgallium-nine=true -Dgallium-opencl=standalone -Dglvnd=true
-Dgallium-va=true -Dgallium-xvmc=false -Dgallium-omx=disabled
-Dgallium-xa=false -Db_lto=true -Db_pgo=generate

After that my 'build' dir looks like this:

drwxr-xr-x  8 dieter users4096 13. Feb 04:34 .
drwxr-xr-x 14 dieter users4096 13. Feb 04:33 ..
drwxr-xr-x  2 dieter users4096 13. Feb 04:34 bin
-rw-r--r--  1 dieter users 4369873 13. Feb 04:34 build.ninja
-rw-r--r--  1 dieter users 4236719 13. Feb 04:34
compile_commands.json
drwxr-xr-x  2 dieter users4096 13. Feb 04:34 include
drwxr-xr-x  2 dieter users4096 13. Feb 04:34 meson-info
drwxr-xr-x  2 dieter users4096 13. Feb 04:33 meson-logs
drwxr-xr-x  2 dieter users4096 13. Feb 04:34 meson-private
drwxr-xr-x 14 dieter users4096 13. Feb 04:34 src

time nice +19 ninja

Lasts ~15 minutes on my aging/'slow' Intel Xeon X3470 Nehalem,
4c/8t,
2.93 GHz, 24 GB, Polaris 20.
Without LTO+PGO it is ~4-5 minutes. (AMD anyone?)

Then I remove all files/dirs except 'src'.

Next 'installing' the new built files under '/usr/local/' (mostly
symlinked to /usr/lib64/).

Now run as much OpenGL/Vulkan progs as I can.
Normaly starting with glmark2 and vkmark.

Here comes my (whole) list:
Knights
Wireshark
K3b
Skanlite
Kdenlive
GIMP
Krita
FreeCAD
Blender 2.81x
digikam
K4DirStat
Discover
YaST
Do some 'movements'/work in/with every prog.
+
some LibreOffice work (OpenGL enabled)
one or two OpenGL games
and Vulkan games
+
run some WebGL stuff in my browsers (Konqi/FF).

After that I have the needed '*.gcda' files in 'src'.

Now second rebuild in 'src'.
Due to the deleted files/dirs I can do a second 'meson' config run
in my
current 'build' dir.

meson ../ --strip --buildtype release -Ddri-drivers=
-Dplatforms=drm,x11
-Dgallium-drivers=r600,radeonsi,swrast -Dvulkan-drivers=amd
-Dgallium-nine=true 

Re: [Mesa-dev] Merging experimental r600/nir code

2020-02-13 Thread Timur Kristóf
I think the question about PGO is this: are the profiles of the users'
applications gonna be the same as the profile that is collected from
the benchmarks?

Eg. if the test benchmark uses different draw calls or triggers
different shader compiler code paths than a your favourite game, in
theory PGO could harm the performance of your game.

Also, how do we prevent it from making bad decisions based on the hw
that the profile was made on?

For example, if you collect the profiling data from a machine that has
a Polaris 10 GPU, then the profile will show that chip_class is
extremely likely to be GFX8 and thus the PGO build will be optimized
for that case. If I then run the same build on my Navi 10, the PGO
build might actually be slower, because the driver needs to take a
different code path than what the PGO build was optimized for.

What do you guys think about this?

Best regards,
Timur

On Thu, 2020-02-13 at 02:40 -0500, Marek Olšák wrote:
> Can we automate this?
> 
> Let's say we implement noop ioctls for radeonsi and iris, and then we
> run the drivers to collect pgo data on any hw.
> 
> Can meson execute this build sequence:
> build with pgo=generate
> run tests
> clean
> build with pgo=use
> 
> automated as buildtype=release-pgo.
> 
> Marek
> 
> On Wed., Feb. 12, 2020, 23:37 Dieter Nützel, 
> wrote:
> > Hello Marek,
> > 
> > I hoped you would ask this...
> > ...but first sorry for the delay of my announced numbers.
> > Our family is/was sick, my wife more than me and our children are
> > fine, 
> > again.
> > So be lenient with me somewhat.
> > 
> > Am 12.02.2020 19:46, schrieb Marek Olšák:
> > > How do you enable LTO+PGO? Is it something we could enable by
> > default
> > > for release builds?
> > > 
> > > Marek
> > 
> > I think we can achieve this.
> > 
> > I'm running with LTO+PGO 'release' since late December (around 
> > Christmas).
> > My KDE Plasma5 (OpenGL 3.0) system/desktop was never
> > agiler/fluider 
> > since then.
> > Even the numbers (glmark2) show it. The 'glmark2' numbers are the
> > best 
> > I've ever seen on this system.
> > LTO offer only some small space reduction and hardly any speedup.
> > But LTO+PGO is GREAT.
> > 
> > First I compile with '-Db_lto=true -Db_pgo=generate'.
> > 
> > mkdir build
> > cd build
> > meson ../ --strip --buildtype release -Ddri-drivers=
> > -Dplatforms=drm,x11 
> > -Dgallium-drivers=r600,radeonsi,swrast -Dvulkan-drivers=amd 
> > -Dgallium-nine=true -Dgallium-opencl=standalone -Dglvnd=true 
> > -Dgallium-va=true -Dgallium-xvmc=false -Dgallium-omx=disabled 
> > -Dgallium-xa=false -Db_lto=true -Db_pgo=generate
> > 
> > After that my 'build' dir looks like this:
> > 
> > drwxr-xr-x  8 dieter users4096 13. Feb 04:34 .
> > drwxr-xr-x 14 dieter users4096 13. Feb 04:33 ..
> > drwxr-xr-x  2 dieter users4096 13. Feb 04:34 bin
> > -rw-r--r--  1 dieter users 4369873 13. Feb 04:34 build.ninja
> > -rw-r--r--  1 dieter users 4236719 13. Feb 04:34
> > compile_commands.json
> > drwxr-xr-x  2 dieter users4096 13. Feb 04:34 include
> > drwxr-xr-x  2 dieter users4096 13. Feb 04:34 meson-info
> > drwxr-xr-x  2 dieter users4096 13. Feb 04:33 meson-logs
> > drwxr-xr-x  2 dieter users4096 13. Feb 04:34 meson-private
> > drwxr-xr-x 14 dieter users4096 13. Feb 04:34 src
> > 
> > time nice +19 ninja
> > 
> > Lasts ~15 minutes on my aging/'slow' Intel Xeon X3470 Nehalem,
> > 4c/8t, 
> > 2.93 GHz, 24 GB, Polaris 20.
> > Without LTO+PGO it is ~4-5 minutes. (AMD anyone?)
> > 
> > Then I remove all files/dirs except 'src'.
> > 
> > Next 'installing' the new built files under '/usr/local/' (mostly 
> > symlinked to /usr/lib64/).
> > 
> > Now run as much OpenGL/Vulkan progs as I can.
> > Normaly starting with glmark2 and vkmark.
> > 
> > Here comes my (whole) list:
> > Knights
> > Wireshark
> > K3b
> > Skanlite
> > Kdenlive
> > GIMP
> > Krita
> > FreeCAD
> > Blender 2.81x
> > digikam
> > K4DirStat
> > Discover
> > YaST
> > Do some 'movements'/work in/with every prog.
> > +
> > some LibreOffice work (OpenGL enabled)
> > one or two OpenGL games
> > and Vulkan games
> > +
> > run some WebGL stuff in my browsers (Konqi/FF).
> > 
> > After that I have the needed '*.gcda' files in 'src'.
> > 
> > Now second rebuild in 'src'.
> > Due to the deleted files/dirs I can do a second 'meson' config run
> > in my 
> > current 'build' dir.
> > 
> > meson ../ --strip --buildtype release -Ddri-drivers=
> > -Dplatforms=drm,x11 
> > -Dgallium-drivers=r600,radeonsi,swrast -Dvulkan-drivers=amd 
> > -Dgallium-nine=true -Dgallium-opencl=standalone -Dglvnd=true 
> > -Dgallium-va=true -Dgallium-xvmc=false -Dgallium-omx=disabled 
> > -Dgallium-xa=false -Db_lto=true -Db_pgo=use
> > 
> > After around 5-6 minutes (!!!) I can install the LTO+PGO 'release'
> > build 
> > driver files and enjoy next level of OpenGL speed.
> > Vulkan do NOT show such GREAT improvements.
> > 
> > Only '-Db_lto=true -Db_pgo=generate' need ~3 times compilation and 
> > mostly linking time.
> > 
> > Below