Re: Lavapipe license

2024-02-15 Thread Marek Olšák
You should only see the MIT license for all Mesa contributions that
don't import external source code.

Marek

On Tue, Feb 13, 2024 at 5:23 AM George Karpathios  wrote:
>
> Hi everyone,
>
> I'd like to bundle Lavapipe's binary that I've built (also contains LLVM 
> thanks to static linking) with a commercial application and I'm confused 
> regarding which licenses I should include into the product. Reading in 
> https://docs.mesa3d.org/license.html, "Different copyrights and licenses 
> apply to different components" and "In general, consult the source files for 
> license terms." makes me think that I should search into every component that 
> Lavapipe uses (how can I figure these out precisely?), is that correct? For 
> example, do I need the licenses for LLVM, Main Mesa code, Gallium, llvmpipe 
> and more? Additionally, looking inside Lavapipe's source files under 
> src/gallium/frontends/lavapipe, I see various license texts from RedHat, 
> Intel, AMD, Valve, VMware etc.
>
> I feel a bit overwhelmed as to what's the proper thing to do, so if anyone 
> could help me learn how to figure situations like this out, I would be really 
> grateful. Thanks in advance.
>
> Best regards,
> George


Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-24 Thread Marek Olšák
Gallium looks like it was just a copy of DX10, and likely many things were
known from DX10 in advance before anything started. Vulkanium doesn't have
anything to draw inspiration from. It's a completely unexplored idea.

AMD's PAL is the same idea as Gallium. It's used to implement Vulkan, DX,
Mantle, Metal, etc.

Marek

On Wed, Jan 24, 2024, 13:40 Faith Ekstrand  wrote:

> On Wed, Jan 24, 2024 at 12:26 PM Zack Rusin 
> wrote:
> >
> > On Wed, Jan 24, 2024 at 10:27 AM Faith Ekstrand 
> wrote:
> > >
> > > Jose,
> > >
> > > Thanks for your thoughts!
> > >
> > > On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca <
> jose.fons...@broadcom.com> wrote:
> > > >
> > > > I don't know much about the current Vulkan driver internals to have
> or provide an informed opinion on the path forward, but I'd like to share
> my backwards looking perspective.
> > > >
> > > > Looking back, Gallium was two things effectively:
> > > > (1) an abstraction layer, that's watertight (as in upper layers
> shouldn't reach through to lower layers)
> > > > (2) an ecosystem of reusable components (draw, util, tgsi, etc.)
> > > >
> > > > (1) was of course important -- and the discipline it imposed is what
> enabled to great simplifications -- but it also became a straight-jacket,
> as GPUs didn't stand still, and sooner or later the
> see-every-hardware-as-the-same lenses stop reflecting reality.
> > > >
> > > > If I had to pick one, I'd say that (2) is far more useful and
> practical.Take components like gallium's draw and other util modules. A
> driver can choose to use them or not.  One could fork them within Mesa
> source tree, and only the drivers that opt-in into the fork would need to
> be tested/adapted/etc
> > > >
> > > > On the flip side, Vulkan API is already a pretty low level HW
> abstraction.  It's also very flexible and extensible, so it's hard to
> provide a watertight abstraction underneath it without either taking the
> lowest common denominator, or having lots of optional bits of functionality
> governed by a myriad of caps like you alluded to.
> > >
> > > There is a third thing that isn't really recognized in your
> description:
> > >
> > > (3) A common "language" to talk about GPUs and data structures that
> > > represent that language
> > >
> > > This is precisely what the Vulkan runtime today doesn't have. Classic
> > > meta sucked because we were trying to implement GL in GL. u_blitter,
> > > on the other hand, is pretty fantastic because Gallium provides a much
> > > more sane interface to write those common components in terms of.
> > >
> > > So far, we've been trying to build those components in terms of the
> > > Vulkan API itself with calls jumping back into the dispatch table to
> > > try and get inside the driver. This is working but it's getting more
> > > and more fragile the more tools we add to that box. A lot of what I
> > > want to do with gallium2 or whatever we're calling it is to fix our
> > > layering problems so that calls go in one direction and we can
> > > untangle the jumble. I'm still not sure what I want that to look like
> > > but I think I want it to look a lot like Vulkan, just with a handier
> > > interface.
> >
> > Yes, that makes sense. When we were writing the initial components for
> > gallium (draw and cso) I really liked the general concept and thought
> > about trying to reuse them in the old, non-gallium Mesa drivers but
> > the obstacle was that there was no common interface to lay them on.
> > Using GL to implement GL was silly and using Vulkan to implement
> > Vulkan is not much better.
> >
> > Having said that my general thoughts on GPU abstractions largely match
> > what Jose has said. To me it's a question of whether a clean
> > abstraction:
> > - on top of which you can build an entire GPU driver toolkit (i.e. all
> > the components and helpers)
> > - that makes it trivial to figure up what needs to be done to write a
> > new driver and makes bootstrapping a new driver a lot simpler
> > - that makes it easier to reason about cross hardware concepts (it's a
> > lot easier to understand the entirety of the ecosystem if every driver
> > is not doing something unique to implement similar functionality)
> > is worth more than almost exponentially increasing the difficulty of:
> > - advancing the ecosystem (i.e. it might be easier to understand but
> > it's way harder to create clean abstractions across such different
> > hardware).
> > - driver maintenance (i.e. there will be a constant stream of
> > regressions hitting your driver as a result of other people working on
> > their drivers)
> > - general development (i.e. bug fixes/new features being held back
> > because they break some other driver)
> >
> > Some of those can certainly be titled one way or the other, e.g. the
> > driver maintenance con be somewhat eased by requiring that every
> > driver working on top of the new abstraction has to have a stable
> > Mesa-CI setup (be it lava or ci-tron, or whatever) but all of those

Re: MesaGL development on Fedora

2023-12-25 Thread Marek Olšák
Hi,

All Mesa libraries must come from the same build so as not to be
rejected at runtime.

Marek

On Mon, Dec 25, 2023 at 7:15 AM Mischa Baars  wrote:
>
> Hello,
>
> I was going over MesaGL Blending when I discovered an issue that needs 
> fixing. The problem is that when I compile and install a custom version of 
> the Fedora mesa-libGL package, i.e. 
> https://archive.mesa3d.org/mesa-23.1.9.tar.xz, that my system 
> (gnome-terminal, firefox, chromium) becomes unstable even without the patch 
> applied. At first I thought it was the patch, but it isn't.
>
> Help would be appreciated.
>
> Best regards,
> Mischa Baars.
>


Better lost context robustness for window systems

2023-08-06 Thread Marek Olšák
There is a possibility to have robust contexts for all apps within a
window system when apps themselves are not robust.

If an app is robust, losing a context skips all following API calls.
The app must recreate its context and resources and continue.

When an app is not robust, losing a context results in undefined
behavior, including but not limited to  process termination.

The last case can be improved as follows. The window system can tell
its apps (via egl/wayland, etc.) to create all contexts as robust if
an app doesn't do that. In this case, the reset status isn't returned
to apps, but instead it's returned to the window system.

If the window system receives it, it knows that:
- the app is not robust
- the app lost its context

The window system can do this:
- gray out the window
- inform the user about the situation
- give the user a list of actions to choose from (wait or terminate
the process, report the issue to the distro vendor, etc.)

Marek


Re: X Crashes after Installing Mesa Driver

2023-07-13 Thread Marek Olšák
Please follow https://github.com/marekolsak/marek-build/

Marek

On Wed, Jul 12, 2023 at 11:55 AM Chen, Jinyi (Joey)  wrote:
>
> [AMD Official Use Only - General]
>
>
> Hi,
>
>
> I have built and install mesa driver. I install it to system so xorg boots up 
> with it. But I am encountering black screens and flashes.
>
> I first did a local install to a directory. It successfully functions with 
> applications like glxinfo and glxgears when I export the LD_LIBRARY_PATH to 
> the custom Mesa driver I created. But LD_LIBRARY_PATH seemed to be set after 
> xorg loads (even when I put it in /etc/environment). So after I did a system 
> install instead. However, when I try to initiate the X server using the debug 
> Mesa driver, I encounter a flickering black screen. I have experimented with 
> various versions of Mesa, including the main branch, 23.0, and 23.1. It works 
> without installing amdgpu but does not work after installing amdgpu. My os is 
> Ubuntu 22.04.
>
> To build Mesa, I followed these steps after downloading the source code from 
> https://gitlab.freedesktop.org/mesa/mesa/-/tree/main/
>
> meson setup builddbg/
> meson configure builddbg/ -Dbuildtype=debug
> meson install -C builddbg/
>
>
>
> I have also tried without specifying build type.
>
>
>
>  Can you share the steps you build mesa and how do you attach it to xorg? 
> Thanks in advance for your help!
>
>
>
>
>
> Best,
>
> Joey


Re: Benefits of cryptographic hash functions for uniquely identifing Vulkan shaders.

2023-06-29 Thread Marek Olšák
If there is a hash collision, it will cause a GPU hang. A cryptographic
hash function reduces that chance to practically zero.

Marek

On Thu, Jun 29, 2023, 07:04 mikolajlubiak1337 
wrote:

> Hi,
> I have recently read Phoronix article[1] about you switching to BLAKE3
> instead of SHA1.
> If BLAKE3 is a cryptographic hash function wouldn't it be faster to use a
> non cryptographic hash function or even a checksum function? Do you need
> the benefits of cryptographic hash functions over other hash/checksum
> functions for the purpose of uniquely identifing Vulkan shaders?
>
> [1]: https://www.phoronix.com/news/Mesa-BLAKE3-Shader-Hashing
>
> -- me
>
>


Re: Need support to add NV12 format in mesa code

2023-06-17 Thread Marek Olšák
I don't think u_format.csv can describe it. It only describes formats of
pixels in a single plane. All formats that have multiple planes where a
single pixel stores only a subset of its bits in each plane are handled as
special cases. Drivers also don't support such formats directly. For
example, drivers usually handle NV12 as 2 separate textures R8 and R8G8.

Marek

On Fri, Jun 16, 2023, 11:30 Gurpreet Kaur (QUIC) 
wrote:

> Hi All,
>
>
>
> While trying to add support of GBM_FORMAT_NV12 format in mesa GBM, we are
> facing some issues to modify u_format.csv file.
>
>
>
> *Progress till now:*
>
> Initially we have faced gbm_dri_bo_create API failure because
> gbm_format_to_dri_format API was returning 0. Since gbm_dri_visual_table
> does not have mapping for GBM_FORMAT_NV12. So, this has been resolved by
> adding mapping for GBM_FORMAT_NV12 as:
>
> *diff --git a/src/gbm/backends/dri/gbm_dri.c
> b/src/gbm/backends/dri/gbm_dri.c*
>
> *index 560b97f2b70..67e23d5b368 100644*
>
> *--- a/src/gbm/backends/dri/gbm_dri.c*
>
> *+++ b/src/gbm/backends/dri/gbm_dri.c*
>
> *@@ -605,6 +605,9 @@ static const struct gbm_dri_visual
> gbm_dri_visuals_table[] = {*
>
> *  { 16, 16, 16, 16 },*
>
> *  true,*
>
> *},*
>
> *+   {*
>
> *+  GBM_FORMAT_NV12, __DRI_IMAGE_FORMAT_NV12,*
>
> *+   },*
>
> *};*
>
>
>
> After that it required additional changes, as loader_dri_create_image was
> not successful which eventually is calling dri2_create_image_common. Based
> on format, we are getting dri_format_mapping and mapping for
> __DRI_IMAGE_FORMAT_NV12 was missing. So, resolved this by adding below
> changes:
>
>
>
> *diff --git a/src/gallium/frontends/dri/dri_helpers.c
> b/src/gallium/frontends/dri/dri_helpers.c*
>
> *index 215fb4e4e3a..6ae1fc85d12 100644*
>
> *--- a/src/gallium/frontends/dri/dri_helpers.c*
>
> *+++ b/src/gallium/frontends/dri/dri_helpers.c*
>
> *@@ -484,6 +484,11 @@ static const struct dri2_format_mapping
> dri2_format_table[] = {*
>
> * __DRI_IMAGE_COMPONENTS_RG,PIPE_FORMAT_RG1616_UNORM, 1,*
>
> * { { 0, 0, 0, __DRI_IMAGE_FORMAT_GR1616 } } },*
>
> *+  { DRM_FORMAT_NV12,  __DRI_IMAGE_FORMAT_NV12,*
>
> *+__DRI_IMAGE_COMPONENTS_Y_UV,  PIPE_FORMAT_NV12, 2,*
>
> *+{ { 0, 0, 0, __DRI_IMAGE_FORMAT_R8 },*
>
> *+  { 1, 1, 1, __DRI_IMAGE_FORMAT_GR88 } } },*
>
>
>
> Then in kms_sw_displaytarget_create API, DRM_IOCTL_MODE_CREATE_DUMB was
> getting failed with return value as -1. As util_format_get_blocksizebits
> was returning bpp value as zero which is incorrect for NV12 format. When we
> manually pass bpp value as 8, then we can see gbm_bo_create API is
> successful.
>
>
>
>
>
>
>
> diff --git a/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c
>
> index c91f7e2ca9a..6a649e9c173 100644
>
> --- a/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c
>
> +++ b/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c
>
> *@@ -181,25 +181,33 @@ kms_sw_displaytarget_create(struct sw_winsys *ws,*
>
> *kms_sw_dt->ro_mapped = MAP_FAILED;*
>
> *kms_sw_dt->format = format;*
>
> *memset(_req, 0, sizeof(create_req));*
>
> *create_req.bpp = util_format_get_blocksizebits(format);*
>
> *+   if(format == 213)*
>
> *+  create_req.bpp = 8;*
>
>
>
> *Next Challenge:*
>
> Next challenge for us is to get correct bpp value which is 8. As format
> description is coming from u_format.csv file. We have tried to modify
> u_format.csv file for PIPE_FORMAT_NV12 row by adding number of bits in a
> cell,  but that does not seem to be working. We need help for the same how
> to modify u_format.csv as it is getting generated at compile time.
>
> Please provide your valuable feedback how we can add bpp value in
> u_format.csv.
>
>
>
> Thanks,
>
> Gurpreet
>


Re: Zink MR signoff tags

2022-10-05 Thread Marek Olšák
That's a good idea.

Marek

On Wed, Oct 5, 2022, 11:22 Erik Faye-Lund 
wrote:

> On Wed, 2022-10-05 at 08:20 -0400, Alyssa Rosenzweig wrote:
> > + for not requiring rb/ab tags ...
>
> I think it's time to think about making this change all over Mesa as
> well. We're deeply in bed with GitLab by now, so I don't think there's
> a realistic chance that this isn't going to just be duplicate info any
> time soon...
>
> > I kinda like the s-o-b tags but those
> > don't require fiddly rebases, just -s in the right place..
> >
> > On Tue, Oct 04, 2022 at 10:44:31PM -0500, Mike Blumenkrantz wrote:
> > >Hi,
> > >After some vigorous and robust discussion with Erik, we've
> > > decided that
> > >zink will no longer require any rb/ab/etb tags to be applied to
> > > patches in
> > >MRs.
> > >Following in Turnip's footsteps, any MR that receives sufficient
> > > reviewage
> > >in gitlab comments can be merged directly with no further action
> > >necessary.
> > >Mike
>
> --
> Erik Faye-Lund
> Principal Engineer
>
> Collabora Ltd.
> Platinum Building, St John's Innovation Park, Cambridge CB4 0DS, United
> Kingdom
> Registered in England & Wales, no. 5513718
>
>


Re: Moving amber into separate repo?

2022-09-24 Thread Marek Olšák
Removing mainline drivers from the build system of Amber is a good idea.

Marek

On Fri, Sep 23, 2022, 06:33 Filip Gawin  wrote:

> Hi, recently I've seen case of user been using Amber when hardware was
> supported by mainline mesa. This gave me a couple of thoughts.
>
> 1) Users don't correlate "Amber" with "Legacy" and probably it's gonna be
> best to always also print "Legacy" together with "Mesa".
> 2) Not sure if problem of choosing best driver is on mesa's or distro
> maintainer's side, but it became more complicated for maintainers.
>
> I'm thinking that moving Amber into separate repo may make this situation
> more clear. (Disabling duplicated drivers or only allowing glsl_to_tgsi
> codepath may futher help.)
>
> Some more reasoning from gitlab:
>
>
>1. web based tools provided by gitlab are quite useful, unfortunately
>they work best with main branch.
>2. repo is growing large. Amber kinda requires long history, modern
>mesa not. This may be good spot to split if cleanup is required.
>3. imho having amber's issues in this repo, won't create new
>contributors. Due to lack of kernel driver (on commercial level) or
>documentation for these gpus, so you need to be both mesa and kernel
>developer. (Any contribution is gonna require deep knowledge about
>hardware, domain and time consuming effort.)
>4. for normal users (not software developers) amber is kinda "hidden
>under the carpet". Communities like vogons may be interested in having
>simpler access to kinda documentation for these ancient gpus.
>
>
> Thanks for all insights, Filip.


Re: Moving amber into separate repo?

2022-09-24 Thread Marek Olšák
Git stores all commits. Removing files at the top of main doesn't make the
repository smaller. It actually makes it bigger. Forking the repository
would also double the size on the server.

Marek

On Fri, Sep 23, 2022, 06:33 Filip Gawin  wrote:

> Hi, recently I've seen case of user been using Amber when hardware was
> supported by mainline mesa. This gave me a couple of thoughts.
>
> 1) Users don't correlate "Amber" with "Legacy" and probably it's gonna be
> best to always also print "Legacy" together with "Mesa".
> 2) Not sure if problem of choosing best driver is on mesa's or distro
> maintainer's side, but it became more complicated for maintainers.
>
> I'm thinking that moving Amber into separate repo may make this situation
> more clear. (Disabling duplicated drivers or only allowing glsl_to_tgsi
> codepath may futher help.)
>
> Some more reasoning from gitlab:
>
>
>1. web based tools provided by gitlab are quite useful, unfortunately
>they work best with main branch.
>2. repo is growing large. Amber kinda requires long history, modern
>mesa not. This may be good spot to split if cleanup is required.
>3. imho having amber's issues in this repo, won't create new
>contributors. Due to lack of kernel driver (on commercial level) or
>documentation for these gpus, so you need to be both mesa and kernel
>developer. (Any contribution is gonna require deep knowledge about
>hardware, domain and time consuming effort.)
>4. for normal users (not software developers) amber is kinda "hidden
>under the carpet". Communities like vogons may be interested in having
>simpler access to kinda documentation for these ancient gpus.
>
>
> Thanks for all insights, Filip.


Re: Gitlab filters in Gmail

2022-07-14 Thread Marek Olšák
MRs always appear to have the MR label. Even if some comments don't have
it, the whole thread has the MR label if at least 1 message has the label
and that's enough.

Marek

On Thu, Jul 14, 2022 at 7:03 AM Timur Kristóf 
wrote:

> How do you separate gitlab comments on merge requests and issues?
>
> Timur
>
> On Thu, 2022-07-14 at 05:28 -0400, Marek Olšák wrote:
> > Hi,
> >
> > Gitlab emails are difficult to filter because issues and MRs are not
> > easy to distinguish. I know that Matt has a script which does this,
> > but since that was kind of difficult for me to set up, I resorted to
> > these filters instead and they work pretty well.
> >
> > Matches: from:gitlab ("created a merge request" OR "pushed new
> > commits to merge request" OR ("Merge request" AROUND 1 "was"))
> > Do this: Apply label "MR"
> >
> > Matches: from:gitlab "Issue was closed by"
> > Do this: Apply label "Issue Closed"
> >
> > Matches: from:gitlab "Merge request" AROUND 1 "was merged"
> > Do this: Apply label "Merged"
> >
> > Matches: from:gitlab "Merge request" AROUND 1 "was closed"
> > Do this: Apply label "MR Closed"
> >
> >
> > Issues are not labeled and it doesn't seem possible, but anything
> > that is not a MR should be a Gitlab issue.
> >
> > Marek
>
>


Gitlab filters in Gmail

2022-07-14 Thread Marek Olšák
Hi,

Gitlab emails are difficult to filter because issues and MRs are not easy
to distinguish. I know that Matt has a script which does this, but since
that was kind of difficult for me to set up, I resorted to these filters
instead and they work pretty well.

Matches: from:gitlab ("created a merge request" OR "pushed new commits to
merge request" OR ("Merge request" AROUND 1 "was"))
Do this: Apply label "MR"

Matches: from:gitlab "Issue was closed by"
Do this: Apply label "Issue Closed"

Matches: from:gitlab "Merge request" AROUND 1 "was merged"
Do this: Apply label "Merged"

Matches: from:gitlab "Merge request" AROUND 1 "was closed"
Do this: Apply label "MR Closed"


Issues are not labeled and it doesn't seem possible, but anything that is
not a MR should be a Gitlab issue.

Marek


Re: Xbox Series S/X UWP

2022-06-06 Thread Marek Olšák
> I'd love to see Mesa used to bring Vulkan to consoles!

Ever heard of Steam Deck? ;)

Marek

On Mon, Jun 6, 2022 at 12:59 PM Jason Ekstrand  wrote:

> On Mon, Jun 6, 2022 at 11:38 AM Jesse Natalie 
> wrote:
>
>> (Hopefully this goes through and not to spam like last time I tried to
>> respond…)
>>
>>
>>
>> No, neither of these would currently work with UWP.
>>
>>
>>
>> The primary reason is that neither Khronos API has extensions to
>> initialize the winsys on top of the UWP core window infrastructure. In
>> theory, you could initialize Dozen for offscreen rendering and then
>> explicitly marshal the contents out – that would probably work actually.
>> There’s 2 more gotchas there though:
>>
>>1. The ICD loaders (OpenGL32.dll, Vulkan-1.dll) are not available in
>>the UWP environment. You could explicitly use the non-ICD version of GL
>>(i.e. Mesa’s OpenGL32.dll from the libgl-gdi target), include the
>>open-source Vulkan ICD loader, or use the ICD version of either
>>(OpenGLOn12.dll/libgallium_wgl.dll for GL – I plan to delete the former at
>>some point and just use the latter at some point; vulkan_dzn.dll for VK).
>>
>> If the objective here is to write a Vulkan app and ship it in UWP, I
> don't see any reason why it couldn't be used for that eventually.  Without
> the loader, you can still load the driver DLL directly.  You just have to
> use vkGetInstance/DeviceProcAddr for everything because drivers don't
> expose the Vulkan 1.0 API symbols (the loader does).  We'd also have to
> come up with a story for window-system integration, as Jesse alluded.  I'd
> love to see Mesa used to bring Vulkan to consoles!  Unfortunately, no one's
> working on that currently or has any plans, as far as I know.
>
> --Jason
>
>
>
>>
>>1.
>>2. There’s not currently extensions for D3D12 interop either spec’d
>>or implemented.
>>
>>
>>
>> There’s one more problem for GL that I don’t think is problematic for VK,
>> which is that it uses APIs that are banned from the UWP environment,
>> specifically around inserting window hooks for Win32 framebuffer lifetime
>> management. So you’d probably have to build a custom version that has all
>> of that stuff stripped out to get it to be shippable in a UWP.
>>
>>
>>
>> We (Microsoft) don’t really have plans to add this kind of stuff, at
>> least not in the near future, but I’d be open to accepting contributions
>> that enable this.
>>
>>
>>
>> -Jesse
>>
>>
>>
>> *From:* mesa-dev  *On Behalf Of 
>> *Daniel
>> Price
>> *Sent:* Monday, June 6, 2022 5:41 AM
>> *To:* mesa-dev@lists.freedesktop.org
>> *Subject:* [EXTERNAL] Xbox Series S/X UWP
>>
>>
>>
>> You don't often get email from riverpr...@hotmail.com. Learn why this is
>> important 
>>
>> Hi, I was wandering if these two layers would work with UWP on Xbox
>> Series Console or if not will there be plans to add support?
>>
>>
>>
>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14766
>>
>>
>>
>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14881
>>
>>
>>
>> Many Thanks
>>
>> Dan
>>
>


Re: Replacing NIR with SPIR-V?

2022-01-21 Thread Marek Olšák
> - Does it make sense to move to SPIR-V?

No, it doesn't make any sense whatsoever.

Marek

On Wed, Jan 19, 2022 at 8:17 PM Abel Bernabeu <
abel.berna...@esperantotech.com> wrote:

> Hi,
>
> My name Abel Bernabeu and I currently chair the Graphics and ML Special
> Interest Group within RISC-V.
>
> As part of my work for RISC-V I am currently looking at what is needed for
> supporting a graphics product that uses a (potentially extended) RISC-V ISA
> for its shading cores. My initial focus has been on analyzing the
> functional gap between RISC-V and SPIR-V, assuming that whatever is needed
> for a modern graphics accelerator is inevitably present on SPIR-V.
>
> Now, the thing is that most of the potential adopters on our committee
> will likely be interested in using mesa for developing their drivers and
> that means using NIR as intermediate representation. Thus, I also need to
> consider NIR when looking at the functional gap, doubling the amount of
> work during the analysis.
>
> Why is mesa using NIR as intermediate representation rather than SPIR-V?
> It would make my life easier if mesa used SPIR-V rather than NIR for
> communicating the front-end and the backends.
>
> I know it is a lot of work to migrate to SPIR-V, but I am interested in
> knowing what is the opinion of the mesa developers:
>
> - My understanding is that when mesa adopted NIR, there was no SPIR-V. Was
> a comparison made after the SPIR-V ratification?
>
> - Does it make sense to move to SPIR-V?
>
> - Is it feasible in terms of functionality supported by SPIR-V?
>
> - Is the cost worth the potential advantage of using a more commonly
> adopted standard?
>
> Thanks in advance for your time and thoughts.
>
> Regards.
>


NIR: is_used_once breaks multi-pass rendering

2022-01-20 Thread Marek Olšák
Hi,

"is_used_once" within an inexact transformation in nir_opt_algebraic can
lead to geometry differences with multi-pass rendering, causing incorrect
output. Here's an example to proof this:

Let's assume there is a pass that writes out some intermediate value from
the position calculation as a varying. Let's assume there is another pass
that does the same thing, but only draws to the depth buffer, so varyings
are eliminated. The second pass would get "is_used_once" because there is
just the position, and let's assume there is an inexact transformation with
"is_used_once" that matches that. On the other hand, the first pass
wouldn't get "is_used_once" because there is the varying. Now the same
position calculation is different for each pass, causing depth test
functions commonly used in multi-pass rendering such as EQUAL to fail.

The application might even use the exact same shader for both passes, and
the driver might just look for depth-only rendering and remove the varyings
based on that. Or it can introduce more "is_used_once" cases via uniform
inlining. From the app's point of view, the positions should be identical
between both passes if it's the exact same shader.

The workaround we have for this issue is called
"vs_position_always_invariant", which was added for inexact FMA fusing, but
it works with all inexact transformations containing "is_used_once".

This issue could be exacerbated by future optimizations.

Some of the solutions are:
- Remove "is_used_once" (safe)
- Enable vs_position_always_invariant by default (not helpful if the data
flow is shader->texture->shader->position)
- Always suppress inexact transformations containing "is_used_once" for all
instructions contributing to the final position value (less aggressive than
vs_position_always_invariant; it needs a proof that it's equivalent to
vs_position_always_invariant in terms of invariance, not behavior)
- Continue using app workarounds.

Just some food for thought.

Marek


Re: git and Marge troubles this week

2022-01-08 Thread Marek Olšák
The ac_surface_meta_address_test timeout occurs rarely and it's because the
test is computationally demanding. It's also possible the machine got
slower for some reason.

Marek

On Fri, Jan 7, 2022 at 12:32 PM Emma Anholt  wrote:

> On Fri, Jan 7, 2022 at 6:18 AM Connor Abbott  wrote:
> >
> > Unfortunately batch mode has only made it *worse* - I'm sure it's not
> > intentional, but it seems that it's still running the CI pipelines
> > individually after the batch pipeline passes and not merging them
> > right away, which completely defeats the point. See, for example,
> > !14213 which has gone through 8 cycles being batched with earlier MRs,
> > 5 of those passing only to have an earlier job in the batch spuriously
> > fail when actually merging and Marge seemingly giving up on merging it
> > (???). As I type it was "lucky" enough to be the first job in a batch
> > which passed and is currently running its pipeline and is blocked on
> > iris-whl-traces-performance (I have !14453 to disable that broken job,
> > but who knows with the Marge chaos when it's going to get merged...).
> >
> > Stepping back, I think it was a bad idea to push a "I think this might
> > help" type change like this without first carefully monitoring things
> > afterwards. An hour or so of babysitting Marge would've caught that
> > this wasn't working, and would've prevented many hours of backlog and
> > perception of general CI instability.
>
> I spent the day watching marge, like I do every day.  Looking at the
> logs, we got 0 MRs in during my work hours PST, out of about 14 or so
> marge assignments that day.  Leaving marge broken for the night would
> have been indistinguishable from the status quo, was my assessment.
>
> There was definitely some extra spam about trying batches, more than
> there were actual batches attempted.  My guess would be gitlab
> connection reliability stuff, but I'm not sure.
>
> Of the 5 batches marge attempted before the change was reverted, three
> fell to https://gitlab.freedesktop.org/mesa/mesa/-/issues/5837, one to
> the git fetch fails, and one to a new timeout I don't think I've seen
> before: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/17357425#L1731.
> Of all the sub-MRs involved in those batches, I think two of those
> might have gotten through by dodging the LAVA lab fail.  Marge's batch
> backoff did work, and !14436 and maybe !14433 landed during that time.
>


Re: revenge of CALLOC_STRUCT

2021-12-27 Thread Marek Olšák
I remember that it wasn't unusual on Windows to define malloc, calloc,
strdup, and free macros to redirect those calls to custom wrappers. That
would eliminate the need to have non-standard names like MALLOC.

There is also the pipe_reference debug code. Is anybody using that?

Marek

On Sun., Dec. 26, 2021, 05:37 Jose Fonseca,  wrote:

> I believe that as long as the CALLOC_STRUCT continue to get paired with
> right FREE call (or equivalent if these get renamed) either way should work
> for us.  Whatever option proposed gets followed, there's a risk these can
> get out of balanced, but the risk seems comparable.
>
>
> For context, IIRC, the main reason these macros remain useful for VMware
> is the sad state of memory debugging tools on Windows.  AFAICT, best hope
> of one day deprecating this would be use AddressSanitizer, which is
> supported on MSVC [1], but unfortunately not yet on MinGW w/ GCC [2], and
> we rely upon a lot for day-to-day development, using Linux
> cross-compilation.  Using MinGW w/ Clang cross compiler seems be a way to
> overcome this difficulty, but that too was still in somewhat experimental
> state when I last tried it.
>
>
> Jose
>
> [1]
> https://devblogs.microsoft.com/cppblog/asan-for-windows-x64-and-debug-build-support/
> [2]
> https://stackoverflow.com/questions/67619314/cannot-use-fsanitize-address-in-mingw-compiler
> --
> *From:* Dave Airlie 
> *Sent:* Wednesday, December 22, 2021 22:35
> *To:* mesa-dev ; Jose Fonseca <
> jfons...@vmware.com>; Brian Paul 
> *Subject:* revenge of CALLOC_STRUCT
>
> Hey,
>
> Happy holidays, and as though to consider over the break,
>
> We have the vmware used MALLOC/FREE/CALLOC/CALLOC_STRUCT wrappers used
> throughout gallium.
>
> We have ST_CALLOC_STRUCT in use in the mesa state tracker, not used in
> gallium.
>
> Merging the state tracker into mesa proper, and even prior to this a
> few CALLOC_STRUCT have started to leak into src/mesa/*.
>
> Now I don't think we want that, but CALLOC_STRUCT is a genuinely
> useful macro on it's own,
> I'm considering just defined a calloc_struct instead for the
> non-gallium use that goes with malloc/calloc/free.
>
> Any opinions, or should mesa just get u_memory.h support for Xmas?
>
> Dave.
>


Re: Moving code around, post classic

2021-12-07 Thread Marek Olšák
While the current directory structure is confusing, the new suggested
directory structure might not be helpful because GL is more spread out
anyway.

mapi is libglapi, so it seems to be its own thing, not a gallium thing
glx is libgl, same thing.
egl is libegl, same thing.
... unless we want to merge all libs with all drivers into one mega lib
built by gallium.

loader is more like util. mapi is partially util too besides being a lib.

Marek

On Tue, Dec 7, 2021 at 2:45 PM Dave Airlie  wrote:

> On Tue, 7 Dec 2021 at 09:51, Dylan Baker  wrote:
> >
> > Classic is gone, and the cleanups have begun, obviously. There is
> > another cleanup that I had in mind, which is moving src/mesa into
> > src/gallium/frontends/mesa. This makes the build system a little
> > cleaner, as currently we do some bending over backwards to get gallium,
> > mesa, and their tests built in the right order. But that's a big ol `git
> > mv`, and when I proposed it Dave and Ilia suggested it would be best to
> > do all of the post-classic code motion at once. So, let's talk about
> > what we want to move, and where we want to move it.
> >
> > Among the suggestions we had were:
> >
> > 1. Move src/mesa into src/gallium/frontends/mesa (I have patches for
> >this)
> >
> >Seems like a pretty obvoius thing to do, given that all of the other
> >gallium state trackers live there (OpenCL, video, d3d9, etc)
>
> I'm against this just for history finding reasons, although git tracks
> file renames it AFAIK fails to track directories, so you can only
> follow the files not the whole subdir back through history once you
> move it.
>
> But I guess enough people want to see it happen, and it will.
>
> >
> > 2. Move src/compiler/glsl into src/gallium/frontends/mesa as well
> >
> > Given that there are now no? drivers that use GLSL-IR directly, it
> > might make sense to move the glsl compiler into the mesa
> > state_tracker, and just have that lower to TGSI or NIR, and treat
> > GLSL-IR as an implementation detail of the OpenGL frontend.
> >
> > Unfortunately, there are a lot of code outside of glsl that uses the
> > linked list implementation in the glsl compiler, and not the on in
> > util.
> >
> > 3. Move src/gallium* to src/
> >
> > This was suggested, though given the existance of Vulkan, it wasn't
> > clear that this was a good idea or not
> >
> > 4. What to do about the src/loader, src/glx, src/egl, src/mapi,
> >src/glapi
> >
> > These are all part of OpenGL, but not really part of gallium, but if
> > we don't move src/gallium/* to src/ does it make sense to leave them
> > in the root?
>
> src/opengl ?
>
> Dave.
>


Re: Moving code around, post classic

2021-12-06 Thread Marek Olšák
Hi,

1. If this happens, let's call it src/gallium/frontends/gl.

3. The src directory already has too much stuff.

Marek

On Mon, Dec 6, 2021 at 6:51 PM Dylan Baker  wrote:

> Classic is gone, and the cleanups have begun, obviously. There is
> another cleanup that I had in mind, which is moving src/mesa into
> src/gallium/frontends/mesa. This makes the build system a little
> cleaner, as currently we do some bending over backwards to get gallium,
> mesa, and their tests built in the right order. But that's a big ol `git
> mv`, and when I proposed it Dave and Ilia suggested it would be best to
> do all of the post-classic code motion at once. So, let's talk about
> what we want to move, and where we want to move it.
>
> Among the suggestions we had were:
>
> 1. Move src/mesa into src/gallium/frontends/mesa (I have patches for
>this)
>
>Seems like a pretty obvoius thing to do, given that all of the other
>gallium state trackers live there (OpenCL, video, d3d9, etc)
>
> 2. Move src/compiler/glsl into src/gallium/frontends/mesa as well
>
> Given that there are now no? drivers that use GLSL-IR directly, it
> might make sense to move the glsl compiler into the mesa
> state_tracker, and just have that lower to TGSI or NIR, and treat
> GLSL-IR as an implementation detail of the OpenGL frontend.
>
> Unfortunately, there are a lot of code outside of glsl that uses the
> linked list implementation in the glsl compiler, and not the on in
> util.
>
> 3. Move src/gallium* to src/
>
> This was suggested, though given the existance of Vulkan, it wasn't
> clear that this was a good idea or not
>
> 4. What to do about the src/loader, src/glx, src/egl, src/mapi,
>src/glapi
>
> These are all part of OpenGL, but not really part of gallium, but if
> we don't move src/gallium/* to src/ does it make sense to leave them
> in the root?
>
>
> Cheers,
> Dylan


Re: [Mesa-dev] glvnd: a lot of CPU overhead, lower performance

2021-10-19 Thread Marek Olšák
https://gitlab.freedesktop.org/glvnd/libglvnd/-/issues/222

On Fri, Oct 15, 2021 at 3:27 PM Marek Olšák  wrote:

> Hi,
>
> I just enabled glvnd and noticed the performance is much lower.
>
> glxgears: 14% perf drop
> glvnd off: ~28000 FPS
> glvnd on: ~24000 FPS
>
> viewperf13 (some subtest): 11% perf drop
> glvnd off: 201 FPS
> glvnd on: 179 FPS
>
> glvnd spends a lot of time in libGL.so and some in libGLdispatch.so.
>
> The "off" results are with LD_LIBRARY_PATH redirected to libGL without
> glvnd.
>
> I'm curious if anybody knows how to persuade glvnd to be faster.
>
> Thanks,
> Marek
>


[Mesa-dev] glvnd: a lot of CPU overhead, lower performance

2021-10-15 Thread Marek Olšák
Hi,

I just enabled glvnd and noticed the performance is much lower.

glxgears: 14% perf drop
glvnd off: ~28000 FPS
glvnd on: ~24000 FPS

viewperf13 (some subtest): 11% perf drop
glvnd off: 201 FPS
glvnd on: 179 FPS

glvnd spends a lot of time in libGL.so and some in libGLdispatch.so.

The "off" results are with LD_LIBRARY_PATH redirected to libGL without
glvnd.

I'm curious if anybody knows how to persuade glvnd to be faster.

Thanks,
Marek


Re: [Mesa-dev] Workflow Proposal

2021-10-12 Thread Marek Olšák
I'd like gitlab macros :rb: and :ab: that put the tags into the comment.

Marek

On Tue, Oct 12, 2021 at 5:01 PM Jason Ekstrand  wrote:

> On Tue, Oct 12, 2021 at 3:56 PM apinheiro  wrote:
> >
> >
> > On 12/10/21 13:55, Alyssa Rosenzweig wrote:
> >
> > I would love to see this be the process across Mesa.  We already don't
> > rewrite commit messages for freedreno and i915g, and I only have to do
> > the rebase (busy-)work for my projects in other areas of the tree.
> >
> > Likewise for Panfrost. At least, I don't do the rewriting. Some Panfrost
> > devs do, which I'm fine with. But it's not a requirement to merging.
> >
> > The arguments about "who can help support this years from now?" are moot
> > at our scale... the team is small enough that the name on the reviewer
> > is likely the code owner / maintainer, and patches regularly go in
> > unreviewed for lack of review bandwidth.
> >
> > There is another reason to the Rb tag, that is to measure the quantity
> > of patch review people do.
> >
> > This was well summarized some years ago by Matt Turner, as it was
> > minimized (even suggested to be removed) on a different thread:
> >
> > https://lists.freedesktop.org/archives/mesa-dev/2019-January/213586.html
> >
> > I was part of the Intel team when people started doing this r-b
> > counting.  I believe that it was being done due to Intel management's
> > failure to understand who was doing the work on the team and credit
> > them appropriately, and also to encourage those doing less to step up.
> >
> >
> > That's basically the same problem with trying to measure and compare
> developers just by commit count. In theory commit count is a bad measure
> for that. In practice it is used somehow.
> >
> > Unfortunately, the problem with Intel management wasn't a lack of
> > available information, and I didn't see publishing the counts change
> > reviews either.
> >
> > 
> >
> > Upstream should do what's best for upstream, not for Intel's "unique"
> > management.
> >
> >
> > Not sure how from Emma explaining how Rb tags were used by Intel
> management it came the conclusion that it were used in that way only by
> Intel management. Spoiler: it is not.
> >
> > Replying both, that's is one of the reasons I pointed original Matt
> Turner email. He never mentioned explicitly Intel management, neither
> pointed this as an accurate measure of the use. Quoting:
> >
> > "The number of R-b tags is not a 100% accurate picture of the
> > situation, but it gives at least a good overview of who is doing the
> > tedious work of patch review. "
> >
> > In any case, just to be clear here: Im not saying that the Rb tags main
> use is this one. Just saying that is one of their uses, and the value for
> such use can be debatable, but it is not zero.
>
> Negative numbers aren't zero!
>
> --Jason
>


Re: [Mesa-dev] Workflow Proposal

2021-10-07 Thread Marek Olšák
Despite all the time it takes to add the tags and force-push, I have no
objection to doing that. It captures per-commit reviews well.

Marek

On Thu, Oct 7, 2021 at 1:17 PM Eero Tamminen 
wrote:

> Hi,
>
> On 7.10.2021 19.51, Daniel Stone wrote:
> > On Thu, 7 Oct 2021 at 09:38, Eero Tamminen 
> wrote:
> >> This sounds horrible from the point of view of trying to track down
> >> somebody who knows about what's & why's of some old commit that is later
> >> on found to cause issues...
> >
> > But why would your first point of call not be to go back to the review
> > discussion and look at the context and what was said at the time? Then
> > when you do that, you can see not only what happened, but also who was
> > involved and saying what at the time.
>
> You're assuming that:
> - The review discussion is still available [1]
> - One can find it based on given individual commit
>
> [1] system hosting it could be down, or network could be down.
>
> It's maybe a bit contrived situation, but I kind of prefer
> self-contained information. What, why and who is better to be in commit
> itself than only in MR.
>
>
> - Eero
>


Re: [Mesa-dev] Let's enable _GLIBCXX_ASSERTIONS=1 on mesa debug builds

2021-09-10 Thread Marek Olšák
Yes, this would be useful.

Marek


On Fri, Sep 10, 2021 at 1:20 PM Timur Kristóf 
wrote:

> Hi,
>
> We've been recently working on tracking down some "mysterious" crashes
> that some users experienced on distro builds of mesa but we couldn't
> reproduce locally, until we found out about _GLIBCXX_ASSERTIONS=1 which
> seems to be not enabled by default in mesa, but is enabled by a lot of
> distros.
>
> I realize that enabling it by default on all mesa builds would have
> performance implications, so I propose to just enable it by default in
> mesa debug builds.
>
> What do you think? Would this be okay with the mesa community?
>
> Thanks & best regards,
> Timur
>
>
>


Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-06-17 Thread Marek Olšák
Timeline semaphore waits (polling on memory) will be unmonitored and as
fast as the roundtrip to memory. Semaphore writes will be slower because
the copy of those write requests will also be forwarded to the kernel.
Arbitrary writes are not protected by the hw but the kernel will take
action against such behavior because it will receive them too.

I don't know if that would work with dma_fence.

Marek


On Thu, Jun 17, 2021 at 3:04 PM Daniel Vetter  wrote:

> On Thu, Jun 17, 2021 at 02:28:06PM -0400, Marek Olšák wrote:
> > The kernel will know who should touch the implicit-sync semaphore next,
> and
> > at the same time, the copy of all write requests to the implicit-sync
> > semaphore will be forwarded to the kernel for monitoring and bo_wait.
> >
> > Syncobjs could either use the same monitored access as implicit sync or
> be
> > completely unmonitored. We haven't decided yet.
> >
> > Syncfiles could either use one of the above or wait for a syncobj to go
> > idle before converting to a syncfile.
>
> Hm this sounds all like you're planning to completely rewrap everything
> ... I'm assuming the plan is still that this is going to be largely
> wrapped in dma_fence? Maybe with timeline objects being a bit more
> optimized, but I'm not sure how much you can optimize without breaking the
> interfaces.
> -Daniel
>
> >
> > Marek
> >
> >
> >
> > On Thu, Jun 17, 2021 at 12:48 PM Daniel Vetter  wrote:
> >
> > > On Mon, Jun 14, 2021 at 07:13:00PM +0200, Christian König wrote:
> > > > As long as we can figure out who touched to a certain sync object
> last
> > > that
> > > > would indeed work, yes.
> > >
> > > Don't you need to know who will touch it next, i.e. who is holding up
> your
> > > fence? Or maybe I'm just again totally confused.
> > > -Daniel
> > >
> > > >
> > > > Christian.
> > > >
> > > > Am 14.06.21 um 19:10 schrieb Marek Olšák:
> > > > > The call to the hw scheduler has a limitation on the size of all
> > > > > parameters combined. I think we can only pass a 32-bit sequence
> number
> > > > > and a ~16-bit global (per-GPU) syncobj handle in one call and not
> much
> > > > > else.
> > > > >
> > > > > The syncobj handle can be an element index in a global (per-GPU)
> > > syncobj
> > > > > table and it's read only for all processes with the exception of
> the
> > > > > signal command. Syncobjs can either have per VMID write access
> flags
> > > for
> > > > > the signal command (slow), or any process can write to any
> syncobjs and
> > > > > only rely on the kernel checking the write log (fast).
> > > > >
> > > > > In any case, we can execute the memory write in the queue engine
> and
> > > > > only use the hw scheduler for logging, which would be perfect.
> > > > >
> > > > > Marek
> > > > >
> > > > > On Thu, Jun 10, 2021 at 12:33 PM Christian König
> > > > >  > > > > <mailto:ckoenig.leichtzumer...@gmail.com>> wrote:
> > > > >
> > > > > Hi guys,
> > > > >
> > > > > maybe soften that a bit. Reading from the shared memory of the
> > > > > user fence is ok for everybody. What we need to take more care
> of
> > > > > is the writing side.
> > > > >
> > > > > So my current thinking is that we allow read only access, but
> > > > > writing a new sequence value needs to go through the
> > > scheduler/kernel.
> > > > >
> > > > > So when the CPU wants to signal a timeline fence it needs to
> call
> > > > > an IOCTL. When the GPU wants to signal the timeline fence it
> needs
> > > > > to hand that of to the hardware scheduler.
> > > > >
> > > > > If we lockup the kernel can check with the hardware who did the
> > > > > last write and what value was written.
> > > > >
> > > > > That together with an IOCTL to give out sequence number for
> > > > > implicit sync to applications should be sufficient for the
> kernel
> > > > > to track who is responsible if something bad happens.
> > > > >
> > > > > In other words when the hardware says that the shader wrote
> stuff
> > > > > like 0xdeadbeef 0x0 or 0xff

Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-06-17 Thread Marek Olšák
The kernel will know who should touch the implicit-sync semaphore next, and
at the same time, the copy of all write requests to the implicit-sync
semaphore will be forwarded to the kernel for monitoring and bo_wait.

Syncobjs could either use the same monitored access as implicit sync or be
completely unmonitored. We haven't decided yet.

Syncfiles could either use one of the above or wait for a syncobj to go
idle before converting to a syncfile.

Marek



On Thu, Jun 17, 2021 at 12:48 PM Daniel Vetter  wrote:

> On Mon, Jun 14, 2021 at 07:13:00PM +0200, Christian König wrote:
> > As long as we can figure out who touched to a certain sync object last
> that
> > would indeed work, yes.
>
> Don't you need to know who will touch it next, i.e. who is holding up your
> fence? Or maybe I'm just again totally confused.
> -Daniel
>
> >
> > Christian.
> >
> > Am 14.06.21 um 19:10 schrieb Marek Olšák:
> > > The call to the hw scheduler has a limitation on the size of all
> > > parameters combined. I think we can only pass a 32-bit sequence number
> > > and a ~16-bit global (per-GPU) syncobj handle in one call and not much
> > > else.
> > >
> > > The syncobj handle can be an element index in a global (per-GPU)
> syncobj
> > > table and it's read only for all processes with the exception of the
> > > signal command. Syncobjs can either have per VMID write access flags
> for
> > > the signal command (slow), or any process can write to any syncobjs and
> > > only rely on the kernel checking the write log (fast).
> > >
> > > In any case, we can execute the memory write in the queue engine and
> > > only use the hw scheduler for logging, which would be perfect.
> > >
> > > Marek
> > >
> > > On Thu, Jun 10, 2021 at 12:33 PM Christian König
> > >  > > <mailto:ckoenig.leichtzumer...@gmail.com>> wrote:
> > >
> > > Hi guys,
> > >
> > > maybe soften that a bit. Reading from the shared memory of the
> > > user fence is ok for everybody. What we need to take more care of
> > > is the writing side.
> > >
> > > So my current thinking is that we allow read only access, but
> > > writing a new sequence value needs to go through the
> scheduler/kernel.
> > >
> > > So when the CPU wants to signal a timeline fence it needs to call
> > > an IOCTL. When the GPU wants to signal the timeline fence it needs
> > > to hand that of to the hardware scheduler.
> > >
> > > If we lockup the kernel can check with the hardware who did the
> > > last write and what value was written.
> > >
> > > That together with an IOCTL to give out sequence number for
> > > implicit sync to applications should be sufficient for the kernel
> > > to track who is responsible if something bad happens.
> > >
> > > In other words when the hardware says that the shader wrote stuff
> > > like 0xdeadbeef 0x0 or 0x into memory we kill the process
> > > who did that.
> > >
> > > If the hardware says that seq - 1 was written fine, but seq is
> > > missing then the kernel blames whoever was supposed to write seq.
> > >
> > > Just pieping the write through a privileged instance should be
> > > fine to make sure that we don't run into issues.
> > >
> > > Christian.
> > >
> > > Am 10.06.21 um 17:59 schrieb Marek Olšák:
> > > > Hi Daniel,
> > > >
> > > > We just talked about this whole topic internally and we came up
> > > > to the conclusion that the hardware needs to understand sync
> > > > object handles and have high-level wait and signal operations in
> > > > the command stream. Sync objects will be backed by memory, but
> > > > they won't be readable or writable by processes directly. The
> > > > hardware will log all accesses to sync objects and will send the
> > > > log to the kernel periodically. The kernel will identify
> > > > malicious behavior.
> > > >
> > > > Example of a hardware command stream:
> > > > ...
> > > > ImplicitSyncWait(syncObjHandle, sequenceNumber); // the sequence
> > > > number is assigned by the kernel
> > > > Draw();
> > > > ImplicitSyncSignalWhenDone(syncObjHandle);
> > > > ...
> > > >
> > > > I'm afra

Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-06-14 Thread Marek Olšák
The call to the hw scheduler has a limitation on the size of all parameters
combined. I think we can only pass a 32-bit sequence number and a ~16-bit
global (per-GPU) syncobj handle in one call and not much else.

The syncobj handle can be an element index in a global (per-GPU) syncobj
table and it's read only for all processes with the exception of the signal
command. Syncobjs can either have per VMID write access flags for the
signal command (slow), or any process can write to any syncobjs and only
rely on the kernel checking the write log (fast).

In any case, we can execute the memory write in the queue engine and only
use the hw scheduler for logging, which would be perfect.

Marek

On Thu, Jun 10, 2021 at 12:33 PM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Hi guys,
>
> maybe soften that a bit. Reading from the shared memory of the user fence
> is ok for everybody. What we need to take more care of is the writing side.
>
> So my current thinking is that we allow read only access, but writing a
> new sequence value needs to go through the scheduler/kernel.
>
> So when the CPU wants to signal a timeline fence it needs to call an
> IOCTL. When the GPU wants to signal the timeline fence it needs to hand
> that of to the hardware scheduler.
>
> If we lockup the kernel can check with the hardware who did the last write
> and what value was written.
>
> That together with an IOCTL to give out sequence number for implicit sync
> to applications should be sufficient for the kernel to track who is
> responsible if something bad happens.
>
> In other words when the hardware says that the shader wrote stuff like
> 0xdeadbeef 0x0 or 0x into memory we kill the process who did that.
>
> If the hardware says that seq - 1 was written fine, but seq is missing
> then the kernel blames whoever was supposed to write seq.
>
> Just pieping the write through a privileged instance should be fine to
> make sure that we don't run into issues.
>
> Christian.
>
> Am 10.06.21 um 17:59 schrieb Marek Olšák:
>
> Hi Daniel,
>
> We just talked about this whole topic internally and we came up to the
> conclusion that the hardware needs to understand sync object handles and
> have high-level wait and signal operations in the command stream. Sync
> objects will be backed by memory, but they won't be readable or writable by
> processes directly. The hardware will log all accesses to sync objects and
> will send the log to the kernel periodically. The kernel will identify
> malicious behavior.
>
> Example of a hardware command stream:
> ...
> ImplicitSyncWait(syncObjHandle, sequenceNumber); // the sequence number is
> assigned by the kernel
> Draw();
> ImplicitSyncSignalWhenDone(syncObjHandle);
> ...
>
> I'm afraid we have no other choice because of the TLB invalidation
> overhead.
>
> Marek
>
>
> On Wed, Jun 9, 2021 at 2:31 PM Daniel Vetter  wrote:
>
>> On Wed, Jun 09, 2021 at 03:58:26PM +0200, Christian König wrote:
>> > Am 09.06.21 um 15:19 schrieb Daniel Vetter:
>> > > [SNIP]
>> > > > Yeah, we call this the lightweight and the heavyweight tlb flush.
>> > > >
>> > > > The lighweight can be used when you are sure that you don't have
>> any of the
>> > > > PTEs currently in flight in the 3D/DMA engine and you just need to
>> > > > invalidate the TLB.
>> > > >
>> > > > The heavyweight must be used when you need to invalidate the TLB
>> *AND* make
>> > > > sure that no concurrently operation moves new stuff into the TLB.
>> > > >
>> > > > The problem is for this use case we have to use the heavyweight one.
>> > > Just for my own curiosity: So the lightweight flush is only for
>> in-between
>> > > CS when you know access is idle? Or does that also not work if
>> userspace
>> > > has a CS on a dma engine going at the same time because the tlb aren't
>> > > isolated enough between engines?
>> >
>> > More or less correct, yes.
>> >
>> > The problem is a lightweight flush only invalidates the TLB, but doesn't
>> > take care of entries which have been handed out to the different
>> engines.
>> >
>> > In other words what can happen is the following:
>> >
>> > 1. Shader asks TLB to resolve address X.
>> > 2. TLB looks into its cache and can't find address X so it asks the
>> walker
>> > to resolve.
>> > 3. Walker comes back with result for address X and TLB puts that into
>> its
>> > cache and gives it to Shader.
>> > 4. Shader starts doing some operation 

Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-06-10 Thread Marek Olšák
Hi Daniel,

We just talked about this whole topic internally and we came up to the
conclusion that the hardware needs to understand sync object handles and
have high-level wait and signal operations in the command stream. Sync
objects will be backed by memory, but they won't be readable or writable by
processes directly. The hardware will log all accesses to sync objects and
will send the log to the kernel periodically. The kernel will identify
malicious behavior.

Example of a hardware command stream:
...
ImplicitSyncWait(syncObjHandle, sequenceNumber); // the sequence number is
assigned by the kernel
Draw();
ImplicitSyncSignalWhenDone(syncObjHandle);
...

I'm afraid we have no other choice because of the TLB invalidation overhead.

Marek


On Wed, Jun 9, 2021 at 2:31 PM Daniel Vetter  wrote:

> On Wed, Jun 09, 2021 at 03:58:26PM +0200, Christian König wrote:
> > Am 09.06.21 um 15:19 schrieb Daniel Vetter:
> > > [SNIP]
> > > > Yeah, we call this the lightweight and the heavyweight tlb flush.
> > > >
> > > > The lighweight can be used when you are sure that you don't have any
> of the
> > > > PTEs currently in flight in the 3D/DMA engine and you just need to
> > > > invalidate the TLB.
> > > >
> > > > The heavyweight must be used when you need to invalidate the TLB
> *AND* make
> > > > sure that no concurrently operation moves new stuff into the TLB.
> > > >
> > > > The problem is for this use case we have to use the heavyweight one.
> > > Just for my own curiosity: So the lightweight flush is only for
> in-between
> > > CS when you know access is idle? Or does that also not work if
> userspace
> > > has a CS on a dma engine going at the same time because the tlb aren't
> > > isolated enough between engines?
> >
> > More or less correct, yes.
> >
> > The problem is a lightweight flush only invalidates the TLB, but doesn't
> > take care of entries which have been handed out to the different engines.
> >
> > In other words what can happen is the following:
> >
> > 1. Shader asks TLB to resolve address X.
> > 2. TLB looks into its cache and can't find address X so it asks the
> walker
> > to resolve.
> > 3. Walker comes back with result for address X and TLB puts that into its
> > cache and gives it to Shader.
> > 4. Shader starts doing some operation using result for address X.
> > 5. You send lightweight TLB invalidate and TLB throws away cached values
> for
> > address X.
> > 6. Shader happily still uses whatever the TLB gave to it in step 3 to
> > accesses address X
> >
> > See it like the shader has their own 1 entry L0 TLB cache which is not
> > affected by the lightweight flush.
> >
> > The heavyweight flush on the other hand sends out a broadcast signal to
> > everybody and only comes back when we are sure that an address is not in
> use
> > any more.
>
> Ah makes sense. On intel the shaders only operate in VA, everything goes
> around as explicit async messages to IO blocks. So we don't have this, the
> only difference in tlb flushes is between tlb flush in the IB and an mmio
> one which is independent for anything currently being executed on an
> egine.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-06-03 Thread Marek Olšák
On Thu., Jun. 3, 2021, 15:18 Daniel Vetter,  wrote:

> On Thu, Jun 3, 2021 at 7:53 PM Marek Olšák  wrote:
> >
> > Daniel, I think what you are suggesting is that we need to enable user
> queues with the drm scheduler and dma_fence first, and once that works, we
> can investigate how much of that kernel logic can be moved to the hw. Would
> that work? In theory it shouldn't matter whether the kernel does it or the
> hw does it. It's the same code, just in a different place.
>
> Yeah I guess that's another way to look at it. Maybe in practice
> you'll just move it from the kernel to userspace, which then programs
> the hw waits directly into its IB. That's at least how I'd do it on
> i915, assuming I'd have such hw. So these fences that userspace
> programs directly (to sync within itself) won't even show up as
> dependencies in the kernel.
>
> And then yes on the other side you can lift work from the
> drm/scheduler wrt dependencies you get in the kernel (whether explicit
> sync with sync_file, or implicit sync fished out of dma_resv) and
> program the hw directly that way. That would mean that userspace wont
> fill the ringbuffer directly, but the kernel would do that, so that
> you have space to stuff in the additional waits. Again assuming i915
> hw model, maybe works differently on amd. Iirc we have some of that
> already in the i915 scheduler, but I'd need to recheck how much it
> really uses the hw semaphores.
>

I was thinking we would pass per process syncobj handles and buffer handles
into commands in the user queue, or something equivalent. We do have a
large degree of programmability in the hw that we can do something like
that. The question is whether this high level user->hw interface would have
any advantage over trivial polling on memory, etc. My impression is no
because the kernel would be robust enough that it wouldn't matter what
userspace does, but I don't know. Anyway, all we need is user queues and
what your proposed seems totally sufficient.

Marek

-Daniel
>
> > Thanks,
> > Marek
> >
> > On Thu, Jun 3, 2021 at 7:22 AM Daniel Vetter  wrote:
> >>
> >> On Thu, Jun 3, 2021 at 12:55 PM Marek Olšák  wrote:
> >> >
> >> > On Thu., Jun. 3, 2021, 06:03 Daniel Vetter,  wrote:
> >> >>
> >> >> On Thu, Jun 03, 2021 at 04:20:18AM -0400, Marek Olšák wrote:
> >> >> > On Thu, Jun 3, 2021 at 3:47 AM Daniel Vetter 
> wrote:
> >> >> >
> >> >> > > On Wed, Jun 02, 2021 at 11:16:39PM -0400, Marek Olšák wrote:
> >> >> > > > On Wed, Jun 2, 2021 at 2:48 PM Daniel Vetter 
> wrote:
> >> >> > > >
> >> >> > > > > On Wed, Jun 02, 2021 at 05:38:51AM -0400, Marek Olšák wrote:
> >> >> > > > > > On Wed, Jun 2, 2021 at 5:34 AM Marek Olšák <
> mar...@gmail.com> wrote:
> >> >> > > > > >
> >> >> > > > > > > Yes, we can't break anything because we don't want to
> complicate
> >> >> > > things
> >> >> > > > > > > for us. It's pretty much all NAK'd already. We are
> trying to gather
> >> >> > > > > more
> >> >> > > > > > > knowledge and then make better decisions.
> >> >> > > > > > >
> >> >> > > > > > > The idea we are considering is that we'll expose
> memory-based sync
> >> >> > > > > objects
> >> >> > > > > > > to userspace for read only, and the kernel or hw will
> strictly
> >> >> > > control
> >> >> > > > > the
> >> >> > > > > > > memory writes to those sync objects. The hole in that
> idea is that
> >> >> > > > > > > userspace can decide not to signal a job, so even if
> userspace
> >> >> > > can't
> >> >> > > > > > > overwrite memory-based sync object states arbitrarily,
> it can still
> >> >> > > > > decide
> >> >> > > > > > > not to signal them, and then a future fence is born.
> >> >> > > > > > >
> >> >> > > > > >
> >> >> > > > > > This would actually be treated as a GPU hang caused by
> that context,
> >> >> > > so
> >> >> > > > > it
> >> >> > > > > > should be fine.
> >> >> > > > >
> >> >> > > > &g

Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-06-03 Thread Marek Olšák
Daniel, I think what you are suggesting is that we need to enable user
queues with the drm scheduler and dma_fence first, and once that works, we
can investigate how much of that kernel logic can be moved to the hw. Would
that work? In theory it shouldn't matter whether the kernel does it or the
hw does it. It's the same code, just in a different place.

Thanks,
Marek

On Thu, Jun 3, 2021 at 7:22 AM Daniel Vetter  wrote:

> On Thu, Jun 3, 2021 at 12:55 PM Marek Olšák  wrote:
> >
> > On Thu., Jun. 3, 2021, 06:03 Daniel Vetter,  wrote:
> >>
> >> On Thu, Jun 03, 2021 at 04:20:18AM -0400, Marek Olšák wrote:
> >> > On Thu, Jun 3, 2021 at 3:47 AM Daniel Vetter  wrote:
> >> >
> >> > > On Wed, Jun 02, 2021 at 11:16:39PM -0400, Marek Olšák wrote:
> >> > > > On Wed, Jun 2, 2021 at 2:48 PM Daniel Vetter 
> wrote:
> >> > > >
> >> > > > > On Wed, Jun 02, 2021 at 05:38:51AM -0400, Marek Olšák wrote:
> >> > > > > > On Wed, Jun 2, 2021 at 5:34 AM Marek Olšák 
> wrote:
> >> > > > > >
> >> > > > > > > Yes, we can't break anything because we don't want to
> complicate
> >> > > things
> >> > > > > > > for us. It's pretty much all NAK'd already. We are trying
> to gather
> >> > > > > more
> >> > > > > > > knowledge and then make better decisions.
> >> > > > > > >
> >> > > > > > > The idea we are considering is that we'll expose
> memory-based sync
> >> > > > > objects
> >> > > > > > > to userspace for read only, and the kernel or hw will
> strictly
> >> > > control
> >> > > > > the
> >> > > > > > > memory writes to those sync objects. The hole in that idea
> is that
> >> > > > > > > userspace can decide not to signal a job, so even if
> userspace
> >> > > can't
> >> > > > > > > overwrite memory-based sync object states arbitrarily, it
> can still
> >> > > > > decide
> >> > > > > > > not to signal them, and then a future fence is born.
> >> > > > > > >
> >> > > > > >
> >> > > > > > This would actually be treated as a GPU hang caused by that
> context,
> >> > > so
> >> > > > > it
> >> > > > > > should be fine.
> >> > > > >
> >> > > > > This is practically what I proposed already, except your not
> doing it
> >> > > with
> >> > > > > dma_fence. And on the memory fence side this also doesn't
> actually give
> >> > > > > what you want for that compute model.
> >> > > > >
> >> > > > > This seems like a bit a worst of both worlds approach to me?
> Tons of
> >> > > work
> >> > > > > in the kernel to hide these not-dma_fence-but-almost, and still
> pain to
> >> > > > > actually drive the hardware like it should be for compute or
> direct
> >> > > > > display.
> >> > > > >
> >> > > > > Also maybe I've missed it, but I didn't see any replies to my
> >> > > suggestion
> >> > > > > how to fake the entire dma_fence stuff on top of new hw. Would
> be
> >> > > > > interesting to know what doesn't work there instead of amd
> folks going
> >> > > of
> >> > > > > into internal again and then coming back with another rfc from
> out of
> >> > > > > nowhere :-)
> >> > > > >
> >> > > >
> >> > > > Going internal again is probably a good idea to spare you the long
> >> > > > discussions and not waste your time, but we haven't talked about
> the
> >> > > > dma_fence stuff internally other than acknowledging that it can be
> >> > > solved.
> >> > > >
> >> > > > The compute use case already uses the hw as-is with no
> inter-process
> >> > > > sharing, which mostly keeps the kernel out of the picture. It uses
> >> > > glFinish
> >> > > > to sync with GL.
> >> > > >
> >> > > > The gfx use case needs new hardware logic to support implicit and
> >> > > explicit
> >> > >

Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-06-03 Thread Marek Olšák
On Thu., Jun. 3, 2021, 06:03 Daniel Vetter,  wrote:

> On Thu, Jun 03, 2021 at 04:20:18AM -0400, Marek Olšák wrote:
> > On Thu, Jun 3, 2021 at 3:47 AM Daniel Vetter  wrote:
> >
> > > On Wed, Jun 02, 2021 at 11:16:39PM -0400, Marek Olšák wrote:
> > > > On Wed, Jun 2, 2021 at 2:48 PM Daniel Vetter 
> wrote:
> > > >
> > > > > On Wed, Jun 02, 2021 at 05:38:51AM -0400, Marek Olšák wrote:
> > > > > > On Wed, Jun 2, 2021 at 5:34 AM Marek Olšák 
> wrote:
> > > > > >
> > > > > > > Yes, we can't break anything because we don't want to
> complicate
> > > things
> > > > > > > for us. It's pretty much all NAK'd already. We are trying to
> gather
> > > > > more
> > > > > > > knowledge and then make better decisions.
> > > > > > >
> > > > > > > The idea we are considering is that we'll expose memory-based
> sync
> > > > > objects
> > > > > > > to userspace for read only, and the kernel or hw will strictly
> > > control
> > > > > the
> > > > > > > memory writes to those sync objects. The hole in that idea is
> that
> > > > > > > userspace can decide not to signal a job, so even if userspace
> > > can't
> > > > > > > overwrite memory-based sync object states arbitrarily, it can
> still
> > > > > decide
> > > > > > > not to signal them, and then a future fence is born.
> > > > > > >
> > > > > >
> > > > > > This would actually be treated as a GPU hang caused by that
> context,
> > > so
> > > > > it
> > > > > > should be fine.
> > > > >
> > > > > This is practically what I proposed already, except your not doing
> it
> > > with
> > > > > dma_fence. And on the memory fence side this also doesn't actually
> give
> > > > > what you want for that compute model.
> > > > >
> > > > > This seems like a bit a worst of both worlds approach to me? Tons
> of
> > > work
> > > > > in the kernel to hide these not-dma_fence-but-almost, and still
> pain to
> > > > > actually drive the hardware like it should be for compute or direct
> > > > > display.
> > > > >
> > > > > Also maybe I've missed it, but I didn't see any replies to my
> > > suggestion
> > > > > how to fake the entire dma_fence stuff on top of new hw. Would be
> > > > > interesting to know what doesn't work there instead of amd folks
> going
> > > of
> > > > > into internal again and then coming back with another rfc from out
> of
> > > > > nowhere :-)
> > > > >
> > > >
> > > > Going internal again is probably a good idea to spare you the long
> > > > discussions and not waste your time, but we haven't talked about the
> > > > dma_fence stuff internally other than acknowledging that it can be
> > > solved.
> > > >
> > > > The compute use case already uses the hw as-is with no inter-process
> > > > sharing, which mostly keeps the kernel out of the picture. It uses
> > > glFinish
> > > > to sync with GL.
> > > >
> > > > The gfx use case needs new hardware logic to support implicit and
> > > explicit
> > > > sync. When we propose a solution, it's usually torn apart the next
> day by
> > > > ourselves.
> > > >
> > > > Since we are talking about next hw or next next hw, preemption
> should be
> > > > better.
> > > >
> > > > user queue = user-mapped ring buffer
> > > >
> > > > For implicit sync, we will only let userspace lock access to a buffer
> > > via a
> > > > user queue, which waits for the per-buffer sequence counter in
> memory to
> > > be
> > > > >= the number assigned by the kernel, and later unlock the access
> with
> > > > another command, which increments the per-buffer sequence counter in
> > > memory
> > > > with atomic_inc regardless of the number assigned by the kernel. The
> > > kernel
> > > > counter and the counter in memory can be out-of-sync, and I'll
> explain
> > > why
> > > > it's OK. If a process increments the kernel counter but not the
> memory
> > > > c

Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-06-03 Thread Marek Olšák
On Thu, Jun 3, 2021 at 3:47 AM Daniel Vetter  wrote:

> On Wed, Jun 02, 2021 at 11:16:39PM -0400, Marek Olšák wrote:
> > On Wed, Jun 2, 2021 at 2:48 PM Daniel Vetter  wrote:
> >
> > > On Wed, Jun 02, 2021 at 05:38:51AM -0400, Marek Olšák wrote:
> > > > On Wed, Jun 2, 2021 at 5:34 AM Marek Olšák  wrote:
> > > >
> > > > > Yes, we can't break anything because we don't want to complicate
> things
> > > > > for us. It's pretty much all NAK'd already. We are trying to gather
> > > more
> > > > > knowledge and then make better decisions.
> > > > >
> > > > > The idea we are considering is that we'll expose memory-based sync
> > > objects
> > > > > to userspace for read only, and the kernel or hw will strictly
> control
> > > the
> > > > > memory writes to those sync objects. The hole in that idea is that
> > > > > userspace can decide not to signal a job, so even if userspace
> can't
> > > > > overwrite memory-based sync object states arbitrarily, it can still
> > > decide
> > > > > not to signal them, and then a future fence is born.
> > > > >
> > > >
> > > > This would actually be treated as a GPU hang caused by that context,
> so
> > > it
> > > > should be fine.
> > >
> > > This is practically what I proposed already, except your not doing it
> with
> > > dma_fence. And on the memory fence side this also doesn't actually give
> > > what you want for that compute model.
> > >
> > > This seems like a bit a worst of both worlds approach to me? Tons of
> work
> > > in the kernel to hide these not-dma_fence-but-almost, and still pain to
> > > actually drive the hardware like it should be for compute or direct
> > > display.
> > >
> > > Also maybe I've missed it, but I didn't see any replies to my
> suggestion
> > > how to fake the entire dma_fence stuff on top of new hw. Would be
> > > interesting to know what doesn't work there instead of amd folks going
> of
> > > into internal again and then coming back with another rfc from out of
> > > nowhere :-)
> > >
> >
> > Going internal again is probably a good idea to spare you the long
> > discussions and not waste your time, but we haven't talked about the
> > dma_fence stuff internally other than acknowledging that it can be
> solved.
> >
> > The compute use case already uses the hw as-is with no inter-process
> > sharing, which mostly keeps the kernel out of the picture. It uses
> glFinish
> > to sync with GL.
> >
> > The gfx use case needs new hardware logic to support implicit and
> explicit
> > sync. When we propose a solution, it's usually torn apart the next day by
> > ourselves.
> >
> > Since we are talking about next hw or next next hw, preemption should be
> > better.
> >
> > user queue = user-mapped ring buffer
> >
> > For implicit sync, we will only let userspace lock access to a buffer
> via a
> > user queue, which waits for the per-buffer sequence counter in memory to
> be
> > >= the number assigned by the kernel, and later unlock the access with
> > another command, which increments the per-buffer sequence counter in
> memory
> > with atomic_inc regardless of the number assigned by the kernel. The
> kernel
> > counter and the counter in memory can be out-of-sync, and I'll explain
> why
> > it's OK. If a process increments the kernel counter but not the memory
> > counter, that's its problem and it's the same as a GPU hang caused by
> that
> > process. If a process increments the memory counter but not the kernel
> > counter, the ">=" condition alongside atomic_inc guarantee that
> signaling n
> > will signal n+1, so it will never deadlock but also it will effectively
> > disable synchronization. This method of disabling synchronization is
> > similar to a process corrupting the buffer, which should be fine. Can you
> > find any flaw in it? I can't find any.
>
> Hm maybe I misunderstood what exactly you wanted to do earlier. That kind
> of "we let userspace free-wheel whatever it wants, kernel ensures
> correctness of the resulting chain of dma_fence with reset the entire
> context" is what I proposed too.
>
> Like you say, userspace is allowed to render garbage already.
>
> > The explicit submit can be done by userspace (if there is no
> > synchronization), but we plan to use the kernel to do it for implicit
> sync.
&g

Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-06-02 Thread Marek Olšák
On Wed, Jun 2, 2021 at 2:48 PM Daniel Vetter  wrote:

> On Wed, Jun 02, 2021 at 05:38:51AM -0400, Marek Olšák wrote:
> > On Wed, Jun 2, 2021 at 5:34 AM Marek Olšák  wrote:
> >
> > > Yes, we can't break anything because we don't want to complicate things
> > > for us. It's pretty much all NAK'd already. We are trying to gather
> more
> > > knowledge and then make better decisions.
> > >
> > > The idea we are considering is that we'll expose memory-based sync
> objects
> > > to userspace for read only, and the kernel or hw will strictly control
> the
> > > memory writes to those sync objects. The hole in that idea is that
> > > userspace can decide not to signal a job, so even if userspace can't
> > > overwrite memory-based sync object states arbitrarily, it can still
> decide
> > > not to signal them, and then a future fence is born.
> > >
> >
> > This would actually be treated as a GPU hang caused by that context, so
> it
> > should be fine.
>
> This is practically what I proposed already, except your not doing it with
> dma_fence. And on the memory fence side this also doesn't actually give
> what you want for that compute model.
>
> This seems like a bit a worst of both worlds approach to me? Tons of work
> in the kernel to hide these not-dma_fence-but-almost, and still pain to
> actually drive the hardware like it should be for compute or direct
> display.
>
> Also maybe I've missed it, but I didn't see any replies to my suggestion
> how to fake the entire dma_fence stuff on top of new hw. Would be
> interesting to know what doesn't work there instead of amd folks going of
> into internal again and then coming back with another rfc from out of
> nowhere :-)
>

Going internal again is probably a good idea to spare you the long
discussions and not waste your time, but we haven't talked about the
dma_fence stuff internally other than acknowledging that it can be solved.

The compute use case already uses the hw as-is with no inter-process
sharing, which mostly keeps the kernel out of the picture. It uses glFinish
to sync with GL.

The gfx use case needs new hardware logic to support implicit and explicit
sync. When we propose a solution, it's usually torn apart the next day by
ourselves.

Since we are talking about next hw or next next hw, preemption should be
better.

user queue = user-mapped ring buffer

For implicit sync, we will only let userspace lock access to a buffer via a
user queue, which waits for the per-buffer sequence counter in memory to be
>= the number assigned by the kernel, and later unlock the access with
another command, which increments the per-buffer sequence counter in memory
with atomic_inc regardless of the number assigned by the kernel. The kernel
counter and the counter in memory can be out-of-sync, and I'll explain why
it's OK. If a process increments the kernel counter but not the memory
counter, that's its problem and it's the same as a GPU hang caused by that
process. If a process increments the memory counter but not the kernel
counter, the ">=" condition alongside atomic_inc guarantee that signaling n
will signal n+1, so it will never deadlock but also it will effectively
disable synchronization. This method of disabling synchronization is
similar to a process corrupting the buffer, which should be fine. Can you
find any flaw in it? I can't find any.

The explicit submit can be done by userspace (if there is no
synchronization), but we plan to use the kernel to do it for implicit sync.
Essentially, the kernel will receive a buffer list and addresses of wait
commands in the user queue. It will assign new sequence numbers to all
buffers and write those numbers into the wait commands, and ring the hw
doorbell to start execution of that queue.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-06-02 Thread Marek Olšák
On Wed, Jun 2, 2021 at 5:44 AM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 02.06.21 um 10:57 schrieb Daniel Stone:
> > Hi Christian,
> >
> > On Tue, 1 Jun 2021 at 13:51, Christian König
> >  wrote:
> >> Am 01.06.21 um 14:30 schrieb Daniel Vetter:
> >>> If you want to enable this use-case with driver magic and without the
> >>> compositor being aware of what's going on, the solution is EGLStreams.
> >>> Not sure we want to go there, but it's definitely a lot more feasible
> >>> than trying to stuff eglstreams semantics into dma-buf implicit
> >>> fencing support in a desperate attempt to not change compositors.
> >> Well not changing compositors is certainly not something I would try
> >> with this use case.
> >>
> >> Not changing compositors is more like ok we have Ubuntu 20.04 and need
> >> to support that we the newest hardware generation.
> > Serious question, have you talked to Canonical?
> >
> > I mean there's a hell of a lot of effort being expended here, but it
> > seems to all be predicated on the assumption that Ubuntu's LTS
> > HWE/backport policy is totally immutable, and that we might need to
> > make the kernel do backflips to work around that. But ... is it? Has
> > anyone actually asked them how they feel about this?
>
> This was merely an example. What I wanted to say is that we need to
> support system already deployed.
>
> In other words our customers won't accept that they need to replace the
> compositor just because they switch to a new hardware generation.
>
> > I mean, my answer to the first email is 'no, absolutely not' from the
> > technical perspective (the initial proposal totally breaks current and
> > future userspace), from a design perspective (it breaks a lot of
> > usecases which aren't single-vendor GPU+display+codec, or aren't just
> > a simple desktop), and from a sustainability perspective (cutting
> > Android adrift again isn't acceptable collateral damage to make it
> > easier to backport things to last year's Ubuntu release).
> >
> > But then again, I don't even know what I'm NAKing here ... ? The
> > original email just lists a proposal to break a ton of things, with
> > proposed replacements which aren't technically viable, and it's not
> > clear why? Can we please have some more details and some reasoning
> > behind them?
> >
> > I don't mind that userspace (compositor, protocols, clients like Mesa
> > as well as codec APIs) need to do a lot of work to support the new
> > model. I do really care though that the hard-binary-switch model works
> > fine enough for AMD but totally breaks heterogeneous systems and makes
> > it impossible for userspace to do the right thing.
>
> Well how the handling for new Android, distributions etc... is going to
> look like is a completely different story.
>
> And I completely agree with both Daniel Vetter and you that we need to
> keep this in mind when designing the compatibility with older software.
>
> For Android I'm really not sure what to do. In general Android is
> already trying to do the right thing by using explicit sync, the problem
> is that this is build around the idea that this explicit sync is
> syncfile kernel based.
>
> Either we need to change Android and come up with something that works
> with user fences as well or we somehow invent a compatibility layer for
> syncfile as well.
>

What's the issue with syncfiles that syncobjs don't suffer from?

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-06-02 Thread Marek Olšák
On Wed, Jun 2, 2021 at 5:34 AM Marek Olšák  wrote:

> Yes, we can't break anything because we don't want to complicate things
> for us. It's pretty much all NAK'd already. We are trying to gather more
> knowledge and then make better decisions.
>
> The idea we are considering is that we'll expose memory-based sync objects
> to userspace for read only, and the kernel or hw will strictly control the
> memory writes to those sync objects. The hole in that idea is that
> userspace can decide not to signal a job, so even if userspace can't
> overwrite memory-based sync object states arbitrarily, it can still decide
> not to signal them, and then a future fence is born.
>

This would actually be treated as a GPU hang caused by that context, so it
should be fine.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-06-02 Thread Marek Olšák
Yes, we can't break anything because we don't want to complicate things for
us. It's pretty much all NAK'd already. We are trying to gather more
knowledge and then make better decisions.

The idea we are considering is that we'll expose memory-based sync objects
to userspace for read only, and the kernel or hw will strictly control the
memory writes to those sync objects. The hole in that idea is that
userspace can decide not to signal a job, so even if userspace can't
overwrite memory-based sync object states arbitrarily, it can still decide
not to signal them, and then a future fence is born.

Marek

On Wed, Jun 2, 2021 at 4:57 AM Daniel Stone  wrote:

> Hi Christian,
>
> On Tue, 1 Jun 2021 at 13:51, Christian König
>  wrote:
> > Am 01.06.21 um 14:30 schrieb Daniel Vetter:
> > > If you want to enable this use-case with driver magic and without the
> > > compositor being aware of what's going on, the solution is EGLStreams.
> > > Not sure we want to go there, but it's definitely a lot more feasible
> > > than trying to stuff eglstreams semantics into dma-buf implicit
> > > fencing support in a desperate attempt to not change compositors.
> >
> > Well not changing compositors is certainly not something I would try
> > with this use case.
> >
> > Not changing compositors is more like ok we have Ubuntu 20.04 and need
> > to support that we the newest hardware generation.
>
> Serious question, have you talked to Canonical?
>
> I mean there's a hell of a lot of effort being expended here, but it
> seems to all be predicated on the assumption that Ubuntu's LTS
> HWE/backport policy is totally immutable, and that we might need to
> make the kernel do backflips to work around that. But ... is it? Has
> anyone actually asked them how they feel about this?
>
> I mean, my answer to the first email is 'no, absolutely not' from the
> technical perspective (the initial proposal totally breaks current and
> future userspace), from a design perspective (it breaks a lot of
> usecases which aren't single-vendor GPU+display+codec, or aren't just
> a simple desktop), and from a sustainability perspective (cutting
> Android adrift again isn't acceptable collateral damage to make it
> easier to backport things to last year's Ubuntu release).
>
> But then again, I don't even know what I'm NAKing here ... ? The
> original email just lists a proposal to break a ton of things, with
> proposed replacements which aren't technically viable, and it's not
> clear why? Can we please have some more details and some reasoning
> behind them?
>
> I don't mind that userspace (compositor, protocols, clients like Mesa
> as well as codec APIs) need to do a lot of work to support the new
> model. I do really care though that the hard-binary-switch model works
> fine enough for AMD but totally breaks heterogeneous systems and makes
> it impossible for userspace to do the right thing.
>
> Cheers,
> Daniel
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-06-01 Thread Marek Olšák
On Tue., Jun. 1, 2021, 08:51 Christian König, <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 01.06.21 um 14:30 schrieb Daniel Vetter:
> > On Tue, Jun 1, 2021 at 2:10 PM Christian König
> >  wrote:
> >> Am 01.06.21 um 12:49 schrieb Michel Dänzer:
> >>> On 2021-06-01 12:21 p.m., Christian König wrote:
> >>>> Am 01.06.21 um 11:02 schrieb Michel Dänzer:
> >>>>> On 2021-05-27 11:51 p.m., Marek Olšák wrote:
> >>>>>> 3) Compositors (and other privileged processes, and display
> flipping) can't trust imported/exported fences. They need a timeout
> recovery mechanism from the beginning, and the following are some possible
> solutions to timeouts:
> >>>>>>
> >>>>>> a) use a CPU wait with a small absolute timeout, and display the
> previous content on timeout
> >>>>>> b) use a GPU wait with a small absolute timeout, and conditional
> rendering will choose between the latest content (if signalled) and
> previous content (if timed out)
> >>>>>>
> >>>>>> The result would be that the desktop can run close to 60 fps even
> if an app runs at 1 fps.
> >>>>> FWIW, this is working with
> >>>>> https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880 , even
> with implicit sync (on current Intel GPUs; amdgpu/radeonsi would need to
> provide the same dma-buf poll semantics as other drivers and high priority
> GFX contexts via EGL_IMG_context_priority which can preempt lower priority
> ones).
> >>>> Yeah, that is really nice to have.
> >>>>
> >>>> One question is if you wait on the CPU or the GPU for the new surface
> to become available?
> >>> It's based on polling dma-buf fds, i.e. CPU.
> >>>
> >>>> The former is a bit bad for latency and power management.
> >>> There isn't a choice for Wayland compositors in general, since there
> can be arbitrary other state which needs to be applied atomically together
> with the new buffer. (Though in theory, a compositor might get fancy and
> special-case surface commits which can be handled by waiting on the GPU)
> >>>
> >>> Latency is largely a matter of scheduling in the compositor. The
> latency incurred by the compositor shouldn't have to be more than
> single-digit milliseconds. (I've seen total latency from when the client
> starts processing a (static) frame to when it starts being scanned out as
> low as ~6 ms with
> https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1620, lower than
> typical with Xorg)
> >> Well let me describe it like this:
> >>
> >> We have an use cases for 144 Hz guaranteed refresh rate. That
> >> essentially means that the client application needs to be able to spit
> >> out one frame/window content every ~6.9ms. That's tough, but doable.
> >>
> >> When you now add 6ms latency in the compositor that means the client
> >> application has only .9ms left for it's frame which is basically
> >> impossible to do.
> >>
> >> See for the user fences handling the display engine will learn to read
> >> sequence numbers from memory and decide on it's own if the old frame or
> >> the new one is scanned out. To get the latency there as low as possible.
> > This won't work with implicit sync at all.
> >
> > If you want to enable this use-case with driver magic and without the
> > compositor being aware of what's going on, the solution is EGLStreams.
> > Not sure we want to go there, but it's definitely a lot more feasible
> > than trying to stuff eglstreams semantics into dma-buf implicit
> > fencing support in a desperate attempt to not change compositors.
>
> Well not changing compositors is certainly not something I would try
> with this use case.
>
> Not changing compositors is more like ok we have Ubuntu 20.04 and need
> to support that we the newest hardware generation.
>
> > I still think the most reasonable approach here is that we wrap a
> > dma_fence compat layer/mode over new hw for existing
> > userspace/compositors. And then enable userspace memory fences and the
> > fancy new features those allow with a new model that's built for them.
>
> Yeah, that's basically the same direction I'm heading. Question is how
> to fix all those details.
>
> > Also even with dma_fence we could implement your model of staying with
> > the previous buffer (or an older buffer at that's already rendered),
> > but it needs explicit involvement of the compositor. At least without
> > adding eglstreams fd to the kernel a

Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-05-28 Thread Marek Olšák
My first email can be ignored except for the sync files. Oh well.

I think I see what you mean, Christian. If we assume that an imported fence
is always read only (the buffer with the sequence number is read only),
only the process that created and exported the fence can signal it. If the
fence is not signaled, the exporting process is guilty. The only thing the
importing process must do when it's about to use the fence as a dependency
is to notify the kernel about it. Thus, the kernel will always know the
dependency graph. Then if the importing process times out, the kernel will
blame any of the processes that passed it a fence that is still unsignaled.
The kernel will blame the process that timed out only if all imported
fences have been signaled. It seems pretty robust.

It's the same with implicit sync except that the buffer with the sequence
number is writable. Any process that has an implicitly-sync'd buffer can
set the sequence number to 0 or UINT64_MAX. 0 will cause a timeout for the
next job, while UINT64_MAX might cause a timeout a little later. The
timeout can be mitigated by the kernel because the kernel knows the
greatest number that should be there, but it's not possible to know which
process is guilty (all processes holding the buffer handle would be
suspects).

Marek

On Fri, May 28, 2021 at 6:25 PM Marek Olšák  wrote:

> If both implicit and explicit synchronization are handled the same, then
> the kernel won't be able to identify the process that caused an implicit
> sync deadlock. The process that is stuck waiting for a fence can be
> innocent, and the kernel can't punish it. Likewise, the GPU reset guery
> that reports which process is guilty and innocent will only be able to
> report unknown. Is that OK?
>
> Marek
>
> On Fri, May 28, 2021 at 10:41 AM Christian König <
> ckoenig.leichtzumer...@gmail.com> wrote:
>
>> Hi Marek,
>>
>> well I don't think that implicit and explicit synchronization needs to be
>> mutual exclusive.
>>
>> What we should do is to have the ability to transport an synchronization
>> object with each BO.
>>
>> Implicit and explicit synchronization then basically become the same,
>> they just transport the synchronization object differently.
>>
>> The biggest problem are the sync_files for Android, since they are really
>> not easy to support at all. If Android wants to support user queues we
>> would probably have to do some changes there.
>>
>> Regards,
>> Christian.
>>
>> Am 27.05.21 um 23:51 schrieb Marek Olšák:
>>
>> Hi,
>>
>> Since Christian believes that we can't deadlock the kernel with some
>> changes there, we just need to make everything nice for userspace too.
>> Instead of explaining how it will work, I will explain the cases where
>> future hardware (and its kernel driver) will break existing userspace in
>> order to protect everybody from deadlocks. Anything that uses implicit sync
>> will be spared, so X and Wayland will be fine, assuming they don't
>> import/export fences. Those use cases that do import/export fences might or
>> might not work, depending on how the fences are used.
>>
>> One of the necessities is that all fences will become future fences. The
>> semantics of imported/exported fences will change completely and will have
>> new restrictions on the usage. The restrictions are:
>>
>>
>> 1) Android sync files will be impossible to support, so won't be
>> supported. (they don't allow future fences)
>>
>>
>> 2) Implicit sync and explicit sync will be mutually exclusive between
>> process. A process can either use one or the other, but not both. This is
>> meant to prevent a deadlock condition with future fences where any process
>> can malevolently deadlock execution of any other process, even execution of
>> a higher-privileged process. The kernel will impose the following
>> restrictions to protect against the deadlock:
>>
>> a) a process with an implicitly-sync'd imported/exported buffer can't
>> import/export a fence from/to another process
>> b) a process with an imported/exported fence can't import/export an
>> implicitly-sync'd buffer from/to another process
>>
>> Alternative: A higher-privileged process could enforce both restrictions
>> instead of the kernel to protect itself from the deadlock, but this would
>> be a can of worms for existing userspace. It would be better if the kernel
>> just broke unsafe userspace on future hw, just like sync files.
>>
>> If both implicit and explicit sync are allowed to occur simultaneously,
>> sending a future fence that will never signal to any process will deadlock
>> that process 

Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-05-28 Thread Marek Olšák
If both implicit and explicit synchronization are handled the same, then
the kernel won't be able to identify the process that caused an implicit
sync deadlock. The process that is stuck waiting for a fence can be
innocent, and the kernel can't punish it. Likewise, the GPU reset guery
that reports which process is guilty and innocent will only be able to
report unknown. Is that OK?

Marek

On Fri, May 28, 2021 at 10:41 AM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Hi Marek,
>
> well I don't think that implicit and explicit synchronization needs to be
> mutual exclusive.
>
> What we should do is to have the ability to transport an synchronization
> object with each BO.
>
> Implicit and explicit synchronization then basically become the same, they
> just transport the synchronization object differently.
>
> The biggest problem are the sync_files for Android, since they are really
> not easy to support at all. If Android wants to support user queues we
> would probably have to do some changes there.
>
> Regards,
> Christian.
>
> Am 27.05.21 um 23:51 schrieb Marek Olšák:
>
> Hi,
>
> Since Christian believes that we can't deadlock the kernel with some
> changes there, we just need to make everything nice for userspace too.
> Instead of explaining how it will work, I will explain the cases where
> future hardware (and its kernel driver) will break existing userspace in
> order to protect everybody from deadlocks. Anything that uses implicit sync
> will be spared, so X and Wayland will be fine, assuming they don't
> import/export fences. Those use cases that do import/export fences might or
> might not work, depending on how the fences are used.
>
> One of the necessities is that all fences will become future fences. The
> semantics of imported/exported fences will change completely and will have
> new restrictions on the usage. The restrictions are:
>
>
> 1) Android sync files will be impossible to support, so won't be
> supported. (they don't allow future fences)
>
>
> 2) Implicit sync and explicit sync will be mutually exclusive between
> process. A process can either use one or the other, but not both. This is
> meant to prevent a deadlock condition with future fences where any process
> can malevolently deadlock execution of any other process, even execution of
> a higher-privileged process. The kernel will impose the following
> restrictions to protect against the deadlock:
>
> a) a process with an implicitly-sync'd imported/exported buffer can't
> import/export a fence from/to another process
> b) a process with an imported/exported fence can't import/export an
> implicitly-sync'd buffer from/to another process
>
> Alternative: A higher-privileged process could enforce both restrictions
> instead of the kernel to protect itself from the deadlock, but this would
> be a can of worms for existing userspace. It would be better if the kernel
> just broke unsafe userspace on future hw, just like sync files.
>
> If both implicit and explicit sync are allowed to occur simultaneously,
> sending a future fence that will never signal to any process will deadlock
> that process after it acquires the implicit sync lock, which is a sequence
> number that the process is required to write to memory and send an
> interrupt from the GPU in a finite time. This is how the deadlock can
> happen:
>
> * The process gets sequence number N from the kernel for an
> implicitly-sync'd buffer.
> * The process inserts (into the GPU user-mapped queue) a wait for sequence
> number N-1.
> * The process inserts a wait for a fence, but it doesn't know that it will
> never signal ==> deadlock.
> ...
> * The process inserts a command to write sequence number N to a
> predetermined memory location. (which will make the buffer idle and send an
> interrupt to the kernel)
> ...
> * The kernel will terminate the process because it has never received the
> interrupt. (i.e. a less-privileged process just killed a more-privileged
> process)
>
> It's the interrupt for implicit sync that never arrived that caused the
> termination, and the only way another process can cause it is by sending a
> fence that will never signal. Thus, importing/exporting fences from/to
> other processes can't be allowed simultaneously with implicit sync.
>
>
> 3) Compositors (and other privileged processes, and display flipping)
> can't trust imported/exported fences. They need a timeout recovery
> mechanism from the beginning, and the following are some possible solutions
> to timeouts:
>
> a) use a CPU wait with a small absolute timeout, and display the previous
> content on timeout
> b) use a GPU wait with a small absolute timeout, and conditional rendering
> will c

[Mesa-dev] Linux Graphics Next: Userspace submission update

2021-05-27 Thread Marek Olšák
Hi,

Since Christian believes that we can't deadlock the kernel with some
changes there, we just need to make everything nice for userspace too.
Instead of explaining how it will work, I will explain the cases where
future hardware (and its kernel driver) will break existing userspace in
order to protect everybody from deadlocks. Anything that uses implicit sync
will be spared, so X and Wayland will be fine, assuming they don't
import/export fences. Those use cases that do import/export fences might or
might not work, depending on how the fences are used.

One of the necessities is that all fences will become future fences. The
semantics of imported/exported fences will change completely and will have
new restrictions on the usage. The restrictions are:


1) Android sync files will be impossible to support, so won't be supported.
(they don't allow future fences)


2) Implicit sync and explicit sync will be mutually exclusive between
process. A process can either use one or the other, but not both. This is
meant to prevent a deadlock condition with future fences where any process
can malevolently deadlock execution of any other process, even execution of
a higher-privileged process. The kernel will impose the following
restrictions to protect against the deadlock:

a) a process with an implicitly-sync'd imported/exported buffer can't
import/export a fence from/to another process
b) a process with an imported/exported fence can't import/export an
implicitly-sync'd buffer from/to another process

Alternative: A higher-privileged process could enforce both restrictions
instead of the kernel to protect itself from the deadlock, but this would
be a can of worms for existing userspace. It would be better if the kernel
just broke unsafe userspace on future hw, just like sync files.

If both implicit and explicit sync are allowed to occur simultaneously,
sending a future fence that will never signal to any process will deadlock
that process after it acquires the implicit sync lock, which is a sequence
number that the process is required to write to memory and send an
interrupt from the GPU in a finite time. This is how the deadlock can
happen:

* The process gets sequence number N from the kernel for an
implicitly-sync'd buffer.
* The process inserts (into the GPU user-mapped queue) a wait for sequence
number N-1.
* The process inserts a wait for a fence, but it doesn't know that it will
never signal ==> deadlock.
...
* The process inserts a command to write sequence number N to a
predetermined memory location. (which will make the buffer idle and send an
interrupt to the kernel)
...
* The kernel will terminate the process because it has never received the
interrupt. (i.e. a less-privileged process just killed a more-privileged
process)

It's the interrupt for implicit sync that never arrived that caused the
termination, and the only way another process can cause it is by sending a
fence that will never signal. Thus, importing/exporting fences from/to
other processes can't be allowed simultaneously with implicit sync.


3) Compositors (and other privileged processes, and display flipping) can't
trust imported/exported fences. They need a timeout recovery mechanism from
the beginning, and the following are some possible solutions to timeouts:

a) use a CPU wait with a small absolute timeout, and display the previous
content on timeout
b) use a GPU wait with a small absolute timeout, and conditional rendering
will choose between the latest content (if signalled) and previous content
(if timed out)

The result would be that the desktop can run close to 60 fps even if an app
runs at 1 fps.

*Redefining imported/exported fences and breaking some users/OSs is the
only way to have userspace GPU command submission, and the deadlock example
here is the counterexample proving that there is no other way.*

So, what are the chances this is going to fly with the ecosystem?

Thanks,
Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-05-04 Thread Marek Olšák
I see some mentions of XNACK and recoverable page faults. Note that all
gaming AMD hw that has userspace queues doesn't have XNACK, so there is no
overhead in compute units. My understanding is that recoverable page faults
are still supported without XNACK, but instead of the compute unit
replaying the faulting instruction, the L1 cache does that. Anyway, the
point is that XNACK is totally irrelevant here.

Marek

On Tue., May 4, 2021, 08:48 Christian König, <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 04.05.21 um 13:13 schrieb Daniel Vetter:
> > On Tue, May 4, 2021 at 12:53 PM Christian König
> >  wrote:
> >> Am 04.05.21 um 11:47 schrieb Daniel Vetter:
> >>> [SNIP]
>  Yeah, it just takes to long for the preemption to complete to be
> really
>  useful for the feature we are discussing here.
> 
>  As I said when the kernel requests to preempt a queue we can easily
> expect a
>  timeout of ~100ms until that comes back. For compute that is even in
> the
>  multiple seconds range.
> >>> 100ms for preempting an idle request sounds like broken hw to me. Of
> >>> course preemting something that actually runs takes a while, that's
> >>> nothing new. But it's also not the thing we're talking about here. Is
> this
> >>> 100ms actual numbers from hw for an actual idle ringbuffer?
> >> Well 100ms is just an example of the scheduler granularity. Let me
> >> explain in a wider context.
> >>
> >> The hardware can have X queues mapped at the same time and every Y time
> >> interval the hardware scheduler checks if those queues have changed and
> >> only if they have changed the necessary steps to reload them are
> started.
> >>
> >> Multiple queues can be rendering at the same time, so you can have X as
> >> a high priority queue active and just waiting for a signal to start and
> >> the client rendering one frame after another and a third background
> >> compute task mining bitcoins for you.
> >>
> >> As long as everything is static this is perfectly performant. Adding a
> >> queue to the list of active queues is also relatively simple, but taking
> >> one down requires you to wait until we are sure the hardware has seen
> >> the change and reloaded the queues.
> >>
> >> Think of it as an RCU grace period. This is simply not something which
> >> is made to be used constantly, but rather just at process termination.
> > Uh ... that indeed sounds rather broken.
>
> Well I wouldn't call it broken. It's just not made for the use case we
> are trying to abuse it for.
>
> > Otoh it's just a dma_fence that'd we'd inject as this unload-fence.
>
> Yeah, exactly that's why it isn't much of a problem for process
> termination or freeing memory.
>
> > So by and large everyone should already be able to cope with it taking a
> > bit longer. So from a design pov I don't see a huge problem, but I
> > guess you guys wont be happy since it means on amd hw there will be
> > random unsightly stalls in desktop linux usage.
> >
>  The "preemption" feature is really called suspend and made just for
> the case
>  when we want to put a process to sleep or need to forcefully kill it
> for
>  misbehavior or stuff like that. It is not meant to be used in normal
>  operation.
> 
>  If we only attach it on ->move then yeah maybe a last resort
> possibility to
>  do it this way, but I think in that case we could rather stick with
> kernel
>  submissions.
> >>> Well this is a hybrid userspace ring + kernel augmeted submit mode, so
> you
> >>> can keep dma-fences working. Because the dma-fence stuff wont work with
> >>> pure userspace submit, I think that conclusion is rather solid. Once
> more
> >>> even after this long thread here.
> >> When assisted with unload fences, then yes. Problem is that I can't see
> >> how we could implement those performant currently.
> > Is there really no way to fix fw here? Like if process start/teardown
> > takes 100ms, that's going to suck no matter what.
>
> As I said adding the queue is unproblematic and teardown just results in
> a bit more waiting to free things up.
>
> Problematic is more overcommit swapping and OOM situations which need to
> wait for the hw scheduler to come back and tell us that the queue is now
> unmapped.
>
> > Also, if userspace lies to us and keeps pushing crap into the ring
> > after it's supposed to be idle: Userspace is already allowed to waste
> > gpu time. If you're too worried about this set a fairly aggressive
> > preempt timeout on the unload fence, and kill the context if it takes
> > longer than what preempting an idle ring should take (because that
> > would indicate broken/evil userspace).
>  I think you have the wrong expectation here. It is perfectly valid and
>  expected for userspace to keep writing commands into the ring buffer.
> 
>  After all when one frame is completed they want to immediately start
>  rendering the next one.
> >>> Sure, for the true userspace direct 

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-05-03 Thread Marek Olšák
Proposal for a new CS ioctl, kernel pseudo code:

lock(_lock);
serial = get_next_serial(dev);
add_wait_command(ring, serial - 1);
add_exec_cmdbuf(ring, user_cmdbuf);
add_signal_command(ring, serial);
*ring->doorbell = FIRE;
unlock(_lock);

See? Just like userspace submit, but in the kernel without
concurrency/preemption. Is this now safe enough for dma_fence?

Marek

On Mon, May 3, 2021 at 4:36 PM Marek Olšák  wrote:

> What about direct submit from the kernel where the process still has write
> access to the GPU ring buffer but doesn't use it? I think that solves your
> preemption example, but leaves a potential backdoor for a process to
> overwrite the signal commands, which shouldn't be a problem since we are OK
> with timeouts.
>
> Marek
>
> On Mon, May 3, 2021 at 11:23 AM Jason Ekstrand 
> wrote:
>
>> On Mon, May 3, 2021 at 10:16 AM Bas Nieuwenhuizen
>>  wrote:
>> >
>> > On Mon, May 3, 2021 at 5:00 PM Jason Ekstrand 
>> wrote:
>> > >
>> > > Sorry for the top-post but there's no good thing to reply to here...
>> > >
>> > > One of the things pointed out to me recently by Daniel Vetter that I
>> > > didn't fully understand before is that dma_buf has a very subtle
>> > > second requirement beyond finite time completion:  Nothing required
>> > > for signaling a dma-fence can allocate memory.  Why?  Because the act
>> > > of allocating memory may wait on your dma-fence.  This, as it turns
>> > > out, is a massively more strict requirement than finite time
>> > > completion and, I think, throws out all of the proposals we have so
>> > > far.
>> > >
>> > > Take, for instance, Marek's proposal for userspace involvement with
>> > > dma-fence by asking the kernel for a next serial and the kernel
>> > > trusting userspace to signal it.  That doesn't work at all if
>> > > allocating memory to trigger a dma-fence can blow up.  There's simply
>> > > no way for the kernel to trust userspace to not do ANYTHING which
>> > > might allocate memory.  I don't even think there's a way userspace can
>> > > trust itself there.  It also blows up my plan of moving the fences to
>> > > transition boundaries.
>> > >
>> > > Not sure where that leaves us.
>> >
>> > Honestly the more I look at things I think userspace-signalable fences
>> > with a timeout sound like they are a valid solution for these issues.
>> > Especially since (as has been mentioned countless times in this email
>> > thread) userspace already has a lot of ways to cause timeouts and or
>> > GPU hangs through GPU work already.
>> >
>> > Adding a timeout on the signaling side of a dma_fence would ensure:
>> >
>> > - The dma_fence signals in finite time
>> > -  If the timeout case does not allocate memory then memory allocation
>> > is not a blocker for signaling.
>> >
>> > Of course you lose the full dependency graph and we need to make sure
>> > garbage collection of fences works correctly when we have cycles.
>> > However, the latter sounds very doable and the first sounds like it is
>> > to some extent inevitable.
>> >
>> > I feel like I'm missing some requirement here given that we
>> > immediately went to much more complicated things but can't find it.
>> > Thoughts?
>>
>> Timeouts are sufficient to protect the kernel but they make the fences
>> unpredictable and unreliable from a userspace PoV.  One of the big
>> problems we face is that, once we expose a dma_fence to userspace,
>> we've allowed for some pretty crazy potential dependencies that
>> neither userspace nor the kernel can sort out.  Say you have marek's
>> "next serial, please" proposal and a multi-threaded application.
>> Between time time you ask the kernel for a serial and get a dma_fence
>> and submit the work to signal that serial, your process may get
>> preempted, something else shoved in which allocates memory, and then
>> we end up blocking on that dma_fence.  There's no way userspace can
>> predict and defend itself from that.
>>
>> So I think where that leaves us is that there is no safe place to
>> create a dma_fence except for inside the ioctl which submits the work
>> and only after any necessary memory has been allocated.  That's a
>> pretty stiff requirement.  We may still be able to interact with
>> userspace a bit more explicitly but I think it throws any notion of
>> userspace direct submit out the window.
>>
>> --Jason
>

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-05-03 Thread Marek Olšák
What about direct submit from the kernel where the process still has write
access to the GPU ring buffer but doesn't use it? I think that solves your
preemption example, but leaves a potential backdoor for a process to
overwrite the signal commands, which shouldn't be a problem since we are OK
with timeouts.

Marek

On Mon, May 3, 2021 at 11:23 AM Jason Ekstrand  wrote:

> On Mon, May 3, 2021 at 10:16 AM Bas Nieuwenhuizen
>  wrote:
> >
> > On Mon, May 3, 2021 at 5:00 PM Jason Ekstrand 
> wrote:
> > >
> > > Sorry for the top-post but there's no good thing to reply to here...
> > >
> > > One of the things pointed out to me recently by Daniel Vetter that I
> > > didn't fully understand before is that dma_buf has a very subtle
> > > second requirement beyond finite time completion:  Nothing required
> > > for signaling a dma-fence can allocate memory.  Why?  Because the act
> > > of allocating memory may wait on your dma-fence.  This, as it turns
> > > out, is a massively more strict requirement than finite time
> > > completion and, I think, throws out all of the proposals we have so
> > > far.
> > >
> > > Take, for instance, Marek's proposal for userspace involvement with
> > > dma-fence by asking the kernel for a next serial and the kernel
> > > trusting userspace to signal it.  That doesn't work at all if
> > > allocating memory to trigger a dma-fence can blow up.  There's simply
> > > no way for the kernel to trust userspace to not do ANYTHING which
> > > might allocate memory.  I don't even think there's a way userspace can
> > > trust itself there.  It also blows up my plan of moving the fences to
> > > transition boundaries.
> > >
> > > Not sure where that leaves us.
> >
> > Honestly the more I look at things I think userspace-signalable fences
> > with a timeout sound like they are a valid solution for these issues.
> > Especially since (as has been mentioned countless times in this email
> > thread) userspace already has a lot of ways to cause timeouts and or
> > GPU hangs through GPU work already.
> >
> > Adding a timeout on the signaling side of a dma_fence would ensure:
> >
> > - The dma_fence signals in finite time
> > -  If the timeout case does not allocate memory then memory allocation
> > is not a blocker for signaling.
> >
> > Of course you lose the full dependency graph and we need to make sure
> > garbage collection of fences works correctly when we have cycles.
> > However, the latter sounds very doable and the first sounds like it is
> > to some extent inevitable.
> >
> > I feel like I'm missing some requirement here given that we
> > immediately went to much more complicated things but can't find it.
> > Thoughts?
>
> Timeouts are sufficient to protect the kernel but they make the fences
> unpredictable and unreliable from a userspace PoV.  One of the big
> problems we face is that, once we expose a dma_fence to userspace,
> we've allowed for some pretty crazy potential dependencies that
> neither userspace nor the kernel can sort out.  Say you have marek's
> "next serial, please" proposal and a multi-threaded application.
> Between time time you ask the kernel for a serial and get a dma_fence
> and submit the work to signal that serial, your process may get
> preempted, something else shoved in which allocates memory, and then
> we end up blocking on that dma_fence.  There's no way userspace can
> predict and defend itself from that.
>
> So I think where that leaves us is that there is no safe place to
> create a dma_fence except for inside the ioctl which submits the work
> and only after any necessary memory has been allocated.  That's a
> pretty stiff requirement.  We may still be able to interact with
> userspace a bit more explicitly but I think it throws any notion of
> userspace direct submit out the window.
>
> --Jason
>
>
> > - Bas
> > >
> > > --Jason
> > >
> > > On Mon, May 3, 2021 at 9:42 AM Alex Deucher 
> wrote:
> > > >
> > > > On Sat, May 1, 2021 at 6:27 PM Marek Olšák  wrote:
> > > > >
> > > > > On Wed, Apr 28, 2021 at 5:07 AM Michel Dänzer 
> wrote:
> > > > >>
> > > > >> On 2021-04-28 8:59 a.m., Christian König wrote:
> > > > >> > Hi Dave,
> > > > >> >
> > > > >> > Am 27.04.21 um 21:23 schrieb Marek Olšák:
> > > > >> >> Supporting interop with any device is always possible. It
> depends on which drivers we need to interoperate

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-05-01 Thread Marek Olšák
On Wed, Apr 28, 2021 at 5:07 AM Michel Dänzer  wrote:

> On 2021-04-28 8:59 a.m., Christian König wrote:
> > Hi Dave,
> >
> > Am 27.04.21 um 21:23 schrieb Marek Olšák:
> >> Supporting interop with any device is always possible. It depends on
> which drivers we need to interoperate with and update them. We've already
> found the path forward for amdgpu. We just need to find out how many other
> drivers need to be updated and evaluate the cost/benefit aspect.
> >>
> >> Marek
> >>
> >> On Tue, Apr 27, 2021 at 2:38 PM Dave Airlie  airl...@gmail.com>> wrote:
> >>
> >> On Tue, 27 Apr 2021 at 22:06, Christian König
> >>  ckoenig.leichtzumer...@gmail.com>> wrote:
> >> >
> >> > Correct, we wouldn't have synchronization between device with and
> without user queues any more.
> >> >
> >> > That could only be a problem for A+I Laptops.
> >>
> >> Since I think you mentioned you'd only be enabling this on newer
> >> chipsets, won't it be a problem for A+A where one A is a generation
> >> behind the other?
> >>
> >
> > Crap, that is a good point as well.
> >
> >>
> >> I'm not really liking where this is going btw, seems like a ill
> >> thought out concept, if AMD is really going down the road of
> designing
> >> hw that is currently Linux incompatible, you are going to have to
> >> accept a big part of the burden in bringing this support in to more
> >> than just amd drivers for upcoming generations of gpu.
> >>
> >
> > Well we don't really like that either, but we have no other option as
> far as I can see.
>
> I don't really understand what "future hw may remove support for kernel
> queues" means exactly. While the per-context queues can be mapped to
> userspace directly, they don't *have* to be, do they? I.e. the kernel
> driver should be able to either intercept userspace access to the queues,
> or in the worst case do it all itself, and provide the existing
> synchronization semantics as needed?
>
> Surely there are resource limits for the per-context queues, so the kernel
> driver needs to do some kind of virtualization / multi-plexing anyway, or
> we'll get sad user faces when there's no queue available for  game>.
>
> I'm probably missing something though, awaiting enlightenment. :)
>

The hw interface for userspace is that the ring buffer is mapped to the
process address space alongside a doorbell aperture (4K page) that isn't
real memory, but when the CPU writes into it, it tells the hw scheduler
that there are new GPU commands in the ring buffer. Userspace inserts all
the wait, draw, and signal commands into the ring buffer and then "rings"
the doorbell. It's my understanding that the ring buffer and the doorbell
are always mapped in the same GPU address space as the process, which makes
it very difficult to emulate the current protected ring buffers in the
kernel. The VMID of the ring buffer is also not changeable.

The hw scheduler doesn't do any synchronization and it doesn't see any
dependencies. It only chooses which queue to execute, so it's really just a
simple queue manager handling the virtualization aspect and not much else.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-27 Thread Marek Olšák
On Wed., Apr. 28, 2021, 00:01 Jason Ekstrand,  wrote:

> On Tue, Apr 27, 2021 at 4:59 PM Marek Olšák  wrote:
> >
> > Jason, both memory-based signalling as well as interrupt-based
> signalling to the CPU would be supported by amdgpu. External devices don't
> need to support memory-based sync objects. The only limitation is that they
> can't convert amdgpu sync objects to dma_fence.
>
> Sure.  I'm not worried about the mechanism.  We just need a word that
> means "the new fence thing" and I've been throwing "memory fence"
> around for that.  Other mechanisms may work as well.
>
> > The sad thing is that "external -> amdgpu" dependencies are really
> "external <-> amdgpu" dependencies due to mutually-exclusive access
> required by non-explicitly-sync'd buffers, so amdgpu-amdgpu interop is the
> only interop that would initially work with those buffers. Explicitly
> sync'd buffers also won't work if other drivers convert explicit fences to
> dma_fence. Thus, both implicit sync and explicit sync might not work with
> other drivers at all. The only interop that would initially work is
> explicit fences with memory-based waiting and signalling on the external
> device to keep the kernel out of the picture.
>
> Yup.  This is where things get hard.  That said, I'm not quite ready
> to give up on memory/interrupt fences just yet.
>
> One thought that came to mind which might help would be if we added an
> extremely strict concept of memory ownership.  The idea would be that
> any given BO would be in one of two states at any given time:
>
>  1. legacy: dma_fences and implicit sync works as normal but it cannot
> be resident in any "modern" (direct submission, ULLS, whatever you
> want to call it) context
>
>  2. modern: In this mode they should not be used by any legacy
> context.  We can't strictly prevent this, unfortunately, but maybe we
> can say reading produces garbage and writes may be discarded.  In this
> mode, they can be bound to modern contexts.
>
> In theory, when in "modern" mode, you could bind the same buffer in
> multiple modern contexts at a time.  However, when that's the case, it
> makes ownership really tricky to track.  Therefore, we might want some
> sort of dma-buf create flag for "always modern" vs. "switchable" and
> only allow binding to one modern context at a time when it's
> switchable.
>
> If we did this, we may be able to move any dma_fence shenanigans to
> the ownership transition points.  We'd still need some sort of "wait
> for fence and transition" which has a timeout.  However, then we'd be
> fairly well guaranteed that the application (not just Mesa!) has
> really and truly decided it's done with the buffer and we wouldn't (I
> hope!) end up with the accidental edges in the dependency graph.
>
> Of course, I've not yet proven any of this correct so feel free to
> tell me why it won't work. :-)  It was just one of those "about to go
> to bed and had a thunk" type thoughts.
>

We'd like to keep userspace outside of Mesa drivers intact and working
except for interop where we don't have much choice. At the same time,
future hw may remove support for kernel queues, so we might not have much
choice there either, depending on what the hw interface will look like.

The idea is to have an ioctl for querying a timeline semaphore buffer
associated with a shared BO, and an ioctl for querying the next wait and
signal number (e.g. n and n+1) for that semaphore. Waiting for n would be
like mutex lock and signaling would be like mutex unlock. The next process
would use the same ioctl and get n+1 and n+2, etc. There is a deadlock
condition because one process can do lock A, lock B, and another can do
lock B, lock A, which can be prevented such that the ioctl that returns the
numbers would return them for multiple buffers at once. This solution needs
no changes to userspace outside of Mesa drivers, and we'll also keep the BO
wait ioctl for GPU-CPU sync.

Marek


> --Jason
>
> P.S.  Daniel was 100% right when he said this discussion needs a glossary.
>
>
> > Marek
> >
> >
> > On Tue, Apr 27, 2021 at 3:41 PM Jason Ekstrand 
> wrote:
> >>
> >> Trying to figure out which e-mail in this mess is the right one to
> reply to
> >>
> >> On Tue, Apr 27, 2021 at 12:31 PM Lucas Stach 
> wrote:
> >> >
> >> > Hi,
> >> >
> >> > Am Dienstag, dem 27.04.2021 um 09:26 -0400 schrieb Marek Olšák:
> >> > > Ok. So that would only make the following use cases broken for now:
> >> > > - amd render -> external gpu
> >>
> >> Assuming said external GPU doe

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-27 Thread Marek Olšák
Jason, both memory-based signalling as well as interrupt-based signalling
to the CPU would be supported by amdgpu. External devices don't need to
support memory-based sync objects. The only limitation is that they can't
convert amdgpu sync objects to dma_fence.

The sad thing is that "external -> amdgpu" dependencies are really
"external <-> amdgpu" dependencies due to mutually-exclusive access
required by non-explicitly-sync'd buffers, so amdgpu-amdgpu interop is the
only interop that would initially work with those buffers. Explicitly
sync'd buffers also won't work if other drivers convert explicit fences to
dma_fence. Thus, both implicit sync and explicit sync might not work with
other drivers at all. The only interop that would initially work is
explicit fences with memory-based waiting and signalling on the external
device to keep the kernel out of the picture.

Marek


On Tue, Apr 27, 2021 at 3:41 PM Jason Ekstrand  wrote:

> Trying to figure out which e-mail in this mess is the right one to reply
> to
>
> On Tue, Apr 27, 2021 at 12:31 PM Lucas Stach 
> wrote:
> >
> > Hi,
> >
> > Am Dienstag, dem 27.04.2021 um 09:26 -0400 schrieb Marek Olšák:
> > > Ok. So that would only make the following use cases broken for now:
> > > - amd render -> external gpu
>
> Assuming said external GPU doesn't support memory fences.  If we do
> amdgpu and i915 at the same time, that covers basically most of the
> external GPU use-cases.  Of course, we'd want to convert nouveau as
> well for the rest.
>
> > > - amd video encode -> network device
> >
> > FWIW, "only" breaking amd render -> external gpu will make us pretty
> > unhappy, as we have some cases where we are combining an AMD APU with a
> > FPGA based graphics card. I can't go into the specifics of this use-
> > case too much but basically the AMD graphics is rendering content that
> > gets composited on top of a live video pipeline running through the
> > FPGA.
>
> I think it's worth taking a step back and asking what's being here
> before we freak out too much.  If we do go this route, it doesn't mean
> that your FPGA use-case can't work, it just means it won't work
> out-of-the box anymore.  You'll have to separate execution and memory
> dependencies inside your FPGA driver.  That's still not great but it's
> not as bad as you maybe made it sound.
>
> > > What about the case when we get a buffer from an external device and
> > > we're supposed to make it "busy" when we are using it, and the
> > > external device wants to wait until we stop using it? Is it something
> > > that can happen, thus turning "external -> amd" into "external <->
> > > amd"?
> >
> > Zero-copy texture sampling from a video input certainly appreciates
> > this very much. Trying to pass the render fence through the various
> > layers of userspace to be able to tell when the video input can reuse a
> > buffer is a great experience in yak shaving. Allowing the video input
> > to reuse the buffer as soon as the read dma_fence from the GPU is
> > signaled is much more straight forward.
>
> Oh, it's definitely worse than that.  Every window system interaction
> is bi-directional.  The X server has to wait on the client before
> compositing from it and the client has to wait on X before re-using
> that back-buffer.  Of course, we can break that later dependency by
> doing a full CPU wait but that's going to mean either more latency or
> reserving more back buffers.  There's no good clean way to claim that
> any of this is one-directional.
>
> --Jason
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-27 Thread Marek Olšák
Supporting interop with any device is always possible. It depends on which
drivers we need to interoperate with and update them. We've already found
the path forward for amdgpu. We just need to find out how many other
drivers need to be updated and evaluate the cost/benefit aspect.

Marek

On Tue, Apr 27, 2021 at 2:38 PM Dave Airlie  wrote:

> On Tue, 27 Apr 2021 at 22:06, Christian König
>  wrote:
> >
> > Correct, we wouldn't have synchronization between device with and
> without user queues any more.
> >
> > That could only be a problem for A+I Laptops.
>
> Since I think you mentioned you'd only be enabling this on newer
> chipsets, won't it be a problem for A+A where one A is a generation
> behind the other?
>
> I'm not really liking where this is going btw, seems like a ill
> thought out concept, if AMD is really going down the road of designing
> hw that is currently Linux incompatible, you are going to have to
> accept a big part of the burden in bringing this support in to more
> than just amd drivers for upcoming generations of gpu.
>
> Dave.
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-27 Thread Marek Olšák
Ok. So that would only make the following use cases broken for now:
- amd render -> external gpu
- amd video encode -> network device

What about the case when we get a buffer from an external device and we're
supposed to make it "busy" when we are using it, and the external device
wants to wait until we stop using it? Is it something that can happen, thus
turning "external -> amd" into "external <-> amd"?

Marek

On Tue., Apr. 27, 2021, 08:50 Christian König, <
ckoenig.leichtzumer...@gmail.com> wrote:

> Only amd -> external.
>
> We can easily install something in an user queue which waits for a
> dma_fence in the kernel.
>
> But we can't easily wait for an user queue as dependency of a dma_fence.
>
> The good thing is we have this wait before signal case on Vulkan timeline
> semaphores which have the same problem in the kernel.
>
> The good news is I think we can relatively easily convert i915 and older
> amdgpu device to something which is compatible with user fences.
>
> So yes, getting that fixed case by case should work.
>
> Christian
>
> Am 27.04.21 um 14:46 schrieb Marek Olšák:
>
> I'll defer to Christian and Alex to decide whether dropping sync with
> non-amd devices (GPUs, cameras etc.) is acceptable.
>
> Rewriting those drivers to this new sync model could be done on a case by
> case basis.
>
> For now, would we only lose the "amd -> external" dependency? Or the
> "external -> amd" dependency too?
>
> Marek
>
> On Tue., Apr. 27, 2021, 08:15 Daniel Vetter,  wrote:
>
>> On Tue, Apr 27, 2021 at 2:11 PM Marek Olšák  wrote:
>> > Ok. I'll interpret this as "yes, it will work, let's do it".
>>
>> It works if all you care about is drm/amdgpu. I'm not sure that's a
>> reasonable approach for upstream, but it definitely is an approach :-)
>>
>> We've already gone somewhat through the pain of drm/amdgpu redefining
>> how implicit sync works without sufficiently talking with other
>> people, maybe we should avoid a repeat of this ...
>> -Daniel
>>
>> >
>> > Marek
>> >
>> > On Tue., Apr. 27, 2021, 08:06 Christian König, <
>> ckoenig.leichtzumer...@gmail.com> wrote:
>> >>
>> >> Correct, we wouldn't have synchronization between device with and
>> without user queues any more.
>> >>
>> >> That could only be a problem for A+I Laptops.
>> >>
>> >> Memory management will just work with preemption fences which pause
>> the user queues of a process before evicting something. That will be a
>> dma_fence, but also a well known approach.
>> >>
>> >> Christian.
>> >>
>> >> Am 27.04.21 um 13:49 schrieb Marek Olšák:
>> >>
>> >> If we don't use future fences for DMA fences at all, e.g. we don't use
>> them for memory management, it can work, right? Memory management can
>> suspend user queues anytime. It doesn't need to use DMA fences. There might
>> be something that I'm missing here.
>> >>
>> >> What would we lose without DMA fences? Just inter-device
>> synchronization? I think that might be acceptable.
>> >>
>> >> The only case when the kernel will wait on a future fence is before a
>> page flip. Everything today already depends on userspace not hanging the
>> gpu, which makes everything a future fence.
>> >>
>> >> Marek
>> >>
>> >> On Tue., Apr. 27, 2021, 04:02 Daniel Vetter,  wrote:
>> >>>
>> >>> On Mon, Apr 26, 2021 at 04:59:28PM -0400, Marek Olšák wrote:
>> >>> > Thanks everybody. The initial proposal is dead. Here are some
>> thoughts on
>> >>> > how to do it differently.
>> >>> >
>> >>> > I think we can have direct command submission from userspace via
>> >>> > memory-mapped queues ("user queues") without changing window
>> systems.
>> >>> >
>> >>> > The memory management doesn't have to use GPU page faults like HMM.
>> >>> > Instead, it can wait for user queues of a specific process to go
>> idle and
>> >>> > then unmap the queues, so that userspace can't submit anything.
>> Buffer
>> >>> > evictions, pinning, etc. can be executed when all queues are
>> unmapped
>> >>> > (suspended). Thus, no BO fences and page faults are needed.
>> >>> >
>> >>> > Inter-process synchronization can use timeline semaphores.

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-27 Thread Marek Olšák
I'll defer to Christian and Alex to decide whether dropping sync with
non-amd devices (GPUs, cameras etc.) is acceptable.

Rewriting those drivers to this new sync model could be done on a case by
case basis.

For now, would we only lose the "amd -> external" dependency? Or the
"external -> amd" dependency too?

Marek

On Tue., Apr. 27, 2021, 08:15 Daniel Vetter,  wrote:

> On Tue, Apr 27, 2021 at 2:11 PM Marek Olšák  wrote:
> > Ok. I'll interpret this as "yes, it will work, let's do it".
>
> It works if all you care about is drm/amdgpu. I'm not sure that's a
> reasonable approach for upstream, but it definitely is an approach :-)
>
> We've already gone somewhat through the pain of drm/amdgpu redefining
> how implicit sync works without sufficiently talking with other
> people, maybe we should avoid a repeat of this ...
> -Daniel
>
> >
> > Marek
> >
> > On Tue., Apr. 27, 2021, 08:06 Christian König, <
> ckoenig.leichtzumer...@gmail.com> wrote:
> >>
> >> Correct, we wouldn't have synchronization between device with and
> without user queues any more.
> >>
> >> That could only be a problem for A+I Laptops.
> >>
> >> Memory management will just work with preemption fences which pause the
> user queues of a process before evicting something. That will be a
> dma_fence, but also a well known approach.
> >>
> >> Christian.
> >>
> >> Am 27.04.21 um 13:49 schrieb Marek Olšák:
> >>
> >> If we don't use future fences for DMA fences at all, e.g. we don't use
> them for memory management, it can work, right? Memory management can
> suspend user queues anytime. It doesn't need to use DMA fences. There might
> be something that I'm missing here.
> >>
> >> What would we lose without DMA fences? Just inter-device
> synchronization? I think that might be acceptable.
> >>
> >> The only case when the kernel will wait on a future fence is before a
> page flip. Everything today already depends on userspace not hanging the
> gpu, which makes everything a future fence.
> >>
> >> Marek
> >>
> >> On Tue., Apr. 27, 2021, 04:02 Daniel Vetter,  wrote:
> >>>
> >>> On Mon, Apr 26, 2021 at 04:59:28PM -0400, Marek Olšák wrote:
> >>> > Thanks everybody. The initial proposal is dead. Here are some
> thoughts on
> >>> > how to do it differently.
> >>> >
> >>> > I think we can have direct command submission from userspace via
> >>> > memory-mapped queues ("user queues") without changing window systems.
> >>> >
> >>> > The memory management doesn't have to use GPU page faults like HMM.
> >>> > Instead, it can wait for user queues of a specific process to go
> idle and
> >>> > then unmap the queues, so that userspace can't submit anything.
> Buffer
> >>> > evictions, pinning, etc. can be executed when all queues are unmapped
> >>> > (suspended). Thus, no BO fences and page faults are needed.
> >>> >
> >>> > Inter-process synchronization can use timeline semaphores. Userspace
> will
> >>> > query the wait and signal value for a shared buffer from the kernel.
> The
> >>> > kernel will keep a history of those queries to know which process is
> >>> > responsible for signalling which buffer. There is only the
> wait-timeout
> >>> > issue and how to identify the culprit. One of the solutions is to
> have the
> >>> > GPU send all GPU signal commands and all timed out wait commands via
> an
> >>> > interrupt to the kernel driver to monitor and validate userspace
> behavior.
> >>> > With that, it can be identified whether the culprit is the waiting
> process
> >>> > or the signalling process and which one. Invalid signal/wait
> parameters can
> >>> > also be detected. The kernel can force-signal only the semaphores
> that time
> >>> > out, and punish the processes which caused the timeout or used
> invalid
> >>> > signal/wait parameters.
> >>> >
> >>> > The question is whether this synchronization solution is robust
> enough for
> >>> > dma_fence and whatever the kernel and window systems need.
> >>>
> >>> The proper model here is the preempt-ctx dma_fence that amdkfd uses
> >>> (without page faults). That means dma_fence for synchronization is
> doa, at
> >>> least as-is, and we're back to figuring out the w

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-27 Thread Marek Olšák
Ok. I'll interpret this as "yes, it will work, let's do it".

Marek

On Tue., Apr. 27, 2021, 08:06 Christian König, <
ckoenig.leichtzumer...@gmail.com> wrote:

> Correct, we wouldn't have synchronization between device with and without
> user queues any more.
>
> That could only be a problem for A+I Laptops.
>
> Memory management will just work with preemption fences which pause the
> user queues of a process before evicting something. That will be a
> dma_fence, but also a well known approach.
>
> Christian.
>
> Am 27.04.21 um 13:49 schrieb Marek Olšák:
>
> If we don't use future fences for DMA fences at all, e.g. we don't use
> them for memory management, it can work, right? Memory management can
> suspend user queues anytime. It doesn't need to use DMA fences. There might
> be something that I'm missing here.
>
> What would we lose without DMA fences? Just inter-device synchronization?
> I think that might be acceptable.
>
> The only case when the kernel will wait on a future fence is before a page
> flip. Everything today already depends on userspace not hanging the gpu,
> which makes everything a future fence.
>
> Marek
>
> On Tue., Apr. 27, 2021, 04:02 Daniel Vetter,  wrote:
>
>> On Mon, Apr 26, 2021 at 04:59:28PM -0400, Marek Olšák wrote:
>> > Thanks everybody. The initial proposal is dead. Here are some thoughts
>> on
>> > how to do it differently.
>> >
>> > I think we can have direct command submission from userspace via
>> > memory-mapped queues ("user queues") without changing window systems.
>> >
>> > The memory management doesn't have to use GPU page faults like HMM.
>> > Instead, it can wait for user queues of a specific process to go idle
>> and
>> > then unmap the queues, so that userspace can't submit anything. Buffer
>> > evictions, pinning, etc. can be executed when all queues are unmapped
>> > (suspended). Thus, no BO fences and page faults are needed.
>> >
>> > Inter-process synchronization can use timeline semaphores. Userspace
>> will
>> > query the wait and signal value for a shared buffer from the kernel. The
>> > kernel will keep a history of those queries to know which process is
>> > responsible for signalling which buffer. There is only the wait-timeout
>> > issue and how to identify the culprit. One of the solutions is to have
>> the
>> > GPU send all GPU signal commands and all timed out wait commands via an
>> > interrupt to the kernel driver to monitor and validate userspace
>> behavior.
>> > With that, it can be identified whether the culprit is the waiting
>> process
>> > or the signalling process and which one. Invalid signal/wait parameters
>> can
>> > also be detected. The kernel can force-signal only the semaphores that
>> time
>> > out, and punish the processes which caused the timeout or used invalid
>> > signal/wait parameters.
>> >
>> > The question is whether this synchronization solution is robust enough
>> for
>> > dma_fence and whatever the kernel and window systems need.
>>
>> The proper model here is the preempt-ctx dma_fence that amdkfd uses
>> (without page faults). That means dma_fence for synchronization is doa, at
>> least as-is, and we're back to figuring out the winsys problem.
>>
>> "We'll solve it with timeouts" is very tempting, but doesn't work. It's
>> akin to saying that we're solving deadlock issues in a locking design by
>> doing a global s/mutex_lock/mutex_lock_timeout/ in the kernel. Sure it
>> avoids having to reach the reset button, but that's about it.
>>
>> And the fundamental problem is that once you throw in userspace command
>> submission (and syncing, at least within the userspace driver, otherwise
>> there's kinda no point if you still need the kernel for cross-engine sync)
>> means you get deadlocks if you still use dma_fence for sync under
>> perfectly legit use-case. We've discussed that one ad nauseam last summer:
>>
>>
>> https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlight=dma_fence#indefinite-dma-fences
>>
>> See silly diagramm at the bottom.
>>
>> Now I think all isn't lost, because imo the first step to getting to this
>> brave new world is rebuilding the driver on top of userspace fences, and
>> with the adjusted cmd submit model. You probably don't want to use amdkfd,
>> but port that as a context flag or similar to render nodes for gl/vk. Of
>> course that means you can only use this mode in headless, without
>> glx/wayland w

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-27 Thread Marek Olšák
If we don't use future fences for DMA fences at all, e.g. we don't use them
for memory management, it can work, right? Memory management can suspend
user queues anytime. It doesn't need to use DMA fences. There might be
something that I'm missing here.

What would we lose without DMA fences? Just inter-device synchronization? I
think that might be acceptable.

The only case when the kernel will wait on a future fence is before a page
flip. Everything today already depends on userspace not hanging the gpu,
which makes everything a future fence.

Marek

On Tue., Apr. 27, 2021, 04:02 Daniel Vetter,  wrote:

> On Mon, Apr 26, 2021 at 04:59:28PM -0400, Marek Olšák wrote:
> > Thanks everybody. The initial proposal is dead. Here are some thoughts on
> > how to do it differently.
> >
> > I think we can have direct command submission from userspace via
> > memory-mapped queues ("user queues") without changing window systems.
> >
> > The memory management doesn't have to use GPU page faults like HMM.
> > Instead, it can wait for user queues of a specific process to go idle and
> > then unmap the queues, so that userspace can't submit anything. Buffer
> > evictions, pinning, etc. can be executed when all queues are unmapped
> > (suspended). Thus, no BO fences and page faults are needed.
> >
> > Inter-process synchronization can use timeline semaphores. Userspace will
> > query the wait and signal value for a shared buffer from the kernel. The
> > kernel will keep a history of those queries to know which process is
> > responsible for signalling which buffer. There is only the wait-timeout
> > issue and how to identify the culprit. One of the solutions is to have
> the
> > GPU send all GPU signal commands and all timed out wait commands via an
> > interrupt to the kernel driver to monitor and validate userspace
> behavior.
> > With that, it can be identified whether the culprit is the waiting
> process
> > or the signalling process and which one. Invalid signal/wait parameters
> can
> > also be detected. The kernel can force-signal only the semaphores that
> time
> > out, and punish the processes which caused the timeout or used invalid
> > signal/wait parameters.
> >
> > The question is whether this synchronization solution is robust enough
> for
> > dma_fence and whatever the kernel and window systems need.
>
> The proper model here is the preempt-ctx dma_fence that amdkfd uses
> (without page faults). That means dma_fence for synchronization is doa, at
> least as-is, and we're back to figuring out the winsys problem.
>
> "We'll solve it with timeouts" is very tempting, but doesn't work. It's
> akin to saying that we're solving deadlock issues in a locking design by
> doing a global s/mutex_lock/mutex_lock_timeout/ in the kernel. Sure it
> avoids having to reach the reset button, but that's about it.
>
> And the fundamental problem is that once you throw in userspace command
> submission (and syncing, at least within the userspace driver, otherwise
> there's kinda no point if you still need the kernel for cross-engine sync)
> means you get deadlocks if you still use dma_fence for sync under
> perfectly legit use-case. We've discussed that one ad nauseam last summer:
>
>
> https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlight=dma_fence#indefinite-dma-fences
>
> See silly diagramm at the bottom.
>
> Now I think all isn't lost, because imo the first step to getting to this
> brave new world is rebuilding the driver on top of userspace fences, and
> with the adjusted cmd submit model. You probably don't want to use amdkfd,
> but port that as a context flag or similar to render nodes for gl/vk. Of
> course that means you can only use this mode in headless, without
> glx/wayland winsys support, but it's a start.
> -Daniel
>
> >
> > Marek
> >
> > On Tue, Apr 20, 2021 at 4:34 PM Daniel Stone 
> wrote:
> >
> > > Hi,
> > >
> > > On Tue, 20 Apr 2021 at 20:30, Daniel Vetter  wrote:
> > >
> > >> The thing is, you can't do this in drm/scheduler. At least not without
> > >> splitting up the dma_fence in the kernel into separate memory fences
> > >> and sync fences
> > >
> > >
> > > I'm starting to think this thread needs its own glossary ...
> > >
> > > I propose we use 'residency fence' for execution fences which enact
> > > memory-residency operations, e.g. faulting in a page ultimately
> depending
> > > on GPU work retiring.
> > >
> > > And 'value fence' for the pure-userspace model suggested by timeline
> > > semaphores, i.e. 

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-26 Thread Marek Olšák
Thanks everybody. The initial proposal is dead. Here are some thoughts on
how to do it differently.

I think we can have direct command submission from userspace via
memory-mapped queues ("user queues") without changing window systems.

The memory management doesn't have to use GPU page faults like HMM.
Instead, it can wait for user queues of a specific process to go idle and
then unmap the queues, so that userspace can't submit anything. Buffer
evictions, pinning, etc. can be executed when all queues are unmapped
(suspended). Thus, no BO fences and page faults are needed.

Inter-process synchronization can use timeline semaphores. Userspace will
query the wait and signal value for a shared buffer from the kernel. The
kernel will keep a history of those queries to know which process is
responsible for signalling which buffer. There is only the wait-timeout
issue and how to identify the culprit. One of the solutions is to have the
GPU send all GPU signal commands and all timed out wait commands via an
interrupt to the kernel driver to monitor and validate userspace behavior.
With that, it can be identified whether the culprit is the waiting process
or the signalling process and which one. Invalid signal/wait parameters can
also be detected. The kernel can force-signal only the semaphores that time
out, and punish the processes which caused the timeout or used invalid
signal/wait parameters.

The question is whether this synchronization solution is robust enough for
dma_fence and whatever the kernel and window systems need.

Marek

On Tue, Apr 20, 2021 at 4:34 PM Daniel Stone  wrote:

> Hi,
>
> On Tue, 20 Apr 2021 at 20:30, Daniel Vetter  wrote:
>
>> The thing is, you can't do this in drm/scheduler. At least not without
>> splitting up the dma_fence in the kernel into separate memory fences
>> and sync fences
>
>
> I'm starting to think this thread needs its own glossary ...
>
> I propose we use 'residency fence' for execution fences which enact
> memory-residency operations, e.g. faulting in a page ultimately depending
> on GPU work retiring.
>
> And 'value fence' for the pure-userspace model suggested by timeline
> semaphores, i.e. fences being (*addr == val) rather than being able to look
> at ctx seqno.
>
> Cheers,
> Daniel
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Marek Olšák
On Tue, Apr 20, 2021 at 2:39 PM Daniel Vetter  wrote:

> On Tue, Apr 20, 2021 at 6:25 PM Marek Olšák  wrote:
> >
> > Daniel, imagine hardware that can only do what Windows does: future
> fences signalled by userspace whenever userspace wants, and no kernel
> queues like we have today.
> >
> > The only reason why current AMD GPUs work is because they have a ring
> buffer per queue with pointers to userspace command buffers followed by
> fences. What will we do if that ring buffer is removed?
>
> Well this is an entirely different problem than what you set out to
> describe. This is essentially the problem where hw does not have any
> support for priviledged commands and separate priviledges command
> buffer, and direct userspace submit is the only thing that is
> available.
>
> I think if this is your problem, then you get to implement some very
> interesting compat shim. But that's an entirely different problem from
> what you've described in your mail. This pretty much assumes at the hw
> level the only thing that works is ATS/pasid, and vram is managed with
> HMM exclusively. Once you have that pure driver stack you get to fake
> it in the kernel for compat with everything that exists already. How
> exactly that will look and how exactly you best construct your
> dma_fences for compat will depend highly upon how much is still there
> in this hw (e.g. wrt interrupt generation). A lot of the
> infrastructure was also done as part of drm_syncobj. I mean we have
> entirely fake kernel drivers like vgem/vkms that create dma_fence, so
> a hw ringbuffer is really not required.
>
> So ... is this your problem underneath it all, or was that more a wild
> strawman for the discussion?
>

Yes, that's the problem.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Marek Olšák
Daniel, imagine hardware that can only do what Windows does: future fences
signalled by userspace whenever userspace wants, and no kernel queues like
we have today.

The only reason why current AMD GPUs work is because they have a ring
buffer per queue with pointers to userspace command buffers followed by
fences. What will we do if that ring buffer is removed?

Marek

On Tue, Apr 20, 2021 at 11:50 AM Daniel Stone  wrote:

> Hi,
>
> On Tue, 20 Apr 2021 at 16:16, Christian König <
> ckoenig.leichtzumer...@gmail.com> wrote:
>
>> Am 20.04.21 um 17:07 schrieb Daniel Stone:
>>
>> If the compositor no longer has a guarantee that the buffer will be ready
>> for composition in a reasonable amount of time (which dma_fence gives us,
>> and this proposal does not appear to give us), then the compositor isn't
>> trying to use the buffer for compositing, it's waiting asynchronously on a
>> notification that the fence has signaled before it attempts to use the
>> buffer.
>>
>> Marek's initial suggestion is that the kernel signal the fence, which
>> would unblock composition (and presumably show garbage on screen, or at
>> best jump back to old content).
>>
>> My position is that the compositor will know the process has crashed
>> anyway - because its socket has been closed - at which point we destroy all
>> the client's resources including its windows and buffers regardless.
>> Signaling the fence doesn't give us any value here, _unless_ the compositor
>> is just blindly waiting for the fence to signal ... which it can't do
>> because there's no guarantee the fence will ever signal.
>>
>>
>> Yeah, but that assumes that the compositor has change to not blindly wait
>> for the client to finish rendering and as Daniel explained that is rather
>> unrealistic.
>>
>> What we need is a fallback mechanism which signals the fence after a
>> timeout and gives a penalty to the one causing the timeout.
>>
>> That gives us the same functionality we have today with the in software
>> scheduler inside the kernel.
>>
>
> OK, if that's the case then I think I'm really missing something which
> isn't explained in this thread, because I don't understand what the
> additional complexity and API change gains us (see my first reply in this
> thread).
>
> By way of example - say I have a blind-but-explicit compositor that takes
> a drm_syncobj along with a dmabuf with each client presentation request,
> but doesn't check syncobj completion, it just imports that into a
> VkSemaphore + VkImage and schedules work for the next frame.
>
> Currently, that generates an execbuf ioctl for the composition (ignore KMS
> for now) with a sync point to wait on, and the kernel+GPU scheduling
> guarantees that the composition work will not begin until the client
> rendering work has retired. We have a further guarantee that this work will
> complete in reasonable time, for some value of 'reasonable'.
>
> My understanding of this current proposal is that:
> * userspace creates a 'present fence' with this new ioctl
> * the fence becomes signaled when a value is written to a location in
> memory, which is visible through both CPU and GPU mappings of that page
> * this 'present fence' is imported as a VkSemaphore (?) and the userspace
> Vulkan driver will somehow wait on this value  either before submitting
> work or as a possibly-hardware-assisted GPU-side wait (?)
> * the kernel's scheduler is thus eliminated from the equation, and every
> execbuf is submitted directly to hardware, because either userspace knows
> that the fence has already been signaled, or it will issue a GPU-side wait
> (?)
> * but the kernel is still required to monitor completion of every fence
> itself, so it can forcibly complete, or penalise the client (?)
>
> Lastly, let's say we stop ignoring KMS: what happens for the
> render-with-GPU-display-on-KMS case? Do we need to do the equivalent of
> glFinish() in userspace and only submit the KMS atomic request when the GPU
> work has fully retired?
>
> Clarifying those points would be really helpful so this is less of a
> strawman. I have some further opinions, but I'm going to wait until I
> understand what I'm actually arguing against before I go too far. :) The
> last point is very salient though.
>
> Cheers,
> Daniel
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-20 Thread Marek Olšák
Daniel, are you suggesting that we should skip any deadlock prevention in
the kernel, and just let userspace wait for and signal any fence it has
access to?

Do you have any concern with the deprecation/removal of BO fences in the
kernel assuming userspace is only using explicit fences? Any concern with
the submit and return fences for modesetting and other producer<->consumer
scenarios?

Thanks,
Marek

On Tue, Apr 20, 2021 at 6:34 AM Daniel Vetter  wrote:

> On Tue, Apr 20, 2021 at 12:15 PM Christian König
>  wrote:
> >
> > Am 19.04.21 um 17:48 schrieb Jason Ekstrand:
> > > Not going to comment on everything on the first pass...
> > >
> > > On Mon, Apr 19, 2021 at 5:48 AM Marek Olšák  wrote:
> > >> Hi,
> > >>
> > >> This is our initial proposal for explicit fences everywhere and new
> memory management that doesn't use BO fences. It's a redesign of how Linux
> graphics drivers work, and it can coexist with what we have now.
> > >>
> > >>
> > >> 1. Introduction
> > >> (skip this if you are already sold on explicit fences)
> > >>
> > >> The current Linux graphics architecture was initially designed for
> GPUs with only one graphics queue where everything was executed in the
> submission order and per-BO fences were used for memory management and
> CPU-GPU synchronization, not GPU-GPU synchronization. Later, multiple
> queues were added on top, which required the introduction of implicit
> GPU-GPU synchronization between queues of different processes using per-BO
> fences. Recently, even parallel execution within one queue was enabled
> where a command buffer starts draws and compute shaders, but doesn't wait
> for them, enabling parallelism between back-to-back command buffers.
> Modesetting also uses per-BO fences for scheduling flips. Our GPU scheduler
> was created to enable all those use cases, and it's the only reason why the
> scheduler exists.
> > >>
> > >> The GPU scheduler, implicit synchronization, BO-fence-based memory
> management, and the tracking of per-BO fences increase CPU overhead and
> latency, and reduce parallelism. There is a desire to replace all of them
> with something much simpler. Below is how we could do it.
> > >>
> > >>
> > >> 2. Explicit synchronization for window systems and modesetting
> > >>
> > >> The producer is an application and the consumer is a compositor or a
> modesetting driver.
> > >>
> > >> 2.1. The Present request
> > >>
> > >> As part of the Present request, the producer will pass 2 fences (sync
> objects) to the consumer alongside the presented DMABUF BO:
> > >> - The submit fence: Initially unsignalled, it will be signalled when
> the producer has finished drawing into the presented buffer.
> > >> - The return fence: Initially unsignalled, it will be signalled when
> the consumer has finished using the presented buffer.
> > > I'm not sure syncobj is what we want.  In the Intel world we're trying
> > > to go even further to something we're calling "userspace fences" which
> > > are a timeline implemented as a single 64-bit value in some
> > > CPU-mappable BO.  The client writes a higher value into the BO to
> > > signal the timeline.
> >
> > Well that is exactly what our Windows guys have suggested as well, but
> > it strongly looks like that this isn't sufficient.
> >
> > First of all you run into security problems when any application can
> > just write any value to that memory location. Just imagine an
> > application sets the counter to zero and X waits forever for some
> > rendering to finish.
>
> The thing is, with userspace fences security boundary issue prevent
> moves into userspace entirely. And it really doesn't matter whether
> the event you're waiting on doesn't complete because the other app
> crashed or was stupid or intentionally gave you a wrong fence point:
> You have to somehow handle that, e.g. perhaps with conditional
> rendering and just using the old frame in compositing if the new one
> doesn't show up in time. Or something like that. So trying to get the
> kernel involved but also not so much involved sounds like a bad design
> to me.
>
> > Additional to that in such a model you can't determine who is the guilty
> > queue in case of a hang and can't reset the synchronization primitives
> > in case of an error.
> >
> > Apart from that this is rather inefficient, e.g. we don't have any way
> > to prevent priority inversion when used as a synchronization mechanism
> > between different GPU q

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-19 Thread Marek Olšák
We already don't have accurate BO fences in some cases. Instead, BOs can
have fences which are equal to the last seen command buffer for each queue.
It's practically the same as if the kernel had no visibility into command
submissions and just added a fence into all queues when it needed to wait
for idle. That's already one alternative to BO fences that would work
today. The only BOs that need accurate BO fences are shared buffers, and
those use cases can be converted to explicit fences.

Removing memory management from all command buffer submission logic would
be one of the benefits that is quite appealing.

You don't need to depend on apps for budgeting and placement determination.
You can sort buffers according to driver usage, e.g. scratch/spill buffers,
shader IO rings, MSAA images, other images, and buffers. Alternatively, you
can have just internal buffers vs app buffers. Then you assign VRAM from
left to right until you reach the quota. This is optional, so this part can
be ignored.

>> - A GPU hang signals all fences. Other deadlocks will be handled like
GPU hangs.
>
>What do you mean by "all"?  All fences that were supposed to be
>signaled by the hung context?

Yes, that's one of the possibilities. Any GPU hang followed by a GPU reset
can clear VRAM, so all processes should recreate their contexts and
reinitialize resources. A deadlock caused by userspace could be handled
similarly.

I don't know how timeline fences would work across processes and how
resilient they would be to segfaults.

Marek

On Mon, Apr 19, 2021 at 11:48 AM Jason Ekstrand 
wrote:

> Not going to comment on everything on the first pass...
>
> On Mon, Apr 19, 2021 at 5:48 AM Marek Olšák  wrote:
> >
> > Hi,
> >
> > This is our initial proposal for explicit fences everywhere and new
> memory management that doesn't use BO fences. It's a redesign of how Linux
> graphics drivers work, and it can coexist with what we have now.
> >
> >
> > 1. Introduction
> > (skip this if you are already sold on explicit fences)
> >
> > The current Linux graphics architecture was initially designed for GPUs
> with only one graphics queue where everything was executed in the
> submission order and per-BO fences were used for memory management and
> CPU-GPU synchronization, not GPU-GPU synchronization. Later, multiple
> queues were added on top, which required the introduction of implicit
> GPU-GPU synchronization between queues of different processes using per-BO
> fences. Recently, even parallel execution within one queue was enabled
> where a command buffer starts draws and compute shaders, but doesn't wait
> for them, enabling parallelism between back-to-back command buffers.
> Modesetting also uses per-BO fences for scheduling flips. Our GPU scheduler
> was created to enable all those use cases, and it's the only reason why the
> scheduler exists.
> >
> > The GPU scheduler, implicit synchronization, BO-fence-based memory
> management, and the tracking of per-BO fences increase CPU overhead and
> latency, and reduce parallelism. There is a desire to replace all of them
> with something much simpler. Below is how we could do it.
> >
> >
> > 2. Explicit synchronization for window systems and modesetting
> >
> > The producer is an application and the consumer is a compositor or a
> modesetting driver.
> >
> > 2.1. The Present request
> >
> > As part of the Present request, the producer will pass 2 fences (sync
> objects) to the consumer alongside the presented DMABUF BO:
> > - The submit fence: Initially unsignalled, it will be signalled when the
> producer has finished drawing into the presented buffer.
> > - The return fence: Initially unsignalled, it will be signalled when the
> consumer has finished using the presented buffer.
>
> I'm not sure syncobj is what we want.  In the Intel world we're trying
> to go even further to something we're calling "userspace fences" which
> are a timeline implemented as a single 64-bit value in some
> CPU-mappable BO.  The client writes a higher value into the BO to
> signal the timeline.  The kernel then provides some helpers for
> waiting on them reliably and without spinning.  I don't expect
> everyone to support these right away but, If we're going to re-plumb
> userspace for explicit synchronization, I'd like to make sure we take
> this into account so we only have to do it once.
>
>
> > Deadlock mitigation to recover from segfaults:
> > - The kernel knows which process is obliged to signal which fence. This
> information is part of the Present request and supplied by userspace.
>
> This isn't clear to me.  Yes, if we're using anything dma-fence based
> like syncobj, this is true.  But it doesn't seem to

[Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-19 Thread Marek Olšák
Hi,

This is our initial proposal for explicit fences everywhere and new memory
management that doesn't use BO fences. It's a redesign of how Linux
graphics drivers work, and it can coexist with what we have now.


*1. Introduction*
(skip this if you are already sold on explicit fences)

The current Linux graphics architecture was initially designed for GPUs
with only one graphics queue where everything was executed in the
submission order and per-BO fences were used for memory management and
CPU-GPU synchronization, not GPU-GPU synchronization. Later, multiple
queues were added on top, which required the introduction of implicit
GPU-GPU synchronization between queues of different processes using per-BO
fences. Recently, even parallel execution within one queue was enabled
where a command buffer starts draws and compute shaders, but doesn't wait
for them, enabling parallelism between back-to-back command buffers.
Modesetting also uses per-BO fences for scheduling flips. Our GPU scheduler
was created to enable all those use cases, and it's the only reason why the
scheduler exists.

The GPU scheduler, implicit synchronization, BO-fence-based memory
management, and the tracking of per-BO fences increase CPU overhead and
latency, and reduce parallelism. There is a desire to replace all of them
with something much simpler. Below is how we could do it.


*2. Explicit synchronization for window systems and modesetting*

The producer is an application and the consumer is a compositor or a
modesetting driver.

*2.1. The Present request*

As part of the Present request, the producer will pass 2 fences (sync
objects) to the consumer alongside the presented DMABUF BO:
- The submit fence: Initially unsignalled, it will be signalled when the
producer has finished drawing into the presented buffer.
- The return fence: Initially unsignalled, it will be signalled when the
consumer has finished using the presented buffer.

Deadlock mitigation to recover from segfaults:
- The kernel knows which process is obliged to signal which fence. This
information is part of the Present request and supplied by userspace.
- If the producer crashes, the kernel signals the submit fence, so that the
consumer can make forward progress.
- If the consumer crashes, the kernel signals the return fence, so that the
producer can reclaim the buffer.
- A GPU hang signals all fences. Other deadlocks will be handled like GPU
hangs.

Other window system requests can follow the same idea.

Merged fences where one fence object contains multiple fences will be
supported. A merged fence is signalled only when its fences are signalled.
The consumer will have the option to redefine the unsignalled return fence
to a merged fence.

*2.2. Modesetting*

Since a modesetting driver can also be the consumer, the present ioctl will
contain a submit fence and a return fence too. One small problem with this
is that userspace can hang the modesetting driver, but in theory, any later
present ioctl can override the previous one, so the unsignalled
presentation is never used.


*3. New memory management*

The per-BO fences will be removed and the kernel will not know which
buffers are busy. This will reduce CPU overhead and latency. The kernel
will not need per-BO fences with explicit synchronization, so we just need
to remove their last user: buffer evictions. It also resolves the current
OOM deadlock.

*3.1. Evictions*

If the kernel wants to move a buffer, it will have to wait for everything
to go idle, halt all userspace command submissions, move the buffer, and
resume everything. This is not expected to happen when memory is not
exhausted. Other more efficient ways of synchronization are also possible
(e.g. sync only one process), but are not discussed here.

*3.2. Per-process VRAM usage quota*

Each process can optionally and periodically query its VRAM usage quota and
change domains of its buffers to obey that quota. For example, a process
allocated 2 GB of buffers in VRAM, but the kernel decreased the quota to 1
GB. The process can change the domains of the least important buffers to
GTT to get the best outcome for itself. If the process doesn't do it, the
kernel will choose which buffers to evict at random. (thanks to Christian
Koenig for this idea)

*3.3. Buffer destruction without per-BO fences*

When the buffer destroy ioctl is called, an optional fence list can be
passed to the kernel to indicate when it's safe to deallocate the buffer.
If the fence list is empty, the buffer will be deallocated immediately.
Shared buffers will be handled by merging fence lists from all processes
that destroy them. Mitigation of malicious behavior:
- If userspace destroys a busy buffer, it will get a GPU page fault.
- If userspace sends fences that never signal, the kernel will have a
timeout period and then will proceed to deallocate the buffer anyway.

*3.4. Other notes on MM*

Overcommitment of GPU-accessible memory will cause an allocation failure or
invoke the OOM 

Re: [Mesa-dev] [RFC] Concrete proposal to split classic

2021-04-04 Thread Marek Olšák
Another thing is that glsl_to_tgsi is going to be removed but an old driver
may want to keep it. For this case, glsl_to_tgsi will be preserved in the
lts branch.

Marek

On Mon., Mar. 29, 2021, 18:59 Ilia Mirkin,  wrote:

> Probably nv30 would do well to "move on" as well. But it also presents
> an interesting question -- the nv30 driver has lots of problems. I
> have no plans to fix them, nor am I aware of anyone else with such
> plans. However if such a developer were to turn up, would it be
> reasonable to assume that their work would ultimately land in this
> "lts" branch/tree/whatever? Some of the "fixes" will require large-ish
> changes to the driver...
>
> Cheers,
>
>   -ilia
>
> On Mon, Mar 29, 2021 at 6:48 PM Marek Olšák  wrote:
> >
> > Alright that's r300 and swr that are going to find a new home in the lts
> branch. Do any other gallium drivers want to join them?
> >
> > Marek
> >
> > On Mon., Mar. 29, 2021, 13:51 Zielinski, Jan, 
> wrote:
> >>
> >> On Thursday, March 25, 2021 8:47 Marek Olšák wrote:
> >> > Same thinking could be applied to other gallium drivers for old
> hardware that don't receive new development and are becoming more and more
> irrelevant every year due to their age.
> >>
> >> Can we also keep Gallium for OpenSWR driver on -lts branch? We
> currently are focusing effort on other OSS projects, and want to maintain
> OpenSWR at its current feature level, but we are often seeing Mesa core
> changes causing problems in OpenSWR, that we can’t always address right
> away. So, we would like to point our users to a stable branch, that limits
> the amount of effort required for OpenSWR to support its existing users.
> >>
> >> Jan
> >>
> >> On Wed, Mar 24, 2021, at 09:15, Jason Ekstrand wrote:
> >> > On Wed, Mar 24, 2021 at 10:28 AM Rob Clark  robdcl...@gmail.com> wrote:
> >> > >
> >> > > On Mon, Mar 22, 2021 at 3:15 PM Dylan Baker  dy...@pnwbakers.com> wrote:
> >> > > >
> >> > > > Hi list,
> >> > > >
> >> > > > We've talked about it a number of times, but I think it's time
> time to
> >> > > > discuss splitting the classic drivers off of the main development
> branch
> >> > > > again, although this time I have a concrete plan for how this
> would
> >> > > > work.
> >> > > >
> >> > > > First, why? Basically, all of the classic drivers are in
> maintanence
> >> > > > mode (even i965). Second, many of them rely on code that no one
> works
> >> > > > on, and very few people still understand. There is no CI for most
> of
> >> > > > them, and the Intel CI is not integrated with gitlab, so it's
> easy to
> >> > > > unintentionally break them, and this breakage usually isn't
> noticed
> >> > > > until just before or just after a release. 21.0 was held up (in
> small
> >> > > > part, also me just getting behind) because of such breakages.
> >> > > >
> >> > > > I konw there is some interest in getting i915g in good enough
> shape that
> >> > > > it could replace i915c, at least for the common case. I also am
> aware
> >> > > > that Dave, Ilia, and Eric (with some pointers from Ken) have been
> >> > > > working on a gallium driver to replace i965. Neither of those
> things are
> >> > > > ready yet, but I've taken them into account.
> >> > > >
> >> > > > Here's the plan:
> >> > > >
> >> > > > 1) 21.1 release happens
> >> > > > 2) we remove classic from master
> >> > > > 3) 21.1 reaches EOL because of 21.2
> >> > > > 4) we fork the 21.1 branch into a "classic-lts"¹ branch
> >> > > > 5) we disable all vulkan and gallium drivers in said branch, at
> least at
> >> > > >the Meson level
> >> > >
> >> > > I'm +1 for the -lts branch.. the layering between mesa "classic" and
> >> > > gallium is already starting to get poked thru in the name of
> >> > > performance, and we've already discovered cases of classic drivers
> >> > > being broken for multiple months with no one noticing.  I think a
> >> > > slower moving -lts branch is the best approach to keeping things
> >> > > working for folks with older hw.
> >> > >
> >> > >

Re: [Mesa-dev] [RFC] Concrete proposal to split classic

2021-03-29 Thread Marek Olšák
Alright that's r300 and swr that are going to find a new home in the lts
branch. Do any other gallium drivers want to join them?

Marek

On Mon., Mar. 29, 2021, 13:51 Zielinski, Jan, 
wrote:

> On Thursday, March 25, 2021 8:47 Marek Olšák wrote:
> > Same thinking could be applied to other gallium drivers for old hardware
> that don't receive new development and are becoming more and more
> irrelevant every year due to their age.
>
> Can we also keep Gallium for OpenSWR driver on -lts branch? We currently
> are focusing effort on other OSS projects, and want to maintain OpenSWR at
> its current feature level, but we are often seeing Mesa core changes
> causing problems in OpenSWR, that we can’t always address right away. So,
> we would like to point our users to a stable branch, that limits the amount
> of effort required for OpenSWR to support its existing users.
>
> Jan
>
> On Wed, Mar 24, 2021, at 09:15, Jason Ekstrand wrote:
> > On Wed, Mar 24, 2021 at 10:28 AM Rob Clark <mailto:robdcl...@gmail.com>
> wrote:
> > >
> > > On Mon, Mar 22, 2021 at 3:15 PM Dylan Baker  dy...@pnwbakers.com> wrote:
> > > >
> > > > Hi list,
> > > >
> > > > We've talked about it a number of times, but I think it's time time
> to
> > > > discuss splitting the classic drivers off of the main development
> branch
> > > > again, although this time I have a concrete plan for how this would
> > > > work.
> > > >
> > > > First, why? Basically, all of the classic drivers are in maintanence
> > > > mode (even i965). Second, many of them rely on code that no one works
> > > > on, and very few people still understand. There is no CI for most of
> > > > them, and the Intel CI is not integrated with gitlab, so it's easy to
> > > > unintentionally break them, and this breakage usually isn't noticed
> > > > until just before or just after a release. 21.0 was held up (in small
> > > > part, also me just getting behind) because of such breakages.
> > > >
> > > > I konw there is some interest in getting i915g in good enough shape
> that
> > > > it could replace i915c, at least for the common case. I also am aware
> > > > that Dave, Ilia, and Eric (with some pointers from Ken) have been
> > > > working on a gallium driver to replace i965. Neither of those things
> are
> > > > ready yet, but I've taken them into account.
> > > >
> > > > Here's the plan:
> > > >
> > > > 1) 21.1 release happens
> > > > 2) we remove classic from master
> > > > 3) 21.1 reaches EOL because of 21.2
> > > > 4) we fork the 21.1 branch into a "classic-lts"¹ branch
> > > > 5) we disable all vulkan and gallium drivers in said branch, at
> least at
> > > >the Meson level
> > >
> > > I'm +1 for the -lts branch.. the layering between mesa "classic" and
> > > gallium is already starting to get poked thru in the name of
> > > performance, and we've already discovered cases of classic drivers
> > > being broken for multiple months with no one noticing.  I think a
> > > slower moving -lts branch is the best approach to keeping things
> > > working for folks with older hw.
> > >
> > > But possibly there is some value in not completely disabling gallium
> > > completely in the -lts branch.  We do have some older gallium drivers
> > > which do not have CI coverage and I think are not used frequently by
> > > developers who are tracking the latest main/master branch.  I'm not
> > > suggesting that we remove them from the main (non-lts) branch but it
> > > might be useful to be able to recommend users of those drivers stick
> > > with the -lts version for better stability?
> >
> > I agree with this.  Generally, I don't think we should delete anything
> > from the -lts branch.  Doing so only risks more breakage.  We probably
> > want to change some meson build defaults to not build anything but old
> > drivers but that's it.
> >
> > --Jason
> >
> > > BR,
> > > -R
> > >
> > > > 6) We change the name and precidence of the glvnd loader file
> > > > 7) apply any build fixups (turn of intel generators for versions >=
> 7.5,
> > > >for example
> > > > 8) maintain that branch with build and critical bug fixes only
> > > >
> > > > This gives ditros and end users two options.
> > > > 1) then can build *on

Re: [Mesa-dev] [RFC] Concrete proposal to split classic

2021-03-25 Thread Marek Olšák
On Thu., Mar. 25, 2021, 12:14 Dylan Baker,  wrote:

> By delete I mean "remove -Dgallium-drivers and -Dvulkan-drivers" from
> Meson. Maybe it makes sense to keep gallium for r300? But how many r300
> breakages have we had in recent memory?
>

We don't have any recent information on the status of r300. Splitting it
would be wise.

Same thinking could be applied to other gallium drivers for old hardware
that don't receive new development and are becoming more and more
irrelevant every year due to their age.

Marek


> On Wed, Mar 24, 2021, at 09:15, Jason Ekstrand wrote:
> > On Wed, Mar 24, 2021 at 10:28 AM Rob Clark  wrote:
> > >
> > > On Mon, Mar 22, 2021 at 3:15 PM Dylan Baker 
> wrote:
> > > >
> > > > Hi list,
> > > >
> > > > We've talked about it a number of times, but I think it's time time
> to
> > > > discuss splitting the classic drivers off of the main development
> branch
> > > > again, although this time I have a concrete plan for how this would
> > > > work.
> > > >
> > > > First, why? Basically, all of the classic drivers are in maintanence
> > > > mode (even i965). Second, many of them rely on code that no one works
> > > > on, and very few people still understand. There is no CI for most of
> > > > them, and the Intel CI is not integrated with gitlab, so it's easy to
> > > > unintentionally break them, and this breakage usually isn't noticed
> > > > until just before or just after a release. 21.0 was held up (in small
> > > > part, also me just getting behind) because of such breakages.
> > > >
> > > > I konw there is some interest in getting i915g in good enough shape
> that
> > > > it could replace i915c, at least for the common case. I also am aware
> > > > that Dave, Ilia, and Eric (with some pointers from Ken) have been
> > > > working on a gallium driver to replace i965. Neither of those things
> are
> > > > ready yet, but I've taken them into account.
> > > >
> > > > Here's the plan:
> > > >
> > > > 1) 21.1 release happens
> > > > 2) we remove classic from master
> > > > 3) 21.1 reaches EOL because of 21.2
> > > > 4) we fork the 21.1 branch into a "classic-lts"¹ branch
> > > > 5) we disable all vulkan and gallium drivers in said branch, at
> least at
> > > >the Meson level
> > >
> > > I'm +1 for the -lts branch.. the layering between mesa "classic" and
> > > gallium is already starting to get poked thru in the name of
> > > performance, and we've already discovered cases of classic drivers
> > > being broken for multiple months with no one noticing.  I think a
> > > slower moving -lts branch is the best approach to keeping things
> > > working for folks with older hw.
> > >
> > > But possibly there is some value in not completely disabling gallium
> > > completely in the -lts branch.  We do have some older gallium drivers
> > > which do not have CI coverage and I think are not used frequently by
> > > developers who are tracking the latest main/master branch.  I'm not
> > > suggesting that we remove them from the main (non-lts) branch but it
> > > might be useful to be able to recommend users of those drivers stick
> > > with the -lts version for better stability?
> >
> > I agree with this.  Generally, I don't think we should delete anything
> > from the -lts branch.  Doing so only risks more breakage.  We probably
> > want to change some meson build defaults to not build anything but old
> > drivers but that's it.
> >
> > --Jason
> >
> > > BR,
> > > -R
> > >
> > > > 6) We change the name and precidence of the glvnd loader file
> > > > 7) apply any build fixups (turn of intel generators for versions >=
> 7.5,
> > > >for example
> > > > 8) maintain that branch with build and critical bug fixes only
> > > >
> > > > This gives ditros and end users two options.
> > > > 1) then can build *only* the legacy branch in the a normal Mesa
> provides
> > > >libGL interfaces fashion
> > > > 2) They can use glvnd and install current mesa and the legacy branch
> in
> > > >parallel
> > > >
> > > > Because of glvnd, we can control which driver will get loaded first,
> and
> > > > thus if we decide i915g or the i965 replacement is ready and turn it
> on
> > > > by default it will be loaded by default. An end user who doesn't like
> > > > this can add a new glvnd loader file that makes the classic drivers
> > > > higher precident and continue to use them.
> > > >
> > > > Why fork from 21.1 instead of master?
> > > >
> > > > First, it allows us to delete classic immediately, which will allow
> > > > refactoring to happen earlier in the cycle, and for any fallout to be
> > > > caught and hopefully fixed before the release. Second, it means that
> > > > when a user is switched from 21.1 to the new classic-lts branch,
> there
> > > > will be no regressions, and no one has to spend time figuring out
> what
> > > > broke and fixing the lts branch.
> > > >
> > > > When you say "build and critical bug fixes", what do you mean?
> > > >
> > > > I mean update Meson if we rely on something that in 

Re: [Mesa-dev] [RFC] Concrete proposal to split classic

2021-03-23 Thread Marek Olšák
+1

We still have some CPU overhead performance targets we haven't reached. One
of them is to decrease CPU overhead for one benchmark 4 times compared to
everything we already have in master. I don't know how we are going to do
that, but we'll try.

Marek

On Mon, Mar 22, 2021 at 6:15 PM Dylan Baker  wrote:

> Hi list,
>
> We've talked about it a number of times, but I think it's time time to
> discuss splitting the classic drivers off of the main development branch
> again, although this time I have a concrete plan for how this would
> work.
>
> First, why? Basically, all of the classic drivers are in maintanence
> mode (even i965). Second, many of them rely on code that no one works
> on, and very few people still understand. There is no CI for most of
> them, and the Intel CI is not integrated with gitlab, so it's easy to
> unintentionally break them, and this breakage usually isn't noticed
> until just before or just after a release. 21.0 was held up (in small
> part, also me just getting behind) because of such breakages.
>
> I konw there is some interest in getting i915g in good enough shape that
> it could replace i915c, at least for the common case. I also am aware
> that Dave, Ilia, and Eric (with some pointers from Ken) have been
> working on a gallium driver to replace i965. Neither of those things are
> ready yet, but I've taken them into account.
>
> Here's the plan:
>
> 1) 21.1 release happens
> 2) we remove classic from master
> 3) 21.1 reaches EOL because of 21.2
> 4) we fork the 21.1 branch into a "classic-lts"¹ branch
> 5) we disable all vulkan and gallium drivers in said branch, at least at
>the Meson level
> 6) We change the name and precidence of the glvnd loader file
> 7) apply any build fixups (turn of intel generators for versions >= 7.5,
>for example
> 8) maintain that branch with build and critical bug fixes only
>
> This gives ditros and end users two options.
> 1) then can build *only* the legacy branch in the a normal Mesa provides
>libGL interfaces fashion
> 2) They can use glvnd and install current mesa and the legacy branch in
>parallel
>
> Because of glvnd, we can control which driver will get loaded first, and
> thus if we decide i915g or the i965 replacement is ready and turn it on
> by default it will be loaded by default. An end user who doesn't like
> this can add a new glvnd loader file that makes the classic drivers
> higher precident and continue to use them.
>
> Why fork from 21.1 instead of master?
>
> First, it allows us to delete classic immediately, which will allow
> refactoring to happen earlier in the cycle, and for any fallout to be
> caught and hopefully fixed before the release. Second, it means that
> when a user is switched from 21.1 to the new classic-lts branch, there
> will be no regressions, and no one has to spend time figuring out what
> broke and fixing the lts branch.
>
> When you say "build and critical bug fixes", what do you mean?
>
> I mean update Meson if we rely on something that in the future is
> deprecated and removed, and would prevent building the branch or an
> relying on some compiler behavior that changes, gaping exploitable
> security holes, that kind of thing.
>
> footnotes
> ¹Or whatever color you like your
> bikeshed___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] llvmpipe MSAA (Was: Fwd: [Mesa-users] Issues with removal of classic OSMesa)

2021-01-06 Thread Marek Olšák
It looks like llvmpipe has real MSAA with NIR.

Marek


On Wed, Jan 6, 2021 at 5:06 AM Jose Fonseca  wrote:

> That's an interesting idea!
>
> llvmpipe rasterization is complicated and very optimized, so changing
> llvmpipe's rasterizer to spit out MSAA coverages is very hard.  I think
> that a good way to approach this is to:
>
> 1) continue to do single sample rasterization, but adjust the line coeffs
> of the triangles edges (in llvmpipe rasterizer code known as *plane
> coefficients*) to do conservative rasterization (ie, so that any fragment
> that intersects a triangle is covered) when MSAA is enabled.  See
> https://developer.nvidia.com/content/dont-be-conservative-conservative-rasterization
>
> 2) once inside the fragment shader, compute SampleMaskIn from the
> unadjusted vertex positions, using the desired number of samples (and
> corresponding sample pattern)
>
> None of this would be throwaway work: the SampleMaskIn are correct and could
> be used for full MSAA support in the future too, and the conservative
> rasterization could be a feature on its own right too eventually.
>
> Jose
>
> ------
> *From:* mesa-dev  on behalf of
> Marek Olšák 
> *Sent:* Wednesday, January 6, 2021 05:57
> *To:* Brian Paul 
> *Cc:* mesa-dev@lists.freedesktop.org ;
> mesa-us...@lists.freedesktop.org 
> *Subject:* Re: [Mesa-dev] Fwd: [Mesa-users] Issues with removal of
> classic OSMesa
>
> Hi,
>
> llvmpipe could implement line and polygon smoothing by rasterizing in MSAA
> and passing the coverage to SampleMaskIn in the fragment shader, but doing
> Z/S tests and color writes and everything else single-sampled. Then,
> FragColor.a *= bitcount(SampleMaskIn) / (float)num_samples. It's roughly
> what OpenGL requires. There is at least one other gallium driver that does
> that.
>
> Marek
>
> On Mon, Jan 4, 2021 at 3:02 PM Brian Paul  wrote:
>
> Hi Andreas,
>
> I'm forwarding your message to the mesa-dev list for better visibility.
>
> BTW, when you say "antialiasing" below, what exactly do you mean?
>
> -Brian
>
>
>  Forwarded Message 
> Subject:[Mesa-users] Issues with removal of classic OSMesa
> Date:   Thu, 31 Dec 2020 12:56:04 +0100
> From:   Andreas Fänger 
> To: mesa-us...@lists.freedesktop.org
>
> Hi,
>
> I've just seen that classic OSMesa has been removed (again) from Mesa3D
> a few weeks ago with this commit "mesa: Retire classic OSMesa".
>
> We are still actively using classical OSMesa for high quality rendering
> of still images in a headless environment with no GPU support
> (server-based rendering on windows and linux)
>
> Unfortunately, none of the alternative software renderers provide all
> the features that we require, which is antialiasing and anisotropic
> filtering. The current state is (correct me if I'm wrong)
>
> * softpipe: anisotropic filtering is supported, no antialiasing
>
> * llvmpipe: no anisotropic filtering, has MSAA
>
> * openswr: no anisotropic filtering, has MSAA, no OSMesa interface (?)
>
> We had hoped that classical OSMesa is only removed when there is a full
> replacement after the discussions in 2016 when OSMesa was about to be
> removed for the first time
>
> https://lists.freedesktop.org/archives/mesa-dev/2016-March/109665.html
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Fmesa-dev%2F2016-March%2F109665.html=04%7C01%7Cjfonseca%40vmware.com%7C38811048f7d4434b32cc08d8b208137f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637455095057094583%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=HjIWy%2B4Mip6TkKYWUEuix41bPSgHUzZqgzwnBnjtBWA%3D=0>
>
> https://lists.freedesktop.org/archives/mesa-users/2016-March/001132.html
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Fmesa-users%2F2016-March%2F001132.html=04%7C01%7Cjfonseca%40vmware.com%7C38811048f7d4434b32cc08d8b208137f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637455095057094583%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=nSZ3ZSfFRt8Z06avVzILOg4TipMkZ9CLyGMONcOkZxk%3D=0>
>
> and the commit that reverted the removal
>
>
> http://cgit.freedesktop.org/mesa/mesa/commit/?id=9601815b4be886f4d92bf74916de98f3bdb7275c
> <https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcgit.freedesktop.org%2Fmesa%2Fmesa%2Fcommit%2F%3Fid%3D9601815b4be886f4d92bf74916de98f3bdb7275c=04%7C01%7Cjfonseca%40vmware.com%7C38811048f7d4434b32cc08d8b208137f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637455095057104572%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2

[Mesa-dev] Applying fixes to stable branches

2021-01-06 Thread Marek Olšák
Hi,

How do you apply the fixes?

Is it possible to pick a random commit in master and apply all fixes that
are newer than that commit?

Thanks,
Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Fwd: [Mesa-users] Issues with removal of classic OSMesa

2021-01-05 Thread Marek Olšák
Hi,

llvmpipe could implement line and polygon smoothing by rasterizing in MSAA
and passing the coverage to SampleMaskIn in the fragment shader, but doing
Z/S tests and color writes and everything else single-sampled. Then,
FragColor.a *= bitcount(SampleMaskIn) / (float)num_samples. It's roughly
what OpenGL requires. There is at least one other gallium driver that does
that.

Marek

On Mon, Jan 4, 2021 at 3:02 PM Brian Paul  wrote:

> Hi Andreas,
>
> I'm forwarding your message to the mesa-dev list for better visibility.
>
> BTW, when you say "antialiasing" below, what exactly do you mean?
>
> -Brian
>
>
>  Forwarded Message 
> Subject:[Mesa-users] Issues with removal of classic OSMesa
> Date:   Thu, 31 Dec 2020 12:56:04 +0100
> From:   Andreas Fänger 
> To: mesa-us...@lists.freedesktop.org
>
> Hi,
>
> I've just seen that classic OSMesa has been removed (again) from Mesa3D
> a few weeks ago with this commit "mesa: Retire classic OSMesa".
>
> We are still actively using classical OSMesa for high quality rendering
> of still images in a headless environment with no GPU support
> (server-based rendering on windows and linux)
>
> Unfortunately, none of the alternative software renderers provide all
> the features that we require, which is antialiasing and anisotropic
> filtering. The current state is (correct me if I'm wrong)
>
> * softpipe: anisotropic filtering is supported, no antialiasing
>
> * llvmpipe: no anisotropic filtering, has MSAA
>
> * openswr: no anisotropic filtering, has MSAA, no OSMesa interface (?)
>
> We had hoped that classical OSMesa is only removed when there is a full
> replacement after the discussions in 2016 when OSMesa was about to be
> removed for the first time
>
> https://lists.freedesktop.org/archives/mesa-dev/2016-March/109665.html
>
> https://lists.freedesktop.org/archives/mesa-users/2016-March/001132.html
>
> and the commit that reverted the removal
>
>
> http://cgit.freedesktop.org/mesa/mesa/commit/?id=9601815b4be886f4d92bf74916de98f3bdb7275c
>
> Are there any plans to enhance the renderers so that at least one of
> them is providing both anisotropic filtering and antialiasing?
>
> As far as I know, anisotropic texture filtering is also one of the
> OpenGL 4.6 requirements.
>
> In 2016 I was told that there are only very few developers involved in
> llvmpipe and that chances are not high that someone is going to port the
> softpipe anisotropic filtering implementation as llvmpipe is much more
> complex. Is there any change in that situation?
>
> If there are no such plans, is there any chance of reverting this commit
> again so that classical OSMesa is available for windows and linux in
> mesa >20?
>
> Regards,
>
> Andreas Fänger
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa 20.2.x and GL_RG8_SNORM/GL_NONE

2020-10-17 Thread Marek Olšák
If CPU overhead is a problem, mesa_glthread=true is likely to give you a
large performance boost and it should be conformant.

Marek

Marek

On Fri., Oct. 16, 2020, 00:28 Jason Ekstrand,  wrote:

> Generally, you need to be careful with forcing no_error.  Some apps
> rely on gl errors to check for features and other things.
> Force-disabling errors may break the app.  Mesa's implementation of
> the no_error extension has been a gradual process where people have
> been removing the error checking paths one at a time as time permits.
> It's entirely possible that an error checking path that the app relies
> on got removed from Mesa.  The solution to this is to stop forcing
> no_error.
>
> --Jason
>
> On Thu, Oct 15, 2020 at 9:17 PM Daniel Mota Leite 
> wrote:
> >
> > >   Since i updated to mesa 20.2.0 and then to 20.2.1, i'm unable to
> > > load war thunder game, it now just returns:
> > >
> > > Unsupported format/type: GL_RG8_SNORM/GL_NONE
> >
> > The other use found that running as a different user the game
> run,
> > so after some debug, we found that we had the mesa_no_error" option
> > enabled, from past tests. With this option disabled, the game runs again
> > without any problem.
> >
> > Still, do not know if the problem was trigger by a game update or
> > by the mesa 20.2.x update. The no_error did give a small performance
> > increase in the past.
> >
> > Best regards
> > higuita
> > --
> > Naturally the common people don't want war... but after all it is the
> > leaders of a country who determine the policy, and it is always a
> > simple matter to drag the people along, whether it is a democracy, or
> > a fascist dictatorship, or a parliament, or a communist dictatorship.
> > Voice or no voice, the people can always be brought to the bidding of
> > the leaders. That is easy. All you have to do is tell them they are
> > being attacked, and denounce the pacifists for lack of patriotism and
> > exposing the country to danger.  It works the same in every country.
> >-- Hermann Goering, Nazi and war criminal, 1883-1946
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Rust drivers in Mesa

2020-10-04 Thread Marek Olšák
I think it's just going to get more messy and complicated for people who
don't want to learn or use another language. Mesa already requires people
to know C, Python, and now newly Gitlab CI scripts just to get stuff done
and merged. Another language would only exacerbate the issue and steepen
the learning curve.

Marek

On Thu., Oct. 1, 2020, 21:36 Alyssa Rosenzweig, <
alyssa.rosenzw...@collabora.com> wrote:

> Hi all,
>
> Recently I've been thinking about the potential for the Rust programming
> language in Mesa. Rust bills itself a safe system programming language
> with comparable performance to C [0], which is a naturally fit for
> graphics driver development.
>
> Mesa today is written primarily in C, a notoriously low-level language,
> with some components in C++. To handle the impedance mismatch, we've
> built up a number of abstractions in-tree, including multiple ad hoc
> code generators (GenXML, NIR algebraic passes, Bifrost disassembler). A
> higher level language can help avoid the web of metaprogramming and
> effect code that is simpler and easier to reason about. Similarly, a
> better type system can aid static analysis.
>
> Beyond abstraction, Rust's differentiating feature is the borrow checker
> to guarantee memory safety. Historically, safety has not been a primary
> concern of graphics drivers, since drivers are implemented as regular
> userspace code running in the process of the application calling them.
> Unfortunately, now that OpenGL is being exposed to untrusted code via
> WebGL, the driver does become an attack vector.
>
> For the time being, Mesa attempts to minimize memory bugs with defensive
> programming, safe in-tree abstractions (including ralloc), and static
> analysis via Coverity. Nevertheless, these are all heuristic solutions.
> Static analysis is imperfect and in our case, proprietary software.
> Ideally, the bugs we've been fixing via Coverity could be caught at
> compile-time with a free and open source toolchain.
>
> As Rust would allow exactly this, I see the primary benefit of Rust in
> verifying correctness and robustness, rather than security concerns per
> se.  Indeed, safety guarantees do translate well beyond WebGL.
>
> Practically, how would Rust fit in with our existing C codebase?
> Obviously I'm not suggesting a rewrite of Mesa's more than 15 million
> lines of C. Instead, I see value in introducing Rust in targeted parts
> of the tree. In particular, I envision backend compilers written in part
> in Rust. While creating an idiomatic Rust wrapper for NIR or Gallium
> would be prohibitively costly for now, a backend compiler could be
> written in Rust with IR builders exported for use of the NIR -> backend
> IR translator written in C.
>
> This would have minimal impact on the tree. Users that are not building
> such a driver would be unaffected. For those who _are_ building Rust
> code, the Rust compiler would be added as a build-time dependency and
> the (statically linked) Rust standard library would be added as a
> runtime dependency. There is concern about the Rust compiler requiring
> LLVM as a dependency, but again this is build-time, and no worse than
> Mesa already requiring LLVM as a runtime dependency for llvmpipe and
> clover. As for the standard library, it is possible to eliminate the
> dependency as embedded Rust does, perhaps calling out to the C standard
> library via the FFI, but this is likely quixotic. I do regret the binary
> size increase, however.
>
> Implications for the build system vary. Rust prefers to be built by its
> own package manager, Cargo, which is tricky to integrate with other
> build systems. Actually, Meson has native support for Rust, invoking the
> compiler directly and skipping Cargo, as if it were C code. This support
> is not widely adopted as it prevents linking with external libraries
> ("crates", in Rust parlance), with discussions between Rust and Meson
> developers ending in a stand-still [1]. For Mesa, this might be just
> fine. Our out-of-tree run-time dependencies are minimal for the C code,
> and Rust's standard library largely avoids the need for us to maintain a
> Rust version of util/ in-tree. If this proves impractical in the
> long-term, it is possible to integrate Cargo with Meson on our end [2].
>
> One outstanding concern is build-time, which has been a notorious
> growing pain for Rust due to both language design and LLVM itself [3],
> although there is active work to improve both fronts [4][5]. I build
> Mesa on my Arm laptop, so I suppose I'd be hurt more than many of us.
> There's also awkward bootstrapping questions, but there is work here too
> [6].
>
> If this is of interest, please discuss. It's clear to me Rust is not
> going away any time soon, and I see value in Mesa embracing the new
> technology. I'd like to hear other Mesa developers' thoughts.
>
> Thanks,
>
> Alyssa
>
> [0] https://www.rust-lang.org/
> [1] https://github.com/mesonbuild/meson/issues/2173
> [2] 

Re: [Mesa-dev] issue about context reference

2020-09-30 Thread Marek Olšák
Hi,

Does the issue happen with mesa/master?

Marek


On Mon, Sep 28, 2020 at 3:11 AM Zhu Yijun  wrote:

> hi all,
>
> I use qemu/kvm to boot some android guests with virgl and run apks,
> the host kernel invokes oom after several hours.
>
> 1. From /proc/meminfo, I see the 'SUnreclaim' is the largest part.
>
> MemTotal: 16553672 kB
> MemFree: 128688 kB
> MemAvailable: 34648 kB
> Slab: 10169908 kB
> SReclaimable: 64632 kB
> SUnreclaim: 10105276 kB
>
> 2. From slabinfo, 'kmalloc-8192' nearly uses 5G memory which is the
> largest part of slab.
>
> kmalloc-8192 566782 566782 8192 4 8 : tunables 0 0 0 : slabdata 141697
> 141697 0
>
> 3. Then I append 'slub_debug=U,kmalloc-8192' to host kernel command
> line to reproduce this issue, find the call number of amdgpu_ctx_free
> is much less than amdgpu_ctx_alloc after running a few minutes test
> with only one android guest. It is the reason that 'kmalloc-8192' slab
> memory increases continuously.
>
> #cat /sys/kernel/slab/kmalloc-8192/alloc_calls
>   2 __vring_new_virtqueue+0x64/0x188 [virtio_ring]
> age=2779779/2779779/2779779 pid=1069 cpus=19 nodes=0
>   1 rd_alloc_device+0x34/0x48 [target_core_mod] age=2776755
> pid=1969 cpus=20 nodes=0
>   2 mb_cache_create+0x7c/0x128 [mbcache]
> age=2777018/2777221/2777425 pid=1186-1810 cpus=3,36 nodes=0
>   2 ext4_fill_super+0x128/0x25b0 [ext4]
> age=2777019/2777222/2777426 pid=1186-1810 cpus=3,36 nodes=0
>   2 svc_rqst_alloc+0x3c/0x170 [sunrpc] age=2775427/2775462/2775497
> pid=2346-2636 cpus=36-37 nodes=0
>   2 cache_create_net+0x4c/0xc0 [sunrpc]
> age=2737590/2757403/2777217 pid=1280-4987 cpus=20,44 nodes=0
>   2 rpc_alloc_iostats+0x2c/0x60 [sunrpc]
> age=2775494/2775495/2775497 pid=2346 cpus=36 nodes=0
>1570 amdgpu_ctx_init+0xb4/0x2a0 [amdgpu] age=30110/314435/1914218
> pid=63167 cpus=1-7,9-10,16-20,23,27,29-35,40-47,52,60,63,95,118,120,122-123
> nodes=0
>1570 amdgpu_ctx_ioctl+0x198/0x2f8 [amdgpu] age=30110/314435/1914218
> pid=63167 cpus=1-7,9-10,16-20,23,27,29-35,40-47,52,60,63,95,118,120,122-123
> nodes=0
>   2 gfx_v8_0_init_microcode+0x290/0x740 [amdgpu]
> age=2776838/2776924/2777011 pid=660 cpus=64 nodes=0
>   2 construct+0xe0/0x4b8 [amdgpu] age=2776819/2776901/2776983
> pid=660 cpus=64 nodes=0
>   2 mod_freesync_create+0x68/0x1d0 [amdgpu]
> age=2776819/2776901/2776983 pid=660 cpus=64 nodes=0
>   1 kvm_set_irq_routing+0xa8/0x2c8 [kvm_arm_0] age=1909635
> pid=63172 cpus=56 nodes=0
>   1 fat_fill_super+0x5c/0xc20 [fat] age=2777014 pid=1817 cpus=49
> nodes=0
>  11 cgroup1_mount+0x180/0x4e0 age=2779901/2779901/2779911 pid=1
> cpus=1 nodes=0
>  12 kvmalloc_node+0x64/0xa8 age=35454/1370665/2776188
> pid=2176-63167 cpus=2,23,34,42,44 nodes=0
> 128 zswap_dstmem_prepare+0x48/0x78 age=2780252/2780252/2780252
> pid=1 cpus=19 nodes=0
>   1 register_leaf_sysctl_tables+0x9c/0x1d0 age=2786535 pid=0 cpus=0
> nodes=0
>   2 do_register_framebuffer+0x298/0x300
> age=2779680/2783032/2786385 pid=1-656 cpus=0,5 nodes=0
>   1 vc_do_resize+0xb4/0x570 age=2786385 pid=1 cpus=5 nodes=0
>   5 vc_allocate+0x144/0x218 age=2776216/2776219/2776224 pid=2019
> cpus=40 nodes=0
>   8 arm_smmu_device_probe+0x2d8/0x640 age=2780865/2780894/2780924
> pid=1 cpus=0 nodes=0
>   4 __usb_create_hcd+0x44/0x258 age=2780467/2780534/2780599
> pid=5-660 cpus=0,64 nodes=0
>   2 xhci_alloc_virt_device+0x9c/0x308 age=2780463/2780476/2780489
> pid=5-656 cpus=0 nodes=0
>   1 hid_add_field+0x120/0x320 age=2780373 pid=1 cpus=19 nodes=0
>   2 hid_allocate_device+0x2c/0x100 age=2780345/2780362/2780380
> pid=1 cpus=19 nodes=0
>   1 ipv4_sysctl_init_net+0x44/0x148 age=2737590 pid=4987 cpus=44
> nodes=0
>   1 ipv4_sysctl_init_net+0xa8/0x148 age=2737590 pid=4987 cpus=44
> nodes=0
>   1 ipv4_sysctl_init_net+0xf8/0x148 age=2780293 pid=1 cpus=19 nodes=0
>   1 netlink_proto_init+0x60/0x19c age=2786498 pid=1 cpus=0 nodes=0
>   1 ip_rt_init+0x3c/0x20c age=2786473 pid=1 cpus=3 nodes=0
>   1 ip_rt_init+0x6c/0x20c age=2786472 pid=1 cpus=3 nodes=0
>   1 udp_init+0xa0/0x108 age=2786472 pid=1 cpus=4 nodes=0
>
> # cat /sys/kernel/slab/kmalloc-8192
> #cat /sys/kernel/slab/kmalloc-8192/free_calls
>1473  age=4297679817 pid=0 cpus=0 nodes=0
>  46 rpc_free+0x5c/0x80 [sunrpc] age=1760585/1918856/1935279
> pid=33422-68056 cpus=32,34,38,40-42,48,55,57,59,61-63 nodes=0
>   1 rpc_free_iostats+0x14/0x20 [sunrpc] age=2776482 pid=2346 cpus=36
> nodes=0
> 122 free_user_work+0x30/0x40 [ipmi_msghandler]
> age=59465/347716/1905020 pid=781-128311 cpus=32-46,50,52,63 nodes=0
> 740 amdgpu_ctx_fini+0x98/0xc8 [amdgpu] age=32012/286664/1910687
> pid=63167-63222
> cpus=1-11,16-24,27,29-35,40,42-45,47,52,60,63,95,118,120,122-123
> nodes=0
> 719 amdgpu_ctx_fini+0xb0/0xc8 [amdgpu] age=31957/287696/1910687
> pid=63167-63222
> cpus=1-7,10-11,13,16-24,27,29-35,40-47,52,57,60,63,95,118,120,122-123
> nodes=0
>   1 dc_release_state+0x3c/0x48 [amdgpu] age=2777920 

Re: [Mesa-dev] [PATCH v3 2/4] gallium/auxiliary: Add helper support for bptc format compress/decompress

2020-09-22 Thread Marek Olšák
bptc-float-modes is fixed by:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6774

Marek

On Tue, Sep 22, 2020 at 4:33 AM Denis Pauk  wrote:

> Hi Dave,
>
> Could you please check
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6809/diffs ?
>
> It contains possible fixes for bptc rgba unorm. Bug is related to
> incorrect reuse current bit_offet reuse between iterations.
>
> Decompress rgb_float has left without fixes for now.
>
> On Sat, May 23, 2020 at 12:47 PM Denis Pauk  wrote:
>
>> Hi Dave,
>>
>> I had tested code before only with bptc-modes and bptc-float-modes from
>> piglit. It was free time project, so no real tests.
>>
>> Code had reused implementation from intel classic driver if i correctly
>> remember.
>>
>> Maybe something wrong with pixel type conversion. I will check.
>>
>> On Sun, May 10, 2020 at 10:26 AM Dave Airlie  wrote:
>>
>>> On Wed, 27 Jun 2018 at 06:36, Denis Pauk  wrote:
>>> >
>>> > Reuse code shared with mesa/main/texcompress_bptc.
>>> >
>>> > v2: Use block decompress function
>>> > v3: Include static bptc code from texcompress_bptc_tmp.h
>>> > Suggested-by: Marek Olšák 
>>> >
>>> > Signed-off-by: Denis Pauk 
>>> > CC: Nicolai Hähnle 
>>> > CC: Marek Olšák 
>>> > CC: Gert Wollny 
>>> > ---
>>>
>>> Hi Denis,
>>>
>>> not sure you are still around or interested in this code, but I've
>>> recently run Vulkan CTS over it and it fails some bc7 tests.
>>>
>>> It also fails a piglit test that I'm not sure is related or not yet.
>>>
>>> It only seems to be a corner case failure, like 6 or 7 pixels in the
>>> CTS cases, but I'm wondering if you have any insight or memory of
>>> where it might diverge.
>>>
>>> Dave.
>>>
>>
>>
>> --
>> Best regards,
>>   Denis.
>>
>
>
> --
> Best regards,
>   Denis.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Gallium interface rename proposals

2020-09-19 Thread Marek Olšák
Hi,

I don't know if you have been following gitlab, but there are a few
cleanups that I have been considering doing.

Rename PIPE_TRANSFER flags to PIPE_MAP, and pipe_transfer_usage to
pipe_map_flags:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5749

Other proposed renames:

transfer_map -> resource_map
transfer_unmap -> resource_unmap
transfer_flush_region -> resource_flush_mapped_range
draw_vbo -> draw

pipe_transfer_* aux helpers -> pipe_resource_* or pipe_texture_* depending
on context. We already have pipe_buffer_map.

I'm inclined to keep the struct pipe_transfer name unchanged to indicate
that mappings can cause internal copies.

Please let me know your preferences.

Thanks,
Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Gallium: Anybody object to adding a new PIPE_CAP_NIR_ATOMICS_AS_DEREF?

2020-08-08 Thread Marek Olšák
No objections from me.

Marek

On Mon, Aug 3, 2020 at 6:07 AM Gert Wollny 
wrote:

> Hi,
>
> has anybody any objection to adding a new cap
> PIPE_CAP_NIR_ATOMICS_AS_DEREF, see:
>
>
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6025/diffs?commit_id=1e8c3032b96a4878f6fee44aaa10ca341da97e1f
>
> Otherwise I'd like to merge this to move forward with r600/nir atomics
> support,
>
> many thanks,
> Gert
>
>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Unit and performance tests for VA-API

2020-06-29 Thread Marek Olšák
+ Leo

On Sat., Jun. 27, 2020, 06:30 Jahnvi Gupta,  wrote:

> Greeting!
> I want to contribute to Project "Unit and performance tests for VA-API".
> Please update me on the status of the project. I would also like to know
> any Prerequisite for the project. Some pointers for getting started will be
> very helpful.
>
> Rgds
> Jahnvi
>
>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] mediump support: future work

2020-05-04 Thread Marek Olšák
16-bit varyings only make sense if they are packed, i.e. we need to fit 2
16-bit 4D varyings into 1 vec4 slot to save memory for IO. Without that,
AMD (and most others?) won't benefit from 16-bit IO much.

16-bit uniforms would help everybody, because there is potential for
uniform packing, saving memory (and cache lines).

The other items are just for eliminating conversion instructions. We must
have more vectorized 16-bit vec2 instructions than "conversion
instructions + vec2 packing instructions" for mediump to pay off. We also
don't get decreased register usage if we are not vectorized, so mediump is
a tough sell at the moment.

Marek

On Mon, May 4, 2020 at 7:03 PM Rob Clark  wrote:

> On Mon, May 4, 2020 at 11:44 AM Marek Olšák  wrote:
> >
> > Hi,
> >
> > This is the status of mediump support in Mesa. What I listed is what AMD
> GPUs can do. "Yes" means what Mesa supports.
> >
> > Feature FP16 support Int16 support
> > ALU Yes No
> > Uniforms No No
> > VS in No No
> > VS out / FS in No No
> > FS out No No
> > TCS, TES, GS out / in No No
> > Sampler coordinates (only coord, derivs, lod, bias; not offset and
> compare) No ---
> > Image coordinates --- No
> > Return value from samplers (incl. sampler buffers) Yes
> > No
> > Return value from image loads (incl. image buffers) No No
> > Data source for image stores (incl. image buffers) No No
> > If 16-bit sampler/image instructions are surrounded by conversions,
> promote them to 32 bits No No
> >
> > Please let me know if you don't see the table correctly.
> >
> > I'd like to know if I can enable some of them using the existing FP16
> CAP. The only drivers supporting FP16 are currently Freedreno and Panfrost.
> >
>
> I think in general it should be ok.
>
> I think for ir3 we want 32b inputs/outputs for geom stages
> (vs/hs/ds/gs).  For frag outs we use nir_lower_mediump_outputs.. maybe
> this is a good approach to continue, to use a simple nir lowering pass
> for cases where a shader stage can directly take 16b input/output.
> For frag inputs we fold the narrowing conversion in to the varying
> fetch instruction in backend.
>
> int16 would be pretty useful, for loop counters especially.. these can
> have a long live-range and currently wastefully occupy a full 32b reg.
>
> Uniforms we haven't cared too much about, since we can (usually) read
> a 32b uniform as a 16b and fold that directly into alu instructions..
> we handle that in the backend.
>
> Pushing mediump support further would be great, and we can definitely
> help if it ends up needing changes in freedreno backend.  The deqp
> coverage in CI should give us pretty good confidence about whether or
> not we are breaking things in the ir3 backend.
>
> BR,
> -R
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] mediump support: future work

2020-05-04 Thread Marek Olšák
Hi,

This is the status of mediump support in Mesa. What I listed is what AMD
GPUs can do. "Yes" means what Mesa supports.

*Feature* *FP16 support* *Int16 support*
ALU Yes No
Uniforms No No
VS in No No
VS out / FS in No No
FS out No No
TCS, TES, GS out / in No No
Sampler coordinates (only coord, derivs, lod, bias; not offset and compare)
No ---
Image coordinates --- No
Return value from samplers (incl. sampler buffers) Yes
No
Return value from image loads (incl. image buffers) No No
Data source for image stores (incl. image buffers) No No
If 16-bit sampler/image instructions are surrounded by conversions, promote
them to 32 bits No No

Please let me know if you don't see the table correctly.

I'd like to know if I can enable some of them using the existing FP16 CAP.
The only drivers supporting FP16 are currently Freedreno and Panfrost.

Thanks,
Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [ANNOUNCE] mesa 20.0.5

2020-04-23 Thread Marek Olšák
I don't think this was preventable given how many things went wrong.
Sometimes we have to accept that shit happens. We can fix that in the next
release.

Marek

On Thu., Apr. 23, 2020, 07:01 Danylo Piliaiev, 
wrote:

> I want to sum up what happened from my perspective, I think it could be
> useful to improve the process:
>
> 1) There was a regression in 20.*
> (https://gitlab.freedesktop.org/mesa/mesa/-/issues/2758)
> 2) I "fixed" the regression and broke non-iris drivers
> (
> https://gitlab.freedesktop.org/mesa/mesa/-/commit/d684fb37bfbc47d098158cb03c0672119a4469fe
> )
> 3) I "fixed" the new regression of fix 2)
> (
> https://gitlab.freedesktop.org/mesa/mesa/-/commit/829013d0cad0fa2513b32ae07cf8d745f6e5c62d
> )
> 4) After that, it appeared that due to a bug in piglit, Intel CI didn't
> run piglit tests which gave me a false sense of commit's correctness
>(https://gitlab.freedesktop.org/mesa/mesa/-/issues/2815)
> 5) I aimed to have a fix before the next minor release on 2020-04-29 by
> consulting the release calendar.
> 6) I accidentally saw 20.0.5 being released with one of the two of my
> commits.
>
> I see multiple failure points:
> 1) Me not carefully examining all code paths, because at least one that
> failed should have been obvious to test.
> 2) CI not communicating that piglit tests were not executed. Again, I
> could have seen this, examined CI results, but did not.
> 3) After restoration of CI capabilities test what added to "expected
> failure" and this may have contributed to regression
> being missed when testing the release. I'm not sure about this part
> so correct me if I'm wrong.
> 4) I didn't know about this release and that this release was help up
> for the fix of 2758.
> 5) There were now window between announcing the scope of the release and
> release itself. Since I knew about regression
> I could have notified about it. Also there is no milestone for minor
> releases so it's problematic to link issue and release.
>
> It's a second release in a row with clear regression crept in. I believe
> that we can use this to improve the process and
> safeguard against such regressions in the future.
>
> P.S. I'm preparing and will test a final fix which will be sent soon.
>
> Danylo
>
> On 23.04.20 07:40, Dylan Baker wrote:
> > Quoting Ilia Mirkin (2020-04-22 15:47:59)
> >> On Wed, Apr 22, 2020 at 6:39 PM Danylo Piliaiev
> >>  wrote:
> >>> I'm sorry for this trouble. However looking at it I think: maybe
> something could be
> >>> changed about applying patches to stable to safeguard against such
> issues.
> >> We used to get pre-announcements a few days prior to a release which
> >> would allow developers to look things over, which would allow us to
> >> notice things like that. Not sure when this got dropped.
> >>
> >> Cheers,
> >>
> >>-ilia
> > That was dropped in favor of a live staging branch that is updated daily
> (at
> > least on week days).
> >
> > Dylan
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Remove classic drivers or fork src/mesa for gallium?

2020-04-07 Thread Marek Olšák
The first milestone:
- make src/compiler a standalone lib

is done: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4400

Marek

On Mon, Mar 30, 2020 at 11:40 AM Kristian Høgsberg 
wrote:

> On Mon, Mar 30, 2020, 7:59 AM Adam Jackson  wrote:
>
>> On Sun, 2020-03-29 at 09:45 -0700, Kristian Høgsberg wrote:
>>
>> > As for loading, doesn't glvnd solve that?
>>
>> It could. It does not. Right now there's not a (good) way for the DDX
>> driver to communicate a preferred implementation name to the GLX
>> client. xserver's glx just knows it needs an implementation named mesa,
>> and nvidia's glx, nvidia. Not a hard thing to wire up, and in fact you
>> can give multiple names and the client side will try them in sequence
>> so fallback stands a chance of working.
>>
>> Now, if we're doing that, we should maybe consider using glvnd's
>> libGLdispatch directly, as I think right now we have an ugly double-
>> indirection between glHamSandwichEXT and _mesa_HamSandwichEXT if you're
>> building for glvnd. The only thing in the world besides Mesa that cares
>> about glapi and what a DRI driver interface is is xserver, and that's a
>> detail I'd like to eliminate and the new EGL-backed GLX in Xwayland
>> gets really close to eliminating it. But if nobody else gets excited
>> that much about fixing GLX, I completely understand.
>>
>
> Yeah it would make sense to disable the double dispatch and it would be
> tempting to get rid of dri driver loading entirely then...
>
> - ajax
>>
>>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: fix Segmentation fault during vaapi enc test

2020-04-06 Thread Marek Olšák
Hi James,

We use gitlab for merge requests and pushing:
https://www.mesa3d.org/submittingpatches.html

Marek


On Mon, Apr 6, 2020 at 2:12 PM James Zhu  wrote:

> Fix Segmentation fault duiring vaapi enc test on Arcturus.
>
> Signed-off-by: James Zhu 
> ---
>  src/gallium/drivers/radeonsi/si_compute_blit.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/drivers/radeonsi/si_compute_blit.c
> b/src/gallium/drivers/radeonsi/si_compute_blit.c
> index 6e3b07c..a56676a 100644
> --- a/src/gallium/drivers/radeonsi/si_compute_blit.c
> +++ b/src/gallium/drivers/radeonsi/si_compute_blit.c
> @@ -63,7 +63,8 @@ static void si_launch_grid_internal(struct si_context
> *sctx, struct pipe_grid_in
> sctx->flags |= SI_CONTEXT_STOP_PIPELINE_STATS;
> sctx->render_cond_force_off = true;
> /* Skip decompression to prevent infinite recursion. */
> -   sctx->blitter->running = true;
> +   if (sctx->blitter)
> +  blitter->running = true;
>
> /* Dispatch compute. */
> sctx->b.launch_grid(>b, info);
> @@ -72,7 +73,8 @@ static void si_launch_grid_internal(struct si_context
> *sctx, struct pipe_grid_in
> sctx->flags &= ~SI_CONTEXT_STOP_PIPELINE_STATS;
> sctx->flags |= SI_CONTEXT_START_PIPELINE_STATS;
> sctx->render_cond_force_off = false;
> -   sctx->blitter->running = false;
> +   if (sctx->blitter)
> +  sctx->blitter->running = false;
>  }
>
>  static void si_compute_clear_12bytes_buffer(struct si_context *sctx,
> struct pipe_resource *dst,
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Remove classic drivers or fork src/mesa for gallium?

2020-03-29 Thread Marek Olšák
If you want a complete fork or removal, that's fine with me. It's
technically more challenging for driver loaders and packaging though.

Marek

On Sun., Mar. 29, 2020, 02:51 Jason Ekstrand,  wrote:

> On Sat, Mar 28, 2020 at 11:41 PM Marek Olšák  wrote:
> >
> > The #include spaghetti will be resolved before the split. I don't care
> about including gallium, but no code will include src/mesa outside of that.
>
> If we make sure that we modify the #include guards on every header in
> src/mesa_classic so that any accidental cross-includes lead to double
> definitions and therefore compile errors, that would probably help.
> It'd certainly give me a lot more confidence that we've done it right.
>
> > The biggest part is to make src/compiler completely independent and
> that's a worthy goal by itself.
>
> Yeah, I've wanted to see that happen since we started splitting stuff
> out to make Vulkan a possibility.
>
> > Milestones:
> > - make src/compiler a standalone lib
> > - don't include src/mesa in other places
> > - split classic drivers into src/mesa_classic
>
> If you're willing to do the work, I guess I'm not opposed for now.
>
> That said, I also have some somewhat selfish reasons for wanting to do
> a fork.  In particular, I'd like to freeze i965 and possibly Gen7
> Vulkan in time so that we can stop maintaining the i965 and the old
> vec4 back-end compiler.  Even though we're not adding features, I
> still find myself having to fix those up from time to time due to
> reworks elsewhere.  Maybe the answer is to copy and isolate them too
> but, at that point, why not just put them in a branch?
>
> --Jason
>
>
> > Marek
> >
> > On Sun., Mar. 29, 2020, 00:08 Jason Ekstrand, 
> wrote:
> >>
> >> On Wed, Mar 25, 2020 at 6:32 PM Marek Olšák  wrote:
> >> >
> >> >
> >> >
> >> > On Thu, Dec 5, 2019 at 2:58 AM Kenneth Graunke 
> wrote:
> >> >>
> >> >> On Tuesday, December 3, 2019 4:39:15 PM PST Marek Olšák wrote:
> >> >> > Hi,
> >> >> >
> >> >> > Here are 2 proposals to simplify and better optimize the
> GL->Gallium
> >> >> > translation.
> >> >> >
> >> >> > 1) Move classic drivers to a fork of Mesa, and remove them from
> master.
> >> >> > Classic drivers won't share any code with master. glvnd will load
> them, but
> >> >> > glvnd is not ready for this yet.
> >> >> >
> >> >> > 2) Keep classic drivers. Fork src/mesa for Gallium. I think only
> mesa/main,
> >> >> > mesa/vbo, mesa/program, and drivers/dri/common need to be forked
> and
> >> >> > mesa/state_tracker moved. src/gallium/state-trackers/gl/ can be
> the target
> >> >> > location.
> >> >> >
> >> >> > Option 2 is more acceptable to people who want to keep classic
> drivers in
> >> >> > the tree and it can be done right now.
> >> >> >
> >> >> > Opinions?
> >> >> >
> >> >> > Thanks,
> >> >> > Marek
> >> >>
> >> >> FWIW, I am not in favor of either plan for the time being.
> >> >>
> >> >> - I agree with Eric that we're finally starting to clean up and
> >> >>   de-duplicate things, and pay off technical debt we've put off for
> >> >>   a long time.  I think forking everything in-tree would just be a
> >> >>   giant mess.
> >> >>
> >> >> - I agree with Dave that this seems to be a giant hassle for our
> >> >>   downstreams with little benefit for them in the short term.
> >> >
> >> >
> >> > If classic drivers were moved to src/mesa_classic, nothing would
> change for downstream.
> >>
> >> I'm concerned that doing that is just asking for more maintenance
> >> problems than we have today.  One of the problems we currently have is
> >> that our #includes are still spaghetti.  We've got stuff in src/util
> >> which inclues stuff in gallium as well as stuff in mesa/main.  If we
> >> even have one cross-wired include, it could lead to bezar and nearly
> >> impossible to trace bugs due to ABI incompatibility between the two
> >> copies of src/mesa the moment we start changing structs or function
> >> prototypes.  The obvious answer to this is "we'll sort out the
> >> spaghetti mess".  However, that also means serious changes to
> >> src/

Re: [Mesa-dev] Remove classic drivers or fork src/mesa for gallium?

2020-03-28 Thread Marek Olšák
The #include spaghetti will be resolved before the split. I don't care
about including gallium, but no code will include src/mesa outside of that.
The biggest part is to make src/compiler completely independent and that's
a worthy goal by itself.

Milestones:
- make src/compiler a standalone lib
- don't include src/mesa in other places
- split classic drivers into src/mesa_classic

Marek

On Sun., Mar. 29, 2020, 00:08 Jason Ekstrand,  wrote:

> On Wed, Mar 25, 2020 at 6:32 PM Marek Olšák  wrote:
> >
> >
> >
> > On Thu, Dec 5, 2019 at 2:58 AM Kenneth Graunke 
> wrote:
> >>
> >> On Tuesday, December 3, 2019 4:39:15 PM PST Marek Olšák wrote:
> >> > Hi,
> >> >
> >> > Here are 2 proposals to simplify and better optimize the GL->Gallium
> >> > translation.
> >> >
> >> > 1) Move classic drivers to a fork of Mesa, and remove them from
> master.
> >> > Classic drivers won't share any code with master. glvnd will load
> them, but
> >> > glvnd is not ready for this yet.
> >> >
> >> > 2) Keep classic drivers. Fork src/mesa for Gallium. I think only
> mesa/main,
> >> > mesa/vbo, mesa/program, and drivers/dri/common need to be forked and
> >> > mesa/state_tracker moved. src/gallium/state-trackers/gl/ can be the
> target
> >> > location.
> >> >
> >> > Option 2 is more acceptable to people who want to keep classic
> drivers in
> >> > the tree and it can be done right now.
> >> >
> >> > Opinions?
> >> >
> >> > Thanks,
> >> > Marek
> >>
> >> FWIW, I am not in favor of either plan for the time being.
> >>
> >> - I agree with Eric that we're finally starting to clean up and
> >>   de-duplicate things, and pay off technical debt we've put off for
> >>   a long time.  I think forking everything in-tree would just be a
> >>   giant mess.
> >>
> >> - I agree with Dave that this seems to be a giant hassle for our
> >>   downstreams with little benefit for them in the short term.
> >
> >
> > If classic drivers were moved to src/mesa_classic, nothing would change
> for downstream.
>
> I'm concerned that doing that is just asking for more maintenance
> problems than we have today.  One of the problems we currently have is
> that our #includes are still spaghetti.  We've got stuff in src/util
> which inclues stuff in gallium as well as stuff in mesa/main.  If we
> even have one cross-wired include, it could lead to bezar and nearly
> impossible to trace bugs due to ABI incompatibility between the two
> copies of src/mesa the moment we start changing structs or function
> prototypes.  The obvious answer to this is "we'll sort out the
> spaghetti mess".  However, that also means serious changes to
> src/compiler/glsl because it includes mtypes.h.  Do we clone that too?
>  I honestly think that this particular cure is likely far worse than
> the disease of having to do careful testing of src/mesa changes.
>
> IMO, a far better plan would be to give it one more year so that
> distros and users get experience with iris and then fork off an LTS
> branch and delete all the legacy stuff from master.  We can continue
> to do maintenance in the LTS branch as needed but I honestly don't
> expect much work to happen there.  The most difficult thing will be
> deciding what to remove from master but I don't want to make that
> decision today.  However, doing a clean fork means we don't have to
> worry about cross-contamination in shared code causing weird issues
> because they're in separate branches.  We will have to figure out a
> few loader issues to ensure that the master drivers get preferred over
> the LTS ones or somehow disable all master drivers in the LTS branch.
>
> --Jason
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa repo commit access

2020-03-26 Thread Marek Olšák
Hi Alexandros,

I recommend submitting more merge requests to have a bigger commit history.

Ultimately the virgl maintainers will have to decide this.

Marek

On Wed, Mar 25, 2020 at 10:00 AM Alexandros Frantzis <
alexandros.frant...@collabora.com> wrote:

> Hi everyone,
>
> I would like to request commit access to the Mesa repo (user
> 'afrantzis'). For the record:
>
> mesa$ git log --author="Alexandros Frantzis" --oneline origin/master | wc
> -l
> 30
>
> Please let me know if you need something more from me.
>
> Thanks,
> Alexandros
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Drop scons for 20.1?

2020-03-26 Thread Marek Olšák
In the long term we should reduce the complexity of the project. scons is a
maintenance burden. Every time I break the scons build and the CI reports
it, can I politely ask you to fix my MR instead of me doing it? Then at
least the real maintenance cost would be known to scons supporters, instead
of the cost being invisible to most.

In the mean time, I think we can remove all parts of scons that VMWare does
NOT care about. Do you need haiku-softpipe? Do you need graw-null? Do you
need swr? glx? There is bunch you don't really need on Windows.

Marek

On Wed, Feb 26, 2020 at 3:44 PM Jose Fonseca  wrote:

> We already solved some pieces (e.g, how to consume and use Meson, while
> following our internal legal process required for adding new 3rd party
> dependencies), and figured a way to consume Meson build without having to
> migrate lots of internal build logic from Scons to Meson.  But other stuff
> just keeps getting higher priority, and we haven't fully migrated.
>
> Please do understand, SCons *just* *works* for us.  We are making
> progress with Meson.  It's just not the highest priority, when time is
> short, it gets deferred.
>
> I don't understand the rush.  If it was trivial and easy we'd obviously
> would have done it.
>
> Jose
>
> --
> *From:* Jason Ekstrand 
> *Sent:* Wednesday, February 26, 2020 04:15
> *To:* Rob Clark ; Kristian Høgsberg <
> hoegsb...@gmail.com>
> *Cc:* mesa-dev ; Dylan Baker <
> baker.dyla...@gmail.com>; Jose Fonseca ; Brian Paul <
> bri...@vmware.com>
> *Subject:* Re: [Mesa-dev] Drop scons for 20.1?
>
> +Jose & Brian
>
> I'm not personally opposed but I also can't remember the last time I had
> to
> fix the scons build. I think it's been years. Maybe that's because I don't
> work on GL anymore? In any case, I don't know that it's really costing us
> that much given that basically none of the drivers actually build with it.
> But fat meh, I guess.
>
> --Jason
>
> On February 25, 2020 21:56:30 Rob Clark  wrote:
>
> > It looks like we have 4 scons build jobs in CI.. I'm not sure how much
> > that costs us, but I guess those cycles could be put to better use?
> > So even ignoring the developer-cycles issue (ie. someone making
> > changes that effects scons build, and has to setup a scons build env
> > to fix breakage of their MR) I guess there is at least an argument to
> > remove scons from CI.  Whether it is worth keeping a dead build system
> > after it is removed from CI is an issue that I'm ambivalent about.
> >
> > BR,
> > -R
> >
> > On Tue, Feb 25, 2020 at 3:42 PM Kristian Høgsberg 
> wrote:
> >>
> >> It's been a while since Dylan did the work to make meson support
> >> Windows and there's been plenty of time to provide feedback or improve
> >> argue why we still need scons. I haven't seen any such discussion and
> >> I think we've waited long enough.
> >>
> >> Let's drop scons for the next release and move things forward?
> >>
> >> Kristian
> >> ___
> >> mesa-dev mailing list
> >> mesa-dev@lists.freedesktop.org
> >>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Cjfonseca%40vmware.com%7Cc8b86d6f312c48d77f3c08d7ba72774f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637182873108493719sdata=oKvqrkRoo6%2FqGW5BWbe1exIcBF%2BI%2BblcWIIVo3iW9J0%3Dreserved=0
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> >
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Cjfonseca%40vmware.com%7Cc8b86d6f312c48d77f3c08d7ba72774f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637182873108493719sdata=oKvqrkRoo6%2FqGW5BWbe1exIcBF%2BI%2BblcWIIVo3iW9J0%3Dreserved=0
>
>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Remove classic drivers or fork src/mesa for gallium?

2020-03-25 Thread Marek Olšák
On Thu, Dec 5, 2019 at 2:58 AM Kenneth Graunke 
wrote:

> On Tuesday, December 3, 2019 4:39:15 PM PST Marek Olšák wrote:
> > Hi,
> >
> > Here are 2 proposals to simplify and better optimize the GL->Gallium
> > translation.
> >
> > 1) Move classic drivers to a fork of Mesa, and remove them from master.
> > Classic drivers won't share any code with master. glvnd will load them,
> but
> > glvnd is not ready for this yet.
> >
> > 2) Keep classic drivers. Fork src/mesa for Gallium. I think only
> mesa/main,
> > mesa/vbo, mesa/program, and drivers/dri/common need to be forked and
> > mesa/state_tracker moved. src/gallium/state-trackers/gl/ can be the
> target
> > location.
> >
> > Option 2 is more acceptable to people who want to keep classic drivers in
> > the tree and it can be done right now.
> >
> > Opinions?
> >
> > Thanks,
> > Marek
>
> FWIW, I am not in favor of either plan for the time being.
>
> - I agree with Eric that we're finally starting to clean up and
>   de-duplicate things, and pay off technical debt we've put off for
>   a long time.  I think forking everything in-tree would just be a
>   giant mess.
>
> - I agree with Dave that this seems to be a giant hassle for our
>   downstreams with little benefit for them in the short term.
>

If classic drivers were moved to src/mesa_classic, nothing would change for
downstream.


>
> - Shuffling r100/r200/i915/nouveau_vieux off to a legacy fork seems
>   like a fine plan.  They're ancient hardware that can't (or barely
>   can) do GL 2.x.  Nothing much has happened with them in years,
>   and I'm not sure all of them even really have maintainers.
>
>   The main blocker here I think would be ironing out all the glvnd
>   issues and making sure that is working solidly for everyone.
>   (FWIW, glvnd is not even enabled by default in our build system!)
>
> - Shuffling i965 off to legacy is really premature I think.  Iris
>   isn't even shipping in distros yet (hopefully soon!), and even
>   then, we have a _ton_ of Haswell users.  Check the Phoronix
>   comments if you want to see how pissed off people are about even
>   bringing up this topic.  (Or better yet, don't...basic human
>   decency toward others seems to be lacking.  Hooray, internet...)
>
> - Writing a Gallium driver for Intel Gen4-7.5 would be interesting.
>
>   I've been thinking about this some, and it might be possible to
>   retain some of the niceties of the iris memory model even on older
>   hardware, by relying on a modern kernel and possibly making some
>   (hopefully minor) improvements.  Even if we didn't, going back to
>   the i965 model wouldn't be the worst thing.  The new driver would
>   almost certainly be faster than i965, if not as good as iris.
>
>   ajax and I were planning to call it crocus, if I wrote one.  I don't
>   think it would take nearly as long.  But it's still a bunch of time
>   that I don't think I can justify spending.  The new hardware can do
>   so much more, and is so much faster, and much lower power.  I think
>   it would be better for me to spend my time on Gen11+.
>

There is also "ilo", which supports gen6-7, but not GL 4.x and NIR, and
probably isn't as fast as i965 (that might be fixable with NIR):
https://cgit.freedesktop.org/mesa/mesa/commit/?id=38794259175852084532499a09dec85b6c6a4321


>
> - Vulkan has really taken off, and OpenGL feels increasingly like a
>   dead end.  DXVK has brought more games than we had with native ports.
>   Feral is even reworking some native OpenGL ports to use Vulkan.  New
>   graphics features are happening in the Vulkan space.
>

There are markets like workstation where OpenGL is the main API. OpenGL is
also still relevant in some other markets. New games have transitioned to
Vulkan but not good old games.

I don't think Zink can outperform an optimized Gallium driver across all
legacy features, so it will only be useful in a small subset of use cases.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] meson doesn't build gallium drivers in parallel with src/mesa

2020-03-20 Thread Marek Olšák
Hi,

I think the problem is that src/mesa is linked using "link_with", while
gallium drivers are linked using "dependencies" in meson, but I might be
wrong.

It looks like meson only compiles the dependencies in "link_with" in
parallel, then waits for completion, and then meson looks at
"dependencies", which triggers python scripts like sid_tables_h WHICH RUN
SINGLE-THREADED UNTIL COMPLETION, and then meson starts another wave of
compilation.

Any idea how to fix this?

Thanks,
Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-19 Thread Marek Olšák
On Thu., Mar. 19, 2020, 06:51 Daniel Vetter,  wrote:

> On Tue, Mar 17, 2020 at 11:01:57AM +0100, Michel Dänzer wrote:
> > On 2020-03-16 7:33 p.m., Marek Olšák wrote:
> > > On Mon, Mar 16, 2020 at 5:57 AM Michel Dänzer 
> wrote:
> > >> On 2020-03-16 4:50 a.m., Marek Olšák wrote:
> > >>> The synchronization works because the Mesa driver waits for idle
> (drains
> > >>> the GFX pipeline) at the end of command buffers and there is only 1
> > >>> graphics queue, so everything is ordered.
> > >>>
> > >>> The GFX pipeline runs asynchronously to the command buffer, meaning
> the
> > >>> command buffer only starts draws and doesn't wait for completion. If
> the
> > >>> Mesa driver didn't wait at the end of the command buffer, the command
> > >>> buffer would finish and a different process could start execution of
> its
> > >>> own command buffer while shaders of the previous process are still
> > >> running.
> > >>>
> > >>> If the Mesa driver submits a command buffer internally (because it's
> > >> full),
> > >>> it doesn't wait, so the GFX pipeline doesn't notice that a command
> buffer
> > >>> ended and a new one started.
> > >>>
> > >>> The waiting at the end of command buffers happens only when the
> flush is
> > >>> external (Swap buffers, glFlush).
> > >>>
> > >>> It's a performance problem, because the GFX queue is blocked until
> the
> > >> GFX
> > >>> pipeline is drained at the end of every frame at least.
> > >>>
> > >>> So explicit fences for SwapBuffers would help.
> > >>
> > >> Not sure what difference it would make, since the same thing needs to
> be
> > >> done for explicit fences as well, doesn't it?
> > >
> > > No. Explicit fences don't require userspace to wait for idle in the
> command
> > > buffer. Fences are signalled when the last draw is complete and caches
> are
> > > flushed. Before that happens, any command buffer that is not dependent
> on
> > > the fence can start execution. There is never a need for the GPU to be
> idle
> > > if there is enough independent work to do.
> >
> > I don't think explicit fences in the context of this discussion imply
> > using that different fence signalling mechanism though. My understanding
> > is that the API proposed by Jason allows implicit fences to be used as
> > explicit ones and vice versa, so presumably they have to use the same
> > signalling mechanism.
> >
> >
> > Anyway, maybe the different fence signalling mechanism you describe
> > could be used by the amdgpu kernel driver in general, then Mesa could
> > drop the waits for idle and get the benefits with implicit sync as well?
>
> Yeah, this is entirely about the programming model visible to userspace.
> There shouldn't be any impact on the driver's choice of a top vs. bottom
> of the gpu pipeline used for synchronization, that's entirely up to what
> you're hw/driver/scheduler can pull off.
>
> Doing a full gfx pipeline flush for shared buffers, when your hw can do
> be, sounds like an issue to me that's not related to this here at all. It
> might be intertwined with amdgpu's special interpretation of dma_resv
> fences though, no idea. We might need to revamp all that. But for a
> userspace client that does nothing fancy (no multiple render buffer
> targets in one bo, or vk style "I write to everything all the time,
> perhaps" stuff) there should be 0 perf difference between implicit sync
> through dma_resv and explicit sync through sync_file/syncobj/dma_fence
> directly.
>
> If there is I'd consider that a bit a driver bug.
>

Last time I checked, there was no fence sync in gnome shell and compiz
after an app passes a buffer to it. So drivers have to invent hacks to work
around it and decrease performance. It's not a driver bug.

Implicit sync really means that apps and compositors don't sync, so the
driver has to guess when it should sync.

Marek


-Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-17 Thread Marek Olšák
On Tue., Mar. 17, 2020, 06:02 Michel Dänzer,  wrote:

> On 2020-03-16 7:33 p.m., Marek Olšák wrote:
> > On Mon, Mar 16, 2020 at 5:57 AM Michel Dänzer 
> wrote:
> >> On 2020-03-16 4:50 a.m., Marek Olšák wrote:
> >>> The synchronization works because the Mesa driver waits for idle
> (drains
> >>> the GFX pipeline) at the end of command buffers and there is only 1
> >>> graphics queue, so everything is ordered.
> >>>
> >>> The GFX pipeline runs asynchronously to the command buffer, meaning the
> >>> command buffer only starts draws and doesn't wait for completion. If
> the
> >>> Mesa driver didn't wait at the end of the command buffer, the command
> >>> buffer would finish and a different process could start execution of
> its
> >>> own command buffer while shaders of the previous process are still
> >> running.
> >>>
> >>> If the Mesa driver submits a command buffer internally (because it's
> >> full),
> >>> it doesn't wait, so the GFX pipeline doesn't notice that a command
> buffer
> >>> ended and a new one started.
> >>>
> >>> The waiting at the end of command buffers happens only when the flush
> is
> >>> external (Swap buffers, glFlush).
> >>>
> >>> It's a performance problem, because the GFX queue is blocked until the
> >> GFX
> >>> pipeline is drained at the end of every frame at least.
> >>>
> >>> So explicit fences for SwapBuffers would help.
> >>
> >> Not sure what difference it would make, since the same thing needs to be
> >> done for explicit fences as well, doesn't it?
> >
> > No. Explicit fences don't require userspace to wait for idle in the
> command
> > buffer. Fences are signalled when the last draw is complete and caches
> are
> > flushed. Before that happens, any command buffer that is not dependent on
> > the fence can start execution. There is never a need for the GPU to be
> idle
> > if there is enough independent work to do.
>
> I don't think explicit fences in the context of this discussion imply
> using that different fence signalling mechanism though. My understanding
> is that the API proposed by Jason allows implicit fences to be used as
> explicit ones and vice versa, so presumably they have to use the same
> signalling mechanism.
>
>
> Anyway, maybe the different fence signalling mechanism you describe
> could be used by the amdgpu kernel driver in general, then Mesa could
> drop the waits for idle and get the benefits with implicit sync as well?
>

Yes. If there is any waiting, or should be done in the GPU scheduler, not
in the command buffer, so that independent command buffers can use the GFX
queue.

Marek


>
> --
> Earthling Michel Dänzer   |   https://redhat.com
> Libre software enthusiast | Mesa and X developer
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-16 Thread Marek Olšák
On Mon, Mar 16, 2020 at 5:57 AM Michel Dänzer  wrote:

> On 2020-03-16 4:50 a.m., Marek Olšák wrote:
> > The synchronization works because the Mesa driver waits for idle (drains
> > the GFX pipeline) at the end of command buffers and there is only 1
> > graphics queue, so everything is ordered.
> >
> > The GFX pipeline runs asynchronously to the command buffer, meaning the
> > command buffer only starts draws and doesn't wait for completion. If the
> > Mesa driver didn't wait at the end of the command buffer, the command
> > buffer would finish and a different process could start execution of its
> > own command buffer while shaders of the previous process are still
> running.
> >
> > If the Mesa driver submits a command buffer internally (because it's
> full),
> > it doesn't wait, so the GFX pipeline doesn't notice that a command buffer
> > ended and a new one started.
> >
> > The waiting at the end of command buffers happens only when the flush is
> > external (Swap buffers, glFlush).
> >
> > It's a performance problem, because the GFX queue is blocked until the
> GFX
> > pipeline is drained at the end of every frame at least.
> >
> > So explicit fences for SwapBuffers would help.
>
> Not sure what difference it would make, since the same thing needs to be
> done for explicit fences as well, doesn't it?
>

No. Explicit fences don't require userspace to wait for idle in the command
buffer. Fences are signalled when the last draw is complete and caches are
flushed. Before that happens, any command buffer that is not dependent on
the fence can start execution. There is never a need for the GPU to be idle
if there is enough independent work to do.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-15 Thread Marek Olšák
The synchronization works because the Mesa driver waits for idle (drains
the GFX pipeline) at the end of command buffers and there is only 1
graphics queue, so everything is ordered.

The GFX pipeline runs asynchronously to the command buffer, meaning the
command buffer only starts draws and doesn't wait for completion. If the
Mesa driver didn't wait at the end of the command buffer, the command
buffer would finish and a different process could start execution of its
own command buffer while shaders of the previous process are still running.

If the Mesa driver submits a command buffer internally (because it's full),
it doesn't wait, so the GFX pipeline doesn't notice that a command buffer
ended and a new one started.

The waiting at the end of command buffers happens only when the flush is
external (Swap buffers, glFlush).

It's a performance problem, because the GFX queue is blocked until the GFX
pipeline is drained at the end of every frame at least.

So explicit fences for SwapBuffers would help.

Marek

On Sun., Mar. 15, 2020, 22:49 Jason Ekstrand,  wrote:

> Could you elaborate. If there's something missing from my mental model of
> how implicit sync works, I'd like to have it corrected. People continue
> claiming that AMD is somehow special but I have yet to grasp what makes it
> so.  (Not that anyone has bothered to try all that hard to explain it.)
>
>
> --Jason
>
> On March 13, 2020 21:03:21 Marek Olšák  wrote:
>
>> There is no synchronization between processes (e.g. 3D app and
>> compositor) within X on AMD hw. It works because of some hacks in Mesa.
>>
>> Marek
>>
>> On Wed, Mar 11, 2020 at 1:31 PM Jason Ekstrand 
>> wrote:
>>
>>> All,
>>>
>>> Sorry for casting such a broad net with this one. I'm sure most people
>>> who reply will get at least one mailing list rejection.  However, this
>>> is an issue that affects a LOT of components and that's why it's
>>> thorny to begin with.  Please pardon the length of this e-mail as
>>> well; I promise there's a concrete point/proposal at the end.
>>>
>>>
>>> Explicit synchronization is the future of graphics and media.  At
>>> least, that seems to be the consensus among all the graphics people
>>> I've talked to.  I had a chat with one of the lead Android graphics
>>> engineers recently who told me that doing explicit sync from the start
>>> was one of the best engineering decisions Android ever made.  It's
>>> also the direction being taken by more modern APIs such as Vulkan.
>>>
>>>
>>> ## What are implicit and explicit synchronization?
>>>
>>> For those that aren't familiar with this space, GPUs, media encoders,
>>> etc. are massively parallel and synchronization of some form is
>>> required to ensure that everything happens in the right order and
>>> avoid data races.  Implicit synchronization is when bits of work (3D,
>>> compute, video encode, etc.) are implicitly based on the absolute
>>> CPU-time order in which API calls occur.  Explicit synchronization is
>>> when the client (whatever that means in any given context) provides
>>> the dependency graph explicitly via some sort of synchronization
>>> primitives.  If you're still confused, consider the following
>>> examples:
>>>
>>> With OpenGL and EGL, almost everything is implicit sync.  Say you have
>>> two OpenGL contexts sharing an image where one writes to it and the
>>> other textures from it.  The way the OpenGL spec works, the client has
>>> to make the API calls to render to the image before (in CPU time) it
>>> makes the API calls which texture from the image.  As long as it does
>>> this (and maybe inserts a glFlush?), the driver will ensure that the
>>> rendering completes before the texturing happens and you get correct
>>> contents.
>>>
>>> Implicit synchronization can also happen across processes.  Wayland,
>>> for instance, is currently built on implicit sync where the client
>>> does their rendering and then does a hand-off (via wl_surface::commit)
>>> to tell the compositor it's done at which point the compositor can now
>>> texture from the surface.  The hand-off ensures that the client's
>>> OpenGL API calls happen before the server's OpenGL API calls.
>>>
>>> A good example of explicit synchronization is the Vulkan API.  There,
>>> a client (or multiple clients) can simultaneously build command
>>> buffers in different threads where one of those command buffers
>>> renders to an image and the other textures from it and then submit
>>> both of

Re: [Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem

2020-03-13 Thread Marek Olšák
There is no synchronization between processes (e.g. 3D app and compositor)
within X on AMD hw. It works because of some hacks in Mesa.

Marek

On Wed, Mar 11, 2020 at 1:31 PM Jason Ekstrand  wrote:

> All,
>
> Sorry for casting such a broad net with this one. I'm sure most people
> who reply will get at least one mailing list rejection.  However, this
> is an issue that affects a LOT of components and that's why it's
> thorny to begin with.  Please pardon the length of this e-mail as
> well; I promise there's a concrete point/proposal at the end.
>
>
> Explicit synchronization is the future of graphics and media.  At
> least, that seems to be the consensus among all the graphics people
> I've talked to.  I had a chat with one of the lead Android graphics
> engineers recently who told me that doing explicit sync from the start
> was one of the best engineering decisions Android ever made.  It's
> also the direction being taken by more modern APIs such as Vulkan.
>
>
> ## What are implicit and explicit synchronization?
>
> For those that aren't familiar with this space, GPUs, media encoders,
> etc. are massively parallel and synchronization of some form is
> required to ensure that everything happens in the right order and
> avoid data races.  Implicit synchronization is when bits of work (3D,
> compute, video encode, etc.) are implicitly based on the absolute
> CPU-time order in which API calls occur.  Explicit synchronization is
> when the client (whatever that means in any given context) provides
> the dependency graph explicitly via some sort of synchronization
> primitives.  If you're still confused, consider the following
> examples:
>
> With OpenGL and EGL, almost everything is implicit sync.  Say you have
> two OpenGL contexts sharing an image where one writes to it and the
> other textures from it.  The way the OpenGL spec works, the client has
> to make the API calls to render to the image before (in CPU time) it
> makes the API calls which texture from the image.  As long as it does
> this (and maybe inserts a glFlush?), the driver will ensure that the
> rendering completes before the texturing happens and you get correct
> contents.
>
> Implicit synchronization can also happen across processes.  Wayland,
> for instance, is currently built on implicit sync where the client
> does their rendering and then does a hand-off (via wl_surface::commit)
> to tell the compositor it's done at which point the compositor can now
> texture from the surface.  The hand-off ensures that the client's
> OpenGL API calls happen before the server's OpenGL API calls.
>
> A good example of explicit synchronization is the Vulkan API.  There,
> a client (or multiple clients) can simultaneously build command
> buffers in different threads where one of those command buffers
> renders to an image and the other textures from it and then submit
> both of them at the same time with instructions to the driver for
> which order to execute them in.  The execution order is described via
> the VkSemaphore primitive.  With the new VK_KHR_timeline_semaphore
> extension, you can even submit the work which does the texturing
> BEFORE the work which does the rendering and the driver will sort it
> out.
>
> The #1 problem with implicit synchronization (which explicit solves)
> is that it leads to a lot of over-synchronization both in client space
> and in driver/device space.  The client has to synchronize a lot more
> because it has to ensure that the API calls happen in a particular
> order.  The driver/device have to synchronize a lot more because they
> never know what is going to end up being a synchronization point as an
> API call on another thread/process may occur at any time.  As we move
> to more and more multi-threaded programming this synchronization (on
> the client-side especially) becomes more and more painful.
>
>
> ## Current status in Linux
>
> Implicit synchronization in Linux works via a the kernel's internal
> dma_buf and dma_fence data structures.  A dma_fence is a tiny object
> which represents the "done" status for some bit of work.  Typically,
> dma_fences are created as a by-product of someone submitting some bit
> of work (say, 3D rendering) to the kernel.  The dma_buf object has a
> set of dma_fences on it representing shared (read) and exclusive
> (write) access to the object.  When work is submitted which, for
> instance renders to the dma_buf, it's queued waiting on all the fences
> on the dma_buf and and a dma_fence is created representing the end of
> said rendering work and it's installed as the dma_buf's exclusive
> fence.  This way, the kernel can manage all its internal queues (3D
> rendering, display, video encode, etc.) and know which things to
> submit in what order.
>
> For the last few years, we've had sync_file in the kernel and it's
> plumbed into some drivers.  A sync_file is just a wrapper around a
> single dma_fence.  A sync_file is typically created as a by-product of
> submitting work 

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-29 Thread Marek Olšák
For Mesa, we could run CI only when Marge pushes, so that it's a strictly
pre-merge CI.

Marek

On Sat., Feb. 29, 2020, 17:20 Nicolas Dufresne, 
wrote:

> Le samedi 29 février 2020 à 15:54 -0600, Jason Ekstrand a écrit :
> > On Sat, Feb 29, 2020 at 3:47 PM Timur Kristóf 
> wrote:
> > > On Sat, 2020-02-29 at 14:46 -0500, Nicolas Dufresne wrote:
> > > > > 1. I think we should completely disable running the CI on MRs which
> > > > > are
> > > > > marked WIP. Speaking from personal experience, I usually make a lot
> > > > > of
> > > > > changes to my MRs before they are merged, so it is a waste of CI
> > > > > resources.
> > > >
> > > > In the mean time, you can help by taking the habit to use:
> > > >
> > > >   git push -o ci.skip
> > >
> > > Thanks for the advice, I wasn't aware such an option exists. Does this
> > > also work on the mesa gitlab or is this a GStreamer only thing?
> >
> > Mesa is already set up so that it only runs on MRs and branches named
> > ci-* (or maybe it's ci/*; I can't remember).
> >
> > > How hard would it be to make this the default?
> >
> > I strongly suggest looking at how Mesa does it and doing that in
> > GStreamer if you can.  It seems to work pretty well in Mesa.
>
> You are right, they added CI_MERGE_REQUEST_SOURCE_BRANCH_NAME in 11.6
> (we started our CI a while ago). But there is even better now, ou can
> do:
>
>   only:
> refs:
>   - merge_requests
>
> Thanks for the hint, I'll suggest that. I've lookup some of the backend
> of mesa, I think it's really nice, though there is a lot of concept
> that won't work in a multi-repo CI. Again, I need to refresh on what
> was moved from the enterprise to the community version in this regard,
>
> >
> > --Jason
> >
> >
> > > > That's a much more difficult goal then it looks like. Let each
> > > > projects
> > > > manage their CI graph and content, as each case is unique. Running
> > > > more
> > > > tests, or building more code isn't the main issue as the CPU time is
> > > > mostly sponsored. The data transfers between the cloud of gitlab and
> > > > the runners (which are external), along to sending OS image to Lava
> > > > labs is what is likely the most expensive.
> > > >
> > > > As it was already mention in the thread, what we are missing now, and
> > > > being worked on, is per group/project statistics that give us the
> > > > hotspot so we can better target the optimization work.
> > >
> > > Yes, would be nice to know what the hotspot is, indeed.
> > >
> > > As far as I understand, the problem is not CI itself, but the bandwidth
> > > needed by the build artifacts, right? Would it be possible to not host
> > > the build artifacts on the gitlab, but rather only the place where the
> > > build actually happened? Or at least, only transfer the build artifacts
> > > on-demand?
> > >
> > > I'm not exactly familiar with how the system works, so sorry if this is
> > > a silly question.
> > >
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH] mesa: fix _mesa_draw_nonzero_divisor_bits to return nonzero divisors

2020-02-29 Thread Marek Olšák
Rb. I'm on the phone.

M.

On Sat., Feb. 29, 2020, 22:53 Ilia Mirkin,  wrote:

> The bitmask is _EffEnabledNonZeroDivisor, so no need to invert it before
> returning.
>
> Fixes: fd6636ebc06d (st/mesa: simplify determination whether a draw needs
> min/max index)
> Signed-off-by: Ilia Mirkin 
> ---
>
> I haven't done any extensive testing on this, but it does seem to fix a
> regression on nv50 where the limits were not being supplied. (And
> there's an assert to make sure that they are.)
>
>  src/mesa/main/arrayobj.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/mesa/main/arrayobj.h b/src/mesa/main/arrayobj.h
> index 19ab65b3242..3efcd577ae5 100644
> --- a/src/mesa/main/arrayobj.h
> +++ b/src/mesa/main/arrayobj.h
> @@ -221,7 +221,7 @@ _mesa_draw_nonzero_divisor_bits(const struct
> gl_context *ctx)
>  {
> const struct gl_vertex_array_object *const vao = ctx->Array._DrawVAO;
> assert(vao->NewArrays == 0);
> -   return ~vao->_EffEnabledNonZeroDivisor &
> ctx->Array._DrawVAOEnabledAttribs;
> +   return vao->_EffEnabledNonZeroDivisor &
> ctx->Array._DrawVAOEnabledAttribs;
>  }
>
>
> --
> 2.24.1
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Request for "Developer" of mesa project.

2020-02-14 Thread Marek Olšák
Done. You now have the Developer status.

Marek

On Fri, Feb 14, 2020 at 3:52 AM Hyunjun Ko  wrote:

> Hi,
>
> I'm Hyunjun Ko, working on Mesa actively nowadays.
> I've been working for over 1 year since I dived into this attractive
> project and mainly focusing on freedreno.
>
> Now I feel it's the right time to request to grant access.
>
> You can see my works in the past:
> https://cgit.freedesktop.org/mesa/mesa/log/?qt=author=zzoon
>
> And also there are several pending patches for mediump, which is
> expected to land soon.
>
> I would like to appreciate it if this request is approved.
> Thanks.
>
> PS. Now I can't even label "turnip" on the issue!
> https://gitlab.freedesktop.org/mesa/mesa/issues/2514 XD
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Profile-guides optimizations

2020-02-13 Thread Marek Olšák
Yeah I guess it reduces instruction cache misses, but then other codepaths
are likely to get more misses.

Does it do anything smarter?

Marek

On Thu., Feb. 13, 2020, 17:52 Dave Airlie,  wrote:

> On Fri, 14 Feb 2020 at 08:22, Marek Olšák  wrote:
> >
> > I wonder what PGO really does other than placing likely/unlikely.
>
> With LTO it can do a lot more, like grouping hot functions into closer
> regions so they avoid TLB misses and faults etc.
>
> Dave.
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Profile-guides optimizations

2020-02-13 Thread Marek Olšák
I wonder what PGO really does other than placing likely/unlikely.

Marek

On Thu., Feb. 13, 2020, 13:43 Dylan Baker,  wrote:

> I actually spent a bunch of time toying with PGO a couple of years ago. I
> got
> the guidance all working and was able to train it, but what we found was
> that it
> made the specific workloads we threw at it much faster, but it made every
> real
> world use case I tried (playing a game, running piglit, gnome) slower,
> often
> significantly so.
>
> The hard part is not setting up pgo, it's getting the right training data.
>
> Dylan
>
> Quoting Marek Olšák (2020-02-13 10:30:46)
> > [Forked from the other thread]
> >
> > Guys, we could run some simple tests similar to piglit/drawoverhead as
> the last
> > step of the pgo=generate build. Tests like that should exercise the most
> common
> > codepaths in drivers. We could add subtests that we care about the most.
> >
> > Marek
> >
> > On Thu., Feb. 13, 2020, 13:16 Dylan Baker,  wrote:
> >
> > meson has buildtins for both of these, -Db_lto=true turns on lto,
> for pgo
> > you
> > would run:
> >
> > meson build -Db_pgo=generate
> > ninja -C build
> > 
> > meson configure build -Db_pgo=use
> > ninja -C build
> >
> > Quoting Marek Olšák (2020-02-12 10:46:12)
> > > How do you enable LTO+PGO? Is it something we could enable by
> default for
> > > release builds?
> > >
> > > Marek
> > >
> > > On Wed, Feb 12, 2020 at 1:56 AM Dieter Nützel <
> die...@nuetzel-hh.de>
> > wrote:
> > >
> > > Hello Gert,
> > >
> > > your merge 'broke' LTO and then later on PGO
> compilation/linking.
> > >
> > > I do generally compiling with '-Dgallium-drivers=
> > r600,radeonsi,swrast'
> > > for testing radeonsi and (your) r600 work. ;-)
> > >
> > > After your merge I get several warnings in 'addrlib' with LTO
> and
> > even a
> > > compiler error (gcc (SUSE Linux) 9.2.1 20200128).
> > >
> > > I had to disable 'r600' ('swrast' is needed for 'nine') to get
> a
> > working
> > > LTO and even better PGO radeonsi driver.
> > > I'm preparing GREAT LTO+PGO (the later is the greater) numbers
> over
> > the
> > > last 2 months. I'll send my results later, today.
> > >
> > > Summary
> > > radeonsi is ~40% smaller and 16-20% faster with PGO (!!!).
> > >
> > > Honza and the GCC people (Intel's ICC folks) do GREAT things.
> > > 'glmark2' numbers are better then 'vkmark'. (Hello, Marek.).
> > >
> > > Need some sleep.
> > >
> > > See my log, below.
> > >
> > > Greetings and GREAT work!
> > >
> > > -Dieter
> > >
> > > Am 09.02.2020 15:46, schrieb Gert Wollny:
> > > > Am Donnerstag, den 23.01.2020, 20:31 +0100 schrieb Gert
> Wollny:
> > > >> has anybody any objections if I merge the r600/NIR code?
> > > >> Without explicitely setting the debug flag it doesn't
> change a
> > > >> thing, but it would be better to continue developing
> in-tree.
> > > > Okay, if nobody objects, I'll merge it Monday evening.
> > > >
> > > > Best,
> > > > Gert
> > >
> > > [1425/1433] Linking target
> src/gallium/targets/dri/libgallium_dri.so.
> > > FAILED: src/gallium/targets/dri/libgallium_dri.so
> > > c++  -o src/gallium/targets/dri/libgallium_dri.so
> > > 'src/gallium/targets/dri/8381c20@@gallium_dri@sha/target.c.o'
> -flto
> > > -fprofile-generate -Wl,--as-needed -Wl,--no-undefined -Wl,-O1
> -shared
> > > -fPIC -Wl,--start-group -Wl,-soname,libgallium_dri.so
> > > src/mesa/libmesa_gallium.a src/mesa/libmesa_common.a
> > > src/compiler/glsl/libglsl.a src/compiler/glsl/glcpp/libglcpp.a
> > > src/util/libmesa_util.a src/util/format/libmesa_format.a
> > > src/compiler/nir/libnir.a src/compiler/libcompiler.a
> > > src/mesa/libmesa_sse41.a
> src/mesa/drivers/dri/common/libdricommon.a
> > > src/mesa/drivers/dri/common/libmegadriver_stub.a
> > > 

[Mesa-dev] Profile-guides optimizations

2020-02-13 Thread Marek Olšák
[Forked from the other thread]

Guys, we could run some simple tests similar to piglit/drawoverhead as the
last step of the pgo=generate build. Tests like that should exercise the
most common codepaths in drivers. We could add subtests that we care about
the most.

Marek

On Thu., Feb. 13, 2020, 13:16 Dylan Baker,  wrote:

> meson has buildtins for both of these, -Db_lto=true turns on lto, for pgo
> you
> would run:
>
> meson build -Db_pgo=generate
> ninja -C build
> 
> meson configure build -Db_pgo=use
> ninja -C build
>
> Quoting Marek Olšák (2020-02-12 10:46:12)
> > How do you enable LTO+PGO? Is it something we could enable by default for
> > release builds?
> >
> > Marek
> >
> > On Wed, Feb 12, 2020 at 1:56 AM Dieter Nützel 
> wrote:
> >
> > Hello Gert,
> >
> > your merge 'broke' LTO and then later on PGO compilation/linking.
> >
> > I do generally compiling with
> '-Dgallium-drivers=r600,radeonsi,swrast'
> > for testing radeonsi and (your) r600 work. ;-)
> >
> > After your merge I get several warnings in 'addrlib' with LTO and
> even a
> > compiler error (gcc (SUSE Linux) 9.2.1 20200128).
> >
> > I had to disable 'r600' ('swrast' is needed for 'nine') to get a
> working
> > LTO and even better PGO radeonsi driver.
> > I'm preparing GREAT LTO+PGO (the later is the greater) numbers over
> the
> > last 2 months. I'll send my results later, today.
> >
> > Summary
> > radeonsi is ~40% smaller and 16-20% faster with PGO (!!!).
> >
> > Honza and the GCC people (Intel's ICC folks) do GREAT things.
> > 'glmark2' numbers are better then 'vkmark'. (Hello, Marek.).
> >
> > Need some sleep.
> >
> > See my log, below.
> >
> > Greetings and GREAT work!
> >
> > -Dieter
> >
> > Am 09.02.2020 15:46, schrieb Gert Wollny:
> > > Am Donnerstag, den 23.01.2020, 20:31 +0100 schrieb Gert Wollny:
> > >> has anybody any objections if I merge the r600/NIR code?
> > >> Without explicitely setting the debug flag it doesn't change a
> > >> thing, but it would be better to continue developing in-tree.
> > > Okay, if nobody objects, I'll merge it Monday evening.
> > >
> > > Best,
> > > Gert
> >
> > [1425/1433] Linking target src/gallium/targets/dri/libgallium_dri.so.
> > FAILED: src/gallium/targets/dri/libgallium_dri.so
> > c++  -o src/gallium/targets/dri/libgallium_dri.so
> > 'src/gallium/targets/dri/8381c20@@gallium_dri@sha/target.c.o' -flto
> > -fprofile-generate -Wl,--as-needed -Wl,--no-undefined -Wl,-O1 -shared
> > -fPIC -Wl,--start-group -Wl,-soname,libgallium_dri.so
> > src/mesa/libmesa_gallium.a src/mesa/libmesa_common.a
> > src/compiler/glsl/libglsl.a src/compiler/glsl/glcpp/libglcpp.a
> > src/util/libmesa_util.a src/util/format/libmesa_format.a
> > src/compiler/nir/libnir.a src/compiler/libcompiler.a
> > src/mesa/libmesa_sse41.a src/mesa/drivers/dri/common/libdricommon.a
> > src/mesa/drivers/dri/common/libmegadriver_stub.a
> > src/gallium/state_trackers/dri/libdri.a
> > src/gallium/auxiliary/libgalliumvl.a
> src/gallium/auxiliary/libgallium.a
> > src/mapi/shared-glapi/libglapi.so.0.0.0
> > src/gallium/auxiliary/pipe-loader/libpipe_loader_static.a
> > src/loader/libloader.a src/util/libxmlconfig.a
> > src/gallium/winsys/sw/null/libws_null.a
> > src/gallium/winsys/sw/wrapper/libwsw.a
> > src/gallium/winsys/sw/dri/libswdri.a
> > src/gallium/winsys/sw/kms-dri/libswkmsdri.a
> > src/gallium/drivers/llvmpipe/libllvmpipe.a
> > src/gallium/drivers/softpipe/libsoftpipe.a
> > src/gallium/drivers/r600/libr600.a
> > src/gallium/winsys/radeon/drm/libradeonwinsys.a
> > src/gallium/drivers/radeonsi/libradeonsi.a
> > src/gallium/winsys/amdgpu/drm/libamdgpuwinsys.a
> > src/amd/addrlib/libaddrlib.a src/amd/common/libamd_common.a
> > src/amd/llvm/libamd_common_llvm.a -Wl,--build-id=sha1
> -Wl,--gc-sections
> > -Wl,--version-script /opt/mesa/src/gallium/targets/dri/dri.sym
> > -Wl,--dynamic-list /opt/mesa/src/gallium/targets/dri/../dri-vdpau.dyn
> > /usr/lib64/libdrm.so -L/usr/local/lib -lLLVM-10git -pthread
> > /usr/lib64/libexpat.so
> > /usr/lib64/gcc/x86_64-suse-linux/9/../../../../lib64/libz.so -lm
> > /usr/lib64/gcc/x86_64-suse-linux/9/../../../../lib64/libzstd.so
> > -L/usr/local/lib -lLLVM-10git /usr/lib64/libunwind.so -

Re: [Mesa-dev] Merging experimental r600/nir code

2020-02-12 Thread Marek Olšák
Can we automate this?

Let's say we implement noop ioctls for radeonsi and iris, and then we run
the drivers to collect pgo data on any hw.

Can meson execute this build sequence:
build with pgo=generate
run tests
clean
build with pgo=use

automated as buildtype=release-pgo.

Marek

On Wed., Feb. 12, 2020, 23:37 Dieter Nützel,  wrote:

> Hello Marek,
>
> I hoped you would ask this...
> ...but first sorry for the delay of my announced numbers.
> Our family is/was sick, my wife more than me and our children are fine,
> again.
> So be lenient with me somewhat.
>
> Am 12.02.2020 19:46, schrieb Marek Olšák:
> > How do you enable LTO+PGO? Is it something we could enable by default
> > for release builds?
> >
> > Marek
>
> I think we can achieve this.
>
> I'm running with LTO+PGO 'release' since late December (around
> Christmas).
> My KDE Plasma5 (OpenGL 3.0) system/desktop was never agiler/fluider
> since then.
> Even the numbers (glmark2) show it. The 'glmark2' numbers are the best
> I've ever seen on this system.
> LTO offer only some small space reduction and hardly any speedup.
> But LTO+PGO is GREAT.
>
> First I compile with '-Db_lto=true -Db_pgo=generate'.
>
> mkdir build
> cd build
> meson ../ --strip --buildtype release -Ddri-drivers= -Dplatforms=drm,x11
> -Dgallium-drivers=r600,radeonsi,swrast -Dvulkan-drivers=amd
> -Dgallium-nine=true -Dgallium-opencl=standalone -Dglvnd=true
> -Dgallium-va=true -Dgallium-xvmc=false -Dgallium-omx=disabled
> -Dgallium-xa=false -Db_lto=true -Db_pgo=generate
>
> After that my 'build' dir looks like this:
>
> drwxr-xr-x  8 dieter users4096 13. Feb 04:34 .
> drwxr-xr-x 14 dieter users4096 13. Feb 04:33 ..
> drwxr-xr-x  2 dieter users4096 13. Feb 04:34 bin
> -rw-r--r--  1 dieter users 4369873 13. Feb 04:34 build.ninja
> -rw-r--r--  1 dieter users 4236719 13. Feb 04:34 compile_commands.json
> drwxr-xr-x  2 dieter users4096 13. Feb 04:34 include
> drwxr-xr-x  2 dieter users4096 13. Feb 04:34 meson-info
> drwxr-xr-x  2 dieter users4096 13. Feb 04:33 meson-logs
> drwxr-xr-x  2 dieter users4096 13. Feb 04:34 meson-private
> drwxr-xr-x 14 dieter users4096 13. Feb 04:34 src
>
> time nice +19 ninja
>
> Lasts ~15 minutes on my aging/'slow' Intel Xeon X3470 Nehalem, 4c/8t,
> 2.93 GHz, 24 GB, Polaris 20.
> Without LTO+PGO it is ~4-5 minutes. (AMD anyone?)
>
> Then I remove all files/dirs except 'src'.
>
> Next 'installing' the new built files under '/usr/local/' (mostly
> symlinked to /usr/lib64/).
>
> Now run as much OpenGL/Vulkan progs as I can.
> Normaly starting with glmark2 and vkmark.
>
> Here comes my (whole) list:
> Knights
> Wireshark
> K3b
> Skanlite
> Kdenlive
> GIMP
> Krita
> FreeCAD
> Blender 2.81x
> digikam
> K4DirStat
> Discover
> YaST
> Do some 'movements'/work in/with every prog.
> +
> some LibreOffice work (OpenGL enabled)
> one or two OpenGL games
> and Vulkan games
> +
> run some WebGL stuff in my browsers (Konqi/FF).
>
> After that I have the needed '*.gcda' files in 'src'.
>
> Now second rebuild in 'src'.
> Due to the deleted files/dirs I can do a second 'meson' config run in my
> current 'build' dir.
>
> meson ../ --strip --buildtype release -Ddri-drivers= -Dplatforms=drm,x11
> -Dgallium-drivers=r600,radeonsi,swrast -Dvulkan-drivers=amd
> -Dgallium-nine=true -Dgallium-opencl=standalone -Dglvnd=true
> -Dgallium-va=true -Dgallium-xvmc=false -Dgallium-omx=disabled
> -Dgallium-xa=false -Db_lto=true -Db_pgo=use
>
> After around 5-6 minutes (!!!) I can install the LTO+PGO 'release' build
> driver files and enjoy next level of OpenGL speed.
> Vulkan do NOT show such GREAT improvements.
>
> Only '-Db_lto=true -Db_pgo=generate' need ~3 times compilation and
> mostly linking time.
>
> Below are some memory and speed numbers.
> Should I send an additional post with a better title to the list?
> Hope this helps ;-)))
>
> -Dieter
>
>
> ***
>
> Mesa git 21bc16a723 (somewhat older)
>
> normal
>
> -rwxr-xr-x   4 root root 9525520 13. Jan 20:00
> libvdpau_radeonsi.so.1.0.0
> -rwxr-xr-x   4 root root 9525520 13. Jan 20:00 libvdpau_r600.so.1.0.0
>
> -rwxr-xr-x   8 root root 18444192 13. Jan 20:00 swrast_dri.so
> -rwxr-xr-x   8 root root 18444192 13. Jan 20:00 radeonsi_dri.so
> -rwxr-xr-x   8 root root 18444192 13. Jan 20:00 r600_dri.so
> -rwxr-xr-x   8 root root 18444192 13. Jan 20:00 kms_swrast_dri.so
> -rwxr-xr-x   4 root root  9505072 13. Jan 20:00 radeonsi_drv_video.so
> -rwxr-xr-x   4 root root  9505072 13. Jan 20:00 r600_drv_video.so
>
>
> -Db_lto=true
>

Re: [Mesa-dev] Merging experimental r600/nir code

2020-02-12 Thread Marek Olšák
How do you enable LTO+PGO? Is it something we could enable by default for
release builds?

Marek

On Wed, Feb 12, 2020 at 1:56 AM Dieter Nützel  wrote:

> Hello Gert,
>
> your merge 'broke' LTO and then later on PGO compilation/linking.
>
> I do generally compiling with '-Dgallium-drivers=r600,radeonsi,swrast'
> for testing radeonsi and (your) r600 work. ;-)
>
> After your merge I get several warnings in 'addrlib' with LTO and even a
> compiler error (gcc (SUSE Linux) 9.2.1 20200128).
>
> I had to disable 'r600' ('swrast' is needed for 'nine') to get a working
> LTO and even better PGO radeonsi driver.
> I'm preparing GREAT LTO+PGO (the later is the greater) numbers over the
> last 2 months. I'll send my results later, today.
>
> Summary
> radeonsi is ~40% smaller and 16-20% faster with PGO (!!!).
>
> Honza and the GCC people (Intel's ICC folks) do GREAT things.
> 'glmark2' numbers are better then 'vkmark'. (Hello, Marek.).
>
> Need some sleep.
>
> See my log, below.
>
> Greetings and GREAT work!
>
> -Dieter
>
> Am 09.02.2020 15:46, schrieb Gert Wollny:
> > Am Donnerstag, den 23.01.2020, 20:31 +0100 schrieb Gert Wollny:
> >> has anybody any objections if I merge the r600/NIR code?
> >> Without explicitely setting the debug flag it doesn't change a
> >> thing, but it would be better to continue developing in-tree.
> > Okay, if nobody objects, I'll merge it Monday evening.
> >
> > Best,
> > Gert
>
> [1425/1433] Linking target src/gallium/targets/dri/libgallium_dri.so.
> FAILED: src/gallium/targets/dri/libgallium_dri.so
> c++  -o src/gallium/targets/dri/libgallium_dri.so
> 'src/gallium/targets/dri/8381c20@@gallium_dri@sha/target.c.o' -flto
> -fprofile-generate -Wl,--as-needed -Wl,--no-undefined -Wl,-O1 -shared
> -fPIC -Wl,--start-group -Wl,-soname,libgallium_dri.so
> src/mesa/libmesa_gallium.a src/mesa/libmesa_common.a
> src/compiler/glsl/libglsl.a src/compiler/glsl/glcpp/libglcpp.a
> src/util/libmesa_util.a src/util/format/libmesa_format.a
> src/compiler/nir/libnir.a src/compiler/libcompiler.a
> src/mesa/libmesa_sse41.a src/mesa/drivers/dri/common/libdricommon.a
> src/mesa/drivers/dri/common/libmegadriver_stub.a
> src/gallium/state_trackers/dri/libdri.a
> src/gallium/auxiliary/libgalliumvl.a src/gallium/auxiliary/libgallium.a
> src/mapi/shared-glapi/libglapi.so.0.0.0
> src/gallium/auxiliary/pipe-loader/libpipe_loader_static.a
> src/loader/libloader.a src/util/libxmlconfig.a
> src/gallium/winsys/sw/null/libws_null.a
> src/gallium/winsys/sw/wrapper/libwsw.a
> src/gallium/winsys/sw/dri/libswdri.a
> src/gallium/winsys/sw/kms-dri/libswkmsdri.a
> src/gallium/drivers/llvmpipe/libllvmpipe.a
> src/gallium/drivers/softpipe/libsoftpipe.a
> src/gallium/drivers/r600/libr600.a
> src/gallium/winsys/radeon/drm/libradeonwinsys.a
> src/gallium/drivers/radeonsi/libradeonsi.a
> src/gallium/winsys/amdgpu/drm/libamdgpuwinsys.a
> src/amd/addrlib/libaddrlib.a src/amd/common/libamd_common.a
> src/amd/llvm/libamd_common_llvm.a -Wl,--build-id=sha1 -Wl,--gc-sections
> -Wl,--version-script /opt/mesa/src/gallium/targets/dri/dri.sym
> -Wl,--dynamic-list /opt/mesa/src/gallium/targets/dri/../dri-vdpau.dyn
> /usr/lib64/libdrm.so -L/usr/local/lib -lLLVM-10git -pthread
> /usr/lib64/libexpat.so
> /usr/lib64/gcc/x86_64-suse-linux/9/../../../../lib64/libz.so -lm
> /usr/lib64/gcc/x86_64-suse-linux/9/../../../../lib64/libzstd.so
> -L/usr/local/lib -lLLVM-10git /usr/lib64/libunwind.so -ldl -lsensors
> -L/usr/local/lib -lLLVM-10git /usr/lib64/libdrm_radeon.so
> /usr/lib64/libelf.so -L/usr/local/lib -lLLVM-10git -L/usr/local/lib
> -lLLVM-10git -L/usr/local/lib -lLLVM-10git /usr/lib64/libdrm_amdgpu.so
> -L/usr/local/lib -lLLVM-10git -Wl,--end-group
> '-Wl,-rpath,$ORIGIN/../../../mesa:$ORIGIN/../../../compiler/glsl:$ORIGIN/../../../compiler/glsl/glcpp:$ORIGIN/../../../util:$ORIGIN/../../../util/format:$ORIGIN/../../../compiler/nir:$ORIGIN/../../../compiler:$ORIGIN/../../../mesa/drivers/dri/common:$ORIGIN/../../state_trackers/dri:$ORIGIN/../../auxiliary:$ORIGIN/../../../mapi/shared-glapi:$ORIGIN/../../auxiliary/pipe-loader:$ORIGIN/../../../loader:$ORIGIN/../../winsys/sw/null:$ORIGIN/../../winsys/sw/wrapper:$ORIGIN/../../winsys/sw/dri:$ORIGIN/../../winsys/sw/kms-dri:$ORIGIN/../../drivers/llvmpipe:$ORIGIN/../../drivers/softpipe:$ORIGIN/../../drivers/r600:$ORIGIN/../../winsys/radeon/drm:$ORIGIN/../../drivers/radeonsi:$ORIGIN/../../winsys/amdgpu/drm:$ORIGIN/../../../amd/addrlib:$ORIGIN/../../../amd/common:$ORIGIN/../../../amd/llvm'
>
> -Wl,-rpath-link,/opt/mesa/build/src/mesa
> -Wl,-rpath-link,/opt/mesa/build/src/compiler/glsl
> -Wl,-rpath-link,/opt/mesa/build/src/compiler/glsl/glcpp
> -Wl,-rpath-link,/opt/mesa/build/src/util
> -Wl,-rpath-link,/opt/mesa/build/src/util/format
> -Wl,-rpath-link,/opt/mesa/build/src/compiler/nir
> -Wl,-rpath-link,/opt/mesa/build/src/compiler
> -Wl,-rpath-link,/opt/mesa/build/src/mesa/drivers/dri/common
> -Wl,-rpath-link,/opt/mesa/build/src/gallium/state_trackers/dri
> -Wl,-rpath-link,/opt/mesa/build/src/gallium/auxiliary
> 

Re: [Mesa-dev] [ANNOUNCE] Mesa 20.0 branchpoint planned for 2020/01/29, Milestone opened

2020-01-29 Thread Marek Olšák
Feel free to make the branchpoint now. We don't really need to wait for
anybody. Drivers that need fixing can have their fixes backported later.

The longer we wait, the greater the chance that somebody will introduce new
regressions in master.

Marek

On Wed, Jan 29, 2020 at 11:02 PM Jason Ekstrand 
wrote:

> On Wed, Jan 29, 2020 at 4:46 AM Bas Nieuwenhuizen 
> wrote:
>
>> On Tue, Jan 28, 2020 at 8:46 PM Dylan Baker  wrote:
>> >
>> > Quoting Dylan Baker (2020-01-22 10:27:05)
>> > > Hi list, due to some last minute changes in plan I'll be managing the
>> 20.0
>> > > release. The release calendar has been updated, but the gitlab
>> milestone wasn't
>> > > opened. That has been corrected, and is here
>> > > https://gitlab.freedesktop.org/mesa/mesa/-/milestones/9, please add
>> any issues
>> > > or MRs you would like to land before the branchpoint to the milestone.
>> > >
>> > > Thanks,
>> > > Dylan
>> > >
>> >
>> > Hi list,
>> >
>> > There are still a fair number of issues and MRs opened for the 20.0
>> branch
>> > point, should we postpone the branch point?
>>
>> IMO we should primarily look at what is needed to be ready in time for
>> the Spring 2020 distro releases* in this release cycle. It doesn't
>> make sense to add a few more features if we're effectively postponing
>> improvements (most of which should already be committed!) getting into
>> the hands of users.
>>
>
> Scanning through the milestone, !3591 is the only thing on there that
> isn't a bugfix that we would likely want to backport anyway.  I don't know
> that we need to hold up the release on all the Gen12 Intel fixes.  We can
> back-port them if we want.
>
> --Jason
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [ANNOUNCE] Mesa 20.0 branchpoint planned for 2020/01/29, Milestone opened

2020-01-28 Thread Marek Olšák
I've added the VBO & CSO optimizations into the list. Let me know if this
is something you would like in 20.0.

Marek

On Wed, Jan 22, 2020 at 1:27 PM Dylan Baker  wrote:

> Hi list, due to some last minute changes in plan I'll be managing the 20.0
> release. The release calendar has been updated, but the gitlab milestone
> wasn't
> opened. That has been corrected, and is here
> https://gitlab.freedesktop.org/mesa/mesa/-/milestones/9, please add any
> issues
> or MRs you would like to land before the branchpoint to the milestone.
>
> Thanks,
> Dylan
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [Review Request (master branch)] st/mesa: release tgsi tokens for shader states

2019-12-19 Thread Marek Olšák
Pushed, thanks!

Marek

On Thu, Dec 19, 2019 at 2:22 PM Neha Bhende  wrote:

> Since we are using st_common_variant while creating variant for vertext
> program, we can release tokens created in st_create_vp_variant which
> are already stored in respective states.
> This fix memory leak found with piglit tests
>
> Fixes bc99b22a305b ('st/mesa: use a separate VS variant for the draw
> module')
>
> Reviewed-by: Charmaine Lee 
> ---
>  src/mesa/state_tracker/st_program.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/src/mesa/state_tracker/st_program.c
> b/src/mesa/state_tracker/st_program.c
> index a9ff68c1f50..ef10399fa18 100644
> --- a/src/mesa/state_tracker/st_program.c
> +++ b/src/mesa/state_tracker/st_program.c
> @@ -694,6 +694,10 @@ st_create_vp_variant(struct st_context *st,
> else
>vpv->base.driver_shader = pipe->create_vs_state(pipe, );
>
> +   if (state.tokens) {
> +  tgsi_free_tokens(state.tokens);
> +   }
> +
> return vpv;
>  }
>
> --
> 2.17.1
>
> ___
> mesa-stable mailing list
> mesa-sta...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-stable
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Merge bot ("Marge") enabled

2019-12-17 Thread Marek Olšák
Hi Eric,

Does it mean people no longer need push access, because Marge can merge
anything?

So any random person can create a merge request and immediately assign it
to Marge to merge it?

Marek

On Fri, Dec 13, 2019 at 4:35 PM Eric Anholt  wrote:

> I finally got back around to experimenting with the gitlab merge bot,
> and it turns out that the day I spent a few weeks back I had actually
> given up 5 minutes before the finish line.
>
> Marge is now enabled for mesa/mesa, piglit, and parallel-deqp-runner.
> How you interact with marge:
>
> - Collect your reviews
> - Put reviewed-by tags in your commits
> - When you would have clicked "Merge when pipeline succeeds" (or,
> worse, rebase and then merge when pipeline succeeds), instead edit the
> assignee of the MR (top right panel of the UI) and assign to Marge Bot
> - Marge will eventually take your MR, rebase it and let the pipeline run.
> - If the pipeline passes, Marge will merge it
> - If the pipeline fails, Marge will note it in the logs and unassign
> herself (so your next push with a "fix" won't get auto-merged until
> you decide to again).
>
> In the commit logs of the commits that Marge rebased (they'll always
> be rebased), you'll get:
>
> Part-of: <
> https://gitlab.freedesktop.org/anholt/gitlab-experiments/merge_requests/3>
>
> In the final commit of that MR, you'll get:
>
> Tested-by: Marge Bot
>  >
>
> I feel like this is a major improvement to our workflow, in terms of
> linking commits directly to their discussions without indirecting
> through google.
>
> Note that one Marge instance will only process one MR at a time, so we
> could end up backed up.  There's a mode that will form merge trains,
> but I don't understand that mode enough yet to trust it. I think for
> Mesa at this point this is going to be fine, as we should still be
> able to push tens of MRs through per day.  As we scale up, we may find
> that we need a separate Marge for piglit and other projects, which I
> should be able to set up reasonably at this point.
>
> Once we settle in with Marge and learn to trust our robot overlords,
> I'll update the contributor docs to direct people to Marge instead of
> the "merge when pipeline succeeds" button.  I'm also hoping that once
> our commit logs are full of links to discussions, we can drop the
> mandatory squashing of r-b tags into commit messages and thus make our
> process even easier for new contributors.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Requiring a full author name when contributing to mesa?

2019-12-11 Thread Marek Olšák
On Wed, Dec 11, 2019 at 7:47 PM Eric Engestrom  wrote:

> On 2019-12-11 at 23:46, Timothy Arceri  wrote:
> > On 12/12/19 10:38 am, Eric Engestrom wrote:
> > > On 2019-12-11 at 23:09, Eric Anholt  wrote:
> > >> On Wed, Dec 11, 2019 at 2:35 PM Timothy Arceri 
> wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> So it seems lately we have been increasingly merging patches with
> made
> > >>> up names, or single names etc [1]. The latest submitted patch has the
> > >>> name Icecream95. This seems wrong to me from a point of keeping up
> the
> > >>> integrity of the project. I'm not a legal expert but it doesn't seem
> > >>> ideal to be amassing commits with these type of author tags from that
> > >>> point of view either.
> > >>>
> > >>> Is it just me or do others agree we should at least require a proper
> > >>> name on the commits (as fake as that may be also)? Seems like a low
> bar
> > >>> to me.
> > >>
> > >> I'm of the opinion that in fact all names are made up,
> > >
> > > Whole heartedly agreed.
> > >
> > > Remember that many different cultures exist, and they have different
> customs
> > > around names. As an example, a teacher of mine had a single name, but
> the school
> > > required two separate "first name" and "last name" fields so he wrote
> his name twice,
> > > which appeared on every form we got from the school, yet everyone knew
> he didn't
> > > have what we called a "last name"/"family name".
> > > Another example is people from Asia who often assume a made up
> Western-sounding
> > > pseudonym to use when communicating with Western people, and those
> often don't
> > > look like real names to us.
> > >
> > > What looks like a real name to you?
> > > How would you even start to define such a rule?
> >
> > As per my reply to Eric Anholt I'm most concerned about the look of the
> > project. IMO contributions with names like Icecream95 or an atom symbol
> > just look unprofessional, opensource gets a hard enough time about its
> > professionalism as it is without encouraging this. A little common sense
> > can go a long way here.
>
> If you want to ask someone to provide a real name if you think they didn't
> I definitely agree,
> and if you want to document that we want real names I'm also ok with that,
> but all I'm saying is that you can't *require* it because there's no
> reliable way
> to enforce that.
>

The question is about whether we require real names. If people lie, that's
on them.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   3   4   5   6   7   8   9   10   >