On 02/08/2019 11:19 AM, Andy Ritger wrote:
(I'll omit EGL and Vulkan for the moment, for the sake of focus, and those
APIs have programmatic ways to enumerate and select GPUs.  Though, some
of what we decide here for GLX we may want to leverage for other APIs.)


Today, GLX implementations loaded into the X server register themselves
on a per-screen basis, GLXVND in the server dispatches GLX requests to
the registered vendor per screen, and libglvnd determines the client-side
vendor library to use by querying the per-screen GLX_VENDOR_NAMES_EXT
string from the X server (e.g., "mesa" or "nvidia").

The GLX_VENDOR_NAMES_EXT string can be overridden within libglvnd
through the __GLX_VENDOR_LIBRARY_NAME environment variable, though I
don't believe that is used much currently.

To enable GLX to be used in a multi-vendor PRIME GPU offload environment,
it seems there are several desirable user-visible behaviors:

* By default, users should get the same behavior we have today (i.e.,
   the GLX implementation used within the client and the server, for an X
   screen, is dictated by the X driver of the X screen).

* The user should be able to request a different GLX vendor for use on a
   per-process basis through either an environment variable (potentially
   reusing __GLX_VENDOR_LIBRARY_NAME) or possibly a future application
   profile mechanism in libglvnd.

* To make configuration optionally more "portable", the selection override
   mechanism should be able to refer to more generic names like
   "performance" or "battery", and those generic names should be mapped
   to specific GPUs/vendors on a per-system basis.

* To make configuration optionally more explicit, the selection override
   mechanism should be able to distinguish between individual GPUs by
   using hardware specific identifiers such as PCI BusID-based names like
   what DRI_PRIME currently honors (e.g., "pci-0000_03_00_0").

Do those behaviors seem reasonable?

If so, it seems like there are two general directions we could take to
implement that infrastructure in client-side libglvnd and GLXVND within
the X server, if the user or application profile requests a particular
vendor, either by vendor name (e.g., "mesa"/"nvidia"), functional
name (e.g., "battery"/"performance"), or hardware-based name (e.g.,
"pci-0000_03_00_0"/pci-0000_01_00_0"):

(1) If configured for PRIME GPU offloading (environment variable or
     application profile), client-side libglvnd could load the possible
     libGLX_${vendor}.so libraries it finds, and call into each to
     find which vendor (and possibly which GPU) matches the specified
     string. Once a vendor is selected, the vendor library could optionally
     tell the X server which GLX vendor to use server-side for this
     client connection.

(2) The GLX implementations within the X server could, when registering
     with GLXVND, tell GLXVND which screens they can support for PRIME
     GPU offloading.  That list could be queried by client-side libglvnd,
     and then used to interpret __GLX_VENDOR_LIBRARY_NAME and pick the
     corresponding vendor library to load.  Client-side would tell the X
     server which GLX vendor to use server-side for this client connection.

In either direction, if the user-requested string is a hardware-based
name ("pci-0000_03_00_0"), the GLX vendor library presumably needs to be
told that GPU, so that the vendor implementation can use the right GPU
(in the case that the vendor supports multiple GPUs in the system).

But, both (1) and (2) are really just points on a continuum.  I suppose
the more general question is: how much of the implementation should go
in the server and how much should go in the client?

At one extreme, the client could do nearly all the work (with the
practical downside of potentially loading multiple vendor libraries in
order to interpret __GLX_VENDOR_LIBRARY_NAME).

At the other extreme, the server could do nearly all the work of
generating the possible __GLX_VENDOR_LIBRARY_NAME strings (with the
practical downside of each server-side GLX vendor needing to enumerate
the GPUs it can drive, in order to generate the hardware-specific
identifiers).

I'm not sure where on that spectrum it makes the most sense to land,
and I'm curious what others think.

Thanks,
- Andy


For a more concrete example, this is what I've been working on for a client-based interface:
https://github.com/kbrenneman/libglvnd/tree/libglx-gpu-offloading

For this design, I've tried to keep the interface as simple as possible and to impose as few requirements or assumptions as possible. The basic idea behind it is that the only thing that a GLX application has to care about is calling GLX functions, and the only thing that libglvnd has to care about is forwarding those functions to the correct vendor library.

The general design is this:
* Libglvnd gets a list of alternate vendor libraries from an app profile (config file, environment variable, whatever) * For each vendor in that list, libglvnd will load the vendor and call a new callback function, which asks the vendor to set up offloading. This call applies to the whole display, so the vendor can do all of its display initialization here. * If that callback succeeds, then libglvnd calls a second function to check which screens the vendor actually supports offloading on. Libglvnd assigns the vendor to those screens.
* For any remaining screens, libglvnd will use its current selection logic.

The entire interface is defined in an extension to the libglvnd GLX vendor library interface. That means the interface itself is entirely client-side, but a driver is free to use whatever combination of client- and server-side logic it wants. For example, a driver can implement device enumeration in the client vendor library (like Mesa does), or it could do that in the server and communicate the results back to the client.

You could also have a multiple client vendor libraries that all work with a single server-side library, or even a client vendor library that doesn't have a server-side counterpart at all.

The profile that libglvnd uses is just a list of vendor library names that libglvnd should try before it falls back to its normal vendor selection. Along with each vendor name, the profile can also optionally have some vendor-specific configuration data. That extra data can be used to select a specific device. For Mesa, for example, you could use the same string that you'd otherwise specify by setting the DRI_PRIME environment variable.

The config file format I put together is JSON-based. It contains a list of profiles (selected based on the executable name of the process), and each profile contains a list of vendors. In addition to naming a vendor directly, a profile can list a generic descriptor, which acts like a macro that expands out to a list of vendors. Drivers can install config files to provide profiles and to provide definitions for those descriptors. Libglvnd will merge the vendor lists in profiles and descriptors from different files so that multiple drivers don't clobber each other. As a result, it should be possible for vendors (or distros) to provide reasonable default behavior, but still allow a user to override any profile or descriptor if they want to.

I think a client-based interface like this is a strict functional superset of anything that requires server-side device enumeration. GLXVND would have to rely on the server-side vendor libraries to do that enumeration, and that same logic could just as easily be an implementation detail between a client and server library.

The one exception is that this interface doesn't allow offloading to different vendors on different screens if no single vendor can handle all of them, but in order to run into that case, you'd need at least two X screens and at least four different GPU vendors. That's still not a client versus server limitation, though, that's just a limitation of libglvnd selecting a single offload vendor and letting it initialize the whole display all at once.

Since this seems to be a sticking point, there's also an option to avoid unnecessarily loading extra client-side vendor libraries. If a client vendor needs a server-side counterpart, then libglvnd can filter it out based on a really simple server query. Right now, it just checks the GLX_VENDOR_NAMES_EXT string (since that was easy to test), but we may want to define some new string for this. This is the closest that libglvnd gets to the server at any point in this process, and even this part is optional and should be a pretty trivial extension to the GLXVND interface in the server.

-Kyle

_______________________________________________
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: https://lists.x.org/mailman/listinfo/xorg-devel

Reply via email to