From: Marek Olšák <marek.ol...@amd.com> --- src/gallium/drivers/radeonsi/si_shader.h | 67 ++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+)
diff --git a/src/gallium/drivers/radeonsi/si_shader.h b/src/gallium/drivers/radeonsi/si_shader.h index 5b665b5..57685e0 100644 --- a/src/gallium/drivers/radeonsi/si_shader.h +++ b/src/gallium/drivers/radeonsi/si_shader.h @@ -19,20 +19,87 @@ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE * USE OR OTHER DEALINGS IN THE SOFTWARE. * * Authors: * Tom Stellard <thomas.stell...@amd.com> * Michel Dänzer <michel.daen...@amd.com> * Christian König <christian.koe...@amd.com> */ +/* The compiler middle-end architecture: Explaining (non-)monolithic shaders + * ------------------------------------------------------------------------- + * + * Typically, there is one-to-one correspondence between API and HW shaders, + * that is, for every API shader, there is exactly one shader binary in + * the driver. + * + * The problem with that is that we also have to emulate some API states + * (e.g. alpha-test, and many others) in shaders too. The two obvious ways + * to deal with it are: + * - each shader has multiple variants for each combination of emulated states, + * and the variants are compiled on demand, possibly relying on a shader + * cache for good performance + * - patch shaders at the binary level + * + * This driver uses something completely different. The emulated states are + * usually implemented at the beginning or end of shaders. Therefore, we can + * split the shader into 3 parts: + * - prolog part (shader code dependent on states) + * - main part (the API shader) + * - epilog part (shader code dependent on states) + * + * Each part is compiled as a separate shader and the final binaries are + * concatenated. This type of shader is called non-monolithic, because it + * consists of multiple independent binaries. Creating a new shader variant + * is therefore only a concatenation of shader parts (binaries) and doesn't + * involve any compilation. The main shader parts are the only parts that are + * compiled when applications create shader objects. The prolog and epilog + * parts are compiled on the first use and saved, so that their binaries can + * be reused by many other shaders. + * + * One of the roles of the prolog part is to compute vertex buffer addresses + * for vertex shaders. A few of the roles of the epilog part are color buffer + * format conversions in pixel shaders that we have to do manually, and write + * tessellation factors in tessellation control shaders. The prolog and epilog + * have many other important responsibilities in various shader stages. + * They don't just "emulate legacy stuff". + * + * Monolithic shaders are shaders where the parts are combined before LLVM + * compilation, and the whole thing is compiled and optimized as one unit with + * one binary on the output. The result is the same as the non-monolithic + * shader, but the final code can be better, because LLVM can optimize across + * all shader parts. Monolithic shaders aren't usually used except for these + * special cases: + * + * 1) Some rarely-used states require modification of the main shader part + * itself, and in such cases, only the monolithic shader variant is + * compiled, and that's always done on the first use. + * + * 2) When we do cross-stage optimizations for separate shader objects and + * e.g. eliminate unused shader varyings, the resulting optimized shader + * variants are always compiled as monolithic shaders, and always + * asynchronously (i.e. not stalling ongoing rendering). We call them + * "optimized monolithic" shaders. The important property here is that + * the non-monolithic unoptimized shader variant is always available for use + * when the asynchronous compilation of the optimized shader is not done + * yet. + * + * Starting with GFX9 chips, some shader stages are merged, and the number of + * shader parts per shader increased. The complete new list of shader parts is: + * - 1st shader: prolog part + * - 1st shader: main part + * - 2nd shader: prolog part + * - 2nd shader: main part + * - 2nd shader: epilog part + */ + /* How linking shader inputs and outputs between vertex, tessellation, and * geometry shaders works. * * Inputs and outputs between shaders are stored in a buffer. This buffer * lives in LDS (typical case for tessellation), but it can also live * in memory (ESGS). Each input or output has a fixed location within a vertex. * The highest used input or output determines the stride between vertices. * * Since GS and tessellation are only possible in the OpenGL core profile, * only these semantics are valid for per-vertex data: -- 2.7.4 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev