> To me this speaks to another aspect of the gallium interface which is > a bit odd -- in particular the way several of our interfaces basically > copy their inputs into a structure and pass that back to the state > tracker. Why are we doing that? The state tracker already knows what > it asked us to do, and there is no reason to assume that it needs us > to re-present that information back to it.
Yes, only the CSOs don't have this form of copying: all other structures include the input parameters there. As a random example pipe_sampler_view has the lots of parameters that a driver would have converter into the hardware format and are thus redundant, and unlikely to be read by state tracker. Textures and buffers also have many visible data members that the state tracker may or may not read. In particular, the Mesa state tracker already keeps everything in the Mesa internal structures, and so benefits little from such data We may want to consider going toward making _all_ Gallium structures opaque (and, by the way, using declared-only structs instead of void* like we do for CSOs, which are not checkable by the compiler). Another serious data duplication issue are drivers that just copy the input state in internal structures and return, to then process everything in draw calls. This usually results in state being duplicated (and copied) 3 times: in Mesa internal structures, in the state tracker structures and then in the driver. The draw module may also keep a 4th copy of the state. Note that when reference counting is involved, copies are even more expensive since they now need atomic operations. Usually drivers do this because: 1. They need to pass data to the draw module in case of fallbacks, and thus cannot send it to hardware and forget about it 2. They need to recreate the whole hardware context state in some cases 3. They multiplex multiple pipe_contexts on a single screen 4. They need a global view of state, rather than a single state change at a time, to decide what to do A possible solution is to remove all set_* and bind_* calls and replace them with data members of pipe_context that the state tracker would use instead of its own internal structures. In addition, and a new "what's new" bitmask would be added, and the driver would check it on draw calls. Performance-wise, this replaces num_state_changes dynamic function calls to the driver, with (log2(total_states) + num_state_changes) branches to check the "what's new" bitmask. Furthermore: 1. State is never copied, since the state tracker constructs it in place 2. There is no longer any need for "state save helper" in the blitter module and similar 3. The draw module can potentially directly read state from pipe_context instead of duplicating it yet a 4. Drivers no longer need to have all the functions that store the parameters, set a dirty flag and return Note that the Direct3D DDI does not do this, but they have to keep binary compatibility, which is easier with Set* calls than this scheme. softpipe, nvfx, nv50, r300 and probably others already do this internally, and having the state tracker itself construct the data would remove a lot of redundant copying code and increase performance. Having drivers capable of doing "send-to-hardware-and-forget-about-it" on arbitrary state setting could be a nice thing instead, but unfortunately a lot of hardware fundamentally can't do this, since for instance: 1. Shaders need to be all seen to be linked, possibly modifying the shaders themselves (nv30) 2. Constants need to be written directly into the fragment program (nv30-nv40) 3. Fragment programs depend on the viewport to implement fragment.position (r300) 4. Fragment programs depend on bound textures to specify normalization type and emulate NPOT (r300, r600?, nv30) and so on... 5. Sometimes sampler state and textures must be seen together since the hardware mixes it > The only really new information provided by the driver to the state > tracker by transfer_create + transfer_map is: > - the pointer to the data > - stride > - slice stride There is also the 3D box, unless transfers start covering the whole resource, which seems really suboptimal for stuff like glTexSubImage. This needs to be provided to the driver unless a buffer-specialized interface is made (then a 1D box is enough). > Thanks for the summary. I'd add that there is also some information > available publicly about the D3D10 DDI, which follows a slightly > different interface to the API. In that world, there is a single > create resource function: It is indeed extremely interesting, and it looks like it should be the first place to look for inspiration for Gallium interface. I added a comparison of the D3D11 DDI and Gallium to src/gallium/docs. > There is however clearly concern about the possible need for > specialized transfer mechanisms for particular buffer types. It seems > like they've taken an approach that leaves the choice to the driver > whether to specialize or not -- basically there are a number of > specialized map/unmap entrypoints, but all with the same function > prototype so that a driver could if it wanted to point them all to a > single generic implementation, or if it preferred, provide specialized > implementations for some of them. There is some discussion of these > choices in the page below: Yes, indeed; not sure why they do it though since the user-visible API appears to be unified. > I'm really keen to get gallium-resources merged - probably combined > with the buffer_usage_cleanup branch. I suspect there are some > lingering bugs in -resources that are addressed by the cleanup branch. > Have you had a chance to do any testing of the changes I made on > -resources or -cleanup? I added a commit to fix nvfx and make nv50 compile on gallium-resources; some of the remaining issues seem to be addressed by buffer-usage-cleanup. It may be useful to merge gallium-resources into -cleanup again so the latest fixes there are picked up. ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev