On Mon, Jul 17, 2017 at 4:35 AM, Samuel Pitoiset <samuel.pitoi...@gmail.com> wrote: > > > On 07/15/2017 02:54 AM, Marek Olšák wrote: >> >> On Wed, Jul 5, 2017 at 1:42 PM, Nicolai Hähnle <nhaeh...@gmail.com> wrote: >>> >>> On 04.07.2017 15:05, Samuel Pitoiset wrote: >>>> >>>> >>>> Using VRAM address as bindless handles is not a good idea because >>>> we have to use LLVMIntToPTr and the LLVM CSE pass can't optimize >>>> because it has no information about the pointer. >>>> >>>> Instead, use slots indexes like the existing descriptors. >>>> >>>> This improves performance with DOW3 by +7%. >>> >>> >>> >>> Wow. >>> >>> The thing is, burning a pair of user SGPRs for this seems a bit overkill, >>> especially since it also hurts apps that don't use bindless at all. >>> >>> Do you have some examples of how LLVM fails here? Could we perhaps avoid >>> most of the performance issues by casting 0 to an appropriate pointer >>> type >>> once, and then using the bindless handle as an index relative to that >>> pointer? >> >> >> The problem is inttoptr doesn't support noalias and LLVM passes assume >> it's a generic pointer and therefore don't optimize it. radeonsi loads >> descriptors before each use and relies on CSE to unify all equivalent >> loads that are close to each other. Without CSE, the resulting code is >> very bad. >> >> Another interesting aspect of having the bindless descriptor array in >> user SGPRs is that we can do buffer invalidations easily by >> reuploading the whole array. That, however, adds a lot of overhead, >> because the array is usually huge (64 bytes * 1000 slots), so it's >> usually worse than the current solution (partial flushes + >> WRITE_DATA). The bindless array could be packed better though. >> Textures need 12 dwords, images need 8 dwords, and buffers need 4 >> dwords. Right now, all slots have 16 dwords. >> >> Samuel, sorry I haven't had time to look at these patches yet. > > > No worries, but are you fine with this solution? If yes, I will fix up patch > 1.
Yes, I'm OK with the solution. The user SGPR usage should decrease when we add support for 32bit pointers (actually we'll just need one opcode to work with 32bit pointers: the 32bit->64bit address space cast). Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev