On Sat, Aug 6, 2016 at 3:01 AM, Eric Anholt <e...@anholt.net> wrote: > Rob Clark <robdcl...@gmail.com> writes: > >> On Fri, Aug 5, 2016 at 8:42 PM, Jan Ziak <0xe2.0x9a.0...@gmail.com> wrote: >>> Mesa source code prior to this patch uses both RTLD_NOW and RTLD_LAZY. >>> This patch removes all RTLD_NOW in favor of RTLD_LAZY. >>> >>> In comparison to early binding, lazy binding reduces CPU instruction count >>> of small GL apps (e.g: glxinfo) by 6 million instructions. >>> Larger apps won't notice the difference. >> >> tbh, I don't know the background of existing places that use RTLD_LAZY >> instead of RTLD_NOW (but my experience w/ xserver using LAZY has not >> been positive, so I think going the other direction seems like a good >> idea).. But I'm not sure that optimizing for glxinfo is the best goal. >> I know that at least for freedreno a lot of the startup time for small >> real gl apps (ie. something that mostly matters for piglit runs) goes >> to constructing regalloc interference graph.. maybe there is some way >> to leverage what is being done for on-disk shader cache to cache some >> of this up-front work and make a meaningful reduction in startup cost >> for things that actually do a bit more than glxinfo. (Plus speeding >> up piglit runs is actually a real world benefit..) > > I do think that RTLD_LAZY makes sense, and there's no reason to waste > the CPU time if we don't need it. If nothing else, we all run a lot of > piglit processes that all create contexts. As far as "what if there are > unresolved symbols or something?", I think if we have symbols not being > covered by piglit even once, we've already lost.
well, for something like shader_runner, I wonder if there is some way to tell what % of symbols actually get resolved? Maybe it is lower than I was expecting. > For your regalloc, have you looked at i965's direct q value calculation > in brw_fs_reg_allocate.cpp? That might save you a ton of time. That > said, I was skimming a paper recently that seemed to be saying that if > you can assume a not-completely-general set of register classes, you can > do the equivalent of the pq test without the giant table. I do actually compute the q values, like i965. I do have more regs (but have restricted things to fewer classes). Oh, and a bunch of half-precision regs too, but fewer classes there since I need to use full precision for args to texture sample instructions so that removes a couple permutations. Anyways, I haven't looked at it for a while, but probably just comes down to overhead being more noticeable on slower devices ;-) I wouldn't mind having a look at that paper if you can find it again. BR, -R _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev