On Wed, Feb 12, 2014 at 9:10 AM, Roland Scheidegger <srol...@vmware.com> wrote: > Am 11.02.2014 22:58, schrieb Dave Airlie: >>>> dst.z = texture_depth(unit, lod) >>>> >>>> +.. opcode:: TG4 - Texture Gather (as per ARB_texture_gather) >>>> + Gathers the four texels to be used in a bi-linear >>>> + filtering operation and packs them into a single register. >>>> + Only works with 2D, 2D array, cubemaps, and cubemaps >>>> arrays. >>>> + For 2D textures, only the addressing modes of the sampler >>>> and >>>> + the top level of any mip pyramid are used. Set W to zero. >>>> + It behaves like the TEX instruction, but a filtered >>>> + sample is not generated. The four samples that contribute >>>> + to filtering are placed into xyzw in clockwise order, >>>> + starting with the (u,v) texture coordinate delta at the >>>> + following locations (-, +), (+, +), (+, -), (-, -), where >>>> + the magnitude of the deltas are half a texel. >>>> + >>>> + PIPE_CAP_TEXTURE_SM5 enhances this instruction to support >>>> + shadow per-sample depth compares, single component >>>> selection, >>>> + and a non-constant offset. It doesn't allow support for the >>>> + GL independent offset to get i0,j0. This would require >>>> another >>>> + CAP is hw can do it natively. For now we lower that before >>>> + TGSI. >>>> + >>>> +.. math:: >>>> + >>>> + coord = src0 >>>> + >>>> + component = src1 >>>> + >>>> + dst = texture_gather4 (unit, coord, component) >>>> + >>>> +(with SM5 - cube array shadow) >>>> + >>>> + coord = src0 >>>> + >>>> + compare = src1 >>>> + >>>> + dst = texture_gather (uint, coord, compare) >>>> + >>> So how does component selection work with the latter version? >>> I think it would be nice if you wouldn't really need two versions (so if >>> you don't support comparisons, the src would just be unused). >> >> That's docs not being clear enough if you read it like that. The >> second version is only for cube array shadow compares, which have no >> components. The first version is the same for non-shadow compares. > Ah right that works, I forgot you don't need the channel select with > shadow comparisons (not that I'm a big fan of such "overloaded" sources > but that's nothing new really). > >> >>> Also, FWIW for llvmpipe you'd probably wanted a native 4 offsets >>> versions, I don't think llvm could eliminate the huge amount of >>> duplicated code completely if you generate 4 texture lookups. Of course, >>> someone would need to implement it first (shouldn't be too difficult). >> >> Yeah llvmpipe might be in the category for using the extra CAP, I'm >> really hoping nvidia hw does do this, but the interface is kinda >> arbitrary and maybe we should consider another opcode, >> >> Since we have for SM5 nonconstant ones something like, >> >> TG4 TEMP[1], TEMP[1], SAMP[0] , TEMP[2].xyz >> which will sample around temp[1] i0,j0 - i1, j1 at the offset in temp[2] >> >> and >> TG4 TEMP[1], TEMP[1], SAMP[0], TEMP[2].xyz, TEMP[3].xyz, TEMP[4].xyz, >> TEMP[5].xyz >> which will sample i0,j0 from TEMP[1] and the respective offsets. >> > > Yes since the offsets are in separate offset structure and the amount of > offsets is indicated I think it should just work actually if a driver > wants to implement multiple offsets natively.
So you okay with this version I think it covers everything, and we can add a CAP if/when someone works out hw/llvmpipe for the 4 offset case. Dave. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev