El 30/10/17 a las 07:44, Pohjolainen, Topi escribió: > On Sun, Oct 29, 2017 at 11:17:11PM +0100, Chema Casanova wrote: >> On 29/10/17 19:55, Pohjolainen, Topi wrote: >>> On Thu, Oct 12, 2017 at 08:38:08PM +0200, Jose Maria Casanova Crespo wrote: >>>> We enable the use of 16-bit values in push constants >>>> modifying the assign_constant_locations function to work >>>> with 16-bit types. >>>> >>>> The API to access buffers in Vulkan use multiples of 4-byte for >>>> offsets and sizes. Current accountability of uniforms based on 4-byte >>>> slots will work for 16-bit values if they are allowed to use 32-bit >>>> slots. For that, we replace the division by 4 by a DIV_ROUND_UP, so >>>> 2-byte elements will use 1 slot instead of 0. >>>> >>>> We aligns the 16-bit locations after assigning the 32-bit >>>> ones. >>>> --- >>>> src/intel/compiler/brw_fs.cpp | 30 +++++++++++++++++++++++------- >>>> 1 file changed, 23 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp >>>> index a1d49a63be..8da16145dc 100644 >>>> --- a/src/intel/compiler/brw_fs.cpp >>>> +++ b/src/intel/compiler/brw_fs.cpp >>>> @@ -1909,8 +1909,9 @@ set_push_pull_constant_loc(unsigned uniform, int >>>> *chunk_start, >>>> if (!contiguous) { >>>> /* If bitsize doesn't match the target one, skip it */ >>>> if (*max_chunk_bitsize != target_bitsize) { >>>> - /* FIXME: right now we only support 32 and 64-bit accesses */ >>>> - assert(*max_chunk_bitsize == 4 || *max_chunk_bitsize == 8); >>>> + assert(*max_chunk_bitsize == 4 || >>>> + *max_chunk_bitsize == 8 || >>>> + *max_chunk_bitsize == 2); >>>> *max_chunk_bitsize = 0; >>>> *chunk_start = -1; >>>> return; >>>> @@ -1987,8 +1988,9 @@ fs_visitor::assign_constant_locations() >>>> int constant_nr = inst->src[i].nr + inst->src[i].offset / 4; >>> Did you test this with, for example, vec4? >> CTS has 16bit scalar, vec2 (uint,sint), vec4 (float) and matrix tests >> for push constants for compute and graphics pipelines. For vec4 you can try: >> >> dEQP-VK.spirv_assembly.instruction.compute.16bit_storage.push_constant_16_to_32.vector_float >> >> For push constant tests in general there are 42 tests, but vec3 aren't >> tested: >> >> dEQP-VK.*16bit_storage.*push_constant. >> >> >>> I've been toying with a glsl >>> lowering pass changing mediump floats into float16. I was curious to know >>> how >>> much is needed as you have addressed most of the things from NIR onwards. >>> Here I'm seeing offsets 0,2,4,6 which result into 0,0,1,1 when divided by >>> four. Don't we need something of this sort in addition? >> If i remember correctly, tests were testing to use push constants with >> 64 16bit values, to use the minimum spec maximum available as >> max_push_constants_size that is 128 bytes. So at the end the generated >> intrinsic was: >> >> vec4 16 ssa_4 = intrinsic load_uniform (ssa_3) () (0, 128) /* base=0 */ >> /* range=128 */ >> >> As the calculus here is to calculate the number of location used, and >> taking into account that the Vulkan API restrictions for push constants >> that says that push constant ranges that say that offset must be >> multiple of 4 and size must be multiple of 4, maintain the use of >> 4-bytes slots was ok for supporting the feature. Our code changes just >> take the accountability in the number of 32-bits location needed, mainly >> changing the divisions by 4 using DIV_ROUND_UP( , 4) to calculate sizes. > I'm probably misunderstanding something. Let me ask a few clarifying > questions. > > I'm reading that the incoming 16-bit values are given in 32-bit slots, and for > the same reason we place them in the push/pull buffers in 32-bits slots. In > other words a vec4 would take 16-bytes and each component would 32-bits apart?
Probably I explained quite bad. A f16vec4 would use 8-bytes, and each component is going to be 16-bits apart. The 32-bit multiple offset only applies to the first element. > If that is the case, then don't we need to adjust the register offsets > somewhere the way I did in the fragment below? Otherwise the offsets will > point to locations in the register that are simply 16-bits apart? Yes components are 16-bits apart. Because of that we don't need anything especial to adjust the offsets. At least we didn't needed for existing tests. > >>> commit 1a6d2bf3302f6e4305e383da0f27712dc5c20a67 >>> Author: Topi Pohjolainen <topi.pohjolai...@intel.com> >>> Date: Sun Oct 29 20:28:03 2017 +0200 >>> >>> fix alignment of 16-bit uniforms on 32-bit slots >>> >>> diff --git a/src/intel/compiler/brw_fs_nir.cpp >>> b/src/intel/compiler/brw_fs_nir.cpp >>> index 2f5443958a..586eb9d9ff 100644 >>> --- a/src/intel/compiler/brw_fs_nir.cpp >>> +++ b/src/intel/compiler/brw_fs_nir.cpp >>> @@ -4007,7 +4007,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder >>> &bld, nir_intrinsic_instr *instr >>> src.offset = const_offset->u32[0]; >>> >>> for (unsigned j = 0; j < instr->num_components; j++) { >>> - bld.MOV(offset(dest, bld, j), offset(src, bld, j)); >>> + const unsigned src_offset = >>> + src.type == BRW_REGISTER_TYPE_HF ? 2 * j : j; >>> + >>> + bld.MOV(offset(dest, bld, j), offset(src, bld, src_offset)); >>> >>> >>> >>> Then about the change of using 32-bit slots. This is now unconditional and >>> would require revisiting if we wanted to pack 16-bits tighter and possibly >>> increase the amount of uniforms that can be pushed. Similarly to Vulkan, in >>> GL the core stores uniforms as floats and I think we should keep it that >>> way. >>> I added support in the i965 backend to keep track of the types of the >>> uniforms and to convert 32-bit presentation to 16-bits on the fly in >>> gen6_constant_state.c::brw_param_value(). I don't like it that much but I >>> had >>> to start from somewhere. >>> My thinking is that we'd want to decouple the storage of the values and the >>> packing used in the compiler backend. Ideally keeping the mesa gl core and >>> the >>> api working with full 32-bit floats but using tight 16-bit slots in the >>> push/pull constant buffers. >>> This requires quite a bit more changes as we have structured >>> param[]/pull_param[] to work with 32-bit slots. >> At the beginning we though that we need to change to do that for VK >> 16-bit values on push constants, but in this case is client of the API >> the one that has control of how values are pushed. And it is the >> limitation related to offset multiples of 4 the one that simplifies the >> logic enabling the use of 32-bit slots. >> >> But as you comment in the case of uniforms, it would make sense to have >> an accountability to a level of byte or two byte-slot. But I think that >> it will only improve the situation for 16-bit scalar uniforms. Current >> push model will use two 32-bit slots for two scalar 16-bit uniforms when >> we could use half of that space with a byte level accountability. But in >> the case of vec2 and vec4 current model is working fine. Also in the >> case of array-of 16-bit scalars the consequence is just losing half-slot >> in the case of a last odd element. > Such as GL, I'm seeing how Vulkan tells how the storage and manipulation of > push constants should be. But that doesn't dictate how driver backend > chooses to compile the shader and how to construct the push/pull buffers > given to the hardware. > > In Anvil anv_cmd_buffer.c::anv_cmd_buffer_push_constants() allocates space > for the upload, consults the storage and can place the values in the > upload buffer independent of the storage. Such as I described for the > GL backend, see gen6_constant_state.c::brw_param_value(). > > Or am I missing something else? I think we have the same picture here. The backend can decide where values are placed. > >>> My current work can be found in: >>> >>> git://people.freedesktop.org/~tpohjola/mesa 16_bit_gles >> I've bisected the code because of a regression of the 16bit_storage >> push_constant test is caused at this commit. > Right, that may easily break things - all that I have is just enough to > run a few piglit tests in tests/spec/glsl-es-1.00/execution/ that I modified > to use uniforms/varyings/math and a couple of benchmarks. I'm trying to get > a general idea how much work there is on top of yours to support gles. > >> commit a8d43602174fa6062839c568690b06c6fab4f2d1 (HEAD, refs/bisect/bad) >> Author: Topi Pohjolainen <topi.pohjolai...@intel.com> >> Date: Thu Oct 26 20:16:42 2017 +0300 >> >> i965: Add support for uploading 16-bit uniforms from 32-bit store >> >> At this point 16-bit uniforms still take full 32-bit slots in the >> pull/push constant buffers and in shader deployment payload. >> >> Signed-off-by: Topi Pohjolainen <topi.pohjolai...@intel.com> >> >> >> >> You would need to include in your branch to test the 16bit_storage push >> constant tests the last commit of our series. >> >> anv: SPV_KHR_16bit_storage/VK_KHR_16bit_storage for gen8+ > I haven't been running Vulkan with my tree - I only focused on gles shaders. > There are a few benchmarks which may benefit from 16-bits. > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev