forgot to add mesa-dev when I sent (again). ---------- Forwarded message ---------- From: "Jacob Lifshay" <programmerj...@gmail.com> Date: Feb 13, 2017 8:27 AM Subject: Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc To: "Nicolai Hähnle" <nhaeh...@gmail.com> Cc:
> > On Feb 13, 2017 7:54 AM, "Nicolai Hähnle" <nhaeh...@gmail.com> wrote: > > On 13.02.2017 03:17, Jacob Lifshay wrote: > >> On Feb 12, 2017 5:34 PM, "Dave Airlie" <airl...@gmail.com >> <mailto:airl...@gmail.com>> wrote: >> >> > I'm assuming that control barriers in Vulkan are identical to >> barriers >> > across a work-group in opencl. I was going to have a work-group be >> a single >> > OS thread, with the different work-items mapped to SIMD lanes. If >> we need to >> > have additional scheduling, I have written a javascript compiler >> that >> > supports generator functions, so I mostly know how to write a llvm >> pass to >> > implement that. I was planning on writing the shader compiler >> using llvm, >> > using the whole-function-vectorization pass I will write, and >> using the >> > pre-existing spir-v to llvm translation layer. I would also write >> some llvm >> > passes to translate from texture reads and stuff to basic vector >> ops. >> >> Well the problem is number of work-groups that gets launched could be >> quite high, and this can cause a large overhead in number of host >> threads >> that have to be launched. There was some discussion on this in >> mesa-dev >> archives back when I added softpipe compute shaders. >> >> >> I would start a thread for each cpu, then have each thread run the >> compute shader a number of times instead of having a thread per shader >> invocation. >> > > This will not work. > > Please, read again what the barrier() instruction does: When the barrier() > call is reached, _all_ threads within the workgroup are supposed to be run > until they reach that barrier() call. > > > to clarify, I had meant that each os thread would run the sections of the > shader between the barriers for all the shaders in a work group, then, when > it finished the work group, it would go to the next work group assigned to > the os thread. > > so, if our shader is: > a = b + tid; > barrier(); > d = e + f; > > and our simd width is 4, our work-group size is 128, and we have 16 os > threads, then it will run for each os thread: > for(workgroup = os_thread_index; workgroup < workgroup_count; workgroup++) > { > for(tid_in_workgroup = 0; tid_in_workgroup < 128; tid_in_workgroup += > 4) > { > ivec4 tid = ivec4(0, 1, 2, 3) + ivec4(tid_in_workgroup + workgroup > * 128); > a[tid_in_workgroup / 4] = ivec_add(b[tid_in_workgroup / 4], tid); > } > memory_fence(); // if needed > for(tid_in_workgroup = 0; tid_in_workgroup < 128; tid_in_workgroup += > 4) > { > d[tid_in_workgroup / 4] = vec_add(e[tid_in_workgroup / 4], > f[tid_in_workgroup / 4]); > } > } > // after this, we run the next rendering or compute job > > >> > I have a prototype rasterizer, however I haven't implemented >> binning for >> > triangles yet or implemented interpolation. currently, it can handle >> > triangles in 3D homogeneous and calculate edge equations. >> > https://github.com/programmerjake/tiled-renderer >> <https://github.com/programmerjake/tiled-renderer> >> > A previous 3d renderer that doesn't implement any vectorization >> and has >> > opengl 1.x level functionality: >> > https://github.com/programmerjake/lib3d/blob/master/softrender.cpp >> <https://github.com/programmerjake/lib3d/blob/master/softrender.cpp> >> >> Well I think we already have a completely fine rasterizer and binning >> and whatever >> else in the llvmpipe code base. I'd much rather any Mesa based >> project doesn't >> throw all of that away, there is no reason the same swrast backend >> couldn't >> be abstracted to be used for both GL and Vulkan and introducing >> another >> just because it's interesting isn't a great fit for long term project >> maintenance.. >> >> If there are improvements to llvmpipe that need to be made, then that >> is something >> to possibly consider, but I'm not sure why a swrast vulkan needs a >> from scratch >> raster implemented. For a project that is so large in scope, I'd think >> reusing that code >> would be of some use. Since most of the fun stuff is all the texture >> sampling etc. >> >> >> I actually think implementing the rasterization algorithm is the best >> part. I wanted the rasterization algorithm to be included in the >> shaders, eg. triangle setup and binning would be tacked on to the end of >> the vertex shader and parameter interpolation and early z tests would be >> tacked on to the beginning of the fragment shader and blending on to the >> end. That way, llvm could do more specialization and instruction >> scheduling than is possible in llvmpipe now. >> >> so the tile rendering function would essentially be: >> >> for(i = 0; i < triangle_count; i+= vector_width) >> jit_functions[i](tile_x, tile_y, &triangle_setup_results[i]); >> >> as opposed to the current llvmpipe code where there is a large amount of >> fixed code that isn't optimized with the shaders. >> >> >> > The scope that I intended to complete is the bare minimum to be >> vulkan >> > conformant (i.e. no tessellation and no geometry shaders), so >> implementing a >> > loadable ICD for linux and windows that implements a single queue, >> vertex, >> > fragment, and compute shaders, implementing events, semaphores, >> and fences, >> > implementing images with the minimum requirements, supporting a >> f32 depth >> > buffer or a f24 with 8bit stencil, and supporting a >> yet-to-be-determined >> > compressed format. For the image optimal layouts, I will probably >> use the >> > same chunked layout I use in >> > >> https://github.com/programmerjake/tiled-renderer/blob/master >> 2/image.h#L59 >> <https://github.com/programmerjake/tiled-renderer/blob/maste >> r2/image.h#L59> >> , >> > where I have a linear array of chunks where each chunk has a >> linear array of >> > texels. If you think that's too big, we could leave out all of the >> image >> > formats except the two depth-stencil formats, the 8-bit and 32-bit >> integer >> > and 32-bit float formats. >> > >> >> Seems like a quite large scope, possibly a bit big for a GSoC though, >> esp one that >> intends to not use any existing Mesa code. >> >> >> most of the vulkan functions have a simple implementation when we don't >> need to worry about building stuff for a gpu and synchronization >> (because we have only one queue), and llvm implements most of the rest >> of the needed functionality. If we leave out most of the image formats, >> that would probably cut the amount of code by a third. >> >> >> Dave. >> >> >> >> >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >> >> > >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev