On Tue, 2016-05-03 at 20:43 -0400, Tom Stellard wrote: > On Mon, May 02, 2016 at 01:11:18AM -0400, Jan Vesely wrote: > > > > From: Jan Vesely <jan.ves...@rutgers.edu> > > > > reserve buffer id 2 > > > > > > Signed-off-by: Jan Vesely <jan.ves...@rutgers.edu> > > --- > > needs llvm patches to be of use: > > https://github.com/jvesely/llvm/tree/eg-const > > > > passes program-scope-arrays piglit and fixes all builtin functions > > that > > are implemented using large tables (AMD Turks) > > > > src/gallium/drivers/r600/evergreen_compute.c | 12 ++++++++++-- > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/src/gallium/drivers/r600/evergreen_compute.c > > b/src/gallium/drivers/r600/evergreen_compute.c > > index 334897e..f498007 100644 > > --- a/src/gallium/drivers/r600/evergreen_compute.c > > +++ b/src/gallium/drivers/r600/evergreen_compute.c > > @@ -259,9 +259,11 @@ static void > > *evergreen_create_compute_state(struct pipe_context *ctx, > > radeon_elf_read(code, header->num_bytes, &shader->binary); > > r600_create_shader(&shader->bc, &shader->binary, > > &use_kill); > > > > + /* Upload code + ROdata */ > > shader->code_bo = r600_compute_buffer_alloc_vram(rctx- > > >screen, > > shader- > > >bc.ndw * 4); > > p = r600_buffer_map_sync_with_rings(&rctx->b, shader- > > >code_bo, PIPE_TRANSFER_WRITE); > > + //TODO: use util_memcpy_cpu_to_le32 ? > > memcpy(p, shader->bc.bytecode, shader->bc.ndw * 4); > > rctx->b.ws->buffer_unmap(shader->code_bo->buf); > > #endif > > @@ -612,9 +614,9 @@ static void > > evergreen_set_compute_resources(struct pipe_context *ctx, > > start, count); > > > > for (unsigned i = 0; i < count; i++) { > > - /* The First two vertex buffers are reserved for > > parameters and > > + /* The First three vertex buffers are reserved for > > parameters and > > * global buffers. */ > > - unsigned vtx_id = 2 + i; > > + unsigned vtx_id = 3 + i; > > if (resources[i]) { > > struct r600_resource_global *buffer = > > (struct r600_resource_global*) > > @@ -681,9 +683,15 @@ static void > > evergreen_set_global_binding(struct pipe_context *ctx, > > *(handles[i]) = util_cpu_to_le32(handle); > > } > > > > + /* globals for writing */ > > evergreen_set_rat(rctx->cs_shader_state.shader, 0, pool- > > >bo, 0, pool->size_in_dw * 4); > > + /* globals for reading */ > > evergreen_cs_set_vertex_buffer(rctx, 1, 0, > > (struct pipe_resource*)pool->bo); > > + > > + /* constants for reading, LLVM puts them in text segment > > */ > > + evergreen_cs_set_vertex_buffer(rctx, 2, 0, > > + (struct pipe_resource*)rctx- > > >cs_shader_state.shader->code_bo); > I see now you are binding the whole shader to the vertex buffer > rather > than just the readonly data, which is why you need to emit the > GlobalAddress in LLVM.
(phabricator did not send out an email for D19792. here's my reasoning, I'm not certain I understand everything) My first attempt (a year ago) used separate rodata section and constant 0. however, I could not find a way to get address of second+ global variables (the first is zero). so it only worked with the first global variable and failed the extended program-scope-arrays piglit. LLVM also used multiple rodata sections by default, so that caused additional problems (I don't remember if it was enough to fix this to get the piglit to pass). > > If you were to just bind the read-only data you could generate better > code in LLVM, because all the global address would just be constant > offsets from the start of the buffer and they could be folded into > other instructions. I don't think we'll see loads from constant addresses. since all global variables are initialized such loads get eliminated anyway. variable indices should produce just as good code with current approach. > > However, this would be a little more involved because you would have > to change llvm to emit the read-only data fro R600 into a separate > section. > I think your approach is fine, since it is what radeonsi is doing, > and > that makes it easier to share code for this in LLVM. If you want to > optimize this in the future you always do it in a follow up commit. SI uses relative addressing so it needs both runtime computation (getpc + X) and MCExpr+fixup. I don't know if the code is always loaded at fixed address if so, I think it can use the static approach, sharing even more code. > > Reviewed-by: Tom Stellard <thomas.stell...@amd.com> thank you, Jan > > > > > } > > > > /**
signature.asc
Description: This is a digitally signed message part
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev