Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
On Tue, Feb 23, 2016 at 7:12 AM, Pierre Moreau wrote: > On 11:43 AM - Feb 23 2016, Hans de Goede wrote: >> Hi, >> > > [snip] > >> >> >You may have to add LOAD64/STORE64 for 64-bit >> >addresses though. Or we could decree that all addressing on global >> >memory shall be 64-bit (and thus read the .xy components of the >> >address source). >> >> I would prefer to keep LOAD / STORE semantics the same as with >> other LOAD / STORE -s to / from 1d buffers. >> >> I think that in the end the tgsi backend for llvm will get both >> a 32 bit and a 64 bit mode, like the nvptx backend already has. >> >> And then the 64 bit backend will use a new LOAD64 / STORE64 >> also do not forget that keeping 64 bit pointers takes twice as >> much registers, so 32 bit will likely be optimal in a lot of >> cases. I guess since OpenCL does not give the user a way >> to select which mode to use we will end up with some sort >> of heuristic based on the amount of memory on the card or >> some such. >> >> After all using 64 bit pointers does not make a lot of sense >> on a card with only 1 GB of RAM (yes I know we're talking virtual >> address space here). >> >> Anyways this all really is too soon to tell. Maybe the performance >> impact of using 64 bit pointers is negligible. But I think it would >> be good (and consistent) to keep LOAD / STORE taking 32 bit addresses >> even for MEMORY and add a LOAD64 / STORE64 when I get around to >> implementing a 64 bit mode for the llvm tgsi backend (or when others >> need them). >> >> >>>Another way of looking at it is that instead of having the hacky >> >>>RES[12345] being hardcoded to mean something special, you now have a >> >>>dedicated file called 'MEMORY', which has identical semantics. >> >> >> >> >> >>I'm all for getting rid of the RES[12345] hack :) >> >> >> >>I guess where you write "you now have a dedicated file called 'MEMORY'" >> >>You mean up to X dedicated MEMORY[#] files, one for each of GLOBAL, SHARED >> >>and LOCAL at least, and probably as discussed one for INPUT ? >> >> >> >>This all sounds good to me, as said my worry was that MEMORY would have >> >>an implied base address like BUFFER has, now that you've >> >>made clear that MEMORY does not have this I'm happy :) >> > >> >There's a bit of a wrinkle here, and it's questionable whether we want >> >to allow for this somehow, but... Tesla actually has no way to address >> >global memory. It's always done with a base offset (which can be set >> >to 0). The trick is that it can only address 32 bits at a time, >> >there's no 64-bit addressing. But it has *16* such "global" memory >> >spaces, i.e. which are each base + up to 32-bit offset [and ultimately >> >only 40 bits of addressability]. I don't know if OpenCL provides >> >something good for that, if it does we can use semantic indices on the >> >GLOBAL to make it like >> > >> >DCL MEMORY[0], GLOBAL[0] >> >DCL MEMORY[1], GLOBAL[1] >> > >> >etc. But again, this is pretty optional. >> >> I think that for Tesla we can just only support the tgsi32 target >> and not the tgsi64 target, at least that is how I envision things >> today, who knows what tomorrow will bring :) > > Fermi is the first family supporting 64 bit addresses, on top of 32 bit > addresses, while Kepler can only do 64 bit IIRC (or is it Maxwell which > dropped > the 32 bit support?). I've only ever seen the blob do 64-bit addressing in OpenGL shaders on Fermi+. However Hans assures me that 32-bit addressing actually works fine, presumably the upper 8 bits are set to 0 or some constant supplied via some register somewhere. All of the SM20+ ISA's have a separate "wide address" bit in their gmem-using instructions. (aka "E" in nvdisasm) Anyways, the only reason to add LOAD64 is not to muddy LOAD semantics. LOAD should remain 32-bit (single-component), while LOAD64 would be defined only for 1d entities and combine the first 2 components of the address source. I think Hans and I are in agreement on most points now... the only issue is that we don't really control where in VA space buffers are allocated, so you even if you have 1MB of vram, it might end up being somewhere in far-off 40-bit land. Ben has promised to provide such control in the future, but who knows when that work will land. -ilia ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
On 11:43 AM - Feb 23 2016, Hans de Goede wrote: > Hi, > [snip] > > >You may have to add LOAD64/STORE64 for 64-bit > >addresses though. Or we could decree that all addressing on global > >memory shall be 64-bit (and thus read the .xy components of the > >address source). > > I would prefer to keep LOAD / STORE semantics the same as with > other LOAD / STORE -s to / from 1d buffers. > > I think that in the end the tgsi backend for llvm will get both > a 32 bit and a 64 bit mode, like the nvptx backend already has. > > And then the 64 bit backend will use a new LOAD64 / STORE64 > also do not forget that keeping 64 bit pointers takes twice as > much registers, so 32 bit will likely be optimal in a lot of > cases. I guess since OpenCL does not give the user a way > to select which mode to use we will end up with some sort > of heuristic based on the amount of memory on the card or > some such. > > After all using 64 bit pointers does not make a lot of sense > on a card with only 1 GB of RAM (yes I know we're talking virtual > address space here). > > Anyways this all really is too soon to tell. Maybe the performance > impact of using 64 bit pointers is negligible. But I think it would > be good (and consistent) to keep LOAD / STORE taking 32 bit addresses > even for MEMORY and add a LOAD64 / STORE64 when I get around to > implementing a 64 bit mode for the llvm tgsi backend (or when others > need them). > > >>>Another way of looking at it is that instead of having the hacky > >>>RES[12345] being hardcoded to mean something special, you now have a > >>>dedicated file called 'MEMORY', which has identical semantics. > >> > >> > >>I'm all for getting rid of the RES[12345] hack :) > >> > >>I guess where you write "you now have a dedicated file called 'MEMORY'" > >>You mean up to X dedicated MEMORY[#] files, one for each of GLOBAL, SHARED > >>and LOCAL at least, and probably as discussed one for INPUT ? > >> > >>This all sounds good to me, as said my worry was that MEMORY would have > >>an implied base address like BUFFER has, now that you've > >>made clear that MEMORY does not have this I'm happy :) > > > >There's a bit of a wrinkle here, and it's questionable whether we want > >to allow for this somehow, but... Tesla actually has no way to address > >global memory. It's always done with a base offset (which can be set > >to 0). The trick is that it can only address 32 bits at a time, > >there's no 64-bit addressing. But it has *16* such "global" memory > >spaces, i.e. which are each base + up to 32-bit offset [and ultimately > >only 40 bits of addressability]. I don't know if OpenCL provides > >something good for that, if it does we can use semantic indices on the > >GLOBAL to make it like > > > >DCL MEMORY[0], GLOBAL[0] > >DCL MEMORY[1], GLOBAL[1] > > > >etc. But again, this is pretty optional. > > I think that for Tesla we can just only support the tgsi32 target > and not the tgsi64 target, at least that is how I envision things > today, who knows what tomorrow will bring :) Fermi is the first family supporting 64 bit addresses, on top of 32 bit addresses, while Kepler can only do 64 bit IIRC (or is it Maxwell which dropped the 32 bit support?). Regards, Pierre > > Regards, > > Hans > ___ > Nouveau mailing list > Nouveau@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau signature.asc Description: PGP signature ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
Hi, On 22-02-16 17:59, Ilia Mirkin wrote: On Mon, Feb 22, 2016 at 11:50 AM, Hans de Goede wrote: Hi, On 22-02-16 17:13, Ilia Mirkin wrote: On Mon, Feb 22, 2016 at 11:00 AM, Ilia Mirkin wrote: On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede wrote: But assuming I'm right, what I'm proposing is that instead of passing the input in as a global buffer, to instead pass it in as a const buffer. As such instead of sticking it into ->set_global_binding, you'd stick it into ->set_constant_buffer, and then you'll be able to refer to it as CONST[0], CONST[1], etc. (Which are, implicitly, CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim when it's 0.) You don't even have to load these, you can use them as args directly anywhere you like (except as indirect addresses). The old code would actually take the supplied inputs, stick them into a constbuf, and then lower RINPUT accesses to load from that constbuf. I'm suggesting we cut out the middleman. By the way, another term for "constant buffer" is "uniform buffer", on the off chance it helps. Basically it's super-cached by the shader for values that never change across shader invocations. [And there's special stuff in the hw to allow running multiple sets of shader invocations with different "constant" values... or so we think.] I'm fine with using constant buffers for the input, it is not the mechanism I'm worried about it is the tgsi syntax to express things, I think it would be beneficial for the tgsi syntax to be abstract, and not worry about the underlying mechanism, this will i.e. allow us to use shared memory for input on tesla and const bufs on later generations without the part generating the tgsi code needing to worry about this. Yeah, I think you're right. I didn't realize that tesla had a special form of input for user params, I assumed it was just the usual thing. So forget about constbufs, go with the INPUT thing. Which is great, since we had one value left over in that (future) 2-bit field :) ### Somewhat unrelated to the input problem, I'm also somewhat worried about the addressing method for MEMORY type registers. Looking at the old RES stuff then the "index" passed into say a LOAD was not as much an index as it was simply a 32 bit GPU virtual memory address, which fits well with the OpenCL ways of doing things (the register number as in the 55 in RES[55] was more or less ignored). Where as, e.g. the new BUFFER style "registers" the index really is an index, e.g. doing: LOAD TEMP[0].x, BUFFER[0], IMM[0] resp. LOAD TEMP[0].x, BUFFER[1], IMM[0] Will read from a different memory address, correct ? Correct -- BUFFER[0] refers to the buffer at binding point 0, and BUFFER[1] refers to the buffer at binding point 1. They might, in fact, overlap, or even be the same buffer. But the code doesn't know about that. Ack. So how will this work for MEMORY type registers ? For OpenCL having the 1-dimensional behavior of RES really is quite useful, and having the address be composed of a hidden base address which gets determined under the hood from the register number, and then adding an index on top of it does not fit so well. Not sure what the question is... you have code like int *foo = [pointer value from user input]; *foo = *(foo + 5); right? So that'd just become MOV TEMP[0].x, ADD TEMP[0].y, TEMP[0].x, 5 * 4 LOAD TEMP[1].x, MEMORY[0] (which is global), TEMP[0].y STORE MEMORY[0], TEMP[0].x, TEMP[1].x or perhaps I'm misunderstanding something? MEMORY, GLOBAL == the global virtual memory address space, not some specific buffer. Trying to load address 0 from it will likely lead to sadness, unless you happen to have something mapped there. BUFFER has an implied base address, based on the binding point, but MEMORY has no such thing. OK, that answers my questions / worries, I was worried that MEMORY too would have an implied base address, which would more or less only get in the way with opencl, but if the memory register file takes a virtual memory address as second operand to LOAD then I'm happy. So I guess that if we mix in say TGSI-shared / OpenCL-local memory them I would do: DCL MEMORY[0], GLOBAL DCL MEMORY[1], SHARED And then to load something from global mem at offset TEMP[0].y: LOAD TEMP[0].x, MEMORY[0], TEMP[0]. And to load something from the shared mem at offset TEMP[0].y: LOAD TEMP[0].x, MEMORY[1], TEMP[0]. Correct ? And the shared mem to will take shared virtual memory addresses, just like global takes global virtual memory addresses ? That's how I see it. Good. You may have to add LOAD64/STORE64 for 64-bit addresses though. Or we could decree that all addressing on global memory shall be 64-bit (and thus read the .xy components of the address source). I would prefer to keep LOAD / STORE semantics the same as with other LOAD / STORE -s to / from 1d buffers. I think that in the end the tgsi backend for llvm will get both a 32 bit and a 64 bit mode, like the nvptx backe
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
On Mon, Feb 22, 2016 at 11:50 AM, Hans de Goede wrote: > Hi, > > > On 22-02-16 17:13, Ilia Mirkin wrote: >> >> On Mon, Feb 22, 2016 at 11:00 AM, Ilia Mirkin >> wrote: >>> >>> On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede >>> wrote: > > But assuming I'm right, what I'm proposing is that instead of passing > the input in as a global buffer, to instead pass it in as a const > buffer. As such instead of sticking it into ->set_global_binding, > you'd stick it into ->set_constant_buffer, and then you'll be able to > refer to it as CONST[0], CONST[1], etc. (Which are, implicitly, > CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim when > it's 0.) You don't even have to load these, you can use them as args > directly anywhere you like (except as indirect addresses). > > The old code would actually take the supplied inputs, stick them into > a constbuf, and then lower RINPUT accesses to load from that constbuf. > I'm suggesting we cut out the middleman. > > By the way, another term for "constant buffer" is "uniform buffer", on > the off chance it helps. Basically it's super-cached by the shader for > values that never change across shader invocations. [And there's > special stuff in the hw to allow running multiple sets of shader > invocations with different "constant" values... or so we think.] I'm fine with using constant buffers for the input, it is not the mechanism I'm worried about it is the tgsi syntax to express things, I think it would be beneficial for the tgsi syntax to be abstract, and not worry about the underlying mechanism, this will i.e. allow us to use shared memory for input on tesla and const bufs on later generations without the part generating the tgsi code needing to worry about this. >>> >>> >>> Yeah, I think you're right. I didn't realize that tesla had a special >>> form of input for user params, I assumed it was just the usual thing. >>> So forget about constbufs, go with the INPUT thing. Which is great, >>> since we had one value left over in that (future) 2-bit field :) >>> ### Somewhat unrelated to the input problem, I'm also somewhat worried about the addressing method for MEMORY type registers. Looking at the old RES stuff then the "index" passed into say a LOAD was not as much an index as it was simply a 32 bit GPU virtual memory address, which fits well with the OpenCL ways of doing things (the register number as in the 55 in RES[55] was more or less ignored). Where as, e.g. the new BUFFER style "registers" the index really is an index, e.g. doing: LOAD TEMP[0].x, BUFFER[0], IMM[0] resp. LOAD TEMP[0].x, BUFFER[1], IMM[0] Will read from a different memory address, correct ? >>> >>> >>> Correct -- BUFFER[0] refers to the buffer at binding point 0, and >>> BUFFER[1] refers to the buffer at binding point 1. They might, in >>> fact, overlap, or even be the same buffer. But the code doesn't know >>> about that. > > > Ack. > So how will this work for MEMORY type registers ? For OpenCL having the 1-dimensional behavior of RES really is quite useful, and having the address be composed of a hidden base address which gets determined under the hood from the register number, and then adding an index on top of it does not fit so well. >>> >>> >>> Not sure what the question is... you have code like >>> >>> int *foo = [pointer value from user input]; >>> *foo = *(foo + 5); >>> >>> right? >>> >>> So that'd just become >>> >>> MOV TEMP[0].x, >>> ADD TEMP[0].y, TEMP[0].x, 5 * 4 >>> LOAD TEMP[1].x, MEMORY[0] (which is global), TEMP[0].y >>> STORE MEMORY[0], TEMP[0].x, TEMP[1].x >>> >>> or perhaps I'm misunderstanding something? >>> >>> MEMORY, GLOBAL == the global virtual memory address space, not some >>> specific buffer. Trying to load address 0 from it will likely lead to >>> sadness, unless you happen to have something mapped there. BUFFER has >>> an implied base address, based on the binding point, but MEMORY has no >>> such thing. > > > OK, that answers my questions / worries, I was worried that MEMORY > too would have an implied base address, which would more or less only > get in the way with opencl, but if the memory register file takes > a virtual memory address as second operand to LOAD then I'm happy. > > So I guess that if we mix in say TGSI-shared / OpenCL-local memory > them I would do: > > DCL MEMORY[0], GLOBAL > DCL MEMORY[1], SHARED > > And then to load something from global mem at offset TEMP[0].y: > > LOAD TEMP[0].x, MEMORY[0], TEMP[0]. > > And to load something from the shared mem at offset TEMP[0].y: > > LOAD TEMP[0].x, MEMORY[1], TEMP[0]. > > Correct ? And the shared mem to will take shared virtual memory > addresses, just like global takes global virtual memory > addresses ? That's how I see it. You may have to
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
Hi, On 22-02-16 17:13, Ilia Mirkin wrote: On Mon, Feb 22, 2016 at 11:00 AM, Ilia Mirkin wrote: On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede wrote: But assuming I'm right, what I'm proposing is that instead of passing the input in as a global buffer, to instead pass it in as a const buffer. As such instead of sticking it into ->set_global_binding, you'd stick it into ->set_constant_buffer, and then you'll be able to refer to it as CONST[0], CONST[1], etc. (Which are, implicitly, CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim when it's 0.) You don't even have to load these, you can use them as args directly anywhere you like (except as indirect addresses). The old code would actually take the supplied inputs, stick them into a constbuf, and then lower RINPUT accesses to load from that constbuf. I'm suggesting we cut out the middleman. By the way, another term for "constant buffer" is "uniform buffer", on the off chance it helps. Basically it's super-cached by the shader for values that never change across shader invocations. [And there's special stuff in the hw to allow running multiple sets of shader invocations with different "constant" values... or so we think.] I'm fine with using constant buffers for the input, it is not the mechanism I'm worried about it is the tgsi syntax to express things, I think it would be beneficial for the tgsi syntax to be abstract, and not worry about the underlying mechanism, this will i.e. allow us to use shared memory for input on tesla and const bufs on later generations without the part generating the tgsi code needing to worry about this. Yeah, I think you're right. I didn't realize that tesla had a special form of input for user params, I assumed it was just the usual thing. So forget about constbufs, go with the INPUT thing. Which is great, since we had one value left over in that (future) 2-bit field :) ### Somewhat unrelated to the input problem, I'm also somewhat worried about the addressing method for MEMORY type registers. Looking at the old RES stuff then the "index" passed into say a LOAD was not as much an index as it was simply a 32 bit GPU virtual memory address, which fits well with the OpenCL ways of doing things (the register number as in the 55 in RES[55] was more or less ignored). Where as, e.g. the new BUFFER style "registers" the index really is an index, e.g. doing: LOAD TEMP[0].x, BUFFER[0], IMM[0] resp. LOAD TEMP[0].x, BUFFER[1], IMM[0] Will read from a different memory address, correct ? Correct -- BUFFER[0] refers to the buffer at binding point 0, and BUFFER[1] refers to the buffer at binding point 1. They might, in fact, overlap, or even be the same buffer. But the code doesn't know about that. Ack. So how will this work for MEMORY type registers ? For OpenCL having the 1-dimensional behavior of RES really is quite useful, and having the address be composed of a hidden base address which gets determined under the hood from the register number, and then adding an index on top of it does not fit so well. Not sure what the question is... you have code like int *foo = [pointer value from user input]; *foo = *(foo + 5); right? So that'd just become MOV TEMP[0].x, ADD TEMP[0].y, TEMP[0].x, 5 * 4 LOAD TEMP[1].x, MEMORY[0] (which is global), TEMP[0].y STORE MEMORY[0], TEMP[0].x, TEMP[1].x or perhaps I'm misunderstanding something? MEMORY, GLOBAL == the global virtual memory address space, not some specific buffer. Trying to load address 0 from it will likely lead to sadness, unless you happen to have something mapped there. BUFFER has an implied base address, based on the binding point, but MEMORY has no such thing. OK, that answers my questions / worries, I was worried that MEMORY too would have an implied base address, which would more or less only get in the way with opencl, but if the memory register file takes a virtual memory address as second operand to LOAD then I'm happy. So I guess that if we mix in say TGSI-shared / OpenCL-local memory them I would do: DCL MEMORY[0], GLOBAL DCL MEMORY[1], SHARED And then to load something from global mem at offset TEMP[0].y: LOAD TEMP[0].x, MEMORY[0], TEMP[0]. And to load something from the shared mem at offset TEMP[0].y: LOAD TEMP[0].x, MEMORY[1], TEMP[0]. Correct ? And the shared mem to will take shared virtual memory addresses, just like global takes global virtual memory addresses ? Another way of looking at it is that instead of having the hacky RES[12345] being hardcoded to mean something special, you now have a dedicated file called 'MEMORY', which has identical semantics. I'm all for getting rid of the RES[12345] hack :) I guess where you write "you now have a dedicated file called 'MEMORY'" You mean up to X dedicated MEMORY[#] files, one for each of GLOBAL, SHARED and LOCAL at least, and probably as discussed one for INPUT ? This all sounds good to me, as said my worry was that MEMORY would have an implied base address li
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
On Mon, Feb 22, 2016 at 11:07 AM, Pierre Moreau wrote: >> On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede >> wrote: >> >> But assuming I'm right, what I'm proposing is that instead of >> >> passing >> >> the input in as a global buffer, to instead pass it in as a const >> >> buffer. As such instead of sticking it into ->set_global_binding, >> >> you'd stick it into ->set_constant_buffer, and then you'll be able >> >> to >> >> refer to it as CONST[0], CONST[1], etc. (Which are, implicitly, >> >> CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim >> >> when >> >> it's 0.) You don't even have to load these, you can use them as >> >> args >> >> directly anywhere you like (except as indirect addresses). >> >> >> >> The old code would actually take the supplied inputs, stick them >> >> into >> >> a constbuf, and then lower RINPUT accesses to load from that >> >> constbuf. >> >> I'm suggesting we cut out the middleman. >> >> >> >> By the way, another term for "constant buffer" is "uniform >> >> buffer", on >> >> the off chance it helps. Basically it's super-cached by the shader >> >> for >> >> values that never change across shader invocations. [And there's >> >> special stuff in the hw to allow running multiple sets of shader >> >> invocations with different "constant" values... or so we think.] >> > >> > >> > I'm fine with using constant buffers for the input, it is not the >> > mechanism I'm worried about it is the tgsi syntax to express >> > things, >> > I think it would be beneficial for the tgsi syntax to be abstract, >> > and >> > not worry about the underlying mechanism, this will i.e. allow us >> > to use shared memory for input on tesla and const bufs on later >> > generations >> > without the part generating the tgsi code needing to worry about >> > this. >> >> Yeah, I think you're right. I didn't realize that tesla had a special >> form of input for user params, I assumed it was just the usual thing. >> So forget about constbufs, go with the INPUT thing. Which is great, >> since we had one value left over in that (future) 2-bit field :) > > I can have a try at using constbufs for user inputs on Tesla. It's just > that the blob was using shared for them, so we kept shared. I suspect there are hw advantages to doing it the way it was being done. We should keep doing that. ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
On Mon, Feb 22, 2016 at 11:00 AM, Ilia Mirkin wrote: > On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede wrote: >>> But assuming I'm right, what I'm proposing is that instead of passing >>> the input in as a global buffer, to instead pass it in as a const >>> buffer. As such instead of sticking it into ->set_global_binding, >>> you'd stick it into ->set_constant_buffer, and then you'll be able to >>> refer to it as CONST[0], CONST[1], etc. (Which are, implicitly, >>> CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim when >>> it's 0.) You don't even have to load these, you can use them as args >>> directly anywhere you like (except as indirect addresses). >>> >>> The old code would actually take the supplied inputs, stick them into >>> a constbuf, and then lower RINPUT accesses to load from that constbuf. >>> I'm suggesting we cut out the middleman. >>> >>> By the way, another term for "constant buffer" is "uniform buffer", on >>> the off chance it helps. Basically it's super-cached by the shader for >>> values that never change across shader invocations. [And there's >>> special stuff in the hw to allow running multiple sets of shader >>> invocations with different "constant" values... or so we think.] >> >> >> I'm fine with using constant buffers for the input, it is not the >> mechanism I'm worried about it is the tgsi syntax to express things, >> I think it would be beneficial for the tgsi syntax to be abstract, and >> not worry about the underlying mechanism, this will i.e. allow us >> to use shared memory for input on tesla and const bufs on later generations >> without the part generating the tgsi code needing to worry about this. > > Yeah, I think you're right. I didn't realize that tesla had a special > form of input for user params, I assumed it was just the usual thing. > So forget about constbufs, go with the INPUT thing. Which is great, > since we had one value left over in that (future) 2-bit field :) > >> >> ### >> >> Somewhat unrelated to the input problem, I'm also somewhat worried >> about the addressing method for MEMORY type registers. >> >> Looking at the old RES stuff then the "index" passed into say a LOAD >> was not as much an index as it was simply a 32 bit GPU virtual memory >> address, which fits well with the OpenCL ways of doing things (the >> register number as in the 55 in RES[55] was more or less ignored). >> >> Where as, e.g. the new BUFFER style "registers" the index really >> is an index, e.g. doing: >> LOAD TEMP[0].x, BUFFER[0], IMM[0] >> resp. >> LOAD TEMP[0].x, BUFFER[1], IMM[0] >> >> Will read from a different memory address, correct ? > > Correct -- BUFFER[0] refers to the buffer at binding point 0, and > BUFFER[1] refers to the buffer at binding point 1. They might, in > fact, overlap, or even be the same buffer. But the code doesn't know > about that. > >> >> So how will this work for MEMORY type registers ? For OpenCL having the >> 1-dimensional behavior of RES really is quite useful, and having the >> address be composed of a hidden base address which gets determined under >> the hood from the register number, and then adding an index on top of >> it does not fit so well. > > Not sure what the question is... you have code like > > int *foo = [pointer value from user input]; > *foo = *(foo + 5); > > right? > > So that'd just become > > MOV TEMP[0].x, > ADD TEMP[0].y, TEMP[0].x, 5 * 4 > LOAD TEMP[1].x, MEMORY[0] (which is global), TEMP[0].y > STORE MEMORY[0], TEMP[0].x, TEMP[1].x > > or perhaps I'm misunderstanding something? > > MEMORY, GLOBAL == the global virtual memory address space, not some > specific buffer. Trying to load address 0 from it will likely lead to > sadness, unless you happen to have something mapped there. BUFFER has > an implied base address, based on the binding point, but MEMORY has no > such thing. Another way of looking at it is that instead of having the hacky RES[12345] being hardcoded to mean something special, you now have a dedicated file called 'MEMORY', which has identical semantics. -ilia ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
- Mail original - > On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede > wrote: > >> But assuming I'm right, what I'm proposing is that instead of > >> passing > >> the input in as a global buffer, to instead pass it in as a const > >> buffer. As such instead of sticking it into ->set_global_binding, > >> you'd stick it into ->set_constant_buffer, and then you'll be able > >> to > >> refer to it as CONST[0], CONST[1], etc. (Which are, implicitly, > >> CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim > >> when > >> it's 0.) You don't even have to load these, you can use them as > >> args > >> directly anywhere you like (except as indirect addresses). > >> > >> The old code would actually take the supplied inputs, stick them > >> into > >> a constbuf, and then lower RINPUT accesses to load from that > >> constbuf. > >> I'm suggesting we cut out the middleman. > >> > >> By the way, another term for "constant buffer" is "uniform > >> buffer", on > >> the off chance it helps. Basically it's super-cached by the shader > >> for > >> values that never change across shader invocations. [And there's > >> special stuff in the hw to allow running multiple sets of shader > >> invocations with different "constant" values... or so we think.] > > > > > > I'm fine with using constant buffers for the input, it is not the > > mechanism I'm worried about it is the tgsi syntax to express > > things, > > I think it would be beneficial for the tgsi syntax to be abstract, > > and > > not worry about the underlying mechanism, this will i.e. allow us > > to use shared memory for input on tesla and const bufs on later > > generations > > without the part generating the tgsi code needing to worry about > > this. > > Yeah, I think you're right. I didn't realize that tesla had a special > form of input for user params, I assumed it was just the usual thing. > So forget about constbufs, go with the INPUT thing. Which is great, > since we had one value left over in that (future) 2-bit field :) I can have a try at using constbufs for user inputs on Tesla. It's just that the blob was using shared for them, so we kept shared. Pierre > > > > > ### > > > > Somewhat unrelated to the input problem, I'm also somewhat worried > > about the addressing method for MEMORY type registers. > > > > Looking at the old RES stuff then the "index" passed into say a > > LOAD > > was not as much an index as it was simply a 32 bit GPU virtual > > memory > > address, which fits well with the OpenCL ways of doing things (the > > register number as in the 55 in RES[55] was more or less ignored). > > > > Where as, e.g. the new BUFFER style "registers" the index really > > is an index, e.g. doing: > > LOAD TEMP[0].x, BUFFER[0], IMM[0] > > resp. > > LOAD TEMP[0].x, BUFFER[1], IMM[0] > > > > Will read from a different memory address, correct ? > > Correct -- BUFFER[0] refers to the buffer at binding point 0, and > BUFFER[1] refers to the buffer at binding point 1. They might, in > fact, overlap, or even be the same buffer. But the code doesn't know > about that. > > > > > So how will this work for MEMORY type registers ? For OpenCL having > > the > > 1-dimensional behavior of RES really is quite useful, and having > > the > > address be composed of a hidden base address which gets determined > > under > > the hood from the register number, and then adding an index on top > > of > > it does not fit so well. > > Not sure what the question is... you have code like > > int *foo = [pointer value from user input]; > *foo = *(foo + 5); > > right? > > So that'd just become > > MOV TEMP[0].x, > ADD TEMP[0].y, TEMP[0].x, 5 * 4 > LOAD TEMP[1].x, MEMORY[0] (which is global), TEMP[0].y > STORE MEMORY[0], TEMP[0].x, TEMP[1].x > > or perhaps I'm misunderstanding something? > > MEMORY, GLOBAL == the global virtual memory address space, not some > specific buffer. Trying to load address 0 from it will likely lead to > sadness, unless you happen to have something mapped there. BUFFER has > an implied base address, based on the binding point, but MEMORY has > no > such thing. > > -ilia > ___ > Nouveau mailing list > Nouveau@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau > ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede wrote: >> But assuming I'm right, what I'm proposing is that instead of passing >> the input in as a global buffer, to instead pass it in as a const >> buffer. As such instead of sticking it into ->set_global_binding, >> you'd stick it into ->set_constant_buffer, and then you'll be able to >> refer to it as CONST[0], CONST[1], etc. (Which are, implicitly, >> CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim when >> it's 0.) You don't even have to load these, you can use them as args >> directly anywhere you like (except as indirect addresses). >> >> The old code would actually take the supplied inputs, stick them into >> a constbuf, and then lower RINPUT accesses to load from that constbuf. >> I'm suggesting we cut out the middleman. >> >> By the way, another term for "constant buffer" is "uniform buffer", on >> the off chance it helps. Basically it's super-cached by the shader for >> values that never change across shader invocations. [And there's >> special stuff in the hw to allow running multiple sets of shader >> invocations with different "constant" values... or so we think.] > > > I'm fine with using constant buffers for the input, it is not the > mechanism I'm worried about it is the tgsi syntax to express things, > I think it would be beneficial for the tgsi syntax to be abstract, and > not worry about the underlying mechanism, this will i.e. allow us > to use shared memory for input on tesla and const bufs on later generations > without the part generating the tgsi code needing to worry about this. Yeah, I think you're right. I didn't realize that tesla had a special form of input for user params, I assumed it was just the usual thing. So forget about constbufs, go with the INPUT thing. Which is great, since we had one value left over in that (future) 2-bit field :) > > ### > > Somewhat unrelated to the input problem, I'm also somewhat worried > about the addressing method for MEMORY type registers. > > Looking at the old RES stuff then the "index" passed into say a LOAD > was not as much an index as it was simply a 32 bit GPU virtual memory > address, which fits well with the OpenCL ways of doing things (the > register number as in the 55 in RES[55] was more or less ignored). > > Where as, e.g. the new BUFFER style "registers" the index really > is an index, e.g. doing: > LOAD TEMP[0].x, BUFFER[0], IMM[0] > resp. > LOAD TEMP[0].x, BUFFER[1], IMM[0] > > Will read from a different memory address, correct ? Correct -- BUFFER[0] refers to the buffer at binding point 0, and BUFFER[1] refers to the buffer at binding point 1. They might, in fact, overlap, or even be the same buffer. But the code doesn't know about that. > > So how will this work for MEMORY type registers ? For OpenCL having the > 1-dimensional behavior of RES really is quite useful, and having the > address be composed of a hidden base address which gets determined under > the hood from the register number, and then adding an index on top of > it does not fit so well. Not sure what the question is... you have code like int *foo = [pointer value from user input]; *foo = *(foo + 5); right? So that'd just become MOV TEMP[0].x, ADD TEMP[0].y, TEMP[0].x, 5 * 4 LOAD TEMP[1].x, MEMORY[0] (which is global), TEMP[0].y STORE MEMORY[0], TEMP[0].x, TEMP[1].x or perhaps I'm misunderstanding something? MEMORY, GLOBAL == the global virtual memory address space, not some specific buffer. Trying to load address 0 from it will likely lead to sadness, unless you happen to have something mapped there. BUFFER has an implied base address, based on the binding point, but MEMORY has no such thing. -ilia ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
Hi, On 22-02-16 16:24, Ilia Mirkin wrote: On Mon, Feb 22, 2016 at 9:50 AM, Hans de Goede wrote: Hi, On 22-02-16 15:22, Ilia Mirkin wrote: On Mon, Feb 22, 2016 at 9:17 AM, Hans de Goede wrote: Hi, On 22-02-16 14:47, Ilia Mirkin wrote: On Mon, Feb 22, 2016 at 8:45 AM, Ilia Mirkin wrote: INPUT is for shader inputs which come from fixed function loaders. This is not what you want. You want CONST. Stick the input params into a constbuf, and you're done. Oh, and in case it's not clear, I think this should be done by the st, not by the driver. Not a big fan of the current interface where the driver is responsible for uploading the kernel input parameters. Moving this to the state-tracker will likely break clover for amd cards, also what is the right place to stick the input params my Not really. Could just have a PIPE_CAP. Could make it part of the whole TGSI logic in the first place. This functionality isn't used for the OpenGL compute shader, and it'd be nice to bring OpenCL in line with the OpenGL stuff. differ from one gpu to the other, also the opencl -> tgsi compiler will need to know how to "address" input params without it needing to know too much details of the targetted gpu. So of INPUT is not suitable, then I think we are going to need MEMORY[#], INPUT for this, which nouveau can then just treat as CONST. That doesn't make sense... MEMORY is for... memory. Perhaps there's something I don't understand about kernel inputs that would make my suggestion unworkable? The way I see it is that you stick them into a user constbuf (i.e. pipe->set_constant_buffer({cb.user = pointer to your thing})), and then launch the grid. Your TGSI would have been composed such that it would expect the user params to show up in the first constbuf. Keep in mind the flat address space issue opencl has, there is no such thing as the first constbuf, there is a const address space, which is shared by all the const buffers, and the kernel gets passed in offsets in the single address space to find different const buffers. I'm not saying that this cannot work with your suggestion, but it is something to keep in mind. I've not yet looked closely at const bufs, I've only been working with global buffers so far, here is how I understand those to work: -clover calls pipe->set_global_binding() before calling launch_grid() -clover has set up the uint32_t **handles pointer array to point to one uint32_t in the buffer which it is going to pass as "input" to launch_grid for each global buffer -In the tgsi code for the kernel I can get to the global buffer pointed to by the first uint32_t in the input data by doing: LOAD TEMP[#].x, RES[#], IMM[0] # IMM[0] == 0 But inputs are somehow special in OpenCL, right? In other words, you know when you want to load an input vs when you want to load not-an-input, right? Correct, the best way to view them is as them being in their own address-space, which is why I suggested using MEMORY[#], INPUT. And those inputs aren't in the flat, global memory space, they're just a list of 32-bit values (or at least expressible as such). If not, then ignore everything I've said and I'll come up with another alternative. More or less, the address-space is typically addressed with byte addresses, so the second 32 bit value is addresses with address 4, etc. But assuming I'm right, what I'm proposing is that instead of passing the input in as a global buffer, to instead pass it in as a const buffer. As such instead of sticking it into ->set_global_binding, you'd stick it into ->set_constant_buffer, and then you'll be able to refer to it as CONST[0], CONST[1], etc. (Which are, implicitly, CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim when it's 0.) You don't even have to load these, you can use them as args directly anywhere you like (except as indirect addresses). The old code would actually take the supplied inputs, stick them into a constbuf, and then lower RINPUT accesses to load from that constbuf. I'm suggesting we cut out the middleman. By the way, another term for "constant buffer" is "uniform buffer", on the off chance it helps. Basically it's super-cached by the shader for values that never change across shader invocations. [And there's special stuff in the hw to allow running multiple sets of shader invocations with different "constant" values... or so we think.] I'm fine with using constant buffers for the input, it is not the mechanism I'm worried about it is the tgsi syntax to express things, I think it would be beneficial for the tgsi syntax to be abstract, and not worry about the underlying mechanism, this will i.e. allow us to use shared memory for input on tesla and const bufs on later generations without the part generating the tgsi code needing to worry about this. ### Somewhat unrelated to the input problem, I'm also somewhat worried about the addressing method for MEMORY type registers. Looking at the old RES stuff then
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
On Mon, Feb 22, 2016 at 10:15 AM, Pierre Moreau wrote: > Apart from Tesla which uses shared memory for passing the OpenCL user params, > Fermi+ use const memory for that, and this is what Samuel's patches are doing > (at least from what I remember). Hmmm that puts a bit of a hole in my theory of "let's just use constbufs". That said, constbufs *will* work on Tesla as well. Well, in that case, perhaps a separate input memory *is* the right thing to do? Ugh :( This definitely needs to be discussed with the wider gallium/tgsi community. -ilia ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
On Mon, Feb 22, 2016 at 9:50 AM, Hans de Goede wrote: > Hi, > > > On 22-02-16 15:22, Ilia Mirkin wrote: >> >> On Mon, Feb 22, 2016 at 9:17 AM, Hans de Goede >> wrote: >>> >>> Hi, >>> >>> On 22-02-16 14:47, Ilia Mirkin wrote: On Mon, Feb 22, 2016 at 8:45 AM, Ilia Mirkin wrote: > > > INPUT is for shader inputs which come from fixed function loaders. > This is not what you want. You want CONST. Stick the input params into > a constbuf, and you're done. Oh, and in case it's not clear, I think this should be done by the st, not by the driver. Not a big fan of the current interface where the driver is responsible for uploading the kernel input parameters. >>> >>> >>> >>> Moving this to the state-tracker will likely break clover for amd >>> cards, also what is the right place to stick the input params my >> >> >> Not really. Could just have a PIPE_CAP. Could make it part of the >> whole TGSI logic in the first place. This functionality isn't used for >> the OpenGL compute shader, and it'd be nice to bring OpenCL in line >> with the OpenGL stuff. >> >>> differ from one gpu to the other, also the opencl -> tgsi compiler >>> will need to know how to "address" input params without it needing to >>> know too much details of the targetted gpu. So of INPUT is not suitable, >>> then I think we are going to need MEMORY[#], INPUT for this, which >>> nouveau can then just treat as CONST. >> >> >> That doesn't make sense... MEMORY is for... memory. Perhaps there's >> something I don't understand about kernel inputs that would make my >> suggestion unworkable? The way I see it is that you stick them into a >> user constbuf (i.e. pipe->set_constant_buffer({cb.user = pointer to >> your thing})), and then launch the grid. Your TGSI would have been >> composed such that it would expect the user params to show up in the >> first constbuf. > > > Keep in mind the flat address space issue opencl has, there is no > such thing as the first constbuf, there is a const address space, > which is shared by all the const buffers, and the kernel gets > passed in offsets in the single address space to find different > const buffers. I'm not saying that this cannot work with your suggestion, > but it is something to keep in mind. > > I've not yet looked closely at const bufs, I've only been working with > global buffers so far, here is how I understand those to work: > > -clover calls pipe->set_global_binding() before calling launch_grid() > -clover has set up the uint32_t **handles pointer array to point to > one uint32_t in the buffer which it is going to pass as "input" to > launch_grid for each global buffer > -In the tgsi code for the kernel I can get to the global buffer pointed > to by the first uint32_t in the input data by doing: > LOAD TEMP[#].x, RES[#], IMM[0] # IMM[0] == 0 But inputs are somehow special in OpenCL, right? In other words, you know when you want to load an input vs when you want to load not-an-input, right? And those inputs aren't in the flat, global memory space, they're just a list of 32-bit values (or at least expressible as such). If not, then ignore everything I've said and I'll come up with another alternative. But assuming I'm right, what I'm proposing is that instead of passing the input in as a global buffer, to instead pass it in as a const buffer. As such instead of sticking it into ->set_global_binding, you'd stick it into ->set_constant_buffer, and then you'll be able to refer to it as CONST[0], CONST[1], etc. (Which are, implicitly, CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim when it's 0.) You don't even have to load these, you can use them as args directly anywhere you like (except as indirect addresses). The old code would actually take the supplied inputs, stick them into a constbuf, and then lower RINPUT accesses to load from that constbuf. I'm suggesting we cut out the middleman. By the way, another term for "constant buffer" is "uniform buffer", on the off chance it helps. Basically it's super-cached by the shader for values that never change across shader invocations. [And there's special stuff in the hw to allow running multiple sets of shader invocations with different "constant" values... or so we think.] -ilia ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
Hello, > On 22 Feb 2016, at 15:22, Ilia Mirkin wrote: > >> On Mon, Feb 22, 2016 at 9:17 AM, Hans de Goede wrote: >> Hi, >> >>> On 22-02-16 14:47, Ilia Mirkin wrote: >>> On Mon, Feb 22, 2016 at 8:45 AM, Ilia Mirkin wrote: INPUT is for shader inputs which come from fixed function loaders. This is not what you want. You want CONST. Stick the input params into a constbuf, and you're done. >>> >>> >>> Oh, and in case it's not clear, I think this should be done by the st, >>> not by the driver. Not a big fan of the current interface where the >>> driver is responsible for uploading the kernel input parameters. >> >> >> Moving this to the state-tracker will likely break clover for amd >> cards, also what is the right place to stick the input params my > > Not really. Could just have a PIPE_CAP. Could make it part of the > whole TGSI logic in the first place. This functionality isn't used for > the OpenGL compute shader, and it'd be nice to bring OpenCL in line > with the OpenGL stuff. > >> differ from one gpu to the other, also the opencl -> tgsi compiler >> will need to know how to "address" input params without it needing to >> know too much details of the targetted gpu. So of INPUT is not suitable, >> then I think we are going to need MEMORY[#], INPUT for this, which >> nouveau can then just treat as CONST. > > That doesn't make sense... MEMORY is for... memory. Perhaps there's > something I don't understand about kernel inputs that would make my > suggestion unworkable? The way I see it is that you stick them into a > user constbuf (i.e. pipe->set_constant_buffer({cb.user = pointer to > your thing})), and then launch the grid. Your TGSI would have been > composed such that it would expect the user params to show up in the > first constbuf. Apart from Tesla which uses shared memory for passing the OpenCL user params, Fermi+ use const memory for that, and this is what Samuel's patches are doing (at least from what I remember). Regards, Pierre > > You *really* want to be using constbufs here -- they're designed for this. > > -ilia > ___ > Nouveau mailing list > Nouveau@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
Hi, On 22-02-16 15:42, Samuel Pitoiset wrote: Well the pipe_loader stuff is buggy in compute.c, I can't even create a screen object... That's sad. It fails in pipe_loader_probe() & co. It works for me on a kepler1 (GT740), at least before the dropping of the RES support some of the tests from compute.c worked. Regards, Hans ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
Hi, On 22-02-16 15:22, Ilia Mirkin wrote: On Mon, Feb 22, 2016 at 9:17 AM, Hans de Goede wrote: Hi, On 22-02-16 14:47, Ilia Mirkin wrote: On Mon, Feb 22, 2016 at 8:45 AM, Ilia Mirkin wrote: INPUT is for shader inputs which come from fixed function loaders. This is not what you want. You want CONST. Stick the input params into a constbuf, and you're done. Oh, and in case it's not clear, I think this should be done by the st, not by the driver. Not a big fan of the current interface where the driver is responsible for uploading the kernel input parameters. Moving this to the state-tracker will likely break clover for amd cards, also what is the right place to stick the input params my Not really. Could just have a PIPE_CAP. Could make it part of the whole TGSI logic in the first place. This functionality isn't used for the OpenGL compute shader, and it'd be nice to bring OpenCL in line with the OpenGL stuff. differ from one gpu to the other, also the opencl -> tgsi compiler will need to know how to "address" input params without it needing to know too much details of the targetted gpu. So of INPUT is not suitable, then I think we are going to need MEMORY[#], INPUT for this, which nouveau can then just treat as CONST. That doesn't make sense... MEMORY is for... memory. Perhaps there's something I don't understand about kernel inputs that would make my suggestion unworkable? The way I see it is that you stick them into a user constbuf (i.e. pipe->set_constant_buffer({cb.user = pointer to your thing})), and then launch the grid. Your TGSI would have been composed such that it would expect the user params to show up in the first constbuf. Keep in mind the flat address space issue opencl has, there is no such thing as the first constbuf, there is a const address space, which is shared by all the const buffers, and the kernel gets passed in offsets in the single address space to find different const buffers. I'm not saying that this cannot work with your suggestion, but it is something to keep in mind. I've not yet looked closely at const bufs, I've only been working with global buffers so far, here is how I understand those to work: -clover calls pipe->set_global_binding() before calling launch_grid() -clover has set up the uint32_t **handles pointer array to point to one uint32_t in the buffer which it is going to pass as "input" to launch_grid for each global buffer -In the tgsi code for the kernel I can get to the global buffer pointed to by the first uint32_t in the input data by doing: LOAD TEMP[#].x, RES[#], IMM[0] # IMM[0] == 0 And then using TEMP[#].x as offset into RES[#] If I understand your proposal correctly then you're suggestion that getting the offset would become: LOAD TEMP[#].x, CONST[0], IMM[0] # IMM[0] == 0 That would work fine for me, the question then becomes what do we do with const input parameters ? use CONST[1] for those ? It is also not entirely clear to me yet how things are going to work with regards to the handles returned by pipe->set_global_binding() with RES these are offsets which can be used directly to address the "RES" address-space. I assume that the intent with the MEMORY register file is that each global buffer gets its own MEMORY[#] and that to address say the first byte of the second buffer allocated one would use: LOAD TEMP[#].x, MEMORY[1], IMM[0] # IMM[0] == 0 But I cannot do that in OpenCL generated TGSI, as at the tgsi llvm backend level I've no idea about which global buffer it is, I only get a base-offset for the global buffer and all global buffers are assumed to be in one memory space, e.g. I will always be using MEMORY[0] for all the global buffers. Does this mean that the state-tracker now needs to figure out the size of all global buffers combined, and then request a single global memory pool and set the offsets which it passes as input to the kernel itself, dividing that memory pool into multiple buffers itself ? Or will there still be a way to pass "flat" addresses in tgsi ? Regards, Hans ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
Well the pipe_loader stuff is buggy in compute.c, I can't even create a screen object... That's sad. It fails in pipe_loader_probe() & co. On 02/22/2016 02:08 PM, Hans de Goede wrote: Hi, On 22-02-16 14:04, Samuel Pitoiset wrote: On 02/22/2016 01:46 PM, Hans de Goede wrote: Hi, On 22-02-16 13:41, Samuel Pitoiset wrote: Hi there, On 02/22/2016 12:26 PM, Hans de Goede wrote: So back to the problem of getting OpenCL(ish) code to work again with the recent mesa changes. For starters I would like to get: src/gallium/tests/trivial/compute.c and then the test with mask 8, test_input_global() to work again, when that is working I should be able to adjust my llvm work (and if necessary clover) to start to work again. Currently the test_input_global() test uses the following bit of TGSI code: COMP DCL SV[0], THREAD_ID[0] DCL TEMP[0], LOCAL DCL TEMP[1], LOCAL IMM UINT32 { 8, 0, 0, 0 } BGNSUB\n" UMUL TEMP[0], SV[0], IMM[0] LOAD TEMP[1].xy, RINPUT, TEMP[0] LOAD TEMP[0].x, RGLOBAL, TEMP[1]. UADD TEMP[1].x, TEMP[0], -TEMP[1] STORE RGLOBAL.x, TEMP[1]., TEMP[1] RET ENDSUB Where by RINPUT and RGLOBAL get replaces by processing the code with cpp and the following defines: #define RGLOBALRES[32767] #define RLOCAL RES[32766] #define RPRIVATE RES[32765] #define RINPUT RES[32764] If I understand how memory is supposed to work, then I would need to change the TGSI as follows: COMP DCL SV[0], THREAD_ID[0] DCL MEMORY[0] DCL TEMP[0], LOCAL DCL TEMP[1], LOCAL IMM UINT32 { 8, 0, 0, 0 } BGNSUB\n" UMUL TEMP[0], SV[0], IMM[0] LOAD TEMP[1].xy, RINPUT, TEMP[0] LOAD TEMP[0].x, MEMORY[0], TEMP[1]. UADD TEMP[1].x, TEMP[0], -TEMP[1] STORE MEMORY[0].x, TEMP[1]., TEMP[1] RET ENDSUB Nope, this won't work because RINPUT is RES[32764]. And you have to remove all occurrences to RES because it's not longer supported. In my opinion, using BUFFER[0] in a first time should work. Currently, only SHARED with MEMORY is supported. Right, as I say below "This only solves the accessing of the global memory, it does not solve getting to the kernel input kernel parameters" This assumes, that as discussed declaring memory without a , SHARED or other flag means the memory is global. So 2 questions: 1) Do the above changes for using the new MEMORY keyword look as intended to you? 2) This only solves the accessing of the global memory, it does not solve getting to the kernel input kernel parameters, how would I deal with those ? The input kernel parameters are directly passed through a call to pipe_context::launch_grid. You just have to fill the pipe_grid_info::input array with your parameters and they will be uploaded by nvXX_compute_upload_input(). Right, the uploading side I understand, the question is how to get to them from the compute kernel's tgsi code ? Right, I wonder if there is already a DECL INPUT or something like that for input parameters of shaders. Oh yeah, there is TGSI_FILE_INPUT, maybe this is what you want? Yes that sounds right, so now "all" we need to do is make nvXX_compute_upload_input() and TGSI_FILE_INPUT work together. Regards, Hans -- -Samuel ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
On Mon, Feb 22, 2016 at 9:17 AM, Hans de Goede wrote: > Hi, > > On 22-02-16 14:47, Ilia Mirkin wrote: >> >> On Mon, Feb 22, 2016 at 8:45 AM, Ilia Mirkin wrote: >>> >>> INPUT is for shader inputs which come from fixed function loaders. >>> This is not what you want. You want CONST. Stick the input params into >>> a constbuf, and you're done. >> >> >> Oh, and in case it's not clear, I think this should be done by the st, >> not by the driver. Not a big fan of the current interface where the >> driver is responsible for uploading the kernel input parameters. > > > Moving this to the state-tracker will likely break clover for amd > cards, also what is the right place to stick the input params my Not really. Could just have a PIPE_CAP. Could make it part of the whole TGSI logic in the first place. This functionality isn't used for the OpenGL compute shader, and it'd be nice to bring OpenCL in line with the OpenGL stuff. > differ from one gpu to the other, also the opencl -> tgsi compiler > will need to know how to "address" input params without it needing to > know too much details of the targetted gpu. So of INPUT is not suitable, > then I think we are going to need MEMORY[#], INPUT for this, which > nouveau can then just treat as CONST. That doesn't make sense... MEMORY is for... memory. Perhaps there's something I don't understand about kernel inputs that would make my suggestion unworkable? The way I see it is that you stick them into a user constbuf (i.e. pipe->set_constant_buffer({cb.user = pointer to your thing})), and then launch the grid. Your TGSI would have been composed such that it would expect the user params to show up in the first constbuf. You *really* want to be using constbufs here -- they're designed for this. -ilia ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
Hi, On 22-02-16 14:47, Ilia Mirkin wrote: On Mon, Feb 22, 2016 at 8:45 AM, Ilia Mirkin wrote: INPUT is for shader inputs which come from fixed function loaders. This is not what you want. You want CONST. Stick the input params into a constbuf, and you're done. Oh, and in case it's not clear, I think this should be done by the st, not by the driver. Not a big fan of the current interface where the driver is responsible for uploading the kernel input parameters. Moving this to the state-tracker will likely break clover for amd cards, also what is the right place to stick the input params my differ from one gpu to the other, also the opencl -> tgsi compiler will need to know how to "address" input params without it needing to know too much details of the targetted gpu. So of INPUT is not suitable, then I think we are going to need MEMORY[#], INPUT for this, which nouveau can then just treat as CONST. Regards, Hans ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
On Mon, Feb 22, 2016 at 8:45 AM, Ilia Mirkin wrote: > INPUT is for shader inputs which come from fixed function loaders. > This is not what you want. You want CONST. Stick the input params into > a constbuf, and you're done. Oh, and in case it's not clear, I think this should be done by the st, not by the driver. Not a big fan of the current interface where the driver is responsible for uploading the kernel input parameters. -ilia ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
On Mon, Feb 22, 2016 at 8:08 AM, Hans de Goede wrote: > Hi, > > > On 22-02-16 14:04, Samuel Pitoiset wrote: >> >> >> On 02/22/2016 01:46 PM, Hans de Goede wrote: >>> >>> Hi, >>> >>> On 22-02-16 13:41, Samuel Pitoiset wrote: Hi there, On 02/22/2016 12:26 PM, Hans de Goede wrote: >>> >>> >>> >>> > So back to the problem of getting OpenCL(ish) code to work again with > the recent mesa changes. For starters I would like to get: > > src/gallium/tests/trivial/compute.c and then the test with mask 8, > test_input_global() to work again, when that is working I should be > able to adjust my llvm work (and if necessary clover) to start to > work again. > > Currently the test_input_global() test uses the following bit of > TGSI code: > > COMP > DCL SV[0], THREAD_ID[0] > DCL TEMP[0], LOCAL > DCL TEMP[1], LOCAL > IMM UINT32 { 8, 0, 0, 0 } > > BGNSUB\n" > UMUL TEMP[0], SV[0], IMM[0] > LOAD TEMP[1].xy, RINPUT, TEMP[0] > LOAD TEMP[0].x, RGLOBAL, TEMP[1]. > UADD TEMP[1].x, TEMP[0], -TEMP[1] > STORE RGLOBAL.x, TEMP[1]., TEMP[1] > RET > ENDSUB > > > Where by RINPUT and RGLOBAL get replaces by processing the > code with cpp and the following defines: > > #define RGLOBALRES[32767] > #define RLOCAL RES[32766] > #define RPRIVATE RES[32765] > #define RINPUT RES[32764] > > If I understand how memory is supposed to work, then I would need to > change the TGSI as follows: > > COMP > DCL SV[0], THREAD_ID[0] > DCL MEMORY[0] > DCL TEMP[0], LOCAL > DCL TEMP[1], LOCAL > IMM UINT32 { 8, 0, 0, 0 } > > BGNSUB\n" > UMUL TEMP[0], SV[0], IMM[0] > LOAD TEMP[1].xy, RINPUT, TEMP[0] > LOAD TEMP[0].x, MEMORY[0], TEMP[1]. > UADD TEMP[1].x, TEMP[0], -TEMP[1] > STORE MEMORY[0].x, TEMP[1]., TEMP[1] > RET > ENDSUB Nope, this won't work because RINPUT is RES[32764]. And you have to remove all occurrences to RES because it's not longer supported. In my opinion, using BUFFER[0] in a first time should work. Currently, only SHARED with MEMORY is supported. >>> >>> >>> Right, as I say below "This only solves the accessing of the global >>> memory, it does not solve >>> getting to the kernel input kernel parameters" >>> > This assumes, that as discussed declaring memory without a , SHARED or > other > flag means the memory is global. > > So 2 questions: > > 1) Do the above changes for using the new MEMORY keyword look as > intended > to you? > > 2) This only solves the accessing of the global memory, it does not > solve > getting to the kernel input kernel parameters, how would I deal with > those ? The input kernel parameters are directly passed through a call to pipe_context::launch_grid. You just have to fill the pipe_grid_info::input array with your parameters and they will be uploaded by nvXX_compute_upload_input(). >>> >>> >>> Right, the uploading side I understand, the question is how to get to >>> them from >>> the compute kernel's tgsi code ? >> >> >> Right, I wonder if there is already a DECL INPUT or something like that >> for input parameters of shaders. Oh yeah, there is TGSI_FILE_INPUT, maybe >> this is what you want? > > > Yes that sounds right, so now "all" we need to do is make > nvXX_compute_upload_input() and TGSI_FILE_INPUT work together. INPUT is for shader inputs which come from fixed function loaders. This is not what you want. You want CONST. Stick the input params into a constbuf, and you're done. -ilia ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
Hi, On 22-02-16 14:04, Samuel Pitoiset wrote: On 02/22/2016 01:46 PM, Hans de Goede wrote: Hi, On 22-02-16 13:41, Samuel Pitoiset wrote: Hi there, On 02/22/2016 12:26 PM, Hans de Goede wrote: So back to the problem of getting OpenCL(ish) code to work again with the recent mesa changes. For starters I would like to get: src/gallium/tests/trivial/compute.c and then the test with mask 8, test_input_global() to work again, when that is working I should be able to adjust my llvm work (and if necessary clover) to start to work again. Currently the test_input_global() test uses the following bit of TGSI code: COMP DCL SV[0], THREAD_ID[0] DCL TEMP[0], LOCAL DCL TEMP[1], LOCAL IMM UINT32 { 8, 0, 0, 0 } BGNSUB\n" UMUL TEMP[0], SV[0], IMM[0] LOAD TEMP[1].xy, RINPUT, TEMP[0] LOAD TEMP[0].x, RGLOBAL, TEMP[1]. UADD TEMP[1].x, TEMP[0], -TEMP[1] STORE RGLOBAL.x, TEMP[1]., TEMP[1] RET ENDSUB Where by RINPUT and RGLOBAL get replaces by processing the code with cpp and the following defines: #define RGLOBALRES[32767] #define RLOCAL RES[32766] #define RPRIVATE RES[32765] #define RINPUT RES[32764] If I understand how memory is supposed to work, then I would need to change the TGSI as follows: COMP DCL SV[0], THREAD_ID[0] DCL MEMORY[0] DCL TEMP[0], LOCAL DCL TEMP[1], LOCAL IMM UINT32 { 8, 0, 0, 0 } BGNSUB\n" UMUL TEMP[0], SV[0], IMM[0] LOAD TEMP[1].xy, RINPUT, TEMP[0] LOAD TEMP[0].x, MEMORY[0], TEMP[1]. UADD TEMP[1].x, TEMP[0], -TEMP[1] STORE MEMORY[0].x, TEMP[1]., TEMP[1] RET ENDSUB Nope, this won't work because RINPUT is RES[32764]. And you have to remove all occurrences to RES because it's not longer supported. In my opinion, using BUFFER[0] in a first time should work. Currently, only SHARED with MEMORY is supported. Right, as I say below "This only solves the accessing of the global memory, it does not solve getting to the kernel input kernel parameters" This assumes, that as discussed declaring memory without a , SHARED or other flag means the memory is global. So 2 questions: 1) Do the above changes for using the new MEMORY keyword look as intended to you? 2) This only solves the accessing of the global memory, it does not solve getting to the kernel input kernel parameters, how would I deal with those ? The input kernel parameters are directly passed through a call to pipe_context::launch_grid. You just have to fill the pipe_grid_info::input array with your parameters and they will be uploaded by nvXX_compute_upload_input(). Right, the uploading side I understand, the question is how to get to them from the compute kernel's tgsi code ? Right, I wonder if there is already a DECL INPUT or something like that for input parameters of shaders. Oh yeah, there is TGSI_FILE_INPUT, maybe this is what you want? Yes that sounds right, so now "all" we need to do is make nvXX_compute_upload_input() and TGSI_FILE_INPUT work together. Regards, Hans ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
On 02/22/2016 01:46 PM, Hans de Goede wrote: Hi, On 22-02-16 13:41, Samuel Pitoiset wrote: Hi there, On 02/22/2016 12:26 PM, Hans de Goede wrote: So back to the problem of getting OpenCL(ish) code to work again with the recent mesa changes. For starters I would like to get: src/gallium/tests/trivial/compute.c and then the test with mask 8, test_input_global() to work again, when that is working I should be able to adjust my llvm work (and if necessary clover) to start to work again. Currently the test_input_global() test uses the following bit of TGSI code: COMP DCL SV[0], THREAD_ID[0] DCL TEMP[0], LOCAL DCL TEMP[1], LOCAL IMM UINT32 { 8, 0, 0, 0 } BGNSUB\n" UMUL TEMP[0], SV[0], IMM[0] LOAD TEMP[1].xy, RINPUT, TEMP[0] LOAD TEMP[0].x, RGLOBAL, TEMP[1]. UADD TEMP[1].x, TEMP[0], -TEMP[1] STORE RGLOBAL.x, TEMP[1]., TEMP[1] RET ENDSUB Where by RINPUT and RGLOBAL get replaces by processing the code with cpp and the following defines: #define RGLOBALRES[32767] #define RLOCAL RES[32766] #define RPRIVATE RES[32765] #define RINPUT RES[32764] If I understand how memory is supposed to work, then I would need to change the TGSI as follows: COMP DCL SV[0], THREAD_ID[0] DCL MEMORY[0] DCL TEMP[0], LOCAL DCL TEMP[1], LOCAL IMM UINT32 { 8, 0, 0, 0 } BGNSUB\n" UMUL TEMP[0], SV[0], IMM[0] LOAD TEMP[1].xy, RINPUT, TEMP[0] LOAD TEMP[0].x, MEMORY[0], TEMP[1]. UADD TEMP[1].x, TEMP[0], -TEMP[1] STORE MEMORY[0].x, TEMP[1]., TEMP[1] RET ENDSUB Nope, this won't work because RINPUT is RES[32764]. And you have to remove all occurrences to RES because it's not longer supported. In my opinion, using BUFFER[0] in a first time should work. Currently, only SHARED with MEMORY is supported. Right, as I say below "This only solves the accessing of the global memory, it does not solve getting to the kernel input kernel parameters" This assumes, that as discussed declaring memory without a , SHARED or other flag means the memory is global. So 2 questions: 1) Do the above changes for using the new MEMORY keyword look as intended to you? 2) This only solves the accessing of the global memory, it does not solve getting to the kernel input kernel parameters, how would I deal with those ? The input kernel parameters are directly passed through a call to pipe_context::launch_grid. You just have to fill the pipe_grid_info::input array with your parameters and they will be uploaded by nvXX_compute_upload_input(). Right, the uploading side I understand, the question is how to get to them from the compute kernel's tgsi code ? Right, I wonder if there is already a DECL INPUT or something like that for input parameters of shaders. Oh yeah, there is TGSI_FILE_INPUT, maybe this is what you want? If I understand you correctly you are suggesting to use BUFFER[0] for this, that is fine from a nouveau point-of-view, but might be a bit nouveau centric way of looking at things, I think a better approach would be a separate input register-file for this, as that will be more flexible when people try to do opencl via clang->llvm->tgsi on other GPUs. I will have a look at the test_input_global(). Thanks! Regards, Hans -- -Samuel ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
Hi, On 22-02-16 13:41, Samuel Pitoiset wrote: Hi there, On 02/22/2016 12:26 PM, Hans de Goede wrote: So back to the problem of getting OpenCL(ish) code to work again with the recent mesa changes. For starters I would like to get: src/gallium/tests/trivial/compute.c and then the test with mask 8, test_input_global() to work again, when that is working I should be able to adjust my llvm work (and if necessary clover) to start to work again. Currently the test_input_global() test uses the following bit of TGSI code: COMP DCL SV[0], THREAD_ID[0] DCL TEMP[0], LOCAL DCL TEMP[1], LOCAL IMM UINT32 { 8, 0, 0, 0 } BGNSUB\n" UMUL TEMP[0], SV[0], IMM[0] LOAD TEMP[1].xy, RINPUT, TEMP[0] LOAD TEMP[0].x, RGLOBAL, TEMP[1]. UADD TEMP[1].x, TEMP[0], -TEMP[1] STORE RGLOBAL.x, TEMP[1]., TEMP[1] RET ENDSUB Where by RINPUT and RGLOBAL get replaces by processing the code with cpp and the following defines: #define RGLOBALRES[32767] #define RLOCAL RES[32766] #define RPRIVATE RES[32765] #define RINPUT RES[32764] If I understand how memory is supposed to work, then I would need to change the TGSI as follows: COMP DCL SV[0], THREAD_ID[0] DCL MEMORY[0] DCL TEMP[0], LOCAL DCL TEMP[1], LOCAL IMM UINT32 { 8, 0, 0, 0 } BGNSUB\n" UMUL TEMP[0], SV[0], IMM[0] LOAD TEMP[1].xy, RINPUT, TEMP[0] LOAD TEMP[0].x, MEMORY[0], TEMP[1]. UADD TEMP[1].x, TEMP[0], -TEMP[1] STORE MEMORY[0].x, TEMP[1]., TEMP[1] RET ENDSUB Nope, this won't work because RINPUT is RES[32764]. And you have to remove all occurrences to RES because it's not longer supported. In my opinion, using BUFFER[0] in a first time should work. Currently, only SHARED with MEMORY is supported. Right, as I say below "This only solves the accessing of the global memory, it does not solve getting to the kernel input kernel parameters" This assumes, that as discussed declaring memory without a , SHARED or other flag means the memory is global. So 2 questions: 1) Do the above changes for using the new MEMORY keyword look as intended to you? 2) This only solves the accessing of the global memory, it does not solve getting to the kernel input kernel parameters, how would I deal with those ? The input kernel parameters are directly passed through a call to pipe_context::launch_grid. You just have to fill the pipe_grid_info::input array with your parameters and they will be uploaded by nvXX_compute_upload_input(). Right, the uploading side I understand, the question is how to get to them from the compute kernel's tgsi code ? If I understand you correctly you are suggesting to use BUFFER[0] for this, that is fine from a nouveau point-of-view, but might be a bit nouveau centric way of looking at things, I think a better approach would be a separate input register-file for this, as that will be more flexible when people try to do opencl via clang->llvm->tgsi on other GPUs. I will have a look at the test_input_global(). Thanks! Regards, Hans ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
Hi there, On 02/22/2016 12:26 PM, Hans de Goede wrote: Hi, On 19-02-16 20:43, Ilia Mirkin wrote: On Fri, Feb 19, 2016 at 5:36 AM, Hans de Goede wrote: Hi, On 18-02-16 17:39, Ilia Mirkin wrote: On Thu, Feb 18, 2016 at 9:45 AM, Hans de Goede wrote: But this does not seem to be hooked up yet for nouveau. Samuel has patches. See https://cgit.freedesktop.org/~hakzsam/mesa/log/?h=arb_compute_shader_v3 Cool, I will take a look at those. So some questions: -The commit by Samual says: This introduces TGSI_FILE_MEMORY for shared, global and local memory. Only shared memory is currently supported. The commit introduces MEMORY[x] and MEMORY[x],SHARED so in reality it also introduces a second option next to shared, so what are we going to use plain MEMORY[x] for? I suggest using it for global memory but we need to be in agreement on this. That sounds fine to me. However what I had in mind was switching the SHARED field into a 2-bit field and making it 1 = SHARED 2 = GLOBAL 3 = LOCAL (since for OpenCL you also need to be able to address local or private memory). I sorta wanted Samuel to do it, but since I had no idea where you were at, or if you were even still working on this, I figured it should be fixed up by the first person who needed it. Sounds good, only the naming is somewhat unfortunate since opencl uses different naming. I.e. it has no "shared" Sad. Well, "shared" is what OpenGL compute shaders use, which is why I proposed it. OpenCL has: -global: accessible by all worker-groups as well as by the CPU -const: read-only global -local: shared by worker-items in the same worker-group, not shared between worker-groups -private: accessible only to a single worker-item So how do these map to the TGSI: 1 = SHARED 2 = GLOBAL 3 = LOCAL OpenCL global = TGSI global OpenCL const = TGSI global OpenCL local = TGSI shared OpenCL private = TGSI local Not sure what the distinction is between OpenCL const and global is. If the const stuff is actually just user-supplied uniforms (and doesn't need to be in a particular place in memory), then those should go into TGSI CONST somehow. AFAIK OpenCL const is really read-only global, so the data is filled in by the CPU, then passed to the opencl-kernel running on the GPU where all worker-items have access to it. I think that TGSI CONST might indeed be usable for this, but it is probably easiest to treat it as GLOBAL for now. -What about kernel input parameters, so far these have been using RES[32764] I must admit that I do not understand where the file_index of 32764 comes from (or where any of the file indexes come from for src/gallium/tests/trivial/compute.c ? I have the feeling that these are not used at all, and everything simply goes to a flat (virtual) memory space, with the params at address 0, correct ? It was never particularly well-specified, which was one of the reasons it went away. It also didn't map nicely onto the OpenGL model. There is a remaining question of how to do addressing in memory... there's 40 bits of address space. Should these implicitly be U64 (dual-component in TGSI) addresses that are passed around? Not sure what the OpenCL position on all this is. So far I've been using U32 for addresses as that is what Francisco's original code was using. And this also is what things like the tgsi LOAD instruction take. If you're doing a LOAD on a 1d buffer then you will use TEMP[#].x to specify the index, and the way how this currently works with OpenCL is that clCreateBuffer() will return a cl_mem type which then gets passed into the kernel as input parameter and gets treated as a pointer by the compiler, so e.g. global mem gets treated as a single address space even if there are multiple global buffers and TEMP[#].x contains the value passed in via cl_mem as start offset for the buffer + the index into the buffer. So this means that currently we are limited to U32 since TEMP[#].x is only 32 bits wide. Internally 40 bits addresses can and should probably be used so that at least the different memory spaces each have the full 32 bits available. Note that we could fix this by adding some sort of LOAD64 opcode, which uses TEMP[#].x and TEMP[#].y as address for 1d buffers, I'm not sure how this would work for 3d buffers though. I foresee the llvm backend eventually getting a 64 bit mode where it will use 64 bits for all pointers and use something like a LOAD64 opcode to indicate that the indexes (which it effectively uses as addresses / pointers) are 64 bit wide. Well, this LOAD64/STORE64 would just only be defined for MEMORY[] src/dest, so you don't need to worry about 3d or anything like that. I believe this is a good solution to the problem. Right for MEMORY this will work fine. I've no clue yet how images will work with OpenCL though, hopefully we can avoid the one flat address space thing there, but I simply don't know yet. Also it would be highly preferable to avoid using GLOBAL a
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
Hi, On 19-02-16 20:43, Ilia Mirkin wrote: On Fri, Feb 19, 2016 at 5:36 AM, Hans de Goede wrote: Hi, On 18-02-16 17:39, Ilia Mirkin wrote: On Thu, Feb 18, 2016 at 9:45 AM, Hans de Goede wrote: But this does not seem to be hooked up yet for nouveau. Samuel has patches. See https://cgit.freedesktop.org/~hakzsam/mesa/log/?h=arb_compute_shader_v3 Cool, I will take a look at those. So some questions: -The commit by Samual says: This introduces TGSI_FILE_MEMORY for shared, global and local memory. Only shared memory is currently supported. The commit introduces MEMORY[x] and MEMORY[x],SHARED so in reality it also introduces a second option next to shared, so what are we going to use plain MEMORY[x] for? I suggest using it for global memory but we need to be in agreement on this. That sounds fine to me. However what I had in mind was switching the SHARED field into a 2-bit field and making it 1 = SHARED 2 = GLOBAL 3 = LOCAL (since for OpenCL you also need to be able to address local or private memory). I sorta wanted Samuel to do it, but since I had no idea where you were at, or if you were even still working on this, I figured it should be fixed up by the first person who needed it. Sounds good, only the naming is somewhat unfortunate since opencl uses different naming. I.e. it has no "shared" Sad. Well, "shared" is what OpenGL compute shaders use, which is why I proposed it. OpenCL has: -global: accessible by all worker-groups as well as by the CPU -const: read-only global -local: shared by worker-items in the same worker-group, not shared between worker-groups -private: accessible only to a single worker-item So how do these map to the TGSI: 1 = SHARED 2 = GLOBAL 3 = LOCAL OpenCL global = TGSI global OpenCL const = TGSI global OpenCL local = TGSI shared OpenCL private = TGSI local Not sure what the distinction is between OpenCL const and global is. If the const stuff is actually just user-supplied uniforms (and doesn't need to be in a particular place in memory), then those should go into TGSI CONST somehow. AFAIK OpenCL const is really read-only global, so the data is filled in by the CPU, then passed to the opencl-kernel running on the GPU where all worker-items have access to it. I think that TGSI CONST might indeed be usable for this, but it is probably easiest to treat it as GLOBAL for now. -What about kernel input parameters, so far these have been using RES[32764] I must admit that I do not understand where the file_index of 32764 comes from (or where any of the file indexes come from for src/gallium/tests/trivial/compute.c ? I have the feeling that these are not used at all, and everything simply goes to a flat (virtual) memory space, with the params at address 0, correct ? It was never particularly well-specified, which was one of the reasons it went away. It also didn't map nicely onto the OpenGL model. There is a remaining question of how to do addressing in memory... there's 40 bits of address space. Should these implicitly be U64 (dual-component in TGSI) addresses that are passed around? Not sure what the OpenCL position on all this is. So far I've been using U32 for addresses as that is what Francisco's original code was using. And this also is what things like the tgsi LOAD instruction take. If you're doing a LOAD on a 1d buffer then you will use TEMP[#].x to specify the index, and the way how this currently works with OpenCL is that clCreateBuffer() will return a cl_mem type which then gets passed into the kernel as input parameter and gets treated as a pointer by the compiler, so e.g. global mem gets treated as a single address space even if there are multiple global buffers and TEMP[#].x contains the value passed in via cl_mem as start offset for the buffer + the index into the buffer. So this means that currently we are limited to U32 since TEMP[#].x is only 32 bits wide. Internally 40 bits addresses can and should probably be used so that at least the different memory spaces each have the full 32 bits available. Note that we could fix this by adding some sort of LOAD64 opcode, which uses TEMP[#].x and TEMP[#].y as address for 1d buffers, I'm not sure how this would work for 3d buffers though. I foresee the llvm backend eventually getting a 64 bit mode where it will use 64 bits for all pointers and use something like a LOAD64 opcode to indicate that the indexes (which it effectively uses as addresses / pointers) are 64 bit wide. Well, this LOAD64/STORE64 would just only be defined for MEMORY[] src/dest, so you don't need to worry about 3d or anything like that. I believe this is a good solution to the problem. Right for MEMORY this will work fine. I've no clue yet how images will work with OpenCL though, hopefully we can avoid the one flat address space thing there, but I simply don't know yet. Also it would be highly preferable to avoid using GLOBAL at all in the first place, and just use BUFFER[] or IMAGE[
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
On Fri, Feb 19, 2016 at 5:36 AM, Hans de Goede wrote: > Hi, > > On 18-02-16 17:39, Ilia Mirkin wrote: >> >> On Thu, Feb 18, 2016 at 9:45 AM, Hans de Goede >> wrote: >>> >>> But this does not seem to be hooked up yet for nouveau. >> >> >> Samuel has patches. See >> https://cgit.freedesktop.org/~hakzsam/mesa/log/?h=arb_compute_shader_v3 > > > Cool, I will take a look at those. > >>> So some questions: >>> -The commit by Samual says: >>> This introduces TGSI_FILE_MEMORY for shared, global and local memory. >>> Only >>> shared memory is currently supported. >>> >>> The commit introduces MEMORY[x] and MEMORY[x],SHARED so in reality it >>> also >>> introduces >>> a second option next to shared, so what are we going to use plain >>> MEMORY[x] >>> for? >>> I suggest using it for global memory but we need to be in agreement on >>> this. >> >> >> That sounds fine to me. However what I had in mind was switching the >> SHARED field into a 2-bit field and making it >> >> 1 = SHARED >> 2 = GLOBAL >> 3 = LOCAL >> >> (since for OpenCL you also need to be able to address local or private >> memory). I sorta wanted Samuel to do it, but since I had no idea where >> you were at, or if you were even still working on this, I figured it >> should be fixed up by the first person who needed it. > > > Sounds good, only the naming is somewhat unfortunate since opencl uses > different > naming. I.e. it has no "shared" Sad. Well, "shared" is what OpenGL compute shaders use, which is why I proposed it. > > OpenCL has: > -global: accessible by all worker-groups as well as by the CPU > -const: read-only global > -local: shared by worker-items in the same worker-group, not shared > between worker-groups > -private: accessible only to a single worker-item > > So how do these map to the TGSI: > >> 1 = SHARED >> 2 = GLOBAL >> 3 = LOCAL OpenCL global = TGSI global OpenCL const = TGSI global OpenCL local = TGSI shared OpenCL private = TGSI local Not sure what the distinction is between OpenCL const and global is. If the const stuff is actually just user-supplied uniforms (and doesn't need to be in a particular place in memory), then those should go into TGSI CONST somehow. >> >>> >>> -What about kernel input parameters, so far these have been using >>> RES[32764] >>> I must admit that I do not understand where the file_index of 32764 >>> comes >>> from (or where any of the file indexes come from for >>> src/gallium/tests/trivial/compute.c ? >>> I have the feeling that these are not used at all, and everything >>> simply >>> goes >>> to a flat (virtual) memory space, with the params at address 0, correct >>> ? >> >> >> It was never particularly well-specified, which was one of the reasons >> it went away. It also didn't map nicely onto the OpenGL model. There >> is a remaining question of how to do addressing in memory... there's >> 40 bits of address space. Should these implicitly be U64 >> (dual-component in TGSI) addresses that are passed around? Not sure >> what the OpenCL position on all this is. > > > So far I've been using U32 for addresses as that is what Francisco's > original > code was using. And this also is what things like the tgsi LOAD instruction > take. If you're doing a LOAD on a 1d buffer then you will use TEMP[#].x to > specify the index, and the way how this currently works with OpenCL is that > clCreateBuffer() will return a cl_mem type which then gets passed into > the kernel as input parameter and gets treated as a pointer by the compiler, > so e.g. global mem gets treated as a single address space even if there > are multiple global buffers and TEMP[#].x contains the value passed in > via cl_mem as start offset for the buffer + the index into the buffer. > > So this means that currently we are limited to U32 since TEMP[#].x is > only 32 bits wide. Internally 40 bits addresses can and should probably > be used so that at least the different memory spaces each have the full > 32 bits available. > > Note that we could fix this by adding some sort of LOAD64 opcode, which > uses TEMP[#].x and TEMP[#].y as address for 1d buffers, I'm not sure > how this would work for 3d buffers though. I foresee the llvm backend > eventually getting a 64 bit mode where it will use 64 bits for all > pointers and use something like a LOAD64 opcode to indicate that > the indexes (which it effectively uses as addresses / pointers) > are 64 bit wide. Well, this LOAD64/STORE64 would just only be defined for MEMORY[] src/dest, so you don't need to worry about 3d or anything like that. I believe this is a good solution to the problem. > >> Also it would be highly preferable to avoid using GLOBAL at all in the >> first place, and just use BUFFER[] or IMAGE[] or whatever. However >> after having a look at how OpenCL works, I don't think that's >> possible. But perhaps I missed something? > > > OpenCL is very C-like and as such uses a flat address space per memory > space, getting around that is going to be very tricky, if
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
Hi, On 18-02-16 17:39, Ilia Mirkin wrote: On Thu, Feb 18, 2016 at 9:45 AM, Hans de Goede wrote: But this does not seem to be hooked up yet for nouveau. Samuel has patches. See https://cgit.freedesktop.org/~hakzsam/mesa/log/?h=arb_compute_shader_v3 Cool, I will take a look at those. So some questions: -The commit by Samual says: This introduces TGSI_FILE_MEMORY for shared, global and local memory. Only shared memory is currently supported. The commit introduces MEMORY[x] and MEMORY[x],SHARED so in reality it also introduces a second option next to shared, so what are we going to use plain MEMORY[x] for? I suggest using it for global memory but we need to be in agreement on this. That sounds fine to me. However what I had in mind was switching the SHARED field into a 2-bit field and making it 1 = SHARED 2 = GLOBAL 3 = LOCAL (since for OpenCL you also need to be able to address local or private memory). I sorta wanted Samuel to do it, but since I had no idea where you were at, or if you were even still working on this, I figured it should be fixed up by the first person who needed it. Sounds good, only the naming is somewhat unfortunate since opencl uses different naming. I.e. it has no "shared" OpenCL has: -global: accessible by all worker-groups as well as by the CPU -const: read-only global -local: shared by worker-items in the same worker-group, not shared between worker-groups -private: accessible only to a single worker-item So how do these map to the TGSI: > 1 = SHARED > 2 = GLOBAL > 3 = LOCAL ? -What about kernel input parameters, so far these have been using RES[32764] I must admit that I do not understand where the file_index of 32764 comes from (or where any of the file indexes come from for src/gallium/tests/trivial/compute.c ? I have the feeling that these are not used at all, and everything simply goes to a flat (virtual) memory space, with the params at address 0, correct ? It was never particularly well-specified, which was one of the reasons it went away. It also didn't map nicely onto the OpenGL model. There is a remaining question of how to do addressing in memory... there's 40 bits of address space. Should these implicitly be U64 (dual-component in TGSI) addresses that are passed around? Not sure what the OpenCL position on all this is. So far I've been using U32 for addresses as that is what Francisco's original code was using. And this also is what things like the tgsi LOAD instruction take. If you're doing a LOAD on a 1d buffer then you will use TEMP[#].x to specify the index, and the way how this currently works with OpenCL is that clCreateBuffer() will return a cl_mem type which then gets passed into the kernel as input parameter and gets treated as a pointer by the compiler, so e.g. global mem gets treated as a single address space even if there are multiple global buffers and TEMP[#].x contains the value passed in via cl_mem as start offset for the buffer + the index into the buffer. So this means that currently we are limited to U32 since TEMP[#].x is only 32 bits wide. Internally 40 bits addresses can and should probably be used so that at least the different memory spaces each have the full 32 bits available. Note that we could fix this by adding some sort of LOAD64 opcode, which uses TEMP[#].x and TEMP[#].y as address for 1d buffers, I'm not sure how this would work for 3d buffers though. I foresee the llvm backend eventually getting a 64 bit mode where it will use 64 bits for all pointers and use something like a LOAD64 opcode to indicate that the indexes (which it effectively uses as addresses / pointers) are 64 bit wide. Also it would be highly preferable to avoid using GLOBAL at all in the first place, and just use BUFFER[] or IMAGE[] or whatever. However after having a look at how OpenCL works, I don't think that's possible. But perhaps I missed something? OpenCL is very C-like and as such uses a flat address space per memory space, getting around that is going to be very tricky, if possible at all. This seems like a nvidia implementation detail though, and we should IMHO still implement a specific TGSI register file for accessing kernel input params, agreed ? I believe that kernel input params should be passed in via user uniforms. i.e. CONST. It doesn't have to be a separate UBO, it can use the user buffer thing, which is basically what it is now. If we can agree on the TGSI syntax for all this, then I'm pretty sure I can hack something together to get things to work with the latest master again, and then we can use this as a starting point for properly implementing this. Regards, Hans p.s. I must say I was somewhat unpleasantly surprised by the RES breakage, yes I know some changes were coming but still. In the future when my llvm->tgsi work is more stable and I'll be pushing it to llvm upstream we really cannot make breaking tgsi changes like this. I thought I had sufficiently explained to y
Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
On Thu, Feb 18, 2016 at 9:45 AM, Hans de Goede wrote: > But this does not seem to be hooked up yet for nouveau. Samuel has patches. See https://cgit.freedesktop.org/~hakzsam/mesa/log/?h=arb_compute_shader_v3 > > So some questions: > -The commit by Samual says: > This introduces TGSI_FILE_MEMORY for shared, global and local memory. Only > shared memory is currently supported. > > The commit introduces MEMORY[x] and MEMORY[x],SHARED so in reality it also > introduces > a second option next to shared, so what are we going to use plain MEMORY[x] > for? > I suggest using it for global memory but we need to be in agreement on > this. That sounds fine to me. However what I had in mind was switching the SHARED field into a 2-bit field and making it 1 = SHARED 2 = GLOBAL 3 = LOCAL (since for OpenCL you also need to be able to address local or private memory). I sorta wanted Samuel to do it, but since I had no idea where you were at, or if you were even still working on this, I figured it should be fixed up by the first person who needed it. > > -What about kernel input parameters, so far these have been using RES[32764] > I must admit that I do not understand where the file_index of 32764 comes > from (or where any of the file indexes come from for > src/gallium/tests/trivial/compute.c ? > I have the feeling that these are not used at all, and everything simply > goes > to a flat (virtual) memory space, with the params at address 0, correct ? It was never particularly well-specified, which was one of the reasons it went away. It also didn't map nicely onto the OpenGL model. There is a remaining question of how to do addressing in memory... there's 40 bits of address space. Should these implicitly be U64 (dual-component in TGSI) addresses that are passed around? Not sure what the OpenCL position on all this is. Also it would be highly preferable to avoid using GLOBAL at all in the first place, and just use BUFFER[] or IMAGE[] or whatever. However after having a look at how OpenCL works, I don't think that's possible. But perhaps I missed something? > > This seems like a nvidia implementation detail though, and we should IMHO > still > implement a specific TGSI register file for accessing kernel input params, > agreed ? I believe that kernel input params should be passed in via user uniforms. i.e. CONST. It doesn't have to be a separate UBO, it can use the user buffer thing, which is basically what it is now. > > If we can agree on the TGSI syntax for all this, then I'm pretty sure I can > hack > something together to get things to work with the latest master again, and > then we > can use this as a starting point for properly implementing this. > > Regards, > > Hans > > > p.s. > > I must say I was somewhat unpleasantly surprised by the RES breakage, yes I > know > some changes were coming but still. In the future when my llvm->tgsi work is > more > stable and I'll be pushing it to llvm upstream we really cannot make > breaking tgsi > changes like this. I thought I had sufficiently explained to you that the resource thing was going away -- it was never spec'd, and there were no users. Perhaps I didn't make it clear enough though -- sorry. The reality is that TGSI is a living thing, and things change every so often. I don't think people would be comfortable with locking down TGSI in such a way. -ilia ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone
Hi Ilia and Samuel, I rebased my mesa git tree today (it was getting a bit stale) and after that src/gallium/tests/trivial/compute.c stopped working as well as any opencl programs with the kernel in TGSI (as clover on nouveau currently only supports having the kernel in TGSI). The problem is that RES no longer is a valid register-file name in TGSI, specifically this is caused by this commit: https://cgit.freedesktop.org/mesa/mesa/commit?id=8cc9a8aa2a97ca9e7a36a993954a3480d44c13d3 (no surprises there) I noticed that Samual adds TGSI_FILE_MEMORY support here: https://cgit.freedesktop.org/mesa/mesa/commit?id=a8328e3a50169c3c74656df7f63f56f061d9e751 But this does not seem to be hooked up yet for nouveau. So some questions: -The commit by Samual says: This introduces TGSI_FILE_MEMORY for shared, global and local memory. Only shared memory is currently supported. The commit introduces MEMORY[x] and MEMORY[x],SHARED so in reality it also introduces a second option next to shared, so what are we going to use plain MEMORY[x] for? I suggest using it for global memory but we need to be in agreement on this. -What about kernel input parameters, so far these have been using RES[32764] I must admit that I do not understand where the file_index of 32764 comes from (or where any of the file indexes come from for src/gallium/tests/trivial/compute.c ? I have the feeling that these are not used at all, and everything simply goes to a flat (virtual) memory space, with the params at address 0, correct ? This seems like a nvidia implementation detail though, and we should IMHO still implement a specific TGSI register file for accessing kernel input params, agreed ? If we can agree on the TGSI syntax for all this, then I'm pretty sure I can hack something together to get things to work with the latest master again, and then we can use this as a starting point for properly implementing this. Regards, Hans p.s. I must say I was somewhat unpleasantly surprised by the RES breakage, yes I know some changes were coming but still. In the future when my llvm->tgsi work is more stable and I'll be pushing it to llvm upstream we really cannot make breaking tgsi changes like this. ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau