Re: Radeon state handling
Roland Scheidegger wrote: Jerome Glisse wrote: How storing state will done is yet to be determined but the idea is that finding state with a given id would have to be fast, very fast. Each state class will have at much 64dword and i think that there will be somethings around 30 differents class so this isn't much memory and all. And i don't expect to have more 100 entries per class, at least if i have more then this class idea is wastefull and i will go the way of reuploading all state each time. Btw, this will be stored in system memory as there storing them in vram would kill performance (the cmd are submited through system memory though i think we can use vram). Right now my target cards are r300, r400, r500 so all of them have quite enough memory. That being said the infrastructure for supporting older card is there and most of initialization code should work on older card (down to r100) haven't tested yet but will do when i see the need. I fail to see the benefits of doing that sort of inter-client optimization too. Or rather, I think you're trying to solve a problem which probably doesn't exist, and thus just add unnecessary complexity. There is some state which will be different anyway always between two clients (for instance because all your buffer addresses will be different), and quite a lot of other state is changing a lot anyway all the time even within one application (hence you excluded shaders yourself from this scheme, which are a pretty big junk already now and even more so in the future) . And, if some state is really the same between two clients, maybe it's just because noone has changed it really from the default but in this case it might not be relevant at all and you should just rather avoid uploading it in the first place (as a simplified example, something like fog color if fog isn't enabled). I'd agree though that the current state management isn't really ideal, the grouping of state probably not optimal (and state is never reused). Nothing gallium3D couldn't fix :-). Roland You all certainly right about this, this state things would be hard to do and will only save few dword of uploading which likely won't kill performance or even impact them. I will go the way of full state submission where its up to the driver to make sure that card is in state it wants it to be for drawing. Cheers, Jerome - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Radeon state handling
Gentlemen, let me see if I understand this properly. You want to create a finite number of states, store them in a list and switch amongst them on a whim, right? If my understanding is correct, I am wondering how much data you are going to store in these states, and how many states you wish to create. I mean, if you have only twenty states, each containing a few K of data, it doesn't seem like much, but if you want to put a megabyte in each state, that would restrict you to 64MB and greater for graphics cards - not really much of an issue with newer hardware, but there are a few 1MB cards still floating around in Linux boxes. (I know, I have one or two.) Also, having a large number of states to sort through might end up degrading performance if you have to search for the proper state each time, depending on the fetching algorithm. These concerns are, of course, purely due to my not understanding what information you want to store in the state. I can conceive of a single state which stores all pixel data - hardly useful - all the way down to one that stores only which VGA mode you are rendering the image with. Can you point out where in this scale you are working? Garry On Nov 19, 2007 4:10 PM, Jerome Glisse [EMAIL PROTECTED] wrote: Keith Whitwell wrote: Jerome Glisse wrote: Hi all, While playing with modesetting ttm i have put some thought on how we send things to card. And i would like to test the following scheme: -split card state into a bunch of separate chunk (z state, fog state, ...) -the driver build the state it want and register each state chunk to the drm -drm give an unique id for this state chunk the id have two part one is the chunk class (z state, fog state, ...) the other is an unique id identifying this particular state id in the chunk class. These are shared btw all drm client ie if another program register exact same stage then drm return the same id -driver can no use this id by using superioctl and providing a list of state id -drm keep a list of lastest state id uploaded to card an upload only state chunk which differ and update its list of state id -few things won't be in this state things like vertex program or fragment program (i believe that there might be too much different of them that this won't be efficient to cache which program was lastly uploaded; so i think its better to reupload program each time (they ain't big anyway). So what good things do we got with this: - from user space its like the card have context :) - we can save a lot of state reuploading the assumption being that most program share most of the state (ie if state chunk are well sliced there won't be many different state id in each state class). - drm is the only place where we can have a coherent up to date view of current state uploaded on card - its lot easier than asking for the userspace to resend all its state - most of the checking is done at state registration (ain't big win i think) - in the future we can even schedule request in the order which will trigger the less number of state change There is likely others good things on the bright side... Bad things: - backward compat if we want to change how state are sliced or what we accept or not (i think by cleverly thinking the interface we might minimize problems in this area) - if we badly split state than we might end up having to much id in for some state chunk which will slow down state registration (as this involve searching to all previous state of same class see if the states registered already exist) Maybe others bad things ? I think we can work around this by putting some time into real test usage of this to see how best we can split state and what might be cached by state or reuploaded at each call. So the superioctl will looks like this: - drm drawable (where we draw dri 2 world :)) - list of state id - cmd buffer (cmd stream with vert, frag prog other state not cached by the above mechanism) - list of reloc buffer -reloc pos into the cmd buffer -buffer The list of reloc buffer will be there to supply texture buffer, vertex buffer or others buffer of this kind. In this scheme you can draw only in one context by call but this could be extended even though i believe its better that way. I believe such scheme were already proposed in the past. So what do you think about it ? I will start soon a sample program (named r300_demo ;)) to test this scheme before doing any driver works and see how it behave. This is more or less extending the constant state object idea as described by GL3, Gallium3D and other 3D APIs to multiple applications. The biggest issue I see is that we expect individual applications to create a moderately large number of these states and to rapidly switch between them.
Re: Radeon state handling
Garry Hurley wrote: Gentlemen, let me see if I understand this properly. You want to create a finite number of states, store them in a list and switch amongst them on a whim, right? If my understanding is correct, I am wondering how much data you are going to store in these states, and how many states you wish to create. I mean, if you have only twenty states, each containing a few K of data, it doesn't seem like much, but if you want to put a megabyte in each state, that would restrict you to 64MB and greater for graphics cards - not really much of an issue with newer hardware, but there are a few 1MB cards still floating around in Linux boxes. (I know, I have one or two.) Also, having a large number of states to sort through might end up degrading performance if you have to search for the proper state each time, depending on the fetching algorithm. These concerns are, of course, purely due to my not understanding what information you want to store in the state. I can conceive of a single state which stores all pixel data - hardly useful - all the way down to one that stores only which VGA mode you are rendering the image with. Can you point out where in this scale you are working? Garry How storing state will done is yet to be determined but the idea is that finding state with a given id would have to be fast, very fast. Each state class will have at much 64dword and i think that there will be somethings around 30 differents class so this isn't much memory and all. And i don't expect to have more 100 entries per class, at least if i have more then this class idea is wastefull and i will go the way of reuploading all state each time. Btw, this will be stored in system memory as there storing them in vram would kill performance (the cmd are submited through system memory though i think we can use vram). Right now my target cards are r300, r400, r500 so all of them have quite enough memory. That being said the infrastructure for supporting older card is there and most of initialization code should work on older card (down to r100) haven't tested yet but will do when i see the need. Cheers, Jerome Glisse - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Radeon state handling
Jerome Glisse wrote: How storing state will done is yet to be determined but the idea is that finding state with a given id would have to be fast, very fast. Each state class will have at much 64dword and i think that there will be somethings around 30 differents class so this isn't much memory and all. And i don't expect to have more 100 entries per class, at least if i have more then this class idea is wastefull and i will go the way of reuploading all state each time. Btw, this will be stored in system memory as there storing them in vram would kill performance (the cmd are submited through system memory though i think we can use vram). Right now my target cards are r300, r400, r500 so all of them have quite enough memory. That being said the infrastructure for supporting older card is there and most of initialization code should work on older card (down to r100) haven't tested yet but will do when i see the need. I fail to see the benefits of doing that sort of inter-client optimization too. Or rather, I think you're trying to solve a problem which probably doesn't exist, and thus just add unnecessary complexity. There is some state which will be different anyway always between two clients (for instance because all your buffer addresses will be different), and quite a lot of other state is changing a lot anyway all the time even within one application (hence you excluded shaders yourself from this scheme, which are a pretty big junk already now and even more so in the future) . And, if some state is really the same between two clients, maybe it's just because noone has changed it really from the default but in this case it might not be relevant at all and you should just rather avoid uploading it in the first place (as a simplified example, something like fog color if fog isn't enabled). I'd agree though that the current state management isn't really ideal, the grouping of state probably not optimal (and state is never reused). Nothing gallium3D couldn't fix :-). Roland - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Radeon state handling
Hi all, While playing with modesetting ttm i have put some thought on how we send things to card. And i would like to test the following scheme: -split card state into a bunch of separate chunk (z state, fog state, ...) -the driver build the state it want and register each state chunk to the drm -drm give an unique id for this state chunk the id have two part one is the chunk class (z state, fog state, ...) the other is an unique id identifying this particular state id in the chunk class. These are shared btw all drm client ie if another program register exact same stage then drm return the same id -driver can no use this id by using superioctl and providing a list of state id -drm keep a list of lastest state id uploaded to card an upload only state chunk which differ and update its list of state id -few things won't be in this state things like vertex program or fragment program (i believe that there might be too much different of them that this won't be efficient to cache which program was lastly uploaded; so i think its better to reupload program each time (they ain't big anyway). So what good things do we got with this: - from user space its like the card have context :) - we can save a lot of state reuploading the assumption being that most program share most of the state (ie if state chunk are well sliced there won't be many different state id in each state class). - drm is the only place where we can have a coherent up to date view of current state uploaded on card - its lot easier than asking for the userspace to resend all its state - most of the checking is done at state registration (ain't big win i think) - in the future we can even schedule request in the order which will trigger the less number of state change There is likely others good things on the bright side... Bad things: - backward compat if we want to change how state are sliced or what we accept or not (i think by cleverly thinking the interface we might minimize problems in this area) - if we badly split state than we might end up having to much id in for some state chunk which will slow down state registration (as this involve searching to all previous state of same class see if the states registered already exist) Maybe others bad things ? I think we can work around this by putting some time into real test usage of this to see how best we can split state and what might be cached by state or reuploaded at each call. So the superioctl will looks like this: - drm drawable (where we draw dri 2 world :)) - list of state id - cmd buffer (cmd stream with vert, frag prog other state not cached by the above mechanism) - list of reloc buffer -reloc pos into the cmd buffer -buffer The list of reloc buffer will be there to supply texture buffer, vertex buffer or others buffer of this kind. In this scheme you can draw only in one context by call but this could be extended even though i believe its better that way. I believe such scheme were already proposed in the past. So what do you think about it ? I will start soon a sample program (named r300_demo ;)) to test this scheme before doing any driver works and see how it behave. Cheers, Jerome Glisse - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Radeon state handling
Keith Whitwell wrote: Jerome Glisse wrote: Hi all, While playing with modesetting ttm i have put some thought on how we send things to card. And i would like to test the following scheme: -split card state into a bunch of separate chunk (z state, fog state, ...) -the driver build the state it want and register each state chunk to the drm -drm give an unique id for this state chunk the id have two part one is the chunk class (z state, fog state, ...) the other is an unique id identifying this particular state id in the chunk class. These are shared btw all drm client ie if another program register exact same stage then drm return the same id -driver can no use this id by using superioctl and providing a list of state id -drm keep a list of lastest state id uploaded to card an upload only state chunk which differ and update its list of state id -few things won't be in this state things like vertex program or fragment program (i believe that there might be too much different of them that this won't be efficient to cache which program was lastly uploaded; so i think its better to reupload program each time (they ain't big anyway). So what good things do we got with this: - from user space its like the card have context :) - we can save a lot of state reuploading the assumption being that most program share most of the state (ie if state chunk are well sliced there won't be many different state id in each state class). - drm is the only place where we can have a coherent up to date view of current state uploaded on card - its lot easier than asking for the userspace to resend all its state - most of the checking is done at state registration (ain't big win i think) - in the future we can even schedule request in the order which will trigger the less number of state change There is likely others good things on the bright side... Bad things: - backward compat if we want to change how state are sliced or what we accept or not (i think by cleverly thinking the interface we might minimize problems in this area) - if we badly split state than we might end up having to much id in for some state chunk which will slow down state registration (as this involve searching to all previous state of same class see if the states registered already exist) Maybe others bad things ? I think we can work around this by putting some time into real test usage of this to see how best we can split state and what might be cached by state or reuploaded at each call. So the superioctl will looks like this: - drm drawable (where we draw dri 2 world :)) - list of state id - cmd buffer (cmd stream with vert, frag prog other state not cached by the above mechanism) - list of reloc buffer -reloc pos into the cmd buffer -buffer The list of reloc buffer will be there to supply texture buffer, vertex buffer or others buffer of this kind. In this scheme you can draw only in one context by call but this could be extended even though i believe its better that way. I believe such scheme were already proposed in the past. So what do you think about it ? I will start soon a sample program (named r300_demo ;)) to test this scheme before doing any driver works and see how it behave. This is more or less extending the constant state object idea as described by GL3, Gallium3D and other 3D APIs to multiple applications. The biggest issue I see is that we expect individual applications to create a moderately large number of these states and to rapidly switch between them. The likelyhood that two unrelated applications would happen to be sharing any large number of these at context switch time seems fairly low. Also, for current drivers, the number of context switches compared to the number of other state changes is actually pretty low. I guess I'd have to question whether even a very large speed savings in a fairly rare case (context switch) is going to make a noticable difference overall. I think that this approach makes sense within a driver context - ie as a way to avoid the same app repeatedly emitting the same piece of state, hence the thinking behind constant state objects in GL3 and elsewhere, but maybe less exciting for sharing between unrelated contexts. Keith State i am thinking of are lot smaller and won't have as many possibility. For instance in fog i don't want to cache fog color or in z i don't want to cache z clear color. I am only interested in putting state which have a configuration impact on the card which might need special treatment like waiting for 3d part to go idle or syncing with somethings. To sum up any of the state class i am thinkng of shouldn't have more than 1000 different possible combinations (and likely a lot less). I need to build some of this class to have an idea on how much different possibilities their