Re: Radeon state handling

2007-11-21 Thread Jerome Glisse
Roland Scheidegger wrote:
 Jerome Glisse wrote:
 How storing state will done is yet to be determined but the idea is that
 finding state with a given id would have to be fast, very fast. Each
 state class will have at much 64dword and i think that there will be
 somethings around 30 differents class so this isn't much memory and all.
 And i don't expect to have more 100 entries per class, at least if i have
 more then this class idea is wastefull and i will go the way of reuploading
 all state each time.

 Btw, this will be stored in system memory as there storing them in vram
 would kill performance (the cmd are submited through system memory though i
 think we can use vram).

 Right now my target cards are r300, r400, r500 so all of them have
 quite enough memory. That being said the infrastructure for supporting
 older card is there and most of initialization code should work on
 older card (down to r100) haven't tested yet but will do when i see
 the need.

 
 I fail to see the benefits of doing that sort of inter-client
 optimization too. Or rather, I think you're trying to solve a problem
 which probably doesn't exist, and thus just add unnecessary complexity.
 There is some state which will be different anyway always between two
 clients (for instance because all your buffer addresses will be
 different), and quite a lot of other state is changing a lot anyway all
 the time even within one application (hence you excluded shaders
 yourself from this scheme, which are a pretty big junk already now and
 even more so in the future) . And, if some state is really the same
 between two clients, maybe it's just because noone has changed it really
 from the default but in this case it might not be relevant at all and
 you should just rather avoid uploading it in the first place (as a
 simplified example, something like fog color if fog isn't enabled).
 I'd agree though that the current state management isn't really ideal,
 the grouping of state probably not optimal (and state is never reused).
 Nothing gallium3D couldn't fix :-).
 
 Roland

You all certainly right about this, this state things would be hard
to do and will only save few dword of uploading which likely won't
kill performance or even impact them. I will go the way of full state
submission where its up to the driver to make sure that card is in
state it wants it to be for drawing.

Cheers,
Jerome

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Radeon state handling

2007-11-20 Thread Garry Hurley
Gentlemen, let me see if I understand this properly.  You want to
create a finite number of states, store them in a list and switch
amongst them on a whim, right?  If my understanding is correct, I am
wondering how much data you are going to store in these states, and
how many states you wish to create.  I mean, if you have only twenty
states, each containing a few K of data, it doesn't seem like much,
but if you want to put a megabyte in each state, that would restrict
you to 64MB and greater for graphics cards - not really much of an
issue with newer hardware, but there are a few 1MB cards still
floating around in Linux boxes.  (I know, I have one or two.)  Also,
having a large number of states to sort through might end up degrading
performance if you have to search for the proper state each time,
depending on the fetching algorithm.

These concerns are, of course, purely due to my not understanding what
information you want to store in the state.  I can conceive of a
single state which stores all pixel data - hardly useful - all the way
down to one that stores only which VGA mode you are rendering the
image with.  Can you point out where in this scale you are working?

Garry

On Nov 19, 2007 4:10 PM, Jerome Glisse [EMAIL PROTECTED] wrote:

 Keith Whitwell wrote:
  Jerome Glisse wrote:
  Hi all,
 
  While playing with modesetting  ttm i have put some thought on how
  we send
  things to card. And i would like to test the following scheme:
  -split card state into a bunch of separate chunk (z state, fog state,
  ...)
  -the driver build the state it want and register each state chunk to
  the drm
  -drm give an unique id for this state chunk the id have two part one
  is the
chunk class (z state, fog state, ...) the other is an unique id
  identifying
this particular state id in the chunk class. These are shared btw
  all drm
client ie if another program register exact same stage then drm
  return the
same id
  -driver can no use this id by using superioctl and providing a list
  of state id
  -drm keep a list of lastest state id uploaded to card an upload only
  state chunk
which differ and update its list of state id
  -few things won't be in this state things like vertex program or
  fragment program
(i believe that there might be too much different of them that this
  won't be
 efficient to cache which program was lastly uploaded; so i think
  its better to
 reupload program each time (they ain't big anyway).
 
  So what good things do we got with this:
  - from user space its like the card have context :)
  - we can save a lot of state reuploading the assumption being that most
 program share most of the state (ie if state chunk are well sliced
  there won't
 be many different state id in each state class).
  - drm is the only place where we can have a coherent  up to date
  view of
 current state uploaded on card
  - its lot easier  than asking for the userspace to resend all its state
  - most of the checking is done at state registration (ain't big win i
  think)
  - in the future we can even schedule request in the order which will
  trigger
 the less number of state change
 
  There is likely others good things on the bright side...
 
  Bad things:
  - backward compat if we want to change how state are sliced or what
  we accept
 or not (i think by cleverly thinking the interface we might
  minimize problems
 in this area)
  - if we badly split state than we might end up having to much id in
  for some
 state chunk which will slow down state registration (as this
  involve searching
 to all previous state of same class see if the states registered
  already exist)
 
  Maybe others bad things ? I think we can work around this by putting
  some time
  into real test usage of this to see how best we can split state  and
  what might
  be cached by state or reuploaded at each call.
 
  So the superioctl will looks like this:
  - drm drawable (where we draw dri 2 world :))
  - list of state id
  - cmd buffer (cmd stream with vert, frag prog  other state not
  cached by the above
 mechanism)
  - list of reloc buffer
  -reloc pos into the cmd buffer
  -buffer
 
  The list of reloc buffer will be there to supply texture buffer,
  vertex buffer or
  others buffer of this kind. In this scheme you can draw only in one
  context by call
  but this could be extended even though i believe its better that way.
 
  I believe such scheme were already proposed in the past. So what do
  you think
  about it ? I will start soon a sample program (named r300_demo ;)) to
  test this scheme
  before doing any driver works and see how it behave.
 
  This is more or less extending the constant state object idea as
  described by GL3, Gallium3D and other 3D APIs to multiple applications.
 
  The biggest issue I see is that we expect individual applications to
  create a moderately large number of these states and to rapidly switch
  between them.

Re: Radeon state handling

2007-11-20 Thread Jerome Glisse
Garry Hurley wrote:
 Gentlemen, let me see if I understand this properly.  You want to
 create a finite number of states, store them in a list and switch
 amongst them on a whim, right?  If my understanding is correct, I am
 wondering how much data you are going to store in these states, and
 how many states you wish to create.  I mean, if you have only twenty
 states, each containing a few K of data, it doesn't seem like much,
 but if you want to put a megabyte in each state, that would restrict
 you to 64MB and greater for graphics cards - not really much of an
 issue with newer hardware, but there are a few 1MB cards still
 floating around in Linux boxes.  (I know, I have one or two.)  Also,
 having a large number of states to sort through might end up degrading
 performance if you have to search for the proper state each time,
 depending on the fetching algorithm.
 
 These concerns are, of course, purely due to my not understanding what
 information you want to store in the state.  I can conceive of a
 single state which stores all pixel data - hardly useful - all the way
 down to one that stores only which VGA mode you are rendering the
 image with.  Can you point out where in this scale you are working?
 
 Garry
 

How storing state will done is yet to be determined but the idea is that
finding state with a given id would have to be fast, very fast. Each
state class will have at much 64dword and i think that there will be
somethings around 30 differents class so this isn't much memory and all.
And i don't expect to have more 100 entries per class, at least if i have
more then this class idea is wastefull and i will go the way of reuploading
all state each time.

Btw, this will be stored in system memory as there storing them in vram
would kill performance (the cmd are submited through system memory though i
think we can use vram).

Right now my target cards are r300, r400, r500 so all of them have
quite enough memory. That being said the infrastructure for supporting
older card is there and most of initialization code should work on
older card (down to r100) haven't tested yet but will do when i see
the need.

Cheers,
Jerome Glisse

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Radeon state handling

2007-11-20 Thread Roland Scheidegger
Jerome Glisse wrote:
 How storing state will done is yet to be determined but the idea is that
 finding state with a given id would have to be fast, very fast. Each
 state class will have at much 64dword and i think that there will be
 somethings around 30 differents class so this isn't much memory and all.
 And i don't expect to have more 100 entries per class, at least if i have
 more then this class idea is wastefull and i will go the way of reuploading
 all state each time.
 
 Btw, this will be stored in system memory as there storing them in vram
 would kill performance (the cmd are submited through system memory though i
 think we can use vram).
 
 Right now my target cards are r300, r400, r500 so all of them have
 quite enough memory. That being said the infrastructure for supporting
 older card is there and most of initialization code should work on
 older card (down to r100) haven't tested yet but will do when i see
 the need.
 

I fail to see the benefits of doing that sort of inter-client
optimization too. Or rather, I think you're trying to solve a problem
which probably doesn't exist, and thus just add unnecessary complexity.
There is some state which will be different anyway always between two
clients (for instance because all your buffer addresses will be
different), and quite a lot of other state is changing a lot anyway all
the time even within one application (hence you excluded shaders
yourself from this scheme, which are a pretty big junk already now and
even more so in the future) . And, if some state is really the same
between two clients, maybe it's just because noone has changed it really
from the default but in this case it might not be relevant at all and
you should just rather avoid uploading it in the first place (as a
simplified example, something like fog color if fog isn't enabled).
I'd agree though that the current state management isn't really ideal,
the grouping of state probably not optimal (and state is never reused).
Nothing gallium3D couldn't fix :-).

Roland

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Radeon state handling

2007-11-19 Thread Jerome Glisse
Hi all,

While playing with modesetting  ttm i have put some thought on how we send
things to card. And i would like to test the following scheme:
-split card state into a bunch of separate chunk (z state, fog state, ...)
-the driver build the state it want and register each state chunk to the drm
-drm give an unique id for this state chunk the id have two part one is the
  chunk class (z state, fog state, ...) the other is an unique id identifying
  this particular state id in the chunk class. These are shared btw all drm
  client ie if another program register exact same stage then drm return the
  same id
-driver can no use this id by using superioctl and providing a list of state id
-drm keep a list of lastest state id uploaded to card an upload only state chunk
  which differ and update its list of state id
-few things won't be in this state things like vertex program or fragment 
program
  (i believe that there might be too much different of them that this won't be
   efficient to cache which program was lastly uploaded; so i think its better 
to
   reupload program each time (they ain't big anyway).

So what good things do we got with this:
- from user space its like the card have context :)
- we can save a lot of state reuploading the assumption being that most
   program share most of the state (ie if state chunk are well sliced there 
won't
   be many different state id in each state class).
- drm is the only place where we can have a coherent  up to date view of
   current state uploaded on card
- its lot easier  than asking for the userspace to resend all its state
- most of the checking is done at state registration (ain't big win i think)
- in the future we can even schedule request in the order which will trigger
   the less number of state change

There is likely others good things on the bright side...

Bad things:
- backward compat if we want to change how state are sliced or what we accept
   or not (i think by cleverly thinking the interface we might minimize problems
   in this area)
- if we badly split state than we might end up having to much id in for some
   state chunk which will slow down state registration (as this involve 
searching
   to all previous state of same class see if the states registered already 
exist)

Maybe others bad things ? I think we can work around this by putting some time
into real test usage of this to see how best we can split state  and what might
be cached by state or reuploaded at each call.

So the superioctl will looks like this:
- drm drawable (where we draw dri 2 world :))
- list of state id
- cmd buffer (cmd stream with vert, frag prog  other state not cached by the 
above
   mechanism)
- list of reloc buffer
-reloc pos into the cmd buffer
-buffer

The list of reloc buffer will be there to supply texture buffer, vertex buffer 
or
others buffer of this kind. In this scheme you can draw only in one context by 
call
but this could be extended even though i believe its better that way.

I believe such scheme were already proposed in the past. So what do you think
about it ? I will start soon a sample program (named r300_demo ;)) to test this 
scheme
before doing any driver works and see how it behave.

Cheers,
Jerome Glisse

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Radeon state handling

2007-11-19 Thread Jerome Glisse
Keith Whitwell wrote:
 Jerome Glisse wrote:
 Hi all,

 While playing with modesetting  ttm i have put some thought on how 
 we send
 things to card. And i would like to test the following scheme:
 -split card state into a bunch of separate chunk (z state, fog state, 
 ...)
 -the driver build the state it want and register each state chunk to 
 the drm
 -drm give an unique id for this state chunk the id have two part one 
 is the
   chunk class (z state, fog state, ...) the other is an unique id 
 identifying
   this particular state id in the chunk class. These are shared btw 
 all drm
   client ie if another program register exact same stage then drm 
 return the
   same id
 -driver can no use this id by using superioctl and providing a list 
 of state id
 -drm keep a list of lastest state id uploaded to card an upload only 
 state chunk
   which differ and update its list of state id
 -few things won't be in this state things like vertex program or 
 fragment program
   (i believe that there might be too much different of them that this 
 won't be
efficient to cache which program was lastly uploaded; so i think 
 its better to
reupload program each time (they ain't big anyway).

 So what good things do we got with this:
 - from user space its like the card have context :)
 - we can save a lot of state reuploading the assumption being that most
program share most of the state (ie if state chunk are well sliced 
 there won't
be many different state id in each state class).
 - drm is the only place where we can have a coherent  up to date 
 view of
current state uploaded on card
 - its lot easier  than asking for the userspace to resend all its state
 - most of the checking is done at state registration (ain't big win i 
 think)
 - in the future we can even schedule request in the order which will 
 trigger
the less number of state change

 There is likely others good things on the bright side...

 Bad things:
 - backward compat if we want to change how state are sliced or what 
 we accept
or not (i think by cleverly thinking the interface we might 
 minimize problems
in this area)
 - if we badly split state than we might end up having to much id in 
 for some
state chunk which will slow down state registration (as this 
 involve searching
to all previous state of same class see if the states registered 
 already exist)

 Maybe others bad things ? I think we can work around this by putting 
 some time
 into real test usage of this to see how best we can split state  and 
 what might
 be cached by state or reuploaded at each call.

 So the superioctl will looks like this:
 - drm drawable (where we draw dri 2 world :))
 - list of state id
 - cmd buffer (cmd stream with vert, frag prog  other state not 
 cached by the above
mechanism)
 - list of reloc buffer
 -reloc pos into the cmd buffer
 -buffer

 The list of reloc buffer will be there to supply texture buffer, 
 vertex buffer or
 others buffer of this kind. In this scheme you can draw only in one 
 context by call
 but this could be extended even though i believe its better that way.

 I believe such scheme were already proposed in the past. So what do 
 you think
 about it ? I will start soon a sample program (named r300_demo ;)) to 
 test this scheme
 before doing any driver works and see how it behave.

 This is more or less extending the constant state object idea as 
 described by GL3, Gallium3D and other 3D APIs to multiple applications.

 The biggest issue I see is that we expect individual applications to 
 create a moderately large number of these states and to rapidly switch 
 between them.

 The likelyhood that two unrelated applications would happen to be 
 sharing any large number of these at context switch time seems fairly 
 low.

 Also, for current drivers, the number of context switches compared to 
 the number of other state changes is actually pretty low.  I guess I'd 
 have to question whether even a very large speed savings in a fairly 
 rare case (context switch) is going to make a noticable difference 
 overall.

 I think that this approach makes sense within a driver context - ie as 
 a way to avoid the same app repeatedly emitting the same piece of 
 state, hence the thinking behind constant state objects in GL3 and 
 elsewhere, but maybe less exciting for sharing between unrelated 
 contexts.

 Keith
State i am thinking of are lot smaller and won't have as many possibility.
For instance in fog i don't want to cache fog color or in z i don't want
to cache z clear color. I am only interested in putting state which have
a configuration impact on the card which might need special treatment like
waiting for 3d part to go idle or syncing with somethings.

To sum up any of the state class i am thinkng of shouldn't have more than
1000 different possible combinations (and likely a lot less). I need to
build some of this class to have an idea on how much different possibilities
their