Re: [Mesa-dev] vertex array regression
Hi, On Sunday, December 25, 2011 10:03:34 Marek Olšák wrote: > I did not fix Torcs. I think optimizing apps is useless if the same > optimization can be done in Mesa too. Ok, so any torcs version should show a performance difference to check against. > > Do you have any test program that could be used to see the impact of this > > kind of change? > > There's a piglit test called draw-batch, which I was using to monitor > what Mesa was doing. > > > Also your comment mentions that the _Tnl program already cares for > > invalidating the required state. Do you remember where this happend? May > > be we could utilize this place to trigger state invalidation then? > > Sorry, I no longer remember the details. Thanks anyway. Mathias ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] vertex array regression
2011/12/20 Mathias Fröhlich : > Ok, back to topic, so have I understood right? You have fixed torcs not to use > the slow path anymore? Which old version is the one still having this issue? I did not fix Torcs. I think optimizing apps is useless if the same optimization can be done in Mesa too. > Do you have any test program that could be used to see the impact of this kind > of change? There's a piglit test called draw-batch, which I was using to monitor what Mesa was doing. > > Also your comment mentions that the _Tnl program already cares for > invalidating the required state. Do you remember where this happend? May be we > could utilize this place to trigger state invalidation then? Sorry, I no longer remember the details. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] vertex array regression
On 12/19/2011 01:12 PM, Mathias Fröhlich wrote: Brian, On Monday, December 19, 2011 17:01:07 you wrote: I still don't get what the purpose of the special mapping of the last four/five elements is all about. Can you explain that? As far as I can see the last ones should not map to anything. The map is used only in fixed function mode. And there these arrays should not show up as inputs to any fixed function shader. I think that the purpose of the previous mapping was just to fill in anything. The changes I did serve exactly this purpose. But I agree that here should not be any mapping of these attributes in this mode. Attached is a small change that is probably a more clean solution to this mapping problem. The change passes a quick r600g piglit test. Yeah, I see how the incorrect varying_inputs values was causing _NEW_ARRAY to get set, thus hiding the problem. The solution looks correct, but I'm a little worried about performance, as is mentioned in the comment. Marek, made that change back in March. Maybe he can take a look at this too. I cannot comment on this, since I do not see a performance regression on my use cases. Marek? I dug a bit deeper into this. When the isosurface corruption is visible it's basically because the glVertex and glNormal values are transposed. With glBegin/End we're sending interleaved (normal3f, vertex3f) values. But with glDrawArrays we're sending interleaved (vertex3f, normal3f) values. If we don't signal _NEW_ARRAY when switching from one method to the other, the state tracker is skipping array validation and we're drawing with normals and positions transposed. The glBegin/End code is building vertex arrays but never directly sets _NEW_ARRAY as a side-effect. I've implemented a solution to this that simply sets _NEW_ARRAY whenever we transition between glBegin/End and glDrawArrays drawing. This fixes the isosurf bug and removes all occurances of NewState |= _NEW_ARRAY in the recalculate_input_bindings() function to avoid a performance regression. I think this fixes the regression while preserving Marek's performance fix. I'll post a patch. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] vertex array regression
Marek, On Tuesday, December 20, 2011 01:40:46 Marek Olšák wrote: > The problem back in March and earlier was that the vertex array state > was completely recomputed for each draw call even if the user did not > change any state at all! There was the validation in the vbo module > and then in st/mesa, which were basically needless. It turned out to > be the major bottleneck in the game Torcs (in the track called Forza). > I checked the source code of Torcs and it confirmed my findings. There > was a loop that looked like this: > > for (i = 0; i < very large number; i++) glDrawElements(...); > > From that point, it became clear that Mesa is underperforming and that > it's fixable. Then I came up with a set of patches which did not > address the issue completely, but some cases (like with Torcs) were > optimized (well maybe not cleanly as it caused a few regressions too). > There still is a lot of cases where the vertex array state is > recomputed needlessly, so it still is an open issue. I did not fully > understand how the entire vbo module works and why so much state > validation was being done there. Even today, drivers have almost no > way to know in draw_prims whether the vertex array state is _really_ > changed, because core Mesa almost always says "yes, it changed of > course! why wouldn't it?" Sadly, performance is usually being put > aside and energy is put into more important stuff like new features. I > was profiling Mesa quite extensively because there was simply nothing > better to do for r300g. I don't expect that's the case with the other > drivers. > > Back on topic. The reason why you don't see a performance regression may > be: 1) The vertex array state management is not the bottleneck. > 2) You already hit the slowest path, that is recomputing the state > completely in every draw call. Well, in general, to me performance *is* a feature. At least for this kind of software like an OpenGL stack. But correctness is not just a feature, correctness is a hard requirement. So I believe I am all with you... Let me explain, the reason why I am starting work on the vertex arrays is that I think this is one of the major bottlenecks in the draw path that we have today. I think that the array object should behave more like a state that could be used to do just the required things to be ready for the draw code path. The first step towards this was to make the array object an array of client arrays. Then start exploiting more of bitmask/ffs driven loops to reduce the overhead of iterating over all client arrays on almost every draw. Being able to accumulate changed arrays in this kind of flag values is something that could be done now. I have a proof of concept implementation sitting around here that updates the traditional client array pointers used in vbo_draw in a lazy way based on an enabled bitmasks and iterating only on the changed values instead of all. Also we could really derive from the generic array object implementation for the old style vbo_draw and an other implementation for a gallium driver path where we are really able to track changes in the array cso thats currently unconditional application to the gallium state could then probably be pulled out of the draw path for the gallium drivers, which I expect a major win for plenty of use cases... So far to what could be done. But at least to me, experience shows that I better do this kind of stuff, which is on the first front to the OpenGL api specs and also used in so different drivers in so different ways, with very small increments. So this current increment is trying to get back to a stable state past the array change. Ok, back to topic, so have I understood right? You have fixed torcs not to use the slow path anymore? Which old version is the one still having this issue? Do you have any test program that could be used to see the impact of this kind of change? Also your comment mentions that the _Tnl program already cares for invalidating the required state. Do you remember where this happend? May be we could utilize this place to trigger state invalidation then? Thanks Mathias ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] vertex array regression
2011/12/19 Mathias Fröhlich : > > Brian, > > On Monday, December 19, 2011 17:01:07 you wrote: >> I still don't get what the purpose of the special mapping of the last >> four/five elements is all about. Can you explain that? > > As far as I can see the last ones should not map to anything. > The map is used only in fixed function mode. And there these arrays should not > show up as inputs to any fixed function shader. I think that the purpose of > the > previous mapping was just to fill in anything. The changes I did serve exactly > this purpose. > > But I agree that here should not be any mapping of these attributes in this > mode. Attached is a small change that is probably a more clean solution to > this mapping problem. > > The change passes a quick r600g piglit test. > >> Yeah, I see how the incorrect varying_inputs values was causing >> _NEW_ARRAY to get set, thus hiding the problem. The solution looks >> correct, but I'm a little worried about performance, as is mentioned >> in the comment. >> >> Marek, made that change back in March. Maybe he can take a look at >> this too. > I cannot comment on this, since I do not see a performance regression on my > use cases. > Marek? The problem back in March and earlier was that the vertex array state was completely recomputed for each draw call even if the user did not change any state at all! There was the validation in the vbo module and then in st/mesa, which were basically needless. It turned out to be the major bottleneck in the game Torcs (in the track called Forza). I checked the source code of Torcs and it confirmed my findings. There was a loop that looked like this: for (i = 0; i < very large number; i++) glDrawElements(...); From that point, it became clear that Mesa is underperforming and that it's fixable. Then I came up with a set of patches which did not address the issue completely, but some cases (like with Torcs) were optimized (well maybe not cleanly as it caused a few regressions too). There still is a lot of cases where the vertex array state is recomputed needlessly, so it still is an open issue. I did not fully understand how the entire vbo module works and why so much state validation was being done there. Even today, drivers have almost no way to know in draw_prims whether the vertex array state is _really_ changed, because core Mesa almost always says "yes, it changed of course! why wouldn't it?" Sadly, performance is usually being put aside and energy is put into more important stuff like new features. I was profiling Mesa quite extensively because there was simply nothing better to do for r300g. I don't expect that's the case with the other drivers. Back on topic. The reason why you don't see a performance regression may be: 1) The vertex array state management is not the bottleneck. 2) You already hit the slowest path, that is recomputing the state completely in every draw call. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] vertex array regression
Brian, On Monday, December 19, 2011 17:01:07 you wrote: > I still don't get what the purpose of the special mapping of the last > four/five elements is all about. Can you explain that? As far as I can see the last ones should not map to anything. The map is used only in fixed function mode. And there these arrays should not show up as inputs to any fixed function shader. I think that the purpose of the previous mapping was just to fill in anything. The changes I did serve exactly this purpose. But I agree that here should not be any mapping of these attributes in this mode. Attached is a small change that is probably a more clean solution to this mapping problem. The change passes a quick r600g piglit test. > Yeah, I see how the incorrect varying_inputs values was causing > _NEW_ARRAY to get set, thus hiding the problem. The solution looks > correct, but I'm a little worried about performance, as is mentioned > in the comment. > > Marek, made that change back in March. Maybe he can take a look at > this too. I cannot comment on this, since I do not see a performance regression on my use cases. Marek? Greetings Mathias From 67f3e817dfd6cc0e937e95c9fc8b671da9bf406c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mathias=20Fr=C3=B6hlich?= Date: Mon, 19 Dec 2011 20:53:51 +0100 Subject: [PATCH] vbo: Set unused arrays in fixed function mode to invalid. The vbo module uses index maps to map VBO arrays into VERT_ATTRIB* arrays. The fixed function map contains some entries that are basically unused and are curretly mapped to any array. This change will explicitly map these unused arrays to an invalid array index which is then filtered out. Signed-off-by: Mathias Froehlich --- src/mesa/vbo/vbo_context.c |2 +- src/mesa/vbo/vbo_exec_draw.c |2 ++ src/mesa/vbo/vbo_save_draw.c |2 ++ 3 files changed, 5 insertions(+), 1 deletions(-) diff --git a/src/mesa/vbo/vbo_context.c b/src/mesa/vbo/vbo_context.c index b2e6bbc..2542c5d 100644 --- a/src/mesa/vbo/vbo_context.c +++ b/src/mesa/vbo/vbo_context.c @@ -185,7 +185,7 @@ GLboolean _vbo_CreateContext( struct gl_context *ctx ) vbo->map_vp_none[VERT_ATTRIB_GENERIC(i)] = VBO_ATTRIB_MAT_FRONT_AMBIENT + i; for (i = NR_MAT_ATTRIBS; i < VERT_ATTRIB_GENERIC_MAX; i++) - vbo->map_vp_none[VERT_ATTRIB_GENERIC(i)] = i; + vbo->map_vp_none[VERT_ATTRIB_GENERIC(i)] = ~((GLuint)0); for (i = 0; i < Elements(vbo->map_vp_arb); i++) vbo->map_vp_arb[i] = i; diff --git a/src/mesa/vbo/vbo_exec_draw.c b/src/mesa/vbo/vbo_exec_draw.c index dd5363b..ac3ada3 100644 --- a/src/mesa/vbo/vbo_exec_draw.c +++ b/src/mesa/vbo/vbo_exec_draw.c @@ -218,6 +218,8 @@ vbo_exec_bind_arrays( struct gl_context *ctx ) */ for (attr = 0; attr < VERT_ATTRIB_MAX ; attr++) { const GLuint src = map[attr]; + if (VBO_ATTRIB_MAX <= src) + continue; if (exec->vtx.attrsz[src]) { GLsizeiptr offset = (GLbyte *)exec->vtx.attrptr[src] - diff --git a/src/mesa/vbo/vbo_save_draw.c b/src/mesa/vbo/vbo_save_draw.c index fa93ca4..8e84c2e 100644 --- a/src/mesa/vbo/vbo_save_draw.c +++ b/src/mesa/vbo/vbo_save_draw.c @@ -185,6 +185,8 @@ static void vbo_bind_vertex_list(struct gl_context *ctx, for (attr = 0; attr < VERT_ATTRIB_MAX; attr++) { const GLuint src = map[attr]; + if (VBO_ATTRIB_MAX <= src) + continue; if (node_attrsz[src]) { /* override the default array set above */ -- 1.7.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] vertex array regression
On 12/16/2011 11:28 AM, Mathias Fröhlich wrote: Brian, On Thursday, December 15, 2011 22:36:24 you wrote: I found the problem. It's this chunk in vbo_context.c: [...] For fixed function, the point is to simply place the per-vertex material attributes in the generic attribute arrays. There are 12 such material attributes. So there's four slots left over. Yep, and these map with the old scheme to often used attributes like the vertex position. I still don't get what the purpose of the special mapping of the last four/five elements is all about. Can you explain that? Thanks to your hints I believe that I found the underlying problem: If you look at the resulting bitmasks for the enabled vertex program inputs in isosurf by commenting out the printf in _mesa_set_varying_vp_inputs you will see surprising results with the old numbering. The current checked in numbering looks much more plausible. It turns out that this wrong varying_vp_inputs mask sets the _NEW_ARRAY bit through _mesa_set_varying_vp_inputs. That in turn cares for some (by now in my debug session untracked) state updates. This helps in the end for isosurf. If the varying_vp_inputs mask looks plausible the _NEW_ARRAY bit is not set and isosurf fails. This also explaines that only draw paths going through vbo_exec_array.c are affected, since the imm variants in vbo_exec_{safe,draw} always set the _NEW_ARRAY bit on any draw. Yeah, I see how the incorrect varying_inputs values was causing _NEW_ARRAY to get set, thus hiding the problem. The solution looks correct, but I'm a little worried about performance, as is mentioned in the comment. Marek, made that change back in March. Maybe he can take a look at this too. So for me this change diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c index a6e41e9..a2851c4 100644 --- a/src/mesa/vbo/vbo_exec_array.c +++ b/src/mesa/vbo/vbo_exec_array.c @@ -446,7 +446,7 @@ recalculate_input_bindings(struct gl_context *ctx) * to revalidate vertex arrays. Not marking the state as dirty also * improves performance (quite significantly in some apps). */ - if (!ctx->VertexProgram._MaintainTnlProgram) + /* if (!ctx->VertexProgram._MaintainTnlProgram) */ ctx->NewState |= _NEW_ARRAY; break; makes isosurf work reliable. I will prepare a patch for that. Greetings Mathias -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] vertex array regression
Brian, On Thursday, December 15, 2011 22:36:24 you wrote: > I found the problem. It's this chunk in vbo_context.c: [...] > For fixed function, the point is to simply place the per-vertex > material attributes in the generic attribute arrays. There are 12 > such material attributes. So there's four slots left over. Yep, and these map with the old scheme to often used attributes like the vertex position. Thanks to your hints I believe that I found the underlying problem: If you look at the resulting bitmasks for the enabled vertex program inputs in isosurf by commenting out the printf in _mesa_set_varying_vp_inputs you will see surprising results with the old numbering. The current checked in numbering looks much more plausible. It turns out that this wrong varying_vp_inputs mask sets the _NEW_ARRAY bit through _mesa_set_varying_vp_inputs. That in turn cares for some (by now in my debug session untracked) state updates. This helps in the end for isosurf. If the varying_vp_inputs mask looks plausible the _NEW_ARRAY bit is not set and isosurf fails. This also explaines that only draw paths going through vbo_exec_array.c are affected, since the imm variants in vbo_exec_{safe,draw} always set the _NEW_ARRAY bit on any draw. So for me this change diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c index a6e41e9..a2851c4 100644 --- a/src/mesa/vbo/vbo_exec_array.c +++ b/src/mesa/vbo/vbo_exec_array.c @@ -446,7 +446,7 @@ recalculate_input_bindings(struct gl_context *ctx) * to revalidate vertex arrays. Not marking the state as dirty also * improves performance (quite significantly in some apps). */ - if (!ctx->VertexProgram._MaintainTnlProgram) + /* if (!ctx->VertexProgram._MaintainTnlProgram) */ ctx->NewState |= _NEW_ARRAY; break; makes isosurf work reliable. I will prepare a patch for that. Greetings Mathias ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] vertex array regression
2011/12/15 Mathias Fröhlich : > > Brian, > > On Thursday, December 15, 2011 17:08:57 Brian Paul wrote: >> There's a regression in vertex array drawing with this commit: >> >> commit ed42c2580717527b2005580940fc766d95bb6b0b >> Author: Mathias Fröhlich >> Date: Mon Oct 31 16:23:40 2011 +0100 >> >> vbo: Use The VERT_{ATTRIB,BIT} defines. >> >> Signed-off-by: Mathias Froehlich >> Reviewed-by: Brian Paul >> Reviewed-by: Eric Anholt >> >> >> To see the problem, run mesa/demos/src/demos/isosurf, choose >> "glDrawArrays" or "glDrawElements" from the pop-up menu (right mouse >> button). I see the problem (random/missing vertices or failed >> assertion) with all gallium drivers. The swrast/i965 drivers seem >> uneffected. >> >> I'll try to debug it further, but maybe you could double-check your work. > > I will look into that. Probably not today but latest at the weekend. I found the problem. It's this chunk in vbo_context.c: @@ -182,14 +177,15 @@ GLboolean _vbo_CreateContext( struct gl_context *ctx ) GLuint i; /* When no vertex program, pull in the material attributes in - * the 16..32 generic range. + * the generic range. */ - for (i = 0; i < 16; i++) + for (i = 0; i < VERT_ATTRIB_FF_MAX; i++) vbo->map_vp_none[i] = i; - for (i = 0; i < 12; i++) -vbo->map_vp_none[16+i] = VBO_ATTRIB_MAT_FRONT_AMBIENT + i; - for (i = 0; i < 4; i++) -vbo->map_vp_none[28+i] = i; + for (i = 0; i < NR_MAT_ATTRIBS; i++) +vbo->map_vp_none[VERT_ATTRIB_GENERIC(i)] += VBO_ATTRIB_MAT_FRONT_AMBIENT + i; + for (i = NR_MAT_ATTRIBS; i < VERT_ATTRIB_GENERIC_MAX; i++) +vbo->map_vp_none[VERT_ATTRIB_GENERIC(i)] = i; for (i = 0; i < Elements(vbo->map_vp_arb); i++) vbo->map_vp_arb[i] = i; Or more precisely: - for (i = 0; i < 4; i++) -vbo->map_vp_none[28+i] = i; + for (i = NR_MAT_ATTRIBS; i < VERT_ATTRIB_GENERIC_MAX; i++) +vbo->map_vp_none[VERT_ATTRIB_GENERIC(i)] = i; This change resulted in a different mapping in the map_vp_none[] array. The original code produced: vbo->map_vp_none = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 0, 1, 2, 3} The new code produces: vbo->map_vp_none = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 12, 13, 14, 15} The last four/five elements are different. Though, I don't really know what their purpose is. It seems to me that they could/should just be identity entries, ex: map_vp_none[i]=i as is the case for the map_vp_arb[] array. And I found that simply removing those lines fixes the problem: @@ -184,8 +184,6 @@ GLboolean _vbo_CreateContext( struct gl_context *ctx ) for (i = 0; i < NR_MAT_ATTRIBS; i++) vbo->map_vp_none[VERT_ATTRIB_GENERIC(i)] = VBO_ATTRIB_MAT_FRONT_AMBIENT + i; - for (i = NR_MAT_ATTRIBS; i < VERT_ATTRIB_GENERIC_MAX; i++) -vbo->map_vp_none[VERT_ATTRIB_GENERIC(i)] = i; for (i = 0; i < Elements(vbo->map_vp_arb); i++) vbo->map_vp_arb[i] = i; For fixed function, the point is to simply place the per-vertex material attributes in the generic attribute arrays. There are 12 such material attributes. So there's four slots left over. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] vertex array regression
Brian, On Thursday, December 15, 2011 17:08:57 Brian Paul wrote: > There's a regression in vertex array drawing with this commit: > > commit ed42c2580717527b2005580940fc766d95bb6b0b > Author: Mathias Fröhlich > Date: Mon Oct 31 16:23:40 2011 +0100 > > vbo: Use The VERT_{ATTRIB,BIT} defines. > > Signed-off-by: Mathias Froehlich > Reviewed-by: Brian Paul > Reviewed-by: Eric Anholt > > > To see the problem, run mesa/demos/src/demos/isosurf, choose > "glDrawArrays" or "glDrawElements" from the pop-up menu (right mouse > button). I see the problem (random/missing vertices or failed > assertion) with all gallium drivers. The swrast/i965 drivers seem > uneffected. > > I'll try to debug it further, but maybe you could double-check your work. I will look into that. Probably not today but latest at the weekend. Mathias ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev