Hmm.
Forgive my ignorance, but isn't memcmp() on structs pretty prone to give incorrect != results, given that there may be padding between members in structs and that IIRC gcc struct assignment is member-wise.

What happens if there's padding between the jit_context and variant members of struct lp_rast_state?

I seem to recall hitting similar issues a number of times in the past.

/Thomas

On 06/30/2011 03:36 AM, Roland Scheidegger wrote:
Ok in fact there's a gcc bug about memcmp:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
In short gcc's memcmp builtin is totally lame and loses to glibc's
memcmp (including call overhead, no knowledge about alignment etc.) even
when comparing only very few bytes (and loses BIG time for lots of bytes
to compare). Oops. Well at least if the strings are the same (I'd guess
if the first byte is different it's hard to beat the gcc builtin...).
So this is really a gcc bug. The bug is quite old though with no fix in
sight apparently so might need to think about some workaround (but just
not doing the comparison doesn't look like the right idea, since
apparently it would be faster with the comparison if gcc's memcmp got
fixed).


Roland



Am 30.06.2011 01:47, schrieb Roland Scheidegger:
I didn't even look at that was just curious why the memcmp (which is
used a lot in other places) is slow. However, none of the other memcmp
seem to show up prominently (cso functions are quite low in profiles,
_mesa_search_program_cache uses memcmp too but it's not that high
neither). So I guess those functions either aren't called that often or
the sizes they compare are small.
So should maybe just file a gcc bug for memcmp and look at that
particular llvmpipe issue again :-).

Roland

Am 30.06.2011 01:16, schrieb Corbin Simpson:
Okay, so maybe I'm failing to recognize the exact situation here, but
wouldn't it be possible to mark the FS state with a serial number and
just compare those? Or are these FS states not CSO-cached?

~ C.

On Wed, Jun 29, 2011 at 3:44 PM, Roland Scheidegger<srol...@vmware.com>  wrote:
Actually I ran some numbers here and tried out a optimized struct compare:
original ipers: 12.1 fps
ajax patch: 15.5 fps
optimized struct compare: 16.8 fps


This is the function I used for that (just enabled in that lp_setup
function):

static INLINE int util_cmp_struct(const void *src1, const void *src2,
unsigned count)
{
  /* hmm pointer casting is evil */
  const uint32_t *src1_ptr = (uint32_t *)src1;
  const uint32_t *src2_ptr = (uint32_t *)src2;
  unsigned i;
  assert(count % 4 == 0);
  for (i = 0; i<  count/4; i++) {
    if (*src1_ptr != *src2_ptr) {
      return 1;
    }
    src1_ptr++;
    src2_ptr++;
  }
  return 0;
}

(And no this doesn't use repz cmpsd here.)

So, unless I made some mistake, memcmp is just dead slow (*), most of
the slowness probably coming from the bytewise comparison (and
apparently I was wrong in assuming the comparison there might never be
the same for ipers).
Of course, the optimized struct compare relies on structs really being
dword aligned (I think this is always the case), and additionally it
relies on the struct size being a whole multiple of dwords - likely
struct needs padding to ensure that (at least I don't think this is
always guaranteed for all structs).
But since memcmp is used extensively (cso for one) maybe some
optimization along these lines might be worth it (though of course for
small structs the win isn't going to be as big - and can't beat the repz
cmps in code size...).

Roland

(*) I actually found some references some implementations might be
better they don't just use repz cmpsb but they split this up in parts
which do dword (or qword even - well for really large structs could use
sse2) comparisons for the parts where it's possible and only byte
comparisons for the remaining bytes (and if the compiler does that it
probably would know the size at compile time here hence could leave out
much of the code). Of course memcmp requires that the return value isn't
just a true or false value, hence there's more code needed once an
unequal dword is found, though the compiler could optimize that away too
in case it's not needed. Much the same as memcpy is optimized usually
really, so blame gcc :-).



Am 29.06.2011 20:33, schrieb Roland Scheidegger:
Ohh that's interesting, you'd think the comparison shouldn't be that
expensive (though I guess in ipers case the comparison is never true).
memcmp is quite extensively used everywhere. Maybe we could replace that
with something faster (since we only ever care if the blocks are the
same but not care about the lexographic ordering and always compare
whole structs, should compare dwords instead of bytes for a 4 time
speedup)? Or isn't that the reason cmpsb instead of cmpsd is used?
Also I guess it would help if the values which are more likely to be
unequal are first in the struct (if we can tell that).
Of course though if it's unlikely to be the same as the compared value
anyway not comparing at all still might be a win (here).

Roland

Am 29.06.2011 19:19, schrieb Adam Jackson:
Perversely, do this by eliminating the comparison between stored and
current fs state.  On ipers, a perf trace showed try_update_scene_state
using 31% of a CPU, and 98% of that was in 'repz cmpsb', ie, the memcmp.
Taking that out takes try_update_scene_state down to 6.5% of the
profile; more importantly, ipers goes from 10 to 14fps and gears goes
from 790 to 830fps.

Signed-off-by: Adam Jackson<a...@redhat.com>
---
  src/gallium/drivers/llvmpipe/lp_setup.c |   61 ++++++++++++++-----------------
  1 files changed, 27 insertions(+), 34 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c 
b/src/gallium/drivers/llvmpipe/lp_setup.c
index cbe06e5..9118db5 100644
--- a/src/gallium/drivers/llvmpipe/lp_setup.c
+++ b/src/gallium/drivers/llvmpipe/lp_setup.c
@@ -839,42 +839,35 @@ try_update_scene_state( struct lp_setup_context *setup )
        setup->dirty |= LP_SETUP_NEW_FS;
     }

-
     if (setup->dirty&  LP_SETUP_NEW_FS) {
-      if (!setup->fs.stored ||
-          memcmp(setup->fs.stored,
-&setup->fs.current,
-                 sizeof setup->fs.current) != 0)
-      {
-         struct lp_rast_state *stored;
-         uint i;
-
-         /* The fs state that's been stored in the scene is different from
-          * the new, current state.  So allocate a new lp_rast_state object
-          * and append it to the bin's setup data buffer.
-          */
-         stored = (struct lp_rast_state *) lp_scene_alloc(scene, sizeof 
*stored);
-         if (!stored) {
-            assert(!new_scene);
-            return FALSE;
-         }
+      struct lp_rast_state *stored;
+      uint i;
+
+      /* The fs state that's been stored in the scene is different from
+       * the new, current state.  So allocate a new lp_rast_state object
+       * and append it to the bin's setup data buffer.
+       */
+      stored = (struct lp_rast_state *) lp_scene_alloc(scene, sizeof *stored);
+      if (!stored) {
+         assert(!new_scene);
+         return FALSE;
+      }

-         memcpy(stored,
-&setup->fs.current,
-                sizeof setup->fs.current);
-         setup->fs.stored = stored;
-
-         /* The scene now references the textures in the rasterization
-          * state record.  Note that now.
-          */
-         for (i = 0; i<  Elements(setup->fs.current_tex); i++) {
-            if (setup->fs.current_tex[i]) {
-               if (!lp_scene_add_resource_reference(scene,
-                                                    setup->fs.current_tex[i],
-                                                    new_scene)) {
-                  assert(!new_scene);
-                  return FALSE;
-               }
+      memcpy(stored,
+&setup->fs.current,
+             sizeof setup->fs.current);
+      setup->fs.stored = stored;
+
+      /* The scene now references the textures in the rasterization
+       * state record.  Note that now.
+       */
+      for (i = 0; i<  Elements(setup->fs.current_tex); i++) {
+         if (setup->fs.current_tex[i]) {
+            if (!lp_scene_add_resource_reference(scene,
+                                                 setup->fs.current_tex[i],
+                                                 new_scene)) {
+               assert(!new_scene);
+               return FALSE;
              }
           }
        }
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to