Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit

2011-11-15 Thread Michel Dänzer
On Fre, 2011-11-11 at 21:25 +0100, Theiss, Ingo wrote: 
 Am Freitag, 11. November 2011 14:53 CET, Brian Paul brian.e.p...@gmail.com 
 schrieb: 
  
  Ingo, if you could find out what the format/type parameters to
  glReadPixels are, we could look into some optimization in the state
  tracker.  I wouldn't be surprised if there's some channel swizzling or
  format conversion going on.
  
 Hi Brian,
 
 I have digged around in the VirtualGL source code and hope I have
 found what you requested. Don´t blame me if I am wrong as I have very
 limited knowledge in C programming :-(

I suspect Brian might have been thinking of running your app in gdb,
setting a breakpoint in _mesa_ReadPixels and reporting the actual values
that are passed in there from glxSwapBuffers.


 Here is the (overwritten?) glReadPixels function from VirtualGL:
 
 void glReadPixels(GLint x, GLint y, GLsizei width, GLsizei height,
 GLenum format, GLenum type, GLvoid *pixels)
 {
 TRY();
 if(format==GL_COLOR_INDEX  !ctxh.overlaycurrent()  
 type!=GL_BITMAP)
 {
 format=GL_RED;
 if(type==GL_BYTE || type==GL_UNSIGNED_BYTE) 
 type=GL_UNSIGNED_BYTE;
 else
 {
 int rowlen=-1, align=-1;  GLubyte *buf=NULL;
 _glGetIntegerv(GL_PACK_ALIGNMENT, align);
 _glGetIntegerv(GL_PACK_ROW_LENGTH, rowlen);
 newcheck(buf=new unsigned char[width*height])
 if(type==GL_SHORT) type=GL_UNSIGNED_SHORT;
 if(type==GL_INT) type=GL_UNSIGNED_INT;
 glPushClientAttrib(GL_CLIENT_PIXEL_STORE_BIT);
 glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
 glPixelStorei(GL_UNPACK_ROW_LENGTH, 1);
 _glReadPixels(x, y, width, height, format, 
 GL_UNSIGNED_BYTE, buf);
 glPopClientAttrib();
 _rpixelconvert(unsigned short, GL_UNSIGNED_SHORT, 2)
 _rpixelconvert(unsigned int, GL_UNSIGNED_INT, 4)
 _rpixelconvert(float, GL_FLOAT, 4)
 delete [] buf;
 return;
 }
 }
 _glReadPixels(x, y, width, height, format, type, pixels);
 CATCH();
 }
 
 One more noob question regarding this code: Is the glReadPixels
 function with the underscore the call to the parent glReadPixels
 function of libGL.so? I have not found the _glReadPixels function
 inside the VirtualGL code so this seems the only explanation to me.

The above procedure should answer this question as well. Otherwise, it
would rather be a question for VirtualGL developers, though it looks
like it is as you say.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit

2011-11-11 Thread Michel Dänzer
On Fre, 2011-11-11 at 07:35 +0100, Mathias Fröhlich wrote: 
 On Thursday, November 10, 2011 18:42:13 Michel Dänzer wrote:
  On Don, 2011-11-10 at 11:01 +0100, Theiss, Ingo wrote:
   The function calls of mesa/state_tracker/st_cb_readpixels.c:382 -
   st_readpixels and mesa/main/pack.c:552 - _mesa_pack_rgba_span_float
   clearly stands out when comparing the 32 bit and 64 bit profile.
  
  I'm afraid that's a red herring, as I don't think glXSwapBuffers calls
  glReadPixels under any circumstances. I suspect the time inside
  glXSwapBuffers isn't spent on CPU cycles but rather waiting for some
  kind of event(s). Offhand I don't know any better way to debug that than
  attaching gdb, interrupting execution every now and then and hoping to
  catch it somewhere inside glXSwapBuffers. Maybe others have better
  ideas.
 
 Well he is using VirtualGL which intercepts glXSwapBuffers by LD_PRELOAD'ing 
 a 
 shared library containing this and several other functions.
 The aim of VirtualGL is to render on the applicatoin side even for a remote 
 display. To make this work they create an application local accelerated gl 
 context render into this one and on glXSwapBuffers read the back buffer and 
 send 
 it the same way over the X connection like the software renderer does.
 
 So It makes sense to find a glReadPixels in VirtualGL's glxSwapBuffers.

Ah. I thought the time measurements in Ingo's original post were for the
Mesa glXSwapBuffers, not the VirtualGL one. If it's the latter, then
this makes sense.

Ingo, I noticed that your 64-bit and 32-bit drivers were built from
slightly different Git snapshots. Is the problem still the same if you
build both from the same, current snapshot?

If yes, have you compared the compiler flags that end up being used in
both cases? E.g., in 64-bit mode SSE is always available, so there might
be some auto-vectorization going on in that case.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit

2011-11-11 Thread Theiss, Ingo
Am Freitag, 11. November 2011 12:09 CET, Michel Dänzer mic...@daenzer.net 
schrieb: 

 So It makes sense to find a glReadPixels in VirtualGL's glxSwapBuffers.
 
 Ah. I thought the time measurements in Ingo's original post were for the
 Mesa glXSwapBuffers, not the VirtualGL one. If it's the latter, then

 this makes sense.
 
 Ingo, I noticed that your 64-bit and 32-bit drivers were built from
 slightly different Git snapshots. Is the problem still the same if you
 build both from the same, current snapshot?
 
 If yes, have you compared the compiler flags that end up being used in
 both cases? E.g., in 64-bit mode SSE is always available, so there might
 be some auto-vectorization going on in that case.

I´ve rebuild my 64-bit and 32-bit drivers from a fresh Git snapshot and turned 
on all processor optimizations in both builds.
But nevertheless the readback performance measured inside VirtualGL is only 
half of the 64-bit readback performance and of course the rendered window 
sceene is noticeable slower to :-(

Here are the compiler flags used.

32-bit:

CFLAGS: -O2 -Wall -g -m32 -march=amdfam10 -mtune=amdfam10 
-fno-omit-frame-pointer -Wall -Wmissing-prototypes -std=c99 -ffast-math 
-fno-strict-aliasing -fno-builtin-memcmp -m32 -O2 -Wall -g -m32 -march=amdfam10 
-mtune=amdfam10 -fno-omit -frame-pointer -fPIC -m32

CXXFLAGS: -O2 -Wall -g -m32 -march=amdfam10 -mtune=amdfam10 -Wall 
-fno-strict-aliasing -fno-builtin-memcmp -m32 -O2 -Wall -g -m32 -march=amdfam10 
-mtune=amdfam10 -fPIC -m32

Macros: -D_GNU_SOURCE -DPTHREADS -DTEXTURE_FLOAT_ENABLED -DHAVE_POSIX_MEMALIGN 
-DUSE_XCB -DGLX_INDIRECT_RENDERING -DGLX_DIRECT_RENDERING -DGLX_USE_TLS 
-DPTHREADS -DUSE_EXTERNAL_DXTN_LIB=1 -DIN_DRI_DRIVER -DHAVE_ALIAS 
-DHAVE_MINCORE -DHAVE_LIBUDEV -DHAVE_XCB_DRI2 
-DXCB_DRI2_CONNECT_DEVICE_NAME_BROKEN -D__STDC_CONSTANT_MACROS -DUSE_X86_ASM 
-DUSE_MMX_ASM -DUSE_3DNOW_ASM -DUSE_SSE_ASM

64-bit:

CFLAGS: -O2 -Wall -g -march=amdfam10 -mtune=amdfam10 -fno-omit-frame-pointer 
-Wall -Wmissing-prototypes -std=c99 -ffast-math -fno-strict-aliasing 
-fno-builtin-memcmp -m64 -O2 -Wall -g -march=amdfam10 -mtune=amdfam10 
-fno-omit-frame-pointer -fPIC

CXXFLAGS: -O2 -Wall -g -march=amdfam10 -mtune=amdfam10 -Wall 
-fno-strict-aliasing -fno-builtin-memcmp -m64 -O2 -Wall -g -march=amdfam10 
-mtune=amdfam10 -fPIC

Macros: -D_GNU_SOURCE -DPTHREADS -DTEXTURE_FLOAT_ENABLED -DHAVE_POSIX_MEMALIGN 
-DUSE_XCB -DGLX_INDIRECT_RENDERING -DGLX_DIRECT_RENDERING -DGLX_USE_TLS 
-DPTHREADS -DUSE_EXTERNAL_DXTN_LIB=1 -DIN_DRI_DRIVER -DHAVE_ALIAS 
-DHAVE_MINCORE -DHAVE_LIBUDEV -DHAVE_XCB_DRI2 
-DXCB_DRI2_CONNECT_DEVICE_NAME_BROKEN -D__STDC_CONSTANT_MACROS -DUSE_X86_64_ASM

Enclosed you can see some VirtualGL internal performance tracing:

64-bit:

Polygons in scene: 62464
[VGL] Shared memory segment ID for vglconfig: 6848522
[VGL] VirtualGL v2.2.90 64-bit (Build 20110813)
[VGL] Opening local display :0
[VGL] NOTICE: Replacing dlopen(libGL.so.1) with dlopen(librrfaker.so)
[VGL] NOTICE: Replacing dlopen(libGL.so.1) with dlopen(librrfaker.so)
Visual ID of window: 0x21
Context is Direct
OpenGL Renderer: Gallium 0.4 on AMD BARTS
[VGL] Using synchronous readback (GL format = 0x80e1)
Readback-   68.18 Mpixels/sec-   61.10 fps
Blit-  307.79 Mpixels/sec-  275.80 fps
Total   -   55.67 Mpixels/sec-   49.88 fps

32-bit:

Polygons in scene: 62464
[VGL] Shared memory segment ID for vglconfig: 6946826
[VGL] VirtualGL v2.2.90 32-bit (Build 20110815)
[VGL] Opening local display :0
[VGL] NOTICE: Replacing dlopen(libGL.so.1) with dlopen(librrfaker.so)
[VGL] NOTICE: Replacing dlopen(libGL.so.1) with dlopen(librrfaker.so)
Visual ID of window: 0x21
Context is Direct
OpenGL Renderer: Gallium 0.4 on AMD BARTS
[VGL] Using synchronous readback (GL format = 0x80e1)
Readback-   33.80 Mpixels/sec-   30.29 fps
Blit-  307.46 Mpixels/sec-  275.51 fps
Total   -   30.44 Mpixels/sec-   27.27 fps


The VirtualGL developer says the slow readback performane in 32-bit mode is out 
of his scope and driver related.

Regards,

Ingo
 
 
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit

2011-11-11 Thread Michel Dänzer
On Fre, 2011-11-11 at 14:15 +0100, Theiss, Ingo wrote: 
 Am Freitag, 11. November 2011 12:09 CET, Michel Dänzer mic...@daenzer.net 
 schrieb: 
 
  So It makes sense to find a glReadPixels in VirtualGL's glxSwapBuffers.
  
  Ah. I thought the time measurements in Ingo's original post were for the
  Mesa glXSwapBuffers, not the VirtualGL one. If it's the latter, then
 
  this makes sense.
  
  Ingo, I noticed that your 64-bit and 32-bit drivers were built from
  slightly different Git snapshots. Is the problem still the same if you
  build both from the same, current snapshot?
  
  If yes, have you compared the compiler flags that end up being used in
  both cases? E.g., in 64-bit mode SSE is always available, so there might
  be some auto-vectorization going on in that case.
 
 I´ve rebuild my 64-bit and 32-bit drivers from a fresh Git snapshot
 and turned on all processor optimizations in both builds.
 But nevertheless the readback performance measured inside VirtualGL is
 only half of the 64-bit readback performance and of course the
 rendered window sceene is noticeable slower to :-( 
 
 Here are the compiler flags used.
 
 32-bit:
 
 CFLAGS: -O2 -Wall -g -m32 -march=amdfam10 -mtune=amdfam10 
 -fno-omit-frame-pointer -Wall -Wmissing-prototypes -std=c99 -ffast-math 
 -fno-strict-aliasing -fno-builtin-memcmp -m32 -O2 -Wall -g -m32 
 -march=amdfam10 -mtune=amdfam10 -fno-omit -frame-pointer -fPIC -m32

Have you tried adding -mfpmath=sse to the 32-bit CFLAGS? According to my
gcc documentation, that option is enabled by default in 64-bit mode but
disabled in 32-bit mode.

Anyway, I guess there's room for optimization in glReadPixels...


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit

2011-11-11 Thread Brian Paul
2011/11/11 Michel Dänzer mic...@daenzer.net:

 Anyway, I guess there's room for optimization in glReadPixels...

Ingo, if you could find out what the format/type parameters to
glReadPixels are, we could look into some optimization in the state
tracker.  I wouldn't be surprised if there's some channel swizzling or
format conversion going on.

-Brina
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit

2011-11-11 Thread Jose Fonseca


- Original Message -
 
 Am Freitag, 11. November 2011 14:33 CET, Michel Dänzer
 mic...@daenzer.net schrieb:
  
  On Fre, 2011-11-11 at 14:15 +0100, Theiss, Ingo wrote:
   Am Freitag, 11. November 2011 12:09 CET, Michel Dänzer
   mic...@daenzer.net schrieb:
   
So It makes sense to find a glReadPixels in VirtualGL's
glxSwapBuffers.

Ah. I thought the time measurements in Ingo's original post
were for the
Mesa glXSwapBuffers, not the VirtualGL one. If it's the latter,
then
   
this makes sense.

Ingo, I noticed that your 64-bit and 32-bit drivers were built
from
slightly different Git snapshots. Is the problem still the same
if you
build both from the same, current snapshot?

If yes, have you compared the compiler flags that end up being
used in
both cases? E.g., in 64-bit mode SSE is always available, so
there might
be some auto-vectorization going on in that case.
   
   I´ve rebuild my 64-bit and 32-bit drivers from a fresh Git
   snapshot
   and turned on all processor optimizations in both builds.
   But nevertheless the readback performance measured inside
   VirtualGL is
   only half of the 64-bit readback performance and of course the
   rendered window sceene is noticeable slower to :-(
   
   Here are the compiler flags used.
   
   32-bit:
   
   CFLAGS: -O2 -Wall -g -m32 -march=amdfam10 -mtune=amdfam10
   -fno-omit-frame-pointer -Wall -Wmissing-prototypes -std=c99
   -ffast-math -fno-strict-aliasing -fno-builtin-memcmp -m32 -O2
   -Wall -g -m32 -march=amdfam10 -mtune=amdfam10 -fno-omit
   -frame-pointer -fPIC -m32
  
  Have you tried adding -mfpmath=sse to the 32-bit CFLAGS? According
  to my
  gcc documentation, that option is enabled by default in 64-bit mode
  but
  disabled in 32-bit mode.
  
  Anyway, I guess there's room for optimization in glReadPixels...
 
 Ok I have added -mfpmath=sse to the 32-bit CFLAGS and the readback
 performance increased from 30.44 Mpixels/sec to 48.92 Mpixel/sec. We
 are getting closer to the 64-bit performance.

hmm. you should try -msse2 too. It's implied on 64bits, and I'm not sure if 
-march/-mfpmath=sse by itself will enable the intrinsics.

Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit

2011-11-11 Thread Michel Dänzer
On Fre, 2011-11-11 at 06:52 -0800, Jose Fonseca wrote: 
 
 - Original Message -
  
  Am Freitag, 11. November 2011 14:33 CET, Michel Dänzer
  mic...@daenzer.net schrieb:
   
   On Fre, 2011-11-11 at 14:15 +0100, Theiss, Ingo wrote:

Here are the compiler flags used.

32-bit:

CFLAGS: -O2 -Wall -g -m32 -march=amdfam10 -mtune=amdfam10
-fno-omit-frame-pointer -Wall -Wmissing-prototypes -std=c99
-ffast-math -fno-strict-aliasing -fno-builtin-memcmp -m32 -O2
-Wall -g -m32 -march=amdfam10 -mtune=amdfam10 -fno-omit
-frame-pointer -fPIC -m32
   
   Have you tried adding -mfpmath=sse to the 32-bit CFLAGS? According
   to my
   gcc documentation, that option is enabled by default in 64-bit mode
   but
   disabled in 32-bit mode.
   
   Anyway, I guess there's room for optimization in glReadPixels...
  
  Ok I have added -mfpmath=sse to the 32-bit CFLAGS and the readback
  performance increased from 30.44 Mpixels/sec to 48.92 Mpixel/sec. We
  are getting closer to the 64-bit performance.
 
 hmm. you should try -msse2 too. It's implied on 64bits, and I'm not
 sure if -march/-mfpmath=sse by itself will enable the intrinsics.

From my reading of the gcc docs, it's implied by -march=amdfam10 .


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit

2011-11-11 Thread Theiss, Ingo

  
  Ok I have added -mfpmath=sse to the 32-bit CFLAGS and the readback
  performance increased from 30.44 Mpixels/sec to 48.92 Mpixel/sec. We
  are getting closer to the 64-bit performance.
 
 hmm. you should try -msse2 too. It's implied on 64bits, and I'm not sure if 
 -march/-mfpmath=sse by itself will enable the intrinsics.
 
 Jose
 

Added -msse2 to the CFLAGS but that did not increase the performance any 
further.

Thanks for you help.

Ingo

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit

2011-11-11 Thread Theiss, Ingo
 
Am Freitag, 11. November 2011 14:53 CET, Brian Paul brian.e.p...@gmail.com 
schrieb: 
 
 2011/11/11 Michel Dänzer mic...@daenzer.net:
 
  Anyway, I guess there's room for optimization in glReadPixels...
 
 Ingo, if you could find out what the format/type parameters to
 glReadPixels are, we could look into some optimization in the state
 tracker.  I wouldn't be surprised if there's some channel swizzling or
 format conversion going on.
 
 -Brina
 
 
Hi Brian,

I have digged around in the VirtualGL source code and hope I have found what 
you requested. Don´t blame me if I am wrong as I have very limited knowledge in 
C programming :-(

Here is the (overwritten?) glReadPixels function from VirtualGL:

void glReadPixels(GLint x, GLint y, GLsizei width, GLsizei height,
GLenum format, GLenum type, GLvoid *pixels)
{
TRY();
if(format==GL_COLOR_INDEX  !ctxh.overlaycurrent()  type!=GL_BITMAP)
{
format=GL_RED;
if(type==GL_BYTE || type==GL_UNSIGNED_BYTE) 
type=GL_UNSIGNED_BYTE;
else
{
int rowlen=-1, align=-1;  GLubyte *buf=NULL;
_glGetIntegerv(GL_PACK_ALIGNMENT, align);
_glGetIntegerv(GL_PACK_ROW_LENGTH, rowlen);
newcheck(buf=new unsigned char[width*height])
if(type==GL_SHORT) type=GL_UNSIGNED_SHORT;
if(type==GL_INT) type=GL_UNSIGNED_INT;
glPushClientAttrib(GL_CLIENT_PIXEL_STORE_BIT);
glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
glPixelStorei(GL_UNPACK_ROW_LENGTH, 1);
_glReadPixels(x, y, width, height, format, 
GL_UNSIGNED_BYTE, buf);
glPopClientAttrib();
_rpixelconvert(unsigned short, GL_UNSIGNED_SHORT, 2)
_rpixelconvert(unsigned int, GL_UNSIGNED_INT, 4)
_rpixelconvert(float, GL_FLOAT, 4)
delete [] buf;
return;
}
}
_glReadPixels(x, y, width, height, format, type, pixels);
CATCH();
}

One more noob question regarding this code: Is the glReadPixels function with 
the underscore the call to the parent glReadPixels function of libGL.so? I 
have not found the _glReadPixels function inside the VirtualGL code so this 
seems the only explanation to me.

Regards,

Ingo
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit

2011-11-10 Thread Theiss, Ingo
Hi Michel,

thanks for the reply and your suggestions.

It took me a while to figure out how to use and run oprofile but finally I was 
able to produce some hopefully useable output.

The function calls of mesa/state_tracker/st_cb_readpixels.c:382 - 
st_readpixels and mesa/main/pack.c:552 - _mesa_pack_rgba_span_float clearly 
stands out when comparing the 32 bit and 64 bit profile.

You can take a look at the complete reports and callgraph images at:

http://www.i-matrixx.de/oreport_glxspheres64.txt
https://www.i-matrixx.de/oprofile_glxspheres64.png

https://www.i-matrixx.de/oreport_glxspheres32.txt
https://www.i-matrixx.de/oprofile_glxspheres32.png

I hope this helps to find the cause and improve the driver.
To sad I have no knowledge in C programming this is getting interesting. 

Let me know if you need anything else.

Thanks for your time.

Regards,

Ingo
 
Am Montag, 07. November 2011 16:10 CET, Michel Dänzer mic...@daenzer.net 
schrieb: 
 
 On Fre, 2011-11-04 at 13:38 +0100, Theiss, Ingo wrote:
  
  I am using VirtualGL (http://www.virtualgl.org) for full 3D hardware
  accelerated remote OpenGL applications with latest mesa from git
  (compiled for both 32 bit and 64 bit) on my 64 bit Debian Wheezy box.
  
  When I run a 32 bit application with VirtualGL I suffer nearly 50%

  performance drop compared when running the same 64 bit application

  with virtualGL. In the first place I have contacted the VirtualGL
  developer and he said that the performance drop is not a VirtualGL

  problem but related to the underlying 3D driver. The performance drop
  seems related to the function glxSwapBuffers which can be seen in the
  function call tracing of VirtualGL:
  
  64 bit application with VirtualGL
  -
  [VGL] glXSwapBuffers (dpy=0x00deb900(:0) drawable=0x00a2 
  pbw-getglxdrawable()=0x0082 ) 28.770924 ms
  [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 
  pbw-getglxdrawable()=0x0082 ) 0.005960 ms
  [VGL] glViewport (x=0 y=0 width=1240 height=900 ) 0.003099 ms
  [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 
  pbw-getglxdrawable()=0x0082 ) 0.002861 ms
  [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 
  pbw-getglxdrawable()=0x0082 ) 0.002861 ms
  [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 
  pbw-getglxdrawable()=0x0082 ) 0.00 ms
  [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 
  pbw-getglxdrawable()=0x0082 ) 0.000954 ms
  [VGL] glXSwapBuffers (dpy=0x00deb900(:0) drawable=0x00a2 
  pbw-getglxdrawable()=0x0082 ) 29.365063 ms
  [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 
  pbw-getglxdrawable()=0x0082 ) 0.006914 ms
  
  32 bit application with VirtualGL
  -
  [VGL] glXSwapBuffers (dpy=0x087f7458(:0.0) drawable=0x00a2 
  pbw-getglxdrawable()=0x0082 ) 65.419075 ms
  [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 
  pbw-getglxdrawable()=0x0082 ) 0.005930 ms
  [VGL] glViewport (x=0 y=0 width=1240 height=900 ) 0.003049 ms
  [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 
  pbw-getglxdrawable()=0x0082 ) 0.002989 ms
  [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 
  pbw-getglxdrawable()=0x0082 ) 0.004064 ms
  [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 
  pbw-getglxdrawable()=0x0082 ) 0.001051 ms
  [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 
  pbw-getglxdrawable()=0x0082 ) 0.001044 ms
  [VGL] glXSwapBuffers (dpy=0x087f7458(:0.0) drawable=0x00a2 
  pbw-getglxdrawable()=0x0082 ) 65.005891 ms
  [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 
  pbw-getglxdrawable()=0x0082 ) 0.004926 ms
  
  
  Is this performance drop a normal or expected behaviour when running a
  32 bit application on 64 bit OS or some kind of bug?
 
 Probably the latter. You should try to find out where the time is spent
 inside glXSwapBuffers in both cases. If the function is (at least
 roughly) CPU bound, this should be relatively easy with a profiler such
 as sysprof, perf or oprofile. 
 
 
 -- 
 Earthling Michel Dänzer   |   http://www.amd.com
 Libre software enthusiast |  Debian, X and DRI developer
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit

2011-11-10 Thread Michel Dänzer
On Don, 2011-11-10 at 11:01 +0100, Theiss, Ingo wrote: 
 
 The function calls of mesa/state_tracker/st_cb_readpixels.c:382 -
 st_readpixels and mesa/main/pack.c:552 - _mesa_pack_rgba_span_float
 clearly stands out when comparing the 32 bit and 64 bit profile.

I'm afraid that's a red herring, as I don't think glXSwapBuffers calls
glReadPixels under any circumstances. I suspect the time inside
glXSwapBuffers isn't spent on CPU cycles but rather waiting for some
kind of event(s). Offhand I don't know any better way to debug that than
attaching gdb, interrupting execution every now and then and hoping to
catch it somewhere inside glXSwapBuffers. Maybe others have better
ideas.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit

2011-11-10 Thread Mathias Fröhlich

Michael,

On Thursday, November 10, 2011 18:42:13 Michel Dänzer wrote:
 On Don, 2011-11-10 at 11:01 +0100, Theiss, Ingo wrote:
  The function calls of mesa/state_tracker/st_cb_readpixels.c:382 -
  st_readpixels and mesa/main/pack.c:552 - _mesa_pack_rgba_span_float
  clearly stands out when comparing the 32 bit and 64 bit profile.
 
 I'm afraid that's a red herring, as I don't think glXSwapBuffers calls
 glReadPixels under any circumstances. I suspect the time inside
 glXSwapBuffers isn't spent on CPU cycles but rather waiting for some
 kind of event(s). Offhand I don't know any better way to debug that than
 attaching gdb, interrupting execution every now and then and hoping to
 catch it somewhere inside glXSwapBuffers. Maybe others have better
 ideas.

Well he is using VirtualGL which intercepts glXSwapBuffers by LD_PRELOAD'ing a 
shared library containing this and several other functions.
The aim of VirtualGL is to render on the applicatoin side even for a remote 
display. To make this work they create an application local accelerated gl 
context render into this one and on glXSwapBuffers read the back buffer and 
send 
it the same way over the X connection like the software renderer does.

So It makes sense to find a glReadPixels in VirtualGL's glxSwapBuffers.

What VirtualGL provides is application side *accelerated* rendering which 
makes highly sense for example if your vizualization application runs in a 
computation center far from the display and close to the actual simulation 
producing really *huge* amount of data.

Greetings

Mathias
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit

2011-11-07 Thread Michel Dänzer
On Fre, 2011-11-04 at 13:38 +0100, Theiss, Ingo wrote:
 
 I am using VirtualGL (http://www.virtualgl.org) for full 3D hardware
 accelerated remote OpenGL applications with latest mesa from git
 (compiled for both 32 bit and 64 bit) on my 64 bit Debian Wheezy box.
 
 When I run a 32 bit application with VirtualGL I suffer nearly 50%
 performance drop compared when running the same 64 bit application
 with virtualGL. In the first place I have contacted the VirtualGL
 developer and he said that the performance drop is not a VirtualGL
 problem but related to the underlying 3D driver. The performance drop
 seems related to the function glxSwapBuffers which can be seen in the
 function call tracing of VirtualGL:
 
 64 bit application with VirtualGL
 -
 [VGL] glXSwapBuffers (dpy=0x00deb900(:0) drawable=0x00a2 
 pbw-getglxdrawable()=0x0082 ) 28.770924 ms
 [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 
 pbw-getglxdrawable()=0x0082 ) 0.005960 ms
 [VGL] glViewport (x=0 y=0 width=1240 height=900 ) 0.003099 ms
 [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 
 pbw-getglxdrawable()=0x0082 ) 0.002861 ms
 [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 
 pbw-getglxdrawable()=0x0082 ) 0.002861 ms
 [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 
 pbw-getglxdrawable()=0x0082 ) 0.00 ms
 [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 
 pbw-getglxdrawable()=0x0082 ) 0.000954 ms
 [VGL] glXSwapBuffers (dpy=0x00deb900(:0) drawable=0x00a2 
 pbw-getglxdrawable()=0x0082 ) 29.365063 ms
 [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 
 pbw-getglxdrawable()=0x0082 ) 0.006914 ms
 
 32 bit application with VirtualGL
 -
 [VGL] glXSwapBuffers (dpy=0x087f7458(:0.0) drawable=0x00a2 
 pbw-getglxdrawable()=0x0082 ) 65.419075 ms
 [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 
 pbw-getglxdrawable()=0x0082 ) 0.005930 ms
 [VGL] glViewport (x=0 y=0 width=1240 height=900 ) 0.003049 ms
 [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 
 pbw-getglxdrawable()=0x0082 ) 0.002989 ms
 [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 
 pbw-getglxdrawable()=0x0082 ) 0.004064 ms
 [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 
 pbw-getglxdrawable()=0x0082 ) 0.001051 ms
 [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 
 pbw-getglxdrawable()=0x0082 ) 0.001044 ms
 [VGL] glXSwapBuffers (dpy=0x087f7458(:0.0) drawable=0x00a2 
 pbw-getglxdrawable()=0x0082 ) 65.005891 ms
 [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 
 pbw-getglxdrawable()=0x0082 ) 0.004926 ms
 
 
 Is this performance drop a normal or expected behaviour when running a
 32 bit application on 64 bit OS or some kind of bug?

Probably the latter. You should try to find out where the time is spent
inside glXSwapBuffers in both cases. If the function is (at least
roughly) CPU bound, this should be relatively easy with a profiler such
as sysprof, perf or oprofile. 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev