Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit
On Fre, 2011-11-11 at 21:25 +0100, Theiss, Ingo wrote: Am Freitag, 11. November 2011 14:53 CET, Brian Paul brian.e.p...@gmail.com schrieb: Ingo, if you could find out what the format/type parameters to glReadPixels are, we could look into some optimization in the state tracker. I wouldn't be surprised if there's some channel swizzling or format conversion going on. Hi Brian, I have digged around in the VirtualGL source code and hope I have found what you requested. Don´t blame me if I am wrong as I have very limited knowledge in C programming :-( I suspect Brian might have been thinking of running your app in gdb, setting a breakpoint in _mesa_ReadPixels and reporting the actual values that are passed in there from glxSwapBuffers. Here is the (overwritten?) glReadPixels function from VirtualGL: void glReadPixels(GLint x, GLint y, GLsizei width, GLsizei height, GLenum format, GLenum type, GLvoid *pixels) { TRY(); if(format==GL_COLOR_INDEX !ctxh.overlaycurrent() type!=GL_BITMAP) { format=GL_RED; if(type==GL_BYTE || type==GL_UNSIGNED_BYTE) type=GL_UNSIGNED_BYTE; else { int rowlen=-1, align=-1; GLubyte *buf=NULL; _glGetIntegerv(GL_PACK_ALIGNMENT, align); _glGetIntegerv(GL_PACK_ROW_LENGTH, rowlen); newcheck(buf=new unsigned char[width*height]) if(type==GL_SHORT) type=GL_UNSIGNED_SHORT; if(type==GL_INT) type=GL_UNSIGNED_INT; glPushClientAttrib(GL_CLIENT_PIXEL_STORE_BIT); glPixelStorei(GL_UNPACK_ALIGNMENT, 1); glPixelStorei(GL_UNPACK_ROW_LENGTH, 1); _glReadPixels(x, y, width, height, format, GL_UNSIGNED_BYTE, buf); glPopClientAttrib(); _rpixelconvert(unsigned short, GL_UNSIGNED_SHORT, 2) _rpixelconvert(unsigned int, GL_UNSIGNED_INT, 4) _rpixelconvert(float, GL_FLOAT, 4) delete [] buf; return; } } _glReadPixels(x, y, width, height, format, type, pixels); CATCH(); } One more noob question regarding this code: Is the glReadPixels function with the underscore the call to the parent glReadPixels function of libGL.so? I have not found the _glReadPixels function inside the VirtualGL code so this seems the only explanation to me. The above procedure should answer this question as well. Otherwise, it would rather be a question for VirtualGL developers, though it looks like it is as you say. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit
On Fre, 2011-11-11 at 07:35 +0100, Mathias Fröhlich wrote: On Thursday, November 10, 2011 18:42:13 Michel Dänzer wrote: On Don, 2011-11-10 at 11:01 +0100, Theiss, Ingo wrote: The function calls of mesa/state_tracker/st_cb_readpixels.c:382 - st_readpixels and mesa/main/pack.c:552 - _mesa_pack_rgba_span_float clearly stands out when comparing the 32 bit and 64 bit profile. I'm afraid that's a red herring, as I don't think glXSwapBuffers calls glReadPixels under any circumstances. I suspect the time inside glXSwapBuffers isn't spent on CPU cycles but rather waiting for some kind of event(s). Offhand I don't know any better way to debug that than attaching gdb, interrupting execution every now and then and hoping to catch it somewhere inside glXSwapBuffers. Maybe others have better ideas. Well he is using VirtualGL which intercepts glXSwapBuffers by LD_PRELOAD'ing a shared library containing this and several other functions. The aim of VirtualGL is to render on the applicatoin side even for a remote display. To make this work they create an application local accelerated gl context render into this one and on glXSwapBuffers read the back buffer and send it the same way over the X connection like the software renderer does. So It makes sense to find a glReadPixels in VirtualGL's glxSwapBuffers. Ah. I thought the time measurements in Ingo's original post were for the Mesa glXSwapBuffers, not the VirtualGL one. If it's the latter, then this makes sense. Ingo, I noticed that your 64-bit and 32-bit drivers were built from slightly different Git snapshots. Is the problem still the same if you build both from the same, current snapshot? If yes, have you compared the compiler flags that end up being used in both cases? E.g., in 64-bit mode SSE is always available, so there might be some auto-vectorization going on in that case. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit
Am Freitag, 11. November 2011 12:09 CET, Michel Dänzer mic...@daenzer.net schrieb: So It makes sense to find a glReadPixels in VirtualGL's glxSwapBuffers. Ah. I thought the time measurements in Ingo's original post were for the Mesa glXSwapBuffers, not the VirtualGL one. If it's the latter, then this makes sense. Ingo, I noticed that your 64-bit and 32-bit drivers were built from slightly different Git snapshots. Is the problem still the same if you build both from the same, current snapshot? If yes, have you compared the compiler flags that end up being used in both cases? E.g., in 64-bit mode SSE is always available, so there might be some auto-vectorization going on in that case. I´ve rebuild my 64-bit and 32-bit drivers from a fresh Git snapshot and turned on all processor optimizations in both builds. But nevertheless the readback performance measured inside VirtualGL is only half of the 64-bit readback performance and of course the rendered window sceene is noticeable slower to :-( Here are the compiler flags used. 32-bit: CFLAGS: -O2 -Wall -g -m32 -march=amdfam10 -mtune=amdfam10 -fno-omit-frame-pointer -Wall -Wmissing-prototypes -std=c99 -ffast-math -fno-strict-aliasing -fno-builtin-memcmp -m32 -O2 -Wall -g -m32 -march=amdfam10 -mtune=amdfam10 -fno-omit -frame-pointer -fPIC -m32 CXXFLAGS: -O2 -Wall -g -m32 -march=amdfam10 -mtune=amdfam10 -Wall -fno-strict-aliasing -fno-builtin-memcmp -m32 -O2 -Wall -g -m32 -march=amdfam10 -mtune=amdfam10 -fPIC -m32 Macros: -D_GNU_SOURCE -DPTHREADS -DTEXTURE_FLOAT_ENABLED -DHAVE_POSIX_MEMALIGN -DUSE_XCB -DGLX_INDIRECT_RENDERING -DGLX_DIRECT_RENDERING -DGLX_USE_TLS -DPTHREADS -DUSE_EXTERNAL_DXTN_LIB=1 -DIN_DRI_DRIVER -DHAVE_ALIAS -DHAVE_MINCORE -DHAVE_LIBUDEV -DHAVE_XCB_DRI2 -DXCB_DRI2_CONNECT_DEVICE_NAME_BROKEN -D__STDC_CONSTANT_MACROS -DUSE_X86_ASM -DUSE_MMX_ASM -DUSE_3DNOW_ASM -DUSE_SSE_ASM 64-bit: CFLAGS: -O2 -Wall -g -march=amdfam10 -mtune=amdfam10 -fno-omit-frame-pointer -Wall -Wmissing-prototypes -std=c99 -ffast-math -fno-strict-aliasing -fno-builtin-memcmp -m64 -O2 -Wall -g -march=amdfam10 -mtune=amdfam10 -fno-omit-frame-pointer -fPIC CXXFLAGS: -O2 -Wall -g -march=amdfam10 -mtune=amdfam10 -Wall -fno-strict-aliasing -fno-builtin-memcmp -m64 -O2 -Wall -g -march=amdfam10 -mtune=amdfam10 -fPIC Macros: -D_GNU_SOURCE -DPTHREADS -DTEXTURE_FLOAT_ENABLED -DHAVE_POSIX_MEMALIGN -DUSE_XCB -DGLX_INDIRECT_RENDERING -DGLX_DIRECT_RENDERING -DGLX_USE_TLS -DPTHREADS -DUSE_EXTERNAL_DXTN_LIB=1 -DIN_DRI_DRIVER -DHAVE_ALIAS -DHAVE_MINCORE -DHAVE_LIBUDEV -DHAVE_XCB_DRI2 -DXCB_DRI2_CONNECT_DEVICE_NAME_BROKEN -D__STDC_CONSTANT_MACROS -DUSE_X86_64_ASM Enclosed you can see some VirtualGL internal performance tracing: 64-bit: Polygons in scene: 62464 [VGL] Shared memory segment ID for vglconfig: 6848522 [VGL] VirtualGL v2.2.90 64-bit (Build 20110813) [VGL] Opening local display :0 [VGL] NOTICE: Replacing dlopen(libGL.so.1) with dlopen(librrfaker.so) [VGL] NOTICE: Replacing dlopen(libGL.so.1) with dlopen(librrfaker.so) Visual ID of window: 0x21 Context is Direct OpenGL Renderer: Gallium 0.4 on AMD BARTS [VGL] Using synchronous readback (GL format = 0x80e1) Readback- 68.18 Mpixels/sec- 61.10 fps Blit- 307.79 Mpixels/sec- 275.80 fps Total - 55.67 Mpixels/sec- 49.88 fps 32-bit: Polygons in scene: 62464 [VGL] Shared memory segment ID for vglconfig: 6946826 [VGL] VirtualGL v2.2.90 32-bit (Build 20110815) [VGL] Opening local display :0 [VGL] NOTICE: Replacing dlopen(libGL.so.1) with dlopen(librrfaker.so) [VGL] NOTICE: Replacing dlopen(libGL.so.1) with dlopen(librrfaker.so) Visual ID of window: 0x21 Context is Direct OpenGL Renderer: Gallium 0.4 on AMD BARTS [VGL] Using synchronous readback (GL format = 0x80e1) Readback- 33.80 Mpixels/sec- 30.29 fps Blit- 307.46 Mpixels/sec- 275.51 fps Total - 30.44 Mpixels/sec- 27.27 fps The VirtualGL developer says the slow readback performane in 32-bit mode is out of his scope and driver related. Regards, Ingo ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit
On Fre, 2011-11-11 at 14:15 +0100, Theiss, Ingo wrote: Am Freitag, 11. November 2011 12:09 CET, Michel Dänzer mic...@daenzer.net schrieb: So It makes sense to find a glReadPixels in VirtualGL's glxSwapBuffers. Ah. I thought the time measurements in Ingo's original post were for the Mesa glXSwapBuffers, not the VirtualGL one. If it's the latter, then this makes sense. Ingo, I noticed that your 64-bit and 32-bit drivers were built from slightly different Git snapshots. Is the problem still the same if you build both from the same, current snapshot? If yes, have you compared the compiler flags that end up being used in both cases? E.g., in 64-bit mode SSE is always available, so there might be some auto-vectorization going on in that case. I´ve rebuild my 64-bit and 32-bit drivers from a fresh Git snapshot and turned on all processor optimizations in both builds. But nevertheless the readback performance measured inside VirtualGL is only half of the 64-bit readback performance and of course the rendered window sceene is noticeable slower to :-( Here are the compiler flags used. 32-bit: CFLAGS: -O2 -Wall -g -m32 -march=amdfam10 -mtune=amdfam10 -fno-omit-frame-pointer -Wall -Wmissing-prototypes -std=c99 -ffast-math -fno-strict-aliasing -fno-builtin-memcmp -m32 -O2 -Wall -g -m32 -march=amdfam10 -mtune=amdfam10 -fno-omit -frame-pointer -fPIC -m32 Have you tried adding -mfpmath=sse to the 32-bit CFLAGS? According to my gcc documentation, that option is enabled by default in 64-bit mode but disabled in 32-bit mode. Anyway, I guess there's room for optimization in glReadPixels... -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit
2011/11/11 Michel Dänzer mic...@daenzer.net: Anyway, I guess there's room for optimization in glReadPixels... Ingo, if you could find out what the format/type parameters to glReadPixels are, we could look into some optimization in the state tracker. I wouldn't be surprised if there's some channel swizzling or format conversion going on. -Brina ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit
- Original Message - Am Freitag, 11. November 2011 14:33 CET, Michel Dänzer mic...@daenzer.net schrieb: On Fre, 2011-11-11 at 14:15 +0100, Theiss, Ingo wrote: Am Freitag, 11. November 2011 12:09 CET, Michel Dänzer mic...@daenzer.net schrieb: So It makes sense to find a glReadPixels in VirtualGL's glxSwapBuffers. Ah. I thought the time measurements in Ingo's original post were for the Mesa glXSwapBuffers, not the VirtualGL one. If it's the latter, then this makes sense. Ingo, I noticed that your 64-bit and 32-bit drivers were built from slightly different Git snapshots. Is the problem still the same if you build both from the same, current snapshot? If yes, have you compared the compiler flags that end up being used in both cases? E.g., in 64-bit mode SSE is always available, so there might be some auto-vectorization going on in that case. I´ve rebuild my 64-bit and 32-bit drivers from a fresh Git snapshot and turned on all processor optimizations in both builds. But nevertheless the readback performance measured inside VirtualGL is only half of the 64-bit readback performance and of course the rendered window sceene is noticeable slower to :-( Here are the compiler flags used. 32-bit: CFLAGS: -O2 -Wall -g -m32 -march=amdfam10 -mtune=amdfam10 -fno-omit-frame-pointer -Wall -Wmissing-prototypes -std=c99 -ffast-math -fno-strict-aliasing -fno-builtin-memcmp -m32 -O2 -Wall -g -m32 -march=amdfam10 -mtune=amdfam10 -fno-omit -frame-pointer -fPIC -m32 Have you tried adding -mfpmath=sse to the 32-bit CFLAGS? According to my gcc documentation, that option is enabled by default in 64-bit mode but disabled in 32-bit mode. Anyway, I guess there's room for optimization in glReadPixels... Ok I have added -mfpmath=sse to the 32-bit CFLAGS and the readback performance increased from 30.44 Mpixels/sec to 48.92 Mpixel/sec. We are getting closer to the 64-bit performance. hmm. you should try -msse2 too. It's implied on 64bits, and I'm not sure if -march/-mfpmath=sse by itself will enable the intrinsics. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit
On Fre, 2011-11-11 at 06:52 -0800, Jose Fonseca wrote: - Original Message - Am Freitag, 11. November 2011 14:33 CET, Michel Dänzer mic...@daenzer.net schrieb: On Fre, 2011-11-11 at 14:15 +0100, Theiss, Ingo wrote: Here are the compiler flags used. 32-bit: CFLAGS: -O2 -Wall -g -m32 -march=amdfam10 -mtune=amdfam10 -fno-omit-frame-pointer -Wall -Wmissing-prototypes -std=c99 -ffast-math -fno-strict-aliasing -fno-builtin-memcmp -m32 -O2 -Wall -g -m32 -march=amdfam10 -mtune=amdfam10 -fno-omit -frame-pointer -fPIC -m32 Have you tried adding -mfpmath=sse to the 32-bit CFLAGS? According to my gcc documentation, that option is enabled by default in 64-bit mode but disabled in 32-bit mode. Anyway, I guess there's room for optimization in glReadPixels... Ok I have added -mfpmath=sse to the 32-bit CFLAGS and the readback performance increased from 30.44 Mpixels/sec to 48.92 Mpixel/sec. We are getting closer to the 64-bit performance. hmm. you should try -msse2 too. It's implied on 64bits, and I'm not sure if -march/-mfpmath=sse by itself will enable the intrinsics. From my reading of the gcc docs, it's implied by -march=amdfam10 . -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit
Ok I have added -mfpmath=sse to the 32-bit CFLAGS and the readback performance increased from 30.44 Mpixels/sec to 48.92 Mpixel/sec. We are getting closer to the 64-bit performance. hmm. you should try -msse2 too. It's implied on 64bits, and I'm not sure if -march/-mfpmath=sse by itself will enable the intrinsics. Jose Added -msse2 to the CFLAGS but that did not increase the performance any further. Thanks for you help. Ingo ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit
Am Freitag, 11. November 2011 14:53 CET, Brian Paul brian.e.p...@gmail.com schrieb: 2011/11/11 Michel Dänzer mic...@daenzer.net: Anyway, I guess there's room for optimization in glReadPixels... Ingo, if you could find out what the format/type parameters to glReadPixels are, we could look into some optimization in the state tracker. I wouldn't be surprised if there's some channel swizzling or format conversion going on. -Brina Hi Brian, I have digged around in the VirtualGL source code and hope I have found what you requested. Don´t blame me if I am wrong as I have very limited knowledge in C programming :-( Here is the (overwritten?) glReadPixels function from VirtualGL: void glReadPixels(GLint x, GLint y, GLsizei width, GLsizei height, GLenum format, GLenum type, GLvoid *pixels) { TRY(); if(format==GL_COLOR_INDEX !ctxh.overlaycurrent() type!=GL_BITMAP) { format=GL_RED; if(type==GL_BYTE || type==GL_UNSIGNED_BYTE) type=GL_UNSIGNED_BYTE; else { int rowlen=-1, align=-1; GLubyte *buf=NULL; _glGetIntegerv(GL_PACK_ALIGNMENT, align); _glGetIntegerv(GL_PACK_ROW_LENGTH, rowlen); newcheck(buf=new unsigned char[width*height]) if(type==GL_SHORT) type=GL_UNSIGNED_SHORT; if(type==GL_INT) type=GL_UNSIGNED_INT; glPushClientAttrib(GL_CLIENT_PIXEL_STORE_BIT); glPixelStorei(GL_UNPACK_ALIGNMENT, 1); glPixelStorei(GL_UNPACK_ROW_LENGTH, 1); _glReadPixels(x, y, width, height, format, GL_UNSIGNED_BYTE, buf); glPopClientAttrib(); _rpixelconvert(unsigned short, GL_UNSIGNED_SHORT, 2) _rpixelconvert(unsigned int, GL_UNSIGNED_INT, 4) _rpixelconvert(float, GL_FLOAT, 4) delete [] buf; return; } } _glReadPixels(x, y, width, height, format, type, pixels); CATCH(); } One more noob question regarding this code: Is the glReadPixels function with the underscore the call to the parent glReadPixels function of libGL.so? I have not found the _glReadPixels function inside the VirtualGL code so this seems the only explanation to me. Regards, Ingo ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit
Hi Michel, thanks for the reply and your suggestions. It took me a while to figure out how to use and run oprofile but finally I was able to produce some hopefully useable output. The function calls of mesa/state_tracker/st_cb_readpixels.c:382 - st_readpixels and mesa/main/pack.c:552 - _mesa_pack_rgba_span_float clearly stands out when comparing the 32 bit and 64 bit profile. You can take a look at the complete reports and callgraph images at: http://www.i-matrixx.de/oreport_glxspheres64.txt https://www.i-matrixx.de/oprofile_glxspheres64.png https://www.i-matrixx.de/oreport_glxspheres32.txt https://www.i-matrixx.de/oprofile_glxspheres32.png I hope this helps to find the cause and improve the driver. To sad I have no knowledge in C programming this is getting interesting. Let me know if you need anything else. Thanks for your time. Regards, Ingo Am Montag, 07. November 2011 16:10 CET, Michel Dänzer mic...@daenzer.net schrieb: On Fre, 2011-11-04 at 13:38 +0100, Theiss, Ingo wrote: I am using VirtualGL (http://www.virtualgl.org) for full 3D hardware accelerated remote OpenGL applications with latest mesa from git (compiled for both 32 bit and 64 bit) on my 64 bit Debian Wheezy box. When I run a 32 bit application with VirtualGL I suffer nearly 50% performance drop compared when running the same 64 bit application with virtualGL. In the first place I have contacted the VirtualGL developer and he said that the performance drop is not a VirtualGL problem but related to the underlying 3D driver. The performance drop seems related to the function glxSwapBuffers which can be seen in the function call tracing of VirtualGL: 64 bit application with VirtualGL - [VGL] glXSwapBuffers (dpy=0x00deb900(:0) drawable=0x00a2 pbw-getglxdrawable()=0x0082 ) 28.770924 ms [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.005960 ms [VGL] glViewport (x=0 y=0 width=1240 height=900 ) 0.003099 ms [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.002861 ms [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.002861 ms [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.00 ms [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.000954 ms [VGL] glXSwapBuffers (dpy=0x00deb900(:0) drawable=0x00a2 pbw-getglxdrawable()=0x0082 ) 29.365063 ms [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.006914 ms 32 bit application with VirtualGL - [VGL] glXSwapBuffers (dpy=0x087f7458(:0.0) drawable=0x00a2 pbw-getglxdrawable()=0x0082 ) 65.419075 ms [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.005930 ms [VGL] glViewport (x=0 y=0 width=1240 height=900 ) 0.003049 ms [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.002989 ms [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.004064 ms [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.001051 ms [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.001044 ms [VGL] glXSwapBuffers (dpy=0x087f7458(:0.0) drawable=0x00a2 pbw-getglxdrawable()=0x0082 ) 65.005891 ms [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.004926 ms Is this performance drop a normal or expected behaviour when running a 32 bit application on 64 bit OS or some kind of bug? Probably the latter. You should try to find out where the time is spent inside glXSwapBuffers in both cases. If the function is (at least roughly) CPU bound, this should be relatively easy with a profiler such as sysprof, perf or oprofile. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit
On Don, 2011-11-10 at 11:01 +0100, Theiss, Ingo wrote: The function calls of mesa/state_tracker/st_cb_readpixels.c:382 - st_readpixels and mesa/main/pack.c:552 - _mesa_pack_rgba_span_float clearly stands out when comparing the 32 bit and 64 bit profile. I'm afraid that's a red herring, as I don't think glXSwapBuffers calls glReadPixels under any circumstances. I suspect the time inside glXSwapBuffers isn't spent on CPU cycles but rather waiting for some kind of event(s). Offhand I don't know any better way to debug that than attaching gdb, interrupting execution every now and then and hoping to catch it somewhere inside glXSwapBuffers. Maybe others have better ideas. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit
Michael, On Thursday, November 10, 2011 18:42:13 Michel Dänzer wrote: On Don, 2011-11-10 at 11:01 +0100, Theiss, Ingo wrote: The function calls of mesa/state_tracker/st_cb_readpixels.c:382 - st_readpixels and mesa/main/pack.c:552 - _mesa_pack_rgba_span_float clearly stands out when comparing the 32 bit and 64 bit profile. I'm afraid that's a red herring, as I don't think glXSwapBuffers calls glReadPixels under any circumstances. I suspect the time inside glXSwapBuffers isn't spent on CPU cycles but rather waiting for some kind of event(s). Offhand I don't know any better way to debug that than attaching gdb, interrupting execution every now and then and hoping to catch it somewhere inside glXSwapBuffers. Maybe others have better ideas. Well he is using VirtualGL which intercepts glXSwapBuffers by LD_PRELOAD'ing a shared library containing this and several other functions. The aim of VirtualGL is to render on the applicatoin side even for a remote display. To make this work they create an application local accelerated gl context render into this one and on glXSwapBuffers read the back buffer and send it the same way over the X connection like the software renderer does. So It makes sense to find a glReadPixels in VirtualGL's glxSwapBuffers. What VirtualGL provides is application side *accelerated* rendering which makes highly sense for example if your vizualization application runs in a computation center far from the display and close to the actual simulation producing really *huge* amount of data. Greetings Mathias ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Performance glxSwapBuffers 32 bit vs. 64 bit
On Fre, 2011-11-04 at 13:38 +0100, Theiss, Ingo wrote: I am using VirtualGL (http://www.virtualgl.org) for full 3D hardware accelerated remote OpenGL applications with latest mesa from git (compiled for both 32 bit and 64 bit) on my 64 bit Debian Wheezy box. When I run a 32 bit application with VirtualGL I suffer nearly 50% performance drop compared when running the same 64 bit application with virtualGL. In the first place I have contacted the VirtualGL developer and he said that the performance drop is not a VirtualGL problem but related to the underlying 3D driver. The performance drop seems related to the function glxSwapBuffers which can be seen in the function call tracing of VirtualGL: 64 bit application with VirtualGL - [VGL] glXSwapBuffers (dpy=0x00deb900(:0) drawable=0x00a2 pbw-getglxdrawable()=0x0082 ) 28.770924 ms [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.005960 ms [VGL] glViewport (x=0 y=0 width=1240 height=900 ) 0.003099 ms [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.002861 ms [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.002861 ms [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.00 ms [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.000954 ms [VGL] glXSwapBuffers (dpy=0x00deb900(:0) drawable=0x00a2 pbw-getglxdrawable()=0x0082 ) 29.365063 ms [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.006914 ms 32 bit application with VirtualGL - [VGL] glXSwapBuffers (dpy=0x087f7458(:0.0) drawable=0x00a2 pbw-getglxdrawable()=0x0082 ) 65.419075 ms [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.005930 ms [VGL] glViewport (x=0 y=0 width=1240 height=900 ) 0.003049 ms [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.002989 ms [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.004064 ms [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.001051 ms [VGL] glPopAttrib (pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.001044 ms [VGL] glXSwapBuffers (dpy=0x087f7458(:0.0) drawable=0x00a2 pbw-getglxdrawable()=0x0082 ) 65.005891 ms [VGL] glDrawBuffer (mode=0x0405 pbw-_dirty=0 pbw-_rdirty=0 pbw-getglxdrawable()=0x0082 ) 0.004926 ms Is this performance drop a normal or expected behaviour when running a 32 bit application on 64 bit OS or some kind of bug? Probably the latter. You should try to find out where the time is spent inside glXSwapBuffers in both cases. If the function is (at least roughly) CPU bound, this should be relatively easy with a profiler such as sysprof, perf or oprofile. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev