[Dri-devel] Streaming video through glTexSubImage2D

2003-01-31 Thread Morten Hustveit
Currently, the performance when streaming video through glTexSubImage2D is 
very low.  In my test program and with mplayer, I get approximately 8 fps in 
720x576 on my Radeon 7500 with texmem-branch from a couple of weeks ago.  
glDrawPixels is equally slow.  I assume glTexSubImage2D is supposed to be 
able to process realtime video, since it handles extensions like 
EXT_422_pixels (for 4:2:2 Y'CbCr) and EXT_interlace.

Using OpenGL for streaming video is useful for creating nonlinear video 
editing applications (I think Apple's Shake use OpenGL), because you will be 
able to preview many of the most common effects in realtime.

Is there any work in progress to make texture sub-image uploading faster?  
Which changes need to be done?


---
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Streaming video through glTexSubImage2D

2003-01-31 Thread Jens Owen
Morten Hustveit wrote:

Currently, the performance when streaming video through glTexSubImage2D is 
very low.  In my test program and with mplayer, I get approximately 8 fps in 
720x576 on my Radeon 7500 with texmem-branch from a couple of weeks ago.  
glDrawPixels is equally slow.  I assume glTexSubImage2D is supposed to be 
able to process realtime video, since it handles extensions like 
EXT_422_pixels (for 4:2:2 Y'CbCr) and EXT_interlace.

Using OpenGL for streaming video is useful for creating nonlinear video 
editing applications (I think Apple's Shake use OpenGL), because you will be 
able to preview many of the most common effects in realtime.

Is there any work in progress to make texture sub-image uploading faster?  
Which changes need to be done?

Morten,

The R200 driver supports an AGP allocator, but that's for the Radeon 
8500 and 9000.  You would need to port the allocator 
(APPLE_client_storage) to the Radeon driver if you wanted to use it on 
the Radeon 7500.

Regards,
Jens

--
   /\
 Jens Owen/  \/\ _
  [EMAIL PROTECTED]  /\ \ \   Steamboat Springs, Colorado



---
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [Dri-devel] Streaming video through glTexSubImage2D

2003-01-31 Thread Ian Romanick
Morten Hustveit wrote:

Currently, the performance when streaming video through glTexSubImage2D is 
very low.  In my test program and with mplayer, I get approximately 8 fps in 
720x576 on my Radeon 7500 with texmem-branch from a couple of weeks ago.  
glDrawPixels is equally slow.  I assume glTexSubImage2D is supposed to be 
able to process realtime video, since it handles extensions like 
EXT_422_pixels (for 4:2:2 Y'CbCr) and EXT_interlace.

Using OpenGL for streaming video is useful for creating nonlinear video 
editing applications (I think Apple's Shake use OpenGL), because you will be 
able to preview many of the most common effects in realtime.

Is there any work in progress to make texture sub-image uploading faster?  
Which changes need to be done?

There are two typical ways to go about imporving texture upload 
performance in OpenGL applications.  One is through the use of OpenGL 
extensions.  There are several extensions available (or available any 
day now) to help this process.  NV_pixel_data_range and 
APPLE_client_storage are the two most directly applicable.  Neither of 
these two is /generally/ available in DRI.  There is a version of 
NV_vertex_array_range in the R200 (and R100?) driver that can be used 
with APPLE_client_storage for texture data.

http://oss.sgi.com/projects/ogl-sample/registry/NV/pixel_data_range.txt
http://oss.sgi.com/projects/ogl-sample/registry/APPLE/client_storage.txt

Jeff Hartmann and I are in the process of designing a COMPLETE 
replacement of the memory management system for DRI.  This re-work 
should allow for a full, proper implementation of APPLE_client_storage. 
 It's going to take a lot of work, though.  The way that 
APPLE_client_storage is implemented in MacOS X is the application 
mallocs memory for textures and the system dynamically maps those pages 
into the AGP aperture.  This would be very difficult on x86, but I think 
Jeff has thought of a different way to get the same effect.

There is another extension from the ARB that should be available, 
literally, any day now to accelerate the process of uploading vertex 
data (it's a replacement for NV_vertex_array_range  
ATI_vertex_array_object).  John Carmack made brief mention of it in his 
recent plan update.  As a follow on, there will likely be a version for 
texture data very soon.

I plan to have both these extensions implemented in DRI as part of the 
memory managment re-write.  My personal opinion is that NV_*_range will 
universally go away after ARB_vertex_buffer_object gains ground.  There 
are too many pitfalls with them for general use, especially WRT software 
fallbacks.  The slow software path becomes even slower if the 
application optimizes by putting data in AGP or on-card memory. :P

The other way to speed-up texture upload performance is to double-buffer 
the textures in side the driver.  The straight forward way to implement 
texture updates is to wait for rendering that may be using the texture 
to finish, then modify the texture data in place.  If I'm not mistaken, 
this is how DRI works.  The optimization is to allocate a new texture 
buffer if the texture has in-flight rendering.

This should be doable in the current implementation, but the 
implementation would be non-trivial.  Basically, you'd have to add a way 
to track if a texture has in-flight rendering.  In the TexSubImage 
functions for each driver you'd need to add code to detect this case. 
In this case the old driTextureObject would need to be added to a list 
of dead texture object (to be released when their rendering is done), 
and a new driTextureObject would need to be allocated.  Periodically 
objects in the dead list would need to be checked and, if their 
rendering is complete, freed.

That's the 10,000 mile over-view.  There's probably some other cases I'm 
missing.  It might also be possible to implement most of this in a 
device independent way, but I would do it in a single driver first.  I 
think the tough part will be getting the fencing right.  If you (or 
anyone else!!!) would be interested in working on this, we can talk 
about it more in next Monday's #dri-devel meeting.



---
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [Dri-devel] Streaming video through glTexSubImage2D

2003-01-31 Thread Leif Delgass
On Fri, 31 Jan 2003, Arkadi Shishlov wrote:

 On Fri, Jan 31, 2003 at 10:26:13AM -0800, Ian Romanick wrote:
  There are two typical ways to go about imporving texture upload 
  performance in OpenGL applications.  One is through the use of OpenGL 
  extensions.  There are several extensions available (or available any 
 
 You are talking about extensions here, but my P3 600MHz Radeon8500 box
 with ATI binary drivers is able to push normal frame rates in MPlayer
 with 720x480 movies with OpenGL output driver at 80% CPU load.
 30% with XVideo.
 It use regular glTexSubImage2D, so it is either R100 or DRI beign slow
 in this case (if CPU is powerful enough).
 I don't know much about extensions you mentioned, but how much you'll
 save with MPlayer? One memcpy() (assuming it doesn't wait for texture
 upload)?

Actually, iirc, all the drivers actually implement glTexSubImage2D the
same way as glTexImage2D.  They always upload the entire texture image --
there was a comment I remeber seeing about the subimage index calculations
being wrong.  Fixing this to only upload the subimage would help the
performance of glTexSubImage2D.

-- 
Leif Delgass 
http://www.retinalburn.net



---
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Streaming video through glTexSubImage2D

2003-01-31 Thread Arkadi Shishlov
On Fri, Jan 31, 2003 at 04:33:36PM -0500, Leif Delgass wrote:
 Actually, iirc, all the drivers actually implement glTexSubImage2D the
 same way as glTexImage2D.  They always upload the entire texture image --
 there was a comment I remeber seeing about the subimage index calculations
 being wrong.  Fixing this to only upload the subimage would help the
 performance of glTexSubImage2D.

I think it doesn't make any difference with MPLayer, it replace whole texture
for every frame (there is draw_slice() in libvo/vo_gl.c, but I doubt it is used
too much; possible source of low performance with DRI?). It always upload in RGB
format, so probably much of the CPU is spent in yuv2rgb().

How glTexSubImage2D can upload full texture? The original source is gone, does
it keep a copy internally?


arkadi.


---
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Streaming video through glTexSubImage2D

2003-01-31 Thread Arkadi Shishlov
On Fri, Jan 31, 2003 at 10:26:13AM -0800, Ian Romanick wrote:
 There are two typical ways to go about imporving texture upload 
 performance in OpenGL applications.  One is through the use of OpenGL 
 extensions.  There are several extensions available (or available any 

You are talking about extensions here, but my P3 600MHz Radeon8500 box
with ATI binary drivers is able to push normal frame rates in MPlayer
with 720x480 movies with OpenGL output driver at 80% CPU load.
30% with XVideo.
It use regular glTexSubImage2D, so it is either R100 or DRI beign slow
in this case (if CPU is powerful enough).
I don't know much about extensions you mentioned, but how much you'll
save with MPlayer? One memcpy() (assuming it doesn't wait for texture
upload)?


arkadi.


---
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Streaming video through glTexSubImage2D

2003-01-31 Thread Leif Delgass
On Fri, 31 Jan 2003, Arkadi Shishlov wrote:

 On Fri, Jan 31, 2003 at 04:33:36PM -0500, Leif Delgass wrote:
  Actually, iirc, all the drivers actually implement glTexSubImage2D the
  same way as glTexImage2D.  They always upload the entire texture image --
  there was a comment I remeber seeing about the subimage index calculations
  being wrong.  Fixing this to only upload the subimage would help the
  performance of glTexSubImage2D.
 
 I think it doesn't make any difference with MPLayer, it replace whole
 texture for every frame (there is draw_slice() in libvo/vo_gl.c, but I
 doubt it is used too much; possible source of low performance with
 DRI?). It always upload in RGB format, so probably much of the CPU is
 spent in yuv2rgb().

You're probably right, in most apps it likely wouldn't have a large
impact.  The extensions that Ian described are going to have more of an
effect.
 
 How glTexSubImage2D can upload full texture? The original source is
 gone, does it keep a copy internally?

Yes, the Mesa drivers currently keep a copy of all textures in system
memory, but this is one of the things that could change with a new
AGP/texture management scheme.

-- 
Leif Delgass 
http://www.retinalburn.net





---
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Streaming video through glTexSubImage2D

2003-01-31 Thread Ian Romanick
Arkadi Shishlov wrote:

On Fri, Jan 31, 2003 at 10:26:13AM -0800, Ian Romanick wrote:


There are two typical ways to go about imporving texture upload 
performance in OpenGL applications.  One is through the use of OpenGL 
extensions.  There are several extensions available (or available any 

You are talking about extensions here, but my P3 600MHz Radeon8500 box
with ATI binary drivers is able to push normal frame rates in MPlayer
with 720x480 movies with OpenGL output driver at 80% CPU load.
30% with XVideo.
It use regular glTexSubImage2D, so it is either R100 or DRI beign slow
in this case (if CPU is powerful enough).


I'm 99% sure that the ATI driver multi-buffers textures.  This was the 
second technique that I mentioned in my post to improve texture upload 
performance.  There are probably other ways to pipeline texture uploads, 
but the DRI doesn't use any of them.  My guess is that if you profiled 
it you would see that most of the wall clock time is spent waiting for 
the rendering pipe to flush.

I believe that this problem is the reason the guys at Tungsten 
implemented NV_vertex_array_range and the simplified version of 
APPLE_client_storage.  The real fix is going to be a LOT of work.  The 
current sollution is a good stop-gap method, though.  I would suggest 
modifying MPlayer to use the var+client_storage work-around, and then 
help us implement the long-term fix. :)

I don't know much about extensions you mentioned, but how much you'll
save with MPlayer? One memcpy() (assuming it doesn't wait for texture
upload)?


It depends.  Using a real implementation of APPLE_client_storage, your 
main loop would like something like the following, and it there would be 
little or no waiting and no copies.  This loop would actually require 
APPLE_fence, but that would be fairly trivial.

The trick is that when you use a texture the driver uses pages from your 
memory space as pages for the AGP aperture.  I don't know exactly what 
they've done, but I know that Apple has gone to some great lengths to 
optimize this path.

struct {
GLuint   texture_id;
GLuint   fence_id;
void   * buffer;
} texture_ring[ MAX_TEXTURES ];

foo( ... )
{
/* Allocate memory, texture IDs, and fence IDs for the ring. */


i = 0;
while ( ! done ) {
glFinishObjectAPPLE( texture_ring[i].fence );
decode_video_frame( texture_ring[i].buffer );

glBindTexture( GL_TEXTURE_2D, texture_ring[i].texture_id );
	
/* Render with the texture. */


i = (i + 1) % MAX_TEXTURES;
}

/* Destroy textures, free memory, etc. */

}



---
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [Dri-devel] Streaming video through glTexSubImage2D

2003-01-31 Thread Arkadi Shishlov
If texture is locked in DRI while rendering, the straghforward solution
is to allocate new texture for every frame or reuse old one from the ring.
The new texture will be free, one application thread can be dedicated to
push texture to AGP and than into the card, while another threads can decode
the video, eliminating stall.
The method you described is just async io, the only advantage is that
there is no copy app mem - AGP mem.
So multu-buffer is not an advantage over multithreaded approach (besides
of sheduling overhead) but use of DMA'able memory directly is.
Did I understood it correctly?
Probably you can advice MPlayer HQ to use multiple textures in a ring to
speedup mplayer on DRI (in case original problem poster system is not CPU
bound)?


arkadi.


---
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel