On Sun, 2003-08-10 at 03:22, Carsten Haitzler wrote:
> On Sat, 9 Aug 2003 22:07:21 -0700 (PDT) Mark Vojkovich <[EMAIL PROTECTED]>
> babbled:
> 
> > On Sun, 10 Aug 2003, Carsten Haitzler wrote:
> > 
> > > Would I be correct in the assumption that the only accelerated path for
> > > xrender is the identity transform (1:1 scale)? all other transforms are done
> > > in software? (my initial tests here with xfree86 4.3.0 & nvidia's latest
> > > drivers(as of about a month ago) seem to indicate as much...) (and yes... my
> > > drivers are using acceleration... GL definitely is). ?
> > > 
> > 
> >    The NVIDIA drivers fall back to software if the source or mask
> > have any transform.
> 
> really? very interesting as i'm getting my own mmx asm bleding routines blending
> @ 32bpp being 35 times faster than xrender (blending at 1:! scaling, no
> transform, nearest filter for scaling).
> 
> display is 24bpp (src picture is 32bpp, with alpha, component alpha set, repeat
> set to true, dither true). i am not sure.. but a 35 TIMES speed difference with
> software being the faster... sounds wrong to me.

If you are getting 35x faster, it probably means that you are hitting a
code path not accelerated in the NVIDIA drivers... one common case where
this used to happen is if the *source* is in video ram. I know Mark has
been working on fixing this (at least as part of the XAA rewrite for
XFree86 5, but it may still be a limitation of the current nvidia 
drivers.

Making the source a SHM pixmap can be a big win, because it forces it
into system ram. Or it might be the repeat, or dither. 

As for why your software fallbacks are 35x faster than the render
fallbacks, there are three causes, in rough order of importantance.

 - The image is being pushed/pulled over the bus (AGP bus, likely) both 
   ways by the video card instead of having 2 PCI transactions per
   pixel. The nvidia drivers in particular do a good job at optimizing
   GetImage.
 - Special casing of the particular formats in your compositing
   routines. The RENDER code is very general, and very slow.
 - MMX acceleration

The current plan for addressing the second two is to do it as part of
the 'libic' library so the same code can be used in the X server and in
the software fallbacks for Cairo. I think a couple of people have taken
a look at doing that optimization work, though I'm not sure if anybody
is working on it actively at this time.

Regards,
                                                Owen


_______________________________________________
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel

Reply via email to