Arch specific optimizations?

2003-09-03 Thread emmanuel ALLAUD
  Hi all,
in the thread about RENDER extension, it has beem
mentionned that XFree was performing much slower (ie 2
or 3 times slower) than imlib2 (sorry I don't really
remember in which tasks). The reason seemed to boil
down to the fact that imlib2 has arch specific asm
instructions (I think mostly for x586 via MMX or SSE
or whatever) for certain crucial functions.
My question is why not do that also in XFree
(borrowing/adapting codes from images manipulation
libs)?
By choosing carefully the functions to optimize (that
would mean only a few small function so that the
maintainance is as easy as possible) we should avoid
too much mess, keeping portability by using the "old"
functions for arches with no specific optimizations.
Does that sound reasonnable?
Bye
Manu

___
Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en français !
Yahoo! Mail : http://fr.mail.yahoo.com
___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel


Re: Arch specific optimizations?

2003-09-03 Thread The Rasterman
On Thu, 4 Sep 2003 00:52:41 +0200 (CEST) emmanuel ALLAUD <[EMAIL PROTECTED]>
(Bbabbled:
(B
(B>   Hi all,
(B> in the thread about RENDER extension, it has beem
(B> mentionned that XFree was performing much slower (ie 2
(B> or 3 times slower) than imlib2 (sorry I don't really
(B> remember in which tasks). The reason seemed to boil
(B> down to the fact that imlib2 has arch specific asm
(B> instructions (I think mostly for x586 via MMX or SSE
(B> or whatever) for certain crucial functions.
(B> My question is why not do that also in XFree
(B> (borrowing/adapting codes from images manipulation
(B> libs)?
(B> By choosing carefully the functions to optimize (that
(B> would mean only a few small function so that the
(B> maintainance is as easy as possible) we should avoid
(B> too much mess, keeping portability by using the "old"
(B> functions for arches with no specific optimizations.
(B> Does that sound reasonnable?
(B> Bye
(B> Manu
(B
(Bno - it wasn't the mmx/sse that did it. xrender was performing WHEN it was going
(Bthrough a hardware accelerated path only 2-5 times faster than imlib2 (using the
(Bcpu & mmx to do blends) which is rather slow compared to the speed gfx hardware
(Bcan actually achieve. in almost all cases xrender was slower - by 30-50 times
(Bslower. this isn't a matter of mmx/sse. this is things like 1. image data not
(Bbeing in video ram across a bus. on top of that mmx/sse would speed things up
(Btoo - but the main issue at hand is that even the software fallbacks don't have
(Ba fair go of doing a decent job, and hardware acceleration doesn't seem to have
(Bbeen used much at all - if at all. thus its always using software. this varies
(Bfrom driver to driver - but non accelerated any transforms at all, and the ones
(Bthat did accelerate 1:1 blending were not what i would deem significantly faster
(Bthan imlib2 (compared to what the hardware i was testing at the time was capable
(Bof in terms of blends with opengl).
(B
(Bso summary:
(B
(Bthe problem is twofold.
(B
(B1. the software fallbacks in xrender - if they have mmx/sse or not are simply
(Bnot being given a chance to perform at full cpu capacity. adding mmx/sse won't
(Bhelp until this is solved. once this is solved mmx/sse/altivec etc. could get
(Byou up to a double speedup again (in my experience), maybe even more. but first
(Bthis needs to be fixed.
(B
(B2. only 1 of all my tests ever went through acceleration at all. the other 6
(Btests all were software only. even the first test sometimes wasn't hardware
(Beither. when it was hardware accelerated it could have been a LOT LOT LOT faster
(Bin the cases i saw
(B
(Bi will also point out that my performance tests also were measuring only a small
(Bsubset of xrender's operations - but they were the operations i personally use
(Bthe most, and definitely to me would appear to be, in general, some of the most
(Bcommonly used ones.
(B
(B> ___
(B> Yahoo! Mail : http://fr.mail.yahoo.com
(B> ___
(B> Devel mailing list
(B> [EMAIL PROTECTED]
(B> http://XFree86.Org/mailman/listinfo/devel
(B
(B
(B-- 
(B--- Codito, ergo sum - "I code, therefore I am" 
(BThe Rasterman (Carsten Haitzler)[EMAIL PROTECTED]
$B7'<*(B - $Bhttp://XFree86.Org/mailman/listinfo/devel

Re: Arch specific optimizations?

2003-09-04 Thread emmanuel ALLAUD
 --- Carsten Haitzler <[EMAIL PROTECTED]> a écrit :
> On Thu, 4 Sep 2003 00:52:41 +0200 (CEST) emmanuel
> ALLAUD <[EMAIL PROTECTED]>
> babbled:
> 
> >   Hi all,
> > in the thread about RENDER extension, it has beem
> > mentionned that XFree was performing much slower
> (ie 2
> > or 3 times slower) than imlib2 (sorry I don't
> really
> > remember in which tasks). The reason seemed to
> boil
> > down to the fact that imlib2 has arch specific asm
> > instructions (I think mostly for x586 via MMX or
> SSE
> > or whatever) for certain crucial functions.
> > My question is why not do that also in XFree
> > (borrowing/adapting codes from images manipulation
> > libs)?
> > By choosing carefully the functions to optimize
> (that
> > would mean only a few small function so that the
> > maintainance is as easy as possible) we should
> avoid
> > too much mess, keeping portability by using the
> "old"
> > functions for arches with no specific
> optimizations.
> > Does that sound reasonnable?
> > Bye
> > Manu
> 



> so summary:
> 
> the problem is twofold.
> 
> 1. the software fallbacks in xrender - if they have
> mmx/sse or not are simply
> not being given a chance to perform at full cpu
> capacity. adding mmx/sse won't
> help until this is solved. once this is solved
> mmx/sse/altivec etc. could get
> you up to a double speedup again (in my experience),
> maybe even more. but first
> this needs to be fixed.
> 
> 2. only 1 of all my tests ever went through
> acceleration at all. the other 6
> tests all were software only. even the first test
> sometimes wasn't hardware
> either. when it was hardware accelerated it could
> have been a LOT LOT LOT faster
> in the cases i saw
> 
> i will also point out that my performance tests also
> were measuring only a small
> subset of xrender's operations - but they were the
> operations i personally use
> the most, and definitely to me would appear to be,
> in general, some of the most
> commonly used ones.
> 

OK that is for xrender software fallbacks, and
basically you are telling that even the generic code
could be substantially faster evn without using asm
arch specific code? That would certainly be the first
thing to fix.
I was also pointing at other functions (not only in
xrender) that could benefit some arch specific
optimizations, but I would like people actually
knowing the code (I'm not in that population ;-) to
speak up and tell if this is doable without
compromising portability (I mean without too much
hassle), and if it is worth doing it.
Bye
Manu

___
Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en français !
Yahoo! Mail : http://fr.mail.yahoo.com
___
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel