On Sat, May 08, 2004 at 01:55:37PM +0900, Carsten Haitzler wrote:
> 
> > So after that long talk on python and evas, let's talk about my second projet:
> > 
> > altivec optimizations for evas.
> 
> aaah all good - you should talk with nathan (rbdpgn) - he's done some altivec
> optimisations for evas already :)

Guess I missed this one on the first pass. My attention span for long messages
is a bit short these days.

> > I recently got a G4 IBook and wanted to test evas on it; after some compiling 
> > problems, I get it to run and ... well, that's not bad but I was once more 
> > disapointed by its result (the 800MHz G4 that's in this computer doesn't seem 
> > to be very powerfull, be warned...). Well, the OpenGL version of the evas test
> > 
> > program run far better (thanks to the radeon 9200), but I wanted for long to 
> > study a bit the altivec instruction set; I had found my guinea-pig.
> > So, inspired by the mmx functions, I began to code some functions using 
> > altivec, while learning thanks to the Apple's tutorials. I first looked at the
> > 
> > top 5 that gave me gprof on the evas_software_x11_test, and I got quite good 
> > results for me, to give just a few: a 60% gain in blending pixels on pixels, 
> > and 33% gain in copying pixels (I also made an altivec optimized version for 
> > blending colors and colors with alpha). For now they're not perfect, and 
> > surely need some corrections and enhancement, but I think I could soon send 
> > them to this list...
> > I've got one little thing to say about blending, as the the precision concern 
> > was discussed on this list (about the division by 255): the mmx version 
> > doesn't give at all the same result as the C one ("not at all" doesn't mean 
> > they're totally different, but lots of results are off by 1), and neither does
> > my altivec version.

Awesome! This takes one item off my TODO list. I did a YUV->RGB
colorspace conversion routine a while back and intended to do the
alpha-blending code next.

If you've got some issues to iron out with it still, feel free to send
it my way and I'll do what I can to lend a hand.

> > So maybe a common algorithm should be decided to avoid the
> > 
> > effects that could arise when lots of partially transparent layers get stacked
> > 
> > (I didn't test it but it would be a good visual accuracy test); I read in the 
> > evas manual that the software renderer was the reference for all the engines 
> > about rendering quality. If the software engine gives differents results 
> > depending on the optimization activated... But I know that algorithms for SIMD
> > 
> > may be very different from classical one, so I can't find a good solution.
> 
> well in my experience, doing things with SIMD involves getting "off by 1" errors
> in return for speed. if we didn't "live with this" we wouldn't get the speedups
> we do. mmx2 and altivec can do better with 128bit registers - but we have no
> code to make use of that at the moment. i personally am willing to live with
> this minor error in return for the speedup. you CAN disable mmx, sse (and
> altivec) to get the raw C version which is pretty accurate :)
> 
> > They're still some problems that make me not very happy with this altivec 
> > version; these aren't related to the altivec optimizations, but with Darwin, I
> > 
> > think (oh, I forgot to mention that I'm working on MacOSX, not Linux for PPC; 
> > but these optimizations should work on it as well as they're not using Apple's
> > 
> > library but gcc pseudo C routines that produce altivec assembly code).
> > The first problem is the memory consumption: I know that evas doesn't leak 
> > memory, I tested every Apple's tool to look at memory leaks, and I didn't find
> > 
> > anythink, but the evas test app is *filling up all the memory* !!! Not even 

Depending on which version of MacOSX you're using, there is a utility
called Shark (built in to XCode now IIRC) that is quite useful for
identifying hot spots in your code. There is also MallocDebug which is
quite nice for monitoring memory use. It might be useful to give it a
run through those.

As for the pseudo-C, there is a lib on Linux to provide the same
functionality. You have to #include <altivec.h> and link to -laltivec,
but it's supposed to be fairly compatible. That being said, I have not
tested the current altivec optimizations on Linux yet. I think we need
to add a couple configure checks for this to work. My hardware running
Linux is too old and would not benefit from altivec.

> > graphic intensive, at least for the first part, and maybe that's why X drawing
> > 
> > routines come first, but this may be a problem when the others will be solved.
> 
> basically x uses up about 15-20% of the display pipeline with doing copies from
> client memory to the framebuffer. that is about the fastest i have managed to
> get it. thats how it is on native x86 and native xservers.
> 
> > If someone has a linux ppc comp to test this on, and could tell me if these 
> > problems arise on a native X display, it would be great (I plan to install 
> > linux one day on mine, but for now it's quite in early stage of developement, 
> > even if it already works on most of the recent G4s).

FYI, I saw a tremendous drop in performance under OS X at one point
about a year ago. After upgrading X, my evas test dropped from somewhere
around 1.0 down to around 0.500. I think fink may install some X libs
that are not optimal for the Apple X server.

Good luck!
Nathan

-- 
------------------------------------------------------------------------
| Nathan Ingersoll          \\  Computer Systems & Network Coordinator |
| [EMAIL PROTECTED]   \\  http://www.ruralcenter.org            |
| http://ningerso.atmos.org/  \\  Rural Health Resource Center         |
------------------------------------------------------------------------

Attachment: signature.asc
Description: Digital signature

Reply via email to