On Tue, 13 Dec 2005 01:03:13 -0500 Jose O Gonzalez <[EMAIL PROTECTED]> babbled:
> > > On Tue, 13 Dec 2005 12:42:46 +0900 Carsten writes: > > On Mon, 12 Dec 2005 20:41:12 -0500 Jose O Gonzalez > > <[EMAIL PROTECTED]> babbled: > > > > > > > > > > > Ummmm.... Yeah, definitely something amiss here :( > > > > yeah. that was a good start in debugging, finding the algorithm > > issues then > > noting they were in many place then finally reverting. > > > > > > > > I'll take a look at what's going on and get back to you on > > > it. > > > > Ok. Let me address this first and then pre-mul alpha :) > > It's fairly clear what's going on -- the approximations used > aren't good enough to keep us within the [0-255] cube we want, so > every once in a while the rgb computations overflow and bump the > alpha up by one.. if the alpha was 255, it'll bump it 'up' to 0 :( > > Fortunately, there are simple enough solution(s): > > 1. We can do the computations with the components separately, > as has been done so far... This is fairly painless, just replace > ocurrances of the RGB_JOIN, ARGB_JOIN macros by the set of > corresponding component-wise macros... > > Or, we can keep the new approach, which is somewhat faster, > by noting that: > > 2. Since the only case where we need to worry about the dst > alpha being overflowed is when it's 255, we can simply add one line > wherever there's a switch statement over dst alpha and the case > is 255.. namely we add the line > A_VAL(dst) = 255; > immediately after the RGB_JOIN macro that ocurrs there. > > This should solve the problem for rgba dst images. > > [ For rgb dst images, if we leave things as they are then > their alphas may actually be changed, which is contrary to the > spirit of things (the current code does this sometimes too), so > if we want to be 'honest' then we should preserve whatever alpha > the dst has.. and again we can first save its alpha and reset it, > or whatever. ] > > I tried this just abit ago and... the ecore_evas_test shaped- > evas demo is back to being on par with the mmx version :) > (you only need to do it for the pixel-pixel rgba to rgba blend func > for the demo) > > > Either way, for the time being, if you feel comfortable with > the mmx versions (and I still haven't really tested those!), then > either of these two approaches, for the c versions, would be a good > idea to put back in :) (the new blend routines do give some *good* > speedups in many cases, not just from the mmx dst-alpha routines). yup. 2's compliment overflow etc. :) for now let's stick to the reverted code. i'd rather make a big leap later :) > > > > sure. though i think what we need iss to instead break out a small > > routine set > > outside of evas for optimising on its own first. i have a small > > routine set of > > more optimised pre-mul alpha c/mmx/sse/sse2 routines (sse3 provoides > > nothing of > > use i can see). they all work on x86 and amd64. it has a good speed > > and > > correctness test harness (write otu resulting png's of every op to > > see if it > > matches). :) > > Yeah, and that's pretty much what we've been doing :) I also > wrote some c/mmx pre-mul alpha routines.. back when we first discussed it > and such.. :) > > But moving *this* evas to pre-mul alpha is a big, *radical* change! > Not only do all the blending/scaling/pixel-import, etc functions, have to > be checked/changed/adapted, etc. that's just a lot of internals to be > redone.. > But every program and lib that's been getting/setting image data is going > to be either broken, or it's going to suffer a major performance hit! sure.. BUT a lot of things dont set pixel values externally - and then a lot dont set ALPHA too :) yes. it'll be a break. this will come though as i want to introduce colorspace support to set yuv, rgb, rgba, lpha mask only etc. data for an image thus premultiplied alpha changes would bring these small api changes (ie add a way to set/get the colorspace). > > i actually get some interestign performance profiles > > where > > speedups vary. on amd64 sse2 is faster than sse by a bit (for > > copies) but on my > > p4 sse is faster than sse2 - by a bigger margin. int he end i > > imagine we will > > need a runtime routine check to see whihc is faster on that > > particular cpu > > (likely cache results unless cpu changes). > > > > You're a nut!! > > But yes, I think a formal test/perfomace suite is an excellent > idea.. clearly :) indeed. then we can check correctness, speed etc. in a nice isolated test suite. :) for now the only thing i have done are copy (blit) and alpha blend (with or without destination alpha). for C, mmx, sse and sse2 i have done: done: * solid pixel copy forwards * pixel blend * pixel blend dst alpha to do: * solid pixel copy backwards * solid color copy * solid color blend * solid color blend dst alpha * color mul pixel copy * color mul pixel blend * color mul pixel blend dst alpha * alpha mask color copy * alpha mask color blend * alpha mask color blend dst alpha * alpha mask mul pixel copy * alpha mask mul pixel blend * alpha mask mul pixel blend dst alpha * pixel argb mask mul pixel copy * pixel argb mask mul pixel blend * pixel argb mask mul pixel blend dst alpha * yuv(yv12) to rgb * yuva(yv12+a plane) to argb * scale image (nearest) * scale image (filtered) * scale & rotate (transform) image non-repeat (nearest) * scale & rotate (transform) image repeat (nearest) * scale & rotate (transform) image non-repeat (smooth) * scale & rotate (transform) image repeat (smooth) * pixel box filter blurr copy * pixel gaussian blurr copy * alpha mask box filter blurr copy * alpha mask gaussian blurr copy ok - so why so all of these separately? build a set of really really realy really fast routines to build "evas 2" software on top of. some of these listed are not in current evas. i think we can remove the cmod routines as they arent used and are just a pain. some of the aboive (yuv->rgb) exist already in highly optimsied format - it'll be hard to beat them - and an altivec one to boot. some other routines i want to create massively optimal subsystems for: * detecting blit regions (this means making a very fast rect list region implementation that merges rects quickly on the fly and can do boolean logic (set, get, union, intersection, difference/cut) WITH motion vector tags. * better gradient fills (jose - you have this well in hand) this combined with the basic routines as above shoudl be enough to implement much more liek arbitray clipping, in-canvas blur filters (filter objects to blurr anything they "filter" like clip objects filter anything they clip). we should put these all into an external test harness and make ti work then work on merging it in later. :) -- ------------- Codito, ergo sum - "I code, therefore I am" -------------- The Rasterman (Carsten Haitzler) [EMAIL PROTECTED] 裸好多 Tokyo, Japan (東京 日本) ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel