On Thu, 13 Nov 2008 18:20:31 +0000 Chris Lord <[EMAIL PROTECTED]> babbled:
> This clearly isn't a fair test - it's apples to apples, but Clutter > isn't an apple, it's more like a 10-course banquet. Also, comparing a > dual-core 3ghz machine vs 8600GT probably isn't of the same order as, > say, an Atom 1.6ghz vs. a GM965-based GPU either. You don't mention > anything about RAM or video RAM either, which are two important factors, > given that a lot of your tests will probably be restricted by how fast > memory can be read/written. clutter doesn't have a comparison that you can make - until you write a properly optimised software rasteriser for it to compare. evas has a software one as well as gl etc. etc. and a 3ghz core 2 desktop cpu vs the nv8600gt desktop gpu is very fair - same ballpark in power consumption and target. right now atom is a very immature cpu. > Also, Matthew's '100x' comment was clearly meant to just give the > general impression 'much faster'. Even if it were just, say, 10x faster, > that still means it can do something 10 times that you could do only > once in software in the same time, and it can do it without lumbering > the CPU too. sure - it is faster - or can be, but you then pay a price by setting a hardware bar that below which the software just won't work at all (is entirely unusable). that is what some of the initial comments that started this were at - they'd like something that works on their non-gpu supported hardware. i don't muc care what intel pushes/uses. you make your own hardware, but if you want moblin to be something more general to work across a wider berth of hardware, then you'll want to take the non-gpu world seriously. as i said - i have clients that just this past week are pushing their way into a lower and lower performance envelope where the only thing they have is a dumb fb and a cpu. they are cutting costs and silicon space. they want something that works on their high-end gpu carrying soc's as well as their low end. and if they can get bling on both they are most happy. but as i said - this is simply the world that clutter doesn't care about. gpu or nothing. that's a design choice and is fair enough. what is not fair is that you characterise software rendering as so incredibly slow compared to "hardware" that its unusable, which is entirely NOT the case. empirically it is otherwise - numbers and actual usage show it. > That aside, I'm a little suspicious of the numbers too; > > > let me quote some "interesting results". > > > > 1. alpha blending a whole bunch of non-scaled images (1:1) gl ONLY managed > > to be 4x faster than software... not 100's of times. > > I outright just don't believe this. Alpha-blending is inherently a slow > operation and you'd definitely see a larger speed up doing this in > hardware, unless your test is very limited, or you're taking some > quality-affecting or constraining short-cuts. incorrect. in fact the alpha blending is done at FSB speed. the software engine is SLI (it splits rendering between as many cpus+cores as you have). i have sat down and benchmarked in gory detail. a dual thread blender routine does not beat a single one - i was surprised, until i did the math. it was FSB limited. cpu couldn't read/write to memory faster. believe it or not - i've learnt to think about my routines and write them with some level of optimisation in mind. i care about my cycle usage. the test is apples vs apples. it's a 1:1 scaled image, so neither gpu nor cpu need to any scale calculations. on the gl side its an ARGB texture mapped onto a quad and drawn 128 times per frame in various positions into the backbuffer, then swapped to the front each frame. in software its the exact same test with the exact same pixle output - ARGB32 pixels (same image) alpha blended onto an argb32 dest tmp buffer and finally copied to the screen per frame. image size is the same, image pixel positions the same. color space the same (both use ARGB32 premul). both use non-alpha destination buffers and both target an x window for output. fyi the alpha blender is mmx/sse and as i mentioned above... is limited by memory access speeds :) > Could you elaborate on what this test actually does? Perhaps you only > see a 4x speed up if you blit alpha-blended rectangles, one at a time, > per rectangle... Try alpha-blending 20 reasonable-sized (say 256x256) > textures, on a reasonable-sized (say 1024x768) buffer, with varying > alpha over a background image. I'd be hard-pushed to believe, even with > your skewed setup, that you'd only see a 4x speed-up. images are 120x160 - draw 128 of them per frame. output window is 720x420. so it's reasonable. images have varying alpha (it's the E logo with a soft shadow rendered in 3d with anti-aliasing). it's not skewed. its very much fair. > > 2. at BEST when doing "smooth scaling" (that means for GL GL_LINEAR vs full > > super/sub sampling in software (which is much higher quality especially on > > down-scale) gl manages at BEST to be 30x faster than software. different > > algorithms here so software is at a major disadvantage due to its higher > > quality. > > Well, 30x is quite a lot, I'm not sure why you say it as if it isn't a > big deal - if you're doing basic UI things, why do you even need > something better than bi-linear filtering? Saying software is at a > disadvantage because it's higher quality is silly. Either compare it > with anisotropic filtering, or write a fast bilinear software filter. I > think you'll find that the hardware does this a lot better. it's not an apples to apples comparison. when downscaling gl just samples points - software literally will read the entire source region and compute. example: i scale an image that is 1000x1000 down to 10x10. for gl with GL_LINEAR at most it will chose 4 poitns per output pixel (only 100 output pixels) and interpolate. software will sample a 100x100 region from the source, compute a sume, average them and use that. the output quality is massively different. it's very noticable scaling down icons and other detailed images. the software scaler could be simplified and get a significant speedup (several times) if such super-sampling were dropped. imaging i throw ansiotropic sampling at gl at lets say a level of 100 (which i haven't even seen supported before). then you'd have something more comparable. gl gets a speedup here due to a quality drop. i know the quality difference - it si noticable and i prefer the full sampling. > > 3. apples vs apples... scaling an image with "GL_NEAREST" in software and > > gl... gl is 5.8times faster. > > Yes, but scaling a 2d, axis-aligned texture using nearest-neighbour > scaling looks AWFUL and you'd never do this. This test is pointless. same as above with downscaling but you seem to not care about it being awful. :) also depending on your source data... this can be perfectly nice. in some cases its BETTER than linear/bi-linear etc. interpolation as the results are visually better. i use it in specific cases as a speedup when i know the image data will not suffer from being scaled this way. > > 4. software is actually FASTER for alpha blending a whole bunch of > > rectangles - 25% faster. gl uses just a quad. software is at an advantage - > > it's smarter adn can calculate deltas better > > I can believe this, for your setup, but only because you have a very > powerful CPU and probably have very fast memory. Do you still get a 25% > faster result if you count the time it takes to copy the buffer from > system memory to video memory though? Also, 25% faster than what, a > microsecond? I'd contend that alpha blending rectangles isn't that > common an operation, vs. say scaling, rotation, or alpha-blending a > texture, and it happens so fast using either method, that quoting a 25% > speed gain isn't actually as big a deal as it sounds. yes ALL the above tests INCLUDE display. they include getting pixels to the window so it's visible. on both sides. gl has the major advantage here as its renderings are already in video ram and display for it is a much faster path than the copy from the system ram ARGB buffer up to video ram. despite this, software wins :) > > 5. gl shos overhead on having to upload ARGB pixels textures - software > > manages to do the argb data change test about 25% faster than gl (that > > includes also drawing the image to the screen after upload of new data). > > I'm not sure what this test is? If you mean it takes longer to upload > ARGB data to video memory than it does to system memory, I'm surprised > the result is only that much faster. I guess that's a good thing though, > it's always good to find out that system->video memory bandwidth isn't > as big an issue as you think it is :) both tests involve writing new ARGB pixel data (simple fixed values in a loop) whihc is of course software. the software engine avoids a "upload to texture" as you actually write directly to the image data itself. the gl engine needs to upload the texture data. then this data is rendered on both sides (gl vs software) and displayed. so in fact in both cases data needs upload - in fact the SAME data. software has to upload its resultant rendered buffer. gl has to upload the texture source data. it just happens at different pipeline stages (one at the start, the other at the end). despite the fact both have to upload - in this case, the exact same amount of data, software wins. > > i'm just saying this above as i believe that there are bogus facts being > > floated about (efl starves your cpu - incorrect. it uses just as > > little/much as clutter would.. if you use the gl engine or any other > > accelerated back end - xrender (an the acceleration is working) etc.) and > > the numbers on "gl is 100x faster" i would say is entirely bogus. i am > > pitting a very very very fast gpu against a very fast laptop cpu. and the > > cpu is doing a stellar job of keeping up... considering. and both engines > > can be optimised - i know just what i could do to improve them, and i don't > > see that your numbers will change so drastically if i did it on both > > ends. :) > > Yes, if you use EFL's GL engine you might get the same performance as > Clutter, at the expense of an API that doesn't integrate as well with > the rest of the Moblin libraries, and is arguably not as nice to use > (but then, I would say this being a GTK and Clutter hacker, so take this > with a bowl of salt). you are right - and i'd never argue there. if you want to live in a gtk and gnome world clutter integrates. nicely. it's "g-compliant". in our world we simply have created the technology we need when comparable technology just doesn't exist or is inadequate or comes with baggage we don't want. we have recycled where appropriate and useful. yes evas is more limited in functionality - though as with all things - that is not a static thing. it has served us quite well so far and will expand - don't worry. it's not a fixed "will forever do only what it currently does" thing. don't get me wrong. i am not bagging clutter nor do i want to change any decision to use clutter in moblin. i just want to correct false and misleading statements of "fact" that for example say "efl cpu-starves you" when that is not true (if you use the gl engine). if clutter is non non-gpu hardware... clutter will not just cpu starve you, compared to efl - it will be like watching continental drift. so you need to be factual and accurate. the above tests are apples vs apples. i am not trying to make gl look bad! i use it for my media center for example - i get slightly better framerates for hdtv (1920x1080) video than software :) i know what it can do and how it does it. but the speedup numbers compared to well engineered software are not so great. they are better but you are not talking such massive factors. of course there is a major shift happening in the gpu world these days - just look at larabee(sp?). that's not a gpu! it's a multi-core cpu with some extra gpu-helpful commands. something like evas's software engine would actually run nicely on it - as it's already multi-core (as above- if it saw 16 or 32 cores, it'd split the rendering between them just like your gpu splits your texel calculation/output between its multiple pipelines). inside the future of gpu's is software... it's just got more direct access to pixel input and output data, as well as a lot of cores. -- ------------- Codito, ergo sum - "I code, therefore I am" -------------- The Rasterman (Carsten Haitzler) [EMAIL PROTECTED] _______________________________________________ Moblin dev Mailing List [email protected] To manage or unsubscribe from this mailing list visit: https://lists.moblin.org/mailman/listinfo/dev or your user account on http://moblin.org once logged in. For more information on the Moblin Developer Mailing lists visit: http://moblin.org/community/mailing-lists
