On Thu, 17 Jan 2013 13:11:47 -0200 Leandro Pereira <lean...@profusion.mobi> said:
> Hey, > > Here are some numbers that show that the async renderer is actually > saving us time. These numbers can be obtained by yourself, by building > Evas with EVAS_RENDER_DEBUG_TIMING defined and setting/unsetting the > ECORE_EVAS_FORCE_SYNC_RENDER environment variable. > > This number is the measure, in ms, of the time spent during the Evas > rendering function. The average value is calculated every 100 calls. > > One can clearly see that the asynchronous renderer frees up a lot of > time in the main thread, in some cases using 10% of the time the > synchronous renderer needed: > > ⮀ export ECORE_EVAS_FORCE_SYNC_RENDER=1 > ⮀ elementary_test Animation > *** sync render: avg 2.978395ms min 0.947041ms max 30.379951ms > *** sync render: avg 1.957078ms min 0.474035ms max 4.538885ms > *** sync render: avg 1.872930ms min 0.760945ms max 4.978016ms > *** sync render: avg 1.724530ms min 0.906000ms max 3.591879ms > *** sync render: avg 1.902350ms min 0.627998ms max 14.931986ms > *** sync render: avg 1.692699ms min 0.940035ms max 3.616943ms > *** sync render: avg 1.754144ms min 0.834012ms max 5.288051ms > *** sync render: avg 1.699393ms min 0.899117ms max 3.489920ms > *** sync render: avg 1.887736ms min 0.519943ms max 9.911051ms > *** sync render: avg 1.775922ms min 0.989104ms max 3.373019ms > *** sync render: avg 1.712771ms min 0.714912ms max 4.344922ms > *** sync render: avg 1.752106ms min 0.918924ms max 4.158897ms > *** sync render: avg 1.754684ms min 0.486945ms max 3.843055ms > *** sync render: avg 1.853960ms min 0.900002ms max 7.829068ms > *** sync render: avg 2.597078ms min 0.549895ms max 25.936982ms > ^C > ⮀ unset ECORE_EVAS_FORCE_SYNC_RENDER > ⮀ elementary_test Animation > *** async render: avg 1.705852ms min 0.105016ms max 32.549012ms > *** async render: avg 0.687507ms min 0.153924ms max 1.679043ms > *** async render: avg 0.364832ms min 0.146887ms max 1.703969ms > *** async render: avg 0.369488ms min 0.175889ms max 1.931037ms > *** async render: avg 0.367418ms min 0.151098ms max 2.927121ms > *** async render: avg 0.351177ms min 0.156086ms max 1.636885ms > *** async render: avg 0.353286ms min 0.163990ms max 1.528031ms > *** async render: avg 0.338158ms min 0.126063ms max 1.547922ms > *** async render: avg 0.355310ms min 0.168943ms max 1.684109ms > *** async render: avg 0.344575ms min 0.149928ms max 1.368967ms > *** async render: avg 0.351807ms min 0.101041ms max 1.538111ms > *** async render: avg 0.356258ms min 0.113084ms max 1.723979ms > *** async render: avg 0.357304ms min 0.174066ms max 1.776023ms > *** async render: avg 0.339344ms min 0.165109ms max 1.570926ms > ^C > > > > Less dramatic but still interesting results are visible with Terminology > running a redraw-intensive program such as the `aafire` demo from aalib: > > ⮀ unset ECORE_EVAS_FORCE_SYNC_RENDER > ⮀ terminology -e aafire -driver curses > *** async render: avg 1.949910ms min 0.320094ms max 6.811932ms > *** async render: avg 2.314617ms min 0.566074ms max 8.457115ms > *** async render: avg 2.440161ms min 0.544033ms max 8.315109ms > *** async render: avg 2.503508ms min 0.610092ms max 8.186047ms > *** async render: avg 2.339103ms min 0.586006ms max 9.791975ms > *** async render: avg 1.971868ms min 0.570879ms max 9.002959ms > *** async render: avg 2.698837ms min 0.558963ms max 11.941055ms > *** async render: avg 2.236830ms min 0.541988ms max 11.201896ms > ^C > ⮀ export ECORE_EVAS_FORCE_SYNC_RENDER=1 > ⮀ terminology -e aafire -driver curses > *** sync render: avg 4.777478ms min 0.809918ms max 12.482078ms > *** sync render: avg 5.043696ms min 2.887004ms max 11.873881ms > *** sync render: avg 4.810856ms min 2.514057ms max 11.527033ms > *** sync render: avg 4.873759ms min 2.718049ms max 9.166908ms > *** sync render: avg 5.406739ms min 2.945019ms max 28.283098ms > *** sync render: avg 5.224728ms min 2.833047ms max 12.698107ms > *** sync render: avg 5.378743ms min 2.787025ms max 13.941910ms > *** sync render: avg 4.993303ms min 2.939022ms max 11.360896ms > *** sync render: avg 5.065144ms min 2.356000ms max 13.610107ms > ^C argh! you confuzesedz me by orderign them differently per test.. had to look closely. :) i see you have found a good test/measuring stick to show how good async render is... :) ie time in evas_render() itself should be a fair bit less on average and spikes are hopefully reduced a lot. basically we have 16.6ms to generate a frame and get it to the screen. what we now have is much less of that counting on the app using minimal/no time in the mainloop AND frame generation logic uses up less time from the mainloop. for trivial examples (animation unfortunately is one) we actually shouldn't see much gain as time spent in "mainloop" is not much anyway... but for more involved apps this will basically not result in more fps really.. or not much.. but result in SMOOTHER fps... LESS frame-drops. :) this is good. nicely done. from here we now need to continue down this path: 1. try farm out more and mroe from what is in the inline render func into async threads. 2. make more threads. thread 1 feeds to thread 2. this requires breaking up the rest of rendering into well defined stages. this simply means we can overlap more and more (there will come a point where its just not worth it anymore though - but i think we can still get gains if we break it up into maybe 3 or so stages/pipelines instead of just 1). - example: 1. prepare thread: this loads images that WILL be needed. it pre-scales image data that WILL be needed into buffers for when such scaled data is expensive to compute. it could upload textures in gl, pre-compute clipout rect lists too. note - #1 may be able to be threaded further - eg the actual smooth scaling routines could work on several images at once. i'd leave this up to an implementation detail at this stage. 2. actually "do the work". blend/blit/generate vertices - manage the gl pipes etc. for software rendering in theory we could too further thread #2, since we now have mostly pre-scaled data, we are mostly copying or blending. for copying we may be memory bound, but blending "depends" so another thread or 2 may work here. 3. "swap" (get data to the screen either via copies, or do buffer swaps which sometimes may block on "swapbuffers" when it tries to vsync). this does lead to a slight problem with software and xshmputimage... we'd be putting FROM a thread. that means x needs to be thread-ready. for xlib that means ecore_x HAS to do XInitThreads(). ecore_evas needs to know if this has been done or not, and then needs to indicate to evas that it can punt this off to a thread. this creates another nasty - order of operation kind of gets a bit futzed around... but i think we can manage this. this can further break down to things like making scalecache not a cache that is part of the rendering pipeline, but as a whole specialized scale data preparation stage. we also realistically want to make a saner shared cache mechanism that can cover more of these cases too :) kind of a prerequisite for implementing the rest actually. also simplifies things a bit given the async setup now. we ALSO need to do this for gl as we don't use mipmaps (due to being 2d we cant rely on regular mipmaps - they consume about 33% more memory, but we need full blown generic mipmaps and this means we actually have to "generate as you go". so reality is we need a similar scalecache setup for gl as w ehave for software, just the REASONING and "should i cache it or not" points are vastly different. we get smooth scaling up (and "down" to 1/2 the width and height) for free in gl. the problem is with extremes. eg 4000x3000 digital photo scaled down to 800x600 - BUT it's on a portrait screen where we only SEE 320x600 of the image.. so why keep 600x600 of it around? how does evas figure this out? at what point does it crop and scale to save texture memory and memory bandwidth always scaling it down on the fly? thus why i think we need a special prepare stage - each engine implements their own as they have different reasons and "switch points". :) > Cheers, > Leandro > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122712 > _______________________________________________ > enlightenment-devel mailing list > enlightenment-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/enlightenment-devel -- ------------- Codito, ergo sum - "I code, therefore I am" -------------- The Rasterman (Carsten Haitzler) ras...@rasterman.com ------------------------------------------------------------------------------ Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and much more. Get web development skills now with LearnDevNow - 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122812 _______________________________________________ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel