Re: [E-devel] Evas Sync vs Async: numbers!

The Rasterman Thu, 17 Jan 2013 18:59:50 -0800

On Thu, 17 Jan 2013 13:11:47 -0200 Leandro Pereira <lean...@profusion.mobi>
said:


> Hey,
> 
> Here are some numbers that show that the async renderer is actually 
> saving us time. These numbers can be obtained by yourself, by building 
> Evas with EVAS_RENDER_DEBUG_TIMING defined and setting/unsetting the 
> ECORE_EVAS_FORCE_SYNC_RENDER environment variable.
> 
> This number is the measure, in ms, of the time spent during the Evas 
> rendering function. The average value is calculated every 100 calls.
> 
> One can clearly see that the asynchronous renderer frees up a lot of 
> time in the main thread, in some cases using 10% of the time the 
> synchronous renderer needed:
> 
> ⮀ export ECORE_EVAS_FORCE_SYNC_RENDER=1
> ⮀ elementary_test Animation
> *** sync render: avg 2.978395ms min 0.947041ms max 30.379951ms
> *** sync render: avg 1.957078ms min 0.474035ms max 4.538885ms
> *** sync render: avg 1.872930ms min 0.760945ms max 4.978016ms
> *** sync render: avg 1.724530ms min 0.906000ms max 3.591879ms
> *** sync render: avg 1.902350ms min 0.627998ms max 14.931986ms
> *** sync render: avg 1.692699ms min 0.940035ms max 3.616943ms
> *** sync render: avg 1.754144ms min 0.834012ms max 5.288051ms
> *** sync render: avg 1.699393ms min 0.899117ms max 3.489920ms
> *** sync render: avg 1.887736ms min 0.519943ms max 9.911051ms
> *** sync render: avg 1.775922ms min 0.989104ms max 3.373019ms
> *** sync render: avg 1.712771ms min 0.714912ms max 4.344922ms
> *** sync render: avg 1.752106ms min 0.918924ms max 4.158897ms
> *** sync render: avg 1.754684ms min 0.486945ms max 3.843055ms
> *** sync render: avg 1.853960ms min 0.900002ms max 7.829068ms
> *** sync render: avg 2.597078ms min 0.549895ms max 25.936982ms
> ^C
> ⮀ unset ECORE_EVAS_FORCE_SYNC_RENDER
> ⮀ elementary_test Animation
> *** async render: avg 1.705852ms min 0.105016ms max 32.549012ms
> *** async render: avg 0.687507ms min 0.153924ms max 1.679043ms
> *** async render: avg 0.364832ms min 0.146887ms max 1.703969ms
> *** async render: avg 0.369488ms min 0.175889ms max 1.931037ms
> *** async render: avg 0.367418ms min 0.151098ms max 2.927121ms
> *** async render: avg 0.351177ms min 0.156086ms max 1.636885ms
> *** async render: avg 0.353286ms min 0.163990ms max 1.528031ms
> *** async render: avg 0.338158ms min 0.126063ms max 1.547922ms
> *** async render: avg 0.355310ms min 0.168943ms max 1.684109ms
> *** async render: avg 0.344575ms min 0.149928ms max 1.368967ms
> *** async render: avg 0.351807ms min 0.101041ms max 1.538111ms
> *** async render: avg 0.356258ms min 0.113084ms max 1.723979ms
> *** async render: avg 0.357304ms min 0.174066ms max 1.776023ms
> *** async render: avg 0.339344ms min 0.165109ms max 1.570926ms
> ^C
> 
> 
> 
> Less dramatic but still interesting results are visible with Terminology 
> running a redraw-intensive program such as the `aafire` demo from aalib:
> 
> ⮀ unset ECORE_EVAS_FORCE_SYNC_RENDER
> ⮀ terminology -e aafire -driver curses
> *** async render: avg 1.949910ms min 0.320094ms max 6.811932ms
> *** async render: avg 2.314617ms min 0.566074ms max 8.457115ms
> *** async render: avg 2.440161ms min 0.544033ms max 8.315109ms
> *** async render: avg 2.503508ms min 0.610092ms max 8.186047ms
> *** async render: avg 2.339103ms min 0.586006ms max 9.791975ms
> *** async render: avg 1.971868ms min 0.570879ms max 9.002959ms
> *** async render: avg 2.698837ms min 0.558963ms max 11.941055ms
> *** async render: avg 2.236830ms min 0.541988ms max 11.201896ms
> ^C
> ⮀ export ECORE_EVAS_FORCE_SYNC_RENDER=1
> ⮀ terminology -e aafire -driver curses
> *** sync render: avg 4.777478ms min 0.809918ms max 12.482078ms
> *** sync render: avg 5.043696ms min 2.887004ms max 11.873881ms
> *** sync render: avg 4.810856ms min 2.514057ms max 11.527033ms
> *** sync render: avg 4.873759ms min 2.718049ms max 9.166908ms
> *** sync render: avg 5.406739ms min 2.945019ms max 28.283098ms
> *** sync render: avg 5.224728ms min 2.833047ms max 12.698107ms
> *** sync render: avg 5.378743ms min 2.787025ms max 13.941910ms
> *** sync render: avg 4.993303ms min 2.939022ms max 11.360896ms
> *** sync render: avg 5.065144ms min 2.356000ms max 13.610107ms
> ^C

argh! you confuzesedz me by orderign them differently per test.. had to look
closely. :) i see you have found a good test/measuring stick to show how good
async render is... :) ie time in evas_render() itself should be a fair bit less
on average and spikes are hopefully reduced a lot. basically we have 16.6ms to
generate a frame and get it to the screen. what we now have is much less of
that counting on the app using minimal/no time in the mainloop AND frame
generation logic uses up less time from the mainloop. for trivial examples
(animation unfortunately is one) we actually shouldn't see much gain as time
spent in "mainloop" is not much anyway... but for more involved apps this will
basically not result in more fps really.. or not much.. but result in SMOOTHER
fps... LESS frame-drops. :)

this is good. nicely done.

from here we now need to continue down this path:

1. try farm out more and mroe from what is in the inline render func into async
threads.
2. make more threads. thread 1 feeds to thread 2. this requires breaking up the
rest of rendering into well defined stages. this simply means we can overlap
more and more (there will come a point where its just not worth it anymore
though - but i think we can still get gains if we break it up into maybe 3 or
so stages/pipelines instead of just 1). - example:

  1. prepare thread: this loads images that WILL be needed. it pre-scales
image data that WILL be needed into buffers for when such scaled data is
expensive to compute. it could upload textures in gl, pre-compute clipout rect
lists too. note - #1 may be able to be threaded further - eg the actual smooth
scaling routines could work on several images at once. i'd leave this up to an
implementation detail at this stage.
  2. actually "do the work". blend/blit/generate vertices - manage the gl
pipes etc. for software rendering in theory we could too further thread #2,
since we now have mostly pre-scaled data, we are mostly copying or blending.
for copying we may be memory bound, but blending "depends" so another thread or
2 may work here.
  3. "swap" (get data to the screen either via copies, or do buffer swaps which
sometimes may block on "swapbuffers" when it tries to vsync). this does lead to
a slight problem with software and xshmputimage... we'd be putting FROM a
thread. that means x needs to be thread-ready. for xlib that means ecore_x HAS
to do XInitThreads(). ecore_evas needs to know if this has been done or not,
and then needs to indicate to evas that it can punt this off to a thread. this
creates another nasty - order of operation kind of gets a bit futzed around...
but i think we can manage this.

this can further break down to things like making scalecache not a cache that
is part of the rendering pipeline, but as a whole specialized scale data
preparation stage. we also realistically want to make a saner shared cache
mechanism that can cover more of these cases too :) kind of a prerequisite for
implementing the rest actually. also simplifies things a bit given the async
setup now. we ALSO need to do this for gl as we don't use mipmaps (due to being
2d we cant rely on regular mipmaps - they consume about 33% more memory, but we
need full blown generic mipmaps and this means we actually have to "generate as
you go". so reality is we need a similar scalecache setup for gl as w ehave for
software, just the REASONING and "should i cache it or not" points are vastly
different. we get smooth scaling up (and "down" to 1/2 the width and height)
for free in gl. the problem is with extremes. eg 4000x3000 digital photo scaled
down to 800x600 - BUT it's on a portrait screen where we only SEE 320x600 of
the image.. so why keep 600x600 of it around? how does evas figure this out? at
what point does it crop and scale to save texture memory and memory bandwidth
always scaling it down on the fly? thus why i think we need a special prepare
stage - each engine implements their own as they have different reasons and
"switch points". :)

> Cheers,
>      Leandro
> 
> ------------------------------------------------------------------------------
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> MVPs and experts. ON SALE this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122712
> _______________________________________________
> enlightenment-devel mailing list
> enlightenment-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/enlightenment-devel


-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    ras...@rasterman.com


------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Re: [E-devel] Evas Sync vs Async: numbers!

Reply via email to