On 25/10/15 03:05, Carsten Haitzler wrote:
> i've been spending a bit of time profiling eo.
>
> SETUP:
>
> here is my test. frankly this is a COMMON CASE test of scrolling genlist
> around. this is incredibly common, and if it's slow people notice. so here is
> the case:
>
>    export ELM_ENGINE=gl
>    export ELM_TEST_AUTOBOUNCE=1
>
> then every time:
>
>    elementary_test -to genlist
>
> use perf. use valgrind/callgrind/cachegrind - whatever. the results are
> similar. this removes any rendering (sw rendering) from the equation.
>
> RESULTS:
>
> eo is using about 25-30% of ALL CPU TIME ... just to find objects, resolve
> functions, go in and out of eo do, call a callback (finding the callback to
> call then calling). so about 27% is what i get from callgrind.
>
> 1% of all cpu time is JUST "eina_main_loop_is()" which is getting the eo call
> stack. i've tried _thread. it's no better actually. you would think it would 
> be
> - but no. compiler+ld+glibc hasn't found a more efficient way of having thread
> local vars than we have.
>
> but THIS IS 1% of that 27... so 1/27th of eo overhead is just this eo call
> stack design. i think it's time we look at eo now not from a "oh but thats not
> clean" perspective but "this is going to be faster" perspective. at this point
> eo1 looks better because at least we didnt need an eo call stack and could 
> pass
> any context on the stack of the thread itself. we need to reconsider this
> callstack and pass this into functions.
>
> now _eo_call_resolve uses about 7.8-8% out of our 27% cpu. this needs some 
> real
> looking at. i cut it down from about 10% by adding a call cache that stores 
> the
> last call that was looked up for that klass + op.
>
> it's crazy but within this func, 0.45% of our cpu time seems to simply be
> checking if the eo op id is valid the compare + branch... alone...
>
> _eo_do_start uses about 6-6.5% of our cpu time. eo_data_scope_get is 5%.
> _eo_do_end even is about 2.9%.
>
> these all add up and every pass through an eo interface is costing the above.
> but we need to stand back and look at eo from a performance perspective. this
> MAY mean making decisions and changes that are not as "elegant" in the name of
> cutting this overhead down to less than 5%. i would say that should be the 
> goal.
>
> but we need to talk here.
>
> one thing that is causing a lot of eo chatter is a lot of:
>
>     blah_xxx_set()
>
> and some
>
>     blah_xxx_get()
>
> and in most of these cases the values are the same is same x,y same r, g, b, a
> etc. from a design perspective it'd have value to "teach" eo about at least
> some basic property types. eg an int, a pair of ints, a double, a set of 4 
> ints
> etc. etc. and eo KNOWS where in memory this property is stored in the object
> and can avoid resolving anything if the values are already the same. so think
> of a "pure" property that simply stores the values u give it and IF they are
> different - possibly triggers an action. these cases mean that it could be
> optimized outside of the object code. what we would need is a way to map N
> input values to N pointer offsets and types in the object. eo would just get,
> compare, and move onto the next one if the same. if all same - return. if any
> changes, call real call.
>
> this would be easier with varargs imho. ie - eo1.
>
> we do things like try and resolve calls for null objects where near the start
> of the resolve after getting stack - we return if its not valid.
>
>    if (EINA_UNLIKELY(!fptr->o.obj))
>      return EINA_FALSE;
>
> like that. we could check before we resolve....
>
> anyway.
>
> i am inviting people to look into the guts of eo and think up ways to speed it
> up - but design or any other means. i suspect the speedups we can get now that
> are meaty enough will all be design and abi break changes. so let's get on 
> with
> this now.
>

Hey,

A lot of it is already optimised in my devs/tasn/eo_optimisations 
branch. I think it's already down to 20% (or less? not sure) if I 
remember correctly. Hopefully, if your modifications help on top of 
mine, we'll get to 18%. I have an idea (which I've already shared with 
you on IRC) that could reduce it drastically more, and I have other 
ideas that may help in that regard too.

The main idea, which may prove a bit controversial, is to increase our 
dependency on Eolian. That is, add more boiler-plate, but that 
boiler-plate will in increase speed. The plan is for every function, 
e.g. efl_text_set, create these definitions:

EOAPI void _EO_efl_text_set(Eo_Context *ctx, const char *part, const 
char *text);
#define efl_text_set(part, text) _EO_efl_text_set(__eo_ctx, part, text)

this way, I could have a local variable in eo_do that is the context. 
I'm not sure if that could work in eo_do_ret (it might, I have an idea 
how, but it would be a bit slower than normal eo_do). Kolesa has already 
put in the Eolian support I needed, and I'll get into implementing it 
early next week.

My plan is to come back with some stats for all the proposed changes. 
None of those are API changes, but they are ABI changes.

The biggest concern about this change is that we'll lose type annotation 
for autocompletion in IDEs, or in more simple terms, when you 
autocomplete in an IDE you'll now see: "efl_text_set(part, text)" 
instead of "efl_text_set(const char *part, const char *text)". 
Compilation errors will still work as expected.

One more thing to keep in mind, is that a lot of this code is 
SIGNIFICANTLY faster with -O2 than -O0. That is because a lot of the 
code I write, I split to inlined functions, or similar things for 
clarity, which any decent compiler will optimise, but without 
optimisations is just damn slow.

--
Tom.

------------------------------------------------------------------------------
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Reply via email to