On Wed, 4 Nov 2015 01:04:58 -0200 Felipe Magno de Almeida
<felipe.m.alme...@gmail.com> said:

> On Wed, Nov 4, 2015 at 12:38 AM, Carsten Haitzler <ras...@rasterman.com>
> wrote:
> > On Sun, 1 Nov 2015 22:22:47 -0200 Felipe Magno de Almeida
> > <felipe.m.alme...@gmail.com> said:
> >
> >> OK,
> >>
> >> So, I tried to take a stab at it during the weekend.
> >>
> >> I think all the optimizations are actually hurting performance. I
> >> wanted to test removing eo_do and the whole machinery for stacks etc.
> >> And just use the _eo_obj_pointer_get. However, for some reason, mixins
> >> and composites stopped working and I don't have much time to
> >> investigate.
> >
> > but... did you get any perf numbers?
> 
> Unfortunately no. Not enough time. Just adding the object to each
> function took a lot of time. Unfortunately this is just a guess,
> but if we are going to have this much trouble, at least proving
> eo_do really brings any benefit should be done. Or else, we
> might just be doing pessimizations, instead of optimizations.

eoid -> obj lookup is a cost. my recent poking in eo_* literally shows a single
if () has real impact in this hot path. splitting up ptr to its components,
then looking up table ptr then row, getting ptr, checking gen count matches
then returning that... is going to be a signficiant cost. doing it every func
instead of every 3rdd func... is a real cost. we dont take advantage of eo_do
enough yet to really measure, thus this would require a synthetic benchmark to
show it.

so you propose adding a cost... for the purpose of syntactic sugar? ie

efl_text_set(obj, "text");
efl_color_set(obj, 10, 20, 50, 255);
efl_show(obj);

vs.

eo_do(obj,
  efl_text_set("text");
  efl_color_set(10, 20, 50, 255);
  efl_show());

ie - you don't "like" the eo_do() bit. eo_do is and will definitely be more
efficient.

> Which seems to me _very_ likely, by trying to be faster than
> C++, an impossible goal given our requirements, we're running
> even more code. Besides, we have Eolian generation to help
> us in optimizations. For example, maybe we should look into
> devirtualization of function calls instead of caching results
> in TLS stacks with multiple allocations.

we are not allowed to use eolian for generating calling code. we went over this
repeatedly at the start of eo. i wanted us to have a preprocessor to pass c thru
that would have happily solved all these issues and allows the syntax we want
and then to move optimizations into the preprocessor. everyone was against it,
so we use macros and tls stacks and "look this up inside eo somewhere at
runtime" code that costs us of course. :) eolian can only generate one side of
the problem - not the other side - the calling code.

i think we can remove the tls thing by putting it on the local stack as a
context so no more tls.

> > because what you propose is that now wee
> > have to eoid -> obj ptr lookup via table every call and we can't batch and
> > share within an eo_do chunk.
> 
> The caching is more expensive because of synchronization, the
> lookup is actually a hash-like lookup, so it should be faster than
> TLS, which actually does hash-lookup too and even more.

already said the eo stack fram should styop being tls and be local on stack.
but the eo call resolve is not in a tls or  in a hash for the call and eo data
- it's in the actual method itself. thats a different thing.

> >> I think this test should be done. Afterall, if we write a lock-free
> >> table_ids data structure, then we would be _much_ better off than
> >> using TLS and allocations all over the place.
> >
> > indeed lock-free tables might help too but we can use spinlocks on those
> > which is ok as table accesses for read or write should be extremely short
> > lived. thats MUCH better than tls.
> 
> spinlocks are not lock-free. But even that is likely to be faster than
> the amount of code we need to run  to try to optimize.

we still need them on the eoid tables tho... no matter what.

> [snip]
> 
> > this means the oid lookup each and every time... thats not pretty. :
> > ( thus .. perf numbers? sure - atm we do a log of eo_do(obj, func1()); ie
> > only 1 func we don't use the eo_do  batches much yet...
> 
> Unfortunately I won't have time for that. It sure looks bad. However, the
> table is going to likely be in cache if we design it correctly. Besides,

l2 cache? maybe. not l1. well unlikely. :)

> the stack maintenance is not cheap and requires allocations from
> time-to-time. We could probably make a table lookup in much less than
> 100 instructions and a dozen data accesses. And, if we can make it really
> lock-free, then we will have very little synchronization overhead. I don't
> think we can do the same with eo_do, which _requires_ us to go around
> some kind of global to fetch the object we are calling (which creates
> our synchronization problems).

given what  i have been looking at.. 100 instr is a big cost. 50 is a big cost.
10 is worth worrying about. :)

and with eo_do we don't HAVE to use the TLS thing - the tls lookup is
expensive. we can just put ctx on the current stack. eo_do begins and fills the
ctx with a looked up obj ptr and anything else and then calls the funcs passing
in ctx. we can hide the pass in with a macro so we dont have to chnage any code
in efl. just rebuild. eolian can generate the macro.

> >> I think that eo_do is very likely hurting performance. So we should at
> >> least prove that it does give better performance before we start using
> >> macros all over the place, which will be necessary to avoid some TLS.
> >
> > my actual profiling shows its the call resolve and fetching of the class
> > data (scope data) that really are costing a lot. those are the huge things
> > - and those have to be done one way or the other. so dropping eo_do doesn
> > thelp at all here.
> 
> It does if we just kill eo_do and start optimizing that. Right now, just
> making eo work right is not an easy task.
> 
> Unfortunately I won't be able to prove either way and will only be able to
> get back to this by the end of november. However, if we do not freeze
> Eo interface right now then we could have more time to bring data to
> the discussion, or if someone else might be willing to try.

it's going to break between 1.16 and 1.17 for sure. eo abi that is.

> >> Best regards,
> >> --
> >> Felipe Magno de Almeida
> 
> Kind regards,
> -- 
> Felipe Magno de Almeida
> 


-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    ras...@rasterman.com


------------------------------------------------------------------------------
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Reply via email to