So, I was, just like, remembering a trick I used in C++ for eina_log. And it hit me. We use eo_do because we need to run something _before_ and _after_ we call functions. Why don't we use a fake for loop?
Let's say we do (simplified): #define eo_prepare(obj) \ for(struct _Eo_Object* __obj; (eo_finish(__obj), false);) And then people can do: eo_prepare(obj) efl_gfx_base_size_set(x, y); or: eo_prepare(obj) { efl_gfx_base_size_set(x, y); efl_gfx_base_visible_set(true); } The naming is awful (eo_prepare) and how it creates a new scope implicitly. However, with this trick in mind, we might just be able to create a better syntax and still be able to use a optimization that requires setting a context and finishing it. No return statement will be allowed though. And break will break from the implicit for-loop and not for any outer-loop that might happen to exist. Regards, On Wed, Nov 4, 2015 at 12:55 PM, Carsten Haitzler <ras...@rasterman.com> wrote: > On Wed, 4 Nov 2015 01:04:58 -0200 Felipe Magno de Almeida > <felipe.m.alme...@gmail.com> said: > >> On Wed, Nov 4, 2015 at 12:38 AM, Carsten Haitzler <ras...@rasterman.com> >> wrote: >> > On Sun, 1 Nov 2015 22:22:47 -0200 Felipe Magno de Almeida >> > <felipe.m.alme...@gmail.com> said: >> > >> >> OK, >> >> >> >> So, I tried to take a stab at it during the weekend. >> >> >> >> I think all the optimizations are actually hurting performance. I >> >> wanted to test removing eo_do and the whole machinery for stacks etc. >> >> And just use the _eo_obj_pointer_get. However, for some reason, mixins >> >> and composites stopped working and I don't have much time to >> >> investigate. >> > >> > but... did you get any perf numbers? >> >> Unfortunately no. Not enough time. Just adding the object to each >> function took a lot of time. Unfortunately this is just a guess, >> but if we are going to have this much trouble, at least proving >> eo_do really brings any benefit should be done. Or else, we >> might just be doing pessimizations, instead of optimizations. > > eoid -> obj lookup is a cost. my recent poking in eo_* literally shows a > single > if () has real impact in this hot path. splitting up ptr to its components, > then looking up table ptr then row, getting ptr, checking gen count matches > then returning that... is going to be a signficiant cost. doing it every func > instead of every 3rdd func... is a real cost. we dont take advantage of eo_do > enough yet to really measure, thus this would require a synthetic benchmark to > show it. > > so you propose adding a cost... for the purpose of syntactic sugar? ie > > efl_text_set(obj, "text"); > efl_color_set(obj, 10, 20, 50, 255); > efl_show(obj); > > vs. > > eo_do(obj, > efl_text_set("text"); > efl_color_set(10, 20, 50, 255); > efl_show()); > > ie - you don't "like" the eo_do() bit. eo_do is and will definitely be more > efficient. > >> Which seems to me _very_ likely, by trying to be faster than >> C++, an impossible goal given our requirements, we're running >> even more code. Besides, we have Eolian generation to help >> us in optimizations. For example, maybe we should look into >> devirtualization of function calls instead of caching results >> in TLS stacks with multiple allocations. > > we are not allowed to use eolian for generating calling code. we went over > this > repeatedly at the start of eo. i wanted us to have a preprocessor to pass c > thru > that would have happily solved all these issues and allows the syntax we want > and then to move optimizations into the preprocessor. everyone was against it, > so we use macros and tls stacks and "look this up inside eo somewhere at > runtime" code that costs us of course. :) eolian can only generate one side of > the problem - not the other side - the calling code. > > i think we can remove the tls thing by putting it on the local stack as a > context so no more tls. > >> > because what you propose is that now wee >> > have to eoid -> obj ptr lookup via table every call and we can't batch and >> > share within an eo_do chunk. >> >> The caching is more expensive because of synchronization, the >> lookup is actually a hash-like lookup, so it should be faster than >> TLS, which actually does hash-lookup too and even more. > > already said the eo stack fram should styop being tls and be local on stack. > but the eo call resolve is not in a tls or in a hash for the call and eo data > - it's in the actual method itself. thats a different thing. > >> >> I think this test should be done. Afterall, if we write a lock-free >> >> table_ids data structure, then we would be _much_ better off than >> >> using TLS and allocations all over the place. >> > >> > indeed lock-free tables might help too but we can use spinlocks on those >> > which is ok as table accesses for read or write should be extremely short >> > lived. thats MUCH better than tls. >> >> spinlocks are not lock-free. But even that is likely to be faster than >> the amount of code we need to run to try to optimize. > > we still need them on the eoid tables tho... no matter what. > >> [snip] >> >> > this means the oid lookup each and every time... thats not pretty. : >> > ( thus .. perf numbers? sure - atm we do a log of eo_do(obj, func1()); ie >> > only 1 func we don't use the eo_do batches much yet... >> >> Unfortunately I won't have time for that. It sure looks bad. However, the >> table is going to likely be in cache if we design it correctly. Besides, > > l2 cache? maybe. not l1. well unlikely. :) > >> the stack maintenance is not cheap and requires allocations from >> time-to-time. We could probably make a table lookup in much less than >> 100 instructions and a dozen data accesses. And, if we can make it really >> lock-free, then we will have very little synchronization overhead. I don't >> think we can do the same with eo_do, which _requires_ us to go around >> some kind of global to fetch the object we are calling (which creates >> our synchronization problems). > > given what i have been looking at.. 100 instr is a big cost. 50 is a big > cost. > 10 is worth worrying about. :) > > and with eo_do we don't HAVE to use the TLS thing - the tls lookup is > expensive. we can just put ctx on the current stack. eo_do begins and fills > the > ctx with a looked up obj ptr and anything else and then calls the funcs > passing > in ctx. we can hide the pass in with a macro so we dont have to chnage any > code > in efl. just rebuild. eolian can generate the macro. > >> >> I think that eo_do is very likely hurting performance. So we should at >> >> least prove that it does give better performance before we start using >> >> macros all over the place, which will be necessary to avoid some TLS. >> > >> > my actual profiling shows its the call resolve and fetching of the class >> > data (scope data) that really are costing a lot. those are the huge things >> > - and those have to be done one way or the other. so dropping eo_do doesn >> > thelp at all here. >> >> It does if we just kill eo_do and start optimizing that. Right now, just >> making eo work right is not an easy task. >> >> Unfortunately I won't be able to prove either way and will only be able to >> get back to this by the end of november. However, if we do not freeze >> Eo interface right now then we could have more time to bring data to >> the discussion, or if someone else might be willing to try. > > it's going to break between 1.16 and 1.17 for sure. eo abi that is. > >> >> Best regards, >> >> -- >> >> Felipe Magno de Almeida >> >> Kind regards, >> -- >> Felipe Magno de Almeida >> > > > -- > ------------- Codito, ergo sum - "I code, therefore I am" -------------- > The Rasterman (Carsten Haitzler) ras...@rasterman.com > -- Felipe Magno de Almeida ------------------------------------------------------------------------------ _______________________________________________ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel