So,

I was, just like, remembering a trick I used in C++ for eina_log.
And it hit me. We use eo_do because we need to run something
_before_ and _after_ we call functions. Why don't we use a fake
for loop?

Let's say we do (simplified):

#define eo_prepare(obj) \
for(struct _Eo_Object* __obj; (eo_finish(__obj), false);)

And then people can do:

eo_prepare(obj)
  efl_gfx_base_size_set(x, y);

or:

eo_prepare(obj)
  {
     efl_gfx_base_size_set(x, y);
     efl_gfx_base_visible_set(true);
  }


The naming is awful (eo_prepare) and how it creates a new
scope implicitly. However, with this trick in mind, we might just
be able to create a better syntax and still be able to use
a optimization that requires setting a context and finishing it.
No return statement will  be allowed though. And break will
break from the implicit for-loop and not for any outer-loop
that might happen to exist.

Regards,


On Wed, Nov 4, 2015 at 12:55 PM, Carsten Haitzler <ras...@rasterman.com> wrote:
> On Wed, 4 Nov 2015 01:04:58 -0200 Felipe Magno de Almeida
> <felipe.m.alme...@gmail.com> said:
>
>> On Wed, Nov 4, 2015 at 12:38 AM, Carsten Haitzler <ras...@rasterman.com>
>> wrote:
>> > On Sun, 1 Nov 2015 22:22:47 -0200 Felipe Magno de Almeida
>> > <felipe.m.alme...@gmail.com> said:
>> >
>> >> OK,
>> >>
>> >> So, I tried to take a stab at it during the weekend.
>> >>
>> >> I think all the optimizations are actually hurting performance. I
>> >> wanted to test removing eo_do and the whole machinery for stacks etc.
>> >> And just use the _eo_obj_pointer_get. However, for some reason, mixins
>> >> and composites stopped working and I don't have much time to
>> >> investigate.
>> >
>> > but... did you get any perf numbers?
>>
>> Unfortunately no. Not enough time. Just adding the object to each
>> function took a lot of time. Unfortunately this is just a guess,
>> but if we are going to have this much trouble, at least proving
>> eo_do really brings any benefit should be done. Or else, we
>> might just be doing pessimizations, instead of optimizations.
>
> eoid -> obj lookup is a cost. my recent poking in eo_* literally shows a 
> single
> if () has real impact in this hot path. splitting up ptr to its components,
> then looking up table ptr then row, getting ptr, checking gen count matches
> then returning that... is going to be a signficiant cost. doing it every func
> instead of every 3rdd func... is a real cost. we dont take advantage of eo_do
> enough yet to really measure, thus this would require a synthetic benchmark to
> show it.
>
> so you propose adding a cost... for the purpose of syntactic sugar? ie
>
> efl_text_set(obj, "text");
> efl_color_set(obj, 10, 20, 50, 255);
> efl_show(obj);
>
> vs.
>
> eo_do(obj,
>   efl_text_set("text");
>   efl_color_set(10, 20, 50, 255);
>   efl_show());
>
> ie - you don't "like" the eo_do() bit. eo_do is and will definitely be more
> efficient.
>
>> Which seems to me _very_ likely, by trying to be faster than
>> C++, an impossible goal given our requirements, we're running
>> even more code. Besides, we have Eolian generation to help
>> us in optimizations. For example, maybe we should look into
>> devirtualization of function calls instead of caching results
>> in TLS stacks with multiple allocations.
>
> we are not allowed to use eolian for generating calling code. we went over 
> this
> repeatedly at the start of eo. i wanted us to have a preprocessor to pass c 
> thru
> that would have happily solved all these issues and allows the syntax we want
> and then to move optimizations into the preprocessor. everyone was against it,
> so we use macros and tls stacks and "look this up inside eo somewhere at
> runtime" code that costs us of course. :) eolian can only generate one side of
> the problem - not the other side - the calling code.
>
> i think we can remove the tls thing by putting it on the local stack as a
> context so no more tls.
>
>> > because what you propose is that now wee
>> > have to eoid -> obj ptr lookup via table every call and we can't batch and
>> > share within an eo_do chunk.
>>
>> The caching is more expensive because of synchronization, the
>> lookup is actually a hash-like lookup, so it should be faster than
>> TLS, which actually does hash-lookup too and even more.
>
> already said the eo stack fram should styop being tls and be local on stack.
> but the eo call resolve is not in a tls or  in a hash for the call and eo data
> - it's in the actual method itself. thats a different thing.
>
>> >> I think this test should be done. Afterall, if we write a lock-free
>> >> table_ids data structure, then we would be _much_ better off than
>> >> using TLS and allocations all over the place.
>> >
>> > indeed lock-free tables might help too but we can use spinlocks on those
>> > which is ok as table accesses for read or write should be extremely short
>> > lived. thats MUCH better than tls.
>>
>> spinlocks are not lock-free. But even that is likely to be faster than
>> the amount of code we need to run  to try to optimize.
>
> we still need them on the eoid tables tho... no matter what.
>
>> [snip]
>>
>> > this means the oid lookup each and every time... thats not pretty. :
>> > ( thus .. perf numbers? sure - atm we do a log of eo_do(obj, func1()); ie
>> > only 1 func we don't use the eo_do  batches much yet...
>>
>> Unfortunately I won't have time for that. It sure looks bad. However, the
>> table is going to likely be in cache if we design it correctly. Besides,
>
> l2 cache? maybe. not l1. well unlikely. :)
>
>> the stack maintenance is not cheap and requires allocations from
>> time-to-time. We could probably make a table lookup in much less than
>> 100 instructions and a dozen data accesses. And, if we can make it really
>> lock-free, then we will have very little synchronization overhead. I don't
>> think we can do the same with eo_do, which _requires_ us to go around
>> some kind of global to fetch the object we are calling (which creates
>> our synchronization problems).
>
> given what  i have been looking at.. 100 instr is a big cost. 50 is a big 
> cost.
> 10 is worth worrying about. :)
>
> and with eo_do we don't HAVE to use the TLS thing - the tls lookup is
> expensive. we can just put ctx on the current stack. eo_do begins and fills 
> the
> ctx with a looked up obj ptr and anything else and then calls the funcs 
> passing
> in ctx. we can hide the pass in with a macro so we dont have to chnage any 
> code
> in efl. just rebuild. eolian can generate the macro.
>
>> >> I think that eo_do is very likely hurting performance. So we should at
>> >> least prove that it does give better performance before we start using
>> >> macros all over the place, which will be necessary to avoid some TLS.
>> >
>> > my actual profiling shows its the call resolve and fetching of the class
>> > data (scope data) that really are costing a lot. those are the huge things
>> > - and those have to be done one way or the other. so dropping eo_do doesn
>> > thelp at all here.
>>
>> It does if we just kill eo_do and start optimizing that. Right now, just
>> making eo work right is not an easy task.
>>
>> Unfortunately I won't be able to prove either way and will only be able to
>> get back to this by the end of november. However, if we do not freeze
>> Eo interface right now then we could have more time to bring data to
>> the discussion, or if someone else might be willing to try.
>
> it's going to break between 1.16 and 1.17 for sure. eo abi that is.
>
>> >> Best regards,
>> >> --
>> >> Felipe Magno de Almeida
>>
>> Kind regards,
>> --
>> Felipe Magno de Almeida
>>
>
>
> --
> ------------- Codito, ergo sum - "I code, therefore I am" --------------
> The Rasterman (Carsten Haitzler)    ras...@rasterman.com
>



-- 
Felipe Magno de Almeida

------------------------------------------------------------------------------
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Reply via email to