Re: [E-devel] Callback arrays and callback invocation optimisations

Cedric BAIL Wed, 24 Aug 2016 12:04:47 -0700

On Wed, Aug 24, 2016 at 2:24 AM, Tom Hacohen <t...@osg.samsung.com> wrote:
> On 23/08/16 18:51, Cedric BAIL wrote:
>> On Tue, Aug 23, 2016 at 3:31 AM, Tom Hacohen <t...@osg.samsung.com> wrote:


<snip>

>>> However, while they provide a nice memory improvement, they have been
>>> hampering many optimisation strategies that would make callback
>>> invocation significantly faster. Furthermore, maybe (not sure), we can
>>> automatically de-duplicate event lists internally (more on that in a
>>> moment). With that being said, there is a way we can maybe keep array
>>> callbacks with some limitations.
>>
>> Do you have a case where performance are impacted by callback today ?
>> I have found that we usually have a very small number of callbacks
>> (likely in an array this days) and when speed did really matter it was
>> just best to not trigger the callback at all (That's why we have this
>> code in many place that count if any callback has been registered).
>
> It always showed up in callgrind. Obviously after you did your changes
> that improved things, because you essentially just don't call that code,
> but having to do this everywhere is a bit of a pain, especially if we
> can just make callbacks fast on their own.
>
> Callback_call takes around 1.5% in the efl atm. Though if we remove the
> not-call optimisations it would be much more again. I wonder if we can
> reach good results without it.

When genlist is scrolling, just calling a function is costly as we end
up calling it million times, litterally. I seriously doubt it is
possible.

>  From my tests back when I was optimising callback invocation, we had
> around 5 callbacks on average on objects with non-zero number of
> registered callbacks with a maximum number of around 12 if my memory
> serves, so this could potentially make callback calls so fast any
> optimisations won't matter.

This number where from before callbacks array. I am seriously
interested to know todays number. Also an improved statistic would be
to know how many callbacks are walked over in the most called case and
how many of those callbacks are actually in an array already.

<snip>

>>> We can also store a pointer to the array in a hash table with the key
>>> being some sort of a hash of the array in order to do some deduplication
>>> afterwards (point to the same arrays, but obviously different private
>>> data, so that would still be duplicated) if we feel it's needed. It
>>> probably won't save as much though and will have some running costs.
>>
>> For anything < 16 entries, I bet that a hash table will be slower than
>> walking an array. Don't forget you need to compute the hash key, jump
>> in an array, walk down a rbtree and finally iterate over a list. Hash
>> are good for very large number of object, not for small number.
>
> That was an optimisation that I just threw out there to the world, but I
> believe you misunderstood me. I didn't mean we create a hash table for
> calling events, it was for saving memory and deduplicating event
> callbacks (essentially callback arrays automatically). This is only done
> on callback add/del.

Indeed I missunderstood your intent. Still this will increase the cost
of insertion for no benefit in my opinion. See below.

>>> The last idea is to keep callback arrays, but kind of limit their scope.
>>> The problem (or at least one of them) is that callback arrays support
>>> setting a priority which means calling them needs to be in between the
>>> calls to normal callbacks. This adds a lot of complexity (this is a very
>>> hot path, even a simple if is complexity, but this adds more). If we
>>> define that all callback arrays are always the lowest priority (called
>>> last), which in practice will have almost zero impact if at all, we can
>>> just keep them, and just call them after we do the normal callback calls
>>> (if they exist). We can even optimise further by not making the arrays
>>> constant, and thus letting us sort them and then run the same algorithm
>>> mentioned above for searching. This is probably the most acceptable
>>> compromise, though I'm not sure if it'll block any future optimisation
>>> attempts that I'm not able to foresee.
>>
>> No ! Array are only useful if they are constant ! That is the only way
>> to share them accross all instance of object. Their size being
>> ridiculously small, I bet you won't win anything in reordering them.
>> And if you really want to reorder them, you can do that once at
>> creation time in the inline function that create them as defined in
>> Eo.h.
>
> That is absolutely untrue. You can reorder them where they are created
> (like you suggested), or reorder them when they are added and still
> share them. You'll only need to reorder once, after that, when they are
> in order, that's it. Const doesn't matter or help at all. Obviously
> you're expected not to change them.

If the array is not const, then you have to allocate it every time you
register it. This has a direct cost. Adding the fact you have then to
sort it out, hash, compare and maybe free it. I seriously doubt the
wisdom of doing so.

As said above, sort it at creation, add debug code that will warn if
inserting unsorted array (code that will be disabled in production)
and just improve walking on those sorted array. I bet that will be
enough of a speedup for our real use case if there is any (see below).

>>> I'm not a huge fan of callback arrays, but if they do save the memory
>>> they claim to be saving, I see no problem with keeping a more limited
>>> version of them that let us optimise everything in the manner described
>>> above.
>>
>> I am not a huge fan of optimization without a clear real life case.
>> Please share number and scenario of when it does matters. I have seen
>> enough people wasting there time optimizing things that don't matters
>> that I really take it with a grain of salt if you are not showing real
>> life scenario. Sharing a callgrind trace or something along that line
>> would really help make your point here.
>
> As I said, it's ~1.5% of the efl cpu usage when scrolling around
> genlist. It also wastes our memory to have them support priority. And as
> your changes proved, there is a reason to minimise callback calls, so we
> already have a case, instead of letting everyone reimplement that
> counting, it's better to just make callback calls fast. As I said, the
> price is very small, all I'm asking for is removing priority from
> callback arrays and always assume they are the lowest priority.

You realize that as an optimization, you are fighting not calling a
function, not walking an array, doing fetch and compare (even doing a
dichotomic search). Pretty sure the benefit of not triggering the
event will remain. Oh and there is plenty of case where, well, you
will still do the optionnal propagation, like for animator.

As for benchmarking, I did a quick run of 'ELM_TEST_AUTOBOUNCE=300
valgrind --tool=callgrind elementary_test  -to genlist'. I see a 0.90%
of the time spend in efl_event_callback_call (~400 000 calls) and
0.35% evas_object_event_callback_call (~500 000 calls). It is going to
be very very hard to win anything on that.

I see also way bigger fish to fish for :
 - _efl_object_call_resolve 12.53%
 - efl_data_scope_get 7.75%
 - efl_isa 3.54%
 - _efl_object_call_end 2.26%

If you manage to win 10% on any of those, you will have managed more
than if you reduce the cost of calling efl_event_callback_call to 0. I
am really not convinced that you are focusing on the right problem at
all here.
-- 
Cedric BAIL

------------------------------------------------------------------------------
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Re: [E-devel] Callback arrays and callback invocation optimisations

Reply via email to