Re: [E-devel] Callback arrays and callback invocation optimisations

The Rasterman Tue, 23 Aug 2016 23:03:19 -0700

On Tue, 23 Aug 2016 10:51:05 -0700 Cedric BAIL <cedric.b...@free.fr> said:


> Hello,
> 
> On Tue, Aug 23, 2016 at 3:31 AM, Tom Hacohen <t...@osg.samsung.com> wrote:
> > Callback arrays was an idea that was introduced a while back to save
> > memory. The idea came from the observation that in many cases we add a
> > few callbacks together to every object of the same type. For example,
> > for an elm widget, we may add "move, resize, hidden, mouse_down,
> > mouse_up" so instead of manually adding all of them separately and have
> > eo allocate memory for each (and a separate private data pointer for
> > each) we would just create a static array and pass that.
> > This leads (at least in theory) to quite significant savings in memory.
> > I haven't actually tested the change, as I didn't do it, Cedric, do you
> > have some numbers to share?
> 
> I remember something along a few hundred kb on elementary_test. Edje
> using it was a real saver, it was ahead of any memory allocation at
> that point.

that's significant. it's still kind of a problem for api and optimizing call
overhead.

> > However, while they provide a nice memory improvement, they have been
> > hampering many optimisation strategies that would make callback
> > invocation significantly faster. Furthermore, maybe (not sure), we can
> > automatically de-duplicate event lists internally (more on that in a
> > moment). With that being said, there is a way we can maybe keep array
> > callbacks with some limitations.
> 
> Do you have a case where performance are impacted by callback today ?
> I have found that we usually have a very small number of callbacks
> (likely in an array this days) and when speed did really matter it was
> just best to not trigger the callback at all (That's why we have this
> code in many place that count if any callback has been registered).
> 
> > I discussed my ideas with Carsten on IRC and I believe we have reached
> > something that would be significantly faster and better.
> >
> > Assuming callback arrays are no more:
> > First of all we will change the internal callbacks structure from a list
> > to array. The array will be sorted based on the address of the event
> > structure first, and priority second. The array will only include the
> > pointer to the callback structure, nothing else. All of the rest of the
> > information needed will be stored in an array of the same size and the
> > same order.
> > This means that we will now be able to walk the array and stop once the
> > address of the event we are looking for is higher than the address of
> > the address of the current event. Meaning we will walk only 50% of the
> > array on average, and the array itself will most likely be in the same
> > cache line. After we find the correct one, we will also fetch the rest
> > of the data based on the index. This will lead to blazing fast
> > iterations of callbacks without the need to optimise at all.
> 
> Callback array are already likely to be in the same cache line. Them

well 4 items on 64bit can fit in a cacheline - i think if we restructures it we
could fit 8 items in so it likely would be more optimal if we redid the data
arrangement. so either we change api and abi OR we make copies of callback
arrays internally and we can hash/memcmmp and share them across all objects.
this is also slightly cleaner as the array doesnt have to be kept in memory by
the caller for all objects possibly using it. efl can take care of it. if we
duplicate it internally we can also reformat it internally to be more optimal
like tasn suggests - stuffing cb arrays at the end of regular cb's.

> being const means they are automatically shared with all instances. It
> does impact our memory usage, but also our cache usage. For example in
> edje, we do have object in an object that trigger a callback and get
> propagated to its parent. With array callbacks it is exactly the same
> memory used in both object. So it doesn't only save memory, but I am
> pretty sure it does result in the same kind of speed up you are
> looking at.

yeah. but as above. we can get the same benefits if we de-dup internally and
make copies of the incoming array from the api but reformat it to have better
l1 cache coherency. :)

> > We can also store a pointer to the array in a hash table with the key
> > being some sort of a hash of the array in order to do some deduplication
> > afterwards (point to the same arrays, but obviously different private
> > data, so that would still be duplicated) if we feel it's needed. It
> > probably won't save as much though and will have some running costs.
> 
> For anything < 16 entries, I bet that a hash table will be slower than
> walking an array. Don't forget you need to compute the hash key, jump
> in an array, walk down a rbtree and finally iterate over a list. Hash
> are good for very large number of object, not for small number.

no - he's talking about taking a list of callbacks and de-duplicating them into
a single instance in memory then sharing that instance across multiple objects
(refcount it etc.) like callback arrays, but implicitly behind the scenes.

so if you have a list of cabacks:

event_a func1 data1
event_b func2 data2
event_b func3 data3
event_c func4 data1

... etc. - take the data, hash it, then look at any existing stored arrays with
the same hash, if they exist and then an EXACT "memcmp" matches, just re-use the
existing stored array of cb's and ref++ it. the hash just cuts down the search
scope to find a match.

> > The last idea is to keep callback arrays, but kind of limit their scope.
> > The problem (or at least one of them) is that callback arrays support
> > setting a priority which means calling them needs to be in between the
> > calls to normal callbacks. This adds a lot of complexity (this is a very
> > hot path, even a simple if is complexity, but this adds more). If we
> > define that all callback arrays are always the lowest priority (called
> > last), which in practice will have almost zero impact if at all, we can
> > just keep them, and just call them after we do the normal callback calls
> > (if they exist). We can even optimise further by not making the arrays
> > constant, and thus letting us sort them and then run the same algorithm
> > mentioned above for searching. This is probably the most acceptable
> > compromise, though I'm not sure if it'll block any future optimisation
> > attempts that I'm not able to foresee.
> 
> No ! Array are only useful if they are constant ! That is the only way
> to share them accross all instance of object. Their size being

actually it isn't. you can duplicate them internally and refcount that
duplicated instance. it makes it easier on the api user to not HAVE to have a
"const" blob of data. it actually helps bindings too as people can then use cb
arrays in bindings too to save memory. the current api doesnt allow it to work
well for bindings.

> ridiculously small, I bet you won't win anything in reordering them.
> And if you really want to reorder them, you can do that once at
> creation time in the inline function that create them as defined in
> Eo.h.
> 
> > I'm not a huge fan of callback arrays, but if they do save the memory
> > they claim to be saving, I see no problem with keeping a more limited
> > version of them that let us optimise everything in the manner described
> > above.
> 
> I am not a huge fan of optimization without a clear real life case.
> Please share number and scenario of when it does matters. I have seen
> enough people wasting there time optimizing things that don't matters
> that I really take it with a grain of salt if you are not showing real
> life scenario. Sharing a callgrind trace or something along that line
> would really help make your point here.

i have done callgrind traces and prof too and eo_callback_call has turned up
using like a constant ~1-3% before. since it is such a hot path it is worth
worrying about as optimizations MAY AFFECT api and if we want to lock api in
stone for 1.19 then this is important.

i personally think we either remove cb arrays and find a way to implicitly
optimized INSIDE the api by finding common data sets and de-duplicating them OR
at least we stop using the cb arrays "in place" and we make copies of them and
then share that single internal copy of THAT array and refcount it. that would
help bindings and improve the api a bit AND it'd not tie INTERNAL data
arrangement to external api's as now we can freely re-order/layout data
internally to optimize LATER without hurting api.

> -- 
> Cedric BAIL
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> enlightenment-devel mailing list
> enlightenment-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/enlightenment-devel
> 


-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    ras...@rasterman.com


------------------------------------------------------------------------------
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Re: [E-devel] Callback arrays and callback invocation optimisations

Reply via email to