[E-devel] memory allocation perf

The Rasterman Thu, 29 Sep 2016 17:12:11 -0700

so for a while now i've been mumbling about possibly pulling in a custom memory
allocator (generic one intended to replace malloc/calloc/realloc/free) because
of various reasons.

1. move memory allocated for efl away from "apps" so the chances of walking
over your memory segment into an efl one and corrupting it go down (at least
app vs efl becomes a bigger device).

2. pointer recycling from libc now won't interfere with efl. i mean if app
mallocs something, then frees it, then keeps using it and happens to not crash,
but an efl malloc now recycles it then app scribbles over efl memory - we crash
and hunt bugs that are not ours. to be 100% fair this applies the other way too
(ie efl is bad and messes up app memory) and also applies to #1 above

3. we can drop memory usage significantly on 64bit but actually having 32bit
pointers *IF* we have our own allocator (for a lot of out memory usage). if we
force memory to be 8 byte aligned we can still allocate up to 32gb of data.
64gb if we force 16 byte alignment. how do we do this? realptr = (smallptr << 3)
+ mem_base; or ... realptr = (smallptr << 4) + mem_base; ... make a small macro
that just does this to resolve a "full" pointer from the "small pointer". in
fact if we have multiple domains/regions as long as you know the region the
"smallptr" came from you can access as much mem as u like by splitting. tbh 32
or 64gb of ram just for our data structures is "enough". i'd still use libc for
large chunks ... like images or fonts etc. - though these days we dont even use
malloc for most of those. we use mmap. we can extend smallptr even more. 16bit
ptrs? (any single domain can access then 512k or 1mb of data, so as long as our
overall needs for that type are within that size then we can cut ptrs down even
more). and as we all know - smaller mem footrpint actually tends to increase
speed too thanks to better cache coherency. we can still have full "64bit"
pointers too. we can easily move over just by changing 1 ptr's type to a new
typedef and then watch the compile errors and "fix them" with a macro. so
moving over should be straightforward and safe if done "one thing at a time".

4. as a bonus round, looking around on how to build such an allocator i ran
across jemalloc. http://jemalloc.net/ ... specifically i saw:

https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919

and now really recently:

https://lists.freedesktop.org/archives/mesa-dev/2016-September/130009.html

we are actually very "malloc heavy" in efl. we malloc and free LOTS of things.
we've move many things to mempools but we still are malloc heavy. so moving to
jemalloc COULD really help us. we cant just wholesale move all of efl over at
once. i think we have to move over very carefully one "thing" at a time. also
there is the issue of relying on an external jemalloc or shipping a copy.

i think we'd need to ship a copy. why? if we want to do 32bit ptrs or 16bit
ptrs on a 64bit system (ore 32 bit too where the 32bit ptrs would just compile
out to nothing, and 16bit ptrs still work), we'd need to change jemalloc a bit.
we'd need to NOT have just malloc/calloc/realloc/free replacements but need
something like:

// some time at startup/init efl wouldinternally make some domains for itself:
domain = eina_mem_domain_new(EINA_MEM_DOMAIN_FULL);
domain2 = eina_mem_domain_new(EINA_MEM_DOMAIN_32);
domain3 = eina_mem_domain_new(EINA_MEM_DOMAIN_16);

//...
// and later on when we allocate memory we determine which domain to allocate
// from depending on needs
fullptr = eina_mem_malloc(domain, size);
eina_mem_free(domain, fullptr);
//...
smallptr = eina_mem_malloc32(domain2, size);
eina_mem_free32(domain2, smallptr);
//...
tinyptr = eina_mem_malloc16(domain3, size);
eina_mem_free16(domain3, tinyptr);

so we'd always have to pass in the domain the mem comes from. i only see the
need for a single "32bit" domain for general efl allocation EXCEPT where we
explicitly expose data through the api that the api user should be calling free
() on. so since we only need one domain across efl in general we can bury this
as a macro so it becomes:

smallptr = EFL_MALLOC(size);

to make moving over easier. yes. every ptr has to be moved one at a time. the
16bit domains are more tricky due to their small size of total amount of
memory they can store. i suspect we might create several 16bit domains and
split data between them as needed. e.g. imagine a single 16bit domain JUST for
eo callback entries/nodes. will all out callback needs for nay app fit in a
single 1mb memory chunk? ... to be thought about.

anyway. in order to be able to have domains, we will have to likely dig into
jemalloc and maybe change a few things and structure it so we can access
internal code paths to implement the above, thus needing to ship a copy.

--
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler) [email protected]

------------------------------------------------------------------------------
_______________________________________________
enlightenment-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

[E-devel] memory allocation perf

Reply via email to