On Fri, 4 Nov 2016 10:18:33 -0200 Gustavo Sverzut Barbieri <barbi...@gmail.com>
said:

> On Thu, Nov 3, 2016 at 9:27 PM, Carsten Haitzler <ras...@rasterman.com> wrote:
> > On Thu, 3 Nov 2016 11:24:14 -0200 Gustavo Sverzut Barbieri
> > <barbi...@gmail.com> said:
> >
> >> I guessed mempool and eina_trash did that
> >
> > nah - mempool i don't think has a "purgatory" for pointers.
> > they are released back into the pool.
> 
> well, it could... OTOH it's just for "empty blocks", since if it's in
> a mempool that has memory blocks and they're still in use, it will
> just flag as unused.
> 
> also, it simplifies bookkeeping of the memory if they are all of the
> same size, like you said Eina_List, it knows the size of each entry,
> thus just need to mark each position that is usable, not try to
> allocate based on size or similar -- much more efficient.

yah. that's what mempool does... but it doesnt have 2 states for an allocation.
it doesnt have "in use" "freed but not able to be reused yet" and "free and
able to be re-used". it just has 1. in use or not.

> > trash is actually a cache for storing ptrs but it never
> > actually frees anything. it doesn't know how to. you have to manually clean
> > trash yourself and call some kind of free func when you do the clean. trash
> > doesn't store free funcs at all.
> 
> I don't see why it couldn't.

but it doesn't, and eina_trash is all static inlines with structs exposed so
we'd break struct definition, memory layout and api to do this. if an
eina_trash is exposed from a lib compiled against efl 1.18 against other code
compiled against 1.19 - it'd break. even worse eina_trash is a single linked
list so walking through it is scattered through memory thus basically likely a
cache miss each time.

> but I find this is trying to replace malloc's internal structures,
> which is not so nice. As you know, malloc implementation can
> postpone/defer actual flushes, it's not 1:1 with brk() and munmap()
> since like our mempools the page or stack may have used bits that
> prevents that to be given back to the kernel.

i know. but it's out of our control. we can't change what and how malloc does
this. we can't do smarter overwrite detection. malloc has options for filling
freed memory with a pattern - but it will do it to any sized allocation. 1 byte
or 1 gigabyte. with a custom implementation WE can decide eg only fill in up to
256 bytes as this is what might be sued for small objects/list nodes but leave
big allocations untouched or .. only fill in the FIRST N bytes of an
allocation with a pattern. if the pattern has been overwritten between
submission to a free queue AND when it is actually freed then we have a bug in
code somewhere scribbling over freed memory. at least we know it and know what
to be looking for. malloc is far more limited in this way.

also we can defer freeing until when WE want. e.g. after having gone idle and
we would otherwise sleep. malloc really doesnt have any way to do this nicely.
it's totally non-portable, libc specific (eg glibc) etc. and even then very
"uncontrollable". a free queue of our own is portable AND controllable.

> what usually adds overhead are mutexes and the algorithms trying to
> find an empty block... if we say freeq/trash are TLS/single-thread,
> then we could avoid the mutex (but see malloc(3) docs on how they try
> to minimize that contention), but adding a list of entries to look for
> a free spot is likely worse than malloc's own tuned algorithm.

no no. i'm not talking about making a CACHE of memory blocks. simply a fifo.
put a ptr on the queue with a free func. it sits there for some time and then
something walks this from beginning to end actually freeing. e.g. once we have
reached and idle sleep state. THEN the frees really happen. once on the free
queue there is no way off. you are freed. or to be freed. only a question of
when.

if there is buggy code that does something like:

x = malloc(10);
x[2] = 10;
free(x);
y = malloc(10);
y[2] = 10;
x[2] = 5;

... there is a very good chance y is a recycled pointer - same mem location as
x. when we do x[2] = 5 we overwrite y[2] with 5 even tho it now should be 10.
yes. valgrind can catch these... but you HAVE to catch them while running.
maybe it only happens in certain logic paths. yes. coverity sometimes can find
these too through static analysis. but not always. and then there are the cases
where this behaviour is split across 2 different projects. one is efl, the
other is some 3rd party app/binary that does something bad. the "y" malloc is
in efl. the c one is in an app. the app now scribbles over memory owned by efl.
this is bad. so efl now crashes with corrupt data structures and we can never
fix this at all as the app is a 3rd party project simply complaining that a
crash is happening in efl.

we can REDUCE these issues by ensuring the x pointer is not recycled so
aggressively by having a free queue. have a few hundred or a few thousand
pointers sit on that queue for a while and HOPE this means the buggy code will
write to this memory while its still allocated but not in use... thus REDUCING
the bugs/crashes at the expense of latency on freeing memory. it doesn't fix
the bug but it mitigates the worst side effects.

of course i'd actually like to replace all our allocations with our own special
allocator that keeps pointers and allocations used in efl separated out into
different domains. e.g. eo can have a special "eo object data" domain and all
eo object data is allocated from here. pointers from here can never be recycled
for a strdup() or a general malloc() or an eina_list_append (that already uses
a mempool anyway), etc. - the idea being that its HARDER to accidentally stomp
over a completely unrelated data structure because pointers are not re-cycled
from the same pool. e.g. efl will have its own pool of memory and at least if
pointers are re-used, they are re-used only within that domain/context. if we
are even smarter we can start using 32bit pointers on 64bit by returning
unisigned ints that are an OFFSET into a single 4gb mmaped region. even better
bitshifting could give us 16 or 32 or even 64gb of available address space for
these allocations if we force alignment to 4, 8 or 16 bytes (probably a good
idea). so you access such ptrs with:

#define P(dom, ptr) \
((void *)(((unsigned char *)((dom)->base)) + (((size_t)ptr) << 4))

so as long as you KNOW the domain it comes from you can compress pointers down
to 1/2 the size ... even 1/4 the size and use 16bit ptrs... like above. (that
would give you 1mb of memory space per domain so for smallish data sets might
be useful). this relies on you knowing in advance the domain source and getting
this right. we can still do full ptrs too. but this would quarantine memory and
pointers from each other (libc vs efl) and help isolate bugs/problems.

but this is a hell of a lot more work. it needs a whole malloc implementation.
i'm not talking about that. far simpler. a queue of pointers to free at a
future point. not a cache. not trash. not to then be dug out and re-used. that
is the job of the free func and its implementation to worry about that. if it's
free() or some other free function. only put memory into the free queue you can
get some sensible benefits out of. it's voluntary. just replace the existing
free func with the one that queues the free with the free func and ptr and size
for example. do it 1 place at a time - totally voluntary. doesn't hurt to do it.

-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    ras...@rasterman.com


------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Reply via email to