On Fri, 4 Nov 2016 10:18:33 -0200 Gustavo Sverzut Barbieri <barbi...@gmail.com> said:
> On Thu, Nov 3, 2016 at 9:27 PM, Carsten Haitzler <ras...@rasterman.com> wrote: > > On Thu, 3 Nov 2016 11:24:14 -0200 Gustavo Sverzut Barbieri > > <barbi...@gmail.com> said: > > > >> I guessed mempool and eina_trash did that > > > > nah - mempool i don't think has a "purgatory" for pointers. > > they are released back into the pool. > > well, it could... OTOH it's just for "empty blocks", since if it's in > a mempool that has memory blocks and they're still in use, it will > just flag as unused. > > also, it simplifies bookkeeping of the memory if they are all of the > same size, like you said Eina_List, it knows the size of each entry, > thus just need to mark each position that is usable, not try to > allocate based on size or similar -- much more efficient. yah. that's what mempool does... but it doesnt have 2 states for an allocation. it doesnt have "in use" "freed but not able to be reused yet" and "free and able to be re-used". it just has 1. in use or not. > > trash is actually a cache for storing ptrs but it never > > actually frees anything. it doesn't know how to. you have to manually clean > > trash yourself and call some kind of free func when you do the clean. trash > > doesn't store free funcs at all. > > I don't see why it couldn't. but it doesn't, and eina_trash is all static inlines with structs exposed so we'd break struct definition, memory layout and api to do this. if an eina_trash is exposed from a lib compiled against efl 1.18 against other code compiled against 1.19 - it'd break. even worse eina_trash is a single linked list so walking through it is scattered through memory thus basically likely a cache miss each time. > but I find this is trying to replace malloc's internal structures, > which is not so nice. As you know, malloc implementation can > postpone/defer actual flushes, it's not 1:1 with brk() and munmap() > since like our mempools the page or stack may have used bits that > prevents that to be given back to the kernel. i know. but it's out of our control. we can't change what and how malloc does this. we can't do smarter overwrite detection. malloc has options for filling freed memory with a pattern - but it will do it to any sized allocation. 1 byte or 1 gigabyte. with a custom implementation WE can decide eg only fill in up to 256 bytes as this is what might be sued for small objects/list nodes but leave big allocations untouched or .. only fill in the FIRST N bytes of an allocation with a pattern. if the pattern has been overwritten between submission to a free queue AND when it is actually freed then we have a bug in code somewhere scribbling over freed memory. at least we know it and know what to be looking for. malloc is far more limited in this way. also we can defer freeing until when WE want. e.g. after having gone idle and we would otherwise sleep. malloc really doesnt have any way to do this nicely. it's totally non-portable, libc specific (eg glibc) etc. and even then very "uncontrollable". a free queue of our own is portable AND controllable. > what usually adds overhead are mutexes and the algorithms trying to > find an empty block... if we say freeq/trash are TLS/single-thread, > then we could avoid the mutex (but see malloc(3) docs on how they try > to minimize that contention), but adding a list of entries to look for > a free spot is likely worse than malloc's own tuned algorithm. no no. i'm not talking about making a CACHE of memory blocks. simply a fifo. put a ptr on the queue with a free func. it sits there for some time and then something walks this from beginning to end actually freeing. e.g. once we have reached and idle sleep state. THEN the frees really happen. once on the free queue there is no way off. you are freed. or to be freed. only a question of when. if there is buggy code that does something like: x = malloc(10); x[2] = 10; free(x); y = malloc(10); y[2] = 10; x[2] = 5; ... there is a very good chance y is a recycled pointer - same mem location as x. when we do x[2] = 5 we overwrite y[2] with 5 even tho it now should be 10. yes. valgrind can catch these... but you HAVE to catch them while running. maybe it only happens in certain logic paths. yes. coverity sometimes can find these too through static analysis. but not always. and then there are the cases where this behaviour is split across 2 different projects. one is efl, the other is some 3rd party app/binary that does something bad. the "y" malloc is in efl. the c one is in an app. the app now scribbles over memory owned by efl. this is bad. so efl now crashes with corrupt data structures and we can never fix this at all as the app is a 3rd party project simply complaining that a crash is happening in efl. we can REDUCE these issues by ensuring the x pointer is not recycled so aggressively by having a free queue. have a few hundred or a few thousand pointers sit on that queue for a while and HOPE this means the buggy code will write to this memory while its still allocated but not in use... thus REDUCING the bugs/crashes at the expense of latency on freeing memory. it doesn't fix the bug but it mitigates the worst side effects. of course i'd actually like to replace all our allocations with our own special allocator that keeps pointers and allocations used in efl separated out into different domains. e.g. eo can have a special "eo object data" domain and all eo object data is allocated from here. pointers from here can never be recycled for a strdup() or a general malloc() or an eina_list_append (that already uses a mempool anyway), etc. - the idea being that its HARDER to accidentally stomp over a completely unrelated data structure because pointers are not re-cycled from the same pool. e.g. efl will have its own pool of memory and at least if pointers are re-used, they are re-used only within that domain/context. if we are even smarter we can start using 32bit pointers on 64bit by returning unisigned ints that are an OFFSET into a single 4gb mmaped region. even better bitshifting could give us 16 or 32 or even 64gb of available address space for these allocations if we force alignment to 4, 8 or 16 bytes (probably a good idea). so you access such ptrs with: #define P(dom, ptr) \ ((void *)(((unsigned char *)((dom)->base)) + (((size_t)ptr) << 4)) so as long as you KNOW the domain it comes from you can compress pointers down to 1/2 the size ... even 1/4 the size and use 16bit ptrs... like above. (that would give you 1mb of memory space per domain so for smallish data sets might be useful). this relies on you knowing in advance the domain source and getting this right. we can still do full ptrs too. but this would quarantine memory and pointers from each other (libc vs efl) and help isolate bugs/problems. but this is a hell of a lot more work. it needs a whole malloc implementation. i'm not talking about that. far simpler. a queue of pointers to free at a future point. not a cache. not trash. not to then be dug out and re-used. that is the job of the free func and its implementation to worry about that. if it's free() or some other free function. only put memory into the free queue you can get some sensible benefits out of. it's voluntary. just replace the existing free func with the one that queues the free with the free func and ptr and size for example. do it 1 place at a time - totally voluntary. doesn't hurt to do it. -- ------------- Codito, ergo sum - "I code, therefore I am" -------------- The Rasterman (Carsten Haitzler) ras...@rasterman.com ------------------------------------------------------------------------------ Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi _______________________________________________ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel