Re: why allocators are not discussed here
Am Thu, 27 Jun 2013 01:59:00 +0200 schrieb Adam D. Ruppe destructiona...@gmail.com: void fillBuffer(lent char[] buffer) {} would be disallowed and that is something I would definitely want. Isn't that what scope is for? -- Marco
Re: why allocators are not discussed here
On Friday, 28 June 2013 at 07:07:39 UTC, Marco Leise wrote: Am Thu, 27 Jun 2013 01:59:00 +0200 schrieb Adam D. Ruppe destructiona...@gmail.com: void fillBuffer(lent char[] buffer) {} would be disallowed and that is something I would definitely want. Isn't that what scope is for? Reading dlang.org makes you guess so but official position is that 'scope' does not exist, so it is hard to say what it is really for.
Re: why allocators are not discussed here
On Thursday, 27 June 2013 at 22:50:47 UTC, John Colvin wrote: Old but perhaps relevant? http://www.linkedin.com/news?viewArticle=articleID=-1gid=86782type=memberitem=253295471articleURL=http%3A%2F%2Fwww%2Eallendowney%2Ecom%2Fss08%2Fhandouts%2Fberger02reconsidering%2Epdfurlhash=96TJgoback=%2Egmr_86782%2Egde_86782_member_253295471 (It's an academic article about memory allocation from 2002) Interesting paper. Still concurrency isn't really addressed, which is a problem to be future proof.
Re: why allocators are not discussed here
On Friday, 28 June 2013 at 07:07:39 UTC, Marco Leise wrote: Isn't that what scope is for? I don't really know. In practice, it does something else (usually nothing, but suppresses heap closure allocation on delegates). The DIPs relating to it all talk about returning refs from functions and I'm not sure if they relate to the built ins or not- I don't think it would quite work for what I have in mind.
Re: why allocators are not discussed here
On Friday, 28 June 2013 at 11:55:46 UTC, Adam D. Ruppe wrote: On Friday, 28 June 2013 at 07:07:39 UTC, Marco Leise wrote: Isn't that what scope is for? I don't really know. In practice, it does something else (usually nothing, but suppresses heap closure allocation on delegates). The DIPs relating to it all talk about returning refs from functions and I'm not sure if they relate to the built ins or not- I don't think it would quite work for what I have in mind. It is no-op keyword in current implementation for everything but delegates. DIP speculation was based on http://dlang.org/attribute.html#scope and Parameter Storage Classes in http://dlang.org/function.html but that info is obviously outdated.
Re: why allocators are not discussed here
On Friday, 28 June 2013 at 10:57:45 UTC, deadalnix wrote: On Thursday, 27 June 2013 at 22:50:47 UTC, John Colvin wrote: Old but perhaps relevant? http://www.linkedin.com/news?viewArticle=articleID=-1gid=86782type=memberitem=253295471articleURL=http%3A%2F%2Fwww%2Eallendowney%2Ecom%2Fss08%2Fhandouts%2Fberger02reconsidering%2Epdfurlhash=96TJgoback=%2Egmr_86782%2Egde_86782_member_253295471 (It's an academic article about memory allocation from 2002) Interesting paper. Still concurrency isn't really addressed, which is a problem to be future proof. http://en.wikipedia.org/wiki/Hoard_memory_allocator
Re: why allocators are not discussed here
On Friday, June 28, 2013 13:55:45 Adam D. Ruppe wrote: On Friday, 28 June 2013 at 07:07:39 UTC, Marco Leise wrote: Isn't that what scope is for? I don't really know. In practice, it does something else (usually nothing, but suppresses heap closure allocation on delegates). The DIPs relating to it all talk about returning refs from functions and I'm not sure if they relate to the built ins or not- I don't think it would quite work for what I have in mind. Per the spec, all scope is supposed to do is prevent references in a parameter to be escaped. To be specific, it says --- references in the parameter cannot be escaped (e.g. assigned to a global variable) --- So, in theory, if you had something like auto foo(scope int[] i) {...} it would prevent i or anything refering to it from being returned or assigned to any variable which will outlive the function call. However, scope currently does _nothing_ for anything other than delegates - which is why I think that using the in attribute is such an incredibly bad idea. Using either in or scope on anything other than delegates could result in all kinds of code breakage if/when scope is ever implemented for types other than delegates. For delegates, it has the advantage of telling the compiler that it doesn't need to allocate a closure (since the delegate won't be used passed the point when it's calling scope will exist as could occur if the delegate escaped the function it was passed to), but I'm not sure that even that works 100% correctly right now. We really should sort out exactly what we're going to do with scope one of these days soon. But the stuff that some of the DIPS do with scope (e.g. returning with scope - which is completely against the spec at this point) are suggestions and not at all how it currently works. - Jonathan M Davis
Re: why allocators are not discussed here
On Friday, 28 June 2013 at 17:43:21 UTC, Jonathan M Davis wrote: it would prevent i or anything refering to it from being returned or assigned to any variable which will outlive the function call. However, That's fairly close to what I'd want. But there's two cases I'm not sure it would cover: 1: struct Unique(T) { scope T borrow(); } If the unique pointer decides to let its reference slip, it wouldn't want it going somewhere else and escaping, since that breaks the unique need. This is important for a few cases. Here's one: int* foo; { Unique!(int*) bar; foo = bar.borrow; int* ok = bar.borrow; // this should be ok, because this never exists outside the same scope as the Unique } // foo now talks to a freed *bar, so that shouldn't be allowed Similarly, if bar were reassigned, this could cause trouble, but what we might do is just disallow such reassignments, but maybe it could work if it always goes down in scope. I'd have to think about that. (I'm thinking my borrowed thing might have to be a type constructor rather than a storage class. Otherwise, you could get around it by: int* bar(scope int* foo) { int* b = foo; return b; } Unless the compiler is very smart about following where it goes.) But if scope works on the return value too, it might be ok. maybe 2: void bar(scope int* foo, int** bar) { *bar = foo; } Actually, I'm reasonably clear the spec's scope words would work for this one. But we'd need to be sure - this is one case where pure wouldn't help (pure generally would help, since it disallows assignments to the outside world, but there's enough holes that you could leak a reference). To be memory safe, these would all have to be guaranteed.
Re: why allocators are not discussed here
On Friday, June 28, 2013 19:56:44 Adam D. Ruppe wrote: struct Unique(T) { scope T borrow(); } Per the current spec, this would not be a valid use of scope, as scope is specifically a parameter storage class and can only be used on function parameters (just like in, out, ref, and lazy). scope seems to be specifically intended for guaranteeing that an argument passed to a function does not escape that function. - Jonathan M Davis
Re: why allocators are not discussed here
On Wednesday, 26 June 2013 at 13:16:25 UTC, Jason House wrote: Bloomberg released an STL alternative called BSL which contains an alternate allocator model. In a nutshell object supporting custom allocators can optionally take an allocator pointer as an argument. Containers will save the pointer and use it for all their allocations. It seems simple enough and does not embed the allocator in the type. https://github.com/bloomberg/bsl/wiki/BDE-Allocator-model There is also EASTL's (Electronic Arts version of STL for gamedev) take on allocators. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html#eastl_allocator
Re: why allocators are not discussed here
On Wednesday, 26 June 2013 at 23:59:01 UTC, Adam D. Ruppe wrote: On Wednesday, 26 June 2013 at 23:02:47 UTC, H. S. Teoh wrote: Maybe a type distinction akin to C++'s auto_ptr might help? It might not be so bad if we modified D to add a lent storage class, or something, similar to some discussions about scope in the past. These would be values you may work with, but never keep; assigning them to anything is not allowed and you may only pass them to a function or return them from a function if that is also marked lent. Any regular reference would be implicitly usable as lent. Something along those lines would probably be a good solution. It seems that we're working with three types of objects: 1. Objects that are owned by a scope (can be stack-allocated) 2. Objects that are owned by a another object (C/C++-like memory management) 3. Objects that have no single owner (GC memory management) The first two would probably operate under semantics like lent or scope, although I'd like to propose an extension to the rules: it should be possible to store a weak reference to these types (or at least to #2) once we have weak reference support. The third type seems to be pretty much solved, seeing as we have a (mostly) working GC. Something like this might be a nice way to implement it: class Thing {} void doSomething(scope Thing t); //Takes #1, #2, or #3 by reference void doSomethingElse(owned Thing t); //Takes only #2 or #3 void main() { scope Thing t1; //stack-allocated doSomething(t1); owned Thing t2 = new Thing; //heap-allocated but freed at end of scope doSomething(t2); }
Re: why allocators are not discussed here
On Tuesday, 25 June 2013 at 22:22:09 UTC, cybervadim wrote: I know Andrey mentioned he was going to work on Allocators a year ago. In DConf 2013 he described the problems he needs to solve with Allocators. But I wonder if I am missing the discussion around that - I tried searching this forum, found a few threads that was not actually a brain storm for Allocators design. Please point me in the right direction or is there a reason it is not discussed or should we open the discussion? The easiest approach for Allocators design I can imagine would be to let user specify which Allocator operator new should get the memory from (introducing a new keyword allocator). This gives a total control, but assumes user knows what he is doing. Example: CustomAllocator ca; allocator(ca) { auto a = new A; // operator new will use ScopeAllocator::malloc() auto b = new B; free(a); // that should call ScopeAllocator::free() // if free() is missing for allocated area, it is a user responsibility to make sure custom Allocator can handle that } By default allocator is the druntime using GC, free(a) does nothing for it. if some library defines its allocator (e.g. specialized container), there should be ability to: 1. override allocator 2. get access to the allocator used I understand that I spent 5 mins thinking about the way Allocators may look. My point is - if somebody is working on it, can you please share your ideas? Old but perhaps relevant? http://www.linkedin.com/news?viewArticle=articleID=-1gid=86782type=memberitem=253295471articleURL=http%3A%2F%2Fwww%2Eallendowney%2Ecom%2Fss08%2Fhandouts%2Fberger02reconsidering%2Epdfurlhash=96TJgoback=%2Egmr_86782%2Egde_86782_member_253295471 (It's an academic article about memory allocation from 2002)
Re: why allocators are not discussed here
On 06/26/2013 12:50 AM, Adam D. Ruppe wrote: On Tuesday, 25 June 2013 at 22:22:09 UTC, cybervadim wrote: (introducing a new keyword allocator) It would be easier to just pass an allocator object that provides the necessary methods and don't use new at all. (I kinda wish new wasn't in the language. It'd make this a little more consistent.) I did think about this as well, but than I came up with something that IMHO is even simpler. Imagine we have two delegates: void* delegate(size_t); // this one allocs void delegate(void*);// this one frees you pass both to a function that constructs you object. The first is used for allocation the memory, the second gets attached to the TypeInfo and is used by the gc to free the object. This would be completely transparent to the user. The use in a container is similar. Just use the alloc delegate to construct the objects and attach the free delegate to the typeinfo. You could even mix allocator strategies in the middle of the lifetime of the container.
Re: why allocators are not discussed here
On 2013-06-26 01:16, Adam D. Ruppe wrote: You'd want it to be RAII or delegate based, so the scope is clear. with_allocator(my_alloc, { do whatever here }); or { ChangeAllocator!my_alloc dummy; do whatever here } // dummy's destructor ends the allocator scope I think the former is a bit nicer, since the dummy variable is a bit silly. We'd hope that delegate can be inlined. It won't be inlined. You would need to make it a template parameter to have it inlined. -- /Jacob Carlborg
Re: why allocators are not discussed here
Bloomberg released an STL alternative called BSL which contains an alternate allocator model. In a nutshell object supporting custom allocators can optionally take an allocator pointer as an argument. Containers will save the pointer and use it for all their allocations. It seems simple enough and does not embed the allocator in the type. https://github.com/bloomberg/bsl/wiki/BDE-Allocator-model On Tuesday, 25 June 2013 at 22:22:09 UTC, cybervadim wrote: I know Andrey mentioned he was going to work on Allocators a year ago. In DConf 2013 he described the problems he needs to solve with Allocators. But I wonder if I am missing the discussion around that - I tried searching this forum, found a few threads that was not actually a brain storm for Allocators design. Please point me in the right direction or is there a reason it is not discussed or should we open the discussion? The easiest approach for Allocators design I can imagine would be to let user specify which Allocator operator new should get the memory from (introducing a new keyword allocator). This gives a total control, but assumes user knows what he is doing. Example: CustomAllocator ca; allocator(ca) { auto a = new A; // operator new will use ScopeAllocator::malloc() auto b = new B; free(a); // that should call ScopeAllocator::free() // if free() is missing for allocated area, it is a user responsibility to make sure custom Allocator can handle that } By default allocator is the druntime using GC, free(a) does nothing for it. if some library defines its allocator (e.g. specialized container), there should be ability to: 1. override allocator 2. get access to the allocator used I understand that I spent 5 mins thinking about the way Allocators may look. My point is - if somebody is working on it, can you please share your ideas?
Re: why allocators are not discussed here
On Wednesday, 26 June 2013 at 13:16:25 UTC, Jason House wrote: Bloomberg released an STL alternative called BSL which contains an alternate allocator model. In a nutshell object supporting custom allocators can optionally take an allocator pointer as an argument. Containers will save the pointer and use it for all their allocations. It seems simple enough and does not embed the allocator in the type. https://github.com/bloomberg/bsl/wiki/BDE-Allocator-model I think the problem with such approach is that you have to maniacally add support for custom allocator to every class if you want them to be on a custom allocator. If we simply able to say - all memory allocated in this area {} should use my custom allocator, that would simplify the code and no need to change std lib. The next step is to notify allocator when the memory should be released. But for the stack based allocator that is not required. More over, if we introduce access to different GCs (e.g. mark-n-sweep, semi-copy, ref counted), we should be able to say this {} piece of code is my temporary, so use semi-copy GC, the other code is long lived and not much objects created, so use ref counted. That is, it is all runtime support and no need changing the library code.
Re: why allocators are not discussed here
26-Jun-2013 14:03, Robert Schadek пишет: On 06/26/2013 12:50 AM, Adam D. Ruppe wrote: On Tuesday, 25 June 2013 at 22:22:09 UTC, cybervadim wrote: (introducing a new keyword allocator) It would be easier to just pass an allocator object that provides the necessary methods and don't use new at all. (I kinda wish new wasn't in the language. It'd make this a little more consistent.) I did think about this as well, but than I came up with something that IMHO is even simpler. Imagine we have two delegates: void* delegate(size_t); // this one allocs void delegate(void*);// this one frees you pass both to a function that constructs you object. The first is used for allocation the memory, the second gets attached to the TypeInfo and is used by the gc to free the object. Then it's just GC but with an extra complication. This would be completely transparent to the user. The use in a container is similar. Just use the alloc delegate to construct the objects and attach the free delegate to the typeinfo. You could even mix allocator strategies in the middle of the lifetime of the container. -- Dmitry Olshansky
Re: why allocators are not discussed here
26-Jun-2013 02:22, cybervadim пишет: I know Andrey mentioned he was going to work on Allocators a year ago. In DConf 2013 he described the problems he needs to solve with Allocators. But I wonder if I am missing the discussion around that - I tried searching this forum, found a few threads that was not actually a brain storm for Allocators design. Please point me in the right direction or is there a reason it is not discussed or should we open the discussion? The easiest approach for Allocators design I can imagine would be to let user specify which Allocator operator new should get the memory from (introducing a new keyword allocator). This gives a total control, but assumes user knows what he is doing. Example: CustomAllocator ca; allocator(ca) { auto a = new A; // operator new will use ScopeAllocator::malloc() auto b = new B; free(a); // that should call ScopeAllocator::free() // if free() is missing for allocated area, it is a user responsibility to make sure custom Allocator can handle that } Awful. What that extra syntax had brought you? Except that now new is unsafe by design? Other questions involve how does this allocation scope goes inside of functions, what is the mechanism of passing it up and down of call-stack. Last but not least I fail to see how scoped allocators alone (as presented) solve even half of the problem. -- Dmitry Olshansky
Re: why allocators are not discussed here
On Wed, Jun 26, 2013 at 04:10:49PM +0200, cybervadim wrote: On Wednesday, 26 June 2013 at 13:16:25 UTC, Jason House wrote: Bloomberg released an STL alternative called BSL which contains an alternate allocator model. In a nutshell object supporting custom allocators can optionally take an allocator pointer as an argument. Containers will save the pointer and use it for all their allocations. It seems simple enough and does not embed the allocator in the type. https://github.com/bloomberg/bsl/wiki/BDE-Allocator-model I think the problem with such approach is that you have to maniacally add support for custom allocator to every class if you want them to be on a custom allocator. Yeah, that's a major inconvenience with the C++ allocator model. There's no way to say switch to allocator A within this block of code; if you're given a binary-only library that doesn't support allocators, you're out of luck. And even if you have the source code, you have to manually modify every single line of code that performs allocation to take an additional parameter -- not a very feasible approach. If we simply able to say - all memory allocated in this area {} should use my custom allocator, that would simplify the code and no need to change std lib. The next step is to notify allocator when the memory should be released. But for the stack based allocator that is not required. More over, if we introduce access to different GCs (e.g. mark-n-sweep, semi-copy, ref counted), we should be able to say this {} piece of code is my temporary, so use semi-copy GC, the other code is long lived and not much objects created, so use ref counted. That is, it is all runtime support and no need changing the library code. Yeah, I think the best approach would be one that doesn't require changing a whole mass of code to support. Also, one that doesn't require language changes would be far more likely to be accepted, as the core D devs are leery of adding yet more complications to the language. That's why I proposed that gc_alloc and gc_free be made into thread-global function pointers, that can be swapped with a custom allocator's version. This doesn't have to be visible to user code; it can just be an implementation detail in std.allocator, for example. It allows us to implement custom allocators across a block of code that doesn't know (and doesn't need to know) what allocator will be used. T -- Fact is stranger than fiction.
Re: why allocators are not discussed here
On Wednesday, 26 June 2013 at 14:17:03 UTC, Dmitry Olshansky wrote: Awful. What that extra syntax had brought you? Except that now new is unsafe by design? Other questions involve how does this allocation scope goes inside of functions, what is the mechanism of passing it up and down of call-stack. Last but not least I fail to see how scoped allocators alone (as presented) solve even half of the problem. Extra syntax allows me not touching the existing code. Imagine you have a stateless event processing. That is event comes, you do some calculation, prepare the answer and send it back. It will look like: void onEvent(Event event) { process(); } Because it is stateless, you know all the memory allocated during processing will not be required afterwards. So the syntax I suggested requires a very little change in code. process() may be implemented using std lib, doing several news and resizing. With new syntax: void onEvent(Event event) { ScopedAllocator alloc; allocator(alloc) { process(); } } So now you do not use GC for all that is created inside the process(). ScopedAllocator is a simple stack that will free all memory in one go. It is up to the runtime implementation to make sure all memory that is allocated inside allocator{} scope is actually allocated using ScopedAllocator and not GC. Does it make sense?
Re: why allocators are not discussed here
Imagine we have two delegates: void* delegate(size_t); // this one allocs void delegate(void*);// this one frees you pass both to a function that constructs you object. The first is used for allocation the memory, the second gets attached to the TypeInfo and is used by the gc to free the object. Then it's just GC but with an extra complication. IMHO, not really, as the place you get the memory from is not managed by the GC, or at least not directly. The GC algorithm would see that there is a free delegate attached to the object and would use this to free the memory. The same should hold true for calling GC.free. Or are you talking about ref counting and such?
Re: why allocators are not discussed here
On Wednesday, 26 June 2013 at 14:26:03 UTC, H. S. Teoh wrote: Yeah, I think the best approach would be one that doesn't require changing a whole mass of code to support. Also, one that doesn't require language changes would be far more likely to be accepted, as the core D devs are leery of adding yet more complications to the language. That's why I proposed that gc_alloc and gc_free be made into thread-global function pointers, that can be swapped with a custom allocator's version. This doesn't have to be visible to user code; it can just be an implementation detail in std.allocator, for example. It allows us to implement custom allocators across a block of code that doesn't know (and doesn't need to know) what allocator will be used. Yes, being able to change gc_alloc, gc_free would do the work. If runtime remembers the stack of gc_alloc/gc_free functions like pushd, popd, that would simplify its usage. I think this is a very nice and simple solution to the problem.
Re: why allocators are not discussed here
26-Jun-2013 18:27, cybervadim пишет: On Wednesday, 26 June 2013 at 14:17:03 UTC, Dmitry Olshansky wrote: Awful. What that extra syntax had brought you? Except that now new is unsafe by design? Other questions involve how does this allocation scope goes inside of functions, what is the mechanism of passing it up and down of call-stack. Last but not least I fail to see how scoped allocators alone (as presented) solve even half of the problem. Extra syntax allows me not touching the existing code. Imagine you have a stateless event processing. That is event comes, you do some calculation, prepare the answer and send it back. It will look like: void onEvent(Event event) { process(); } Because it is stateless, you know all the memory allocated during processing will not be required afterwards. Here is a chief problem - the assumption that is required to make it magically work. Now what I see is: T arr[];//TLS //somewhere down the line arr = ... ; else{ ... alloctor(myAlloc){ arr = array(filter!); } ... } return arr; Having an unsafe magic wand that may transmogrify some code to switch allocation strategy I consider naive and dangerous. Who ever told you process does return before allocating a few Gigs of RAM (and hoping on GC collection)? Right, nobody. Maybe it's an event loop that may run forever. What is missing is that code up to date assumes new == GC and works _like that_. So the syntax I suggested requires a very little change in code. process() may be implemented using std lib, doing several news and resizing. With new syntax: void onEvent(Event event) { ScopedAllocator alloc; allocator(alloc) { process(); } } So now you do not use GC for all that is created inside the process(). ScopedAllocator is a simple stack that will free all memory in one go. It is up to the runtime implementation to make sure all memory that is allocated inside allocator{} scope is actually allocated using ScopedAllocator and not GC. Does it make sense? Yes, but it's horribly broken. -- Dmitry Olshansky
Re: why allocators are not discussed here
26-Jun-2013 03:16, Adam D. Ruppe пишет: On Tuesday, 25 June 2013 at 22:50:55 UTC, H. S. Teoh wrote: And maybe (b) can be implemented by making gc_alloc / gc_free overridable function pointers? Then we can override their values and use scope guards to revert them back to the values they were before. Yea, I was thinking this might be a way to go. You'd have a global (well, thread-local) allocator instance that can be set and reset through stack calls. You'd want it to be RAII or delegate based, so the scope is clear. with_allocator(my_alloc, { do whatever here }); or { ChangeAllocator!my_alloc dummy; do whatever here } // dummy's destructor ends the allocator scope Both suffer from a) being totally unsafe and in fact bug prone since all references obtained in there are now dangling (and there is no indication where they came from) b) imagine you need to use an allocator for a stateful object. Say forward range of some other ranges (e.g. std.regex) both scoped/stacked to allocate its internal stuff. 2nd one may handle it but not the 1st one. c) transfer of objects allocated differently up the call graph (scope graph?), is pretty much neglected I see. I kind of wondering how our knowledgeable community has come to this. (must have been starving w/o allocators way too long) { malloced_string str; auto got = to!string(10, str); } // str is out of scope, so it gets free()'d. unsafe though: if you stored a copy of got somewhere, it is now a pointer to freed memory. I'd kinda like language support of some sort to help mitigate that though, like being a borrowed pointer that isn't allowed to be stored, but that's another discussion. In contrast 'container as an output range' works both safely and would be still customizable. IMHO the only place for allocators is in containers other kinds of code may just ignore allocators completely. std.algorithm and friends should imho be customized on 2 things only: a) containers to use (instead of array) b) optionally a memory source (or allocator) f container is temporary(scoped) to tie its life-time to smth. Want temporary stuff? Use temporary arrays, hashmaps and whatnot i.e. types tailored for a particular use case (e.g. with a temporary/scoped allocator in mind). These would all be unsafe though. Alternative is ref-counting pointers to an allocator. With word on street about ARC it could be nice direction to pursue. Allocators (as Andrei points out in his video) have many kinds: a) persistence: infinite, manual, scoped b) size: unlimited vs fixed c) block-size: any, fixed, or *any* up to some maximum size Most of these ARE NOT interchangeable! Yet some are composable however I'd argue that allocators are not composable but have some reusable parts that in turn are composable. Code would have to cutter for specific flavors of allocators still so we'd better reduce this problem to the selection of containers. -- Dmitry Olshansky
Re: why allocators are not discussed here
26-Jun-2013 05:24, Adam D. Ruppe пишет: I was just quickly skimming some criticism of C++ allocators, since my thought here is similar to what they do. On one hand, maybe D can do it right by tweaking C++'s design rather than discarding it. Criticisms are: A) Was defined to not have any state (as noted in the standard) B) Parametrized on type (T) yet a container that is parametrized on it may need to allocate something else completely (a node with T). C) Containers are parametrized on allocators so say 2 lists with different allocators are incompatible in a sense that e.g. you can't splice pieces of them together. Of the above IMHO we can deduce that a) Should support stateful allocators but we have to make sure we don't pay storage space for state-less ones (global ones e.g. mallocator). b) Should preferably be typeless and let container define what they allocate c) Hardly solvable unless we require a way to reassign objects between allocators (at least of similar kinds) Anyway, bottom line is I don't think that criticism necessarily applies to D. But there's surely many others and I'm more or less a n00b re c++'s allocators so idk yet. -- Dmitry Olshansky
Re: why allocators are not discussed here
On Wed, Jun 26, 2013 at 01:16:31AM +0200, Adam D. Ruppe wrote: On Tuesday, 25 June 2013 at 22:50:55 UTC, H. S. Teoh wrote: And maybe (b) can be implemented by making gc_alloc / gc_free overridable function pointers? Then we can override their values and use scope guards to revert them back to the values they were before. Yea, I was thinking this might be a way to go. You'd have a global (well, thread-local) allocator instance that can be set and reset through stack calls. You'd want it to be RAII or delegate based, so the scope is clear. with_allocator(my_alloc, { do whatever here }); or { ChangeAllocator!my_alloc dummy; do whatever here } // dummy's destructor ends the allocator scope I think the former is a bit nicer, since the dummy variable is a bit silly. We'd hope that delegate can be inlined. Actually, D's frontend leaves something to be desired when it comes to inlining delegates. It *is* done sometimes, but not as often as one may like. For example, opApply generally doesn't inline its delegate, even when it's just a thin wrapper around a foreach loop. But yeah, I think the former has nicer syntax. Maybe we can help the compiler with inlining by making the delegate a compile-time parameter? But it forces a switch of parameter order, which is Not Nice (hurts readability 'cos the allocator argument comes after the block instead of before). But, the template still has a big advantage: you can change the type. And I think that is potentially enormously useful. True. It can use different types for different allocators that does (or doesn't) do cleanups at the end of the scope, depending on what the allocator needs to do. Another question is how to tie into output ranges. Take std.conv.to. auto s = to!string(10); // currently, this hits the gc What if I want it to go on a stack buffer? One option would be to rewrite it to use an output range, and then call it like: char[20] buffer; auto s = to!string(10, buffer); // it returns the slice of the buffer it actually used (and we can do overloads so to!string(10, radix) still works, as well as to!string(10, radix, buffer). Hassle, I know...) I think supporting the multi-argument version of to!string() is a good thing, but what to do with library code that calls to!string()? It'd be nice if we could somehow redirect those GC calls without having to comb through the entire Phobos codebase for stray calls to to!string(). [...] The fun part is the output range works for that, and could also work for something like this: struct malloced_string { char* ptr; size_t length; size_t capacity; void put(char c) { if(length = capacity) ptr = realloc(ptr, capacity*2); ptr[length++] = c; } char[] slice() { return ptr[0 .. length]; } alias slice this; mixin RefCounted!this; // pretend this works } { malloced_string str; auto got = to!string(10, str); } // str is out of scope, so it gets free()'d. unsafe though: if you stored a copy of got somewhere, it is now a pointer to freed memory. I'd kinda like language support of some sort to help mitigate that though, like being a borrowed pointer that isn't allowed to be stored, but that's another discussion. Nice! And that should work. So then what we might do is provide these little output range wrappers for various allocators, and use them on many functions. So we'd write: import std.allocators; import std.range; // mallocator is provided in std.allocators and offers the goods OutputRange!(char, mallocator) str; auto got = to!string(10, str); I like this. However, it still doesn't address how to override the default allocator in, say, Phobos functions. What's nice here is the output range is useful for more than just allocators. You could also to!string(10, my_file) or a delegate, blah blah blah. So it isn't too much of a burden, it is something you might naturally use anyway. Now *that* is a very nice idea. I like having a way of bypassing using a string buffer, and just writing the output directly to where it's intended to go. I think to() with an output range parameter definitely should be implemented. It doesn't address all of the issues, but it's a very big first step IMO. Also, we may have the problem of the wrong allocator being used to free the object. Another reason why encoding the allocator into the type is so nice. For the minimal D I've been playing with, the idea I'm running with is all allocated memory has some kind of special type, and then naked pointers are always assumed to be borrowed, so you should never store or free them. Interesting idea. So basically you can tell which allocator was used to allocate an object just by looking at its type? That's not a bad idea, actually. auto foo = HeapArray!char(capacity); void bar(char[] lol){} bar(foo); // allowed, foo has an alias this on slice
Re: why allocators are not discussed here
On Wed, Jun 26, 2013 at 04:31:40PM +0200, cybervadim wrote: On Wednesday, 26 June 2013 at 14:26:03 UTC, H. S. Teoh wrote: Yeah, I think the best approach would be one that doesn't require changing a whole mass of code to support. Also, one that doesn't require language changes would be far more likely to be accepted, as the core D devs are leery of adding yet more complications to the language. That's why I proposed that gc_alloc and gc_free be made into thread-global function pointers, that can be swapped with a custom allocator's version. This doesn't have to be visible to user code; it can just be an implementation detail in std.allocator, for example. It allows us to implement custom allocators across a block of code that doesn't know (and doesn't need to know) what allocator will be used. Yes, being able to change gc_alloc, gc_free would do the work. If runtime remembers the stack of gc_alloc/gc_free functions like pushd, popd, that would simplify its usage. I think this is a very nice and simple solution to the problem. Adam's idea does this: tie each replacement of gc_alloc/gc_free to some stack-based object, that automatically cleans up in the dtor. So something along these lines: struct CustomAlloc(A) { void* function(size_t size) old_alloc; void function(void* ptr) old_free; this(A alloc) { old_alloc = gc_alloc; old_free = gc_free; gc_alloc = A.alloc; gc_free = A.free; } ~this() { gc_alloc = old_alloc; gc_free = old_free; // Cleans up, e.g., region allocator deletes the // region A.cleanup(); } } class C {} void main() { auto c = new C(); // allocates using default allocator (GC) { CustomAlloc!MyAllocator _; // Everything from here on until end of block // uses MyAllocator auto d = new C(); // allocates using MyAllocator { CustomAlloc!AnotherAllocator _; auto e = new C(); // allocates using AnotherAllocator // End of scope: auto cleanup, gc_alloc and // gc_free reverts back to MyAllocator } auto f = new C(); // allocates using MyAllocator // End of scope: auto cleanup, gc_alloc and // gc_free reverts back to default values } auto g = new C(); // allocates using default allocator } So you effectively have an allocator stack, and user code never has to directly manipulate gc_alloc/gc_free (which would be dangerous). T -- Almost all proofs have bugs, but almost all theorems are true. -- Paul Pedersen
Re: why allocators are not discussed here
Some type system help is required to guarantee that references to such scope-allocated data won't escape.
Re: why allocators are not discussed here
On Wed, Jun 26, 2013 at 06:51:54PM +0400, Dmitry Olshansky wrote: 26-Jun-2013 03:16, Adam D. Ruppe пишет: On Tuesday, 25 June 2013 at 22:50:55 UTC, H. S. Teoh wrote: And maybe (b) can be implemented by making gc_alloc / gc_free overridable function pointers? Then we can override their values and use scope guards to revert them back to the values they were before. Yea, I was thinking this might be a way to go. You'd have a global (well, thread-local) allocator instance that can be set and reset through stack calls. You'd want it to be RAII or delegate based, so the scope is clear. with_allocator(my_alloc, { do whatever here }); or { ChangeAllocator!my_alloc dummy; do whatever here } // dummy's destructor ends the allocator scope Both suffer from a) being totally unsafe and in fact bug prone since all references obtained in there are now dangling (and there is no indication where they came from) How is this different from using malloc() and free() manually? You have no indication of where a void* came from either, and the danger of dangling references is very real, as any C/C++ coder knows. And I assume that *some* people will want to be defining custom allocators that wrap around malloc/free (e.g. the game engine guys who want total control). b) imagine you need to use an allocator for a stateful object. Say forward range of some other ranges (e.g. std.regex) both scoped/stacked to allocate its internal stuff. 2nd one may handle it but not the 1st one. Yeah this is a complicated area. A container basically needs to know how to allocate its elements. So somehow that information has to be somewhere. c) transfer of objects allocated differently up the call graph (scope graph?), is pretty much neglected I see. They're incompatible. You can't safely make a linked list that contains both GC-allocated nodes and malloc() nodes. That's just a bomb waiting to explode in your face. So in that sense, Adam's idea of using a different type for differently-allocated objects makes sense. A container has to declare what kind of allocation its members are using; any other way is asking for trouble. I kind of wondering how our knowledgeable community has come to this. (must have been starving w/o allocators way too long) We're just trying to provoke Andrei into responding. ;-) [...] IMHO the only place for allocators is in containers other kinds of code may just ignore allocators completely. But some people clamoring for allocators are doing so because they're bothered by Phobos using ~ for string concatenation, which implicitly uses the GC. I don't think we can just ignore that. std.algorithm and friends should imho be customized on 2 things only: a) containers to use (instead of array) b) optionally a memory source (or allocator) f container is temporary(scoped) to tie its life-time to smth. Want temporary stuff? Use temporary arrays, hashmaps and whatnot i.e. types tailored for a particular use case (e.g. with a temporary/scoped allocator in mind). These would all be unsafe though. Alternative is ref-counting pointers to an allocator. With word on street about ARC it could be nice direction to pursue. Ref-counting is not fool-proof, though. There's always cycles to mess things up. Allocators (as Andrei points out in his video) have many kinds: a) persistence: infinite, manual, scoped b) size: unlimited vs fixed c) block-size: any, fixed, or *any* up to some maximum size Most of these ARE NOT interchangeable! Yet some are composable however I'd argue that allocators are not composable but have some reusable parts that in turn are composable. I was listening to Andrei's talk this morning, but I didn't quite understand what he means by composable allocators. Is he talking about nesting, say, a GC inside a region allocated by a region allocator? Code would have to cutter for specific flavors of allocators still so we'd better reduce this problem to the selection of containers. [...] Hmm. Sounds like we have two conflicting things going on here: 1) En massé replacement of gc_alloc/gc_free in a certain block of code (which may be the entire program), e.g., for the avoidance of GC in game engines, etc.. Basically, the code is allocator-agnostic, but at some higher level we want to control which allocator is being used. 2) Specific customization of containers, etc., as to which allocator(s) should be used, with (hopefully) some kind of support from the type system to prevent mistakes like dangling pointers, escaping references, etc.. Here, the code is NOT allocator-agnostic; it has to be written with the specific allocation model in mind. You can't just replace the allocator with another one without introducing bugs or problems. These two may interact in complex ways... e.g., you might want to use malloc to allocate a pool, then use a custom gc_alloc/gc_free to allocate from this pool in order to support language
Re: why allocators are not discussed here
By the way, while this topic gets some attention, I want to make a notice that there are actually two orthogonal entities that arise when speaking about configurable allocation - allocators itself and global allocation policies. I think good design should address both of those. For example, changing global allocator for custom one has limited usability - you are anyway limited by the language design that makes only GC or ref-counting viable general options. However, some way to prohibit automatic allocations at runtime while still allowing manual ones may be useful - and it does not matter what allocator is actually used to get that memory. Once such API is designed, tighter classification and control may be added with time.
Re: why allocators are not discussed here
On Wednesday, 26 June 2013 at 17:25:24 UTC, H. S. Teoh wrote: I was listening to Andrei's talk this morning, but I didn't quite understand what he means by composable allocators. Is he talking about nesting, say, a GC inside a region allocated by a region allocator? Maybe he was talking about a freelist allocator over a reap, as described by the HeapLayers project http://heaplayers.org/ in the paper from 2001 titled 'Composing High-Performance Memory Allocators'. I'm pretty sure that web site was referenced in the talk. A few publications there are from Andrei. I agree that D should support programming without a GC, with different GCs than the default one, and custom allocators, and that features which demand a GC will be troublesome. -- Brian
Re: why allocators are not discussed here
On Wednesday, 26 June 2013 at 16:40:20 UTC, H. S. Teoh wrote: I think supporting the multi-argument version of to!string() is a good thing, but what to do with library code that calls to!string()? It'd be nice if we could somehow redirect those GC calls without having to comb through the entire Phobos codebase for stray calls to to!string(). Let's consider what kinds of allocations we have. We can break them up into two broad groups: internal and visible. Internal allocations, in theory, don't matter. These can be on the stack, the gc heap, malloc/free, whatever. The function itself is responsible for their entire lifetime. Changing these either optimize, in the case of reusing a region, or leak if you switch it to manual and the function doesn't know it. Visible allocations are important because the caller is responsible for freeing them. Here, I really think we want the type system's help: either it should return something that we know we're responsible for, or take a buffer/output range from us to receive the data in the first place. Either way, the function signature should reflect what's going on with visible allocations. It'd possibly return a wrapped type and it'd take an output range/buffer/allocator. With internals though, the only reason I can see why you'd want to change them outside the function is to give them a region of some sort to work with, especially since you don't know for sure what it is doing - these are all local variables to the function/call stack. And here, I don't think we want to change the allocator wholesale. At most, we'd want to give it hints that what we're doing are short lived. (Or, better yet, have it figure this out on its own, like a generational gc.) So I think this is more about tweaking the gc than replacing it, at most adding a couple new functions to it: GC.hint_short_lived // returns a helper struct with a static refcount: TempGcAllocator { static int tempCount = 0; static void* localRegion; this() { tempCount++; } // pretend this works ~this() { tempCount--; if(tempCount == 0) gc.tryToCollect(localRegion); } T create(T, Args...)(Args args) { return GC.new_short_lived T(args); } } and gc.tryToCollect() does a quick scan for anything into the local region. If there's nothing in there, it frees the whole thing. If there is, in the name of memory safety, it just reintegrates that local region into the regular memory and gc's its components normally. The reason the count is static is that you don't have to pass this thing down the call stack. Any function that wants to adapt to this generational hint system just calls hint_short_lived. If you're a leaf function, that's ok, the static count means you'll inherit the region from the function above you. You would NOT use this in main(), as that defeats the purpose. I think to() with an output range parameter definitely should be implemented. No doubt about it, we should aim for most phobos functions not to allocate at all, if given an output range they can use. Interesting idea. So basically you can tell which allocator was used to allocate an object just by looking at its type? Right, then you'll know if you have to free() it. (Or it can free itself with its destructor.) This is a bit inconvenient. So your member variables will have to know what allocation type is being used. Not the end of the world, of course, but not as pretty as one would like. Yeah, you'd need to know if you own them or not too (are you responsible for freeing that string you just got passed? If no, are you sure it won't be freed while you're still using it?), but I just think that's a part of memory management you can't sidestep. There's two easy answers: 1) always make a private copy of anything you store (and perhaps write to) or 2) use a gc and trust it to always be the owner. In any other case, I think you *have* to think about it, and the type telling you can help you make that decision. and allows you to mix differently-allocated objects without having to Important to remember though that you are borrowing these references, not taking ownership. I think the rule of all pointers/slices are borrowed is fairly workable though. With the gc, that's ok, you don't own anything. The garbage collector is responsible for it all, so store away. (Though if it is mutable, you might want to idup it so you don't get overwritten by someone else. But that's a separate question from allocation method and already encoded in D's type system). So never free() a naked pointer, unless you know what you're doing like interfacing with a C library, prefer to only free a ManuallyAllocated!(pointer). hell a C library binding could change the type too, it'd still be binary compatible. RefCounted!T wouldn't be, but ManuallyAllocated!T would just be a wrapper around T*. I think I'm starting to ramble!
Re: why allocators are not discussed here
On Wednesday, 26 June 2013 at 17:25:24 UTC, H. S. Teoh wrote: malloc to allocate a pool, then use a custom gc_alloc/gc_free to allocate from this pool in order to support language built-ins like ~ and ~= without needing to rewrite every function that uses strings. Blargh, I forgot about operator ~ on built ins. For custom types it is easy enough to manage, just overload it. You can even do ~= on types that aren't allowed to allocate, if they have a certain capacity set up ahead of time (like a stack buffer) But for built ins, blargh, I don't even think we can hint on them to the gc. Maybe we should just go ahead and make the gc generational. (If you aren't using gc, I say leave binary ~ unimplemented in all cases. Use ~= on a temporary instead whenever you would do that. It is easier to follow the lifetime if you explicitly declare your temporary.)
Re: why allocators are not discussed here
On Wednesday, 26 June 2013 at 14:59:41 UTC, Dmitry Olshansky wrote: Here is a chief problem - the assumption that is required to make it magically work. Now what I see is: T arr[];//TLS //somewhere down the line arr = ... ; else{ ... alloctor(myAlloc){ arr = array(filter!); } ... } return arr; Having an unsafe magic wand that may transmogrify some code to switch allocation strategy I consider naive and dangerous. Who ever told you process does return before allocating a few Gigs of RAM (and hoping on GC collection)? Right, nobody. Maybe it's an event loop that may run forever. What is missing is that code up to date assumes new == GC and works _like that_. Not magic, but the tool which is quite powerful and thus it may shoot your leg. This is unsafe, but if you want it safe, don't use allocators, stay with GC. In the example above, you get first arr freed by GC, second arr may point to nothing if myAlloc was implemented to free it before. Or you may get a proper arr reference if myAlloc used malloc and didn't free it. The fact that you may write bad code does not make the language (or concept) bad.
Re: why allocators are not discussed here
26-Jun-2013 23:04, cybervadim пишет: On Wednesday, 26 June 2013 at 14:59:41 UTC, Dmitry Olshansky wrote: Having an unsafe magic wand that may transmogrify some code to switch allocation strategy I consider naive and dangerous. Who ever told you process does return before allocating a few Gigs of RAM (and hoping on GC collection)? Right, nobody. Maybe it's an event loop that may run forever. What is missing is that code up to date assumes new == GC and works _like that_. Not magic, but the tool which is quite powerful and thus it may shoot your leg. I know what kind of thing you are talking about. It's ain't powerful it's just a hack that doesn't quite do what advertised. This is unsafe, but if you want it safe, don't use allocators, stay with GC. BTW you were talking changing allocation of the code you didn't write. There is not even single fact that makes the thing safe. It's all working by chance or because the thing was designed to work with scoped allocator to begin with. I believe the 2nd case (design to use scoped allocation) is a) The behavior is guaranteed (determinism vs GC etc) b) Safety is assured be the designer not pure luck (and reasonable assumption that may not hold) In the example above, you get first arr freed by GC, second arr may point to nothing if myAlloc was implemented to free it before. Or you may get a proper arr reference if myAlloc used malloc and didn't free it. Yeah I know, hence I showed it. BTW forget about malloc I'm not talking about explicit malloc being an alternative to you scheme. The fact that you may write bad code does not make the language (or concept) bad. It does. Because it introduces easy unreliable and bug prone usage. -- Dmitry Olshansky
Re: why allocators are not discussed here
26-Jun-2013 21:35, Dicebot пишет: By the way, while this topic gets some attention, I want to make a notice that there are actually two orthogonal entities that arise when speaking about configurable allocation - allocators itself and global allocation policies. I think good design should address both of those. Sadly I believe that global allocators would still have to be compatible with GC (to not break code in hard to track ways) thus basically being a GC. Hence we can easily stop talking about them ;) -- Dmitry Olshansky
Re: why allocators are not discussed here
26-Jun-2013 21:23, H. S. Teoh пишет: Both suffer from a) being totally unsafe and in fact bug prone since all references obtained in there are now dangling (and there is no indication where they came from) How is this different from using malloc() and free() manually? You have no indication of where a void* came from either, and the danger of dangling references is very real, as any C/C++ coder knows. And I assume that *some* people will want to be defining custom allocators that wrap around malloc/free (e.g. the game engine guys who want total control). Why the heck you people think I purpose to use malloc directly as alternative to whatever hackish allocator stack proposed? Use the darn container. For starters I'd make allocation strategy a parameter of each containers. At least they do OWN memory. Then refactor out common pieces into a framework of allocation helpers. I'd personally in the end would separate concerns into 3 entities: 1. Memory area objects - think as allocators but without the circuitry to do the allocation, e.g. a chunk of memory returned by malloc/alloca can be wrapped into a memory area object. 2. Allocators (Policies) - a potentially nested combination of such circuitry that makes use of memory areas. Free-lists, pools, stacks etc. Safe ones have ref-counting on memory areas, unsafe once don't. (Though safety largely depends on the way you got that chunk of memory) 3. Containers/Warppers as above objects that handle life-cycle of objects and make use of allocators. In fact allocators are part of type but not memory area objects. b) imagine you need to use an allocator for a stateful object. Say forward range of some other ranges (e.g. std.regex) both scoped/stacked to allocate its internal stuff. 2nd one may handle it but not the 1st one. Yeah this is a complicated area. A container basically needs to know how to allocate its elements. So somehow that information has to be somewhere. c) transfer of objects allocated differently up the call graph (scope graph?), is pretty much neglected I see. They're incompatible. You can't safely make a linked list that contains both GC-allocated nodes and malloc() nodes. What I mean is that if types are the same as built-ins it would be a horrible mistake. If not then we are talking about containers anyway. And if these have a ref-counted pointer to their allocator then the whole thing is safe albeit at the cost of performance. Sadly alias this to some built-in (=e.g. slice) allows squirreling away underlying reference too easily. As such I don't believe in any of the 2 *lies*: a) built-ins can be refurbished to use custom allocators b) we can add opSlice/alias this or whatever to our custom type to get access to the underlying built-ins safely and transparently Both are just nuclear bombs waiting a good time to explode. That's just a bomb waiting to explode in your face. So in that sense, Adam's idea of using a different type for differently-allocated objects makes sense. Yes, but one should be careful here as not to have exponential explosion in the code size. So some allocators have to be compatible and if there is a way to transfer ownership it'd be bonus points (and a large pot of these mind you). A container has to declare what kind of allocation its members are using; any other way is asking for trouble. Hence my thoughts to move this piece of circuitry to containers proper. The whole idea that by swapping malloc with myMalloc you can translate to a wildly different allocation scheme doesn't quite hold. I think it may be interesting to try and put a wall in different place namely in between allocation strategy and memory areas it works on. I kind of wondering how our knowledgeable community has come to this. (must have been starving w/o allocators way too long) We're just trying to provoke Andrei into responding. ;-) Cool, then keep it coming but ... safety and other holes has to be taken care of. [...] IMHO the only place for allocators is in containers other kinds of code may just ignore allocators completely. But some people clamoring for allocators are doing so because they're bothered by Phobos using ~ for string concatenation, which implicitly uses the GC. I don't think we can just ignore that. ~= would work with any sensible array-like contianer. ~ is sadly only a convenience for scripts and/or non-performance (determinism) critical apps unfortunately. std.algorithm and friends should imho be customized on 2 things only: a) containers to use (instead of array) b) optionally a memory source (or allocator) f container is temporary(scoped) to tie its life-time to smth. Want temporary stuff? Use temporary arrays, hashmaps and whatnot i.e. types tailored for a particular use case (e.g. with a temporary/scoped allocator in mind). These would all be unsafe though. Alternative is ref-counting pointers to an allocator. With word on street about ARC it
Re: why allocators are not discussed here
Am Wed, 26 Jun 2013 16:30:50 +0200 schrieb Robert Schadek realbur...@gmx.de: Imagine we have two delegates: void* delegate(size_t); // this one allocs void delegate(void*);// this one frees you pass both to a function that constructs you object. The first is used for allocation the memory, the second gets attached to the TypeInfo and is used by the gc to free the object. Does it mean 16 extra bytes for every allocation ? -- Marco
Re: why allocators are not discussed here
On 06/26/2013 10:06 PM, Marco Leise wrote: Does it mean 16 extra bytes for every allocation ? yes, or wrap it, and you have 4 or 8 bytes, but yes you would to have save it somewhere
Re: why allocators are not discussed here
On Wednesday, 26 June 2013 at 19:40:54 UTC, Dmitry Olshansky wrote: Sadly I believe that global allocators would still have to be compatible with GC (to not break code in hard to track ways) thus basically being a GC. Hence we can easily stop talking about them ;) Nice way to say we don't really need that embedded, kernel and gamedev guys. GC as a safe an obvious approach should be the default but druntime needs to provide means for tight and dangerous control upon explicit request.
Re: why allocators are not discussed here
27-Jun-2013 00:53, Dicebot пишет: On Wednesday, 26 June 2013 at 19:40:54 UTC, Dmitry Olshansky wrote: Sadly I believe that global allocators would still have to be compatible with GC (to not break code in hard to track ways) thus basically being a GC. Hence we can easily stop talking about them ;) Nice way to say we don't really need that embedded, kernel and gamedev guys. GC as a safe an obvious approach should be the default but druntime needs to provide means for tight and dangerous control upon explicit request. Just don't use certain built-ins. Stub them out in run-time if you like. The only problematic point I see is closures allocated on heap. Frankly I see embedded, kernel and gamedev guys using ref-counting and custom data structures all the time. They all want that level of control and determinism anyway or are so resource constrained that GC is too much code space or run-time overhead anyway. Needless to say that custom run-time for the first 2 categories is required anyway so just hack the druntime. It would be nice to have hooks readily available (and documented?) to do so but hardly beyond that. -- Dmitry Olshansky
Re: why allocators are not discussed here
On Wednesday, 26 June 2013 at 21:00:54 UTC, Dmitry Olshansky wrote: Needless to say that custom run-time for the first 2 categories is required anyway so just hack the druntime. It would be nice to have hooks readily available (and documented?) to do so but hardly beyond that. It is an API issue. Hacking druntime is, unfortunately, inevitable but keeping ability to swap those two with no code changes simplifies development process and makes less tempting too forget about this use case when doing std lib / runtime stuff - it has been a second-class citizen for rather long time.
Re: why allocators are not discussed here
On Wednesday, 26 June 2013 at 21:00:54 UTC, Dmitry Olshansky wrote: Just don't use certain built-ins. Stub them out in run-time if you like. The only problematic point I see is closures allocated on heap. Actually, I was kinda sorta able to solve this in my minimal d. // this would be used for automatic heap closures, but there's no way to free it... ///* extern(C) void* _d_allocmemory(size_t bytes) { auto ptr = manual_malloc(bytes); debug(allocations) { char[16] buffer; write(warning: automatic memory allocation , intToString(cast(size_t) ptr, buffer)); } return ptr; } struct HeapClosure(T) if(is(T == delegate)) { mixin SimpleRefCounting!(T, q{ char[16] buffer; write(\nfreeing closure , intToString(cast(size_t) payload.ptr, buffer),\n); manual_free(payload.ptr); }); } HeapClosure!T makeHeapClosure(T)(T t) { // if(__traits(isNested, T)) { return HeapClosure!T(t); } void closureTest2(HeapClosure!(void delegate()) test) { write(\nptr is , cast(size_t) test.ptr, \n); test(); auto b = test; } void closureTest() { string a = whoa; scope(exit) write(\n\nexit\n\n); //throw new Exception(test); closureTest2( makeHeapClosure({ write(a); }) ); } It worked in my toy tests. The trick would be though to never store or use a non-scope builtin delegate. Using RTInfo, I believe I can statically verify you don't do this in the whole program, but haven't actually tried yet. I also left built in append unimplemented, but did custom types with ~= that are pretty convenient. Binary ~ is a loss though, too easy to lose pointers with that.
Re: why allocators are not discussed here
27-Jun-2013 01:05, Adam D. Ruppe пишет: On Wednesday, 26 June 2013 at 21:00:54 UTC, Dmitry Olshansky wrote: Just don't use certain built-ins. Stub them out in run-time if you like. The only problematic point I see is closures allocated on heap. Actually, I was kinda sorta able to solve this in my minimal d. // this would be used for automatic heap closures, but there's no way to free it... [snip a cool hack] Yeah, I suspected something like this might work. Basically defining your own ref-count closure type and forging delegate keyword in your codebase (except in the file that defines heap closure). That still leaves chasing code like auto dg = (...){ ... } though. Maybe having it as a template Closure!(ret-type, arg types...) and instantiator function called simply closure could be more ecstatically pleasing (this is IMHO). It worked in my toy tests. The trick would be though to never store or use a non-scope builtin delegate. Using RTInfo, I believe I can statically verify you don't do this in the whole program, but haven't actually tried yet. I also left built in append unimplemented, but did custom types with ~= that are pretty convenient. Binary ~ is a loss though, too easy to lose pointers with that. -- Dmitry Olshansky
Re: why allocators are not discussed here
So to try some ideas, I started implementing a simple container with replaceable allocators: a singly linked list. All was going kinda well until I realized the forward range it offers to iterate its contents makes it possible to escape a reference to a freed node. auto range = list.range; auto range2 = range; range.removeFront(); range2 now refers to a freed node. Maybe the nodes could be refcounted, though a downside there is even the range won't be sharable, it would be a different type based on allocation method. (I was hoping to make the range be a sharable component, even as the list itself changed type with allocators.) I guess we could @disable copy construction, and make it a forward range instead of an input one, but that takes some of the legitimate usefulness away. Interestingly though, opApply would be ok here, since all it would expose is the payload. (though if the payload is a reference type, does the container take ownership of it? How do we indicate that? Perhaps more interestingly, how do we indicate the /lack/ of ownership at the transfer point?) This is all fairly easy if we just decide we're going to do this with GC or we're going to do this C style and do the whole program like that, libraries and all. But trying to mix and match just gets more complicated the more I think about it :( It makes the question of allocators look trivial.
Re: why allocators are not discussed here
On Thu, Jun 27, 2013 at 12:43:54AM +0200, Adam D. Ruppe wrote: So to try some ideas, I started implementing a simple container with replaceable allocators: a singly linked list. All was going kinda well until I realized the forward range it offers to iterate its contents makes it possible to escape a reference to a freed node. [...] (though if the payload is a reference type, does the container take ownership of it? How do we indicate that? Perhaps more interestingly, how do we indicate the /lack/ of ownership at the transfer point?) Maybe a type distinction akin to C++'s auto_ptr might help? Say we introduce OwnedRef!T vs. plain old T*. So something returning OwnedRef!T will need to assume ownership of the object, whereas something returning T* would just be returning a reference, but the container continues to hold ownership over the object. This is all fairly easy if we just decide we're going to do this with GC or we're going to do this C style and do the whole program like that, libraries and all. But trying to mix and match just gets more complicated the more I think about it :( It makes the question of allocators look trivial. Heh. Yeah, I'm started to wonder if it even makes sense to try to mix-n-match GC-based and non-GC-based allocators. It seems that maybe we just have to settle for the fact of life that a GC-based object is fundamentally incompatible with a pool-allocated object, and both are also fundamentally incompatible with malloc-allocated objects, 'cos you need the code to be aware in each instance of what needs to be done to cleanup, etc.. T -- GEEK = Gatherer of Extremely Enlightening Knowledge
Re: why allocators are not discussed here
On Wednesday, 26 June 2013 at 23:02:47 UTC, H. S. Teoh wrote: Maybe a type distinction akin to C++'s auto_ptr might help? Yeah, that's what I'm thinking, but I don't really like it. Perhaps I'm trying too hard to cover everything, and should be happier with just doing what C++ does. Full memory safety is prolly out the window anyway. In std.typecons, there's a Unique!T, but it doesn't look complete. A lot of the code is commented out, maybe it was started back in the days of bug city. Yeah, I'm started to wonder if it even makes sense to try to mix-n-match GC-based and non-GC-based allocators. It might not be so bad if we modified D to add a lent storage class, or something, similar to some discussions about scope in the past. These would be values you may work with, but never keep; assigning them to anything is not allowed and you may only pass them to a function or return them from a function if that is also marked lent. Any regular reference would be implicitly usable as lent. int* ptr; void bar(int* a) { foo(a); // ok } int* foo(lent int* a) { bar(a); // error, cannot call bar with lent pointer ptr = a; // error, cannot assign lent value to non-lent field foo2(a); // ok foo(foo2(a)); // ok return a; // error, cannot return a lent value } lent int* foo2(lent int* a) { return a; // ok } foo(ptr); // ok (if foo actually compiled) And finally, if you take the address of a lent reference, that itself is lent; (lent int*) == lent int**. Then, if possible, it would be cool if: lent int* a; { int* b; a = b; } That was an error, because a outlived b. But since you can't store a anywhere, the only time this would happen would be something like here. And hell maybe we could hammer around that by making lent variables head const and say they must be initialized at declaration, so lent int* a; is illegal as well as a = b;. But we wouldn't want it transitively const, because then: void fillBuffer(lent char[] buffer) {} would be disallowed and that is something I would definitely want. Part of me thinks pure might help with this too but eh maybe not because even a pure function could in theory escape a reference via its other parameters. But with this kind of thing, we could do a nicer pointer type that does: lent T getThis() { return _this; } alias getThis this; and thus implicitly convert our inner pointer to something we can use on the outside world with some confidence that they won't sneak away any references to it. If combined with @disabling the address of operator on the container itself, we could really lock down ownership.
Re: why allocators are not discussed here
On Wed, Jun 26, 2013 at 12:22:04AM +0200, cybervadim wrote: I know Andrey mentioned he was going to work on Allocators a year ago. In DConf 2013 he described the problems he needs to solve with Allocators. But I wonder if I am missing the discussion around that - I tried searching this forum, found a few threads that was not actually a brain storm for Allocators design. Please point me in the right direction or is there a reason it is not discussed or should we open the discussion? That would be nice to get things going. :) Ever since I found D and subscribed to this mailing list, I've been hearing rumors of allocators, but they seem to be rather lacking in the department of concrete evidence. They're like the Big Foot or Swamp Ape of D. Maybe it's time we got out into the field and produced some real evidence of these mythical beasts. :-P The easiest approach for Allocators design I can imagine would be to let user specify which Allocator operator new should get the memory from (introducing a new keyword allocator). This gives a total control, but assumes user knows what he is doing. Example: CustomAllocator ca; allocator(ca) { auto a = new A; // operator new will use ScopeAllocator::malloc() auto b = new B; free(a); // that should call ScopeAllocator::free() // if free() is missing for allocated area, it is a user responsibility to make sure custom Allocator can handle that } By default allocator is the druntime using GC, free(a) does nothing for it. I believe the current direction is to avoid needing new language features / syntax. So the above probably won't happen. if some library defines its allocator (e.g. specialized container), there should be ability to: 1. override allocator 2. get access to the allocator used I understand that I spent 5 mins thinking about the way Allocators may look. My point is - if somebody is working on it, can you please share your ideas? Well, thanks for getting the ball rolling. Maybe Andrei can pipe up about any experimental designs he's currently considering. But barring that, I'm thinking about how allocators would be used in user code. I think it's pretty much a given that the C++ way of sticking it to the end of template arguments doesn't really fly: it's just too much of a hassle to keep having to worry about passing allocators around template arguments, that people just don't bother. So coming back to square one, how would allocators be used? 1) Usually, the user would just be content with the GC, and not ever have to worry about allocators. So this means that whatever allocator design we adopt, it should be practically invisible to ordinary users unless they're specifically looking to change how memory is allocated. 2) Furthermore, it's unlikely that in the same piece of code, you'd want to use 3 or 4 different allocators for different objects; while such cases may exist, it seems to me to be more likely that you want either (a) a very specific object (say a class instance or container) to use a particular allocator, or (b) you want to transitively block off an entire section of code (which may be the entire program in some cases) to use a particular allocator. As a first stab at it, I'd say (a) can be implemented by a static class member reference to an allocator, that can be set from user code. And maybe (b) can be implemented by making gc_alloc / gc_free overridable function pointers? Then we can override their values and use scope guards to revert them back to the values they were before. This allows us to use the runtime stack to manage which allocator is currently active. This lets *all* memory allocations be rerouted through the custom allocator without needing to hand-edit every call to new down the call graph. This is just a very crude first stab at the problem, though. In particular, (a) isn't very satisfactory. And also the interaction of allocated objects with the call stack: if any custom-allocated objects in (b) survive past the containing function which sets/resets the function pointers, there could be problems: if a member function of such an object needs to allocate memory, it will pick up the ambient allocator instead of the custom allocator in effect when the object was first created. Also, we may have the problem of the wrong allocator being used to free the object. Anyone has better ideas? T -- All problems are easy in retrospect.
Re: why allocators are not discussed here
On Tuesday, 25 June 2013 at 22:22:09 UTC, cybervadim wrote: (introducing a new keyword allocator) It would be easier to just pass an allocator object that provides the necessary methods and don't use new at all. (I kinda wish new wasn't in the language. It'd make this a little more consistent.) The allocator's create function could also return wrapped types, like RefCounted!T or NotNull!T depending on what it does. Though the devil is in the details here and I don't think I can say more without trying to actually do it.
Re: why allocators are not discussed here
On Wed, Jun 26, 2013 at 12:50:36AM +0200, Adam D. Ruppe wrote: On Tuesday, 25 June 2013 at 22:22:09 UTC, cybervadim wrote: (introducing a new keyword allocator) It would be easier to just pass an allocator object that provides the necessary methods and don't use new at all. (I kinda wish new wasn't in the language. It'd make this a little more consistent.) It's not too late to introduce a default allocator object that maps to built-in GC primitives. Maybe something like: struct DefaultAllocator { T* alloc(T, A...)(A args) { return new T(args); } void free(T)(T* ref) { // no-op } } We can then change Phobos to always use allocator.alloc and allocator.free, which it gets from user code somehow, and in the default case it would do the Right Thing. The allocator's create function could also return wrapped types, like RefCounted!T or NotNull!T depending on what it does. So maybe something like: struct RefCountedAllocator { RefCounted!T alloc(T, A...)(A args) { return allocRefCounted(args); } void free(T)(RefCounted!T ref) { dotDotDotMagic(ref); } } etc.. Though the devil is in the details here and I don't think I can say more without trying to actually do it. The main issue I see is how *not* to get stuck in C++'s situation where you have to specify allocator objects everywhere, which is highly inconvenient and liable for people to avoid using, which defeats the purpose of having allocators. It would be nice, IMO, if we can somehow let the user specify a custom allocator for, say, the whole of Phobos, so that people who care about this sorta thing can just replace the GC wholesale and then use Phobos to their hearts' content without having to manually specify allocator objects everywhere and risk forgetting a single case that eventually leads to memory leakage. T -- Computers shouldn't beep through the keyhole.
Re: why allocators are not discussed here
On Tuesday, 25 June 2013 at 22:50:55 UTC, H. S. Teoh wrote: On Wed, Jun 26, 2013 at 12:22:04AM +0200, cybervadim wrote: That would be nice to get things going. :) Ever since I found D and subscribed to this mailing list, I've been hearing rumors of allocators, but they seem to be rather lacking in the department of concrete evidence. They're like the Big Foot or Swamp Ape of D. Maybe it's time we got out into the field and produced some real evidence of these mythical beasts. :-P Well, thanks for getting the ball rolling. Maybe Andrei can pipe up about any experimental designs he's currently considering. But barring that, I'm thinking about how allocators would be used in user code. I think it's pretty much a given that the C++ way of sticking it to the end of template arguments doesn't really fly: it's just too much of a hassle to keep having to worry about passing allocators around template arguments, that people just don't bother. So coming back to square one, how would allocators be used? 1) Usually, the user would just be content with the GC, and not ever have to worry about allocators. So this means that whatever allocator design we adopt, it should be practically invisible to ordinary users unless they're specifically looking to change how memory is allocated. 2) Furthermore, it's unlikely that in the same piece of code, you'd want to use 3 or 4 different allocators for different objects; while such cases may exist, it seems to me to be more likely that you want either (a) a very specific object (say a class instance or container) to use a particular allocator, or (b) you want to transitively block off an entire section of code (which may be the entire program in some cases) to use a particular allocator. As a first stab at it, I'd say (a) can be implemented by a static class member reference to an allocator, that can be set from user code. And maybe (b) can be implemented by making gc_alloc / gc_free overridable function pointers? Then we can override their values and use scope guards to revert them back to the values they were before. This allows us to use the runtime stack to manage which allocator is currently active. This lets *all* memory allocations be rerouted through the custom allocator without needing to hand-edit every call to new down the call graph. This is just a very crude first stab at the problem, though. In particular, (a) isn't very satisfactory. And also the interaction of allocated objects with the call stack: if any custom-allocated objects in (b) survive past the containing function which sets/resets the function pointers, there could be problems: if a member function of such an object needs to allocate memory, it will pick up the ambient allocator instead of the custom allocator in effect when the object was first created. Also, we may have the problem of the wrong allocator being used to free the object. Anyone has better ideas? T From my experience all objects may be divided into 2 categories 1. temporaries. Program usually have some kind of event loop. During one iteration of this loop some temporary objects are created and then discarded. The ideal case for stack (or ranged or area) allocator, where you define allocator at the beginning of the loop cycle, use it for all temporaries, then free all the memory in one go at the end of iteration. 2. containers. Program receives an event from the outside and puts some data into container OR update the data if the record already exists. The important thing here is - when updating the data in container, you may want to resize the existing area. If you are working with temporary which should be placed into container, a copy can be made (with corresponding memory allocation from container allocator). Not sure if there is anything better than stack/area allocator for the first class. For the second class user should be able to choose default GC or more precise memory handling (e.g. explicit malloc/free for resizing). Anything I am missing in this categorization? So even if we get allocators that lets us deal with temporaries, that will be a huge benefit.
Re: why allocators are not discussed here
On Tuesday, 25 June 2013 at 22:50:55 UTC, H. S. Teoh wrote: And maybe (b) can be implemented by making gc_alloc / gc_free overridable function pointers? Then we can override their values and use scope guards to revert them back to the values they were before. Yea, I was thinking this might be a way to go. You'd have a global (well, thread-local) allocator instance that can be set and reset through stack calls. You'd want it to be RAII or delegate based, so the scope is clear. with_allocator(my_alloc, { do whatever here }); or { ChangeAllocator!my_alloc dummy; do whatever here } // dummy's destructor ends the allocator scope I think the former is a bit nicer, since the dummy variable is a bit silly. We'd hope that delegate can be inlined. But, the template still has a big advantage: you can change the type. And I think that is potentially enormously useful. Another question is how to tie into output ranges. Take std.conv.to. auto s = to!string(10); // currently, this hits the gc What if I want it to go on a stack buffer? One option would be to rewrite it to use an output range, and then call it like: char[20] buffer; auto s = to!string(10, buffer); // it returns the slice of the buffer it actually used (and we can do overloads so to!string(10, radix) still works, as well as to!string(10, radix, buffer). Hassle, I know...) Naturally, the default argument is to use the 'global' allocator, whatever that is, which does nothing special. The fun part is the output range works for that, and could also work for something like this: struct malloced_string { char* ptr; size_t length; size_t capacity; void put(char c) { if(length = capacity) ptr = realloc(ptr, capacity*2); ptr[length++] = c; } char[] slice() { return ptr[0 .. length]; } alias slice this; mixin RefCounted!this; // pretend this works } { malloced_string str; auto got = to!string(10, str); } // str is out of scope, so it gets free()'d. unsafe though: if you stored a copy of got somewhere, it is now a pointer to freed memory. I'd kinda like language support of some sort to help mitigate that though, like being a borrowed pointer that isn't allowed to be stored, but that's another discussion. And that should work. So then what we might do is provide these little output range wrappers for various allocators, and use them on many functions. So we'd write: import std.allocators; import std.range; // mallocator is provided in std.allocators and offers the goods OutputRange!(char, mallocator) str; auto got = to!string(10, str); What's nice here is the output range is useful for more than just allocators. You could also to!string(10, my_file) or a delegate, blah blah blah. So it isn't too much of a burden, it is something you might naturally use anyway. Also, we may have the problem of the wrong allocator being used to free the object. Another reason why encoding the allocator into the type is so nice. For the minimal D I've been playing with, the idea I'm running with is all allocated memory has some kind of special type, and then naked pointers are always assumed to be borrowed, so you should never store or free them. auto foo = HeapArray!char(capacity); void bar(char[] lol){} bar(foo); // allowed, foo has an alias this on slice // but struct A { char[] lol; // not allowed, because you don't know when lol is going to be freed } foo frees itself with refcounting.
Re: why allocators are not discussed here
cybervadim: From my experience all objects may be divided into 2 categories 1. temporaries. Program usually have some kind of event loop. During one iteration of this loop some temporary objects are created and then discarded. The ideal case for stack (or ranged or area) allocator, where you define allocator at the beginning of the loop cycle, use it for all temporaries, then free all the memory in one go at the end of iteration. 2. containers. Program receives an event from the outside and puts some data into container OR update the data if the record already exists. The important thing here is - when updating the data in container, you may want to resize the existing area. Many garbage collectors use the same idea (and manage it automatically), with two or three different generations: http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29#Generational_GC_.28ephemeral_GC.29 Bye, bearophile
Re: why allocators are not discussed here
Many garbage collectors use the same idea (and manage it automatically), with two or three different generations: http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29#Generational_GC_.28ephemeral_GC.29 Bye, bearophile The problem with GC is that it doesn't know which is temporary and which is not, so it has to traverse tree to determine that. Allocators in my opinion should let user specify explicitly the temporaries.
Re: why allocators are not discussed here
I was just quickly skimming some criticism of C++ allocators, since my thought here is similar to what they do. On one hand, maybe D can do it right by tweaking C++'s design rather than discarding it. On the other hand, with all the C++ I've done, I have never actually used STL allocators, which could say something about me or could say something about them. One thing I saw said making the differently allocated object a different type sucks. ...but must it? The complaint there was so much for just doing a function that takes a std::string. But, the way I'd want to do it in D is the function would take a char[] instead, and our special allocated type provides that via opSlice and/or alias this. So you'd only have to worry about the different type if you intend to take ownership of the container yourself. Which we already kinda think about in D: if you store a char[], someone else could overwrite it, so we prefer to store an immutable(char)[] aka string. If you're given a char[] and want to store it, you might idup. So I don't think doing a private copy with some other allocation scheme is any more of a hassle. (BTW immutable objects IMO should *always* be garbage collected, because part of immutability is infinite lifetime. So we might want to be careful with implicit conversions to immutable based on allocation method, which I believe we can protect through member functions.) Anyway, bottom line is I don't think that criticism necessarily applies to D. But there's surely many others and I'm more or less a n00b re c++'s allocators so idk yet.