Re: Success! (Precisely)
dsimcha wrote: 2. I'm thinking about how to write the bitmask templates. In the next release of DMD, when static arrays are value types and returnable from functions, will they be returnable from functions in CTFE? Yes.
Re: Success! (Precisely)
On 10/30/09 06:08, dsimcha wrote: After a few evenings of serious hacking, I've integrated precise heap scanning into the GC. Right now, I still need to test it better and debug it, but it at least basically works. I also still need to write the templates to generate bit masks at compile time, but this is a simple matter of programming. A few things: 1. Who knows how to write some good stress tests to make sure this works? 2. I'm thinking about how to write the bitmask templates. In the next release of DMD, when static arrays are value types and returnable from functions, will they be returnable from functions in CTFE? 3. new only takes RTTI. It is not a template. Unless RTTI gets bitmasks in the format I created (which I'll document once I clean things up and release, but has only deviated slightly from what I had talked about here), stuff allocated using it won't be able to take advantage of precise heap scanning. The default bitmask, if none is provided, uses good (bad) old-fashioned conservative scanning unless the entire block has no pointers, in which case it isn't scanned. This means that we have all the more incentive to replace new with a template of some kind. 4. I solved the static array problem, but the solution required using up one of the high-order bits. We have at least one more to play with in my bitmask scheme, because I'm storing things by word offsets, not byte offsets, since the GC isn't supposed to work with misaligned pointers anyhow. This leaves one more bit before we start limiting T.sizeof to less than full address space (on 32-bit, where a word is 4 bytes). I think it needs to be reserved for pinning, in case a copying collector ever gets implemented. If we're willing to not let any precisely scanned object have a T.sizeof of more than half the address space (a ridiculously minor limitation; this does not limit the size of arrays, only the size of classes and the elements of an array), we could throw in a third bit for weak references. Would this be possible to use with D1 ?
Re: Success! (Precisely)
== Quote from Jacob Carlborg (d...@me.com)'s article > On 10/30/09 06:08, dsimcha wrote: > > After a few evenings of serious hacking, I've integrated precise heap > > scanning > > into the GC. Right now, I still need to test it better and debug it, but it > > at least basically works. I also still need to write the templates to > > generate bit masks at compile time, but this is a simple matter of > > programming. > > > > A few things: > > > > 1. Who knows how to write some good stress tests to make sure this works? > > > > 2. I'm thinking about how to write the bitmask templates. In the next > > release of DMD, when static arrays are value types and returnable from > > functions, will they be returnable from functions in CTFE? > > > > 3. new only takes RTTI. It is not a template. Unless RTTI gets bitmasks > > in > > the format I created (which I'll document once I clean things up and > > release, > > but has only deviated slightly from what I had talked about here), stuff > > allocated using it won't be able to take advantage of precise heap scanning. > > The default bitmask, if none is provided, uses good (bad) old-fashioned > > conservative scanning unless the entire block has no pointers, in which case > > it isn't scanned. This means that we have all the more incentive to replace > > new with a template of some kind. > > > > 4. I solved the static array problem, but the solution required using up > > one > > of the high-order bits. We have at least one more to play with in my > > bitmask > > scheme, because I'm storing things by word offsets, not byte offsets, since > > the GC isn't supposed to work with misaligned pointers anyhow. This leaves > > one more bit before we start limiting T.sizeof to less than full address > > space > > (on 32-bit, where a word is 4 bytes). I think it needs to be reserved for > > pinning, in case a copying collector ever gets implemented. If we're > > willing > > to not let any precisely scanned object have a T.sizeof of more than half > > the > > address space (a ridiculously minor limitation; this does not limit the size > > of arrays, only the size of classes and the elements of an array), we could > > throw in a third bit for weak references. > Would this be possible to use with D1 ? The precise heap scanning would probably work, if the bit masks were generated manually, but I don't know if D1's templates are powerful enough to generate them. I had D2 in mind as a target, but I'll document the format once everything's cleaned up, tested, etc. and if someone wants to try to make it work on D1, they can. The only thing, though, is that D1 is supposed to be stable, and adding bit masks as an argument o GC.malloc might not fly. Then again, it would only be a lib change, probably in Tango. Furthermore, I made old-fashioned conservative scanning the default (a bit mask called conservativeBitMask is stored in the static data segment and is the default argument to GC.malloc) specifically to avoid breaking any compatibility at the source level.
Re: Success! (Precisely)
dsimcha wrote: After a few evenings of serious hacking, I've integrated precise heap scanning into the GC. Right now, I still need to test it better and debug it, but it at least basically works. I also still need to write the templates to generate bit masks at compile time, but this is a simple matter of programming. [snip] This is great news. Hopefully you'll be willing to integrate your code with druntime. I forwarded Sean your post to make sure your work doesn't slip unnoticed. Andrei
Re: Success! (Precisely)
dsimcha, el 30 de octubre a las 05:08 me escribiste: > After a few evenings of serious hacking, I've integrated precise heap scanning > into the GC. Right now, I still need to test it better and debug it, but it > at least basically works. I also still need to write the templates to > generate bit masks at compile time, but this is a simple matter of > programming. > > A few things: > > 1. Who knows how to write some good stress tests to make sure this works? If somebody have this, I'm very interested too. Being D2 it will be much harder to find programs to test. I found Dil to be a good candidate for testing, at least it failed with some bugs other smaller, simpler test didn't. But it's for D1. And congratulations! Great news. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ -- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) -- Cada movimiento que no se hace, es un movimiento que se pierde. Y cada movimiento que se pierde, se transforma en una mochila. Y las mochilas nos alejan, de nuestros amigos y nuestras amigas. Y nuestros amigos se transforman, en enemigos y en enemigas. Cada movimiento que se hace, es una mochila que se deja.
Re: Success! (Precisely)
== Quote from Andrei Alexandrescu (seewebsiteforem...@erdani.org)'s article > This is great news. Hopefully you'll be willing to integrate your code > with druntime. That was kind of the point all along. I'll file it in Bugzilla after I finish the bitmask generation, testing and debugging.
Re: Success! (Precisely)
Leandro Lucarella: > If somebody have this, I'm very interested too. Two D translations of the Olden Benchmarks, more to come: http://www.fantascienza.net/leonardo/js/index.html Bye, bearophile
Re: Success! (Precisely)
== Quote from bearophile (bearophileh...@lycos.com)'s article > Leandro Lucarella: > > If somebody have this, I'm very interested too. > Two D translations of the Olden Benchmarks, more to come: > http://www.fantascienza.net/leonardo/js/index.html > Bye, > bearophile These might be worth a try, but they're more benchmarks for performance than tests of correctness, if I understand correctly. I was looking for the latter. The improvements I'm working on have nothing to do with speed (though I may throw in a few optimizations I found while I'm at it). They have to do with preventing the GC from retaining memory unnecessarily.
Re: Success! (Precisely)
dsimcha: > These might be worth a try, but they're more benchmarks for performance than > tests > of correctness, if I understand correctly.< Yes, I understand. What kind of code are you looking for then? Bye, bearophile
Re: Success! (Precisely)
== Quote from bearophile (bearophileh...@lycos.com)'s article > dsimcha: > > These might be worth a try, but they're more benchmarks for performance > > than tests > > of correctness, if I understand correctly.< > Yes, I understand. What kind of code are you looking for then? > Bye, > bearophile I was just thinking maybe someone else (I mostly had Sean and Walter in mind, but would certainly accept solutions from others) had written a GC test suite and never released it. If not, I'll probably just write my own and include it with the patches.
Re: Success! (Precisely)
> Yes, I understand. What kind of code are you looking for then? You have already answered, in a way. You need complex code that runs for a lot of time, while keeping the total amount of memory used constant. You also surely need some synthetic tests, to spot eventual bugs better. I don't know if I have code for you, probably not. Indeed the Olden benchmarks stress the GC, but only its performance, not its precision... Bye, bearophile
Re: Success! (Precisely)
On Fri, 30 Oct 2009 08:08:10 +0300, dsimcha wrote: After a few evenings of serious hacking, I've integrated precise heap scanning into the GC. Right now, I still need to test it better and debug it, but it at least basically works. I also still need to write the templates to generate bit masks at compile time, but this is a simple matter of programming. A few things: 1. Who knows how to write some good stress tests to make sure this works? 2. I'm thinking about how to write the bitmask templates. In the next release of DMD, when static arrays are value types and returnable from functions, will they be returnable from functions in CTFE? 3. new only takes RTTI. It is not a template. Unless RTTI gets bitmasks in the format I created (which I'll document once I clean things up and release, but has only deviated slightly from what I had talked about here), stuff allocated using it won't be able to take advantage of precise heap scanning. The default bitmask, if none is provided, uses good (bad) old-fashioned conservative scanning unless the entire block has no pointers, in which case it isn't scanned. This means that we have all the more incentive to replace new with a template of some kind. 4. I solved the static array problem, but the solution required using up one of the high-order bits. We have at least one more to play with in my bitmask scheme, because I'm storing things by word offsets, not byte offsets, since the GC isn't supposed to work with misaligned pointers anyhow. This leaves one more bit before we start limiting T.sizeof to less than full address space (on 32-bit, where a word is 4 bytes). I think it needs to be reserved for pinning, in case a copying collector ever gets implemented. If we're willing to not let any precisely scanned object have a T.sizeof of more than half the address space (a ridiculously minor limitation; this does not limit the size of arrays, only the size of classes and the elements of an array), we could throw in a third bit for weak references. Blaze (http://www.dsource.org/projects/blaze) is often suggested for stress-testing the GC. Probably, because it does huge amount of dynamic allocations, while total amount of memory consumed is about the same. Worth a note, it's for D1/Tango, but you said you were going to port it to Tango, too, so it may be better to start with Tango (because there are a lot more code written against Tango and you get instant user feedback) and then port it to druntime. If not a performance test, it may be a good correctness test (so that you don't collect memory which is still referenced).
Re: Success! (Precisely)
== Quote from Denis Koroskin (2kor...@gmail.com)'s article > On Fri, 30 Oct 2009 08:08:10 +0300, dsimcha wrote: > > After a few evenings of serious hacking, I've integrated precise heap > > scanning > > into the GC. Right now, I still need to test it better and debug it, > > but it > > at least basically works. I also still need to write the templates to > > generate bit masks at compile time, but this is a simple matter of > > programming. > > > > A few things: > > > > 1. Who knows how to write some good stress tests to make sure this > > works? > > > > 2. I'm thinking about how to write the bitmask templates. In the next > > release of DMD, when static arrays are value types and returnable from > > functions, will they be returnable from functions in CTFE? > > > > 3. new only takes RTTI. It is not a template. Unless RTTI gets > > bitmasks in > > the format I created (which I'll document once I clean things up and > > release, > > but has only deviated slightly from what I had talked about here), stuff > > allocated using it won't be able to take advantage of precise heap > > scanning. > > The default bitmask, if none is provided, uses good (bad) old-fashioned > > conservative scanning unless the entire block has no pointers, in which > > case > > it isn't scanned. This means that we have all the more incentive to > > replace > > new with a template of some kind. > > > > 4. I solved the static array problem, but the solution required using > > up one > > of the high-order bits. We have at least one more to play with in my > > bitmask > > scheme, because I'm storing things by word offsets, not byte offsets, > > since > > the GC isn't supposed to work with misaligned pointers anyhow. This > > leaves > > one more bit before we start limiting T.sizeof to less than full address > > space > > (on 32-bit, where a word is 4 bytes). I think it needs to be reserved > > for > > pinning, in case a copying collector ever gets implemented. If we're > > willing > > to not let any precisely scanned object have a T.sizeof of more than > > half the > > address space (a ridiculously minor limitation; this does not limit the > > size > > of arrays, only the size of classes and the elements of an array), we > > could > > throw in a third bit for weak references. > Blaze (http://www.dsource.org/projects/blaze) is often suggested for > stress-testing the GC. Probably, because it does huge amount of dynamic > allocations, while total amount of memory consumed is about the same. > Worth a note, it's for D1/Tango, but you said you were going to port it to > Tango, too, so it may be better to start with Tango (because there are a > lot more code written against Tango and you get instant user feedback) and > then port it to druntime. If not a performance test, it may be a good > correctness test (so that you don't collect memory which is still > referenced). Clarification: I didn't say **I** was going to port this to Tango. I've been targeting D2/druntime all along. A Tango port is not out of the question if the port ends up being fairly trivial and the Tango devs are ok with breaking backwards compatibility, at least at the binary level. However, if the port is non-trivial then it's unlikely that I'll do it since I don't use D1 and, if Tango ever gets ported to D2 it will run on top of druntime. I just said that if **someone else** cares enough to port this to Tango, it shouldn't be terribly hard, since druntime is basically the Tango runtime.
Re: Success! (Precisely)
On 10/30/09 14:29, dsimcha wrote: == Quote from Jacob Carlborg (d...@me.com)'s article On 10/30/09 06:08, dsimcha wrote: After a few evenings of serious hacking, I've integrated precise heap scanning into the GC. Right now, I still need to test it better and debug it, but it at least basically works. I also still need to write the templates to generate bit masks at compile time, but this is a simple matter of programming. A few things: 1. Who knows how to write some good stress tests to make sure this works? 2. I'm thinking about how to write the bitmask templates. In the next release of DMD, when static arrays are value types and returnable from functions, will they be returnable from functions in CTFE? 3. new only takes RTTI. It is not a template. Unless RTTI gets bitmasks in the format I created (which I'll document once I clean things up and release, but has only deviated slightly from what I had talked about here), stuff allocated using it won't be able to take advantage of precise heap scanning. The default bitmask, if none is provided, uses good (bad) old-fashioned conservative scanning unless the entire block has no pointers, in which case it isn't scanned. This means that we have all the more incentive to replace new with a template of some kind. 4. I solved the static array problem, but the solution required using up one of the high-order bits. We have at least one more to play with in my bitmask scheme, because I'm storing things by word offsets, not byte offsets, since the GC isn't supposed to work with misaligned pointers anyhow. This leaves one more bit before we start limiting T.sizeof to less than full address space (on 32-bit, where a word is 4 bytes). I think it needs to be reserved for pinning, in case a copying collector ever gets implemented. If we're willing to not let any precisely scanned object have a T.sizeof of more than half the address space (a ridiculously minor limitation; this does not limit the size of arrays, only the size of classes and the elements of an array), we could throw in a third bit for weak references. Would this be possible to use with D1 ? The precise heap scanning would probably work, if the bit masks were generated manually, but I don't know if D1's templates are powerful enough to generate them. I had D2 in mind as a target, but I'll document the format once everything's cleaned up, tested, etc. and if someone wants to try to make it work on D1, they can. The only thing, though, is that D1 is supposed to be stable, and adding bit masks as an argument o GC.malloc might not fly. Then again, it would only be a lib change, probably in Tango. Furthermore, I made old-fashioned conservative scanning the default (a bit mask called conservativeBitMask is stored in the static data segment and is the default argument to GC.malloc) specifically to avoid breaking any compatibility at the source level. Ok, well I didn't think phobos would change so I was think about the possibility to add this to tango. It seems it's mostly a runtime thing but I may be wrong.
Re: Success! (Precisely)
Denis Koroskin Wrote: > On Fri, 30 Oct 2009 08:08:10 +0300, dsimcha wrote: > > > After a few evenings of serious hacking, I've integrated precise heap > > scanning > > into the GC. Right now, I still need to test it better and debug it, > > but it > > at least basically works. I also still need to write the templates to > > generate bit masks at compile time, but this is a simple matter of > > programming. > > > > A few things: > > > > 1. Who knows how to write some good stress tests to make sure this > > works? > > > > 2. I'm thinking about how to write the bitmask templates. In the next > > release of DMD, when static arrays are value types and returnable from > > functions, will they be returnable from functions in CTFE? > > > > 3. new only takes RTTI. It is not a template. Unless RTTI gets > > bitmasks in > > the format I created (which I'll document once I clean things up and > > release, > > but has only deviated slightly from what I had talked about here), stuff > > allocated using it won't be able to take advantage of precise heap > > scanning. > > The default bitmask, if none is provided, uses good (bad) old-fashioned > > conservative scanning unless the entire block has no pointers, in which > > case > > it isn't scanned. This means that we have all the more incentive to > > replace > > new with a template of some kind. > > > > 4. I solved the static array problem, but the solution required using > > up one > > of the high-order bits. We have at least one more to play with in my > > bitmask > > scheme, because I'm storing things by word offsets, not byte offsets, > > since > > the GC isn't supposed to work with misaligned pointers anyhow. This > > leaves > > one more bit before we start limiting T.sizeof to less than full address > > space > > (on 32-bit, where a word is 4 bytes). I think it needs to be reserved > > for > > pinning, in case a copying collector ever gets implemented. If we're > > willing > > to not let any precisely scanned object have a T.sizeof of more than > > half the > > address space (a ridiculously minor limitation; this does not limit the > > size > > of arrays, only the size of classes and the elements of an array), we > > could > > throw in a third bit for weak references. > > Blaze (http://www.dsource.org/projects/blaze) is often suggested for > stress-testing the GC. Probably, because it does huge amount of dynamic > allocations, while total amount of memory consumed is about the same. > Worth a note, it's for D1/Tango, but you said you were going to port it to > Tango, too, so it may be better to start with Tango (because there are a > lot more code written against Tango and you get instant user feedback) and > then port it to druntime. If not a performance test, it may be a good > correctness test (so that you don't collect memory which is still > referenced). Blaze is for D1, yes, but it's not only for Tango - I made the initial port for Phobos, and the author decided to keep it (all further changes were Phobos-compatiable). It's been available for both Tango and Phobos for a long while now.
Re: Success! (Precisely)
"dsimcha" wrote in message news:hcdsbq$4i...@digitalmars.com... After a few evenings of serious hacking, I've integrated precise heap scanning into the GC. Awesome! Thank you so much for doing this. Does the GC have knowledge of pointers on both the stack as well as the heap? 3. new only takes RTTI. It is not a template. Unless RTTI gets bitmasks in the format I created (which I'll document once I clean things up and release, but has only deviated slightly from what I had talked about here), stuff allocated using it won't be able to take advantage of precise heap scanning. The default bitmask, if none is provided, uses good (bad) old-fashioned conservative scanning unless the entire block has no pointers, in which case it isn't scanned. This means that we have all the more incentive to replace new with a template of some kind. I'm surprised nobody commented on this. Andrei said Walter decided that new shouldn't be a template. But it seems like a good idea to me. Templated new has received positive feedback from the community, and here is another advantage of templated new. -Craig
Re: Success! (Precisely)
== Quote from Craig Black (craigbla...@cox.net)'s article > "dsimcha" wrote in message > news:hcdsbq$4i...@digitalmars.com... > > After a few evenings of serious hacking, I've integrated precise heap > > scanning > > into the GC. > Awesome! Thank you so much for doing this. Does the GC have knowledge of > pointers on both the stack as well as the heap? No. Precise stack and static data segment scanning would be an order of magnitude harder to implement because it would require going deep into compiler hacking. Precise heap scanning, to the extent that I can get the mask information to the GC somehow, can be implemented by just hacking the runtime. > > 3. new only takes RTTI. It is not a template. Unless RTTI gets bitmasks > > in > > the format I created (which I'll document once I clean things up and > > release, > > but has only deviated slightly from what I had talked about here), stuff > > allocated using it won't be able to take advantage of precise heap > > scanning. > > The default bitmask, if none is provided, uses good (bad) old-fashioned > > conservative scanning unless the entire block has no pointers, in which > > case > > it isn't scanned. This means that we have all the more incentive to > > replace > > new with a template of some kind. > I'm surprised nobody commented on this. Andrei said Walter decided that new > shouldn't be a template. But it seems like a good idea to me. Templated > new has received positive feedback from the community, and here is another > advantage of templated new. > -Craig What's the rationale for not templating new? IMHO it makes a lot of sense b/c then you have access to all the compile time features of the language while new'ing stuff, such as the ability to generate bitmasks.
Re: Success! (Precisely)
== Quote from Craig Black (craigbla...@cox.net)'s article Does the GC have knowledge of > pointers on both the stack as well as the heap? Also, probably most of the problem with false pointers is on the heap anyhow. The stack is usually on the order of a few 10s of kilobytes, the static data segment maybe another few 10s of kilobytes, whereas the heap can be hundreds of megabytes. I think that, given how little progress has been made on improving the current GC implementation and how hard a problem it is, adding precise heap scanning to the current GC is a good 80/20 solution until D becomes popular enough to earn hordes of money for research into improving its GC.
Re: Success! (Precisely)
"dsimcha" wrote in message news:hcfu9t$9d...@digitalmars.com... == Quote from Craig Black (craigbla...@cox.net)'s article "dsimcha" wrote in message news:hcdsbq$4i...@digitalmars.com... > After a few evenings of serious hacking, I've integrated precise heap > scanning > into the GC. Awesome! Thank you so much for doing this. Does the GC have knowledge of pointers on both the stack as well as the heap? No. Precise stack and static data segment scanning would be an order of magnitude harder to implement because it would require going deep into compiler hacking. Precise heap scanning, to the extent that I can get the mask information to the GC somehow, can be implemented by just hacking the runtime. > 3. new only takes RTTI. It is not a template. Unless RTTI gets > bitmasks > in > the format I created (which I'll document once I clean things up and > release, > but has only deviated slightly from what I had talked about here), > stuff > allocated using it won't be able to take advantage of precise heap > scanning. > The default bitmask, if none is provided, uses good (bad) old-fashioned > conservative scanning unless the entire block has no pointers, in which > case > it isn't scanned. This means that we have all the more incentive to > replace > new with a template of some kind. I'm surprised nobody commented on this. Andrei said Walter decided that new shouldn't be a template. But it seems like a good idea to me. Templated new has received positive feedback from the community, and here is another advantage of templated new. -Craig What's the rationale for not templating new? IMHO it makes a lot of sense b/c then you have access to all the compile time features of the language while new'ing stuff, such as the ability to generate bitmasks. You would have to ask Walter. The only downside I see is the extra exclaimation point, but perhaps he has another reason. -Craig
Re: Success! (Precisely)
"dsimcha" wrote in message news:hcfur2$av...@digitalmars.com... == Quote from Craig Black (craigbla...@cox.net)'s article Does the GC have knowledge of pointers on both the stack as well as the heap? Also, probably most of the problem with false pointers is on the heap anyhow. The stack is usually on the order of a few 10s of kilobytes, the static data segment maybe another few 10s of kilobytes, whereas the heap can be hundreds of megabytes. I think that, given how little progress has been made on improving the current GC implementation and how hard a problem it is, adding precise heap scanning to the current GC is a good 80/20 solution until D becomes popular enough to earn hordes of money for research into improving its GC. Yeah, that would be great. Maybe some day. One step at a time I suppose. -Craig