Re: GC: what is better, reuse or avoid cloning?
At 04:57 PM 2/12/2001 -0300, Branden wrote: >Dan Sugalski wrote: > > ... > > doing software copy-on-write stuff, along with having to make the garbage > > collector smart enough to deal with multiple PMCs pointing to identical >memory. > >??? > >I thought that was the big deal of GC! Dealing with many references to the >same thing and free the thing when there's none more... Having multiple pointers to the same piece of memory makes many garbage collectors more complex, since it means that when you move that chunk of memory you need to change all the pointers to it. If the moveable piece of memory has only a single pointer to it, it makes things easier and faster, which is generally a good thing. I didn't say it wasn't possible, just that it makes things more difficult. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: GC: what is better, reuse or avoid cloning?
Dan Sugalski wrote: > ... > doing software copy-on-write stuff, along with having to make the garbage > collector smart enough to deal with multiple PMCs pointing to identical memory. ??? I thought that was the big deal of GC! Dealing with many references to the same thing and free the thing when there's none more... - Branden
Re: GC: what is better, reuse or avoid cloning?
At 04:21 PM 2/12/2001 -0300, Branden wrote: >Jan Dubois wrote: >You point out two disadvantages: > > > - It steal 2 bits from the SvTYPE flags. Flags are a *very* scarce > > resource and shouldn't be used up unless there are very good reasons > > for it. > > > > - Using shared strings is not totally backward compatible: Extensions > > *must* check if a SV* is shared and naturalize it if it intends to > > change the contents. Note that I had to patch one occurrence of this > > in the bundled Data::Dumper. This could be improved *some* by adding > > XSUBPP for it, but wouldn't help in case the extension accesses SvPVX > > directly. > >Considering Perl 6 will be built from scratch, I think these are not an >issue anymore, right? Wrong. Flags will always be in scarce supply (nature of the beast). Shared strings, at least inside perl, are reasonably problematic, as it means doing software copy-on-write stuff, along with having to make the garbage collector smart enough to deal with multiple PMCs pointing to identical memory. Neither are an insurmountable problem, but I don't think it's one worth tackling for the first cut. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: GC: what is better, reuse or avoid cloning?
Jan Dubois wrote: > >Perl 5 basically clones on every assignment. As it uses refcounting, it > >knows it doesn't need to clone a string if its refcount=1 and it's marked as > >temporary, i.e., only a temporary that will go away anyway knows about this > >string, so it's guaranteed no other reference to it will exist. > > I did experiment with this idea just for fun some time ago: > > http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-07/msg00550.html That's exactly what I was looking for. You point out two disadvantages: > - It steal 2 bits from the SvTYPE flags. Flags are a *very* scarce > resource and shouldn't be used up unless there are very good reasons > for it. > > - Using shared strings is not totally backward compatible: Extensions > *must* check if a SV* is shared and naturalize it if it intends to > change the contents. Note that I had to patch one occurrence of this > in the bundled Data::Dumper. This could be improved *some* by adding > XSUBPP for it, but wouldn't help in case the extension accesses SvPVX > directly. Considering Perl 6 will be built from scratch, I think these are not an issue anymore, right? Other consideration I see is the ``no more refcounts'', which would make it hard to see if something is SHARED (it's shared if refcount > 1). This would mean we have no way to find out if something is or isn't shared, so that every modification would have to `naturalize' (or clone) the data, instead of reusing it. Would this be a too big overhead? Actually, I've thought of a scheme that would allow reusing data, but that would require the opcodes telling the values they want them to reuse the data explicitly. That would require more complexity on the implementation and optimizers that can find this kind of situation on the source code. That means: 1) I avoid it, if I can; or 2) I'll go for it, if it's a big win. - Branden
Re: GC: what is better, reuse or avoid cloning?
On Mon, 12 Feb 2001 15:18:47 -0300, "Branden" <[EMAIL PROTECTED]> wrote: >Alan Burlison wrote: >> Branden wrote: >> > Any suggestions? >> Yes, but none of them polite. I do think this rudeness is uncalled for. >> You might do well to study the way perl5 handles these issues. > >Perl 5 basically clones on every assignment. As it uses refcounting, it >knows it doesn't need to clone a string if its refcount=1 and it's marked as >temporary, i.e., only a temporary that will go away anyway knows about this >string, so it's guaranteed no other reference to it will exist. I did experiment with this idea just for fun some time ago: http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-07/msg00550.html If you follow the follow-up links, you will also find some benchmarks, e.g. here: http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-07/msg00773.html This approach could be valuable for *large* or [inout] style foreign buffers (shared memory, mmaped files etc). But it is questionable if the overall gain in a few special cases is worth the added complexity in the internals. -Jan
Re: GC: what is better, reuse or avoid cloning?
Alan Burlison wrote: > Branden wrote: > > Any suggestions? > Yes, but none of them polite. > You might do well to study the way perl5 handles these issues. Perl 5 basically clones on every assignment. As it uses refcounting, it knows it doesn't need to clone a string if its refcount=1 and it's marked as temporary, i.e., only a temporary that will go away anyway knows about this string, so it's guaranteed no other reference to it will exist. - Branden
Re: GC: what is better, reuse or avoid cloning?
At 12:51 AM 2/10/2001 -0200, Branden wrote: >Back to the GC issue, I was wondering something. Okay, I snipped all of this. After reading it, I'm pretty sure it makes no sense at all. Branden, I'd recommend picking up a copy of _Garbage Collection_ and reading it. The ISBN's in the perl reading list. (My copy's in the office or I'd dig it out for you) Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: GC: what is better, reuse or avoid cloning?
Branden wrote: > Any suggestions? Yes, but none of them polite. You might do well to study the way perl5 handles these issues. Alan Burlison
Re: GC: what is better, reuse or avoid cloning?
Buddha M Buck wrote: > > > I see two ways of doing this: one is allowing a string value to be shared by > > > two or more variables, and the other one not. > > > > Why would you want to share the string value? Why did you assign the > > value of $foo to $bar if you really wanted to: > > > >$bar = \$foo; > > I think what he's thinking (in C terms) would be more like the following: > > typedef struct { int length; char *s } string; > > // $foo = "xyzzy"; > string foo; foo.length = 5; foo.s = strdup("xyzzy---blank---buffer---space---"); > > // $bar = $foo; > string bar; bar.length = foo.length; bar.s = foo.s; > > // $foo .= "xyzzy"; > strncpy(foo.s+foo.length,"xyzzy",5); foo.length += 5; > > // $foo and $bar share string buffers, but $bar only sees the first 5 > // characters while $foo sees the first 10. > > I don't see that as quite the same as the implicit references or > type-globs you suggested. > > But it's late, and I might not know what I'm talking about... > That's exactly what I mean. But actually doing that second $foo .= "xyzzy" thing without allocating a new string would be problematic, since if I did $bar .= "abccb" after that in the same way that it's done for $foo, it would overwrite the "xyzzy" in $foo, right? - Branden
Re: GC: what is better, reuse or avoid cloning?
> > On Sat, 10 Feb 2001, Branden wrote: > > > Suppose I have a string stored in $foo, say, "abcbca", and then I do: > > > > $bar = $foo; > > $foo .= "xyzyzx"; > > > > I see two ways of doing this: one is allowing a string value to be shared by > > two or more variables, and the other one not. > > Why would you want to share the string value? Why did you assign the > value of $foo to $bar if you really wanted to: > >$bar = \$foo; > > Or actually closer to what you seem to want: > >*bar = \$foo; > > Although a little birdy told me we're dropping globs for Perl6. Don't > most programmers do assignment for a reason? Why should we second-guess > them? I think what he's thinking (in C terms) would be more like the following: typedef struct { int length; char *s } string; // $foo = "xyzzy"; string foo; foo.length = 5; foo.s = strdup("xyzzy---blank---buffer---space---"); // $bar = $foo; string bar; bar.length = foo.length; bar.s = foo.s; // $foo .= "xyzzy"; strncpy(foo.s+foo.length,"xyzzy",5); foo.length += 5; // $foo and $bar share string buffers, but $bar only sees the first 5 // characters while $foo sees the first 10. I don't see that as quite the same as the implicit references or type-globs you suggested. But it's late, and I might not know what I'm talking about...
Re: GC: what is better, reuse or avoid cloning?
On Sat, 10 Feb 2001, Buddha M Buck wrote: > I think what he's thinking (in C terms) would be more like the following: Right. It already has a technical name - copy-on-write. I should have made it more clear that I recognized the intended mechanism. I was trying to demonstrate that Perl-level mechanisms already existed to do value aliasing. I was also trying to show that what he is suggesting is a lot like aliasing with some simple copy-on-write STORE magic. For some reason I thought that by pointing that out I could relieve him of his bizarre worries about garbage collecting things with references. I probably should have just gone to bed instead. -sam
Re: GC: what is better, reuse or avoid cloning?
On Sat, 10 Feb 2001, Branden wrote: > Suppose I have a string stored in $foo, say, "abcbca", and then I do: > > $bar = $foo; > $foo .= "xyzyzx"; > > I see two ways of doing this: one is allowing a string value to be shared by > two or more variables, and the other one not. Why would you want to share the string value? Why did you assign the value of $foo to $bar if you really wanted to: $bar = \$foo; Or actually closer to what you seem to want: *bar = \$foo; Although a little birdy told me we're dropping globs for Perl6. Don't most programmers do assignment for a reason? Why should we second-guess them? > Given mark-and-sweep or other advanced GC, which of them is better? Sharing > the value or cloning on each assignment? I don't believe implementing copy-on-write for scalars has anything to do with garbage collection. Any garbage collector that will work for Perl will need to work with references. All you're suggesting is a beneath-the-covers reference, right? What's the point? You seem to be engaging in some extreme and bizarre form of premature optimization. -sam
GC: what is better, reuse or avoid cloning?
Back to the GC issue, I was wondering something. Suppose I have a string stored in $foo, say, "abcbca", and then I do: $bar = $foo; $foo .= "xyzyzx"; I see two ways of doing this: one is allowing a string value to be shared by two or more variables, and the other one not. If I allow sharing a string value between two variables, I see that $bar = $foo only points $bar to the same string $foo is pointing. But when I do $foo .= "xyzyzx", I cannot use the same buffer that $foo points, because I don't know if the string pointed by $foo is shared or not (in this case I can, but consider that the two statements are separated by many others, and also sub calls; suppose $foo is actually a parameter to a sub that is passed with its value already shared to another variable). Then I must allocate a new buffer, copy the buffer of $foo, and do the `concat' operation with the string "xyzyzx". If I don't allow sharing a string value between two variables, $foo .= "xyzyzx" could use $foo's buffer, if it's room in it, because guaranteedly there will be no other variable pointing to it. However, $bar = $foo needs to clone the string of $foo, since it cannot be shared between $foo and $bar. In this case there would be little problem in both cloning and in not reusing the buffer, because the strings are short, but imagine $foo has actually a 10KB string, or even something bigger... And imagine the = or .= is actually being done in a loop that will potentially be run some hundreds of times... Given mark-and-sweep or other advanced GC, which of them is better? Sharing the value or cloning on each assignment? (If there's no much difference, I'd keep with sharing the value, since it seems simpler to implement and doesn't require a clone operator, which can be messy). I actually see two other approaches to this problem. One is using refcounted GC. Why? Because if the refcounter is 1, I know this is the only object with the buffer, no other object is looking at it, and I could use the same buffer on .= without any problem at all. Other way would be putting the burden on the compiler. If the compiler sees something like foreach $i (1..$big_loop) { $foo .= ", $i"; } it would clone $foo's value _before_ entering in the loop. This would assure $foo's value is not shared with any other variable. This way, $foo's buffer could be used to do the .= operation, having to create a new one only if that one overflows. There's actually a problem with reusing the buffer if the value of $foo has the `concat' and `destroy' operations overloaded. If I'm reusing $foo's buffer, on $foo .= "xyzyzx" I would call the (overloaded) concat operation of $foo's value, telling it to use $foo's buffer if possible. This would probably invalidade $foo's old value. But then I should call $foo's `destroy', since $foo's old value is no longer valid, and the overloaded action of $foo's value should take place. But `destroy' would encounter an invalid value, or even would try to destroy the data in the new value of $foo. And not calling `destroy' wouldn't be the right behaviour, since it can do another kind of cleanup that should actually be necessary. My call would be `not reusing values at all', but I'm quite unsure what would be the performance of such a use. Maybe allow the compiler optimizing version if $foo's value has a simple `destroy' that doesn't need to be called if the buffer was reused... Or split `destroy' in two, one that would get called always and the other only when the value is cleaned by the GC... Any suggestions? Thanks, - Branden