Re: GC: what is better, reuse or avoid cloning?

2001-02-12 Thread Dan Sugalski

At 04:57 PM 2/12/2001 -0300, Branden wrote:
>Dan Sugalski wrote:
> > ...
> > doing software copy-on-write stuff, along with having to make the garbage
> > collector smart enough to deal with multiple PMCs pointing to identical
>memory.
>
>???
>
>I thought that was the big deal of GC! Dealing with many references to the
>same thing and free the thing when there's none more...

Having multiple pointers to the same piece of memory makes many garbage 
collectors more complex, since it means that when you move that chunk of 
memory you need to change all the pointers to it. If the moveable piece of 
memory has only a single pointer to it, it makes things easier and faster, 
which is generally a good thing.

I didn't say it wasn't possible, just that it makes things more difficult.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: GC: what is better, reuse or avoid cloning?

2001-02-12 Thread Branden

Dan Sugalski wrote:
> ...
> doing software copy-on-write stuff, along with having to make the garbage
> collector smart enough to deal with multiple PMCs pointing to identical
memory.

???

I thought that was the big deal of GC! Dealing with many references to the
same thing and free the thing when there's none more...

- Branden




Re: GC: what is better, reuse or avoid cloning?

2001-02-12 Thread Dan Sugalski

At 04:21 PM 2/12/2001 -0300, Branden wrote:
>Jan Dubois wrote:
>You point out two disadvantages:
>
> > - It steal 2 bits from the SvTYPE flags.  Flags are a *very* scarce
> >   resource and shouldn't be used up unless there are very good reasons
> >   for it.
> >
> > - Using shared strings is not totally backward compatible:  Extensions
> >   *must* check if a SV* is shared and naturalize it if it intends to
> >   change the contents.  Note that I had to patch one occurrence of this
> >   in the bundled Data::Dumper.  This could be improved *some* by adding
> >   XSUBPP for it, but wouldn't help in case the extension accesses SvPVX
> >   directly.
>
>Considering Perl 6 will be built from scratch, I think these are not an
>issue anymore, right?

Wrong. Flags will always be in scarce supply (nature of the beast). Shared 
strings, at least inside perl, are reasonably problematic, as it means 
doing software copy-on-write stuff, along with having to make the garbage 
collector smart enough to deal with multiple PMCs pointing to identical memory.

Neither are an insurmountable problem, but I don't think it's one worth 
tackling for the first cut.


Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: GC: what is better, reuse or avoid cloning?

2001-02-12 Thread Branden

Jan Dubois wrote:
> >Perl 5 basically clones on every assignment. As it uses refcounting, it
> >knows it doesn't need to clone a string if its refcount=1 and it's marked
as
> >temporary, i.e., only a temporary that will go away anyway knows about
this
> >string, so it's guaranteed no other reference to it will exist.
>
> I did experiment with this idea just for fun some time ago:
>
>
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-07/msg00550.html

That's exactly what I was looking for.

You point out two disadvantages:

> - It steal 2 bits from the SvTYPE flags.  Flags are a *very* scarce
>   resource and shouldn't be used up unless there are very good reasons
>   for it.
>
> - Using shared strings is not totally backward compatible:  Extensions
>   *must* check if a SV* is shared and naturalize it if it intends to
>   change the contents.  Note that I had to patch one occurrence of this
>   in the bundled Data::Dumper.  This could be improved *some* by adding
>   XSUBPP for it, but wouldn't help in case the extension accesses SvPVX
>   directly.

Considering Perl 6 will be built from scratch, I think these are not an
issue anymore, right?

Other consideration I see is the ``no more refcounts'', which would make it
hard to see if something is SHARED (it's shared if refcount > 1). This would
mean we have no way to find out if something is or isn't shared, so that
every modification would have to `naturalize' (or clone) the data, instead
of reusing it. Would this be a too big overhead?

Actually, I've thought of a scheme that would allow reusing data, but that
would require the opcodes telling the values they want them to reuse the
data explicitly. That would require more complexity on the implementation
and optimizers that can find this kind of situation on the source code. That
means:
1) I avoid it, if I can; or
2) I'll go for it, if it's a big win.

- Branden




Re: GC: what is better, reuse or avoid cloning?

2001-02-12 Thread Jan Dubois

On Mon, 12 Feb 2001 15:18:47 -0300, "Branden" <[EMAIL PROTECTED]>
wrote:

>Alan Burlison wrote:
>> Branden wrote:
>> > Any suggestions?
>> Yes, but none of them polite.

I do think this rudeness is uncalled for.

>> You might do well to study the way perl5 handles these issues.
>
>Perl 5 basically clones on every assignment. As it uses refcounting, it
>knows it doesn't need to clone a string if its refcount=1 and it's marked as
>temporary, i.e., only a temporary that will go away anyway knows about this
>string, so it's guaranteed no other reference to it will exist.

I did experiment with this idea just for fun some time ago:

http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-07/msg00550.html

If you follow the follow-up links, you will also find some benchmarks,
e.g. here:

http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-07/msg00773.html

This approach could be valuable for *large* or [inout] style foreign
buffers (shared memory, mmaped files etc).  But it is questionable if the
overall gain in a few special cases is worth the added complexity in the
internals.

-Jan




Re: GC: what is better, reuse or avoid cloning?

2001-02-12 Thread Branden

Alan Burlison wrote:
> Branden wrote:
> > Any suggestions?
> Yes, but none of them polite.
> You might do well to study the way perl5 handles these issues.

Perl 5 basically clones on every assignment. As it uses refcounting, it
knows it doesn't need to clone a string if its refcount=1 and it's marked as
temporary, i.e., only a temporary that will go away anyway knows about this
string, so it's guaranteed no other reference to it will exist.

- Branden




Re: GC: what is better, reuse or avoid cloning?

2001-02-10 Thread Dan Sugalski

At 12:51 AM 2/10/2001 -0200, Branden wrote:
>Back to the GC issue, I was wondering something.

Okay, I snipped all of this. After reading it, I'm pretty sure it makes no 
sense at all.

Branden, I'd recommend picking up a copy of _Garbage Collection_ and 
reading it. The ISBN's in the perl reading list. (My copy's in the office 
or I'd dig it out for you)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: GC: what is better, reuse or avoid cloning?

2001-02-10 Thread Alan Burlison

Branden wrote:

> Any suggestions?

Yes, but none of them polite.

You might do well to study the way perl5 handles these issues.

Alan Burlison



Re: GC: what is better, reuse or avoid cloning?

2001-02-10 Thread Filipe Brandenburger

Buddha M Buck wrote:

> > > I see two ways of doing this: one is allowing a string value to be
shared by
> > > two or more variables, and the other one not.
> >
> > Why would you want to share the string value?  Why did you assign the
> > value of $foo to $bar if you really wanted to:
> >
> >$bar = \$foo;
>
> I think what he's thinking (in C terms) would be more like the following:
>
> typedef struct { int length; char *s } string;
>
> // $foo = "xyzzy";
> string foo; foo.length = 5; foo.s =
strdup("xyzzy---blank---buffer---space---");
>
> // $bar = $foo;
> string bar; bar.length = foo.length; bar.s = foo.s;
>
> // $foo .= "xyzzy";
> strncpy(foo.s+foo.length,"xyzzy",5); foo.length += 5;
>
> // $foo and $bar share string buffers, but $bar only sees the first 5
> // characters while $foo sees the first 10.
>
> I don't see that as quite the same as the implicit references or
> type-globs you suggested.
>
> But it's late, and I might not know what I'm talking about...
>

That's exactly what I mean.

But actually doing that second $foo .= "xyzzy" thing without allocating a
new string would be problematic, since if I did $bar .= "abccb" after that
in the same way that it's done for $foo, it would overwrite the "xyzzy" in
$foo, right?

- Branden






Re: GC: what is better, reuse or avoid cloning?

2001-02-09 Thread Buddha M Buck

> 
> On Sat, 10 Feb 2001, Branden wrote:
> 
> > Suppose I have a string stored in $foo, say, "abcbca", and then I do:
> >
> > $bar  = $foo;
> > $foo .= "xyzyzx";
> >
> > I see two ways of doing this: one is allowing a string value to be shared by
> > two or more variables, and the other one not.
> 
> Why would you want to share the string value?  Why did you assign the
> value of $foo to $bar if you really wanted to:
> 
>$bar = \$foo;
> 
> Or actually closer to what you seem to want:
> 
>*bar = \$foo;
> 
> Although a little birdy told me we're dropping globs for Perl6.  Don't
> most programmers do assignment for a reason?  Why should we second-guess
> them?

I think what he's thinking (in C terms) would be more like the following:

typedef struct { int length; char *s } string;

// $foo = "xyzzy";
string foo; foo.length = 5; foo.s = strdup("xyzzy---blank---buffer---space---");

// $bar = $foo;
string bar; bar.length = foo.length; bar.s = foo.s;

// $foo .= "xyzzy";
strncpy(foo.s+foo.length,"xyzzy",5); foo.length += 5;

// $foo and $bar share string buffers, but $bar only sees the first 5
// characters while $foo sees the first 10.

I don't see that as quite the same as the implicit references or
type-globs you suggested.

But it's late, and I might not know what I'm talking about...




Re: GC: what is better, reuse or avoid cloning?

2001-02-09 Thread Sam Tregar

On Sat, 10 Feb 2001, Buddha M Buck wrote:

> I think what he's thinking (in C terms) would be more like the following:

Right.  It already has a technical name - copy-on-write.  I should have
made it more clear that I recognized the intended mechanism.  I was trying
to demonstrate that Perl-level mechanisms already existed to do value
aliasing.  I was also trying to show that what he is suggesting is a lot
like aliasing with some simple copy-on-write STORE magic.  For some reason
I thought that by pointing that out I could relieve him of his bizarre
worries about garbage collecting things with references.

I probably should have just gone to bed instead.

-sam





Re: GC: what is better, reuse or avoid cloning?

2001-02-09 Thread Sam Tregar

On Sat, 10 Feb 2001, Branden wrote:

> Suppose I have a string stored in $foo, say, "abcbca", and then I do:
>
> $bar  = $foo;
> $foo .= "xyzyzx";
>
> I see two ways of doing this: one is allowing a string value to be shared by
> two or more variables, and the other one not.

Why would you want to share the string value?  Why did you assign the
value of $foo to $bar if you really wanted to:

   $bar = \$foo;

Or actually closer to what you seem to want:

   *bar = \$foo;

Although a little birdy told me we're dropping globs for Perl6.  Don't
most programmers do assignment for a reason?  Why should we second-guess
them?

> Given mark-and-sweep or other advanced GC, which of them is better? Sharing
> the value or cloning on each assignment?

I don't believe implementing copy-on-write for scalars has anything to do
with garbage collection.  Any garbage collector that will work for Perl
will need to work with references.  All you're suggesting is a
beneath-the-covers reference, right?

What's the point?  You seem to be engaging in some extreme and bizarre
form of premature optimization.

-sam






GC: what is better, reuse or avoid cloning?

2001-02-09 Thread Branden

Back to the GC issue, I was wondering something.

Suppose I have a string stored in $foo, say, "abcbca", and then I do:

$bar  = $foo;
$foo .= "xyzyzx";

I see two ways of doing this: one is allowing a string value to be shared by
two or more variables, and the other one not.

If I allow sharing a string value between two variables, I see that $bar =
$foo only points $bar to the same string $foo is pointing. But when I do
$foo .= "xyzyzx", I cannot use the same buffer that $foo points, because I
don't know if the string pointed by $foo is shared or not (in this case I
can, but consider that the two statements are separated by many others, and
also sub calls; suppose $foo is actually a parameter to a sub that is passed
with its value already shared to another variable). Then I must allocate a
new buffer, copy the buffer of $foo, and do the `concat' operation with the
string "xyzyzx".

If I don't allow sharing a string value between two variables, $foo .=
"xyzyzx" could use $foo's buffer, if it's room in it, because guaranteedly
there will be no other variable pointing to it. However, $bar = $foo needs
to clone the string of $foo, since it cannot be shared between $foo and
$bar.

In this case there would be little problem in both cloning and in not
reusing the buffer, because the strings are short, but imagine $foo has
actually a 10KB string, or even something bigger... And imagine the = or .=
is actually being done in a loop that will potentially be run some hundreds
of times...

Given mark-and-sweep or other advanced GC, which of them is better? Sharing
the value or cloning on each assignment? (If there's no much difference, I'd
keep with sharing the value, since it seems simpler to implement and doesn't
require a clone operator, which can be messy).



I actually see two other approaches to this problem. One is using refcounted
GC. Why? Because if the refcounter is 1, I know this is the only object with
the buffer, no other object is looking at it, and I could use the same
buffer on .= without any problem at all.

Other way would be putting the burden on the compiler. If the compiler sees
something like

foreach $i (1..$big_loop) {
$foo .= ", $i";
}

it would clone $foo's value _before_ entering in the loop. This would assure
$foo's value is not shared with any other variable. This way, $foo's buffer
could be used to do the .= operation, having to create a new one only if
that one overflows.



There's actually a problem with reusing the buffer if the value of $foo has
the `concat' and `destroy' operations overloaded. If I'm reusing $foo's
buffer, on $foo .= "xyzyzx" I would call the (overloaded) concat operation
of $foo's value, telling it to use $foo's buffer if possible. This would
probably invalidade $foo's old value. But then I should call $foo's
`destroy', since $foo's old value is no longer valid, and the overloaded
action of $foo's value should take place. But `destroy' would encounter an
invalid value, or even would try to destroy the data in the new value of
$foo. And not calling `destroy' wouldn't be the right behaviour, since it
can do another kind of cleanup that should actually be necessary.



My call would be `not reusing values at all', but I'm quite unsure what
would be the performance of such a use. Maybe allow the compiler optimizing
version if $foo's value has a simple `destroy' that doesn't need to be
called if the buffer was reused... Or split `destroy' in two, one that would
get called always and the other only when the value is cleaned by the GC...

Any suggestions?

Thanks,

- Branden