Re: [perl #131492] Camelia produces different error message from commandline

2017-06-07 Thread Benjamin Goldberg via RT
On Monday, June 05, 2017 5:05 PM, Will Coleda via RT  wrote:
> On Fri, 02 Jun 2017 23:29:40 -0700, ben-goldb...@hotmail.com wrote:
> > On #perl6 IRC, I typed this:
> >
> >  m: my \foo = Callable but role :: { };
> > <+camelia> rakudo-moar ef9872: OUTPUT: «X::Method::NotFound exception
> > produced no message␤  in block  at  line 1␤␤»
> >
> > If, at my command prompt, I type perl6 –e “my \foo = Callable but role
> > :: { };”
> > I get instead:
> >
> > No such method 'mixin' for invocant of type
> > 'Perl6::Metamodel::ParametricRoleGroupHOW'
> >   in block  at -e line 1
> >
> >
> > Including, oddly, an extra blank line after the error and before my
> > prompt.
>
>
> Please provide the version of rakudo you're running locally so we can rule 
> out any version skew.
>

C:\Users\Ben (user)>perl6 -v
This is Rakudo version 2017.01 built on MoarVM version 2017.01
implementing Perl 6.c.


Re: [perl #131492] Camelia produces different error message from commandline

2017-06-07 Thread Benjamin Goldberg
On Monday, June 05, 2017 5:05 PM, Will Coleda via RT  wrote:
> On Fri, 02 Jun 2017 23:29:40 -0700, ben-goldb...@hotmail.com wrote:
> > On #perl6 IRC, I typed this:
> >
> >  m: my \foo = Callable but role :: { };
> > <+camelia> rakudo-moar ef9872: OUTPUT: «X::Method::NotFound exception
> > produced no message␤  in block  at  line 1␤␤»
> >
> > If, at my command prompt, I type perl6 –e “my \foo = Callable but role
> > :: { };”
> > I get instead:
> >
> > No such method 'mixin' for invocant of type
> > 'Perl6::Metamodel::ParametricRoleGroupHOW'
> >   in block  at -e line 1
> >
> >
> > Including, oddly, an extra blank line after the error and before my
> > prompt.
>
>
> Please provide the version of rakudo you're running locally so we can rule 
> out any version skew.
>

C:\Users\Ben (user)>perl6 -v
This is Rakudo version 2017.01 built on MoarVM version 2017.01
implementing Perl 6.c.



RE: Lessons to learn from ithreads (was: threads?)

2010-10-16 Thread Benjamin Goldberg

 Date: Tue, 12 Oct 2010 23:46:48 +0100
 From: tim.bu...@pobox.com
 To: faw...@gmail.com
 CC: ben-goldb...@hotmail.com; perl6-langu...@perl.org
 Subject: Lessons to learn from ithreads (was: threads?)
 
 On Tue, Oct 12, 2010 at 03:42:00PM +0200, Leon Timmermans wrote:
  On Mon, Oct 11, 2010 at 12:32 AM, Ben Goldberg ben-goldb...@hotmail.com 
  wrote:
   If thread-unsafe subroutines are called, then something like ithreads
   might be used.
  
  For the love of $DEITY, let's please not repeat ithreads!
 
 It's worth remembering that ithreads are far superior to the older
 5005threads model, where multiple threads ran with an interpreter.
 [Shudder]
 
 It's also worth remembering that real O/S level threads are needed to
 work asynchronously with third-party libraries that would block.
 Database client libraries that don't offer async support are an
 obvious example.
 
 I definitely agree that threads should not be the dominant form of
 concurrency, and I'm certainly no fan of working with O/S threads.
 They do, however, have an important role and can't be ignored.
 
 So I'd like to use this sub-thread to try to identify when lessons we
 can learn from ithreads. My initial thoughts are:
 
 - Don't clone a live interpreter.
 Start a new thread with a fresh interpreter.
 
 - Don't try to share mutable data or data structures.
 Use message passing and serialization.
 
 Tim.

If the starting subroutine for a thread is reentrant, then no message passing 
is needed,and the only serialization that might be needed is for the initial 
arguments and for thereturn values (which will be gotten by the main thread via 
join).
As for starting a new thread in a fresh interpreter, I think that it might be 
necessary topopulate that fresh interpreter with (copies of) data which is 
reachable from thesubroutine that the thread calls... reachability can probably 
be identified by usingthe same technique the garbage collector uses.  This 
would provide an effect similar toithreads, but only copying what's really 
needed.
To minimize copying, we would only treat things as reachable when we have to -- 
forexample, if there's no eval-string used a given sub, then the sub only 
reaches thosescopes (lexical and global) which it actually uses, not every 
scope that it could use.
  

Re: Junctions Set Theory

2003-09-02 Thread Benjamin Goldberg
Luke Palmer wrote:
 
 Wow, what an old thread...
 
 Jonadab the Unsightly One writes:
  Abhijit A. Mahabal [EMAIL PROTECTED] writes:
 
   On the other hand, if you wanted to say true for all except exactly
   one value, I can't think of a way.
 
  Easy.  The following two statements are equivalent:
 
  F(x) is true for all but exactly one x
  (not F(x)) is true for exactly one x
 
  The only additional possibility that can't be phrased in terms of the
  four given is the complex generalised case:
 
  F(x) is true for at least n but not more than m values of x
 
 Oh, I wouldn't say that
 
 use HypotheticalModules::List::Combinations 'combinations';
 
 sub between($low, $high, [EMAIL PROTECTED]) {
 any(
 $low..$high == map - $num {
 any( map { all(@$_) } combinations($num, @values) )
 }
 )
 }
 
 Efficient?  No.  Blows out memory for (20, 30, 1..50)?  Yes.
 Works?  Yes.
 
 We also have:
 
 class Betwunction is Junction {
 submethod BUILD($.low, $.high, @.values) { }

 method evaluate() {
 my Int $n = grep { ? $_ } @.values;
 $.low = $n = $.high;
 }
 }
 sub between($low, $high, @values) {
 new Betwunction: $low, $high, @values;
 }
 
 Or something.  But I guess this one breaks the constraint of in terms
 of the four given.

Also, it tests more values for truth than it has to.

If there are fewer elements in the array than $.low, then obviously, we
should fail immediately.  Actually I should say, elements remaining,
instead of just elements, and should say $.low - the number of true
elements so far instead of $.low.  In otherwords, this bit of short-
circuiting could be done on every iteration, so as to test fewer
elements of the array for truth, if we can.

If there are fewer elements in the array than $.high, then (assuming we
already know that there are at least $.low elements) we can succeed
immediately.  Likewise, this short-circuiting can be done on every
iteration.

   method evaluate() {
  my Iterator $i = Iterator.new($.values);
  my Int $need = $.low;
  while( $need  0 ) {
 return false if $i  $need;
 --$need if ? $i.shift;
  }
  my Int $limit = $.high - $.low;
  while( $limit = 0 ) {
 return true if $i  $limit;
 --$limit if ? $i.shift;
  }
  return false;
   }

This *should* do no more boolean tests on the elements of $.values than
absolutely necessary.

  So for example you could have a list of fifty values for x and test
  whether the condition is true for at least ten but not more than
  fourty of them.  (Or, x could be the condition; you could have a list
  of fifty conditions and test whether between twenty and thirty of them
  were true.)  My guess is, however, that the frequency with which
  anyone would use such a capability would not be overwhelming.
 
  It would be great for obfuscation, though, particularly if some of the
  conditions had side effects with an impact on the value of the other
  conditions to be tested...  That would be sufficiently interesting
  that it's almost a shame I can't think of a real reason to request
  such a feature, since I rather doubt anyone's going to much fancy
  implementing it just for obfuscatory value ;-)
 
 Who needs reasons?

It's at least as useful as the proposed fortytwo op for parrot. :)

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: [RfC] vtable-dump

2003-09-02 Thread Benjamin Goldberg
Leopold Toetsch wrote:
[snip]
 [1] when we want to thaw/clone big structures, we should have some means
 to estimate the amount of needed headers. If we will not have enough, we
 do a DOD run before clone/thaw and then turn DOD off - it will not yield
 any more free headers anyway. This can avoid a couple of DOD runs that
 do just nothing except burning a lot of cycles and massive cache
 pollution.
 To achieve this, we might call aggregates.elements() first by means of
 the iterator again or with some depth restriction and returning, when we
 reach the free-header limit.

Even with a depth restriction, a recursive estimate can produce
misleading results due to circular references.  Only actually walking
the structure can get the number right.  However, walking the structure
*twice* would be silly.  So, begin with an estimate of unknown
(serialize an integer -1), and then, after the whole thing has been
frozen, we seek backwards (if that's possible) and replace that
unknown with the actual number of pmcs that were serialized.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


PIO Questions.

2003-09-02 Thread Benjamin Goldberg

I'm looking for, but not finding, information regarding the character
type and encoding on parrot io objects.

As an example of why... I found this in io.ops :

op write(in PMC) {
  PMC *p = $1;
  STRING *s = (VTABLE_get_string(interpreter, p));
  if (s) {
PIO_write(interpreter, PIO_STDOUT(interpreter),
  s-strstart, s-bufused);
  }
  goto NEXT();
}

Surely, blinding writing the bytes that are in a string, without
checking that the string's encoding and type match that of the stream,
is wrong.

I would expect something like:

op write(in PMC) {
   PMC *p = $1;
   STRING *s = VTABLE_get_string(interpreter, p);
   if( s ) {
  PMC *o = PIO_STDOUT(interpreter);
  string_transcode(interpreter, s,
 PIO_encoding(interpreter, o),
 PIO_chartype(interpreter, o), s);
  PIO_write(interpreter, o, s-strstart, s-bufused);
   }
   goto NEXT();
}

Except that I can't seem to find any PIO_encoding and PIO_chartype
functions.

#

Actually... I think that the op print(in PMC) and write(in PMC) are
designed wrong.  Instead of asking for a string, and printing that
string, they should call a print_self and/or write_self vtable method on
the PMC, passing in the PIO object.  In turn, the default
implementations of those methods should print or write the results of
DYNSELF.get_string() to that PIO object.

This way, a PMC whose string representation is very large doesn't need
to serialize itself to a string before it can be printed -- it can print
itself directly to the PIO object, thus avoiding allocating memory for
that big string, and probably lots of copying.

To avoid loss of synchronization between the get_string form of a pmc
and the print_self/write_self form of a pmc, one should be able to
define a string's get_string as creating a new stringstream PIO object,
printing itself to that stream, and returning the corresponding string. 
However, there doesn't (yet) seem to be a stringstream layer.  When do
we expect to have one?

#

PIO_putps converts to a cstring, then calls PIO_puts.

Since PIO_puts doesn't take a length, obviously it must be determining
the length of the string based on the presence of a nul character in
it.  Thus, we cannot use PIO_puts to print binary data.  This means that
PIO_write must be used.  Since there's no op write(in STR), the only way
to do it from parrot is to create a PerlString, store our Sreg into it,
then call write.  Blech.

#

Shouldn't everything in io_unix.c (except for pio_unix_layer) be
static?  This isn't just about namespace pollution -- it slows down
linking and dynamic loading.  I think.

Hmm, the same applies to the other io_foo.c files.

#

Why does PIO_unix_seek clear errno before calling lseek?

#

Why does PIO_unix_tell have a temp variable pos?

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: serialisation (was Re: [RfC] vtable-dump)

2003-09-01 Thread Benjamin Goldberg


Nicholas Clark wrote:
 
 On Sat, Aug 30, 2003 at 10:13:02PM -0400, Benjamin Goldberg wrote:
  Nicholas Clark wrote:
 
   The attacker can craft a bogus CGITempFile object that refers to any
   file on the system, and when this object is destroyed it will attempt to
   delete that file at whatever privilege level the CGI runs at. And
   because that object is getting destroyed inside the deserialise routine
   of Storable, this all happens without the user written code getting any
   chance to inspect the data. And even Storable can't do anything about
   it, because by the time it encounters the repeated hash key, it has
   already deserialised this time-bomb. How does it defuse it?
 
  The simplest solution *I* can think of, is to have storable copy the
  taint
  flag from the input string/stream onto every single string that it
  produces.
 
  Taint checking doesn't solve *all* security problems, of course, but it
  can catch many of them, and it certainly would catch this one (if $$self
  were tainted).
 
 I don't believe that it would. A quick test suggests that in perl5
 destructors get to run after a taint violation is detected:
 
 $ perl -Tle 'sub DESTROY {print We still get to run arbitrary code after taint 
 violations}; bless ($a = []); `cat /etc/motd`'
 Insecure $ENV{PATH} while running with -T switch at -e line 1.
 We still get to run arbitrary code after taint violations
 $

That wasn't what I meant.

The taint violation in your example does *not* correspond to Storable
turning on the taint flag in the strings it produces.  At best, it
corresponds to Storable dieing due to a malformed input file, resulting
in destructors being called.

If sub DESTROY tries to unlink a file, and that filename is a tainted
string, then the DESTROY will die.

Currently, Storable *doesn't* turn on the taint flag in the SVPVs that it
produces; because of this, $$self in CGITempFile::DESTORY isn't tainted.

If $$self in CGITempFile::DESTROY *were* tainted, then obviously that
DESTROY would die, and the file wouldn't get unlinked.  Thus, we would
avoid that security hole.

  One defense against following a bomb with malformed data, might be to
  have Storable save up the SV*s and the names with which to bless them,
  and only do the blessing *after* the data is fully deserialized, as a
  last step before returning it to the user.  This way, if there's
  malformed data, no destructors get called.  The user still needs to
  validate the returned data, though, and rebless anything which might
  result in an evil destructor being called.
 
  Another defense is to run deserialization and validation inside of a
  Safe object.  Make sure that if the object fails to validate, it is
  completely destructed before we exit the Safe compartment.
 
 I think that these could work well.
 
  For backwards compatibility with perl5, parrot will quite likely support
  taint checking, safe.pm, and ops.pm.
 
 I don't see why support is needed at core parrot level. I believe that
 the plan is to provide XS-level Perl 5 compatibility via ponie, in which
 case much of this stuff would be up to ponie, not core parrot.
 
 Tainting sounds to me like the sort of things that would be a property on
 a scalar, checked by custom ponie ops, which would be as parrot core as
 python's ops. A brief inspection suggests that ops.pm is entirely a parser
 level thing, so I don't see the need for core parrot support there.
 
 Safe.pm as implemented is an unreliable mess. It's also problematic as
 it describes actions solely in terms of perl 5 ops. I've no idea quite
 how close Arthur, Rafael and the other ponie conspirators think that
 ponie-generated parrot bytecode will be to the perl 5 optree structure,
 but I wouldn't like to place any bets right now. I think that safe
 execution compartments are part of Dan's plan, but I'm not sure if
 anyone yet knows how any of this fits together (even Dan or Arthur)
 
 Nicholas Clark

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: [RfC] vtable-dump

2003-09-01 Thread Benjamin Goldberg
Leopold Toetsch wrote:
 
 Benjamin Goldberg [EMAIL PROTECTED] wrote:
 class freezer {
 class thawer {
 class cloner {
 
 [ big snip ]
 
 Do you expect that these are overridden by some languages using parrot?
 I.e. that ponie tries to implement a freezer that writes output for
 Storable?

I'm not entirely sure...

For ponie to implement something that deals with Storable, then of
course it needs to output data in the same file format as Storable
does.  I haven't looked at the guts of storable, so I don't know what
that format is.

 Further: having clone implemented in terms of freeze + thaw needs
 additional memory for the intermediate frozen image. Isn't that
 suboptimal?

Only slightly -- It's just *one single* PMC's data that's stored in that
additional memory.

And if we have a seperate clone vtable method, then there's a chance
that our cloning procedure will drift apart from our freezing/thawing
procedure.

 A general traverse routine or iterator seems to be more flxible.
 
 leo

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}



Re: [RfC] vtable-dump

2003-08-31 Thread Benjamin Goldberg
Leopold Toetsch wrote:
 
 Benjamin Goldberg [EMAIL PROTECTED] wrote:
  Actually, I think the following interface would be better:
 
 void freeze(PMC *freezer);
 void thaw  (PMC *thawer);
 
 I'm thinking of (in horrible pseudo code ;-):
[snip]
 Above functionality does IMHO match your description. The only
 difference is, that I'd like to use that not only for freeze/thaw, but
 for clone and dump too.

Well, I wasn't precisely thinking of my interface being *soley* for
freeze/thaw, but for clone and dump, too.

Cloning would be like freezing followed immediately by thawing, without
writing to disk.

That is, I was thinking of something like the following psuedocode.

   class freezer {
  fifoPMC* queue;
  setINTVAL seen;
  outputer o;
  void push_integer(INTVAL i) { o.print_int(i); }
  void push_number(NUMVAL n) { o.print_num(n); }
  void push_string(STRING*s) { o.print_str(s); }
  void push_bigint(BIGINT*b) { o.print_big(b); }
  void push_pmc(PMC *p) {
 INTVAL i = (INTVAL)p;
 o.print_int(i);
 if( !seen.contains(i) ) { seen.add(i); queue.push(p); }
  }
   }
   void do_freeze(PMC *p, outputer o) {
  PMC * f = new_freezer_dumper(o);
  f.push_pmc(p);
  while( f.queue.notempty() ) {
 p = f.queue.pop();
 o.print_int( p.getclass() );
 p.freeze(f);
  }
   }

   class thawer {
  fifoPMC* queue;
  mapINTVAL, PMC* prep;
  inputter i;
  INTVAL shift_integer() { return i.read_int(); }
  NUMVAL shift_number() { return i.read_num(); }
  STRING*shift_string() { return i.read_str(); }
  BIGNUM*shift_bignum() { return i.read_big(); }
  void shift_pmc() {
 INTVAL j = i.read_int();
 if( prep.contains(j) ) { return prep.get(j); }
 PMC * n = new_placeholder();
 prep.put(j, n);
 queue.push(n);
 return n;
 }
   }
   PMC* do_thaw(inputter i) {
  PMC * t = new_thawer(i);
  PMC * ret = t.shift_pmc();
  PMC * p = ret;
  do {
 p.setclass( i.read_int() );
 p.thaw(t);
 p = t.queue.pop();
  } while(p);
  return ret;
   }

   class cloner {
  fifopairPMC*,PMC*  queue;
  mapPMC*,PMC* prep;
  fifoPMC* tempdata;
  void push_integer(INTVAL i) { tempdata.push_integer(i); }
  void push_number (NUMVAL n) { tempdata.push_number (n); }
  void push_string (STRING*s) { tempdata.push_string (s); }
  void push_bigint (BIGINT*b) { tempdata.push_bigint (b); }
  void push_pmc(PMC *p) {
 if( prep.contains(p) ) {
tempdata.push_pmc(prep.get(p));
 } else {
PMC *p2 = new_placeholder();
prep.put(p, p2);
queue.push( pair(p, p2) );
tempdata.push_pmc(p2);
 }
  }
  INTVAL shift_integer() { return tempdata.shift_integer(); }
  NUMVAL shift_number () { return tempdata.shift_number (); }
  STRING*shift_string () { return tempdata.shift_string (); }
  BIGINT*shift_bigint () { return tempdata.shift_bigint (); }
  PMC*   shift_pmc() { return tempdata.shift_pmc(); }
   }
   PMC * do_clone(PMC *p) {
  PMC *c = new_cloner();
  PMC *r = new_placeholder();
  c.prep.put(p, r);
  p.freeze(c);
  r.setclass( p.getclass() );
  r.thaw(c);
  assert( c.tempdata.isempty() );
  while( c.queue.notempty() ) ) {
 pairPMC*,PMC* x = c.queue.pop();
 x.first.freeze(c);
 x.second.setclass( x.first.getclass() );
 x.second.thaw(c);
 assert( c.tempdata.isempty() );
  }
  return r;
   }

I've ommited the psuedocode for dumping/pretty-printing since I'm not
sure how best it would be done.  It does require quite a bit more
information than freezing (since the computer can tell when/where a pmc
class starts and ends, but a human wouldn't be able to), so merely
calling do_freeze() with an outputer which sprintfs intvals and numvals
nicely wouldn't be sufficient.

However, if you replaced the o.print_int inside of push_pmc with a
o.print_pmc_pointer, and o.print_int in the main loop with
o.print_pmc_class, then the outputter *would* have enough info to make
something vaguely human-readable.
 
-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: serialisation (was Re: [RfC] vtable-dump)

2003-08-31 Thread Benjamin Goldberg
Nicholas Clark wrote:
 
 On Fri, Aug 29, 2003 at 05:30:37PM +0200, Leopold Toetsch wrote:
 I think, we need a general solution for freeze, dump and clone. As
 shown
 
 I don't know if this is relevant here, but I'll mention it in case.
 For perl5 there isn't a single good generic clone system. Probably the
 best (in terms of quality of what it can cope with, if not speed) is
 Storable.
 Storable provides a dclone method which internally serialises the passed
 in reference, and then deserialises a new reference. This is a lot of
 extra effort, but it does work. Obviously this fail's parrots aim -
 speed.
 
 I don't know whether this is relevant either:
 The internal representation of Storable's serialisation is stream of
 tagged data. There was a discussion on p5p a while back about whether it
 would be possible to make a callback interface to the tags, so that
 people could write custom deserialisers (eg vetting the data as it came
 in) This sort of interface to hooking deserialisation would be useful.
 Likewise being able to hook into the traversal/serialisation interface
 to provide custom output (eg XML when we're serialising as standard to
 compressed YAML) would save people having to reinvent
 traversal/introspection routines. (eg perl as both Data::Dumper and
 Storable in core doing separate traversal routines)
 
 I suspect that this is relevant:
 You can't trust you data deserialiser. It can do evil on you before it
 returns.
 
 I forget who worked this attack out, but Jonathan Stowe presented a talk
 on it at YAPC::EU in Amsterdam. The specific attack form on Storable is
 as follows, but it should generalise to any system capable of
 deserialising objects:
 
 You create serialisation which holds a hash. The hash happens to have 2
 identical keys (the serialisation format does not prevent this)
 The second key is innocent. When Storable writes this key into the hash
 it's deserialising, perl's hash API automatically frees any previous
 value for that key. This is how hashes are supposed to work.
 
 Obviously while deserialising there should never have been a previous
 key (a legitimate file generated from a real hash could not have had
 repeating keys) The problem is that things like perl let you
 (de)serialise objects, and objects have destructors that can run
 arbitrary code. So your attacker puts an object as the value associated
 with the first copy of the key which is an object with attributes that
 cause it to do something interesting. The suggestion for attacking a
 Perl 5 CGI was to use a CGITempFile object. The destructor looks like
 this:
 
 sub DESTROY {
 my($self) = @_;
 unlink $$self;  # get rid of the file
 }
 
 The attacker can craft a bogus CGITempFile object that refers to any
 file on the system, and when this object is destroyed it will attempt to
 delete that file at whatever privilege level the CGI runs at. And
 because that object is getting destroyed inside the deserialise routine
 of Storable, this all happens without the user written code getting any
 chance to inspect the data. And even Storable can't do anything about
 it, because by the time it encounters the repeated hash key, it has
 already deserialised this time-bomb. How does it defuse it?

The simplest solution *I* can think of, is to have storable copy the
taint
flag from the input string/stream onto every single string that it
produces.

Taint checking doesn't solve *all* security problems, of course, but it
can catch many of them, and it certainly would catch this one (if $$self
were tainted).

 Simple solutions such as checking the hash keys are unique don't work
 either. All the attacker does then increase the abstraction slightly.
 Put the time-bomb as the first element in an array. Put one of these
 bogus hashes as the second object. Fine, you can realise that you've
 got bad keys in the bogus hash and never build it. But at this
 point the time-bomb object already exists. You'd have to validate the
 entire serialised stream before continuing. And if deserialisers for
 objects are allowed to fail, then you're still stuffed, because the
 attacker then crafts a time-bomb object, and a second malformed object
 that is known to cause its class's deserialiser to fail.

One defense against following a bomb with malformed data, might be to
have Storable save up the SV*s and the names with which to bless them,
and only do the blessing *after* the data is fully deserialized, as a
last step before returning it to the user.  This way, if there's
malformed data, no destructors get called.  The user still needs to
validate the returned data, though, and rebless anything which might
result in an evil destructor being called.

Another defense is to run deserialization and validation inside of a
Safe object.  Make sure that if the object fails to validate, it is
completely destructed before we exit the Safe compartment.

These defenses still aren't perfect:  One could embed a time-bomb inside
of a 

Re: lvalue cast warnings

2003-08-31 Thread Benjamin Goldberg


Leopold Toetsch wrote:
 
 Nicholas Clark [EMAIL PROTECTED] wrote:
  -#define PMC_sub(pmc) ((parrot_sub_t)((pmc)-cache.pmc_val))
  +#define PMC_sub(pmc) (*((parrot_sub_t *)((pmc)-cache.pmc_val)))
 
 This seems to work. Thanks for the patch.
 (the tcc tinderbox seems to be missing a make realclean/Configure or
 such though)

Hmm, maybe we should define a new macro, to make this easier in the
future:

#ifdef __gcc
#define LVALUE_CAST(type, val) ((type)(val))
#else
#define LVALUE_CAST(type, val) (*((type *)(val)))
#endif

Then, PMC_sub becomes:

#define PMC_sub(pmc) LVALUE_CAST( parrot_sub_t, (pmc)-cache.pmc_val )

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: [RfC] vtable-dump

2003-08-30 Thread Benjamin Goldberg
Dan Sugalski wrote:
 
 On Fri, 29 Aug 2003, Leopold Toetsch wrote:
 
 Dan Sugalski [EMAIL PROTECTED] wrote:
 I think we'd be better served getting the freeze/thaw stuff in and

 We were just discussing this in the f'up.
 
 I read those, but I wanted to make sure the discussion went this way. :)
 
 If we make the initial dump scheme something mildly human readable
 (XML, YAML, whatever) then the debugger doesn't have to worry about
 formatting it for a while. (And we're going to want a pluggable dump
 scheme anyway, to make experimentation eaiser)

 Pluggable implies a dump() vtable method, doesn't it?
 
 Nope. Pluggable implies freezethaw.c provides all the functionality to
 freeze or thaw a PMC, with code in there to handle the right final
 encoding or decoding of the data.
 
 I thought we had a freeze and thaw entry in the vtables, but I see not.
 The signatures (since I don't have source handy to check them in) should
 be:
 
   STRING *freeze()
   PMC* thaw(STRING*)
 
 With thaw ignoring its initial PMC parameter. (It's a class method,
 essentially)

Actually, I think the following interface would be better:

   void freeze(PMC *freezer);
   void thaw  (PMC *thawer);

When you freeze a pmc, it's passed a handle into the freezing mechansim,
which it can then use to record the data that will be necessary to
reconstitute itself later.  This is done using the VTABLE_push_type
methods.

Pushing a pmc does *not* instantly result in that other pmc being
frozen;
that only happens *after* the freeze() function returns.  In effect,
it marks the other pmc as needing serialization, which then happens
later.

During thawing, a pmc header is created for you, morphed into your
class, and then has VTABLE_thaw called on it.  It can then reconsititute
itself using VTABLE_shift_type on the handle to the thawer that's been
passed in to it.

Just as serialization of external pmcs in freeze wasn't immediate, any
pmc which you shift_ out of the Thawer object might not be fully
thawed... it might be a pmc header which has been created as a
placeholder, and which has not had VTABLE_thaw called on it yet.

Freezing and thawing would have algorithms similar to the mark-and-sweep
of the DoD (but simple fifos, or simple stacks, not any kind of priority
queue;)).

I'm not *entirely* sure, but I think we might need/want to allow
STRING*s returned by shift_ to be placeholders too.  This allow the
freezer to print out the string data after the pmcs if it choses to.  If
it does, it can perhaps write the strings in a more compact manner. 
Almost certainly moving the strings to the end would make compression by
gzip or whatever easier.

(I vaguely recall hearing that the following problem is NP-hard, but I
don't remember.  Does anyone know for sure?  Given a set of N strings,
construct a single larger string, of which all N strings are
substrings.  This allows all N strings to be represented on disk using
nothing more than the large string, and an offset and length for each
substring.)

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: [RfC] constant PMCs and classes

2003-08-29 Thread Benjamin Goldberg


Leopold Toetsch wrote:
 
 We currently have constant Key and Sub PMCs both created from the
 packfile at load time. They live in the constant_table pointing to a
 constant PMC pool. But we need more.
 
 We have allover the core code like this:
 
string_from_cstring(interpreter, pIt, 0)

Don't do that.  Instead, do:

 string_from_cstring(interpreter, pIt, PObj_external_FLAG);

This will allow parrot to avoid making a copy of pIt.

Actually, IMHO, we should disallow (both for this, and for string_make)
a simple default of 0 for 'flags'.  Instead, I think we should require
that the programmer explicitly indicate exactly what the pointer we're
passing in is.  It could be any of:

   A buffer which might be reused by something else, so we should
allocate space, and copy the data into it.
   A const static buffer which won't (can't) ever change.  We can
capture it and use it, but shouldn't [ever] try altering it.
   An external buffer which won't change on us, but which is not
constant memory ... like non-const static memory, or perhaps a string
from a program in which parrot is embedded.  We can freely use it
(without copying it) and even change it, but growing it requires that we
allocate from our own arenas.  Also, when we no longer need it, just let
go of it.
   A buffer allocated by mem_sys_allocate, which the calling function
wants to *give up* to the string.  We can use it directly without
copying it, reallocate it as needed, and free it to our memory arenas
when we're done with it.  Indeed, by indicating this, it would now be
the string's responsibility to free up the memory when it's gone,
instead of being the caller's responsibility.
   If there are any other scenarios, I'm not aware of them.

The majority of calls from user-written C code will be of type 2: A
constant static buffer which won't (can't) ever change (like in your
example).  Some user-land code will be type 1 (for example, turning the
result of inet_ntoa into a string).

I'm not sure if type 3 has a use -- one can claim that the string is
type 1, without harm.

There wouldn't be too many times that type 4 would need to be used,
except as an optomization (change places where we make a STRING then
free the memory that the cstring was in, into making a STRING and giving
up ownership of the cstring-pointer to the STRING).  As I recall, there
are places in perl5 where buffers are stolen instead of copied... this
would be the same sort of thing.

Strings whose buffers are of type 2 would have a big advantage with
respect to COW: nothing needs to be done to make them COW (we copy the
pointer, and copy the flags), and nothing needs to be done to make them
non-COW.

In addition, I think we need *another* argument: One to tell what type
of STRING* object we want back.  In particular, we might want back a
constant one, for which appending or truncating is illegal.  In addition
to that, we might (or might not) be willing to accept a STRING which
isn't newly allocated, but was cached from an earlier call.

key = key_new_cstring(interpreter, _message);

IMHO, key_new_cstring should allow a flags argument, so that we can
pass in either a PObj_external_FLAG or 0; otherwise, it won't know
whether or not it needs to make a copy.

 to create some STRINGs or entries in hashes. The keys should be constant
 PMCs and they should be shared as well as the STRINGs.
 We need this in objects.c (\0\0anonymous), for setting standard
 property names internally and for the current hash based implementation
 of Exceptions.
 
 This leads to my proposal:
 * const_string_from_cstring - return cached constant string or create
   one

Instead of a new entry point, how about adding whatever behavior you
want here to be an added behavior of string_from_cstring, dependent on
what flags were passed as a third argument?

 * const_key_new_cstring - return cached PMC or create a constant PMC

If 

 * constant_pmc_new* - ditto (called from above)
 
 Further we would need for some classes a Const$Class variant, where the
 set-like vtables throw an exception. These classes should be
 autogenerated from *.pmc for all classes that have a const_too or such
 in their classes $flags.
 
 This leads to changes in parsing the vtable.tbl - which we need anyway
 to do the proposed var/value split of vtables.
 
 e.g.
 [FETCH]
 get_integer
 ...
 [STORE]
 set_integer
 ...
 [PUSH]
 push_integer
 ...
 and so on.
 The section names conform (where applicable) to methods described in
 Tie::*(3pm).



-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: [RfC] constant PMCs and classes

2003-08-29 Thread Benjamin Goldberg


Juergen Boemmels wrote:
 
 Leopold Toetsch [EMAIL PROTECTED] writes:
 
  Benjamin Goldberg [EMAIL PROTECTED] wrote:
 
   #define PARROT_DECLARE_STATIC_STRING(name, cstring) \
 
  [ big snip ]
 
  While Juergen's original or your proposal are fine, they don't work
  (or not as proposed). First there are compiler issues:
 
[snip]
  The declaration looks ok at first sight, my gcc 2.95.2 might be wrong, but
  anyway, a bigger problem is here:
 
   PIO_printf(interpreter, %S\n, mystr);
 
  Program received signal SIGSEGV, Segmentation fault.
  0x400f8725 in make_COW_reference (interpreter=0x8049828, s=0x80486e0)
  at string.c:95
  95  PObj_constant_CLEAR(s);
  (gdb) bac
  #0  0x400f8725 in make_COW_reference (interpreter=0x8049828, s=0x80486e0)
  at string.c:95
  #1  0x400f91b6 in string_copy (interpreter=0x8049828, s=0x80486e0)
  at string.c:497
 
 Ah it seems that make_COW_reference wants to change a constant
 string. As my code puts the static strings in .rodata (this was one of
 the targets of my experiments) it is not possible in any way to change
 this item.

The problem is that as far as the COW logic knows, only the *buffer* is
constant.  In this (peciliar?) case, both the string buffer and the
string header are constant.

 I even think PObj_constant_CLEAR(s) is evil. One reason for
 setting a PObj_constant_FLAG is to declare that an object will not
 change. Unsetting this flag means breaking this contract.

It's clearing the constant flag, then making the copy, then turning the
constant flag back on.  If it were only the buffer that were in const
memory, and not the header, this would be perfectly fine.  I think.

  snippet from objdump:
 
   .rodata0011  _Parrot_static_mystr_cstring.15
   .rodata0024  _Parrot_static_mystr_STRING.16
 
  If we get a general compiler independend solution for declaring static
  STRINGs we still have nothing for static keys.
 
 The main problem of all this code is the union initialiser. Static
 keys could also created with an initialisier, but i think this needs
 an other union-val to be initialized. There are gcc extension to init
 arbitary members, but ANSI-C allows only the initializiation of the
 first member.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: What the heck is active data?

2003-08-29 Thread Benjamin Goldberg
Dan Sugalski wrote:
 
 I've talked about this before and generally I've assumed that people
 know what I'm talking about, but that's not true anymore, so an
 explanation of this is in order.
 
 Active Data is data that takes some active role in its
 use--reading, writing, modifying, deleting, or using in some activity.
 
 As a point of reference, data in the C sense, ints, floats, pointers,
 and so forth, are passive. It doesn't matter what's in them, your
 program has complete control and it doesn't matter what data is in
 those variables, things always behave the same. What's nice here is
 that the compiler can predict, soley from the code, what will happen
 when an operation occurs. It may lose track of the referents to data
 (if, for example, a pointer escapes the code visible to the compiler)
 but everything's nicely static, and the only thing actually *doing*
 anything is the code.
 
 Active data is data that in some way affects program behavior
 independent (or semi-independent) of the code being executed.
 
 The common case, one most folks are familiar with (if they're
 familiar with this stuff at all) is operator overloading. Addition
 may not really be addition, if one or the other of the data in the
 operation overloads the addition operator. (Whether both sides, or
 just the left, is checked, and whether the types of both sides
 matters depends on the language) The compiler can't necessarily tell
 whether addition is really addition any more--it might really be
 subtraction, or multiplication, or some bizarre way to encrypt the
 contents of your hard drive.

Be a bit verbose in saying why this is so.  Something like:

The compiler can always tell that + between variables known to be
numeric types will always be addition; further, it can always tell that
+ on variables known to be objects with an overloaded + will always
be a subroutine call.  What it *can't* always tell, is whether +
between two variables whose types it doesn't know will be addition or a
subroutine call.

A language like C++ never has to deal with the uncertainty of the latter
situation, since it always knows the types of the variables.

In a language like perl5, the types of the variables aren't ever known,
so it generally can't *know* whether + will be addition or an
overloaded operator.  (In few cases where it *can* know that it's
numeric addition, then it can usually constant fold.  After constant
folding, we're left back in the ambigous case).

 In some languages the compiler can at least make some predictions of
 what operations will do, but we're not really interested in those
 languages.

Why not?  You mean we won't be implementing them?

 Or, rather, it doesn't matter if we are or not, since
 perl, python, and ruby are all untyped so there's not much to be
 done, and full type inferencing's a halting-problem sort of thing.
 (Though a useful subset of many programs can be analyzed enough to do
 some optimization)

If it's a halting-problem sort of thing, then how does smalltalk work?

 Perl, at least, and because of it Parrot, goes one step further. Not
 only are operations overloaded, but assignment is overloaded.

In C++, overloading = overloads assignment.

In perl, overloading = doesn't overload assignment (it's still just a
pointer copy).  Instead, it overloads how perl reacts to doing an
overloaded mutating operation (+=, -=, *=, /=, .=, etc.) on an object
whose refcount is  1.  (Obviously, that refcount only *got* to a number
 1 due to an assignment.  But it's the later overloaded operation, not the assignment 
 itself, which triggers the subroutine which we passed to 'use overload = = ...). 
  The overloaded = is used to make a clone of the original object, and that clone 
 is put into the variable... thus, your += (or whatever) changes the clone, not any 
 of the other variables which formerly shared that pointer with it.

I suspect that in perl6, overloading = will overload assignment.

 If you're coming from a pure OO sort of language where assignment is
 just the association of a name and pointer to an object, this may
 seem kind of strange, but it is really useful.
 
 The canonical example is associating a hash with a database.

*Blink*.  Now we're switching discussion from overloading to tieing.

Or were we really talking about tieing, and I was confused and thinking
of overloading?

 Reading or writing from the hash does a lookup in the backing database
 and reads or writes from it. A handy thing, with the variable getting in
 the way of reads or writes to itself. However, you can do this with
 plain scalar variables.
 
 For example, if you had code:
 
 int Foo;
 Dog Bar = new();
 Foo = Bar;
 
 you wouldn't want there to be a Dog in Foo--you've declared it an
 integer, so it can only hold an integer. The assignment can't do a
 plain pointer copy the way it would in the case where both sides can
 hold objects.

Indeed ... the compiler needs to detect that a conversion is necessary.


Re: What the heck is active data?

2003-08-29 Thread Benjamin Goldberg
Piers Cawley wrote:
 
 Benjamin Goldberg [EMAIL PROTECTED] writes:
[snip]
 Reading or writing from the hash does a lookup in the backing database
 and reads or writes from it. A handy thing, with the variable getting
 in the way of reads or writes to itself. However, you can do this with
 plain scalar variables.

 For example, if you had code:

 int Foo;
 Dog Bar = new();
 Foo = Bar;

 you wouldn't want there to be a Dog in Foo--you've declared it an
 integer, so it can only hold an integer. The assignment can't do a
 plain pointer copy the way it would in the case where both sides can
 hold objects.

 Indeed ... the compiler needs to detect that a conversion is necessary.

 Fortunatly, since we've told the compiler the types of the variables,
 it can do that.
 
 How about this then:
 
sub wibble($arg is rw) {
   $arg = Dog.new;
}
 
my int $foo;
my Cat $bar;
 
wibble(rand  0.5 ?? $foo :: $bar);

 The compiler does not and cannot know the type of thing passed
 into wibble until runtime,

It knows that it's one(Cat, int).  (OT for this thread, but I think
perl6 ought to have a oneof alias for the one superposition).

 so assignment must be handled by the target thing. Yes, this is a
 pathological case, but it is a case that must be handled.

But the target pmc doesn't know what kind of variable it's in!

Consider: At what point should the following code die?

   sub wibble($arg is rw) {
  $arg = Dog.new;
   }
   sub wobble(Mammal $arg is rw) {
  my Mammal $m = $arg;
  wibble($m);
  wibble($arg);
   }
   my Mammal $foo;
   my Cat $bar;
   wobble( int(rand(2)) ?? $foo :: $bar );

Should wibble($m) always work without dieing?  What about wibble($arg)?

Does wibble($m) cause $m to contain a new pmc (does the assignment
inside of wibble have set semantics to the containing variable), or does
it cause the existing pmc to be changed (does the assignment inside
wibble have assign semantics to the passed in pmc)?

  As an alternate example, try this:
 
   TempMeter kitchentemp = new('kitchenroom');
   TempMeter bedroomtemp = new('bedroom');
   bedroomtemp = kitchentemp;
 
  Where in this case kitchentemp *is* an object, but one of type
  TempMeter, and bound to an object that represents the temperature in
  your kitchen, while bedroomtemp is a TempMeter object that represents
  the temperature in your bedroom. (Code I'd not recommend using in CGI
  programs) The assignment does *not* bind the bedroomtemp name to
  the object for the kitchen temp, rather it sets the bedroom
  temperature to be the same as that in the kitchen. Even being an
  object, the object's methods still intercept read and write requests
  and instead of assignment being rebinding, it's instead a get or set
  call on the object.
 
  Again, we've told the compiler the types, so it can do that.
 
  The part that affects us is that we can't tell at compiletime whether
  assignment is rebinding or is a get/set, because we could have code
  like:
 
   kitchentemp = new('kitchenroom');
   bedroomtemp = new('bedroom');
   bedroomtemp = kitchentemp;
 
  which is typeless,
 
  Actually, we *could* try to infer the types of the variables, based
  on our knowledge of the types of the values assigned to them.  I'll
  ignore that, and pretend that we don't know the return type of new()
  at compile time.
 
 There's no pretending required. In the vast majority of cases you
 don't *know* the types returned by new. After all, something could
 have altered the behaviour of Class::new after the code in question
 is compiled but before it gets run.

It's a good thing we (will) have a Notification system, then, eh?

We can know the return type of new() at compile time, and if it
changes, and be notified and respond accordingly.

If the new return type of new() is sufficiently compatible with the old
type, no changes need to be made to the sub which is calling new().  If
the new type is sufficiently different, then the calling sub needs to be
recompiled (not from scratch, but from the point just before the type
inferencing engine got at it).  This might be expensive, but how often
do the return types of functions change?  Also, recompilation doesn't
need to happen right away -- we could set a flag, and defer until
necessary.

(Actually, it would be pretty cool if by default, perl only compiled
subs into ASTs, and didn't produce parrot bytecode until the sub first
gets called.  This way, we can get the metainfo of a sub right away, as
soon as it's been seen, which is useful for code analysis and
inter-subroutine optomization, but we don't have to do the extra work if
it doesn't ever get called... which is good for big modules/libraries).

  and yet should still respect the fact that the
  objects bound to the name intercept assignment (basically overloading
  it) rather than having the compiler or runtime doing the rebinding
  for you.
 
  Since, of course, we're dealing with basically

Re: [RfC] constant PMCs and classes

2003-08-26 Thread Benjamin Goldberg
Juergen Boemmels wrote:
 Leopold Toetsch wrotes:
 
  We currently have constant Key and Sub PMCs both created from the
  packfile at load time. They live in the constant_table pointing to a
  constant PMC pool. But we need more.
 
  We have allover the core code like this:
 
 string_from_cstring(interpreter, pIt, 0)
 key = key_new_cstring(interpreter, _message);
 
  to create some STRINGs or entries in hashes. The keys should be
  constant PMCs and they should be shared as well as the STRINGs.
 
  We need this in objects.c (\0\0anonymous), for setting standard
  property names internally and for the current hash based
  implementation of Exceptions.
 
 [...]
 
 Some time ago I did some experiments with initialising strings at
 compile time. With some preprocessor magic and a perl program scanning
 the c-file I got it running.
 
 The changes to an existing c-file are minimal: At the beginning add a
 #include FILE.str and replace all these string_form_cstring with
 _S(text). The _S macros are replaced by static string_structures
 with static initialisers. These string structures are in the ro-data
 segement of the executable and should load really fast. No calls to
 any functions.

I can think of a slightly easier method, one which would not depend on
running a helper perl program.

#define PCONCAT(b,c) _Parrot_static_##b##c
#define PARROT_DECLARE_STATIC_STRING(name, cstring) \
   static const char PCONCAT(name,_cstring) [] = cstring; \
   static const struct parrot_string_t \
  PCONCAT(name,_STRING) = { \
  { /* pobj_t */ \
 {{ \
(void*)PCONCAT(name,_cstring), \
sizeof(PCONCAT(name,_cstring)) \
 }}, \
 PObj_constant_FLAG \
 GC_DEBUG_VERSION \
  }, \
  sizeof(PCONCAT(name,_cstring)), \
  (void*)PCONCAT(name,_cstring), \
  sizeof(PCONCAT(name,_cstring)) - 1, \
  NULL, \
  NULL, \
  0 \
   }, * const name = PCONCAT(name,_STRING);
[untested]

Then, of course, one can simply declare the names and values of one's
constant strings in any place where variables may be declared, using:

   PARROT_DECLARE_STATIC_STRING(mystr, some string here);

Now, there's a const STRING * const mystr variable, whose contents are the
some string here.  Actually, I'm not sure that there's a need for the
PCONCAT(name,_cstring) variable -- it might be possible to use cstring
directly when initializing the struct.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: String API

2003-08-25 Thread Benjamin Goldberg


Dan Sugalski wrote:
 
 At 12:07 AM -0400 8/19/03, Benjamin Goldberg wrote:
 There are a number of shortcomings in the API, which I'd like to address
 here, and propose improvments for.
 
 You're conflating language level strings with low-level strings. Don't.
 
 STRINGs, the parrot structure and what S registers point to, are
 single-encoding, single-character set entities. They're designed for
 as fast access as feasable while maintaining the bare minimum of
 language/set/encoding abstraction. Parts of the core *will* assume
 they are concrete and, once transformed into a fixed-width format,
 can be accessed directly while avoiding all the overhead of the
 encoding (and even character set) functions.
 
 PMCs are full-blown language level variables that can do whatever the
 heck they want when accessed. Everything you do can be mediated by C,
 Parrot, or perl/python/ruby/scheme/forth/BASIC/whatever code. Lazy
 strings, multi-encoding strings, tree structures with strin reps,
 whiatever.
 
 Most, nearly all, high-level language level functionality should use
 PMCs. If a STRING is insufficiently flexible for what you want,
 that's a sign you shouldn't use it.

Ok, I'm convinced.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Applying regexen/grammars to foo objects (was Re: String API)

2003-08-25 Thread Benjamin Goldberg
Gordon Henriksen wrote:
 
 Now, I don't really have much of an opinion on compound strings in
 general. I do want to address one particular argument, though—the lazily
 slurped file string.
 
 On Thursday, August 21, 2003, at 07:22 , Benjamin Goldberg wrote:
 
  A foolish question: can you imagine strings which are lazily read from
  a file?
 
  If so, could you imagine such a string, sitting in front of a really
  really big file, bigger than could fit into memory?
 
 Having a lazily slurped file string simply delays disaster, and opens
 the door for Very Big Mistakes. Such strings would have to be treated
 very delicately, or the program would behave very inefficiently or
 crash.

Although Dan's convinced me that STRING*s don't need to be anything
other than concrete, wholly-in-memory, non-active buffers of data, (for
various and sundry reasons), I'm not sure why a lazily slurped file
string would need to be treated delicately.

In particular, what would make the program crash?

(I'll assume that you're thinking the program might become inefficient
due to someone grabbing bytes out of arbitrary offsets, due to having to
seek() about here and there all the time.  While it's true that this
might be inneficient, it's not much worse than skipping around inside of
a utf8 string.)

 (And let's be frank, a lazily concatenated STRING* is just a
 tie()d string value—I thought that was leaving the core.) There's power
 in such strings, no doubt. There's also TERROR of passing the string to
 anything lest your program explode because some CPAN module's author
 wasn't also TERRIFIED of your input being something not-just-a-string.
 If I'm going to have the potential to load the entire file into memory
 if I'm the least bit careless,

Why would you have the potential to load the entire file into memory if
you're careless?

 I'd prefer to be up front about it.
 Anti-action-at-a-distance. I don't need to be deluded that my code is
 efficient because it reads lazily. (Fact is, it's probably faster if it
 buffers the file all at once, if it's going to buffer it at all.
 Certainly more memory-efficient (!). Fewer chunks. Less overhead. But
 probably faster still to mmap() it.)

Well, I'll certainly agree that mmaping is almost certainly faster than
any other way of bringing whole files into strings.

Hmm, does parrot's memory system use mmap when really big chunks are
requested?  IIRC, perl5's does.

 And what if your admittedly huge file is larger than 2**32 bytes? (A
 very real possibility! You said it was too big to fit in memory!) Are
 you going to suggest that all STRING* consumers on 32-bit platforms
 emulate 64-bit arithmetic whenever manipulating STRING* lengths?

Blech.  Yeah, that *would* be annoying.  OTOH, they're already emulating
64-bit arithmetic whenever they deal with file offsets.  Or perhaps I
should be saying, bad enough that they're already ... with file
offsets, we don't want to have to do it with string lengths, too.

 To efficiently process a Very Large String, you need to *stream* through
 it, not buffer it. Same applies to infinite strings (generators) or
 indeterminate strings (generators and sockets). Such strings don't have
 representable or knowable lengths. STRING*'s *really* *really* should
 reliably have lengths, I think.
 
 IMAGINE, if you will, something absolutely crazy:
 
 grammar HTTPServer {
 rule http {
 (request commit)*
 }
 rule request {
 get_request | post_request | ...
 }
 rule get_request {
 GET path version crlf
 header
[snip]

You should have a commit after that CRLF there :)

 If perl's using a stream rather than buffering to a STRING*, then
 $sock =~ /HTTPServer::http/ could actually work—and quite efficiently.
 [1]

 [1] Of course, this requires that the regex engine be coded to think in
 sequences.

Well, PerlString already supports the Iterator interface, and if the
regex engine were redesigned to *use* the Iterator interface, and if
streams were designed to support the Iterator interface, then passing a
stream instead of a string to the regex engine would be easy.

The way that I would envision a stream's support of the iterator
interface would be as follows:

   PMC * get_integer_keyed(PMC *key) {
  return key_integer(INTERP, key);
   }

   PMC * nextkey_keyed(PMC *key, INTVAL what) {
  PMC * ret;
  switch( what ) {
 case ITERATE_FROM_START:
key_set_integer(INTERP, key, DYNSELF-shift_integer());
return key;
 case ITERATE_GET_NEXT:
ret = key_next(INTERP, key);
if( ret ) return ret;
ret = key_new_integer(INTERP, DYNSELF-shift_integer());
key_set_next( key, ret );
return key;
 case ITERATE_GET_PREV:
 case ITERATE_FROM_END:
internal_exception

Re: Applying regexen/grammars to foo objects (was Re: String API)

2003-08-25 Thread Benjamin Goldberg
Gordon Henriksen wrote:
 
 Benjamin Goldberg wrote:
 
  Gordon Henriksen wrote:
[snip]
   [3] Unshift hack #1: Where commit appears in the above, exit the
   grammar, trim the beginning of the string, and re-enter. (But that
   forces the grammar author to discard the regex state, whereas commit
   would offer no such restriction.) Unshift hack #2: Tell =~ that
   commit can trim the beginning of the string. (DWIM departs;
   /cgxism returns.)
 
  Trimming off the beginning of the string is the job of the cut
  operator, not the commit operator.
 
 Indeed, my bad--been a while since I read the apocalypse.
 
  Hmm... I wonder how cut would be done with an iterator.  Bleh.
 
 Equivalent to commit, I say

Not necessarily.  I mean, consider if someone does:

   my @ary := unpack U*, $string;

Then, doing:

   @ary =~ /regex/

And:

   $string =~ /regex/

*Should* be equivilant, ne?

And a cut on the string should trim the front of the string, and a
cut on the array should splice of the front of the array.

Maybe we should provide some vtable entries for altering a pmc through
an iterator, so that that cut could say:

   VTABLE_trim_front_keyed(INTERP, pmc, key);

 Then your grammar rule can work on an
 iterator, or on a string that's being used as a buffer.
 
 Here's a question: How does $iter =~ /a+b/ work on an iterator which
 returns aaack!? Requires a putback op.

Have you read how perl6's regex engine works?

 I'm not sure about cut vs. commit. They seem so orthogonal, and they
 pervasively tie a grammar to an implementation choice. It seems more
 like an m:option.

They are orthogonal:  A commit says, throw away the backtracking stack
before this point, because if we *try* to backtrack before here, then
the entire match fails.  A cut says, throw away the backtracking
stack, *AND* throw away the front of the string/aggregate we're matching
against.

However, I'm not so sure that they tie a grammar to a choice: if we
provide an interface to trim off the front of an aggregate using an
iterator, (and we can already trim off the front of a string, using
string_replace), then we have both commit and cut with both.

Of course, it's quite possible that some pmc classes won't *provide* a
method for trimming itself, in which case cut will throw an exception.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Embedding Parrot in Perl

2003-08-25 Thread Benjamin Goldberg


Luke Palmer wrote:
 
 I started working on some XS code for embedding a Parrot interpreter in
 Perl.  I ran into a few problems:
 
 1) I don't know XS :-)   (good way to learn, though)
 
 2) What do I put as stacktop in Parrot_init()?  I can't just use a
local variable in the calling function, because it will return and
there might be something lower than that later...

If our DoD is no longer walking the C stack, then stacktop should no
longer be needed.  Until the code is changed, just pass NULL.

 3) I'd like to have the PIR compiler available to this embedded
interpreter.  What would be the best way to do that?
 
 Thanks,
 Luke

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: PerlHash.get_pmc_keyed of non existing key

2003-08-24 Thread Benjamin Goldberg


Gordon Henriksen wrote:
 
 Taking a thread from Perl 6 Internals. Will Perl 6 support this behavior?
 
 $ perl 'EOT'
 my @ary;
 my $ref = \$ary[0];
 $$ref = value;
 print '$ary[0] : ', $ary[0], \n;
 EOT
 $ary[0] : value
 
 Presumably the Perl 6 would be:
 
 my @ary;
 my $ref = [EMAIL PROTECTED];
 $$ref = value;
 print '@ary[0] : ', @ary[0], \n;   # - @ary[0] : value
 
 If that's supported, then how will this behave?
 
 my @cows of Cow;
 my $slot = [EMAIL PROTECTED];
 $$slot = new Dog;

Well, Perl6 can infer (at compile time) that $slot is a reference to a
Cow, and from that, it can infer that $$slot may only contain a Cow. 
Thus, it should be catchable at compile time.

OTOH, the following is not catchable at compile time:

   my @cows of Cow;
   my @dogs of Dog;
   my $ref = int(rand(2)) ? [EMAIL PROTECTED] : [EMAIL PROTECTED];
   $$ref = new Dog;

All that Perl6 can infer at compile time about $ref is that it is a
reference to any(Cow, Dog).  So, the assignment is legal at compile
time.

Thus, *some* sort of runtime check needs to be made.

I suppose that instead of $ref being a PerlRef (or however we manage
references), it would be a PerlTypeConstrainedRef, which throws an
exception if you try to store a value the wrong type into it.

 Do we wind up with Clarus or an exception? (Never mind whether you take
 exception with Clarus.)

Clarus would be if we had an object whose type was all(Cow, Dog), eh?:)

 Hopefully an exception, from a user's POV.

An exception at compile time the code you presented, an exception at run
time for the code I presented.

 And this?
 
 my @cows of Cow:
 my @cats of Cat;
 my $ref = [EMAIL PROTECTED];

Ok.

 @cats[0] := $$ref;

Ok, although this is mostly a no-op.

 @cats[1] := @cows[0];   # Just to hammer the point home

A compile time error.

 $$ref = new Dog;

A compile time error due to type inferencing.
(If type inferencing were insufficient, it would be a run-time error).

 But then there's a question for p6i as to how all the above happens.

I suppose that a PerlRef would be a PMC which has another PMC stored in
it's -cache.struct_value component; storing and fetching it's contents
would be done via VTABLE_set_pmc and VTABLE_get_pmc.

I suppose that a PerlTypeConstrainedRef would, in addition, have a
PerlClass PMC in it's PMC_data, and check for the appropriate isa
relationship before allowing new data to be stored into it.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: [CVS ci] PackFile-15: print warning location

2003-08-24 Thread Benjamin Goldberg


Robert Spier wrote:
 
   The HLL doesn't know, how many ops one source line will need.
 
  Not *normally*, but if it's including code which is already literal
  assembler, it does: Imagine a version of lex/yacc wherein the the
  blocks of code you give are imcc or pasm (instead of C).  Clearly,
  there's one op per line of source.
 
 Insert optimizer here.

Blech.

For that matter, won't the optomizer wreak havoc with other all of the
various other uses of .setline?

Anyway, how about this semantic: .setline_i would associate an
increasing series of line numbers to the ops which follow it, as if each
one had been preceded by it's own individiual .setline.

Thus, whatever solution we come up with for the optomizer's interactions
with regular .setline automagically applies to .setline_i, too.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Method call parameters

2003-08-24 Thread Benjamin Goldberg


Togos wrote:
 
 What's the reasoning behind putting the object a
 method is being called on in P2 instead of in the
 first parameter of the method? I have a feeling that
 putting it as the first parameter of the method would
 make the lives of the python folks a little bit
 easier. Would it make it harder for someone else?

While I've no idea about that, I don't see why you can't have it both
ways... put it in P2 *and* as the first param, if that's the way your
language wants it.

I think this would be useful for both Python and Perl5.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Weak References?

2003-08-24 Thread Benjamin Goldberg


Gordon Henriksen wrote:
 
 On Saturday, August 23, 2003, at 08:17 , Benjamin Goldberg wrote:
 
  The reason this is safe to do is because it's *only* created by the
  dod,
 
 Allocating memory during garbage collection?

Why not?  Or at least, why not for sys_mem_allocate()d memory?

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Weak References?

2003-08-24 Thread Benjamin Goldberg


Dan Sugalski wrote:
 
 At 9:27 PM -0400 8/21/03, Benjamin Goldberg wrote:
 I would like for Parrot to have some way of creating Weak References; I
 think that this is probably a vital feature.
 
 No, it isn't, and we've discussed this before. (You were involved, as
 I recall) Weak references can be done entirely with notifications,
 there doesn't have to be any special core functionality for them
 outside a weak reference PMC type and general notification support.
 
 I'll wade through the rest of this thread to see if anything else has
 come up from it, but if not then dig through the archives for the
 last go-round and read what came of it.

Googling doesn't seem to find it:
   http://groups.google.com/groups?
  q=group:perl.perl6.internals+weak+references
Shows only this thread.  Searching for notifications:
   http://groups.google.com/groups?
  q=group:perl.perl6.internals+notifications
Shows this thread, and an older, apparently unrelated thread.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Weak References?

2003-08-24 Thread Benjamin Goldberg

I dislike replying to myself, however, it seems I haven't been clear
enough in describing how I think this (c|sh)ould be implemented.

Let's abstract DoD to the following psuedocode:

   function markalive(p) {
  if(!p.is_alive)
 interpreter-dod_queue.enqueue(p);
   }
   for p in all pmcs
  p.is_alive = false
   for p in rootset
  interpreter-dod_queue.enqueue(p)
   while interpreter-dod_queue.size()
  p = interpreter-dod_queue.dequeue()
  if(p.is_alive) continue
  p.is_alive = true
  if(p.has_special_mark)
 p.mark()
   for p in all pmcs
  if !p.is_alive
 if p.has_destruct
p.destruct();
 add p to list of free pmcs
   # end

With my proposal, we'd change it to:

   function markalive(p) {
  if(!p.is_alive)
 interpreter-dod_queue.enqueue(p);
   }
   function weakmark(asker, asked_of, cb, cb_info) {
  if(!asked_of.is_alive)
 interpreter-weakref_cnt ++;
 interpreter-weakref_list.push(cb);
 interpreter-weakref_list.push(cb_info);
 interpreter-weakref_list.push(asked_of);
   }
   interpreter-weakref_cnt = 0;
   interpreter-weakref_list = new fast_unsafe_stack;
   for p in all pmcs
  p.is_alive = false
   for p in rootset
  
   q = interpreter's list of objs to sweep
   while( interpreter-dod_queue.size() )
  p = interpreter-dod_queue.dequeue()
  p.is_alive = true;
  if(p.has_special_mark)
 p.mark()
 if( interpreter-weakref_cnt )
interpreter-weakref_list.push(interpreter-weakref_cnt);
interpreter-weakref_list.push(p);
interpreter-weakref_cnt = 0;
   while( interpreter-weakref_list.size() )
  p = interpreter-weakref_list.pop();
  i = interpreter-weakref_list.pop();
  if( !p.is_alive ) {
 while( i-- ) {
interpreter-weakref_list.pop();
interpreter-weakref_list.pop();
interpreter-weakref_list.pop();
 }
  } else {
 while( i-- ) {
deadobj = interpreter-weakref_list.pop();
if( deadobj.is_alive ) {
   interpreter-weakref_list.pop();
   interpreter-weakref_list.pop();
} else {
   cb_extra = interpreter-weakref_list.pop();
   callback = interpreter-weakref_list.pop();
   callback-(p, deadobj, cb_extra);
}
 }
  }
   }
   free up memory used by interpreter-weakref_list
   interpreter-weakref_list = NULL;
   for p in all pmcs
  if(!p.is_alive) {
 if( p.has_destruct ) p.destruct();
 add p to list of free pmcs
  }
   # end

Notice that the marking phase of dod now does a single extra integer
check in the common case of objects not having weak references; I
believe that this won't be a significant speed penalty.

Further, note that the weakref_list doesn't need to know (record)
anything about the type of the data being stored in it -- push can take
a void*, and pop can return a void*, and if we want an int, we can just
use a cast.

The reason this is safe to do is because it's *only* created by the dod,
and *only* accessed through the weakmark function; since it's not
exposed to *arbitrary* code; there's no chance of it being accessed
incorrectly, since it's not exposed.

If we wanted, it could be defined as follows:
   typedef struct {
  ptrdiff_t * b;
  ptrdiff_t * i;
   } weaklreflist;
   weaklreflist * new_weaklreflist() {
  weaklreflist * list = malloc(sizeof(weaklreflist));
  list-b = malloc(sizeof(ptrdiff_t) * 1024);
  list-b[0] = 0;
  list-i = b;
  return list;
   }
   void wekref_list_push(weaklreflist * list, void* val) {
  if( ++list-i == list-b+1023 ) {
 ptrdiff_t * newb = malloc(sizeof(ptrdiff_t)*1024);
 ptrdiff_t * oldb = list-b;
 newb[0] = (ptrdiff_t)oldb;
 oldb[1023] = (ptrdiff_t)newb;
 list-b = newb;
 list-i = newb + 1;
  }
  *(list-i) = (ptrdiff_t)val;
   }
   void* weakref_list_pop(weaklreflist * list) {
  if( list-i-- == b ) {
 list-b = (ptrdiff_t*)*(list-b);
 list-i = (list-b[1022]);
  }
  return (void*)*(list-i);
   }
   void free_weakreflist(weakreflist * list) {
  ptrdiff_t * b = list-b, *n;
  assert(list-i == list-b+1);
  assert(list-b[0] == 0);
  do {
 n = (ptrdiff_t*)b[1023];
 free(b);
 b = n;
  } while(b);
  free(list);
   }
[untested, might contain syntax and logical errors]

I used ptrdiff_t instead of void*, since dealing with pointers to void**
always confuses me and makes my head hurt, not to mention making my code
harder for me to read.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: [CVS ci] PackFile-15: print warning location

2003-08-24 Thread Benjamin Goldberg


Leopold Toetsch wrote:
 
 Benjamin Goldberg [EMAIL PROTECTED] wrote:
 
  In other words, setline and setfile ops in source don't translate to
  actual ops in the bytecode, but instead translate to additions/changes
  to the debugging segment?
 
 Exactly. (+ Csetpackage, which isn't done yet)
 
  I like the ideas of a range of characters, and of variable amount of
  information.  So, how about multiple setline variants?
 
   setline \d+
   setline \d+ '[' \d+ (',' \d+)* ']'
 
 The brackets have the char (range) info, the first one is used to count
 dimensions.

Ok.

 setline_i Ix # the next line is x, each succeeding line increases.
 
 The HLL doesn't know, how many ops one source line will need.

Not *normally*, but if it's including code which is already literal
assembler, it does: Imagine a version of lex/yacc wherein the the blocks
of code you give are imcc or pasm (instead of C).  Clearly, there's one
op per line of source.

  There'd be a corresponding get* function for each of these except for
  setline_i (for which it wouldn't make sense), which would get
  translated at compile time to set Ix, 12 or whatever.  There should
  be a C-code level interface to go (at runtime) from a pointer to
  bytecode (or from a bytecode offset) to a file, line, or range of
  lines, or ... with columns; this would be useful for debuggers.
 
 Yep.
 
 leo

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: [CVS ci] PackFile-15: print warning location

2003-08-24 Thread Benjamin Goldberg


Juergen Boemmels wrote:
 
 Benjamin Goldberg [EMAIL PROTECTED] writes:
 
   Normal processors also don't have setline and setfile operations. They
   use an extra segment in the *.o file, which is only used by the
   debugger. This could also be done in parrot.
 
  In other words, setline and setfile ops in source don't translate to
  actual ops in the bytecode, but instead translate to additions/changes
  to the debugging segment?
 
 In principle yes. But as they are no opcodes any more they should not
 look like ones. They should be written .setline or #setline

Ok.

 #line 17 sourcefile.p6
 
  I don't like this syntax -- it sounds too easy for someone to write a
  comment like:
 
  #When this was in the original foobar language, it was on
  #line 17
 
  And have it interpreted as a directive, when the author meant for it
  to be just a comment.
 
 from the point of the bytecode this is just a comment. No different
 bytecode is generated with or without this line. Only for the
 debugsegment gets other information.

I wasn't suggesting that it might produce different bytecode, but it
could concievable produce incorrect debugging info.

  There's no reason not to have the directives look like ops (setline,
  setfile).
 
 No they should not look like ops. They are no ops.
 
 They might look like macros .setfile, macros can evaluate to nothing.

Ok.

  Oh, and you could have get{line,file} directives, which end up
  translated as being simple set ops, using the info generated by the
  set{line,file} directives.
 
 Same here. Use .getline macros which are expanded to a set
 operation. Or better use a .currentline macro which expands to the
 current line. Much like the __LINE__ macro in C.

Hmm.  Definitely better.  Then it can be used as an argument to any op,
not just to set.

So, what values does .currentline have inside of another macro?

Do macros have their own line number context, or do they get the context
from the code they're being called from?

Is this a whole bucket of worms, or just a small can of worms?

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Weak References?

2003-08-24 Thread Benjamin Goldberg


Juergen Boemmels wrote:
 
 Benjamin Goldberg [EMAIL PROTECTED] writes:
 
  But suppose that at the end of DoD, the object we're weakly referring
  to gets marked as alive?  Now, we would need to look through that
  object's list of destroy-functions, and remove all of the
  weakref-callbacks.
 
 At the end of DoD nobody gets marked alive anymore. The calling of
 destroy-functions is done at sweep-time. There might be a problem when
 you destroy referent and referee at the same DoD run. Then the
 weakref-callback can be called on a dead but not destroyed object. It
 is a matter of destruction ordering to destroy the referee first,
 which destroyes the weakref.

Erm, I had meant to say, suppose that *by* the time DoD has finished
sweeping, the object we're weakly referring to *has been* marked as
alive some time (after we've done pobject_weakref), not what happens
if it gets marked as alive after the end of DoD

  But if the weakref-callbacks are stored in a seperate data structure,
  created during DoD, then there's no problem; this data structure will
  get re-created for each DoD pass, and thus always start empty.
 
 This is unnecessary complicated. And it slows down the DoD run by
 creating a datastructure.

As opposed to attaching another destructor (which is, after all, a data
structure) to the object being weakly referred to?

Plus, since the data structure for the weakref callbacks gets created at
the start of the DoD function, and we're finished with it at the end of
the DoD function, then it doesn't need to be a PMC or any other
generic structure; it's allocation and deallocation will be quite
explicit, and it can be a special-purpose data structure written for
compactness and speed instead of generality.

  Also, by keeping them seperate, we can walk all the callback functions
  before destructing the dead objects.
 
 This is a problem of destruction ordering. You try to solve it by
 introducing a seperate step, which works in this special case. But
 destruction ordering is a much harder problem.

Well, it wouldn't be *harmful* to call the callbacks after the dead
objects are destructed (or even when *some* but not all of them are
destructed) ... however, to now be safe, we'd have to make it illegal to
operate on the Pobj pointer as anything other than opaque pointer, which
we may only perform equality comparisons with.  Caching the information
about what the Pobj *used* to be would become necessary, if we needed to
know that info.

  (But you're right -- it is too complicated to do a lookup table.  A
  simple linked list should do fine.)
 
   But one other thing, what happens if the object holding the weakref
   dies before the refrenced object? Then the callback-function will be
   called for a dead object.
 
  Each callback-function belongs to a pmc.  The DoD should be able to
  know this, and act on it.  So if the pmc which registered the callback
  is dead, (or if the object weakly referred has since then come alive),
  then the callback isn't called.
 
   So pobject_weakref() needs to return a handle
   for the weakref and there needs to be a function
   weakref_destroy(weakref_handle *handle).
  
   Other issue is who owns the data_structure of the weakref? The
   referent, the referee, or will this be garbage-collected (which
   makes the weakref_handle a PObj* and weakref_destroy its custom
   destroy function.
 
  The garbage collector owns everything except for callback_info, which
  belongs to the pmc which registered the weakref-callback.
 
 This is one possiblity. Therefor the destroy-function of the
 registering PMC must be extended with freeing the callback_info. As we
 already extend the destroy function of the PMC referenced by the
 weakref this needs no new mechanics.

? Where have we extended the destroy function of ... ?  Hmm, oh, in your
proposed modification of ... .  Ok.  I suppose that works, though it
sounds like it's getting to be rather more complicated than my idea. 
And bits of the weakref stuff is now scattered all over the place, stuck
onto each pmc being dealt with, instead of collected all in one nicely
managable location.

After DOD finishes, the lookup table is walked; for each entry
whose Pobj* hasn't been marked as alive, the callbacks are called.
   
The effect of this of course is that a WeakRef has no cost except
during Dead Object Detection.
  
   It only has a cost at object destroy-time. (If the weakrefs are
   garbagecollected they have an effect on DOD in the way that there
   are more objects to trace)
 
  *blink* More objects?  Oh, you're assuming that pobject_weakref is
  returning a Pobj* handle.
 
 Returning a PObj* handle is the other possibility. The registering PMC
 holds a hard reference to the callback_info, and the callback_info
 deregisters itself when it gets destroyed. The advantage of this
 approach is there is no need to malloc/free the memory for
 callback_info, it just uses the standard gc-allocator.

Either you're

Re: Should I target Parrot?

2003-08-22 Thread Benjamin Goldberg


Leopold Toetsch wrote:
 
 Benjamin Goldberg [EMAIL PROTECTED] wrote:
 
 inline op mthread_create(inconst INT) {
opcode_t *dest = PTR2OPCODE_T(CUR_OPCODE + $1));
PMC * p = new_ret_continuation_pmc(interpreter, dest)
 
 Please note that the ret_continuation is intended for returning only,
 not for executing code on. (it just has pointers to the original
 interpreter context that get swapped in on invoke() - no COW copies like
 the normal Continuation class)

Oops.  I was writing this mostly by looking at core.ops, and I saw this
and thought that it was just a shortcut for creating a Continuation (one
line instead of two).  Foolish me.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Should I target Parrot?

2003-08-22 Thread Benjamin Goldberg


Michal Wallace wrote:
 
 On Thu, 21 Aug 2003, Benjamin Goldberg wrote:
 
  I hope you aren't planning on serializing just a single isolated
  microthread... that wouldn't work well with what I've got in mind due
  to how much stuff comes along when you serialize a continuation --
  you'd get almost the whole interpreter serialized.
 
  If you want, instead, to serialize interpreter-microthreads,
  however...
  well, you'd *still* get almost the whole interpreter serialized, but
  you're getting more bang for your buck :)
 
 Well how else are we going to implement squeak? :)

Squeak uses synchronous communication channels, right?  Writing blocks
until there's a reader, and reading blocks until there's a writer.

I just thought of a way of implementing such things for microthreads :)

Something like the following:

   enum {
  enum_state_none = 0,
  enum_state_ready = 1,
  enum_state_readers = 2,
  enum_state_writers = 4,
   };
   pmclass SynchronousChannel {
  void init() {
 SELF-cache.int_val = enum_state_none;
 PMC_data(SELF) = pmc_new(INTERP, enum_class_PerlArray);
 PObj_custom_mark_SET(SELF);
  }
  void mark() {
 pobject_lives(INTERP, PMC_data(SELF));
  }
  void push_pmc(PMC *the_val) {
 if( SELF-cache.int_val  enum_state_ready )
internal_exception(???, Illegal state\n);
 SELF-cache.int_val |= enum_state_ready
 VTABLE_push(INTERP, PMC_data(SELF), the_val);
  }
  PMC * shift_pmc() {
 if( !(SELF-cache_int_val  enum_state_ready) )
internal_exception(???, Illegal_state);
 SELF-cache_int_val = ~enum_state_ready
 return VTABLE_pop(INTERP, PMC_data(SELF));
  }
  void* invoke(void* next) {
 PMC * data = (PMC*)PMC_data(SELF);

 PMC * cont;
 switch( SELF-cache.int_val ) {
case enum_state_ready:
case enum_state_ready|enum_state_writers:
case enum_state_ready|enum_state_readers:
case enum_state_none:
case enum_state_readers:
   cont = pmc_new(INTERP, enum_class_Continuation);
   VTABLE_set_integer_native(INTERP, cont, (INTVAL)next);
   break;
 }

 switch( SELF-cache.int_val ) {
/* ready means we're trying to write data */
case enum_state_ready:
case enum_state_ready|enum_state_writers:
   /* if there are no readers, suspend ourself */
   SELF-cache.int_val = enum_state_writers;
   VTABLE_push_pmc(INTERP, data, cont);
   break;
case enum_state_ready|enum_state_readers:
   /* otherwise, stay awake, and wake up */
   /* (and switch to) a reader. */
   VTABLE_push_pmc(INTERP, interpreter-microthreads, cont);
   cont = VTABLE_shift(INTERP, data);
   if( VTABLE_get_integer(INTERP, data) == 1 )
  /* Remove _readers flag */
  SELF-cache.int_val = enum_state_ready;
   return VTABLE_invoke(INTERP, cont);
/* not-ready means we're trying to read data */
case enum_state_writers:
   /* if data has been written, we stay awake, */
   /* and we wake up the writer who wrote. */
   SELF-cache.int_val = enum_state_ready;
   VTABLE_push_pmc(INTERP, interpreter-microthreads,
  VTABLE_shift(INTERP, data));
   if( VTABLE_get_integer(INTERP, data)  1 )
  SELF-cache.int_val |= enum_state_writers;
   return next;
case enum_state_none:
   SELF-cache.int_val = enum_state_readers;
   /* fallthrough */
case enum_state_readers:
   /* If no data has been written, we suspend ourself */
   VTABLE_push_pmc(INTERP, data, cont);
   break;
case enum_state_readers|enum_state_writers:
case enum_state_readers|enum_state_writers|enum_state_ready:
   internal_exception(???, Illegal state);
   break;
default:
   internal_exception(???, Illegaller state);
   break;
 }
 if(!VTABLE_get_integer(INTERP, interpreter-microthreads))
internal_exception(???, Deadlock!);
 cont = VTABLE_shift(INTERP, interpreter-microthreads);
 return VTABLE_invoke(INTERP, cont, next);
  }
   }

Writing to a channel is done with:
   push $Pchannel, $Pdata
   invoke $Pchannel
Reading from a channel is done with:
   invoke $Pchannel
   shift $Pchannel, $Pdata

We can detect when the whole system is deadlocked, since -microthreads
becomes empty.

A partial deadlock, though, is undetectable, since there's no way to
know whether or not one of the 'live' threads might happen to write to
one of the channels that the deadlocked threads are reading from, or
might happen

Re: Weak References?

2003-08-22 Thread Benjamin Goldberg
Juergen Boemmels wrote:
 
 Benjamin Goldberg [EMAIL PROTECTED] writes:
 
  I would like for Parrot to have some way of creating Weak References;
  I think that this is probably a vital feature.
 
  The way I envision this is as follows.  The following typedef and new
  function would be added:
 
  typedef void (*pobject_died_cb)(INTERP, PMC* who_asked,
 Pobj* weakref, void *callback_info);
  void pobject_weakref(INTERP, pobject_died_cb callback,
 Pobj* weakref, void *callback_info);
 
 This might be sometimes useful. Keeping a container of active objects
 like filehandles or windows for example. In this case it can't mark
 the reference because this would make the object active forever.
 
  Inside of a PMC*'s mark method, it registers callbacks (using the
  above function) so that it will be informed about what to do if the
  object to which it weakly refers to is found to not be alive at the
  end of the DOD sweep.
 


Re: Weak References?

2003-08-22 Thread Benjamin Goldberg
Juergen Boemmels wrote:
 
 Benjamin Goldberg [EMAIL PROTECTED] writes:
 
  I would like for Parrot to have some way of creating Weak References;
  I think that this is probably a vital feature.
 
  The way I envision this is as follows.  The following typedef and new
  function would be added:
 
  typedef void (*pobject_died_cb)(INTERP, PMC* who_asked,
 Pobj* weakref, void *callback_info);
  void pobject_weakref(INTERP, pobject_died_cb callback,
 Pobj* weakref, void *callback_info);
 
 This might be sometimes useful. Keeping a container of active objects
 like filehandles or windows for example. In this case it can't mark
 the reference because this would make the object active forever.
 
  Inside of a PMC*'s mark method, it registers callbacks (using the
  above function) so that it will be informed about what to do if the
  object to which it weakly refers to is found to not be alive at the
  end of the DOD sweep.
 
 This does not need to go into the mark function. The weakref_callback
 code can be called inside the destroy function. Im not sure if it
 should be called befor or after the custom destroy-function is
 run. And is the order of the weakref_callbacks defined.
 
  The pobject_weakref function first checks if the 'weakref' argument
  has been marked as alive -- if so, nothing happens.  Then, it adds the
  Pobj* to a lookup table, pointing from the Pobj*, to a list of
  registered callbacks for that Pobj*.
 
 This is far to complicated. Each object has a list of
 destroy-functions, one of them is the costom-destroy function the
 others are the weakref-callbacks.

But suppose that at the end of DoD, the object we're weakly referring to
gets marked as alive?  Now, we would need to look through that object's
list of destroy-functions, and remove all of the weakref-callbacks.

But if the weakref-callbacks are stored in a seperate data structure,
created during DoD, then there's no problem; this data structure will
get re-created for each DoD pass, and thus always start empty.

Also, by keeping them seperate, we can walk all the callback functions
before destructing the dead objects.

(But you're right -- it is too complicated to do a lookup table.  A
simple linked list should do fine.)

 But one other thing, what happens if the object holding the weakref
 dies before the refrenced object? Then the callback-function will be
 called for a dead object.

Each callback-function belongs to a pmc.  The DoD should be able to
know this, and act on it.  So if the pmc which registered the callback
is dead, (or if the object weakly referred has since then come alive),
then the callback isn't called.

 So pobject_weakref() needs to return a handle
 for the weakref and there needs to be a function
 weakref_destroy(weakref_handle *handle).
 
 Other issue is who owns the data_structure of the weakref? The
 referent, the referee, or will this be garbage-collected (which makes
 the weakref_handle a PObj* and weakref_destroy its custom destroy
 function.

The garbage collector owns everything except for callback_info, which
belongs to the pmc which registered the weakref-callback.

  After DOD finishes, the lookup table is walked; for each entry whose
  Pobj* hasn't been marked as alive, the callbacks are called.
 
  The effect of this of course is that a WeakRef has no cost except
  during Dead Object Detection.
 
 It only has a cost at object destroy-time. (If the weakrefs are
 garbagecollected they have an effect on DOD in the way that there are
 more objects to trace)

*blink* More objects?  Oh, you're assuming that pobject_weakref is
returning a Pobj* handle.

Keeping the callback data in a seperate list which only exists for the
duration of the dod prevents this.  Or rather, you do have to clean up
the linked list, of course, but there's no extra bookkeeping.

  The first, perhaps most important use, would be to implement a
  string-interning table.
 
  You'd have a global (per-interpreter) pmc which contains a hashtable
  of cstrings to perlstrings; if the cstring is present, the
  corresponding perlstring is returned; otherwise a new perlstring would
  be created and added to the table, then returned.  If any of the
  perlstrings go out of scope, then their hash entries should disappear
  from the table.  Obivously, if the table contained normal references
  to the strings, then they won't ever go out of scope.  And if the
  table simply doesn't mark them as alive, it wouldn't know when they're
  dead.  But with weakrefs, this is easy.
 
 this is only useful if a hashlookup is fast compared with
 string_make.

Well, it might be.  Hashing can be quite fast, ya know.

Here's a better idea, one you'll have more difficulty arguing with --
imagine a debugger, written in parrot.

We are going to have one, right?  Hmm, p6tkdb :)

It needs to keep references to objects it's interested in, but if
they're strong references, then we would have trouble debugging objects
with custom destroys (or worse, objects

cancel 3F46B0AD.F217054@hotpop.com

2003-08-22 Thread Benjamin Goldberg
This message was cancelled from within Mozilla.


Re: [CVS ci] PackFile-15: print warning location

2003-08-22 Thread Benjamin Goldberg


Juergen Boemmels wrote:
 
 Leopold Toetsch [EMAIL PROTECTED] writes:
 
  Further:
  The Csetline and Csetfile opcodes are suboptimal, they impose
  runtime penalty on each run core, so they will go finally. The
  Cgetline and Cgetfile can map to the functionality used in
  warnings.c.
 
 Normal processors also don't have setline and setfile operations. They
 use an extra segment in the *.o file, which is only used by the
 debugger. This could also be done in parrot.

In other words, setline and setfile ops in source don't translate to
actual ops in the bytecode, but instead translate to additions/changes
to the debugging segment?

 We just need to settle on a format for the line-info bytecode segement.
 The only question is reinvent the wheel, or use an already establiched
 format (stabs or DWARF).
 
  And finally: Parrot will (again[2]) track HLL source line info like:
 
 #line 17 sourcefile.p6
 
  So that warnings and errors can be located in HLL source too.

I don't like this syntax -- it sounds too easy for someone to write a
comment like:

#When this was in the original foobar language, it was on
#line 17

And have it interpreted as a directive, when the author meant for it to
be just a comment.

There's no reason not to have the directives look like ops (setline,
setfile).

Oh, and you could have get{line,file} directives, which end up
translated as being simple set ops, using the info generated by the
set{line,file} directives.

 It might be nice to have column information to. This would make
 debugging of Befunge programs a lot more easy. Also it would be nice
 to have blocks of code (many pasm-lines) with just one linenumber, and
 otherwise have a possiblity to increase the source-line with each line
 of pasmcode.

I like the ideas of a range of characters, and of variable amount of
information.  So, how about multiple setline variants?

   setline Ix # all code from here to the next set{line,file} op is line
x
   setline Ix, Iy # set line,col number from here to next set* op.

   setline_i Ix # the next line is x, each succeeding line increases.

   setlinerange Ix, Iy # the following represents lines x..y of hll
code.
   setlinerange Ix, Iy, Iz, Iw # line x, col y .. line z, col w.

   setfile Sx # sets filename from here to next setfile op.

There'd be a corresponding get* function for each of these except for
setline_i (for which it wouldn't make sense), which would get translated
at compile time to set Ix, 12 or whatever.  There should be a C-code
level interface to go (at runtime) from a pointer to bytecode (or from a
bytecode offset) to a file, line, or range of lines, or ... with
columns; this would be useful for debuggers.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: String API

2003-08-21 Thread Benjamin Goldberg
Leopold Toetsch wrote:
 
 Benjamin Goldberg wrote:
 
 
  Leopold Toetsch wrote:
  Not having an INTERP argument severely limits us, even in other ways.
 
 The INTERP argument is fine. The user defined encoding is/was my
 problem.

As in, you think we shouldn't have any, at all?

  Similarly, that would eliminate the chance of a STRING* which is
  actually a lazily concatenated list of other STRING*s; we'd only be
  able to do this as a PerlString derivitive.
 
 I have problems imaginating such kind of STRINGs.

You lack sufficient imagination -- Larry's suggested that Perl6 strings
may consist of a list of chunks.  I can easily imagine each of those
chunks being full-fledged STRING* objects.

A foolish question: can you imagine strings which are lazily read from a
file?

If so, could you imagine such a string, sitting in front of a really
really big file, bigger than could fit into memory?

Not only can I imagine all of theat, I can imagine the pain that would
be caused, if such file-strings are only implemented at the PMC level,
not at the STRING level, and someone unknowingly converts such a string
from a PMC into a STRING, and forces the entire file to be loaded from
disk into memory.

 They need an attached PMC doing the work + an attached list containing
 the string chunks.

Not necessarily.  If we could have str-strstart as a pointer to a
vector of STRING*s, we wouldn't need any PMC to contain the chunks.  And
the str-encoding api is (already) sufficient for doing the work.  The
only lack is a custom mark, to keep the sub-strings alive.

 You need a PMC anyway. Why not have this in a PerlString derived class.

If we have it in a PerlString derived class, and do not make it part of
STRING*, then we cannot pass such strings to C functions defined to
accept strings in STRING* parameters, or to Parrot subroutines which are
defined to accept strings in S-registers, or which move the strings from
P-registers to S-registers.

We would lose the magic, similar to how moving from a PerlInt to an
INTVAL loses magic.

Well, except that when a PerlInt loses magic going to an INTVAL, the
resulting integer generally takes *less* memory than it did as a PMC,
whereas losing magic by changing from a PMC to a STRING could very
easily result in using *more* memory.  (And doing lots of work, which we
wouldn't need if our string kept it's magic).

 So you don't have an overhead on average strings.

How much speed overhead is there?

  In particular, if we make a STRING* encoding which is a lazily
  concatted list of other strings (yes I keep going back to it, but
  Larry said we'll have them.  Theoretically he might have only meant as
  a high level type (a PerlString derivitive) but *I* think it would be
  nice for to be able to have this as a STRING* type) we'd want our
  iterator to be two integers: the first one being the integer into our
  array, the second being the iterator into the string we're currently
  iterating through.
 
 How many strings in JAPP[]1 might need that?

That depends.  Does concatenation in Perl6, by default, produce a lazy
concatenation, or an immediate actual concatenation?

Furthermore, consider if we allowed something like:

   my str $slurp = File.new($filename).slurp(); # =
File.slurp($filename)?

Sure, we could have this read in the whole file, but wouldn't it be
nicer if it would *lazily* fill in $slurp?

 Do you really want to slow down all string access, just for one very
 special corner case?

I don't believe that it *would* slow down all string access.

 ... provide wrappers which take character index
 arguments, and converts them into string iterators relative to those
 particular strings.
 
 Ok. I see. That's fine - except for utf8 strings.
 
 
  Why wouldn't it work for utf8 strings?
 
 The wrapper is O(n) for utf8 strings. So converting once might be
 cheaper during the first character-index access.

For the current string code, we already take O(n) to get a void* pointer
into an appropriate part of a utf8 string, for each character-index.

If we factor the current code into functions taking iterators, an
index-to-iterator converter, and wrappers taking indices, then it
shouldn't be significantly slower than the current code (except for the
overhead of entering/leaving a function, which might be eliminated by
the C compiler inlining the wrapper and conversion function, if they're
small enough... or by us providing macro versions of the wrappers and
converter).

And if use of the wrappers is discouraged in favor of the iterator
versions, except in cases where random access to the string truly is
needed, then some speed improvements can be gotten.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Should I target Parrot?

2003-08-21 Thread Benjamin Goldberg


Tom Locke wrote:
 
 OK, here's what I'm hoping Parrot can provide for the language I'm building.
 
 My big requirement is for lightweight microthreads (hopefully *very*
 lightweight - I'm considering one scheduler that can handle *millions*
 of threads on a single machine).

Hmm, I can envision a microthread implementation wherein each
microthread is nothing more than a Continuation PMC.

Add an interpreter-microthreads PMC* pointer.  (This could be a global
stored in interpreter-perl_stash-stash_hash, but that wouldn't be as
fast).  Then, add the following ops:

   inline op init_microthreads() {
  if( !interpreter-microthreads )
 interpreter-microthreads =
new_pmc(interpreter, enum_class_PerlArray);
  goto NEXT();
   }

=item Bmthread_create(in PMC)
Create a new microthread, which will start with the Continuation in $1.
=cut

   inline op mthread_create(in PMC) {
  VTABLE_push(interpreter, interpreter-microthreads, $1);
  goto NEXT();
   }

=item Bmthread_create(inconst INT)
Create a new Continuation, with the address set to $1, and 
=cut

   inline op mthread_create(inconst INT) {
  opcode_t *dest = PTR2OPCODE_T(CUR_OPCODE + $1));
  PMC * p = new_ret_continuation_pmc(interpreter, dest)
  VTABLE_push(interpreter, interpreter-microthreads, p);
  goto NEXT();
   }

=item Bmthread_yieldcc()
Create a new Continuation (which resumes after this op), schedule
it to activate later, and then switch to the next microthread.
=cut

   inline op mthread_yieldcc() {
  opcode_t *dest = expr NEXT();
  PMC * p = new_ret_continuation_pmc(interpreter, dest);
  VTABLE_push(interpreter, interpreter-microthreads, $1);
  p = VTABLE_shift(interpreter, interpreter-microthreads);
  dest = (opcode_t *)VTABLE_invoke(interpreter, p, dest);
  goto ADDRESS(dest); 
   }

=item Bmthread_yield(in PMC)
Schedule the Continuation in $1, then switch to the next microthread.
=cut

   inline op mthread_yield(in PMC) {
  opcode_t *dest = expr NEXT();
  PMC * p;
  VTABLE_push(interpreter, interpreter-microthreads, $1);
  p = VTABLE_shift(interpreter, interpreter-microthreads);
  dest = (opcode_t *)VTABLE_invoke(interpreter, p, dest);
  goto ADDRESS(dest); 
   }

=item Bmthread_yield(inconst INT)
Create a new Continuation, which resumes at address $1, then switch
to the next microthread.
=cut

   inline op mthread_yield(inconst INT) {
  opcode_t *dest = PTR2OPCODE_T(CUR_OPCODE + $1);
  PMC * p = new_ret_continuation_pmc(interpreter, dest);
  VTABLE_push(interpreter, interpreter-microthreads, p);
  p = VTABLE_shift(interpreter, interpreter-microthreads);
  dest = (opcode_t *)VTABLE_invoke(interpreter, p, dest);
  goto ADDRESS(dest); 
   }

   inline op mthread_fork() {
  PMC * p = new_ret_continuation_pmc(interpreter, expr NEXT());
  VTABLE_push(interpreter, interpreter-microthreads, p);
  goto NEXT() 
   }

 Oh and I will need them to be serializable.

I hope you aren't planning on serializing just a single isolated
microthread... that wouldn't work well with what I've got in mind due to
how much stuff comes along when you serialize a continuation -- you'd
get almost the whole interpreter serialized.

If you want, instead, to serialize interpreter-microthreads, however...
well, you'd *still* get almost the whole interpreter serialized, but
you're getting more bang for your buck :)

 I can live with co-operative scheduling, perhaps running
 'regular' threads over the top do provide a layer of preemptive
 scheduling.

Remember, each thread has it's own interpreter.  This means that one
would have to share -microthreads amongst multiple interpreters. 
Besides the very minor nuisance of locking for -push and -shift,
there's the very big headach of figuring out what happens when you
invoke a continuation created in one interpreter in another interpreter.

An easier solution might be to create a timer event which, when handled,
does a yieldcc.

[PS: All code in this message is UNTESTED.  If you'd *like* to test, be
my guest! :)]

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Weak References?

2003-08-21 Thread Benjamin Goldberg

I would like for Parrot to have some way of creating Weak References; I
think that this is probably a vital feature.

The way I envision this is as follows.  The following typedef and new
function would be added:

typedef void (*pobject_died_cb)(INTERP, PMC* who_asked,
   Pobj* weakref, void *callback_info);
void pobject_weakref(INTERP, pobject_died_cb callback,
   Pobj* weakref, void *callback_info);

Inside of a PMC*'s mark method, it registers callbacks (using the above
function) so that it will be informed about what to do if the object to
which it weakly refers to is found to not be alive at the end of the DOD
sweep.

The pobject_weakref function first checks if the 'weakref' argument has
been marked as alive -- if so, nothing happens.  Then, it adds the Pobj*
to a lookup table, pointing from the Pobj*, to a list of registered
callbacks for that Pobj*.

After DOD finishes, the lookup table is walked; for each entry whose
Pobj* hasn't been marked as alive, the callbacks are called.

The effect of this of course is that a WeakRef has no cost except during
Dead Object Detection.


The first, perhaps most important use, would be to implement a
string-interning table.

You'd have a global (per-interpreter) pmc which contains a hashtable of
cstrings to perlstrings; if the cstring is present, the corresponding
perlstring is returned; otherwise a new perlstring would be created and
added to the table, then returned.  If any of the perlstrings go out of
scope, then their hash entries should disappear from the table.
Obivously, if the table contained normal references to the strings, then
they won't ever go out of scope.  And if the table simply doesn't mark
them as alive, it wouldn't know when they're dead.  But with weakrefs,
this is easy.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: [PATCH 2]Build system

2003-08-21 Thread Benjamin Goldberg
Juergen Boemmels wrote:
 
 Vladimir Lipskiy [EMAIL PROTECTED] writes:
 
 [...]
 
  sub runstep {
  my $inc = 'include/parrot';
 
  my @headers=(
  sort
  map  { m{\./$inc/(.*)} }
  glob ./$inc/*.h
  );
 
 in a non clean tree the generated files are picked up also.
 
  $_ = \$(INC)/$_ for @headers;
  my $nongen_headers = join(\\\n   , @headers);
 
 So the name nongen_headers is worng here
 
  Configure::Data-set(
  inc= $inc,
  nongen_headers = $nongen_headers,
  );
  }
 
 Non-generated headers should be found in MANIFEST so using maniread
 might help here
 
  use ExtUtils::Manifest qw(maniread);
 
  my $inc = 'include/parrot';
 
  my @headers=(
  sort
  map  { m{$inc/(.*)} }
  keys %{maniread()}
  );

ITYM:

  my @headers=(
  sort
  map  { m{^$inc/(.*\.h)\z} }
  keys %{maniread()}
  );

Otherwise, someone might at some future date, write:

   langauges/mylang/include/parrot/oops.txt

And that would get picked up ;)

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: PerlHash.get_pmc_keyed of non existing key

2003-08-21 Thread Benjamin Goldberg


Leopold Toetsch wrote:
 
 IMHO is
 
$a = \$h{a};
print $$a;
$$a = xxx\n;
$a = $h{a};
print $a;
 
 the same as:
 
new P1, .PerlHash
set P0, P1[a]
print P0
set P0, xxx\n
set P2, P1[a]
print P2
end
 
 (PMCs have reference semantics[1])
 Shouldn't that print xxx as perl5 does? I.e. store the returned
 PerlUndef in the hash.

If you'd done:

 assign P0, xxx\n

Instead of set, then yes.  However, set Px, Py merely stores Py into
the register Px, without touching the PMC that was in it.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Timely Destruction: An efficient, complete solution

2003-08-20 Thread Benjamin Goldberg


Dave Mitchell wrote:
 
 On Sun, Aug 17, 2003 at 05:48:14AM -0600, Luke Palmer wrote:
  Here comes that ever-reincarnating thread again, sorry.
 
  This is a proposal for an efficient solution to the timely destruction
  problem, which doesn't use refcounting, and fits in to the current
  scheme pretty well.
 
 I don't quite follow all the below (probably mainly due to my
 infamiliarity with parrot's DOD/GC stuff).
 
 Would it be possible to illuminate it by explaining how the following
 Perl5 code (presumably being executed by Ponie/Parrot) would ensure that
 the two destructors are called at the places marked:
 
 {
 my $ref;
 {
 my $obj1 = Foo-new(...);
 my $obj2 = Foo-new(...);
 $ref = $obj1;
 } # $obj2-DESTROY called here
 # ...
 } # $obj1-DESTROY called here
 # ...

At each of the two end-of-scope braces, a sweep 0 instruction would be
emmited.  This performs Dead Object Detection, which at the first }
detects that $obj2 is dead, and at the second } detects that $obj1 is
unreachable.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: String API

2003-08-20 Thread Benjamin Goldberg


Leopold Toetsch wrote:
 
 Benjamin Goldberg [EMAIL PROTECTED] wrote:
  Leopold Toetsch wrote:
 
  Benjamin Goldberg [EMAIL PROTECTED] wrote:
   There are a number of shortcomings in the API, which I'd like to
   address here, and propose improvments for.
 
   To allow user-defined encodings, and user-defined transcoding,
   (written in parrot) the first parameter of all of the function
   pointers in the ENCODING and TYPE structures should be INTERP.
 
  This belongs IMHO into PerlString (or better a class derived from
  that).
 
  Then how do we pass a user-defined string to a function which expects
  an argument in an Sreg?
 
 We have here IMHO the same relationship as with ints:
 int (INTVAL) - Int (PerlInt)
 str (STRING) - Str (PerlString)
 You can't tie native types, you can't attach properties on these and so
 on. And you can't pass some kind of active native strings in the SReg.
 The user-defined (written in parrot) implies, that these are
 PerlString PMCs.

Not having an INTERP argument severely limits us, even in other ways.

Even ignoring the problem of an encoding being an active string, what
about if it needs to allocate memory (perhaps some temporary buffer to
do some computation needed to decode bytes of a string)? 
sys_mem_allocate and sys_mem_free all over the place?  Blech.  I want to
be able to allocate a garbage collectible Buffer :(

This also eliminates the chance of having a STRING* which is attached to
a file -- we'd only be able to do such a thing as a PerlString
derivative.

Similarly, that would eliminate the chance of a STRING* which is
actually a lazily concatenated list of other STRING*s; we'd only be able
to do this as a PerlString derivitive.

  Now, suppose that instead of a pointer, we had an integer describing
  the number of bytes from strstart to where we're looking... *now* most
  of the problems go away.
 
 So lets change the encoding-skip_{for,back}ward to take/return an
 INTVAL being the byte-position relative to strstart.

And they need to take str-strstart! :)

I said most, not all.  It solves the problems incurred with the
string buffer getting moved by gc (which is good), but it doesn't solve
everything.

In particular, if we make a STRING* encoding which is a lazily concatted
list of other strings (yes I keep going back to it, but Larry said we'll
have them.  Theoretically he might have only meant as a high level type
(a PerlString derivitive) but *I* think it would be nice for to be able
to have this as a STRING* type) we'd want our iterator to be two
integers: the first one being the integer into our array, the second
being the iterator into the string we're currently iterating through.

  11/ Any string_ function which takes a character index as a
   parameter, should be able to take a string iterator.
 
  Bloat IMHO. While this abstraction is flexible, it IMHO doesn't
  belong into the string subsystem but into a string class, that
  implements these functions.
 
  The bloat can be avoided if the primary string_ implementations *only*
  took string iterators.  Then, to satisfy those who want to use
  character indices, provide wrappers which take character index
  arguments, and converts them into string iterators relative to those
  particular strings.
 
 Ok. I see. That's fine - except for utf8 strings.

Why wouldn't it work for utf8 strings?

   INTVAL string_index_to_iterator(INTERP, STRING *s, INTVAL index) {
  INTVAL start = s-encoding-iterator_start(interpreter, s);
  return s-encoding-skip_forward(interpreter, s, start, index);
   }

Converting an index to an iterator for a utf8 string crawls through the
string and finds the byte offset.

 But these could be converted to utf32 as soon as they are seen.

For a long string, that could be quite a bit of bloat.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Timely Destruction: An efficient, complete solution

2003-08-20 Thread Benjamin Goldberg
Dave Mitchell wrote:
 
 On Wed, Aug 20, 2003 at 06:40:51PM -0400, Benjamin Goldberg wrote:
  Dave Mitchell wrote:
  
   On Sun, Aug 17, 2003 at 05:48:14AM -0600, Luke Palmer wrote:
Here comes that ever-reincarnating thread again, sorry.
   
This is a proposal for an efficient solution to the timely
destruction problem, which doesn't use refcounting, and fits in to
the current scheme pretty well.
  
   I don't quite follow all the below (probably mainly due to my
   infamiliarity with parrot's DOD/GC stuff).
  
   Would it be possible to illuminate it by explaining how the
   following Perl5 code (presumably being executed by Ponie/Parrot)
   would ensure that the two destructors are called at the places
   marked:
  
   {
   my $ref;
   {
   my $obj1 = Foo-new(...);
   my $obj2 = Foo-new(...);
   $ref = $obj1;
   } # $obj2-DESTROY called here
   # ...
   } # $obj1-DESTROY called here
   # ...
 
  At each of the two end-of-scope braces, a sweep 0 instruction would
  be emmited.  This performs Dead Object Detection, which at the first
  } detects that $obj2 is dead, and at the second } detects that
  $obj1 is unreachable.
 
 Yeah, I understood that bit. It was all the stuff about lists of
 high-priority and low-priority objects etc that I got lost in.

(For the sake of my fingers, please read any ntd as needing timely
destruction, and owntd below as object which needs timely
destruction.)

Well, *most* of the time, when we reach a }, *nothing* has gone out of
scope.  Or at least, no owntds have gone out of scope.

Thus, on *those* occasions (when nothing ntd has gone out of scope), we
want the sweep 0 instruction to execute really really fast, not
slowing our interpreter down at all.

When we normally do dead object detection, when we discover an object
which is reachable, which we hadn't previously marked as reachable, we
would just add it to a simple queue -- a linked list.  I forget if this
list is a fifo or stack, but it doesn't really matter... the point is,
the order we scan our objects depends soley on the order in which we see
them.

Luke's proposal is that we change to *two* queues -- one queue
containing objects which are believed to be able to reach owntds (this
is the high priority queue), and a second queue for objects which we
don't think are able to reach any owntds (this is the low priority
queue).

*Assuming* that no owntds have gone out of scope, then by scanning the
high priority queue first, we *should* (we hope) be able to (quickly)
mark all of the owntds as being alive (just by scanning the high
priority queue), and then quickly abort the DOD.

 In particular how does it detect that $obj2 should be kept alive at the
 end of the inner block withoput doing a *full* DoD sweep?

ITYM, s/kept alive/destructed/.

Alas, in those situations when an owntd goes out of scope, a full DoD
sweep will be needed.

After sweeping the high priority queue, and discovering there are some
owntds not yet marked as alive, we continue with the low priority queue,
and if they're *still* not all marked as alive, we continue with garbage
collection and firing off of destructors.

But this situation *isn't* the majority case.

 Since I presume that's what the proposal was intended to avoid.

It's intended to speed up the majority case, when no owntds have gone
out of scope, and thus no destructors need be called.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: String API

2003-08-19 Thread Benjamin Goldberg
Luke Palmer wrote:
 
 Benjamin Goldberg writes:
[snip]
 9/ New ops which provide access to the string iterator API.
 
 Yes.  What is going to be used to store an iterator.  An I reg, a P reg?
 If it's a PMC, would it be possible to just implement the iterator
 itself as a PMC, and use the standard iterator vtable methods (which
 are?) for motion and dereferencing?  Again, that involves a vtable
 overhead and doesn't lend itself to JIT very well (which is very, very
 important).

I envision that an iterator for a STRING* would be an INTVAL, but an
iterator for a PerlString pmc would be an Iterator pmc.

There are a number of reasons why INTVALs are best -- the most important
ones being that they're *small*, and that we need do nothing at all in C
code to prevent them from being garbage collected.

Also, if STRING*s use INTVALs as their iterators, then PerlString can
work with the Iterator.pmc class simply by wrapping these INTVALs inside
Key.pmc objects, using key_new_integer.

There are, alas, a few drawbacks to this.
  1/ The algorithm needs to be designed so that these intvals are
agnostic to the location of the string's location in memory -- if
garbage collection moves the string's buffer, the iterator needs to
remain valid. Thus, you can't use a pointer into the buffer which has
been simply cast to an integer.  (You *can*, however, subtract the start
of the buffer from a pointer into the buffer, and then cast this
ptrdiff_t into an INTVAL).
   2/ If you need data more complex than an integer -- e.g., if your
string's data is actually in some sort of tree, and you want to say,
the node at the end of the fourth branch from the twentieth branch from
the firth branch from the root, then you're in trouble.

Hmm, considering how much of a problem 2/ is, maybe we should use
Key.pmc objects as our iterators?  This would make PerlString's
interface with Iterator even simpler.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: String API

2003-08-19 Thread Benjamin Goldberg
Leopold Toetsch wrote:
 
 Benjamin Goldberg [EMAIL PROTECTED] wrote:
  There are a number of shortcomings in the API, which I'd like to
  address here, and propose improvments for.
 
  To allow user-defined encodings, and user-defined transcoding,
  (written in parrot) the first parameter of all of the function
  pointers in the ENCODING and TYPE structures should be INTERP.
 
 This belongs IMHO into PerlString (or better a class derived from that).

Then how do we pass a user-defined string to a function which expects an
argument in an Sreg?  In particular, consider if the string is Really
Really big, and actually resides on disk instead of in memory.  If we
had to convert from a PerlString derivative to a STRING*, then we'd have
to load that whole file into memory.  Ugh.

  I *really* *really* want string iterators.  The current API for
  iterating through the characters of a string is, IMHO, vastly
  insufficient.
 
 encoding-skip_forward(.., by_n) doesn't look like that insufficient. A
 skip_one() function wouldn't harm though.

That wasn't precisely what I was speaking of.

 1/ Iterators won't become invalid if the string gets moved in
  memory.
 
  Currently, all we've got is a void* pointer which points into the
  buffer of the string; during GC, strings can get reallocated, making
  the pointer invalid.
 
 You are not allowed to cache the pointer.

That's my point.  I want an iterator value which I *can* cache.

 string-strstart + idx is always your actual character in the string.

The pointer can get invalidated even without storing it for any length
of time.

Consider:

   PMC * array = pmc_new( interpreter, enum_class_Sarray );
   void * iter = str-strstart;
   INTVAL i = 0, len = string_length( str );
   VTABLE_set_integer( array, string_length( str ) );
   for( i = 0; i  len; ++i ) {
  INTVAL c = str-encoding-decode( iter );
  VTABLE_set_integer_keyed_int( array, i, c );
  iter = str-encoding-skip_forward( iter, 1 );
   }

What happens if VTABLE_set_integer on that sarray causes a gc to get
run, and if that gc causes the string's buffer to get moved?  Oops.

Is the code construct I have forbidden?

 To satisfy 1/ we would have to mark the string as immobile (which we
 have a flag for) *but* you can't grow such strings, the copying
 collector can't cleanup the block, where the string is in (and worse,
 the collector currently just frees the block).

Indeed.  That's why using a pointer into the string is so very
insufficient.

Now, suppose that instead of a pointer, we had an integer describing the
number of bytes from strstart to where we're looking... *now* most of
the problems go away.  It would no longer matter if the string got
moved, would it?  The drawback of course is that we'd need to add it to
the str-strstart pointer before any time that we want to use it... but
that's not especially painful.

 10/ Add methods to PerlString to make it compatible with Iterator.
 
 Yep. That was in my iterator proposal.
 
 11/ Any string_ function which takes a character index as a
  parameter, should be able to take a string iterator.
 
 Bloat IMHO. While this abstraction is flexible, it IMHO doesn't belong
 into the string subsystem but into a string class, that implements these
 functions.

The bloat can be avoided if the primary string_ implementations *only*
took string iterators.  Then, to satisfy those who want to use character
indices, provide wrappers which take character index arguments, and
converts them into string iterators relative to those particular
strings.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: String API

2003-08-19 Thread Benjamin Goldberg


Tim Bunce wrote:
 
 On Tue, Aug 19, 2003 at 12:07:22AM -0400, Benjamin Goldberg wrote:
  There are a number of shortcomings in the API, which I'd like to
  address here, and propose improvments for.
 
 Just to be sure people are keeping it in mind, I'll repost this from
 Larry:
 
 On Wed, Jan 30, 2002 at 10:47:36AM -0800, Larry Wall wrote:
 
  For various reasons, some of which relate to the sequence-of-integer
  abstraction, and some of which relate to infinite strings and
  arrays, I think Perl 6 strings are likely to be represented by a list
  of chunks, where each chunk is a sequence of integers of the same size
  or representation, but different chunks can have different integer
  sizes or representations.  The abstract string interface must hide
  this from any module that wishes to work at the abstract string level.
  In particular, it must hide this from the regex engine, which works on
  pure sequences in the abstract.
 
 Tim.

*I* was certainly keeping it in mind ;).

Just for the curious, the *reasoning* behind my proposed requirements
are as follows:

   1/ The regex engine uses string_index all over the place.  This is an
O(n) operation for the utf8 encoding.  This is bad.

   If we had real string iterators, then this would be an O(1)
operation.

   My requirements 4..7 are a description of the time complexity which
most normal people can expect of iterators.

   2/ There's no way for a string to refer to other strings or pmcs,
making all sorts of things (including what Larry mentioned) impossible.

   Or at least, if we tried, there's no way to prevent the things we're
pointing to from being cleaned up out from underneath us, since we've no
way of marking them as alive.

   This is my requirement 8, and, to a lesser degree, requirement 3.

   /***/

   Most of everything else assumes that the solution to failing 1/ of
the current API will actually be a string iterator.

   3/ String iterator usage should be *simple*.

   Making them pmcs would mean that we'd need to temporarily anchor
them.  Having them as void* pointers to gc-relocatable memory means that
they can become invalid at unexpected times.  Obviously if we
temporarily disable DOD/GC, then these can be avoided, but that has
other drawbacks.

   This is my requirements 1 and 2.  I fear that we're going to lose the
requirement that iterators be simple objects, since letting them be pmcs
gives us so much more flexibility.  (Which, I fear, we need, if the
encoding is a not-so-simple data structure (like a tree, or a lazily
concatenated sequence of substrings)).

   It's also my requirements 9..12: if we're going to have them, use
them.

   /***/

   4/ If we've got fast string iterators, then an Iterator.pmc object
for a PerlString object won't be significantly slower than using
iterators for strings directly.  So... why not make the rx engine work
on Iterator.pmc objects?

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: What the heck is: timely destruction

2003-08-18 Thread Benjamin Goldberg
Michael G Schwern wrote:
[snip stuff, including a mention of refcounting and it's
(dis)advantages]
 So Parrot is going with something else.  Don't ask me what it is, I
 don't know.

Parrot will do it like Java -- a mark-and-sweep garbage collector --
with the difference that garbage collection will generally happen more
often.

In Java, it happens when you run out of memory, or when someone does
java.lang.Runtime.getRuntime().gc().  There's no timely destruction,
be cause you'd need to be psychic to know ahead of time when you're
going to run out of memory.  And, since gc() isn't especially fast,
calling it often would slow your program down unnecessarily.

In Parrot, it happens when you run out of memory, or any time that a
sweep opcode is called.  And sweep might be called at the end of
every scope, and maybe whenever you undef() a variable, and sometimes
even more often.  Thus, chances to clean up objects in Parrot are almost
as frequent as they are in perl5.

 With this other garbage collecting technique its more involved than ref
 counting

Whether it's more involved or less involved depends on from whose point
of view you're looking.

From the POV of the person implementing the garbage collector, it's more
involved.

From the POV of the program using the system, it's less involved.

 to guarantee timely destruction as we desire in Perl. Apparently
 someone's figured a way to do it and do it efficiently.

Well... mostly.

As a memory manager, it's certainly more efficient.  That is, if you
only call it when you need to free up memory, then on average, it uses
less time than refcounting does.

However, regarding the timely destruction part... this could use some
improvement.

Every time we come to a scope end, we do a garbage collection run.  This
is often inefficient.  We can sometimes statically determine that no
objects needing timely destruction could possibly have gone out of
scope, and leav out the sweep code.

But this still isn't enough.

What we'd like is a way (and there've been a couple proposed) to make it
so that the sweep at the end of scope can *quickly* determine that all
objects needing timely destruction are still alive/reachabe/in-scope,
and abort early (and thus use less time).

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


String API

2003-08-18 Thread Benjamin Goldberg
There are a number of shortcomings in the API, which I'd like to address
here, and propose improvments for.

Not so much the string_* functions, but rather with how they work (the
encoding API, the transcoding functions).

To allow user-defined encodings, and user-defined transcoding, (written
in parrot) the first parameter of all of the function pointers in the
ENCODING and TYPE structures should be INTERP.

This, of course means that the first parameter to all of the string_*
functions needs to be INTERP, too.  Prefereably, all of them, not just
the ones which actually use -encoding or -type... so that we'll have
the freedom to change them so they *do* use -encoding and -type in the
future.  (And for consistancy).  Note that currently, *not* all of the
string_ functions have INTERP as their first param.

For other encodings than the built-in ones, the string data should not
*need* to be inside of the string buffer itself (though of course it
still *can* be); we should be able to build it lazily, or read it from a
disk, or by calling methods on a PMC, or whatever.  This will likely
mean having one or more pObjs (PMCs, generally) stored in the buffer. 
This, in turn, means we need a 'mark' entry in the encoding vtable, to
prevent them from being cleaned up from under us.  We should also have
the option of allocating non-gced memory from the system and freeing it
when the string is freed; this means we need a 'destroy' entry in the
encoding vtable.

(For the builtin string types, 'mark' and 'destroy' will of course be
NULL.  For custom strings, of course, they might not be.)

(For simplicity, we might allow the buffer to *only* consist of either
raw data, *or* of pointers to pObjs; then we only need a single flag to
tell the gc which we have, and don't need a 'mark' or 'destroy')


I *really* *really* want string iterators.  The current API for
iterating through the characters of a string is, IMHO, vastly
insufficient.

The following are what I want for string iterators:

   1/ Iterators won't become invalid if the string gets moved in memory.

Currently, all we've got is a void* pointer which points into the buffer
of the string; during GC, strings can get reallocated, making the
pointer invalid.  For that matter, if you grow a string, while iterating
forward through it, there's a good chance that your iterator will become
invalid.

   2/ Iterators should be integers, structs, or pointers into immobile
memory (memory which won't be moved during GC).  They should not need to
be anchored to avoid being GCed, nor freed to avoid memory leakage.

If, for the builtin types, we changed that 'void*' pointer to an
integer, indicating a number of bytes from strstart, then conditions 12
would be satisfied for them.

   3/ The encoding functions (iterator creation, advancement, etc.)
should be able to call into parrot code; thus, they need an INTERP as
the first parameter.

   4/ It should take O(1) time to get an iterator to the start or end of
a string.

   5/ It should take O(n) time to advance an iterator n characters
(either forwards or backwards).  It would be nice if it took O(1) time,
but it's not necessary.

   6/ It should take O(1) time to decode whatever characters are at the
iterator.

   7/ If two iterators are N characters apart, it should take O(N) time
to measure that distance.

   8/ The encoding/iterator API should be sufficiently complete to allow
someone to write a character-rope string type, and have it work
seamlessly with other strings.

   9/ New ops which provide access to the string iterator API.

   10/ Add methods to PerlString to make it compatible with Iterator.

   11/ Any string_ function which takes a character index as a
parameter, should be able to take a string iterator.

   12/ The rx engine should use the new ops.

   12a/ We should be able to use the rx engine to match a stream of
values from an Iterator PMC.  Whether this Iterator is crawling over a
PerlString, or PerlArray, or something else, shouldn't matter to the rx
engine.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Parrot 0.1.0 -- what's left?

2003-08-18 Thread Benjamin Goldberg
Steve Fink wrote:
 
 In light of the insane amount of work that's gone into Parrot
 recently, I'd say it's about time to cut another release. What else
 would people like to slip in? This is not a freeze announcement yet --
 I want to know what people think of the state of things they're
 working on first.
 
 As for the version number -- Dan, if it's ok with you, I'd like to
 call this 0.1. Leo's got some form of exceptions in, which was the
 stated gate to 0.1, and with the EXEC stuff and a real live Python
 port, it seems to me that it's more than earned the name.

Here's some suggestions, not for 0.1, but for 0.2 :)

Finish objects.  Add a vtable isa method?

Document somewhere the difference between set_pmc and clone.  (Leo says
that it's shallow vs deep copy.  Ok, but which is which? :))

Allow .macro in imcc, for when we're parsing pir code.

Macro-ize the stub functions in default.pmc, so that throwing an
exception takes one line of C code.

Profiling and coverage tools.

Get more functions which explicitly declare ParrotInterpreter *
interpreter to use the INTERP macro.

Recategorize various set/assign/clone ops to alias/mutate/create.

A new macro, INTERP_as_arg (or INTERPa, or...) which would currently
just be defined as interpreter, and gradually change calls to
functions which take an interpreter as thier first argument to use this
macro.  This way, if in the future, we want to change the INTERP macro
to be ParrotInterpreter * interpreter, struct gc_frame * frame, we can
prevent most breakage simply be changing INTERPa to interpreter,
frame.

String stuff.  (See my other post).

Allow imcc to generate C code (a la pbc2c.pl, but one subroutine at a
time, so as not to overload gcc/whatever's optomizer.)

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: set vs. assign, continued: 'add' vs. 'add!'

2003-08-17 Thread Benjamin Goldberg
Benjamin Goldberg wrote:
 Brent Dax wrote:
  TOGoS:
  # When I say in IMCC:
  #
  #   $P0 = $P1 + $P2
  #
  # , I expect it to create a new value and store it in
  # $P0, not give me a segfault because I didn't say
  #
  #   $P0 = new figure out what the heck type
  #  $P0 is supposed to be based
  #  on the types of $P1 and $P2
  #
  # first.
 
  Then your expectations are wrong.
 
  I think you may be losing sight of the fact that most users will *never*
  interact with PASM or PIL directly.  Most users will have a
  parser/compiler/optimizer between them and PIL--one that was written,
  tested and debugged by experts.  The few people who work with raw
  assembly or IL should know what they're doing.  While I have no problem
  with IMCC-time warnings about constructs like this (Use of
  uninitialized register $P0 in add), I don't think we should coddle the
  user at this level.
 
  Moreover, there really isn't any way to do what you want to do here.
  There is no way for the add opcode to intuit what type should be put
  into $P0.
 
 Noone is saying that the add opcode *should* try to intuit that -- he's
 trying to say that there should be two versions of the add opcode, one
 of which uses assign semantics, and the other of which uses set
 semantics.
 
  That's ultimately the host language's decision, not Parrot's,
  and the host language may have complicated rules to decide it.
 
 It would be the host language's decision about which opcode to use.
 
 I suppose that if one writes:
$P0 = $P1 + $P2
 , then it might be parrot's (imcc's) decision about which opcode to
 compile it to... but of course, the host language can avoid that
 ambiguity by writing it out explicitly, as add_set( $P0, $P1, $P2 ) or
 add_assign( $P0, $P1, $P2 ).  Or perhaps, require some pragma
 indicating to imcc that all $Px = $Py + $Pz should use the add or assign
 versions of the ops.

Hmm... I just thought of something.  Since 'set' semantics can be easily
simulated when we have only ops for 'assign' semantics, maybe imcc
itself could do this for us.

That is, by default,
   $P0 = $P1 + $P2
will be translated by imcc into
   add $P0, $P1, $P2
but, if some command is given to imcc, then it will be translated into
   new $P0, .PerlUndef
   add $P0, $P1, $P2
Or, instead of PerlUndef, the command to change to set semantics would
indicate what to put into $P0 prior to using it as the target of an add
(so we can initialize $P0 with any of PerlInt, PerlNum, BigInt,
BigFloat, BigRat, ContinuedFraction, etc.)

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: there's no undef!

2003-08-17 Thread Benjamin Goldberg


Michal Wallace wrote:
 
 Uh-oh. I just went to implement del x
 and there's no op to remove a variable
 from a lexical pad! :)

Why would you want to remove a variable from a lexical pad?

Surely the right thing to do would be to create a new pad (scope),
then add your 'x' variable which you plan to 'delete' in the future,
then when you want to delete that variable, pop off that pad.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Calling parrot from C?

2003-08-17 Thread Benjamin Goldberg


Togos wrote:
 
 --- Sean O'Rourke [EMAIL PROTECTED] wrote:
  Luke Palmer [EMAIL PROTECTED]
  writes:
 
   How does one call a parrot Sub from
   C and get the
 
  I'd vote for stuffing args into the
  interpreter, calling the sub's invoke()
  method, then digging through the registers
  to pull out the return values (see e.g.
  Parrot_pop_argv in method_util.c, which
  may be outdated). Then again, it would be
  _your_ pain, not mine ;).
 
 The Parrot-C interface could simply come with
 some wrapper functions that would do this for
 you. I suppose this could be done with a couple
 printf/scanf-like functions, except they would
 work with pcc parameters instead of strings.

Would that be a printf-like to put args onto the stack, with the format
string being the subroutine's prototype, and a scanf-like to get return
values from the stack, with the format string being the return context?

If we didn't have NCI, using these two functions in the other order (to
get the arguments passed in, then to return values) might be useful for
functions which want to be called from Parrot.

Hmm... even with the existance of NCI, doing that might be useful --
having a function process it's arguments itself, rather than having NCI
do it, might be more efficient.

Oh, and maybe this idea could probably replace pxs.c?

 (This would probably tie in to the subroutine
 signatures discussed in 'calling conventions,
 variable-length parameter lists'.)

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Timely Destruction: An efficient, complete solution

2003-08-17 Thread Benjamin Goldberg
Luke Palmer wrote:
 
 Here comes that ever-reincarnating thread again, sorry.
 
 This is a proposal for an efficient solution to the timely destruction
 problem, which doesn't use refcounting, and fits in to the current
 scheme pretty well.
 
 It is based on the fact that 90% of the time (or more), all objects
 needing timely destruction will be found live.  It uses a priority
 method to try to find these objects quickly, and ceases if it does.
 This behavior is only carried out if the DOD was invoked by Csweep 0.
 
 There is an external list of objects needing timely destruction, which
 is not walked by DOD.  Each object has a DOD_high_priority_FLAG.  Each
 time an impatient object is created, its high_priority_FLAG is set.
 
 As everything is walked, if an object with this flag set is encountered,
 whichever thing referenced it also gets the flag set[1].

 The DOD queue has two segments:  high_priority and low_priority.
 high_priority is always drained and processed first.

All this, I understand so far.

 When this portion of the queue is completely drained, the external list
 of impatient objects is checked.  If every object has been marked live,
 DOD terminates

This, I don't understand.  Why do we stop DoD here?  Shouldn't we try
and free up some of the non-impatient objects?

[insert sound of gears turning]

OIC... if this is only for sweep 0, then we haven't really *asked* to
try and free up the impatient objects; we've only asked to free up those
objects needing timely destruction.

 (and GC is not allowed to run, because that would result
 in a lot of wrongly murdered objects).

I don't understand this, though.  Why would GC murder something, if that
something has been marked as live?  Or do you mean, if we stoped DoD in
the middle, then obviously don't follow it with the GC that normally
follows a DOD.  That makes sense, though I think that skipping GC after
an *incomplete* DOD is a sufficiently obvious thing that it needn't be
mentioned ;)

 If there are dead objects in that list,
 DOD continues and does a full sweep.

When you say, DOD continues, I assume you mean that we now drain the
low priority queue.  I assume that it's *this* part of the DOD, when,
if we now encounter something with a high_priority flag set, we set the
high_priority flag of the object which can reach it.  (Certainly not
when we're draining the high priority queue: all the things doing the
reaching there already have the high_priority flag set on them).

 When an impatient object is destructed, it might be good to reset all
 high_priority_FLAGs, except for other impatient objects, so there is
 nothing being walked at high priority that doesn't deserve it.

Or, set a global (per-interpreter) flag indicating that an impatient
object has been destructed, and all high_priority_FLAGs need to be reset
at the beginning of the next DoD run.

 That's it.  The first few times Csweep 0 is run, it will take just as
 long as Csweep 1, but the priorities will propogate down to things in
 the root set, and it will begin to speed up.  And the algorithmic
 complexity never exceeds that of a regular sweep[2].

I do see a potential problem:  Suppose that a highly visible PMC (one
reachable through lots of objects) is able to reach a PMC needing timely
destruction, and then a sweep 0 is done, and afterwards that highly
visible PMC changes so it no longer can reach the object needing timely
destruction.  It still has it's high_priority flag set, and this will
propogate into the many things which will reach it, eventually going
into the root set.

And the more things which (incorrectly) end up in the high priority
queue instead of the low priority queue, the closer it's speed will be
to a regular sweep.

I can think of a workaround, but it would cost more memory. 
Specifically, make the priority an actual number, which decreases with
the distance away from an object needing timely destruction (e.g., one
less than the max of the priorities of the objects we can reach).  If an
object needs timely destruction, it's priority is fixed at MAX_INT. 
PMCs which used to be able to reach things needing timely destruction,
but no longer can, will thus no longer have high priorities.

An advantage of this is that if an object needing timely destruction is
destructed, then we no longer need to clear the priorities of all PMCs;
most of the priorities (the ones leading to other objects needing timely
destruction) are still good, and the priorities going to the destructed
PMc will eventually decrease.

An improvment that can be made, if numeric priorities are used, would be
to make the high priority queue into an actual priority queue.
Although the algorithmic complexity will be higher than that of a
regular sweep, the objects needing timely destruction will be reached
faster, since the search is directed straight at them.



For any scheme (yours scheme, my suggested modification, or anything
else), a potential improvment to the speed of 

Re: set vs. assign, continued: 'add' vs. 'add!'

2003-08-17 Thread Benjamin Goldberg
Luke Palmer wrote:
 
 Benjamin Goldberg writes:
  Hmm... I just thought of something.  Since 'set' semantics can be
  easily simulated when we have only ops for 'assign' semantics, maybe
  imcc itself could do this for us.
 
  That is, by default,
 $P0 = $P1 + $P2
  will be translated by imcc into
 add $P0, $P1, $P2
  but, if some command is given to imcc, then it will be translated into
 new $P0, .PerlUndef
 add $P0, $P1, $P2
  Or, instead of PerlUndef, the command to change to set semantics would
  indicate what to put into $P0 prior to using it as the target of an
  add (so we can initialize $P0 with any of PerlInt, PerlNum, BigInt,
  BigFloat, BigRat, ContinuedFraction, etc.)
 
 Hmm, I wonder whether this is the right approach.  What about operator
 overloading?  Let's say Python overloads + (can it do that?) and passes
 that variable to Perl.  If Perl creates a .PerlUndef and assigns to it,
 you lose the overloaded semantics.

Eh?  I would think that if Python overloads +, then that means it's
creating a PMC with a vtable-add different from a normal addition
operation.  If we assign that pmc to a pmc which was PerlUndef, then
the vtable (pointer) gets assigned, too.  Doesn't it?

Unless you mean, let's say Python makes a pmc with an unusual vtable,
and overloads vtable-add in such a way that it expects it's target to
be a something pmc, but instead, we pass in a PerlUndef.  If
vtable-add doesn't bother checking that the target isn't the right
type, then it's broken.  It should either throw an exception if it's not
what's wanted, or else mutate/promote that PerlUndef to the right type.

 I don't know... I think using vtable multimethods would be the best way
 to go wrt these things.

Umm, I didn't think that I was suggesting something else -- I was
suggesting merely that imcc can be told to produce to pretend that +, -,
and other mathematical operations have set semantics, and have it
produce parrot ops to support that pretense.  Within parrot, the add
op would still be the same as it always has been (merely calling the
vtable-add method).  And I'm not suggesting that the vtables be
changed. The only difference would be, *if* that command had been given
to imcc, then any add op in the bytecode, which had originally been
spelled as + in the pir code, would be preceded by a new op (put
there by imcc).

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: there's no undef!

2003-08-17 Thread Benjamin Goldberg


Michal Wallace wrote:
 
 On Sun, 17 Aug 2003, Benjamin Goldberg wrote:
 
  Michal Wallace wrote:
 
   Uh-oh. I just went to implement del x
   and there's no op to remove a variable
   from a lexical pad! :)
 
  Why would you want to remove a variable from a lexical pad?
 
  Surely the right thing to do would be to create a new pad (scope),
  then add your 'x' variable which you plan to 'delete' in the future,
  then when you want to delete that variable, pop off that pad.
 
 Hmm. Do you mean
 
   if for stmt in block:
  if stmt.type == undef:
 flag_as_going_to_delet(stmt.varname)
 
 So I can create a new pad when it's assigned?

Right.  You'd create a new pad just before the for, and put stmt
into it.  After the for loop ends, you pop off that pad.

 It can't be a simple pop though, can it?

Why not?

 #!/usr/bin/perl
 $x = cat;
 $y = pop $x?;
 undef $x;
 print x = $x \n;
 print y = $y \n;
 
 Because here, $x has to be defined before $y, so
 I'd have to delete the -2nd scope. Unless the
 code was smart enough to work out that y is in
 -2 and x is in -1...

These are dynamic variables here; could you give an example of what
you're trying to do with lexicals (my variables)?  And remember, undef
doesn't get rid of a variable; it merely stores an undef value into it.

I mean, consider:

   #!/usr/bin/perl
   use strict;
   { my $x = cat;
 { my $y = pop $x;
   # $x and $y are both in scope.
   undef $x;
   # $x and $y are still both in scope.
   $x = 2; # no error here!
 } # $y goes out of scope.
 $x = 42; # ok.
 $y = 6*9; # but this is an error.
   } # $x goes out of scope now.
   $x = 13; # this is an error.
   __END__

There's no way, in this program, for $x to be out of scope while $y is
in scope.

 In any case, what does it buy me? it seems like a
 lot of work when there's already a delete_keyed
 op, and leo just made an implementation for pads. :)
 
 I guess in perl it's not bad, since that's the
 whole point of undef, you can just $x = PerlUndef
 it... But in python, this throws a NameError:
 
x = 1
del x
x
 
 So the easiest thing to me is just to translate
 del x as  remove this from the current pad.

Maybe.  But, what happens with:

   x = 1
   y = lambda: x
   del x
   z = y()

Does/should this also throw a NameError?

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: set vs. assign, continued: 'add' vs. 'add!'

2003-08-15 Thread Benjamin Goldberg


Brent Dax wrote:
 
 TOGoS:
 # When I say in IMCC:
 #
 #   $P0 = $P1 + $P2
 #
 # , I expect it to create a new value and store it in
 # $P0, not give me a segfault because I didn't say
 #
 #   $P0 = new figure out what the heck type
 #  $P0 is supposed to be based
 #  on the types of $P1 and $P2
 #
 # first.
 
 Then your expectations are wrong.
 
 I think you may be losing sight of the fact that most users will *never*
 interact with PASM or PIL directly.  Most users will have a
 parser/compiler/optimizer between them and PIL--one that was written,
 tested and debugged by experts.  The few people who work with raw
 assembly or IL should know what they're doing.  While I have no problem
 with IMCC-time warnings about constructs like this (Use of
 uninitialized register $P0 in add), I don't think we should coddle the
 user at this level.
 
 Moreover, there really isn't any way to do what you want to do here.
 There is no way for the add opcode to intuit what type should be put
 into $P0.

Noone is saying that the add opcode *should* try to intuit that -- he's
trying to say that there should be two versions of the add opcode, one
of which uses assign semantics, and the other of which uses set
semantics.

 That's ultimately the host language's decision, not Parrot's,
 and the host language may have complicated rules to decide it.

It would be the host language's decision about which opcode to use.

I suppose that if one writes:
   $P0 = $P1 + $P2
, then it might be parrot's (imcc's) decision about which opcode to
compile it to... but of course, the host language can avoid that
ambiguity by writing it out explicitly, as add_set( $P0, $P1, $P2 ) or
add_assign( $P0, $P1, $P2 ).  Or perhaps, require some pragma
indicating to imcc that all $Px = $Py + $Pz should use the add or assign
versions of the ops.

 And even if we did do something like this, we'd be penalizing calls to
 add that *do* have correct arguments.

Define correct.

If add only has assign semantics, then any language which wants set
semantics must first allocate a new pmc, then add with that as the
target.

If add were to only have set semantics, then any language which wants
assign semantics must first add, then assign (and then, I suppose, null
out the pmc which add created for it's target so it can be GC'd.)

(Note that if add were to have set semantics, it wouldn't be parrot's
decision about what type of pmc the target should have; that decision
would belong to the add vtable entry of the first of the two things
being added together.)

Clearly, having *only* an add with set semantics would be wrong, since
there'd be no way to avoid having the extra pmc created.  (Not to
mention, even if we want to assign the result into one of the pmcs being
added together, our result still has to go into a register other than
one of them; this might require spilling registers first.)

However, having *only* an add with assign semantics isn't perfect
either, since to simulate set semantics, we have to allocate a target
pmc beforehand, which means must decide what type the new pmc should be
-- either we allocate a PerlUndef (or some other fixed type, which
regardless of what it is, might just possibly be the *wrong* type). 
This puts the burden of the decision about what type the target should
be allocated as completely on the host language's shoulders, and doesn't
let the pmc's involved decide.  Well, I suppose one could do:
   result_type_for_add $I0, $P1, $P2
   $P0 = new $I0
   $P0 = $P1 + $P2
, but this is rather clumsy.  I'd rather have two versions of add, one
with set/create semantics and the other with assign/mutate semantics.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: [PATCH] Win32 env.pmc fixes

2003-08-15 Thread Benjamin Goldberg

In the second chunk of the patch [classes/env.pmc], what is free_it set
to if the else branch is taken?  IOW, I think you should have free_it
initialized to 0 when you declare it.

In the last part of the patch [config/gen/platform/win32.c], why do you
set *free_it to 0 even when you return NULL?  Wouldn't it be better to
have it as:

   if( size == 0 ) {
  *free_it = 0;
  return NULL;
   } else
  *free_it = 1;

(Of course, the unnecessary mem_sys_free(NULL) is harmless, but
still...)

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


perlnum.pmc

2003-08-14 Thread Benjamin Goldberg

Shouldn't the get_number() method use string_from_num, instead of
calling sprintf/snprintf?

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Packfile stuff

2003-08-14 Thread Benjamin Goldberg


Luke Palmer wrote:
 
 Leopold Toetsch writes:
  Juergen Boemmels [EMAIL PROTECTED] wrote:
   what still fails is pbc2c.pl (This needs Parrot::Packfile, which can
   only read format 0 (old assemble.pl) bytecodes).
 
  This is obsoleted by Daniel's exec patches.
 
 Sadly.  I mean, the exec patches are great, but I found no faster way to
 run parrot bytecode than to run it through pbc2c.pl and compile it with
 gcc -O3.  That was still about twice as fast as the JIT (and thus the
 exec).

What is different about the resulting executable code from the code
produced by jit?  If we could determine what the differences are, then
we might be able to change our jit to produce the same code as gcc -O3
is producing, thus getting the same speed.

 But, as previously pointed out, pbc2c.pl's idea doesn't scale well and
 has a few other drawbacks, so I suppose this was destined to happen.

Is there any chance that we could try and make pbc2c.pl turn each pir
subroutine into a seperate function?

If not enough information is available in the pbc to do that (though
AFAIK, there should be), then imcc itself could output the C code.


As another alternative, can't gcc optomize assembly, at least a little? 
Suppose we ran a disassembler on the program produced by exec, then
compiled that using gcc -O3?  Obviously, it likely won't be quite as
good as if it compiled C code with -O3, but it still might be able to
make *some* optomizations.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


pmc.c ([constant_]pmc_new_noinit)

2003-08-14 Thread Benjamin Goldberg

In pmc_new_noinit, I see a switch() which decides whether or not to do
add_pmc_ext.  Why not have that information (whether or not an ext is
needed) defined in each of the .pmc files, and stuck in the vtable?

That would allow other cache-only pmcs to avoid getting an unnecessary
chunk of bytes allocated for them.

Also, why does constant_pmc_new_noinit always add an ext?

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Perl 6's for() signature

2003-08-14 Thread Benjamin Goldberg


Jonadab The Unsightly One wrote:
 
 John Siracusa [EMAIL PROTECTED] writes:
 
  Did this ever get resolved to anyone's satisfaction?  While reading
  EX6, I found myself wonder exactly what for() would look like in
  Perl 6 code...
 
 A for loop[1] is basically syntax sugar for a while loop.  In general,
 where foo, bar, baz, and quux are expressions, the following are equivalent:
 
 for (foo; bar; baz) { quux }
 foo; while (bar) { quux; baz }

Well, except that in the second, any variables declared in foo will leak
into the scope after the end of the for loop.  Also, in the second,
baz is inside the same scope as quux, whereas in the normal C-style
for loop, it's not.

Thus, for (foo; bar; baz) { quux } is really more like:

   {
  foo;
  start_for_loop:
  if( !bar ) { goto end_for_loop }
  { quux }
  baz;
  goto start_for_loop;
  end_for_loop:
   };

 If Perl6 has enough syntax-sugar ability to let you turn the former
 into the latter, then you don't need to worry about for's signature.

Well, yes.  But the point of this discussion is, precisely what *kind*
of syntactic sugar will be used.

 [1]  Of course I mean a C-style for loop.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Set vs. Assign..?

2003-08-14 Thread Benjamin Goldberg

Having read this thread, I think that the real problem is not just that
there are multiple assignment semantics -- it's that the names set and
assign are not really meaningful -- that is, they *look* to english
speakers as if they are synonyms... their names alone aren't enough info
to know what precisely they're doing.

Furthermore, the current division of assignments to those three types is
not what *I* would consider intui^H^H^H^H^H the most logical and
sensible of arrangements.

IMHO, we should rename the ops, dividing them up more clearly by
behavior.

I would suggest the opnames/categories mutate, alias, and create.

The mutate ops would all re-use the string or pmc headers, changing the
contents of the first argument to match the contents of the second
argument.  The (two) alias ops would simply copy pointers.  The create
ops would create new strings or new pmcs.  The ops which target ints and
nums would be lumped with the mutate ops, since they neither create
aliases, nor allocate new pmcs or new strings.


Yes, I realize that changing the names of so many ops will require
changing lots of code, and may temporarily introduce breakage, but I
belive that it would be good in the long term.  It would make the
different assignment ops *much* easier to remember and understand, and
make it easier for someone looking at parrot code understand which
semantics a particular assignment operation is using.  (Not changing the
names would have the same bad effects as breaking a chain letter :-)


The changes in ops that I am proposing would be as follows:

Alias would be the existing set(out STR, invar STR) and set(out PMC,
in PMC).

Mutate would be all the existing assign opcodes, plus most of the
other set opcodes (except the ones which have out STR as their first
argument), plus the following new ops:

   inline op mutate(in str, in int) {
  string_set(INTERP, $1, string_from_int(INTERP, $2));
   }
   inline op mutate(in str, in num) {
  string_set(INTERP, $1, string_from_num(INTERP, $2));
   }
   inline op mutate(in str, in pmc) {
  string_set(INTERP, $1, VTABLE_get_string(INTERP, $2));
   }

And lastly, Create would be the remaining set ops (which target
strings, creating new strings), and the two clone ops, plus the
following new ops:

   inline op create(out pmc, in int) {
  $1 = pmc_new(INTERP, enum_class_PerlInt);
  VTABLE_set_integer_native(INTERP, $1, $2);
   }
   inline op create(out pmc, in num) {
  $1 = pmc_new(INTERP, enum_class_PerlNumber);
  VTABLE_set_number_native(INTERP, $1, $2);
   }
   inline op create(out pmc, in str) {
  $1 = pmc_new(INTERP, enum_class_PerlString);
  VTABLE_set_string_native(INTERP, $1, $2);
   }

(I realize that each of these would be the same as new $Px, .SomeClass,
followed by mutate $Px, $foo... however, the combined ops here would be
easier to write and read (one line instead of two), and in non-jit
environments, faster to run, as well.)

I believe that the most in^H^H logical shorthand spellings of these
opcodes would be :=, =, and ==, for alias, mutate, and create,
respectively.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: [perl #23276] Prefixing #define names

2003-08-14 Thread Benjamin Goldberg
Michael G Schwern wrote:
 
 On Mon, Aug 11, 2003 at 01:45:35AM -0700, Ask Bjoern Hansen wrote:
  We totally need to have Parrot running on this thing when it comes
  out.  :-)
 
http://www.xgamestation.com/
 
 Great idea, shame the hardware is crap. :(
 
 Third-generation Motorola 68HCS12 16-bit processor @ 25 MHz.
  Graphics architecture similar to Commodore 64, Atari 800 and Apple II.

From reading the webpage, the xgs seems not to be intended to be a
serious gaming platform, but rather, a hands-on way of teaching
enthusiasts about how game programming was done in the good old days.

The pupose of having a small memory, and slow cpu speed, would seem to
be that programmers for it are forced to hand-optomize algorithms and
code, something they mightn't bother with in a resource abundant
environment.  Whether this is a good thing or a bad thing depends on why
you're writing code for the xgs :)

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Set vs. Assign..?

2003-08-14 Thread Benjamin Goldberg
Leopold Toetsch wrote:
 
 Benjamin Goldberg [EMAIL PROTECTED] wrote:
  Leopold Toetsch wrote:
 
  Benjamin Goldberg [EMAIL PROTECTED] wrote:
 
   I would suggest the opnames/categories mutate, alias, and
   create.
 
  IMHO, we could leave PASM syntax as it is and create opcode aliases
  inside the assembler ...
 
  I would rather rename the ops, but make their old names aliases to the
  new names.
 
 Yes.
 
  But that would tempt people to do:
 
 $P0 = new SomethingElse (42)
 
  Which sadly wouldn't work, since neither the create ops, nor the new
  op, generalize that way.
 
 That's fine for Arrays (setting the initial size) and for Subs (setting
 branch location).

Hmm, yes.

 Its kind of the current init_pmc vtable.

Err, not quite -- the proposed create ops first create a pmc, then call
set_integer_native; this is not in any way related to init_pmc.

Although, I suppose it could be -- if there existed an init_integer,
then it would be more logical for $P0 = new SomethingElse (42) to use
that, rather than create then set_integer_native.

Maybe we should have 5 init functions: init, init_pmc, init_integer,
init_double, init_string.  (Hmm, what's that init_pmc_props thingy?)

  But if the syntax is:
 
 $P0 == 42
 
  , we would not be impling a generality which doesn't exist.  Also, it
  removes redundancy, since 42 already implies PerlInt, so we don't
  need to spell that name out.
 
 PieThonInt, RubyInt?

Hmm.  I that parrot should DTRT, and use CurrentLanguageInt. :)

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: #define name collisions -- yet another small project

2003-08-14 Thread Benjamin Goldberg


Dan Sugalski wrote:
 
 Well, it turns out that at least some compilers (AIX's) are really
 unhappy about redefined #defines in the C source. This turns out to
 be a problem with things like HAS_STDLIB_H, which is common enough to
 cause collisions. So, we need to go name-prefix all the #defines.

Why not just change them from:

  #define HAS_FOO_H

To:

   #ifndef HAS_FOO_H
   #define HAS_FOO_H
   #endif

 So, the project. Someone needs to go through the configure procedure
 and the headers and throw a PARROT_ prefix in front of all the HAS_
 defines we define, so we can avoid this problem.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: QUERIES: Questions about Unanswered Elderly or Recent Issues Eventually Solvable

2003-08-14 Thread Benjamin Goldberg
Dan Sugalski wrote:
 
 At 11:06 AM +0200 8/8/03, Leopold Toetsch wrote:
[snip]
 PMC methods
 ---
 ParrotIO has methods via find_method/invoke. Should that be a general
 mechanism in default.pmc with one vtable slot for the meth hash?
 
 We're going to want lexially nested method caches, so I don't think
 we're going to want to do this. The method hash needs to be living in
 the interpreter struct, alas.

There's no reason why the find_method in default.pmc can't do lookups in
the interpreter struct, and then of course the normal method would be to
use find_method.

This way, if a pmc class wants to override the normal find_method, it
can.

For example, PerlObject's find_method could be:

   PMC * find_method(STRING *name) {
  PMC * perlclass = SELF.getprop(string_from_cstring(class));
  return VTABLE_find_method(INTERP, perlclass, name);
   }

Then, PerlClass's find_method might be something like:

   PMC * find_method(PMC *key, STRING *name) {
  PMC * classname = SELF.get_string();
  PMC * lookup = scratchpad_get_current(INTERP);
  PMC * method = NULL;
  {
 STRING * lex_name = string_from_cstring(INTERP, , 1);
 string_append(INTERP, lex_name, classname);
 string_append(INTERP, lex_name,
string_from_cstring(INTERP, ::, 2));
 string_append(INTERP, lex_name, name);
 method = scratchpad_get(INTERP, lookup, lex_name, 0);
 if( method ) return method;
  }
  lookup = SELF.getprop(string_from_cstring(INTERP, methods));
  method = VTABLE_get_pmc_keyed_string(INTERP, lookup, name);
  if( method ) return method;
  lookup = SELF.getprop(string_from_cstring(INTERP, parent));
  if( !lookup ) return NULL;
  return VTABLE_find_method(INTERP, lookup, name);
   }

 Exceptions
 --
 I presume internal_exception() should throw a real exception. So the
 exception reason should be classified (severity/reason).
 
 Yep. Internal exceptions are generally pretty severe, but they
 certainly aren't all that bad.

Indeed -- consider how default.pmc throws ILL_INHERIT all over the
place... those could be caught.

(Hmm, I think that default.pmc would smaller, and no less clear if we
defined a macro:

#define THROW_ILL_INHERIT(METH) \
   internal_exception(ILL_INHERIT, \
  METH () not implemented in class '%s'\n, \
  caller(INTERP, SELF))

And then use
   THROW_ILL_INHERIT(get_number_keyed);

Instead of the more verbose code that's currently there.)

[snip]
 Vtables 2
 -
 Are there any design hints on var/value split of vtables?
 Currently in a lot of places the address of the vtable is used to
 check if a class is e.g. a PerlInt. This will not work if that is a
 tied PerlInt. Other checks are done with vtable-base_type, but a tied
 PerlInt will have a different class enum too.

That's why encapsulation is so important.  If we had a vtable-isa
method, then it could be overridden however we want.

 So how will we check for some PerlInt class?
 What is the subtype? And what are int_type, float_type, num_type,
 string_type in the vtable?
 
 I'm still working on these, and as part of it I'm trying to come up
 with a good way to allow layered vtables that aren't horribly
 expensive in terms of memory and time.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: assign opcodes

2003-08-11 Thread Benjamin Goldberg


Brent Dax wrote:
 
 TOGoS:
 # Personally, I would like = to mean 'set', and
 # maybe - do 'assign'.
 
 I usually think of registers as variables with fixed names, so the Perl
 6 part of my brain suggests:
 
 $P0  = $P1  #assign
 $P0 := $P1  #set

Which is why I suggested, when proposing renaming assign, set, and clone
to alias, mutate, create, that shorthand for alias (which is today's set
Px, Py and set Sx, Sy) be :=, and shorthand for mutate (which is
today's assign) be =.


-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: parrot bug: continuations/multiple return

2003-08-10 Thread Benjamin Goldberg
Michal Wallace wrote:
[snip]
   def f():
   return g()
[snip]
 # f from line 3
 .pcc_sub _sub1 non_prototyped
 .local object res1# (visitReturn:528)
 find_lex $P2, 'g' # (callingExpression:325)
 newsub $P3, .Continuation, ret0# (callingExpression:331)
 .pcc_begin non_prototyped # (callingExpression:332)
 .pcc_call $P2, $P3# (callingExpression:335)
 ret0:
 .result res1  # (callingExpression:338)
 .pcc_end  # (callingExpression:339)
 .pcc_begin_return # (visitReturn:530)
 .return res1  # (visitReturn:531)
 .pcc_end_return   # (visitReturn:532)
 .end

Does python allow tail calls to be optomized?  That is, would it be
legal python semantics if f were to become:

.pcc_sub _sub1 non_prototyped
find_lex $P2, 'g'
.pcc_begin non_prototyped
.pcc_call $P2, P1
.pcc_end
.end

(untested)
Also... why is $P2 merely an imcc temporary, without a real name?  That
is, why not do:

.pcc_sub _sub1 non_prototyped
.local object sub1
find_lex sub1, 'g'
.pcc_begin non_prototyped
.pcc_call sub1, P1
.pcc_end
.end

The more different prefixes you use (res, sub, $P), the lower the
numbers you'll need to append to them to make the names unique.  Not to
mention, it'll make the generated code more meaningful.

Another advantage is that with '.local' names, if you accidentally use a
name declared in one subroutine, in another, imcc will consider this to
be an error.  Such a mistake might be otherwise difficult to spot.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: parrot bug: continuations/multiple return

2003-08-10 Thread Benjamin Goldberg
Michal Wallace wrote:
 Benjamin Goldberg wrote:
  Michal Wallace wrote:
[snip]
  Also... why is $P2 merely an imcc temporary, without a real name? 
  That is, why not do:
 
  .pcc_sub _sub1 non_prototyped
  .local object sub1
  find_lex sub1, 'g'
  .pcc_begin non_prototyped
  .pcc_call sub1, P1
  .pcc_end
  .end
 
  The more different prefixes you use (res, sub, $P), the lower
  the numbers you'll need to append to them to make the names unique. 
  Not to mention, it'll make the generated code more meaningful.
 
  Another advantage is that with '.local' names, if you accidentally use
  a name declared in one subroutine, in another, imcc will consider this
  to be an error.  Such a mistake might be otherwise difficult to spot.
 
 Hmmm. The counters are global so there shouldn't be any conflicts.
 I'd actually been trying to move away from .local now that I started
 using lexicals... If there's going to be two sets of names for
 everything I'd rather one of them was just $Pxx...

Ehh, why would there be two sets of names for everything?  You mean, one
of them the name to lookup in the pad, the other the imcc name of the
register to use?  There's no reason not to use the same name for both.

(Except that using a unique .local name is still good for preventing
accidental leakage of a name from one sub to another)

 Now, if I could say:
 
   .lexical g
 
 That would be really nice... :)

This is where .macro comes in handy :)

   .macro lexical(name_in_pad, register_name)
  local .register_name
  find_lex .register_name, .name_in_pad
   .endm

And in pirate.py, you'd do:
   self.append('.lexical %s %s', lexname, lexname)

Ok, so it's not *quite* what you were asking for, but it's close enough,
eh? :)

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Set vs. Assign..?

2003-08-09 Thread Benjamin Goldberg
Leopold Toetsch wrote:
 
 Benjamin Goldberg [EMAIL PROTECTED] wrote:
 
  I would suggest the opnames/categories mutate, alias, and
  create.
 
 IMHO, we could leave PASM syntax as it is and create opcode aliases
 inside the assembler ...

You mean leave the old ops with their names, but merely make new aliases
for them with the scheme I suggested?

I would rather rename the ops, but make their old names aliases to the
new names.

Of course, in theory, there should be no visible difference whatsoever
between these two ideas :)

(Well, maybe for decompiling.)

  I believe that the most in^H^H logical shorthand spellings of these
  opcodes would be :=, =, and ==, for alias, mutate, and create,
  respectively.
 
 But what I really want to have is these in the PIR code (except for the
 create one, which is:
 
   $P0 = new PerlInt
 
 With an underlying create op, that does assign a value too, it could
 then be
 
   $P0 = new PerlInt (42)
 
 or something.

But that would tempt people to do:

   $P0 = new SomethingElse (42)

Which sadly wouldn't work, since neither the create ops, nor the new
op, generalize that way.

But if the syntax is:

   $P0 == 42

, we would not be impling a generality which doesn't exist.  Also, it
removes redundancy, since 42 already implies PerlInt, so we don't need
to spell that name out.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: assign opcodes

2003-08-09 Thread Benjamin Goldberg
Togos wrote:
 
  Anyway:
 
assign Px, {Iy,Sy,Ny}
 
  are not needed IMHO, these end up as
  set_type_native and are identical
  to set Px, {Iy,Sy,Ny}.
 
 Yes, but as we were discussing in the
 Set vs. Assign thread, it makes more sense
 to call them 'assign', as it morphs the
 existing value (as 'assign Px, Py' does),
 instead of simply copying a pointer
 (as 'set Px, Py' does). Yay for consistancy.
 If you want to get rid of opcode aliases,
 perhaps it would be better to get rid of
 the extra 'set's.

Out of curiosity, how does the word assign imply that it morphs an
existing value, and how does the word set imply that it copies a
pointer?

Obviously, in parrot as it is now, this is how it is, but what precisely
led to the choice of those two names?

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: assign opcodes

2003-08-09 Thread Benjamin Goldberg


Leopold Toetsch wrote:
 
 I hopefully got the semantics of assign Px,Py right now. The LHS gets
 the value of RHS, eventually morphing itself to the source type.
 
 Anyway:
 
assign Px, {Iy,Sy,Ny}
 
 are not needed IMHO, these end up as set_type_native and are identical
 to set Px, {Iy,Sy,Ny}.

Or else, set Px, {Iy,Sy,Ny} ought to create a new pmc, then use
set_type_native, with that new pmc as the target, and store that new
pmc into Px.

 But we are missing keyed variants to set the value of some aggregate
 member:
 
assign(in PMC, in KEY, in PMC)
assign(in PMC, in INTKEY, in PMC)
 
 with the 2 vtables assign_keyed and assign_keyed_int.

We can alway emulate assign semantics with set followed by assign.  That
is:

   $Ptmp = $Px[ y ]
   assign $Pz, $Ptmp

Bleh.

 Comments welcome

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


set_pmc vs clone?

2003-08-09 Thread Benjamin Goldberg

What's the difference between VTABLE_set_pmc and VTABLE_clone?

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Infant mortality

2003-08-07 Thread Benjamin Goldberg
Juergen Boemmels wrote:
 
 Benjamin Goldberg [EMAIL PROTECTED] writes:
[snip]
  Except that generational_dod_helper is much simpler and faster -- it
  doesn't mark anything as alive or free, it only adjusts the generation
  of those pmcs that were created in C functions which we have since
  returned from.
 
 Its still one full sweep.

Did you read my other message in this thread?

It could/should be much simpler: only a sweep of the stack of newly
allocated pmcs; and since the stack only grows at one end, it's *only*
those newly allocated pmcs whose depths are greater than our current
depth.

  The generation count doesn't *force* intermediate DOD-runs... or at
  least, not a *full* DOD, anyway.
 
 But it needs intermediate sweeps.

But with the generation-number kept seperate from the pmc in a stack,
these intermediate sweeps can be very fast.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Infant mortality

2003-08-04 Thread Benjamin Goldberg
Juergen Boemmels wrote:
 
 Benjamin Goldberg [EMAIL PROTECTED] writes:
 
  I was recently reading the following:
 
 http://www.parrotcode.org/docs/dev/infant.dev.html
 
  It's missing some things.  One of which is the (currently used?) way
  of preventing infant mortality: anchor right away, or else turn off
  DoD until the new object isn't needed.
 
 This is Solution 3: Explicit root set augmentation
 
 Its not explained in so much details than the other solutions.
 
  This document doesn't mention another technique, which was mentioned
  recently:
 
 http://groups.google.com/groups?
selm=m2k7asg87i.fsf%40helium.physik.uni-kl.de
 
  , at the use a linked list of frames part.
 
 In principle this is a variant of the explicit root set
 augmentation. The linked list of frames is part of the root set. The
 big advantages of this list is that return and longjmp automatically
 drop the now unused objects, because each partial list is stored in
 the C-stack-frames.

  Another similar idea (one which I thought of myself, so feel free to
  shoot it down!) is to use a generational system, with the current
  generation as a value on the C stack, passed as an argument after the
  interpreter.  That is, something like:
 
  foo(struct ParrotInterp *interpreter, int generation, ...)
{
  PMC * temp = bar(interpreter, generation);
  baz(interpreter, generation+1);
}
 
  Because inside baz(), generation is a higher value than it was when
  temp was created, a DOD run inside of baz() won't kill foo.
 
 This is a solution similar to Solution 2 / Variant 4: Generation
 count.
 
  During a DOD run, any PMC with a generation less than or equal to the
  current generation is considered live.
 
 You can even use mark the current generation as free. If a function
 wants to keep data it uses Parrot_do_DOD(interp, generation + 1). If
 it does not have any data to keep it calls Parrot_do_DOD(interp,
 generation).
 
  Any PMC with a generation
  greater than the current generation gets it's generation set to 0.
 
 You mean is marked as free. An anchored object should be set to 0.

Actually, I meant generation set to MAX_INT, not 0.

Marking pmcs as free happens at the end of DOD.  Marking pmcs as live or
dead happens earlier.  I was thinking of something like:

   foreach(pmc in all_pmcs) {
  if( pmc-generation = current_generation )
 mark pmc as live
 if( pmc is an aggregate )
add pmc to list of pmcs to be traced
 /* don't alter it's generation */
  else
 mark pmc as dead
 /* and if it becomes alive again, then */
 /* on the next DOD, it shouldn't be considered */
 /* alive due to being neonate/on the stack */
 pmc-generation = MAX_INT;
   }
   foreach(pmc in root set) {
  if( pmc is marked as live ) next;
  mark pmc as live
  if( pmc is an aggregate )
 add pmc to list of pmcs to be traced
   }
   trace the pmcs in the list of pmcs to be traced
   foreach(pmc in all_pmcs) {
  if( pmc isn't live )
 add pmc to the free list
   }

  Like the linked list scheme, this works through longjmps and recursive
  run_cores, and it's much simpler for the user, too: just add one to
  the  generation to prevent all temporaries in scope from being freed.
 
  It similarly has the drawback of altering the signature of every
  parrot function.
 
 If this will lead to exact and fast DOD we might bite the bullet.

Actually, I just thought of a seriously cool idea.  If the C stack
always grows up, or always grows down, then we can use the address of
the local variable holding the interpreter, as the generation number. 
Thus, we can avoid changing the prototype of all our functions.  Sortof
like how one write an alloca() library function.

  There's another drawback I can think of... consider:
 
  foo(struct ParrotInterp *interpreter, int generation, ...)
{
  PMC * temp = bar(interpreter, generation);
  baz(interpreter, generation+1);
  qux(interpreter, generation+1);
}
 
  If baz creates a temporary object and returns, then qux performs a
  DOD, baz's (dead) object won't get cleaned up.
 
 When asume the current generation as free (less than instead of less
 equal) this is not problem. Only if qux calls another function with an
 increased generation count without a prior call to Parrot_do_DOD the
 temporaries of baz will survive. But this can be solved by:
 
 foo(struct ParrotInterp *interpreter, int generation, ...)
   {
 PMC * temp = bar(interpreter, generation);
 baz(interpreter, generation+1);
 Parrot_do_DOD(interpreter, generation+1);
 /* free the temporaries baz */
 qux(interpreter, generation+1);
   }

  This could be solved by keeping a stack of newly created objects, and
  providing some sort of generational_dod_helper() function, which would
  do something like:
   while( neonates  neonates-top-generation  current_generation ) {
  neonates-top-generation

Re: Infant mortality

2003-08-04 Thread Benjamin Goldberg

Having applied a bit more thought, having the generation field as part
of the PMC isn't all that great -- it makes PMCs larger, but it's really
only needed for new/neonate pmcs.

Instead of attatching the generation directly to the pmc, have a global
(per-interpreter) stack of neonate pmcs.  Each stack entry contains both
the generation it was added, and the pointer to the pmc.

When a pmc is created, it does something like:
   while( size(neonate_stack)  top(neonate_stack).gen  cur_gen )
  pop(neonate_stack);
   push(neonate_stack, { self, cur_gen });

During DOD, we do:

   foreach(pmc in all_pmcs)
  mark pmc as dead
   while( size(neonate_stack)  top(neonate_stack).gen  cur_gen )
  pop(neonate_stack)
   foreach( pmc in root_set and on the neonate_stack ) {
  if( pmc is alive ) next;
  mark pmc as live
  if( pmc is an aggregate )
 add pmc to the list of pmcs to be traced
   }
   foreach( pmc in all_pmcs )
  if( pmc is dead )
 add pmc to free list;

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: string.c questions

2003-08-03 Thread Benjamin Goldberg


Luke Palmer wrote:
 
 Benjamin Golberg writes:
  Actually, these are mostly questions about the string_str_index
  function.
 
 Uh oh...
 
  I've some questions about bufstart, strstart, bufused, strlen and
  encoding-characters?
 
  In string_str_index_multibyte, the lastmatch variable is calculated as:
 
  const void* const lastmatch =
 str-encoding-skip_backward((char*)str-strstart + str-strlen,
find-encoding-characters(find, find-strlen));
 
  There seems to be quite a bit of confusion on this line about bytes and
  characters... the goal here seems to be to find a pointer to the last
  place where it would be possible to begin a match.
 
 Yep.
 
 You're right, there is a bit of confusion about characters and bytes
 in this statement -- mostly because I'm confused about characters and
 bytes in Parrot.  So... str-strlen is the number of *characters* in
 the string?  Hmm.. that changes things.
 
 Maybe someone else should fix this -- who knows what they're doing :-)
 Do we have tests for multibyte string operations in the test suite?
 
  What's with find and -characters?  Shouldn't find-strlen be
  sufficient, without all that other stuff around it?  Next...
 
 If find-strlen represents the number of characters as you say, then
 yes.
 
  If these weren't multibyte strings, then this would be (str-strstart +
  str-strlen - find-strlen), right?  Or, translating that literally (and
  doing the subtraction first):
 
  const void* const lastmatch = str-encoding-skip_forward(
 str-strstart, str-strlen - find-strlen );
 
 Yeah, the thing about that is, for strings in UTF formats,
 skip_forward is a linear time operation, which is pretty expensive
 when there's a lot of data.  That's why I used pointers in this
 function instead of string_index as the previous implementation did.

Except, of course, that the pointer arithmetic version was wrong :(


  Or, if we can do that trick for finding the end of a string:
 
  const void* const lastmatch = str-encoding-skip_backward(
 (char*)str-bufstart + str-bufused, find-strlen );
 
  Similarly, the lastfind variable should either be:
 
  const void* const lastfind = find-encoding-skip_forward(
 find-strlen );
 
 skip_forward takes 2 args, I assume you mean:
 
 const void* const lastfind = find-encoding-skip_forward(
 find, find-strlen);

Actually, I think that I meant:

 const void* const lastfind = find-encoding-skip_forward(
find-strstart, find-strlen );

Since I assume that the functions of encoding objects operate on pointers
into a buffer's data area, *not* on STRING* objects.

 Again, that's linear time.  But usually the string to find won't be
 that long, so it's not so important in this case.  But your shortcut
 would still be faster.
 
  Or:
 
  const void* const lastfind = (char*)find-bufstart + find-bufused;

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: string.c questions

2003-08-03 Thread Benjamin Goldberg


Leopold Toetsch wrote:
 
 Benjamin Goldberg [EMAIL PROTECTED] wrote:
 
  Also, although we're told at the top of string.c to not look at
  s-bufstart or s-buflen, I'd like to know if we are allowed to
  assume/assert that for all strings, the following is true:
 
 s-encoding-skip_forward( s-strstart, s-strlen ) ==
(char*)s-bufstart + s-bufused
 
 No. Fres_lea.c e.g. is using a reference count at bufstart. But with
 s/bufstart/strstart/ above equation sould be true.

To what does 'bufused' refer?  The number of bytes from where to where?

I *thought* that it was from bufstart to the end of the string... no?

And where is all this documented?

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Ultra bootstrapping :)

2003-08-03 Thread Benjamin Goldberg

Considering that parrot is now emitting an executable (on some
platforms)... and IIRC, C will be one of the languages we plan to have
parrot support for... will parrot be able to compile itself? :)

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Infant mortality

2003-08-03 Thread Benjamin Goldberg
I was recently reading the following:

   http://www.parrotcode.org/docs/dev/infant.dev.html

It's missing some things.  One of which is the (currently used?) way of
preventing infant mortality: anchor right away, or else turn off DoD
until the new object isn't needed.

This document doesn't mention another technique, which was mentioned
recently:

   http://groups.google.com/groups?
  selm=m2k7asg87i.fsf%40helium.physik.uni-kl.de

, at the use a linked list of frames part.


Another similar idea (one which I thought of myself, so feel free to
shoot it down!) is to use a generational system, with the current
generation as a value on the C stack, passed as an argument after the
interpreter.  That is, something like:

foo(struct ParrotInterp *interpreter, int generation, ...)
  {
PMC * temp = bar(interpreter, generation);
baz(interpreter, generation+1);
  }

Because inside baz(), generation is a higher value than it was when temp
was created, a DOD run inside of baz() won't kill foo.

During a DOD run, any PMC with a generation less than or equal to the
current generation is considered live.  Any PMC with a generation
greater than the current generation gets it's generation set to 0.

Like the linked list scheme, this works through longjmps and recursive
run_cores, and it's much simpler for the user, too: just add one to the
generation to prevent all temporaries in scope from being freed.

It similarly has the drawback of altering the signature of every parrot
function.

There's another drawback I can think of... consider:

foo(struct ParrotInterp *interpreter, int generation, ...)
  {
PMC * temp = bar(interpreter, generation);
baz(interpreter, generation+1);
qux(interpreter, generation+1);
  }

If baz creates a temporary object and returns, then qux performs a DOD,
baz's (dead) object won't get cleaned up.

This could be solved by keeping a stack of newly created objects, and
providing some sort of generational_dod_helper() function, which would
do something like:
   while( neonates  neonates-top-generation  current_generation ) {
  neonates-top-generation = 0;
  neonates = neonates-next;
   }
, and calling that in foo between baz and qux.  (And maybe sometimes at
toplevel, between opcodes... at times when the generation count in a
normal generation count scheme (with a global counter) would be
incremented)  You lost a bit of simplicity, by having to call this
function occcasionally, but it can save a bit of memory.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: As soon as she walked through my door...

2003-08-03 Thread Benjamin Goldberg
Miko O Sullivan wrote:
 
 Congratulations to Damian on a great opening in Ex 6.  Anybody can spoof
 the classic detective novel setup, but it takes real talent to have it
 actually make sense in the context of a technical document.

How long till Ex 6 is online, for those of us who weren't there? :)

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


string.c questions

2003-08-02 Thread Benjamin Goldberg

Actually, these are mostly questions about the string_str_index
function.

I've some questions about bufstart, strstart, bufused, strlen and
encoding-characters?

I *think* that -characters is a fuction which gets passed a pointer to
the start of a buffer, and the number of bytes in the buffer, and
returns the number of characters in the buffer... right?

Can we assert that for any string s, s-strlen ==
s-encoding-characters( s-strstart, s-bufused)?  If so, where would
this be documented?

Also, although we're told at the top of string.c to not look at
s-bufstart or s-buflen, I'd like to know if we are allowed to
assume/assert that for all strings, the following is true:

   s-encoding-skip_forward( s-strstart, s-strlen ) ==
  (char*)s-bufstart + s-bufused

If so, this makes finding the end of a string significantly easier.

In string_str_index_multibyte, the lastmatch variable is calculated as:

const void* const lastmatch =
   str-encoding-skip_backward((char*)str-strstart + str-strlen,
  find-encoding-characters(find, find-strlen));

There seems to be quite a bit of confusion on this line about bytes and
characters... the goal here seems to be to find a pointer to the last
place where it would be possible to begin a match.

What's with find and -characters?  Shouldn't find-strlen be
sufficient, without all that other stuff around it?  Next...

If these weren't multibyte strings, then this would be (str-strstart +
str-strlen - find-strlen), right?  Or, translating that literally (and
doing the subtraction first):

const void* const lastmatch = str-encoding-skip_forward(
   str-strstart, str-strlen - find-strlen );

Or, if we can do that trick for finding the end of a string:

const void* const lastmatch = str-encoding-skip_backward(
   (char*)str-bufstart + str-bufused, find-strlen );

Similarly, the lastfind variable should either be:

const void* const lastfind = find-encoding-skip_forward(
   find-strlen );

Or:

const void* const lastfind = (char*)find-bufstart + find-bufused;

In the string_str_index function, I see:

if (!s || !string_length(s))
return -1;
if (!s2 || !string_length(s2))
return -1;

I would think that the function should start:

   /* the empty string is a substring of *every* string. */
   if( !s2 || !s2-strlen )
  return s ? MIN( s-strlen, start ) : 0;
   /* you can't find a big string inside a little one. */
   if( !s || s-strlen - start  s2-strlen )
  return -1;

If we left this as it originally was... consider what happens if these
are multibyte strings, and s2 is much larger than s... our 'lastmatch'
variable will be well before the beginning of the buffer... and
*reaching* it, either through skip_forward or skip_backward, would mean
walking through memory that wasn't our own.

Next Q: Where's the docs on what functions are range checked, and what
functions will seg fault if called with bad params?  If string_index(s,
1) is called on a short string, it'll segfault.  This isn't
mentioned in string_index's docs, afaiks.

Why aren't there string_(chr_r?index|str_rindex) functions?  ENOTUITS on
the part of whoever's in charge of string.{c,ops}?  These would be quite
useful in making the rx_ ops faster.  (Note that string_str_index is
useful for /foo/ or for /^.*?foo/, but not as good for /.*foo/ ... for
that, you'd want to use a string_str_rindex).

I'd offer some impls for these, but sadly, I've a case of ENOTUITS
myself.

More possible additions to the string_ repertoire might be
string_(starts|ends)with, to speed up /^foo/ and /bar$/ matches.  Or a
limit argument to string_whatever_r?index, for things like
/^.{0,5}foo/.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


rx.ops

2003-08-02 Thread Benjamin Goldberg

Since I don't see anything to save/restore the instack on subroutine
calls, I am wondering what happens if a regex has a (?{ CODE }), and
that CODE calls a regex.  Are we garunteed that after a regex completes
(either succeeds or fails) that the intstack is in the same state it
started?  If not (and remember, exceptions can leave things in odd
states), what keeps things from being fubared?

Throughout the rx opcode definitions, I see much use of string_index to
find the character at a particular index.  If the string's encoding uses
multiple bytes per character, this can be O(N) for each call.  Not good.

Since str-encoding-skip_forward is supposed to be O(N) in terms of how
many chars are skipped forwards, and since most of those string_index()s
are one character-index away from an index which was recently accessed,
we should be able to switch to that for a great speed improvement.

Alas, this only works for within a single op, since there's a chance
that intervening ops will cause memory to be allocated, potentially
causing a GC, potentially re-locating the string's buffer, and thus
invalidating the void* pointer.  (Well, unless we use string_pin to keep
the string from getting moved.)

Is there any chance of us ever getting a real string iterator type, for
which this isn't so much of a concern?

Actually, if we did our string-skip_forward type stuff, but only saved
indices relative to the start of the *buffer*, maybe it would be ok...

However, we'd still need an additional int register, for the character
as well as byte position.  Given the growing number of things that each
regex subroutine needs to keep track of (string being searched, start of
search as a character offset, start as a byte offset, current search
point as a character offset, current search point as a byte offset),
IMHO, this would be a good use for a regex state struct.

   op rx_advance(in pmc, inconst int) {
  RXState * state = (RXState*)PMC_data($1);
  STRING * str = state-str;
  if( state-char_off = str-strlen ) {
 goto OFFSET($2);
  }
  ++state-char_off;
  state-byte_off = str-encoding-skip_forward(
 (void*)((char*)str-strstart + state-byte_off), 1 ) -
(char*)str-strstart;
  goto NEXT();
   }

   op rx_literal(in pmc, in str, inconst int) {
  RXState * state = (RXState*)PMC_data($1);
  STRING * str = state-str;
  void * a = (void*)( (char*)str-strstart + state-byte_off );
  void * b = $2-strstart;
  int num_chars = 0;
  while( num_chars  $2-strlen ) {
 INTVAL a_code = str-encoding-decode(a);
 INTVAL b_code = $2 -encoding-decode(b);
 /* XXX should do chartypes, too */
 if( a_code != b_code ) goto OFFSET($3);
 a = str-encoding-skip_forward(a, 1);
 b = $2 -encoding-skip_forward(b, 1);
 ++num_chars;
  }
  state-byte_off = (char*)a - (char*)str-strstart;
  state-char_off = num_chars;
  goto NEXT();
   }

   op rx_char(in pmc, in int, inconst int) {
  RXState * state = (RXState*)PMC_data($1);
  STRING * str = state-str;
  void * p;
  if( state-char_off = str-strlen ) {
 goto OFFSET($3);
  }
  p = state-byte_off + (char*)str-strstart;
  if( str-encoding-decode(p) != $2 ) {
 goto OFFSET($3);
  }
  ++state-char_off;
  state-byte_off = str-encoding-skip_forward( p, 1 )
 - (char*)str-strstart;
  goto NEXT();
   }

   

   op rx_search(in pmc, in str, inconst int) {
  RXState * state = (RXState*)PMC_data($1);
  STRING * str = state-str;
  INTVAL idx = string_str_index( str, $2, state-char_off );
  if( idx == -1 ) {
 goto OFFSET($3);
  } else {
 char * p = (char*)str-strstart + state-byte_off;
 state-byte_off = str-encoding-skip_forard( p,
idx - state-char_off + $2-strlen ) - p;
 state-char_off = idx + $2-strlen;
  }
  goto NEXT();
   }

Blech, this is really making me wish for some sort of Real string
iterator type.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Macro arguments themselves

2003-08-02 Thread Benjamin Goldberg
Larry Wall wrote:
[snip]
 Nope.  $x and $p are syntax trees.

blink

Macros are passed syntax trees as arguments, but return coderefs?

That's... odd.

I would expect that a macro would be expected to *return* a syntax
tree... which could then undergo (more) macro-expansion.

Sortof like how in lisp, a macro recieves lists as arguments (those
lists being un-evaluated code) and then returns a list, which then has
more macro expansion done on it, and then gets parsed and evaluated.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: approaching python

2003-08-01 Thread Benjamin Goldberg
Joseph F. Ryan wrote:
 Benjamin Goldberg wrote:
 Joseph Ryan wrote:
 Benjamin Goldberg wrote:
[snip]
 Hmm...  If imcc is smart enough, (or perhaps I should say, when the
 flow control is simple/clear enough) it should be able to see when a
 value is pushed onto the stack, and later popped off, when there are
 enough registers free that we could have stored it in a spare
 register instead of on the stack.  That is, a sort of register
 *un*spilling.

 Doesn't IMCC already do this?
 
 I think I misunderstood you.  However, I still don't see this would be
 helpful.

If someone's code emits something like:

   save $P1
   restore $P2

Then IMCC should be able to optimize that to:

   $Ptemp = $P1
   $P2 = $Ptemp

True, a human writing code wouldn't do a save/restore like that, but a
naive compiler might (the 'save' being the last instruction of one of
language's opcodes, and the 'restore' being the first instruction of
the next of one of language's opcodes).  And I'd rather we make imcc
smart enough to make such code fast, than have to make my compiler
smarter.

 How would IMCC know which save/restore are associated with
 the stack, and which are normal save/restore? (i.e., associated
 with parameters/results from a subroutine)  Unless I'm missing
 something, you'd have to between those two sets of use...

That's why I said, when the control flow is simple/clear enough ...

The following isn't the only way to do it, merely the first that comes
to mind, and I *could* be quite mistaken about whether or not it would
work.

We can analyze each basic block, and try and figure out how much it
changes the stack -- how many items from the stack it will pop, and how
many it will push.  We can say unknown for these if we have to, if
it's beyond our ability to analyse.  Also, code can make assertions,
saying that even though we're doing this weird sequence of pushes and
pops, (e.g., doing saves in one loop, followed by restores in another
loop), that it will always pop a total of X, and push a total of Y.

For each stretch of ops, repeatedly scan for a save op, then look for
the following restore op.  If we *know* that the value that was just
saved wasn't popped off the stack by the ops and blocks in between (we
should know this due to the analysis we just did), then we can replace
that save...restore pair with $Ptemp = $P1...$P2 = $Ptemp.



Note that no differentiation needs to be made about whether a
save/restore is for a subroutine argument/return value, or a normal
save/restore.  We only count *how many* saves/restores occur.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: approaching python

2003-08-01 Thread Benjamin Goldberg
Benjamin Goldberg wrote:
[snip]
 If someone's code emits something like:
 
save $P1
restore $P2
 
 Then IMCC should be able to optimize that to:
 
$Ptemp = $P1
$P2 = $Ptemp


Actually, that (sometimes) should be able to be changed to:

   $P2 = $P1
   
   noop

or:

   noop
   
   $P2 = $P1

or even removed entirely, rewriting everything after the  to refer
to $P1 instead of $P2.  Does imcc do anything like this?

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: [perl #23039] [PATCH] event handling-2

2003-08-01 Thread Benjamin Goldberg
Leopold Toetsch wrote:
 OK here it is.
 Again the description for the record:
 
 1) Initialization:
 - normal core: build op_func_table with all opcode #4 [1]
 - CG core: build ops_addr[] filled with this opcode
 - prederef cores: build a list of (backward) branch instructions
   and the opcode at that offset

I'm not sure, but I think you should also include invoke, call, and
invokeccs in that list.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: approaching python

2003-07-27 Thread Benjamin Goldberg


K Stol wrote:
 
 - Original Message -
 From: Michal Wallace [EMAIL PROTECTED]
 To: Luke Palmer [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Sent: Thursday, July 24, 2003 12:01 PM
 Subject: Re: approaching python
 
 
  On 24 Jul 2003, Luke Palmer wrote:
   Klass-Jan Stol writes:
module, right? I don't know Python, and I've a little experience
with IMC, but it seems to me only a new code generator module
should
 
  ...[snip]
 
   Well... sortof.  It's definitely going to take writing a whole new
   code generator module; it's not just a matter of getting the right
   instructions.  Python's interpreter is stack-based, while Parrot's
   is register-based, which are two very different kinds of data.
 
  Right. The implementation would be completely yanked out, but
  if we keep the code generator's interface, then that's the only
  file we have to touch.
 
  The other idea i had was a little crazy, but might make
  life a lot easier: if perl 6 can use pmcs directly,
  why not just compile it down to perl and let the perl
  compiler handle all the register stuff? [I honestly
  have no idea which way would be easier, since I don't
  realy know much about IMCC]
 
 The register stuff, I presume, is register allocation and the like? When
 targeting IMCC, you can use an infinite amount of registers. Just keep a
 counter in the code generator, each time a new register is needed, just
 increment the counter and add a ${S|N|I|P} in front of it (example:
 $P1). IMCC does register allocation and spilling.
 So this is not really difficult.

Nono, the problem isn't that python uses *more* registers than
whatever, but rather, that it doesn't use registers at all.  Instead,
it uses a stack.  So, for example, python's add instruction might get
translated into the following pir or pasm code:

   restore P0
   restore P1
   clone P0, P0
   add P0, P1
   save P0

Needless to say, this is not efficient, due to how slow parrot's push
and pop operations are.

Hmm...  If imcc is smart enough, (or perhaps I should say, when the flow
control is simple/clear enough) it should be able to see when a value is
pushed onto the stack, and later popped off, when there are enough
registers free that we could have stored it in a spare register instead
of on the stack.  That is, a sort of register *un*spilling.

 It may be that this is not as efficient, because it adds an extra layer
 of translation:
 
 Python-Perl6-IMCC-PASM[1]

?  Why would we translate Python into Perl6 code?

Also, don't you mean IMCC-PBC ?  AFAIK, the parrot assembly language is
a mostly vestigal creation whose purpose was to allow us to write code
for parrot while imcc was being developed.  Now that imcc existsworks,
we no longer need nor use a seperate assembly language.

 instead of
 
 Python-IMCC-PASM
 
 [1]. IMCC-PASM integrated in Parrot exec.
 
 Klaas-Jan
 
  Also, there's an older project called rattlesnake that
  never really got off the ground. I don't know if any
  code was produced, but the point was to make a register
  based python vm. Maybe someone from that camp could help
  us out.
 
   I think it would be good design to have the python binary parse for
   us, but it's not likely practical.  Python has eval, so unless we
   want to link with python, we should probably write our own
   parser.[1]
 
  But there's already a parser written in python (in the
  pypython project), and it makes trees that work with the
  code generator. :) So I'm saying, use those tools, and
  just run the compiler from python until it's good enough
  to compile itself.
 
  That just seems like the quickest way to get python
  working.  (Of course, then there's all the C-based
  python modules, but that's another story)

Well, we're already writing a parrot - perl5/XS compatibility later...
if necessary, we could probably (if the python/C interface has
sufficient encapsulation of python objects) write a compatibility layer
for parrot - python/C, if there's sufficient demand for it.  But such
a thing would have to wait until we get python running on parrot in the
first place :)

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Slightly foolish question... :)

2003-07-24 Thread Benjamin Goldberg

What kind of semantics do we want for perl6, if we have:

   my $fh1 = fdopen( $n );
   do {
  my $fh2 = fdopen( $n );
   };
   # is $fh1 valid or not at this point?

For that matter, what about:

   for(1..2) {
  my $fh = fdopen( $n ); # does this succed the second time?
   }

Should these semantics be different depending on whether 0 = $n = 2 ?

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: approaching python

2003-07-24 Thread Benjamin Goldberg
K Stol wrote:
 
 - Original Message -
 From: Michal Wallace [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Wednesday, July 23, 2003 4:48 PM
 Subject: approaching python
 
 
  Hey all,
 
  I've been thinking about the compiling python to
  parrot concept. Right now it looks like the
  approach is to start from scratch, but I'm
  wondering if it might make more sense to
  leverage python itself, at least for now?
 
  Python has a compiler module (written in python
  and standard with the distribution) that can
  take a python parse tree and produce python
  byte code. It basically just walks the tree
  and has a method for each python structure.
 
  The actual parser is written in c, but there's
  a drop-in replacement (or at least a partial one)
  written in python described here:
 
http://codespeak.net/moin/pypy/moin.cgi/BytecodeCompiler
 
  Would it make sense to use this to boostrap the
  python parrot compiler? I was thinking about taking
  a shot next week at replacing compiler.pycodegen.CodeGenerator
  with something that produced IMCC.
 
 Correct me if I'm wrong, but there's no bootstrap problem, just a matter
 of reimplementing the code generator.

Except that you'd have to reimplement the code generator *in python* to
produce parrot bytecode.

Then run it once (on the python parser and code generator themselves)
using the python executable, *then* you can run them using parrot.

Sounds like bootstrapping to me. :P

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Events

2003-07-23 Thread Benjamin Goldberg
Dan Sugalski wrote:
[snip]
 Here's what we *are* doing.
 
 We are going to do all I/O under the hood asynchronously. Completed
 async IO puts an event in the event queue.
 
 We are going to have a single unified event queue. All IO, timer,
 signal, and UI events go in it. All of 'em. We aren't going to do any
 of this we have N different places to look for data nonsense, where
 N is greater than three.

Don't forget, flock(), fcntl(), ioctl(), wait()/waitpid(), all the SysV
stuff, all the (get|set|end)(net|gr|pw)(ent|.id|name?) functions,
sendmsg, recvmsg, ... :)

I've occasionally wanted to do IO while waiting for an exclusive lock on
a filehandle, but I don't think it's possible without threads.

The closest you could come would be to intersperse the IO calls with
calls to flock(...,LOCK_EX|LOCK_NB).  Sadly, this runs the risk of
never acquiring the lock, due to other processes always having it.  If
you lock without the NB, then waiters will generally get the lock in
fifo order (well, it's at the whim of the OS, but most OSs try to
schedule unlocking in a way that avoid resource starvation).

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: The Perl 6 Summary -- preprocessors

2003-07-23 Thread Benjamin Goldberg
Luke Palmer wrote:
 
  grammar Grammars::Languages::C::Preprocessor {
rule CompilationUnit {
  ( Directive | UnprocessedStuff )*
}
 
rule Directive {
  Hash ( Include
 | Line
 | Conditional
 | Define
  ) Continuation*
}
 
rule Hash { /^\s*#\s*/ }
rule Include {...}
rule Line {...}
rule Conditional {...}
rule Define {...}
rule Continuation {...}
rule UnprocessedStuff {...}
  }
 
 We're not quite in the world of ACME::DWIM, so you can't just replace
 the important stuff with ... .  :-)
 
 You're not outputting a parse tree, you're just outputting more text
 to be parsed with another, text-based, grammar.  It seems to me like
 it's a big s//ubstitution, of sorts.

Hmm... well, think about yacc for a moment.  You could either have the
handlers for rules assign to $$, based on $1, etc., and thus build a
huge structure, OR, you could have the handlers for rules print stuff to
stdout (which might possibly be a pipe to another process).

ISTM that we also want our grammers to be able to do things like that...

We want to be able to produce a tree, *and*, we want to be able to act
as a filter.  And probably also some combination thereof -- some rules
produce trees, and other rules within the same grammer (which contain
the tree-producing ones) somehow process and output those trees.

 Or maybe not, maybe we make an output stream which will be fed to
 Grammars::Languages::C.  Here's an implementation a #include
 processor, using as little made-up syntax as possible.
 
 use Pattern::Common;
 
 grammar Preprocessor {
 rule include {
 :w(/\h*/)
 \# include Pattern::Common::filename $$
 { $0 := ( open  $filename) ~~ /main/ }
 }
 
 rule main {
 $0 := ( [include | .]* )
 }
 }

Alas, this doesn't work right with a fairly common idiom:

   #ifdef HAVE_FOO_H
   #include foo.h
   #endif

Nor the even more common:

   #ifndef SYS_SOMEFILE_H
   #define SYS_SOMEFILE_H
   lots of stuff, possibly including some #includes, and some typedefs.
   #endif /* SYS_SOMEFILE_H */

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: [perl #23025] [PATCH] env.t doesn't test the env ops on solaris (and others)

2003-07-23 Thread Benjamin Goldberg


Lars Balker Rasmussen wrote:
 
 # New Ticket Created by  Lars Balker Rasmussen
 # Please include the string:  [perl #23025]
 # in the subject line of all future correspondence about this issue.
 # URL: http://rt.perl.org/rt2/Ticket/Display.html?id=23025 
 
 There's no reason to test for the presence of setenv/unsetenv in libc
 - these functions are emulated if not present.
 
 However, now the 4th test fails on Solaris (and most likely other OS's
 witout setenv/unsetenv).  This is because the test relies on a key
 disappearing from %ENV when it's been unsetenv'ed - this doesn't
 happen when using putenv(key=) as the current unsetenv emulation.
 
 I'm not sure what's the best portable way to handle this is.

a) Have config test for the existance of extern char **environ, or
possibly for extern char **_environ, and then emulate unsetenv by
mangling the contents of that variable.  We might even try and get an
envp as a third argument to main(), and mangle that... I'm not sure
whether or not this would actually effect getenv(), though.

b) Use a hash to keep track of which keys we have deleted from %ENV. 
When we want to know if a key exists in %ENV, first try to getenv() it,
and if it's an empty string (), see if it's mentioned in the hash of
things we deleted from %ENV: if so, *claim* that the key doesn't exist.

Although (a) is unportable (and maybe a bit evil), sufficient testing
with Config might allow us to better fake unsetenv() on some systems.

Option (b) is quite portable, but is *definitely* slightly evil -- we'd
really only be deluding ourselves into thinking that the keys had been
deleted, but child processes, and anything in our own process which does
getenv directly (bypassing %ENV), will know that the deleted key is
really there.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Small perl task for the interested

2003-07-19 Thread Benjamin Goldberg


Josh Wilmes wrote:
 
 At 12:48 on 07/14/2003 +0200, Lars Balker Rasmussen [EMAIL PROTECTED] wrote:
 
  I've taken this very simple approach to the problem.  A perl-wrapper
  for the CC lines in makefiles/root.in
 
  .c$(O) :
$(PERL) tools/dev/cc_flags.pl $(CC) $(CFLAGS) ${cc_o_out}$@ -c $
 
 I would go a bit further, and create a tools/build/compile, tools/build/
 link_executable, tools/build/link_library, etc.

That would be silly.  Instead, specify the file to read the flags from
as the first argument to cc_flags.pl.  That is, change:

   if (open F, CFLAGS) {

To:

   if (open F, shift @ARGV) {

Then:

c$(O) :
$(PERL) tools/dev/flags.pl CFLAGS $(CC) $(CFLAGS) \
${cc_o_out}$@ -c $

And for linking, flags.pl gets an argument of LINKFLAGS, and for making
a shared library, it gets an argument of SHAREFLAGS, etc..  In each of
those files are rules for the per-file flags for that type of step.

 Take all the flags out of the makefile altogether.  Just a thought.
 
 --Josh

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Events

2003-07-18 Thread Benjamin Goldberg
Damien Neil wrote:
[snip]
 Also, given that asynchronous IO is a fairly unpopular programming
 technique these days (non-blocking event-loop IO and blocking
 threaded IO are far more common), I would think long and hard before
 placing support for it as a core design goal of the VM.  If there
 is a compelling reason to use AIO, the implementation may better
 be handled at a lower level than Parrot; even if Parrot itself does
 not support AIO at the opcode level, Parrot programs could still
 use it by calling down to a C library.

AIO is unpopular because it's not widely/portably supported, whereas
non-blocking event-loop IO is, (one generally does have either select or
poll), as is blocking threaded IO (even if the thread starting stuff may
need to be different on some platform, it's *mostly* portable).

If we make it a core part of parrot, it will become more popular, simply
because of parrot's availability.

  It's not just signals, there's a lot of stuff that falls into this
  category. We've got to deal with it, and deal with it properly, since
  not dealing with it gets you an 80% solution.
 
 Outside of signals and AIO, what requires async event dispatch?
 User events, as you pointed out above, are better handled through
 an explicit request for the next event.

Inter-thread notification and timers?  True, these *could* be considered
to be user events, but IMHO, there are times when we want a user
event to have the same (high) priority as a system signal.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Aliasing an array slice

2003-07-18 Thread Benjamin Goldberg


David Storrs wrote:
 
 Thinking about it, I'd rather see lvalue slices become a nicer version
 of Csplice().
 
  my @start = (0..5);
  my @a = @start;
 
  @a[1..3] = qw/ a b c d e /;
  print @a;   #  0 a b c d e 4 5

What would happen if I used 1,2,3 instead of 1..3?  Would it do the same
thing?  I wanna know what happens if I do:

   @a[0,2,4] = qw/ a b c d e /;

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Aliasing an array slice

2003-07-18 Thread Benjamin Goldberg


Luke Palmer wrote:
 
  David Storrs wrote:
  
   Thinking about it, I'd rather see lvalue slices become a nicer version
   of Csplice().
  
my @start = (0..5);
my @a = @start;
  
@a[1..3] = qw/ a b c d e /;
print @a;   #  0 a b c d e 4 5
 
  What would happen if I used 1,2,3 instead of 1..3?  Would it do the
  same thing?
 
 Of course.
 
  I wanna know what happens if I do:
 
 @a[0,2,4] = qw/ a b c d e /;
 
 It would probably do the same as in Perl 5; the same thing as:
 
 @a[0,2,4] =  a b c ;
 
 (those   brackets are new shorthand for qw, not that qw is going
 anywhere)

Hmm... so, if I were to do:

   @x = map int(rand(6)), 1..3;
   @[EMAIL PROTECTED] = a .. e;
   print @a.length;

Then, if the three integers in @x are consecutive, @a grows, but
otherwise it doesn't?

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: Aliasing an array slice

2003-07-18 Thread Benjamin Goldberg
Dave Whipp wrote:
 Luke Palmer wrote:
  Benjamin Goldberg wrote:
   David Storrs wrote:
 @a[1..3] = qw/ a b c d e /;
 print @a;   #  0 a b c d e 4 5
  
   What would happen if I used 1,2,3 instead of 1..3?
   Would it do the same  thing?
 
  Of course.
 
 I tend to agree, I think. But see below
 
   I wanna know what happens if I do:
  
  @a[0,2,4] = qw/ a b c d e /;
 
  It would probably do the same as in Perl 5; the same thing as:
 
  @a[0,2,4] =  a b c ;
 
 But that would be awful:
 
@a[$x,$y] = a b c d e
 
 would insert all 5 elements if ($y == $x+1); but only the first two
 otherwise. Belch.
 
 If we wanted the array-splice thing to resize arrays for us, then either
 we trigger the behavior only when rx/ \[ scalar \.\. scalar \]/, or

AFAIK, in perl6, $a..$b doesn't create an expanded list... it creates a
lazy list, which gets expanded as-needed.  Most likely, this list
object, and the list object representing the values being stored there,
would be handed directly to the @a array object, just as if we'd done
code like:

   @a[ new LazyIntRange(1,3) ] = new LazyStrRange(a,e);

So, @a can say, hey, this is a Range, not merely a list of 1,2,3... I
should instead splice() the values in

 we need to define the spilling algorithm in a way that makes sense (e.g.
 all remaining items go into the element of the righmost index).

 But perhaps there's a better way to beet the desire. Perhaps ellipses
 could be emplyed to indicate that the splice is allowed to resize the
 array:
 
   @a[1...3] = qw(a b c d e);

That might work, too.

My way is more invisible, but then again, if someone *wants* to have
@a[1,2,3,4,5,6,7,8,9] = @x, and writes it as @a[1..9] = @x, and is
surprised by perl doing a splice, it would be bad; with ... it would
only splice when the user asks for a splice.

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


Re: [WIP PATCH] core.ops split-up

2003-07-18 Thread Benjamin Goldberg

With this increase in the number of files, I'm going to ask for
something I've thought of for a while, but not had the guts to ask for.

Could we try and clean up the parrot/ directory?  Specifically, I'd like
all of the source code itself moved into a single subdirectory, leaving
at the toplevel only directories, the all-caps files, and a few others
(ChangeLog, Configure.pl, parrot.spec, and whatever else is really
needed to build and distribute parrot (make.pl?)).

-- 
$a=24;split//,240513;s/\B/ = /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print [EMAIL PROTECTED]
]\n;((6=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))redo;}


  1   2   >