Re: [perl #24030] [PATCH] hash.t fails on OS X

2003-09-25 Thread Jeff Clites
Hi:

In case it helps, it looks like it's crashing at string.c:552, because 
it's trying to call src->encoding->decode() but src->encoding is NULL.

(gdb) f 0
#0  0x6104 in string_transcode (interpreter=0x616400, src=0x623440, 
encoding=0x19e43c, type=0x19a6fc, dest_ptr=0x0) at string.c:552
552 UINTVAL c = src->encoding->decode(srcstart);
(gdb) l
547 srcend = srcstart + src->bufused;
548 deststart = dest->strstart;
549 destend = deststart;
550
551 while (srcstart < srcend) {
552 UINTVAL c = src->encoding->decode(srcstart);
553
554 if (transcoder1)
555 c = transcoder1(src->type, dest->type, c);
556 if (transcoder2)
(gdb) p encoding
$1 = (const struct parrot_encoding_t *) 0x19e43c
(gdb) p src
$2 = (struct parrot_string_t *) 0x623440
(gdb) p src->encoding
$3 = (const struct parrot_encoding_t *) 0x0

Here's another backtrace, with a little more info:

Program received signal EXC_BAD_ACCESS, Could not access memory.
0x6104 in string_transcode (interpreter=0x616400, src=0x623440, 
encoding=0x19e43c, type=0x19a6fc, dest_ptr=0x0) at string.c:552
552 UINTVAL c = src->encoding->decode(srcstart);
(gdb) bt
#0  0x6104 in string_transcode (interpreter=0x616400, src=0x623440, 
encoding=0x19e43c, type=0x19a6fc, dest_ptr=0x0) at string.c:552
#1  0x6fbc in string_compare (interpreter=0x616400, s1=0x625cd8, 
s2=0x623440) at string.c:949
#2  0x45b0 in find_bucket (interpreter=0x616400, hash=0x6223e0, 
head=0, key=0x6816b0) at hash.c:281
#3  0x4a4c in hash_put (interpreter=0x616400, hash=0x6223e0, 
key=0x6816b0, value=0xbb50) at hash.c:406
#4  0x2b5c in main (argc=1, argv=0xbc2c) at CrashingTest.c:36
#5  0x27f8 in _start (argc=1, argv=0xbc2c, envp=0xbc34) at 
/SourceCache/Csu/Csu-45/crt.c:267
#6  0x2678 in start ()

JEff

On Thursday, September 25, 2003, at 08:22  AM, Michael Scott wrote:

On Thursday, Sep 25, 2003, at 16:06 Europe/Berlin, Michael Scott wrote:

On Thursday, Sep 25, 2003, at 13:20 Europe/Berlin, Leopold Toetsch 
wrote:

Michael Scott <[EMAIL PROTECTED]> wrote:

t/src/hash.t

test 7 fails on Mac OS X 10.2.6 (gcc 3.3) because BIGLEN is too big:
9.
100_000 chars for the key doesn't seem to be very big.
Wher does it fail?
Can you debug/back-trace it?
(gdb) r
Starting program: /Users/mike/Developer/Parrot/Tests/mem_test
Reading symbols for shared libraries . done
Program received signal EXC_BAD_ACCESS, Could not access memory.
0x5f30 in string_transcode ()
(gdb) bt
#0  0x5f30 in string_transcode ()
#1  0x6de8 in string_compare ()
#2  0x43e4 in find_bucket ()
#3  0x4880 in hash_put ()
#4  0x2998 in main ()
#5  0x25a4 in _start (argc=1, argv=0xbdd4, envp=0xbddc) 
at /SourceCache/Csu/Csu-45/crt.c:267
#6  0x2424 in start ()

And just as I was going to get started on t/src/string.t too.
I'm running tests on string_compare and string_transcode with 999 
byte strings without complaint.



I've made it a bit smaller: 65536.

This begs the questions:

 What is the maximum hash key size?
 What is the largest STRING?
There are no limits except those imposed by UINTVAL (2^32-1) AFAIK.

Mike
leo






Re: [perl #24030] [PATCH] hash.t fails on OS X

2003-09-26 Thread Jeff Clites
I did a bit more digging on this test failure, and I think it's an 
"infant mortality" case--it looks like creating the large string might 
be triggering a DOD run which is freeing the hash. Just dumping the 
hash before and after creating the string demonstrates (by crashing 
during the second dump):

big = calloc(BIGLEN, sizeof(char));
big = memset(big, 'x', BIGLEN - 1);
new_hash(interpreter, &hash);

dump_hash(interpreter, hash);

key = string_from_cstring(interpreter, big, NULL);

dump_hash(interpreter, hash);

Output:

Hashtable[0/16]  Bucket 16: type(0) -> type(0) -> type(0) -> type(0) -> 
type(0) -> type(0) -> type(0) -> type(0) -> type(0) -> type(0) -> 
type(0) -> type(0)
Hashtable[0/16]  Bucket 16: type(524288) -> Segmentation fault

What's the preferred way to prevent this in tests? I know there are 
various approaches possible, but I don't know if there is a consensus 
for how to deal with this in test cases--disabling DOD works but seems 
a bit heavy handed.

JEff

On Thursday, September 25, 2003, at 09:23  AM, Jeff Clites wrote:

Hi:

In case it helps, it looks like it's crashing at string.c:552, because 
it's trying to call src->encoding->decode() but src->encoding is NULL.

(gdb) f 0
#0  0x6104 in string_transcode (interpreter=0x616400, 
src=0x623440, encoding=0x19e43c, type=0x19a6fc, dest_ptr=0x0) at 
string.c:552
552 UINTVAL c = src->encoding->decode(srcstart);
(gdb) l
547 srcend = srcstart + src->bufused;
548 deststart = dest->strstart;
549 destend = deststart;
550
551 while (srcstart < srcend) {
552 UINTVAL c = src->encoding->decode(srcstart);
553
554 if (transcoder1)
555 c = transcoder1(src->type, dest->type, c);
556 if (transcoder2)
(gdb) p encoding
$1 = (const struct parrot_encoding_t *) 0x19e43c
(gdb) p src
$2 = (struct parrot_string_t *) 0x623440
(gdb) p src->encoding
$3 = (const struct parrot_encoding_t *) 0x0

Here's another backtrace, with a little more info:

Program received signal EXC_BAD_ACCESS, Could not access memory.
0x6104 in string_transcode (interpreter=0x616400, src=0x623440, 
encoding=0x19e43c, type=0x19a6fc, dest_ptr=0x0) at string.c:552
552 UINTVAL c = src->encoding->decode(srcstart);
(gdb) bt
#0  0x6104 in string_transcode (interpreter=0x616400, 
src=0x623440, encoding=0x19e43c, type=0x19a6fc, dest_ptr=0x0) at 
string.c:552
#1  0x6fbc in string_compare (interpreter=0x616400, s1=0x625cd8, 
s2=0x623440) at string.c:949
#2  0x45b0 in find_bucket (interpreter=0x616400, hash=0x6223e0, 
head=0, key=0x6816b0) at hash.c:281
#3  0x4a4c in hash_put (interpreter=0x616400, hash=0x6223e0, 
key=0x6816b0, value=0xbb50) at hash.c:406
#4  0x2b5c in main (argc=1, argv=0xbc2c) at CrashingTest.c:36
#5  0x27f8 in _start (argc=1, argv=0xbc2c, envp=0xbc34) at 
/SourceCache/Csu/Csu-45/crt.c:267
#6  0x2678 in start ()

JEff

On Thursday, September 25, 2003, at 08:22  AM, Michael Scott wrote:

On Thursday, Sep 25, 2003, at 16:06 Europe/Berlin, Michael Scott 
wrote:

On Thursday, Sep 25, 2003, at 13:20 Europe/Berlin, Leopold Toetsch 
wrote:

Michael Scott <[EMAIL PROTECTED]> wrote:

t/src/hash.t

test 7 fails on Mac OS X 10.2.6 (gcc 3.3) because BIGLEN is too 
big:
9.
100_000 chars for the key doesn't seem to be very big.
Wher does it fail?
Can you debug/back-trace it?
(gdb) r
Starting program: /Users/mike/Developer/Parrot/Tests/mem_test
Reading symbols for shared libraries . done
Program received signal EXC_BAD_ACCESS, Could not access memory.
0x5f30 in string_transcode ()
(gdb) bt
#0  0x5f30 in string_transcode ()
#1  0x6de8 in string_compare ()
#2  0x43e4 in find_bucket ()
#3  0x4880 in hash_put ()
#4  0x2998 in main ()
#5  0x25a4 in _start (argc=1, argv=0xbdd4, envp=0xbddc) 
at /SourceCache/Csu/Csu-45/crt.c:267
#6  0x2424 in start ()

And just as I was going to get started on t/src/string.t too.
I'm running tests on string_compare and string_transcode with 999 
byte strings without complaint.



I've made it a bit smaller: 65536.

This begs the questions:

 What is the maximum hash key size?
 What is the largest STRING?
There are no limits except those imposed by UINTVAL (2^32-1) AFAIK.

Mike
leo







Re: Q: vtable - can, isa, does

2003-09-27 Thread Jeff Clites
On Saturday, September 27, 2003, at 07:46  AM, Dan Sugalski wrote:

The problem with checking for a single method by name is that it's 
possible (potentially likely) that the method you find isn't the 
method you want, since names short enough to be useful are short 
enough to be confusing. ("rotate" is a perfectly good name for a 
method that does matrix manipulation, image manipulation, and UI 
element manipulation -- just because an object says it can rotate 
doesn't mean that it actually does what you want)

It's certainly possible to have multiple interfaces with the same name 
that do different things, but that's less likely, and it still gives 
the programmer another way to make sure code'll behave the way 
he/she/it wants.
Another place where interfaces are useful is in a distributed computing 
context, because you can inquire about the capabilities of a remote 
object in one round trip by asking if it implements a particular 
interface, rather than asking about each method individually.

JEff



Re: CVS checkout hints for the bandwidth-limited

2003-09-27 Thread Jeff Clites
On Friday, September 26, 2003, at 07:14  PM, Dan Sugalski wrote:

At 8:54 PM +0200 9/26/03, Leopold Toetsch wrote:
Dan Sugalski <[EMAIL PROTECTED]> wrote:

   cvs -z9 update
  cvs -z9 update -dP parrot

to get rid of deleted files too.
Which is a good point, and one I forgot. (Which is annoying, as it's 
how I update)
Actually, I'm pretty sure that cvs will delete local copies of file 
removed from the repository with just "cvs update". The "-dP" just 
affects directories--the "-d" says to pull down and update new 
directories, and the "-P" tells it to prune (remove) empty directories 
(this is necessary because cvs doesn't really version directories, and 
you end up not really having a way to fully remove them from the 
repository).

My canonical invocation is:

	cvs -q -z6 update -dP

(I use -z6 rather than -z9 because I remember reading somewhere that 
the higher compressions numbers don't save much space, but use a lot 
more CPU, and level 6 is the default for gzip.)

JEff



Re: MANIFEST verification

2003-09-29 Thread Jeff Clites
There's already a test in the form of t/src/manifest.t, which you might 
be able to adapt (or invoke directly).

JEff

On Monday, September 29, 2003, at 01:36  PM, Robert Spier wrote:

There's been a little bit of hubbub recently about people checking
things into the languages/ directories and breaking the MANIFEST.
I can setup the CVS repository to notify the user after a checkin where
something has gone awry.  But, I'm low on tuits, so I need to farm out
a little of the work.
I need a script that meets the following specs:

  Input (via command line)
1)  file representing MANIFEST
2)  file representing MANIFEST.SKIP
3)  file representing list of all files found
  Output (via stdout)
error message if manifest check fails
  Restrictions:
The script cannot access any other files/filesystems/directory
listings besides the data in the files provided on the command
line.
I think the easiest to implement this is by subclassing
ExtUtils::Manifest, and overriding manifiles() and maybe one or two
other subroutines.
-R (CVS TimeLord)




Re: PIO tests

2003-09-30 Thread Jeff Clites
On Tuesday, September 30, 2003, at 01:24  PM, Michael Scott wrote:

When I was writing the PIO_parse_open_flags() test it did seem to me 
rather Perlish to have string flags in the first place. But I'm a new 
cockroach in town, so I kept my mouth shut, not wanting to get stomped 
on.

If the PIO_F_* flags used by open() were moved up to io.h then they 
could be exported as .constants by config/gen/parrot_include.pl and 
used explicitly.

Various composite flags could be added for convenience.

PIO_F_WRITE_TRUNC   (">")
PIO_F_WRITE_APPEND  (">>")
PIO_F_READ_WRITE("+<", "+>")
Also, add an INT flag version of open.

	inline op open(out PMC, in STR, in INT)
It does seem to make more sense to have the open op take an INT mode, à 
la the Unix open(2), and leave the translation from a flags string to 
an INT to the language level. This of course leaves the PIO layer less 
language-specific, which is good, and it also gets rid of ambiguities 
such as whether "<+" is a valid alternative to "+<", or whether "<" 
is valid, or what do do with a mode string such as ";)-", since a bit 
in an INT is either set or not, making an INT flag-set unambiguous and 
(probably) never invalid.

Here is the list of modes from the open(2) man page on my system; I'm 
not sure if this is a subset or a superset of what PIO should support:

   O_RDONLYopen for reading only
   O_WRONLYopen for writing only
   O_RDWR  open for reading and writing
   O_NONBLOCK  do not block on open or for data to become 
available
   O_APPENDappend on each write
   O_CREAT create file if it does not exist
   O_TRUNC truncate size to 0
   O_EXCL  error if create and file exists
   O_SHLOCKatomically obtain a shared lock
   O_EXLOCKatomically obtain an exclusive lock

JEff



LF needed at end of pasm files?

2003-10-04 Thread Jeff Clites
The pasm files in examples/benchmarks give an error when run with 
parrot, because they lack a final line feed character (at the very end 
of the files):

	error:imcc:parse error, unexpected EOM, expecting '\n'

	in file 'examples/benchmarks/gc_waves_sizeable_data.pasm' line 85

I'd imagine this indicates a behavior change in imcc at some point. 
It's easy enough to add the LFs, but it might be better to not require 
the final line end character.

JEff



Re: You too can direct the Attack of the Unicode Monster!

2003-10-07 Thread Jeff Clites
I've been poking at it a little bit--nothing finished yet, but I've 
been making some headway.

JEff

On Tuesday, October 7, 2003, at 08:51  AM, Dan Sugalski wrote:

WHich ought ot be some sort of really bad '50s era black and white 
giant
atomic monster movie. But anyway.

ICU now configures and builds on at least some platforms. That means 
it's
time to build an encoding and chartype library for it. This'd be a nice
little project for someone looking to get their feet wet in the parrot
source, so...

Takers?

	Dan




Re: An evil task for the interested

2003-10-13 Thread Jeff Clites
On Oct 9, 2003, at 8:43 AM, Dan Sugalski wrote:

We've got ordered destruction on the big list 'o things to do, and it
looks like we need to get that done sooner rather than later.
...
Not a huge task, we just need to order PMCs before destroying them, to
make sure we don't call the destructor for a PMC before we destroy the
things that depend on it. A quick run through the PMC pools to build 
up a
dependency chain if there's more than one PMC that needs destruction
should do it.
I have a partial implementation of this now. (Need to clean it up 
before submitting it.) There are a couple of thorny points that may 
need some discussion, but I'll start with the sharpest thorn: For PMCs 
with a custom mark method, the idea is that it's opaque exactly what 
other object they have references to, right? (That is, the PMC knows 
how to mark them alive, but doesn't reveal what they are, right?)

I'm asking because my approach is to collect the list of PMCs which 
need their destroys called, and trace through this set (like a DOD run) 
to see which ones are referred to by others on the list (and more 
importantly, which ones are not), destroy() those that are unreferenced 
and take them off the list (or do this to an arbitrary one if none are 
unreferenced), and repeat until done. This works out fine, but I don't 
think I can use the PObj_live_FLAG for this marking or it will mess up 
the subsequent work in free_unused_pobjects(), but there isn't a way to 
ask a mark() method to set another flag. This leaves me stuck with 
regard to PMCs with custom mark methods (but it's working for the other 
PMC flavors of referring).

So, I wanted to see if I'm missing something obvious, or if we need for 
mark() to take a parameter to indicate the flag to set.

(For a second I was about to decide that I could re-use PObj_live_FLAG, 
since by definition it will be unset for objects we are going to 
destroy(), thinking that I just need to be sure to clear the flag again 
before getting to the rest of free_unused_pobjects(), but since there's 
no unmark() method I'm again stuck, since my tracing could 
inadvertently appear to bring things back to life. Though I suppose a 
subsequent DOD run would get them)

Thoughts?

JEff



Re: Instantiating objects

2003-10-15 Thread Jeff Clites
On Oct 15, 2003, at 8:36 AM, Dan Sugalski wrote:

I'm poking around in the object stuff today, to try and get at least
single-inheritance objects up and running.
At the moment, I'm torn between just having a new method of some sort 
in
the class and having a vtable method on the class PMC that returns an
object of that class. (get_pmc on the class returns a new object, or
something of the sort)
What do you mean by "new method in the class" above?

To use the vocabulary of other languages, if you're trying to decide 
between a static method (a.k.a. a class method), versus a instance 
method on the class object, I'd say they amount to the same thing. 
(Either involves calling an implementation which depends on the class, 
and arranging for that implementation to have access to class-specific 
data, in the form of a class object or static class data.) Here the 
distinction is mostly just a matter of viewpoint--C++ has static 
methods, ObjC has class objects.

If you're trying to decide whether the creation method should have 
access to a class object, then I think it should--that makes it easier 
to implement things such as singletons, since you have a place to hang 
state, such as a cache. And calling this an instance method on the 
class object is conceptually simpler (I would argue).

Or did I totally miss your point?

JEff



Re: Instantiating objects

2003-10-15 Thread Jeff Clites
On Oct 15, 2003, at 1:48 PM, Melvin Smith wrote:

At 03:49 PM 10/15/2003 -0400, Dan Sugalski wrote:
On Wed, 15 Oct 2003, Jeff Clites wrote:

> On Oct 15, 2003, at 8:36 AM, Dan Sugalski wrote:
>
> > I'm poking around in the object stuff today, to try and get at 
least
> > single-inheritance objects up and running.
> >
> > At the moment, I'm torn between just having a new method of some 
sort
> > in
> > the class and having a vtable method on the class PMC that 
returns an
> > object of that class. (get_pmc on the class returns a new object, 
or
> > something of the sort)
>
> What do you mean by "new method in the class" above?

In this case, the "new method" is a named method in the class 
namespace
that we look up and call. We'd look it up and dispatch to it.

The vtable method, on the other hand, would be a function in the 
vtable
that we'd dispatch to like any other vtable method. It'd be a class
method, as we don't really have object methods as such. (I think, 
hard to
say, the terminology escapes me at the moment)

Personally, of the two I prefer a vtable entry, as it's a lot faster.
...
instead, basically eliminating the differences between "objects" and
generic PMCs. I'm not 100% sure I want to do that, though.
...
For some reason I'm getting the feeling that PMCs and "Classes" should
be one and the same. Otherwise it seems we are making Parrot more
complex than it needs to be.
My gut feeling is that everything should be a Class, whether 
implemented
in the C API or in bytecode, and a PMC vfun should be equivalent to a
class method. That stands for PerlUnder, PerlHash, etc. as well.

If we have class support, native call support and an extension API, I
see no reason why we even need the concept of a PMC.
My brain was in PMC == class mode when I wrote what I wrote above, but 
now I need to rethink, since that's not a given (yet?). I was probably 
thinking that because currently the *.pmc files live in directories 
called "classes" and "dynclasses" (So I was thinking of PMCs as 
classes at the parrot implementation level--that is, if Parrot were 
written in C++, they'd probably be C++ classes.)

So at the moment I'm a bit fuzzy on the terminology we're using--I 
guess there are "classes" and "objects" at the HLL level (the 
terminology for which may differ between HLLs), and PMC definitions 
(the source code) and instances (the allocated in-memory "thing") at 
the parrot level. And I guess a fundamental design choice for parrot is 
that a "Perl object" can be treated as an object from Python code (and 
vice versa), although this is hard to reconcile (at least 
terminologically) with the Perl (Perl 5 at least) notion that 
strings/arrays/hashes/etc. aren't considered to be objects, but they 
become objects if they are blessed into a class.

Hmm, I need to think this over a bit more

JEff



Re: [perl #24224] [PATCH] IMCC: Macros are handled via hash

2003-10-16 Thread Jeff Clites
On Oct 16, 2003, at 7:21 AM, Dan Sugalski wrote:

On Thu, 16 Oct 2003, Leopold Toetsch wrote:

Juergen Boemmels <[EMAIL PROTECTED]> wrote:
Leopold Toetsch <[EMAIL PROTECTED]> writes:

The IMCC_INFO does not need to be part of the interpreter.
I don't like to have global state variables around. I don't know yet,
what happens if 2 interpreter threads both do eval()...
If it's per-interpreter, nothing, since each thread should have its own
intepreter. (Though I can see some potential deadlock issues, though
that's not specific to this)
The lex-generated stuff that Juergen mentioned could be more of a 
problem, since it seems to store its state in globals (yytext, etc.), 
and that might be pretty tricky to "fix". I've always worried about the 
apparent lack of thread-safety of such generated lexers/parsers when 
I've run across them before, and wondered why they don't pass around a 
context structure instead. (I suppose the tools are old enough that 
they may actually pre-date threading as a widespread concept)

JEff



Re: mmd_dispatch_numval

2003-10-16 Thread Jeff Clites
On Oct 16, 2003, at 2:40 PM, Simon Glover wrote:

On Thu, 16 Oct 2003, Dan Sugalski wrote:

On Thu, 16 Oct 2003, Simon Glover wrote:

From mmd.pod:
=item FLOATVAL mmd_dispatch_numval(interp, left, right, func_num)

Like C, only it returns a FLOATVAL.

Wouldn't it be more sensible to call the function 
mmd_dispatch_floatval ?
Yep. The terminology's mixed in a number of places--we should 
probably go
fix it everywhere and be done with it. (Unless I'm just behind the 
times
and its already been done :)
 OK, I've changed it over to mmd_dispatch_floatval in mmd.c, mmd.h and
 mmd.pod; as far as I can tell, it's not yet being used anywhere else.
Although I think "float" is clearer than "number" (in that it's clearly 
different than "integer"), it might be easier to go with 
"number"--since we have set_number vtable methods, and "N" registers 
rather than "F" registers

JEff



Re: Lexical Pads

2003-10-16 Thread Jeff Clites
On Oct 16, 2003, at 10:53 PM, Will Coleda wrote:

I'm trying to write "global" for tcl, and trying to pull out a 
variable from the outermost pad, and failing to find it.
Globals aren't stored in the lexical pads--there are find_global and 
store_global ops (see var.ops)--is that what you are looking for?

JEff



Ordered destruction and object graph traversal

2003-10-20 Thread Jeff Clites
As mentioned in a previous post, I've been working on getting ordered 
destruction working. I actually have a mostly-working version of this 
now, which will merit its own discussion, but there's a particular 
aspect which I wanted to call out in light of Dan's recent post on 
"Object freezing".

A key aspect of getting destruction-ordering right is doing an object 
graph traversal to determine which objects are reachable from objects 
which need to have their vtable->destroy() methods called--these 
objects can't safely go away until these destroy's have been called. 
This amounts to doing exactly what trace_children() does, except that 
(1) you start with a different base set of objects to trace out from, 
and (2) you need to set a different flag on the objects you reach.

For PMC's with custom vtable->mark() methods, this posed a 
problem--that method sets the live flag, but now I needed to repeat 
that logic almost exactly, but setting a different flag. Hmm--that was 
telling me something, namely that we needed to generalize. In 
particular, what was needed in both cases was simply a way to find out 
what other objects are referenced by a particular object.

Since this is to be used in the middle of a DOD run, I thought it 
crucial that this not require the allocation of additional memory 
(which rules out an iterator object or method which returns something 
like a flat array of referenced objects).

My solution was to define a new vtable method--I've called it visit(), 
though the name's not the important part--to which you pass a callback 
(plus an optional context argument). It's job is to invoke that 
callback on each of it's referenced "children" (also passing the 
context object with each invocation), and that's it. The PMC doesn't 
know or care what the callback does--in a GC context the callback may 
set a flag on what's passed in and stash it for later processing; in 
another context it may be used inside of an iterator which vends out 
objects one-by-one and guards against loops.

With this method, you end up not needing vtable->mark() -- all you need 
to do is call vtable->visit() and pass it a callback which wraps 
pobject_lives(). To do the destruction ordering which I need, you pass 
it a different callback, which sets a different flag. And finally (the 
reason I bring this up now), I expect that this method will serve as 
the primitive which a vtable->traverse() implementation would leverage.

To be clear, this vtable->visit() isn't an object traversal 
method--it's a tell-me-what-you-directly-point-to method, and often a 
building block for traversal strategies or for iterator 
implementations. I'm fairly convinced that a separate vtable->mark() is 
unnecessary; at most, I'd say that the default implementation (which 
calls visit() to do its work) would rarely need to be overridden. 
Similarly for vtable->traverse(), although it's too soon to tell for 
sure.

Again, I'm bringing this up now so that anyone diving into freeze/thaw 
or traversal can leverage the work/thought I've put into this so far. 
(I have some other thoughts on how to do traversals, but I'll save 
those for another post.)

(My ordered-destruction implementation not quite ready for general 
consumption yet--there are quite a few existing PMCs with custom mark 
methods which need to be converted to custom visit methods. It's 
straightforward to do but is taking a bit of time, and I want to finish 
that before submitting this to make sure that I haven't overlooked 
anything. All you really have to do to turn a mark() method into a 
visit() method is change the prototype and change the call to 
pobject_lives into a call to the callback.)

JEff



Re: Object freezing

2003-10-20 Thread Jeff Clites
On Oct 20, 2003, at 10:09 PM, Gregor N. Purdy wrote:

Here's an example of a wrapper:

  

  
That's a bit better, although bear in mind that if the intent is that
you could throw the entire chunk at an XML parser and have it not
complain, you are going to have to take some care in generating the
guts. Binary data is in general right out (where does it end? What if
it contains fragments that look like XML markup?). Sure, you could
slap it in a  film, but you'd still have to watch
out for the possibility that the body might want the sequence "]]>"
in it somewhere...
A slight aside, but just to build on what you said since this is an 
often-misunderstood facet of XML, there are a bunch of other reasons 
you can't just throw binary data into an arbitrary XML document inside 
of a CDATA section:

1) If you declare the encoding of the XML to be UTF-8, then your 
document won't be well-formed XML if your binary data doesn't look like 
legitimate UTF-8 data (which it won't in the general case--many bytes 
and byte sequences can't occur in UTF-8).

2) XML parsers are free to transcode--so a document declared in one 
encoding may pass data on to its client application in another 
encoding, which will mangle your binary data. For instance, the expat 
parser can consume documents in a variety of encodings, but always 
passes text to its callbacks in UTF-8.

3) XML parsers are required to line-ending normalize, so anything which 
looks like a CR or CRLF will turn into an LF, even inside of a CDATA 
section.

That said, you can get around all of this by base-64 encoding your 
binary data for storage in XML, since that turns binary data into text. 
On the other hand, it's a waste of space and more CPU cycles to consume 
than a more obvious binary format.

And that said, I read Dan's email as just meaning an XML header, but I 
didn't quite understand exactly what he had in mind either.

JEff



Re: Object instantiation

2003-10-21 Thread Jeff Clites
On Oct 21, 2003, at 7:14 AM, Dan Sugalski wrote:

After thinking about this a bit, it became glaringly obvious that the
right way to instantiate an object for class "Foo" is to do:
  new P5, .Foo

Or whatever the constant value assigned to the Foo class upon its 
creation
is. When a class is created, it should be assigned a number, and for 
most
things PMC-only classes or full-on HLL classes should behave 
identically.
Duh.
That makes sense. What I keep wondering is what about things with the 
semantics of Perl5, in which new objects aren't instantiated 
directly--already-allocated things later become associated with a 
class. This doesn't seem quite like a case of morphing, since for 
instance a Perl array can be blessed into a class, but it's still a 
Perl array.

JEff



Re: Object freezing

2003-10-21 Thread Jeff Clites
On Oct 21, 2003, at 5:53 AM, Dan Sugalski wrote:

Note that I do *not* want to have multiple object traversal systems 
in
parrot! We have one for DOD, and proposals have ranged upwards from 
there.
No. That is *not* happening--the chance for error is significant, the
side-effects of the error annoying and tough to track down for 
complex
cases (akin to the trouble with tracking down GC issues), and just 
not
necessary. (Perhaps desirable for speed/space reasons, but desirable
isn't necessary)
DOD's mark() routine has different requirements then a general
traverse() for freeze(), chill(), clone(), and destruction ordering.
Using just mark() will have these side effects that you want to avoid.
The only thing that mark does that the general traversal doesn't, in 
the
abstract, is flip the object's live flag. Everything else is an
optimization of code which we can, if we need, discard.
I don't believe that is quite true. There are a couple of important 
differences between traversal-for-GC and traversal-for-serialization, 
which will be a challenge to reconcile in the one-true-traversal:

1) Serialization traversals need to "take note" of logical int and 
float slots (e.g., as used in perlint.pmc and perlnum.pmc) so that they 
can be serialized, but for GC you only need to worry about GC-able 
objects. It's difficult to come up with a reasonable callback which can 
take either int, float, or PObj arguments.

2) It's reasonable for an object to have a pointer to some sort of 
cache object, which is not logically part of the object, and shouldn't 
be serialized along with it. This needs to be traversed for GC 
purposes, but needs to not be traversed for serialization. (Situations 
such as this--physical but not logical membership--are the origin of 
the "mutable" keyword in C++.)

3) Traversal for GC needs to do loop detection, but can just stop going 
down a particular branch of the object graph once it encounters an 
object it's seen before. Serialization traversals would need to have a 
way, upon encountering an object seen before, to include in the 
serialization stream an indication that the current object has already 
been serialized, and enough information to enable deserialization code 
to go find it and recreate the loop. The only options I see here are 
either for serialization to involve the allocation of unbounded 
additional memory, or to expand the PObj structure to include a slot 
for a UUID which can be used as a back-reference in a stream, or to 
have serialization break loops (so that deserialized structures never 
have loops).

I'm not 100% convinced that a single approach can't handle both 
applications, but it's looking as though their requirements are 
different enough that it may not work well.

Two other questions/concerns/comments/issues:

1) I assume that ultimately a user-space iterator would end up calling 
the traversal code, right? If so, you can't reasonably mandate that 
only one traversal be in progress at one time. That would be the 
canonical way to compare two ordered collections--get an iterator for 
each, and compare element-by-element.

2) I don't see it as a huge problem that serialization code could end 
up creating additional objects if called from a destroy() method. 
(Though yes, it would be a problem for GC infrastructure code to.) I 
say that for two reasons: (a) destroy() methods can really do anything 
they want, and if that task involves allocating additional memory, that 
just makes it a risk to perform that task in a destroy() method--it may 
fail due to out-of-memory conditions. I think that Java design experts 
tend to argue against doing things like serialization in finalization 
methods. It sounds elegant, but it's problematic. One reason for this 
is that you tend to want to serialize structures as a whole, not 
piece-by-piece as they are garbage-collected. The second reason it is 
not always a problem in practice is that (b) a DOD run may be triggered 
by an out-of-headers conditions, but that doesn't mean that an 
additional chunk of memory for headers can't be allocated. If it can't 
be, then this is no more problematic that it would be in other user 
code--think of the case where I have some big tree of objects I want to 
make some sort of copy of, with the intention of then letting go of the 
original when I'm done. I'll be freeing up headers at the end of that 
process, but if I run out of memory part-way-through, then I'm just 
stuck.

3) I assume that not every object is assumed to be serializable? For 
instance, an object representing a filehandle can't really be 
serialized in a useful way. So I'm not sure of what sort of "fidelity" 
is required of a generic serialization method--that is, how similar a 
deserialized structure is guaranteed to be to the original.

JEff



Re: Object freezing

2003-10-21 Thread Jeff Clites
On Oct 21, 2003, at 6:12 AM, Dan Sugalski wrote:

On Tue, 21 Oct 2003, Elizabeth Mattijsen wrote:

At 08:21 -0400 10/21/03, Dan Sugalski wrote:
I find the notion of an "XML header" a bit confusing, given Dan's
 statement to the effect that it was a throw to XML folks.
 I think anything "XML folks" will be interested in will entail
 *wrapping* stuff, not *prefixing* it.
Nah, I expect what they'll want is for the entire data stream of
serialized objects to be in XML format. Which is fine--they can have 
that.
(It's why I mentioned the serialization routines can be overridden)

For an XML stream the header might be 
version=1.0> with the rest of the stream in XML. A YAML stream would 
start
 with the rest in YAML, and teh
binary format as . Or 
something
like that, modulo actual correct XML.
If you want that to be looking like valid XML, it would have to be 
different:

error: Specification mandate value for attribute parrot

   ^
Better in my opinion would be something like:
data yadda yadda yadda
I'm not an XML guy, and I'm making all this up as I go along. If that's
better, fine with me. :)
Yeah, you can't put extra things in the "

So are we talking about a header or a wrapper?  If it is really a
header, it's not XML and then it's prettyy useless from an XML point
of view.
We're talking about the first thing in a file (or stream, or 
whatever). I
was under the impression that XML files should be entirely composed of
valid XML, hence the need for the stream type marker being valid XML.
No, XML _documents_ must be XML, but that doesn't mean that document == 
file. (For another example where this comes up, consider an XML 
document transmitted over HTTP. There are headers and other textual 
things in the stream along with the xml, and it's the HTTP protocol 
which determines where the document begins and ends, not xml's.) You 
can certainly have more than one XML document in a single file, but 
something needs to decide where an xml document begins and ends, and 
hand only that data to the xml parser.

YAML doesn't care as much, so far as I understand, and for our own 
internal
binary format we cna do whatever we want. If that's not true, then we 
can
go for a more compact header.
Yes, if you want the whole serialized steam to count as a well-formed 
xml document, then you can't but arbitrary binary data in the middle. 
See my previous post for why.

Once again, modulo my limited and inevitably incorrect YAML knowledge. 
So
if the header says it's XML the whole thing is valid XML, while if it
doesn't the rest of the stream doesn't have to be. (Just enough of the
header so that an XML processing program can examine the stream and 
decide
that the valid XML chunk at the beginning says that the rest of the
stream's not XML)
Most XML parsers aren't expecting to handle this. That is, there's no 
such thing as a valid half-of-an-xml document, from the perspective of 
the xml spec, and in many cases you'd have trouble getting a parser to 
stop before hitting something problematic and blowing up. In other 
words, you can't rely on an xml parser to process something which 
starts out looking like xml, but isn't.

Basically we want some nice, fixed (mostly) thing at the head of the
stream that doesn't vary regardless of the way the stream is encoded, 
and
XML seemed to be the most restrictive of the forms I know people will
clamor for. (I know, it means the stream can't be valid Lisp-style 
sexprs,
but XML's more widespread :)
Yeah, if you're just needing to tag the stream with a label to indicate 
the type plus a version number, then xml's on the one hand overkill and 
on the other hand not necessarily a big help to xml proponents.

JEff



Re: Object freezing

2003-10-21 Thread Jeff Clites
On Oct 21, 2003, at 10:41 AM, Elizabeth Mattijsen wrote:

At 12:53 -0400 10/21/03, Dan Sugalski wrote:
 > Yeah, if you're just needing to tag the stream with a label to 
indicate
 the type plus a version number, then xml's on the one hand overkill 
and
 > on the other hand not necessarily a big help to xml proponents.
So, in a nutshell, throwing an XML format type tag at the beginning 
buys
us nothing regardless of whether it's an XML stream or not?
Yep.  But mainly I think because you'll need to encode binary data to 
make it valid XML.  That's on overhead you don't to suffer for those 
serialization that don't need it.

If you ask me, you could do easy with a simple header line like:

  parrot xml 1.0
  \0
basically magic word ('parrot')
 followed by a space
 followed by the type
 followed by a space
 followed by version
 followed by a CRLF (not sure about this one, but could be nice)
 followed by a null byte
Yep, that's the sort of thing that I was thinking, though I'd actually 
leave the CRLF (or just an LF or CR, whatever), and take out the null 
byte. My reason for that is that this way, if your serialization format 
always spits out vanilla ASCII w/o control characters, suitable for 
consumption by some foreign C program, then the header won't change 
this. (That's one of the nice features of the tar format--a tar archive 
of ASCII text file is itself an ASCII text file, if I recall 
corrrectly.)

It could also be handy to allow additional "comment" text after the 
version (ignored by the deserialization, restricted to be ASCII w/o any 
CR or LF), because that would let you put in some human-readably 
comment to help out people trying to figure out what this file is. Some 
other formats to this, which is nice. Just another thought.

JEff



Re: Object freezing

2003-10-21 Thread Jeff Clites
On Oct 21, 2003, at 10:49 AM, Dan Sugalski wrote:

On Tue, 21 Oct 2003, Elizabeth Mattijsen wrote:

At 12:53 -0400 10/21/03, Dan Sugalski wrote:
Yeah, if you're just needing to tag the stream with a label to 
indicate
 the type plus a version number, then xml's on the one hand 
overkill and
on the other hand not necessarily a big help to xml proponents.
So, in a nutshell, throwing an XML format type tag at the beginning 
buys
us nothing regardless of whether it's an XML stream or not?
Yep.  But mainly I think because you'll need to encode binary data to
make it valid XML.  That's on overhead you don't to suffer for those
serialization that don't need it.
I had it in mind that the XML parsers were all event driven so they'd 
read
the header and stop until prodded, and wouldn't be prodded on if it 
wasn't
a real parrot XML serialization stream, so binary data wouldn't matter.
The event-based parsers (such as expat and other SAX parsers) tend to 
be push instead of pull, so you hand them your bytes and they invoke 
your callbacks (as opposed to pull-style in which you'd ask for the 
next event). Sometimes you can hand them your bytes in chunks (and 
they'll process what they can and save up the rest to include with your 
next chunk), so with expat (for example) you could probably do what you 
wanted, but you'd have to hand it data one byte at a time until your 
first callback was invoked, then stop. So it could probably be done 
with some parsers, but it would be an unusual usage, and more overhead 
than it's worth to just parse out a couple of strings. :)

JEff



Re: [RfC] and [PATCH]: Libraries

2003-10-21 Thread Jeff Clites
On Oct 15, 2003, at 4:52 AM, Juergen Boemmels wrote:

I spent the last day getting parrot running under Borland. The
attached patch is whats need to get linking and running make test on
both Windows/Borland and Linux/gcc. I'm not sure if its ready for
inclusion in the tree, but I want some feedback on the approach.
The main problem is that Borland can't build a single static library
(at least I did not find out) with two files of the same name. But
there are some name clashes: intlist.o and classes/intlist.o or
stacks.o and languages/imcc/stacks.o. I solved this by seperating
libparrot in three partial libs: classes/classes.a containing all
object-files of classes/ ; languages/imcc/imcc.a containing all
object-files of imcc and blib/lib/libparrot.a for all the rest. (This
names need cleanup; shouldn't they all go to blib/lib?). classes/ is
still build by its own Makefile, this should be integrated in the
root-Makefile, but thats another story.
Next problem is library interdependence. classes.a depends on
libparrot.a and libparrot.a depends on classes.a. This complicates
linking a bit. The gnu linker does not revisit previous files so the
link line has to contain something like
libparrot.a classes.a libparrot.a
A new configure variable parrot_libs takes care of this
Since no one else commented, I'll give you my two cents. I think there 
are 4 other options in addition to your proposal:

1) Find out for sure if Borland has a way to build this into a single 
library, if you're not 100% certain. Don't know if there are any 
Borland experts on the list.

2) Build separate libs, but only under Borland.

3) Rename the files with duplicate names so that they don't 
conflict--this might be worth doing anyway.

4) When building under Borland, make copies of the offending files 
under different names, and build those. (e.g., make a copy of 
languages/imcc/stacks.c called imcc-stacks.c, and build that instead)

I'd say that (4) is the cleanest (it only affect the one problematic 
environment, and you get to keep a single library everywhere), but as I 
said I think (3) might be useful anyway (just to make it easier to 
refer to files in conversation). Option (3) is also dead-easy, but we'd 
lose cvs history for the renamed files.

Having multiple libs isn't the end of the world, but it would be a 
shame to have to do it because of a particular compiler/linker quirk.

JEff



Re: Object freezing

2003-10-22 Thread Jeff Clites
On Oct 21, 2003, at 8:11 AM, Juergen Boemmels wrote:

If you think about it: The call to the destructors is done after
free_unused_pobjects completed. The memory of the objects without
destructors is already freed. If you are still in an out of memory
situation when the destructors are run, then its also very likely that
you are in a not much better situation afterwards.
That's not quite right. You can't reclaim the memory of any object 
which is reachable from any object with a destructor which has not yet 
been called. But you're right that some memory can be reclaimed before 
calling destructors.

JEff



Re: Ordered destruction and object graph traversal

2003-10-27 Thread Jeff Clites
On Oct 27, 2003, at 6:21 AM, Dan Sugalski wrote:

On Fri, 24 Oct 2003, Gordon Henriksen wrote:

On Monday, October 20, 2003, at 11:40 , Jeff Clites wrote:

My solution was to define a new vtable method--I've called it 
visit(),
though the name's not the important part--to which you pass a 
callback
(plus an optional context argument). It's job is to invoke that
callback on each of it's referenced "children" (also passing the
context object with each invocation), and that's it. The PMC doesn't
know or care what the callback does--in a GC context the callback may
set a flag on what's passed in and stash it for later processing; in
another context it may be used inside of an iterator which vends out
objects one-by-one and guards against loops.
Great! This is exactly the fundamental operation I'd been musing would
be the best building block to enable serialization and pretty-print
operations, and I'm sad to see that your post has been Warnocked. :(
Allow me to haul out this bucket of ice-water I keep around for just 
such
an eventuality. :)

There's a low limit to the complexity of any sort of traversal we can
provide. We *can't* go recursive in a traversal, if it crosses the
C->Parrot or Parrot->C boundary as part of the recursion, or stays
entirely in C. We can only count on a 20-40K stack *total*, and it 
doesn't
take too many calls into a recursive procedure to eat that up. That 
limits
the mechanisms we can use to traverse the graph of objects pretty 
badly.
The linked list mechanism the DOD system is using is one of the few 
that
isn't recursive.

There are other mechanisms that can be used for traversal that avoid
recursion, but they generally trade off pretty heavy memory usage for 
that
lack, which makes them unsuitable for general use.
Who said anything about recursion?

JEff



Re: NULL Px proposal (was Re: [BUG] IMCC looking in P3[0] for 1st arg)

2003-10-27 Thread Jeff Clites
On Oct 26, 2003, at 10:39 AM, Melvin Smith wrote:

I think a compromise would be to do define a interpreter global PMCNull
and point (or init) all Px registers to it.
...
The downside is fast initialization of register blocks. memsetting 
with NULL (0)
will not be possible, but I'd have to actually go check and see if that
is really all that common.
If we determine that it makes a difference, we can store away a 
register block filled with our PMCNull, and memcpy (or memmove) from 
that to initialize a block. Should be nearly as fast.

JEff



Re: Garbage-collecting classes [was: This week's Summary]

2003-10-28 Thread Jeff Clites
On Oct 28, 2003, at 3:56 PM, Luke Palmer wrote:

  Object Instantiation
Dan had a moment of clarity and declared that the Parrot Way to
instantiate an object in class Foo will be:
  new P5, .Foo

All we need now is a working implementation. And, apparently, 
knowing
what class a class is a member of might be handy, but Dan's 
punting on
("ignoring the heck out of") that one.
(Yeah, I read the thread, but just thought of this now)

So classes will be integers?  Fair enough, as there's just as many of
those as there are memory locations.
I do worry about some long-running daemon that uses anonymous classes
heavily.  It's possible that after quite some time, we'd run out of
integers, even when there are far fewer than 4 billion classes in
current existence.
So, shouldn't classes instead be objects that can be garbage-collected?
I know that violates that nice equivalency between PMCs and classes, 
but
I'm sure there's a way to finesse the convenience of that equivalence
somehow...
I'd say it's more like class names are integers, so anonymous classes 
don't use them up since they don't have names. But that points out that 
there should be a way to instantiate a member of a class given the 
class object (itself a PMC), with the above syntax serving as a bit of 
a shorthand (possibly an optimized one) for finding the class object 
based on its name and then instantiating it.

(And it's not really that a class name is an int, but rather that the 
representation of the name spit out by a compiler would be an int.)

Just a thought.

JEff



Re: HASH changes

2003-11-06 Thread Jeff Clites
On Nov 6, 2003, at 10:05 AM, Juergen Boemmels wrote:

But I still have one problem: hash_put and hash_get are not
symmetric. hash_put puts a void *value, but hash_get returns a
HASHBUCKET. This looks wrong to me.
When returning the value directly, we couldn't have NULL value's - or
better NULL values and no entry at all would have the same return
result.
...
From the principle of least surprise, I suggest to let hash_get return
a value pointer directly and have the ambiguity of NULL value and no
entry, and have a hash_get_bucket which returns a pointer to the
bucket.
void * hash_get(Interp * interpreter, HASH *hash, void *key);
HASHBUCKET * hash_get_bucket(Interp * interpreter, HASH *hash, void 
*key);
And to amplify what you are saying, I think it's okay for hash_get to 
be ambiguous as to whether or not the key exists, as long as there is a 
separate method for that--hash_key_exists or something. (I can't think 
of a case in user code where you'd need to do both in one call--in 
perl5 for example "exists" is a separate operator from hash lookup.)

An alternative (which may or may not a good one) would be to say that 
you can never actually store a NULL in a hash--at most you could store 
a PerlUndef or Null or whatever null-ish PMC. Then a NULL return 
unambiguously means that the key wasn't found. I could probably argue 
either for or against this.

JEff



Re: Re[2]: NCI Broken On Win32

2003-11-06 Thread Jeff Clites
On Nov 6, 2003, at 3:59 PM, Jonathan Worthington wrote:

From: "Mattia Barbon" <[EMAIL PROTECTED]>
Il Tue, 28 Oct 2003 18:51:56 +0100 Leopold Toetsch <[EMAIL PROTECTED]> ha
scritto:

Jonathan Worthington <[EMAIL PROTECTED]> wrote:
Hi,

loadlib P1, "user32.dll"
There are 2 errors in that, one is mine :)
- I didn't use the configure define of PARROT_DLL_EXTENSION
- Your's is: don't append ".dll" - parrot does it
  Somebody might wish to load something not named ".dll". For example
MS HTMLHelp API is located in a file suffixed ".ocx" (which exports
both an ActiveX control and raw C API).
This is true, and IMHO needs some thought.  I'm working on a library 
that
gives Parrot access to the entire Win32 API.  Some of them (related to
printing) are in a library called winspool.drv (note .drv, not .dll).
Parrot adds .dll onto the end of this and, of course, it cannot load 
it.
I'd be surprised if Win32 was the only OS where there was more than one
possible extension for a shared library.
Yeah, this would likely be a problem on Mac OS X as well, where 
loadable bundles can have a variety of extensions. I'm of the opinion 
that there's no very good reason to not just have the API take a real 
file name (including the extension).

I can imagine enforcing a per-platform extension for parrot bytecode 
libs specifically, so that you can write platform-independent user code 
which doesn't have to worry about how a parrot add-on lib is named, but 
then again it would be just as sensible to have a cross-platform 
convention for the extension, and just use the file name even here.

JEff



Re: Failing tests: t/pmc/eval.t 1-3 and t/pmc/sub.t 42-44

2003-11-08 Thread Jeff Clites
Pretty much the exact same results on Mac OS X. Here's the contents of  
myconfig:

Summary of my parrot 0.0.13 configuration:
  configdate='Sat Nov  8 18:40:54 2003'
  Platform:
osname=darwin, archname=darwin
jitcapable=1, jitarchname=ppc-darwin,
jitosname=DARWIN, jitcpuarch=ppc
execcapable=1
perl=perl
  Compiler:
cc='cc', ccflags='-g -pipe -pipe -fno-common -no-cpp-precomp  
-DHAS_TELLDIR_PROTOTYPE  -pipe -fno-common -Wno-long-double ',
  Linker and Libraries:
ld='cc', ldflags='  -flat_namespace ',
cc_ldflags='',
libs='-lm'
  Dynamic Linking:
so='.so', ld_shared=' -flat_namespace -bundle -undefined suppress',
ld_shared_flags=''
  Types:
iv=long, intvalsize=4, intsize=4, opcode_t=long, opcode_t_size=4,
ptrsize=4, ptr_alignment=4 byteorder=4321,
nv=double, numvalsize=8, doublesize=8

On Nov 8, 2003, at 6:22 PM, Bryan Donlan wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
I'm using Gentoo Linux on an intel celeron processor. When I do make  
test from
latest CVS, I get:

/usr/bin/perl t/harness --gc-debug --running-make-test  -b t/op/*.t  
t/pmc/*.t
t/native_pbc/*.t
t/op/00ff-dos..ok
t/op/00ff-unix.ok
t/op/arithmetics...ok
t/op/basic.ok
t/op/bitwise...ok
t/op/comp..ok
t/op/conv..ok
t/op/debuginfo.ok
3/3 skipped: getline/setline changes not finished
t/op/gcok
t/op/globals...ok
t/op/hacks.ok
2/2 skipped: no events yet
t/op/ifunless..ok
t/op/info..ok
t/op/integer...ok
t/op/interpok
t/op/jit...ok
t/op/jitn..ok
t/op/lexicals..ok
t/op/macro.ok
1/16 skipped: Await exceptions
t/op/numberok
t/op/rxok
1/23 skipped: Pending some sort of lowercasing op
t/op/stacksok
t/op/stringok
t/op/time..ok
t/op/trans.ok
t/op/types.ok
t/pmc/arrayok
t/pmc/boolean..ok
t/pmc/coroutineok
t/pmc/env..ok
t/pmc/eval.# Failed test (t/pmc/eval.t at line 8)
#  got: 'back again
# '
# expected: 'in eval
# back again
# '
# Failed test (t/pmc/eval.t at line 20)
#  got: '40
# '
# expected: '42
# '
# Failed test (t/pmc/eval.t at line 33)
#  got: 'hello
# '
# expected: 'hello parrot
# '
# Looks like you failed 3 tests of 3.
dubious
	Test returned status 3 (wstat 768, 0x300)
DIED. FAILED tests 1-3
	Failed 3/3 tests, 0.00% okay
t/pmc/exceptionok
t/pmc/floatok
t/pmc/intlist..ok
t/pmc/io...ok
1/21 skipped: clone not finished yet
t/pmc/iter.ok
1/9 skipped: N/Y: get_keyed_int gets rest of array
t/pmc/key..ok
t/pmc/managedstructok
t/pmc/mmd..ok
t/pmc/multiarray...ok
t/pmc/nci..i386 JIT CPU
.so SO extension
ok
t/pmc/objects..ok
t/pmc/orderedhash..ok
t/pmc/perlarrayok
t/pmc/perlhash.ok
t/pmc/perlint..ok
t/pmc/perlnum..ok
t/pmc/perlstring...ok
t/pmc/pmc..ok
t/pmc/prop.ok
t/pmc/ref..ok
t/pmc/sarray...ok
t/pmc/scratchpad...ok
t/pmc/sub..# Failed test (t/pmc/sub.t at line 512)
#  got: 'main
# loaded
# Global '_sub1' not found
# '
# expected: 'main
# loaded
# found sub
# in sub1
# '
# './parrot --gc-debug -b t/pmc/sub_42.pasm' failed with exit code 1
# Failed test (t/pmc/sub.t at line 541)
#  got: 'main
# loaded
# Global '_sub1' not found
# '
# expected: 'main
# loaded
# found sub
# in sub1
# back
# '
# './parrot --gc-debug -b t/pmc/sub_43.pasm' failed with exit code 1
# Failed test (t/pmc/sub.t at line 574)
#  got: 'main
# loaded
# Global '_sub1' not found
# '
# expected: 'main
# loaded
# found sub1
# in sub1
# back
# found sub2
# in sub2
# back
# in sub1
# back
# '
# './parrot --gc-debug -b t/pmc/sub_44.pasm' failed with exit code 1
# Looks like you failed 3 tests of 47.
dubious
	Test returned status 3 (wstat 768, 0x300)
DIED. FAILED tests 42-44
	Failed 3/47 tests, 93.62% okay
t/pmc/timerok
t/native_pbc/numberok
1/3 skipped: core ops changes
Failed 2/56 test scripts, 96.43% okay. 6/954 subtests failed, 99.37%  
okay.
Failed Test  Stat Wstat Total Fail  Failed  List of Failed
-  
--- 

t/pmc/eval.t3   768 33 100.00%  1-3
t/pmc/sub.t 3   768473   6.38%  42-44
10 subtests skipped.
make: *** [testb] Error 29

Updated from cvs with:
 cvs -z3 update -dP && make realclean && perl Configure.pl && make
Compiler flags are reported as:
- -DPERL5 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -g -Dan_Sugalski  
-Larry
- -Wall -Wstrict-prototypes -Wmissing-prototypes -Winline -Wshadow
- -Wpointer-arith -Wcast-qual -Wcast-align -W

Re: HASH changes

2003-11-09 Thread Jeff Clites
On Nov 6, 2003, at 11:56 PM, Leopold Toetsch wrote:

Jeff Clites <[EMAIL PROTECTED]> wrote:

An alternative (which may or may not a good one) would be to say that
you can never actually store a NULL in a hash
Not good. The Hash (now) may take almost arbitrary key/value pairs.
Value can be plain ints ...
Okay, that makes sense then. So is the current implementation 
incomplete? I ask because it looks like the mark functionality won't 
yet quite deal with non-PObj values, etc. Also, since key and value are 
now void*'s, there'll be a problem on platforms where sizeof(void *) != 
sizeof(other types), right? (In particular, doubles will usually be 
larger than pointers.)

JEff



Re: [CVS ci] hash compare

2003-11-13 Thread Jeff Clites
On Nov 13, 2003, at 2:21 PM, Nicholas Clark wrote:

On Wed, Nov 12, 2003 at 02:07:52PM -0800, Mark A. Biggar wrote:

And even when the sequence of Unicode code-points is the same, some
encodings have multiple byte sequences for the same code-point.  For
example, UTF-8 has two ways to encode a code-point that is larger the
0x (Unicode as code-points up to 0x10FFF), as either two 16 bit
surrogate code points encoded as two 3 byte UTF-8 code sequences or as
a single value encoded as a single 4 or 5 byte UTF-8 code sequence.
Is it legal to encode surrogate pairs as UTF8? Or does that count as
malformed UTF8?
No, it's not legal. As of Unicode 3.2, it's not permissible to encode a 
non-BMP (that is, code point > 0x) character in UTF-8 via two 
3-byte UTF-8 sequences. There is another encoding which does this, 
called CESU-8, but I don't think it's really ever used.

JEff



Re: Darwin issues

2003-11-14 Thread Jeff Clites
Hi Chris:

I haven't had any problems such as this on Mac OS X--either 10.2.6 or 
10.3. What is the contents of your "myconfig" file? Here is the 
contents of mine, for comparison:

Summary of my parrot 0.0.13 configuration:
  configdate='Fri Nov 14 18:23:39 2003'
  Platform:
osname=darwin, archname=darwin
jitcapable=1, jitarchname=ppc-darwin,
jitosname=DARWIN, jitcpuarch=ppc
execcapable=1
perl=perl
  Compiler:
cc='cc', ccflags='-g -pipe -pipe -fno-common -no-cpp-precomp 
-DHAS_TELLDIR_PROTOTYPE  -pipe -fno-common -Wno-long-double ',
  Linker and Libraries:
ld='cc', ldflags='  -flat_namespace ',
cc_ldflags='',
libs='-lm'
  Dynamic Linking:
so='.so', ld_shared=' -flat_namespace -bundle -undefined suppress',
ld_shared_flags=''
  Types:
iv=long, intvalsize=4, intsize=4, opcode_t=long, opcode_t_size=4,
ptrsize=4, ptr_alignment=4 byteorder=4321,
nv=double, numvalsize=8, doublesize=8

I'm thinking that there must be something else causing your problem, as 
there are not in general problems building parrot on Mac OS X--for me, 
parrot builds without modification on Mac OS X, and (currently) passes 
all tests via the standard "perl Configure.pl; make; make test" 
sequence. (I'm currently using gcc 3.3 on Panther, but I believe I was 
using gcc 3.1 previously on Jaguar. I see your gcc is marked 
prerelease--it may be that you need an updated version of the dev. 
tools.)

JEff

On Nov 14, 2003, at 5:29 PM, [EMAIL PROTECTED] wrote:

Apple shipped a linker that doesn't work well with a lot of projects 
unless they recognize it.  It requires that the link phase of any 
c/c++ compilation add a -lcc_dynamic flag.  I was able to do a manual 
compilation of many things by adding that flag, but that gets to be 
tedious.  When I was looking for a good spot to type the flag in once 
and then have it just work I tried the main Makefiles LD_FLAGS setting 
and others to no avail...  I have just started with parrot so I don't 
know my way around well enough yet to get a feel for that.

After finishing the main `make` process, I went on to try `make 
test`... It passed most of them except the ones in t/src/ because they 
also needed -lcc_dynamic

If you need more information about my system so you can write test 
with darwin in mind just holler

btw:
gcc (GCC) 3.1 20020420 (prerelease)
Mac OS version 10.2.8 (iBook 800Mhz G3, 512MB RAM)
Chris




IMCC problems with library loading

2003-11-16 Thread Jeff Clites
I've run into a couple of issue with library loading which have their 
origin down inside the IMCC code:

1) External libraries are being loaded at parse time.

Inside of INS() in imcc/parser_util.c, Parrot_load_lib() is called at 
parse-time when loadlib is encountered. This is causing libraries to be 
loaded twice (once at parse-time and once at run-time), which is a 
problem in its own right, but it also just seems like generally the 
wrong behavior.

2) Code which tries to instantiate a dynamically-loaded PMC fails to 
parse.

Code such as this:

loadlib P1, "foo"
new P0, .Foo
fails to parse, because ".Foo" is being interpreted as a macro. For 
example:

% ./parrot dynclasses/dynfoo.pasm
error:imcc:unknown macro '.Foo'
in file 'dynclasses/dynfoo.pasm' line 7
The problem seems to be the following code, which interprets ".Blah" as 
a macro if "Blah" isn't an already-registered type:

imcc/imcc.l line 337:

{DOT}{LETTER}{LETTERDIGIT}* {
int type = pmc_type(interp, string_from_cstring(interp, 
yytext+1, 0));

if (type > 0) {
char *buf = malloc(16);
sprintf(buf, "%d", type);
valp->s = buf;
return INTC;
}
if (!expand_macro(valp, interp, yytext+1))
fataly(1, sourcefile, line, "unknown macro '%s'", yytext);
}
I can't find any definitive docs on the macro syntax, so I don't know 
if this is pointing out a true ambiguity in the grammar (since 
dynamically-loaded PMC types won't be known at parse-time, before the 
library defining them is loaded), or if macro usages are always 
supposed to have argument-syntax, such as ".Blah()", which would mean 
that this is just a problem with the parser. There's no example in 
t/op/macro.t without parentheses, so I'm thinking it's the latter.

JEff



Re: IMCC problems with library loading

2003-11-17 Thread Jeff Clites
On Nov 17, 2003, at 1:59 AM, Leopold Toetsch wrote:

Jeff Clites <[EMAIL PROTECTED]> wrote:
I've run into a couple of issue with library loading which have their
origin down inside the IMCC code:

1) External libraries are being loaded at parse time.

Inside of INS() in imcc/parser_util.c, Parrot_load_lib() is called at
parse-time when loadlib is encountered. This is causing libraries to 
be
loaded twice (once at parse-time and once at run-time), which is a
problem in its own right, but it also just seems like generally the
wrong behavior.
If the library registers itself correctly, its loaded once only. E.g. a
PMC library calls pmc_register() so that the classname is known.
Actually, Parrot_load_lib calls Parrot_dlopen (inside if the call to 
get_path), then later calls is_loaded and closes the library if it was 
already loaded before. This is loading the library twice, from the 
perspective of the operating system. But that's really a side issue.

My main point is that you can't do conditional library loading. This 
code will try to load the "doesnt_exist" library, and I don't think it 
should:

branch  HERE
loadlib P1, "doesnt_exist"
HERE:
end
It's not because the branch fails--it's because the library is loaded 
before the script starts running.

Or is this expected--that a loadlib instruction will cause a library to 
be loaded, even if the instruction is never reached?

2) Code which tries to instantiate a dynamically-loaded PMC fails to
parse.

Code such as this:

 loadlib P1, "foo"
 new P0, .Foo
How does "foo" look like? dynclasses/dynfoo.pasm has an example of
loading dynamic classes (and it works here).
dynclasses/dynfoo.pasm is the example I was referring to. It actually 
works for me also--if I leave the code in imcc/parser_util.c which 
causes the parse-time library loading mentioned above. (I had commented 
that out.)

But rearranging the assembly slightly so that the execution order is 
logically the same but the loadlib instruction is further down in the 
source causes it to fail. That is:

null P0
branch one
two:
find_type I0, "Foo"
print I0
print "\n"
new P0, .Foo
print "ok 2\n"
branch three
one:
loadlib P1, "foo"
print "ok 1\n"
branch two
three:
set I0, P0
print I0
print "\n"
end
This again fails with the error:

	error:imcc:unknown macro '.Foo'

This is because the parser is encountering ".Foo" before it has loaded 
the lib at parse-time.

This brings me back to my original question of whether the parser 
should think ".Foo" is a macro at all, or if rather that's not legal 
macro syntax.

JEff



Re: Proposal: parrot-compilers list

2003-11-18 Thread Jeff Clites
On Nov 17, 2003, at 11:22 AM, Melvin Smith wrote:

In the past couple of years we've seen several sub-projects pop-up
and subsequently fizzle out (maybe due to Parrot slow
progress or maybe due to lack of critical mass).
I propose creating 'parrot-compilers' as a general
purpose list for any and all language development
...

So I'll be one of the few nay-sayers. I'm not definitely against it, 
but here are two counter-arguments:

1) It's likely that we've seen very little traffic (at least recently) 
about language development because either as you said Parrot isn't 
quite ready, or because of lack of interest. If so, a new list won't 
help.

2) Forking a list almost always leads to an increase of messages of the 
form "you should be asking this on the other list...", and/or of 
cross-posting, neither of which is very interesting to deal with. I'm 
of the opinion that unless traffic is unmanageable, fewer lists is 
better. So I think a fair question to ask is: How many people currently 
subscribed to this list would not subscribe to both? For those who 
subscribe to both, splitting wouldn't be an improvement.

I doubt that the name of the current list is actually a big problem, 
because no matter what the name is, you'll find out about the list by 
reading a blurb about it somewhere, which will explain why it's named 
this way. Those who are off-put because they hate Perl are going to 
probably not like the content of the discussion either, because it will 
constantly come up.

Again, I'm not dead-set against it--I just don't think it would improve 
anything.

JEff



Re: IMCC problems with library loading

2003-11-18 Thread Jeff Clites
On Nov 17, 2003, at 11:07 AM, Leopold Toetsch wrote:

Jeff Clites <[EMAIL PROTECTED]> wrote:

My main point is that you can't do conditional library loading. This
code will try to load the "doesnt_exist" library, and I don't think it
should:

 branch  HERE
 loadlib P1, "doesnt_exist"
HERE:
 end

It's not because the branch fails--it's because the library is loaded
before the script starts running.
It depends. When you run it immediately, you are right: The library is
loaded at compile time. When you compile it as .pbc and run it, the lib
isn't loaded ar runtime.
*But*, when writing that stuff, one prelim was, that the compiler sees
all libs in the same sequence as runtime does. So above code (or your
second example) is illegal.
Okay for now then. But long-term it seems pretty obvious that this will 
be too restrictive. Here are some specific problems:

1) For non-parrot libs (like the ncurses lib. used in Dan's example), 
it's not necessary to load them at parse-time. It's a waste in terms of 
performance, but also not all libraries can be unloaded on all 
platforms, and some libraries will cause behavior changes that aren't 
intended to take effect until later. This is an argument for the 
distinction we used to have with different ops to load different types 
of libraries. We may have been a bit hasty in unifying them all.

2) A program may act differently when run from source v. run from .pbc, 
because different libraries may be loaded.

3) You can't do conditional loading of the form "if some condition is 
met, load library A which implements the PMC I need, else load a 
different library B which implements it". For instance, you might want 
to choose at runtime between loading a debug and a non-debug version of 
some library. Currently I'd imagine that we'd end up loading both at 
compile-time, which it's probably guaranteed to not work correctly.

This brings me back to my original question of whether the parser
should think ".Foo" is a macro at all, or if rather that's not legal
macro syntax.
This doesn't change the problem per se.
It does in that the current parser behavior blocks finding out what the 
real issue is.

Also, it's still a legitimate question, so for the third time I'll ask 
if ".Foo" is valid macro syntax? If not, then the current parser 
behavior is misleading. I'm of the opinion that it should not be valid 
macro syntax--that ".Foo()" should be necessary instead, because 
otherwise the grammar is ambiguous.

JEff



Re: Proposal: parrot-compilers list

2003-11-18 Thread Jeff Clites
On Nov 18, 2003, at 9:07 AM, Sterling Hughes wrote:

The reason I think parrot-compilers would be useful, is that its 
dedicated to helping people (like me) write compilers for parrot, 
whereas (in my understanding), perl6-internals@ is really about the 
development of the vm itself (I would subscribe to both).  I see 
parrot-compilers@ as opening up more questions from people struggling 
with the current apis than perl6-internals@ - at least I'll post more. 
:)
Okay, fair enough--I could see how currently people might think 
compiler-development questions are off-topic.

JEff



Re: Freeze checkin

2003-11-19 Thread Jeff Clites
On Nov 19, 2003, at 9:04 AM, Dan Sugalski wrote:

Just a quick heads-up--I checked in the preliminary patch for 
freeze/thaw
that Leo sent me for review. It'll change internally a fair amount, and
the vtable/low-level API is going to change, but the op-level interface
will be stable. I wanted it in before things driged any further, 
though,
so feel free to play but expect it to change rather a lot.
Two initial concerns:

1) I have a patch which I've been assembling to do ordered destruction. 
That needs to use the next_for_GC pointer (and I think any alternate 
implementation would need to as well). But Parrot_freeze_at_destruct() 
uses next_for_GC. I assume Parrot_freeze_at_destruct is intended to be 
called from a destroy() method; if so, there's a problem, and we'd need 
2 next_for_GC-ish pointers. We certainly can't have a method intended 
to be called only at destruction-time which is incompatible with proper 
ordered destruction

2) For the same patch, I'm using something very similar to the 
vtable->visit method used in the new freeze/thaw code (and actually my 
method was called vtable->visit as well, which may be where Leo got the 
name, since I had mentioned this before...). But the ordered-GC code 
needs to know about "children" in the sense of any referenced PObj (ie, 
children defined by what needs to be marked in a GC run), whereas the 
freeze code needs to know about only "logical children" (ie, only those 
which are intended to be frozen when the "parent" is frozen). So I 
would modify my ordered GC code to use Leo's vtable->visit, but it has 
a different notion of what children to visit, so I think we are either 
going to end up with two very-similar-but-different vtable methods, or 
we'll need extend vtable->visit to allow it to serve both purposes 
(either by calling visit with a flag which tells it which type of 
visitation to do, or by instead having a flag which is passed to the 
visit_child_function to tell it whether this is a "logical" child or 
merely a referenced object). Off the top of my head I think that 
"freeze-children" will always be a subset of "mark-children", so 
passing a flag to vtable->visit is probably cleaner.

JEff



Re: Freeze checkin

2003-11-19 Thread Jeff Clites
On Nov 19, 2003, at 1:34 PM, Leopold Toetsch wrote:

Jeff Clites <[EMAIL PROTECTED]> wrote:
On Nov 19, 2003, at 9:04 AM, Dan Sugalski wrote:

Two initial concerns:

1) I have a patch which I've been assembling to do ordered 
destruction.
That needs to use the next_for_GC pointer (and I think any alternate
implementation would need to as well). But Parrot_freeze_at_destruct()
uses next_for_GC. I assume Parrot_freeze_at_destruct is intended to be
called from a destroy() method; if so, there's a problem, and we'd 
need
2 next_for_GC-ish pointers.
I don't know yet, when and from where freeze_at_destruct() is run.
Yeah, that's not clear to me either. (And I think that many other 
languages/environments have concluded that trying to archive at 
destruction time turns out not to be a good idea, but that's a separate 
issue.)

It seems, that objects are already dead then. That could mean, that
destruction ordering is run before.
But, even if these 2 get run at the same time, it could be possible to
share the next_for_GC pointer - ordering could (if possible) skip some
PMCs that don't need ordering.
It's possible that they could be made to work in parallel--that for a 
particular object we run the freeze right before we run the destroy. I 
don't think we could archive after destruction (since destruction by 
design will be tearing down the object and may destroy the state we are 
trying to archive).

... So I
would modify my ordered GC code to use Leo's vtable->visit, but it has
a different notion of what children to visit,
... which, AFAIK isn't carved in stone yet. We don't have a notion for
logical children that need ordered destruction. An array holding some
scalar PMCs can get destructed in any order.
I'm not thinking of an array controlling the destruction ordering of 
its children--I'm talking about the need to destroy() the array itself 
before destroy()ing its children (that's just what "ordered 
destruction" means). What I'm really worried about here is the case 
where something like an array may have a reference to some PMC which 
its using as a cache, but which is not supposed to be archived along 
with the array (because it's not part of the "contents" of the array). 
The GC traversals need to know about this reference, but freezing does 
not.

We first should separate destruction (i.e. mainly freeing system memory
or maybe closing other resources) and the more HLL-like finalizing,
which might depend on other objects to get finalized first. This needs
IMHO another vtable, that if present (not default), is a sign for
"destruction ordering" or finalizing.
I've been assuming that these two are tied--that for an HLL object, 
whatever PMC is used to implement it would be responsible for calling 
the HLL DESTROY() from the PMC's vtable->destroy. If not, then I'm 
pretty confused--because any recycling of objects is going to need a 
global view of what is reachable, and of what things reference what 
other things. But I may be missing something, so I'm anxious to hear 
your thoughts about this.

... so I think we are either
going to end up with two very-similar-but-different vtable methods, or
we'll need extend vtable->visit to allow it to serve both purposes
(either by calling visit with a flag which tells it which type of
visitation to do,
The visit vtable passes all "visited" PMCs on to a callback, which puts
these on some kind of a TODO list. visit, when called via freeze is
different to visit called from thaw. So destruction ordering can setup
just its on visit_todo_list() and work on from there. If that's not
enough, the visit vtable has all the info, that may be needed in the
visit_info structure.
So to take my "cache" example from above, destruction ordering would 
need for the visit_child_function to be called on the cache object, and 
freeze would need for it to _not_ be. Or, it could always be passed, 
but then the visit_child_function would need to be able to know "this 
object is referenced but not logically contained". Either way, it's the 
vtable->visit which has the information--the question will just be how 
it "communicates" it to the visit_child_function.

... or by instead having a flag which is passed to the
visit_child_function to tell it whether this is a "logical" child or
merely a referenced object).
I need for freeze/thaw another vist_info structure member called
possibly "void *extra" anyway, to be able to e.g. call back to the
original PMC to freeze/thaw native data or to handle sparse arrays.
While I'd rather have the visit vtable free of such functionality, it
seems not achievable. So if destruction ordering has to pass some info,
we'll just do it.
As I had originally thought of it, the signature was going to be:

	vtable->visit(visit_child

Re: cvs commit: parrot/config/gen/makefiles root.in

2003-12-01 Thread Jeff Clites
On Dec 1, 2003, at 8:11 AM, Andy Dougherty wrote:

On Mon, 1 Dec 2003, Leopold Toetsch wrote:

Andy Dougherty <[EMAIL PROTECTED]> wrote:

I can think of one other potential problem:  What if the user wants 
to
mount the source read-only?  Last I checked, except for this issue, 
it was
possible to do so.  For perl5, this ability was often requested.
We are generating all kinds of files inside the parrot directory. I
don't know, what you mean with "mount the source read-only". If you
mean, the filesystem, where the parrot source is on, you can't build
it.
Sorry, I guess I should have been more explicit.  What folks 
apparently do
is have the sources on an NFS server mounted read-only.  They then 
build a
symlink tree onto a local read-write filesystem and run the build 
there.
Thus they can create all the new files they want, but they can't touch 
or
change the source files.  This is presumably helpful because other 
users
may be simultaneously mounting those same source files and may be
expecting them to remain unchanged from the distributed version.

perl5's build system is supposed to support this in two ways:  First,
Configure contains code to detect when it's being run from a different
directory than that which contains the source.  The Makefiles are 
supposed
to then do the right thing.  I haven't touched that stuff in a long 
time,
and it would surprise me if it actually worked.  Second, perl5's 
Configure
supports a -Dmksymlinks option which makes the symlink tree for you.

These features were added as a result of a number of user requests. I
vaguely recall it showing up in the archives under the description of
"self-modifying source". I have since found it useful and have used the
-Dmksymlinks feature regularly to test builds on multiple architectures
(or with different options).
As an extension of this, I think it would be useful to have a "build" 
directory (configurable, defaulting to a "build" subdirectory of the 
distro root) as a location for all intermediate (*.o, etc.) and 
generated files to live. That's even simpler than a symlink tree, and 
in addition to allowing read-only source has some other nice features: 
you can "make clean" by deleting a single dir, and you can grep the 
sources easily without having to take steps to avoid the generated 
files. I've found this sort of thing useful in the context of other 
projects.

JEff



Re: [RfC] Fix assigned PMC enum_class_xxx

2003-12-02 Thread Jeff Clites
On Dec 1, 2003, at 9:01 PM, Melvin Smith wrote:

At 10:40 AM 11/28/2003 +0100, Leopold Toetsch wrote:
As outlined some time ago, when ops.num made it into the core, we 
need fix assigned PMC class enums too. (Changed class enums 
invalidate existing PBC files).

1) lib/Parrot/PMC.pm is the canonical source of PMC class => enum 
mapping.
2) the class enums should be numbered so that "base" classes come 
first and dependent classes later.
3) If we keep the current scheme for Const$class generation and 
switching, we have to use this numbering scheme:
   default ...  implicitely #0
   odd enums ... plain class
   even enums ... Const variant of class
4) Where config/* now have $pmc_names, %pmc_types is used.
5) Adding a new class starts with editing PMC.pm. If the current 
numbering can not be kept, PBC_COMPAT gets an entry too and thus 
invalidates existing PBCs.
6) The interactive step of selecting PMCs is IMHO unneeded. Its error 
prone, rarely used PMCs can be dynamically loaded.
7) We probably have to reserve some slots for Perl5-classes

Comments welcome
An additional thought. I feel there is no need to expose the enumerated
values to user-code (bytecode or native methods). Looking up PMCs 
isn't really
any different than looking up other symbols and could be fixed up at 
load time.

This does away with any ordering or numeric range reservation issues.

If I recall, Java stores class references as UTF encoded strings
of the full path to the class [java/lang/foo]
Yes, ObjC does something similar. At compile-time, class names in the 
code become references into a per-library lookup table (stored in a 
separate segment alongside the constant string segment), and that 
lookup table is filled in a library-load-time with a reference to the 
actual class object (or a unique int representing it). (This is roughly 
the approach, at least.) So 3 attractive options here for parrot for 
PMCs (and also for methods and classes) are (1) do the same thing, or 
(2) at compile-time represent PMC names as strings in the bytecode, and 
do a fixup run to replace these with ints (or whatever) at load time, 
or (3) have an op which does the lookup by name, and which morphs 
itself into an op which holds a direct reference, the first time it is 
called.

To me, the 1 -> 2 -> 3 progression is very much like the evolution of 
predereferencing.

JEff



Re: More object stuff

2003-12-04 Thread Jeff Clites
On Dec 3, 2003, at 12:03 PM, Dan Sugalski wrote:

At 12:17 PM +0100 12/3/03, Leopold Toetsch wrote:
Dan Sugalski <[EMAIL PROTECTED]> wrote:

 > *) Exceptions we're happy with

Create an Exception class hierarchy?
I'm not 100% sure I want to go with a real OO style of exceptions, but 
that might just be the neo-luddite in me.
I'm of the opinion that it's a conceptual mistake to model exceptions 
as objects. My main concrete argument is that it's an abuse of 
inheritance--you need a hierarchy of exception reasons, but using 
inheritance just to get a hierarchy is silly. (That is, it doesn't make 
sense to subclass if you are not changing behavior in some way.) In 
addition, I think it's overly restrictive.

Fundamentally, I think raising an exception requires two things: (1) 
some indication of "reason" for the exception (used by the exception 
infrastructure to select the correct handler), and (2) a way for the 
originator of the exception to pass some arbitrary information to the 
handler (this information being opaque to the exception 
infrastructure).

In C terms, that would mean that the signature is conceptually like:

	raise_exception(Reason reason, void* info);

which in Parrot terms would probably shake down to:

	raise_exception(Reason reason, PMC* info);

The "Reason" data type could either be a structured string, such as 
"parrot.io.staleFileHandle", or some sort of array of strings--just 
enough to indicate a hierarchy to be used for selecting a handler.

The key point here is that the "reason" parameter is what is needed by 
the exception infrastructure to choose the correct handler (and to 
decide what to do if no handler is found), and "info" just has to be of 
a form agreed upon by the "thrower" and the "catcher".

This sort of approach would work with languages such as Java which 
expect exceptions to be objects--in this case, the Java exception 
object would be passed as the "info", and the "reason" would be derived 
from the type of that object.

For simpler languages (and possibly even for Perl6), "info" could just 
be a hash--no real need for a specialized type of object.

I think the core of my thinking here is that exceptions don't really 
have behavior--at most they are a bag of data (which is what hashes are 
usually good for), and really they are a process for jumping the flow 
of execution, and sending along a bag of data to indicate why it just 
happened.

That was a bit long-winded, but basically I'm saying that I'd like to 
see the pasm for raising an exception take the form:

	raise Px, .parrot.io.StaleFileHandle

where Px may actually be null.

No matter what approach we take, we're going to have issues throwing 
exceptions across language boundaries (ie, Perl exception handled by 
Python code), since different languages aren't going to agree on how to 
classify exceptions--and having exceptions be HLL objects seems to make 
this worse. Actually, in a sense this _might_ not be a problem: If we 
supply a decent reason hierarchy, then using our supplied parrot.* 
exception "reasons" would allow an exception thrown in one language to 
be handled in another; if a language-specific reason is used (e.g., 
perl.blah), then it's probably going to only end up being handled by 
code in the same language. (And that seems fine--even in Java if I 
create my own Exception subclass, then code I haven't written isn't 
expected to be catching that type of exception.) But having some 
standard types will make it possible to cross language boundaries 
meaningfully.

Just some thoughts.

One other thing: Long-term, it would be good to avoid having an 
exception handler stack which we push and pop as we do now, and instead 
use a variant of the zero-overhead approach used by C++ (and probably 
Java), wherein try-catch blocks incur zero execution overhead--CPU time 
is only consumed when an exception is actually thrown, and not when a 
"try" block is entered or exited.

JEff



Re: [RFC] IMCC pending changes request for comments

2003-12-04 Thread Jeff Clites
On Dec 2, 2003, at 8:49 PM, Melvin Smith wrote:

At 07:59 PM 12/2/2003 -0800, Steve Fink wrote:
On Dec-02, Melvin Smith wrote:

Do you really want to generate the extra unused code at the end of all
the subroutines? I don't think you want to autodetect whether the code
is guaranteed to return.
Good point. Its easy to detect returns if the code uses high level IMC
directives, but in cases where it is all low-level PASM, it could get
a little troublesome. It would also add a few dead instructions here 
and there.
Nix that idea.
You could take the approach of putting in the return code unless you 
can tell for certain that it couldn't be reached, the idea being that 
this would lead to some unreachable code at first but as the compiler 
(or optimizer) becomes more sophisticated in its flow analysis this 
would gradually improve over time.

JEff



Re: [CVS ci] object stuff

2003-12-10 Thread Jeff Clites
On Dec 9, 2003, at 3:40 PM, Dan Sugalski wrote:
At 5:46 PM +0100 12/5/03, Leopold Toetsch wrote:
Melvin Smith <[EMAIL PROTECTED]> wrote:
 At 05:14 PM 12/5/2003 +0100, Leopold Toetsch wrote:
 set I2, P1["Foo\x00i"]   # I1 == I2

gets currently the attribute idx (0) of "$Foo::i".
Q: Should the assembler mangle the "Foo::i" to "Foo\0i"
Why was this even needed in this example? Since P1 was the class Foo, 
why do we need scoping here--shouldn't the above just be:

	set I2, P1["i"]   # I1 == I2

 Something about this embedded \0 character
 bugs me. I know its what Dan has in the design doc but
 it just seems too cryptic.
Yeah its ungly. OTOH it should be really transparent for all possible
HLLs. And the NUL most certainly is.
I don't like it either, but the alternative is to impose an external 
hierarchic structure. Which we certainly could do, though it may make 
some things more difficult.

I'm more than willing to entertain arguments for and against the 
hierarchic structure, but we're closing in on the point where we fix 
the scheme and don't change it any more.
I don't see a big advantage of using a null byte over any other 
character--a null may seem unlikely to conflict with an HLL, but that's 
just hoping. If we're going to define a syntax, then we may as well 
make it easy to print out in the debugger, and use "." or ":", and 
either restrict identifiers to be underscore-letter-number 
combinations, or define an escaping syntax so that "foo\.bar.baz" means 
"baz" in the "foo.bar" namespace (ugly, but workable).

If we _really_ want these to be strings and reserve some special 
character, then we should pick something in the Unicode private use 
area. But either way, I think we need an escaping syntax, so that I can 
really have a null byte (or whatever) in the middle of something.

If we really want to allow the most freedom possible to HLLs, then 
these should be specified as arrays--no special characters needed. That 
is, "foo:bar:baz" in Perl code would assemble into an array of "foo", 
"bar", and "baz" (and "foo.bar.baz" in Java would assemble to the same 
thing). It would be natural then to store these in nested hashes, but 
not 100% necessary. I think this is what you meant by a hierarchic 
structure. It seems straightforward, though with some overhead. We 
could get around the need for hashes by defining an escaping scheme as 
mentioned above, but as an internal implementation detail--that is, the 
API uses arrays, but internally uses strings. Hiding that detail lets 
us change it later.

JEff



Re: [CVS ci] object stuff

2003-12-11 Thread Jeff Clites
On Dec 10, 2003, at 12:37 AM, Luke Palmer wrote:

Dan Sugalski writes:
At 05:14 PM 12/5/2003 +0100, Leopold Toetsch wrote:
set I2, P1["Foo\x00i"]   # I1 == I2

gets currently the attribute idx (0) of "$Foo::i".
Q: Should the assembler mangle the "Foo::i" to "Foo\0i"
I don't like it either, but the alternative is to impose an external
hierarchic structure. Which we certainly could do, though it may make
some things more difficult.
I'm more than willing to entertain arguments for and against the
hierarchic structure, but we're closing in on the point where we fix
the scheme and don't change it any more.
I'm not sure mangling is a good idea: think about anonymous classes.  
If
we assign a name to each lexical class, then there are problems when 
one
instance of a class inherits from another instance of the same class.
That is:

sub make_class(Class $parent) {
return class is $parent {
has $.stuff;
}
}
my $class1 = make_class(Object);
my $class2 = make_class($class1);
So, if you name that lexical class _class_0001 or something,
"_class_0001\0$.stuff" refers to both attributes.
Melvin pointed out another reason why this shouldn't actually conflict, 
but I'll also say: I would think that anonymous class wouldn't have 
names (pretty much by definition), not even generated ones. So to 
access class attributes of anonymous classes, you'd have to use 
something like:

	$class1->get_attribute(".stuff")

I don't know what actual syntax Perl6 would use, but what I mean to say 
is that for an anonymous class you'd have to access anything about it 
via a reference to the class object, not via name (since it won't have 
one).

If you generate a name for each instance of a lexical class, you're
setting yourself up for some major runtime cost -- concatenating the
class name to the attribute on each access (or memory cost if you want
to cache).
I would think that we'd want to do it so that we don't have to do 
by-name lookups behind the scenes if we're not doing that up front. 
That is, I'd think that

	$Foo::Bar::Baz::egg = 4;

would be doing a lookup to find the namespace "Foo::Bar::Baz" (may be 
hierarchical or not), and then a lookup of "egg" in that. But that:

package Foo::Bar::Baz

sub something
{
$egg = 4;
}
would be able to do a more direct lookup. (That is, it would look up 
"egg" in the package namespace without having to go find the package 
namespace by name.)

I think a heirarchy is a good idea for namespacing in general.  I've
always wanted to be able to tie namespaces in Perl 5.  It would only
make sense that if I tie Foo::, that Foo::anything:: would also go
through that tie to get the anything:: stash.
What do you mean by "tie" here? Are you talking about namespace 
aliasing, so that I can alias "Foo" to "A::B::C::D::E", so that I can 
say "Foo::bar" rather than "A::B::C::D::E::bar"? If so, it seems that 
this would work with or without a hierarchical structure. Definitely 
useful, though.

JEff



Re: Mac OS X Dynaloading less broken

2003-12-11 Thread Jeff Clites
I have some other fixes for this--I'll clean them up and send them in. 
I got something working which doesn't crash, and which can find 
libraries in standard locations w/o knowing the path. It uses the 
native dyld API rather than dlopen--the dlopen which shipped with 
Panther is just the third-party dlcompat lib which was on Fink before, 
I believe.

So I have ncurses.pasm working, although something is amiss--it's 
definitely invoking ncurses (I get the green generation count on the 
black background), but I don't see the life animation. But I suspect 
that the problem there isn't related to library loading.

JEff

On Dec 11, 2003, at 8:46 AM, Dan Sugalski wrote:

If you're on OS X 10.3, I unbroke (sort of) the dynaloading code, so 
it now uses the platform dlopen call. This handles .dylib files like, 
say, libncurses.dylib. That's good. The bad news, such as it is, is:

*) Still crashes. Ick. a "ulimit -c unlimited" in the terminal will 
generate gdb-able core files if you want them. (They go in /cores, 
which I recommend cleaning out occasionally)

*) The dlopen routine is really primitive compared to the rest of the 
known unix universe. You must specify the *exact* filename of the 
library. No library search paths or anything like that. (I've also 
fixed the dynaloading code to try the exact filename, so this does 
work now)

If anyone wants to try figuring out why ncurses.pasm fails, go for it 
and good lukc.
--
Dan

--"it's like 
this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: [perl #24682] [BUG] parrot compile fails on MacOS 10.3.1 - possibly dynaloading patch?

2003-12-19 Thread Jeff Clites
On Dec 17, 2003, at 11:24 AM, Leopold Toetsch wrote:

Allison Randal <[EMAIL PROTECTED]> wrote:

$ parrot t/op/hacks_1.pasm
not reached
...
Does F have an entry for SIGFPE?
Is PARROT_HAS_HEADER_SIGNAL defined?
"Yes" to both questions.

The issue turns out to be that SIGFPE isn't raised on Mac OS X on 
divide by zero. If I hack src/core_ops.c to explicitly raise(SIGFPE) in 
the case of zero divisor then the tests pass, so the exception handling 
code seems to be working correctly.

I'm not sure of the best way to fix the tests--long-term I'd expect 
we'd want parrot to explicitly guard against divide-by-zero (as perl5 
seems to do), but I know that's not the real point of these tests.

JEff



Re: [perl #24682] [BUG] parrot compile fails on MacOS 10.3.1 - possibly dynaloading patch?

2003-12-20 Thread Jeff Clites
On Dec 20, 2003, at 1:54 AM, Leopold Toetsch wrote:

Jeff Clites <[EMAIL PROTECTED]> wrote:

The issue turns out to be that SIGFPE isn't raised on Mac OS X on
divide by zero. If I hack src/core_ops.c to explicitly raise(SIGFPE) 
in
the case of zero divisor then the tests pass, so the exception 
handling
code seems to be working correctly.
Ah, good take. As Perl5 obviously catches div by 0, we probably need
some Configure test, if a platform signals int div/0.
I was told that int divide by zero has undefined behavior in C (though 
it's well-defined for floating-point operations, via IEEE 754), so it 
might be safest to always do an actual check in parrot (though that's 
an extra conditional branch around every int divide). What I'm thinking 
is that for a particular platform the behavior may not be consistent, 
so an accurate Configure check might be difficult. (Also, having this 
always directly raise a parrot exception should leave us with less 
platform-dependent code--e.g. I assume the Windows case wouldn't use 
signals.)

I'm not sure of the best way to fix the tests
Could your try float division?
That doesn't cause SIGFPE either. And actually this:

set N10, 2.1
div N10, 0
print N10
end
prints "inf".

Some more data: The behavior in these 2 cases matches the behavior of 
analogous C code on Mac OS X, as you'd expect. Also, on Intel--don't 
remember if this was Windows or Linux--float divide by zero in a C test 
case also gave "inf" via printf. I believe that the IEEE default 
behavior is to not raise a signal for invalid operations.

JEff



Re: threads and shared interpreter data structures

2003-12-21 Thread Jeff Clites
It sounds like an assumption here is that separate threads get separate 
interpreter instances. I would have thought that a typical 
multithreaded program would have one interpreter instance and multiple 
threads (sharing that instance). I would think of separate interpreter 
instances as the analog of separate independent processes (at the Unix 
level), and that threads would be something more lightweight than that. 
There would be _some_ structure which is per-thread, but not logically 
the whole interpreter.

JEff



Re: threads and shared interpreter data structures

2003-12-21 Thread Jeff Clites
On Dec 21, 2003, at 5:44 PM, Dan Sugalski wrote:

At 5:31 PM -0800 12/21/03, Jeff Clites wrote:
It sounds like an assumption here is that separate threads get 
separate interpreter instances. I would have thought that a typical 
multithreaded program would have one interpreter instance and 
multiple threads (sharing that instance). I would think of separate 
interpreter instances as the analog of separate independent processes 
(at the Unix level), and that threads would be something more 
lightweight than that. There would be _some_ structure which is 
per-thread, but not logically the whole interpreter.
Been there, done that, got the scars to prove it. Doesn't work well in 
a system with guarantees of internal consistency and core data 
elements that are too large for atomic access by the processor. (And, 
while you can lock things, it gets really, really, *really* slow...)
1) It seems like you've made a leap here. I don't see how the need to 
guarantee the internal consistency of things such as strings directly 
implies per-thread allocation pools or file-descriptor tables. Having 
non-shared HLL-accessible data doesn't imply non-shared internal data 
structures.

2) Separate issue really, but: How can a language such as Java, which 
doesn't have inter-thread data-access restrictions, be implemented on 
top of Parrot?

JEff



Re: ParrotIO objects

2003-12-22 Thread Jeff Clites
On Dec 22, 2003, at 2:48 PM, Melvin Smith wrote:

At 11:02 AM 12/22/2003 -0700, Cory Spencer wrote:
The program I'm writing is passing a ParrotIO object about to various
functions (the ParrotIO object being either stdin or a file handle to 
a
regular file).  Each function can read as many bytes as it likes from 
the
descriptor.  There are times, however, where I might decide that I 
don't
want the byte I just read, and need to push it back onto the stream.  
This
brings up (I think) two ways around the problem:

  1) I'd like to be able to peek ahead at a byte, before actually
 consuming it.  ie)
Yes, Parrot will get an "unget" and "peek". Now that the buffering
layer is mostly working it should be quite easy.
I assume this would only be allowed to work if a buffering layer is 
topmost, right? That is, I shouldn't be able to do either to an io_unix 
layer.

I've had a bad bout with the flu for the last 7 days. As soon as I can 
sit
up again I'll see about patching it in.
Hope you feel better!

JEff



Re: [perl #24682] [BUG] parrot compile fails on MacOS 10.3.1 - possibly dynaloading patch?

2003-12-22 Thread Jeff Clites
On Dec 22, 2003, at 6:57 AM, Dan Sugalski wrote:

At 11:44 AM -0800 12/20/03, Jeff Clites wrote:
On Dec 20, 2003, at 1:54 AM, Leopold Toetsch wrote:

Jeff Clites <[EMAIL PROTECTED]> wrote:

The issue turns out to be that SIGFPE isn't raised on Mac OS X on
divide by zero. If I hack src/core_ops.c to explicitly 
raise(SIGFPE) in
the case of zero divisor then the tests pass, so the exception 
handling
code seems to be working correctly.
Ah, good take. As Perl5 obviously catches div by 0, we probably need
some Configure test, if a platform signals int div/0.
I was told that int divide by zero has undefined behavior in C 
(though it's well-defined for floating-point operations, via IEEE 
754), so it might be safest to always do an actual check in parrot 
(though that's an extra conditional branch around every int divide). 
What I'm thinking is that for a particular platform the behavior may 
not be consistent, so an accurate Configure check might be difficult. 
(Also, having this always directly raise a parrot exception should 
leave us with less platform-dependent code--e.g. I assume the Windows 
case wouldn't use signals.)
I don't want to use signals to catch math errors. (Though, amusingly, 
fatal hardware-type errors like math issues were what signals were 
originally for. But that's something for another time) It's 
platform-dependent, as we've seen some platforms don't do it, and 
worse yet some platforms are somewhat indeterminate as to when the 
errors get thrown. (The Alpha has fuzzy math exceptions. The "math 
screwed up" status bit does get set, but it isn't generally checked 
immediately, and the massively out-of-order execution engine tends to 
throw them at what looks like odd times)

Math operations that can throw errors should have a wrapper around 
them to catch the error, either a macro or some ops preprocessing 
keyword that we can wedge in platform-specific checks for.
Another good reason to not use signals to catch math errors is that 
it's bad practice for libraries to install signals handlers as part of 
their basic operation (since signal handlers are global to the 
application, you unexpectedly affect the global behavior of the app).

(In this particular case, a math error in some non-parrot-related part 
of an application would end up trying to throw a parrot exception)

JEff



Re: When is enough too much?

2003-12-23 Thread Jeff Clites
On Dec 23, 2003, at 4:08 PM, Uri Guttman wrote:

"DS" == Dan Sugalski <[EMAIL PROTECTED]> writes:

Speaking personally, being able to automatically convert a Parrot 
array
to an array of ints or floats would be very useful, but that's 
because I
do fairly hard-core number crunching in my day job. What are the
arguments againtst putting something like this in the NCI interface?

Mainly it's two more things for the interface, which grows ever
larger. Past that, well... there's no real argument against other than
it means more funky assembly code for the JIT building call frames.
but it is (just about) one time only work and will save tons of 
repeated
tricky work down the line for those who will embed c libs in parrot. 
the
richer this api is, the less problems for the nci users and that is
important IMO. but as i said in my other reply, handling hashes is a 
big
issue and that can't deal will all possible cases (while you can deal
with most of the cases for arrays).
Yeah, the hash case is less of a clear win--in addition to the 
complexity issues you mentioned, they are also much less used in pure C 
code (almost never, really)--since there's no standard hash type in C, 
people tend to not write C-based interfaces which use them.

JEff



Re: Python running fast on .NET

2003-12-24 Thread Jeff Clites
I don't think that Dan meant that Python-on-Parrot would be 20x faster. 
He's saying that it's easy to speed up Python if you leave out the slow 
parts, and that the Python-on-.NET implementation has left out the slow 
parts so far. So when it's complete, Python-on-.NET will end up slower 
than "regular" Python. (That's my interpretation of what he's saying.)

JEff

On Dec 23, 2003, at 10:28 PM, Joe Wilson wrote:

In order to get the 20x speed gain you seek I assume
that Parrot would have to perform some sort of variable
type inference to distinguish, for example, when a
scalar is really just an integer and use an integer register.
Otherwise, the PMCs in Parrot would perform much the same
as the Python scalars (or whatever Python calls them).
The question is when would Parrot would perform this type
inference and subsequent bytecode transformation?  At bytecode
load time or at runtime?
--- Dan Sugalski <[EMAIL PROTECTED]> wrote:
 Yeah, but alas Miguel's mis-informed. A reasonable reimplementation
 of core python (without all the grotty bits that arguably throw in
 the huge speed hit) should run that benchmark at about 20x python's
 base speed, and the parts of Python that will give .NET serious 
fits
 haven't been implemented. Python's method semantics don't match
 .NET's semantics in a number of performance-unpleasant ways.



Re: Slow vararg subroutines in Parrot, jit bug found as well.

2003-12-28 Thread Jeff Clites
On Dec 28, 2003, at 1:42 AM, Leopold Toetsch wrote:

Joe Wilson <[EMAIL PROTECTED]> wrote:
Perl's arrays do indeed accept mixed data types (see example below).
Perl's Arrays take SV's. Please use a PerlArray instead of SArray.

Parrot (still built unoptimized) is significantly faster then perl5 on 
this
test.
If I take the f4.pasm that Joe Wilson originally posted, and just 
change SArray to PerlArray, then I grow in memory usage very quickly. 
If I comment out the "set P5, 5", then it doesn't happen.

Based on your 'warnocked "Q: Array vs SArray" from Dec 11th' (though 
now that we are talking about it, I suppose it's not warnocked any 
more...), I'd expect this "set P5, 5" to fill the PerlArray with 5 
nulls, but I don't see right off why I'd get a memory explosion. Do you 
have any idea what might be going on?

Thanks,

JEff



Re: [perl #24769] [PATCH] mem_sys_allocate_executable - initial draft

2003-12-28 Thread Jeff Clites
On Dec 28, 2003, at 6:35 PM, Mark A. Biggar wrote:

Leopold Toetsch wrote:
Jonathan Worthington <[EMAIL PROTECTED]> wrote:
The other question is does Parrot care about the memory being zero'd 
out?
Isn't necessary. Executable mem is filled with ops anyway. Currently 
it
is zeroed to aid debugging a bit. It should be filled up with trap
operations of some kind finally I think.
Wouldn't that suggest that zero ought to be a trap code?
This memory is being filled with JITed code (so, native executable code 
for the platform), so we don't get to determine how zero is 
interpreted--it's up to the CPU.

JEff



Re: Garbage Collection Tasks

2003-12-29 Thread Jeff Clites
On Dec 29, 2003, at 8:13 AM, Dan Sugalski wrote:

2) We need a more traditional, non-moving GC as an alternative to the 
copying collector. A modified mark & sweep'd be fine, but as it'll be 
living behind the API it doesn't much matter.
This is really only for the chunks of memory backing strings and such, 
right? We're already using mark-and-sweep for PMCs and strings and such 
(that is, their headers), right?

As I see it, it's really the allocation that is more complicated with a 
mark-and-sweep collector (since you have to search for a correct-sized 
free chunk, efficiently)--the collection itself is the easy part. 
Actually, it seems like this is just the GC_IS_MALLOC case--that 
already gives us non-moving GC, and lets malloc (the system version or 
the Lea version) deal with managing the bookkeeping.

#2 is a bit more interesting and, if we do it right, means that we'll 
end up with a pluggable garbage collection system and may (possibly) 
be able to have type 3 threads share object arenas and memory pools, 
which'd be rather nice. Even if not, it leaves a good window for 
experimentation in allocation and collection, which isn't bad either.
This, combined with the mention of needing to drive this off of a 
vtable, means being able to determine the GC algorithm at runtime 
instead of compile-time (i.e., that was part of your point). I'm all 
for that--it will mean less #if's in the code, and it will make the 
code a bit clearer.

JEff



Re: Garbage Collection Tasks

2003-12-30 Thread Jeff Clites
On Dec 29, 2003, at 11:48 AM, Dan Sugalski wrote:

As I see it, it's really the allocation that is more complicated with 
a mark-and-sweep collector (since you have to search for a 
correct-sized free chunk, efficiently)--the collection itself is the 
easy part. Actually, it seems like this is just the GC_IS_MALLOC 
case--that already gives us non-moving GC, and lets malloc (the 
system version or the Lea version) deal with managing the 
bookkeeping.
Allocation's more complex, but so is deallocation. Properly 
maintaining a free list with adjacent piece coalescing can be 
non-trivial, and there's the added complication of COW so multiple 
string headers may be pointing to the same chunk of memory, so freeing 
up one's not a sufficient reason to return the memory to the free 
pool.
Yep, I think the Lea allocator takes care of the tricky parts of the 
free-list management, and the GC_IS_MALLOC code is already handling the 
COW cases.

#2 is a bit more interesting and, if we do it right, means that 
we'll end up with a pluggable garbage collection system and may 
(possibly) be able to have type 3 threads share object arenas and 
memory pools, which'd be rather nice. Even if not, it leaves a good 
window for experimentation in allocation and collection, which isn't 
bad either.
This, combined with the mention of needing to drive this off of a 
vtable, means being able to determine the GC algorithm at runtime 
instead of compile-time (i.e., that was part of your point). I'm all 
for that--it will mean less #if's in the code, and it will make the 
code a bit clearer.
Yep. Care to take a shot at it?
I'll add it to my to-do list. I'm trying to finish up a few things 
which I've left sitting 90% done, so if someone else wants to tackle 
this before I get to it then go ahead--just post to the list when you 
start so that we don't duplicate efforts (and I'll do the same, if I 
get to it and no one else has started).

JEff



Re: Threading design

2003-12-30 Thread Jeff Clites
On Dec 30, 2003, at 11:18 AM, Dan Sugalski wrote:

At 11:27 AM -0500 12/30/03, Gordon Henriksen wrote:
I wish the threading design for parrot would look more toward 
successful, performant multithreaded systems,
I'm going to be really grumpy here, though it's not directed at 
Gordon. What *I* wish is that people who've not had any experience 
trying to build threaded interpreters for languages with data as 
heavyweight as perl's with a POSIXy "share everything" requirement 
that guarantee user threading problems won't crash the interpreter 
would stop pronouncing judgement on threading designs. It's getting 
really tiresome and I'm going to start getting viciously rude about 
it.

If you *have* experience with this sort of thing, *please* share. 
Otherwise stop telling me the design sucks--I *know* that already. 
What I don't have is a better answer, nor the ability to throw out the 
troublesome requirements.

If you want to help, then great. Specifics are a wonderful thing--X 
worked because of Y and Z, or Q didn't work because of R and/or S. 
Details are great, generalities are OK if details aren't available for 
whatever reason.
Please understand the people are not specifically criticizing you--a 
lot of people care about the design and success of Parrot, and it 
valuable for everyone to hear their opinions, even if they are seeing 
the problems and not the solutions. It's important (psychologically) 
for people to be able to feel heard, and also (statistically) to find 
out what the community feels is important (this slice of the community, 
at least). Please try not to take it personally--it's just a 
discussion, and people are talking just as much to each other as they 
are to you, even if a design decision on your part sparked the 
discussion. I realize it can be frustrating for you, but everyone 
really does seem to have the same goal at heart, and we'll end up with 
a better product in the end if discussion is encouraged.

Also, please share specifics that you have about design decisions that 
others seem to disagree with. With more information on the table, the 
rationale will be clearer, and that will either help convince people 
that it really is the best design, or it will allow them to identify 
specific flaws--places where some bit of data may have been overlooked. 
And also, if you feel stumped by something, put that out there too--I 
think it sets a very different tone if a (seemingly questionable) 
design decision is one which you think is great (or as good as it can 
get), versus just the best which you've been able to come up with so 
far. This will help steer the discussion, and make it more productive.

I hope you find this feedback useful.

JEff



Re: Problem during "make test"

2003-12-30 Thread Jeff Clites
On Dec 29, 2003, at 2:12 PM, Harry Jackson wrote:

During

[EMAIL PROTECTED] parrot]$ make test
echo imcc/imcc.y -d -o imcc/imcparser.c
imcc/imcc.y -d -o imcc/imcparser.c
perl -e 'open(A,qq{>>$_}) or die foreach @ARGV' imcc/imcc.y.flag 
imcc/imcparser.c imcc/imcparser.h
perl t/harness --gc-debug --running-make-test  -b t/op/*.t t/pmc/*.t 
t/native_pbc/*.t
t/op/00ff-dos...

This is as far as it gets. I am assuming since no one else has noticed 
this that it is a problem with my set up but I am at a bit of a loss 
as to what has happened to cause it.

It gets even stranger. If I do a make clean and make test again it 
does not necessarily stop in the same place each time ie.
Here are 3 things to try:

1) When it hangs there, check with 'top' to see if it is using CPU (ie, 
is it blocking, or in an infinite loop).

2) Try running one of the tests which blocks, individually. If you can 
get it to happen this way, then run it in gdb and see what it's doing. 
(Or, attach to an already blocked one from 'make test'--this is 
assuming it's parrot that's actually blocking, and not t/harness.)

3) Try building from a clean checkout, and see if that shows the 
problem. If not, it's probably something you've changed and don't 
realize.

JEff



Re: Problem during "make test"

2003-12-30 Thread Jeff Clites
On Dec 30, 2003, at 3:11 PM, Harry Jackson wrote:

2) Try running one of the tests which blocks, individually. If you 
can get it to happen this way, then run it in gdb and see what it's 
doing. (Or, attach to an already blocked one from 'make test'--this 
is assuming it's parrot that's actually blocking, and not t/harness.)
When run individually I get the same error. Complete freeze at what 
appears to be an arbitrary point.

Running gdb

0x080a7625 in Perl_sv_gets ()
I meant try running parrot in the debugger--Perl is probably hanging 
b/c it's waiting for parrot to exit. For instance, see if the following 
hangs, and if so run it in gdb:

./parrot --gc-debug -b t/op/arithmetics_4.pasm

This is where gdb freezes execution. CTL-C then frees it up to 
continue until the next one freezes.
The ctrl-C is probably killing the subprocess (parrot), which does 
imply that it's parrot that's hanging.

3) Try building from a clean checkout, and see if that shows the 
problem. If not, it's probably something you've changed and don't 
realize.
I have tried something a bit more drastic. Deleted the entire tree and 
downloaded it again (sorry about the bandwidth).
Yep, that's what I meant.

I have also tried strace and got the following.
Try this on parrot rather than Perl.

Doing a "ps ax" reveals the following (ignore the test number it keeps 
changing)

20802 pts/11   S  0:00 make test
21598 pts/11   S  0:00 perl t/harness --gc-debug 
--running-make-test -b t/op/00ff-dos.t t/op/00ff-unix.t 
t/op/arithmetics.t t/op/basic.t
21610 pts/11   S  0:00 perl -w t/op/arithmetics.t
21620 pts/11   S  0:00 ./parrot --gc-debug -b 
t/op/arithmetics_4.pasm
21621 pts/11   S  0:00 ./parrot --gc-debug -b 
t/op/arithmetics_4.pasm
21622 pts/11   S  0:00 ./parrot --gc-debug -b 
t/op/arithmetics_4.pasm
That does seem odd that it appears to be running the same test 3 
times

JEff



Patch submission gone missing?

2004-01-01 Thread Jeff Clites
I submitted a patch yesterday to [EMAIL PROTECTED], 
and received the automated response (the ID is [perl #24789]), but it 
doesn't seem to have been forwarded to the mailing list. Is something 
up with the tracking system?

JEff



Re: Threads. Design. Go for it

2004-01-01 Thread Jeff Clites
On Jan 1, 2004, at 9:43 AM, Josh Wilmes wrote:

At 16:15 on 12/30/2003 EST, Dan Sugalski <[EMAIL PROTECTED]> wrote:

Your constraints:

2) A perl 5 iThreads "it's not a thread, it's a fork. Well, sorta..."
mode must be available
For those of us who aren't particularly familiar with ithreads, what 
does
this imply?   What's different, and why must it be done at the level of
parrot, not perl?
The unique feature of ithreads is that data is not shared between 
threads by default, but rather has to be marked as shared to be visible 
from multiple threads. (If you have Perl 5.8, you can read more about 
this in the perlthrtut man page; in particular, the section "Threads 
And Data".)

As far as what level needs to implement them, I'd say that parrot has 
to do enough to make it possible for an HLL to expose ithreads-style 
threading. Due to the cross-language nature of parrot, practically 
speaking this probably means that the semantic needs to exist at the 
parrot level. (For instance, an ithread created in Perl code could load 
a parrot bytecode library implemented in Python; calling into code 
inside this library shouldn't allow access to "unshared" data from 
other ithreads.)

JEff



Re: Problem during "make test"

2004-01-01 Thread Jeff Clites
On Dec 30, 2003, at 5:19 PM, Harry Jackson wrote:

I have also tried strace and got the following.
Try this on parrot rather than Perl.
strace on parrot gets to

rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0, NULL) = 8423
--- SIGCHLD (Child exited) ---
sigreturn() = ? (mask now [])
write(1, ": blib/lib/libparrot.a\n", 23: blib/lib/libparrot.a
) = 23
rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT TERM XCPU XFSZ], NULL, 8) = 0
vfork() = 8424
--- SIGCHLD (Child exited) ---
sigreturn() = ? (mask now [HUP INT QUIT 
TERM XCPU XFSZ])
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0, NULL) = 8424
rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT TERM XCPU XFSZ], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
stat64("blib/lib/libparrot.a", {st_mode=S_IFREG|0664, 
st_size=18102092, ...}) = 0
write(1, "cc -o parrot -Wl,-E  -g  imcc/ma"..., 98cc -o parrot -Wl,-E 
-g  imcc/main.o blib/lib/libparrot.a -lnsl -ldl -lm -lcrypt -lutil 
-lpthread
) = 98
rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT TERM XCPU XFSZ], NULL, 8) = 0
vfork() = 8425
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0, NULL) = 8425
--- SIGCHLD (Child exited) ---
sigreturn() = ? (mask now [])
rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT TERM XCPU XFSZ], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
stat64("parrot", {st_mode=S_IFREG|0775, st_size=4133629, ...}) = 0
stat64("test_prep", 0xbfffddf0) = -1 ENOENT (No such file or 
directory)
stat64("testb", 0xbfffddf0) = -1 ENOENT (No such file or 
directory)
write(1, "/usr/local/bin/perl5.8.2 t/harne"..., 
106/usr/local/bin/perl5.8.2 t/harness --gc-debug --running-make-test  
-b t/op/*.t t/pmc/*.t t/native_pbc/*.t
) = 106
rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT TERM XCPU XFSZ], NULL, 8) = 0
vfork() = 8428
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
t/op/00ff-dos...ok
t/op/00ff-unix..ok
t/op/arithmeticsok 15/18
That looks like you ran strace on 'make'.

Here's one more thing to investigate: When you get to the point where 
it freezes, run ps with the -j option, to display the parent pid of 
each process (so maybe 'ps -jax'). (Do this in a separate terminal 
session, of course.) Find whatever process is at the bottom of the 
"tree" of processes descending from 'make' (that is, 'make' should be 
the parent of 'perl t/harness', which should be the parent of another 
perl process running a ".t" script, which should be the parent of some 
parrot process running an individual test), then try to 'gdb attach' to 
that pid, and do a backtrace to see where it is hanging.

JEff



Re: Patch submission gone missing?

2004-01-02 Thread Jeff Clites
On Jan 1, 2004, at 11:31 PM, Robert Spier wrote:

I submitted a patch yesterday to [EMAIL PROTECTED],
and received the automated response (the ID is [perl #24789]), but it
doesn't seem to have been forwarded to the mailing list. Is something
up with the tracking system?
Nothing is up.  It's in the moderation queue.  The list moderator is
probably still recovering -- it is a holiday.  Please give him a
break.
I didn't realize there was moderation. (How was spam getting through?)

JEff



Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Jeff Clites
On Jan 3, 2004, at 11:19 AM, Elizabeth Mattijsen wrote:

At 01:48 -0500 1/3/04, Uri Guttman wrote:
 > "NS" == Nigel Sandever <[EMAIL PROTECTED]> writes:
  NS> All that is required to protect an object from corruption 
through
  NS> concurrent access and state change is to prevent two (or more) 
VMs
  NS> trying to access the same object simultaneously. In order for 
the VM to
  NS> address the PMC and must load it into a VM register, it must 
know where
  NS> it is. Ie. It must have a pointer.  It can use this pointer to 
access
  NS> the PMC with a Bit Test and Set operation.  (x86 BTS)
  NS> [http://www.itis.mn.it/linux/quarta/x86/bts.htm] This is a CPU 
atomic
  NS> operation. This occurs before the VM enters the VM operations 
critical
  NS> section.
ding! ding! ding! you just brought in a cpu specific instruction which
is not guaranteed to be on any other arch. in fact many have such a
beast but again, it is not accessible from c.
I just _can't_ believe I'm hearing this.  So what if it's not 
accessible from C?  Could we just not build a little C-program that 
would create a small in whatever loadable library?  Or have a 
post-processing run through the binary image inserting the right 
machine instructions in the right places?  Not being from a *nix 
background, but more from a MS-DOS background, I've been used to 
inserting architecture specific machine codes from higher level 
languages into executable streams since 1983!  Don't tell me that's 
not "done" anymore?  ;-)
Yes, you are correct--we are already using bits of assembly in parrot. 
C compilers tend to allow you to insert bits of inline assembly, and we 
are taking advantage of that--for instance, look for "__asm__" in the 
following files:

jit/arm/jit_emit.h
jit/ppc/jit_emit.h
src/list.c
src/malloc.c
Also, JIT is all about generating platform-specific machine 
instructions at runtime. So it's certainly do-able, and right along the 
lines of of what we are already doing.

JEff



Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Jeff Clites
On Jan 3, 2004, at 10:08 AM, Elizabeth Mattijsen wrote:

At 12:15 -0500 1/3/04, Uri Guttman wrote:
 > "LT" == Leopold Toetsch <[EMAIL PROTECTED]> writes:
  LT> These are platform specific details. We will use whatever the
  LT> platform/OS provides. In the source code its a LOCK() UNLOCK() 
pair.
  LT> The LOCK() can be any atomic operation and doesn't need to call 
the
  LT> kernel, if the lock is aquired.
if it doesn't call the kernel, how can a thread be blocked?
Think out of the box!

Threads can be blocked in many ways.  My forks.pm module uses sockets 
to block threads  ;-).
IO operations which block like that end up calling into the kernel.

But I believe it is usually possible to acquire an uncontested lock 
without calling into the kernel. When you do need to block (when trying 
to acquire a lock which is already held by another thread) you may need 
to enter the kernel. But I think that Leo's point was that in the 
common case of a successful lock operation, it may not be necessary.

JEff



Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Jeff Clites
On Jan 3, 2004, at 12:26 PM, Uri Guttman wrote:

  LT> These are platform specific details. We will use whatever the
  LT> platform/OS provides. In the source code its a LOCK() UNLOCK() 
pair.
  LT> The LOCK() can be any atomic operation and doesn't need to call 
the
  LT> kernel, if the lock is aquired.

if it doesn't call the kernel, how can a thread be blocked?
  LT> I wrote, *if* the lock is aquired. That's AFAIK the fast path of 
a futex
  LT> or of the described Win32 behavior. The slow path is always a 
kernel
  LT> call (or a some rounds spinning before ...)
  LT> But anyway, we don't reinvent these locking primitives.

ok, i missed the 'if' there. :)

that could be workable and might be faster. it does mean that locks are
two step as well, user space test/set and fallback to kernel lock. we
can do what nigel said and wrap the test/set in macros and use 
assembler
to get at it on platforms that have it
This is probably already done inside of pthread (and Win32) locking 
implementations--a check in userspace before a kernel call. It's also 
important to remember that it's not a two-step process most of the 
time, since most of the locking we are taking about is likely to be 
uncontested (ie, the lock acquisition will succeed most of the time, 
and no blocking will be needed). If we _do_ have major lock contention 
in our internal locking, those will be areas calling for a redesign.

JEff



Re: Threads Design. A Win32 perspective.

2004-01-03 Thread Jeff Clites
On Jan 3, 2004, at 2:59 PM, Leopold Toetsch wrote:

Nigel Sandever <[EMAIL PROTECTED]> wrote:
Only duplicating shared data on demand (COW) may work well on systems
that support COW in the kernel.
No, we are dealing with VM objects and structures here - no kernel is
involved for COWed copies of e.g. strings.
And also, COW at the OS level (that is, of memory pages) doesn't help, 
because we have complex data structures filled with pointers, so 
copying them involves more than just duplicating a block of memory. We 
can use an approach similar to what we do for strings to make a COW 
copy of, for instance, the globals stash, but overall that will only be 
a speed improvement if the data structure is rarely modified. (That is, 
once it's modified, we will have paid the price. Unless we have clever 
data structure which can be COWed in sections.)

Just adding to what Leo already said.

JEff



Re: Thread Question and Suggestion -- Matt

2004-01-03 Thread Jeff Clites
On Jan 3, 2004, at 5:24 PM, Matt Fowles wrote:

All~

I have a naive question:

Why must each thread have its own interpreter?
The short answer is that the bulk of the state of the virtual machine 
(including, and most importantly, its registers and register stacks) 
needs to be per-thread, since it represents the "execution context" 
which is logically thread-local. Stuff like the globals stash may or 
may not be shared (depending on the thread semantics we want), but as I 
understand it the "potentially shared" stuff is actually only a small 
part of the bits making up the VM.

That said, I do think we have a terminology problem, since I initially 
had the same question you did, and I think my confusion mostly stems 
from there being no clear terminology to distinguish between 2 
"interpreters" which are completely independent, and 2 "interpreters" 
which represent 2 threads of the same program. In the latter case, 
those 2 "interpreters" are both part of one "something", and we don't 
have a name for that "something". It would be clearer to say that we 
have two "threads" in one "interpreter", and just note that almost all 
of our state lives in the "thread" structure. (That would mean that the 
thing which is being passed into all of our API would be called the 
thread, not the interpreter, since it's a thread which represents an 
execution context.) It's just (or mostly) terminology, but it's causing 
confusion.

Why not have the threads that share everything share interpreters.  We 
can have these threads be within the a single interpreter thus 
eliminating the need for complicated GC locking and resource sharing 
complexity.  Because all of these threads will be one kernel level 
thread, they will not actually run concurrently and there will be no 
need to lock them.  We will have to implement a rudimentary scheduler 
in the interpreter, but I don't think that is actually that hard.
There are 2 main problems with trying to emulate threads this way:

1) It would likely kill the performance gains of JIT.

2) Calls into native libraries could block the entire VM. (We can't 
manually timeslice external native code.) Even things such as regular 
expressions can take an unbounded amount of time, and the internals of 
the regex engine will be in C--so we couldn't timeslice without slowing 
them down.

And basically, people are going to want "real" threads--they'll want 
access to the full richness and power afforded by an API such as 
pthreads, and the "real" threading libraries (and the OS) have already 
done all of the really hard work.

This allows threads to have completely shared state, at the cost of 
not being quite as efficient on SMP (they might be more efficient on 
single processors as there are fewer kernel traps necessary).
Not likely to be more efficient even on a single processor, since even 
if a process is single threaded it is being preempted by other 
processes. (On Mac OS X, I've not been able to create a case where 
being multithreaded is a slowdown in the absence of locking--even for 
pure computation on a single processor machine, being multithreaded is 
actually a slight performance gain.)

Programs that want to run faster on an SMP will use threads without 
shared that use events to communicate.
It's nice to have speed gains on MP machines without having to redesign 
your application, especially as MP machines are quickly becoming the 
norm.

(which probably provides better performance, as there will be fewer 
faults to main memory because of cache misses and shared data).
Probably not faster actually, since you'll end up with more data 
copying (and more total data).

I understand if this suggestion is dismissed for violating the rules, 
but I would like an answer to the question simply because I do not 
know the answer.
I hope my answers are useful. I think it's always okay to ask questions.

JEff



Re: Threads Design. A Win32 perspective.

2004-01-04 Thread Jeff Clites
On Jan 4, 2004, at 5:47 AM, Leopold Toetsch wrote:

Elizabeth Mattijsen <[EMAIL PROTECTED]> wrote:

When you use an external library in Perl, such as e.g. libxml, you
have Perl data-structures and libxml data-structures.  The Perl
data-structures contain pointers to the libxml data-structures.

In comes the starting of an ithread and Perl clones all of the Perl
data-structures.  But it copies _only_ does things it knows about.
And thus leaves the pointers to the libxml data-structures untouched.
Now you have 2 Perl data-structures that point to the _same_ libxml
data-structures.  Voila, instant sharing.
I see. Our library loading code should take care of that. On thread
creation we call again the _init code, so that the external lib can
prepare itself to be used from multiple threads. But don't ask me about
details ;)
But I think we'll never be able to make this work as the user would 
initially expect. For instance if we have a DBI implementation, and 
some PMC is holding an external reference to a database cursor for an 
open transaction, then we can't properly duplicate the necessary state 
to make the copy of the PMC work correctly (that is, independently). 
(And I'm not saying just that we can't do it from within parrot, I'm 
saying the native database libraries can't do this.)

So some objects such as this would always have to end up shared (or 
else non-functional in the new thread), which is bad for users because 
they have to be concerned with what objects are backed by native 
libraries and which ones cannot be made to conform to each of our 
thread styles.

That seems like a major caveat.

JEff



Re: Threads Design. A Win32 perspective.

2004-01-04 Thread Jeff Clites
On Jan 3, 2004, at 8:59 PM, Gordon Henriksen wrote:

On Saturday, January 3, 2004, at 04:32 , Nigel Sandever wrote:

Transparent interlocking of VHLL fat structures performed 
automatically by the VM itself. No need for :shared or lock().
Completely specious, and repeatedly proven unwise. Shouldn't even be 
pursued.

Atomic guarantees on collections (or other data structures) are rarely 
meaningful; providing them is simply a waste of time. Witness the 
well-deserved death of Java's synchronized Vector class in favor of 
ArrayList. The interpreter absolutely shouldn't crash due to threading 
errors—it should protect itself using standard techniques—but it would 
be a mistake for parrot to mandate that all ops and PMCs be 
thread-safe.
What are these standard techniques? The JVM spec does seem to guarantee 
that even in the absence of proper locking by user code, things won't 
go completely haywire, but I can't figure out how this is possible 
without actual locking. (That is, I'm wondering if Java is doing 
something clever.) For instance, inserting something into a collection 
will often require updating more than one memory location (especially 
if the collection is out of space and needs to be grown), and I can't 
figure out how this could be guaranteed not to completely corrupt 
internal state in the absence of locking. (And if it _does_ require 
locking, then it seems that the insertion method would in fact then be 
synchronized.)

So my question is, how do JVMs manage to protect internal state in the 
absence of locking? Or do they?

JEff


Re: Thread Question and Suggestion -- Matt

2004-01-04 Thread Jeff Clites
On Jan 4, 2004, at 1:58 PM, Matt Fowles wrote:

Dave Mitchell wrote:

Why on earth would they be all one kernel-level thread?


Truth to tell I got the idea from Ruby.  As I said it make 
syncronization easier, because the interpreter can dictate when 
threads context switch, allowing them to only switch at safe points.  
There are some tradeoffs to this though.  I had forgotten about 
threads calling into C code.  Although the example of regular 
expressions doesn't work as I think those are supposed to compile to 
byte code...
Ah yes, I think you are right about regexes, judging from 'perldoc 
ops/rx.ops'. I was thinking of a Perl5-style regex engine, in which 
regex application is a call into compiled code (I believe...).

JEff



Crash when joining multiple threads

2004-01-05 Thread Jeff Clites
If I run this code, which should just be a loop which 
creates-then-joins a thread (looping twice), I get a crash at the end:

set I16, 0
again:
inc I16
new P5, .ParrotThread
find_global P6, "_foo"
find_method P0, P5, "thread3"
invoke
set I5, P5
getinterp P2
find_method P0, P2, "join"
invoke
lt I16, 2, again
print "done\n"
end
.pcc_sub _foo:
invoke P1
The problem is that when joining the _second_ time, in pt_thread_join() 
you get a return value from joining the thread--which happens to 
contain a thread-interpreter PMC:

(gdb) p (char*)((PMC*)retval)->vtable->isa_str->strstart
$2 = 0x1dc40c "ParrotThread ParrotInterpreter"
This then gets cloned, which ultimately ends up messing up the 
interpreter_array[] (things end up in the wrong slot), and 
pt_join_threads() ends up trying to join a bogus thread, and you get a 
crash.

So, the bug seems to be that the second time through, you get a return 
value from the thread. I don't know why this is happening--haven't 
tried digging yet.

JEff



Re: [perl #24808] [PATCH] Fix for remaining PPC JIT failures

2004-01-05 Thread Jeff Clites
On Jan 5, 2004, at 5:45 AM, Leopold Toetsch wrote:

Jeff Clites <[EMAIL PROTECTED]> wrote:
Attached are patches to fix the remaining "make testj" failures on Mac
OS X (i.e., ppc jit). These fixes are relative to my previously
submitted patch with other ppc jit fixes, [perl #24789].
... which still seems to be missing.
Yeah, I inquired about that ("Patch submission gone missing?" of 
1/1/2004), and Robert Spier indicated that it was in the RT moderation 
queue. If it doesn't show up in a bit, I'll re-send that patch 
submission directly to the list.

JEff



Re: [PATCH] The Return of the Priority DOD

2004-01-05 Thread Jeff Clites
On Jan 5, 2004, at 5:47 AM, Luke Palmer wrote:

After many months of lying dormant, I figured I'd get my act together
and adapt this patch to the few recent modifications.  And this time,
I'm posting a benchmark!
My results get about 5% slowdown in the eager case, and the usual
10,000% speedup in the lazy case.
Sounds cool; do you have a quick high-level description of what it's 
all about? (I had seen that needs_early_DOD flag in the source, and 
didn't know what it was intended to do.)

Thanks!

JEff



Re: Crash when joining multiple threads

2004-01-05 Thread Jeff Clites
On Jan 5, 2004, at 5:32 AM, Leopold Toetsch wrote:

Jeff Clites <[EMAIL PROTECTED]> wrote:
If I run this code, which should just be a loop which
creates-then-joins a thread (looping twice), I get a crash at the end:

The problem is that when joining the _second_ time, in 
pt_thread_join()
you get a return value from joining the thread
The return value is only returned, when I3 != 0. For your example that
shouldn't be the case (I3 is unused aka zero). So there isn't any 
return
value passed back.
Yep, the second time through I find REG_INT(3) (that is, 
interpreter->int_reg.registers[3]) set to 1 in thread_func().

Are you running the latest source? It doesn't crash here.
Yep, I just verified, using a clean checkout and build:

% gdb parrot
...
(gdb) b src/thread.c:238
Breakpoint 1 at 0xf890c: file src/thread.c, line 238.
(gdb) r thrspeed.pasm
Starting program: 
/Users/jac/Projects/parrot/parrot-thread-join-crash/parrot 
thrspeed.pasm
Reading symbols for shared libraries . done

Breakpoint 1, pt_thread_join (parent=0x1000200, tid=1) at 
src/thread.c:238
238 if (retval) {
(gdb) p retval
$1 = (void *) 0x0
(gdb) c
Continuing.

Breakpoint 1, pt_thread_join (parent=0x1000200, tid=1) at 
src/thread.c:238
238 if (retval) {
(gdb) p retval
$2 = (void *) 0x10163d0
(gdb) p (char*)((PMC*)retval)->vtable->isa_str->strstart
$3 = 0x1da410 "ParrotThread ParrotInterpreter"
(gdb) c
Continuing.
done

Program received signal EXC_BAD_ACCESS, Could not access memory.
0x90039138 in pthread_join ()
(gdb) bt
#0  0x90039138 in pthread_join ()
#1  0x000f8b78 in pt_join_threads (interpreter=0x1000200) at 
src/thread.c:309
#2  0xaba0 in Parrot_really_destroy (exit_code=0, 
vinterp=0x1000200) at src/interpreter.c:1123
#3  0xc120 in Parrot_exit (status=0) at src/exit.c:48
#4  0x3544 in main (argc=1, argv=0xbbd4) at imcc/main.c:555
(gdb) f 1
#1  0x000f8b78 in pt_join_threads (interpreter=0x1000200) at 
src/thread.c:309
309 JOIN(thread_interp->thread_data->thread, retval);
(gdb) p thread_interp->thread_data
$4 = (struct _Thread_data *) 0x903520
(gdb) p thread_interp->thread_data->thread
$5 = 0x0

And here's myconfig:

Summary of my parrot 0.0.13 configuration:
  configdate='Mon Jan  5 08:33:02 2004'
  Platform:
osname=darwin, archname=darwin
jitcapable=1, jitarchname=ppc-darwin,
jitosname=DARWIN, jitcpuarch=ppc
execcapable=1
perl=perl
  Compiler:
cc='cc', ccflags='-g -pipe -pipe -fno-common -no-cpp-precomp 
-DHAS_TELLDIR_PROTOTYPE  -pipe -fno-common -Wno-long-double ',
  Linker and Libraries:
ld='cc', ldflags='  -flat_namespace ',
cc_ldflags='',
libs='-lm'
  Dynamic Linking:
so='.dylib', ld_shared=' -flat_namespace -bundle -undefined 
suppress',
ld_shared_flags=''
  Types:
iv=long, intvalsize=4, intsize=4, opcode_t=long, opcode_t_size=4,
ptrsize=4, ptr_alignment=4 byteorder=4321,
nv=double, numvalsize=8, doublesize=8

JEff



[PATCH] Re: Crash when joining multiple threads

2004-01-06 Thread Jeff Clites
On Jan 5, 2004, at 9:37 AM, Leopold Toetsch wrote:

Jeff Clites <[EMAIL PROTECTED]> wrote:

On Jan 5, 2004, at 5:32 AM, Leopold Toetsch wrote:

The return value is only returned, when I3 != 0. For your example 
that
shouldn't be the case (I3 is unused aka zero). So there isn't any
return
value passed back.

Yep, the second time through I find REG_INT(3) (that is,
interpreter->int_reg.registers[3]) set to 1 in thread_func().
Wherever that comes from, just insert a line

   set I3, 0

before you invoke the return continutation. That's the official way, to
inidicate a void return value.
I tracked down how I3 was getting set, even though from the pasm it 
looks like nothing should be changing it. It turns out to be a side 
effect of the use of NCI in the threading implementation; the problem 
wasn't showing up for you since you have CAN_BUILD_CALL_FRAMES defined 
(but I don't). So I have a patch below which avoids setting I3 to 1 if 
the return value (in P5) is null--the generated NCI stubs were setting 
I3 to 1 for functions whose signatures indicate that they return PMCs, 
even if there was no PMC returned. (With this patch, the script I 
posted works, and the nci.t tests continue to pass.)

Also, looking at the i386 Parrot_jit_build_call_func() code I noticed 
that in the case of a "P" return type, it was setting a PMC register 
but incrementing the int register count. I have a patch below to fix 
this also; I don't have a way to test it, though.

JEff

Index: build_tools/build_nativecall.pl
===
RCS file: /cvs/public/parrot/build_tools/build_nativecall.pl,v
retrieving revision 1.35
diff -u -r1.35 build_nativecall.pl
--- build_tools/build_nativecall.pl 3 Jan 2004 20:03:38 -   
1.35
+++ build_tools/build_nativecall.pl 6 Jan 2004 08:35:33 -
@@ -314,8 +314,14 @@

 sub set_return_count {
 my ($stack, $int, $string, $pmc, $num) = @_;
+
+   my $pmc_string;
+
+   if( $pmc ) { $pmc_string = "return_data ? $pmc : 0" }
+   else { $pmc_string = 0 }
+
 print NCI <
-set_return_val(interpreter, $stack, $int, $string, $pmc, $num);
+set_return_val(interpreter, $stack, $int, $string, $pmc_string, 
$num);
 return;
 }

Index: jit/i386/jit_emit.h
===
RCS file: /cvs/public/parrot/jit/i386/jit_emit.h,v
retrieving revision 1.99
diff -u -r1.99 jit_emit.h
--- jit/i386/jit_emit.h 3 Jan 2004 13:22:45 -   1.99
+++ jit/i386/jit_emit.h 6 Jan 2004 08:37:22 -
@@ -3050,7 +3050,7 @@
 case 'v': /* void - do nothing */
 break;
 case 'P':
-jit_emit_mov_mr_i(pc, &PMC_REG(next_i++), emit_EAX);
+jit_emit_mov_mr_i(pc, &PMC_REG(next_p++), emit_EAX);
 break;
 case 'p':   /* make a new unmanaged struct */
 /* save return value on stack */


[PATCH] PPC JIT fixes [re-send] (Modified by Jeff Clites)

2004-01-06 Thread Jeff Clites
{This is a re-send of my submission [perl #24789] from 12/31, which has  
not been forwarded from RT. I wanted to keep this separate from my  
subsequent [perl #24808] in order to preserve the change history.}

Attached is a patch which fixes most of the "make testj" failures on  
Mac OS X (ie, ppc jit). Essentially all of the tests were failing;  
after the patch, these still fail:

Failed TestStat Wstat Total Fail  Failed  List of Failed
 
---
t/op/arithmetics.t1   256181   5.56%  7
t/op/interp.t 5  1280165  31.25%  2 8-11
t/pmc/exception.t 4  1024224  18.18%  15-16 18 20
t/pmc/io.t1   256211   4.76%  6
t/pmc/pmc.t   1   256921   1.09%  62
t/pmc/sub.t   4  1024474   8.51%  42-45
t/pmc/tqueue.t1   256 21  50.00%  2
33 subtests skipped.
Failed 7/61 test scripts, 88.52% okay. 17/1017 subtests failed, 98.33%  
okay.

(The arithmetics.t failure is due to "-0.0"; I have a fix for that as  
well, which I'll submit separately, assuming we want to suppress  
negative zero.)

Notes:

1) Now using the bctrl instruction instead of the bl instruction in  
Parrot_jit_normal_op, to branch to the op function using an absolute  
address rather than a relative address. This fixes the bulk of the  
failures. It's not completely clear why the previous implementation was  
failing; the generated code looked correct when disassembled in the  
debugger, but then again it also worked when running in the debugger  
(but not when run outside of the debugger). I suspect that the issue  
may have been problems with large branch offsets. The new  
implementation also has the nice side effects that it avoids the need  
for a jit_fixup here, and also matches the code which gcc generates in  
the case of a branch to a literal address.

2) Changed the frame entry and exit code to save the nonvolatile  
registers that we are modifying.

3) Fixed typo in Parrot_if_n_ic and Parrot_unless_n_ic.

4) Fixed typo in jit_emit_bcctrx.

5) Changed add_disp to handle large offsets. This was causing failures  
in t/op/stacks.t which had nothing to do with stacks, but rather were  
problems with large bytecode. Added a test to t/op/jit.t to explicitly  
test behavior with large bytecode, and modified the output of a test in  
stacks.t to make it easier to determine the point of failure.

6) Changed dumpcore() to be raise(SIGQUIT) instead of kill(0, SIGQUIT)  
for the Unix platforms. This is because the latter actually sends the  
signal to the entire current process group (not just the current  
process), so a failing test was taking down t/harness.

7) I don't expect anything here to break AIX users--the Mac OS X and  
AIX calling conventions differ somewhat, but I don't think anything  
here should run into those differences. But I'd like to hear the test  
results on AIX after application of the patch, and also I'd like to  
know if most tests were failing before its application.



ppc-jit.patch
Description: Binary data


JEff

Re: References to hash elements?

2004-01-06 Thread Jeff Clites
On Jan 6, 2004, at 9:05 AM, Simon Cozens wrote:

Arthur Bergman:
I am wondering how the references to hash elements are planned to be
done? The call to set_ must somehow be delayed until the time is 
right.
I would have thought that a hash element would itself be a PMC rather
than an immediate value, so a reference to that should be treated just
like any other reference to a PMC.
But there's a semantic difference between a "reference to a hash 
element" and a "reference to something which happens to have come out 
of a hash". Consider:

# example 1
$a = "foo";
$b = \$a;
# example 2
$hash{bar} = "foo";
$a = $hash{bar};
$b = \$a;
# example 3
$hash{bar} = "foo";
$b = \$hash{bar};
Examples 1 and 2 should be identical in terms of what $a and $b 
contain; in particular, modifying $b doesn't affect the hash in example 
2. In example 3, $b contains something different (a hash element 
reference), and assigning to it does change the hash. (That is, 
assuming we have hash element references of this sort.) That is to say, 
reference-to-hash-element-for-key(bar) is not the same as 
reference-to(hash-element-for-key(bar)).

JEff



Re: References to hash elements?

2004-01-06 Thread Jeff Clites
[ apologies for the duplicate email message from my last post--mail 
client problems... ]

On Jan 6, 2004, at 11:50 AM, Simon Cozens wrote:

Jeff Clites:
$a = $hash{bar};
Here you used the copy constructor before taking the reference. It 
might look
like an assignment operator, but it isn't. You're better off thinking 
that
assignment doesn't exist. It's a copy constructor. It makes the PMC 
referred
to by $a a copy of the PMC in $hash{bar}. Their values may be "equal" 
but
they're two different PMCs.
But here what I'm copying is the _contents_ of the hash slot--so I'm 
copying a PerlString PMC, for instance.

$b = \$hash{bar};
Here you didn't make a copy before taking the reference. No copy, only 
one
PMC. It all works.
And here I'm not making a copy, but also the thing I'm taking a 
reference to is not the same thing I copied above. Here, it's a 
reference to a hash slot.

So maybe the way to think about it is:

$a = $hash{bar};
#This finds the hash slot, pulls out its contents, and copies that 
contents into $a.

$b = \$hash{bar};
#This finds the hash slot, takes a reference to it, and places that 
reference in $b.

So the first case involves copying a string, and the second involves 
creating a reference to a hash slot. In particular, if the hash is 
initially empty, the first case should end up with undef in $a (so 
nothing actually copied, if undef is a singleton), but $b would 
(presumably) still contain a real reference (to a hash slot which 
doesn't contain anything yet), which would create a hash entry if you 
"$$b = 'goo'". And that reference-to-a-hash-slot is probably a separate 
special PMC (it needs to keep track of the hash and key, and work even 
if the hash is empty).

JEff



Re: References to hash elements?

2004-01-06 Thread Jeff Clites
On Jan 6, 2004, at 9:25 AM, Leopold Toetsch wrote:

Arthur Bergman <[EMAIL PROTECTED]> wrote:
Hi,

I am wondering how the references to hash elements are planned to be
done? The call to set_ must somehow be delayed until the time is 
right.

$foo = \$hash{key};

$$foo = "bar";

$has{key} is now "bar"
There was some discussion WRT that issue months ago. Currently the
implementation is wrong IMHO. When $hash{key} doesn't exist a new
PerlUndef is returned (which is fine) but it should be stored in the
hash too, so that assigning to $foo does The Rigth Thing.
(Parrot aggregates have reference semantics, so ...)
This may have been discussed previously, but I would think that reading 
a hash should never auto-vivify elements. (That would be contrary to 
the Perl5 semantics, at least.) You may need to return a special 
reference-style PMC which knows how to get back to the original hash if 
you assign to it, but the hash itself shouldn't grow entries.

JEff



Re: References to hash elements?

2004-01-06 Thread Jeff Clites
On Jan 6, 2004, at 1:31 PM, Simon Cozens wrote:

Jeff Clites:
But here what I'm copying is the _contents_ of the hash slot.
True, but irrelevant. :)

And here I'm not making a copy, but also the thing I'm taking a
reference to is not the same thing I copied above. Here, it's a
reference to a hash slot.
No, it isn't. It's a reference to a PMC. The fact that that PMC happens
to be pointed to by a hash key is incidental. It's still just another 
PMC.
No special case at all.
Oh, I see what you are saying. I was misinterpreting what \$hash{bar} 
is supposed to do. (I was thinking there was supposed to be some new 
way to reference a hash slot, such that assigning to the reference [or 
dereferencing and assigning] would be equivalent to using $hash{bar} as 
an l-value.)

(But saying something is a PMC doesn't clarify anything--basically 
everything ends up being a PMC. PerlStrings are PMCs, hashes are PMCs; 
we have reference PMCs If we had a reference-to-a-hash-slot, that 
would be a PMC also probably. So your original post threw me off track 
when you referred to "a PMC rather than an immediate value".)

But I get it now--thanks!

JEff



Re: Continuations don't close over register stacks

2004-01-06 Thread Jeff Clites
On Jan 6, 2004, at 3:41 PM, Luke Palmer wrote:

Leopold Toetsch writes:
Good. Pass it over to me :) COW copy of stacks and of other 
buffer-based
items is still broken. These need distinct headers so that they work
like COWed strings.
Alright, I've got a pretty big incomplete patch here (see, when one has
a deadline on a compiler written with the assumption of working
continuations, one can be pretty determined :-).
It makes each chunk into a subclass of Buffer like so:

struct RegisterChunkBuf {
size_t used;
PObj* next;
};
To do that properly, I think you need a pobj_t as the first struct 
member, like string has:

struct parrot_string_t {
pobj_t obj;
UINTVAL bufused;
void *strstart;
UINTVAL strlen;
const ENCODING *encoding;
const CHARTYPE *type;
INTVAL language;
};
so maybe something like:

struct RegisterChunkBuf {
pobj_t obj;
UINTVAL bufused;
RegisterChunkBuf* next;
};
But also take a look at list.h and see if it's already doing what you 
want to do; you may be able to do it directly.

And then, for example:

struct PRegChunkBuf {
struct RegisterChunkBuf buf;
struct PRegFrame PRegFrame[FRAMES_PER_CHUNK];
};
I want these things to be garbage collected, but DOD doesn't trace the
buffer.  I can't seem to find a way to mark the frames without making
the chunks into PMCs (yuck).   Is there a way to do this?
I think you'll need to add something to the root set tracing code 
(probably trace_active_buffers() in src/dod.c). I'm not sure what all 
of you stuff hangs off of, though.

Just some thoughts--I'm a little fuzzy on where these items are rooted.

JEff



Re: Continuations don't close over register stacks

2004-01-06 Thread Jeff Clites
On Jan 6, 2004, at 6:42 PM, Luke Palmer wrote:

Jeff Clites writes:
On Jan 6, 2004, at 3:41 PM, Luke Palmer wrote:

It makes each chunk into a subclass of Buffer like so:

   struct RegisterChunkBuf {
   size_t used;
   PObj* next;
   };
To do that properly, I think you need a pobj_t as the first struct
member, like string has:
struct parrot_string_t {
pobj_t obj;
UINTVAL bufused;
void *strstart;
UINTVAL strlen;
const ENCODING *encoding;
const CHARTYPE *type;
INTVAL language;
};
Ah, no.  That's how you create a new type of header.  I need nothing
more than the simple buffer header.  So I make a PObj and stick this
struct in its bufstart.
Hmm, okay; I'm mildly confused then, but that's okay--I need to stare 
at this a bit more. But if you put your struct inside the memory 
pointed to by the bufstart of a simple buffer header, then I think that 
compact_pool() is going to expect you to have a Buffer_Tail and such. 
Since your struct is fixed-size, I would think it would make more sense 
to make it a new type of header (like Stack_Chunk in stacks.h); but on 
the other hand, I don't think we COW headers, and we need major COW 
here. Hmm, still fuzzy.

I want these things to be garbage collected, but DOD doesn't trace 
the
buffer.  I can't seem to find a way to mark the frames without making
the chunks into PMCs (yuck).   Is there a way to do this?
I think you'll need to add something to the root set tracing code
(probably trace_active_buffers() in src/dod.c). I'm not sure what all
of you stuff hangs off of, though.
Did that.  That works for the main part, but continuations and
who-knows-what-else are going to be holding references to parts of
these, so I'd like to mark these automatically.
Yeah, I was on crack--what I said made no sense; of course these are 
going to be hanging off of individual continuation PMCs, not directly 
accessible from the interpreter struct, so these won't be part of the 
root set.

Unless... hey!  You just gave me an idea.  I'll make a mark_regstack
that anything that holds on to one of these has to call as part of its
mark routine.
Yep, at least from the vtable->mark() of the Continuation PMC. I see 
there's already a mark_stack() being called there (its impl. is in 
src/method_util.c). That's the right spot--I should have thought of 
that before.

I know, it seems like a no-brainer.  Mustn't have had a brain earlier 
:-)
Apparently, I didn't either! It must be going around.

JEff



Re: [perl #24829] RE: [PATCH] PPC JIT fixes [re-send] (Modified by Jeff Clites)

2004-01-07 Thread Jeff Clites
On Jan 6, 2004, at 3:28 PM, Adam Thomason (via RT) wrote:

-Original Message-
From: Jeff Clites [mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 31, 2003 4:30 PM
To: [EMAIL PROTECTED]
Subject: [PATCH] PPC JIT fixes [re-send] (Modified by Jeff Clites)
7) I don't expect anything here to break AIX users--the Mac OS X and
AIX calling conventions differ somewhat, but I don't think anything
here should run into those differences. But I'd like to hear
the test
results on AIX after application of the patch, and also I'd like to
know if most tests were failing before its application.
It never worked to begin with, so you haven't broken anything :).  On 
actinium, the initial jump into jit_code from runops_jit promptly 
segfaults.  I've yet to successfully convince gdb to import the stabs 
object (made either with GNU STABS and gas or XCOFF STABS and the AIX 
as) to see what parrot instruction is causing the failure.  Since even 
basic_1.pasm fails (i.e., noop,end), I'm tempted to think the problem 
is occurring before it gets to the PASM ops, but I'm not familiar 
enough with PPC ASM or gdb to track it down.  If there's anyone more 
knowledgable I'd be glad to take some direction and poke around.
Try this, to give you a lead:

1) In gdb, break on runops_jit, then step until jit_code is defined 
(but not jumpted to yet).

2) Dump the assembly by typing "x/40i jit_code". (That is "x" for 
examine memory at "jit_code", and the "/40i" says to format the output 
as instructions, and print out 40 of them.) That will at least give you 
the first chunk of the JIT-produced assembly.

3) Continue running. When you crash, use "info reg" to output the 
registers--the "pc" listed is the program counter, which should tell 
you the instruction on which you crashed. If the number matches one of 
the lines printed in (2), then that's where you crashed. If the number 
is in a completely different range, you probably jumped somewhere from 
the JIT code before crashing. If so, do (4):

(4) Start over, but after step (2), instead of just continuing, step 
one instruction at a time with "nexti" until you crash. Each time you 
step it will print out the program counter, so you'll be able to see at 
which point you jump from the JIT code into the beginning-of-the-end. 
That will tell us at least where to start looking.

(Also, as a side note, I believe that before my patches the 
basic_1.pasm test was working on Mac OS X--about the only one 
working--so it may be that something completely different is going 
wrong on AIX. And actually, there's some flush-the-processor-cache code 
which may be conditionally compiled, so maybe that isn't building in 
your case, and your jump could end up trying to execute stale data. Or 
it may be and issue with calling convention differences between AIX and 
Mac OS X.)

JEff



Re: Continuations don't close over register stacks

2004-01-07 Thread Jeff Clites
On Jan 7, 2004, at 1:46 AM, Leopold Toetsch wrote:

Luke Palmer <[EMAIL PROTECTED]> wrote:
It makes each chunk into a subclass of Buffer like so:

struct RegisterChunkBuf {
size_t used;
PObj* next;
};
That part is already answered: create a buffer_like structure.
*But* again register backing stacks are *not* in the interpreter 
context.
I don't understand what you are getting at. They are not physically 
part of Parrot_Interp.ctx, but it holds pointers to them, right? So, 
they need to be copied when the context is being duplicated. Is that 
your point, or are you trying to say that they are not _logically_ part 
of the context, or are not supposed to be?

JEff



Re: Continuations don't close over register stacks

2004-01-08 Thread Jeff Clites
On Jan 7, 2004, at 8:15 PM, Melvin Smith wrote:
Leopold Toetsch writes:
> Jeff Clites <[EMAIL PROTECTED]> wrote:
> > On Jan 7, 2004, at 1:46 AM, Leopold Toetsch wrote:
> Exactly the latter:
> That was AFAIK a design decision, when Dan did introduce CPS. At 
this
> time register backing stacks went out of the continuation or 
whatelse
> context - IIRC did Dan commit that to CVS himself.
I tend to agree, but maybe Dan can explain. I looked back at the
CVS history and when I put continuations in, I did originally have
register stacks in the Parrot_Context (although they weren't yet
garbage collected). Dan since reverted that and put them back to
the top level interpreter object.
I think I'm just being dense, but looking at 
include/parrot/interpreter.h it appears that they are in the context:

typedef struct Parrot_Context {
struct IRegChunk *int_reg_top;/* Current top chunk of int reg 
stack */
struct NRegChunk *num_reg_top;/* Current top chunk of the float 
reg
   * stack */
struct SRegChunk *string_reg_top; /* Current top chunk of the string
   * stack */
struct PRegChunk *pmc_reg_top;/* Current top chunk of the PMC 
stack */

struct Stack_Chunk *pad_stack;  /* Base of the lex pad stack */
struct Stack_Chunk *user_stack; /* Base of the scratch stack */
struct Stack_Chunk *control_stack;  /* Base of the flow control 
stack */
IntStack intstack;  /* Base of the regex stack */
Buffer * warns; /* Keeps track of what warnings
 * have been activated */
} parrot_context_t;

and

typedef struct Parrot_Interp {
struct IReg int_reg;
struct NReg num_reg;
struct SReg string_reg;
struct PReg pmc_reg;
struct Parrot_Context ctx;  /* All the registers and stacks 
that
   matter when context 
switching */
...

and in src/register.c we have:

void
Parrot_push_i(struct Parrot_Interp *interpreter, void *where)
{
/* Do we have any space in the current savestack? If so, memcpy
 * down */
if (interpreter->ctx.int_reg_top->used < FRAMES_PER_CHUNK) {
memcpy(&interpreter->ctx.int_reg_top->
   IRegFrame[interpreter->ctx.int_reg_top->used],
   where, sizeof(struct IRegFrame));
interpreter->ctx.int_reg_top->used++;
}
...
The continuation code only saves interpreter->ctx, and not the stuff 
that hangs off of it, but it seems pretty clear that the register save 
stacks are attached to the interpreter context in the current sources.

So what am I missing?

JEff



Re: [perl #24829] RE: [PATCH] PPC JIT fixes [re-send] (Modified by Jeff Clites)

2004-01-08 Thread Jeff Clites
On Jan 7, 2004, at 10:39 AM, Adam Thomason wrote:

The 40 ops range from 0x200fca28 to 0x200fca90, with 0x200fca94 onward 
being
".long 0x0".  The pc at failure is 0x7c0802a4.  So it's probably safe 
to
assume the trouble is pre-pasm.
Yep, you're right. Thanks for all of the great information--I'm pretty 
sure I now understand what's going on. (But I don't know that I have a 
fix.) So here's the scoop:

On AIX, function calls via pointer expect that the supplied address 
isn't a pointer to where the code starts, but rather a pointer to a 
structure which contains a pointer to where the code starts, as well as 
some other information (a TOC pointer, and an optional environment 
pointer). Since that's not what jit_code is, it blows up. How is blows 
up is that rather than jumping to jit_code, it jumps to *jit_code--that 
is, it dereferences jit_code and interprets that value as a function 
pointer. But that value is actually going to be the numeric value the 
first instruction:

(gdb) x/i jit_code
0x1024c00:  mflrr0
(gdb) x/w jit_code
0x1024c00:  0x7c0802a6
You see above that the numeric value of that instruction is 0x7c0802a6. 
But, instructions have to be on 4-byte boundaries, and the branch 
instructions for this reason ignore the 2 low-order bits (ie, they 
round down to a multiple of 4), so branching to 0x7c0802a6 really lands 
you at 0x7c0802a4, which is where you are crashing.

(This extra level of indirection for function pointers is part of the 
AIX scheme for doing indirect function calls, and the _ptrgl is some 
boilerplate code for pulling out the pieces of that struct and putting 
them into the correct registers. If you use "stepi" instead of "nexti", 
which I almost suggested before but then didn't, you should see that 
you get into the _ptrgl code okay, but then crash at the bctr 
instruction. "nexti" steps over function calls, so it's being 
misleading in this case--"stepi" will continue to step an instruction 
at a time across branches.)

So the problem is that the calling conventions differ between AIX and 
Mac OS X, and I think that we might need (at least) a different 
Parrot_jit_begin() and Parrot_jit_normal_op() for AIX. The above 
problem might not matter once we enter the JIT-generated code, but AIX 
also uses r2 as a TOC pointer and we'll probably need to maintain that 
properly. (At least, the NCI code will need to worry about that, once 
we are using a JIT-ish scheme on ppc to generate the needed stubs, 
since those will be jumping between libraries.) The AIX calling 
conventions seem to be closer to the Mac OS 9 conventions than to those 
for Mac OS X.

Just for fun, try changing runops_jit() to something like the 
following, and see what happens. I expect it to still blow up 
somewhere, but it might get further:

static opcode_t *
runops_jit(struct Parrot_Interp *interpreter, opcode_t *pc)
{
struct { jit_f functPtr, void *toc, void *env } temp;
#if JIT_CAPABLE
jit_f jit_code = (jit_f) D2FPTR(init_jit(interpreter, pc));
temp.functPtr = jit_code;
temp.toc = NULL; /* should fill with current TOC, but I don't know
how to find out what that is */
temp.env = NULL
(&temp) (interpreter, pc);
#endif
return NULL;
}
I didn't compile that to make sure that the syntax is okay, but the 
idea is to make jit_code the first member of a struct big enough to 
hold 3 pointers, and to use a pointer to that as the function pointer.

One of the first things I did to get the port to work was introduce the
'#ifndef __IBMC__' guard in ppc_sync_cache in jit/ppc/jit_emit.h, just 
to get
the file to compile (xlC doesn't support any form of inlined asm).
I saw that, and a couple of things in CVS comments, and jumped to the 
conclusion that it was working on AIX, but now I'm thinking that it 
must never have worked there.

 Might the
absence of the sync cause the trouble?  If so, now is probably the 
time to
patch up the build system to support assembling a separate .s file 
containing
the necessary snippet.  I can look into doing that, and calling it from
ppc_sync_cache, but I've no idea if there's any trouble w.r.t not 
inlining
that piece, so help would be appreciated.
I think it should be okay for it not to be inline, and alternatively it 
might make sense to just implement ppc_sync_cache entirely in assembly 
(though I guess that's a little more work). But that seems not to be 
the problem in this case.

JEff



Re: Continuations don't close over register stacks

2004-01-08 Thread Jeff Clites
On Jan 8, 2004, at 4:24 AM, Leopold Toetsch wrote:

Jeff Clites <[EMAIL PROTECTED]> wrote:

I think I'm just being dense, but looking at
include/parrot/interpreter.h it appears that they are in the context:
Sorry, yes. They are in the context but not saved. I mixed that up with
the registers themselves, which went out of the context.
Okay--thanks for the confirmation.

So it looks like currently if a continuation is activated after the 
stack frame in which it was created has exited, then 
interpreter->ctx.int_reg_top may point to garbage (since the frame 
_pointers_ are inside of the saved/restored context), so we'll end up 
with indeterminate behavior. I don't think we have any guards against 
that in place.

JEff



Re: BUG: coroutine + exception = stack_height segfault

2004-01-09 Thread Jeff Clites
On Jan 9, 2004, at 12:24 AM, Leopold Toetsch wrote:

Michal Wallace <[EMAIL PROTECTED]> wrote:
#!/bin/env parrot
#
# yieldbug.imc
#
# This program should print dots forever.
# Instead it prints a few dots and then segfaults.
It does print dots forever here.
It does for me too. But based on a previous email, I assume you are 
getting the crash after applying Luke's continuations patch? (I haven't 
tried your script with that applied.)

JEff



Meaning of "restart" in *.ops files?

2004-01-09 Thread Jeff Clites
What does "restart" mean in op files, as in "restart ADDRESS(resume);"? 
Also, why does the find_global op use it? I couldn't find it explained 
anywhere.

Also, on a vaguely related topic, in "Parrot_jit_cpcf_op", what does 
"cpcf" stand for?

JEff



Re: nci

2004-01-10 Thread Jeff Clites
On Jan 10, 2004, at 6:50 AM, Harry Jackson wrote:

 # New in libpq.so.3.1
 t   pt
 p   ptit33i
 i   ptit33i
 i   ptiit33i
 c   pi
I have jsut added the above signatures to my call_list due to an 
upgrade to postgres and was wondering what plans are there for 
managing the call_list. If every time there is a minor update to a lib 
and several new signatures get added this file is going to get quite 
large.
I assume the plan is to get on-the-fly building of NCI stubs working 
"everywhere". Platforms where that works don't need the functions 
generated by build_nativecall.pl, but right now that's only i386; it's 
a JIT-like and pretty difficult piece of code to write, but it should 
eventually work on every platform which supports JIT, at least.

JEff



Re: [PATCH] Fix for remaining PPC JIT failures

2004-01-10 Thread Jeff Clites
On Jan 9, 2004, at 11:05 AM, Robert wrote:

Yeah, I inquired about that ("Patch submission gone missing?" of 
1/1/2004), and Robert Spier indicated that it was in the RT 
moderation queue. If it doesn't show up in a bit, I'll re-send that 
patch submission directly to the list.
It was in the -list- submission queue.  I'm pretty sure I saw it show 
up.. but since I don't have full archives on my notebook, and I'm 
somewhere over Arizona, I can't be of much use right now.

Leo, you can grab it directly from RT too.
I don't think it ever showed up on the mailing list, so I re-sent it 
directly there myself, and the patch has been applied, so I think it's 
all taken care of now.

Thanks,

JEff



Re: [PATCH] The Return of the Priority DOD

2004-01-10 Thread Jeff Clites
On Jan 10, 2004, at 11:54 AM, Nicholas Clark wrote:

On Sat, Jan 10, 2004 at 08:39:51PM +0100, Leopold Toetsch wrote:
Nicholas Clark <[EMAIL PROTECTED]> wrote:

src/dod.c: In function `clear_live_bits':
src/dod.c:755: `cur_arena' undeclared (first use in this function)

The appended patch cures it (and all tests pass) but I'm not sure if 
it
is correct.
Ah yep, thanks. I didn't test the patch with ARENA_DOD_FLAGS disabled
and finally diceded that "arena" is as meaningful and readable as
"cur_arena" and changed that before submitting - a policy which almost
always causes an error somewhere :)
How come most tinderboxes kept going without failing? What's making the
choice on the value of ARENA_DOD_FLAGS ?
It gets set in include/parrot/pobj.h, and is basically set to true if 
your platform has some flavor of memalign(), which allows you to 
allocate chunks of memory with arbitrary power-of-2 alignment. So all 
the platforms being tested on the tinders probably have this. (Of 
course, you can manually set ARENA_DOD_FLAGS to false in the source, 
for testing.)

JEff



Re: [perl #24866] [PATCH] Unified harness for IMCC tests

2004-01-11 Thread Jeff Clites
On Jan 11, 2004, at 3:10 AM, Leopold Toetsch wrote:

Bernhard Schmalhofer <[EMAIL PROTECTED]> wrote:

'make fulltest' gives 6 reports. 5 reports for different parrot 
options plus
1 report for t/src/*.t.
Dou you see a simple way, to run all tests in one harness for
"fulltest". That would need some configure hints, which run-loops are
available and then run one test X times in a loop with the appropriate
run-loop switch - or something like that.
I have a patch which lets you run multiple sets of tests in t/harness, 
and get one summary at the end. (I was working a bit on the same 
problem as Bernhard, but from a different direction, so I think we have 
met in the middle.)

Essentially, it lets you do things such as this:

	perl t/harness -b t/op/*.t t/pmc/*.t t/src/*.t -j t/op/*.t t/pmc/*.t

instead of having to do:

perl t/harness -b t/op/*.t t/pmc/*.t t/src/*.t
perl t/harness -j t/op/*.t t/pmc/*.t
and gives you a single summary at the end.

Unfortunately it requires modification of Test::Harness to allow 
multiple sets of tests to be summarized together, so we'd have to add 
that to the repository along with our other Test:: modules. But it's a 
straightforward modification.

I'll clean up my t/harness changes to make them relative to Bernhard's 
patch, and send it all in.

JEff



JVM as a threading example (threads proposal)

2004-01-12 Thread Jeff Clites
This is a threading proposal, in the form of a collection of notes.

Rationale:

We need to constantly compare parrot to the JVM, and in particular have  
a deep understanding of cases where the JVM performs well (or better  
than parrot). The reason for this is obvious: the JVM is the product of  
several years of R&D, and we need to benefit from that where we can. Of  
course, there are places where the two will necessarily differ (due to  
differences such as static v. dynamic typing of their target  
languages), but these necessary differences don't account for  
everything, and it's important for us to understand if individual  
performance differences are fundamental, or incidental. In particular,  
it's instructive to look at Java's threading strategy, and its impact  
on GC.

I have 2 fundamental assumptions:

1) As stated by others before, Parrot needs to guarantee that its  
internal state is protected, even in the absence of proper  
synchronization by user code, though user-visible data structures may  
become corrupted from the user's perspective. (That is, a hash might  
suddenly lose some entries, but it can't become corrupted in such a way  
that a crash or memory leak might result.) To put it formally, no  
sequence of byte code can have the possibility of causing an attempt to  
read from, or write to, unallocated memory. [That may not cover all  
cases of corruption, but it gets at the main point.] This is  
particularly important in an embedded context--we want to be able to  
guarantee that bad bytecode can't crash the process in which parrot is  
embedded. Note that Java seems to provide similar guarantees (more on  
this below).

2) My second assumption is that if we can provide a safe and performant  
implementation which allows "full-access" threads (that is, without  
data-access restrictions imposed between threads), then  
access-restricted threads (such as Perl 5's ithreads) can be  
implemented on top of this. Note that thread creation in the  
access-restricted case will necessarily be slower, due to the need to  
copy (or COW-copy) additional data structures.

So the theme is taking lessons from the JVM wherever we can. The focus  
is on how we can provide safe and high-performance threading without  
data-access restrictions between threads (in light of assumption 2  
above).

Notes and considerations:

1) As mentioned above, Java seems to guarantee robustness in the  
absence of proper user-level synchronization. In particular, section  
2.19 of the JVM spec  
 states the following:

"When a thread uses the value of a variable, the value it obtains is in  
fact a value stored into the variable by that thread or by some other  
thread. This is true even if the program does not contain code for  
proper synchronization. For example, if two threads store references to  
different objects into the same reference value, the variable will  
subsequently contain a reference to one object or the other, not a  
reference to some other object or a corrupted reference value. (There  
is a special exception for long and double values; see §8.4.)"

(It also further states, "In the absence of explicit synchronization,  
an implementation is free to update the main memory in an order that  
may be surprising. Therefore, the programmer who prefers to avoid  
surprises should use explicit synchronization.")

(Section 8.1  
 also clarifies, "A variable is any location  
within a program that may be stored into. This includes not only class  
variables and instance variables, but also components of arrays.")

2) It seems that Java provides these guarantees in part by relying on  
the atomicity of stores of 32-bit sized data. (This is "atomic" in the  
sense of "no word tearing", meaning that when such a value is written  
to a memory location, any thread reading that location will see either  
the old or the new value, and not some other value.) Section 8.4 of the  
JVM spec  
 implies this, albeit indirectly. Due to the  
ubiquity of the JVM, it seems reasonable for Parrot to similarly rely  
on atomic stores, without compromising the range of platforms on which  
parrot can run. In practice, modern processors likely provide such a  
guarantee natively, without the need for explicit locking (though  
explicit locking could be employed in the seemingly rare case of a  
platform which provides threads but not atomic memory updates). If Java  
is relying on some other mechanism here, we need to understand what  
that is.

3) There seem to be 3 basic categories of things which need  
thread-safety: data structures internal to the VM (eg, opcode arrays,  
register stacks), variables (globals, registers, etc.; most  
importantly, those which hold references

Re: More object stuff

2004-01-12 Thread Jeff Clites
On Jan 12, 2004, at 8:07 AM, Harry Jackson wrote:

Dan Sugalski wrote:

Having a fetchrow_hash that returned a Hash where the keys are the 
column names and the values are the column values would be most of 
the rest.
I read somewhere that accessing a hash was slightly slower than 
accessing and array which is one reason why I never used it. The other 
reason is that if I name the fileds in the hash then the user needs to 
know the names to access them or perform some magic to get them.
The names will be known--they are the column names (or aliases) used in 
the query.

With an array they come out in the order they are aksed for.
The nice thing about a hash (that I've found when using DBI) is that 
you don't have a problem if you end up inadvertently changing the order 
in which the columns are returned, when modifying a query.

Another reason not to use the hash method above is that you are moving 
column names around that will not change throughout the transaction 
(is this not more bulky than using arrays).
Hopefully the actual strings used as the keys can be re-used. (That is, 
each hash can use the same string instance for a given key. I believe 
that Perl5 does this for hash keys.)

JEff



Some namespace notes

2004-01-13 Thread Jeff Clites
Here are some notes on namespaces, picking up a thread from about a 
month ago:

On Dec 11, 2003, at 8:57 AM, Dan Sugalski wrote:

That does, though, argue that we need to revisit the global access 
opcodes. If we're going hierarchic, and we want to separate out the 
name from the namespace, that would seem to argue that we'd want it to 
look  like:

  find_global P1, ['global', 'namespace', 'hierarchy'], "thingname"

That is, split the namespace path from the name of the thing, and make 
the namespace path a multidimensional key.
I definitely agree that we should have separate slots for namespace and 
name, as you have above. So I think the discussion boils down to 
whether a namespace specifier is logically a string or an array of 
strings.

Short version: I was originally going to argue for fully hierarchical 
namespaces, identified as above, but after turning this over in my head 
for a while, I came to the conclusion that namespaces are not 
conceptually hierarchical (especially as used in languages such as 
Perl5 and Java, at least), so I'm going to argue for a single string 
(rather than an array) as a namespace identifier.

Here's my framing of the general problem. I think there are 3 basic 
options:

1) No namespaces. That would mean we might have variables named 
"Foo::Bar::baz", but to parrot that would just be a variable with a 
funny name (it wouldn't infer any sort of nesting of variables or 
namespaces).

2) Two-level namespaces. That would mean that parrot has the concept of 
"look up variable 'baz' in namespace 'Foo::Bar'", but no concept of 
nested namespaces--"Foo::Bar" is just a funny namespace name (no 
nesting of namespaces inferred).

3) Full hierarchical namespaces. So parrot knows how to "look up 
variable baz inside namespace Bar inside namespace Foo". Parrot would 
never need to see the syntax ("::" v. "." v. whatever) used by 
different languages to specify nested namespaces--their compilers would 
assemble these as arrays, as in Dan's example above.

(Also, (2) v. (1) is what Tim was indicating with :

*{"Foo\0Bar\0Baz"}->{var};
or
*{"Foo\0Bar\0Baz\0var"};
in his post in the previous thread, I believe.)

I think that probably most agree that (1) is out--so the question is 
(2) v. (3).

I think there are 2 considerations:

A) What does a hierarchy give us?

B) What kind of cross-language compatibility do we need?

As to (A), I don't think the hierarchy actually matters much. What I 
mean is, that I don't think it's actually significant to say that the 
namespace A::B::C is "inside" of the namespace A::B. For instance, 
$A::B::C::var won't "fall back" to finding $A::B::var -- they're really 
just separate namespaces which would have worked just the same if 
they'd been called "ABC" and "AB" or "Foo" and "Bar". The hierarchy is 
only used to conceptually organize things (for humans), not really at 
runtime. Notably, this is the viewpoint taken by Java and I believe by 
Perl5. (For instance, Java has a "com.sun.media.sound" namespace, but 
not a "com" namespace.)

So unless I'm missing some uses of a hierarchy, I think that (A) 
doesn't argue for (3) over (2), so it boils down to consideration (B).

For (B), what I mean is: Do we want the following to refer to the same 
package/namespace:

in Perl:
use Foo::Bar::Baz;
in Java:
import Foo.Bar.Baz;
If we do, then I say we should go with (3), and use the array-based 
method of specifying a namespace which Dan indicated above. Then, it's 
up to the individual compilers to pick apart this syntaxes into the 
same arrays: ['Foo', 'Bar', 'Baz']

On the other hand, maybe we don't want this. Maybe we want these to 
refer to different packages/namespaces. In Perl, if I want to actually 
instantiate a java.lang.String, maybe it's clearer to just really treat 
the class name as "java.lang.String". I actually think it should be up 
to the individual language implementers to decide if they want to 
"normalize" during compilation to a common syntax for specifying 
package names, but I think it makes more sense for them to _not_ 
normalize, and in Perl just "use java.lang.String" to pull in that Java 
package.

So I'm arguing for (2), which says: Namespaces don't conceptually nest.

Now, that said, this really just argues that most languages actually 
use 2-level namespaces in their syntax--that "Foo::Bar::Baz" doesn't 
really indicate nesting. But, we can certainly _allow_ namespace 
nesting--it just wouldn't have a one-line syntax. What I'm thinking of 
here is having ops like this:

-
# shortcut for lookup of "thingnane" in global namespace
find_global P1, "thingname"
-
# lookup "thingname" in namespace "MyPackage::Foo"; really, this is: 
find namespace "MyPackage::Foo" inside global namspace, and lookup 
"thingname" in that, so it's still a shortcut

find_global P1, "MyPackage::Foo", "thingname"
-
# here's an alternate, more explicit way to do the same thing. This 
might be slower for a

Re: JVM as a threading example (threads proposal)

2004-01-14 Thread Jeff Clites
On Jan 12, 2004, at 10:03 AM, Gordon Henriksen wrote:

On Monday, January 12, 2004, at 04:29 , Jeff Clites wrote:

5) Java seems to use a check-in/check-out model for access to global  
data, in which global data "lives" in a central store, but is copied  
back-and-forth to thread-local storage for modification. I don't 
fully  understand the performance benefit which I assume this 
provides, but  this is something else we need to understand. It's 
discussed in detail  in the threads section of the JVM spec,  
<http://java.sun.com/docs/books/vmspec/2nd-edition/html/ 
Threads.doc.html>.
The passage in question being:

Every thread has a working memory in which it keeps its own working 
copy of variables that it must use or assign. As the thread executes a 
program, it operates on these working copies. The main memory contains 
the master copy of every variable. There are rules about when a thread 
is permitted or required to transfer the contents of its working copy 
of a variable into the master copy or vice versa.
There's that passage, but actually the following sections go into depth 
about the semantic constraints on how this is supposed to work.

This is very obliquely phrased, but it refers to the register file and 
stack frame. Or, since the Java bytecode is stack-based, it refers to 
the stack.
You might be right, but that's not exactly how I read it, because later 
it says, "A use action (by a thread) transfers the contents of the 
thread's working copy of a variable to the thread's execution engine. 
This action is performed whenever a thread executes a virtual machine 
instruction that uses the value of a variable."

I assume that the stack would correspond to the "execution engine" 
here, rather than to the "thread's working copy", but I could read it 
either way.

In any case, I thought this sounded like an interesting model, but 
based on the rules in that section about how accesses to data in the 
thread-local store have to correspond to accesses of the main store, it 
wasn't apparent to me how there was an advantage over just accessing 
things directly out of the main store. But I assume I'm overlooking 
some subtlety, and that there may be an idea here that we could use to 
our advantage.

JEff



  1   2   3   4   >