library assumptions

2002-04-08 Thread Roman Hunt

Hello all:
  I was just begining work on the string api and was wondering what
  libraries are allowed for use inside the interpreter.  Mainly
  I want to know if I can use stdarg.h


--Roman




Re: library assumptions

2002-04-08 Thread Melvin Smith

At 06:32 PM 4/7/2002 -0400, Roman Hunt wrote:
Hello all:
   I was just begining work on the string api and was wondering what
   libraries are allowed for use inside the interpreter.  Mainly
   I want to know if I can use stdarg.h

I would expect that should be fine, stdarg is one of the 4 headers
that are guaranteed by ANSI C89 even on a free standing environment
(read embedded targets, etc.)

Its integral to C, and if you don't have it, I suppose the question would
be why we should port to it.

-Melvin





Re: library assumptions

2002-04-08 Thread Dan Sugalski

At 6:32 PM -0400 4/7/02, Roman Hunt wrote:
Hello all:
   I was just begining work on the string api and was wondering what
   libraries are allowed for use inside the interpreter.  Mainly
   I want to know if I can use stdarg.h

As Melvin's said, that's fine. Pretty much everything else needs a 
Configure test, of course. :(
-- 
 Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: library assumptions

2002-04-08 Thread Russ Allbery

Melvin Smith [EMAIL PROTECTED] writes:

 I would expect that should be fine, stdarg is one of the 4 headers that
 are guaranteed by ANSI C89 even on a free standing environment (read
 embedded targets, etc.)

 Its integral to C, and if you don't have it, I suppose the question
 would be why we should port to it.

Basically, whether you can use stdarg.h is directly tied to whether you
want to support KR compilers.  If you want to support KR, you have to
allow for the possibility of varargs.h instead.  If you are willing to
require an ANSI C compiler (which I believe was the decision already
made), stdarg.h is safe.

-- 
Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/



string api

2002-04-08 Thread Roman Hunt

hello:
I am interested in contributing to the project.  (Thank Dan's
cross-country tour :)  This is my first project of this size
and importance, but I feel up to the task.  (Read: Please, be
patient with the newbie).  I have begun work on
string_nprintf() as strings.pod says that it was still
unimplemented. I have run into a few questions though: I can't
find the definition for the string_vtable it is not in
string.h as the pod states.  Also, what would be the standard
way to map a C string into a STRING would one just call
string_make passing a pointer to the char buffer  with
the correct encoding passed in, and the strings length into
len?



Re: string api

2002-04-08 Thread Melvin Smith

At 05:49 PM 4/8/2002 -0400, Roman Hunt wrote:
hello:
 and importance, but I feel up to the task.  (Read: Please, be
 patient with the newbie).  I have begun work on

The more the merrier, its been too quiet this last week.

 find the definition for the string_vtable it is not in

Try classes/perlstring.pmc

Keep in mind there is the primitive STRING type which is the S* registers,
and then there is the PMC (PerlString) which uses vtables.

 string.h as the pod states.  Also, what would be the standard
 way to map a C string into a STRING would one just call
 string_make passing a pointer to the char buffer  with
 the correct encoding passed in, and the strings length into
 len?

Looks correct, except make sure you watch where you stash
the STRING while you are working on it.

If you make calls to subroutines that may trigger a GC_collect()
then the STRING you had might be moved or collected.

For now the safest is the 'immortal' bit or stashing the STRING in a
register so the root set can see it.

The standard way to deal with this (as far as Parrot goes) is
still the topic of debate.

However, if all you are doing is allocating the STRING then doing a lot of
known ops and/or system calls that don't thread into the GC,
there is nothing to bother with.

Hope this helps,

-Melvin




Re: string api

2002-04-08 Thread Steve Fink

On Mon, Apr 08, 2002 at 07:01:44PM -0400, Melvin Smith wrote:
 At 05:49 PM 4/8/2002 -0400, Roman Hunt wrote:
 find the definition for the string_vtable it is not in
 
 Try classes/perlstring.pmc
 
 Keep in mind there is the primitive STRING type which is the S* registers,
 and then there is the PMC (PerlString) which uses vtables.

The primitive STRING also uses a vtable (for the different encodings).
That's in include/parrot/encoding.h.

 string.h as the pod states.  Also, what would be the standard
 way to map a C string into a STRING would one just call
 string_make passing a pointer to the char buffer  with
 the correct encoding passed in, and the strings length into
 len?
 
 Looks correct, except make sure you watch where you stash
 the STRING while you are working on it.
 
 If you make calls to subroutines that may trigger a GC_collect()
 then the STRING you had might be moved or collected.
 
 For now the safest is the 'immortal' bit or stashing the STRING in a
 register so the root set can see it.

And if the C string belongs to someone else, you may need to set the
BUFFER_external_FLAG flag.

 However, if all you are doing is allocating the STRING then doing a lot of
 known ops and/or system calls that don't thread into the GC,
 there is nothing to bother with.

This message does remind me of how empty the TODO list is. Surely we
can think of many more things to be done?



Re: string api

2002-04-08 Thread Melvin Smith

At 06:10 PM 4/8/2002 -0700, Steve Fink wrote:
On Mon, Apr 08, 2002 at 07:01:44PM -0400, Melvin Smith wrote:
  At 05:49 PM 4/8/2002 -0400, Roman Hunt wrote:
  find the definition for the string_vtable it is not in
 
  Try classes/perlstring.pmc
 
  Keep in mind there is the primitive STRING type which is the S* registers,
  and then there is the PMC (PerlString) which uses vtables.

The primitive STRING also uses a vtable (for the different encodings).
That's in include/parrot/encoding.h.

Don't mind me, I'm 75% fact and 25%..well

  If you make calls to subroutines that may trigger a GC_collect()
  then the STRING you had might be moved or collected.
 
  For now the safest is the 'immortal' bit or stashing the STRING in a
  register so the root set can see it.

And if the C string belongs to someone else, you may need to set the
BUFFER_external_FLAG flag.

  However, if all you are doing is allocating the STRING then doing a lot of
  known ops and/or system calls that don't thread into the GC,
  there is nothing to bother with.

This message does remind me of how empty the TODO list is. Surely we
can think of many more things to be done?

Speaking of..

1) Bugfix release please, we banged quite a few stack and GC bugs out.
 Don't we get any dessert?

2) I'm thinking of an internal stack not visible to user code that we use
 for temporary PMCs and Buffers and a simple macro for entry and
 exit of GC sensitive routines. I think I might have mentioned this.


 p = gcsaveframe();

 yada
 yada
 yada

 gcrestoreframe(p);

This scribble pad stack is part of the root set so I think its self 
explanatory.

Even if messy code scribbles too much on the stack, as long as the outer
scopes restore the stack frame, it'll be kept in check.

So..

foo_alloc() {
 marker = gcsaveframe();
 bar_alloc();
 gcrestoreframe(marker);
 # All cleaned up
}

# bar_alloc might be messy and return without restoring.
bar_alloc() {
 mymarker = gcsaveframe();
 yada();
 return;
}

There isn't anything really innovative here, its the same way we handle normal
stacks, yet its just implicit because the pushes are hidden in the PMC and
buffer allocators.

I'm not a GC design guru, but the limited reading I've done on JVM hints that
they do something similar.

Then again, I haven't thought about how this works with threads, I suppose
the stack would have to exist in TLS.

I'd like something like this rather than hoping all developers can 
systematically
set bits or handle references correctly because in reality we'd probably
never catch all the cases.

-Melvin





[netlabs #500] disassemble fails with errors and garbage

2002-04-08 Thread Clinton A. Pierce

# New Ticket Created by  Clinton A. Pierce 
# Please include the string:  [netlabs #500]
# in the subject line of all future correspondence about this issue. 
# URL: http://bugs6.perl.org/rt2/Ticket/Display.html?id=500 


Compiling BASIC into out.pbc:

C:\projects\parrot\parrotbasic.pl  [produces out.pbc]
Including stackops.pasm
Including alpha.pasm
Including dumpstack.pasm
Including tokenize.pasm
Including basicvar.pasm
Including basic.pasm
Including instructions.pasm
Including expr.pasm
   4026 lines

Ready
QUIT
C:\projects\parrot\parrotdisassemble.pl out.pbc
Use of uninitialized value in modulus (%) at lib/Parrot/Types.pm line 82, 
GEN0 line 12.
Use of uninitialized value in addition (+) at lib/Parrot/Types.pm line 83, 
GEN0 line 12.
Use of uninitialized value in substr at lib/Parrot/Types.pm line 85, GEN0 
line 12.
PackFile::ConstTable: Internal error: Unpacked Constant returned bad byte 
count '52'! at lib/Parrot/PackFile/ConstTable.pm line 73
 
Parrot::PackFile::ConstTable::unpack('Parrot::PackFile::ConstTable=HASH(0x1d48340)', 
'm^@^^
s^@^^@^T^^@^^@^^@^^@^^@^^@^^@^^A^@^^@#^^@^s^@^^@^T^^@^^@^^@^^@^^@^^@^^@^^A^@^
^@-^^@^s^@^^@...') called at lib/Parrot/PackFile.pm line 149
 Parrot::PackFile::unpack('Parrot::PackFile=HASH(0x1d48358)', 
'M-!U1^A^@^^@^4^O^@^m^@^^@s
^@^^@^T^^@^^@^^@^^@^^@^^@^^@^^A^@^^@#^^@^s^@^^@^T^^@^^@^^@^^@^^@^^@^^@^...')
 
ca
lled at lib/Parrot/PackFile.pm line 206
 Parrot::PackFile::unpack_filehandle('Parrot::PackFile=HASH(0x1d48358)', 
'FileHandle=GLOB(0x1
d483e8)') called at lib/Parrot/PackFile.pm line 222
 Parrot::PackFile::unpack_file('Parrot::PackFile=HASH(0x1d48358)', 
'out.pbc') called at C:\pr
ojects\parrot\parrot\disassemble.pl line 248
 main::disassemble_file('out.pbc') called at 
C:\projects\parrot\parrot\disassemble.pl line 276




Re: string api

2002-04-08 Thread Michel J Lambert

 This message does remind me of how empty the TODO list is. Surely we
 can think of many more things to be done?

 Speaking of..

 1) Bugfix release please, we banged quite a few stack and GC bugs out.
  Don't we get any dessert?

Peter has already stated he'd like his parrot_reallocate_buffer patch to
go in first, as it does fix a reproducible bug with clint's evaluator. And
there's still a bunch of GC bugs. I know of three types:

1) The problem that's been brought up before (and below), of CPU-stack
vars not being traceable.
2) The GC initialization stuff could potentially trigger a GC (woops!). My
fix was to disable the GC during interpreter initialization, but I'm not
sure what we want to do for this.
3) Non-string buffers (ie, stuff in last_Buffer_Arena, not
last_String_arena) are pretty broken, I think. They aren't marked,
unmarked, freed, or copied. I could be mistaken on this one, as I don't
have any test cases that break it yet, just my understanding of the code.
This should be a lot easier to fix, with a little copy and paste. But I'd
like to get a valid test case before I attempt to fix this.

 2) I'm thinking of an internal stack not visible to user code that we use
  for temporary PMCs and Buffers and a simple macro for entry and
  exit of GC sensitive routines. I think I might have mentioned this.

What defines a GC-sensitive routine? Anything that does string manip, pmc
manip, or any allocations, is marked GC-sensitive?

 I'd like something like this rather than hoping all developers can
 systematically set bits or handle references correctly because in
 reality we'd probably never catch all the cases.

Two things:

First, we now have a GC_DEBUG define that we can turn on to find all
places the GC could cause problems. In the current state, I think it
covers 90% of the problems (one problem is that if I conditionally
call string_make, this function isn't guaranteed of triggering GC in
a test case, like it should.)

However, if we can't find all the places we do buffer manipulation to mark
them immortal, how are we going to properly identify all the GC-sensitive
functions?

Secondly, setting a flag should be much quicker, speed-wise. We'd only be
setting flags on buffer headers that are already in the CPU cache, as
opposed to writing to memory in this stack, pushing and popping all the
time. And if we macro-ize the setting of the flags, I don't think it
should look nearly as bad. GC_MARK(Buffer), GC_UNMARK(Buffer), etc.


I know there's been little activity in the past week...as far as my
activity, I'm waiting for the Dan to come back tomorrow, and tell us
minions what the plan is for GC stuff. Peter and I fixed most of the GC
bugs that are easily fixed, but the rest require a more architectural fix,
something I think we all are deferring to Dan on.

Mike Lambert




Re: string api

2002-04-08 Thread Steve Fink

On Mon, Apr 08, 2002 at 11:40:28PM -0400, Michel J Lambert wrote:
 However, if we can't find all the places we do buffer manipulation to mark
 them immortal, how are we going to properly identify all the GC-sensitive
 functions?

Ack! Sorry for being anal, but I finally decided the 'immortal' name
just bugged me too much, and renamed it to 'immune'. :-)




Re: string api

2002-04-08 Thread Melvin Smith

At 11:40 PM 4/8/2002 -0400, Michel J Lambert wrote:
  2) I'm thinking of an internal stack not visible to user code that we use
   for temporary PMCs and Buffers and a simple macro for entry and
   exit of GC sensitive routines. I think I might have mentioned this.

What defines a GC-sensitive routine? Anything that does string manip, pmc
manip, or any allocations, is marked GC-sensitive?

Ok, thats a really general phrase I used. :)

I agree we need an overall architectural solution. Setting and clearing
bits manually is error-prone but fast, as you said. Its identical to
the malloc()/free() situation which is one of the primary reasons we
use garbage collection in the first place, so why reinvent the same
situation with different syntax?

malloc/free is vulnerable to:
1) leakage (forgot to free)
2) double deallocation (freed an already freed buffer)

So is setting/clearing GC bits.

I was thinking of a solution that didn't require tracking every single
allocation. Keep in mind I'm just tossing about an alternate point of
view for sake of discussion.

I suppose a variation of the scratch-pad that might be more on the
performance line that you are thinking could be similar to the
scope tracking that compilers do when gathering symbols into
symbol tables.

Keep track of global (or interpreter local) scope with a macro
upon entry.

#define GC_NEWPAD() cur_interp-scope++
#define GC_CLEARPAD(s)  cur_interp-scope = s

So a GC-able buffer gets created with a intial scope of cur_interp-scope,
hidden in the allocator, and the collector skips collect on any
buffer with scope = the cur_scope.

Two things:

First, we now have a GC_DEBUG define that we can turn on to find all
places the GC could cause problems. In the current state, I think it
covers 90% of the problems (one problem is that if I conditionally
call string_make, this function isn't guaranteed of triggering GC in
a test case, like it should.)

However, if we can't find all the places we do buffer manipulation to mark
them immortal, how are we going to properly identify all the GC-sensitive
functions?

Secondly, setting a flag should be much quicker, speed-wise. We'd only be
setting flags on buffer headers that are already in the CPU cache, as
opposed to writing to memory in this stack, pushing and popping all the
time. And if we macro-ize the setting of the flags, I don't think it
should look nearly as bad. GC_MARK(Buffer), GC_UNMARK(Buffer), etc.

Fair enough on the speed point, however you have to remember for
every object you are handling to (1) mark it, (2) unmark it after attaching
it to the root set.

One the other hand, what if we could say

{
 orig = GC_NEWPAD();

 x();
 y();
 z();

 GC_CLEARPAD(orig);
}

and know that anything in between newpad to clearpad would be
safe and be free to write normal code even with GC churning.

And there is no stack churn.

I know there's been little activity in the past week...as far as my
activity, I'm waiting for the Dan to come back tomorrow, and tell us
minions what the plan is for GC stuff. Peter and I fixed most of the GC
bugs that are easily fixed, but the rest require a more architectural fix,
something I think we all are deferring to Dan on.

Agreed. However, more discussion around here is a good thing. :)

-Melvin





Worst-case GC Behavior?

2002-04-08 Thread Michel J Lambert

I think I know of two potential performance problems with the GC code.
They could be problems in my head, or real problems, as I haven't done any
profiling. We also don't have any real test cases. :)



The first example is the following code, which calls parrot_allocate to
create the string each time.

for(1..1) {
  push a, a;
}

If we start out with no room, it calls Parrot_go_collect for each push,
but the go_collect does nothing because there's no free memory. This then
requires another allocation, fit exactly to the size of the block, one
character.

Repeat.

Since we're never freeing any memory, it continually is allocating a block
of size 56 (memory pool) + 1 (character)  from the underlying system api.



The second example involves headers. Say we have the following code:
loop:
new P0, PerlString
branch loop

Which might correspond to:

while(1) {
  my $dummy;
}

Each time through the array, it has to alloc a PMC header. When we
allocate the header, we store it into P0, and the old header is
essentially freed.

The next time through the loop, entries_in_pool is 0, and it triggers
alloc_more_string_Headers, and a dod run. This finds the PMC we just
freed, and uses it. Repeat. Each time through the loop, it triggers a dod
run.

The above example might be a bit contrived, due to the fact that it could
pull 'my $dummy' outside of the loop, assuming it isn't tied. (If it is
tied, we need to do it each time through, since it could be counting the
number of times we set the variable, or somesuch.)



Now, I know that any memory management system can have cases which cause
worst-case behavior. I'm not sure if the cases I presented are those kind
of cases, or whether they are common enough that we need to worry about
it.

The first problem can probably be solved by enforcing a minimum block
allocation size. I'm not sure of a good solution to the second problem,
however.

If we do have a minimum block allocation size, it will perform horribly
memory-wise on something like:

while (1) {
  push a, a;
  push a, aax200;
}

This example destroys any implementation that has a minimum block
allocation size. This could be alleviated with a linked list of blocks
that have available memory at the tail of the block. This could give very
bad performance whenever we have lots of half-filled blocks. (Say, when we
continually allocate blocks of size 0.51*minimum_block_allocation_size.)
I recall Dan saying he didn't like traversing linked lists when searching
for memory, but it shouldn't be that bad since it all gets cleaned up on a
call to parrot_go_collect.

Finally, another approach is to randomize things. Lots of algorithms
randomize their behavior to prevent test cases that exhibit worst-case
behavior. Of course, I'm not sure if a non-deterministic interpreter is a
good thing, since it'll just make GC bugs that much more annoying to track
down.

These might all be things that were considered, and discarded as not
important enough. But I didn't see these potential problems mentioned
anywhere, so I figured I'd bring them up here, just in case.

Mike Lambert




macros (was Re: string api)

2002-04-08 Thread Robert Spier


 Keep track of global (or interpreter local) scope with a macro
 upon entry.

I shudder every time someone says macro on p6i.

perl5 has several thousand macros defined.  (grep for ^#define) (over 
8000 if you include all the embedding macros.  it's down to ~4000 if you 
cut out embedding, config.. and closer to ~1500/2000 if you rip out more 
things.)

This makes it wonderfully challenging to debug.

Macros are a useful feature of the C language, but we should be very 
careful in how we use them.  (I'm not saying don't use them at all.) 
I'm sure there's a happy medium somewhere between no macros and perl5. 
We should look for it.

-R




Re: macros (was Re: string api)

2002-04-08 Thread Melvin Smith

At 10:30 PM 4/8/2002 -0700, Robert Spier wrote:

Keep track of global (or interpreter local) scope with a macro
upon entry.

I shudder every time someone says macro on p6i.

perl5 has several thousand macros defined.  (grep for ^#define) (over 8000 
if you include all the embedding macros.  it's down to ~4000 if you cut 
out embedding, config.. and closer to ~1500/2000 if you rip out more things.)

Are you counting literals and things like bit values in your grep?

This makes it wonderfully challenging to debug.

That might be a bit unfair, I'd argue that it makes it _easier_
to debug in many cases, particularly with constants.

Macros are a useful feature of the C language, but we should be very 
careful in how we use them.  (I'm not saying don't use them at all.) I'm 
sure there's a happy medium somewhere between no macros and perl5. We 
should look for it.

'macro' here is a choice of words... call it an inline function if you want.

I'd be more worried about debugging that computed goto core than a macro. :)

-Melvin




Re: Worst-case GC Behavior?

2002-04-08 Thread Melvin Smith

At 01:17 AM 4/9/2002 -0400, Michel J Lambert wrote:
The first example is the following code, which calls parrot_allocate to
create the string each time.

Might both of these be solved by using arenas?

-Melvin




Re: string api

2002-04-08 Thread Michel J Lambert

 I agree we need an overall architectural solution. Setting and clearing
 bits manually is error-prone but fast, as you said. Its identical to
 the malloc()/free() situation which is one of the primary reasons we
 use garbage collection in the first place, so why reinvent the same
 situation with different syntax?

Generally, malloc/free are used in more complex situations than just
stack-based memory management. But I see your point.

 malloc/free is vulnerable to:
 1) leakage (forgot to free)

If you remember to mark it as used, you're pretty much guaranteed to mark
it as unused at the end of the same function.

 2) double deallocation (freed an already freed buffer)

In general, this can't happen with setting bits. We can't unset a bit
twice, since we're only doing this on stuff returned by a function, for
the duration of the function that got it. If we return it again,
they'll set it, and free it on their own. Agreed, it's confusing, and thus
the reason for this whole discussion. :)

 I suppose a variation of the scratch-pad that might be more on the
 performance line that you are thinking could be similar to the
 scope tracking that compilers do when gathering symbols into
 symbol tables.

Ahhh, that's what I missed. I was assuming that you'd either have to push
variables on to this stack in buffer_allocate, or in the place that's
allocating them, and pop them all off with a end_GC_function. Which I
considered to be 'just as much work'.

 So a GC-able buffer gets created with a intial scope of cur_interp-scope,
 hidden in the allocator, and the collector skips collect on any
 buffer with scope = the cur_scope.

I'm a bit confused. Say I have function A call B and C. Function C will
have the same scope as B will. If C triggers a GC run, then anything
allocated in B will have the same scope as C. How will the GC system
know that it can mark those as dead? Granted your system is safe, but it
seems a little *too* safe.

 And there is no stack churn.

I like that part, tho. :)

Mike Lambert





Re: string api

2002-04-08 Thread Melvin Smith

At 01:48 AM 4/9/2002 -0400, Michel J Lambert wrote:
  the malloc()/free() situation which is one of the primary reasons we
  use garbage collection in the first place, so why reinvent the same
  situation with different syntax?

Generally, malloc/free are used in more complex situations than just
stack-based memory management. But I see your point.

  malloc/free is vulnerable to:
  1) leakage (forgot to free)

If you remember to mark it as used, you're pretty much guaranteed to mark
it as unused at the end of the same function.

As long as you leave the function from one place. There is no way using
this method to say, Whatever you do inside X(), I'm going to clean it up,
even if you jump out of X() with longjmp()

In general, this can't happen with setting bits. We can't unset a bit
twice, since we're only doing this on stuff returned by a function, for

I may be using a hammer where a nail file would do here.

I was thinking of the dangling buffer situation getting the bit cleared
after the original had been moved out from under it, but I think
it may be getting too late for me to continue this thing called thinking... :)

You are probably right, if its a bug, its not the clearing of the bit, its
keeping the invalid reference around not reachable from root.

  I suppose a variation of the scratch-pad that might be more on the
  performance line that you are thinking could be similar to the
  scope tracking that compilers do when gathering symbols into
  symbol tables.

Ahhh, that's what I missed. I was assuming that you'd either have to push
variables on to this stack in buffer_allocate, or in the place that's
allocating them, and pop them all off with a end_GC_function. Which I
considered to be 'just as much work'.

You interpreted correctly, my first mail had that in mind, then I saw your
point on churn and killing the cache benefit.

  So a GC-able buffer gets created with a intial scope of cur_interp-scope,
  hidden in the allocator, and the collector skips collect on any
  buffer with scope = the cur_scope.

I'm a bit confused. Say I have function A call B and C. Function C will
have the same scope as B will. If C triggers a GC run, then anything
allocated in B will have the same scope as C. How will the GC system
know that it can mark those as dead? Granted your system is safe, but it
seems a little *too* safe.

After return from C, it will be collected.

Given, this is controlled usage for internals programmers.

I wouldn't expect Parrot to do:

int main() {
 save = GC_NEWPAD();

 runops();

 GC_RSTPAD(save);
}


I'm targetting the situation where A() creates an aggregate which calls
B() and C() which might be recursive, etc. A() sets a marker, calls B()/C().
And pops the marker before returning.

Its basically setting a GC transaction to use a SQL reference, but
I don't expect internals guys to set a single large transaction over the
whole interpreter.

A caveat, scope 0 might be immune to this rule, else everything in
the base scope might live forever. Easy enough to handle.

-Melvin