library assumptions
Hello all: I was just begining work on the string api and was wondering what libraries are allowed for use inside the interpreter. Mainly I want to know if I can use stdarg.h --Roman
Re: library assumptions
At 06:32 PM 4/7/2002 -0400, Roman Hunt wrote: Hello all: I was just begining work on the string api and was wondering what libraries are allowed for use inside the interpreter. Mainly I want to know if I can use stdarg.h I would expect that should be fine, stdarg is one of the 4 headers that are guaranteed by ANSI C89 even on a free standing environment (read embedded targets, etc.) Its integral to C, and if you don't have it, I suppose the question would be why we should port to it. -Melvin
Re: library assumptions
At 6:32 PM -0400 4/7/02, Roman Hunt wrote: Hello all: I was just begining work on the string api and was wondering what libraries are allowed for use inside the interpreter. Mainly I want to know if I can use stdarg.h As Melvin's said, that's fine. Pretty much everything else needs a Configure test, of course. :( -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: library assumptions
Melvin Smith [EMAIL PROTECTED] writes: I would expect that should be fine, stdarg is one of the 4 headers that are guaranteed by ANSI C89 even on a free standing environment (read embedded targets, etc.) Its integral to C, and if you don't have it, I suppose the question would be why we should port to it. Basically, whether you can use stdarg.h is directly tied to whether you want to support KR compilers. If you want to support KR, you have to allow for the possibility of varargs.h instead. If you are willing to require an ANSI C compiler (which I believe was the decision already made), stdarg.h is safe. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/
string api
hello: I am interested in contributing to the project. (Thank Dan's cross-country tour :) This is my first project of this size and importance, but I feel up to the task. (Read: Please, be patient with the newbie). I have begun work on string_nprintf() as strings.pod says that it was still unimplemented. I have run into a few questions though: I can't find the definition for the string_vtable it is not in string.h as the pod states. Also, what would be the standard way to map a C string into a STRING would one just call string_make passing a pointer to the char buffer with the correct encoding passed in, and the strings length into len?
Re: string api
At 05:49 PM 4/8/2002 -0400, Roman Hunt wrote: hello: and importance, but I feel up to the task. (Read: Please, be patient with the newbie). I have begun work on The more the merrier, its been too quiet this last week. find the definition for the string_vtable it is not in Try classes/perlstring.pmc Keep in mind there is the primitive STRING type which is the S* registers, and then there is the PMC (PerlString) which uses vtables. string.h as the pod states. Also, what would be the standard way to map a C string into a STRING would one just call string_make passing a pointer to the char buffer with the correct encoding passed in, and the strings length into len? Looks correct, except make sure you watch where you stash the STRING while you are working on it. If you make calls to subroutines that may trigger a GC_collect() then the STRING you had might be moved or collected. For now the safest is the 'immortal' bit or stashing the STRING in a register so the root set can see it. The standard way to deal with this (as far as Parrot goes) is still the topic of debate. However, if all you are doing is allocating the STRING then doing a lot of known ops and/or system calls that don't thread into the GC, there is nothing to bother with. Hope this helps, -Melvin
Re: string api
On Mon, Apr 08, 2002 at 07:01:44PM -0400, Melvin Smith wrote: At 05:49 PM 4/8/2002 -0400, Roman Hunt wrote: find the definition for the string_vtable it is not in Try classes/perlstring.pmc Keep in mind there is the primitive STRING type which is the S* registers, and then there is the PMC (PerlString) which uses vtables. The primitive STRING also uses a vtable (for the different encodings). That's in include/parrot/encoding.h. string.h as the pod states. Also, what would be the standard way to map a C string into a STRING would one just call string_make passing a pointer to the char buffer with the correct encoding passed in, and the strings length into len? Looks correct, except make sure you watch where you stash the STRING while you are working on it. If you make calls to subroutines that may trigger a GC_collect() then the STRING you had might be moved or collected. For now the safest is the 'immortal' bit or stashing the STRING in a register so the root set can see it. And if the C string belongs to someone else, you may need to set the BUFFER_external_FLAG flag. However, if all you are doing is allocating the STRING then doing a lot of known ops and/or system calls that don't thread into the GC, there is nothing to bother with. This message does remind me of how empty the TODO list is. Surely we can think of many more things to be done?
Re: string api
At 06:10 PM 4/8/2002 -0700, Steve Fink wrote: On Mon, Apr 08, 2002 at 07:01:44PM -0400, Melvin Smith wrote: At 05:49 PM 4/8/2002 -0400, Roman Hunt wrote: find the definition for the string_vtable it is not in Try classes/perlstring.pmc Keep in mind there is the primitive STRING type which is the S* registers, and then there is the PMC (PerlString) which uses vtables. The primitive STRING also uses a vtable (for the different encodings). That's in include/parrot/encoding.h. Don't mind me, I'm 75% fact and 25%..well If you make calls to subroutines that may trigger a GC_collect() then the STRING you had might be moved or collected. For now the safest is the 'immortal' bit or stashing the STRING in a register so the root set can see it. And if the C string belongs to someone else, you may need to set the BUFFER_external_FLAG flag. However, if all you are doing is allocating the STRING then doing a lot of known ops and/or system calls that don't thread into the GC, there is nothing to bother with. This message does remind me of how empty the TODO list is. Surely we can think of many more things to be done? Speaking of.. 1) Bugfix release please, we banged quite a few stack and GC bugs out. Don't we get any dessert? 2) I'm thinking of an internal stack not visible to user code that we use for temporary PMCs and Buffers and a simple macro for entry and exit of GC sensitive routines. I think I might have mentioned this. p = gcsaveframe(); yada yada yada gcrestoreframe(p); This scribble pad stack is part of the root set so I think its self explanatory. Even if messy code scribbles too much on the stack, as long as the outer scopes restore the stack frame, it'll be kept in check. So.. foo_alloc() { marker = gcsaveframe(); bar_alloc(); gcrestoreframe(marker); # All cleaned up } # bar_alloc might be messy and return without restoring. bar_alloc() { mymarker = gcsaveframe(); yada(); return; } There isn't anything really innovative here, its the same way we handle normal stacks, yet its just implicit because the pushes are hidden in the PMC and buffer allocators. I'm not a GC design guru, but the limited reading I've done on JVM hints that they do something similar. Then again, I haven't thought about how this works with threads, I suppose the stack would have to exist in TLS. I'd like something like this rather than hoping all developers can systematically set bits or handle references correctly because in reality we'd probably never catch all the cases. -Melvin
[netlabs #500] disassemble fails with errors and garbage
# New Ticket Created by Clinton A. Pierce # Please include the string: [netlabs #500] # in the subject line of all future correspondence about this issue. # URL: http://bugs6.perl.org/rt2/Ticket/Display.html?id=500 Compiling BASIC into out.pbc: C:\projects\parrot\parrotbasic.pl [produces out.pbc] Including stackops.pasm Including alpha.pasm Including dumpstack.pasm Including tokenize.pasm Including basicvar.pasm Including basic.pasm Including instructions.pasm Including expr.pasm 4026 lines Ready QUIT C:\projects\parrot\parrotdisassemble.pl out.pbc Use of uninitialized value in modulus (%) at lib/Parrot/Types.pm line 82, GEN0 line 12. Use of uninitialized value in addition (+) at lib/Parrot/Types.pm line 83, GEN0 line 12. Use of uninitialized value in substr at lib/Parrot/Types.pm line 85, GEN0 line 12. PackFile::ConstTable: Internal error: Unpacked Constant returned bad byte count '52'! at lib/Parrot/PackFile/ConstTable.pm line 73 Parrot::PackFile::ConstTable::unpack('Parrot::PackFile::ConstTable=HASH(0x1d48340)', 'm^@^^ s^@^^@^T^^@^^@^^@^^@^^@^^@^^@^^A^@^^@#^^@^s^@^^@^T^^@^^@^^@^^@^^@^^@^^@^^A^@^ ^@-^^@^s^@^^@...') called at lib/Parrot/PackFile.pm line 149 Parrot::PackFile::unpack('Parrot::PackFile=HASH(0x1d48358)', 'M-!U1^A^@^^@^4^O^@^m^@^^@s ^@^^@^T^^@^^@^^@^^@^^@^^@^^@^^A^@^^@#^^@^s^@^^@^T^^@^^@^^@^^@^^@^^@^^@^...') ca lled at lib/Parrot/PackFile.pm line 206 Parrot::PackFile::unpack_filehandle('Parrot::PackFile=HASH(0x1d48358)', 'FileHandle=GLOB(0x1 d483e8)') called at lib/Parrot/PackFile.pm line 222 Parrot::PackFile::unpack_file('Parrot::PackFile=HASH(0x1d48358)', 'out.pbc') called at C:\pr ojects\parrot\parrot\disassemble.pl line 248 main::disassemble_file('out.pbc') called at C:\projects\parrot\parrot\disassemble.pl line 276
Re: string api
This message does remind me of how empty the TODO list is. Surely we can think of many more things to be done? Speaking of.. 1) Bugfix release please, we banged quite a few stack and GC bugs out. Don't we get any dessert? Peter has already stated he'd like his parrot_reallocate_buffer patch to go in first, as it does fix a reproducible bug with clint's evaluator. And there's still a bunch of GC bugs. I know of three types: 1) The problem that's been brought up before (and below), of CPU-stack vars not being traceable. 2) The GC initialization stuff could potentially trigger a GC (woops!). My fix was to disable the GC during interpreter initialization, but I'm not sure what we want to do for this. 3) Non-string buffers (ie, stuff in last_Buffer_Arena, not last_String_arena) are pretty broken, I think. They aren't marked, unmarked, freed, or copied. I could be mistaken on this one, as I don't have any test cases that break it yet, just my understanding of the code. This should be a lot easier to fix, with a little copy and paste. But I'd like to get a valid test case before I attempt to fix this. 2) I'm thinking of an internal stack not visible to user code that we use for temporary PMCs and Buffers and a simple macro for entry and exit of GC sensitive routines. I think I might have mentioned this. What defines a GC-sensitive routine? Anything that does string manip, pmc manip, or any allocations, is marked GC-sensitive? I'd like something like this rather than hoping all developers can systematically set bits or handle references correctly because in reality we'd probably never catch all the cases. Two things: First, we now have a GC_DEBUG define that we can turn on to find all places the GC could cause problems. In the current state, I think it covers 90% of the problems (one problem is that if I conditionally call string_make, this function isn't guaranteed of triggering GC in a test case, like it should.) However, if we can't find all the places we do buffer manipulation to mark them immortal, how are we going to properly identify all the GC-sensitive functions? Secondly, setting a flag should be much quicker, speed-wise. We'd only be setting flags on buffer headers that are already in the CPU cache, as opposed to writing to memory in this stack, pushing and popping all the time. And if we macro-ize the setting of the flags, I don't think it should look nearly as bad. GC_MARK(Buffer), GC_UNMARK(Buffer), etc. I know there's been little activity in the past week...as far as my activity, I'm waiting for the Dan to come back tomorrow, and tell us minions what the plan is for GC stuff. Peter and I fixed most of the GC bugs that are easily fixed, but the rest require a more architectural fix, something I think we all are deferring to Dan on. Mike Lambert
Re: string api
On Mon, Apr 08, 2002 at 11:40:28PM -0400, Michel J Lambert wrote: However, if we can't find all the places we do buffer manipulation to mark them immortal, how are we going to properly identify all the GC-sensitive functions? Ack! Sorry for being anal, but I finally decided the 'immortal' name just bugged me too much, and renamed it to 'immune'. :-)
Re: string api
At 11:40 PM 4/8/2002 -0400, Michel J Lambert wrote: 2) I'm thinking of an internal stack not visible to user code that we use for temporary PMCs and Buffers and a simple macro for entry and exit of GC sensitive routines. I think I might have mentioned this. What defines a GC-sensitive routine? Anything that does string manip, pmc manip, or any allocations, is marked GC-sensitive? Ok, thats a really general phrase I used. :) I agree we need an overall architectural solution. Setting and clearing bits manually is error-prone but fast, as you said. Its identical to the malloc()/free() situation which is one of the primary reasons we use garbage collection in the first place, so why reinvent the same situation with different syntax? malloc/free is vulnerable to: 1) leakage (forgot to free) 2) double deallocation (freed an already freed buffer) So is setting/clearing GC bits. I was thinking of a solution that didn't require tracking every single allocation. Keep in mind I'm just tossing about an alternate point of view for sake of discussion. I suppose a variation of the scratch-pad that might be more on the performance line that you are thinking could be similar to the scope tracking that compilers do when gathering symbols into symbol tables. Keep track of global (or interpreter local) scope with a macro upon entry. #define GC_NEWPAD() cur_interp-scope++ #define GC_CLEARPAD(s) cur_interp-scope = s So a GC-able buffer gets created with a intial scope of cur_interp-scope, hidden in the allocator, and the collector skips collect on any buffer with scope = the cur_scope. Two things: First, we now have a GC_DEBUG define that we can turn on to find all places the GC could cause problems. In the current state, I think it covers 90% of the problems (one problem is that if I conditionally call string_make, this function isn't guaranteed of triggering GC in a test case, like it should.) However, if we can't find all the places we do buffer manipulation to mark them immortal, how are we going to properly identify all the GC-sensitive functions? Secondly, setting a flag should be much quicker, speed-wise. We'd only be setting flags on buffer headers that are already in the CPU cache, as opposed to writing to memory in this stack, pushing and popping all the time. And if we macro-ize the setting of the flags, I don't think it should look nearly as bad. GC_MARK(Buffer), GC_UNMARK(Buffer), etc. Fair enough on the speed point, however you have to remember for every object you are handling to (1) mark it, (2) unmark it after attaching it to the root set. One the other hand, what if we could say { orig = GC_NEWPAD(); x(); y(); z(); GC_CLEARPAD(orig); } and know that anything in between newpad to clearpad would be safe and be free to write normal code even with GC churning. And there is no stack churn. I know there's been little activity in the past week...as far as my activity, I'm waiting for the Dan to come back tomorrow, and tell us minions what the plan is for GC stuff. Peter and I fixed most of the GC bugs that are easily fixed, but the rest require a more architectural fix, something I think we all are deferring to Dan on. Agreed. However, more discussion around here is a good thing. :) -Melvin
Worst-case GC Behavior?
I think I know of two potential performance problems with the GC code. They could be problems in my head, or real problems, as I haven't done any profiling. We also don't have any real test cases. :) The first example is the following code, which calls parrot_allocate to create the string each time. for(1..1) { push a, a; } If we start out with no room, it calls Parrot_go_collect for each push, but the go_collect does nothing because there's no free memory. This then requires another allocation, fit exactly to the size of the block, one character. Repeat. Since we're never freeing any memory, it continually is allocating a block of size 56 (memory pool) + 1 (character) from the underlying system api. The second example involves headers. Say we have the following code: loop: new P0, PerlString branch loop Which might correspond to: while(1) { my $dummy; } Each time through the array, it has to alloc a PMC header. When we allocate the header, we store it into P0, and the old header is essentially freed. The next time through the loop, entries_in_pool is 0, and it triggers alloc_more_string_Headers, and a dod run. This finds the PMC we just freed, and uses it. Repeat. Each time through the loop, it triggers a dod run. The above example might be a bit contrived, due to the fact that it could pull 'my $dummy' outside of the loop, assuming it isn't tied. (If it is tied, we need to do it each time through, since it could be counting the number of times we set the variable, or somesuch.) Now, I know that any memory management system can have cases which cause worst-case behavior. I'm not sure if the cases I presented are those kind of cases, or whether they are common enough that we need to worry about it. The first problem can probably be solved by enforcing a minimum block allocation size. I'm not sure of a good solution to the second problem, however. If we do have a minimum block allocation size, it will perform horribly memory-wise on something like: while (1) { push a, a; push a, aax200; } This example destroys any implementation that has a minimum block allocation size. This could be alleviated with a linked list of blocks that have available memory at the tail of the block. This could give very bad performance whenever we have lots of half-filled blocks. (Say, when we continually allocate blocks of size 0.51*minimum_block_allocation_size.) I recall Dan saying he didn't like traversing linked lists when searching for memory, but it shouldn't be that bad since it all gets cleaned up on a call to parrot_go_collect. Finally, another approach is to randomize things. Lots of algorithms randomize their behavior to prevent test cases that exhibit worst-case behavior. Of course, I'm not sure if a non-deterministic interpreter is a good thing, since it'll just make GC bugs that much more annoying to track down. These might all be things that were considered, and discarded as not important enough. But I didn't see these potential problems mentioned anywhere, so I figured I'd bring them up here, just in case. Mike Lambert
macros (was Re: string api)
Keep track of global (or interpreter local) scope with a macro upon entry. I shudder every time someone says macro on p6i. perl5 has several thousand macros defined. (grep for ^#define) (over 8000 if you include all the embedding macros. it's down to ~4000 if you cut out embedding, config.. and closer to ~1500/2000 if you rip out more things.) This makes it wonderfully challenging to debug. Macros are a useful feature of the C language, but we should be very careful in how we use them. (I'm not saying don't use them at all.) I'm sure there's a happy medium somewhere between no macros and perl5. We should look for it. -R
Re: macros (was Re: string api)
At 10:30 PM 4/8/2002 -0700, Robert Spier wrote: Keep track of global (or interpreter local) scope with a macro upon entry. I shudder every time someone says macro on p6i. perl5 has several thousand macros defined. (grep for ^#define) (over 8000 if you include all the embedding macros. it's down to ~4000 if you cut out embedding, config.. and closer to ~1500/2000 if you rip out more things.) Are you counting literals and things like bit values in your grep? This makes it wonderfully challenging to debug. That might be a bit unfair, I'd argue that it makes it _easier_ to debug in many cases, particularly with constants. Macros are a useful feature of the C language, but we should be very careful in how we use them. (I'm not saying don't use them at all.) I'm sure there's a happy medium somewhere between no macros and perl5. We should look for it. 'macro' here is a choice of words... call it an inline function if you want. I'd be more worried about debugging that computed goto core than a macro. :) -Melvin
Re: Worst-case GC Behavior?
At 01:17 AM 4/9/2002 -0400, Michel J Lambert wrote: The first example is the following code, which calls parrot_allocate to create the string each time. Might both of these be solved by using arenas? -Melvin
Re: string api
I agree we need an overall architectural solution. Setting and clearing bits manually is error-prone but fast, as you said. Its identical to the malloc()/free() situation which is one of the primary reasons we use garbage collection in the first place, so why reinvent the same situation with different syntax? Generally, malloc/free are used in more complex situations than just stack-based memory management. But I see your point. malloc/free is vulnerable to: 1) leakage (forgot to free) If you remember to mark it as used, you're pretty much guaranteed to mark it as unused at the end of the same function. 2) double deallocation (freed an already freed buffer) In general, this can't happen with setting bits. We can't unset a bit twice, since we're only doing this on stuff returned by a function, for the duration of the function that got it. If we return it again, they'll set it, and free it on their own. Agreed, it's confusing, and thus the reason for this whole discussion. :) I suppose a variation of the scratch-pad that might be more on the performance line that you are thinking could be similar to the scope tracking that compilers do when gathering symbols into symbol tables. Ahhh, that's what I missed. I was assuming that you'd either have to push variables on to this stack in buffer_allocate, or in the place that's allocating them, and pop them all off with a end_GC_function. Which I considered to be 'just as much work'. So a GC-able buffer gets created with a intial scope of cur_interp-scope, hidden in the allocator, and the collector skips collect on any buffer with scope = the cur_scope. I'm a bit confused. Say I have function A call B and C. Function C will have the same scope as B will. If C triggers a GC run, then anything allocated in B will have the same scope as C. How will the GC system know that it can mark those as dead? Granted your system is safe, but it seems a little *too* safe. And there is no stack churn. I like that part, tho. :) Mike Lambert
Re: string api
At 01:48 AM 4/9/2002 -0400, Michel J Lambert wrote: the malloc()/free() situation which is one of the primary reasons we use garbage collection in the first place, so why reinvent the same situation with different syntax? Generally, malloc/free are used in more complex situations than just stack-based memory management. But I see your point. malloc/free is vulnerable to: 1) leakage (forgot to free) If you remember to mark it as used, you're pretty much guaranteed to mark it as unused at the end of the same function. As long as you leave the function from one place. There is no way using this method to say, Whatever you do inside X(), I'm going to clean it up, even if you jump out of X() with longjmp() In general, this can't happen with setting bits. We can't unset a bit twice, since we're only doing this on stuff returned by a function, for I may be using a hammer where a nail file would do here. I was thinking of the dangling buffer situation getting the bit cleared after the original had been moved out from under it, but I think it may be getting too late for me to continue this thing called thinking... :) You are probably right, if its a bug, its not the clearing of the bit, its keeping the invalid reference around not reachable from root. I suppose a variation of the scratch-pad that might be more on the performance line that you are thinking could be similar to the scope tracking that compilers do when gathering symbols into symbol tables. Ahhh, that's what I missed. I was assuming that you'd either have to push variables on to this stack in buffer_allocate, or in the place that's allocating them, and pop them all off with a end_GC_function. Which I considered to be 'just as much work'. You interpreted correctly, my first mail had that in mind, then I saw your point on churn and killing the cache benefit. So a GC-able buffer gets created with a intial scope of cur_interp-scope, hidden in the allocator, and the collector skips collect on any buffer with scope = the cur_scope. I'm a bit confused. Say I have function A call B and C. Function C will have the same scope as B will. If C triggers a GC run, then anything allocated in B will have the same scope as C. How will the GC system know that it can mark those as dead? Granted your system is safe, but it seems a little *too* safe. After return from C, it will be collected. Given, this is controlled usage for internals programmers. I wouldn't expect Parrot to do: int main() { save = GC_NEWPAD(); runops(); GC_RSTPAD(save); } I'm targetting the situation where A() creates an aggregate which calls B() and C() which might be recursive, etc. A() sets a marker, calls B()/C(). And pops the marker before returning. Its basically setting a GC transaction to use a SQL reference, but I don't expect internals guys to set a single large transaction over the whole interpreter. A caveat, scope 0 might be immune to this rule, else everything in the base scope might live forever. Easy enough to handle. -Melvin