Re: [perl #17731] [PATCH] Integration of Lea like allocators
> And additionally, for + 10 % more generations in life.pasm > - tossed one instruction in the fast path of Buffer_headers I don't believe this is valid. bufstart needs to be set to 0 when you free an object. When the stackwalk runs, it could "liven" a dead buffer. When the copying collector runs, it could see a dead buffer and want to copy it. So we'd better make sure that buffer's not pointing to garbage memory or it'll be coping random stuff from who-knows where and likely GPF. :) Mike Lambert
Re: RFC: library entry name collision
> I was beating my head on the wall yesterday trying to figure out why > an intlist test was failing on a freshly updated source tree. (I > rarely use 'make clean', because that's almost always just covering up > dependency problems.) I'll leave out the gory details, but the problem > boiled down to parrot/intlist.o and parrot/classes/intlist.o being > treated as identical by ar. Upon further reading, it appears that for > portability, we can only depend on the basename of ar entries. Two things... First: One dependancy problem that comes up all the time is that classes/Makefile doesn't have any dependancies upon GENERAL_H_FILES. These .o files aren't updated if I change parrot headers, etc. The best way to solve this is put the logic into the base parrot Makefile, although that could make makefile generation a bit more difficult. Second: intlist is not the only culprit. ./classes/key.c and ./key.c have a similar problem. Mike Lambert
Re: Of PMCs Buffers and memory management
le, however, to have a pool of unsized-header *pointers*, and that's exactly what extra_buffer_headers is. Currently, we group all headers of the same size in the same header pool, although only constants string headers currently have their own pool. Namely, because we don't really have constants of anything else implemented yet. :) > > ... But there is > > no compelling reason to do so, at this point in time. (I have some ideas > > that would require it, tho) > > Could you elaborate on these ideas? I guess I will need to write up those ideas. :) > > ... I don't think we want interpreters appearing and > > disapppearing with references...they should be explicitly created and > > destroyed. > > Actually, it's not a big difference, how they are destroyed, but we have > already a "newinterp" opcode, so a interpreter PMC class just needs a > custom destroy method - that get called too ;-) > Though, if nested structures inside the interpreter are all buffers, > destroying them would neatlessly fit into the framework. Yes, it would. But a lot of the interpreters structures have data fields, and those don't work too well as buffer data. They could work as part of a sized buffer header, I suppose. I think it would be much easier to make the interpreter PMC-ish, or at least have a PMC wrapper. Then this PMC can have an active-destroy method, which would properly clean up everything that needed to be cleaned up. Since the interpreter memeory would be malloc-allocated, it wouldn't be copied or cleaned on it's own. The PMC would become an interface for the GC system to control the lifetime of the allocated interpreter memory, since the GC system would control the PMC. Mike Lambert
Re: Of PMCs Buffers and memory management
method, so that fields of the sized buffer interpreter header could be marked() and buffer_lives() themselves. (Currently, this is done in dod.c). If they were unified, the PMC would be an interpreter referencing a sized buffer header. Or if we had sized PMCs, the fields could be part of it, avoiding the need for a buffer. However, as far as leaking memory, there is no reason that interpreters have to be PMC/buffers. Just as we have an make_interpreter to create an interpreter, we can have an unmake_interpreter that destroys the interpreter. I don't think we want interpreters appearing and disapppearing with references...they should be explicitly created and destroyed. But that's a discussion for another thread. My point is that all things don't need to be traced, and some stuff can be handled manually, as long as the perl programmer doesn't see it directly. Hope this helps answer your questions, Mike Lambert
Re: [INFO] parrot with Lea malloc - 50% faster life
> The whole resource.c is replaced by calls to calloc/realloc (res.c), > add_free_buffer does free(). I think that's the problem right there. What exactly are you changing to use this new calloc/malloc implementation? One approach is to modify memory.c to use the new versions, although since parrot does a lot of its own memory management of large chunks, it's not likely to give any speed improvement over the OS. If you're replacing resources.c and headers.c with this calloc/malloc, you have to make sure to get every occurance, which can be difficult. What level are you trying to hook in this new implementation? You can bind every buffer's contents to calloc/realloc calls, but then there will be no copying or collection going on because we're not allocating out of pools. You'll need to disable compact_pool to do nothing, and update the checks in mem_allocate such that they aren't dependant upon it being called. If you want the actual headers to be allocated from calloc/realloc, you'll need to change add_free_object and get_free_object in smallobject.c, and (add|get)_free_(buffer|pmc) in headers.c. Then you'll need to disable DOD because all of the headers will no longer be consecutive in large pools like they were before. At this point, we're screwed because we will never free any memory. You could reimplement DOD to use a pool of pointers to headers, but that's just going to be diminishing returns, especially with the random memory dereferences (we had relatively good cache coherency before). > make test shows currently gc_2-4 broken (no statistics) and 2 tests from > src/intlist.t, which is proably my fault. I'm not surprised, to be honest. :) gc tests some GC behaviors that aren't tested through the normal code (ie, it will actually trigger a DOD and/or collection run), so if they're broken, you've likely done something wrong. (changing just add_free_buffer without get_free_buffer or dod is one thing that classifies as "something wrong" ;) > I didn't look into these further, but this is probably due to more > broken string/COW code + continuations in 8.t. > > string_substr / unmake_COW currently does highly illegal (WRT malloc) > things, which might or might not cause problems for the current GC > implementation. s. patch. There are probably more things like this. Those illegal things are likely illegal for malloc, but they work perfectly fine for our pool-based system. Feel free to remove some of that code (it's mostly optimizations) if you like. However, I sincerely doubt that unmake_COW or string_substr is the cause of the GC bugs you mentioned, since they shouldn't be used by those tests at all. (Those tests merely verify that collections and dods are being run. They don't even really check that the GC runs don't destroy important data. (So gc_2 failing is basically saying that the pool compaction is failing.) > I didn't look further into memory usage or such, though top seems to > show ~double the footprint of CVS. I think Dan would disallow such a patch for this reason alone. We're already taking a 2x hit (peak) by using a copying collector. No need to make it worse. :) Mike Lambert
Re: [perl #17495] [PATCH] for a faster life
> Now, trace_system_stack walks a ~1300 entries deeper stack in CGoto > run mode, because of the jump table in cg_core. Don't ask me about this > difference to 900 ops, gdb says so. Ahh, good observation. (I'm more of a non-cgoto person myself ;). > Attached patch now sets interpreter->lo_var_ptr beyond this jump table, > reducing the time of trace_system_stack to 0.04s for above case. Unfortunately, this doesn't work too well as a solution. There are a few pmcs and buffers that appear before the call to runops. These must be traced by trace_system_stack or else get prematurely freed (think ARGV, pbc filename, etc). Try running with GC_DEBUG to see this happen. What I don't understand, however.the ops_addr jump table appears to be a static variable. Shouldn't the contents of this static variable not be stored on the stack? Alternately, is it possible to allocate this jump table in the system heap (malloc et al), and store only a pointer to it on the stack? Mike Lambert
Re: Tinderbox "TD-ParkAvenue" not working
> First, a thank you to whoever it is who is running these test-drive > machines (there's no name in the build log). Also, a thanks to Compaq > for setting them up. You're welcome. It's basically just a script on my linux box that uploads a tar file (the servers don't have gzip dammit! ;) to the remote machine, then telnets and and manually builds them. This is because the machines don't have any outgoing connections (so they can't grab ftp or cvs or rsh stuff). > There's a problem with the NetBSD machine. There's no 'perl' in the > $PATH being used, so the log file looks like this: > > ... > > Obviously it's not going to work. If there is a perl installed on > that machine, then perhaps the full pathname to perl should be used. > If there's no perl, then perhaps the machine should be removed from the > tinderbox. Yes, I had noticed that. And that struct me as strange, particularly because that machine had worked before, but isn't working now. I'll remove it from the list of machines it connects to. Mike Lambert
Re: Current Perl6 on MS Win32 status
> Perl6 on Win32 MS VC++ gives: > > Failed TestStatus Wstat Total Fail Failed List of Failed > -- > > t/compiler/8.t 1 256 61 16.67% 6 > t/compiler/a.t 1 256 31 33.33% 2 > t/rx/call.t1 256 21 50.00% 1 After the recent BUFFER_external_FLAG fixes, I now get: Failed TestStatus Wstat Total Fail Failed List of Failed -- t/compiler/1.t 1 256121 8.33% 11 t/compiler/3.t 1 256 71 14.29% 7 t/rx/call.t1 256 21 50.00% 1 Specifically, t/compiler/1NOK 11# got: '' # expected: '1003.10 # 1031.00 # 1310.00 # 4100.00 # ' and: t/compiler/3NOK 7# got: 'Wrong type on top of stack! # ' # expected: '678910 # 1112131415 # ' The first was a GPF, the second was just incorrect output. I'm not sure if this is progress or not, but I believe it might adversely affect other platforms. I don't have time to look into the issue now, but I'll try to do so tomorrow. Mike Lambert
Re: [perl #16855] [PATCH] uselessly optimize print()
> > In tracking down a gc bug, I realized that the current throwaway > > implementation of the print op could be replaced with a faster > > throwaway implementation that avoids doing a string_to_cstring. > > > > Note that both the original and new implementations are still buggy > > with respect to supporting different encodings. I don't know if > > printf("%s") is any better than fwrite in terms of at least vaguely > > paying attention to your locale or whatever. If so, don't apply it. > > > > (all tests pass) Applied, thanks, Mike Lambert
Re: [perl #16852] [PATCH] Eliminate empty extension
> > This patch trims off the period at the end of executable filenames for > > C-based tests on unix. (It compiled "t/src/basic_1.c" -> > > "t/src/basic_1."; this patch makes that "t/src/basic_1") > > This patch should also update languages/perl6/P6C/TestCompiler.pm, since > it hijacks lib/Parrot/Test.pm to get its functionality. I'll probably > apply this after the code opens up again, but if someone beats me to it, > please be sure to update the affected file above. Applied, thanks, Mike Lambert
Current Perl6 on MS Win32 status
Perl6 on Win32 MS VC++ gives: Failed TestStatus Wstat Total Fail Failed List of Failed -- t/compiler/8.t 1 256 61 16.67% 6 t/compiler/a.t 1 256 31 33.33% 2 t/rx/call.t1 256 21 50.00% 1 t/compiler/8NOK 6# got: 'Wrong type on top of stack! # ed 1 # 1 # 2 # a.1: 3 # b.1 # foo # ' # expected: '1 # 2 # a.1: 3 # b # 4 # 5 # Survived 1 # 1 # 2 # a.1: 3 # b.1 # foo # ' # Looks like you failed 1 tests of 6. This one is known, and is waiting on a BUFFER_external patch. Now that parrot works on win32 again, I'll try to clear out my patch queue. t/compiler/aok 1/3Couldn't find global label '__setup' at line 1. Couldn't find global label '_main' at line 3. Couldn't find operator 'bsr' on line 1. Couldn't find operator 'bsr' on line 3. # Failed test (t/compiler/a.t at line 51) t/compiler/aNOK 2# got: '' # expected: '1 # 1.1 # 2 # -- # 1.1 # 2.1 # -- # 1 # 1.1 # 2 # 2.1 # 3.1 # 4 # 4.1 # 5.1 # 6.1 # -- # 1 # 1.1 # 2.1 # 3.1 # 4 # 4.1 # 5.1 # ' This error was in imc->pasm, specifically: last token = [] (error) line 63: parse error Didn't create output asm. t/rx/call...NOK 1# got: 'ok 1 # ok 2 # ok 3 # ok 4 # ok 5 # ok 6 # ok 7 # ok 8 # ok 9 # ' # expected: 'ok 1 # ok 2 # ok 3 # ok 4 # ok 5 # ok 6 # ok 7 # ok 8 # ok 9 # ok 10 # ' No idea on where the missing "ok 10" went. If people would like the p6/imcc/pasm/pbc files, I can provide them. Just let me know. Mike Lambert
Re: Conditional makefile generation (Was Re: [perl #16856] [PATCH]various changes to imcc)
> > Is there any fundamental reason why we *cannot* just enter a generated > > imcparser.c and imcparser.h into CVS and save users the step of building > > them on these platforms? > > > Ack, so we should just delete the lines: > imclexer.c > imcparser.c > imcparser.h > > from .cvsignore Yep, although one also needs to adjust the makefile to avoid performing such rules. Attached patch gets IMCC building on MSVC without cygwin (lex/bison/yacc/etc). It assumes you add the generated imclexer.c, imcparser.c, and imcparser.h to cvs as well. Current perl6 test gives: Failed TestStatus Wstat Total Fail Failed List of Failed -- t/compiler/8.t 1 256 61 16.67% 6 t/rx/basic.t 2 512 52 40.00% 3-4 t/rx/call.t1 256 21 50.00% 2 Any idea on how to go about fixing the rx ones? They're failing on imc->pasm, with msgs like "NO Op rx_pushmark_ic (rx_pushmark<1>)" Mike Lambert Index: config/gen/makefiles/imcc.in === RCS file: /cvs/public/parrot/config/gen/makefiles/imcc.in,v retrieving revision 1.1 diff -u -r1.1 imcc.in --- config/gen/makefiles/imcc.in27 Aug 2002 05:02:28 - 1.1 +++ config/gen/makefiles/imcc.in2 Sep 2002 02:57:49 - @@ -33,10 +33,12 @@ all : imcc cd ../.. && $(MAKE) shared && $(RM_F) parrot${exe} && $(MAKE) -imcparser.c imcparser.h : imcc.y +grammar : yacc_file lex_file + +yacc_file : imcc.y $(YACC) -d -o imcparser.c imcc.y -imclexer.c : imcc.l $(HEADERS) +lex_file : imcc.l $(HEADERS) $(LEX) imcc.l .c$(O): Index: languages/imcc/.cvsignore === RCS file: /cvs/public/parrot/languages/imcc/.cvsignore,v retrieving revision 1.2 diff -u -r1.2 .cvsignore --- languages/imcc/.cvsignore 27 Aug 2002 08:07:45 - 1.2 +++ languages/imcc/.cvsignore 2 Sep 2002 02:58:00 - @@ -1,6 +1,3 @@ imcc -imclexer.c -imcparser.c -imcparser.h imcparser.output Makefile
Re: [perl #16895] [PATCH] core.ops, ops2c.pl
> 4) set P[k], i > > Here probably P should be a IN argument, because P itself is neither > created nor changed, just the array/hash contents is changed. > Currently only the JITters and imcc are using this flags, so, it should > be discussed, what OUT really means. I disagree. If P contains no keys, then set P[k] will create a new key or pair element in the P hashtable. This means that P is being modified. Of course, our meaning of IN and OUT is not completely nailed down just yet, especially since the best meaning of them will probably relate to how the JIT and IMCC want them to act. As such, the above argument could be correct or incorrect depending upon exactly how they are defined. :) Mike Lambert
Re: [PATCH] in makefile, move libparrot.a from "test" to "all"
Mr. Nobody wrote: > Date: Fri, 30 Aug 2002 18:13:27 -0700 (PDT) > From: Mr. Nobody <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Subject: [PATCH] in makefile, move libparrot.a from "test" to "all" > > libparrot.a is not really related to testing, it should belong in "all". This > patch does so, and as a side effect, t/src/basic will now work with "make testj". I thought so as well, at first. And currently, that might be an okay thing to do. However, it might help if I explain the purpose of the t/src/* tests. The originate from ticket 468: http://bugs6.perl.org/rt2/Ticket/Display.html?id=468 I believe the eventual intent is to set up the t/src/* tests to test: a) functions in parrot which aren't testable via opcodes, and thus can't be tested with our pasm files. b) the embedding system, to ensure that a static interface doesn't change behavior on us, etc. Currently however, neither a nor b are implemented, and so the t/src/* test have no direct dependancy upon libparrot.a/lib and libparrot.so/dll, and so can probably be removed. If it helps make 0.0.8 build on more platforms, it might be a "good thing" to do. At least, that's my understanding of the situation. Mike Lambert
Re: [BUG] GC collects argv aka P0
> > $ perl6 -k examples/life-ar.p6 5 > > Running generations > > This problem is due to the fact that the argument strings are > created with the external flag set, which is not properly > supported by the string and GC modules. Steve posted > some patches recently that might well fix the problem, > but I have been leaving those for Mike to look at. Yes yes, I've been a lazy bum. :) It's currently the summer->school transition for me, so I might be a bit spotty for the few days, too. You're right in that this bug is probably just a result of unimplemented external-ness, in which case, applying Steve's fixes should make this problem go away. Mike Lambert
Re: Conditional makefile generation (Was Re: [perl #16856] [PATCH]various changes to imcc)
> > However, the intermediate filename 'y.tab.c' isn't necessarily portable, > > > if I remember my Windows and VMS lore correctly. However, those platforms > > probably have bison, which understands the simpler -o imcparser.c output > > option. > > > So the first question actually is, is there a platform, parrot will > support, where there is/will be.. no bison. The better question to ask is: is there any platform where we will need to run bison/yacc on that platform in *order* to compile Parrot? I believe the answer is no. Is there any fundamental reason why we *cannot* just enter a generated imcparser.c and imcparser.h into CVS and save users the step of building them on these platforms? It's just an additional parrot depenendancy which doesn't need to be there, and may come in handy when trying to build on a lot of the more arcane platforms. Mike Lambert
Re: [perl #16874] [BUG] Concatenation failing
> > I have a weird bug where concatenation is sometimes failing, and I > > have no idea why. See the attached pasm. I fully expect both works and > > weird to output "foo", "bar", "quux" with various levels of spacing, > > but weird doesn't output quux. > > Patch below should fix the problem. This is not an optimal solution, > as the unmake_COW is probably not required if the string_grow is > going to happen anyway, but it seems to follow the general spirit of > the current code. Ah, this would be my bug. Thanks for finding it Peter. Unfortunately, I fail to see why this actually fixes any bug. string_grow should unmake_COW itself. So the old code essentially looked like this: /* make sure A's big enough for both */ if (a->buflen < a->bufused + b->bufused) { unmake_COW(interpreter,a); /* Don't check buflen, if we are here, we already checked. */ Parrot_reallocate_string(interpreter, a, a->bufused + b->bufused + EXTRA_SIZE); } unmake_COW(interpreter, a); While the new code with your patch, should look like: unmake_COW(interpreter, a); /* make sure A's big enough for both */ if (a->buflen < a->bufused + b->bufused) { unmake_COW(interpreter,a); /* Don't check buflen, if we are here, we already checked. */ Parrot_reallocate_string(interpreter, a, a->bufused + b->bufused + EXTRA_SIZE); } Since unmake_COW is a no-op if the string is not COW, I fail to see what the functional difference is between these two snippets of code, although I don't doubt that there is one if your fix solves Leon's code. Also, I agree that unmake_COW is not an optimal function call if you're going to grow the string afterwards. I wanted to get a simple interface implemented for using COW, such that it would be easy to understand, and then later optimize it for actual usage scenarios. I imagine that unmake_COW could be extended to take 'pre' and 'post' byte arguments, and would pad the resulting string by that much on either side when uncowifying it. This would help string_grow optimally uncowify things. I just haven't gotten around to that yet. Thanks, Mike Lambert
Re: [perl #16859] [PATCH] Fix BUFFER_external_FLAG for strings
> This patch is the real fix for strings with the BUFFER_external_FLAG. > It requires the previous dod.c and resources.c patches to be applied. > > Strings marked with the BUFFER_external_FLAG point to the external > memory rather than making a copy for themselves. Side effects of this > are (1) less memory usage, (2) external updates to the string are > reflected in the Parrot STRING (except for length changes!), and (3) > these strings are skipped over when memory is getting moved around > during a compaction. Fixing BUFFER_external_FLAG is probably a good thing, and I'm up for applying them after 008 goes out the door. However, BUFFER_external_FLAG and BUFFER_selfpoolptr_FLAG almost seem to have complementary purposes now. (The latter was something that I introduced along with the current COW version.) selfpoolptr indicates that the header references memory in the local pool. This allows a non-constant-header string to point cowingly towards a constant buffer in the constant pool. The selfpoolptr flag would be false, and it would avoid collecting/copying/destroying the data. This is rather similar in nature to external's behavior, and I imagine we could either: a) make them act identically, like this patch does b) just have people who want external data to unset selfpoolptr. Thoughts? Mike Lambert
Re: [perl #16857] [PATCH] minor refactoring in dod.c
> Small cleanups, but also a piece of my attempted fixes to the > BUFFER_external_FLAG. (The CVS version currently doesn't work anyway, > so don't worry about only getting parts of my fixes in; nothing > depends on it.) I'm curious about your dod/external "fix". If I understood the purpose of BUFFER_external_FLAG correctly, it indicates that the memory pointed to by this header is external to our local memory pool, and thus should not be collected, etc. However, if I understand your patch correctly, it makes all external buffers immune to being collected. I agree with the resources.c patch to fix external, but I'm not sure about this one. Mike Lambert
Re: [perl #16855] [PATCH] uselessly optimize print()
> In tracking down a gc bug, I realized that the current throwaway > implementation of the print op could be replaced with a faster > throwaway implementation that avoids doing a string_to_cstring. > > Note that both the original and new implementations are still buggy > with respect to supporting different encodings. I don't know if > printf("%s") is any better than fwrite in terms of at least vaguely > paying attention to your locale or whatever. If so, don't apply it. > > (all tests pass) Yay, I like this one. I was looking for a way to get rid of string_to_cstring at one point due to its ugly habit of uncowifying strings when we go to print them. I tried a temporary solution that used parrot io, which worked, except that the rest of the print ops in parrot still used stdio, and so I ended up with out-of-order printing. I'll apply this once Parrot 009 starts up again. Although at some point, we should go 100% in converting the IO in core.ops to use parrot io, or convert the tests/programs over to using io.ops. (Not sure which way we want to go.) Mike Lambert
Re: [perl #16852] [PATCH] Eliminate empty extension
> This patch trims off the period at the end of executable filenames for > C-based tests on unix. (It compiled "t/src/basic_1.c" -> > "t/src/basic_1."; this patch makes that "t/src/basic_1") This patch should also update languages/perl6/P6C/TestCompiler.pm, since it hijacks lib/Parrot/Test.pm to get its functionality. I'll probably apply this after the code opens up again, but if someone beats me to it, please be sure to update the affected file above. Thanks, Mike Lambert
Re: [perl #16820] [PATCH] Build libparrot.a with ar and ranlib
> The use of ar rcs to build a library and reconstruct the symbol table > is non-portable. (Mac OS X, for example, doesn't support the > 's' option to ar.) Yes, I had noticed that, and was hoping someone more knowledgeable would help out with our build problems. :) > The following patch changes the main makefile to use ar and ranlib, just > as perl5 has successfully done for years. Applied, thanks. Mike Lambert
Re: [perl #16818] [PATCH] Build cleanup
> I discovered 'make languages' yesterday. The enclosed patch cleans up a > lot of small nits I found in the build process. In a number of cases, the > Makefiles were running perl scripts as > ./script > rather than as > $(PERL) script > > A few other places called a plain 'perl' instead of $(PERL). Thanks for cleaning these up. Applied. > Second, Configure.pl was putting the wrong flags in to build a shared > library. (Or, more precisely, it was apparently unconditionally using > flags that work for GNU binutils.) I have replaced ld_shared by what I > suspect is the appropriate perl5 Config variable. I left ld_shared_flags > empty because I don't know what is supposed to go there, but the value > Configure.pl used to use is definitely not right for Solaris's linker. I applied the ld_shared_flags portion of this. When I went to get things working on win32/cygwin, I didn't know what "-Wl,-soname,libparrot$(SO)" was for, so I left it in. Taking it out as your patch had, seems to work fine. However, I did not apply the following: > -ld_shared => '-shared', > +ld_shared => $Config{lddlflags}, With that bit applied, I get the following error during "make shared" make[1]: Leaving directory `/cygdrive/d/p/parrot-manfree/parrot/classes' gcc -s -L/usr/local/lib -s -L/usr/local/lib -o blib/lib/libparrot.so exceptions.o ...(lots of .o files)... chartypes/usascii.o -lcrypt /usr/lib/libcygwin.a(libcmain.o)(.text+0x6a): undefined reference to `WinMain@16' collect2: ld returned 1 exit status Where LD_SHARED = -shared -L/usr/local/lib It seems that cygwin GCC does not like the -s, and would much rather prefer -shared to work properly. The Makefile was built using cygwin perl (that's why its using cygwin GCC), so perhaps cygwin perl's $Config{lddlflags} is incorrect? Any ideas on how to resolve this? Thanks, Mike Lambert
Re: DOD etc
Let me ask a somewhat obvious question here. Why is deterministic destruction needed? The most often-used example is that of objects with external resources like filehandles or network sockets. Let me take that argument for the duration of this email, but please feel free to bring up other reasons that deterministic destruction is needed. For the most part, the programmer should be perectly aware of when a determinstic destruction object should be desetructed. 90% of the cases involve the object being on the stack, and going out of scope. The remaining 10%, in my mind, are the ones where the programmer passes on a filehandle to some code which will do stuff with the filehandle later in the program, and it needs to hold a reference to it. This tells me that if we make an attribute stack_collected, the user could use that when they are sure they are done with the filehandle. { my $fh is stack_collected = new IO::FileHandle(..) print $fh whatever; } # $fh is collected here The other reason for ref-counted (I think) objects is to avoid pushing certain system limits, like 64 filehandles, etc. This mirrors the situation of headers, where we have a limited number of headers, and try to avoid allocating new ones. If we are able to define a new type of precious resource, we can make the GC handle them efficiently. On allocation of a new PMC with type PRECIOUS_filehandle, we can check how many PRECIOUS_filehandle's exist, and if there's no room to allocate anymore, we can trigger a DOD run to attempt to free some up. This particular system would allow us to avoid over-allocating certain system resources like filehandles and network sockets, while not placing a burden upon the code that doesn't care for such precious resources. Is there still a need for determinstic destruction, even in light of the alternative approaches mentioned above? Thanks, Mike Lambert
Re: [perl #16755] imcc requires Parrot_dlopen but HAS_DLOPEN is never defined
> It currently works on my version of MSVC with nmake and friends. A few > minutes ago, it worked on cygwin/GCC as well. Unfortunately, I broke > something, I'm not sure what, and it doesn't work on cygwin anymore. I'm > going to sleep now, and will probably pick up again on this tomorrow > night. Okay, with a bit more rejiggering tonight of some defines that were being misused, I got dlopen to work on cygwin. I've also committed the revised patch, in an attempt to help force out any remaining issues before we release 0.0.8. So currently, if one does a CVS checkout on win32, and is using cygwin or msvc, they can do: Configure.pl && cd languages\perl6 && make && make test And it should proceed to properly pass all of the compiler tests, aside from 8_5 and 8_6, which are a bug with the perl6 compiler somewhere (verified by sean and leo). Mike Lambert
Re: [perl #16755] imcc requires Parrot_dlopen but HAS_DLOPEN is never defined
> > "make shared" dies with 'missing .h files' > > More competent and/or Windows-savvy hands than mine are working on this as > we speak. I believe the proper term is stubbornly persistent. Attached is a patch to fix up parrot shared-libraries, imcc, and perl6 all to work on win32. It currently works on my version of MSVC with nmake and friends. A few minutes ago, it worked on cygwin/GCC as well. Unfortunately, I broke something, I'm not sure what, and it doesn't work on cygwin anymore. I'm going to sleep now, and will probably pick up again on this tomorrow night. Is getting perl6 working on win32 a priority for 0.0.8? I wouldn't want to commit code to fix known problems the night before 0.0.8 ships, since there are some people who are adamantly against that. This code will probably only work on unix/win32 when it is done. From what I can tell, it only worked on unix to begin with (due to shared-library/dynamic-loading usage), so I believe this is an improvement. :) Some of the steps taken in this patch could be deemed hacks. Some people are of the opinion that this is okay if it gets perl6 working on win32 for 0.0.8. I'm rather unfamiliar with how cross-platform makefile weirdness should be resolved, so I'd appreciate any advice on how to fix up some of the issues. (see imcc.y MSC_VER ifdef's, root.in's LD_SHARED_FLAGS, root.in's ${blib_lib_libparrot_a, and libparrot.def}) Any of the win32 folk out there want to try this patch and see if it resolves any issues for you? Thanks, Mike Lambert Index: config/gen/makefiles.pl === RCS file: /cvs/public/parrot/config/gen/makefiles.pl,v retrieving revision 1.2 diff -u -r1.2 makefiles.pl --- config/gen/makefiles.pl 29 Jul 2002 04:41:24 - 1.2 +++ config/gen/makefiles.pl 26 Aug 2002 08:14:15 - @@ -17,6 +17,7 @@ genfile('config/gen/makefiles/miniperl.in', 'languages/miniperl/Makefile'); genfile('config/gen/makefiles/scheme.in','languages/scheme/Makefile'); genfile('config/gen/makefiles/perl6.in', 'languages/perl6/Makefile'); + genfile('config/gen/makefiles/imcc.in', 'languages/imcc/Makefile'); } 1; Index: config/gen/makefiles/root.in === RCS file: /cvs/public/parrot/config/gen/makefiles/root.in,v retrieving revision 1.24 diff -u -r1.24 root.in --- config/gen/makefiles/root.in25 Aug 2002 23:39:15 - 1.24 +++ config/gen/makefiles/root.in26 Aug 2002 08:14:16 - @@ -1,12 +1,13 @@ O = ${o} -SO = .so -A = .a +SO = ${so} +A = ${a} RM_F = ${rm_f} RM_RF = ${rm_rf} AR_CRS = ar crs LD = ${ld} LD_SHARED = ${ld_shared} LD_OUT = ${ld_out} +LD_SHARED_FLAGS=${ld_shared_flags} INC=include/parrot @@ -158,10 +159,6 @@ mops : examples/assembly/mops${exe} examples/mops/mops${exe} -# XXX Unix-only for now -libparrot$(A) : $(O_DIRS) $(O_FILES) - $(AR_CRS) $@ $(O_FILES) - $(TEST_PROG) : test_main$(O) $(GEN_HEADERS) $(O_DIRS) $(O_FILES) lib/Parrot/OpLib/core.pm lib/Parrot/PMC.pm $(LD) ${ld_out}$(TEST_PROG) $(LDFLAGS) $(O_FILES) test_main$(O) $(C_LIBS) @@ -180,50 +177,60 @@ # # Shared Library Targets: # -# XXX This target is not portable to Win32 -# ### blib : - mkdir -p blib - -blib_lib : - mkdir -p blib/lib + -mkdir blib -shared : blib_lib blib/lib/libparrot$(SO) blib/lib/libcore_prederef$(SO) $(TEST_PROG_SO) +blib_lib : blib + -mkdir blib${slash}lib -blib/lib/libparrot$(SO).${VERSION} : blib_lib $(O_DIRS) $(O_FILES) - $(LD) $(LD_SHARED) -Wl,-soname,libparrot$(SO).${MAJOR} $(LDFLAGS) $(LD_OUT)blib/lib/libparrot$(SO).${VERSION} $(O_FILES) +shared : blib_lib blib/lib/libparrot$(SO) ${blib_lib_libparrot_a} $(TEST_PROG_SO) -blib/lib/libparrot$(SO).${MAJOR}.${MINOR} : blib/lib/libparrot$(SO).${VERSION} - $(RM_F) $@ - cd blib/lib; ln -s libparrot$(SO).${VERSION} libparrot$(SO).${MAJOR}.${MINOR} - -blib/lib/libparrot$(SO).${MAJOR} : blib/lib/libparrot$(SO).${MAJOR}.${MINOR} - $(RM_F) $@ - cd blib/lib; ln -s libparrot$(SO).${MAJOR}.${MINOR} libparrot$(SO).${MAJOR} - -blib/lib/libparrot$(SO) : blib/lib/libparrot$(SO).${MAJOR} - $(RM_F) $@ - cd blib/lib; ln -s libparrot$(SO).${MAJOR} libparrot$(SO) - -blib/lib/libcore_prederef$(SO).${VERSION} : blib_lib core_ops_prederef$(O) - $(LD) $(LD_SHARED) -Wl,-soname,libparrot$(SO).${MAJOR} $(LDFLAGS) $(LD_OUT)blib/lib/libcore_prederef$(SO).${VERSION} core_ops_prederef$(O) +# XXX Unix-only for now +blib/lib/libparrot$
Re: [perl #16269] [PATCH] COW...Again and Again
> On Wed, Aug 21, 2002 at 04:17:30AM -0400, Mike Lambert wrote: > > Just to complete this thread, I have committed the current version of my > > COW code, as I promised earlier this week. > > Did you try running tests with GC_DEBUG on? I get numerous failures. > Here's a patch with a couple of fixes (not all of them gc-related), > though I should warn you that it is rather roughly carved out of my > local copy, which has far too many modifications at the moment. Look at the hypocrite! He writes GC_DEBUG code to make others fix GC problems, then forgets to do the same for his own GC problems. I have another GC_DEBUG patch in the wings which should make this easier to test with if you compile with --debugging, when I get around to properly cleaning it up. > With this patch, things work for me, but I punted in one place: if you > look at unmake_COW() in string.c, I just disabled garbage collection > around the reallocation. The problem seems to be that you change where > s->bufstart points to, then call Parrot_reallocate_string() on s. But > that can trigger a collection, and it gets confused by the > inconsistent state. Hrm, yeah. When I did that in unmake_COW, it seemed like a neat hack. Unfortunately, all hacks are bad hacks with GC. :) Your patch looks good for now, although I'll have to think about a better way to solve the problem than blocking GC. > This patch also contains a debugging aid that I was speculating on > earlier. Whenever a buffer is marked as being live, it checks to see > if the buffer is on the free list. If so, it whines and complains. > (This is for finding premature killing of newborn Buffers; they'll go > through one sweep and get put on the free list, then get anchored to > the root set somehow. The next sweep will find them.) I see your clever use of a version tag to find the source of the problem. This should certainly help with buffers, and I imagine something similar can be done to PMCs as well. I'll take care of applying this patch, as I've other changes to submit. (See below.) > This is not 100% accurate, because the stackwalk is conservative: it > assumes that anything whose pointer is in the appropriate range and > otherwise smells right must be a live PMC or Buffer. I thought that > finding old Buffers on the stack would be rare, but I was wrong -- the > problem is that if they were buried in some deeply nested call chain, > then because stack frames are not zeroed out upon allocation (which > would be horribly slow), a later call chain will dig them back up. And > the stackwalking code itself is deep enough and has large enough stack > frames to dig up quite a bit of junk. This problem is theoretically identical to there being perfectly valid data on the stack that appears to be a pointer to a freed header. Even if we could memset(0), it would not eliminate this particular problem. > Note that this is a real bug; we really shouldn't be poking into dead > Buffers. There's no telling what the current state of decomposition > is. A seg fault might jump out and bite us, or worse, because that > pointer may have been used by something else for its own twisted > purposes. And that something else could get very upset when it returns > to find that we've jammed flags into the corpse and used its liver for > a link in the ->next_for_GC chain. I would mostly disagree about it being a bug. I originally thought it was a problem as well, until I talked it out loud on IRC once. First, free buffer/pmcs have one field you can't touch: bufstart/vtable. This first field of the structure is used as a pointer to create the freed-header linked list. However, every other bit of data in the header is part of allocated, valid memory. Parrot will dole out this memory in their current form to code requesting headers later. We can't possibly segfault by messing with the flag fields. Other than that, dead headers just sit around as part of this linked list waiting for someone to request them. So...if we find a buffer pointer on the stack, we modify it's flags in buffer_lives. This is perfectly harmless. When we perform the DOD free_unused_buffers, we only add it to the free list (and modify bufstart) if it's not BUFFER_on_free_list_FLAG or BUFFER_live_FLAG. So stuff already on the free list will stay that way. > It is unlikely to cause problems accidentally, I suppose -- pointers > are checked to make sure they're within one of the pools, so the only > way to run into problems is to have a pointer on the stack to > somewhere within a Buffer's memory, kill off the Buffer by forgetting > about it, then have DOD add the bogus Buffer back onto the free list. > Or have the COW code chase down a bogus tail pointer. Or use a bogus > PMC instead -- then you have ->vtable->mar
Re: [perl #16269] [PATCH] COW...Again and Again
> Some final 5000 life results from my system, and a few improvements > I believe are still possible: > > Before COW: 172 seconds > After COW: 121 seconds > A 30% improvement in performance is not too bad, I suppose. > Well done Mike! Thanks! > CVS/COW with stack pointer alignment = four: 93 seconds > Above plus pre-mask for PMC/Buffer alignment = four: 90 seconds > > The first of these improvements is achieved by determining > the alignment with which pointers are actually placed on the > stack, versus PARROT_PTR_ALIGNMENT, which is the > minimum alignment permitted by the underlying system. > On an Intel x86 platform running linux, I have been unable to > persuade any pointer to live on the stack other than on a > four-byte alignment, except by placing it in a struct, and > telling the compiler to pack structs. A simple C program is > included below which illustrates this point. Jason Gloudon has also said that x86 has a four-byte pointer alignment. I seem to recall a pointer aligned to an odd value that I found in a stack walk once, but I'm unable to reproduce it in extensive fiddling with your test program. As such, it's probably worthwhile to implement such a change, although I'm not quite sure the best way to do it. Should this be a configure.pl-determined constant? Should we hardcode it to sizeof(void*)? Is this behavior guaranteed by the C spec? Can we assume it across all platforms even if it is not guaranteed? > > If you don't mind, please feel free to continue your work on parrot-grey. > The problem arises with trying to do new experimental development, > which still keeping sufficiently in sync with cvs parrot that I can do a > 'cvs update' from time to time without getting dozens of conflicts. > A case in point is the new 'strstart' field - grey doesn't need it, but to > leave it out would create a large number of differences between the > two versions, with code having to be changed every time somebody > writes a new reference to it - therefore if I do continue with grey, I will > just probably just leave strstart in, and ignore the memory overhead. > The next item on the list for grey was paged memory allocation - this > may be usable to some extent without the buffer linked lists; so I will > probably give that a spin anyway. I think a union in the string header might do quite nicely in your case. I had the chance to look into your next/prev buffer linking code the other night. Interesting approach, but I have a few questions. :) In your collection phase, you give up header pool cache-coherency in favor of the memory pool. Your headers are organized by bufstart, essentially. Likewise, your use of the circular linked list of headers to add stuff to the front and ends of the header list as necessary is also interesting, and thrw me for a loop for a little while. :) The current cvs approach has an approach which is mostly cache-coherent. It iterates over ALL (not just live ones, as you do) buffers in header pools. And since the last collection, we can assume that most of the data hasn't changed (a harder assumption if we have a generational collector), and so the pool locality should follow the header locality, due to the nature of the copying. I'm not trying to argue which one is better, but merely try and state the differences in implementations to see if I got it straight. Might I ask what your motivation was for the header linked list? I can see that it solves the problem of: set S0, some_large_data_file_contents substr S1, S0, 0, 1 #get first character as COW set S0, "" sweep collect In current CVS, the large data file is kept around, whereas in your implementation, it would only copy the single character. However, there is an easy way to achieve nearly the same behavior as above in the current CVS. When we copy a COW string, it's initially marked as non-COW. In the subsequent collection, we have a really large buffer with a strstart and bufused that are quite small in total usage. If we only copy necessary data for non-COW strings, then the second sweep performed would eliminate the wasted memory copy. Not quite as fast in eliminating the memory usage as the above solution, but since we are guaranteed of collections happening throughout the lifetime of any program that does something with strings, I think it's an okay tradeoff. Were there any other reasons for implementing the above linked list technique that I missed? Thanks, Mike Lambert
Re: Possible bug in new string COW code
> Reading through the latest version of string.c, pondering the > best way to integrate the grey and (what colour is cvs parrot?) > versions, I came across the following line in unmake_COW: > s->buflen = s->strlen; > which got be a little confused - I seem to recall buflen as being > in bytes, and strlen as being in encoding-defined characters. > Did something change when I wasn't looking, or is this a bug > just waiting for somebody to actually implement Unicode? Yep, you're right, that's definitely a bug waiting for unicode. My intention there was to only copy as much data as was needed when we uncowify a buffer. I believe changing strlen to bufused is the proper fix, and have committed said change. Thanks, Mike Lambert
Re: DOD etc
> In this Brave New World of DOD and GCC, what guarantees (if any) > will we be making at the Perl 6 language level for the timely calling of > object destructors at scope exit? >From the straight GC perspective, none. There might be workarounds at higher-levels, however. > ie the classic > > { my $fh = IO::File->new(...); } > > I know there's been lots of discussion on this over the months, > but I'm slighly confused as to what the outcome was. I'm not sure if there ever was a consensus. A few ideas that I recall being brought up were: a) allow the ability to force a DOD run at block exit. This would emulate perl5 behavior, and would be necessary when porting perl5 code with DESTROY methods. I can imagine having a "block-exit-var-in-scope" flag somewhere, that's set when we create a magic filehandle var, and possibly unset with each dod run if the variable goes out of scope. When this flag is set, the interpreter can force a DOD on on some block_exit() opcode, or whatever the interface. b) We can make a special property for these variables: my $fh is stack_freed = IO::File->new(...); When this variable's stack frame goes out of scope, it automatically has it's destructor called, regardless of other references, since it can't detect them. It would leave the actual PMC header as "live" until the next DOD pass, when it would be truly freed. If the next DOD pass finds it alive, it could barf. This isn't entirely safe, but it does offer the best performance, I think. c) similar to b, but more programmer-directed. I believe .NET has two concepts of destruction. IO filehandles can have an active destroy method called directly on them with 'delete someobject', leaving the actual memory hanging around until the next GC (dod) run, at which point it really deletes the header. ..NET improves upon Java's inability to give timely constructors to objects, by allowing the user to manually delete things when they know there are no other references for things that need to free resources. Other than the above, I'm not sure what other methods could be used to force destruction. And I'm not sure if a decree has been made about what Perl6 will do. Mike Lambert
Re: [perl #16269] [PATCH] COW...Again and Again
Just to complete this thread, I have committed the current version of my COW code, as I promised earlier this week. Below is my response to Peter's most recent email. > > Note that the comparison against parrot-grey is not > > exactly fair, because it dodn't use system stackwalking. > > Note that I have only commented out the call to the stackwalk > function - for COW benchmarking purposes you could always > reinstate it. But that is beside the point now - your COW has > been fixed, and the benchmarks confirm that gc_generations > is equally unfriendly to all cows. There will always be programs > that don't benefit and therefore only get the overhead - but in > typical perl usage, I would expect that the majority of programs > will benefit significantly, for example regex capture will be able > to use COWed substrings. Yeah, regex capture should be benefit *big* from COW. It also technically helps make strings act more perl5-like, where you may easily chop characters off the front and end of the string without reallocation. We could even have the non-COW copy collection use strstart and strlen to compact memory usage, giving the best of both worlds for those kinds of applications. > This should finally bring about the demise of grey, as I don't > believe there is room for two totally different implementations > of COW, and my buffer linked list, which is already expensive, > gets absurdly so with the addition of strstart also. This saddens me, and I hope it's not a permanent death. Grey was a very good sanity check for me, at least. It caused us to get a 20% performance in stackwalking (I think), motivated me to improve parrot's cow abilities and performance, and was in general a good wake-up call that some of our decisions were having a negative impact, and should be reconciled. In reality, all that differs between grey's cow and mine, is that mine allows for COWing of substrings with constant strings, and has a modified string.c interface that improves clarity, imo. Fundamentally, it's the original COW you provided a long time ago. I'd hate to make you discontinue your side project because I committed a different implementation of COW that wasn't directly compatible. If you don't mind, please feel free to continue your work on parrot-grey. I'd love to see the other ideas you had mentioned in your previous emails that hadn't yet made it to grey, as some of them didn't sound entirely illegal. You said that parroy grey was a fun project to play around with performance numbers, and I'd hate to be the reason you stopped having fun. :) Thanks, Mike Lambert
Re: [perl #16274] [PATCH] Keyed access
Tom Hughes wrote: > Index: basicvar.pasm > === ... > Index: instructions.pasm > === ... Fixes the bug, and wumpus plays yet again. Applied, thanks. Mike Lambert
Re: GC generation?
> At 6:16 PM -0400 8/20/02, John Porter wrote: > >Dan Sugalski wrote: > >> I expect a UINTVAL should be sufficient to hold the counter. > > > >Why? Because you don't expect a perl process to run longer > >than a couple hours? Or because rollover won't matter? > > Rollover won't really matter much, if we're careful with how we > document things. Still, a UINTVAL should be at least 2^32--do you > really think we'll have that many GC generations in a few hours? Currently, 5000 iterations of life execute in 6 seconds, with 42K DOD runs. At that rate, we have a rollover every week. Not really a problem, but if we have code which doesn't allow for rollover, it is a problem. I can see using the generations value to handle code that is dependant upon things "changing". However, as Steve mentioned, it's probably easiest and fastest to just always re-dereference the bufstart. It might be useful to specify the generation within a pool, with the assumption that the GC would track it and promote it to a different generational pool before an overflow occurs. But it'd make more sense to use a byte/short for this, and reset it to 0 with each promotion. (Or in the case of the final generation, ignore rollover.) A dod generation count doesn't buy us much. Because we don't track inter-pool pointers, we need to do a full dod every time we need to determine the root set. However, copy collection can be localized to a given pool, and as long as we copy every header into that pool, we can avoid copying a lot of data. If we have more DOD's than collections, it would make sense to just iterate over the header list with each collection to search for pool pointers, and hope the generational overhead is outweighed by the ability to avoid re-copying stuff. This will probably be more apparent with real-world programs than test programs that can keep every bit of memory in the cache. If collections are more frequent than DODs, we might be able to set up lists of pointers on a DOD run, organized into generational pool, and just use those during collection. That's effectively one additional pointer per header, however. And there are better uses for such space. Finally, it's possibly to do whole-scale generational promotion of an entire pool, and avoid a generations count altogether. I forget the details exactly, but it involves seperating each generation into two pools, and storing the generation count in the pool itself. It introduces some error into the promotion rates (some stuff is promoted too early, some too late), but it avoids the extra generational count. So in conclusion, generational systems can be done using at most a byte or a short, and it's even possible to do them with nothing at all. So until the need arises, I don't think the generations count would be worth it. Especially since I plan to try and prove the need for a header pool pointer at some point. :) Mike Lambert
Re: [perl #16274] [PATCH] Keyed access
> I have a clean version that's up to date, and as everybody seems to > be happy with it I'm going to go ahead and commit it now. Ah-ha! I found a showstopper! Oh, it's a little late for that, isn't it? :) Anyways, cd to languages/BASIC, run basic.pl, type "LOAD wumpus", and watch it die on "Not a string!". It could be that basic is using keys in weird ways, or it could be that the key patch is borked...I haven't looked into it enough to determine the true cause here. Thanks, Mike Lambert
Re: [perl #16308] [PATCH] logical right shift
> This adds logical shift right opcodes. They are essential for bit shifting > negative values without sign extension getting in the way. Applied, thanks. Mike Lambert
Re: [perl #16300] [BUG] hash clone hangs
> recent changes in hash.c seems to hang[1] hash_clone. > This patch works, but I don't know, if it is the correct way to solve > the problem. Even if it is the correct way to solve the problem (which I don't know), it uses C++-style comments which are a no-no for Parrot's C target. Secondly, can you please turn off strip-trailing-whitespace in your editor? Your patches are reflecting the stripped spaces, which makes it hard to discern intentional changes from accidental ones. Thanks, Mike Lambert
Re: [perl #16269] [PATCH] COW...Again and Again
asm at label getout, so the reported > active buffers and memory use are as accurate as we can > make them. Which makes me think of something - grey is Yes, this solved some of the problem. Adding a sweep&collect dropped the active header usage down to reasonable levels, which helped me realize that I wasn't actually allocating any additional headers due to COW...the GC was just inefficient in it's management of them. > ignoring reclaimable to get the size to allocate for the > post-compaction pool, therefore the memory usage is always > going to be higher than is actually needed - are we simply > looking at excess allocation here, rather than excess usage? > If so, grey will fix it in the next release with paged memory > allocation; and I'm sure you'll think of a solution also. That's also a distinct possibility. The current COW implementation ignores reclaimable altogether, piecing together a proper total_size using the code I posted in a previous email. Aaanywys, >From my current benchmarks, a lot of my worries about COW have been nullified. BASIC wumpus loading now takes place in 1/3 the time, and the worst-case performance is gc_generations of 20%. And that test exclusively uses "repeat" to create lots of strings of varying lifetimes, so it's unreasonable to expect any better performance on it. So, now that the major objections to the previous patch have been addressed, does anyone have any reasons against this patch going in? Thanks, Mike Lambert Index: core.ops === RCS file: /cvs/public/parrot/core.ops,v retrieving revision 1.199 diff -u -r1.199 core.ops --- core.ops18 Aug 2002 23:57:37 - 1.199 +++ core.ops19 Aug 2002 02:28:35 - @@ -166,9 +166,9 @@ } $1 = string_make(interpreter, NULL, 65535, NULL, 0, NULL); - memset(($1)->bufstart, 0, 65535); - fgets(($1)->bufstart, 65534, file); - ($1)->strlen = ($1)->bufused = strlen(($1)->bufstart); + memset(($1)->strstart, 0, 65535); + fgets(($1)->strstart, 65534, file); + ($1)->strlen = ($1)->bufused = strlen(($1)->strstart); goto NEXT(); } @@ -354,7 +354,7 @@ UINTVAL len = $3; s = string_make(interpreter, NULL, len, NULL, 0, NULL); - read($2, s->bufstart, len); + read($2, s->strstart, len); s->bufused = len; $1 = s; goto NEXT(); @@ -418,7 +418,7 @@ op write(in INT, in STR) { STRING * s = $2; UINTVAL count = string_length(s); - write($1, s->bufstart, count); + write($1, s->strstart, count); goto NEXT(); } @@ -2256,7 +2256,7 @@ t = string_make(interpreter, buf, (UINTVAL)(len - s->buflen), NULL, 0, NULL); $1 = string_concat(interpreter, $1, s, 1); } else { -t = string_make(interpreter, s->bufstart, (UINTVAL)len, NULL, 0, NULL); +t = string_make(interpreter, s->strstart, (UINTVAL)len, NULL, 0, NULL); } $1 = string_concat(interpreter, $1, t, 1); @@ -2281,7 +2281,7 @@ } /* XXX this is EVIL, use string_replace */ -n = $1->bufstart; +n = $1->strstart; t = string_to_cstring(interpreter, s); for (i = $4; i < $4 + $2; i++) n[i] = t[i - $4]; @@ -3891,7 +3891,7 @@ switch ($3) { case STRINGINFO_HEADER: $1 = PTR2UINTVAL($2); break; -case STRINGINFO_BUFSTART: $1 = PTR2UINTVAL($2->bufstart); +case STRINGINFO_STRSTART: $1 = PTR2UINTVAL($2->strstart); break; case STRINGINFO_BUFLEN: $1 = $2->buflen; break; @@ -4162,13 +4162,13 @@ void (*func)(void); string_to_cstring(interpreter, ($2)); string_to_cstring(interpreter, ($1)); - p = Parrot_dlopen($1->bufstart); + p = Parrot_dlopen($1->strstart); if(p == NULL) { const char * err = Parrot_dlerror(); fprintf(stderr, "%s\n", err); PANIC("Failed to load native library"); } - func = D2FPTR(Parrot_dlsym(p, $2->bufstart)); + func = D2FPTR(Parrot_dlsym(p, $2->strstart)); if (NULL == func) { PANIC("Failed to find symbol in native library"); } Index: debug.c === RCS file: /cvs/public/parrot/debug.c,v retrieving revision 1.25 diff -u -r1.25 debug.c --- debug.c 18 Aug 2002 23:57:37 - 1.25 +++ debug.c 19 Aug 2002 02:28:37 - @@ -692,7 +692,7 @@ constants[pc[j]]->string->strlen) { escaped = PDB_escape(interpreter->code->const_table-> - constants[pc[j]]->str
Re: [perl #16283] parrot dandruff
> > > Tru64 finds the following objectionable spots from a fresh CVS checkout: > > > > Does this patch fix it? (Though even if it does, I wouldn't be at all > > surprised if some other compiler choked on it.) > > Works okay in Tru64 and IRIX which are known for their pointer pickiness. Applied, thanks. > On IRIX, though, I get these, where probably NO_STACK_ENTRY_TYPE is > meant instead. Applied as well. Mike Lambert
Re: [perl #15907] [PATCH] Make warnings configurable
> In the quest for removing warnings, I added an option --ccwarn to > Configure.pl. With this option I could selectivly turn on and off > warnings, and especially compile with -Werror, so I don't miss any > warnings. The simple warnings (the missing return values) were already > fixed before I was able to submit a patch. Looking at the patch, it seems rather GCC-specific. The checking for "no-X" versus "X" in the warnings flags seems to be rather non-portable to compilers like MSVC. Unfortunately, I don't believe this is easily fixable. Mike Lambert
Re: [perl #16048] [PATCH] Eliminate alignment warning in packfile.c
> The following patch eliminates an alignment warning in packfile.c, and > adds a comment to packfile.h about alignment assumptions underlying the > size of the packfile header. Applied, thanks. > I wonder if we ought to have a Configure "sanity section" wherein various > assumptions are tested prior to build time. Two candidates for such a > section would be > > sizeof(INTVAL) >= sizeof(void *) > PACKFILE_HEADER_BYTES % sizeof(opcode_t) == 0 > > I'm sure there are other assumptions too. Anyone else have any ideas on where the best place to put the above? Configure currently doesn't know about PACKFILE_HEADER_BYTES, since it's a macro in packfile.h. We could check them in Parrot's initialization, but I don't know if that's a good idea. We could create a C file which contains the above assumpts with asserts, and includes parrot.h. Then the main() function could assert on all of the necessary conditions. Configure would compile and run this program to ensure correctness. Thoughts? Anyone want to take a crack at it? Mike Lambert
Re: [perl #16269] [PATCH] COW...Again and Again
o CVS levels, with memory usage lower, but still higher than CVS. It seems that while COW might save memory due to sharing, it also makes when-to-collect logic break, and break our balance of collection frequency and new-block-size, leading to an apparant memory usage increase. I can't really think of any other cause. Personally, I find that COW logic makes things a bit more complex, and somewhat harder to debug. And it certainly requires some more discipline to be sure you copy data before modifying it, etc. So while I've been pushing for COW for a long time, if it turns out to be horribly broken in memory usage, I'm going to have to sideline my work on it and continue with other stuff. :| Thanks, Mike Lambert
Re: [perl #16269] [PATCH] COW...Again and Again
> Elapsed times for 'time parrot hanoi.pbc 14 > /dev/null' are: > CVS: 52.81, 52.05, 52.33 > CVS + grey COW: 51.53, 52.06, 51.67 > CVS + Mike's COW: 44.31, 44.48, 44.55 > CVS + grey1: 35.89, 36.48, 36.60 (+COW +cyclecount -stackwalk) > End June grey: 30.14, 29.35, 29.53 > > And 5000 generations of life tested again: > CVS: 170.22, 169.01, 168.70 > CVS + grey COW: 162.65, 161.44, 163.61 > CVS + Mike's COW: 156.86, 157.78, 157.67 > CVS + grey1: 80.38, 80.74, 80.69 > End June grey: 59.21, 59.41, 59.42 > CVS 14th July: 81.22, 81.17 (last timings I recorded before stack walking) > > So I get an improvement on Hanoi of about 15% using your > COW patch, and your COW is better on both tests than mine. Wow, that's cool, if strange. COW+Hanoi was definitely slower for me. I have another interesting test to try. Run languages/basic/basic.pl. Type "LOAD WUMPUS, and hit return. Type "RUN", and hit return. Type "N" and hit return. It then builds a wumpus maze and does other intensive stuff in the basic interpreter. The patch I sent out runs 4x slower than CVS on the above test. It has a peak memory usage of 22MB, versus straight CVS's 2MB, and your parrot-grey's usage of 6MB. (I didn't compare parroy-grey speed because it wouldn't be fair. ;) My current theory is that it is due to its conservative increase of reclaimable (only if it's guaranteed to be reclaimable), versus grey's always-increment-reclaimable. Technically, mine is correct, since in theory you could make a bunch of COW strings, free them all, have reclaimable be quite large, and have the total_size calculation in resources.c come out negative. :) One idea I had was that because I don't have an accurate reclaimable figure, the asymptotic behavior of the collection pool size was growing each collection due to the under-estimating of reclaimable. Changing this to use a non-increasing number calculated from the pool's free space, brought my usage down to 6MB, closer, but not nearly there. In addition, my COW code was getting roughly 4x slower times after I hit "N". I'm assuming it's related to the memory usage. I was able to get it down to 1.5x the striaght CVS code when I dropped the memory usgae, and I'd like to hope we could make it drop even further by fixing this memory leak. The fact that both COW's require roughly 3x *more* memory is quite surprising. If you (or anyone else) feels like attempting to figure out where our memory is going, I would greatly appreciate it. I'm stumped over here, and am getting frustrated. (That's why I'm writing this email and going to sleep ;) Thanks, Mike Lambert
Re: [perl #16274] [PATCH] Keyed access
> Attached is my first cut of a patch to address the keyed access issues. First, thanks for spending the time to implement and clean up the keyed code. Hopefully this'll clean the floor so that when this list has key discussions, we'll all be arguing about the same thing. :) > This patch doesn't do everything, but it does bring things more or less > in line with Dan's recent specification I hope. I'm sure there are also > problems with it so if we can get some eyeballs on it before I commit it > that would be good. Here are the things I noticed when going through your patch. - assemble.pl: shouldn't the code : elsif ($_->[0] =~ /^([snpk])c$/) { # String/Num/PMC/Key constant include support for "kic" somewhere? the magic numbers in _key_constant, I'm assuming they are supposed to map to the constants in key.h ? Perhaps a note mentioning that correspondance would be useful. Also, it seems the number usage is broken. You use 1,1,1,2,4,7. Shouldn't it be 1,1,1,2,3,5? And shouldn't s/inps/ be s/insp/? Or maybe the constants in key.h need rearranging? - dod.c: Near the comment, "Mark the key constants as live". Constants shouldn't need to be marked live, as constants are prevented from being GC'ed, if PMC_constant_FLAG is set. At least, in theory. Did it not work for you? - core.ops Looking at the set functions, shouldn't the "Px[ KEY ] = Bx" set of functions have $1 defined as inout instead of out in most circumstances? In your definition of the groups of set functions, can you change "Ax = Px[ KEY ]" to "Ax = Px[ INTKEY ]" where appropriate? - key.pmc the mark() function needs to return a value. Namely, the return value of key_mark. - random Your use of registers for key atom values is interesting, and I think it will create problems. It's not a problem with your patch as much a problem with an aspect of the key design, I think. The plan is to allow parrot functions to implement vtable methods in parrot. If I have a key [I0,I1], and pass it to this vtable method, it could be passed to a function implemented in parrot, with all of parrot's calling conventions. This means that by the time it gets to the person implementing the key, it's extremely possible that the registers have been overwritten. I'm not sure how to resolve this one. Alternatives are: a) don't allow register references in keys. Instead, force people to use the key modification ops to reset the key to the correct values each time they want to use it. b) handle auto-generated .ops files, such that if they receive a KEY as a parameter, it calls key_fixup_registers, which grabs the current values of the registers and sticks them into the key structure. This could cause problems with constant keys, so you might need to create a key copy. c) any other ideas? Or should we mark this as a 'known limitation' ? Overall, tho, the patch looks extemely complete. Tracing support, disassemble.pl support, debug.c support, etc. You even reduced macro usage. Rather impressive. :) Thanks, Mike Lambert
Re: [perl #16219] [PATCH] stack direction probe
Applied, thanks. Mike Lambert > This is a config test for the direction of stack growth that makes > the direction a compile time constant. > > -- > Jason
Re: [perl #16278] [PATCH] Quicker pointer checking for stack walking
Applied, thanks. > Moved the static prototype to dod.c > > Jason
Stack Walk Speedups?
As Peter has pointed out, our stackwalk code is rather slow. The code that's in there was my first-attempt at the need for stack walking code. There's one optimization in place, but the algorithm behind the optimization could use some work. Basically, it finds the min and max values of all headers. It does a check (for quick failure purposes) to see if the data on the stack is in that range, and then proceeds to do the accurate check. The accurate check consists of walking through each header pool in an attempt to see if this pointer-sized data could be interpreted as being in that pool. Currently, this is a linear walk over the header pools. I imagine there are many better algorithms for determing a root set from a stack. The boehm collector probably has decent code in this regard. However, given that we have O(N) with size of stack, I'm not sure how we'll be able to alleviate this in the long run. Anyone feeling adventuresome and want to attempt to speed this up? It should be an easy introduction to the GC code in general. Just start out in trace_system_stack, and work your way down. Mike Lambert
Re: [INFO] The first pirate parrot takes to the air
Peter Gibbs wrote: > > How much of the speed win is from the cycle count instead of stack > > walking? Unless you've solved the problem of recursive interpreter > > calls and setjmp, it's not a valid solution, no matter what the speed > > win might be. > According to my notes the progression (for 5000 lives) was: > CVS: 172 seconds > Cycle count instead of stack walk: 97 seconds > COW with stack walk: 158 seconds > Cycle count + COW: 81 seconds Just for fun, can you run Hanoi on CVS versus CVS+COW? I just got COW implemented here, and while I get a 17% speedup on life, I get a 5% loss on hanoi. Since you only posted life, it's a bit hard to see if the drop on hanoi is just my fault, or the fault of COW in general. (More benchmarks will appear in my soon-to-be-sent COW patch email.) Thanks, Mike Lambert
Re: [INFO] The first pirate parrot takes to the air
> For purely academic purposes, I have re-synchronised some of my > forbidden code with the latest CVS version of Parrot. All tests pass > without gc debug; and with gc_debug plus Steve Fink's patch. > Benchmarks follow, on a 166MHz Pentium running linux 2.2.18. > > Parrot African Grey > life (5000 generations) 172 seconds 81 seconds > reverse /dev/null193 seconds 130 seconds > hanoi 14 >/dev/null51 seconds 37 seconds Rather impressive. Except that it makes me look bad. :) > The differences between the two versions are: > 1) Use of the interpreter cycle-counter instead of stack walking. > 2) Linked lists of buffer headers sorted by bufstart > 3) COW-supporting code in GC (for all buffer objects) > 4) Implementation of COW for string_copy and string_substr 1) Yeah, the approach of cycle-counter is a nice one. I also had a similar solution involving neonate flag usage, somewhere in the archives. Both have *significant* speed advantages versus the curent codebase's stack walking. I tried to convince Dan of the merit, but they failed for various reasons: Your solution, (ignoring the extra cycle counter byte for now), cannot handle vtable methods implemented in Parrot. The current system to implement this involves the interpreter recursively calling runops_core to handle the vtable method. If you increment cycles on the inner loop, you risk pre-collection of stuff on the stack of the vtable method calling stuff. If you don't increment cycles, you prevent any of the memory allocated inside of this vtable method from ever being collected during the method execution...bad stuff when your vtable methods are multiplying gigantic matrices or somesuch. My neonate buffers solution fails only in the presence of longjmp. Granted, we don't do any of this yet, so these solutions will mop the floor with my current stackwalk code, and pass tests to boot. But it's the planned introduction of these requirements which are the reason for making these solutions 'forbidden'. One of Nick's solutions was to fallback on stack-walking to handle the cases where our faster solutions fail. I can definitely see that working with neonate buffers to handle the fixups needed after a longjmp call. But it doesn't seem as attractive in the presence of your solution, for which it would require stackwalking for all re-entrant runops calls. Do you have another solutioin in mind for handling re-entrant runops calls? As far as the extra byte in the buffer, I don't mind that one at all. There are a lot of restrictions on the GC code in the interest of making stuff "lightweight". Unfortuantely, GC code takes a significant portion of the execution time in any realistic application. Hopefully we can convince Dan to allow extra fields in the buffers in the interest of speed, but I don't think we can reduce parrot/perl6's feature set in the interest of speed...might as well use C if that's what you want. :) 2) Currently, we use linked list of buffer headers for freeing and allocating headers. I'm not sure what you mean by saying that they are sorted by bufstart? What does this buy over the current system? 3) Definitely a good one. I've been trying to merge your original COW patch into my code here. Without GC_DEBUG, it fails one test. With GC_DEBUG, it fails the traditional set plus that one test. The test case is rather large unfortunately, I haven't been able to narrow down the problem further or I'd have committed it. 4) Isn't this really the same thing as item 3? I'm basing my knowledge off your old COW patches. Has additional work been done on the string function integration since then, or do #3 and #4 both come from those patches? > Some of the changes I made before the memory management > code was totally reorganised have not yet been re-integrated. > My last version prior to that reorganisation ran 5000 lives in > 61 seconds, and I hope to get back to somewhere close to > that again. I'm not sure how much of the new code you've merged with. Which of the new files are you planning to integrate/merge with, and which have you thrown out in favor of older versions? I'm specifically referring to any of resources/dod/smallobject/headers.c. Regardless of whether or not it goes in, I'd be interested in seeing a patch. I can work on integrating a lot of your non-forbidden code into the current codebase. Thanks for spending the time to generate these numbers...they're a nice eyeopener on what can be done without the current restrictions. Hopefully they'll allow us to reconsider each restriction in the context of the speed of our GC. Mike Lambert
Re: [COMMIT] GC_DEBUG, Some GC Fixes, and Remaining GC Bugs
> Somebody gimme a cookie. /me hands Steve a cookie. > If the rx info object is going away, then obviously those parts of the > patch need not be applied. But in the meantime, it's nice to have a > Parrot that doesn't crash. I agree. My disclaimer about the regex code in my original email was to suggest that we didn't need to focus on the rx issues, but if you've already done it... :) > I'm not going to apply this patch yet because I'm sure someone will > disagree with how it fixes some or all of these bugs. So would that > someone please speak up? Thanks. I suppose that someone is me, although there might be other someones. > In summary, this patch > > - Adds an OUT parameter to new_hash() so the hash is anchored to the root set >while it is being constructed. > - Adds an OUT parameter to hash_clone() for early anchoring. > - Adds an OUT parameter to rx_allocate_info() for early anchoring. > - Briefly disables DOD while a stack is being created so allocating the contents >of the stack buffer doesn't destroy the unanchored buffer header. These are needed for now. However, when we get that buffer/pmc unification, we should be able to make mark() methods in the header pools. Then, with support for non-pmc-wrapped buffers, we can find references to them on the system stack, and call their mark() method directly, avoiding the above hoops. At least, that's my hope. Is it possible to mark the above code with some XXX tag so that we can re-address it when we get the unification in place? > - Makes a major change to the Pointer PMC: the previously unused ->cache area >is now used to hold a pointer to a custom mark routine that will get fired >during PMC traversal. Previously, Pointers had the PMC_private_GC_FLAG set, >but nothing ever looked at it. With this change, Pointers behave as they >always did unless something externally sets the ->cache.struct_val field >(in other words, there is no vtable entry for setting the mark routine, >and the PMC's custom mark routine does nothing if that field is NULL.) > > - Reorders the rx_allocinfo opcode to assign things in the correct order and >fill in the ->cache.struct_val field of the Pointer PMC it creates. These are a bit hackish, but I agree they are needed to solve our GC_DEBUG problems (and by extension, "real-world Parrot programs" ;). Both of these should also be able to "go away" with the unification, so see previous paragraph. :) I think I'm going to make GC_DEBUG a parameter of the interpreter, and allow it to be turned on/off via opcodes. Then we could force our test suite to use GC_DEBUG to root out GC problems a lot sooner than they otherwise would. Fixing all GC_DEBUG problems would help allow this kind of testing to be part of the standard test suite. > - In interpreter.c, asserts that a few of the early buffer creations do not >return the same buffer (provides early warning of GC mischief) Oooh, nice! :) The rest of the things you listed, which I didn't comment on are, imo, perfectly fine. In conclusion, I don't have any objections to this patch, although it would be nice if "XXX Unification" markers were included in places that needed to be addressed later. Mike Lambert
Re: [COMMIT] GC_DEBUG, Some GC Fixes, and Remaining GC Bugs
> > Anyone more well-versed in these departments than I care to take a look at > > the potential problems? Just change GC_DEBUG in parrot.h, and you can be > > on your way. :) > > I can't get to it because parrot doesn't survive past initialization > for me. When it creates the Array PMC for userargv, it allocates the > PMC first and then the buffer for the array's data. During the > buffer's creation, it does a collection that wipes out the PMC. My > lo_var_ptr and hi_var_ptr are set to reasonable-sounding values at the > top of trace_system_stack(), but I haven't been able to track it > farther yet. Oh, and I do have your recent patch to set > interpreter->lo_var_ptr early. > > The userargv PMC is not anchored other than in the C stack, because it > dies in the pmc_new() creation process before the assignment to P0 can > run. Weird. I had to move the lo_var_ptr initialization code to runcode instead of runops, in order to avoid collecting the ARGV pmc. The new code looks like: void *dummy_ptr; PMC *userargv; Is it possible that some systems might put dummy_ptr higher in memory than userargv, thus causing userargv to become prematurely collected? If so, there are three options: - make two dummy ptrs, and choose the lesser of the two. - set the dummy ptr to userargv, and hope we don't add two header variables. ;) - force the setting of lo_var_ptr upon the 'main' code in test_main.c, above all possible functions. I think 1 is easiest, but 3 does have the advantage of allowing the user to do GC stuff outside of the parrot execution loop, like allocating global variables (like argv, but app-specific), etc. Of course, it also imposees additional coding overhead on the embedding programmer. Mike Lambert
[COMMIT] GC_DEBUG, Some GC Fixes, and Remaining GC Bugs
Hey, I re-added the GC_DEBUG define today, and weeded out a bunch of issues. For those who don't remember, GC_DEBUG (currently in parrot.h) causes various limits and settings and logic to be setup such that GC bugs occur relatively soon after the offending code. It allocates one header at a time, and performs DOD and collection runs extremely frequently (effectively, anywhere they could possibly occur if GC_DEBUG weren't defined.) It's goal is to make GC bugs which appear only in complex programs...appear in simpler ones as well. Check the cvs-commit traffic if you're interested in what issues I've fixed already. From what I can tell, two things remain: - regexes (these are known to be broken. angel's latest patch should fix these in theory. Probably not worth spending time on fixing these.) - hashes (these were recently rewritten to use indices, a step forward, but they aren't 100% clean yet) - lexicals (there's one remaining issue on the last test I didn't look into) - subs (likely includes all variety of them. Basically, I got the wrong result on one test, instead of GPF failures like I received on the above bugs.) - possibly other that got lost in the noise of the above issues Anyone more well-versed in these departments than I care to take a look at the potential problems? Just change GC_DEBUG in parrot.h, and you can be on your way. :) Thanks, Mike Lambert
Re: Unifying PMCs and Buffers for GC
Peter Gibbs wrote: > I am very much in agreement with this concept in principle. I would like you > to consider adding a name/tag/id field to all pool headers, containing a > short text description of the pool, for debugging purposes. I don't have a problem with that. And yes, it'd definitely help debugging (as opposed to printing out the various pool addresses and comparing them ;) > > One idea, which is most closely in line with the current semantics, is to > > add a pool pointer to every header. I've found a few times in the past > > where such a pointer would have come in handy. This would allow us to call > > the pool's mark() function, to handle stuff like pointing-to-buffers, etc. > This is something I have done in my personal version, for buffer headers > only at present (I have been mainly ignoring PMCs, as I believe they are > still immature). I use it for my latest version COW code, as well as to > allow buffer headers to be returned to the correct pool when they are > detected as free in code that is not resource-pool driven. Re: DOD immaturity: Yeah, I agree to some extent. It's somewhat difficult to test DOD efficiency because every string is directly traceable from the root, thus avoding mark_used for the most part. Perhaps some GC-PMC benchmarks are needed to weed out remaining issues. Re: COW code. Ooohh! You've kept it up date with the current code? I was working on applying your old patch (ticket 607 at http://bugs6.perl.org/rt2/Ticket/Display.html?id=607), but if you've gow COW code in the current build, that's even better. One question: does your current code utilize bufstart as the beginning of the buffer, or the beginning of the string? > > b) it allows us to make new types of buffer-like headers on par with > > existing structures. > On this subject, I would like to see the string structure changed to include > a buffer header structure, rather than duplicating the fields. This would > mean a lot of changes (e.g. all s->bufstart to s->buffer.bufstart), but > would be safer and more consistant. Of course, strings may not even > warrant existence outside of a generic String pmc any more. Again, I agree. If the COW code forces all the string usage to use strstart and strlen, then bufstart and buflen essentially are used a *lot* less. This should make the mental transition easier. > One option would be to use a limited set of physical sizes (only multiples > of 16 bytes or something) and have free lists per physical size, rather than > per individual pool. This would waste some space in each header, but may > be more efficient overall. I suppose this allows us to mix and match entries of different types in same pools, since each header would have a pointer to its own pool, regardless of its neighbors. However, the number 16 could be tuned to 4 or 1 to achieve slightly better mem usage. (Or even POINTER_ALIGNMENT). > > Finallythe unification of buffers and PMCs means that buffers can now > > point to things of their own accord, without requiring that they be > > surrounded by an accompanying PMC type. > How about the other way round? If the one-size-fits-all PMCs were to be > replaced by custom structures, then everything could be a PMC, and > buffer headers as a separate resource could just disappear! I think you misunderstood me here. I agree that making the buffer headers a distinct resource is unnecessary. However, this does mean that all headers need to be traced now. For pure strings, this can hurt performance, although one can argue that it helps performance in the general case of the PMC containing buffer data (a couple less indirections needed on usage). We could make a new header flag, BUFFER_has_pointers_FLAG, which specifies that this buffer contains pointers to other data structures, and should be traced. If this is unset, the buffer doesn't get added onto the free list. Since adding it to the free list requires adjusting next_for_GC, it's already going to reference memory there. Checking the flag would merely prevent traversing the memory again in the 'process' portion. Thanks for the quick reply, Mike Lambert
Re: Unifying PMCs and Buffers for GC
Mike Lambert wrote: > One idea, which is most closely in line with the current semantics, is to > add a pool pointer to every header. I've found a few times in the past > where such a pointer would have come in handy. This would allow us to call > the pool's mark() function, to handle stuff like pointing-to-buffers, etc. Oh, I meant to mention an alternative to the pool pointer, but forgot... At one point, we had a mem_alloc_aligned, which guaranteed the start of a block of memory given any pointer into the contents of the block. If we store a pointer to the pool at the beginning of each set of headers, then we navoid the need for a per-header pool pointer, at the cost of a bit more math and an additional dereference to get at it. The benefits to this are the drawbacks to the aforementioned approach, but the drawbacks include: - additional cpu, and/or cache misses in getting to the pool. for dod, this might be very inefficient. - it imposes additional memory requirements in order to align the block of memory, and imposes a bit more in this 'header header' at the beginning of the block of headers. Mike Lambert
Re: Unifying PMCs and Buffers for GC
ecific memory to handle the various pointers that are required for DODbut the point remains that this further increases the memory footprint of buffers, and I wanted to verify that it was okay. Comments and/or suggestions, please? Thanks, Mike Lambert
Re: [perl #15942] UNICOS/mk new unhappiness: hash.c
Hey, I was going throuh the RT system looking to resolve issues. It looks like the offending lines of code are still there. A quick look at the problem, and I see the following patch: Index: hash.c === RCS file: /cvs/public/parrot/hash.c,v retrieving revision 1.10 diff -u -r1.10 hash.c --- hash.c 2 Aug 2002 02:58:27 - 1.10 +++ hash.c 4 Aug 2002 07:09:33 - @@ -437,7 +437,7 @@ HASHBUCKET * b = table[i]; while (b) { /* XXX: does b->key need to be copied? */ -hash_put(interp, ret, b->key, key_clone(interp, &(b->value))); +hash_put(interp, ret, b->key, b->value); b = b->next; } } Unfortunately, this causes different semantics for whether you are storing primitives or pointers (primitives copy, whereas pointers are shallow). Of course, one could argue that the previous one didn't work at all. :) Thoughts? Mike Lambert Sean O'Rourke wrote: > Date: Fri, 2 Aug 2002 08:20:58 -0700 (PDT) > From: Sean O'Rourke <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: Re: [perl #15942] UNICOS/mk new unhappiness: hash.c > > That's me. Will fix. > > /s > > On Fri, 2 Aug 2002, Jarkko Hietaniemi wrote: > > > # New Ticket Created by Jarkko Hietaniemi > > # Please include the string: [perl #15942] > > # in the subject line of all future correspondence about this issue. > > # http://rt.perl.org/rt2/Ticket/Display.html?id=15942 > > > > > > > The subroutine.pmc and sub.pmc problems ([perl #15920]) are gone now > > that Dan checked in the patches but now new discontent has appeared: > > > > CC-167 cc: ERROR File = hash.c, Line = 440 > > Argument of type "KEY_ATOM *" is incompatible with parameter of type "KEY *". > > > > hash_put(interp, ret, b->key, key_clone(interp, &(b->value))); > > ^ > > > > CC-167 cc: ERROR File = hash.c, Line = 440 > > Argument of type "KEY *" is incompatible with parameter of type "KEY_ATOM *". > > > > hash_put(interp, ret, b->key, key_clone(interp, &(b->value))); > > ^ > > > > 2 errors detected in the compilation of "hash.c". > > > > -- > > $jhi++; # http://www.iki.fi/jhi/ > > # There is this special biologist word we use for 'stable'. > > # It is 'dead'. -- Jack Cohen > > > > > >
Re: [perl #15943] [PATCH] UNICOS/mk vs dynaloading continues
Applied, thanks. Mike Lambert Jarkko Hietaniemi wrote: > Date: Fri, 02 Aug 2002 15:03:21 GMT > From: Jarkko Hietaniemi <[EMAIL PROTECTED]> > Reply-To: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: [perl #15943] [PATCH] UNICOS/mk vs dynaloading continues > Resent-Date: 2 Aug 2002 15:03:21 - > Resent-From: [EMAIL PROTECTED] > Resent-To: [EMAIL PROTECTED] > > # New Ticket Created by Jarkko Hietaniemi > # Please include the string: [perl #15943] > # in the subject line of all future correspondence about this issue. > # http://rt.perl.org/rt2/Ticket/Display.html?id=15943 > > > > Sorry, I missed this patch hunk from #15880 (but I still think > eventually the dynaloading should be separated from the generic > "platform"): > > --- config/gen/platform/generic.c.dist2002-08-02 17:58:47.0 +0300 > +++ config/gen/platform/generic.c 2002-08-02 17:59:24.0 +0300 > @@ -4,7 +4,9 @@ > > #include > #include > -#include > +#ifdef HAS_HEADER_DLFCN > +# include > +#endif > > #include "parrot/parrot.h" > > -- > $jhi++; # http://www.iki.fi/jhi/ > # There is this special biologist word we use for 'stable'. > # It is 'dead'. -- Jack Cohen > > >
Re: [perl #15953] [PATCH] More GC tests
Applied, thanks. Mike Lambert Simon Glover wrote: > Date: Fri, 02 Aug 2002 21:40:51 GMT > From: Simon Glover <[EMAIL PROTECTED]> > Reply-To: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: [perl #15953] [PATCH] More GC tests > Resent-Date: 2 Aug 2002 21:40:52 - > Resent-From: [EMAIL PROTECTED] > Resent-To: [EMAIL PROTECTED] > > # New Ticket Created by Simon Glover > # Please include the string: [perl #15953] > # in the subject line of all future correspondence about this issue. > # http://rt.perl.org/rt2/Ticket/Display.html?id=15953 > > > > > A few more tests for the GC ops. > > Simon > > --- t/op/gc.t.old Fri Aug 2 17:03:13 2002 > +++ t/op/gc.t Fri Aug 2 17:39:17 2002 > @@ -1,6 +1,70 @@ > #! perl -w > > -use Parrot::Test tests => 1; > +use Parrot::Test tests => 5; > + > +output_is( <<'CODE', '1', "sweep" ); > + interpinfo I1, 2 # How many DOD runs have we done already? > + sweep > + interpinfo I2, 2 # Should be one more now > + sub I3, I2, I1 > + print I3 > + end > +CODE > + > +output_is( <<'CODE', '1', "collect" ); > + interpinfo I1, 3 # How many garbage collections have we done already? > + collect > + interpinfo I2, 3 # Should be one more now > + sub I3, I2, I1 > + print I3 > + end > +CODE > + > +output_is( <<'CODE', <<'OUTPUT', "collectoff/on" ); > + interpinfo I1, 3 > + collectoff > + collect > + interpinfo I2, 3 > + sub I3, I2, I1 > + print I3 > + print "\n" > + > + collecton > + collect > + interpinfo I4, 3 > + sub I6, I4, I2 > + print I6 > + print "\n" > + > + end > +CODE > +0 > +1 > +OUTPUT > + > +output_is( <<'CODE', <<'OUTPUT', "Nested collectoff/collecton" ); > + interpinfo I1, 3 > + collectoff > + collectoff > + collecton > + collect # This shouldn't do anything... > + interpinfo I2, 3 > + sub I3, I2, I1 > + print I3 > + print "\n" > + > + collecton > + collect # ... but this should > + interpinfo I4, 3 > + sub I6, I4, I2 > + print I6 > + print "\n" > + > + end > +CODE > +0 > +1 > +OUTPUT > > output_is(<<'CODE', < print "starting\n" > > > >
Re: [perl #15952] [PATCH] Minor doc fix in core.ops
Applied, thanks. Mike Lambert Simon Glover wrote: > Date: Fri, 02 Aug 2002 21:39:13 GMT > From: Simon Glover <[EMAIL PROTECTED]> > Reply-To: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: [perl #15952] [PATCH] Minor doc fix in core.ops > Resent-Date: 2 Aug 2002 21:39:13 - > Resent-From: [EMAIL PROTECTED] > Resent-To: [EMAIL PROTECTED] > > # New Ticket Created by Simon Glover > # Please include the string: [perl #15952] > # in the subject line of all future correspondence about this issue. > # http://rt.perl.org/rt2/Ticket/Display.html?id=15952 > > > > > mem_allocs_since_last_collect is the number of new blocks allocated, > not the total memory allocated. > > Simon > > --- core.ops.old Fri Aug 2 17:32:26 2002 > +++ core.ops Fri Aug 2 17:33:32 2002 > @@ -3797,7 +3797,7 @@ structures. > =item 8 The number of headers (PMC or buffer) that have been allocated > since the last DOD run. > > -=item 9 The amount of memory allocated since the last GC run. > +=item 9 The number of new blocks of memory allocated since the last GC run. > > =item 10 The total amount of memory copied during garbage collections. > > > > > > >
Re: [perl #15951] [BUG] header_allocs_since_last_collect neverupdated
Fixed, thanks. Mike Lambert Simon Glover wrote: > Date: Fri, 02 Aug 2002 21:19:29 GMT > From: Simon Glover <[EMAIL PROTECTED]> > Reply-To: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: [perl #15951] [BUG] header_allocs_since_last_collect never > updated > Resent-Date: 2 Aug 2002 21:19:29 - > Resent-From: [EMAIL PROTECTED] > Resent-To: [EMAIL PROTECTED] > > # New Ticket Created by Simon Glover > # Please include the string: [perl #15951] > # in the subject line of all future correspondence about this issue. > # http://rt.perl.org/rt2/Ticket/Display.html?id=15951 > > > > > The title says it all really: the counter in the interpreter structure > that tracks recent header allocations is initialized to 0 when the > interpreter is set up, but isn't incremented when headers are allocated. > Consequently, this: > > interpinfo I1, 8 > print I1 > > always prints zero. > > Simon > > > >
Re: [perl #15949] [PATCH] Silence warning in hash clone
Applied, thanks. Mike Lambert Simon Glover wrote: > Date: Fri, 02 Aug 2002 21:00:19 GMT > From: Simon Glover <[EMAIL PROTECTED]> > Reply-To: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: [perl #15949] [PATCH] Silence warning in hash clone > Resent-Date: 2 Aug 2002 21:00:19 - > Resent-From: [EMAIL PROTECTED] > Resent-To: [EMAIL PROTECTED] > > # New Ticket Created by Simon Glover > # Please include the string: [perl #15949] > # in the subject line of all future correspondence about this issue. > # http://rt.perl.org/rt2/Ticket/Display.html?id=15949 > > > > > hash->num_buckets is unsigned, so we were getting a "comparison between > signed and unsigned" warning. Patch below fixes. > > Simon > > --- hash.c.oldFri Aug 2 16:51:05 2002 > +++ hash.cFri Aug 2 16:52:28 2002 > @@ -432,7 +432,7 @@ HASH * > hash_clone(struct Parrot_Interp * interp, HASH * hash) { > HASH * ret = new_hash(interp); > HASHBUCKET ** table = (HASHBUCKET **)hash->buffer.bufstart; > -int i; > +UINTVAL i; > for (i = 0; i < hash->num_buckets; i++) { > HASHBUCKET * b = table[i]; > while (b) { > > > > > >
Re: [perl #15948] [PATCH] Configure broken on windows 9x
Applied, thanks. Mr. Nobody wrote: > Date: Fri, 02 Aug 2002 20:57:57 GMT > From: Mr. Nobody <[EMAIL PROTECTED]> > Reply-To: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: [perl #15948] [PATCH] Configure broken on windows 9x > Resent-Date: 2 Aug 2002 20:57:57 - > Resent-From: [EMAIL PROTECTED] > Resent-To: [EMAIL PROTECTED] > > # New Ticket Created by "Mr. Nobody" > # Please include the string: [perl #15948] > # in the subject line of all future correspondence about this issue. > # http://rt.perl.org/rt2/Ticket/Display.html?id=15948 > > > > I sent this patch before but it got the wordwraps > messed up, its enclosed as an attachment this time so > it will be unchanged. > > __ > Do You Yahoo!? > Yahoo! Health - Feel better, live better > http://health.yahoo.com > > -- attachment 1 -- > url: http://rt.perl.org/rt2/attach/32707/26971/8b1cd1/diff > >
Re: [perl #15845] [BUG] GC segfault
Applied with some modification, thanks. Mike Lambert Richard Cameron wrote: > Date: Wed, 31 Jul 2002 22:24:55 +0100 > From: Richard Cameron <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: Re: [perl #15845] [BUG] GC segfault > > > On Tuesday, July 30, 2002, at 07:20 PM, Simon Glover (via RT) wrote: > > > This code segfaults: > > > > sweepoff > > set I0, 0 > > > > LOOP: new P0, .PerlString > > set P0, "ABC" > > save P0 > > inc I0 > > lt I0, 127, LOOP > > > > end > > This is a fairly straightforward fix. > > Parrot_do_dod_run() ordinarily updates pool->num_free_objects as a side > effect of looking for unused headers. If dod is disabled with sweepoff, > then num_free_objects doesn't get updated. This confuses a piece of code > later on which decides that it doesn't need to allocate any new buffers > after all (although all other evidence point to the contrary). > > Parrot segfaults soon after. > > I've attached two patches, the first fixes the problem by telling the > allocator to ignore the value of num_free_objects if it's unknown; the > second adds the erstwhile crashing code to the test suite (although I'm > not convinced I've put it in the best place). > > Richard. > > >
Re: [perl #15731] [PATCH] Silence warning
> Patch below kills a couple of warnings that cropped up because > alloc_more_objects was renamed to alloc_objects in the code but > not the headers. Also updates the comments. > > Simon Applied, thanks. Mike Lambert
Re: [perl #15730] [PATCH] Fix typos
> Fixes a few typos and tidies up capitalization in dod.dev > > Simon Applied, thanks. Mike Lambert
Re: [perl #15724] [BUG] GC bugs
> 1. As far as I can make out, start_arena_memory & end_arena_memory are > never initialized before being used in alloc_objects (in dod.c). > Consequently, there's no guarantee that they ever do get initialized > properly, and hence any or all of pmc_min, pmc_max, buffer_min & > buffer_max in trace_system_stack (also in dod.c) may contain garbage. Ahh, good catch, thanks. Although it won't cause any problems, it still is a bug. This code was just an optimization which made the trace_system_stack code MUCH faster, as compared to checking in every buffer pool for *each* potential pointer on the stack. Since alloc_objects includes sanity checks to change *_arena_memory, it's guaranteed to contain all potential buffers, although it could be wildly over-zealous with it's definition of the min and max. > 2. In get_min_buffer_address (in header.c): shouldn't all of the > references to end_arena_memory actually be to start_arena_memory ? Ah yes, now that one could cause problems. :) The only reason that I believe it didn't cause problems in the GC benchmark suite is that all buffers on the system stack were "recently" allocated, and likely to be later in the system memory and thus *after* the end of the first allocated buffer pool (which is what the code would currently return). The redundancy in the code for the min/max stuff on pmc/header pools does bother me (and look, it's prone to error ;), but I'm not really sure of any cleaner ways to do it. If you have any suggestions, please feel free to mention them. Anyways, I've committed fixes for both of these issues. Thanks again. Mike Lambert
Keyed multiplication
Can I propose a simply-phrased question? I have an IntArray in @P1 and a NumArray in @P2. How would I do the equivalent of: S1 = P1[5] * P2[5] I'm not asking about how to do it currently, but rather how it should be done in the 'final keyed interface'. When explaining, I'd appreciate sample pasm code and/or rough pseudo-code for the unimplemented ops and methods you're using. Also, it prevents any magical hand-waving. :) Beware, as I believe this is a very tricky question that will delve into areas of mm-dispatch. However, if answered, I think it will give great insights into "the way things (should) work". Of course, the possibility exists that I am completely missing something, and this truly is a simple question. :) Thanks, Mike Lambert
Re: [PATCH] Reduce array.pmc keyed code
Scott Walters wrote: > Part of the beauty of PMCs is that you can have very compact > storage given a dedicated eg int array type. Generating these > would not be a bad thing. The typical case still remains, that > arrays will contained mixtures of all datatypes. Yep, I agree. Thus, array.pmc would be 'the typical case', since its 100% PMCs. Stored integers and strings would be converted to PMCs for storage, much like how perl 5 works. > I proposed several approaches. Taking just one of them: > > Requests to operate on PMCs should not be propogated down to the PMC > through one of dozens of methods. Instead, the PMC should be fetched > into a scratch register and operated on directly. In the case of > primitive atomic datatypes, the recurisve multiply, add, div, etc > operations should be disposed of, and references to the atomic types > should be fetched to a scratch register, where they are operated on > directly (as a reference). Okay, I'll try and think this through. For normal aggregate PMCs, nothing more is required. The ops to fetch and store in scratch registers is available. And the ops to operate on them are there, too. For atomic types, it gets a bit stranger. We need to return references to these atomic types. This is problematic because: We don't have anywhere to store them. We can store them in INT registers (which I believe are guaranteed as large as pointers, but not sure), but then this gives us the ability to break the nonexistant "safe" interpreter by operating on the pointer. Finally, a whole complement set of ops needs to be generated. For every op like 'add N, N', we need an 'add Np, Np', where Np is NUM pointer. Or we need to "get" the NUM reference into an Nx register, operate on it, and copy it back into the INT pointer to the one in the aggregate. However, returning referencse does have one major advantage. Indicating the *lack* of something. If I retrieve a PMC which doesn't exist, I can get NULL back. But if I retrieve the 5th index of an integer array for which the length is 3, what am I supposed to get bacK? NaN? In this context, retrieving a pointer to the referred element makes sense, and allows us to "do things" to the keyed element, that aren't necessarily supported by the vtable methods. I'm beginning to agree with you, here. :) > Restating, if you're going to work on this, please work with me. I'd be > glad to help whatever you're doing, but I *hate* duplication of effort. > I've got plenty of other things I should be working on. Oh, I hate duplication of effort, too. But the patch I sent in took 10-15 minutes to do, and I wanted to try it out and see how it reduced the code size. Besides, if you're complaining about how to do something, and you provide patches that start on it, I think people take you more seriously. :) > The 33k implementation of Array is less then 1/4 complete. This beast > will be 125k before its done, mark my words. Low power chips have > quarter meg or half meg L2 caches. I'm a firm believer that a VM should fit > in cache and leave room for some data to be cached, too. A seperate > fetch and operation would add a few more instructions to the implementation, > but compared to the cost of a cache stall, this is beans. The best thing we > could do is remove some of this bloat now. I won't argue that it'll be 125K before it's done. In fact, I think it will be more. At least, if we continue on our current path of keyed versions of the various array methods. In terms of convincing Dan, I think we just need to be clearer in the argument: - the keyed approach is fine - get_keyed, set_keyed, are fine - the existing .ops for keys are fine, although more are needed The main changes that I see are: - the elimination of the keyed versions for all the mathematical vtable methods - the addition get_keyed_ref method - the addition of mathematical keyed *ops* that use get_keyed_ref to "do their thing" ? - perhaps some method for storing the keyed_ref result into a register? The mathematical keyed ops might make this unnecessary, however. Thoughts on this hopefully more concrete explanation of what could be changed? Thanks, Mike Lambert > Mike, I can't mail you directly. I'm sharing a netblock with a known spammer > and you're using spews.org. I didn't mean to send my rant to the entire list > again. Sorry, folks =( Sorry about that. I don't have control over the machine receiving email for me, and won't be back to serving my own mail until the fall semester starts up again.
Re: [COMMIT] GC Speedup
> > If performance has to halve in order to implement such features, I hope > > somebody plans to write Parrot::Lite! > > I'm not sure if I understand the problem properly, but is the problem with > using exceptions (or using longjmp) and the like to unwind that we can't > trust external extensions to play ball with our rules when we need to unwind > through them? And that if we didn't have to call out to external code, we > could use the faster methods without needing stack walking? Well, I only knew about the first problem, but I suppose the "external code" problem is another valid one. :) > If so, is it possible to make a hybrid scheme, whereby if we know that > between the two marks on the stack we've not called out to any external > code we use the faster mechanism to check for leaks, but if we know that > we entered external code (and must have come back in because we're now back > in the parrot garbage collection system called by another parrot call) use > a tack walk. Obviously we'd carry the overhead of more bookkeeping, but it > might win if it saves the stackwalk. (and thrashing the cache in the process) I do like this idea, although we'll have to run it by Dan first. I know that both Peter and I had different solutions to the pointers-on-the-stack problem, and all such ones were rejected. I'd have to do benchmarks comparing my own (neonate buffers) to the current stackwalk code, but I'm sure both Peters and mine would win, hands down. That's mainly because we: a) don't call external code b) don't use longjmp So our solutions, and the bookkeeping they entail, would never be wasted, because there is nothing to invalidate them yet. To really determine their worth, we need real-world programs, and figure out how often would they be using longjmp, and external code, to determine how often the bookkeeping is wasted. But that brings me back around to the point in my previous email, about the mythical "real world program". :) Mike Lambert
Re: [COMMIT] GC Speedup
> With the call to trace_system_stack commented out in dod.c, I get 48.5 > generations per second. The full stats are: > 5000 generations in 103.185356 seconds. 48.456488 generations/sec > A total of 36608 bytes were allocated > A total of 42386 DOD runs were made > A total of 6005 collection runs were made > Copying a total of 72819800 bytes > There are 21 active Buffer structs > There are 1024 total Buffer structs > > This compares to the 14th July CVS version: > 5000 generations in 81.172149 seconds. 61.597482 generations/sec > A total of 58389 bytes were allocated > A total of 160793 DOD runs were made > A total of 1752 collection runs were made > Copying a total of 1228416 bytes > There are 81 active Buffer structs > There are 192 total Buffer structs I guess this means the examples/benchmarks I was using to test were not too representative of real-world programs. Or maybe that's the case for life.pasm. :) Looking at the above results, I think I can see part of the problem. What's really annoying is that the more I play with the benchmarks, the more I realize they are useless. The new parrot has an initial buffer count of 256, which helped performance on my system, when compared to the pre-GC commit. The old version has 64 or so. Just this small tuning difference means: a) more buffers to dod and collect b) less of a need to DOD since we can "live" longer without it c) more memory usage because we can't collect data in old PMCs until they've been DOD'ed Doing minor adjustments like inlining functions, etc (which I did the other day) can give maybe a 1-4% performance across the board, each. However, changing a number like HEADERS_PER_ALLOC can affect performance +/-8%, program-depending. This makes it rather difficult to difficult to optimize the GC, since optimizing for one program *easily* messes up the performance on other programs. Setting *_HEADERS_PER_ALLOC back to the original of 16 improves performance on life.pasm by 5%, although it causes a corresponding hit on the examples/benchmarks. Changing UNITS_PER_ALLOC_GROWTH_FACTOR either way causes a big speed hit. Changing REPLENISH_LEVEL_FACTOR either way causes a big speed hit. Changing the logic on when we DOD relative to collection, in any manner, causes a speed hit. This leads me to believe that we have a GC that's tuned for life.pasm, which makes a lot of sense. Before examples/benchmarks, there was only life, and all GC performance changes were compared on that. In my attempts to tune for examples/benchmarks, I undoubtedly caused life performance to suffer. Parrot doesn't have any real-world programs, which makes it difficult to do any sort of worthwhile tuning. Hopefully, with sean's (and everyone else's) work on the Perl6 grammar, we can start taking these perl programs (like qsort), and running them through and benchmarking them against the parrot VM. Unfortunately, until we have a wide test suite of programs, or start implementing adaptive adjustment of GC parameters, I have a feeling we're just going to travel around in circles. Mike Lambert
Re: [PATCH] Reduce array.pmc keyed code
> This patch is rather questionable, and thus I did not commit it > directly. However, it illustrates a point I wanted to make. Doh! Hopefully my previous post will make a bit more sense now. :) Mike Lambert Index: array.pmc === RCS file: /cvs/public/parrot/classes/array.pmc,v retrieving revision 1.28 diff -u -r1.28 array.pmc --- array.pmc 24 Jul 2002 07:32:46 - 1.28 +++ array.pmc 25 Jul 2002 03:24:31 - @@ -146,46 +146,16 @@ } INTVAL get_integer_keyed (KEY* key) { -KEY_ATOM* kp; -INTVAL ix; -PMC* value; - -if (!key) { -return 0; -} - -kp = &key->atom; -ix = atom2int(INTERP, kp); +PMC *value = SELF->vtable->get_pmc_keyed(INTERP, SELF, key); -if (ix >= SELF->cache.int_val || ix < 0) { -internal_exception(OUT_OF_BOUNDS, "Array element out of bounds!\n"); -} - -value = ((PMC**)(((Buffer *)SELF->data)->bufstart))[ix]; - -if(key->next != NULL) { +if(key->next != NULL) return value->vtable->get_integer_keyed(INTERP, value, key->next); -} -else { +else return value->vtable->get_integer(INTERP, value); -} } INTVAL get_integer_keyed_int (INTVAL* key) { -INTVAL ix; -PMC* value; - -if (!key) { -return 0; -} - -ix = *key; - -if (ix >= SELF->cache.int_val || ix < 0) { -internal_exception(OUT_OF_BOUNDS, "Array element out of bounds!\n"); -} - -value = ((PMC**)(((Buffer *)SELF->data)->bufstart))[ix]; + PMC *value = SELF->vtable->get_pmc_keyed_int(INTERP, SELF, key); return value->vtable->get_integer(INTERP, value); } @@ -194,46 +164,16 @@ } FLOATVAL get_number_keyed (KEY* key) { -KEY_ATOM* kp; -INTVAL ix; -PMC* value; +PMC *value = SELF->vtable->get_pmc_keyed(INTERP, SELF, key); -if (!key) { -return 0; -} - -kp = &key->atom; -ix = atom2int(INTERP, kp); - -if (ix >= SELF->cache.int_val || ix < 0) { -internal_exception(OUT_OF_BOUNDS, "Array element out of bounds!\n"); -} - -value = ((PMC**)(((Buffer *)SELF->data)->bufstart))[ix]; - -if(key->next != NULL) { +if(key->next != NULL) return value->vtable->get_number_keyed(INTERP, value, key->next); -} -else { +else return value->vtable->get_number(INTERP, value); -} } FLOATVAL get_number_keyed_int (INTVAL* key) { -INTVAL ix; -PMC* value; - -if (!key) { -return 0; -} - -ix = *key; - -if (ix >= SELF->cache.int_val || ix < 0) { -internal_exception(OUT_OF_BOUNDS, "Array element out of bounds!\n"); -} - -value = ((PMC**)(((Buffer *)SELF->data)->bufstart))[ix]; + PMC *value = SELF->vtable->get_pmc_keyed_int(INTERP, SELF, key); return value->vtable->get_number(INTERP, value); } @@ -243,46 +183,16 @@ } BIGNUM* get_bignum_keyed (KEY* key) { -KEY_ATOM* kp; -INTVAL ix; -PMC* value; +PMC *value = SELF->vtable->get_pmc_keyed(INTERP, SELF, key); -if (!key) { -return 0; -} - -kp = &key->atom; -ix = atom2int(INTERP, kp); - -if (ix >= SELF->cache.int_val || ix < 0) { -internal_exception(OUT_OF_BOUNDS, "Array element out of bounds!\n"); -} - -value = ((PMC**)(((Buffer *)SELF->data)->bufstart))[ix]; - -if(key->next != NULL) { +if(key->next != NULL) return value->vtable->get_bignum_keyed(INTERP, value, key->next); -} -else { +else return value->vtable->get_bignum(INTERP, value); -} } BIGNUM* get_bignum_keyed_int (INTVAL* key) { -INTVAL ix; -PMC* value; - -if (!key) { -return 0; -} - -ix = *key; - -if (ix >= SELF->cache.int_val || ix < 0) { -internal_exception(OUT_OF_BOUNDS, "Array element out of bounds!\n"); -} - -value = ((PMC**)(((Buffer *)SELF->data)->bufstart))[ix]; + PMC *va
[PATCH] Reduce array.pmc keyed code
This patch is rather questionable, and thus I did not commit it directly. However, it illustrates a point I wanted to make. As mentioned in my recent PARROT QUESTIONS email, a lot of the clutter in the PMC aggregates can be removed with the use of redirecting functions. The below patch reduces the resulting array.c from 40K to 20K, and the ..obj file from 25K to 33K (not that much). It introduces another layer of recursion into the code, while at the same time eliminating lots of duplicate code. If we disallow subclassing of the PMC, a lot of the vtable redirects in this code could be replaced with straight function calls (and those function calls could subsequently be inlined within the same .c file). After doing this, I wonder if it's not useful to allow an aggregate PMC to declare its inner type (in this case, PMC). The pmc2c.pl would then generate stub functions which converted from that base type to each of the requested types. This would allow PerlIntArraty to be a base of INT, yet perform auto-conversions to num, string, pmc, etc in the generated code. Is the patch here too recursive to be efficient, despite the reduction in actual code? Should all SELF->vtable methods on *.pmc files be made to call the appropriate function directly, and assume no subclassing? Another possibility is to regenerate the functions for the subclasses, so that parent inlinings of SELF->vtable->get_pmc_keyed will not interfere with a child's redefinition of their own get_pmc_keyed ? Thoughts, comments? Mike Lambert
Re: PARROT QUESTIONS: Keyed access: PROPOSAL
keys. > > >Given your objectives of speed, generality and elegance, > > > > I should point out here that elegance appears third in your list > > here. (It's fourth or fifth on mine) > > Ooops. Yes, Dan's coding objectives are somewhat of a mystery to me as well. :) > > > * function calls consume resources > > Generally incorrect. Function calls are, while not free, damned cheap > > on most semi-modern chips. > > Your inner loop is a few lines of code. If every inner loop execution triggers > a cascade of function calls, this is lost. It may be small, but certain cases > do warrent changing extremely frequently used recursive structures to > iterative structures. I'm not saying this happends - I'm just saying that there > is a certain point when this value does become significant. However, if that inner loop references a multi-dim array, a standard implementation of a recursive keyed access would fail to work, no? And if you're that concerned about the recursive key lookup on a heavily nested loop, I'm sure you could hoist some of the key lookups out of the appropriate loops (or maybe not, depending upon the Perl code in question). However, given that Perl is hard to optimize, there's not much you can do to optimize [$a][$b][$c] access because any one of the PMCs might be tied, changing the behavior of the system. As such, you might very well need to perform the full keyed lookup each time. > I agree with Dan now that I understand better. My complaints have been addressed, > with the one exception of refactoring code bloat. I feel this is a small change > in implementation, and shouldn't impact design. I hope Dan will (pending time) > consider it, and I'll be happy to hash it out with him on IRC to make sure > both parties understand exactly what is being said and that I don't continue > to miss things ;) Hopefully I've helped to explain some of the things you said you were unsure about. Of course, since I'm not Dan, you might very well hear something completely different when he gets back from TPC. :) Mike Lambert
Re: [patch] win32 io
> * win32 can flush it's file buffers (FlushFileBuffers()) > * SetFilePointer knows about whence, win32 constants (values, not names) are the >same as in linux. Applied, thanks. Mike Lambert
Re: [COMMIT] GC Speedup
Hey Peter, Sorry about not replying to your ealier message...I completely forgot until I saw this message. :) > Thanks Mike, those changes do indeed help. Current numbers on my system for > 5000 generations of life with various versions of Parrot (using CVS tags) > are: > 0.0.5 47.99 generations per second > 0.0.6 57.41 > 0.0.7 20.18 > current 21.18 (an improvement of 4.7% over 0.0.7) These do look pretty bad. Unfortunately, these numbers are not directly comparable. Between 0.0.6 and 0.0.7, two major things changed in the GC code: - addition of stack-walk code to avoid child collection - the GC refactoring I commited I suspect the former is what is causing your speed hit, although I'm not ruling out the possibility that my changes caused a problem as well. Can you disable the trace_system_stack call and re-run these numbers? I know that there are faster solutions to the problem of child collection, but Dan doesn't want to use them due to the problems that occur when we start using exceptions (and longjmp, etc). Perhaps, if the above performance hit is due to trace_system_stack, it might give reason to reconsider the chosen solution? Thanks, Mike Lambert
Re: [perl #15317] [PATCH] Recursive keyed lookups for array.pmc
Applied, thanks. If someone wants to mark this ticket as resolved, I'd appreciate it. Mike Lambert Scott Walters wrote: > Date: Mon, 22 Jul 2002 08:49:33 GMT > From: Scott Walters <[EMAIL PROTECTED]> > Reply-To: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: [perl #15317] [PATCH] Recursive keyed lookups for array.pmc > Resent-Date: 22 Jul 2002 08:49:33 - > Resent-From: [EMAIL PROTECTED] > Resent-To: [EMAIL PROTECTED] > > # New Ticket Created by Scott Walters > # Please include the string: [perl #15317] > # in the subject line of all future correspondence about this issue. > # http://rt.perl.org/rt2/Ticket/Display.html?id=15317 > > > > > When a KEY * key datastructure is passed to a keyed method in array.pmc, > and key->next is true...: > array.pmc will recurse into the keyed lookup method of the PMC that it > contains, passing it key->next. > This implements the recursive indexing behavior as described in PDD08. > > -scott > > > -- attachment 1 -- > url: http://rt.perl.org/rt2/attach/30940/25927/407316/array.pmc.diff > >
More Keyed Questions
Heya, After seeing all the bruhaha on the list about keyed-ness, I thought I'd try my hand at it, to see if I could get any farther before running up into the wall. :) Here's my own list of questions...first, the main problem with keys, as I see it, is that there is no guiding PDD or spec that describes how they should work. As such, people can only learn from the code. And that's a mess. While basic key support is there, lots of things are half-implemented, or incorrectly implemented, and it's quite hard to get a coherent picture. At least, imho. Where are keys going to be stored? Currently, they exist only on the system stack, and key_new and key_clone are unused. Will we eventually have free-floating keys? Should we create support that causes them to be stored in small-object pools? I see that certain ops accept a type called KEY, which acts exactly like INT. And I mean *exactly*. It pulls data from INT registers, and even the 'k' and 'kc' type translate into accesses that look exactly like 'i' and 'ic'. Are we planning on having key creation and mutation operations? Where will these keys be stored in order to operate on them? INT registers? (Sorta how it is now, although it looks wrong). STR registers? (Means they need to be headered, DOD'ed, and a larger per-key size). Will they get their own registers, maybe only 8 deep? What about the possibility for constructing/operating on keys using a Key PMC. We could convert to/from real KEYs by using the PMC. This is basically just a sidestep of the above problem. Currently, I see: set_keyed_integer: PK* PI* *PI *PK set_keyed: PS* *PS PN* *PN set: What's the point of set_keyed versus set_keyed_integer naming? There doesn't seem to be any overlap at all, so set_keyed_integer could safely be named set_keyed. Can we safely remove "set ", due to the relative inefficiency in constructing dummy PMCs to call it? Wouldn't it be more efficient to split the call into two "set PP*",and "setP*P" calls? Thanks, Mike Lambert
Re: [PATCH] genclass.pl patch
Josef Höök wrote: > I've added an if case in genclass so it will print > "return whoami;" for "name" function so that no one need to grep parrot > source for an hour or two trying to figure out why it segfaults when > registering pmc class in init_world... ( grumble :-) ) Applied, thanks. Mike Lambert
[COMMIT] GC Speedup
I've just committed some changes tonight that improved performance on the GC-heavy tests in examples/benchmarks/ by about 8%. Results on each of the GC benchmark tests are, scaled against 1.0 as the old version, are: old new gc_alloc_new.pbc1.000 0.969 gc_alloc_reuse.pbc 1.000 0.957 gc_generations.pbc 1.000 0.899 gc_header_new.pbc 1.000 0.991 gc_header_reuse.pbc 1.000 0.871 gc_waves_headers.pbc1.000 0.867 gc_waves_sizeable_data.pbc 1.000 0.987 gc_waves_sizeable_headers.pbc 1.000 0.855 Overall: old 1.000 new 0.925 Details of what were done to accomplish this can be found in my email to the cvs-parrot list. It was pretty much 4 or 5 distinct things that each gave a couple percentage points' improvement. Thanks, Mike Lambert
[COMMIT] Major GC Refactoring
Last night I committed the GC refactoring I submitted the other day, then spent a couple hours putting out fires on the tinderbox. The last thing I attempted was to align my pointer accesses, because Tru64 was giving lots of warnings about Unaligned access pid=246428 va=0x1400b7364 pc=0x12005e408 ra=0x120037228 inst=0xb52c0010 After attempting to solve them for myself unsuccessfully, I went to: http://csa.compaq.com/Dev_Tips/unalign.htm and http://csa.compaq.com/Dev_Tips/unalign_example.htm which give instructions on tracking them down. Turns out set_keyed_string, and plenty of other parrot code, has the same problems I did. I believe there's a way to turn this off in the compilation, but I'm not sure if we want to do that. Finally, it appears that there are still 64-bit issues with the code I comitted last night, mostly in regards to the GC failing on the more intensive tests. I will try to look into this tomorrow night, but I'm not sure how much progress I'll be able to make, since I'm quite unfamiliar with gdb, and 64-bit platforms (and each individually, for that matter. :) Worst comes to worst, and DrForr needs to make 0.0.7, I can undo the changes to get the tests passing on all platforms, again. And then try it with JUST the stackwalking code to avoid neonate problems. Thanks, Mike Lambert
Re: [perl #823] [PATCH] put numeric comparisons in the right order
> > Um, I don't think it's right to *always* do the comparison > > floating point. Specifically, if both operands are ints, > > the comparison should be int. > > I thought about this, but all it buys you is a few bits of precision when > your ints and floats are both 64-bit, and slower comparisons all the time. > IMHO it's a wash, so I did it this way. If A = 2^30 and B = 2^30+1, won't they be the identical value when converted to IEEE floats on a 32-bit platform? IEEE floats have a 23 bit mantissa, which isn't enough to differentiate between 1^30 and 1^30+1^0, since 30 - 0 > 23. Am I missing something here? Thanks, Mike Lambert
Re: [perl #814] [PATCH] fix PMC type morphing
Foor for thought... There currently is a 'morph' vtable entry, which I believe is intended to morph from one vtable type to another. I think it'd be better to implement this function properly than to use macros (talk to Robert ;), especially considering that certain vtables might have special morphing requirements, such as setting PMC_is_buffer_ptr_FLAG. Of course, morph seems to be unimplemented, and my attempt at implementing it ran into a problem, which I brought up here: http:[EMAIL PROTECTED]/msg09317.html There are two problems: a) morph will break horribly when we deal with tied variables, since it will have to reimplement *every* PMC method to avoid any morphing. b) Since it's possible that dest == src, we need to make a copy of our data (be it a buffer ptr, or regular number) on the local stack, call morph() to morph the PMC and initialize data, and then set the new value. This pattern is currently utilized in the string PMC methods, but not with the number-related methods. So in conconclusion, while I don't have any reservations about your patch, I do have a preference that it be done differently. :) Mike Lambert
Re: Adding the system stack to the GC
> >a) Can I assume the stack always extends into larger-addressed memory, or > >must I handle both cases? > > Both cases. If you want to add configure probing to determine > direction, go ahead. I'm currently doing it dynamically. Get it working, then someone can do nice configure probing. :) Turns out win32 does it 'backwards', which I found interesting, at least. > >b) What's the largest alignment guaranteed for pointers? Byte-level? > > I think we can safely assume natural alignment, so you can treat the > stack as an array of pointers. If the start and end point are both > pointers you can just iterate that way. Seems to be byte-level, as I've had pointers between two pointers that wasn't pointer-aligned. Not sure what kind of padding msvc is doing, but it seems that from runops to a given DOD call in a string* function, that there's about 1KB of stack. That's 1000 checks to see if any of them are pointers to pmcs/buffers. Luckily, this number shouldn't grow with program size, but it might be a cause for worry. Also, I think I've discovered a situation which might not be resolvable. How do we determine id a given buffer is alive? We walk the stack and check for potential pointers into buffer memory. If the stack contains garbage pointers, we might have bad references into buffer memory. I can check for alignment within the buffer pool, and so it should be safe to set the BUFFER_live_FLAG on them. However, when we perform a collection, that means that we could be taking a garbage pointer in the buffer, and attempting to copy the memory it pointed to, into new memory. This could give us GPFs if we access bad memory, I think. Even if we check to ensure the buffer points into valid collectable memory (making it slower), we still have the issue of buflen being set to MAX_INT or something, and killing the system. :| The same caveats apply to pmc headers which happen to have PMC_buffer_ptr_FLAG set. How should we get around this particular problem, or is it spelling the doom of this particular solution? Thanks, Mike Lambert
Re: coders: add doco
> Melvin Smith wrote: > > What parts particularly bug you? Maybe we can address a few. > > Well, basically, AFAICT, virtually none of the parrot code > is adequately documented. So, pick a random entry point. :-) First, you have to understand that what you are saying is quite inflammatory, regardless of its veracity. Saying "I'm not flaming here" does not make it so. :) There are certainly many places in your original email where you could have been less inflammatory towards the people that have contributed code and documentation to the Parrot project. There have been many requests for additional documentation in the past. Patches have even been refused for lack of documentation. Did you check the p6i mail archive to see if this issue has been brought up before? I'm sure you won't find anyone arguing the point that parrot needs more documentation. However, if it was as simple a matter as telling people that more documentation was needed, Parrot would have had ample documentation looong ago. If you want to have a request listened to, you should be direct in what you request. When Melvin asked you for particular places that we could improve upon, you responded with 'all of the above'. That's quite a big task to address, and not telling anyone any more than they already knew. Did you have problems learning any particular aspect of parrot? If so, that might be a good area to request additional documentation. From my vantage point, documentation is bad only if someone attempts to learn a particular area of the code and has trouble because of the lack of, or inadequacy of, the documentation for that task. Have you attempted to learn every aspect of parrot, such that you can verifiably say that all of parrot's documentation is lacking? Thanks for understanding, Mike Lambert
Re: Adding the system stack to the GC
I'll take a stab at it. Got a few questions, tho: a) Can I assume the stack always extends into larger-addressed memory, or must I handle both cases? b) What's the largest alignment guaranteed for pointers? Byte-level? c) Where should this code go, such that it can be replaced for the OS/platforms which need it differently? resources.c? .c? Maybe in resources.c with each .c calling the generic one in resources.c (since win32, generic, darwin, etc are all likely to share the same logic.) Mike Lambert Dan Sugalski wrote: > Date: Fri, 12 Jul 2002 13:05:54 -0400 > From: Dan Sugalski <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Subject: Adding the system stack to the GC > > Okay, anyone up for this? Should be fairly trivial--take the address > of an auto variable in runops, store it in the interpreter, take the > address of *another* auto variable in the GC, and walk the contiguous > chunk of memory between, looking for valid PMC and Buffer pointers. > > Anyone? > -- > Dan > > --"it's like this"--- > Dan Sugalski even samurai > [EMAIL PROTECTED] have teddy bears and even >teddy bears get drunk > >
Re: vtables and multimethod dispatch
> We need a multimethod dispatch for vtable calls. Right now we're > working on a "left side wins" scheme and, while we're going to keep > that (sort of) we really need a way to choose the method to call > based on the types on both sides of binary operators. (Unary > operators, luckily, are easier) Woot. Woot. I'm glad to see that we're going to get multi-dispatch in the parrot core. There's a few main methodologies: Lookup logic: a) Have the dispatch logic be intelligent, and lookup the appropriate dispatch in order of increasing generality. b) Have the dispatch just lookup in a table, and generate that table at load/bind/etc time with the logic of inheritence, generality, etc. You can generate a large table that's N^2 for N = number of PMCs. Easy on lookup logic, bad on cache and memory usage. With option a) above, there's a few techniques you can use: a) Dan's preferring (I think he is, anyway) a two-level lookup (so that if you don't need the multi-dispatch, it can be nearly as fast as regular dispatch. So we dispatch to left-side. It can be greedy and handle it, or perform the second-level lookup itself, giving full multi-method dispatch. (Extending to trinary dispatch and more should we want that.) b) You can probably use a hashtable as a sparse matrix. Perform repeated lookups until you get a non-null method. Most PMCs will only have specially-designed interactions with a small subset of family PMCs, if anything at all. I'm sure there are other techniques...I posted links to a bunch of 'make multi-dispatch fast' links back when I was arguing about this topic awhile ago. But I admit to not having read them, so there are likely to be many other techniques that my current brain dump doesn't cover. :) > We can do this with the current vtable scheme as it is, since we > already have a slot to put this in, and I think we're going to have > languages that still do a left-side-win scheme. Well, for one, the current vtable scheme does a lot more than operators, so I don't think anyone would argue for it going away. And even left-side-win schemes are merely a special-case of generic multi-dipatch, with the right side being a '*' to match all PMCs. Of course, if Perl is going to be multi-dispatch to the core, is there a valid argument for trying to optimize the single-dispatch case? Granted we already have that implemented, but some multi-dispatch schemes impose the same penalty for single- as they do for multi-. Would these schemes be allowed, or explicitly disallowed? Finally, my last item that I'd like to see included in any multi-dispatch scheme that gets implemented, is the ability to register methods to be called, that aren't in either PMC. While this is infringing a bit on p6-language territory, I still believe we need a mechanism for it internally. It would give a way for different mathematical libraries to interoperate in code, by merely writing operators which could handle the conversion from one type to another (or faster, dealing with the internals of both). Finally, I'd like to be able to use multi-dispatch for the purpose of conversion/casting. While the _as_int methods handle the simple ones fine, PMC->PMC conversions are essentially multi-dispatch, and imo, should be treated as such. This might only matter with strong typing, but it might also help with the differently-organized mathematical libraries: assuming no binary operators are written, one only needs to write conversions, to allow them to interoperate, if slowly. Thoughts? Am I taking it too far? Mike Lambert
Re: GC Benchmarking Tests
> > > gc_alloc_new.pbc > > > gc_alloc_reuse.pbc > > > > I don't think these two tests are very interesting. They allocate > > quite large strings, so they don't put much strain on the GC. > > Instead, they measure how fast Parrot is at copying strings. > > I believe that's a very good thing to be testing. If the pool allocates > more memory than it thinks it will need, it will perform less overall > copies at the expensive of larger callocs and worser performance in other > cases. Erg. Seems that for every email I send, I send out another one correcting/clarifying it. :( I will agree with you that gc_alloc_new isn't really useful as it currently stands. While what it is testing is good, it currently only has five iterations due to the huge amount of memory it is allocating. As such, I plan to give it a slower rampup that should allow for more iterations, while still testing the same thing. Mike Lambert
Re: GC Benchmarking Tests
> > gc_alloc_new.pbc > > gc_alloc_reuse.pbc > > I don't think these two tests are very interesting. They allocate > quite large strings, so they don't put much strain on the GC. > Instead, they measure how fast Parrot is at copying strings. I believe that's a very good thing to be testing. If the pool allocates more memory than it thinks it will need, it will perform less overall copies at the expensive of larger callocs and worser performance in other cases. If the GC is generational, it will be able to detect the early gc_alloc_new headers as 'old', and promote them into an older pool. It should exhibit better performance. Finally, gc_alloc_reuse has a header turnover. So while it allocates tons of memory, the old memory is able to be marked unused and it has a mostly-constant 'total_used' at any given time. If DOD runs aren't done, this will demonstrate poor performance due to the old headers not getting marked as unused, and it needing to allocate more memory blocks. I think it is *because* they measure how fast parrot is at copying string that these are good tests. GCs which avoid copying will perform better on these. Mike Lambert
Re: [netlabs #636] GC Bench: out-of-pool-memory logic
> Results are: > before after > gc_alloc_new.pbc 4.1559990.18 > > gc_alloc_new seems to have improved a *lot*. This is because > gc_alloc_new allocates a lot of memory using the same headers. It > doesn't tear through headers quickly enough to trigger any dod runs on > its own, so these headers stay live and allocate tons of memory that gets > continually copied between generations. Okay. I guess I was a bit optimistic with that. The gc_alloc_new statistic above is false. After getting confused when applying this patch to my local safe GC and not seeing an equivalent speedup, I investigated a bit more. Normal allocation sizes are: 45 980 19600 392000 784 15680 With this patch, the allocation sizes are: 980 19600 392000 784 0 Obviously, not allocating 156 mb of memory is quite efficient. It also makes me realize that a slower rampup here would probably be a better test. I guess it also raises some suspicion about the other test results. Ah well, we need a fix for our GC problems anyway... Mike Lambert PS: I'm currently operating under the assumption that these kinds of emails are a good thing. There will probably be a lot more in this style, so if you want to change something about how I'm doing this, please let me know.
Re: [netlabs #642] GC Bench: Collection Pool Bounds
> I've modified his patch to remove some unnecessary calculations. > > before after > gc_alloc_new.pbc 4.1559993.756002 > gc_alloc_reuse.pbc16.574 9.423002 > gc_generations.pbc4.025 5.278002 > gc_header_new.pbc 3.686 3.615 > gc_header_reuse.pbc 5.5779994.908003 > gc_waves_headers.pbc 3.8150023.675001 > gc_waves_sizeable_data.pbc8.3830029.403999 > gc_waves_sizeable_headers.pbc 5.668 6.268999 Yet another correction to my results...the 'after' benchmarks are from a completely different build of parrot. Unfortunately, the new results aren't any easier to explain. Correct results are: before after gc_alloc_new.pbc4.1559993.836001 gc_alloc_reuse.pbc 16.574 12.318001 gc_generations.pbc 4.025 4.186 gc_header_new.pbc 3.686 4.166 gc_header_reuse.pbc 5.5779994.345999 gc_waves_headers.pbc3.8150023.796001 gc_waves_sizeable_data.pbc 8.3830027.27 gc_waves_sizeable_headers.pbc 5.668 5.617998 gc_waves_resizeable_data improves by 1.1 gc_header_reuse improves 1.2 gc_alloc_new improves 0.3 gc_alloc_reuse improves 4.2 (as it does for every benchmark) gc_header_new worsens 0.5 gc_waves_resizeable_data improves because it closesly follows the shape of the curve, instead of allocating lots of extra memory all the time. (not sure how closely...depends upon when it runs out of pool memory and copmacts..at the bottom of the curve or at the top). Not really sure how to explain the other results, unfortunately. My imagination has blown it's cover and been exposed for what it is, and it's having some trouble recovering. :) With these new stats, this patch looks like it *does* provide an improvement, and so I think this one is worthwhile (although my comments still stand about looking for an adaptive pool sizing system). Thanks for bearing with me on this, Mike Lambert
GC Benchmarking Tests
Hey all, After finding out that life.pasm only does maybe 1KB per collection, and Sean reminding me that there's more to GC than life, I decided to create some pasm files testing specific behaviors. Attached is what I've been using to test and compare running times for different GC systems. It's given a list of builds of parrot, a list of tests to run, and runs each four times and takes the sum of them as the value for that test. Then it prints out a simple table for comparing the results. It's not really robust or easily workable in a CVS checkout (since it operates on multiple parrot checkouts). Included are five tests of certain memory behaviors. They are: gc_alloc_new.pbc allocates more and more memory checks collection speed, and the ability to grow the heap gc_alloc_reuse.pbc allocates more memory, but discards the old checks collection speed, and the ability to reclaim the heap gc_header_new.pbc allocates more and more headers checks DOD speed, and the ability to allocate new headers gc_header_reuse.pbc allocates more headers, but discards the old checks DOD speed, and the ability to pick up old headers gc_waves_headers.pbc total headers (contain no data) allocated is wave-like no data, so collection is not tested tests ability to handle wavelike header usage pattersn gc_waves_sizeable_data.pbc buffer data (pointed to by some headers) is wave-like a few headers, so some DOD is tested mainly tests ability to handle wavelike buffer usage patterns gc_waves_sizeable_headers.pbc total headers (and some memory) allocated is wave-like sort of a combination of the previous two each header points to some data, so it tests the collectors ability to handle changing header and small-sized memory usage gc_generations.pbc me trying to simulate behavior which should perform exceptionally well under a genertaional collector, even though we don't have one :) each memory allocation lasts either a long time, a medium time, or a short time Please let me know if there are any other specific behaviors which could use benchmarking to help compare every aspect of our GCs? Real-world programs are too hard to come by. :) Results of the above test suite on my machine comparing my local GC work and the current parrot GC are coming soon... Enjoy! Mike Lambert PS: If you get bouncing emails from me because my email server is down, I apologize, and I do know about it. My email server is behind cox's firewall which prevents port 25 access. It should be relocated and online again in a few days. gc_bench.zip Description: gc_bench.zip
Re: [netlabs #629] [PATCH] Memory manager/garbage collector -majorrevision
> STRING * concat (STRING* a, STRING* b, STRING* c) { >PARROT_start(); >PARROT_str_params_3(a, b, c); >PARROT_str_local_2(d, e); > >d = string_concat(a, b); >e = string_concat(d, c); > >PARROT_return(e); > } Yet more ideas. Woohoo! :) I considered this kind of approach myself, but discarded it due to the ton of extraneous code you have to write to do the simplest of things. :( I'm not sure if the other people have considered it, discarded it, or are still considering it. As far as the pros/cons... First, it requires you write in a pseudo-language to define your local PMC headers and how to return data. I'm sure the macro freaks that have been scarred by perl5 will jump on here and beat you down in a few hours or so. :) Can you provide an implementation of the macros you described above? I have a few concerns which I'm not sure if they are addressed. For example: PARROT_str_local(d) I'm assuming it puts a reference to d onto the rooted stack. It would also need to initialize d to NULL to avoid pointing at garbage buffers. PARROT_str_params_3(a, b, c); What's the point of this? With rule 5 that prevents function call nesting, you're guaranteed of all your arguments being rooted. I think you can lose either the nesting requirement or the str_params requirement. PARROT_return(e); I'm assuming this backs the stack up to the place pointed to by PARROT_start(), right? This means during a longjmp, the stack won't be backed up properly until another PARROT_return() is called, somewhere farther up the chain, right? Finally, I think Dan has already outlawed longjmp due to problems with threading, but he'll have to elaborate on that. I agree my most recently stated approach is not longjmp safe since it could leave neonate set on certain buffers/pmcs. Finally, in response to my original post, you asked: > Suppose your C code builds a nested datastructure. For instance, > it creates some strings and add them to a hash-table. The hash-table is > then returned. Should it clear the neonate flag of the strings? I think I'd have to say...don't do that. Ops and functions shouldn't be building large data structures, imo. Stuff like buliding large hashes and/or arrays of data should be done in opcode, in perl code, or whatever language is operating on parrot. If you *really* need to operate on a nested datastructure, and you're going to hold it against my proposal, then there are two options. a) write code like: base = newbasepmc #nenoate pmc other = newchildpmc #also neonate base->add(other) #even if collecting/dod'ing, can't collect above two done_with_pmc(other) #un-neonates it, since it's attached to a root (neonate0 set repeat... It works, and then you just need to worry about what to do with your 'base' at the end of the function (to un-neonate it or not). b) make a done_with_children_of_pmc() style function. it hijacks onto the tracing functionality inherent in the DOD code, and searches for a contiguous selection of neonate buffers and pointers eminating from the pmc we pass in, and un-neonates them, leaving the passed-in-pmc neonated. Since everything we do in the function is nenoate, everything we construct into this base pmc should be contiguously neonate, if that makes sense. Granted, it's a little bit expensive to do the tracing, but you shouldn't need to trace too deep at all, and its time is proportional to the size of the nested data structure you are creating. Does that help? Mike Lambert PS: Oh, and I forgot to mention in my previous proposal about the need for nenonating pmc headers, and to look into what functions need to un-neonate pmc headers. That should be localized to the vtable methods, which are sort of a mess right now anyway with the transmogrification of vtables and have other GC problems.
Re: [netlabs #629] [PATCH] Memory manager/garbage collector -majorrevision
Okay. I have yet another idea for solving our infant mortality problem, which I think Dan might like. :) The neonate idea originally was intended to be set on *all* headers returned by the memory system, and they'd be reset by a clear_neonate op. At least, that's how I understood it. A straightforward implementation of the above is about 50% slower than it was before, so I think that rules this option out. The current code (without this patch), adds neonate wherever it discovers that it is needed, and turns it off when it is done. This was quite efficient, but required the user to constantly think about what functions could cause GC, etc. It was rather error-prone. If I understood Dan correctly on IRC yesterday, he was proposing that our current approach of handling infant mortality everywhere it can occur, is the 'correct' approach. It definitely buys us speed, but as mentioned above, it's somewhat error prone. The below is an attempt to try and convince Dan that in lieu of hardcore GC-everywhere programming, there is a middle ground. I believe we need a middle ground because forcing users to learn the quirks of our GC system makes parrot programming less fun, and raises Parrot's barrier to entry. As I was working on my revised GC system, I came up with a relaxation of the above that should be easier on programmers, and yet still be fast. It's not revolutionary by any means, but rather grabbing bits and pieces of different people's solutions. When you call new_*_header, the neonate flag is automatically turned on for you. As a programmer writing a function, you explicitly turn off the neonate flag when you attach it to the root set, or let it die on the stack. If you return it, you don't do anything, as it becomes the caller's job to handle. Neonate guarantees that it won't be collected, avoiding infant mortality. The programmer does not have to explicitly turn it on. Just turn it off. >From a cursory glance over string.c, only string_concat and string_compare create strings which die within the scope of that function, and thus need to be modified. This approach would complicate many of our string .ops, however. Stuff like "$1 = s" needs to turn off the neonate flag. Perhaps we can encode logic into the ops2c converter to turn off the neonate flag for things that it can detect, or perhaps we can require the user to do it because automated converters are guaranteed to fail. Core.ops requires a lot of such modifications, however. Things like err, open, readline, print, read, write, clone, set, set_keyed, the various string ops (substr, pack, etc), and savec, all require modification. I think these guidelines make it easy for non-GC-programmers to writ GC-dafe code, since they do not need to be aware of what allocates memory, and what does not. What do people think of this approach? Mike Lambert
Re: Hashtable+GC problems
> > Something about the whole setup just feels wrong. GC shouldn't be this > > intrusive. And it definitely shouldn't slow down the common case by > > making the compiler defensively reload a buffer pointer every few > > instructions (it'll be cached, but it fouls up reordering.) > > Alright. Today I discovered tracked headers. :) > > What are wrong with these for hashtable buckets? These are headers, and > so are immobile. You can allocate lots of them without having to pay much > of a price in terms of instructions. Well, for one, this isn't the intended use of these tracked headers. From my recent understading of how they should work: - They must be larger than a Buffer, and be a buffer header. - They will eventually get collected like regular headers. - They will be DOD'ed in the same manner as regular headers. All of which combines to mean that the above proposed use for tracked headers is incorrect. But perhaps there is a place for a small object allocator? Guidelines for it would be: non-moving (non-copying, non-compacting) one size per pool similar to tracked header support headers can be implemented on top of the small object allocator Hashtables, with a bunch of small buckets with pointers between them, could be implemented below buffer headers. They would avoid the overhead of a full buffer header per bucket, but would require the hashtable to maintain them manually (which thet current hashtable code does anyway, iirc). Pros: - not stuck with using headers for everything. keys could use a small object allocator (SOA) - see any traditional SOA for their advantages over generic memory pools Cons: - see previous email...cache coherency - see previous email...lack of automatic GC Any others? Thoughts on why we should/shouldn't implement this kind of thing in parrot, below the buffer level? Thanks, Mike Lambert PS: In case you're confused...yes, I was replying to myself. :)
Re: Hashtable+GC problems
> Ok, I'll finish off the original conversion to indexed access that I > began once, before giving up in disgust. The problem is not just that > you have to use indices instead of pointers; it's also that you have > to constantly go back to the buffer header before you can get > anywhere. That needs to be hidden by a macro or (my preference) an > inline function, and slows down the common case. Also, you lose the > clean sentinel value NULL (index 0 is definitely valid; index -1 > introduces signedness problems.) Dan says it won't be slow. So nyah! :P > Let me know if you've already started a rewrite, though, so I don't > just redo it. Sorry, I forgot to reply earlier...no I hadn't started work on a rewrite. > Something about the whole setup just feels wrong. GC shouldn't be this > intrusive. And it definitely shouldn't slow down the common case by > making the compiler defensively reload a buffer pointer every few > instructions (it'll be cached, but it fouls up reordering.) Alright. Today I discovered tracked headers. :) What are wrong with these for hashtable buckets? These are headers, and so are immobile. You can allocate lots of them without having to pay much of a price in terms of instructions. One drawback of tracked headers is the loss of cache coherency over time as the tracked headers end up getting spread out over memory, and then large allocations get interspersed into the various holes, with no locality. Hopefully we can get away with this due to the studies which have shown that objects tend to live and die in groups (and thus allocate and free up lots of memory all at once). Another problem is that these tracked headers aren't DOD'ed at all. This means you have to explicitly free them with add_to_free_pool (I'm not sure what the design of tracked headers is supposed to bewho is the 'tracked' referring to? Is user code or GC code supposed to be tracking them?). Since all of the buckets in your hashtable should be available from the hash itself, it should be easy to manage them yourself. In addition to not being dod'ed themselves, they don't mark other objects as live themselves. So you'd have to handle all your tracked headers in your PMC, going through them yourself and handling any buffer/pmcs they might point at. Now you have immobile memory that's efficient to allocate, good at avoiding memory fragmentation, and good for you to do with what you please. Once we figure out how hashes are implemented well, we should probably write up some guidelines on when to use what kinds of headers, et.c Thoughts? Mike Lambert
Re: GC design
> Add a counter to the interpreter structure, which is incremented every > opcode (field size would not be particularly important) > Store this counter value in every new object created, and set the 'new > object' flag (by doing this on every object, we remove the requirement for > the creating function to be aware of what is happening) > If an object is encountered during DOD that claims to be new, but was not > created during the current opcode, dispute the claim. > If the counter has exactly wrapped in the meantime, an object might survive > longer than it should. I know Dan's proposed solution has already been committed, but I'd like to add my support for this. In addition to providing a counter per opcode to ensure that we don't prematurely GC data, this field could also be reused to calculate the generation of a particular buffer, to help sort it into a correct generational pool (if/when we do get that). Another proposal is to walk the C stack. In runops (or some similar high-level function) we implement a dummy variable, and store a reference to it in the interpreter. In do_dod, we create another dummy stack variable. We then walk the memory byte by byte (or maybe some larger amount, if that's guaranteed), check to see if it passses 'the three rules', and then mark the buffer it points to. This is on the conservative side in that we might accidentally mark things we shouldn't. Then again, with our registers, it's very possible to reference old data which the program never bothered to clear, which also would be overly conservative. The three rules were: (as defined in Jones and Lins' Garbage Collection, pg 233) - Does p refer to a heap? (Is it within the low and high marks of all the header pools) - Has the heap block been allocated? (Go through the heaps, and check to ensure that this pointer points into one of our header blocks) - Is the offset a multiple of the object size of that block? (So we don't get random memory pointers into the header list, but only aligned ones) As long as the C stack is guaranteed to be contiguous, this should be portable. I'm not sure if that is guaranteed by ANSI C, however. Has this already been considered and explicitly rejected? Thanks, Mike Lambert
Re: [netlabs #607] [PATCH] COW strings (again)
> Actually, we don't. (Sez the man catching up on altogether too much > mail) Since we're putting the COW stuff at the tail end, substrings > of COW strings are fine. You set the bufstart to where the substring > starts, buflen set so it goes to the end of the original buffer, and > the string length bit holds the real string length. That way you > start where you need to, but you can still find the COW marker off > the end of the original buffer. > [End quote] I see one problem with this kind of approach... We have two string headers, A, and B. A points to the second half of string B, and the bufstart points into the middle of B. We start the collection/compacting process. We first find header A, and copy buflen chars after bufstart. We copied half the string. Now header B is traversed, and it can't point to the same memory because only half of the required amount was copied. We could 'fix' this up by including the true buffer length in the buffer footer, so that we ignore the header's buflen during the collection process. But I think the strstart is a better idea regardless. It's what perl5 did anyway, isn't it? Mike Lambert
Re: GC vtable method limitations?
> At 12:06 AM -0400 5/19/02, Mike Lambert wrote: > >Is there a plan to make a freed method for when pmc header gets put back > >onto the free list? (This would require we call this method on all pmc's > >before moving anything to the freelist, in case of dependancies between > >pmcs and buffers) > > Nope. I don't see a need--once the PMC's been destroyed, it belongs > to the system. Um. I see we have a destroy() vtable method, but it's only called when one calls the destroy() op, and the PMC has PMC_active_destroy_FLAG. I don't get this. What's the point of actively-destroying things? I thought since we had a GC, we don't need to worry about this kind of stuff. I'd argue that destroy() should get called when the PMC gets put back on to the free list (similar to destructors in C++). That was the behavior I was documenting when I discussed destruct below, at least. > Collect's dead, I think. I'm not seeing the point anymore, and since > we do collect runs through the buffers and not the PMCs, there's no > place to find what needs calling. Well, the hashtable could certainly use it. :) There is a hashtable pmc, which stores a bunch of pointers into some internal buffer data. Every time functions get called, it calls restore_invariants to fix them up. It might be better to do those fixups in the collect() method, so that they could update their internal data pointers. Or perhaps it should be rewritten to use indices. :) Mike Lambert
Re: GC design
> >Most vtable methods, and/or people that call vtable methods, will end up > >making themselves critical. This overhead will be imposed on most function > >calls, etc. Lots of the string api will require the users to mark > >themselves as critical. > > I don't think this is accurate. People calling vtable methods have no need > to mark themselves as critical. The things that mark themselves critical > are internals that are allocating and holding onto objects. I think very > few vtable methods even fall into this category, but I'd have to survey > the .pmc files before continuing this discussion. Perl strings, arrays, and hashes all require buffer manipulation, and will probably fall prey to this. I agree that I was probably generalizing a bit, and that in theory PMCs can criticalize their own methods. > >If I remember correctly, this did get hammered out with a directive from > >Dan. ;) > > I've seen no evidence of that hammering. I still think we are having GC > crashes on this issue. I said Dan gave a directive. I didn't say anyone listened to Dan, or implemented what he suggested. ;) > >The advantages of this are that nobody needs to worry about the GC > >implications of their code. > > Yes they do, they have to call an explicit routine, clear_uncollectable Ah, but as internals designers, we don't need to worry about that. We get to push it into the compiler writers. Isn't it fun? :) I initially had the same opposition to Dan's idea that you have. I'm not sure why I eventually gave in...perhaps the lack of any other solution? Don't really recall. :) > I'd just like to see someone implement a solution, and give benchmarks > to back it up. > > I'd like to see both approaches compared, personally, and I think neither > requires a whole lot of thought to implement. I also think we should reference > existing research and implementations, since we aren't the first to do this. That's true. I believe most implementations have taken advantage of the ability to access the C stack, something we don't have the liberty of with our wide-reaching compatibility goals. I think this will be similar to the cost of a reference counting solution versus a tracing system, where the former amortizes the cost over the entire system, but ends up being slower in terms of total time used. Your approach would involve lots of computation in lots of little functions over parrot's execution, whereas Dan's would involve a full trace (equivalent to a DOD) to be performed every now and again. But that's just hypothetical posturing because I don't have any real benchmarks, of course. Mike Lambert
Dynamic register frames?
I may be approaching semi-radical territory here with this idea. I've read all the FAQs and reasons on why we choose a register architecture versus a stack architecture. However, I recently thought of a combination idea which, (although it was probably discovered in the 70s sometime,) I think provides the best of both worlds, and would like to propose it before I shove it off to the dust-bin. Problems with register architecture: with caller-save, we're saving 0.5KB (4types*32registers*4bytes) of data per function call, which might add up with deeply-nested functions. If we want more than 32 elements, we need to start doing stack-pushing to get around limitations. One thing that I wanted to do in a regex implementation was use the full set of registers to store off certain points in the compiled regex. With 32 registers, longer regexes will require stack pushing at a certain point, and that will make a certain transition of the regex slower than the other portions of the regex. Problems with stack architecture: time spent pushing/popping ops (also happens with parrot registers if we use >32 elements) stack grows at runtime Now, my proposal is simply that instead of hardcoding to 32 elements, we allow the function to determine the number of elements in each register type. This gives us: each function uses a minimal of space, since it only uses as many registers as it requires leaf accessor functions have 0 or 1 registers for the most part, so we don't need to allocate a whole new set of registers for them. Caller-save is just pushing the current register frame onto the stack, and allocating a properly-sized register frame for the current leaf function, which is very small. Should be efficient. the register stack only gets used for functions. For the most part, functions allocate all the space they know they'll need, and that's it. (Of course, I suppose there are probably legit reasons for functions to use the register stack frames, however). We can use more than 32 elements. This means a 120-node regex can allocate 120 int registers for it's operation and execution. (Not saying that it's 1:1, more like 1:1 for non-greedy regex ops only, but still) No need to worry about register liveness/allocation to make things work. We now only need to do it if we want to make things run faster by using *less* than one register per C/perl stack variable. It's probably a bit late in the parrot development cycle to be propsing an idea like this, but I suppose since this idea couldn't have been implemented until we had functions which could store register requirement information, this might actually be a good time to suggest it. :) Thoughts or comments? Thanks, Mike Lambert
Re: GC design
> I would like an elegant, easy to use solution for making the GC > play nicely. So would we all. :) > This creates a sliding scope window that GC must not peep through, > and provides a clean interface for internals writers. I think you've explained this idea before, but I complained about it because I thought that the bottom_gen never got set to top_gen, and figured a lot of stuff would end up permanently allocated. Now that I see how it works, it seems to make a lot of sense. Problems with your approach: GC-sensitive functions must remember to mark themselves as critical. This will be a source of bugs (whether that's a big enough of a complaint is up for debate. ;) Most vtable methods, and/or people that call vtable methods, will end up making themselves critical. This overhead will be imposed on most function calls, etc. Lots of the string api will require the users to mark themselves as critical. > Lets hammer this one design issue out for good because I'm tired of worrying > about it and I think its hindering current Parrot developers and > confusing potential newcomers. > If it is not what I propose, lets at least discuss alternatives. If I remember correctly, this did get hammered out with a directive from Dan. ;) His approach was: the live flag is valid only within GC. all newly-allocated headers are marked as uncollectable there is a clear-uncollectable op, which iterates over the headers, and marks them all as collectable Basically, you need to have assigned all your headers to something traceable by the root set before your current op ends. The advantages of this are that nobody needs to worry about the GC implications of their code. The disadvantage are: - very expensive ops can allocate lots of uncollectable headers? - you must explicitly allow for marking headers as collectable in your opcode, at strategically placed locations. otherwise, nothing gets collected and you have no dod results, although collection will still occur normally. Any other contenders to the ring? Anyone have any other major dis/advantages they'd like to contribute about the above approaches? FWIW, I feel confident enough about my understanding of Dan's idea to implement that, should we choose it. Melvin's idea would require that much more work on the multitude of functions, and so I can't imagine it being as easy to implement. :) Mike Lambert