Re: GMC for dummies
On Mon, Jul 25, 2005 at 10:33:37PM -0400, Bob Rogers wrote: [snip] > > This is sounding more and more like the CMUCL gencgc algorithm, which > uses what I understand is a classic approach. Instead of an IGP list, > it write-protects all oldspace pages (hence my earlier question), > unprotecting them transparently when one is stored into, so that it only > needs to scan writable pages to look for newspace pointers. It is my > intuition that this would be less overhead than an IGP list, but I > suspect this is data-dependent, and would take benchmarking to prove. > On a POSIX-ish OS, this approach involves a system call to change the protection on each page, plus a signal handler that gets invoked whenever such a page is stored into, and then another system call to unprotect the page. [snip] > > That's OK; if Leo believes it will work, then I'm sure it will. My > quibbles were about speed and complexity, and I don't want to distract > you with unproven assertions about things that might not matter. System calls aren't cheap, and page table manipulations are not necessarilly cheap either. Whether this performance tradeoff is worth it is going to be both OS- and processor-specific. It also lurches into the realm of signal handlers, where POSIX guarantees very little behavior that is portable behavior, but operating systems may allow much more, but the allowed behaviors form an ever-changing and largely disjoint set. In summary, just about any algorithm that avoids page table manipulations and signal handlers is likely to be more portable, and will quite likely be faster. -- Ed M
Re: Stacks, registers, and bytecode. (Oh, my!)
On Tue, May 29, 2001 at 06:20:40PM -0400, Dan Sugalski wrote: > > I really think we'll win if we have support for at least integers as well > as PMCs. There's potentially a lot of integer work that'll be generated by > the optimizer and, while integer opcodes might not make the interpreter > much faster, they'll speed up TIL and generated C code significantly since > they won't need to call opcodes. How much integer arithmetic does the perl interpreter actually do? I've profiled a quite a few perl5 versions on a lot of different perl programs, and about the only programs where integer ops made it above the background noise were things like Ackerman, iterative Fibonacci, and some Q&D image processing on PBM files. Most of the programs I looked at didn't do much of any integer arithmetic. Figuring out where the hot spots are in an interpreter for a general- purpose programming language is hard. I'd recommend against special cases in the registers, since it's not clear how much they'd help. -- Ed Mooring ([EMAIL PROTECTED])
Re: Stacks, registers, and bytecode. (Oh, my!)
On Wed, May 30, 2001 at 12:14:29PM -0400, Uri Guttman wrote: > >>>>> "NI" == Nick Ing-Simmons <[EMAIL PROTECTED]> writes: > > NI> The "overhead of op dispatch" is a self-proving issue - if you > NI> have complex ops they are expensive to dispatch. > > but as someone else said, we can design our own ops to be as high level > as we want. lowering the number of op calls is the key. that loop will > be a bottleneck as it is in perl5 unless we optimize it now. > In my experience, perl opcodes have not been the performance bottleneck in perl5. It seems it isn't actually the loop that's the bottleneck in perl5. I profiled a whole bunch of different perl programs, using a lot of different versions of perl5; and the runops loop was very rarely among the top CPU users. Many times, the opcode routines themselves weren't the hot spots. It was the support routines like hash key calculation and lookup, string comparisons, or the regular expression code. Profiling is almost always counter-intuitive, but if I had to guess, I'd say that most of the per-opcode cost in perl5 was due to setup/initialization as each opcode was entered, and that devious/clever data structure design could avoid most of this. Also, opcode dispatch might not be the right tree up which to be barking in seeking performance. -- Ed Mooring ([EMAIL PROTECTED])
Re: Meta-design
On Thu, Dec 07, 2000 at 07:07:27PM +, Simon Cozens wrote: > On Thu, Dec 07, 2000 at 12:24:50PM +, David Mitchell wrote: > > In a Perl context, I find it hard to believe that reference counting takes > > up more than tiny fraction of total cycles. > > On a vaguely related note, here's the flat profile from gprof run > cumulatively on the test suite. (I haven't seen some hard data like > this in a while) Freeing SVs does appear to be inexpensive but called > often. What the *hell* is wrong with modulo? > This isn't directed at Simon, who isn't making any claims about what his data means. He just offered a convenient opening for me to pass on some experience. Relying on isolated measurements of Perl 5 to shape the implementation of Perl6 is probably not going to be a good idea. I've run a lot of gprof's on Perl5 and one of the things I found was that the results varied widely from program to program. Any given program had consistent results, but the hot spots in each program were different. Worse yet, the results varied widely when I changed compilers. Sun's C compiler is reputedly a lot better than gcc, but when I tried both with max optimization on perlbench and some other programs, some benchmarks ran up to 15% faster with one compiler or the other, but the perlbench average was identical. Various attempted micro-optimizations in the Perl 5 source were met with roughly identical results. Then I tried checking some of my results on platforms other than Sparc, Adding x86 and PowerPC made it even more confusing. I got about the same 15% variation in random directions in more code. Benchmarking complex programs on multiple platforms is hard. When the complex program is a programming language itself, it's really hard. -- Ed Mooring ([EMAIL PROTECTED])
Re: Profiling
On Sat, Sep 02, 2000 at 07:22:08PM +, Nick Ing-Simmons wrote: > > This is from a perl5.7.0 (well the current perforce depot) compiled > with -pg and then run on a smallish example of my heavy OO day job app. > > The app reads 7300 lines of "verilog" and parses it with (tweaked) Parse-Yapp > into tree of perl objects, messes with the parse tree and then calls > a method to write verilog back out again. > > It isn't your typical perl app but it is one I am interested in speeding > up. (Maybe even in perl5.) > > Anyone surprised by the top few entries: Nope. It looks close to what I saw when I profiled perl 5.004 and 5.005 running over innlog.pl and cleanfeed. The only difference is the method stuff, since neither of those were OO apps. The current Perl seems to spend most of its time in the op dispatch loop and in dealing with internal data structures. -- Ed Mooring ([EMAIL PROTECTED])
Re: RFC 146 (v1) Remove socket functions from core
On Fri, Aug 25, 2000 at 09:12:19AM -0400, Dan Sugalski wrote: > At 10:08 PM 8/24/00 -0600, Nathan Torkington wrote: > >Isn't dynamic loading really slow? > > Not particularly, at least not as far as I know. There's some extra cost in > finding the library and loading it that you wouldn't pay if you were linked > directly to it, but AFAIK that's about it. Dynamic loading can be noticeably slow if you are loading something via NFS. In addition the PIC code and jump tables used for dynamic linking result in a 10-15% slowdown in execution speed on SunOS and Solaris (at least in my experiments). Not what I'd call really slow, but we've complained vigorously about smaller slowdowns. -- Ed Mooring