Re: GMC for dummies

2005-07-29 Thread Ed Mooring
On Mon, Jul 25, 2005 at 10:33:37PM -0400, Bob Rogers wrote:
[snip]
> 
> This is sounding more and more like the CMUCL gencgc algorithm, which
> uses what I understand is a classic approach.  Instead of an IGP list,
> it write-protects all oldspace pages (hence my earlier question),
> unprotecting them transparently when one is stored into, so that it only
> needs to scan writable pages to look for newspace pointers.  It is my
> intuition that this would be less overhead than an IGP list, but I
> suspect this is data-dependent, and would take benchmarking to prove.
> 

On a POSIX-ish OS, this approach involves a system call to change the
protection on each page, plus a signal handler that gets invoked whenever
such a page is stored into, and then another system call to unprotect the
page.

[snip]
> 
> That's OK; if Leo believes it will work, then I'm sure it will.  My
> quibbles were about speed and complexity, and I don't want to distract
> you with unproven assertions about things that might not matter.

System calls aren't cheap, and page table manipulations are not
necessarilly cheap either. Whether this performance tradeoff is worth it
is going to be both OS- and processor-specific. It also lurches into the
realm of signal handlers, where POSIX guarantees very little behavior
that is portable behavior, but operating systems may allow much more,
but the allowed behaviors form an ever-changing and largely disjoint set.

In summary, just about any algorithm that avoids page table manipulations
and signal handlers is likely to be more portable, and will quite likely
be faster.
-- 
Ed M


Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-01 Thread mooring

On Tue, May 29, 2001 at 06:20:40PM -0400, Dan Sugalski wrote:
> 
> I really think we'll win if we have support for at least integers as well 
> as PMCs. There's potentially a lot of integer work that'll be generated by 
> the optimizer and, while integer opcodes might not make the interpreter 
> much faster, they'll speed up TIL and generated C code significantly since 
> they won't need to call opcodes.

How much integer arithmetic does the perl interpreter actually do?

I've profiled a quite a few perl5 versions on a lot of different perl
programs, and about the only programs where integer ops made it above
the background noise were things like Ackerman, iterative Fibonacci, and
some Q&D image processing on PBM files. Most of the programs I looked
at didn't do much of any integer arithmetic.

Figuring out where the hot spots are in an interpreter for a general-
purpose programming language is hard. I'd recommend against special
cases in the registers, since it's not clear how much they'd help.
-- 
Ed Mooring ([EMAIL PROTECTED])



Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-01 Thread mooring

On Wed, May 30, 2001 at 12:14:29PM -0400, Uri Guttman wrote:
> >>>>> "NI" == Nick Ing-Simmons <[EMAIL PROTECTED]> writes:
> 
>   NI> The "overhead of op dispatch" is a self-proving issue - if you
>   NI> have complex ops they are expensive to dispatch.
> 
> but as someone else said, we can design our own ops to be as high level
> as we want. lowering the number of op calls is the key. that loop will
> be a bottleneck as it is in perl5 unless we optimize it now.
> 

In my experience, perl opcodes have not been the performance bottleneck
in perl5.

It seems it isn't actually the loop that's the bottleneck in perl5. I
profiled a whole bunch of different perl programs, using a lot of
different versions of perl5; and the runops loop was very rarely among
the top CPU users.  Many times, the opcode routines themselves weren't
the hot spots. It was the support routines like hash key calculation
and lookup, string comparisons, or the regular expression code.

Profiling is almost always counter-intuitive, but if I had to
guess, I'd say that most of the per-opcode cost in perl5 was due to
setup/initialization as each opcode was entered, and that devious/clever
data structure design could avoid most of this. Also, opcode dispatch
might not be the right tree up which to be barking in seeking performance.

-- 
Ed Mooring ([EMAIL PROTECTED])



Re: Meta-design

2000-12-10 Thread mooring

On Thu, Dec 07, 2000 at 07:07:27PM +, Simon Cozens wrote:
> On Thu, Dec 07, 2000 at 12:24:50PM +, David Mitchell wrote:
> > In a Perl context, I find it hard to believe that reference counting takes
> > up more than tiny fraction of total cycles.
>  
> On a vaguely related note, here's the flat profile from gprof run
> cumulatively on the test suite. (I haven't seen some hard data like
> this in a while) Freeing SVs does appear to be inexpensive but called
> often. What the *hell* is wrong with modulo?
> 

This isn't directed at Simon, who isn't making any claims about what
his data means. He just offered a convenient opening for me to pass
on some experience.

Relying on isolated measurements of Perl 5 to shape the implementation of
Perl6 is probably not going to be a good idea.

I've run a lot of gprof's on Perl5 and one of the things I found was
that the results varied widely from program to program. Any given
program had consistent results, but the hot spots in each program
were different.

Worse yet, the results varied widely when I changed compilers. Sun's C
compiler is reputedly a lot better than gcc, but when I tried both with
max optimization on perlbench and some other programs, some benchmarks
ran up to 15% faster with one compiler or the other, but the perlbench
average was identical. Various attempted micro-optimizations in the Perl
5 source were met with roughly identical results.

Then I tried checking some of my results on platforms other than Sparc,
Adding x86 and PowerPC made it even more confusing. I got about the
same 15% variation in random directions in more code.

Benchmarking complex programs on multiple platforms is hard. When
the complex program is a programming language itself, it's really
hard. 
-- 
Ed Mooring ([EMAIL PROTECTED])



Re: Profiling

2000-09-04 Thread mooring

On Sat, Sep 02, 2000 at 07:22:08PM +, Nick Ing-Simmons wrote:
> 
> This is from a perl5.7.0 (well the current perforce depot) compiled
> with -pg and then run on a smallish example of my heavy OO day job app.
> 
> The app reads 7300 lines of "verilog" and parses it with (tweaked) Parse-Yapp
> into tree of perl objects, messes with the parse tree and then calls
> a method to write verilog back out again.
> 
> It isn't your typical perl app but it is one I am interested in speeding 
> up. (Maybe even in perl5.)  
> 
> Anyone surprised by the top few entries:

Nope. It looks close to what I saw when I profiled perl 5.004 and 5.005
running over innlog.pl and cleanfeed. The only difference is the method
stuff, since neither of those were OO apps. The current Perl seems to
spend most of its time in the op dispatch loop and in dealing with
internal data structures.

-- 
Ed Mooring ([EMAIL PROTECTED])



Re: RFC 146 (v1) Remove socket functions from core

2000-08-26 Thread mooring

On Fri, Aug 25, 2000 at 09:12:19AM -0400, Dan Sugalski wrote:
> At 10:08 PM 8/24/00 -0600, Nathan Torkington wrote:
> >Isn't dynamic loading really slow?
> 
> Not particularly, at least not as far as I know. There's some extra cost in 
> finding the library and loading it that you wouldn't pay if you were linked 
> directly to it, but AFAIK that's about it.

Dynamic loading can be noticeably slow if you are loading something
via NFS. In addition the PIC code and jump tables used for dynamic
linking result in a 10-15% slowdown in execution speed on SunOS and
Solaris (at least in my experiments). Not what I'd call really slow, but 
we've complained vigorously about smaller slowdowns.
-- 
Ed Mooring