Re: Transferring control between code segments, eval, and suchlike things

2003-01-22 Thread Benjamin Stuhl
At 03:00 PM 1/22/2003 -0500, you wrote:

Okay, since this has all come up, here's the scoop from a design perspective.

First, the branch opcodes (branch, bsr, and the conditionals) are all 
meant for movement within a segment of bytecode. They are *not* supposed 
to leave a segment. To do so was arguably a bad idea, now it's officially 
an error. If you need to do so, branch to an op that can transfer across 
boundaries.

Design Edict #1: Branches, which is any transfer of control that takes an 
offset, may *not* escape the current bytecode segment.

Seems reasonable. Especially when they bytecode loader may not guarantee 
the relative placement of segments (think mmap()). Although,
all this would seem to suggest that we'd need/want a special-purpose 
allocator for bytecode segments, since every sub has to fit within precisely
one segment (and I know _I'd_ like to keep bytecode segments on their own 
memory pages, to e.g. maximize sharing on fork()).

Next, jumps. Jumps take absolute addresses, so either need fixup at load 
time (blech), are only valid in dynamically generated code (okay, but 
limiting), or can only jump to values in registers (that's fine). Jumps 
aren't a problem in general.

Fixups aren't so bad if we make the jump opcode itself take an index into a 
table of fixups (thus letting the bytecode stream stay read-only). Register 
jumps
are dangerous, since parrot can't control what the user code loads into the 
register (while we can theoretically protect the fixup table from anything 
short of
native code).

Design Edict #2: Jumps may go anywhere.

Destinations. These are a pain, since if we can go anywhere then the JIT 
has to do all sorts of nasty and unpleasant things to compensate, and to 
make every op a valid destination. Yuck.

Design Edict #3: All destinations *must* be marked as such in the bytecode 
metadata segment. (I am officially nervous about this, as I can see a 
number of ways to subvert this for evil)

Marked destinations are very important; as for evil subversion, how about 
just saying untrusted code only gets pure interpretation, and the 
untrusting interpreter bounds-checks everything?

[snip]
Calling actual routines--subs, methods, functions, whatever--at the high 
level isn't done with branches or jumps. It is, instead, done with the 
call series of ops. (call, callmeth, callcc, tailcall, tailcallmeth, 
tailcallcc (though that one makes my head hurt), invoke) These are 
specifically for calling code that's potentially in other segments, and to 
call into them at fixed points. I think these need to be hashed out a bit 
to make them more JIT-friendly, but they're the primary transfer 
destination point

Design Edict #6: The first op in a sub is always a valid 
jump/branch/control transfer destination

Wouldn't make much sense if you had a sub but couldn't call it, now would 
it? :-D

Now. Eval. The compile opcode going in is phenomenally cool (thanks, Leo!) 
but has pointed out some holes in the semantics. I got handwavey and, 
well, it shows. No cookie for me.

The compreg op should compile the passed code in the language that is 
indicated and should load that bytecode into the current interpreter. That 
means that if there are any symbols that get installed because someone's 
defined a sub then, well, they should get installed into the interpreter's 
symbol tables.

Compiled code is an interesting thing. In some cases it should return a 
sub PMC, in some cases it should execute and return a value, and in some 
cases  it should install a bunch of stuff in a symbol table and then 
return a value. These correspond to:


   eval print 12;

   $foo = eval sub bar{return 1;};

   require foo.pm;

respectively. It's sort of a mixed bag, and unfortunately we can't count 
on the code doing the compilation to properly handle the semantics of the 
language being compiled. So...

Design Edict #7: the compreg opcode will execute the compiled code, 
calling in with parrot's calling conventions. If it should return 
something, then it had darned well better build it and return it.

How does this play with

eval 'sub bar { change_foo(); } BEGIN { bar(); }  (...stuff that depends on 
foo...)';

? The semantics of BEGIN{} would seem to require that bar be installed into 
the symbol table immediately... but then how do we reproduce that if we're 
e.g. loading
precompiled bytecode?

Oh, and:

Design Edict #8: compreg is prototyped. It takes a single string and must 
return a single PMC. The compiler may cheat as need be. (No need to check 
and see if it returned a string, or an int)

Yes, this does mean that for plain assembly that we want to compile and 
return a sub ref for we need to do extra in the assembly we pass in. 
Tough, we can deal. If it was dead-simple it wouldn't be assembly. :)

That makes sense.

-- BKS




interpreter passing (was Re: Large string patch)

2002-01-01 Thread Benjamin Stuhl

--- Dan Sugalski [EMAIL PROTECTED] wrote:
 At 07:30 AM 12/30/2001 -1000, David  Lisa Jacobs wrote:
 
 From: Dan Sugalski [EMAIL PROTECTED]
   At 08:33 PM 12/29/2001 -1000, David  Lisa Jacobs
 wrote:
   GC will manage all the memory. Everything managed
 should either be hung
 off
   a PMC or an internal structure. (There are GC hooks
 in the vtable for
   complex things)
 
 So does that mean I can get rid of passing around the
 interpreter?
 
 Sort of. Memory and structure (pmc header  string
 header) allocation must 
 be from interpreter-local pools. There's a patch to use
 TLS for the 
 interpreter pointer rather than passing it as an
 argument--I've pretty much 
 decided it's The Way To Go, so I'm going to dig it out
 and apply it.
 
 So you still need the interpreter pointer, you just don't
 have to pass it.

Are you really sure about this? The reason perl5 threads
are MULTIPLICITY-based (pass around an interpreter pointer)
is that Sarathy got a noticeable speedup from not having to
call pthread_getspecific() every time he needed to allocate
memory or look up a symbol. It can be good to have
_nocontext functions that who fetch the interpreter when
it's really needed (e.g. to throw an error), but do we want
to have to make an extra library call of unknown efficiency
on _every_ call to string_make()?

-- BKS

__
Do You Yahoo!?
Send your FREE holiday greetings online!
http://greetings.yahoo.com



Re: Request for comments

2001-12-26 Thread Benjamin Stuhl

--- Dan Sugalski [EMAIL PROTECTED] wrote:
 At 08:03 PM 12/18/2001 -0800, Benjamin Stuhl wrote:
 --- Melvin Smith [EMAIL PROTECTED] wrote:
   3) Perl IO has conditional compilation for using
 stdio.
   Dan has said no
   STDIO
but are we going to abandon conditional support
 for
   Parrot?
(I vote for ditching conditional STDIO support
   because then its easier
 to stop thinking in STDIO terms...)
 
 Unfortunately, I don't think we can completely do
 without
 stdio in parrot for one reason: miniparrot. Without
 doing a
 full configure.pl run, the _only_ I/O API we're
 guarranteed
 is a basic stdio.
 
 Ah, but we can abandon stdio completely, unless you file
 read/write in with 
 stdio. Which is reasonable, but I don't generally count
 them.

I was going to argue that unix-ish
open()/read()/write()/close() aren't portable, but even
Win32's runtime provides an emulation of that much (as does
VMS's). Are there any platforms that don't provide this
API? (I argued for stdio because it's in the ANSI spec, and
so _must_ be there, as opposed to us simply assuming it'll
be there.)
 
   I want comments now or else I threaten to post
 replies to
   myself in a
   creepy third
   person way.
 
 No! Anything but that. BKS hates things like that!
 
 Suckered into those freshmen harmless Psych 101
 experiments, I see :)

Way too many friends in Prof. Moss's cult^Hclass... ;-)

-- BKS

__
Do You Yahoo!?
Send your FREE holiday greetings online!
http://greetings.yahoo.com



Re: Hello? Win32 on fire?

2001-12-12 Thread Benjamin Stuhl

--- Andy Dougherty [EMAIL PROTECTED] wrote:
 One idiom which might work is
 
   cd foo  $(MAKE)
 
 Since lines in makefiles are handed off to the native
 shell, this will
 be dependent upon the user's native shell.  I don't know
 any details,
 but I gather the various shells in Win95, Win98, WinNT,
 and WinXP are
 not necessarily identical.  I *think* the above idiom
 works in NT, but
 not Win98.  I'm hopeful it will work in XP.
 (Of course, the user may well have installed a different
 command
 shell, in which case who knows what will happen.)

It should work, IIRC, since XP's shell is the latest
version of cmd.exe, the NT shell.
 
 If the user is using dmake, but is stuck with Win95(?)'s
 command.com,
 then he or she can still use perl5's win32/genmk95.pl. 
 Here are the
 comments from it:
 
 # genmk95.pl - uses miniperl to generate a makefile that
 command.com will
 #  understand given one that cmd.exe will
 understand
 
 # Author: Benjamin K. Stuhl
 # Date: 10-16-1999
 
 # how it works:
 #dmake supports an alternative form for its recipes,
 called group
 #recipes, in which all elements of a recipe are run
 with only one shell.
 #This program converts the standard dmake makefile.mk
 to one using group
 #recipes. This is done so that lines using  or ||
 (which command.com
 #doesn't understand) may be split into two lines that
 will still be run
 #with one shell.
 
 We would need permission from the author to include the
 script itself in
 parrot.

Consider permission granted, but I probably won't get
around to rigging it up to work with parrot myself. (I
finally got a newer system with Win2k, so I don't need to
arm-wrestle command.com anymore, woohoo!)

Of course, this idiom won't AFAIK work on VMS, but that
shouldn't surprise anyone.

-- BKS

__
Do You Yahoo!?
Check out Yahoo! Shopping and Yahoo! Auctions for all of
your unique holiday gifts! Buy at http://shopping.yahoo.com
or bid at http://auctions.yahoo.com



Re: Key stuff for aggregates

2001-12-05 Thread Benjamin Stuhl

--- Dan Sugalski [EMAIL PROTECTED] wrote:
 At 10:28 AM 12/5/2001 -0500, Jason Gloudon wrote:
 Using the aggregate's vtable is another way of getting
 the job done that 
 avoids all the extra reference PMCs. However, references
 will have to be 
 supported.
 
 References are interesting. I'm currently thinking that:
 
 *) PMCs should have a get_reference vtable entry
 *) Accessing a reference should be just like accessing
 the referent. (i.e. 
 you pass in the same key stuff and the reference vtable
 does the indirect 
 lookup for you)
 *) Some references will need to be 'smart', so if you do:
 
 $foo = \@bar[4];
 
 and @bar's a packed array, $foo's actually a fancy ref
 that knows it points 
 to @bar[4] and calls @bar's vtables when you access it.
 Or something like that.

This looks interesting, as far as it goes, but how will
parrot support the perl5ish 

use overload 
   '@{}' = \deref_as_array,
   '%{}' = \deref_as_hash;

? Do we pass in the PMC type that we want it to come back
as? But then how do we tell between the various types of
e.g. arrays. (To put it simply, how do you say I want an
arry. No, I don't _care_ if it's a PerlPMCArray or a
PerlIntArray or a PerlWhatHaveYouArray! ?)

-- BKS

__
Do You Yahoo!?
Send your FREE holiday greetings online!
http://greetings.yahoo.com



Re: Opcode numbers

2001-11-03 Thread Benjamin Stuhl

--- Gregor N. Purdy [EMAIL PROTECTED] wrote:
 Brian --
 
   None of these are issues with the approach I've been
 working on /
   advocating. I'm hoping we can avoid these altogether.
   
  
  I think this is a cool concept, but it seems like a lot
 of overhead with
  the string lookups.  
 
 I'm hoping we can keep the string lookups in order to
 sidestep the
 versioning issue. They can be made pretty cheap with a
 hashtable or search
 tree, and the lookups only happen once when we load. And,
 we may even be
 able to create the tree or hash table structure as part
 of the oplib.so,
 so we don't even have to pay to construct it at run time.
 I guess I'm
 making the provisional assumption that by the type we go
 out and
 dynamically load the oplib, a few op lookups by name
 won't be too big a
 deal if we are smart about it. Of course, I could be
 wrong, but I'd like
 to see it in action before passing judgement on it.
[snip]

Better than doing two string lookups for every op we use
(library, op_name), we can vector the library through
the fixup section. This is sort of how I at least envision
accessing global variables: the fixup has an entry
(PAR_FIXUP_GLOBVAR, strtab_ref($foo)), where strtab_ref()
is the index of a string in the string table. So loading
another oplib becomes as simple as (PAR_FIXUP_OPLIB,
core). The individual op descriptors then simply
reference the fixup for that library, which after fixup
contains the global index of the library. Actually, if
libraries are good about not reusing op numbers, we don't
have to do _any_ string lookups. Since we're building each
module's op table at load time anyway, we don't loose any
cache space by not reusing op numbers, since unused ops
will never show up in any module's table.

e.g.

..use core
..use perl
set I0, 2
set I1, 3
add I3, I0, I1
fetch P0, $foo
inc P0

produces

# .section .fixup
PAR_FIXUP_OPLIB, 1
PAR_FIXUP_OPLIB, 2
PAR_FIXUP_OPTABLE_SIZE, 4  # put this here so .optable can
   # be processed w/o any special 
   # cases and we can prealloc the
   # table

PAR_FIXUP_VARREF, 3# this becomes the pointer to
the
   # entry in the symbol table (not
   # the variable itself - its slot
   # in the table so that aliasing
   # works right)

# .section .strtab
core
perl
$foo

# .section .optable
1, 54 # core::set_i_ic
1, 33 # core::add_i_i_i
2, 1  # perl::fetch_p_ic
2, 23 # perl::inc_p

# .section .text
# numbers are the actual opcode/operand values
1 0 2
1 1 3
2 3 0 1
3 0 4
4 0

By using this scheme we manage to not do _any_ string
lookups, and if we pick our base set of ops well enough
that we don't end up obsoleting many of them, we also won't
be using as much memory as the string lookups would
require.

-- BKS

__
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com



Re: Revamping the build system

2001-10-10 Thread Benjamin Stuhl

--- Dan Sugalski [EMAIL PROTECTED] wrote:
 Okay, I think it's time to abstract out how the build
 system's handled a 
 bit. I'm not sure how much we need, but filling in a
 template makefile's 
 not going to cut it, I think.
 
 We've a couple of things we need to do generically:
 
 *) Compile C code to an object module and put that module
 in a library

We'll also need to be able to apply specific compiler
options to specific source files from at least the
platform-specific hints files (e.g. on platforms whose
optimizer breaks regexec.c).

-- BKS

__
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com



Re: [PATCH] Big patch to have DO_OP as optional switch() statment

2001-10-09 Thread Benjamin Stuhl

--- Paolo Molaro [EMAIL PROTECTED] wrote:
[snip, snip]
 The problem here is to make sure we really need the
 opcode swap
 functionality, it's really something that is going to
 kill
 dispatch performance.
 If a module wants to change the meaning of, eg the +
 operator,
 it can simply request the compiler to insert a call to a
 subroutine, instead of changing the meaning assigned to
 the
 VM opcode. The compiler is free to inline the sub, of
 course,
 just don't cripple the normal case with unnecessary
 overhead
 and let the special case pay the price of flexibility.
 Of course, if the special case is not so special, a _new_
 opcode can be introduced, but there is really no reason
 to
 change the meaning of an opcode on the fly, IMHO.
 Comment, or flame, away.

Unfortunately, compiler tricks only work at compile time.
They're great for static languages like C++ or C#, but Perl
supports doing

%CORE::GLOBAL::{'print'} = \myprint;

at _runtime_. This is much to late to be going back and
patching up any occurences of print_p in the opstream, so
we need a level of indirection on every overridable opcode.
(Note that the _overloadable_ ones like the math routines
don't need that level of indirection - they get it by
vectoring through the PMC vtables).

-- BKS

__
Do You Yahoo!?
NEW from Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1



RE: Parrot 0.0.2

2001-10-03 Thread Benjamin Stuhl

--- Brent Dax [EMAIL PROTECTED] wrote:
 
 
 --Brent Dax
 [EMAIL PROTECTED]
 Configure pumpking for Perl 6
 
 They *will* pay for what they've done.
 
 # -Original Message-
 # From: Simon Cozens [mailto:[EMAIL PROTECTED]]
 # Sent: Wednesday, October 03, 2001 09:51
 # To: Brent Dax
 # Cc: [EMAIL PROTECTED]
 # Subject: Re: Parrot 0.0.2
 #
 #
 # OK, let's try and clear this up.
 #
 # On Wed, Oct 03, 2001 at 09:39:32AM -0700, Brent Dax
 wrote:
 #  #  got: 'Seem to have negative Nx
 #  not ok
 #  '
 #  # expected: 'Seem to have negative Nx
 #  Seem to have positive Nx after pop
 #  '
 #
 # Don't know what's going on here.
 #
 #  t/op/string.NOK 4# Failed test
 (Parrot/Test.pm at line 74)
 #  #  got: 'Error: Control left bounds of
 byte-code
 # block (now at
 #  location
 #  31)!
 #
 # There isn't an end on that test. Fixed.
 #
 #  #  got: 'failure
 #  '
 #
 # Since there was no other output, this failed:
 # timeI0
 # ge  I0, 0, OK1
 #
 # Now that's anyone's guess.
 #
 # I've added some debugging prints, can you try a resync?

I resynced at 3:05pm EDT today, and I'm seeing the same
errors (Pentium III, Win2k):

C:\parrotperl t/harness
t/op/basic..ok
t/op/bitwiseok
t/op/integerok
t/op/number.ok
t/op/stacks.NOK 5# Failed test (Parrot/Test.pm
at line 74)
#  got: 'Seem to have negative Nx
not ok
'
# expected: 'Seem to have negative Nx
Seem to have positive Nx after pop
'
t/op/stacks.ok 9/9# Looks like you failed 1 tests
of 9.
t/op/stacks.dubious
Test returned status 1 (wstat 256, 0x100)
DIED. FAILED test 5
Failed 1/9 tests, 88.89% okay (-3 skipped tests: 5
okay, 55.56%)
t/op/string.ok, 1/11 skipped: TODO: printing empty
string reg segfaults
t/op/time...NOK 2# Failed test (Parrot/Test.pm
at line 74)
#  got: 'failure
'
# expected: 'ok, (!= 1970) Grateful Dead not
ok, (nowbefore) timelords need not apply
'
# Looks like you failed 1 tests of 2.
t/op/time...dubious
Test returned status 1 (wstat 256, 0x100)
DIED. FAILED test 2
Failed 1/2 tests, 50.00% okay
t/op/trans..ok
Failed Test   Status Wstat Total Fail  Failed  List of
Failed


t/op/stacks.t  1   256 91  11.11%  5
t/op/time.t1   256 21  50.00%  2
4 subtests skipped.
Failed 2/8 test scripts, 75.00% okay. 2/100 subtests
failed, 98.00% okay.

-- BKS

__
Do You Yahoo!?
NEW from Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1



Re: SV: Parrot multithreading?

2001-09-29 Thread Benjamin Stuhl

--- Alan Burlison [EMAIL PROTECTED] wrote:
 
   or have entered a mutex,
  
  If they're holding a mutex over a function call without
 a
  _really_ good reason, it's their own fault.
 
 Rubbish.  It is common to take out a lock in an outer
 functions and then
 to call several other functions under the protection of
 the lock.

Let me be more specific: if you're holding a mutex over a
call back into parrot, it's your own fault. Parrot itself
knows which functions may croak() and which won't, so it
can use utility funtions that return a status in places
where it'd be unsafe to croak(). (And true panics probably
should not be croak()s the way they are in perl5 - there's
not much an application can do with Bizarre copy of
ARRAY)
 
The alternative is that _every_ function simply
 return
   a status, which
is fundamentally expensive (your real retval has to
 be
   an out
parameter, to start with).
 
 Are we talking 'expensive in C' or 'expensive in parrot?'

Expensive in C (wasted memory bandwidth, code bloat -
cache waste), which translates to a slower parrot.

  It is also slow, and speed is priority #1.
 
 As far as I'm aware, trading correctness for speed is not
 an option.

This is true, which is why I asked if there were any
platforms that have a nonfunctional (set|long)jump.

-- BKS

__
Do You Yahoo!?
Listen to your Yahoo! Mail messages from any phone.
http://phone.yahoo.com



Re: SV: Parrot multithreading?

2001-09-28 Thread Benjamin Stuhl

Thus did the Illustrious Dan Sugalski [EMAIL PROTECTED]
write:
 Croak's going to throw an interpreter exception. There's
 a little bit of 
 documentation about the exception handling opcodes in 
 docs/parrot_assembly.pod, with more to come soonish.

This is fine at the target language level (e.g. perl6,
python, jako, whatever), but how do we throw catchable
exceptions up through six or eight levels of C code?
AFAICS, this is more of why perl5 uses the JMP_BUF stuff -
so that XS and functions like sv_setsv() can Perl_croak()
without caring about who's above them in the call stack.
The alternative is that _every_ function simply return a
status, which is fundamentally expensive (your real retval
has to be an out parameter, to start with).

-- BKS

__
Do You Yahoo!?
Listen to your Yahoo! Mail messages from any phone.
http://phone.yahoo.com



RE: SV: Parrot multithreading?

2001-09-28 Thread Benjamin Stuhl

--- Hong Zhang [EMAIL PROTECTED] wrote:
 
   This is fine at the target language level (e.g.
 perl6, python, jako,
   whatever), but how do we throw catchable exceptions
 up through six or
   eight levels of C code? AFAICS, this is more of why
 perl5 uses the
   JMP_BUF stuff - so that XS and functions like
 sv_setsv() can
   Perl_croak() without caring about who's above them in
 the call stack.
  
  This is my point exactly.
 
 This is the wrong assumption. If you don't care about the
 call stack, 
 how can you expect the [sig]longjmp can successfully
 unwind stack?
 The caller may have a malloc memory block, 

Irrelevant with a GC.

 or have entered a mutex,

If they're holding a mutex over a function call without a
_really_ good reason, it's their own fault.

 or acquire the file lock of Perl cvs directory. You
 probably have
 to call Dan or Simon for the last case.
 
  The alternative is that _every_ function simply return
 a status, which
  is fundamentally expensive (your real retval has to be
 an out
  parameter, to start with).
 
 This is the only right solution generally. If you really
 really really
 know everything between setjmp and longjmp, you can use
 it. However,
 the chance is very low.

It is also slow, and speed is priority #1.

[snip, snip]
 code. The problem is they can not be used inside signal
 handler under
 MT, and it is (almost) impossible to write a thread-safe
 version.

Signals are an event, and so don't need jumps. Under MT,
it's not like there would be a lot of contention for
PAR_jump_lock.

-- BKS

__
Do You Yahoo!?
Listen to your Yahoo! Mail messages from any phone.
http://phone.yahoo.com



Re: 0.0.2 needs what?

2001-09-25 Thread Benjamin Stuhl

--- Dan Sugalski [EMAIL PROTECTED] wrote:
 At 06:07 PM 9/25/2001 -0700, Benjamin Stuhl wrote:
 Just to make sure that it's making the _right_ sense,
 the
 fixup section is basically our single level of
 indirection
 so that we can make the bytecode itself be
 position-independant, right?
 
 Yup.
 
 But why store it in this
 format? What we really need to store is the list of what
 we
 expect in the table and where.
 
 We have 8 bytes per entry. We can store a lot in there.
 :)

But not enought to allow us to vector Perl-level variable
references - to support run-time aliasing, we need to have
the fixup table entry be built from the symbol names:

e.g. for accessing $Foo::bar, we need

index in bytecode (offset in current fixup table) - fixup
table entry (constructed by looking up $Foo::bar in the
symbol table and getting the address of it's symtab entry)
- the PMC * from the symtab entry

This extra level of indirection lets us be sure that all
the references to a variable are updated when someone does 
%Foo::{'$bar'} = \$glock;

 Can't we just have the bytecode header
 have
 
 int32 *data_template;
 int32 fixup_space_needed;
 
 and build the final table as needed?
 
 If we do that, the fixup section won't be part of the
 same section of 
 memory as the bytecode, which means we'll need to touch
 at least some of 
 the actual bytecode so we can set the absolute address of
 the fixups. 
 Sticking it on the end means we can access it relatively,
 and padding to 8k 
 means the section should start on its own memory page so
 we won't be making 
 a private copy of anything but the fixup section.

But it should be just as cheap to have a pointer to the
current fixup section in the interpreter structure and just
vector off of that, rather than doing relative lookups.
Besides, for non-constant entries in the fixup table (PMCs,
again), the fixup section needs to be per-interpreter,
since different interpreters will want to reference their
own PMCs, not someone else's.

-- BKS

__
Do You Yahoo!?
Get email alerts  NEW webcam video instant messaging with Yahoo! Messenger. 
http://im.yahoo.com



RE: [PATCH] assemble.pl registers go from 0-31

2001-09-24 Thread Benjamin Stuhl

--- Hong Zhang [EMAIL PROTECTED] wrote:
 Just curious, do we need a dedicated zero register and
 sink register?
 The zero register always reads zero, and can not be
 written. The sink
 register can not be read, and write to it can be ignored.

Those, probably not = we have a real nop, and it takes the
same number of bits to encode a register as it does a
literal integral zero. What we may want (and I've brought
up before) is special PMC registers, so our contstant
tables aren't clogged up with undefs or the like.

-- BKS

__
Do You Yahoo!?
Get email alerts  NEW webcam video instant messaging with Yahoo! Messenger. 
http://im.yahoo.com



Re: pack(d) packs floats, I think.

2001-09-22 Thread Benjamin Stuhl

--- Simon Cozens [EMAIL PROTECTED] wrote:
 On Sun, Sep 23, 2001 at 02:17:40AM +0300, Jarkko
 Hietaniemi wrote:
  unaligned access
 
 Bother. It is as I feared. 
 
 Dan, we need to do something about this. The choices are:
 put floats into the
 constant section, or ensure instructions are assigned on
 an appropriate
 boundary. I can see pros and cons of both.

What I ended up doing in my work on the perl5 B::Bytecode
stuff was actually storing floats as their string
representation and restoring them back into native floats
at load time. Yes, it's slow and hackish, but strings are
the only *portable* format for floating-point numbers, and
they have the added benefit of degrading gracefully when
bytecode from a machine with more bits of precision is
loaded on a machine with fewer.

-- BKS

__
Do You Yahoo!?
Get email alerts  NEW webcam video instant messaging with Yahoo! Messenger. 
http://im.yahoo.com



Re: Patch to add string_nprintf

2001-09-17 Thread Benjamin Stuhl

--- Simon Cozens [EMAIL PROTECTED] wrote:
 On Mon, Sep 17, 2001 at 09:33:56AM +0100, Tom Hughes
 wrote:
  The attached patch adds string_nprintf, the last
 unimplemented
  function listed in strings.pod as far as I can see.
 
 Thanks; but I think I'm going to wait for the portability
 police to
 comment. There's every likelihood we want to write out
 own sprintf-like
 function.

I'm quite sure we will. There are so many unportabilities
in *printf (like, what's the format for an IV? an NV?), as
well lots of potential buffer overruns (not all platforms
have vsnprintf). And besides, we may not want a vsnprintf -
we may want something that autogrows the string. Take a
look at sv_vcatpvf in perl5's sv.c for inspiration.

-- BKS

__
Terrorist Attacks on U.S. - How can you help?
Donate cash, emergency relief information
http://dailynews.yahoo.com/fc/US/Emergency_Information/



RE: [PATCH] testsuite and Win32 compilation

2001-09-15 Thread Benjamin Stuhl

--- Brent Dax [EMAIL PROTECTED] wrote:
 Gibbs Tanton - tgibbs:
 # ## +#if defined(WIN32)
 # ## +program_code = malloc( file_stat.st_size );
 # ## +#else
 #
 # Also, since more than win32 is not going to have mmap,
 # perhaps you could add
 # a Configure #define for HAS_MMAP or something like
 that.
 # Then you could
 # test the cc compiler to check for mmap availability.
 
 Configure sets up a bunch of HAS_HEADER_FOO macros in
 parrot/config.h,
 including HAS_HEADER_MEMORY (undef on my Win32 system). 
 Would this be
 the correct file?

I'd recommend HAS_HEADER_SYSMMAN (and if anyone saw it, I
posted a patch yesterday that started making header
includes actually be dependent on the configure macros).

-- BKS

__
Do You Yahoo!?
Get email alerts  NEW webcam video instant messaging with Yahoo! Messenger
http://im.yahoo.com



[PATCH] Win32 build

2001-09-14 Thread Benjamin Stuhl

As promised, here's the patch that gets Parrot building out
of the box on Win32. I had to comment out the mmap(), I'll
start working on doing that portably after class.

-- BKS

__
Terrorist Attacks on U.S. - How can you help?
Donate cash, emergency relief information
http://dailynews.yahoo.com/fc/US/Emergency_Information/
 patch


Re: Call/savestack popping semantics

2001-09-14 Thread Benjamin Stuhl

--- Dan Sugalski [EMAIL PROTECTED] wrote:
 So, I'm currently working on the stack system for Parrot.
 I've got the 
 following issue here.
 
 Assuming there's one general stack to save stuff on,
 where stuff is:

Out of curiousity, why only one stack? Perl 5 has at least
four or five that I can think of off hand (and actually
several times that, since it stack-switches when its
calling a tie or a signal handler or a hook or...).
 
 * Scope entries
 * Return addresses for JSRs
 * Saved individual registers
 * Local() calls

These really ought to be separate stacks - not every scope
has a return address and not every return address has a
scope (e.g. scopeless subs). Why conflate them when we
don't need to? (similar arguments hold true for register
saving and dynamic variables)
 
 Should plain returns at the parrot level clean things
 up? Which is to 
 say, when I walk back up the stack looking for a return
 address, if I come 
 across any scope entry markers, shall I clean 'em up,
 toss them, or pitch a 
 fit? (Currently I'm going to pitch a fit, but that's a
 temporary solution)
 
 Having two types of returns, one that cleans up and one
 that doesn't, is 
 also an option. (I can see both being useful and
 reasonable)

I would say go with this. It will probably be quite common
that we'll need to do cleanup on return, but there are
sitiuations where we won't need it, and so shouldn't pay
the overhead.

-- BKS

__
Do You Yahoo!?
Get email alerts  NEW webcam video instant messaging with Yahoo! Messenger
http://im.yahoo.com



Re: RFC: Bytecode file format

2001-09-14 Thread Benjamin Stuhl

--- Brian Wheeler [EMAIL PROTECTED] wrote:
 I've been thinking alot about the bytecode file format
 lately.  Its
 going to get really gross really fast when we start
 adding other
 (optional) sections to the code.
 
 So, with that in mind, here's what I propose:
 
 * All data sizes are in longwords (4 bytes) because
 that's just the way
 things are :)

We can't do that. There are platforms on both ends that
have _no_ native 32-bit data formats (Crays, some 16-bit
CPUs?). They still need to be able to load and generate
bytecode without ridiculuous CPU penalties (your Palm III
is not running on a 700MHz Pentium III, after all!)
 
 * The file is composed of a header (which is really just
 a magic 
 cookie) , a series of data chunks, and a directory (of
 sorts)
 
 
 OffsetLength  Description
 0 1   Magic Cookie (0x013155a1)
 1 n   Data
 n+1   m   Directory Table
 m+n+1 1   Offset of beginning of directory table (i.e. n+1)
 

No, we _really_ need some versioning info (either
major/minor or just a single integer (if we go through even
16000 revisions of the bytecode, we've screwed up
somewhere). As Dan said, we also need a BOM of some sort
and a word size indicator.

I think we certainly want the directory at the front,
especially since we will likely end up with data segments
that we might not want/need to load (e.g. the original
source code, debugging info (?), the optimized parse tree,
etc.). If we have to map in the entire file to find the end
so we can find what parts we don't want to map, that sort
of defeats the purpose.

-- BKS

__
Terrorist Attacks on U.S. - How can you help?
Donate cash, emergency relief information
http://dailynews.yahoo.com/fc/US/Emergency_Information/



Re: RFC: Bytecode file format

2001-09-14 Thread Benjamin Stuhl

--- Brian Wheeler [EMAIL PROTECTED] wrote:
 Ok, what if we did IFF with these caveats:
   * all chunks must be padded to 4 bytes (instead of IFF's
 2)
   * no nesting of FORMs 
 
 Chunks we'd need are:
 
 Name: 'PINF' - Parrot Information
 Size: 28 bytes + size of directory
 Optional: No
 Data:
   longmagic cookie (or will PINF) be enough?
   8-byte word:endianness (magic value
 0x123456789abcdef0)
   byte:   word size
   byte[7]:empty
   word:   major version
   word:   minor version
   longcount of directory entries
   --- directory goes here ---
   -- each entry as follows --
   longtype of chunk
   longoffset
 
 
 Name: 'PBYT' - Parrot Bytecode
 Size: Varies
 Optional: Sure. :)
 Data:
   bytes of the bytecode
 
 
 Name: 'PSTR' - Parrot String Table 
 Size: Varies
 Optional: Yes
 Data:
   longCount of string entries
   --- each string as follows ---
   longbyte length
   n bytes + pad   string data
 
 
 Name: 'PFIX' - Parrot Fixup Table
 Size: Varies
 Optional: Yes
 Data:
   --- beats me...how are we doing fixups? ---
 
 
 Name: 'PNOT' - Parrot Notes Block
 Size: Varies
 Optional: Yes
 Data:
   free-form text for 'notes' about the file.
 
 
 
 
 How's this?
 

A few more chunks:

Name: 'PCON' - Parrot Constants Block
Size: Varies
Optional: Yes
Data: the constants section (you know, all
those 
  intialized PMCs... :-)

Name: 'PSOU' - Parrot Source Block
Size: Varies
Optional: Yes
Data: the source code for the program

Name: 'PMOD' - Parrot Module list
Size: Varies
Optional: Yes
Data: the list of all the modules that need to 
  be loaded for this module to work (since
  there may be a lot of time between
BEGIN{}
  and execution, we need to record just
what
  was loaded so we can reload and
initialize
  it)

Name: 'PSYM' - Parrot symbol table
Size: Varies
Optional: Yes
Data: an offset-based hash table for every 
  symbol defined in the module (dynamic
  symbol table manipulation is two-level -
  an in-memory table that is queried first
  for dynamic overrides and the static per-
  module one for compile-time definitions)

Name: 'PSPS' Parrot Special Subroutines
Size: Varies
Optional: Yes
Data: word: number of special routines
  (followed by a list of
 word: type
 word: bytecode offset
  pairs) (the point of this is that
multiple
  BEGIN/INIT/CHECK/END subs are allowed, so
  they can't be simply stored in the symbol
  
  table)


There are probably some other sections I can't think of,
but these are a start.

-- BKS

__
Terrorist Attacks on U.S. - How can you help?
Donate cash, emergency relief information
http://dailynews.yahoo.com/fc/US/Emergency_Information/



Quick success report for Win32

2001-09-13 Thread Benjamin Stuhl

I had to hand-apply the NV patch and some of the casting
patches to get VC++ to shut up and compile, but Parrot
works on Win32 (Win2k, VC++ 6.0SP5). (it takes 1 sec to
count to ten million on my PIII 1Ghz) I'll post the
hacked-up Makefile that I fed through nmake to get it to
work when I get back from classes.

-- BKS

__
Terrorist Attacks on U.S. - How can you help?
Donate cash, emergency relief information
http://dailynews.yahoo.com/fc/US/Emergency_Information/



Re: #include config.h or #include parrot/config.h

2001-09-13 Thread Benjamin Stuhl

--- Dave Mitchell [EMAIL PROTECTED] wrote:
 Andy Dougherty [EMAIL PROTECTED] wrote:
  On Wed, 12 Sep 2001, Dan Sugalski wrote:
  
 changing parrot.h to do  #include
 parrot/config.h and
 then changing
 Makefile to add -I./include to CCFLAGS.
   
   One thing to keep in mind is that the directory may
 not be sufficient on 
   some platforms. VMS, specifically, ignores the
 directory portion of the 
   include filename. (And the suffix, generally, but
 that's separate)
  
  Hmm.  So would you suggest adding -I[.include]
 -I[.include.parrot] for VMS
  as well?  (My VMS days were a very long time ago.)
  
   Not, mind, that I'm proposing prepending parrot_ to
 all the filenames, 
   though that's an option certainly.
  
  That would be fun on 8.3 filesystems :-).
 
 Perhaps I'm missing something here, but I always thought
 that
 
 #include config.h
 rather than
 #include config.h
 
 would ensure that the local Perl version would get always
 get picked up in 
 preference.
 

The point is not us finding our config.h; the problem is if
we are embedded, we don't want to clobber our embedder's
config.h. (Besides, when we're being embedded, there're no
promises about what order the -I flags will be when we're
compiled. E.g.:

#include parrot/embed.h /* includes parrot/config.h
#include project.h /* includes config.h */

int call_parrot(...)
{
IV i;
...
#ifdef USE_FUNCTION_FOO
embedders_foo();
#endif
...
}

-- BKS

__
Terrorist Attacks on U.S. - How can you help?
Donate cash, emergency relief information
http://dailynews.yahoo.com/fc/US/Emergency_Information/



Re: patch: assembly listings from assembler

2001-09-13 Thread Benjamin Stuhl

--- Brian Wheeler [EMAIL PROTECTED] wrote:
  Index: assemble.pl

===
 RCS file: /home/perlcvs/parrot/assemble.pl,v
 retrieving revision 1.14
 diff -r1.14 assemble.pl
 7a8
  use Getopt::Long;
 9,12c10,33
  my $opt_c;
  if (@ARGV and $ARGV[0] eq -c) {
  shift @ARGV;
  $opt_c = 1;
 ---
  my %options;
  GetOptions(\%options,('checksyntax',
'help',
'version',
'verbose',
'output=s',
'listing=s'));
  

Could we please get in the habit of adding a -c or a -u to
our CVS diffs, just as we would with normal patches?

Many thanks,

-- BKS

__
Terrorist Attacks on U.S. - How can you help?
Donate cash, emergency relief information
http://dailynews.yahoo.com/fc/US/Emergency_Information/



Re: #include config.h or #include parrot/config.h

2001-09-12 Thread Benjamin Stuhl

--- Andy Dougherty [EMAIL PROTECTED]
wrote:
 In perl5, we've had occasional header file name conflicts
 over the
 years.  One common example is someone putting a file
 named config.h
 in /usr/local/include.  Other conflicts with string.h
 and memory.h
 are also conceivable.
 
 I'd suggest 
 
   cd parrot
   mkdir include
   mkdir include/parrot
   mv *.h include/parrot
 
 changing parrot.h to do  #include parrot/config.h and
 then changing
 Makefile to add -I./include to CCFLAGS.

YES!!! This is something I've wanted to do to Perl5 for
years, but we can't because every XS module in the world
expects the headers to be at the root. We should
_definitely_ do this -- the fewer namespaces we pollute,
the better.

-- BKS

__
Do You Yahoo!?
Get email alerts  NEW webcam video instant messaging with Yahoo! Messenger
http://im.yahoo.com



Re: Muddled Boundaries - Perl 6 vs Parrot

2001-09-11 Thread Benjamin Stuhl

--- Simon Cozens [EMAIL PROTECTED] wrote:
 On Mon, Sep 10, 2001 at 11:26:03AM -0700, Benjamin Stuhl
 wrote:
   It's not a prioirty, but it's so much easier to walk
 the
   correct path from the start.  Since it's all Parrot,
 it's even easier.
  
  Hear, hear! I remember the pain in 5.005_5* of turning
 off
  PERL_POLLUTE. I expect that there may still be CPAN
 modules
  that won't build without manually defining it.
 
 You are, of course, correct; I back down. However, if you
 care that
 much, I'm going to make you prove it by implementing it.
 :)

On my personal todo list for this afternoon is to go
through each of the source files and enforce the coding
PDD. However, before I do this, I would like to bring up
the question of prefixes once again. Do we really need a 7
letter, mixed-case prefix (Parrot_)? If Apache can do ap_,
why can't we do par_ (and maybe parp_ for private stuff)?
Also, I am planning to go through the structs and prefix
them (STRING - PAR_STRIN, for instance) and clean up the
subsystem naming. What do you think of par_gc_* for memory
management (yes, that means that uncollected memory is
gotten by par_gc_memalloc_nogc(), but if you really want
it, you deserve to type in all those letters) and par_io_*
for I/O?

-- BKS

__
Do You Yahoo!?
Get email alerts  NEW webcam video instant messaging with Yahoo! Messenger
http://im.yahoo.com



Feature testing API?

2001-09-11 Thread Benjamin Stuhl

It seems to me that Parrot should expose a feature
testing API. Something on the order of

int [read: boolean] par_has_feature(PAR_STRING *feature);

(yes, I _do_ think that claiming STRING is unnecessary
namespace pollution - sepecially as ANSI compilers, AFAIK,
aren't required to be case-sensitive)

This would be very useful for any language running on
Parrot, so that they can just test (via some language
binding) its presence at the beginning of a program, rather
than bombind out in the middle when they hit an
unimplemented call.

Example features might be async I/O, run-time compilation
(eval) of different languages, dynamic loading, etc.

Basically, I would like to be able to say:

use features qw[asyncio eval(perl6)];

and know that if my program loads, it's not going to panic
when it gets to an

eval 'Async::queueio_out($out_fh, $text);';

What d'y'all think?

-- BKS

__
Do You Yahoo!?
Get email alerts  NEW webcam video instant messaging with Yahoo! Messenger
http://im.yahoo.com



Re: Muddled Boundaries - Perl 6 vs Parrot

2001-09-10 Thread Benjamin Stuhl

--- Bryan C. Warnock [EMAIL PROTECTED] wisely wrote:
 On Monday 10 September 2001 01:08 pm, Simon Cozens wrote:
  And in addition - why are we worrying about namespace
 collision RIGHT NOW?
  Sure, when Parrot can be embedded, then we should
 ensure that our names
  aren't going to clash. But who in their right minds is
 going to embed
  Parrot in anything in its current state? (Leon, I said
 in their right
  minds)
 
  It's not a priority, compared to getting working code
 out there. We can
  sort it out later.
 
 Oh, how many times have I heard that before?
 
 It's not a prioirty, but it's so much easier to walk the
 correct path from 
 the start.  Since it's all Parrot, it's even easier.

Hear, hear! I remember the pain in 5.005_5* of turning off
PERL_POLLUTE. I expect that there may still be CPAN modules
that won't build without manually defining it.

It's much better if we get it right from the start so that
there's less that we need to go back and fix.

-- BKS

__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail
http://personal.mail.yahoo.com/



Re: Math functions? (Particularly transcendental ones)

2001-09-08 Thread Benjamin Stuhl

--- Dan Sugalski [EMAIL PROTECTED] wrote:
 Okay, I'm whipping together the fancy math section of
 the interpreter 
 assembly language. I've got:
 
 sin, cos, tan : Plain ones
 asin, acos, atan  : arc-whatevers
 shinh, cosh, tanh : Hyperbolic whatevers
 log2, log10, log  : Base 2, base 10, and explicit base
 logarithms
 pow   : Raise x to the y power
 
 Can anyone think of things I've forgotten? It's been a
 while since I've 
 done numeric work.

ln, asinh, acosh, atanh2?

-- BKS

__
Do You Yahoo!?
Get email alerts  NEW webcam video instant messaging with Yahoo! Messenger
http://im.yahoo.com



language agnosticism and internal naming

2001-09-06 Thread Benjamin Stuhl

I had a thought this morning on funtion/struct/global
prefixes for Parrot. If we really plan to also run
Python/Ruby/whatever on it, it does not look good for the
entire API to be prefixed with perl_. We really (IMHO)
ought to pick something else so that we don't give people a
convenient target for FUD.

For lack of anything better, I propose par_ for
functions. We might stll be able to get away with PL_ 
for globals (Parrot Library?), but I doubt it.

Just something else to consider. (But hopefully a topic
that won't make Dan's brain hurt any more than it probably
does. :-)

-- BKS

__
Do You Yahoo!?
Get email alerts  NEW webcam video instant messaging with Yahoo! Messenger
http://im.yahoo.com



Re: An overview of the Parrot interpreter

2001-09-04 Thread Benjamin Stuhl

--- Dan Sugalski [EMAIL PROTECTED] wrote:
 At 03:48 PM 9/4/2001 -0400, Uri Guttman wrote:
   DS == Dan Sugalski [EMAIL PROTECTED] writes:
 
 
DS Ah. I've always wanted to do that with tied
 hashes. Okay, even
DS more reason to pass the data in! (We're going to
 end up with a
DS WANT register by the time we're done...)
 
 that is not a bad idea. we could allocate a PMC register
 (e.g. #31)
 permanently to store WANT info (in a hash i assume like
 the RFC
 implies).
 
 I don't think I'd want to soak up a PMC register that
 way. Maybe an integer 
 one.

Maybe not a general purpose PMC register, but what about a
special one? Since the proposal was to lazily update it, it
doesn't need to be part of the standard register frame.
Besides, I though we were going with having a few special
PMC registers (PL_sv_yes, PL_sv_no, PL_sv_undef, etc.) to
reduce the size of the constants section?

-- BKS

__
Do You Yahoo!?
Get email alerts  NEW webcam video instant messaging with Yahoo! Messenger
http://im.yahoo.com



-g vs. -O

2001-07-06 Thread Benjamin Stuhl

Alright, here's an issue I was musing on after dinner
yesterday: There are huge sets of optimizations that could
be made *if* the user promises not to do certain things.
For instance, who needs a symbol table when the user has
promised not to do any symbolic lookups? (Yes, I know, the
debugger, but only _if_ the user actually wants to debug
the program.) Thus, I propose 3 command-line switches
(highly reminiscent of gcc...):

-g : include all information required for debugging (symbol
 tables, etc.) and do not perform optimizations

-O# : controls just how complex the optimizations we try to
  make are, given the constraints we're under from what
  the user did _not_ promise to abstain from

-fpromise : this is a bit different from gcc, where it 
controls individual optimizations - instead, in
Parrot it enumerates the promises the user
makes
(eg. I solemnly swear to never use symbolic 
references, count on specific op patterns, or
use any number large enough to require
bignums.)

If certain promises become very common, they could possibly
get their own flag or something.

Thoughts?

-- BKS

__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail
http://personal.mail.yahoo.com/



Re: new event loop

2001-07-05 Thread Benjamin Stuhl

Thus spake the enlightened Uri Guttman [EMAIL PROTECTED]:
 
 i am going to make a proposal that we ('we' to be defined
 later) develop
 a new common event loop with two major goals in mind:
 
   1. the event loop should be fully portable over all
 modern unix OS's
  and the win32 server flavors (nt, 2k).
 

VMS! We must have VMS! Oh, and it should proabably be
modular enough that one can use a stripped down version of
it to write Palm apps too. (Perl6 for the TI-89, anyone?)

-- BKS

__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail
http://personal.mail.yahoo.com/



Re: ~ for concat / negation (Re: The Perl 6 Emulator)

2001-06-22 Thread Benjamin Stuhl

 In summary:
 
1. I don't like ~ for concat 
 
2. But if it does become concat, then we still
 shouldn't
   change ~'s current unary meaning
 
 
 Thanks for listening.
 
 -Nate

I agree completely. However, this is no longer really a
topic for -internals, it's really a purely language thing. 

-- BKS


__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail
http://personal.mail.yahoo.com/



Re: A quick sketch of the interpreter

2001-06-15 Thread Benjamin Stuhl

--- Dan Sugalski [EMAIL PROTECTED] wrote:
 =head1 Stacks
[snip]
 The stacks are at least:
 
 =over 4
 
 =item Temp stack
 
 for squirreling away the contents of individual registers
 
 =item Register stack
 
 For pushing the entire register file at once. There are
 four sets, one
 for each register type.
 
 =item state stack
 
 For the interpreter's internal state
 
 =back

Perl 5-ish save stack for dynamic scoping? (whatever term
replaces 'local')
 
What is the subroutine calling convention? Caller cleans or
callee cleans?

 =head1 Registers
 
 We have four sets. Each set has 64 members

Do we really need 64 ints and 64 floats? 64 stringish ones
I can understand (sort of) - the RE engine could use them.
Maybe only 32 each of ints and floats? Also, what about the
suggestion to have the various special values
(PL_sv_undef, PL_sv_yes, PL_sv_yes) be registers (so
undef $foo becomes 'st sp_reg0, $foo' or somesuch)? Also,
what about having one or more of the registers be the
'lexical state register(s)' to inplement pragmas (or is
this the state stack?)?
 
 =head1 Opcodes
 
 Opcodes are all dispatched indirectly via an opcode
 function
 table. Each segment of bytecode (a segment roughly
 corresponding to a
 compilation unit--a precompiled module would be in its
 own segment,
 for example) has its own opcode function table.

Be wary of this. I tried this in Perl 5 (on an old sun4c,
granted), and I came out with something like a 5% slowdown
over having the function pointer actually stored in the op,
IIRC.

 =head1 The opcode loop
 
 This is a tight loop. All it does is call an opcode
 function, get back
 a pointer to the next opcode to execute, and check the
 event dispatch
 flag. Lather, rinse, repeat ad infinitum.

How does this port to a TIL form?
 
 =head1 Bytecode
Looks good from here (and a _lot_ prettier than
B::Bytecode/ByteLoader!).

-- BKS

__
Do You Yahoo!?
Spot the hottest trends in music, movies, and more.
http://buzz.yahoo.com/



Re: Should the op dispatch loop decode?

2001-06-12 Thread Benjamin Stuhl

--- Dan Sugalski [EMAIL PROTECTED] wrote:
 'Kay, here's a question to ponder. Should the op dispatch
 loop handle 
 argument decoding, or should that be left to the opcode
 functions?

[good analysis of trade-off's snipped]

 At the moment I'm leaning towards the functions doing
 their own decoding, 
 as it seems likely to be faster. (Though we'd be
 duplicating the decoding 
 logic everywhere, and bigger's reasonably bad) Possibly
 mandating shadow 
 functions for each opcode function, where the shadow does
 the decoding and 
 calls the real functions which take real things rather
 than our registers.
 
 Opinions anyone?

I don't see where shadow functions are really necessary -
after all, no one has ever complained that you can't do 

pp_chomp(sv); /* or pp_add(sv1, sv2), for that matter */

in Perl 5. Quite frankly the shadow thing sounds like a
bundle of unnecessary function calls.

But that's just my opinion, feel free to disagree.

-- BKS

__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/



vtbl-based SVs and sv_setsv()

2001-01-13 Thread Benjamin Stuhl

How is setting one SV from another going to be implemented?
My (admittedly vague) recollection was that it would be
something like

void sv_setsv(SV* dest, SV* src)
{
   dest-sv_vtbl-delete(dest); /* clear the old value */
   dest-sv_vtbl = src-sv_vtbl;
   dest-sv_vtbl-dupfrom(dest, src); /* and copy in the
new */
}

That is, in $a = $b, $a would get a new vtbl, the one from
$b. My question is how does this work with the need for
assign-by-value, which is required for things like ties and
overloads? 

-- BKS

__
Do You Yahoo!?
Get email at your own domain with Yahoo! Mail. 
http://personal.mail.yahoo.com/



Re: standard representations

2000-12-27 Thread Benjamin Stuhl

--- Dan Sugalski [EMAIL PROTECTED] wrote:
 At 08:02 AM 12/26/00 -0800, Benjamin Stuhl wrote:
 Thus spake the illustrious Dan Sugalski [EMAIL PROTECTED]:
   For integers, we have two types, platform native, and
   bigint. No guarantees
   are made as to the size of a native int. bigints can
 be
   of any size.
 
 I'm not sure about the wisdom of not making any
 guarrantees
 about int size, since that means that extensions have to
 go
 through the same hoops perl5 has, dealing with
 "unspecified" behaviors (cf. fun with ANSI stdio). To
 make
 life easy, we might want to ordain sizeof(p6int) =
 sizeof(void *)  sizeof(p6int) = 4.
 
 Perl will (well, should at least) automagically upgrade
 to bigints if a 
 regular int overflows (Assuming that conversion's not
 been forbidden by a 
 particular variable), so that's not going to be an issue
 inside variables. 
 As for types presented to extensions, we can certainly
 provide I8, I16, 
 I32, and friends.
 
 On the other hand,
 this makes a port of the PVM to Palms and the like
 somewhat
 harder (but would it be much easier to wedge them into
 the
 standard PVM?). Also, can we please mandate
 2s-complement
 integral math? Perl 5 really always has, but can we
 please
 make it official?
 
 Why? For variables, math is math--2+2=4 regardless of
 whether you're one or 
 two's complement, or BCD-encoded, or use the EBCDIC
 signed characters, 
 or... Mandating representations seems rather too
 low-level to me, though if 
 you've got a good argument I'm OK with it.

Mostly because it seems to be a requirement for intelligent
integer-preserving maths (cf. PERL_PRESERVE_IVUV in perl5).
 
   For floats, we also have two types, C double and
   bigfloat. No guarantees to
   the size or accuracy of the double. bigfloats can be
 of
   any size.
 
 Floating point is even harder, and will require a lot of
 build-time checks anyway.
 
 The big issue I have with floats is that bigfloats will
 be more precise 
 than regular floats/doubles, and so downconverting will
 lose data, which I 
 don't like. Other than that, because floats should
 autoconvert like ints, I 
 don't see any problem. (Not to say there isn't one, just
 that I don't see 
 it... :)

My question here is whether each supported platform is
going to need to provide its own overflow
detection/autoconversion decision routines, since the
portable part of perl6 will have now idea how far it can go
with native numbers.

   Strings can be of three types--binary data, platform
   native, and UTF-32.
 
 "platform native"?
 
 ASCII, EBCDIC, 16-bit chars, whatever. I'd rather not
 deal with 
 variable-length characters at all, so things like UTF-8
 and friends aren't 
 really on the list. (Though they could be with the regex
 engine dealing 
 with them in UTF-32 format)

But why is perl6 messing with them, since it has no idea
what they mean?

   No, we are not messing around with UTF-8 or 16, nor
 are
   we messing with
   EBCDIC, shift-JIS, or any of that stuff. Strings can
 be
   stored internally
   that way (and the native form might be one of them)
 but
   as far as the
   interface is concerned we have only three. Yes, this
 does
   mean if we mess
   with strings in UTF-8 format on a non-UTF-8 system
   they'll need to be fed
   out in UTF-32. It's bigger, but we can deal.
 
 The issue with UTF-32 is that we'd need to write an
 entire
 string-handling library, while quite a few modern
 platforms
 have _wstr* or equivalent.
 
 I'm not sure there's much in the way of string handling
 that we need to do 
 that's not perl-specific. It's also not all that much
 work anyway and, 
 while it is stuff we'll need to do, the benefits seem
 worth it.

The only issue is that the CRTL's versions may be in
hand-tuned assembler, and perl6 will be doing a _lot_ of 
strlen()s, I expect. But with good compilers, I suppose,
its an open question on how much performance one really
gains from hand assembly.

-- BKS

__
Do You Yahoo!?
Yahoo! Shopping - Thousands of Stores. Millions of Products.
http://shopping.yahoo.com/



Re: standard representations

2000-12-26 Thread Benjamin Stuhl

Thus spake the illustrious Dan Sugalski [EMAIL PROTECTED]:
 Okay, here's what I'm currently thinking of for standard
 representations of 
 integers, numbers, strings, and (possibly) complex data.
 These are not 
 necessarily indicative of how the data's stored in
 scalars (or hashes or 
 arrays), merely the types that will need to be dealt with
 in the vtables.
 
 In addition to each of the types below, each vtable will
 have a 'same type' 
 entry that'll be used if the optimizer can guarantee that
 the scalars 
 involved in an operation are of the identical type.
 (Presumably things can 
 be faster that way)
 
 For integers, we have two types, platform native, and
 bigint. No guarantees 
 are made as to the size of a native int. bigints can be
 of any size.

I'm not sure about the wisdom of not making any guarrantees
about int size, since that means that extensions have to go
through the same hoops perl5 has, dealing with
"unspecified" behaviors (cf. fun with ANSI stdio). To make
life easy, we might want to ordain sizeof(p6int) =
sizeof(void *)  sizeof(p6int) = 4. On the other hand,
this makes a port of the PVM to Palms and the like somewhat
harder (but would it be much easier to wedge them into the
standard PVM?). Also, can we please mandate 2s-complement
integral math? Perl 5 really always has, but can we please
make it official? 

 For floats, we also have two types, C double and
 bigfloat. No guarantees to 
 the size or accuracy of the double. bigfloats can be of
 any size.

Floating point is even harder, and will require a lot of
build-time checks anyway. 

 Strings can be of three types--binary data, platform
 native, and UTF-32. 

"platform native"?

 No, we are not messing around with UTF-8 or 16, nor are
 we messing with 
 EBCDIC, shift-JIS, or any of that stuff. Strings can be
 stored internally 
 that way (and the native form might be one of them) but
 as far as the 
 interface is concerned we have only three. Yes, this does
 mean if we mess 
 with strings in UTF-8 format on a non-UTF-8 system
 they'll need to be fed 
 out in UTF-32. It's bigger, but we can deal.

The issue with UTF-32 is that we'd need to write an entire
string-handling library, while quite a few modern platforms
have _wstr* or equivalent.

 Finally, complex numbers, if we deal with them, will be
 either double or 
 bigfloat complexes. (I don't see any reason to mess with
 integer versions, 
 nor with mixed double/bigfloat types)
 
 And, unless Larry objects, I feel that all vtable methods
 should have the 
 option of going with a 'scalar native' form if the
 operation if it's 
 determined at runtime that two scalars are the same type,
 though this is 
 optional and bay be skipped for cost reasons. (Doing it
 with, for example, 
 complex numbers might be worth it, or when expensive
 conversions might be 
 avoided)

This part sounds good.

 Comments? I'm trying to balance out accuracy and DWIMmery
 with cost here, 
 and I'm not 100% sure things are quite right yet.
 
   Dan
 

-- BKS

__
Do You Yahoo!?
Yahoo! Shopping - Thousands of Stores. Millions of Products.
http://shopping.yahoo.com/



Re: [not quite an RFC] shared bytecode/optree

2000-10-25 Thread Benjamin Stuhl

--- Chaim Frenkel [EMAIL PROTECTED] wrote:
  "BS" == Benjamin Stuhl [EMAIL PROTECTED] writes:
 
 BS 1. Bytecode can just be mmap'ed or read in, no
 playing
 BS around with relocations on loading or games with RVAs
 BS (which can't be used anyway, since variable RVAs vary
 based
 BS on what's been allocated or freed earlier).
 
 (What is an RVA?)

relative virtual address
 
 And how does the actual runtime use a relocatable
 pointer?  If it is
 an offset, then any access becomes an add. And depending
 upon the
 source of the pointer, it would either be a real address
 or an offset.
 
 Or if everything is a handle, then each access requires
 two fetches.
 And I don't see where you avoided the relocation. The
 handle table
 that would come in with the bytecode would need to be
 adjusted to
 reflect the real address.
 
 I vaguly can see a TIL that uses machine code linkage
 (real machine code
 jumps) that perhaps could use relative addressing as not
 needing
 relocation. But I'm not sure that all architectures
 support long enough
 relative jumps/calls.
 
 Doing the actual relocation should be quite fast. I
 believe that all
 current executables have to be relocated upon loading.
 Not to mention
 the calls to shared modules/dlls.
 
 chaim
 -- 
 Chaim Frenkel  Nonlinear Knowledge, Inc.
 [EMAIL PROTECTED] +1-718-236-0183

My primary goal (it may not have come accross strongly
enough) in this proposal was sharing bytecode between
threads even with an ithreadsish model (variables are
thread-private, except when explicitly shared). This
requires that the bytecode not contain direct pointers to
variables, but rather references with at least one level of
indirection. Avoiding fixups/relocations and allowing
bytecode to be mmap()ed are additional potential benefits.
But my first goal was to not have one copy of each
subroutine in File::Spec::Functions for each thread I run.

-- BKS


__
Do You Yahoo!?
Yahoo! Messenger - Talk while you surf!  It's FREE.
http://im.yahoo.com/



[not quite an RFC] shared bytecode/optree

2000-10-24 Thread Benjamin Stuhl

Firstly, by "bytecode" I mean a .pmc and by "optree" I mean
the perl6 VM's internal form that it goes through
executing.

It seems to me that one thing that the perl6 bytecode
implementation _should_ do (in the interests of being light
and fast, as well as meshing well with MT) is be
position-independant. What do I mean? That all direct
references to SV*'s or regexes or anything else in the
bytecode _and_ the optree should actually be handles of
some sort. This has several benefits:

1. Bytecode can just be mmap'ed or read in, no playing
around with relocations on loading or games with RVAs
(which can't be used anyway, since variable RVAs vary based
on what's been allocated or freed earlier).

2. (more importantly, IMHO) Bytecode and the optree are
shareable between threads. My primary reason for opposing
to the RFC proposing that modules must be reloaded in each
thread is the immense amount of memory that would be wasted
without bytecode/optree sharing.

3. With a good slab allocator and possibly some mprotect()
calls (and a good OS) bytecode/optree suddenly becomes
_completely_ shared between child processed. No more
needing to restart httpd and mod_perl6 because the mixing
of code and data has doubled the core usage of each
process!

I don't have the background to seriously argue
implementation, but I might suggest a "handle table" of
sorts which defines for each thread and CV which variable
goes with which handle. This sort of ties in with my
(vague) idea that CVs should carry around instructions for
building their scratchpad, rather than the pad itself (IOW,
scratchpads become purely part of the stack frame, rather
than the subroutine's carrier variable). This is all for
the purpose of reducing the required locking around
subroutine calls to nil or almost nil (perhaps one to make
sure that no-one's changed the subroutine out from under us
via eval("*foo = \bar;"); or the like).

At any rate, I'm just spouting off ideas sparked by various
recent discussions (I probably need a higher blood sugar or
something). It's probably too early to seriously argue
technical merits, but on the other hand, basic VM design
can start before we know the precise grammar.

-- BKS

__
Do You Yahoo!?
Yahoo! Messenger - Talk while you surf!  It's FREE.
http://im.yahoo.com/



Re: RFCs for thread models

2000-09-10 Thread Benjamin Stuhl

--- Chaim Frenkel [EMAIL PROTECTED] wrote:
  "SWM" == Steven W McDougall [EMAIL PROTECTED]
 writes:
 
 SWM If you actually compile a Perl program, like
 
 SWM  $a = $b
   
 SWM and then look at the op tree, you won't find the
 symbol "$b", or "b"
 SWM anywhere in it. The fetch() op does not have the
 name of the variable
 SWM $b; rather, it holds a pointer to the value for $b.
 
 Where did you get this idea from? P5 currently does many
 lookups for
 names. All globals. Lexicals live elsewhere.

Globals whose names can be resolved at compile time are,
with the SV* is stuck in to o-op_sv.

 SWM If each thread is to have its own value for $b, then
 the fetch() op
 SWM can't hold a pointer to *the* value. Instead, it
 must hold a pointer
 SWM to a map that indexes from thread ID to the value of
 $b for that
 SWM thread. Thread IDs tend to be sparse, so the map
 can't be implemented
 SWM as an array. It will have to be a hash, or a
 B*-tree, or a balanced
 SWM B-tree, or the like.

Or, say a hash table by pointer value that only contains
thread-local-ified globals - the rest juat use the stored
pointer (So for only a few thread-local globals, there is
very little overhead). I.e.

OP* PERL_FASTCALL p6_pp_fetch (perl_thread *t)
{
SV *real_sv = ((SVOP*)PL_op)-op_sv, tsv;

if (tsv = p6_ptrtbl_fetch(t-t_localsvs, real_sv))
real_sv = tsv;
p6_extend_stack(t-t_stack, 1);
p6_push(t-t_stack, real_sv);
}

 Now where
   sub recursive() { my $a :shared; ; return
 recursive() }
 would put $a or even which $a is meant, is left as an
 excersize
 for someone brighter than me.

%P6-E-MEANINGLESS, "my $a : shared" is a meaningless
construct.

-- BKS


__
Do You Yahoo!?
Yahoo! Mail - Free email you can access from anywhere!
http://mail.yahoo.com/



YAVTBL: yet another vtbl scheme

2000-09-05 Thread Benjamin Stuhl

All -
I fail to see the reason for imposing that all
variables
"know" how to perform ops upon themselves. An operation is 
separate from the data it operates on. Therefore, I propose
the following vtbl scheme, with two goals:
  1. that the minimal vtbl be just that, minimal
  2. that it be possible (convenient) to override ops as 
 needed
First, a few basic types (these are sample only, and should
be beaten on for cach-friendliness, etc. once a design is
formalized).

typedef struct _ovl {
U32   ov_type;
U32   ov_flags;
void *ov_vtbl;
void *ov_data;
struct _ovl *ov_next;
} OVERLOAD;

typedef union {
SCALAR_VTBL s;
ARRAY_VTBL  a;
HASH_VTBL   h;
} SV_VTBL;

typedef struct sv {
void *sv_data;
OVERLOAD *sv_magic;
SV_VTBL  *sv_vtbl;
U32  sv_flags; /* and type (SV, AV, HV) */
(... GC stuff ... MT-safe stuff ...)
} SV, *PMC;

SV_VTBL, then, supports basic operations on perlish data
types (get, store, and a few housekeeping things). Since
noone (outside perl and libperl.so) should be directly
calling vtbl functions, this makes it easy to put checks in
that a variable is the appropriate type (ie, av_fetch will
die if the variable is really a scalar).

Here are what each data type should support (each get/set
may require an argument giving a bit more detail (ie, U16
vs. I64, UTF8 vs. UTF16-bigendian, etc.)):

SCALAR_VTBL:
get_int
get_string
get_real
get_ref
num_sign /* positive or negative (or zero?)*/
num_is_integral
set_int
set_string
set_real
set_ref
set_multival /* == perl5ish 
sv_setpv(sv...);
sv_setiv(sv,...);
SvPOK_on(sv); (esp this part)
  */
undef
construct
finalize

ARRAY_VTBL:
get_at
set_at
grow /* a hint on where we plan to put values, ie 
av-sv_vtbl.a.grow(bottom_ix, top_ix) */
size
clear /* @av = (); */
undef /* undef @av; */
get_interator
construct
finalize

HASH_VTBL:
fetch
store
get_iterator
/* not sure if these two are needed */
get_iterator_keys
get_iterator_values
clear
undef
size
construct
finalize

In order to allow overriding of opcodes for, say, BigInts,
several types of OVERLOAD are defined (4 basic types (flags
in bottom byte of ov_type?) are defined, based on what
flavor of vtbl is in ov_vtbl). These are OV_GET, OV_SET,
OV_RANDOM, OV_OPS and are denoted in sv-sv_flags. The
first three correspond to the perl5 GMG, SMG, and RMG. The
last marks that the vtbl is an overload of one or more
opcodes. Every op checks to see if it is overloaded, and if
it is, calls that. Some ops don't need to (ie, vec() can
just do a set_string and add an OVERLOAD for the bitwise
ops).

If necessary, additional subclasses of OV_OPS may be
defined (ie, OV_NUMERIC, OV_STRING, OV_IO).

-- BKS

__
Do You Yahoo!?
Yahoo! Mail - Free email you can access from anywhere!
http://mail.yahoo.com/



Re: RFC 146 (v1) Remove socket functions from core

2000-08-25 Thread Benjamin Stuhl

--- "Stephen P. Potter" [EMAIL PROTECTED] wrote:
 Lightning flashed, thunder crashed and Tom Christiansen
 [EMAIL PROTECTED]
 m whispered:
 | Unless that's done completely transparently, you'll
 pretty much screw the
 | pooch as far as "Perl is the Cliff Notes of Unix"
 notion.  Not to 
 | mention running a very strong risk of butchering the
 performance.
 
 I don't think there is any ruling from Larry that perl
 must remain the
 "Cliff Notes of Unix."  In fact, there seems to be a bit
 of a concerted
 effort (partly suggested by Larry, IIRC) to make perl
 *less* Unix-centric
 and more friendly for other environments.
 
 I'm not concerned with performance, per se.  I have
 confidence in the
 people who will actually write the code to take care of
 that issue.
 Performance will be a factor in deciding whether this can
 be implemented or
 not.  If performance will suffer unacceptably, then this
 won't get
 implemented.

It probably would. Dynamic loading is not cheap, and having
to do a dlopen() and a dlsym() (or a LoadLibrary() and a
GetProcAddress()) to find out the square root of 2 is not
my idea of a _useful_ lightweight programing language.

 | I don't understand this desire to eviscerate Perl's
 guts.  Having
 | everything you want just *there* is part of what's made
 Perl fast,
 | fun, and successful.  Good luck on preserving all
 three.
 
 This desire stems from having a wonderful mechanism for
 making the core
 more lightweight (hopefully improving performance) called
 loadable
 modules.  Larry designed this feature for a reason, and
 has been saying
 since the early perl5 alphas that we could/should migrate
 some things out
 of the core.  I'm simply suggesting all the parts that I
 think reasonably
 go together than could be migrated.  They can still be
 "there", just in a
 module.  If the AUTOLOAD stuff that is being discussed
 works out, you won't
 even know the internals have changed.

AUTOLOAD searches are not cheap either. It can take a lot
of stat() calls to even _find_ the correct module, much
less load it. The average math function in the perl5 core
is about 13 lines of C code. Eviscerating it out of the
core would accomplish nothing.
 
 I don't understand this desire to not want anything to
 change.  This is an
 opportunity to clean up the language, make it more
 useable, and more fun.

Slowing perl down and forcing everyone to add 5 "use"
statements to the top of every program to get any useful
features would neither make it more useful or more fun.

 I would have a lot more fun if perl were a better
 performer and if it was
 easy for me to expand it, contract it, reshape it,
 improve it, etc.
 
 -spp

-- BKS

__
Do You Yahoo!?
Yahoo! Mail - Free email you can access from anywhere!
http://mail.yahoo.com/



Re: Avoid memory copy and redundant loops in reduce/fold

2000-08-04 Thread Benjamin Stuhl

 The normal problem with this type of structure is that
 the previous
 statement would create 2 array copies, and 3 loops for
 most compilers. In
 perl speak, it might look like:
 $dummy1[$_] = $b[$_]*$c[$_] for (0..$#b-1);
 $dummy2[$_] = $d[$_]+$dummy1[$_] for (0..$#dummy1-1);
 $sum+=$_ for (@dummy2);
 (Sorry if this isn't very idiomatic perl--it's not really
 my native
 language.)
 
 Progressive C++ numeric programming libraries like POOMA
 and Blitz++ use
 template meta-programming techniques to implement
 'expression templates'.
 Templates are used to create the parse tree for these
 kind of array
 expressions at compile time, and the compiler then
 optimises out the extra
 loops and array copies to create something like:
 $sum+=$b[$_]*$c[$_]+$d[$_] for (0..$#b-1);
 
 Without this optimisation, array semantics become next to
 useless for
 numeric programming, because their overhead is just so
 high. But writing
 numericly intensive programs without array semantics is
 messy--they become
 littered with control structures and loops (which is
 particularly
 unintuitive for mathmaticians used to the compact
 notation of mathematics).
 
 So, could perl 6 do this optimisation (assuming that the
 array
 notation/folding stuff makes its way into the language)?
 Given that some
 amount of compilation or interpretation will presumably
 still be done at
 run-time, perhaps this is much easier in perl than in
 C++...

This actually leads to a much more general question, namely
passing of arrays to functions. For ppcode at least and
probably any code using the perl API, it should be possible
and IMHO desirable to push the AV* (or equivalent), rather
than expanding the array and pushing each of its elements.
Furthermore, if we do this, it would make passing named
array arguments (sub foo (@baz, @qux) { ... }) much
simpler. 
If a subroutine asks for @_, than we can go the old way and
push everything, but reducing the number of stack pushes on
list operators could be a major win.

-- BKS

__
Do You Yahoo!?
Kick off your party with Yahoo! Invites.
http://invites.yahoo.com/