Re: RFC: Foreign objects in perl

2000-08-12 Thread Larry Wall

Benjamin Stuhl writes:
: --- Bart Schuller <[EMAIL PROTECTED]> wrote:
: > Larry knew what he was doing when he decided on utf8.
: 
: It has also led to the perl5 internals being, to put it
: bluntly, a horrible mess. And forget about the regex
: engine.

That's a vast oversimplification.  It has very little to do with
choosing utf8 over utf16.  The internals were already a mess, from the
standpoint of not using vtables, and the standpoint of assuming that
characters are 8 bits.  It's those lingering assumptions that infest
the regex optimizer.  (And the fact that we didn't actually *finish*
the utf8 support for 5.6.0.)

: Perhaps if it was designed in from the beginning things
: would be better, but this is something that needs serious 
: discussion.

Certainly, but I still think that utf8 must be supported as the default
string datatype--at least in any Perl east of the Pacific Ocean.  We
can of course support polymorphically support utf16 and utf32 as
well--the language abstraction is such that there's little problem with
that.  The only places we should have to worry about it is in the
interfaces to the outside world.  As far as Perl is concerned (these
days), a string is a sequence of integers.  And utf8 supports that
rather more smoothly than utf16!

What you have to realize is that going to utf16 does not solve your
variable-width character problems.  I consider it a requirement that
Perl handle plane 1 characters as smoothly as it handles plane 0
characters.  To my American mind, the only value of utf16 is that it is
a poorly compressed form of utf32.  People from Japan may differ, of
course.  :-)  But as long as we can allow for those differences in the
design through string polymorphism and lazy conversion, we can
hopefully make everyone happy.

Whether or not strings appear to be objects in Perl, they will
certainly need vtables in perl.

Larry



Re: Ramblings on "base class" for SV etc.

2000-08-12 Thread Larry Wall

Nick Ing-Simmons writes:
: Chaim Frenkel <[EMAIL PROTECTED]> writes:
: >Hmm, will vtbl get rid of all the magic hacks?
: 
: The "mg.c" 'magic hacks' are in essence applying vtable semantics (they 
: are even called vtables in the sources) to a subset of "values".

Well, yes, but there's a linked list of them to allow for MI of
magical methods.

: So yes vtables mean evrything is "magic" so nothing needs "special magic"...

Er, you still have to think about whether something could be tied and
tainted at the same time.

Larry



Re: RFC 35 (v1) A proposed internal base format for perl

2000-08-12 Thread Larry Wall

[EMAIL PROTECTED] writes:
: This way perl doesn't, for example, have to know how to access an
: individual element of an array of integers--it just asks the array to
: return it a particular element. Code MUST use the vtable functions to
: get or set values from variables. They MUST NOT directly access the data.

Fine, though we'll have to decide how indexes and keys get passed in to
these functions (vfuncs?).  We could pass such parameters on the stack
or as function arguments.  At a guess, we'll end up providing both
APIs.  Which API will be more basic is anyone's guess.  'Course, this
is outside the scope of this RFC anyway.

: This base structure should be considered immobile, so it's safe to
: maintain pointers to it. The data portion of a variable should be
: considered moveable, and may be shuffled around if a variable changes
: its type, or the garbage collector needs to compact the heap.

If it's only compacting the heap, I don't know whether I'd call it "the
garbage collector".  Catching cycles has to involve the PMCs too.

: =item GC_data
: 
: Random data for the garbage collector, whichever one is used. This
: could be a marker for M&S GC, or a refcount for refcount GC, or just
: nothing at all if we get a really clever GC.

We'll need to decide if we're really going to make people release their
PMC pointers explicitly.  This is required for refcount GC, and can
help some forms of "real" GC do their work more efficient, but slows
down other forms of GC.

: =item vtable
: 
: The vtable field holds a pointer to the vtable for a variable. Each
: variable type has its own vtable, holding pointers to functions for
: the variable. Vtables are shared between variables of the same
: type. (All integer arrays have the same vtable, as do all string
: scalars and so on)

I think that last statement is false.  I think strings will have
several vtables depending on their format.  (So might integers, if we
decide to stitch in bigints.)

Larry



Re: Imrpoving tie() (Re: RFC 15 (v1) Stronger typing through tie.)

2000-08-12 Thread Larry Wall

Dan Sugalski writes:
: Yup. It's an issue for things that implement any non-standard semantics for 
: existing ops, especially if those ops are overridden at runtime so the 
: optimizer doesn't know. It's one thing to mess with tied variables, its 
: another entirely to make + behave differently.
: 
: I think we'll need to get a ruling from Larry at some point on this one.

I haven't been terribly happy with tie for some time.  I'd rather we had
more type-based approach, which could:

1) factor the work out to one spot so you wouldn't have to call tie
on every element of an array, for instance.
2) let us tie lexically scoped variables that don't leak out to the
surrounding program.

And even if we keep the current tie interface (and we probably have to,
even if we add a better way), we can probably limit the damage other
ways.  If we see a declaration like:

my int $foo;

we can decide either that $foo will never be tied, or that it will never
be tied to an implementation that violates the standard meaning of +.

Larry



Re: Ramblings on "base class" for SV etc.

2000-08-13 Thread Larry Wall

Dan Sugalski writes:
: I'm not sure the vtable's the place for this sort of thing. (Plus then we 
: start getting a zillion alternate functions--we'd have taint/notaint and 
: thread/nothread right now for four, add another and that brings us to 
: eight, then sixteen, then...) Besides, generally the code would want to 
: check taint status before accessing the data.

Which penalizes every operation.  I'd suggest vtable wrappers for
exceptional circumstances.  Or at least do like Perl 5 and bunch
multiple things under one "think about it" test.  But one of the
realizations of Perl 5 over Perl 4 was that we didn't need a separate
executable to keep tainting efficient if we just made it part of the
magic system.  I'd hate to lose that in Perl 6.  It'd be really nice if
ordinary data didn't have to interrogate any bits.  If magic data
requires an indirection through whatever you're calling sv_any these
days, then you can keep magic magical without penalizing the mundane
data.  I expect MI of magical methods could even be emulated by nested
SI vtables, as long as you keep them in the right order.  Then you
don't get a combinatorial explosion of vtables.

'Course, if you nest sv_any pointers, you'd probably need a pointer in
the last one back to the first one, or you might lose track of the
PMC.  That's why Perl 5 inverted the whole thing and ran the linked
list of magics out of the data area.  But then you have to test bits
everywhere.

Tradeoffs, tradeoffs...

Larry



Re: RFC 35 (v1) A proposed internal base format for perl

2000-08-14 Thread Larry Wall

Chaim Frenkel writes:
: >>>>> "LW" == Larry Wall <[EMAIL PROTECTED]> writes:
: 
: LW> : =item vtable
: LW> : 
: LW> : The vtable field holds a pointer to the vtable for a variable. Each
: LW> : variable type has its own vtable, holding pointers to functions for
: LW> : the variable. Vtables are shared between variables of the same
: LW> : type. (All integer arrays have the same vtable, as do all string
: LW> : scalars and so on)
: 
: LW> I think that last statement is false.  I think strings will have
: LW> several vtables depending on their format.  (So might integers, if we
: LW> decide to stitch in bigints.)
: 
: Actually, it isn't false. Just not 100% accurate statement. During some
: interchange with Dan. We discuseed saving various flag checking by
: swapping vtbls. (Hmm, might this be a state machine?)

I was talking about utf8 strings having a completely different vtable
than a utf16 or a utf32 string.  The statement in question seems to be
incompatible with that, but I'm not here to argue about that.

What I'd like to figure out about these vtables is how we do something
as simple as string comparison if two strings have different vtables.
Basically it's the old overloading dispatch problem all over again.
It's not clear to me whether the intrinsic types should have a different
solution to this than the extrinsic types.

Doubtless there are already proposals out there that I just haven't
read yet.  I'm just letting you know where my thinking is at, even if I
look ignorant of current discussions elsewhere.  I only just got caught
up reading the bootstrap mailing list this weekend.  And I'm still 1300
articles behind in perl6-language, sigh...

Larry



Re: RFC 35 (v1) A proposed internal base format for perl

2000-08-14 Thread Larry Wall

Nick Ing-Simmons writes:
: >It's not clear to me whether the intrinsic types should have a different
: >solution to this than the extrinsic types.
: 
: _This_ thread is about using vtables for intrinsic types. If we cannot 
: make them work there then the proposed innermost SV * replacment is flawed.

Sure, but we may have to warp our ideas of what a vtable is to encompass
the notion of a vtable that is the cross-product of two vtables.

Larry



Re: RFC 35 (v1) A proposed internal base format for perl

2000-08-14 Thread Larry Wall

Nick Ing-Simmons writes:
: Larry Wall <[EMAIL PROTECTED]> writes:
: >Nick Ing-Simmons writes:
: >: >It's not clear to me whether the intrinsic types should have a different
: >: >solution to this than the extrinsic types.
: >: 
: >: _This_ thread is about using vtables for intrinsic types. If we cannot 
: >: make them work there then the proposed innermost SV * replacment is flawed.
: >
: >Sure, but we may have to warp our ideas of what a vtable is to encompass
: >the notion of a vtable that is the cross-product of two vtables.
: 
: That wouldn't be a 'vector' table but a 'matrix' table ! only 1/2 ;-)

Well, it's not clear that I know what I'm talking about.  And vice versa.  :-)

Larry



Re: pramgas as compile-time-only

2000-08-14 Thread Larry Wall

Dan Sugalski writes:
: That's what I was thinking about, and it also makes the ops smaller.

Er, I don't think so, if you're talking about what I think you're talking
about.

: (We got a warning flag field in every op as of 5.6.0, IIRC)

No, they're only stored once per statement, as far as I recall.  This
is a great way to handle all sorts of lexically scoped things, provided
they don't require finer specificity than a statement.  Each new
statement just rams a new cop pointer into curcop and you're done with
it.  Think of it as a funny kind of vtbl pointer.  You potentially
change a whole bunch of semantics by one pointer assignment.  Any
opcode within the statement can look up anything it likes in the
current lexical context merely by following the curcop pointer back.

Larry



Re: Multi-object locks (was Re: RFC 35 / Re: perl6-internals-gc sublist)

2000-08-14 Thread Larry Wall

Dan Sugalski writes:
: A language issue. Being able to require multiple locks upon entering a sub, 
: along with timeouts and retries and such, would be very nice, and something 
: for the language people. (Which probably means some of us over there, since 
: I don't know that we have that much thread experience in the pure perl side 
: of the world)

Be careful what you ask for from us language designers.  If you're not
careful, we'll take away your low-level primitives and give you something
like Ada's rendezvous model.

Larry



Re: RFC 35 (v1) A proposed internal base format for perl

2000-08-14 Thread Larry Wall

Dan Sugalski writes:
: The big problem here is the large number of operators that need to
: be supported in every vtable. On the other hand, it means we whittle
: ourselves down to only one operator opcode. ;-)

I don't care if the program is half vtables, as long as it runs fast.

Larry



Re: RFC 35 (v1) A proposed internal base format for perl

2000-08-14 Thread Larry Wall

Dan Sugalski writes:
: On Mon, 14 Aug 2000, Larry Wall wrote:
: 
: > Dan Sugalski writes:
: > : The big problem here is the large number of operators that need to
: > : be supported in every vtable. On the other hand, it means we whittle
: > : ourselves down to only one operator opcode. ;-)
: > 
: > I don't care if the program is half vtables, as long as it runs fast.
: 
: It shall run fast if I have to chase it with a stick. (maybe I should just
: give up now and go with the single opcode "dwim" and be done with it...)
: 
: How much of the current base of target ports are you willing to give up in
: the first cut for fast? The TIL suggestion, amongst others, has the
: potential to speed things up rather a lot, but it has the disadvantage of
: requiring intimate knowledge of each target port. My preference is to get
: a snappy interpreter and leave the Java JIT-equivalents to the various
: chip/OS vendors, but I'd bet the TIL style would be faster.

I *thought* I was taking TIL into account with the design of Perl 5.
In fact, I think Ilya almost got it to work.  But there were a lot of
things that snuck in as the design evolved that made it difficult.  My
current thinking is that if we're going to seriously consider TIL then
we'd durn well better do at least one TIL implementation in parallel or
we'll inadvertently design ourselves out of the capability.  Again.

On the other hand, targeting JVM and IL.NET might keep us honest enough.

Larry



Re: RFC 35 (v1) A proposed internal base format for perl

2000-08-15 Thread Larry Wall

Chaim Frenkel writes:
: >>>>> "LW" == Larry Wall <[EMAIL PROTECTED]> writes:
: 
: LW> On the other hand, targeting JVM and IL.NET might keep us honest enough.
: 
: What is IL.NET? 

Sorry, just my shorthand for the IL of .NET, which is to say, Microsoft's
new intermediate language.

Larry



Re: Welcome to Perl Vtbls

2000-08-15 Thread Larry Wall

Chaim Frenkel writes:
: I can't see how objectA's vtbl can handle a cross-operation to objectB's
: vtbl.
: 
: Enlightenment sought.

I'm mostly just trying to bust us out of conventional thinking by
throwing random words around.  I don't know offhand whether multimethod
dispatch would make any sense here as a way to let us shrink the core
further while keeping our dynamic loading options open.  But it seems
slightly bogus to me to use vtbls to concatenate two strings.  That
could be the "core" way, given we only define

concat(utf8, *)
concat(utf16, *)
concat(utf32, *)

but suppose for performance reasons you don't want to view the second
string as dynamically typed.  You could dynamically load in some compile-time
typed routines:

concat(utf8, utf8)
concat(utf8, utf16)
concat(utf8, utf32)
concat(utf16, utf8)
concat(utf16, utf16)
concat(utf16, utf32)
concat(utf32, utf8)
concat(utf32, utf16)
concat(utf32, utf32)

in which case it would be *nice* if concat(utf8,utf8) could simply
slam the two strings together without any of this second vtbl business.
But that implies multimethod dispatch, recalculated when the new routines
are loaded in.  I'm not claiming we want this--I'm just stirring the pot
to keep things from gelling prematurely.

By the way, don't take this as a final design of string types either.  :-)

Larry



Re: Char encoding

2000-08-15 Thread Larry Wall

Dan Sugalski writes:
: >I don't see why. I don't think we should be dealing with *multiple* internal
: >encodings. That would be Bad and Wrong.
: 
: Why not? We're going to have two already, binary and UTF-something, and if 
: we provide an option for UTF-8, -16, and -32 we're going to need the code 
: *anyway*, so what's wrong with having them all available?

A small perl will force everything to one form, and a large perl will
have code to handle all permutations lazily.  But in any case, the
abstract model as viewed from the Perl language will make this
transparent.  Violating that abstract model would be Bad and Wrong.
Anything else is fair game.

Larry



Re: Char encoding

2000-08-15 Thread Larry Wall

Dan Sugalski writes:
: > iii) Never assume bytes.
: 
: What, never? Not even in vectors and bitmaps? :)
: 
: I agree, though. Character and byte are separate constructs and need to be
: dealt with separately.

Not sure what you guys mean.  A string is a sequence of integers.
A sequence of integers can have many useful representations, where
usefulness can be defined in any of several different ways.

Just to tweak everyone's brain again, we wouldn't necessarily have to
support *any* variable length encoding in the core.  That would throw
utf8 and utf16 right out the window.  Under this model, you just store
strings as integer arrays, and use the smallest size integers that will
hold all the characters of the string.  You hit a character that won't
fit into your current representation, you just change the
representation on the fly from 8 to 16 to 32 bits.

That's one definition of useful.  It combines the representation of
integer arrays and strings.  It's optimized for substr(), but
deoptimized for I/O conversions.  Choose your poison...

: > > Perhaps the regex engine should always force UF8 form ?
: > 
: > I think we really want to store data internally in a common, Unicode format.
: 
: Maybe we should just abstract it, though the more abstract it gets the
: slower the regex engine's likely to be, as it does prefer to rip through
: raw data buffers.

Again, a small perl wants one regex engine that follow vtbl pointers
character by character.  (Or it might want to force a single representation
on all regex strings.)  A large perl may want three or four different
regex engines tuned for each representation.  Memory is getting cheaper,
after all.

The point being that we write an abstract generic regex engine that
can be instantiated either once for small perls or multiple times for
large perls.

Larry



Re: Char encoding

2000-08-15 Thread Larry Wall

Dan Sugalski writes:
: Is a statement like "All X comparisons treated as the 
: platform-native X" OK (for X in string, integer, float) in the 'small perl' 
: model? (Assuming then that there's no core knowledge of BigInts, BigRats, 
: or Complex numbers in small perl)

Depends on what you mean by "small", and "no core knowledge".  One
could have a small perl that can still load in all sorts of emulation
routines on the fly as needed.  These might well be written in Perl,
for compactness.  It just wouldn't be as efficient as using something
more hard-wired.  But you'd be able to do anything in Palm Perl that
you can do in Big Perl.  Eventually.

On the other hand, one could also strip out all the emulation routines
if you were targeting a microcontroller, say.

Larry



Re: RFC 99 (v1) Maintain internal time in Modified Julian (not epoch)

2000-08-15 Thread Larry Wall

[EMAIL PROTECTED] writes:
: Yep.  Or more generally "Standardize Perl on all platforms to one
: common time epoch" and reccommend the Unix epoch since it's so
: widespread.  :-)

Oh, gee, where's your sense of history?  (As in creating our own. :-)
Maybe we should invent our own epoch, like the year 2000.  Or use a
really standard one, like the year 0 AD (aka 1 BC).

I have this horror that people will still be using 1970 as the epoch in
the year 31,536.

Larry



Re: RFC 131 (v1) Internal String Storage to be Opaque

2000-08-18 Thread Larry Wall

[EMAIL PROTECTED] writes:
: A single internal encoding which is opaque to the programmer Would Be Nice.

We seem to be asking for contradictory things here.  If it's truly
opaque, the programmer shouldn't care whether it's polymorphic or
monomorphic.  I'm inclined to think the polymorphic solution will be
more efficient.  I could be wrong, of course.  But if we write the
interface right, we can swap in a monomorphic implementation at any
point.

Larry



Re: RFC 131 (v1) Internal String Storage to be Opaque

2000-08-18 Thread Larry Wall

Simon Cozens writes:
: On Fri, Aug 18, 2000 at 09:57:59AM -0700, Larry Wall wrote:
: > Because we don't lose much efficiency to polymorphism, since we need it
: > anyway to support generic scalars, and we gain some efficiency whenever
: > we procrastinate conversions out of existence.
: 
: Surely we do, because we have to add in something that says what
: representation we're in and, if we're going for a vtable design, how to
: transform it into anything else.

With vtables you add more code, but since the code is accessed in
parallel, you don't add any steps, so the only way polymorphism slows
you down is if it blows out your cache.

: I agree that procastinated conversions are the way to go; this actually brings
: up another interesting thought I had:
: 
: We can tell, for each evaluated block, from the optree which parts of each
: variable are going to be used in that run. This means we can, for instance,
: completely avoid using an SV if a variable is always treated as an NV no
: matter what. 
: 
: Does this buy us anything?

Certainly it can, at the expense of a slower compiler.  Any sort of
type inferencing can only help speed up the run time.  Type
inferencing works better the more real type info you feed it.
In particular, we will need a way to declare the return types of
subroutines.  (Optionally, of course!  Don't panic!  I'm not trying
to turn Perl into a strongly typed language--at least, not by default.)

But as long as we're feeding the compiler more type info, even if we
rule out fancy optimizations for load-and-go programs, we can stilll
say that

my int $foo;
my str $bar;

lets the compiler assume that the variable is restricted to integer or
string operations (and related conversions).  We could go farther than
that, and let the programmer request particular representions:

my int $foo :bits(16);
my str $bar :enc(utf8);

I don't know how far we want to push that for ordinary Perl programming,
but if we decide that Perl6 is the PI for Perl6, such specificity would
likely be required to produce efficient C/Java/C# code.  It might also
allow us to write the interfaces to external libraries in Perl.

Larry



Re: RFC 76 (v1) Builtin: reduce

2000-08-21 Thread Larry Wall

Jeremy Howard writes:
: How much hand-waving can we do with implementation efficiency of anonymous
: subs and higher order functions? How much can we expect Perl to optimise
: away at compile time? For instance, if:
: 
:   $sum = reduce ^_+^_, @list;
: 
: has any substantial overhead on each iteration it would be useless for any
: decent sized number crunching. Other areas where this is a huge issue are
: lazily generated lists (RFCs 81, 90, and 91), and implicit array loops (RFC
: 82). I've been kind of assuming that functions act on whole lists without
: mutating them (RFC 82 operators, map, grep, reduce, ...) would be called in
: a 'special way' that avoided the overhead of "real" sub calls. As I've
: mentioned before, I've also put in various RFCs that this kind of stuff
: should be evaluated lazily...
: 
: So anyway, if any of this is just so out of the question that we shouldn't
: even consider the possibility, now is a great time to let us know!

I think, even if we relegate currying to some kind of high-powered
macro system, as long as the operator in question has a good enough
prototype, we can be pretty efficient in how we rewrite things.  For
instance, it seems to me that if we somehow know that the first
argument to a reduce should be a sub that wants two arguments, we could
count the placeholders and rewrite the curried expression accordingly
without the nested ?: of the naive rewrite.

Larry



Re: RFC 127 (v1) Sane resolution to large function returns

2000-08-23 Thread Larry Wall

Dan Sugalski writes:
: The core will already know.

Especially if we add return types.

: Whether this justifies exposing the information's for someone else to 
: judge, but the core will know what context something is in. This is for 
: optimization reasons. While it's straightforward enough to know that this 
: is a hash copy:
: 
:%foo = %bar;
: 
: which can be optimized, it's less easy to optimize this:
: 
:sub foo {
:  my %hash;
:  %hash = (1..1);
:  return %hash;
:}
: 
:%bar = foo();
: 
: without return knowing its argument's in list(hash) context. If we know 
: that, though, the function return can be quicker than it would be if we 
: flatten and reconstitute the hash.

I expect that we'll get more compile-time benefit from

my HASH sub foo {
...
}

%bar = foo();

Larry



Re: RFC 136 (v1) Implementation of hash iterators

2000-08-23 Thread Larry Wall

Dan Sugalski writes:
: I have had the "Well, Duh!" flash, though, and now do realize that having 
: multiple iterators over a hash or array simultaneously could be rather handy.

You can also have the opposite "Well, Duh!" flash and realize that most
DBM implementations only support a single iterator at a time.  For some
definition of support.  That's the main reason for Perl's current
limitation.

Larry



Re: RFC 127 (v1) Sane resolution to large function returns

2000-08-23 Thread Larry Wall

Buddha Buck writes:
: At 11:26 AM 8/23/00 -0700, Larry Wall wrote:
: 
: >I expect that we'll get more compile-time benefit from
: >
: > my HASH sub foo {
: > ...
: > }
: >
: > %bar = foo();
: 
: So how would you fill in the type in:
: 
: my TYPE sub foo {
:...
:if (wanthash())   { return %bar;  }
:if (wantarray())  { return @baz;  )
:if (wantscalar()) { return $quux; };
: }
: 
: $scalar = foo();
: @array  = foo();
: %hash   = foo();

I don't yet know whether built-in types will derive from UNIVERSAL.  But
certainly 

my sub foo {

will work at least as well as it does now.  :-)

Larry



Re: RFC 127 (v1) Sane resolution to large function returns

2000-08-24 Thread Larry Wall

Bart Lateur writes:
: On Thu, 24 Aug 2000 09:38:28 +0100, Hildo Biersma wrote:
: 
: >> I expect that we'll get more compile-time benefit from
: >> 
: >> my HASH sub foo {
: >> ...
: >> }
: >> 
: >> %bar = foo();
: >
: >Ah, the Return Value Optimization so loved in C++...
: >
: >For those who haven't seen it before, you can optimize this by passing
: >in a reference to %bar to foo() and then use that in the function.
: 
: Just a remark: this is only safe if all other references to the hash
: returned are abandoned. Otherwise you'd have an alias where you should
: have gotten a copy.

I wasn't actually considering the RVO optimization--I was only thinking
about the fact that you can do more type inferencing at compile time
from the function's signature.

However, you couldn't actually do the aliased return right now if you
wanted to--at least, not without chicanery.  Just as with the list
assignment, it might be nice to have some way to request an object
pointer assignment rather than a hash copy:

\%bar = foo();

or some such.

Larry



Re: RFC 127 (v1) Sane resolution to large function returns

2000-08-24 Thread Larry Wall

Dan Sugalski writes:
: Chicanery's on the big To Do list. I'm really wanting to defer list 
: flattening as long as possible, and skipping it all together.

And I'm wondering whether it's better in general to explicitly force a
context in which we treat @foo and %bar as objects, rather than trying
to intuit when we can get away with it under the list flattening regime.
Not that we can't do that too, but I have a sneaking suspicion we could
easily confuse everybody if we try to get too fancy in using context to
differentiate between pass-by-ref and pass-by-value.  We could certainly
confuse ourselves--we'd have to distinguish explicit references from
implicit references internally.  Consider how you'd implement

push(@foo, @bar);

vs

push(@foo, \@bar);

If you defer the decision to flatten into the function, then you have
to distinguish those two kinds of reference.

I'm probably not making much sense, given that I made my coffee this
morning but haven't been awake enough to realize I actually need to go
back into the kitchen to get some of it...

Or I could just be getting old.  The thing about getting old is that
it's not only later than you think, it's later than you *can* think...

Larry



Re: RFC 155 - Remove geometric functions from core

2000-08-25 Thread Larry Wall

Dan Sugalski writes:
: Or, more succinctly, we're not going to screw with perl without a *darned* 
: good reason.

Er, perl is already screwed--it's Perl we're trying to preserve and grow.

Larry



Re: RFC 155 - Remove geometric functions from core

2000-08-25 Thread Larry Wall

Tom Christiansen writes:
: >Or, more succinctly, we're not going to screw with perl without a *darned* 
: >good reason.
: 
: This is the most beautiful thing I've read in days.

Bear in mind there are lots of darned good reasons.  :-)

Larry



Re: RFC 155 - Remove geometric functions from core

2000-08-25 Thread Larry Wall

Tom Christiansen writes:
: More of this nonsense, eh?

Please don't use fighting words in here.

: I just fail to understand the urge to eviscerate.  Why don't we just
: say that Perl isn't for systems work anymore, and remove everything
: that diddles $!, : or $?, or anything that might call anything from
: the C library.

This is confusing Perl and perl again.  This is not terribly useful,
especially in -internals.  The degree to which Perl 6 (the language)
will be useful for systems work has almost nothing to do with the method
by which perl6 (the program) implements the given functionality.  In fact,
we're trying to *decouple* those ideas further so that we can more easily
add such functionality to Perl without forcing people to read miles of
camel entrails before they begin.  If we can make it run faster in
the process, that'll be even better.

We're redesigning everything.  And we'll do the best job we can with
the brains we've got.  We will bear people's fears in mind, but we will
not be controlled by such fears.  Rather, we will try to give fear its
proper weight--which is neither too much, nor too little.

Fasten your seat belts, folks.

Larry



Re: RFC 146 (v1) Remove socket functions from core

2000-08-25 Thread Larry Wall

Joe McMahon writes:
: On Thu, 24 Aug 2000, Stephen P. Potter wrote:
: 
: > I have several RFCs I need to write about removing certain functionality
: > out of the core (math functions, IPC, networking, "user").  I don't want to
: > go too overboard.  I don't know that we want to go so far as to remove
: > printing and such.  It might be nice to generalize some functions (like the
: > discussion with open() that happened awhile back).
: 
: Hard things should be easy, easy things should be trivial. We should try
: to keep the stuff that is commonly used in the core (excluding OS
: dependent stuff, perhaps? Non-Unix folks don't see the use for getpwent(),
: for instance). In my opinion, Perl6 should still be a language in which
: it is easy to write powerful and useful programs on the command line.

We're fighting multiple definitions of "core" here.  Please distinguish
the core of the language from the core of the implementation--they're
two entirely different things.  What we're attempting to do here is
make it *not matter* whether something is in the core implementation or
not.  This gives us a great deal of freedom in how to implement the
core of the language.

For instance, if I'm running Perl on my Palm, I'd just as soon that
index() were implemented in Perl using repeated substr() comparisons.
Yes, that's slower than the C implementation, but the representation is
much more compact, and might have a hope of getting shoehorned into a
small memory.  A Perl distribution might implement index() much as it
does now in portable (more-or-less) C.  A distribution intended for a
high-powered server of known architecture might substitute in a
hand-coded assembler version of index().  A JVM implementation might
call out to some core JVM routine.  An .NET implementation might
delegate indexing to some object running on index.microsoft.com.  :-)

This should all be transparent at the language level.  Certainly
index() will continue to be supported in the core language.  But it
almost certainly belongs outside the core implementation.

Larry



Re: RFC 146 (v1) Remove socket functions from core

2000-08-25 Thread Larry Wall

Tom Christiansen writes:
: >Hard things should be easy, easy things should be trivial. We should try
: >to keep the stuff that is commonly used in the core (excluding OS
: >dependent stuff, perhaps? Non-Unix folks don't see the use for getpwent(),
: >for instance). 
: 
: That's their problem.  Perl is extremely useful to Unix systems
: programmers and administrators.  They are the target audience
: that Perl was initially written for, whom it was made famous by,
: and you will find that it continues to be very important to us.
: If you relegate us to take a back seat behind a mob of Billduhs,
: then you have betrayed your history and really pissed a lot of
: people off.  

Please don't monger these fears.

I am a Unix systems administrator.  I don't see much of a problem
leaving getpwent in the core language but out of the core implementation.
The issue of preserving the semantics of one-liners is a known
design constraint.

We are now beating a dead unicorn.  Please move it to the -language list.

Larry



Re: RFC 146 (v1) Remove socket functions from core

2000-08-25 Thread Larry Wall

Fisher Mark writes:
: > For instance, if I'm running Perl on my Palm, I'd just as soon that
: > index() were implemented in Perl using repeated substr() comparisons.
: 
: How small do we really need to go?

It's not so much a matter of small as a matter of pluggable.  But small
will continue to be important.  I think microcontrollers will keep getting
smaller and smaller physically, which will tend to keep down the amount
of memory they can have.  In 20 years we may have nanocomputers that
are limited to 2 megs...

But the pluggable is important for scaling up as well.  Consider
pluggable string methods for multiple string encodings.  A small Perl
might just coerce all encodings into utf8 or utf16 or whatever.  A
medium Perl might put separate implementations for several different
string types, and use a common intermediate type such as utf8 or utf16
for conversions.  A large Perl might put in a complete crossbar that
can convert any string type directly to any other.  All hand coded in
assembler for the Pentium 18.

Larry



Re: Episode 4 - A New Version, part 2

2000-08-25 Thread Larry Wall

[EMAIL PROTECTED] writes:
: "I'm Nathan, captain of the Metaperl Falcon. Tom Christian-bacca here
: is my first mate."
: "RRRWW!" Tom roars.
: Dan looks shocked.
: "Does he speak english?"
: Nathan shrugged.
: "Yeah, but he mostly prefers to just scream and shout."

This is not terribly useful either.

Larry



Re: RFC 155 - Remove geometric functions from core

2000-08-25 Thread Larry Wall

Tom Christiansen writes:
: >Please act like a grown-up.  Stephen cast the
: >first stone, but that's no excuse for you to reply with a boulder.
: 
: Sure it is: when a hoodlum jumps you with a knife, there's no reason
: to roll over and quietly submit to the death of a thousand cuts.
: No, you pull an Indy by responding with overwhelming firepower to
: dispatch the cretin forthwith before he gets cocky.  Otherwise
: you're a willing victim with an FMH sign on his butt waiting for
: further abuse as the next bandit decides to chew on you.  As nobody
: else said mum about that scat, I took care of it myself.

I despise escalation.  You seem to enjoy it.

Larry



Re: RFC 127 (v1) Sane resolution to large function returns

2000-08-24 Thread Larry Wall

Chaim Frenkel writes:
: LW> P.S. I think we *could* let @foo and %bar return an object ref in scalar
: LW> context, as long as the object returned overloads itself to behave as
: LW> arrays and hashes currently do in scalar context.
: 
: Isn't this an internals issue?

Not completely.  The scalar value would visably be a built-in object:

@bar = (0,1,2);
$foo = @bar;# now means \@bar, not (\@bar)->num
print ref $foo, $foo->num, $foo->str, ($foo->bool ? "true" : "false");
^D
ARRAY3(0,1,2)true

One implication of this approach is that we'd break the rule that says
references are always true.  Not clear if that's a problem.  It's basically
already broken with bool overloading, and defined still works.

Larry



Re: A tentative list of vtable functions

2000-09-01 Thread Larry Wall

Dan Sugalski writes:
: Anyone got anything to add before I throw together the base vtable RFC?

So how do you call a generic method?

Larry



Re: A tentative list of vtable functions

2000-09-01 Thread Larry Wall

Dan Sugalski writes:
: Type returns a magic cookie value of some sort (Not sure what sort yet), 
: name returns a string with the name of the type of the variable.

Why can't the type object just stringify to the name of the type?

>From a language level, I'm inclined to say that any bare identifier
that is known to be a type name should be compiled to a type object
that stringifies to the name of its type.  Then class methods don't
have to do an extra symbol table lookup.

Larry



Re: RFC 136 (v1) Implementation of hash iterators

2000-08-23 Thread Larry Wall

Dan Sugalski writes:
: At 11:30 AM 8/23/00 -0700, Larry Wall wrote:
: >Dan Sugalski writes:
: >: I have had the "Well, Duh!" flash, though, and now do realize that having
: >: multiple iterators over a hash or array simultaneously could be rather 
: >handy.
: >
: >You can also have the opposite "Well, Duh!" flash and realize that most
: >DBM implementations only support a single iterator at a time.  For some
: >definition of support.  That's the main reason for Perl's current
: >limitation.
: 
: Fair enough. Removing the limit makes sense, though, both from a 
: flexibility and a thread-safing standpoint. Might make sense for the 
: hash/array slices the PDL folks want too, if that's how they get 
: implemented, since I can see wanting to have many different hash or array 
: slices.

No problem with that.  We can always catch the DBM limitation at
runtime, and we're no worse off than we are now, unless people expect
to be able to turn on hash persistence transparently by tying to DBM.

But in actual fact, we almost always use keys rather than each, so it
probably doesn't matter.

Larry



Re: RFC 127 (v1) Sane resolution to large function returns

2000-08-23 Thread Larry Wall

Dan Sugalski writes:
: And do we want to consider making this (and its ilk) Do The Right Thing?
: 
:(@foo, @bar) = (@bar, @foo);

We certainly want to consider it, though perhaps not in -internals.
You can talk about passing @bar and @foo around as lazy lists, and
maybe even do lazy list-flattening, but I don't see how that works yet,
even in the absence of overlap.  The basic issue here may come
down to whether the LHS of an assignment can supply a prototype for the
entire assignment that forces everything to be treated as objects
rather than lists.

That is, right now, we can only have a scalar assignment prototype of ($),
and a list assignment prototype of (@).  We need a prototype (not just
for assignment) that says "all the rest of these arguments are objects",
so we don't have to use prototypes like (;\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@).
Or (\@*) for short.

Though if we let @foo and %bar automatically return refs in a scalar context
rather than booleans, we might write that as (;$).  ($*) for
short.  (Presuming * to be available if typeglobs go away.)

However we force non-flattening object lists in prototypes, there's
still the question of how you specify the use of an object list
prototype on assignment.  Hmm.

(@foo, @bar) \= (@bar, @foo);

Dunno if I like that or not.

Larry

P.S. I think we *could* let @foo and %bar return an object ref in scalar
context, as long as the object returned overloads itself to behave as
arrays and hashes currently do in scalar context.

Larry



Re: stackless python

2000-10-21 Thread Larry Wall

Joshua N Pritikin writes:
: http://www.oreillynet.com/pub/a/python/2000/10/04/stackless-intro.html

Perl 5 is already stackless in that sense, though we never implemented
continuations.  The main impetus for going stackless was to make it
possible to implement a Forth-style treaded code interpreter, though
we never put one of those into production either.

Larry



Re: Threaded Perl bytecode (was: Re: stackless python)

2000-10-23 Thread Larry Wall

Adam Turoff writes:
: If Perl bytecode were to become threaded, it would be rather troublesome.

Wasn't actually suggesting it, though similar issues also arise for
compiling down to efficient C, JVM, or C# IL.  Optimizing for Least
Surprise means different things in different contexts, but I'd hate
for Perl 6 to be astonishingly slow at anything...  :-)

Larry



Re: Unicode handling

2001-03-23 Thread Larry Wall

Jarkko Hietaniemi writes:
: *cough* \C *is* taken.
: 
: > >also \U has a meaning in double quotish strings.
: 
: "\Uindeed."

Bear in mind we are redesigning the language.  If there's a botch we
can think about fixing it.

Though maybe not on -internals...   :-)

Larry



Re: Unicode handling

2001-03-27 Thread Larry Wall

Dan Sugalski writes:
: Fair enough. I think there are some cases where there's a base/combining 
: pair of codepoints that don't map to a single combined-character code 
: point. Not matching on a glyph boundary could make things really odd, but 
: I'd hate to have the checking code on by default, since that'd slow down 
: the common case where the string in NFC won't have those.

Assume that in practice most of the normalization will be done by the
input disciplines.  Then we might have a pragma that says to try to
enforce level 1, level 2, level 3 if your data doesn't match your
expectations.  Then hopefully the expected semantics of the operators
will usually (I almost said "normally" :-) match the form of the data
coming in, and forced conversions will be rare.

That's how I see it currently.  But the smarter I get the less I know.

Larry



Re: Unicode handling

2001-03-27 Thread Larry Wall

Garrett Goebel writes:
: Someone please clue me in. A pointer to an RFC which defines the use of
: colons in Perl6 among other things would help.

Heh.  If you read the RFCs, you'll discover one of the basic rules of
language redesign: everybody wants the colon.  And it never seems to
occur to people that we'll actually have to break Perl 5's ?: operator
in order to give them the colon.  :-)

Larry



Re: Unicode handling

2001-03-27 Thread Larry Wall

Dan Sugalski writes:
: At 07:21 AM 3/27/2001 -0800, Larry Wall wrote:
: >Dan Sugalski writes:
: >Assume that in practice most of the normalization will be done by the
: >input disciplines.  Then we might have a pragma that says to try to
: >enforce level 1, level 2, level 3 if your data doesn't match your
: >expectations.  Then hopefully the expected semantics of the operators
: >will usually (I almost said "normally" :-) match the form of the data
: >coming in, and forced conversions will be rare.
: 
: The only problem with that is it means we'll be potentially altering the 
: data as it comes in, which leads back to the problem of input and output 
: files not matching for simple filter programs. (Plus it means we spend CPU 
: cycles altering data that we might not actually need to)

I think the programmer will often know which files are already
normalized, and can just be slurped in with a :raw discipline or some
such.  Whether that can be a default filter policy is really a matter
that depends on how the OS handles things.

It's almost more important to know what form the programmer wants for
the output.  I don't think, by and large, that people will be interested
in producing files that are of mixed normalization.  On input you take
what you're given, but on output, you have to make a policy decision.
That's likelier to be consistent for a whole program (or at least a whole
lexical scope), so we can probably have a declaration for the preferred
output form, and default everything that way.

This is somewhat orthogonal to the issue of laziness, however.  To the
first approximation it doesn't really matter when we do the conversion, as
long as the user sees a consistent semantics.  (Real life intrudes in
the case of when exceptions are thrown, however.)

: It might turn out that deferred conversions don't save anything, and if 
: that's so then I can live with that. And we may feel comfortable declaring 
: that we preserve equivalency in Unicode data only, and that's OK too. 
: (Though *you* get to call that one... :)

I think we can have our cake and eat it too if we are very careful to
distinguish semantics from representation.  In the extreme view, you
can have an EBCDIC representation and make it look to the program like
you're processing Unicode, as long as you don't go outside the subset
that corresponds to EBCDIC.  It's just a small matter of programming.  :-)

That being said, I don't think we can easily predict how many passes
we're going to make over the data, and we're going to be making many
passes over the same data, it's more efficient to convert once at the
beginning than to emulate one character set in another each time.  Emulation
has the advantage of keeping the old representation around as long as
possible, however.

Larry



Re: Unicode handling

2001-03-27 Thread Larry Wall

Dan Sugalski writes:
: I'm not sure that raw's the right word, given that the data is really 
: Unicode. It's not raw in the sense that a JPEG image or executable is raw data.

I'm suggesting it might be raw in that very sense, and simultaneously
be perfectly valid "internal" Unicode.  Otherwise you couldn't "slurp"
it.  To me, "raw" means "I know exactly what I'm doing, so keep your
cotton-picken' fingers off it until I tell you to put your cotton-picken'
fingers on it."

: I'm half-tempted to implement a 'touch count' in the scalars somewhere to 
: track the number of times something's been dealt with in a non-native way 
: to use as an indicator of whether we should just up and convert things. I 
: can't shake off the feeling that it'll be more expensive than not doing it, 
: though.

My feelings agree with your feelings there.  My guess is we have to
glue a big switch on the side of something so the programmer can tell
us whether to be pessimistic or optimistic.  My guess is we want to
attach that big switch to each of the input stacks, not to the current
lexical scope, since in a single scope there may be several data paths,
determined primarily by where the data came from.  (I expect that
attaching such a big switch to each variable would be overkill, but
some people seem to like overkill.)

Remember also that the scalability of Perl will depend on allowing
different policy decisions on this matter.  A tiny, slow Perl would
probably force everything to one representation immediately.  A large,
fast Perl might have code to do regex matching in anything from Big-5
to KOI-8.  (So it behooves us to write the basic regex algorithm in an
encoding-agnostic form, and then find some way to efficiently tie that
to particular encodings.  Indeed, the regex engine itself had better
be easily portable to Java and C#.)

It doesn't matter how fast the CPU or how big the memory--I think we'll
always need to be able to trade CPU and memory off for each other.  Part
of the endearing quality of Perl 5 is that it generally tries to outsmart
the programmer at every turn on this issue, but that might not be the
best approach in the long run.  "use less" wasn't intended to be useless.

Larry



Re: Tying & Overloading

2001-04-20 Thread Larry Wall

: At 06:20 PM 4/20/2001 -0300, Filipe Brandenburger wrote:
: >Please tell me if there really is an use for overloading && and || that 
: >would not be better done with source filtering, then I will (maybe) 
: >reconsider my opinion.

I think it's a category error to talk about overloading && and ||,
which are not really operators so much as they are control flow
constructs.  You really have to talk about overloading boolean context
in general.

Dan Sugalski writes:
: @foo = @bar  && @baz;

That will probably be written with a colon, something like:

@foo := @bar && @baz;

and do everything in scalar context.  Think of it as doing something
roughly equivalent to a Perl 5-like

*foo = \@bar && \@baz;

As to what

@foo = @foo && @bar;

(no colon) will mean in Perl 6, I confess I'm not yet sure.  Perhaps
the && will force scalar context on its arguments, then the assignment
to @foo will have to figure out whether to flatten the return array
reference, perhaps based on some hint supplied by the fact that &&
was itself in list context.  (Much as any lazy list might come in as
a reference that is marked to be expanded when read out.)

But in general, yes, we do need to be careful to distinguish the vtable
of the variable from the vtable of the value.  Consider:

my Cat %chases : = { ... };

This is much like a method:

my Cat &chases (Dog $spot) : = { ... };

In either case, Cat is the type of the return value, and really has
little to do with the implementation of the function (or hash) itself.
$spot.chases is a Dog method, not a Cat method.  In the same way,
%chases is a Catalog method, not a Cat method.

Larry



Re: Tying & Overloading

2001-04-20 Thread Larry Wall

Jarkko Hietaniemi writes:
: What is someone wants to define matrices and have both cross product
: and dot product?

At some point, there aren't enough operators, and new ones have to
be named somehow, or old ones usurped.  In any event, new ops either
have to be declared with a lexical scope, or use an existing syntactic
rule.  But the underlying operation can still be dispatched like a method.

: The whole slew of other math functions: abs, sqrt, sin, log?  Where do
: they fit in?  (See why multilevel vtables might be a good idea?)

I don't see how a multilevel vtable saves us either time or memory.

: >   // bit operations
: >   void  (*BITAND) (SVAL *result, SVAL *this, SVAL *value);
: >   void  (*BITOR)  (SVAL *result, SVAL *this, SVAL *value);
: >   void  (*BITXOR) (SVAL *result, SVAL *this, SVAL *value);
: >   void  (*BITNOT) (SVAL *result, SVAL *this);
: >   void  (*BITSHL) (SVAL *result, SVAL *this, SVAL *value);
: >   void  (*BITSHR) (SVAL *result, SVAL *this, SVAL *value);
: 
: Many new bitops have been suggested over the years: bit roll,
: bit reverse.  I'm not saying they should be blindly added here,
: I'm asking what if somebody wants to?

I think it's a mistake to think of the vtable format as fixed.  We need
ways to install sets of methods such that they'll dispatch just as fast
as any built-in.  This probably involves setting up some kind of a
registry for mapping of names to vtable offsets at compile time, for
those method names we know about at compile time.

: >   // numeric comparisons
: >   int   (*NUMCMP) (SVAL *this, SVAL *value);
: >   int   (*NUMEQ)  (SVAL *this, SVAL *value);
: >   int   (*NUMNE)  (SVAL *this, SVAL *value);
: >   int   (*NUMLT)  (SVAL *this, SVAL *value);
: >   int   (*NUMGT)  (SVAL *this, SVAL *value);
: >   int   (*NUMLE)  (SVAL *this, SVAL *value);
: >   int   (*NUMGE)  (SVAL *this, SVAL *value);
: 
: I'm not certain about the utility of having separate EQ NE LT GT LE GE.
: If those are not deducible from CMP, we don't have a consistent comparison
: function (we might get loops).

Nevertheless, there are machines that can implement EQ faster than they
can implement CMP, or that would rather not pay the overhead of the
extra recursive method call.  A given program on a given architecture
might want to have it either one way or the other.  This is yet another
indication that nailing down the entries of the vtables is a form of
premature optmization we can't afford to indulge in.  Every class
should have its own vtable, constructed from available metadata.  If
some of that metadata is available before we even start, fine, we can
cheat some based on that.  But let's not reimplement Perl 5's
intellectual limitations.  There's no reason to relegate user-defined
functions to second-class status.

Larry



Re: Tying & Overloading

2001-04-23 Thread Larry Wall

Nick Ing-Simmons writes:
: >You really have to talk about overloading boolean context
: >in general.
: 
: Only if you are going to execute the result in the normal perl realm.
: Consider using the perl parser to build a parse tree - e.g. one to 
: read perl5 and write perl 6. This works for all expressions except
: &&, || and ?: because perl5 cannot overload those - so 
: 
: $c = ($a && &b) ? $d : $e;
: 
: calls the bool-ness of $a and in the defered execution mode of a translator
: it wants to return not true/false but "it depends on what $a is at run-time".
: It cannot do that and is not passed $b so cannot return 

I think using overloading to write a parser is going to be a relic of
Perl 5's limitations, not Perl 6's.

Larry



Re: Just in case you were wondering if alignment matters...

2001-04-23 Thread Larry Wall

As a general rule of thumb, if you sort your structs into decreasing
size, it usually comes out right.  That is, put all your 64-bit items
first, then all your 32-bit items, then 16-bit, then 8-bit.  Then there
are no "holes" except the one at the end, which most compilers are
pretty good at keeping track of.

But yes, you could still certainly misalign the whole struct if it's
embedded in a stream of something with smaller alignment constraints.

Larry



Re: Tying & Overloading

2001-04-24 Thread Larry Wall

Dan Sugalski writes:
: Resizing the vtable at runtime is a really dodgy thing. There are some 
: rather huge threading implications here--changing their size (as opposed to 
: using up a limited number of "uncommitted" spots we leave at the end) means 
: potentially having to move all the vtables around, which means updating the 
: vtable pointers already stuck into variables. This, one could assume, falls 
: firmly in the "yuck" category.

I think we definitely have to be able to resize vtables at compile time,
which is a form of run time.  It's vaguely possible we could restrict
multithreading during compile phase.

On the other hand, if objects point at their class, and the class points
at the vtable, I don't see any big deal in resizing.  When you're ready
to commit a new vtable it's a single pointer swap.

Larry



Re: Tying & Overloading

2001-04-24 Thread Larry Wall

Nick Ing-Simmons writes:
: Larry Wall <[EMAIL PROTECTED]> writes:
: >I think using overloading to write a parser is going to be a relic of
: >Perl 5's limitations, not Perl 6's.
: 
: I am _NOT_ using overloading to write a parser. 
: Parse::Yapp is just fine for writing parsers. I am trying to re-use
: a parser that already exists - perl5's parser. 

I understand that, even if I was unclear.

: What _really_ want to do is a dynamically scoped peep-hole "optimize"
: (actually a rewrite) of the op tree - written in perl.

Sure, but that's not really overloading the way I think of it, it's a
different kind of hook into the parser/code-generator.  That's more or
less what I was trying to say, poorly.

Larry



Re: Split PMCs

2001-04-24 Thread Larry Wall

Dan Sugalski writes:
: Unless Larry says otherwise, this:
: 
:my num @foo;
: 
: will have the data portion of the @foo PMC point off to a block of memory 
: with floats jammed end-to-end in it.

I'm not going to say other.

Larry



Re: Tying & Overloading

2001-04-30 Thread Larry Wall

Dan Sugalski writes:
: At 03:08 PM 4/25/2001 -0300, Branden wrote:
: 
: >At 01:52 PM 25/04/2001 -0400, Dan Sugalski wrote:
: >>Seriously, I don't see why this should be a scary thing. So, the opcode 
: >>table's extendable. So what? It'll make language X mode simpler, for some 
: >>value of X, if that language can load in its own set of extended opcodes. 
: >>Perhaps someone'll want to do functional programming with the Parrot 
: >>runtime, and it makes the most sense from a speed and parser simplicity 
: >>standpoint to treat certain activities as atomic things and emit a single 
: >>bytecode for them. Extending the opcode table makes sense there.

Possibly, but I just looked through the opcode table, and found only
about 50 opcodes that weren't obvious candidates for OO treatment.
Which basically means that the other 300 or so of them *are* candidates,
and don't belong in the opcode table at all.

: >I agree, although I would argue that for functional programming you don't 
: >need anything but Perl
: 
: Oh, sure, that's true and I'll give no argument.

We don't have continuations yet...

: That doesn't mean that you 
: can't do things to make something easier, however.

We just need to be careful that in making something easier we don't make
something else much harder.  Those 300 non-opcode methods still need to be
blazing fast...

: You could, for example, argue that perl has no real need at the C level for 
: opcodes with underlying support for arrays and hashes as data types, as you 
: can emulate them with the scalar operations. Doesn't make them any less useful.
: 
: >Anyway, I would say opcode extendability should be analysed with a ROI 
: >optics, because I really don't know if having new opcodes (instead of 
: >faking them with subs -- if we have appropriate ways of locking & 
: >synchronizing thread data) would be really necessary.
: 
: Providing the capability is simple and straightforward, and will piggyback 
: on top of the overridable opcode stuff.

Just make sure it's lexically scoped so my new opcodes don't clobber yours.

: Besides, what makes you think we won't be doing at least some class of subs 
: this way? :)

Why not all of them?  Why not simply say that opcodes and subs are
indistinguishable in how they're stored in the symbol tables?  Some
subs may be implemented in Perl, some in C, some in Java, some in C#,
some in P4 assembler...

Larry



Re: PDD: Conventions and Guidelines for Perl Source Code

2001-05-09 Thread Larry Wall

Dave Mitchell writes:
: | anyone know precisely what the following means?
: 
: "K&R" style for indenting control constructs

Strictly speaking, it means you always put the opening bracket on the
same line as the keyword, and only worry about lining up the closing
bracket:

: | my personal pet peeve: death to dSP and friends !!
: 
: Macros must never define or implicity use auto variables unless it
: is essential for extensibility. In this case, defining macros should
: be prefixed with C, and macros which use said variables should
: be prefixed with C, eg
: 
:   #define DEFVAR_save_stack   struct Stack *oldsp = sp;
:   #define VAR_restore_stack   sp = oldsp;
: 
: This then at least provides some warning to the programmer that things
: are being done behind his/her/its back.

I think that's silly.  You misuse a variable that requires an auto, the
compile dies, that's all.  And macros can be very useful for an abstraction
layer that intended to *hide* the implementation.  Hoisting implementation
details into the name defeats that abstraction.

The whole point of dSP and friends was to make it easier to switch to
inline threaded code if we felt like it, in which case all bets are off
as to which variables are auto.

: =head2 Extensibility
: 
: If Perl 5 is anything to go by, the lifetime of Perl 6 will be at least
: seven years. During this period, the source code will undergo many major
: changes never envisaged by its original authors - cf threads, unicode
: in perl 5. To this end, your code should make as few assumptions as
: possible.

Any statement that contains the phrase "as X as possible" is very
likely one-sided.  Balance is usually necessary.  The fact of the
matter is that code that makes as few assumptions as possible runs
incredibly slow.  So it's not possible to make as few assumptions as
possible.  (Well, it is, but we don't want to do that.)

: For example, if your struct eventually needs more than
: 32 flags, can it be gracefully expanded to more than a single word of
: flags? Bear in mind that there may be code in other people's Perl
: extensions and code that Perl itself is embedded in, all of which
: may be using your stuff. Or there may be other distributions of Perl
: using your code. You may find it rather difficult to persuade all these
: other programmers to modify their code due to your lack of foresight.

Given this sort of directive, many programmers would go off and invent
a general property system that is completely general and runs like a
dog.  I guess the real problem here is that we aren't saying what
*kinds* of assumptions are bad.  One of the assumptions we need to
avoid is that abstractions in the compiler are necessarily reflected in
complications at runtime.  Using the example given, if you need more
than 32 flags, and you've abstracted your flags with macros, then it's
pretty simple to redirect some of the flags to a different flag word
without introducing a level of indirection at run-time.

Sorry, I know this point is actually balanced out later in the document,
but it hit a hot button.

: If you do put an optimisation in, time it on as many architectures
: as you can, and reject it if it slows down on any of them! And remember
: to document it.

Or disable the optimization on that architecture, to get the benefit of
it elsewhere, balancing the benefits of the small code fork against
the benefits of not forking.  When in Rome, do as the Romans would do
if they weren't Roman and just visiting.

Larry



Re: PDD: Conventions and Guidelines for Perl Source Code

2001-05-09 Thread Larry Wall

Larry Wall writes:
: Dave Mitchell writes:
: : | anyone know precisely what the following means?
: : 
: : "K&R" style for indenting control constructs
: 
: Strictly speaking, it means you always put the opening bracket on the
: same line as the keyword, and only worry about lining up the closing
: bracket:

That's funny, my examples disappeared, leaving only a colon.  Here:

mumble (natter, gromish) {
...
}

I then went on to point out that if there are multiple lines in the
front matter, I like to line up the first bracket as well:

mumble (natter, gromish,
natter, gromish,
natter, gromish,
natter, gromish)
{
...
}

Larry



Re: PDD: Conventions and Guidelines for Perl Source Code

2001-05-09 Thread Larry Wall

Dave Mitchell writes:
: My thinking behind "if fails on one, avoid on all" was that if it failed
: on at least one, then it may well fail on others that you dont have access
: to - either now or in the future, and thus perhaps isnt as good an optimisation
: as you figured. The other way would to be only enable for those architectures
: that experience a speedup.  

Makes sense.

Larry



Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Larry Wall

Dan Sugalski writes:
: Nah, bytecode'll have an endianness marker at the top. If you load in 
: bytecode with the wrong endianness, the loader will have to swap for you.

Er.  If they're not bytes, we can't call it bytecode.

Larry



Re: PDD 2nd go: Conventions and Guidelines for Perl Source Code

2001-05-29 Thread Larry Wall

Dan Sugalski writes:
: 1) The indentation should be all tabs or all spaces. No mix, it's a pain. 

This will devolve into an editor war, and I don't think it's a real issue.

Larry



Re: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Larry Wall

Dan Sugalski writes:
: Right, but in this case we have the advantage of tailoring the instruction 
: set to the language, and given the overhead inherent in op dispatch we also 
: have an incentive to hoist opcodes up to as high a level as we can manage.

We basically tried this experiment with Perl 5, and it's only a partial
success.  Yes, you end up with a Perl VM that can run Perl pretty fast,
but it tends to complicate the mapping to other virtual machines.
(Enough so that we still don't have a reasonable way to run Perl on a
JVM, despite several attempts.)

I guess the real question is to what extent the world of the future will
use interpreters, and to what extent it'll settle on JIT compiling instead.
And that's a big enough dog that we can't wag it very easily.

By the way, have you folks considered how to unify the regex opcodes
with the normal Perl opcodes?  I suspect we might want to do that.

Larry



Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Larry Wall

Dan Sugalski writes:
: Are you speaking of the nodes in regnode.h? I hadn't considered them as 
: regular perl opcodes--I figured they'd stay internal to the regex engine so 
: we could keep it reasonably modular.

I don't think that's a terribly strong argument--one could justify any
number of unfortunate architectural distinctions on the basis of
modularity.  Plus, I'd argue you can still retain modularity of your
code while unifying implementational philosophy.

It seems to me that the main reason for not considering such a
unification is that We've Never Done It That Way Before.  It's as if
regular expressions have always been second-class programs, so they'll
always be second-class programs, world without end, amen, amen.

But there is precedent for turning second-class code into first-class
code.  After all, that's just what we did for ordinary quotes in the
transition from Perl 4 to Perl 5.  Perl 4 had a string interpolation
engine, and it was a royal pain to deal with.

The fact that Perl 5's regex engine is a royal pain to deal with should
be a warning to us.

Much of the pain of dealing with the regex engine in Perl 5 has to do
with allocation of opcodes and temporary values in a non-standard
fashion, and dealing with the resultant non-reentrancy on an ad hoc
basis.  We've already tried that experiment, and it sucks.  I don't
want to see the regex engine get swept back under the complexity carpet
for Perl 6.  It will come back to haunt us if we do:

  "Sure, you can download the object code for this 5 line Perl program
  into your toaster...but you'll also have to download this 5 gigabyte
  regex interpreter before it'll run."

That's a scenario I'd love to avoid.  And if we can manage to store
regex opcodes and state using mechanisms similar to ordinary opcodes,
maybe we'll not fall back into the situation where the regex engine is
understood by only three people, plus or minus four.

Larry



Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Larry Wall

Dan Sugalski writes:
: At 11:24 AM 6/4/2001 -0700, Larry Wall wrote:
: >Dan Sugalski writes:
: >: Are you speaking of the nodes in regnode.h? I hadn't considered them as
: >: regular perl opcodes--I figured they'd stay internal to the regex engine so
: >: we could keep it reasonably modular.
: >
: >I don't think that's a terribly strong argument--one could justify any
: >number of unfortunate architectural distinctions on the basis of
: >modularity.
: 
: Yeah I know. I mean, look at how those darned bricks have held back 
: architecture all these years! :-P

Hey, I come from the coast where we try to avoid bricks, especially
when they're falling.

: >Plus, I'd argue you can still retain modularity of your
: >code while unifying implementational philosophy.
: 
: I'm not entirely sure of that one--processing a full regex requires the 
: perl interpreter, it's not all that modular.

These days I'm trying to see the regex as just a funny-looking kind of
Perl code.

: Though whether being able to 
: yank out the RE engine and treat it as a standalone library is important 
: enough to warrant being treated as a design goal or not is a separate 
: issue. (I think so, as it also means I can treat it as a black box for the 
: moment so there's less to try and stuff in my head at once)

As a fellow bear of very little brain, I'm just trying to point out that
we already have a good example of the dangers to that approach.

: >It seems to me that the main reason for not considering such a
: >unification is that We've Never Done It That Way Before.  It's as if
: >regular expressions have always been second-class programs, so they'll
: >always be second-class programs, world without end, amen, amen.
: 
: No, not really. The big reasons I wasn't planning on unification are:
: 
: *) It makes the amount of mental space the core interpreter takes up smaller

It may certainly be valuable to (not) think of it that way, but just
don't be surprised if the regex folks come along and borrow a lot of
your opcodes to make things that look like (in C):

while (s < send && isdigit(*s)) s++;

: *) It can make performance tradeoffs separately from the main perl engine

The option of doing its own thing its own way is always open to an
opcode, but when you do that the option of making efficient use of the
core infrastructure goes away.  As an honorary member of the regex
hacking team, I covet registers.  :-)

: *) We can probably snag the current perl 5 source without much change

Cough, cough.

: *) The current RE engine's scared (or is that scarred?) me off enough that 
: I'd as soon leave it to someone who's more tempermentally suited for such 
: things.

As an honorary member of the temperamentally suited team, allow me to
repeat myself.  Cough, cough.  We can certainly borrow the ideas from
Perl 5's regex engine, but even us temperamentally suited veterans are
sufficiently scared/scarred to want something that works better.

: *) Treating regexes as non-atomic operations brings some serious threading 
: issues into things.

Eh, treating regexes as atomic is the root of the re-entrancy problem.
If the regex has access to local storage, the re-entrancy and threading
problems pretty much solve themselves.

: >The fact that Perl 5's regex engine is a royal pain to deal with should
: >be a warning to us.
: 
: I can think of a couple of reasons that the current engine's a royal pain, 
: and they don't have much to do with it as a separate entity...

Sure, I'm just saying that at least two of those couple reasons are
that 1) it invents its own opcode storage mechanism, and 2) it uses
globals for efficiency when it should be using some efficient variety
of locals.

: >Much of the pain of dealing with the regex engine in Perl 5 has to do
: >with allocation of opcodes and temporary values in a non-standard
: >fashion, and dealing with the resultant non-reentrancy on an ad hoc
: >basis.  We've already tried that experiment, and it sucks.  I don't
: >want to see the regex engine get swept back under the complexity carpet
: >for Perl 6.
: 
: Yeah, but those are mostly issues with the implementation, not with the 
: separation.

That sounds suspiciously like what I'm trying to say.

: >That's a scenario I'd love to avoid.  And if we can manage to store
: >regex opcodes and state using mechanisms similar to ordinary opcodes,
: >maybe we'll not fall back into the situation where the regex engine is
: >understood by only three people, plus or minus four.
: 
: While I'm not sure I agree with it, if that's what you want, then that's 
: what we'll do. Threading will complicate this some, since we'll need to 
: guarantee atomicity across multiple opcodes, something I'd not planned on 
: doing.

How will we guarantee that my $foo stays "my" under threading?  Surely
the same mechanism could serve to keep "my" regex state variables sane
under the same circumstances.  (I oversimplify, of course...)

Larry



Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Larry Wall

Dan Sugalski writes:
: Yeah, a lot of that's definitely a problem, as is the manipulation of the 
: return stack and some assignment bits. (You can cut the time split takes in 
: half by having the destination array presized, for example)

That's why the current version Perl goes to a great deal of trouble to
hang onto its allocations from previous iterations/invocations, on the
theory that if you needed N elements last time, you'll probably need
about N elements again this time.  In changing how locals are generated
for Perl 6, it would be easy to lose such optimizations accidentally,
and that might be unfortunate.  It may be that there's a middle ground,
where by saving only the information and not the actual allocation, we
can estimate how much to reallocate the next time through without
actually hanging on to the memory in question.

Larry



Re: Stacks, registers, and bytecode. (Oh, my!)

2001-06-04 Thread Larry Wall

Jarkko Hietaniemi writes:
: > : Though whether being able to 
: > : yank out the RE engine and treat it as a standalone library is important 
: > : enough to warrant being treated as a design goal or not is a separate 
: > : issue. (I think so, as it also means I can treat it as a black box for the 
: > : moment so there's less to try and stuff in my head at once)
: > 
: > As a fellow bear of very little brain, I'm just trying to point out that
: > we already have a good example of the dangers to that approach.
: 
: Still, having the regexen as "reusable component" (i.e. library)
: wouldn't be a bad idea.  The current regex code most definitely isn't
: detachable or reusable, so we can't have said to have explored that
: option.

Well, other languages have explored that option, and I think that makes
for an unnatural interface.  If you think of regexes as part of a
larger language, you really want them to be as incestuous as possible,
just as any other part of the language is incestuous with the rest of
the language.  That's part of what I mean when I say that I'm trying to
look at regular expressions as just a strange variant of Perl code.

Looking at it from a slightly different angle, regular expressions are
in great part control syntax, and library interfaces are lousy at
implementing control.

Larry



Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Larry Wall

Dan Sugalski writes:
: Have they changed that again? Last I checked, UTF-8 was capped at 4 bytes, 
: but that's in the Unicode 3.0 standard.

Doesn't really matter where they install the artificial cap, because
for philosophical reasons Perl is gonna support larger values anyway.
It's just that 4 bytes of UTF-8 happens to be large enough to represent
anything UTF-16 can represent with surrogates.  So they refuse to
believe in anything longer than 4 bytes, even though the representation
can be extended much further.  (Perl 5 extends it all the way to 64-bit
values, represented in 13 bytes!)

They also arbitrarily define UTF-32 to not use higher values than
0x10, but that doesn't mean we're gonna send in the high-bit Nazis
if people want higher values for their own purposes.

But since the names UTF-8 and UTF-32 are becoming associated with those
arbitrary restrictions, it's getting even more important to refer to
Perl's looser style as utf8 (and, potentially, utf32).  I don't know
if Perl will have a utf16 that is distinguised from UTF-16.

Larry



Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Larry Wall

Russ Allbery writes:
: Particularly since extending UTF-8 to more
: than 31 bits requires breaking some of the guarantees that UTF-8 makes,
: unless I'm missing how you're encoding the first byte so as not to give it
: a value of 0xFE.

The UTF-16 BOMs, 0xFEFF and 0xFFFE, both turn out to be illegal UTF-8
in any case, so it doesn't much matter, assuming BOMs are used on
UTF-16 that has to be auto-distinguished from UTF-8.  (Doing any kind of
auto-recognition on 16-bit data without BOMs is problematic in any case.)

Larry



Re: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Larry Wall

Dan Sugalski writes:
: At 04:44 PM 6/5/2001 -0700, Larry Wall wrote:
: >(Perl 5 extends it all the way to 64-bit values, represented in 13 bytes!)
: 
: I know we can, but is it really a good idea? 32 bits is really stretching 
: it for character encoding, and 64 seems rather excessive.

Such large values would not typically be used for standard characters, but
as a means of embedding an inline chunk of non-character data, such as a
pointer, or a set of metadata bits.

: Really 
: space-wasteful as well, if we maintain a character type with a fixed width 
: large enough to hold the largest decoded variable-width character.

True 'nuff.  I suspect most people would want to stick within 32 bits,
which is sufficiently wasteful for most purposes.

: And I 
: really, *really* want to do as little as possible internally with 
: variable-width encodings. Yech.

Mmm, the difficulty of that is overrated.  Very seldom do you want to
do anything other than find the next character, or the previous
character, and those are pretty easy to do in utf8.

: >They also arbitrarily define UTF-32 to not use higher values than
: >0x10, but that doesn't mean we're gonna send in the high-bit Nazis
: >if people want higher values for their own purposes.
: 
: Well, that'd be inappropriate since a good chunk of the rest of the set's 
: been dedicated to future expansion. I think it might be a reasonable idea 
: for -w to grumble if someone's used a character in the unassigned range, 
: though. (IIRC there's a piece set aside for folks to do whatever they want 
: with)

Certainly, but it's easy to come up with reasons to want to stuff more
bits inline than the private use areas will support.  Rather than have
-w grumble about such characters, I'd rather see an optional output
discipline that enforces strict Unicode output.

: >But since the names UTF-8 and UTF-32 are becoming associated with those
: >arbitrary restrictions, it's getting even more important to refer to
: >Perl's looser style as utf8 (and, potentially, utf32).  I don't know
: >if Perl will have a utf16 that is distinguised from UTF-16.
: 
: I'd as soon not do UTF-16 at all, or at least no more than we need to 
: convert to UTF-32 or UTF-8.

Well, as you pointed out above, we might not use any kind of "UTF"
internally, but just arrays of properly sized integers, which are never
variable length.  (UTF-32 is the only UTF that's not a variable-length
encoding.)

On the other hand, maybe there's some use for a data structure that is
a sequence of integers of various sizes, where the representation of
different chunks of the array/string might be different sizes.  Would
make some aspects of copy-on-write more efficient to be able to chunk
strings and integer arrays.  And of course this would all be transparent
at the language level, in the absence of explicit syntax to treat an
array as a string or a string as an array.

Larry



Re: Should we care much about this Unicode-ish criticism?

2001-06-06 Thread Larry Wall

Russ Allbery writes:
: Yeah, but one of the guarantees of UTF-8 is:
: 
:-  The octet values FE and FF never appear.
: 
: I can see that this property may not be that important, but it makes me
: feel like things that don't have this property aren't really UTF-8.

Which is one of the reasons I call it "utf8" instead.  I think of utf8
as a nice way to compactly store a sequence of arbitrarily sized
integers.  And you know I've never been particularly interested in
having Perl enforce arbitrary limits.  (Admittedly, in the particular
case of integer size, Perl has historically accepted some arbitrary
2**n limits to gain performance.)

I'm much more interested in the clean abstraction of "a string is a
sequence of integers" than I am in the fact that those integers happen
to represent particular characters under Unicode.  To be sure, it's
quite handy that those integers do represent characters, but (as has
been pointed out redundantly and repetitiously) the definition of
Unicode changes over time.

In contrast, the definition of integers doesn't change.  (At least, it
hadn't changed last time I checked...)

Larry



Re: Should we care much about this Unicode-ish criticism?

2001-06-06 Thread Larry Wall

Dan Sugalski writes:
: At 06:59 PM 6/5/2001 -0700, Larry Wall wrote:
: >Such large values would not typically be used for standard characters, but
: >as a means of embedding an inline chunk of non-character data, such as a
: >pointer, or a set of metadata bits.
: 
: Ah. In that case, perhaps extended utf-8 processing isn't really the most 
: appropriate way to go. If the intent is to do embedded binary bits in a 
: text stream, maybe we should build input and output filters to do that instead.

I see the issue of filters as orthogonal to the issue of representation.
In other words, I don't understand what you're saying.

: >Mmm, the difficulty of that is overrated.  Very seldom do you want to
: >do anything other than find the next character, or the previous
: >character, and those are pretty easy to do in utf8.
: 
: As Hong pointed out to me on more than one occasion. I'm not sure I buy 
: that, and I have serious reservations about the speed of dealing with 
: variable length characters instead of fixed-length ones.

Whether you buy it or not, I wasn't offering it as a mere conjecture.
That is precisely what Perl 5.6+ is already doing for Unicode data.
It's not a big deal unless your program is full of substr().  And it
saves on input processing, if the input is already known to be UTF-8.

: >Certainly, but it's easy to come up with reasons to want to stuff more
: >bits inline than the private use areas will support.
: 
: Maybe. That trips my "way too clever" reflex, though, and makes me think 
: that perhaps it's not the best way to go about that sort of thing. Rather 
: than making non-text things look like text, maybe we'd be better off coming 
: up with a better way to intermingle text and non-text things. It'd be more 
: space-efficient as well, since utf-8 encoding random binary things will 
: tend to expand them more than would seem necessary.

True 'nuff.  There are many valid reasons to want to annotate
substrings with various kinds of out-of-band metadata.  The trick
is to keep the out-of-band data in sync with the in-band data.

: >On the other hand, maybe there's some use for a data structure that is
: >a sequence of integers of various sizes, where the representation of
: >different chunks of the array/string might be different sizes.  Would
: >make some aspects of copy-on-write more efficient to be able to chunk
: >strings and integer arrays.  And of course this would all be transparent
: >at the language level, in the absence of explicit syntax to treat an
: >array as a string or a string as an array.
: 
: I think that'd be a better solution than fibbing about what a piece of a 
: data stream is.

If you want to attach the label "fib" to an intentional violation of
cultural convention, I suppose I can't stop you.  But Rosa Parks wasn't
pretending to be white when she sat in the front of the bus.  :-)

Larry



Re: Butt-ugliness reduction

2001-11-16 Thread Larry Wall

Dan Sugalski writes:
: At 06:52 PM 11/15/2001 +, Simon Cozens wrote:
: >I've hit upon quite a major problem with implementing
: >Perl scalar PMCs. The problem is that the code is just
: >too damned ugly for words.
: >
: >Remember that PMCs have a data area which is a void
: >pointer; I'm connecting that to a structure which has
: >integer, number and string slots. Those of you familiar
: >with Perl SVs will know exactly where I'm coming from.
: 
: Ah, perl 5 SVs. Exactly what I was trying to avoid. :)
: 
: >Any suggestions for cleaning up this crap and making it a bit
: >more maintainable?
: 
: Couple of things:
: 
: *) Use the cache entry in the PMC header. It'll mean for ints and floats 
: you won't need to bother with a separate structure or dereference
: 
: *) Swap out vtables as the pmc internals change, rather than vectoring off 
: flags.
: 
: Yeah, some of the code will be a little grotty, but not as bad as you might 
: think.

I think that as soon as the concept of a cache starts producing bit flags
and unions and cascaded if/then/elses, it's probably a bad concept.

Larry



Re: [ID 20020130.001] Unicode broken for 0x10FFFF

2002-01-30 Thread Larry Wall

Jarkko Hietaniemi writes:
: > What I notice, though, is that the current code does not warn for
: > characters beyond 0x10, which is definitely a bug.
: 
: Ahh, it's all coming back now... warning about such characters
: causes pain in the complementing tr///... have to look at this later.

I think the general policy of Perl should be that it is allowed to
think about bad thoughts, because that is the only way to understand
what's bad about the bad thoughts Perl receives on input.  If there is
to be any self-censorship, it should be on the output, I believe.
That's why they're called "disciplines", after all. :-) So it's fine if
the default output discipline enforces that the internal representation
is transformed to well-formed UTF-8.  It's even okay if the default
input discipline enforces well-formedness, as long as there's a way
to get at the raw badness.

But within Perl, character strings are simply sequences of integers.
The internal representation must be optimized for this concept, not for
any particular Unicode representation, whether UTF-8 or UTF-16 or
UTF-32.  Any of these could be used as underlying representations, but
the abstraction of sequences of integers must be there explicitly in
the internal high-level string API.  To oversimplify, the high-level
API must not have any parameters whose type contains the string "UTF".

In the absence of other type information, these integers are assumed
to be Unicode code points.  Additional strictures are possible and even
useful, but should not be the default (except for certain operations that
are explicitly designed for Unicode.)

For various reasons, some of which relate to the sequence-of-integer
abstraction, and some of which relate to "infinite" strings and arrays,
I think Perl 6 strings are likely to be represented by a list of
chunks, where each chunk is a sequence of integers of the same size or
representation, but different chunks can have different integer sizes
or representations.  The abstract string interface must hide this from
any module that wishes to work at the abstract string level.  In
particular, it must hide this from the regex engine, which works on
pure sequences in the abstract.

Note that I did not use the phrase "pure sequences of integers" in the
last sentence.  The regex engine must not care if it is matching
characters from a string of known length, or tokens objects from an
array that is being grown arbitrarily on demand.  Matching on UTF-32
is not good enough.

This is just a heads up for some of the stuff in Apocalypse 5.
Backtracking behavior will not necessarily be limited to regexes in
Perl 6, and if so, we have to consider very carefully how regex
backtracking, continuations, and temp variable unifications all work
together.  (This is part of the reason I pushed earlier for the regex
opcodes to be meshed with the normal opcodes.)

I seriously intend that it be trivial to write a Perl parser (or any
other parser) in Perl, and that changing a grammar rule be as simple as
swapping in a different qr// (or a sub equivalent to a qr//).  More
generally, I want logic programming to be one of the paradigms that
Perl supports.  And as usual, I want to support it without forcing it
on people who aren't interested.

Sorry I can't be more clear yet.  Story of my life.  That's the basic
problem with the bear-of-very-little-brain approach.  So please "bear"
with me.

[I've cross-posted because of the wide interest, but I don't want to
start a general frenzy cross-posted to all the lists.  Please answer
specific points in separate messages, and please direct each followup
to the appropriate list.  Thanks.]

Larry



Re: Parrot is very (intentionally) broken

2002-02-08 Thread Larry Wall

Simon Cozens writes:
: Gregor N. Purdy:
: > I was only involved in a small amount of 'key' discussion. FWIW, I
: > would have thought the KEY_PAIR thingee was for (array) slice ranges,
: > not multidimensional indexing...
: 
: Then it's doubly mis-named, because KEY_PAIR holds a single key, not a
: pair of anything, and KEY holds a bunch of KEY_PAIRs.

I just think of multidimensionality as another "list of" dimension on
top of the slices.  Alternately, you can think of it as another
dimension on each leaf that turns each scalar into a list.  But the
extra dimension has to sneak in there somewhere if we're to allow
multidimensional slicing.

Larry



Re: Parrot is very (intentionally) broken

2002-02-08 Thread Larry Wall

Gregor N. Purdy writes:
: I think of slicing as a shortcut for map.
: 
:foo[1,2,3]   ===map { foo[$_] } (1,2,3)
: 
: I think of multidimensionality as arrays-of-arrays:
: 
:foo[1][2]
: 
: As for combining the two, I guess that would be
: 
:foo[1,2][3,4]  =~=  temp = map { foo[$_] } (1,2);
:map { temp[$_] } (3,4)

I think the math folks want to write that:

foo[1,2;3,4]

And I don't see why we can't support that.  As I said somewhere, that's
probably just shorthand for

foo[[1,2],[3,4]]

So I guess that's actually the "leaf" view of the extra dimension.

: I think of ranges as being lazy lists. Under flattening, they
: remain generators and cause the the flattened list to also be lazy
: so that when the ranges are encountered they DTRT. Optimization
: might cause short ranges to explode so we don't have too much
: time overhead for (1..5), while still avoiding space overhead for
: (1...100).

Yes, ranges are just one kind of generator.  Any generator turns a
list into a lazy list.  In fact, most lists will be lazy under the
hood, I expect.  Consider something like this:

@lines = <$in>;
print @lines;

That may well be doing the actual input during the print.  On the other
hand, if you take the length of the array, it'd have to slurp it all
in right then and there.

Larry



Re: Parrot is very (intentionally) broken

2002-02-08 Thread Larry Wall

Dan Sugalski writes:
: At 5:17 PM + 2/8/02, Simon Cozens wrote:
: >Dan Sugalski:
: >>  Can't. Needs to be a linked list. Otherwise we can't nest data structures
: >>  well.
: >
: >Thanks; I knew there had to be a reason, couldn't remember what it was.
: 
: Now all we need to do is figure out whether keys at the lowest levels 
: will deal with slices and ranges, or whether we should emit chunks of 
: bytecode to explicitly iterate through things.

The object at the bottom has to be able to process slices and ranges
in case it wants to return a lazy list of its own.  You can't just
iterate it on behalf of the object.

Larry



Re: VM, closures, continuation

2002-02-09 Thread Larry Wall

raptor writes:
: I was just reading this :
: 
: http://www.javalobby.com/clr.html
: 
: and a question raised to me. Will Parrot have some optimisation
: (features) that will speed up closures & continuation ?

Well, we understand closures pretty well already, since Perl 5 has
'em.  Most of the optimizations there are what you'd like to do for any
subroutine call.  There just happen to be references back to lexicals
outside your current scope, so it really depends a lot on whether your
GC model is efficient.

Continuations are more problematical, because they essentially have
references to their entire dynamic context, including the call stack.
But we'll need something like continuations if we're to generalize
pattern matching outside of regular expressions.  The current regex
engine does backtracking by recursing deeper when you think you're
returning from a nested () construct.

The more general alternative is to "fork" the call stack.  One obvious
optimization there is to do a virtual fork using copy-on-write
semantics.  I suspect COW is going to be important in Perl 6 on other
grounds as well, whenever we want to preserve the appearance of value
passing when we're actually doing reference passing.

Larry



Re: Please rename 'but' to 'has'.

2002-04-26 Thread Larry Wall

Tim Bunce writes:
: For perl at least I thought Larry has said that you'll be able to
: create new ops but only give them the same precedence as any one
: of the existing ops.

Close, but not quite.  What I think I said was that you can't specify
a raw precedence--you can only specify a precedence relative to an
existing operator.  That way it doesn't matter what the initial
precedence assignments are.  We can always change them internally.

: Why not use a 16 bit int and specify that languages should use
: default precedence levels spread through the range but keeping the
: bottom 8 bits all zero. That gives 255 levels between '3' and '4'.
: Seems like enough to me!
: 
: Floating point seems like over-egging the omelette.

It's also under-egging the omelette, and not just because you
eventually run out of bits.  I don't think either integer or floating
point is the best solution, because in either case you have to remember
separately how many levels of derivation from the standard precedence
levels you are, so you know which bit to flip, or which increment to
add or subtract from the floater.

In an approach vaguely reminscent of surreal numbers, I'd just use a
string that does trinary ordering.  Suppose you have 26 initial
precedence levels.  Call these A-Z.  Subsequent characters can just be
0-2.  Skip A for the moment, call the lowest precedence level B, the
next lowest C, and so on.

  * To make a new operator at the same precedence, simply copy the base
precedence string.

  * To make a new operator at a higher precedence level, copy the base
precedence and append a "1" to it.

  * To make a new operator at a lower precedence level, copy the base
precedence, decrement the last character (that's why we skipped A)
and append a "2".

This is now extensible to any number of precedence levels, and you can
now use simple string comparison to compare any two precedences.  It even
short circuits the comparison as soon as it finds a character that
differs.

Gee, maybe I should patent this.

: p.s. I missed the start of this thread so I'm not sure why this is
: a parrot issue rather than a language one. I also probably don't
: know what I'm talking about :)

It's a language issue insofar as the language specifies that the
implementation should avoid arbitrary limits.

Larry



Re: Please rename 'but' to 'has'.

2002-04-26 Thread Larry Wall

Buddha Buck writes:
: So you'd have something like:
: 
: sub operator:mult($a, $b) is looser('*') is inline {...}
: sub operator:add($a, $b) is tighter("+") is inline {...}
: sub operator:div($a,$b) is looser("/") is inline {...}
: 
: assuming default Perl5 precedences for *, *, and / you would have the 
: precedence strings for *, +, /, mult, add, and div to be "S", "R", "S", 
: "S2", "S1", "S2" respectively?  So mult and div would have the same 
: precedences?

Yes.  This seems the most sensical approach to me.  If you base two
operators on the same precedence level with the same pedigree, they
should have the the same precedence.  One can always differentiate them
explicitly.  I could even see people setting up a system of "virtual"
operators that are just there for deriving precedence from, so you
could have 20 standard levels of precedence between * and + if you
wanted.

sub operator:PREC1($a, $b) is tighter('+') {...}
sub operator:PREC2($a, $b) is tighter('PREC1') {...}
sub operator:PREC3($a, $b) is tighter('PREC2') {...}
...

sub operator:funky($a, $b) is like("PREC13") { ... }

: Hmmm  What problems would be caused by:
: 
: sub operator:radd($a,$b) is tighter("+") is inline is rightassociative {...}
: sub operator:ladd($a,$b) is tighter("+") is inline is leftassociative {...}
: 
: Right now, all the operator precedence levels in Perl5 have either right, 
: left, or no associativity, but they do not mix right and left associative 
: operators.  Will that be allowed in Perl6?

Well, associativity is mostly just a tie-breaking rule, so we'd just
want to refine the tie-breaking rule to say what happens in that case.
It's possible the right thing to do is to treat conflicting associativity
as non-associative, and force parentheses.

Larry



Re: Apoc 5 questions/comments

2002-06-10 Thread Larry Wall

On Sun, 9 Jun 2002 [EMAIL PROTECTED] wrote:
: The parsing of perl 6 is the application of a huge, compiled, regex, correct? 

No, it's a system of compiled regexes which we're calling a grammar.

: In order to parse the new syntax, perl6 is going to have to compile the
: new rule, and stick it in the place of the old one, for the duration of the 
: scope, right?

Doesn't exactly "stick it in place of" except in an abstract sense.
It uses ordinary method overriding to hide the old rule.

: Now what happens to the parser at large if you have dependencies on what has
: changed - ex: if you change the rule for brackets, say so that all '[' are now
: actually '[[' and all ']' are now  ']]'. Won't the whole regex for parsing
: perl need to be recompiled for the duration of the block, or at least the
: dependencies on the things that you changed? And won't *that* be slow and/or
: memory intensive?

No, only the rule in question is compiled, and that only happens once regardless
of how often you invoke the rule.  There might possibly be a compilation phase
when you derive a new grammar from an old one, but that's a tradeoff we can
make when we get to it.  It'd still only happen once for a given grammar.

: And if the rules are somehow abstracted in the perl6 parser/parrot/regex engine
: ,so that each 'rule' is in essence a pointer to the real code corresponding to 
: interpreting that rule (so it can be replaced easily by user defined ones) - 
: well won't that abstraction hurt the performance of parsing regular perl?

If that becomes an issue we can always install a hard-wired lexer/parser as the
base grammar.

: And finally, if the regular expressions are in bytecode to get this flexibility
: as opposed to native machine code, what sort of overhead will this impose on 
: the regex engine?

Er, Perl 5's regexes are in their own pecurliar bytecode, not in
native machine code.  If anything, we'll be better off with Perl 6's
JITable bytecode.

: I know the above might be a bit simplistic, and since its an implementation 
: question I'm posting to perl6-internals instead, but the post is more for the 
: point of clarification about what's going on than anything else. I'd love to 
: see this happen, would use it all the time..

That's our dream.

Larry




Re: [JIT] bsr/ret in native code

2002-06-14 Thread Larry Wall

On Fri, 14 Jun 2002, Dan Sugalski wrote:
: At 9:54 AM +0200 6/14/02, Aldo Calpini wrote:
: >you would
: >not be able, for example, to inspect the call stack from inside a Parrot
: >program anymore.
: 
: That, unfortunately, makes it untenable, since we need to be able to 
: do this in the general case. Also, we'll fill up the thread stack 
: pretty quickly. Not hugely fast, mind, but it's still an issue when 
: we have a potentially small stack on hand. (20-40K won't be unusual, 
: unfortunately)
: 
: Believe me, I'd love to get the speed this way, but it'll make some 
: code untenable, and the lack of stack inspection may be a problem. 
: (If it turns out later to not be a problem, well, we can do it then. 
: I like the idea, I just think the limits'll be a problem. Hopefully 
: I'm wrong :)

Hmm.  The routines called from tight loops tend to be leaf nodes.
It might very well be useful to keep track of which routines don't
inspect the stack.  It might even be worthwhile to make a language
rule saying that any routine that uses C or C must so
indicate in the declaration somehow, via a superpositional return
type or a property.

Larry




Re: [JIT] bsr/ret in native code

2002-06-14 Thread Larry Wall

On Fri, 14 Jun 2002, Nicholas Clark wrote:
: But surely an routine that calls another routine can potentially have its
: stack inspected by the caller?

Certainly.

: So it would only make sense for leaf nodes, and even then they might
: get inspected by overloaded values or methods on objects that were passed
: as parameters?

Yes.

: So is it possible to make it useful in a general case, or were you meaning
: that a subroutine can declare "I don't need to be on the stack", document
: itself as such, and then any indirect calls it makes don't get to see it
: (but at their own risk). It's still a form of action-at-a-distance, so
: is it that good?

Probably can't make the optimization unless we have the body and can
tell either that there are no indirect calls or that any indirect
calls made are known safe.  I can see some routines that could use
this optimization that couldn't use inlining (such as when we have
no guarantee against redefinition, except in that case you still have
to go indirect through the header).

: Or would the property of "I don't use caller or want" still be useful on a
: subroutine, because the run-time could determine that it would be
: inline-able (or whatever) inside a loop at run time, based on parameters
: passed to it? (and call it non-inline if the parameters were not base perl
: types)

Maybe.  I'm not an expert on run-time optimizations.  I just know that
the more info you have, the easier it is to know when you can get
away with a particular optimization.  And that there are advantages
and disadvantages to knowing anything at any particular stage.  And I
really like optional declarations, because then the programmer gets
to make the tradeoff.

Larry




Re: Irrational fear of macros

2002-06-18 Thread Larry Wall

On Tue, 18 Jun 2002, Melvin Smith wrote:
: 2) In fact, there are MANY funny named macros in Perl5.

That is precisely *why* they had to have funny names.  Perl 5's
macro naming schemes were a vast improvement over Perl 4's.  In
Perl 4 it was impossible to tell at a glance what kind of macro
you were looking at.

: Don't you agree that code should document itself as much as possible?

Sure, but there's more than one kind of self-documentation.
The self-documenting nature of Perl 5's macros is intended for people
who can see patterns, not for people who have to have it all spelled
out to them every time.  After all, when Perl 5's code was originally
written, it only had to be readable by one person, which was me.  :-)

It was only after 5.000 came out that I figured out I needed to
delegate internals work.

: Now, take an example from Perl5.
: 
: #define BOop(f) return ao((yylval.ival=f, REPORT("bitorop",f) PL_expect = 
: XTERM,PL_bufptr = s,(int)BITOROP))
: #define BAop(f) return ao((yylval.ival=f, REPORT("bitandop",f) PL_expect = 
: XTERM,PL_bufptr = s,(int)BITANDOP))
: 
: Reading these, I know what they mean. When dealing with Lex/Yacc interface
: it is useful to hide yylval usage and state checks/changes inside an interface,
: however if I mean BitwiseOrOp() why do I write BOop() when, in fact, the macro
: is only used _twice_? When I see BOop, I'm looking for BEtty()! :)

Doesn't matter how often it's used--it fit into the system.  I don't see this
as a good place for Huffman coding.  In particular, macros defined in
the same file they're used are not much of a problem here.  If you can't
remember what they mean, you just search for them.  It's the macros that
are defined in header files that have to be very carefully named.

: >If Parrot is to avoid perl 5's legacy, a dictionary, a work to explain the 
: >usage of
: >each element of the API.  Parrot needs a rosetta stone, through which 
: >future implementors
: 
: I agree mostly, but this goes back to what I said above, in fact macros are
: often named badly. Just open up toke.c in Perl5 and hop to line 152.
: Now try to read toke.c remembering what each of those macros does;
: even if you have a dictionary, you can only retain so many terms.
: 4 character macros obfuscate as much as they help productivity.

As I say, those aren't a very good example, being local macros rather
than global.  It was much more important to systematize the macros in,
say, pp.h.  After all these years, I still know exactly what a
dPOPTOPnnrl does, and it's not because I remember the name.

:  >If you knew why the macros existed, you wouldn't be confused now would you?
: 
:  >( who is really praying for XS support in parrot,
:  > cause he's got 11,000 line of it he doesn't want to rewrite )
: 
: Wait, I thought you said we don't want to repeat history. ;)

With apologies to Henry Spencer, those who do not understand Perl 5
macros are doomed to re-invent them, poorly.

Larry




Re: Perl 6 Summary

2002-07-02 Thread Larry Wall

Are you sure Ruby isn't just using dynamic variables?  My information may
be old, but that's all it seemed like to me.  A certain amount of confusion
naturally arises in the Ruby world because of the absence of explicit
declaration, so the name binding rules get to be rather complicated.

In fact, that's the basic underlying problem with Ruby, as far as I can
tell.  In pursuing the principle of least surprise, they've merely swept
the surprises elsewhere.   Waterbed theory of surprise, if you will...

Larry




Re: Parrot Glossary - COW

2002-07-05 Thread Larry Wall

On Fri, 5 Jul 2002, Nicholas Clark wrote:
: On Fri, Jul 05, 2002 at 02:54:18PM +0200, Aldo Calpini wrote:
: > this approach saves memory, because you can create as many copies of a
: > string as you want, without allocating it many times. unless you modify
: > them, at least. it's also usually a great speed boost, because copying a
: > string is performed in O(1) time, instead of O(n) - where n is the size of
: > the string.
: 
: I suspect that the speed boost is going to be more noticeable for most
: applications than the memory saving. It doesn't matter if I'm wrong on this :-)

It's a real win for regexes that want to map $1, $2, etc. onto an existing
string.  Not to mention $`, $& and $'.

Larry




Re: Ruby iterators and blocks (was: Perl 6 Summary)

2002-07-04 Thread Larry Wall

On 4 Jul 2002, Erik [ISO-8859-1] Bågfors wrote:
: On Thu, 2002-07-04 at 11:19, Andy Wardley wrote:
: > I personally believe this approach is flawed, especially considering the fact 
: > that there is no way (that I know of) to force block parameters to be truly
: > lexically scoped or temporary (i.e. 'my' or 'local' in Perlspeak).  Much too 
: > easy to mangle existing variables like this.
: 
: Most people agree. In the future there will be a way of doing that. 
: Matz himself has said so.

Indeed, Ruby is the main reason I decided to keep "my" explicit in Perl 6.  :-)

Larry




Re: Perl6 grammar (take IV)

2002-07-06 Thread Larry Wall

On Sat, 6 Jul 2002, Trey Harris wrote:
: In a message dated Sat, 6 Jul 2002, Sean O'Rourke writes:
: > - Implicit currying variables ($^a etc) are in.  I thought I had read
: >   somewhere they were gone in favor of closure args, but people seem
: >   to be using them, and they're not hard to put in.
: 
: My understanding is that they still exist as placeholder variables, but
: no longer implicitly curry.  I.e.,
: 
: {$^a + $^b}
: 
: is a synonym for
: 
: -> $a, $b {$a + $b}

That is correct.  Currying will be done via some explicit method.
Sentiment at yapc favored something resembling:

&incr := &add.assuming(x => 1);

Larry




Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Larry Wall
I would like to express my sincere gratitude to all of you for working
through these issues.  I bent my brain on the Perl 5 regex engine,
and that was just a "simple" recurse-on-success engine--and I'm not
the only person it drove mad.  I deeply appreciate that Perl 6's
regex engine may drive you even madder.  But such sacrifices are at
the heart of why people love Perl.  Thanks!

Larry


Re: String formatting and transformation

2003-11-27 Thread Larry Wall
On Thu, Nov 27, 2003 at 10:16:50PM +, Pete Lomax wrote:
: Of the above (IMO), up & downcase are core functions, the rest not.

It's not so simple.  Upcasing the first letter should really use the
Unicode title case mapping, not upper case.  At least that's how
Perl 5 does it.

Larry


Re: Some namespace notes

2004-01-16 Thread Larry Wall
I've used non-hierarchical file systems in the distant past, and
it wasn't pleasant.  I think aliases (symlinks) work much better in
a hierarchy.  So do inner packages, modules, and classes, which we
plan to have in Perl 6.  And package aliasing will be the basis for
allowing different versions of the same module to coexist.  And if
Parrot makes people put /perl/parrot/java on the front of Java names,
the first thing people will do is to alias them all to /java.

Larry


Re: Some namespace notes

2004-02-02 Thread Larry Wall
On Fri, Jan 30, 2004 at 06:16:06PM +, Tim Bunce wrote:
: In Java you would write "java.lang.String", naturally, and in Perl
: you'd write "parrot::java::java.lang.String".

That's okay if it's a string being interpreted by the appropriate
code, but as a Perl 6 name it won't wash.  That's gonna try to call
the .lang method on the parrot::java::java class, and the String
method on the result of that.

(Unless, of course, you define a parrot::java::java macro to mangle
subsequent Perl 6 syntax.  But that seems awfully hackish.  And the
parrot::java namespace might not let you define the macro there in
the first place...)

Larry


Re: Patch vaporized?

2004-02-05 Thread Larry Wall
On Thu, Feb 05, 2004 at 11:25:22AM -0500, Gordon Henriksen wrote:
: I've submitted a patch to bugs-parrot, and it didn't seem to get posted
: to RT or otherwise handled. Anyone know where it might've gone?

Did it have an executable attachment?  :-)

Larry


Re: Need some help with object tests

2004-02-25 Thread Larry Wall
On Wed, Feb 25, 2004 at 11:59:21AM -0500, Simon Glover wrote:
: 
:  One question: there doesn't appear to be any way to generate a list of
:  the existing attributes of a class or even to determine how many
:  attributes a particular class has. Should there be ops for one or both
:  of these things?

>From the language point of view that would be a method call on the
metaclass instance.  Of course, maybe *that* method needs a special
opcode to get at the real metaclass data...

Larry


Re: Ladies and gentlemen, I give you... objects!

2004-02-27 Thread Larry Wall
On Fri, Feb 27, 2004 at 09:08:31AM -0500, Dan Sugalski wrote:
: Nope. If a language wants to provide get/set methods for class 
: attributes it needs to create those methods at compilation time.

For Perl 6 it's a single method that might be lvaluable depending on
the declaration of the attribute.

Larry


Re: Ladies and gentlemen, I give you... objects!

2004-02-28 Thread Larry Wall
On Fri, Feb 27, 2004 at 10:08:33PM -0500, Joseph Ryan wrote:
: Larry Wall wrote:
: 
: >On Fri, Feb 27, 2004 at 09:08:31AM -0500, Dan Sugalski wrote:
: >: Nope. If a language wants to provide get/set methods for class 
: >: attributes it needs to create those methods at compilation time.
: >
: >For Perl 6 it's a single method that might be lvaluable depending on
: >the declaration of the attribute.
: > 
: >
: 
: Right, but the compiler should be able to figure that out and emit
: the proper code.

Yes, but it'd be nice not to force the compiler to generate a heavy
proxy object for every lvaluable accessor to something that already
knows how to fetch and store.  We just need to make sure that the
attribute can serve as its own proxy.  That means the interface to an
attribute should be identical to the interface to a variable, once
you've taken a reference to it.  We're trying to make a big deal of
the notion that you can use an "is rw" method call anywhere you can
use a variable.  It'd just be nice if that didn't suck overly much.

Larry


Re: Dates and Times

2004-03-03 Thread Larry Wall
On Wed, Mar 03, 2004 at 11:37:09AM -0500, Dan Sugalski wrote:
: FWIW, if we start getting into the "What should our base time for the 
: epoch be" arguments, I'll warn you that the answer if I have to make 
: one is probably Nov 17, 1858 at midnight, give or take a bad memory, 
: and our time value'll be a 64-bit integer. So think carefully before 
: you go there. :)

Well, you can do whatever you like with Parrot, but I want Perl 6's
standard interface to be floating point seconds since 2000.  Floating
point will almost always have enough precision for the task at hand,
and by the time it doesn't, it will.  :-)

But the overriding consideration is that normal users should
I have to remember the units of the fractional seconds.
Is it nanoseconds this week?

That's the sort of arbitrary complexity that doesn't belong in Perl 6.
Solving the real problems is hard enough.

Larry


Re: Dates and Times

2004-03-03 Thread Larry Wall
On Wed, Mar 03, 2004 at 10:21:37AM -1000, Joshua Hoblitt wrote:
: On Wed, 3 Mar 2004, Larry Wall wrote:
: 
: > Well, you can do whatever you like with Parrot, but I want Perl 6's
: > standard interface to be floating point seconds since 2000.  Floating
: > point will almost always have enough precision for the task at hand,
: > and by the time it doesn't, it will.  :-)
: 
: Aren't there enough epochs already? :)

Aren't there enough programming languages already?  :-)

: Anyways, I recall some discussion on p6l from years ago about using
: TAI (and I think specifically libtai) as the internal time format
: for p6.  Is this still the case?

As I said, I don't care what the internal time format is.  I'm just
sticking up for the average user of the next millenium.  It's my
gut-level feeling that 99% of users would prefer that continuous
time be represented by a pseudo-continuous type like floating pint,
and that we should all settle on an epoch that will still be easy to
remember in the year 2149.  So I think the default public interface
needs to be floating point seconds since 2000.

: It would be *really* nice to be able to rely on getting TAI from Parrot.

It would also be really nice to be able to rely on not having to
write FAQs telling people how to deal with interfaces that are more
complicated than they need to be.  :-)

Don't get me wrong--I think the concept of TAI time is great.
It's just always going to be a fixed number of seconds different than
Perl 6 time, is all, whatever the TAI time is for Jan 1, 2000, UTC.

Yes, I'm a megalomaniac to think that I can set a better standard than
the French...  :-)

Larry


Re: Dates and Times

2004-03-03 Thread Larry Wall
On Wed, Mar 03, 2004 at 04:18:14PM -0500, Dan Sugalski wrote:
: >Don't get me wrong--I think the concept of TAI time is great.
: >It's just always going to be a fixed number of seconds different than
: >Perl 6 time, is all, whatever the TAI time is for Jan 1, 2000, UTC.
: 
: That, as they say, turns out not to be the case. UTC has leap 
: seconds, TAI doesn't. The two are slowly diverging--off by 32 seconds 
: right now, and probably off by 33 this year or next, with extra 
: seconds added irregularly. (It's why the decoded time array's seconds 
: goes from 0-60 rather than 0-59)

No, I *am* assuming that Perl 6 time tracks TAI time accurately with
a constant offset, and drifts with respect to UTC.  That does mean
that the translations from internal time to real datetimes have to
be groomed periodically, and people who don't upgrade their Perl for
years on end might find its clock off by a second or two.  But just
as the cable company forces periodic upgrades into your cable box
without your keeping track, any networked Perl ought to be able
to install a new time map automatically over the net without much
user intervention--but only if we design with that in mind in the
first place.  But for sanity's sake, to keep the time continuum flat,
we have to abandon the notion that Dec 31 is always 86400 seconds long,
or any length predictable several years in advance.  We can't afford
to have off-by-one errors when you subtract two standard Perl 6 times.

Larry


Re: OO benchmarks

2004-03-04 Thread Larry Wall
On Thu, Mar 04, 2004 at 09:58:02AM -0500, Dan Sugalski wrote:
: Damn. Okay, I'm going to spend today digging into the object stuff to 
: try and track down the leaks. Something's not right in there, as the 
: DOD and GC ought to be reclaiming the dead memory.

Can I hit you with a cream pie at OSCON if Perl 5 runs faster than Parrot?

:-)

Larry


Re: Dates and Times

2004-03-04 Thread Larry Wall
On Thu, Mar 04, 2004 at 09:12:47AM -0500, Dan Sugalski wrote:
: At 7:30 PM -0800 3/3/04, TOGoS wrote:
: > > Interesting -- so the planet's finally gotten
: >> its act together and settled on a rotational
: >> speed, huh? Cool. :)
: >
: >Nobody said anything about a planet.
: 
: Actually, they did. UTC (which was the original reference) is defined 
: such that noon is within .9 seconds of the sun being as directly 
: overhead as it can be, and is thus directly tied to the behaviour of 
: the planet.

The basic problem is that there are times you need to know exactly
how many seconds have passed between two timestamps, and times you
need to know exactly how many days have passed between two timestamps,
and the two are in conflict.  However, I think the more basic problem
of the two is the first one.  If you ignore that problem, you get
all sorts of breakage down in random routines that you don't want
breakage in, and this will only get worse as computers get faster,
and try to syncronize more low level things.

In contrast, the other problem tends to manifest in user interfaces,
where it can typically be solved by some snap-to-grid semantics at the
minute, hour, and day boundaries, at least until the second timer and
the Earth drift apart by 30 seconds or so.  Which will take a while.

And anyone who needs greater UTC accuracy than that should probably
be tracking TAI-UTF offsets anyway, or have some other way of resyncing
their clock at least semiyearly.

So anyway, I don't care whether Parrot builds in support for
complicated time systems like UTC.  (Well, I do care some--it should
probably be a library in any event.)  But I do care that C
be accurate.  (In fact, I'd like $^T to change to a floater too, on
systems that can support it.  It's way past time for hi-res timing
to be the default, I think.)

Larry


Re: [DOCS] Documentation tools

2004-03-04 Thread Larry Wall
On Thu, Mar 04, 2004 at 03:40:27PM +0100, Michael Scott wrote:
: I'd like to remove non-modified, non-parrot Perl modules from lib and 
: install them via CPAN.pm. I have a version here which works, but I 
: remember from experience it can be tricky to set up CPAN.pm to work 
: behind firewalls, so I'm wondering what collective wisdom has to say. 
: Should we run CPAN.pm from Configure.pl or rely on "prominent notes"?

The only sane course forward is to separate the notion of developer
distribution from user distribution.  The developer codebase should
have no "fat", and can have external dependencies out the wazoo.
The user distribution (and there can be more than one--see Linux)
provides all the bells and whistles that the distributor sees fit,
and should generally shield the user from having to download anything
commonly in use.

Ideally, the developer codebase should be completely *unusable* by
mere mortals, to prevent ISPs from installing that and claiming they
"support Perl".  They need to be forced to install a user-oriented
distribution.  Just like you can't merely install the Linux kernel
and say you're done.

I think this is one of those situations where having our cake and
eating it too is not just an option, but a requirement.

Larry


Re: [PROPOSAL] C opcode and interface

2004-03-10 Thread Larry Wall
On Wed, Mar 10, 2004 at 10:58:14AM -0500, Dan Sugalski wrote:
:   *) Times (create, modify, access)

Just a reminder that ctime on Unix is not "create" time, but time of
last inode change.  I wish there were a create time on Unix, but there
ain't.

Larry


  1   2   3   >