Re: S28ish [was: [Pugs] A couple of string interpolation edge cases]
Larry Wall wrote: On Thu, Mar 31, 2005 at 03:03:09PM +0200, Thomas Sandlaß wrote: : BTW, will bidirectionality be supported? Does it make sense to reflect : it in the StrPos type such that $pos_start < $pos_end means a non-empty : left to right string, $pos_start > $pos_end is a non-empty right to left : string and $pos_start == $pos_end delimit an empty (sub)string? As a : natural consequence the sign indicates direction with negative length : beeing right to left. And that leads to two times two types of iterators: : left to right, right to left, start to end and end to start. Offhand I'd rather have end < start be undefined, I think, but I suppose we could give it a meaning if it turns out not to be an easily generated degenerate case like 0..-1. On the other hand, I think right-to-left might deserve more Huffman visibility than an itty-bitty sign that might be hidden down in a varible. But then, we've played games with signs in substr and splice before. It's not clear that people would want substr($x, -3) to return the characters in reversed order, though. I don't see how rtl vs ltr changes how we process strings. It's purely a display problem. I seriously doubt the someone working with a rtl language would ever wish to count the characters ltr. And note that we are calling the positions "start" and "end", not "left" and "right". If I'm missing something basic here, let me know. -- Rod Adams
Re: S28ish [was: [Pugs] A couple of string interpolation edge cases]
On Thu, Mar 31, 2005 at 03:03:09PM +0200, Thomas Sandlaß wrote: : Larry Wall wrote: : >On Sat, Mar 26, 2005 at 02:37:24PM -0600, Rod Adams wrote: : >: How can you have a level independent position? : > : >By not confusing positions with numbers. They're just pointers into : >a particular string. : : I'm not the Unicode guru but my understanding is that all composition : sequences are finite and stateless with respect to everything before : and after them in the string. Which brings me to the question if these : positions are defined like positions in Emacs as lying *between* the : chars? Then the set of positions of a higher level is a subset of the : positions of lower levels. Yes, that's how I've been thinking of them. Thanks for making that explicit. : With defining position as between chars many operations on strings are : downwards compatible between levels, e.g. splitting. If one determines : e.g. an insert position on a higher level there's no problem in letting : the actual insertion beeing handled by a lower level. With fractional : positions on higher levels some degree of upward or tunneling : compatibility can be achieved. That's my feeling. : BTW, will bidirectionality be supported? Does it make sense to reflect : it in the StrPos type such that $pos_start < $pos_end means a non-empty : left to right string, $pos_start > $pos_end is a non-empty right to left : string and $pos_start == $pos_end delimit an empty (sub)string? As a : natural consequence the sign indicates direction with negative length : beeing right to left. And that leads to two times two types of iterators: : left to right, right to left, start to end and end to start. Offhand I'd rather have end < start be undefined, I think, but I suppose we could give it a meaning if it turns out not to be an easily generated degenerate case like 0..-1. On the other hand, I think right-to-left might deserve more Huffman visibility than an itty-bitty sign that might be hidden down in a varible. But then, we've played games with signs in substr and splice before. It's not clear that people would want substr($x, -3) to return the characters in reversed order, though. : All the above leads me to rant about an array like type. Please forgive : me if the following is not proper Perl6. My point is to illustrate how : I imagine the future communication between implementor and user of such : a class. Actually some POD support for extracting the type information : into the documentation would be great, too! : : And yes, the :analyse should be made lazy. The distinction between the : first and second index method could be even more specific by using : type 'Index ^ List of Str where { $_.elems == 1 }' to convey the : information that indexing with a list of one element doesn't result : in a List of Str but a plain Str. OTOH this will incur a performance : penalty and violate the intuitive notion "list in, list out". MEGO. : class StrPosArray does Array where { ::Index does StrPos } : { :has Str$:data; :has StrPos @:pos; : :multi method postcircumfix:<[ ]> :(: Index $i ) returns Str {...} :multi method postcircumfix:<[ ]> :(: List of Index $i ) returns List of Str {...} :multi method postcircumfix:<[ ]> :(: Range of Index $i ) returns List of Str {...} :multi method postcircumfix:<[ ]> :(:Int $i ) returns Str {...} : :# more stuff here for push, pop, shift etc. : :method infix:<=> (: Str $rhs ) returns ::?CLASS :{ : $:data = $rhs; : :analyse; :} : :method :analyse () :{ : # scan $:data for all between char positions : # and store them into @:pos :} : } : : Question: : does the compiler go over this source in multiple passes : such that the declaration of :analyse is known before its : usage in infix:<=>? No, you just throw in a forward declaration with {...} in that case. Larry
Re: S28ish [was: [Pugs] A couple of string interpolation edge cases]
Larry Wall wrote: On Sat, Mar 26, 2005 at 02:37:24PM -0600, Rod Adams wrote: : How can you have a level independent position? By not confusing positions with numbers. They're just pointers into a particular string. I'm not the Unicode guru but my understanding is that all composition sequences are finite and stateless with respect to everything before and after them in the string. Which brings me to the question if these positions are defined like positions in Emacs as lying *between* the chars? Then the set of positions of a higher level is a subset of the positions of lower levels. With defining position as between chars many operations on strings are downwards compatible between levels, e.g. splitting. If one determines e.g. an insert position on a higher level there's no problem in letting the actual insertion beeing handled by a lower level. With fractional positions on higher levels some degree of upward or tunneling compatibility can be achieved. BTW, will bidirectionality be supported? Does it make sense to reflect it in the StrPos type such that $pos_start < $pos_end means a non-empty left to right string, $pos_start > $pos_end is a non-empty right to left string and $pos_start == $pos_end delimit an empty (sub)string? As a natural consequence the sign indicates direction with negative length beeing right to left. And that leads to two times two types of iterators: left to right, right to left, start to end and end to start. All the above leads me to rant about an array like type. Please forgive me if the following is not proper Perl6. My point is to illustrate how I imagine the future communication between implementor and user of such a class. Actually some POD support for extracting the type information into the documentation would be great, too! And yes, the :analyse should be made lazy. The distinction between the first and second index method could be even more specific by using type 'Index ^ List of Str where { $_.elems == 1 }' to convey the information that indexing with a list of one element doesn't result in a List of Str but a plain Str. OTOH this will incur a performance penalty and violate the intuitive notion "list in, list out". class StrPosArray does Array where { ::Index does StrPos } { has Str$:data; has StrPos @:pos; multi method postcircumfix:<[ ]> (: Index $i ) returns Str {...} multi method postcircumfix:<[ ]> (: List of Index $i ) returns List of Str {...} multi method postcircumfix:<[ ]> (: Range of Index $i ) returns List of Str {...} multi method postcircumfix:<[ ]> (:Int $i ) returns Str {...} # more stuff here for push, pop, shift etc. method infix:<=> (: Str $rhs ) returns ::?CLASS { $:data = $rhs; :analyse; } method :analyse () { # scan $:data for all between char positions # and store them into @:pos } } Question: does the compiler go over this source in multiple passes such that the declaration of :analyse is known before its usage in infix:<=>? -- TSa (Thomas Sandlaß)
Re: S28ish [was: [Pugs] A couple of string interpolation edge cases]
Larry Wall wrote: On Sat, Mar 26, 2005 at 02:37:24PM -0600, Rod Adams wrote: : Please convince me your view works in practice. I'm not seeing it work : well when I attempt to define the relevent parts of S29. But I might : just be dense on this. Well, let's work through an example. multi method substr(Str $s: Ptr $start, PtrDiff ?$len, Str ?$repl) Depending on the typology of Ptr and PtrDiff, we can either coerce various dimensionalities into an appropriate Ptr and PtrDiff type within those classes, or we could rely on MMD to dispatch to a suite of substr implementations with more explicit classes. Interestingly, since Ptrs aren't integers, we might also allow multi method substr(Str $s: Ptr $start, Ptr ?$end, Str ?$repl) which might be a more natural way to deal with variable length encodings, and we just leave the "lengthy" version in there for old times sake. ...snip... for the non-destructive slicing of a string, and leave substr() with Perl 5 semantics, in which case it's just a SMOP to coerce the user's substr($a, 5, 10); to something the effectively means substr($a, Ptr.new($a, 5, $?UNI_LEVEL), PtrDiff.new(10, $?UNI_LEVEL)); Actually, in this case, I expect we're actually calling into multi method substr(Str $s: PtrDiff $start, PtrDiff ?$len, Str ?$repl) where $start will be counted from the begining of the string, so the call is effectively substr($a, PtrDiff.new(5, $?UNI_LEVEL), PtrDiff.new(10, $?UNI_LEVEL)); Okay, that looks scary, but if as in my previous message we define "chars" as the highest Unicode level allowed by the context and the string, then we can just write that in some notation resembling: substr($a, 5`Chars, 10`Chars); or whatever notation we end up with for labeling units on numbers. Even if we don't define "chars" that way, they just end up labeled with the current level (here assuming "Codes"): substr($a, 5`Codes, 10`Codes); or whatever. But this is all implicit, which is why you can just write substr($a, 5, 10); and have it DWYM. I see some danger here. In particular, there is a huge difference between a Ptr (position), and a PtrDiff (length). I'm going to rename these classes StrPos and StrLen for the time being. A StrPos can have multiple char units associated with it, and has the ability morph between them. However, it is also strictly bound to a given string. A StrLen can only have one char unit associated with it, since there is no binding string and anchors with which to reliably map how many cpts there are to so many lchars. I see the following operations being possible at a logical level: StrPos = StrPos + StrLen StrLen = StrPos - StrPos # must specify units (else implied), and must be same base Str StrLen = StrLen + StrLen # if same units. StrLen = StrLen + Int So I see the following cases of Substr happening: multi sub substr(Str $s, StrPos $start : StrPos ?$end, ?$replace) Where $start and $end must be anchored to $s multi sub substr(Str $s, StrPos $start, StrLen $length : ?$replace) Same restriction on $start, multi sub substr(Str $s, StrLen $offset : StrLen ?$length, ?$replace) Where $offset gets used as C<$s.start + $offset> and kicked over to case #2. Hmm. Okay, that's not dangerous, just a lot to look at. What gets dangerous is letting users think of a StrPos as a number, since it's not. Only StrLen's get to pretend to be numbers. StrPos should have some nifty methods to return StrLen's relative to it's base Str's .start, and those StrLens can look like a number, but the StrPos never gets to ever look like a number. Make it where StrLen "does Int", and there's a C«coerce:(Int,StrLen)» with default units of your "Chars as highest supported by string applied to", and I think we're getting somewhere. We need to define what happens to a StrPos when it's base Str goes away. Having it assume some nifty flavor of undef would do the trick. This implies that a Str knows all the StrPos's hanging off it, so the destructor can undef them. But that shouldn't pose a problem for p6c. Now, I admit that I've handwaved the tricksy bit, which is, "How do you know, Larry, that substr() wants 5`Codes rather than 5`Meters? It's all very well if you have a single predeclared subroutine and can look at the signature at compile time, but you wrote those as multi methods up above, so we don't know the signature at compile time." Well, that's correct, we don't know it at compile time. But what *do* we know? We know we have a number, and that it was generated in a context where, if you use it like a string position, it should turn into a number of code points, and if you use it like a weight, it should turn into a number of kilograms (or pounds, if you're NASA). I don't see the need for all this. Make a C«coerce:(Int,StrLen)» as mentioned above, and the MMD should be able to figure out that it can take the Int peg and hammer it into the StrLen hole. Then leave it up to the coerce sub to complain if the I
Re: S28ish [was: [Pugs] A couple of string interpolation edge cases]
On Sat, Mar 26, 2005 at 02:37:24PM -0600, Rod Adams wrote: : Larry Wall wrote: : : >%+ and %- are gone. $0, $1, $2, etc. are all objects that know : >where they .start and .end. (Mind you, those methods return magical : >positions that are Unicode level independent.) : > : How can you have a level independent position? By not confusing positions with numbers. They're just pointers into a particular string. : The matching itself happens at a specified level. (Note that which level : the match happens at can change what is matched.) So it makes sense that : all the positions that come out of it are in terms of that level. When we're dealing with mostly variable length encodings, it makes more sense that the positions come out as string pointers that only convert to numbers grudgingly under duress. If you're just going to feed a position back into a substr() or as the start position of the next index(), there's no reason to translate it to a number and back to a pointer. It's a lot more efficient if you don't. : Now, that position can be translated to a lower level, but not to an : upper level, since you can happily land in the middle of a char. I talked about this problem in one of the As. I think the fail soft approach is to round to the next "ceiling" boundary and issue a warning. : This is part of what I'm having trouble with your concept of a Str being : at several levels at once: There's no reliable way to have a notion of : "position", expect to have it as attached to the highest possible level, : and the second someone does something at lower level, you void the : position, and possibly the ability to remain at that high level. A position that is a pointer can be true for all levels simultaneously. It has the additional benefit of a type that is subtype constrained to operate with other values from the same string, so if you subtract two pointers from different strings, you can actually detect the error. : I still see my notion of a Str having only one level and encoding at a : time as being preferable. Having the ability to recast a string to other : levels/encoding should be easy, and many builtins should do that : recasting for you. And I still see that you can have your view if you install a pragma that forces all incoming strings to a single level. But I think we can do that lazily, or not at all, in many cases. The basic underlying problem is that there is no simple mapping from math to Unicode. The language that lets people express their solution in terms of Unicode instead of in terms of math is going to have a leg up on the future, at least in the Unicode problem space. Strings were never arrays in Perl, and they're only getting further apart as the world makes greater demands on strings to represent human language. So I'd much rather introduce an abstraction like "string position" now that is not a number. It's a dimensional value, where the scaling of the dimensionality is bound to a particular string. You can have a pragma that says, "Untyped numbers are assumed to be meters, kilograms, and seconds", and a different lexical scope might have a pragma that says "Untyped numbers are assumed to be centimeters, grams, and seconds." These scopes can get along as long as they don't try to exchange untyped integers. Or if they do, they have some way of ascertaining what an untyped integer meant when it was generated. : I do _not_ see $/ & friends getting ported across a recasting. .pos can : be translated if new level <= old level, otherwise gets set to undef. The interesting thing about a pointer is that you can pass it through a higher level transparently as long as you don't actually try to use it. But if you do try to use it, I think undef is overkill. Just as a float stuffed into an int truncates, we should just pick a direction to find the next boundary and go from there, maybe with a loss of precision warning. The right way to suppress the warning would be to install an explicit function that rounds up or down. : Please convince me your view works in practice. I'm not seeing it work : well when I attempt to define the relevent parts of S29. But I might : just be dense on this. Well, let's work through an example. multi method substr(Str $s: Ptr $start, PtrDiff ?$len, Str ?$repl) Depending on the typology of Ptr and PtrDiff, we can either coerce various dimensionalities into an appropriate Ptr and PtrDiff type within those classes, or we could rely on MMD to dispatch to a suite of substr implementations with more explicit classes. Interestingly, since Ptrs aren't integers, we might also allow multi method substr(Str $s: Ptr $start, Ptr ?$end, Str ?$repl) which might be a more natural way to deal with variable length encodings, and we just leave the "lengthy" version in there for old times sake. We could go as far as to allow a range as the second argument: $x = substr($a, $start..^$end); or its evil twin: $x = $a[$start..^$
Re: S28ish [was: [Pugs] A couple of string interpolation edge cases]
On Sat, 2005-03-26 at 12:48 -0800, Larry Wall wrote: > On Sat, Mar 26, 2005 at 09:59:10AM -0500, Aaron Sherman wrote: > Well, there is a process object, but it actually exists inside the > operating system. It's a little silly to force people to name their > own process all the time. I think we can assume that global variables > belong to the current process, sort of on the "you're soaking in it" > principle. That seems to be a self-limiting position. It leads (as it did in Perl 5) to a desire to reduce the number of times you add access to new OS features (as it requires global namespace suckage, though not as bad as in Perl 5), and you'll still split out an object, module or data structure to contain all of the information that's not in Perl proper because it's platform specific (e.g. current drive letter context under DOS). I agree that $*PID is a useful alias for $*PROC.pid (though the extra * still bothers me), but providing a unified API for interacting with myself as an OS-level construct seems to make sense. That's perhaps just my preference. I'm a hybrid OO/procedural guy, so I tend to reach into the OO toolbox whenever I think it will make my life easier. > : If you think of the OS-level shell around a Perl interpreter as an > : object[...] > We can certainly have various objects proxying for various contexts. > It's not clear how those should be broken out though. To me, an OS > isn't a process, and there's not necessarily going to be a one-to-one > correspondence. True enough, and you would certainly NOT: my $sock = $*PROC.socket; That makes no sense at all. However, things like "what IO layer am I using" or "am I a thread" are perfectly valid questions to pose of a process abstraction. > : If we consider $*PROC to be the invocant of the implicit "main", then: > : > : say "I am number {.pid}, who is number 1?"; > That's an interesting idea, the more so now that we're leaning away > from .foo ever assuming the current topic unless it also happens to > be the invocant. But it probably wouldn't do to have one common name for > the .pid outside of methods and force people to use a different name > inside methods. Here's where $*PID works much better, because it can > be the same everywhere. Well, it's always: $*PROC.pid The invocant goodness is just handy in a certain circumstance (what *is* main's invocant, out of curiosity? I guess it could be the interpreter context, but that should probably have some relationship to your process info anyway (either is or does ... probably does.) If I were writing Learning Perl 6, I would teach "$*PID" and/or "$*PROC.pid", but not ".pid".
Re: S28ish [was: [Pugs] A couple of string interpolation edge cases]
On Sat, Mar 26, 2005 at 09:59:10AM -0500, Aaron Sherman wrote: : On Sat, 2005-03-26 at 00:27 -0800, Larry Wall wrote: : : > $$ is now $*PID. ($$foo is now unambuous.) : > : > $0 is gone in favor of $*PROGRAM_NAME or some such. : : You know, Java did one thing in this respect that I liked, and managed : to do it in a way that I couldn't stand. The idea of program as object : was nice, but they made the programmer manage it, which was really kind : of silly. Well, there is a process object, but it actually exists inside the operating system. It's a little silly to force people to name their own process all the time. I think we can assume that global variables belong to the current process, sort of on the "you're soaking in it" principle. : If you think of the OS-level shell around a Perl interpreter as an : object, and make perl manage that for you, then this falls out rather : nicely: : : $*PID := $*PROC.pid; : $*PPID := $*PROC.ppid; : $*PROGRAM_NAME := ~$*PROC; : : Perhaps even some often-used data could be shoved in there: : : $life = time() - $*PROC.start_time; : : In fact, it seems like a good place for any OS-level globals: : : $*IN := $*PROC.pio_in // $*PROC.stdin; We can certainly have various objects proxying for various contexts. It's not clear how those should be broken out though. To me, an OS isn't a process, and there's not necessarily going to be a one-to-one correspondence. : If we consider $*PROC to be the invocant of the implicit "main", then: : : say "I am number {.pid}, who is number 1?"; : : works just fine in global context. This also gives you a nice simple way : to drill down into your interpreter / runtime / VM / whatever state: : : say "I'm {.name} running under {.interp.name}"; That's an interesting idea, the more so now that we're leaning away from .foo ever assuming the current topic unless it also happens to be the invocant. But it probably wouldn't do to have one common name for the .pid outside of methods and force people to use a different name inside methods. Here's where $*PID works much better, because it can be the same everywhere. Larry
Re: S28ish [was: [Pugs] A couple of string interpolation edge cases]
Larry Wall wrote: %+ and %- are gone. $0, $1, $2, etc. are all objects that know where they .start and .end. (Mind you, those methods return magical positions that are Unicode level independent.) How can you have a level independent position? The matching itself happens at a specified level. (Note that which level the match happens at can change what is matched.) So it makes sense that all the positions that come out of it are in terms of that level. Now, that position can be translated to a lower level, but not to an upper level, since you can happily land in the middle of a char. This is part of what I'm having trouble with your concept of a Str being at several levels at once: There's no reliable way to have a notion of "position", expect to have it as attached to the highest possible level, and the second someone does something at lower level, you void the position, and possibly the ability to remain at that high level. I still see my notion of a Str having only one level and encoding at a time as being preferable. Having the ability to recast a string to other levels/encoding should be easy, and many builtins should do that recasting for you. I do _not_ see $/ & friends getting ported across a recasting. .pos can be translated if new level <= old level, otherwise gets set to undef. Please convince me your view works in practice. I'm not seeing it work well when I attempt to define the relevent parts of S29. But I might just be dense on this. -- Rod Adams
Re: S28ish [was: [Pugs] A couple of string interpolation edge cases]
On Sat, Mar 26, 2005 at 03:37:41AM -0700, Luke Palmer wrote: : > $! will be a legal variable name. $/ is going away, : : By which you mean that $/ is turning into a special $0. I'd say that $0 is a specialization of $/, but yes, basically, they both represent the current match result, albeit differently. $0 is explicitly what would have been returned by $1 if you'd put parens around the entire match, which is not quite the same as the complete match. result. : > Anything that varied with the selected output filehandle like $| : > is now a method on that filehande, and the variables don't exist. : > (The p5-to-p6 translator will probably end up depending on some : > $Perl5ish::selected_output_filehandle variable to emulate Perl 5's : > single-arg select().) : : I think $| et al. could just translate to methods on $*OUT, and select : would look like this: : : sub perl5_select($fh) { : $*OUT = $fh; : } : : Is there some subtlety that that doesn't cover? Like, it renders standard output nameless? In Perl 5, the selected output handle is a level of indirection above the standard names for the streams attached to fd 0, 1, and 2. Saying select(FH) doesn't change the meaning of STDOUT. : > %+ and %- are gone. $0, $1, $2, etc. are all objects that know : > where they .start and .end. (Mind you, those methods return magical : > positions that are Unicode level independent.) : : Uh, it might be a bad idea to make $# objects. It might not, but it : might. I think it would be fine if they turned into regular strings : upon assignment (and to pass their full objecthood around, you'd have to : backwhack them). But the problem with keeping them objects is that if : you put them somewhere else and change them, they turn back into regular : strings without .start and .end, which may be a hard-to-track-down bug : if you're thinking that they stay objects... haven't really thought : about this much (and my head is irritatingly foggy at the moment). My head is always irritatingly foggy. :-) Anyway, I'm think of them more as COW objects, and they'd have to know if their original string was yanked out from under them in any case, so that's probably the correct moment to invalidate .start and .end, if we even bother. : > $; is gone because the multidim hash hack is gone. : : Funny, I never used the multidim hash hack, I just emulated it: : : $hash{"$foo$;$bar"} = $value; Well, guess how we'll emulate it in Perl 6. :-) : > We never did find a use for $}, thank goodness. : : Isn't that the "enable all of Damian's unpublished modules" variable? Shh. Impressionable people are listening. : > $^W is is too blunt an instrument even in Perl 5, so it's probably gone. : : Well, almost. When writing a recent module, I found that one of the : modules I was using was spitting out an error from its own internal code : on one of my calls, and there was nothing wrong with the call. I : submitted a bug report to the author, and searched for a way to shut it : up so my users wouldn't complain at me. It ended up having to use $^W : at compile time (and it looks very hackish). We ought to have a : (perhaps not quite as hackish) ability to say "there's no reason for : that warning, but I can't modify your code, so just be quiet". Yes, we need to be able to suppress warnings in dynamic scopes as well as lexical, but that's probably not a scalar proposition anymore, unless the replacement for $^W is taken as a pointer to a hash of potential warnings. Presumably you could temporize the whole hash to suppress all warnings, or individual elements to suppress individual warnings. But maybe that's a good place for temporized methods instead, and then we could name sets of warnings. Or maybe there's yet some other approach that makes more sense. We want to encourage people to suppress only the exact warnings they want to suppress, and not just cudgel other modules into silence. : > I'm not quite sure what to do with $^N or $^R yet. Most likely they : > end up as something $ish, if they stay. : : For $^N, how about $/[-1]? I guess that makes some sense. I was thinking of $/[-$n] as relative to the current match position, but hadn't thought it through to the point of deciding how to count those. $^N mandates counting based on right parentheses rather than left, which I guess makes sense. So let's say that $/[-2] means (one) rather the incomplete ((three)two): /(one)((three) { $/[-2] } two) I note that this is another difference between $/ and $0, since $/ is representing the current state of the match, while $0 isn't bound till the match succeeds (unless you explicitly bind it earlier, which is yet another difference between $0 and $/, since you can't bind $/ to mean a portion of itself). Larry
Re: S28ish [was: [Pugs] A couple of string interpolation edge cases]
On Sat, 2005-03-26 at 00:27 -0800, Larry Wall wrote: > $$ is now $*PID. ($$foo is now unambuous.) > > $0 is gone in favor of $*PROGRAM_NAME or some such. You know, Java did one thing in this respect that I liked, and managed to do it in a way that I couldn't stand. The idea of program as object was nice, but they made the programmer manage it, which was really kind of silly. If you think of the OS-level shell around a Perl interpreter as an object, and make perl manage that for you, then this falls out rather nicely: $*PID := $*PROC.pid; $*PPID := $*PROC.ppid; $*PROGRAM_NAME := ~$*PROC; Perhaps even some often-used data could be shoved in there: $life = time() - $*PROC.start_time; In fact, it seems like a good place for any OS-level globals: $*IN := $*PROC.pio_in // $*PROC.stdin; If we consider $*PROC to be the invocant of the implicit "main", then: say "I am number {.pid}, who is number 1?"; works just fine in global context. This also gives you a nice simple way to drill down into your interpreter / runtime / VM / whatever state: say "I'm {.name} running under {.interp.name}";
Re: S28ish [was: [Pugs] A couple of string interpolation edge cases]
Larry Wall creates Sish28: > On Sat, Mar 26, 2005 at 02:11:29PM +0800, Autrijus Tang wrote: > : On Fri, Mar 25, 2005 at 10:03:45PM -0800, Larry Wall wrote: > : > Hmm, well, if it got that far. Given strict being on by default, > : > this particular example should probably just die on the fact that $" > : > isn't declared, since there's no $" in Perl 6. > : > : Is $" okay as a variable name? Is everything from perlvar.pod legal? :) > > Considering nobody's written perlvar.pod for Perl 6 yet, yeah, everything > in that pod is legal. :-) > > : my $" = 3; > : > : Pugs parses that because it only considers $! and $/ as legal > : symbolic variable names. > > $! will be a legal variable name. $/ is going away, By which you mean that $/ is turning into a special $0. > Anything that varied with the selected output filehandle like $| > is now a method on that filehande, and the variables don't exist. > (The p5-to-p6 translator will probably end up depending on some > $Perl5ish::selected_output_filehandle variable to emulate Perl 5's > single-arg select().) I think $| et al. could just translate to methods on $*OUT, and select would look like this: sub perl5_select($fh) { $*OUT = $fh; } Is there some subtlety that that doesn't cover? > %+ and %- are gone. $0, $1, $2, etc. are all objects that know > where they .start and .end. (Mind you, those methods return magical > positions that are Unicode level independent.) Uh, it might be a bad idea to make $# objects. It might not, but it might. I think it would be fine if they turned into regular strings upon assignment (and to pass their full objecthood around, you'd have to backwhack them). But the problem with keeping them objects is that if you put them somewhere else and change them, they turn back into regular strings without .start and .end, which may be a hard-to-track-down bug if you're thinking that they stay objects... haven't really thought about this much (and my head is irritatingly foggy at the moment). > $; is gone because the multidim hash hack is gone. Funny, I never used the multidim hash hack, I just emulated it: $hash{"$foo$;$bar"} = $value; > We never did find a use for $}, thank goodness. Isn't that the "enable all of Damian's unpublished modules" variable? > $^W is is too blunt an instrument even in Perl 5, so it's probably gone. Well, almost. When writing a recent module, I found that one of the modules I was using was spitting out an error from its own internal code on one of my calls, and there was nothing wrong with the call. I submitted a bug report to the author, and searched for a way to shut it up so my users wouldn't complain at me. It ended up having to use $^W at compile time (and it looks very hackish). We ought to have a (perhaps not quite as hackish) ability to say "there's no reason for that warning, but I can't modify your code, so just be quiet". > I'm not quite sure what to do with $^N or $^R yet. Most likely they > end up as something $ish, if they stay. For $^N, how about $/[-1]? Luke
S28ish [was: [Pugs] A couple of string interpolation edge cases]
On Sat, Mar 26, 2005 at 02:11:29PM +0800, Autrijus Tang wrote: : On Fri, Mar 25, 2005 at 10:03:45PM -0800, Larry Wall wrote: : > Hmm, well, if it got that far. Given strict being on by default, : > this particular example should probably just die on the fact that $" : > isn't declared, since there's no $" in Perl 6. : : Is $" okay as a variable name? Is everything from perlvar.pod legal? :) Considering nobody's written perlvar.pod for Perl 6 yet, yeah, everything in that pod is legal. :-) : my $" = 3; : : Pugs parses that because it only considers $! and $/ as legal : symbolic variable names. $! will be a legal variable name. $/ is going away, as is $", which means they fail under "use strict", but they'd still autocreate globals under laxity as Perl 5 does. (I know Perl 5 exempted all special variables from strict, but I don't see why we have to do that for Perl 6. Merely having $_ in the lexical scope or $*! in the global scope should be sufficient declaration to get around strict. Though perhaps we can exempt people from having to write $*! under strict. In fact, that probably goes for all predeclared $* names, so $IN is legal for $*IN as long as you don't have "my $IN" hiding it. Another way to look at it is that * variables are basically autodeclared "our" implicitly in the outermost lexical scope.) Sigh, I'd better rough it all in here, even if I don't have time to do a good job on it. Maybe somebody can beat this into a real S28 pod. $? and $@ are gone, merged in with $!. (Frees up ? twigil for $?FOO syntax.) $^E is merged too. $! is an object with as much info as you'd like on the current exception (unthrown outside of CATCH, thrown inside). Unthrown exceptions are typically interesting values of undef. $$ is now $*PID. ($$foo is now unambuous.) $0 is gone in favor of $*PROGRAM_NAME or some such. Anything that varied with the selected output filehandle like $| is now a method on that filehande, and the variables don't exist. (The p5-to-p6 translator will probably end up depending on some $Perl5ish::selected_output_filehandle variable to emulate Perl 5's single-arg select().) Likewise $/ and $. should be attached to a particular input filehandle. (In fact, $/ is now the result of the last regular expression match, though we might keep the idea of $. around in some form or other just because it's awfully handy for error messages. But the localizing $. business is yucky. We have to clean that up.) All the special format variables ($%, $=, $-, $:, $~, $^, $^A, $^L) are gone. (Frees up the = twigil for %= POD doc structures and old __DATA__ stream, the : twigil for private attributes, and the ~ twigil for autodeclared parameters.) $`, $', and $+ don't exist any more, but you can dig that info out of $/'s structures. Shortcuts into $/ include $1, $2, and such, and the newfangled $ things. Also, $& is changed to $0 for the whole matched string. $` and $' may be $ and $, but you probably have to explicitly match and to get them remembered, so we don't have a repeat of the Perl 5 sawampersand fiasco. and would automatically exclude themselves from $0. Or you need some special flag to remember them, maybe. %+ and %- are gone. $0, $1, $2, etc. are all objects that know where they .start and .end. (Mind you, those methods return magical positions that are Unicode level independent.) $* and $# have been deprecated half of forever and are gone. $[ is a fossil that I suppose could turn into an evil pragma, if we try to translate it at all. (Frees up * twigil for $*FOO syntax.) $(, $), $<, and $> should all change to various $*FOO names. $] is either something in $* or a trait of the Perl namespace. Likewise $^V, if they aren't in fact merged. ${...} is reserved for hard refs only now. ($::(...) must be used for symbolics refs.) ${^foo} should just change to $*foo or $*_foo or some such. $; is gone because the multidim hash hack is gone. $" is gone, replaced by @foo.join(":") or some such. Likewise for $, in print statements. We never did find a use for $}, thank goodness. And we still are keeping $_ around, though it's lexically scoped. Let's see, what other damage can we do to perlvar. $a and $b are no longer special. No bareword filehandles. $*IN, $*OUT, $*ERR. Args come in @*ARGS rather than @ARGV. (Environment still in %ENV, will wonders never cease.) I don't know whether @INC and %INC will make as much sense when we're looking installed modules in a database, though I suppose you still have to let the user add places to look. %SIG is now %*SIG. The __DIE__ and __WARN__ hooks should be brought out as separate &*ON_DIE and &*ON_WARN variables--they really have nothing to do with signals. I suppose we could even do away with %SIG and replace it with &*ON_SIGINT and such, though then we'd lose a bit of signal introspection which would have to be provided some other way. Oh, and we probably ought to split out &?ON_PARSEERROR from $*ON_DIE to