On Mon, Apr 12, 2010 at 2:23 PM, Larry Wall <la...@wall.org> wrote: > On Sun, Apr 11, 2010 at 03:47:18PM -0700, Darren Duncan wrote: > : > : I see that making underscores > : and hyphens to be equivalent is akin to having case-insensitive > : identifiers, where "Perl","PERL","perl" mean the same thing. Rather > : what I want is to be everything-sensitive, as AFAIK Perl 6 currently > : is; if something looks different, it is different. -- Darren Duncan > > ...
> As for the horror of people having to memorize lists again or > Risk Using The Wrong Character...I'm afraid I just don't buy it. > Larry, I'm curious what you think of this example: A web page of Perl 6 documentation suggests that you should call time-local. Unfortunately, in the font that my browser uses, the height of that single stroke is ambiguous. Of course, we could have no sympathy and just say, "get a better font," but this problem will likely creep up over and over, would it not? I agree with you that this doesn't really help the person writing code from scratch, but that's not the same as a developer who is trying to interact with potentially dozens of libraries with various sources of documentation from comments to Web pages. I'd suggest the following in decreasing order of urgency: - Choose a single character (hyphen or underscore) to use in standard library code to separate the component words of an identifier (remember that underscore is only special in C-like code because it's standing in for space). - Never use dash versus underscore notationally (e.g. a-b indicates that the identifier is to be used one way vs a_b indicates otherwise) - Allow only one such character in any given identifier That last item rolls into a whole rant of mine against ambiguity in identifiers. Most often this stems from Unicode that puts the programmer in the position of having to have good enough font support to tell ambiguous names apart (and in cases like "Αpple" or "Рerl" or "Ρerl", you're just doomed regardless), but dashes and underscores are a good example of the same problem cropping up elsewhere. On the more general point, I really feel that no language should ever allow identifiers to mix Unicode blocks without strong reason. Specifically: - Underscore (or dash or whatever your notational separator is) should be the only exceptional character allowed in all identifiers - Identifiers should never include c̈ombining m̅arks - Upon scanning the first alpha character in an identifier, no further alpha characters should be allowed except as they come from the same code block or related supplemental block ("related" might be expanded to include first-class blocks in some cases to allow for combinations like Kanji (Chinese in Unicode) + Hirigana, etc.) - Upon scanning the first numeric character in an identifier, no further numeric characters should be allowed except as they come from the same code block (again, there might be some wiggle in some exceptional cases, but the goal is to avoid counting in more than one system at a time). Should all of these be hard errors? Perhaps not, but certainly they should at least yield warnings by default. PS: While I never finished the implementation of Sand, its simplistic guide to Unicode identifiers might be useful in illuminating what I'm describing above: http://www.ajs.com/ajswiki/Sand:_Syntax_and_Structure#Identifiers -- Aaron Sherman Email or GTalk: a...@ajs.com http://www.ajs.com/~ajs