On Mon, Apr 12, 2010 at 2:23 PM, Larry Wall <la...@wall.org> wrote:

> On Sun, Apr 11, 2010 at 03:47:18PM -0700, Darren Duncan wrote:
> :
> : I see that making underscores
> : and hyphens to be equivalent is akin to having case-insensitive
> : identifiers, where "Perl","PERL","perl" mean the same thing.  Rather
> : what I want is to be everything-sensitive, as AFAIK Perl 6 currently
> : is; if something looks different, it is different. -- Darren Duncan
>
> ...

> As for the horror of people having to memorize lists again or
> Risk Using The Wrong Character...I'm afraid I just don't buy it.
>

Larry, I'm curious what you think of this example: A web page of Perl 6
documentation suggests that you should call time-local. Unfortunately, in
the font that my browser uses, the height of that single stroke is
ambiguous. Of course, we could have no sympathy and just say, "get a better
font," but this problem will likely creep up over and over, would it not?

I agree with you that this doesn't really help the person writing code from
scratch, but that's not the same as a developer who is trying to interact
with potentially dozens of libraries with various sources of documentation
from comments to Web pages.

I'd suggest the following in decreasing order of urgency:


   - Choose a single character (hyphen or underscore) to use in standard
   library code to separate the component words of an identifier (remember that
   underscore is only special in C-like code because it's standing in for
   space).
   - Never use dash versus underscore notationally (e.g. a-b indicates that
   the identifier is to be used one way vs a_b indicates otherwise)
   - Allow only one such character in any given identifier


That last item rolls into a whole rant of mine against ambiguity in
identifiers. Most often this stems from Unicode that puts the programmer in
the position of having to have good enough font support to tell ambiguous
names apart (and in cases like "Αpple" or "Рerl" or "Ρerl", you're just
doomed regardless), but dashes and underscores are a good example of the
same problem cropping up elsewhere.

On the more general point, I really feel that no language should ever allow
identifiers to mix Unicode blocks without strong reason. Specifically:


   - Underscore (or dash or whatever your notational separator is) should be
   the only exceptional character allowed in all identifiers
   - Identifiers should never include c̈ombining m̅arks
   - Upon scanning the first alpha character in an identifier, no further
   alpha characters should be allowed except as they come from the same code
   block or related supplemental block ("related" might be expanded to include
   first-class blocks in some cases to allow for combinations like Kanji
   (Chinese in Unicode) + Hirigana, etc.)
   - Upon scanning the first numeric character in an identifier, no further
   numeric characters should be allowed except as they come from the same code
   block (again, there might be some wiggle in some exceptional cases, but the
   goal is to avoid counting in more than one system at a time).


Should all of these be hard errors? Perhaps not, but certainly they should
at least yield warnings by default.

PS: While I never finished the implementation of Sand, its simplistic guide
to Unicode identifiers might be useful in illuminating what I'm describing
above:

http://www.ajs.com/ajswiki/Sand:_Syntax_and_Structure#Identifiers


-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs

Reply via email to