On Fri, May 20, 2016 at 09:39:30AM -0400, yary wrote: : On Tue, Apr 12, 2016 at 6:12 PM, Brandon Allbery <allber...@gmail.com> : wrote: : > I was explaining why some "symbols" are acceptable to the parser. Which : one : > is more appropriate is not my call, : : I was thinking about what exactly are valid identifiers in Perl6/rakudo's : implementation. The docs <http://docs.perl6.org/language/syntax#Identifiers> : say: : : An identifier is a primitive name, and must start with an alphabetic : character (or an underscore), followed by zero or more word characters : (alphabetic, underscore or number). You can also embed dashes - or single : quotes ' in the middle, but not two in a row.
At this point, "number" means only characters with a GeneralCategory of Nd. We could talk about generalizing that, but there are potential issues. We can't simply extend it to No characters, because then pi² would misparse as a 3-character identifier. : Experimenting with some of the numeric codes from Wikipedia : <https://en.wikipedia.org/wiki/Numerals_in_Unicode>, some of the numeric : codes seem inconsistent- Note that, even if we used this table, we could not distinguish ² from ② and such. : > my $_६೬𝟨 = ६೬𝟨 # "De" Devanagari, Kannada, Mathematical. "De" is all : good. : 666 That's fine, those work because of the Nd general property, so they're equivalent to 0..9 as far as we're concerned. : > my $x六 = 6 # "Nu" Han number 6 : 6 : > say 六 : ===SORRY!=== ... Note that 六 works in identifiers by virtue of being not numeric at all, but by being in general category Lo, that is, it's a "letter other", so considered alphabetic. : > say ௰ # "Nu" Tamil number 10 : 10 : > my $x௰ = 5 : ===SORRY!=== Error ... Excluded because it's No, not Nd. : > say ① + 3 # "Di" 1 in typographic context has value 1 : 4 : > my $b① = 44 "Di" 1 not valid in identifier : ===SORRY!=== Error ... ① is indistinguishable from superscripts, even by "Di", and falls into the No general category, so excluded. : Some numeric codepoints are recognized as such, yet Rakudo isn't allowing : them in identifiers. Especially confounding is the treatment of the "Han : number 6" and "Tamil number 10", both of which are unicode "Nu" numeric. : The Tamil is recognized as a number on its own but not as an identifier; : the Han is allowed in an identifier but isn't recognized as a number! We currently rely only on GeneralCategory. I don't believe we use NumericType anywhere in parsing Perl 6. : Is there some deeper rule at work here- which could be added to the : documentation? Or are these bugs? Not a bug, but potentially negotiable. It simply comes down to Nd vs No at the moment. One could argue that we could notice superscripts as a separate category and treat them differently, but there are two arguments against that. The first is that we'd like to keep the basic identifier rules fairly simple. We're already pushing the state of the art here, and I don't see much benefit in making the rules more arcane that they are. The second argument is that we should probably reserve syntax for the user here. Once we get slangs fully hooked up, we can easily let users define identifiers to include ① and such. But it's just as likely, perhaps more likely, that the user will want to use ① for a postfix, just like we currently treat superscripts as powers. We can't guess (well, we *could* guess, but can't know) which way the user will want to use these, so the conservative approach is to make neither of them work, and let the user take an additive approach, rather than forcing them to use a subtractive approach if we guessed wrong. Larry