tokenizer hints, supporting delimited identifiers or symbols

Darren Duncan Tue, 07 Feb 2006 15:22:37 -0800

All,

I would like for there to be a simple and terse way for Perl 6identifiers or symbols, including variable and subroutine andidentifier names, to be able to be composed of any characterswhatsoever, even whitespace, as it is possible to do in some otherlanguages like SQL, and as it is possible to name filesystem files.

I also want to emphasize that what I'm looking for is simply acompile time feature; the delimited identifiers are always literalconstants resolvable at compile time, so there is no possibledeferral to runtime like with symbolic references that can come fromvariables.

This would asist in having closer mapping when porting code from alanguage like PLSQL to Perl, or invoking code in such languages, butalso gaining that native ability internally. And simply remappingcharacters, like spaces to underscores, won't work partly because ofclashes like if the source had both a "the var" and a "the_var"already. And certain other workarounds, like hex-escaping all sourceidentifiers, would cause obfuscation, which is bad for understandingthe result.

In a way, this would be a wider application of that hash keys canalready contain any characters, or that named parameter arguments canbe string-quoted, though the latter are akin to identifiers in themethod declarations.

Unless its already done, I see that support for this is onlysomething that the tokenizer, and perhaps wider parser, of Perl 6code has to be concerned with, and all other parts of the Perl 6runtime don't have to care. Because, really, one main reason itisn't common place to, say have space characters in variable names,is because that could make the parser's job more difficult whendetermining the boundaries of a symbol name in code.

I propose that this can be accomplished with a simple and optionalde-sugaring of the language that simply provides clues to thetokenizer in the form of special delimiters.

For example, if Perl 6 doesn't currently have back-tick (`)delimiters reserved (I forget) like Perl 5 does for invoking the Unixshell, we could use that; literal occurances of the delimitercharacters in the identifier would be backslash-escaped as usual likewith the single-quote (') delimited strings. Or if you considerthis being used rarely, we could huffman code to have a longerdelimiter like "qi()" or "qs()" or something.

If the delimited identifier would be valid as a non-delimitedidentifier (since it only contains alphanums for example), which Perl6 code is composed of by default, then delimited and non-delimitedversions of the same can be intermixed as equivalent; otherwise (eg,if they contain whitespace), they appear only in delimited form.


Using the back-ticks as an example, we could say:

  my $baz = 7; # parsed symbol is "baz"
  say $baz;    # parsed symbol is "baz"

  my $`foo` = 3; # parsed symbol is "foo"
  say $`foo`;    # parsed symbol is "foo"

  my $`the bar` = 5; # parsed symbol is "the bar"
  say $`the bar`;    # parsed symbol is "the bar"

Similarly, with subroutine or method names:

  method `do it` (:$`with this`) { ... }

  $myobj.`do it`( 'with this' => 17 );
  $myobj.`do it`( :`with this`<44> );

Note that named arguments can already have string quoted key names, Ithink, this is sort of an extension of that.

Of course, the exact syntax can be different, but I want to not losefunctionality that I have in other languages and environments when inPerl 6.

Unless we have this feature, I would have to resort to either storingall symbols in hashes, or hex-escaping them all to ensure useablecharacters without name collisions, and that makes the resulting codeobfuscated and hard to understand; I don't want to obfuscate.


Thank you. -- Darren Duncan

tokenizer hints, supporting delimited identifiers or symbols

Reply via email to