More not-quite-FlightGear subject matter ahead.  But I need advice:

Nasal needs a "character constant" syntax.  That is, the ability to
write an ASCII charactrer as a numerical constant.  In C/C++, you use
single quotes to do this (e.g. the token 'A' is just a synonym for the
integer value 65).

   (Brief background: I just added the ability to read and write
   individual bytes in a string with the [] operator, just like you
   can for elements of the vector.  Thus the need for character
   constants to which to compare those bytes.)

But Nasal can't do that, because it already uses single quotes for
unescaped strings, something that will be really useful in the regular
expression interface I am putting together.

Perl and Python get away without having character constants at all.
They do string indexing by making substrings at runtime.  But
substrings are garbage-collected, which makes them a little expensive.
I don't want to thrash the heap just to iterate through a single
string (the lack of the ability to do this really annoys me in perl).
So basically, I can't just emulate C, perl or python here, I need to
"invent" a new syntax.

One possibility is to use the backquote to do this, so `A` would be a
synonym for 65 in Nasal.  This would be nice because I could
re-purpose the lexer for double-quoted strings, and then throw an
error if the resulting string was not a single character (single byte
for now, single UTF8 character in the future).  But the backquote is
hard to type, and in some fonts hard to distinguish from a regular
single quote.

Some languages do this by prefixing a single token to the constant
instead of enclosing this in quotes.  Examples: Ruby expresses 65 as
?A, where Lisp uses \#A.  (Nasal can't use ? because of the ternary ?:
operator, but it might use something like @, %, or &, or maybe a
single backquote).  This is nice because it's easy to type and easy to
read.  But the syntax makes it hard to support the same escapes as a
"" string, so it wouldn't be natural to write things like @\t for a
tab (syntax highlighting in the emacs nasal mode, for example, freaks
out when it sees the lonely backslash).

Or we could do a combination: prefix a normal string constant with a
special token indicating to the lexer that this is a character
constant.  Something like c"A" or @"A" for 65, for example.  This is
easy to read and type, and natural to implement.  But it's different
from other languages, and Nasal is trying really hard to stick to
common, proven, universally-understood features in its design.

So anyway, which of the following are good/bad choices for a character
constant syntax:

   `A`   @A   $A   %A   &A   @"A"   $"A"   %"A"   &"A"   c"A"

Finally, should I even have a character constant syntax at all?  Note
that there is a potential gotcha with this feature: Nasal is
dynamically typed, so there is nothing "incorrect" with writing code
like:

   for(var i=0; i<size(str); i+=1) {
       if(str[i] == "A") { foundA(); }
   }

Except that this code is WRONG.  The str[i] expression returns a
number, not a string.  You can legally (!) compare it with the string
"A", but the result will be false in all cases, even when the number
is 65*.  Is this kind of mistake common enough to eliminate the idea
of indexed strings returning numbers?

  [* I thought briefly about having strings match numbers when the
     string was a single character with the same value, but that
     doesn't work.  A world where 9 is equal to "\t" but "9" is not
     equal to "\t" is just too weird to contemplate.]

Anyway, feel free to kibitz.  I'd back out the string indexing
feature, but I really, really like it...

Andy

_______________________________________________
Flightgear-devel mailing list
Flightgear-devel@flightgear.org
http://mail.flightgear.org/mailman/listinfo/flightgear-devel
2f585eeea02e2c79d7b1d8c4963bae2d

Reply via email to