I'm wondering if we should explicitly break out the languages that
comprise perl today. That'd be at least toplevel perl, regular
expressions, and pack. Maybe tr// and the second half of s/// are
sufficiently different too. If nothing else, it would highlight the
problems in switching languages midstream. Like:

*) can you always know that you're definitely switching languages, or
might you have to back out of a regular expression parse when you
realize you've been reading a comment all along? Related: will you
sometimes need a lookahead "token" or twelve to figure out what's about
to happen from the outer language's perspective?

*) When you suspend the parse, you may need to suspend a whole tree of
language parsers.

*) Tokenization can be totally different for different languages.

*) Tokenization and parsing can depend on the interpreter's state. (Good
example? Indirect objects?) We either need an API for accessing such
state in whatever wacky structures the sub-languages store it in, or
define some rules about what cannot affect a parse. Gets into language
definition.

*) The end of a mini-language may have to be determined by agreement
between the outer and inner languages (the inner language has to agree
that it's at a stopping point; the outer language has to agree that the
next stuff can end an inner language region.)

*) These languages recurse quite a bit. Think s///e, or just variable
interpolation: m!$x{foo(3**1e6, "An entire program")}!. Though that
example will fail in perl5 if you say "An entire program!"; should it?
(If the answer is yes, then the outer language either has to tell the
inner "I know that normally you're okay with an unescaped exclamation
point, but don't be, please?", or it must impose its own notions of
escaping on the inner language to be able to scan ahead for the ending.
If the answer is no, then highlighting editors will be incrementally
more unhappy.)

Reply via email to