On Sun, Dec 17, 2000 at 12:43:15PM +0000, Simon Cozens wrote:
> On Sun, Dec 17, 2000 at 01:20:07AM +0000, Nicholas Clark wrote:
> > I'm assuming we're all sort of thinking that input is certainly
> > [good stuff]

Thanks, but you were supposed to tell me what I'd missed :-)

> > I don't think you can do that with eval in perl5, can you?
> > If not, it represents something new the parser will have to be able to
> > communicate with the outside world.
>  
> You can't, but it's a simple matter of trapping end-of-source in the parser.
> (Look for "case 0" in toke.c)

Ah. So it's not that hard, it's just a matter of putting this ability
into the API.

I had a hairier thought last night.
Lots of people (well, certainly at least me and the person who wrote it)
use syntax highlighting editors.
The remarks I've seen in comments in several syntax highlighting modes for
perl say that is that perl is pretty much the hardest language to parse.
For example, the syntax highlighter I find myself using at the moment
doesn't like:

    $foo = \"abc";

It thinks that \" escapes that ". Oops.
Other nice things like s###; pod, __DATA__, here docs, not needed to escape
some otherwise "interesting" characters inside regexps all seem to be prone
to confusion. So one gets irritated, and either spend time trying to fix
the mode, or "fix" your perl to make the mode happy.
Neither of which is getting your real job done (or letting you use your
spare time to fix perl instead of the mode)
There's only 1 thing that really knows how to parse perl - perl  :-)

Would it be sane to get the parser to return suitable information on the
source to let a syntax analyser (such as a highlighting editor) know that
character positions 5123 to 5146 are a qq() string (So it can change the
font or the colour or whatever)

I don't think this is quite the same job as rebuilding the script from
the syntax tree, as you need to preserve (at least)
whitespace, comments, pod, whether they wrote

   die "Erk" if $foo;

   if ($foo) {
     die "Erk";
   }

, that "Hello World\n" isn't "Hello World
" isn't "\uhello \Uw\LORLD\012"
(even if the lattermost constant folds to the same string),
useless constants in void context and probably some more things that
dissappear.

I'm assuming that for performance reasons a parser-beast written in C would
have code to do this conditionally compiled (like the -DDEBUGGING stuff)
so that serious production perl wouldn't have the slowdown, but the perl
you embed in your favourite editor would. It could even tell you the 
precise character (rather than line number) that it barfed on. It could
run perl -c on your script without it ever leaving your editor's buffer.

Not sure how you get such information back out from the API, but I suspect
that making it possible to do this would need some new hooks in the API.

Nicholas Clark

Reply via email to