Re: Syntax explainer, phase 2: planning

jerry gay Wed, 30 Jan 2008 10:57:45 -0800

On Jan 30, 2008 10:36 AM, Moritz Lenz <[EMAIL PROTECTED]> wrote:
>
> Larry Wall wrote:
> > On Wed, Jan 30, 2008 at 04:08:04PM +0100, Moritz Lenz wrote:
> > : Do you have any idea how I may circumvent the problem?
> > :
> > : I had some thoughts, but none appear to be a good solution:
> > :  * build two trees, one normal AST for the BEGIN block evaluation, and
> > : one parse tree for the markup output.
> > :  * subclass the normal action class, and annotate the AST with enough
> > : information, and as a second stop, after all BEGIN block were executed,
> > : filter out the interesting information.
> > :  * parse the BEGIN blocks with the normal grammar and action class, and
> > : the rest with the modified action class that emits the markup.
> > :
> > : Actually I have no idea if any of these could work. Any thoughts?
> >
> > From my MAD experiences, I'd say the only guaranteed correct way is to
> > annotate the existing AST, and to make sure that the standard grammar
> > mechanism has all the hooks you need to do that.
>
> Ok, then I'll do that.
>
> Question to the rakudo hackers: are the hooks there yet?
> Start position and end position of the token + token name + key would be
> enough, or start postion + a uniq key should work as well.
>
well, you may have to dive into PIR to get at it, but it's all there.
for example, see the ws and afterws rules in the rakudo perl 6 grammar
file.


> > The big evil in the
> > Perl 5 parser is that it was continually forgetting things.  It does
> > this by lying to itself about what it saw.  Or in more moderate terms
> > "replace this AST with that AST".  So when you talk about trying to
> > maintain a separate AST, I shudder with horror.  It's impossible.
> > So never replace.  Always augment and annotate.  It will save your
> > sanity, and stop the flame wars about forcing people to program in
> > the One True Language.  Perl 6 is not about that.  It's about being a
> > metalanguage in which you can express many languages, and doing so in
> > a sufficiently controlled fashion that we always know what language
> > any given lexical scope is expressed in.  And if we truly know what
> > language we're parsing at any moment, we can do everything PPI does
> > without much extra work, and without enforcing arbitrary linguistic
> > restrictions.
> >
> > If the current {*} hack is insufficiently powerful for you to
> > annotate the AST correctly, then we need to negotiate a better hack.  :)
>
> I think the {*} hack can be made sufficiently powerful, but it requires
> additional work, for example currently you can't know from looking at $/
>  which token/regex/rule it comes from.
> You can work around it by adding that information in every action
> method, but that's boring work and no fun.
> Maybe a modifier :trace could annotate that automatically?
>
yes, pge is missing the ability to know the name of the rule it's
currently inside. i'd like it because it'd make debugging and error
message generation easier, so it's on my list of things to implement,
but it's not there yet. i suppose a TODO ticket in RT couldn't
hurt....

> > : A second problem is that the information should be accessible for
> > : perldoc. Since the documentation synopsis is indefinitely pending, I
> > : don't really want to rely on perldoc syntax, especially because the data
> > : has to be accessible from the action class.
> > : This could be circumvented by another abstraction layer (for example a
> > : text based DB that contains uniq token names and the description, and
> > : that DB could be used both by the action class and to emit some perldoc).
> > : Are there better ideas, perhaps even some that don't introduce more
> > : layers? ;-)
> > :
> > : Any comments are welcome.
> >
> > This seems to me to primarily be a naming problem, and the AST gives
> > you the naming path to get to any particular node.
>
> Not in the detail level that I want, no. At least not in the general case.
>
> You can't know from the AST if something was matched by <foo=bar> or by
> <bar>, and any closure can make() $/ something completely different.
> And (<.foo>) leaves no trace that could be used to identify the matching
> regex.
>
> I don't know if that's a problem in reality, or just an academic one.
>
> I just ran
> ../../parrot perl6.pbc --target=past t/01-sanity/02-counter.t
>  and it seems that I'm able to reconstruct the basic structure (I can
> identify operators and variables and their position in the source code,
> for example), but for example it stores variables this way:
>
> PMC 'PAST::Var'  {
>     <name> => "$counter"
>     <viviself> => "Undef"
>     <source> => "$counter"
>     <pos> => 192
> }
>
> That's probably all you need for the compiler, but it doesn't go into
> the details, for example that '$counter' is made of a sigil and an
> identifier.
> Is it overkill for a normal compilation to keep that information? Or
> could we add that?
> Or is such a detail level overkill even for a syntax explainer?
>
your problem here is too much abstraction. by the time you're dealing
with abstract syntax tree, you've lost some syntactic info. you want
--target=parse instead, which will give you a parse tree including
sigil-twigil info, etc.

~jerry

Re: Syntax explainer, phase 2: planning

Reply via email to