Hi Kilon,

2015-02-11 8:24 GMT+01:00 kilon alios <kilon.al...@gmail.com>:

> Ok so after rereading the tutorial and testing again and again , I think I
> have finally managed to understand how SmaCC really works and I was
> succesful into converting simple python litsts to pharo arrays and ordered
> collections.
>
> The tricky part now is to apply this knowledge to complex python types
> like multi dimensional lists, tuples and dictionaries. I understand that
> the visitor allows me to visit a specific object instances each time they
> are found in the AST . But because I want to walk the AST in order to build
> multi dimensional ordered collections I need something more, or maybe my
> understanding of the visitor pattern is flawed.
>
> The problem I am having here is that each time I parse a python type that
> python type is not necessarily represented by different kind of node. For
> example whether its a list or a tuple or a dictionary the same class is
> used PyAtomNode. In order to differentiate between those diffirent python
> types PyAtomNode has instance variables for right and left bracket,
> parantheses, and curly. So my initial thinking is to check those instance
> variables to see if they are nil and from that I can conclude which python
> type I am parsing.
>
> So I can preform simple ifs that check that the instance variable is Nil
> or not but the question is if my strategy is a good one or a bad one.
>

Well, I see three things in what you wwant to achieve.

The first one is that checking the node (it has parentheses, brackets,
etc...) is a good way of determining its type during the visitor traversal.
You may extend directly PyAtomNode with selectors such as isArray,
isDictionary, etc... so as to cope with the fact that the parser puts
different objects under the umbrella PyAtomNode.

The second one is that I did most of the AST generation as a crude, get rid
of warnings approach (and it took me long enough). Now that you have a
better understanding of your requirements, it may be a good idea to revisit
the grammar and ensure that the right type of node is generated. For
example, if you rewrite the atom productions :

atom:
    <lparen> <rparen> {{}}
    | <lparen> yield_expr 'list' <rparen> {{}}
    | <lparen> testlist_comp 'list' <rparen> {{}}
    | <lbrack> <rbrack> {{}}
    | <lbrack> listmaker 'list' <rbrack> {{}}
    | <lcurly> dictorsetmaker 'list' <rcurly> {{}}
    | <lcurly>  <rcurly> {{}}
    | "`" testlist1 'list' "`" {{BackTick}}
    | <name>  {{Symbol}}
    | <number> {{Number}}
    | strings
    ;

What you see is that, with the {{}}, I create PyAtomNode instances for all
productions, even if it isn't appropriate. Maybe this should be changed
like that for lists :

    | <lbrack> <rbrack> {{List}}
    | <lbrack> listmaker 'list' <rbrack> {{List}}

Like that I get PyListNode when I parse '[ ]' . I just have to tune the AST
generation code so that it gives me the nodes I need, and, at the moment,
you are the right person to do so since you're molding it to your needs.

And the last one is about the visitor. For complex processing like the
transformations you intend, I would see two strategies: a builder inside
the visitor with a stack/context strategy, so that you can recurse in your
visit of the ast and add elements to the right collection, or a simple
recurse and merge the result of the lower visits (when in a List node,
collect the visit of all the children as an array or as an
OrderedCollection).


>
> I could define my own syntax to simplify the AST tree including different
> nodes for different python types , because from the looks of it , it seems
> it is a bit too verbose for my needs but On the other hand I am not so sure
> because in the future my needs may become more verbose too.
>

I believe this is approach two above: you can mold the grammar to your
needs. As John Brant told me: an AST is not a parse tree. So you can, and
should, adapt the AST generation in the way that makes sense to you.


>
> So I am very close and ready to create my full python types converter for
> pharo but I wanted some good advice before wasting time on something that
> is not efficient.
>

I believe you are on the right path, if my explanations made sense :)


>
>
> By the way Thierry I have to agree with you Smacc is a very capable
> parser, also I like the use of regex syntax, makes it uglier compared
> Pettit Parser but I prefer the compact regex syntax to having to define and
> browse tons of classes and send tons of messages. Also the Python support
> is very good and I am impressed how easily SmaCC can parse whole python
> applications since some of the test are very complex. Well done great work!
>

Thank to the help of all who have worked on Python parsing before and with
me: having a large test base as I inherited from the previous parser is a
huge benefit. Happy that you like it: implementing the ident/dedent tokens
was the most interesting part in there.

If you start changing the grammar as suggested above, make a fork and pull
requests on github :)

Thierry

Reply via email to