I'm not surprised by this.  I found the same thing in developing my 
bookmarks manager.  Conceptually the structure of the bookmarks collection 
is tree-like, but it can be much easier to flatten that structure into text 
and use string methods on the flattened structure.

OTOH, there has been theoretical and (somewhat?) practical work on regular 
expressions for trees.  E.g.,

Implementing Regular Tree Expressions 
<https://theory.stanford.edu/~aiken/publications/papers/fpca91.pdf> (very 
tough going)
Tregex, Tsurgeon and Semgrex 
<https://nlp.stanford.edu/software/tregex.shtml>

If you can convert the AST trees to XML form, then there are XSLT, XPATH, 
and perhaps best of all, XQuery.  And if you can convert trees into RDF 
graphs, some implementations of SPARQL 
<https://www.w3.org/TR/rdf-sparql-query/> can efficiently query and 
transform documents with millions of RDF triplets.


On Monday, May 8, 2023 at 9:16:08 AM UTC-4 Edward K. Ream wrote:

> Yesterday I discovered a dead simple *design* pattern for finding *code* 
> patterns in python parse trees.
>
> This pattern is a milestone. It changes forever how I'll use python's ast 
> <https://docs.python.org/3/library/ast.html> module. Old techniques now 
> seem like trying to swat flies with a sledgehammer!
>
> The Aha is as surprising as the Aha that created @clean. I'll never forget 
> it.
>
> *tl;dr: *Convert *parts* of parse trees to strings using ast.unparse.
>
> *Background*
>
> I have been revisiting an old problem: how to analyze Leo's code. The 
> previous state of the art was unsatisfactory. There were two approaches, 
> each flawed:
>
> 1. Use regex patterns. This approach works for prototyping, but there is 
> no way to ignore false matches in strings, docstrings, and comments.
>
> 2. Use parse trees. This approach is sound, but finding patterns in ast 
> nodes was cumbersome.
>
> How to get the best of both approaches?
>
> Finally, my work with the rope project made me familiar with a new design 
> pattern: small, special-purpose tree traversers based on ast.NodeVisitor 
> <https://docs.python.org/3/library/ast.html#ast.NodeVisitor>. 
>
> *The Problem*
>
> I wanted a program to find all chains of ast Attribute nodes within Leo. 
> Examples are g.app.gui and c.frame.body.wrapper.widget.
>
> I packaged this program as the TestChains.test_all_paths unit test in 
> leo/unittests/test_design.py. This is a new file: see PR #3319 
> <https://github.com/leo-editor/leo-editor/pull/3319>.
>
> Early attempts were clumsy, as before. ast.Attribute trees can contain 
> much more than ast.Name and ast.Attribute nodes.
>
>
> *The Aha: use ast.unparse*
>
> I had vaguely planned to experiment with ast.unparse 
> <https://docs.python.org/3/library/ast.html#ast.unparse>. Given an ast 
> node/tree, ast.unparse returns a string with code that would produce an 
> equivalent node/tree.
>
> Aha: This string is exactly what we need to discover patterns in parse 
> trees!!!
>
> Here is the *complete* traverser class that discovers all chains:
>
> class ChainsTraverser(NodeVisitor):
>     chains_set = set()
>
>     def visit_Attribute(self, node):
>         """
>         Add only top-level Attribute chains to chains_set.
>         Do *not* call generic_visit!
>         """
>         chain = ast.unparse(node)
>         self.chains_set.add(chain)
>
> OMG! This is amazing:
>
> - unparse returns the *entire* chain, no matter how complex.
> - visit_Attribute handles only top-level attributes because it *doesn't* 
> call ast.generic_visit 
> <https://docs.python.org/3/library/ast.html#ast.NodeVisitor.generic_visit>
> .
>
> A straightforward unit test (TestChains.test_all_paths) instantiates 
> ChainsTraverser once, calls the traverser for most of Leo's core files, and 
> prints/tests ChainsTraverser.chains_set. Nothing could be simpler.
>
> *Summary*
>
> Finding patterns in ast trees is straightforward:
>
> - Use ast visitors to find the *roots* of ast trees containing potential 
> patterns.
> - Convert those roots to strings with ast.unparse.
> - Use regexs to filter the results. See the PostScript.
>
> This Aha changes my whole approach to analyzing ast trees.
>
> Edward
>
> P.S. TestChains.test_all_paths filters the resulting chains as follows:
>
> array_pat = re.compile(r'(\[).*?(\])')
> call_pat = re.compile(r'(\().*?(\))')
> string_pat1 = re.compile(r"(\').*?(\')")
> string_pat2 = re.compile(r'(\").*?(\")')
> patterns = (array_pat, call_pat, string_pat1, string_pat2)
>
> def filter_chain(s: str) -> str:
>         
>     def repl(m):
>         return m.group(1) + m.group(2)
>
>     for pattern in patterns:
>         s = re.sub(pattern, repl, s)
>     return s
>
> This kind of post-processing would be impossible if the operands were ast 
> trees.
>
> Edward
>

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/8dd552b3-b315-4e79-bf87-4eb85e93370dn%40googlegroups.com.

Reply via email to