This is more or less what I was thinking of when I brought up Python's tokenizer on the earlier thread. The tokenizer seems to function much like a SAX parser in the XML world. It basically emits a stream of events as it works through a Python file. Most of them wouldn't be of interest for Leo-izing a file, but the whitespace detection and the events of defs and classes, etc., are just what are wanted here.
On Wednesday, December 1, 2021 at 5:26:48 PM UTC-5 vitalije wrote: > Idea: > use tokenize python module to find where all function/method and class > definitions start, > and then use this data to find lines where top level children should > start. After creating the top level children, the process can be repeated > for all nodes which have more than certain threshold number of lines, > generating the second level children. > > A little bit of background story (feel free to skip if you just want to > see the example code): > A long ago I've tried to solve this problem in more efficient way for > importing JavaScript files. I remember looking in the Importer class and > the way Leo did imports at the time and feeling that it was too > complicated, much more than necessary. I can't say that I've solved this > problem in general, but for a very specific case, it worked pretty well. > > Recent posts about improving Leo in this area, especially regarding > Python, made me think again about this problem. > > I strongly feel that the main problem with the current implementation is > insisting on the use of scan_line. This is maybe suitable for unification > of all other source languages, but it is far from the optimal when we talk > about the python source files. > > The way I see this problem is to search and find the lines where a new > node should start. Whether this node should be indented or not, I would > rather leave for the next phase. First of all, the outline structure of my > python files which I start from the scratch in Leo usually have in the top > level node a few lines, then comes at-others and usually after at-others > comes the block with `if __name__ == '__main__':. If I have a lot of > imports, then I usually put all imports in one section node `<<imports>>`. > > The first line where I would introduce the first child node is actually > first function definition or first class definition. Everything before > should go directly in the root node. Imports can be extracted later. > > Attached to this message is a python script which can be executed inside > Leo. > It imports any python module from the standard python library and checks > to see if the import is perfect or not. At the moment it just extracts top > level definitions in separate nodes, the direct children of root. > -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/leo-editor/e2dde3c1-eb0a-4945-a947-5f684b2eb2d6n%40googlegroups.com.
