This is more or less what I was thinking of when I brought up Python's 
tokenizer on the earlier thread.  The tokenizer seems to function much like 
a SAX parser in the XML world.  It basically emits a stream of events as it 
works through a Python file.  Most of them wouldn't be of interest for 
Leo-izing a file, but the  whitespace detection and the  events of defs and 
classes, etc., are just what are wanted here.

On Wednesday, December 1, 2021 at 5:26:48 PM UTC-5 vitalije wrote:

> Idea:
> use tokenize python module to find where all function/method and class 
> definitions start,
> and then use this data to find lines where top level children should 
> start. After creating the top level children, the process can be repeated 
> for all nodes which have more than certain threshold number of lines, 
> generating the second level children.
>
> A little bit of background story (feel free to skip if you just want to 
> see the example code):
> A long ago I've tried to solve this problem in more efficient way for 
> importing JavaScript files. I remember looking in the Importer class and 
> the way Leo did imports at the time and feeling that it was too 
> complicated, much more than necessary. I can't say that I've solved this 
> problem in general, but for a very specific case, it worked pretty well.
>
> Recent posts about improving Leo in this area, especially regarding 
> Python, made me think again about this problem.
>
> I strongly feel that the main problem with the current implementation is 
> insisting on the use of scan_line. This is maybe suitable for unification 
> of all other source languages, but it is far from the optimal when we talk 
> about the python source files.
>
> The way I see this problem is to search and find the lines where a new 
> node should start. Whether this node should be indented or not, I would 
> rather leave for the next phase. First of all, the outline structure of my 
> python files which I start from the scratch in Leo usually have in the top 
> level node a few lines, then comes at-others and usually after at-others 
> comes the block with `if __name__ == '__main__':. If I have a lot of 
> imports, then I usually put all imports in one section node `<<imports>>`.
>
> The first line where I would introduce the first child node is actually 
> first function definition or first class definition. Everything before 
> should go directly in the root node. Imports can be extracted later.
>
> Attached to this message is a python script which can be executed inside 
> Leo.
> It imports any python module from the standard python library and checks 
> to see if the import is perfect or not. At the moment it just extracts top 
> level definitions in separate nodes, the direct children of root.
>

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/e2dde3c1-eb0a-4945-a947-5f684b2eb2d6n%40googlegroups.com.

Reply via email to