Terry J. Reedy <tjre...@udel.edu> added the comment:

As noted in the test for find_good_parse_start and PR5755 discussion, a single 
line header on a line by itself in a multiline comment before a multiline 
header can prevent recognition of the latter.
 
>>> P.set_str("'somethn=ig'\ndef g(a,\nb\n")        
>>> P.find_good_parse_start(lambda i: False) 
13
>>> P.set_str("'''\ndef f():\n pass'''\ndef g(a,\nb\n")     
>>> P.find_good_parse_start(lambda i:i < 15)
>>>

One alternative to the current algorithm would be to search the beginning of 
every line for a compound statement keyword, not just lines ending with ':'.  I 
believe the concern is that this would require uselessly checking more lines 
within strings.  I believe that the same concern is why 'if' and 'for' are 
missing from the keyword list.

When the window is an editor rather than shell, find_good_parse_start is called 
in EditorWindow.newline_and_indent_event and Hyperparser.__init__.  The 
call-specific in-string function is returned by EW._build_char_in_string_func. 
It calls EW.is_char_in_string, which returns False only if the char in the text 
widget has been examined by the colorizer and not tagged with STRING.

The call to find_good_parse_start is always followed by a call to set_lo and 
and then _study1 (via a call to another function).  _study1 replaces runs of 
non-essential chars with 'x', which means that string literals within the code 
string are mostly reduced to a single x per line.  (It would be fine if they 
were emptied except for newlines.)  This suggests starting 
find_good_parse_start with a partial reduction, of just string literals, saved 
for further reduction by _study, so that keywords would never occur within the 
reduced literal.

The problem is that one cannot tell for sure whether ''' or """ is the 
beginning or end of a multiline literal without parsing from the beginning of 
the code (which colorizer does).  An alternate way to reuse the colorizer work 
might be to use splitlines on the code and then get all string tag ranges.

The code-context option picks out compound-statement header lines.  When 
enabled, I believe that its last line may be the desired good parse start line.

Any proposed speedup should be tested by parsing multiple chunks of multiple 
stdlib modules.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32880>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to