subject:"What to use for finding as many syntax errors as possible."

Re: What to use for finding as many syntax errors as possible.

2022-11-08 Thread Alex Hall

On Sunday, October 9, 2022 at 12:09:45 PM UTC+2, Antoon Pardon wrote:
> I would like a tool that tries to find as many syntax errors as possible 
> in a python file. I know there is the risk of false positives when a 
> tool tries to recover from a syntax error and proceeds but I would 
> prefer that over the current python strategy of quiting after the first 
> syntax error. I just want a tool for syntax errors. No style 
> enforcements. Any recommandations? -- Antoon Pardon

Bit late here, coming from the Pycoder's Weekly email newsletter, but I'm 
surprised that I don't see any mentions of 
[parso](https://parso.readthedocs.io/en/latest/):

> Parso is a Python parser that supports error recovery and round-trip parsing 
> for different Python versions (in multiple Python versions). Parso is also 
> able to list multiple syntax errors in your python file.

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-12 Thread Peter J. Holzer

On 2022-10-11 14:11:56 -0400, Thomas Passin wrote:
> To bring things back to the context of the original post, actual web
> browsers are extremely tolerant of HTML syntax errors (including incorrect
> nesting of tags) in the documents they receive.

HTML5 actually specifies exactly how to recover from errors. So since
every sequence of bytes results in a well-defined DOM tree you might
argue (a bit tongue in cheek) that there are no syntax errors in HTML5.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-12 Thread Peter J. Holzer

On 2022-10-13 11:23:40 +1100, Chris Angelico wrote:
> On Thu, 13 Oct 2022 at 11:19, Peter J. Holzer  wrote:
> > On 2022-10-11 09:47:52 +1100, Chris Angelico wrote:
> > > On Tue, 11 Oct 2022 at 09:18, Cameron Simpson  wrote:
> > > >
> > > Consider:
> > >
> > > if condition # no colon
> > > code
> > > else:
> > > code
> > >
> > > To actually "restart" parsing, you have to make a guess of some sort.
> >
> > Right. At least one of the papers on parsing I read over the last few
> > years (yeah, I really should try to find them again) argued that the
> > vast majority of syntax errors is either a missing token, a superfluous
> > token or a combination of the the two. So one strategy with good results
> > is to heuristically try to insert or delete single tokens and check
> > which results in the longest distance to the next error.
> >
> > Checking multiple possible fixes has its cost, especially since you have
> > to do that at every error. So you can argue that it is better for
> > productivity if you discover one error in 0.1 seconds than 10 errors in
> > 5 seconds.
> 
> Maybe; but what if you report 10 errors in 5 seconds, but 8 of them
> are spurious? You've reported two useful errors in a sea of noise.
> Even if it's the other way around (8 where you nailed it and correctly
> reported the error, 2 that are nonsense), is it actually helpful?

Humans are pattern-matching animals. It is quite possible that seeing a
bunch of related errors makes the fix more obvious than seeing them in
isolation.

No, I haven't done any studies on this. Yes, it is possible that all
those compiler writers who spent lots of work on error recovery over the
last 50 years (or longer) are delusional.


> > > > I grew up with C and Pascal compilers which would _happily_ produce many
> > > > complaints, usually accurate, and all manner of syntactic errors. They
> > > > didn't stop at the first syntax error.
> > >
> > > Yes, because they work with a much simpler grammar.
> >
> > I very much doubt that. Python doesn't have a particularly complicated
> > grammar, and C certainly doesn't have a particularly simple one.
> >
> > The argument that it's impossible in Python (unlike any other language),
> > because Python is oh so special doesn't hold water.
> >
> 
> Never said it's because Python is special; there are a LOT of
> languages that are at least as complicated.

And almost all of their compilers do try to recover from errors.

> But I do think that Pascal, especially, has a significantly simpler
> grammar than Python does.

Incidentally, Turbo Pascal was the one other example of a compiler which
*didn't* try to recover.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-12 Thread Chris Angelico

On Thu, 13 Oct 2022 at 11:23, dn  wrote:
> # add an extra character within identifier, as if 'new' identifier
> 28  assert expected_value == fyibonacci_number
> UUU
>
> # these all trivial SYNTAX errors - could have tried leaving-out a
> keyword, but ...

Just to be clear, this last one is not actually a *syntax* error -
it's a misspelled name, but contextually, that is clearly a name and
nothing else. These are much easier to report multiples of, and
typical syntax highlighters will do so.

Your other two examples were both syntactic discrepancies though.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-12 Thread Chris Angelico

On Thu, 13 Oct 2022 at 11:19, Peter J. Holzer  wrote:
>
> On 2022-10-11 09:47:52 +1100, Chris Angelico wrote:
> > On Tue, 11 Oct 2022 at 09:18, Cameron Simpson  wrote:
> > >
> > Consider:
> >
> > if condition # no colon
> > code
> > else:
> > code
> >
> > To actually "restart" parsing, you have to make a guess of some sort.
>
> Right. At least one of the papers on parsing I read over the last few
> years (yeah, I really should try to find them again) argued that the
> vast majority of syntax errors is either a missing token, a superfluous
> token or a combination of the the two. So one strategy with good results
> is to heuristically try to insert or delete single tokens and check
> which results in the longest distance to the next error.
>
> Checking multiple possible fixes has its cost, especially since you have
> to do that at every error. So you can argue that it is better for
> productivity if you discover one error in 0.1 seconds than 10 errors in
> 5 seconds.

Maybe; but what if you report 10 errors in 5 seconds, but 8 of them
are spurious? You've reported two useful errors in a sea of noise.
Even if it's the other way around (8 where you nailed it and correctly
reported the error, 2 that are nonsense), is it actually helpful? Bear
in mind that, if you can discover one syntax error in 0.1 seconds, you
can do that check *the moment the user types a key* in the editor
(which is more-or-less what happens with most syntax highlighting
editors - some have a small delay to avoid being too noisy with error
reporting, but same difference). Why report false errors when you can
report errors one by one and know that they're true?

> > > I grew up with C and Pascal compilers which would _happily_ produce many
> > > complaints, usually accurate, and all manner of syntactic errors. They
> > > didn't stop at the first syntax error.
> >
> > Yes, because they work with a much simpler grammar.
>
> I very much doubt that. Python doesn't have a particularly complicated
> grammar, and C certainly doesn't have a particularly simple one.
>
> The argument that it's impossible in Python (unlike any other language),
> because Python is oh so special doesn't hold water.
>

Never said it's because Python is special; there are a LOT of
languages that are at least as complicated. Try giving multiple useful
errors when there's a syntactic problem in SQL, for instance. But I do
think that Pascal, especially, has a significantly simpler grammar
than Python does.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-12 Thread dn


On 09/10/2022 23.09, Antoon Pardon wrote:
I would like a tool that tries to find as many syntax errors as possible 
in a python file. I know there is the risk of false positives when a 
tool tries to recover from a syntax error and proceeds but I would 
prefer that over the current python strategy of quiting after the first 
syntax error. I just want a tool for syntax errors. No style 
enforcements. Any recommandations? -- Antoon Pardon



Am not sure if have really understood problem being addressed, because 
it seems 'answered' - perhaps the question says more about the tool-set 
being utilised...



As someone who used to manually check and re-check code before 
submitting (first punched-cards, and later edited files) source to a 
compiler, it took some re-education to learn what to expect from a 
modern/language-intelligent IDE.


The topic was a major interest back in the days of batch-compilers. Plus 
we had other tools, eg CREF/XREF utilities which produced 
cross-references of identifier usage - and illustrated typos in 
identifiers, usage before value-assignment, etc (per request from one 
respondent).



Using an IDE which is inspecting source-code as it is being typed (or 
when an existing file is opened) will suggest what might?should be typed 
'next' (a mixed blessing IMHO!), and secondly highlights errors until 
they are noticed and dealt-with. Some, especially warnings, can be 
safely ignored - and yes, some are spurious and SHOULD be ignored!.


PyCharm* displays a number of indicators. The least intrusive appears in 
the top-right corner of the editor-tab listing, eg 8 errors, 2 warnings. 
So, apparently not 'stopping' at first error found.


Within the source-code itself, there are high-lights and under-lines (in 
and amongst the syntax highlighting presentation/theme) - which I 
suppose are easier to notice during data-entry if one is a touch-typist. 
Accordingly, not much of a context for multiple errors to be committed 
during a single coding-session, but remaining un-noticed until 'the end'.



For illustration, I took a simple tutorial* routine and deliberately 
introduced some/many of the types of error discussed within this thread. 
It would have been ideal to attach a graphic but here are some lines of 
code, under which I have attempted to represent a highlighted character 
(related to the line above) with an "H", and a (red) under-lined token 
with a "U". So, this is a feeble-attempt to show how the source is 
displayed and annotated by the IDE:


# mis-type the tuple-assignment by adding semi-colon
# which might also confuse Python into thinking of a second instruction
17 i, j = 0;, 1
  H  UH

# replace under-line/under-score with space: s/b expected_value
25 for expected value, fibonacci_number in \
   UU  

# mis-type the name of the zip built-in function
26 z ip( SERIES, fibonacci_generator() ):
   U 

# add an extra character within identifier, as if 'new' identifier
28  assert expected_value == fyibonacci_number
   UUU

# these all trivial SYNTAX errors - could have tried leaving-out a 
keyword, but ...



Assuming the problem is not noticed/handled as the text is being typed, 
and in addition to the coder reviewing the work, recognising problems, 
and dealing with them him-/her-self; the IDE offers two follow-up 
mechanisms:


1 a means to jump 'focus' from the site of one error to the next, 
whereupon a pop-up will describe the error, eg (line 28) "Unresolved 
reference 'expected_value'"; which illustrates one problem in-isolation. 
In this case, line 28 is 'at fault' despite the fact that the 'error' is 
a consequence of THE problem on line 25!


2 a "Problems" Tool Window can be displayed, which will list every error 
and warning, with pretty, colored, icons, and the same message per 
example above, together with the relevant line-number, (the first two 
entries, as-listed, are 'warnings', and the rest are described as "errors"):


Need more values to unpack:17
Statement seems to have no effect:17
# so it has picked-up both of my nefarious intentions

Statement expected, found Py:COMMA:17
# as above
# NB the "Py:COMMA" is from tokenize (per @Chris contribution(s))
'in' expected:25
# logical, but confused by the space
Unresolved reference 'value':25
# pretty-much had no chance with so many faults in one statement!
Unresolved reference 'fibonacci_number':25
# ditto
Unresolved reference 'z':26
# absolutely!
':' expected:26
# evidently re-started after the "in" and did what it could with the "z"
Unresolved reference 'expected_value':28
# it would be "resolved" but for the first error on line 25
Unresolved reference 'fyibonacci_number':28
# ahah! Apparently trying to use an identifier before declaring/defining
# in reality, just another typo
# that said, I created the issue by inserting the "y"
# if I'd mistyped the ent

Re: What to use for finding as many syntax errors as possible.

2022-10-12 Thread Peter J. Holzer

On 2022-10-11 09:47:52 +1100, Chris Angelico wrote:
> On Tue, 11 Oct 2022 at 09:18, Cameron Simpson  wrote:
> >
> Consider:
> 
> if condition # no colon
> code
> else:
> code
> 
> To actually "restart" parsing, you have to make a guess of some sort.

Right. At least one of the papers on parsing I read over the last few
years (yeah, I really should try to find them again) argued that the
vast majority of syntax errors is either a missing token, a superfluous
token or a combination of the the two. So one strategy with good results
is to heuristically try to insert or delete single tokens and check
which results in the longest distance to the next error.

Checking multiple possible fixes has its cost, especially since you have
to do that at every error. So you can argue that it is better for
productivity if you discover one error in 0.1 seconds than 10 errors in
5 seconds.

> > I grew up with C and Pascal compilers which would _happily_ produce many
> > complaints, usually accurate, and all manner of syntactic errors. They
> > didn't stop at the first syntax error.
> 
> Yes, because they work with a much simpler grammar.

I very much doubt that. Python doesn't have a particularly complicated
grammar, and C certainly doesn't have a particularly simple one.

The argument that it's impossible in Python (unlike any other language),
because Python is oh so special doesn't hold water.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-11 Thread Cameron Simpson


On 11Oct2022 17:45, Thomas Passin  wrote:
Personally, I'd most likely go for a decent programming editor that you 
can set up to run a program on your file, use that to run a checker, 
like pyflakes for instance, and run that from time to time.  You could 
run it when you save a file.  Even if it only showed one error at a 
time, it would make quick work of correcting mistakes.  And it wouldn't 
need to trigger an entire tool chain each time.


Aye.

I've got my editor (vim) configured to run an autoformatter on my code 
when I save (this can be turned off, and parse errors prevent any 
reformatting).


Linters I run by hand from the adjacent shell window, via a small script 
which runs my preferred linters with their preferred options.


My current workplace triggers the CI workflow when you push commits 
upstream, and you can make branch names which do not trigger the CI 
stuff.


So there's a decent separation between saving (and testing or locally 
running the dev code) from the CI cycle.


Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-11 Thread Thomas Passin

On 10/11/2022 5:09 PM, Thomas Passin wrote:

The OP wants to get help with problems in
his files even if it isn't perfect, and I think that's reasonable to
wish for. The link to a post about the lezer parser in a recent message
on this thread is partly about how a real, practical parser can do some
error correction in mid-flight, for the purposes of a programming editor
(as opposed to one that has to build a correct program).

One editor that seems to do what the OP wants is Visual Studio Code. It
will mark apparent errors - not just syntax errors - not limited to one
per page. Sometimes it can even suggest corrections. I personally
dislike the visual clutter the markings impose, but I imagine I could
get used to it.

VSC uses a Microsoft system they call "PyLance" - see

https://devblogs.microsoft.com/python/announcing-pylance-fast-feature-rich-language-support-for-python-in-visual-studio-code/

Of course, you don't get something complex for free, and in this case
the cost is having to run a separate server to do all this analysis on
the fly. However, VSC handles all of that behind the scenes so you
don't have to.

Personally, I'd most likely go for a decent programming editor that you
can set up to run a program on your file, use that to run a checker,
like pyflakes for instance, and run that from time to time. You could
run it when you save a file. Even if it only showed one error at a
time, it would make quick work of correcting mistakes. And it wouldn't
need to trigger an entire tool chain each time.

My editor of choice for setting up helper "tools" like this on Windows
is Editplus (non-free but cheap and very worth it), and I have both
py_compile and pyflakes set up this way in it. However, as I mentioned
in an earlier post, the Leo Editor
(https://github.com/leo-editor/leo-editor) does this for you
automatically when you save, so it's very convenient. That's what I
mostly work in.

--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-11 Thread Thomas Passin


On 10/11/2022 4:00 PM, Chris Angelico wrote:

On Wed, 12 Oct 2022 at 05:23, Thomas Passin  wrote:


On 10/11/2022 3:10 AM, avi.e.gr...@gmail.com wrote:

I see resemblances to something like how a web page is loaded and operated.
I mean very different but at some level not so much.

I mean a typical web page is read in as HTML with various keyword regions
expected such as  ...  or  ...  with things
often cleanly nested in others. The browser makes nodes galore in some kind
of tree format with an assortment of objects whose attributes or methods
represent aspects of what it sees. The resulting treelike structure has
names like DOM.


To bring things back to the context of the original post, actual web
browsers are extremely tolerant of HTML syntax errors (including
incorrect nesting of tags) in the documents they receive.  They usually
recover silently from errors and are able to display the rest of the
page.  Usually they manage this correctly.


Having had to debug tiny errors in HTML pages that resulted in
extremely weird behaviour, I'm not sure that I agree that they usually
manage correctly. Fundamentally, they guess, and guesswork is never
reliable.


Still, browsers generally do a very decent job of recovery, even though 
perfection isn't possible.  The OP wants to get help with problems in 
his files even if it isn't perfect, and I think that's reasonable to 
wish for.  The link to a post about the lezer parser in a recent message 
on this thread is partly about how a real, practical parser can do some 
error correction in mid-flight, for the purposes of a programming editor 
(as opposed to one that has to build a correct program).


--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-11 Thread Chris Angelico

On Wed, 12 Oct 2022 at 05:23, Thomas Passin  wrote:
>
> On 10/11/2022 3:10 AM, avi.e.gr...@gmail.com wrote:
> > I see resemblances to something like how a web page is loaded and operated.
> > I mean very different but at some level not so much.
> >
> > I mean a typical web page is read in as HTML with various keyword regions
> > expected such as  ...  or  ...  with things
> > often cleanly nested in others. The browser makes nodes galore in some kind
> > of tree format with an assortment of objects whose attributes or methods
> > represent aspects of what it sees. The resulting treelike structure has
> > names like DOM.
>
> To bring things back to the context of the original post, actual web
> browsers are extremely tolerant of HTML syntax errors (including
> incorrect nesting of tags) in the documents they receive.  They usually
> recover silently from errors and are able to display the rest of the
> page.  Usually they manage this correctly.

Having had to debug tiny errors in HTML pages that resulted in
extremely weird behaviour, I'm not sure that I agree that they usually
manage correctly. Fundamentally, they guess, and guesswork is never
reliable.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-11 Thread Thomas Passin


On 10/11/2022 3:10 AM, avi.e.gr...@gmail.com wrote:

I see resemblances to something like how a web page is loaded and operated.
I mean very different but at some level not so much.

I mean a typical web page is read in as HTML with various keyword regions
expected such as  ...  or  ...  with things
often cleanly nested in others. The browser makes nodes galore in some kind
of tree format with an assortment of objects whose attributes or methods
represent aspects of what it sees. The resulting treelike structure has
names like DOM.


To bring things back to the context of the original post, actual web 
browsers are extremely tolerant of HTML syntax errors (including 
incorrect nesting of tags) in the documents they receive.  They usually 
recover silently from errors and are able to display the rest of the 
page.  Usually they manage this correctly.  The OP would like to have a 
parser or checker that could do the same, plus giving an output showing 
where each of the errors happened.


I can imagine such a parser also reporting which lines it had to skip 
before it was able to recover.

--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-11 Thread Chris Angelico

On Tue, 11 Oct 2022 at 18:12,  wrote:
>
> Thanks for a rather detailed explanation of some of what we have been
> discussing, Chris. The overall outline is about what I assumed was there but
> some of the details were, to put it politely, fuzzy.
>
> I see resemblances to something like how a web page is loaded and operated.
> I mean very different but at some level not so much.
>
> I mean a typical web page is read in as HTML with various keyword regions
> expected such as  ...  or  ...  with things
> often cleanly nested in others. The browser makes nodes galore in some kind
> of tree format with an assortment of objects whose attributes or methods
> represent aspects of what it sees. The resulting treelike structure has
> names like DOM.

Yes. The basic idea of "tokenize, parse, compile" can be used for
pretty much any language - even English, although its grammar is a bit
more convoluted than most programming languages, with many weird
backward compatibility features! I'll parse your last sentence above:

LETTERS The
SPACE
LETTERS resulting
SPACE
... you get the idea
LETTERS like
SPACE
LETTERS DOM
FULLSTOP # or call this token PERIOD if you're American

Now, we can group those tokens into meaningful sets.

Sentence(type=Statement,
subject=Noun(name="structure", addenda=[
Article(type=The),
Adjective(name="treelike"),
]),
verb=Verb(type=Being, name="has", addenda=[]),
object=Noun(name="name", plural=True, addenda=[
Adjective(phrase=Phrase(verb=Verb(name="like"), object=Noun(name="DOM"),
]),
)

Grammar nerds will probably dispute some of the awful shorthanding I
did here, but I didn't want to devise thousands of AST nodes just for
this :)

> To a certain approximation, this tree starts a certain way but is regularly
> being manipulated (or perhaps a copy is) as it regularly is looked at to see
> how to display it on the screen at the moment based on the current tree
> contents and another set of rules in Cascading Style Sheets.

Yep; the DOM tree is initialized from the HTML (usually - it's
possible to start a fresh tree with no HTML) and then can be
manipulated afterwards.

> These are not at all the same thing but share a certain set of ideas and
> methods and can be very powerful as things interact.

Oh absolutely. That's why there are languages designed to help you
define other languages.

> In effect the errors in the web situation have such analogies too as in what
> happens if a region of HTML is not well-formed or uses a keyword not
> recognized.

And they're horribly horribly messy, due to a few decades of
sloppy HTML programmers and the desire to still display the page even
if things are messed up :) But, again, there's a huge difference
between syntactic errors (like omitting a matching angle bracket) and
semantic errors (a keyword not known, like using  when you
should have used ). In the latter case, you can still build a
DOM tree, but you have an unknown element; in the former case, you
have to guess at what the author meant, just to get anything going at
all.

> There was a guy around a few years ago who suggested he would create a
> system where you could create a series of some kind of configuration files
> for ANY language and his system would them compile or run programs for each
> and every such language? Was that on this forum? What ever happened to him?

That was indeed on this forum, and I have no idea what happened to
him. Maybe he realised that all he'd invented was the Unix shebang?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-11 Thread Weatherby,Gerard

Sure it does. They’re optional and not enforced at runtime, but I find them 
useful when writing code in PyCharm:

import os
from os import DirEntry

de : DirEntry
for de in os.scandir('/tmp'):
print(de.name)

de = 7
print(de)

Predeclaring de allows me to do the tab completion thing with DirEntry fields / 
methods

From: Python-list  on 
behalf of avi.e.gr...@gmail.com 
Date: Monday, October 10, 2022 at 10:11 PM
To: python-list@python.org 
Subject: RE: What to use for finding as many syntax errors as possible.
*** Attention: This is an external email. Use caution responding, opening 
attachments or clicking on links. ***

Michael,

A reasonable question. Python lets you initialize variables but has no
explicit declarations.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-11 Thread Antoon Pardon





Op 10/10/2022 om 19:08 schreef Robert Latest via Python-list:

Antoon Pardon wrote:

I would like a tool that tries to find as many syntax errors as possible
in a python file.

I'm puzzled as to when such a tool would be needed. How many syntax errors can
you realistically put into a single Python file before compiling it for the
first time?


Why are you puzzled? I don't need to make that many syntaxt errors to find
such a tool useful.

--
Antoon Pardon
--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-11 Thread Roel Schroeven

Op 10/10/2022 om 19:08 schreef Robert Latest via Python-list:

Antoon Pardon wrote:
> I would like a tool that tries to find as many syntax errors as possible 
> in a python file.

I'm puzzled as to when such a tool would be needed. How many syntax errors can
you realistically put into a single Python file before compiling it for the
first time?
I've been following the discussion from a distance and the whole time 
I've been wondering the same thing. Especially when you have unit tests, 
as Antoon said he has, I can't really imagine a situation where you add 
so much code in one go without running it that you introduce a painful 
amount of syntax errors.

My solution would be to use a modern IDE with a linter, possibly with 
style warnings disabled, which will flag syntax errors as soon as you 
type them. Possibly combined with a TDD-style tactic which also prevents 
large amounts of errors (any errors) to build up. But I have the 
impression that any of those doesn't fit in Antoon's workflow.

--
"Peace cannot be kept by force. It can only be achieved through understanding."
-- Albert Einstein

--
https://mail.python.org/mailman/listinfo/python-list

RE: What to use for finding as many syntax errors as possible.

2022-10-11 Thread avi.e.gross

Thanks for a rather detailed explanation of some of what we have been
discussing, Chris. The overall outline is about what I assumed was there but
some of the details were, to put it politely, fuzzy.

I see resemblances to something like how a web page is loaded and operated.
I mean very different but at some level not so much.

I mean a typical web page is read in as HTML with various keyword regions
expected such as  ...  or  ...  with things
often cleanly nested in others. The browser makes nodes galore in some kind
of tree format with an assortment of objects whose attributes or methods
represent aspects of what it sees. The resulting treelike structure has
names like DOM.

To a certain approximation, this tree starts a certain way but is regularly
being manipulated (or perhaps a copy is) as it regularly is looked at to see
how to display it on the screen at the moment based on the current tree
contents and another set of rules in Cascading Style Sheets. But bits and
pieces of JavaScript are also embedded or imported that can read aspects of
the tree (and more) and modify the contents and arrange for all kinds of
asynchronous events when bits of code are invoked such as when you click a
button or hover or when an image finishes loading or every 100 milliseconds.
It can insert new objects into the DOM too. And of course there can be
interactions with restricted local storage as well as with servers and code
running there.

It is quite a mess but in some ways I see analogies. Your program reads a
stream of data and looks for tokens and eventually turns things into a tree
of sorts that represents relationships to a point. Additional structures
eventually happen at run time that let you store collections of references
to variables such as environments or namespaces and the program derived from
the trees makes changes as it goes and in a language like Python can even
possibly change the running program in some ways.

These are not at all the same thing but share a certain set of ideas and
methods and can be very powerful as things interact. In the web case, the
CSS may search for regions with some class or ID or that are the third
element of a bullet list and more, using powerful tools like jQuery, and
make changes. A CSS rule that previously ignored some region as not having a
particular class, might start including it after a JavaScript segment is
aroused while waiting on an event listener for say a mouse hovering over an
area and then changes that part of the DOM (like a node) to be in that
class. Suddenly the area on your screen changes background or whatever the
CSS now dictates. We have multiple systems written in an assortment of
"languages" that complement each other. Some running programs, especially
ones that use asynchronous methods like threads or callbacks on events, such
as a GUI, can effectively do similar things. 

In effect the errors in the web situation have such analogies too as in what
happens if a region of HTML is not well-formed or uses a keyword not
recognized. This becomes even more interesting in XML where anything can be
a keyword and you often need other kinds of files (often also in ML) to
define what the XML can be like and what restrictions it may have such as
can a  have multiple authors but only one optional publication date
and so on. It can be fascinating and highly technical. So I am up for a
challenge of studying anything from early compilers for languages of my
youth to more recent ways including some like what you show.

I have time to kill and this might be more fun than other things, for a
while.

There was a guy around a few years ago who suggested he would create a
system where you could create a series of some kind of configuration files
for ANY language and his system would them compile or run programs for each
and every such language? Was that on this forum? What ever happened to him?

But although what he promised seemed a bit too much, I can see from your
comments below how in some ways a limited amount of that might be done for
some subset of languages which can be parsed and manipulated as described. 

-Original Message-
From: Python-list  On
Behalf Of Chris Angelico
Sent: Monday, October 10, 2022 11:55 PM
To: python-list@python.org
Subject: Re: What to use for finding as many syntax errors as possible.

On Tue, 11 Oct 2022 at 14:26,  wrote:
>
> I stand corrected Chris, and others, as I pay the sin tax.
>
> Yes, there are many kinds of errors that logically fall into different 
> categories or phases of evaluation of a program and some can be 
> determined by a more static analysis almost on a line by line (or 
> "statement" or "expression", ...)  basis and others need to sort of 
> simulate some things and look back and forth to detect possible 
> incompatibilities and yet others can only be detected at run time and 
> likely way more categories depending on the language.
>
> But

What to use for finding as many syntax errors as possible.

2022-10-10 Thread avi.e.gross

I think we are in agreement here, Chris. My point is that the error
detection and correction is now done at levels where there is not much need
to use earlier and inefficient methods like parity bits set aside. We use
protocols like TCP and IP and layers above them and above those to maintain
the integrity of packets and sessions and forms of encryption allowing
things like authentication. There is tons of overhead, even when some is
fairly efficient, but we hardly notice it unless things go wrong.

So written language sent (as in this email/post) does not need lots of
redundancy and all the extra effort is, IMNSHO opinion, largely wasted. If I
see a bear, I do not wish to check their genitals or DNA to determine their
irrelevant gender before asking someone to run from it. If I happen to know
the gender, as in a zoo, gender only matters for things like breeding
purposes. I do not want to memorize terms in languages that have not only
words like lion and lioness or duck and drake and goose and gander, but for
EVERYTHING in some sense so I can say the equivalent of ANIMAL-male and
ANIMAL-female with unique words. Life would be so much simpler if I could
say your dog was nice and not be corrected that it was a bitch and I used
the wrong word endings. If I really wanted to say it was a female dog, well
I could just add a qualified. Most of the time, who cares?

The same applies to so much grammatical nonsense which is also usually
riddled with endless exceptions to the many rules. Make the languages simple
with little redundancy and thus far easier to learn.

I can say similar things about some programming languages that either have
way too many rules or too few of the right ones.

There are tradeoffs and if you want a powerful language it will likely not
be easy to control. If you want a very regulated language, you may find it
not very useful as many things are hard to do ad others not possible. I know
that strongly typed languages often have to allow some method of cheating
such as unions of data types, or using a parent class as the sort of
object-type to allow disparate objects to live together. Python is far from
the most complex but as noted, it is not trivial to evaluate even the syntax
past errors.

But I admit it is fun and a challenge to learn both kinds and I spent much
of my time doing so. I like the flexibility of seeing different approaches
and holding contradictions in my mind while accepting both and yet neither!
LOL!

-Original Message-
From: Python-list  On
Behalf Of Chris Angelico
Sent: Monday, October 10, 2022 11:24 PM
To: python-list@python.org
Subject: Re: What to use for finding as many syntax errors as possible.

On Tue, 11 Oct 2022 at 14:13,  wrote:
> With the internet today, we are used to expecting error correction to 
> come for free. Do you really need one of every 8 bits to be a parity 
> bit, which only catches may half of the errors...

Fortunately, we have WAY better schemes than simple parity, which was only
really a thing in the modem days. (Though I would say that there's still a
pretty clear distinction between a good message where everything has correct
parity, and line noise where half of them
don't.) Hamming codes can correct one-bit errors (and detect two-bit
errors) at a price of log2(size)+1 bits of space. Here's a great
rundown:

https://www.youtube.com/watch?v=X8jsijhllIA

There are other schemes too, but Hamming codes are beautifully elegant and
easy to understand.

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-10 Thread Chris Angelico

On Tue, 11 Oct 2022 at 14:26,  wrote:
>
> I stand corrected Chris, and others, as I pay the sin tax.
>
> Yes, there are many kinds of errors that logically fall into different
> categories or phases of evaluation of a program and some can be determined
> by a more static analysis almost on a line by line (or "statement" or
> "expression", ...)  basis and others need to sort of simulate some things
> and look back and forth to detect possible incompatibilities and yet others
> can only be detected at run time and likely way more categories depending on
> the language.
>
> But when I run the Python interpreter on code, aren't many such phases done
> interleaved and at once as various segments of code are parsed and examined
> and perhaps compiled into block code and eventually executed?

Hmm, depends what you mean. Broadly speaking, here's how it goes:

0) Early pre-parse steps that don't really matter to most programs,
like checking character set. We'll ignore these.
1) Tokenize the text of the program into a sequence of
potentially-meaningful units.
2) Parse those tokens into some sort of meaningful "sentence".
3) Compile the syntax tree into actual code.
4) Run that code.

Example:
>>> code = """def f():
... print("Hello, world", 1>=2)
... print(Ellipsis, ...)
... return True
... """
>>>

In step 1, all that happens is that a stream of characters (or bytes,
depending on your point of view) gets broken up into units.

>>> for t in tokenize.tokenize(iter(code.encode().split(b"\n")).__next__):
... print(tokenize.tok_name[t.exact_type], t.string)

It's pretty spammy, but you can see how the compiler sees the text.
Note that, at this stage, there's no real difference between the NAME
"def" and the NAME "print" - there are no language keywords yet.
Basically, all you're doing is figuring out punctuation and stuff.

Step 2 is what we'd normally consider "parsing". (It may well happen
concurrently and interleaved with tokenizing, and I'm giving a
simplified and conceptualized pipeline here, but this is broadly what
Python does.) This compares the stream of tokens to the grammar of a
Python program and attempts to figure out what it means. At this
point, the linear stream turns into a recursive syntax tree, but it's
still very abstract.

>>> import ast
>>> ast.dump(ast.parse(code))
"Module(body=[FunctionDef(name='f', args=arguments(posonlyargs=[],
args=[], kwonlyargs=[], kw_defaults=[], defaults=[]),
body=[Expr(value=Call(func=Name(id='print', ctx=Load()),
args=[Constant(value='Hello, world'), Compare(left=Constant(value=1),
ops=[GtE()], comparators=[Constant(value=2)])], keywords=[])),
Expr(value=Call(func=Name(id='print', ctx=Load()),
args=[Name(id='Ellipsis', ctx=Load()), Constant(value=Ellipsis)],
keywords=[])), Return(value=Constant(value=True))],
decorator_list=[])], type_ignores=[])"

(Side point: I would rather like to be able to
pprint.pprint(ast.parse(code)) but that isn't a thing, at least not
currently.)

This is where the vast majority of SyntaxErrors come from. Your code
is a sequence of tokens, but those tokens don't mean anything. It
doesn't make sense to say "print(def f[return)]" even though that'd
tokenize just fine. The trouble with the notion of "keeping going
after finding an error" is that, when you find an error, there are
almost always multiple possible ways that this COULD have been
interpreted differently. It's as likely to give nonsense results as
actually useful ones.

(Note that, in contrast to the tokenization stage, this version
distinguishes between the different types of word. The "def" has
resulted in a FunctionDef node, the "print" is a Name lookup, and both
"..." and "True" have now become Constant nodes - previously, "..."
was a special Ellipsis token, but "True" was just a NAME.)

Step 3: the abstract syntax tree gets parsed into actual runnable
code. This is where that small handful of other SyntaxErrors come
from. With these errors, you absolutely _could_ carry on and report
multiple; but it's not very likely that there'll actually *be* more
than one of them in a file. Here's some perfectly valid AST parsing:

>>> ast.dump(ast.parse("from __future__ import the_past"))
"Module(body=[ImportFrom(module='__future__',
names=[alias(name='the_past')], level=0)], type_ignores=[])"
>>> ast.dump(ast.parse("from __future__ import braces"))
"Module(body=[ImportFrom(module='__future__',
names=[alias(name='braces')], level=0)], type_ignores=[])"
>>> ast.dump(ast.parse("def f():\n\tdef g():\n\t\tnonlocal x\n"))
"Module(body=[FunctionDef(name='f', args=arguments(posonlyargs=[],
args=[], kwonlyargs=[], kw_defaults=[], defaults=[]),
body=[FunctionDef(name='g', args=arguments(posonlyargs=[], args=[],
kwonlyargs=[], kw_defaults=[], defaults=[]),
body=[Nonlocal(names=['x'])], decorator_list=[])],
decorator_list=[])], type_ignores=[])"

If you were to try to actually compile those to bytecode, they would fail:

>>> compile(ast.parse("from __future__ import braces"), "-", "exec")

Re: What to use for finding as many syntax errors as possible.

2022-10-10 Thread Chris Angelico

On Tue, 11 Oct 2022 at 14:13,  wrote:
> With the internet today, we are used to expecting error correction to come
> for free. Do you really need one of every 8 bits to be a parity bit, which
> only catches may half of the errors...

Fortunately, we have WAY better schemes than simple parity, which was
only really a thing in the modem days. (Though I would say that
there's still a pretty clear distinction between a good message where
everything has correct parity, and line noise where half of them
don't.) Hamming codes can correct one-bit errors (and detect two-bit
errors) at a price of log2(size)+1 bits of space. Here's a great
rundown:

https://www.youtube.com/watch?v=X8jsijhllIA

There are other schemes too, but Hamming codes are beautifully elegant
and easy to understand.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

RE: What to use for finding as many syntax errors as possible.

2022-10-10 Thread avi.e.gross

I stand corrected Chris, and others, as I pay the sin tax.

Yes, there are many kinds of errors that logically fall into different
categories or phases of evaluation of a program and some can be determined
by a more static analysis almost on a line by line (or "statement" or
"expression", ...)  basis and others need to sort of simulate some things
and look back and forth to detect possible incompatibilities and yet others
can only be detected at run time and likely way more categories depending on
the language.

But when I run the Python interpreter on code, aren't many such phases done
interleaved and at once as various segments of code are parsed and examined
and perhaps compiled into block code and eventually executed? 

So is the OP asking for something other than a Python Interpreter that
normally halts after some kind of error? Tools like a linter may indeed fit
that mold. 

This may limit some of the objections of when an error makes it hard for the
parser to find some recovery point to continue from as no code is being run
and no harmful side effects happen by continuing just an analysis. 

Time to go read some books about modern ways to evaluate a language based on
more mathematical rules including more precisely what is syntax versus ...

Suggestions?

-Original Message-
From: Python-list  On
Behalf Of Chris Angelico
Sent: Monday, October 10, 2022 10:42 PM
To: python-list@python.org
Subject: Re: What to use for finding as many syntax errors as possible.

On Tue, 11 Oct 2022 at 13:10,  wrote:
> If the above is:
>
> Import grumpy as np
>
> Then what happens if the code tries to find a file named "grumpy" 
> somewhere and cannot locate it and this is considered a syntax error 
> rather than a run-time error for whatever reason? Can you continue 
> when all kinds of functionality is missing and code asking to make a 
> np.array([1,2,3]) clearly fails?

That's not a syntax error. Syntax is VERY specific. It is an error in Python
to attempt to add 1 to "one", it is an error to attempt to look up the
upper() method on None, it is an error to try to use a local variable you
haven't assigned to yet, and it is an error to open a file that doesn't
exist. But not one of these is a *syntax* error.

Syntax errors are detected at the parsing stage, before any code gets run.
The vast majority of syntax errors are grammar errors, where the code
doesn't align with the parseable text of a Python program.
(Non-grammatical parsing errors include using a "nonlocal" statement with a
name that isn't found in any surrounding scope, using "await"
in a non-async function, and attempting to import braces from the
future.)

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

RE: What to use for finding as many syntax errors as possible.

2022-10-10 Thread avi.e.gross

Cameron, or OP if you prefer,

I think by now you have seen a suggestion that languages make choices and
highly structured ones can be easier to "recover" from errors and try to
continue than some with way more complex possibilities that look rather
unstructured.

What is the error in code like this?

A,b,c,d = 1,2,

Or is it an error at all?

Many languages have no concept of doing anything like the above and some
tolerate a trailing comma and some set anything not found to some form of
NULL or uninitialized and some ...

If you look at human language, some are fairly simple and some are way too
organized. But in a way it can make sense. Languages with gender will often
ask you to change the spelling and often how you pronounce things not only
based on whether a noun is male/female or even neuter but also insist you
change the form of verbs or adjectives and so on that in effect give
multiple signals that all have to line up to make a valid and understandable
sentence. Heck, in conversations, people can often leave out parts of  a
sentence such as whether you are talking about "I" or "you" or "she" or "we"
because the rest of the words in the sentence redundantly force only one
choice to be possible. 

So some such annoying grammars (in my opinion) are error
detection/correction codes in disguise. In days before microphones and
speakers, it was common to not hear people well, like on a stage a hundred
feet away with other ambient noises. Missing a word or two might still allow
you to get the point as other parts of the sentence did such redundancies.
Many languages have similar strictures letting you know multiple times if
something is singular or plural. And I think another reason was what I call
stranger detection. People who learn some vocabulary might still not speak
correctly and be identifiable as strangers, as in spies.

Do we need this in the modern age? Who knows! But it makes me prefer some
languages over others albeit other reasons may ...

With the internet today, we are used to expecting error correction to come
for free. Do you really need one of every 8 bits to be a parity bit, which
only catches may half of the errors, when the internals of your computer are
relatively error free and even the outside is protected by things like
various protocols used in making and examining packets and demanding some be
sent again if some checksum does not match? Tons of checking is built in so
at your level you rarely think about it. If you get a message, it usually is
either 99.% accurate, or you do not have it shown to you at all. I am
not talking about SPAM but about errors of transmission.

So my analogies are that if you want a very highly structured language that
can recover somewhat from errors, Python may not be it.

And over the years as features are added or modified, the structure tends to
get more complex. And R is not alone. Many surviving languages continue to
evolve and borrow from each other and any program that you run today that
could partially recover and produce pages of possible errors, may blow up
when new features are introduced.

And with UNICODE, the number of possible "errors" in what is placed in code
for languages like Julia that allow them in most places ...

-Original Message-
From: Python-list  On
Behalf Of Cameron Simpson
Sent: Monday, October 10, 2022 6:17 PM
To: python-list@python.org
Subject: Re: What to use for finding as many syntax errors as possible.

On 11Oct2022 08:02, Chris Angelico  wrote:
>There's a huge difference between non-fatal errors and syntactic 
>errors. The OP wants the parser to magically skip over a fundamental 
>syntactic error and still parse everything else correctly. That's never 
>going to work perfectly, and the OP is surprised at this.

The OP is not surprised by this, and explicitly expressed awareness that
resuming a parse had potential for "misparsing" further code.

I remain of the opinion that one could resume a parse at the next unindented
line and get reasonable results a lot of the time.

In fact, I expect that one could resume tokenising at almost any line which
didn't seem to be inside a string and often get reasonable results.

I grew up with C and Pascal compilers which would _happily_ produce many
complaints, usually accurate, and all manner of syntactic errors. They
didn't stop at the first syntax error.

All you need in principle is a parser which goes "report syntax error here,
continue assuming ". For Python that might mean "pretend a
missing final colon" or "close open brackets" etc, depending on the context.
If you make conservative implied corrections you can get a reasonable
continued parse, enough to find further syntax errors.

I remember the Pascal compiler in particular had a really good "you missed a
semicolon _back there_" mode which was almost alwa

Re: What to use for finding as many syntax errors as possible.

2022-10-10 Thread Chris Angelico

On Tue, 11 Oct 2022 at 13:10,  wrote:
> If the above is:
>
> Import grumpy as np
>
> Then what happens if the code tries to find a file named "grumpy" somewhere
> and cannot locate it and this is considered a syntax error rather than a
> run-time error for whatever reason? Can you continue when all kinds of
> functionality is missing and code asking to make a np.array([1,2,3]) clearly
> fails?

That's not a syntax error. Syntax is VERY specific. It is an error in
Python to attempt to add 1 to "one", it is an error to attempt to look
up the upper() method on None, it is an error to try to use a local
variable you haven't assigned to yet, and it is an error to open a
file that doesn't exist. But not one of these is a *syntax* error.

Syntax errors are detected at the parsing stage, before any code gets
run.  The vast majority of syntax errors are grammar errors, where the
code doesn't align with the parseable text of a Python program.
(Non-grammatical parsing errors include using a "nonlocal" statement
with a name that isn't found in any surrounding scope, using "await"
in a non-async function, and attempting to import braces from the
future.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

RE: What to use for finding as many syntax errors as possible.

2022-10-10 Thread avi.e.gross

Michael,

A reasonable question. Python lets you initialize variables but has no
explicit declarations. Languages differ and I juggle attributes of many in
my mind and am reacting to the original question NOT about whether and how
Python should report many possible errors all at once but how ANY language
can be expected to do this well. Many others do have a variable declaration
phase or an optional declaration or perhaps just a need to declare a
function prototype so it can be used by others even if the formal function
creation will happen later in the code.

But what I meant in a Python context was something like this:

Wronk = who cares # this should fail
...
If (Wronk > 5): ...
...
Wronger = Wronk + 1
...
X = minimum(Wronk, Wronger, 12)

The first line does not parse well so you have an error. But in any case as
the line makes no sense, Wronk is not initialized to anything. Later code
may use it  in various ways and some of those may be seen as errors for an
assortment of reasons, then at one point the code does provide a value for
Wronk and suddenly code beyond that has no seeming errors. The above
examples are not meant to be real but just give a taste that programs with
holes in them for any reason may not be consistent. The only relatively
guaranteed test for sanity has to start at the top and encounter no errors
or missing parts based on an anything such as I/O errors. 

And I suggest there are some things sort of declared in python such as:

Import numpy as np

Yes, that brings in code from a module if it works and initializes a
variable called np to sort of point at the module or it's namespace or
whatever, depending on the language. It is an assignment but also a way to
let the program know things. If the above is:

Import grumpy as np

Then what happens if the code tries to find a file named "grumpy" somewhere
and cannot locate it and this is considered a syntax error rather than a
run-time error for whatever reason? Can you continue when all kinds of
functionality is missing and code asking to make a np.array([1,2,3]) clearly
fails?

Many of us here are talking past each other.

Yes, it would be nice to get lots of info and arguably we may eventually
have machine-learning or AI programs a bit more like SPAM detectors that
look for patterns commonly found and try to fix your program from common
errors or at least do a temporary patch so they can continue searching for
more errors. This could result in the best case in guessing right every
time. If you allowed it to actually fix your code, it might be like people
who let their spelling be corrected and do not proofread properly and send
out something embarrassing or just plain wrong!

And it will compile or be interpreted without complaint albeit not do
exactly what it is supposed to!

-Original Message-
From: Python-list  On
Behalf Of Michael F. Stemper
Sent: Monday, October 10, 2022 9:22 AM
To: python-list@python.org
Subject: Re: What to use for finding as many syntax errors as possible.

On 09/10/2022 10.49, Avi Gross wrote:
> Anton
> 
> There likely are such programs out there but are there universal 
> agreements on how to figure out when a new safe zone of code starts 
> where error testing can begin?
> 
> For example a file full of function definitions might find an error in 
> function 1 and try to find the end of that function and resume 
> checking the next function.  But what if a function defines local
functions within it?
> What if the mistake in one line of code could still allow checking the 
> next line rather than skipping it all?
> 
> My guess is that finding 100 errors might turn out to be misleading. 
> If you fix just the first, many others would go away. If you spell a 
> variable name wrong when declaring it, a dozen uses of the right name may
cause errors.
> Should you fix the first or change all later ones?

How does one declare a variable in python? Sometimes it'd be nice to be able
to have declarations and any undeclared variable be flagged.

When I was writing F77 for a living, I'd (temporarily) put:
   IMPLICIT CHARACTER*3
at the beginning of a program or subroutine that I was modifying, in order
to have any typos flagged.

I'd love it if there was something similar that I could do in python.

--
Michael F. Stemper
87.3% of all statistics are made up by the person giving them.
--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-10 Thread Thomas Passin


On 10/10/2022 9:21 AM, Michael F. Stemper wrote:

On 09/10/2022 10.49, Avi Gross wrote:

Anton

There likely are such programs out there but are there universal 
agreements

on how to figure out when a new safe zone of code starts where error
testing can begin?

For example a file full of function definitions might find an error in
function 1 and try to find the end of that function and resume 
checking the

next function.  But what if a function defines local functions within it?
What if the mistake in one line of code could still allow checking the 
next

line rather than skipping it all?

My guess is that finding 100 errors might turn out to be misleading. 
If you
fix just the first, many others would go away. If you spell a variable 
name

wrong when declaring it, a dozen uses of the right name may cause errors.
Should you fix the first or change all later ones?


How does one declare a variable in python? Sometimes it'd be nice to
be able to have declarations and any undeclared variable be flagged.

When I was writing F77 for a living, I'd (temporarily) put:
   IMPLICIT CHARACTER*3
at the beginning of a program or subroutine that I was modifying,
in order to have any typos flagged.

I'd love it if there was something similar that I could do in python.


The Leo editor (https://github.com/leo-editor/leo-editor) will notify 
you of undeclared variables (and some syntax errors) each time you save 
your (Python) file.


--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-10 Thread Chris Angelico

On Tue, 11 Oct 2022 at 09:18, Cameron Simpson  wrote:
>
> On 11Oct2022 08:02, Chris Angelico  wrote:
> >There's a huge difference between non-fatal errors and syntactic
> >errors. The OP wants the parser to magically skip over a fundamental
> >syntactic error and still parse everything else correctly. That's
> >never going to work perfectly, and the OP is surprised at this.
>
> The OP is not surprised by this, and explicitly expressed awareness that
> resuming a parse had potential for "misparsing" further code.
>
> I remain of the opinion that one could resume a parse at the next
> unindented line and get reasonable results a lot of the time.

The next line at the same indentation level as the line with the
error, or the next flush-left line? Either way, there's a weird and
arbitrary gap before you start parsing again, and you still have no
indication of what could make sense. Consider:

if condition # no colon
code
else:
code

To actually "restart" parsing, you have to make a guess of some sort.
Maybe you can figure out what the user meant to do, and parse
accordingly; but if that's the case, keep going immediately, don't
wait for an unindented line. If you want for a blank line followed by
an unindented line, that might help with a notion of "next logical
unit of code", but it's very much dependent on the coding style, and
if you have a codebase that's so full of syntax errors that you
actually want to see more than one, you probably don't have a codebase
with pristine and beautiful code layout.

> In fact, I expect that one could resume tokenising at almost any line
> which didn't seem to be inside a string and often get reasonable
> results.

"Seem to be"? On what basis?

> I grew up with C and Pascal compilers which would _happily_ produce many
> complaints, usually accurate, and all manner of syntactic errors. They
> didn't stop at the first syntax error.

Yes, because they work with a much simpler grammar. But even then,
most syntactic errors (again, this is not to be confused with semantic
errors - if you say "char *x = 1.234;" then there's no parsing
ambiguity but it's not going to compile) cause a fair degree of
nonsense afterwards.

The waters are a bit muddied by some things being called "syntax
errors" when they're actually nothing at all to do with the parser.
For instance:

>>> def f():
... await q
...
  File "", line 2
SyntaxError: 'await' outside async function

This is not what I'm talking about; there's no parsing ambiguity here,
and therefore no difficulty whatsoever in carrying on with the
parsing. You could ast.parse() this code without an error. But
resuming after a parsing error is fundamentally difficult, impossible
without guesswork.

> All you need in principle is a parser which goes "report syntax error
> here, continue assuming ". For Python that might mean
> "pretend a missing final colon" or "close open brackets" etc, depending
> on the context. If you make conservative implied corrections you can get
> a reasonable continued parse, enough to find further syntax errors.

And, more likely, you'll generate a lot of nonsense. Take something like this:

items = [
item[1],
item2],
item[3],
]

As a human, you can easily see what the problem is. Try teaching a
parser how to handle this. Most likely, you'll generate a spurious
error - maybe the indentation, maybe the intended end of the list -
but there's really only one error here. Reporting multiple errors
isn't actually going to be at all helpful.

> I remember the Pascal compiler in particular had a really good "you
> missed a semicolon _back there_" mode which was almost always correct, a
> nice boon when correcting mistakes.
>

Ahh yes. Design a language with strict syntactic requirements, and
it's not too hard to find where the programmer has omitted them. Thing
is Python just doesn't HAVE those semicolons. Let's say that a
variant Python required you to put a U+251C ├ at the start of every
statement, and U+2524 ┤ at the end of the statement. A whole lot of
classes of error would be extremely easy to notice and correct, and
thus you could resume parsing; but that isn't benefiting the
programmer any. When you don't have that kind of information
duplication, it's a lot harder to figure out how to cheat the fix and
go back to parsing.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-10 Thread Cameron Simpson


On 09/10/2022 10.49, Avi Gross wrote:
My guess is that finding 100 errors might turn out to be misleading. 
If you

fix just the first, many others would go away. If you spell a variable name
wrong when declaring it, a dozen uses of the right name may cause errors.
Should you fix the first or change all later ones?


Just to this, these are semantic errors, not syntax errors. Linters do 
an ok job of spotting these. Antoon is after _syntax errors_.


On 10Oct2022 08:21, Michael F. Stemper  wrote:

How does one declare a variable in python? Sometimes it'd be nice to
be able to have declarations and any undeclared variable be flagged.


Linters do pretty well at this. They can trace names and their use 
compared to their first definition/assignment (often - there are of 
course some constructs which are correct but unclear to a static 
analysis - certainly one of my linters occasionally says "possible 
undefine use" to me because there may be a path to use before set). This 
is particularly handy for typos, which often make for "use before set" 
or "set and not used".



I'd love it if there was something similar that I could do in python.


Have you used any lint programmes? My "lint" script runs pyflakes and 
pylint.


Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-10 Thread Cameron Simpson


On 11Oct2022 08:02, Chris Angelico  wrote:

There's a huge difference between non-fatal errors and syntactic
errors. The OP wants the parser to magically skip over a fundamental
syntactic error and still parse everything else correctly. That's
never going to work perfectly, and the OP is surprised at this.


The OP is not surprised by this, and explicitly expressed awareness that 
resuming a parse had potential for "misparsing" further code.


I remain of the opinion that one could resume a parse at the next 
unindented line and get reasonable results a lot of the time.


In fact, I expect that one could resume tokenising at almost any line 
which didn't seem to be inside a string and often get reasonable 
results.


I grew up with C and Pascal compilers which would _happily_ produce many 
complaints, usually accurate, and all manner of syntactic errors. They 
didn't stop at the first syntax error.


All you need in principle is a parser which goes "report syntax error 
here, continue assuming ". For Python that might mean 
"pretend a missing final colon" or "close open brackets" etc, depending 
on the context. If you make conservative implied corrections you can get 
a reasonable continued parse, enough to find further syntax errors.


I remember the Pascal compiler in particular had a really good "you 
missed a semicolon _back there_" mode which was almost always correct, a 
nice boon when correcting mistakes.


Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-10 Thread Robert Latest via Python-list

Antoon Pardon wrote:
> I would like a tool that tries to find as many syntax errors as possible 
> in a python file.

I'm puzzled as to when such a tool would be needed. How many syntax errors can
you realistically put into a single Python file before compiling it for the
first time?

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-10 Thread Robert Latest via Python-list

Michael F. Stemper wrote:
> How does one declare a variable in python? Sometimes it'd be nice to
> be able to have declarations and any undeclared variable be flagged.

To my knowledge, the closest to that is using __slots__ in class definitions.
Many a time have I assigned to misspelled class members until I discovered
__slots__.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-10 Thread Robert Latest via Python-list

 wrote:
> Cameron,
>
> Your suggestion makes me shudder!

Me, too

> Removing all earlier lines of code is often guaranteed to generate errors as
> variables you are using are not declared or initiated, modules are not
> imported and so on.

all of which aren't syntax errors, so the method should still work. Ugly as
hell though. I can't think of a reason to want to find multiple syntax errors
in a file.

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-10 Thread Michael F. Stemper


On 09/10/2022 10.49, Avi Gross wrote:

Anton

There likely are such programs out there but are there universal agreements
on how to figure out when a new safe zone of code starts where error
testing can begin?

For example a file full of function definitions might find an error in
function 1 and try to find the end of that function and resume checking the
next function.  But what if a function defines local functions within it?
What if the mistake in one line of code could still allow checking the next
line rather than skipping it all?

My guess is that finding 100 errors might turn out to be misleading. If you
fix just the first, many others would go away. If you spell a variable name
wrong when declaring it, a dozen uses of the right name may cause errors.
Should you fix the first or change all later ones?


How does one declare a variable in python? Sometimes it'd be nice to
be able to have declarations and any undeclared variable be flagged.

When I was writing F77 for a living, I'd (temporarily) put:
  IMPLICIT CHARACTER*3
at the beginning of a program or subroutine that I was modifying,
in order to have any typos flagged.

I'd love it if there was something similar that I could do in python.

--
Michael F. Stemper
87.3% of all statistics are made up by the person giving them.
--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-10 Thread Chris Angelico

On Tue, 11 Oct 2022 at 06:34, Peter J. Holzer  wrote:
>
> On 2022-10-10 09:23:27 +1100, Chris Angelico wrote:
> > On Mon, 10 Oct 2022 at 06:50, Antoon Pardon  wrote:
> > > I just want a parser that doesn't give up on encoutering the first syntax
> > > error. Maybe do some semantic checking like checking the number of 
> > > parameters.
> >
> > That doesn't make sense though.
>
> I think you disagree with most compiler authors here.
>
> > It's one thing to keep going after finding a non-syntactic error, but
> > an error of syntax *by definition* makes parsing the rest of the file
> > dubious.
>
> Dubious but still useful.

There's a huge difference between non-fatal errors and syntactic
errors. The OP wants the parser to magically skip over a fundamental
syntactic error and still parse everything else correctly. That's
never going to work perfectly, and the OP is surprised at this.

> > What would it even *mean* to not give up?
>
> Read the blog post on Lezer for some ideas:
> https://marijnhaverbeke.nl/blog/lezer.html
>
> This is in the context of an editor.

Incidentally, that's actually where I would expect to see that kind of
feature show up the most - syntax highlighters will often be designed
to "carry on, somehow" after a syntax error, even though it often
won't make any sense (just look at what happens to your code
highlighting when you omit a quote character). It still won't always
be any use, but you do see *some* attempt at it.

But if the OP would be satisfied with that, I rather doubt that this
thread would even have happened. Unless, of course, the OP still lives
in the dark ages when no text editor available had any suitable
features for code highlighting.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-10 Thread Peter J. Holzer

On 2022-10-10 09:23:27 +1100, Chris Angelico wrote:
> On Mon, 10 Oct 2022 at 06:50, Antoon Pardon  wrote:
> > I just want a parser that doesn't give up on encoutering the first syntax
> > error. Maybe do some semantic checking like checking the number of 
> > parameters.
> 
> That doesn't make sense though.

I think you disagree with most compiler authors here.

> It's one thing to keep going after finding a non-syntactic error, but
> an error of syntax *by definition* makes parsing the rest of the file
> dubious.

Dubious but still useful.

> What would it even *mean* to not give up?

Read the blog post on Lezer for some ideas:
https://marijnhaverbeke.nl/blog/lezer.html

This is in the context of an editor. But the same problem applies to
compilers. It's not very important if a compile run only takes a second
or so but even then it might be helpful to see several error messages
and not only one at a time. It becomes much more important as compile
times get longer (as an extreme[1] example, when I worked on a largeish
cobol program in the 1980s, compiling the thing took about half an hour.
I really wanted to fix *everything* before starting the compiler again.)

Marijn isn't the only person who revisited this problem recently[2].
I've read a few other blog posts and papers on that topic at about the
same time.

hp

[1] Yes, there are programs where a full compile takes much longer than
that. But you can usually get away with recompiling only a small
part, so you don't have to wait that long during normal development.
That cobol compiler couldn't do that.

[2] "Recently" means "in the last 10 years or so".

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-10 Thread Cameron Simpson


On 10Oct2022 09:04, Antoon Pardon  wrote:
It is easy to get the syntax right before submitting to such a 
pipeline.  I usually run a linter on my code for serious commits, and 
I've got a `lint1` alias which basicly runs the short fast flavour of 
that which does a syntax check and the very fast less thorough lint 
phase.


If you have a linter that doesn't quit after the first syntax error, 
please provide a link. I already tried pylint and it also quits after 
the first syntax error.


I don't have such a linter. I did outline an approach for you to write 
one of your own by wrapping an existing parser program.


I have a personal "lint" script which runs a few linters. The first 
check is `py_compile` which quits at the first syntax error. The other 
linters are not even tried if that fails.


I do not know what your editing environment is; I'd have thought that 
some IDEs should make the first syntax error very obvious and easy to go 
to, and an obvious indication that the file as a whoe is syntacticly 
good/bad. If you have such, between them you could fairly easily resolve 
syntax errors rapidly, perhaps rapidly enough to make up for a 
stop-at-the-first-fail syntax check.


Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-10 Thread Antoon Pardon




Op 10/10/2022 om 00:45 schreef Cameron Simpson:

On 09Oct2022 21:46, Antoon Pardon  wrote:
Is it that onerous to fix one thing and run it again? It was once 
when you

handed in punch cards and waited a day or on very busy machines.


Yes I find it onerous, especially since I have a pipeline with unit 
tests
and other tools that all have to redo their work each time a bug is 
corrected.


It is easy to get the syntax right before submitting to such a 
pipeline.  I usually run a linter on my code for serious commits, and 
I've got a `lint1` alias which basicly runs the short fast flavour of 
that which does a syntax check and the very fast less thorough lint phase.


If you have a linter that doesn't quit after the first syntax error, 
please provide a link. I already tried pylint and it also quits after 
the first syntax error.


--
Antoon Pardon
--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Cameron Simpson


On 10Oct2022 00:41, avi.e.gr...@gmail.com  wrote:

Your suggestion makes me shudder!


And fair enough too. I don't do this for me, I'm just suggesting an 
approach which might bring something to Antoon's objective.



Removing all earlier lines of code is often guaranteed to generate errors as
variables you are using are not declared or initiated, modules are not
imported and so on.


Antoon's interested in syntax errors.


Removing just the line or three where the previous error happened would also
have a good chance of invalidating something.


Doubtless. He accepts that any such resume-the-parse can bring 
misleading error messages. Antoon is not expecting magic, just getting 
several complaints instead of just the first syntax error.


I must admit I sympathise a bit, as one of my own major irks is command 
line tools which moan about the first bad option instead of noting it 
and moving on to complain about other things as well, then quitting 
after the command line parse. Pure laziness a lot of the time IMO; I've 
done it myself, but do like to make multiple complaints when it's 
feasible.


Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list

RE: What to use for finding as many syntax errors as possible.

2022-10-09 Thread avi.e.gross

ackets? 

The compiler or interpreter often cannot fix it so it often tries to skip
forward till it finds something unambiguous that mark the beginning of a new
section. That might be something like an unquoted semicolon at the end of a
line or a matching close bracket. Depending on such choices, again, varying
amounts of the program may be ignored in evaluating what follows. But this
is not the same as a human speedreading or daydreaming who misses a bit here
and there and just hopes it was not crucial and that what follows probably
remains worthy and valid. I have sometimes missed something like a name and
then seen pages of pronouns like "she" and eventually give up as no more
hints arrive and I have to go back or ask someone lest a big bunch of the
text makes no sense to me. 

Someone is wanting to treat code from a spelling checker perspective and
wants all possible mistakes thrown at them at once. As I pointed out, in
real life many kinds of context can matter and a really good checker might
even consult a personal list of words it has learned you want ignored, like
people's names or some abbreviations like LOL. It may even read marked-up
text in say HTML or XML or similar formats that is marked with the language
they supposedly contain and calls up a spell-checker appropriate for each
region. 

But if they want a really intelligent program that recovers enough from
errors to reliably continue, maybe not easy.

They have explained and amended that they understand some of these issues
and are willing to get lots of false negatives or red herrings and their
real goal is to have a chance to detect and maybe fix a few things per round
rather than just one. Not a bad wish. Just not a trivial wish to grant and
satisfy.

-Original Message-
From: Python-list  On
Behalf Of Cameron Simpson
Sent: Sunday, October 9, 2022 6:45 PM
To: python-list@python.org
Subject: Re: What to use for finding as many syntax errors as possible.

On 09Oct2022 21:46, Antoon Pardon  wrote:
>>Is it that onerous to fix one thing and run it again? It was once when 
>>you handed in punch cards and waited a day or on very busy machines.
>
>Yes I find it onerous, especially since I have a pipeline with unit 
>tests and other tools that all have to redo their work each time a bug 
>is corrected.

It is easy to get the syntax right before submitting to such a pipeline.  
I usually run a linter on my code for serious commits, and I've got a
`lint1` alias which basicly runs the short fast flavour of that which does a
syntax check and the very fast less thorough lint phase.

I say this just to ease your write/run-tests cycle.

Regarding your main request, had you considered writing your own wrapper
tool? Something which ran something like:

 python -We:invalid -m py_compile your_python_file.py

If there's an error, report it, then make a new file commencing with the
next unindented line after the error, with all preceeding lines commented
out (to keep the line numbers the same). Then run the check again. Repeat
until the file's empty or there are no errors.

This doesn't sound very complex.

Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Thomas Passin




On 10/9/2022 1:29 PM, Peter J. Holzer wrote:
> On 2022-10-09 12:59:09 -0400, Thomas Passin wrote:
>> 
https://stackoverflow.com/questions/4284313/how-can-i-check-the-syntax-of-python-script-without-executing-it

>>
>> People seemed especially enthusiastic about the one-liner from jmd_dk.
>
> I don't think that one-liner solves Antoon's requirement of continuing
> after an error. It uses just the normal python parser so it has exactly
> the same limitations.

Yes, of course. Interesting, though. py_compile tends to be what I use 
for a quick check. I linked to the page mostly for the other 
possibilities, as you mentioned below:


> Some of the mentioned tools may do what Antoon wants, though.
>
>  hp
>
>

--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Cameron Simpson


On 09Oct2022 21:46, Antoon Pardon  wrote:
Is it that onerous to fix one thing and run it again? It was once when 
you

handed in punch cards and waited a day or on very busy machines.


Yes I find it onerous, especially since I have a pipeline with unit tests
and other tools that all have to redo their work each time a bug is 
corrected.


It is easy to get the syntax right before submitting to such a pipeline.  
I usually run a linter on my code for serious commits, and I've got a 
`lint1` alias which basicly runs the short fast flavour of that which 
does a syntax check and the very fast less thorough lint phase.


I say this just to ease your write/run-tests cycle.

Regarding your main request, had you considered writing your own wrapper 
tool? Something which ran something like:


python -We:invalid -m py_compile your_python_file.py

If there's an error, report it, then make a new file commencing with the 
next unindented line after the error, with all preceeding lines 
commented out (to keep the line numbers the same). Then run the check 
again. Repeat until the file's empty or there are no errors.


This doesn't sound very complex.

Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Chris Angelico

On Mon, 10 Oct 2022 at 06:50, Antoon Pardon  wrote:
> I just want a parser that doesn't give up on encoutering the first syntax
> error. Maybe do some semantic checking like checking the number of parameters.

That doesn't make sense though. It's one thing to keep going after
finding a non-syntactic error, but an error of syntax *by definition*
makes parsing the rest of the file dubious. What would it even *mean*
to not give up? How should it interpret the following lines of code?
All it can do is report the error.

You know, if you'd not made this thread, the time you saved would have
been enough for quite a few iterations of "fix one syntactic error,
run it again to find the next".

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Karsten Hilbert

Am Sun, Oct 09, 2022 at 07:51:12PM +0200 schrieb Antoon Pardon:

> >But the point is: you can't (there is no way to) be sure the
> >9+ errors really are errors.
> >
> >Unless you further constrict what sorts of errors you are
> >looking for and what margin of error or leeway for false
> >positives you want to allow.
>
> Look when I was at the university we had to program in Pascal and
> the compilor we used continued parsing until the end. Sure there
> were times that after a number of reported errors the number of
> false positives became so high it was useless trying to find the
> remaining true ones, but it still was more efficient to correct the
> obvious ones, than to only correct the first one.
>
> I don't need to be sure. Even the occasional wrong correction
> is probably still more efficient than quiting after the first
> syntax error.

A-ha, so you further defined your context.

Under which I can agree to the objective :-)

Best,
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Barry



> On 9 Oct 2022, at 18:54, Antoon Pardon  wrote:
> 
> 
> 
> Op 9/10/2022 om 19:23 schreef Karsten Hilbert:
>> Am Sun, Oct 09, 2022 at 06:59:36PM +0200 schrieb Antoon Pardon:
>> 
>>> Op 9/10/2022 om 17:49 schreef Avi Gross:
 My guess is that finding 100 errors might turn out to be misleading. If you
 fix just the first, many others would go away.
>>> At this moment I would prefer a tool that reported 100 errors, which would
>>> allow me to easily correct 10 real errors, over the python strategy which 
>>> quits
>>> after having found one syntax error.
>> But the point is: you can't (there is no way to) be sure the
>> 9+ errors really are errors.
>> 
>> Unless you further constrict what sorts of errors you are
>> looking for and what margin of error or leeway for false
>> positives you want to allow.
> 
> Look when I was at the university we had to program in Pascal and
> the compilor we used continued parsing until the end. Sure there
> were times that after a number of reported errors the number of
> false positives became so high it was useless trying to find the
> remaining true ones, but it still was more efficient to correct the
> obvious ones, than to only correct the first one.

If it’s very fast to syntax check then one at a time is fine.
Python is very fast to syntax check so I personal do not need the multi error 
version.
My editor has syntax check on a key and it’s instant to drop me a syntax error.

Barry

> 
> I don't need to be sure. Even the occasional wrong correction
> is probably still more efficient than quiting after the first
> syntax error.
> 
> -- 
> Antoon.
> -- 
> https://mail.python.org/mailman/listinfo/python-list
> 

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Antoon Pardon





Op 9/10/2022 om 21:44 schreef Avi Gross:

But an error like setting the size of a fixed length data structure to the
right size may result in oodles of errors about being out of range that
magically get fixed by one change. Sometimes too much info just gives you a
headache.


So? The user of such a tool doesn't need to go through all the provided 
information.
If after correcting a few errors, the users find the rest of the information 
gives
him a headache, he can just ignore all that and just run a new iteration.

--
Antoon Pardon
--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Peter J. Holzer

On 2022-10-09 15:18:19 -0400, Avi Gross wrote:
> Antoon,  it may also relate to an interpreter versus compiler issue.
> 
> Something like a compiler for C does not do anything except write code in
> an assembly language. It can choose to keep going after an error and start
> looking some more from a less stable place.
> 
> Interpreters for Python have to catch interrupts as they go and often run
> code in small batches. Continuing to evaluate after an error could cause
> weird effects.

I don't think this is really an issue. A python file is completely
compiled to byte code before execution starts.

It's true that a syntax error before an import prevents that import, but
since imports are usually at the start of a file, a syntax error will
only rarely prevent the import (and files intended to be imported
generally don't have weird side effects anyway).

One issue is could be that compilers which generate executables are
generally thorough and slow, while the compilers which generate
byte-code for immediate consumption by an interpreter are generally
simple and fast. So there is more incentive for the former to discover
as many errors as possible and they are also better equipped to do this.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Antoon Pardon





Op 9/10/2022 om 21:18 schreef Avi Gross:

Antoon,  it may also relate to an interpreter versus compiler issue.

Something like a compiler for C does not do anything except write code in
an assembly language. It can choose to keep going after an error and start
looking some more from a less stable place.

Interpreters for Python have to catch interrupts as they go and often run
code in small batches. Continuing to evaluate after an error could cause
weird effects.

So what you want is closer to a lint program that does not run code at all,
or merely writes pseudocode to a file to be run faster later.


I just want a parser that doesn't give up on encoutering the first syntax
error. Maybe do some semantic checking like checking the number of parameters.


I will say that often enough a program could report more possible errors.
Putting your code into multiple files and modules may mean you could
cleanly evaluate the code and return multiple errors from many modules as
long as they are distinct. Finding all errors is not possible if recovery
from one is not guaranteed.


I don't need it to find all errors. As long as it reasonably accuratly
finds a significant number of them.


Is it that onerous to fix one thing and run it again? It was once when you
handed in punch cards and waited a day or on very busy machines.


Yes I find it onerous, especially since I have a pipeline with unit tests
and other tools that all have to redo their work each time a bug is corrected.

--
Antoon.
--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Antoon Pardon





Op 9/10/2022 om 21:18 schreef Avi Gross:

Antoon,  it may also relate to an interpreter versus compiler issue.

Something like a compiler for C does not do anything except write code in
an assembly language. It can choose to keep going after an error and start
looking some more from a less stable place.

Interpreters for Python have to catch interrupts as they go and often run
code in small batches. Continuing to evaluate after an error could cause
weird effects.

So what you want is closer to a lint program that does not run code at all,
or merely writes pseudocode to a file to be run faster later.


I just want a parser that doesn't give up on encoutering the first syntax
error. Maybe do some semantic checking like checking the number of parameters.


I will say that often enough a program could report more possible errors.
Putting your code into multiple files and modules may mean you could
cleanly evaluate the code and return multiple errors from many modules as
long as they are distinct. Finding all errors is not possible if recovery
from one is not guaranteed.


I don't need it to find all errors. As long as it reasonably accuratly
finds a significant number of them.


Is it that onerous to fix one thing and run it again? It was once when you
handed in punch cards and waited a day or on very busy machines.


Yes I find it onerous, especially since I have a pipeline with unit tests
and other tools that all have to redo their work each time a bug is corrected.

--
Antoon.
--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Avi Gross

I will say that those of us  meaning me, who express reservations are not
arguing it is a bad idea to get more info in one sweep. Many errors come in
bunches.

If I keep calling some function with the wrong number or type of arguments,
it may be the same in a dozen places in my code. The first error report may
make me search for the others places so I fix it all at once. Telling me
where some instances are might speed that a bit.

As long as it is understood that further errors are a heuristic and
possibly misleading,  fine.

But an error like setting the size of a fixed length data structure to the
right size may result in oodles of errors about being out of range that
magically get fixed by one change. Sometimes too much info just gives you a
headache.

But a tool like you described could have uses even if imperfect. If you are
teaching a course and students submit programs, could you grade the one
with a single error higher than one with 5 errors shown imperfectly and
fail the one with 600?

On Sun, Oct 9, 2022, 1:53 PM Antoon Pardon  wrote:

>
>
> Op 9/10/2022 om 19:23 schreef Karsten Hilbert:
> > Am Sun, Oct 09, 2022 at 06:59:36PM +0200 schrieb Antoon Pardon:
> >
> >> Op 9/10/2022 om 17:49 schreef Avi Gross:
> >>> My guess is that finding 100 errors might turn out to be misleading.
> If you
> >>> fix just the first, many others would go away.
> >> At this moment I would prefer a tool that reported 100 errors, which
> would
> >> allow me to easily correct 10 real errors, over the python strategy
> which quits
> >> after having found one syntax error.
> > But the point is: you can't (there is no way to) be sure the
> > 9+ errors really are errors.
> >
> > Unless you further constrict what sorts of errors you are
> > looking for and what margin of error or leeway for false
> > positives you want to allow.
>
> Look when I was at the university we had to program in Pascal and
> the compilor we used continued parsing until the end. Sure there
> were times that after a number of reported errors the number of
> false positives became so high it was useless trying to find the
> remaining true ones, but it still was more efficient to correct the
> obvious ones, than to only correct the first one.
>
> I don't need to be sure. Even the occasional wrong correction
> is probably still more efficient than quiting after the first
> syntax error.
>
> --
> Antoon.
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Avi Gross

Antoon,  it may also relate to an interpreter versus compiler issue.

Something like a compiler for C does not do anything except write code in
an assembly language. It can choose to keep going after an error and start
looking some more from a less stable place.

Interpreters for Python have to catch interrupts as they go and often run
code in small batches. Continuing to evaluate after an error could cause
weird effects.

So what you want is closer to a lint program that does not run code at all,
or merely writes pseudocode to a file to be run faster later.

Many languages now have blocks of code that are not really be evaluated
till later. Some code is built on the fly. And some errors are not errors
at first. Many languages let you not declare a variable before using it or
allow it to change types. In some, the text is lazily evaluated as late as
possible.

I will say that often enough a program could report more possible errors.
Putting your code into multiple files and modules may mean you could
cleanly evaluate the code and return multiple errors from many modules as
long as they are distinct. Finding all errors is not possible if recovery
from one is not guaranteed.

Take a language that uses a semicolon to end a statement. If absent usually
there would be some error but often something on the next line. Your
evaluator could do an experiment and add a semicolon and try again. This
might work 90% of the time but sometimes the error was not ending the line
with a backslash to make it continue properly, or an indentation issue and
even spelling error. No guarantees.

Is it that onerous to fix one thing and run it again? It was once when you
handed in punch cards and waited a day or on very busy machines.

On Sun, Oct 9, 2022, 1:03 PM Antoon Pardon  wrote:

>
>
> Op 9/10/2022 om 17:49 schreef Avi Gross:
> > My guess is that finding 100 errors might turn out to be misleading. If
> you
> > fix just the first, many others would go away.
>
> At this moment I would prefer a tool that reported 100 errors, which would
> allow me to easily correct 10 real errors, over the python strategy which
> quits
> after having found one syntax error.
>
> --
> Antoon.
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread MRAB


On 2022-10-09 18:51, Antoon Pardon wrote:



Op 9/10/2022 om 19:23 schreef Karsten Hilbert:

Am Sun, Oct 09, 2022 at 06:59:36PM +0200 schrieb Antoon Pardon:


Op 9/10/2022 om 17:49 schreef Avi Gross:

My guess is that finding 100 errors might turn out to be misleading. If you
fix just the first, many others would go away.

At this moment I would prefer a tool that reported 100 errors, which would
allow me to easily correct 10 real errors, over the python strategy which quits
after having found one syntax error.

But the point is: you can't (there is no way to) be sure the
9+ errors really are errors.

Unless you further constrict what sorts of errors you are
looking for and what margin of error or leeway for false
positives you want to allow.


Look when I was at the university we had to program in Pascal and
the compilor we used continued parsing until the end. Sure there
were times that after a number of reported errors the number of
false positives became so high it was useless trying to find the
remaining true ones, but it still was more efficient to correct the
obvious ones, than to only correct the first one.

I don't need to be sure. Even the occasional wrong correction
is probably still more efficient than quiting after the first
syntax error.

When I did some programming in COBOL, a single omitted "." would 
completely confuse the compiler and it was best to fix that one error 
and then try again.


On the other hand, TurboPascal would also stop on the first error and 
put the cursor at the error position in the IDE, but as it compiled 
quickly, it wasn't a problem. It was no slower than it would've been if 
it had found multiple errors and you pressed a key to advance to the 
next error.

--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Weatherby,Gerard

PyCharm.

Does a good job of separating these are really errors from do you really mean 
that warnings from this word is spelled right.

https://www.jetbrains.com/pycharm/

From: Python-list  on 
behalf of Antoon Pardon 
Date: Sunday, October 9, 2022 at 6:11 AM
To: python-list@python.org 
Subject: What to use for finding as many syntax errors as possible.
*** Attention: This is an external email. Use caution responding, opening 
attachments or clicking on links. ***

I would like a tool that tries to find as many syntax errors as possible
in a python file. I know there is the risk of false positives when a
tool tries to recover from a syntax error and proceeds but I would
prefer that over the current python strategy of quiting after the first
syntax error. I just want a tool for syntax errors. No style
enforcements. Any recommandations? -- Antoon Pardon
--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!kxDZilNf74VILuntVEzVZ4Wjv6RPr4JUbGpWrURDJ3CtDNAi9szBWweqrDM-uHy-o_Sncgrm2BmJIRksmxSG_LGVbBU$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!kxDZilNf74VILuntVEzVZ4Wjv6RPr4JUbGpWrURDJ3CtDNAi9szBWweqrDM-uHy-o_Sncgrm2BmJIRksmxSG_LGVbBU$>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Antoon Pardon





Op 9/10/2022 om 19:23 schreef Karsten Hilbert:

Am Sun, Oct 09, 2022 at 06:59:36PM +0200 schrieb Antoon Pardon:


Op 9/10/2022 om 17:49 schreef Avi Gross:

My guess is that finding 100 errors might turn out to be misleading. If you
fix just the first, many others would go away.

At this moment I would prefer a tool that reported 100 errors, which would
allow me to easily correct 10 real errors, over the python strategy which quits
after having found one syntax error.

But the point is: you can't (there is no way to) be sure the
9+ errors really are errors.

Unless you further constrict what sorts of errors you are
looking for and what margin of error or leeway for false
positives you want to allow.


Look when I was at the university we had to program in Pascal and
the compilor we used continued parsing until the end. Sure there
were times that after a number of reported errors the number of
false positives became so high it was useless trying to find the
remaining true ones, but it still was more efficient to correct the
obvious ones, than to only correct the first one.

I don't need to be sure. Even the occasional wrong correction
is probably still more efficient than quiting after the first
syntax error.

--
Antoon.
--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Peter J. Holzer

On 2022-10-09 19:23:41 +0200, Karsten Hilbert wrote:
> Am Sun, Oct 09, 2022 at 06:59:36PM +0200 schrieb Antoon Pardon:
> > Op 9/10/2022 om 17:49 schreef Avi Gross:
> > >My guess is that finding 100 errors might turn out to be misleading. If you
> > >fix just the first, many others would go away.
> >
> > At this moment I would prefer a tool that reported 100 errors, which would
> > allow me to easily correct 10 real errors, over the python strategy which 
> > quits
> > after having found one syntax error.
> 
> But the point is: you can't (there is no way to) be sure the
> 9+ errors really are errors.

As a human who knows Python in many cases you can be sure. Sometimes you
aren't sure, then you leave that one for the next iteration. No big
deal. This isn't the 1960s when you sent your punched cards in and got
the result back next week. So neither the parser nor you need to be
perfect. Just better than one error at a time.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Peter J. Holzer

On 2022-10-09 12:59:09 -0400, Thomas Passin wrote:
> https://stackoverflow.com/questions/4284313/how-can-i-check-the-syntax-of-python-script-without-executing-it
> 
> People seemed especially enthusiastic about the one-liner from jmd_dk.

I don't think that one-liner solves Antoon's requirement of continuing
after an error. It uses just the normal python parser so it has exactly
the same limitations.

Some of the mentioned tools may do what Antoon wants, though.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Karsten Hilbert

Am Sun, Oct 09, 2022 at 06:59:36PM +0200 schrieb Antoon Pardon:

> Op 9/10/2022 om 17:49 schreef Avi Gross:
> >My guess is that finding 100 errors might turn out to be misleading. If you
> >fix just the first, many others would go away.
>
> At this moment I would prefer a tool that reported 100 errors, which would
> allow me to easily correct 10 real errors, over the python strategy which 
> quits
> after having found one syntax error.

But the point is: you can't (there is no way to) be sure the
9+ errors really are errors.

Unless you further constrict what sorts of errors you are
looking for and what margin of error or leeway for false
positives you want to allow.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Thomas Passin


https://stackoverflow.com/questions/4284313/how-can-i-check-the-syntax-of-python-script-without-executing-it

People seemed especially enthusiastic about the one-liner from jmd_dk.

On 10/9/2022 12:17 PM, Peter J. Holzer wrote:

On 2022-10-09 12:09:17 +0200, Antoon Pardon wrote:

I would like a tool that tries to find as many syntax errors as possible in
a python file. I know there is the risk of false positives when a tool tries
to recover from a syntax error and proceeds but I would prefer that over the
current python strategy of quiting after the first syntax error. I just want
a tool for syntax errors. No style enforcements. Any recommandations?


There seems to have been increased interest in good error recovery over
the last years. I thought I had bookmarked a bunch of projects, but the
only one I can find right now is Lezer
(https://marijnhaverbeke.nl/blog/lezer.html) which is part of the
CodeMirror (https://codemirror.net/) editor. Python is listed as a
currently supported language, so you might want to check that out.

Disclaimer: I haven't used CodeMirror, so I can't say anything about
its quality. The blog entry about Lezer was interesting, though.

 hp




--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Antoon Pardon





Op 9/10/2022 om 17:49 schreef Avi Gross:

My guess is that finding 100 errors might turn out to be misleading. If you
fix just the first, many others would go away.


At this moment I would prefer a tool that reported 100 errors, which would
allow me to easily correct 10 real errors, over the python strategy which quits
after having found one syntax error.

--
Antoon.

--
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Peter J. Holzer

On 2022-10-09 12:09:17 +0200, Antoon Pardon wrote:
> I would like a tool that tries to find as many syntax errors as possible in
> a python file. I know there is the risk of false positives when a tool tries
> to recover from a syntax error and proceeds but I would prefer that over the
> current python strategy of quiting after the first syntax error. I just want
> a tool for syntax errors. No style enforcements. Any recommandations?

There seems to have been increased interest in good error recovery over
the last years. I thought I had bookmarked a bunch of projects, but the
only one I can find right now is Lezer
(https://marijnhaverbeke.nl/blog/lezer.html) which is part of the
CodeMirror (https://codemirror.net/) editor. Python is listed as a
currently supported language, so you might want to check that out.

Disclaimer: I haven't used CodeMirror, so I can't say anything about
its quality. The blog entry about Lezer was interesting, though.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: What to use for finding as many syntax errors as possible.

2022-10-09 Thread Avi Gross

Anton

There likely are such programs out there but are there universal agreements
on how to figure out when a new safe zone of code starts where error
testing can begin?

For example a file full of function definitions might find an error in
function 1 and try to find the end of that function and resume checking the
next function.  But what if a function defines local functions within it?
What if the mistake in one line of code could still allow checking the next
line rather than skipping it all?

My guess is that finding 100 errors might turn out to be misleading. If you
fix just the first, many others would go away. If you spell a variable name
wrong when declaring it, a dozen uses of the right name may cause errors.
Should you fix the first or change all later ones?

On Sun, Oct 9, 2022, 6:11 AM Antoon Pardon  wrote:

> I would like a tool that tries to find as many syntax errors as possible
> in a python file. I know there is the risk of false positives when a
> tool tries to recover from a syntax error and proceeds but I would
> prefer that over the current python strategy of quiting after the first
> syntax error. I just want a tool for syntax errors. No style
> enforcements. Any recommandations? -- Antoon Pardon
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list

What to use for finding as many syntax errors as possible.

2022-10-09 Thread Antoon Pardon

I would like a tool that tries to find as many syntax errors as possible 
in a python file. I know there is the risk of false positives when a 
tool tries to recover from a syntax error and proceeds but I would 
prefer that over the current python strategy of quiting after the first 
syntax error. I just want a tool for syntax errors. No style 
enforcements. Any recommandations? -- Antoon Pardon

--
https://mail.python.org/mailman/listinfo/python-list

60 matches

Mail list logo