[issue40661] The new parser segfaults when parsing invalid input

2020-05-18 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:


New changeset 7b7a21bc4fd063b26a2d1882fddc458861497812 by Lysandros Nikolaou in 
branch 'master':
bpo-40661: Fix segfault when parsing invalid input (GH-20165)
https://github.com/python/cpython/commit/7b7a21bc4fd063b26a2d1882fddc458861497812


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40661] The new parser segfaults when parsing invalid input

2020-05-18 Thread Pablo Galindo Salgado


Change by Pablo Galindo Salgado :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40661] The new parser segfaults when parsing invalid input

2020-05-18 Thread Guido van Rossum


Guido van Rossum  added the comment:

Okay, deferring the blocker. But I'll still land Lysandros' fix ASAP.

--
priority: release blocker -> deferred blocker

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40661] The new parser segfaults when parsing invalid input

2020-05-18 Thread STINNER Victor


STINNER Victor  added the comment:

I don't think that such bug should block Python 3.9 beta1 (I'm talking about 
the "release blocker" priority). There is still time before 3.9 final to fix it.

--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40661] The new parser segfaults when parsing invalid input

2020-05-18 Thread Guido van Rossum


Guido van Rossum  added the comment:

Unfortunately, I do not understand enough about how Hypothes* works to be 
helpful here, other than by offering encouragement. (And please don't try to 
educate me. I have too many other things. Sorry.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40661] The new parser segfaults when parsing invalid input

2020-05-18 Thread Zac Hatfield-Dodds


Zac Hatfield-Dodds  added the comment:

I know what else it might find either, but I still think it's worth running 
property-based tests in CI to find out!  The demo I wrote for my language 
summit talk doesn't have any parser tests, but still would have caught this bug 
in the pull request that introduced it.


The specific reproducer here is odd, because it's reported as an internal error 
in Hypothesmith - I use the `compile()` builtin to check that strings valid 
against an approximate grammar are actually valid.

It's structurally less complex than typical outputs because it's only a 
fragment of the tree being generated, but because shrinking doesn't run for 
generation-time errors it's also much harder to interpret than usual.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40661] The new parser segfaults when parsing invalid input

2020-05-17 Thread Guido van Rossum


Guido van Rossum  added the comment:

Zac: The reproducer here apparently uses a long string of weird accented 
characters. I'm not sure how to generalize from that to other things that 
Hyothes* could find. But maybe this helps: 
https://github.com/python/cpython/pull/20106#issuecomment-629742075

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40661] The new parser segfaults when parsing invalid input

2020-05-17 Thread Zac Hatfield-Dodds


Zac Hatfield-Dodds  added the comment:

I understand from Paul Ganssle that this bug was found using Hypothesmith in my 
stdlib property tests (reported at 
https://github.com/Zac-HD/stdlib-property-tests/issues/14).

As discussed in https://github.com/we-like-parsers/cpython/issues/91 and 
https://pyfound.blogspot.com/2020/05/property-based-testing-for-python.html I'm 
keen to help out how I can, so if there's anything more specific than "write 
tools, write test, and wait" please let me know!

Best,
Zac

--
nosy: +Zac Hatfield-Dodds

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40661] The new parser segfaults when parsing invalid input

2020-05-17 Thread Lysandros Nikolaou


Change by Lysandros Nikolaou :


--
pull_requests: +19464
pull_request: https://github.com/python/cpython/pull/20165

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40661] The new parser segfaults when parsing invalid input

2020-05-17 Thread Guido van Rossum


Change by Guido van Rossum :


--
keywords: +patch
pull_requests: +19462
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/20159

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40661] The new parser segfaults when parsing invalid input

2020-05-17 Thread Guido van Rossum


Guido van Rossum  added the comment:

I see almost no time difference for 'make time_stdlib': before 3.471, after 
3.451. But I see a serious difference for 'make time_compile': before 3.474, 
after 4.996. That's over 40% slower (on the extreme case, xxl.py).

I'll prepare a PR just in case.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40661] The new parser segfaults when parsing invalid input

2020-05-17 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

A quick benchmark using xxl.py:

Base time (master):
Time: 9.386 seconds on an average of 20 runs

With the patch in this issue:
Time: 9.498 seconds on an average of 20 runs

Sadly I could not test with PGO/LTO and without CPU isolation, so it would be 
great if someone could double-check these numbers.

Also, I will be unable to do a PR until this night/tomorrow morning (London 
time) :(

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40661] The new parser segfaults when parsing invalid input

2020-05-17 Thread Guido van Rossum

Guido van Rossum  added the comment:

How costly is PyErr_Occurred()? That worries me most, otherwise I’d accept this 
right away.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40661] The new parser segfaults when parsing invalid input

2020-05-17 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

Indeed, that diff solves the problem

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40661] The new parser segfaults when parsing invalid input

2020-05-17 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

I think we may need to test for the error indicator (and maybe PyErr_Ocurred 
for safety) before every alternative. Something like:

diff --git a/Tools/peg_generator/pegen/c_generator.py 
b/Tools/peg_generator/pegen/c_generator.py
index 8f9972bb41..61cb694628 100644
--- a/Tools/peg_generator/pegen/c_generator.py
+++ b/Tools/peg_generator/pegen/c_generator.py
@@ -468,10 +468,6 @@ class CParserGenerator(ParserGenerator, GrammarVisitor):
 memoize = self._should_memoize(node)

 with self.indent():
-self.print("if (p->error_indicator) {")
-with self.indent():
-self.print("return NULL;")
-self.print("}")
 self.print(f"{result_type} _res = NULL;")
 if memoize:
 self.print(f"if (_PyPegen_is_memoized(p, {node.name}_type, 
&_res))")
@@ -685,6 +681,12 @@ class CParserGenerator(ParserGenerator, GrammarVisitor):
 def visit_Alt(
 self, node: Alt, is_loop: bool, is_gather: bool, rulename: 
Optional[str]
 ) -> None:
+self.print("if (p->error_indicator == 1 || PyErr_Occurred()) {")
+with self.indent():
+self.print("p->error_indicator = 1;")
+self.print("return NULL;")
+self.print("}")
+
 self.print(f"{{ // {node}")
 with self.indent():
 # Prepare variable declarations for the alternative

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40661] The new parser segfaults when parsing invalid input

2020-05-17 Thread Pablo Galindo Salgado


New submission from Pablo Galindo Salgado :

The new peg parser segfaults when parsing the attached reproducer.

this is due to the fact that the exception set by `tokenizer_error` is not 
properly propagated.

--
components: Interpreter Core
files: reproducer.py
messages: 369132
nosy: gvanrossum, lys.nikolaou, pablogsal
priority: release blocker
severity: normal
stage: needs patch
status: open
title: The new parser segfaults when parsing invalid input
type: crash
versions: Python 3.9
Added file: https://bugs.python.org/file49169/reproducer.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com