[issue47117] repl segfaults on non utf-8 input

2022-03-26 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

Thanks for the report, Jon!

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47117] repl segfaults on non utf-8 input

2022-03-26 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:


New changeset 27ee43183437c473725eba00def0ea7647688926 by Pablo Galindo Salgado 
in branch '3.10':
[3.10] bpo-47117: Don't crash if we fail to decode characters when the 
tokenizer buffers are uninitialized (GH-32129) (GH-32130)
https://github.com/python/cpython/commit/27ee43183437c473725eba00def0ea7647688926


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47117] repl segfaults on non utf-8 input

2022-03-26 Thread Pablo Galindo Salgado


Change by Pablo Galindo Salgado :


--
pull_requests: +30210
pull_request: https://github.com/python/cpython/pull/32130

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47117] repl segfaults on non utf-8 input

2022-03-26 Thread miss-islington


miss-islington  added the comment:


New changeset 26cca8067bf5306e372c0e90036d832c5021fd90 by Pablo Galindo Salgado 
in branch 'main':
bpo-47117: Don't crash if we fail to decode characters when the tokenizer 
buffers are uninitialized (GH-32129)
https://github.com/python/cpython/commit/26cca8067bf5306e372c0e90036d832c5021fd90


--
nosy: +miss-islington

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47117] repl segfaults on non utf-8 input

2022-03-26 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

Ah yes, we have been defeated by half an emoji :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47117] repl segfaults on non utf-8 input

2022-03-26 Thread Pablo Galindo Salgado


Change by Pablo Galindo Salgado :


--
keywords: +patch
pull_requests: +30209
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/32129

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47117] repl segfaults on non utf-8 input

2022-03-25 Thread Jon Åslund

Jon Åslund  added the comment:

very similar back trace too

(gdb) run
Starting program: /home/jon/.pyenv/versions/3.10.4/bin/python3.10 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Python 3.10.4 (main, Mar 24 2022, 14:20:44) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> _ 

Program received signal SIGSEGV, Segmentation fault.
__strchr_avx2 () at ../sysdeps/x86_64/multiarch/strchr-avx2.S:57
57  ../sysdeps/x86_64/multiarch/strchr-avx2.S: No such file or directory.
(gdb) bt
#0  __strchr_avx2 () at ../sysdeps/x86_64/multiarch/strchr-avx2.S:57
#1  0x557d4a7a in get_error_line (lineno=lineno@entry=0, p=, p=) at Parser/pegen.c:443
#2  0x557d541b in _PyPegen_raise_error_known_location 
(p=0x77885ed0, 
errtype=0x558fe420 <_PyExc_SyntaxError>, lineno=0, col_offset=0, 
end_lineno=0, end_col_offset=-1, 
errmsg=0x558a2dd3 "(%s) %U", va=0x7fffd410) at Parser/pegen.c:499
#3  0x557d5646 in _PyPegen_raise_error (p=p@entry=0x77885ed0, 
errtype=, 
errmsg=errmsg@entry=0x558a2dd3 "(%s) %U") at Parser/pegen.c:422
#4  0x557d5839 in raise_decode_error (p=p@entry=0x77885ed0) at 
Parser/pegen.c:271
#5  0x557d6193 in initialize_token (token_type=60, end=0x0, 
start=, token=0x77a55d10, 
p=0x77885ed0) at Parser/pegen.c:720
#6  _PyPegen_fill_token (p=p@entry=0x77885ed0) at Parser/pegen.c:793
#7  0x557fec00 in statement_newline_rule (p=0x77885ed0) at 
Parser/parser.c:1080
#8  interactive_rule (p=0x77885ed0) at Parser/parser.c:1002
#9  _PyPegen_parse (p=p@entry=0x77885ed0) at Parser/parser.c:34508
#10 0x557d6c60 in _PyPegen_run_parser (p=0x77885ed0) at 
Parser/pegen.c:1342
#11 0x557d718f in _PyPegen_run_parser_from_file_pointer 
(fp=fp@entry=0x77e29980 <_IO_2_1_stdin_>, 
start_rule=start_rule@entry=256, 
filename_ob=filename_ob@entry=0x77a85670, enc=enc@entry=0x77a7c1a0 
"utf-8", 
ps1=, ps1@entry=0x1e00160 , 
ps2=ps2@entry=0xe001a0 , flags=0x7fffd7f8, 
errcode=0x7fffd724, arena=0x7792cc70) at Parser/pegen.c:1448
#12 0x5575661c in _PyParser_ASTFromFile (fp=fp@entry=0x77e29980 
<_IO_2_1_stdin_>, 
filename_ob=filename_ob@entry=0x77a85670, enc=enc@entry=0x77a7c1a0 
"utf-8", mode=mode@entry=256, 
ps1=0x1e00160 , 
ps1@entry=0x77acf960 ">>> ", 
ps2=0xe001a0 , 
ps2@entry=0x77af02e0 "... ", 
flags=, errcode=, arena=) at 
Parser/peg_api.c:26
#13 0x556cad97 in PyRun_InteractiveOneObjectEx 
(fp=fp@entry=0x77e29980 <_IO_2_1_stdin_>, 
filename=filename@entry=0x77a85670, flags=flags@entry=0x7fffd7f8) at 
Python/pythonrun.c:257
#14 0x556cba26 in _PyRun_InteractiveLoopObject 
(fp=fp@entry=0x77e29980 <_IO_2_1_stdin_>, 
filename=filename@entry=0x77a85670, flags=flags@entry=0x7fffd7f8) at 
Python/pythonrun.c:148
#15 0x556cc5ce in _PyRun_AnyFileObject (flags=, 
closeit=, filename=0x77a85670, fp=) at 
Python/pythonrun.c:84
#16 PyRun_AnyFileExFlags (fp=0x77e29980 <_IO_2_1_stdin_>, 
filename=filename@entry=0x55802103 "", closeit=closeit@entry=0, 
flags=flags@entry=0x7fffd7f8) at Python/pythonrun.c:116
#17 0x555bb5c7 in pymain_run_stdin (config=0x55932ce0) at 
Modules/main.c:502
#18 pymain_run_python (exitcode=exitcode@entry=0x7fffd930) at 
Modules/main.c:590
#19 0x555bba1f in Py_RunMain () at Modules/main.c:666
#20 pymain_main (args=0x7fffd8f0) at Modules/main.c:696
#21 Py_BytesMain (argc=, argv=) at 
Modules/main.c:720
#22 0x77c610b3 in __libc_start_main (main=0x555aedb0 , 
argc=1, argv=0x7fffda58, init=, fini=, 
rtld_fini=, stack_end=0x7fffda48)
at ../csu/libc-start.c:308
#23 0x555ba57e in _start () at ./Include/internal/pycore_pyerrors.h:14

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47117] repl segfaults on non utf-8 input

2022-03-25 Thread Jon Åslund

Jon Åslund  added the comment:

Yes. I think they are the same. I can reproduce the emoji crash. This is much 
easier to reproduce. No need to have a Swedish keyboard layout.

1. Copy _
2. Start python with a non unicode locale. LC_ALL=C python3.10
3. Paste in _
4. Press backspace once. It will look like the 2 character wide emoji is 
replaced by a 1 character wide space.
6. Press return
7. Crash

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47117] repl segfaults on non utf-8 input

2022-03-25 Thread Karthikeyan Singaravelan


Karthikeyan Singaravelan  added the comment:

This looks similar to https://bugs.python.org/issue46206

--
nosy: +pablogsal, xtreak

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47117] repl segfaults on non utf-8 input

2022-03-25 Thread Jon Åslund

New submission from Jon Åslund :

Some bytes that are non utf-8 segfaults python repl in 3.10 and later on linux. 
Example:

$ python3.10
Python 3.10.4 (main, Mar 24 2022, 14:20:44) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> �
Segmentation fault (core dumped)

It is treated correctly in Python 3.9 and earlier

$ python3.9
Python 3.9.12 (main, Mar 24 2022, 14:21:53) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> �
  File "", line 0

SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xb6 in position 
0: invalid start byte

How to reproduce:

In Gnome on Ubuntu 20.04 with the Swedish keyboard layout, holding left alt and 
pressing the ö key enters the byte 0xb6 into the terminal.

I have only been able to make it crash the repl. I can't make it crash the 
parser. For instance trying to eval the byte.

--
messages: 415992
nosy: jooon
priority: normal
severity: normal
status: open
title: repl segfaults on non utf-8 input
type: crash
versions: Python 3.10, Python 3.11

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com