[issue25880] codecs should raise specific UnicodeDecodeError/UnicodeEncodeError rather than just UnicodeError

2021-11-26 Thread Irit Katriel

Change by Irit Katriel :


--
title: u'..'.encode('idna') → UnicodeError: label empty or too long -> codecs 
should raise specific UnicodeDecodeError/UnicodeEncodeError rather than just 
UnicodeError
versions: +Python 3.11 -Python 2.7, Python 3.4

___
Python tracker 
<https://bugs.python.org/issue25880>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45461] UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 8191: \ at end of string

2021-10-14 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45461] UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 8191: \ at end of string

2021-10-14 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:


New changeset 7c722e32bf582108680f49983cf01eaed710ddb9 by Serhiy Storchaka in 
branch '3.9':
[3.9] bpo-45461: Fix IncrementalDecoder and StreamReader in the 
"unicode-escape" codec (GH-28939) (GH-28945)
https://github.com/python/cpython/commit/7c722e32bf582108680f49983cf01eaed710ddb9


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45461] UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 8191: \ at end of string

2021-10-14 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:


New changeset 0bff4ccbfd3297b0adf690655d3e9ddb0033bc69 by Miss Islington (bot) 
in branch '3.10':
[3.10] bpo-45461: Fix IncrementalDecoder and StreamReader in the 
"unicode-escape" codec (GH-28939) (GH-28943)
https://github.com/python/cpython/commit/0bff4ccbfd3297b0adf690655d3e9ddb0033bc69


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45461] UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 8191: \ at end of string

2021-10-14 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
pull_requests: +27233
pull_request: https://github.com/python/cpython/pull/28945

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45461] UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 8191: \ at end of string

2021-10-14 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:


New changeset c96d1546b11b4c282a7e21737cb1f5d16349656d by Serhiy Storchaka in 
branch 'main':
bpo-45461: Fix IncrementalDecoder and StreamReader in the "unicode-escape" 
codec (GH-28939)
https://github.com/python/cpython/commit/c96d1546b11b4c282a7e21737cb1f5d16349656d


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45461] UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 8191: \ at end of string

2021-10-14 Thread miss-islington


Change by miss-islington :


--
nosy: +miss-islington
nosy_count: 5.0 -> 6.0
pull_requests: +27231
pull_request: https://github.com/python/cpython/pull/28943

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45461] UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 8191: \ at end of string

2021-10-13 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
keywords: +patch
pull_requests: +27228
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/28939

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45461] UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 8191: \ at end of string

2021-10-13 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
assignee:  -> serhiy.storchaka
versions: +Python 3.10, Python 3.11, Python 3.9 -Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45461] UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 8191: \ at end of string

2021-10-13 Thread STINNER Victor


Change by STINNER Victor :


--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45461] UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 8191: \ at end of string

2021-10-13 Thread Matthew Barnett


Matthew Barnett  added the comment:

It can be shortened to this:

buffer = b"a" * 8191 + b"\\r\\n"

with open("bug_csv.csv", "wb") as f:
f.write(buffer)

with open("bug_csv.csv", encoding="unicode_escape", newline="") as f:
f.readline()

To me it looks like it's reading in blocks of 8K and then decoding them,  but 
it isn't correctly handling an escape sequence that happens to cross a block 
boundary.

--
nosy: +mrabarnett

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45461] UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 8191: \ at end of string

2021-10-13 Thread Anatoly Myachev


Anatoly Myachev  added the comment:

Hello!

I can reduce it a little.
The buffer shoudln't be decreased, as it seems there is a some kind relation 
with the buffer size for IO operations.

buffer = 
b'col1,col2,col3,col4,col5,col6\\r\\n0,2000-01-01,0,00:00:00,DuBFsyerJU,1809.3924826424557\\r\\n10,2000-01-01,10,01:00:00,AlwGHbVPpB,2853.2392617952996\\r\\n20,2000-01-01,20,02:00:00,TEkGgsYXYz,9933.278931158615\\r\\n30,2000-01-01,30,03:00:00,tfvnynVSfp,8574.917426248916\\r\\n40,2000-01-01,40,04:00:00,YOGjhztMWe,3768.71871233428\\r\\n50,2000-01-01,50,05:00:00,vkTOJSeQmU,6330.252072351792\\r\\n60,2000-01-01,60,06:00:00,LeolDfaGyv,5052.618993456892\\r\\n70,2000-01-01,70,07:00:00,OcyrbYVtyr,4287.371622852719\\r\\n80,2000-01-01,80,08:00:00,VUwDPNhcFV,3589.697826814614\\r\\n90,2000-01-01,90,09:00:00,KOadtzcNyK,4794.158259020925\\r\\n100,2000-01-01,100,10:00:00,rdSOjXJBWC,8826.736894397129\\r\\n110,2000-01-01,110,11:00:00,qzwVBOklhk,8086.105782454443\\r\\n120,2000-01-01,120,12:00:00,UTRlqVfKoD,1012.5061461339624\\r\\n130,2000-01-01,130,13:00:00,wKqEkRhkfw,2511.3137510933934\\r\\n140,2000-01-01,140,14:00:00,LxklWJbgxo,406.7116346419042\\r\\n150,2000-01-01,150,15:00:00,SxmZkdUgHv,84
 
24.978062284761\\r\\n160,2000-01-01,160,16:00:00,nEvzypASGb,9890.252156059063\\r\\n170,2000-01-01,170,17:00:00,xiFkkjoDPB,2728.8359201479675\\r\\n180,2000-01-01,180,18:00:00,boMmgpBXgL,4231.680208002166\\r\\n190,2000-01-01,190,19:00:00,dXLJXWiXZI,7757.44902751916\\r\\n200,2000-01-01,200,20:00:00,PBdjwKoCMD,4915.090357003991\\r\\n210,2000-01-01,210,21:00:00,zGWLALpmoA,359.5243650158153\\r\\n220,2000-01-01,220,22:00:00,CfpZJoOqGZ,704.7990862762942\\r\\n230,2000-01-01,230,23:00:00,DrkxpLhpEN,520.3290677592321\\r\\n240,2000-01-02,240,00:00:00,TDKEBbZAzQ,5218.671660857721\\r\\n250,2000-01-02,250,01:00:00,gULwzvNeWO,4218.66872701774\\r\\n260,2000-01-02,260,02:00:00,ogSyzHWmNY,9026.657391329585\\r\\n270,2000-01-02,270,03:00:00,NetmmthtzN,2027.8312539582244\\r\\n280,2000-01-02,280,04:00:00,PoYiHipTzR,7667.627476518046\\r\\n290,2000-01-02,290,05:00:00,MjHIRGmsoq,4144.001792539834\\r\\n300,2000-01-02,300,06:00:00,qESRSNnNnO,5348.024681284471\\r\\n310,2000-01-02,310,07:00:00,sSIjcXWhLC,3622.46
 
73907599413\\r\\n320,2000-01-02,320,08:00:00,IvjrlljbeB,7500.419388155823\\r\\n330,2000-01-02,330,09:00:00,aVWVRXZjZy,3686.5972529264213\\r\\n340,2000-01-02,340,10:00:00,QKeTjcNlCG,1228.9751449454411\\r\\n350,2000-01-02,350,11:00:00,phEdHCVsbe,4254.15983968718\\r\\n360,2000-01-02,360,12:00:00,ursHJjQxRK,6099.131673115221\\r\\n370,2000-01-02,370,13:00:00,JvjcRlYcYG,1503.3586866746164\\r\\n380,2000-01-02,380,14:00:00,gzCyqHPRRb,7816.898213939008\\r\\n390,2000-01-02,390,15:00:00,lQZmobRwzt,8295.113759829599\\r\\n400,2000-01-02,400,16:00:00,qspiYGfTou,1987.8215069414816\\r\\n410,2000-01-02,410,17:00:00,mcqWMMzomf,15.878728570531964\\r\\n420,2000-01-02,420,18:00:00,fiPsxulpGU,5380.485947841902\\r\\n430,2000-01-02,430,19:00:00,gTAyTkpeez,4720.7159908343565\\r\\n440,2000-01-02,440,20:00:00,hzFbhAPvFX,946.5797295044975\\r\\n450,2000-01-02,450,21:00:00,NYNcYxsyVl,7333.850198973723\\r\\n460,2000-01-02,460,22:00:00,wvgMmIxLzo,7399.341315026157\\r\\n470,2000-01-02,470,23:00:00,bZoyzAGgEC,5464.0
 
53510955946\\r\\n480,2000-01-03,480,00:00:00,jZNaceUYyr,1390.8829937709977\\r\\n490,2000-01-03,490,01:00:00,sbfLgcCpru,9626.900131786555\\r\\n500,2000-01-03,500,02:00:00,MHpAkHfnmV,9406.471079089133\\r\\n510,2000-01-03,510,03:00:00,ENdFBGtRCq,3740.8773019724517\\r\\n520,2000-01-03,520,04:00:00,FzqXhMLHLY,4270.3585910905\\r\\n530,2000-01-03,530,05:00:00,wWinjEGhAj,8548.152649813675\\r\\n540,2000-01-03,540,06:00:00,LcxAImCvxt,4097.693176523874\\r\\n550,2000-01-03,550,07:00:00,sDhzGBYKpt,1673.7466277500146\\r\\n560,2000-01-03,560,08:00:00,jhagjcZhGU,4103.702089490347\\r\\n570,2000-01-03,570,09:00:00,ZIkRwPWyWP,9368.662605679918\\r\\n580,2000-01-03,580,10:00:00,uphgoCQwZY,3321.0096306747137\\r\\n590,2000-01-03,590,11:00:00,jEKaqqScLF,8442.084614664149\\r\\n600,2000-01-03,600,12:00:00,kSIJFBHVnL,4065.19226287942\\r\\n610,2000-01-03,610,13:00:00,YRhoANskYn,5089.668482943252\\r\\n620,2000-01-03,620,14:00:00,SnlwCSdkWf,5738.46737129545\\r\\n630,2000-01-03,630,15:00:00,ANfpLOiJTV,393.7754525
 

[issue45461] UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 8191: \ at end of string

2021-10-13 Thread STINNER Victor


STINNER Victor  added the comment:

Can you please try write a simpler (shorter) reproducer?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45461] UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 8191: \ at end of string

2021-10-13 Thread Anatoly Myachev


New submission from Anatoly Myachev :

Expected behavior - if `read()` function works correctly, then `readline()` 
should also works.

Reproducer in file - just run: `python test.py`.

Traceback (most recent call last):
  File "test.py", line 11, in 
f.readline()
  File 
"C:\Users\amyachev\Miniconda3\envs\modin\lib\encodings\unicode_escape.py", line 
26, in decode
return codecs.unicode_escape_decode(input, self.errors)[0]
UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 
8191: \ at end of string

--
components: Unicode
files: test.py
messages: 403837
nosy: anmyachev, ezio.melotti, vstinner
priority: normal
severity: normal
status: open
title: UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in 
position 8191: \ at end of string
type: behavior
versions: Python 3.8
Added file: https://bugs.python.org/file50354/test.py

___
Python tracker 
<https://bugs.python.org/issue45461>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34550] UnicodeDecodeError when invoke method configure() of Menu instance

2021-07-09 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
resolution:  -> out of date
stage:  -> resolved
status: pending -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44510] file.read() UnicodeDecodeError with UTF-8 BOM in files on Windows

2021-06-25 Thread Eryk Sun


Eryk Sun  added the comment:

> On Windows we currently still default to your console encoding

In Windows, the default encoding for open() is the ANSI code page of the 
current process [1], from GetACP(), which is based on the system locale, unless 
it's overridden to UTF-8 in the application manifest. The console encoding is 
unrelated and not something we use much anymore since io._WindowsConsoleIO was 
introduced in Python 3.6.

--
nosy: +eryksun
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed
versions: +Python 3.6, Python 3.9 -Python 3.11

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44510] file.read() UnicodeDecodeError with UTF-8 BOM in files on Windows

2021-06-25 Thread Steve Dower


Steve Dower  added the comment:

The file that fails contains a UTF-8 BOM at the start, which is a multibyte 
character indicating that the file is definitely UTF-8.

Unfortunately, none of Python's default settings will handle this, because it's 
a convention that only really exists on Windows.

On Windows we currently still default to your console encoding, since that is 
what we have always done and changing it by default is very complex. Apparently 
your console encoding does not include the character represented by the first 
byte of the BOM - in any case, it's not a character you'd ever want to see, so 
if it _had_ worked, you'd just have garbage in your read data.

The immediate fix for your scenario is to use "open(filename, 'r', 
encoding='utf-8-sig')" which will handle the BOM correctly.

For the core team, I still think it's worth having the default encoding be able 
to read and drop the UTF-8 BOM from the start of a file. Since we shouldn't do 
it for any arbitrary operation (which may not be at the start of a file), it'd 
have to be a special default object for the TextIOWrapper case, but it would 
have solved this issue. If the BOM is there, it can switch to UTF-8 (or UTF-16, 
if that BOM exists); if not, it can use whatever the default would have been 
(based on all the other available settings).

--
nosy: +methane
title: file.read() UnicodeDecodeError with large files on Windows -> 
file.read() UnicodeDecodeError with UTF-8 BOM in files on Windows
versions: +Python 3.11 -Python 3.6, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue44510>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44510] file.read() UnicodeDecodeError with large files on Windows

2021-06-25 Thread Jason Yundt


Change by Jason Yundt :


--
nosy: +jayman

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44510] file.read() UnicodeDecodeError with large files on Windows

2021-06-25 Thread Rohan Amin


New submission from Rohan Amin :

When using file.read() with a large text file,  
there is a UnicodeDecodeError. I expected file.read(1) to read one character 
from the file. It works with a smaller text file. I experienced this bug on  
Windows 10 version 20H2. My teacher couldn't reproduce this bug on Linux.

--
components: IO, Unicode, Windows
files: Bug Reproduction Code.zip
messages: 396532
nosy: RohanA, ezio.melotti, paul.moore, steve.dower, tim.golden, vstinner, 
zach.ware
priority: normal
severity: normal
status: open
title: file.read() UnicodeDecodeError with large files on Windows
type: behavior
versions: Python 3.6, Python 3.9
Added file: https://bugs.python.org/file50126/Bug Reproduction Code.zip

___
Python tracker 
<https://bugs.python.org/issue44510>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35427] logging UnicodeDecodeError from undecodable strftime output

2021-05-28 Thread Mark Dickinson


Mark Dickinson  added the comment:

Agreed. Thank you!

--
stage:  -> resolved
status: pending -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35427] logging UnicodeDecodeError from undecodable strftime output

2021-05-28 Thread Irit Katriel


Irit Katriel  added the comment:

Since this is not relevant to python 3, I think this issue can be closed.

--
nosy: +iritkatriel
resolution:  -> out of date
status: open -> pending

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32983] UnicodeDecodeError 'ascii' codec can't decode byte in position - ordinal not in range(128)

2021-05-28 Thread Irit Katriel


Irit Katriel  added the comment:

Jiri, if you are still having this problem in 3.9+, and Glenn's suggestion to 
escape the error is not helpful, please create a new issue and include code to 
reproduce it.

Python 2.7 is no longer maintained.

--
nosy: +iritkatriel
resolution:  -> out of date
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26740] tarfile: accessing (listing and extracting) tarball fails with UnicodeDecodeError

2021-05-28 Thread Irit Katriel


Irit Katriel  added the comment:

Python 2.7 is no longer maintained. There aren't enough details here to tell 
whether the issue was fixed in python 3.

If you are having this problem with python 3.9+, please create a new issue.

--
nosy: +iritkatriel
resolution:  -> out of date
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43340] json.load() can raise UnicodeDecodeError, but this is not documented

2021-04-08 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
assignee: rhettinger -> 

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43340] json.load() can raise UnicodeDecodeError, but this is not documented

2021-04-03 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
keywords: +patch
pull_requests: +23914
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/25173

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43214] site: Potential UnicodeDecodeError when handling pth file

2021-03-17 Thread STINNER Victor


STINNER Victor  added the comment:

Since it's a Python script, the default encoding should be UTF-8, as any Python 
script. I guess that most pth files don't use characters outside ASCII so it's 
fine.

I think that distutils made a few changes to switch UTF-8 last years, so it's 
possible.

--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43214] site: Potential UnicodeDecodeError when handling pth file

2021-03-17 Thread Inada Naoki


Inada Naoki  added the comment:

locale-specific encoding is not good especially for Windows.
But we used it for a long time. Changing the encoding for pth files is breaking 
change.

--
resolution:  -> not a bug
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43214] site: Potential UnicodeDecodeError when handling pth file

2021-03-12 Thread Inada Naoki


Change by Inada Naoki :


--
keywords: +patch
pull_requests: +23602
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/24837

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43214] site: Potential UnicodeDecodeError when handling pth file

2021-03-12 Thread Inada Naoki


Change by Inada Naoki :


--
superseder:  -> Use io.open_code for .pth files

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43340] json.load() can raise UnicodeDecodeError, but this is not documented

2021-03-03 Thread Kamil Turek


Change by Kamil Turek :


--
nosy: +kamilturek

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43340] json.load() can raise UnicodeDecodeError, but this is not documented

2021-03-01 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

json.loads() accepts also data encoded with UTF-16 and UTF-32.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43340] json.load() can raise UnicodeDecodeError, but this is not documented

2021-02-27 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Normally, we don't (or can't) enumerate all possible exceptions.  But
in this case, it is worth expanding the documentation so that person can know 
which of two common input errors they need to catch:

"If the data being deserialized is not valid UTF-8 a UnicodeDecodeError will be 
raised, and if the decoded file is not 
a valid JSON document, a JSONDecodeError will be raised".

--
assignee: docs@python -> rhettinger
nosy: +rhettinger

___
Python tracker 
<https://bugs.python.org/issue43340>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43340] json.load() can raise UnicodeDecodeError, but this is not documented

2021-02-27 Thread Eric V. Smith


Eric V. Smith  added the comment:

As a rule we don't try and document every exception that can be raised. I could 
go either way on documenting encoding errors with the json module, although it 
seems pretty clear that an encoding error would be possible in this case.

--
assignee:  -> docs@python
components: +Documentation -Library (Lib)
nosy: +docs@python, eric.smith
versions:  -Python 3.6, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43340] json.load() can raise UnicodeDecodeError, but this is not documented

2021-02-27 Thread Matthew Woodcraft

New submission from Matthew Woodcraft :

The documentation for json.load() and json.loads() says:

« If the data being deserialized is not a valid JSON document, a 
JSONDecodeError will be raised. »

But this is not currently entirely true: if the data is provided in bytes form 
and is not properly encoded in one of the three accepted encodings, 
UnicodeDecodeError is raised instead.

(I have no opinion on whether the documentation or the behaviour should be 
changed.)

--
components: Library (Lib)
messages: 387780
nosy: mattheww
priority: normal
severity: normal
status: open
title: json.load() can raise UnicodeDecodeError, but this is not documented
type: behavior
versions: Python 3.10, Python 3.6, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue43340>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20140] UnicodeDecodeError in ntpath.py when home dir contains non-ascii signs

2021-02-25 Thread Eryk Sun


Change by Eryk Sun :


--
resolution:  -> out of date
stage: needs patch -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43214] site: Potential UnicodeDecodeError when handling pth file

2021-02-13 Thread Inada Naoki


Change by Inada Naoki :


--
keywords: +easy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43214] site: Potential UnicodeDecodeError when handling pth file

2021-02-13 Thread Inada Naoki


New submission from Inada Naoki :

https://github.com/python/cpython/blob/4230bd52e3f9f289f02e41ab17a95f50ed4db5a6/Lib/site.py#L160

```
f = io.TextIOWrapper(io.open_code(fullname))
```

When default text encoding is not UTF-8 and pth file contains non-ASCII 
character, it will raise UnicodeDecodeError.

--
components: Library (Lib)
keywords: 3.8regression
messages: 386916
nosy: methane
priority: normal
severity: normal
status: open
title: site: Potential UnicodeDecodeError when handling pth file
type: behavior
versions: Python 3.10, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue43214>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34550] UnicodeDecodeError when invoke method configure() of Menu instance

2020-12-21 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

There should no be any UnicodeDecodeError now. Tkinter on Windows uses now the 
UTF-16 encoding with the surrogatepass error handler which should never fail.

Could you please confirm that the issue is gone?

--
status: open -> pending

___
Python tracker 
<https://bugs.python.org/issue34550>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42351] Setup.py: UnicodeDecodeError in grep_headers_for

2020-11-14 Thread E. Paine


Change by E. Paine :


--
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42351] Setup.py: UnicodeDecodeError in grep_headers_for

2020-11-14 Thread Ronald Oussoren


Ronald Oussoren  added the comment:

Thanks for testing!

--
resolution:  -> fixed
stage: patch review -> resolved

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42351] Setup.py: UnicodeDecodeError in grep_headers_for

2020-11-14 Thread Ronald Oussoren


Ronald Oussoren  added the comment:


New changeset 7a27c7ed4b2b45bb9ea27d3f5c4f423495d6e939 by Ronald Oussoren in 
branch 'master':
bpo-42351: Avoid error when opening header with non-UTF8 encoding (GH-23279)
https://github.com/python/cpython/commit/7a27c7ed4b2b45bb9ea27d3f5c4f423495d6e939


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42351] Setup.py: UnicodeDecodeError in grep_headers_for

2020-11-14 Thread Ronald Oussoren


Ronald Oussoren  added the comment:

I've created PR. Could you please check if that fixes the problem?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42351] Setup.py: UnicodeDecodeError in grep_headers_for

2020-11-14 Thread Ronald Oussoren


Change by Ronald Oussoren :


--
keywords: +patch
pull_requests: +22173
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/23279

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42351] Setup.py: UnicodeDecodeError in grep_headers_for

2020-11-14 Thread Ronald Oussoren


Ronald Oussoren  added the comment:

That's annoying. A quick workaround is to patch setup.py:get_headers_for and 
add "encoding='latin1'" to the arguments of open.

I'll look into a better fix later this weekend.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42325] UnicodeDecodeError executing ./setup.py during build

2020-11-13 Thread Chih-Hsuan Yen


Chih-Hsuan Yen  added the comment:

I got a similar issue on Arch Linux - see issue42351.

--
nosy: +yan12125

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42351] Setup.py: UnicodeDecodeError in grep_headers_for

2020-11-13 Thread Chih-Hsuan Yen


Chih-Hsuan Yen  added the comment:

I can also confirm the issue on our Arch Linux server [1]. The problematic file 
is also /usr/include/OMX_Other.h.

Looks like it is a regression from https://github.com/python/cpython/pull/22855 
(https://bugs.python.org/issue41100). Ronald Oussoren, mind to have a look?

[1] 
https://build.archlinuxcn.org/~imlonghao/log/python-git/2020-11-14T01:17:02.html

--
nosy: +ronaldoussoren, yan12125

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42351] Setup.py: UnicodeDecodeError in grep_headers_for

2020-11-13 Thread E. Paine


New submission from E. Paine :

When compiling the master branch (i.e. running 'make'), I get a 
UnicodeDecodeError as follows:
Traceback (most recent call last):
  File "/home/elisha/Documents/Python/cp0/cpython/./setup.py", line 2619, in 

main()
  File "/home/elisha/Documents/Python/cp0/cpython/./setup.py", line 2589, in 
main
setup(# PyPI Metadata (PEP 301)
  File "/home/elisha/Documents/Python/cp0/cpython/Lib/distutils/core.py", line 
148, in setup
dist.run_commands()
  File "/home/elisha/Documents/Python/cp0/cpython/Lib/distutils/dist.py", line 
966, in run_commands
self.run_command(cmd)
  File "/home/elisha/Documents/Python/cp0/cpython/Lib/distutils/dist.py", line 
985, in run_command
cmd_obj.run()
  File 
"/home/elisha/Documents/Python/cp0/cpython/Lib/distutils/command/build.py", 
line 135, in run
self.run_command(cmd_name)
  File "/home/elisha/Documents/Python/cp0/cpython/Lib/distutils/cmd.py", line 
313, in run_command
self.distribution.run_command(command)
  File "/home/elisha/Documents/Python/cp0/cpython/Lib/distutils/dist.py", line 
985, in run_command
cmd_obj.run()
  File 
"/home/elisha/Documents/Python/cp0/cpython/Lib/distutils/command/build_ext.py", 
line 340, in run
self.build_extensions()
  File "/home/elisha/Documents/Python/cp0/cpython/./setup.py", line 471, in 
build_extensions
self.detect_modules()
  File "/home/elisha/Documents/Python/cp0/cpython/./setup.py", line 1825, in 
detect_modules
self.detect_ctypes()
  File "/home/elisha/Documents/Python/cp0/cpython/./setup.py", line 2205, in 
detect_ctypes
if grep_headers_for('ffi_prep_cif_var', ffi_headers):
  File "/home/elisha/Documents/Python/cp0/cpython/./setup.py", line 246, in 
grep_headers_for
if function in f.read():
  File "/home/elisha/Documents/Python/cp0/cpython/Lib/codecs.py", line 322, in 
decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 4210: 
invalid start byte

The problematic file it is trying to read is /usr/include/OMX_Other.h which is 
part of the libomxil-bellagio package (a copy of this package can be downloaded 
from 
https://www.archlinux.org/packages/extra/x86_64/libomxil-bellagio/download/). 
More specifically, there are several characters in the comments which cannot be 
decoded correctly (the first of these is on line 93).

The fix is a very simple one and is just to add errors='replace' to line 244 of 
setup.py (I cannot see this having any ill-effects).

I couldn't find who to nosy for this so apologies about that.

--
components: Build
messages: 380913
nosy: epaine
priority: normal
severity: normal
status: open
title: Setup.py: UnicodeDecodeError in grep_headers_for
type: compile error
versions: Python 3.10, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue42351>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42325] UnicodeDecodeError executing ./setup.py during build

2020-11-11 Thread Skip Montanaro

New submission from Skip Montanaro :

I recently replaced Ubuntu 20.04 with Manjaro 20.2. In the process my Python 
builds broke in the sharedmods target of the Makefile. The tail end of the 
traceback is:

  File "/home/skip/src/python/cpython/./setup.py", line 246, in grep_headers_for
    if function in f.read():
  File "/home/skip/src/python/cpython/Lib/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 1600: 
invalid start byte

The grep_headers_for() function in setup.py appeared to be the culprit, so I 
added a print statement to its loop:

def grep_headers_for(function, headers):
    for header in headers:
        print("***", header, file=sys.stderr)
        with open(header, 'r') as f:
            if function in f.read():
                return True
    return False

which printed these lines:

*** /usr/include/umfpack_report_perm.h
*** /usr/include/dbstl_dbc.h
*** /usr/include/itclTclIntStubsFcn.h
*** /usr/include/dbstl_vector.h
*** /usr/include/cholmod_blas.h
*** /usr/include/amd.h
*** /usr/include/m17n-X.h

Sure enough, that m17n-X.h file (attached) doesn't contain utf-8 (my 
environment's encoding). According to the Emacs coding cookie at the end, the 
file is euc-japan encoded. Would simply catching the exception in 
grep_headers_for() be the correct way to deal with this?

--
components: Build
files: m17n-X.h
messages: 380761
nosy: skip.montanaro
priority: normal
severity: normal
status: open
title: UnicodeDecodeError executing ./setup.py during build
versions: Python 3.10
Added file: https://bugs.python.org/file49593/m17n-X.h

___
Python tracker 
<https://bugs.python.org/issue42325>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

2020-10-14 Thread Inada Naoki


Inada Naoki  added the comment:

Thank you for finding/fixing.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

2020-10-14 Thread miss-islington


miss-islington  added the comment:


New changeset f07448bef48d645c8cee862b1f25a99003a6140e by Miss Skeleton (bot) 
in branch '3.9':
bpo-41894: Fix UnicodeDecodeError while loading native module (GH-22466)
https://github.com/python/cpython/commit/f07448bef48d645c8cee862b1f25a99003a6140e


--

___
Python tracker 
<https://bugs.python.org/issue41894>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

2020-10-14 Thread miss-islington


miss-islington  added the comment:


New changeset 47ca6799725bb4c40953bb26ebcd726d1d766361 by Miss Skeleton (bot) 
in branch '3.8':
bpo-41894: Fix UnicodeDecodeError while loading native module (GH-22466)
https://github.com/python/cpython/commit/47ca6799725bb4c40953bb26ebcd726d1d766361


--

___
Python tracker 
<https://bugs.python.org/issue41894>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

2020-10-14 Thread miss-islington


Change by miss-islington :


--
pull_requests: +21675
pull_request: https://github.com/python/cpython/pull/22705

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

2020-10-14 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset 2d2af320d94afc6561e8f8adf174c9d3fd9065bc by Kevin Adler in branch 
'master':
bpo-41894: Fix UnicodeDecodeError while loading native module (GH-22466)
https://github.com/python/cpython/commit/2d2af320d94afc6561e8f8adf174c9d3fd9065bc


--

___
Python tracker 
<https://bugs.python.org/issue41894>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

2020-10-14 Thread miss-islington


Change by miss-islington :


--
nosy: +miss-islington
nosy_count: 3.0 -> 4.0
pull_requests: +21674
pull_request: https://github.com/python/cpython/pull/22704

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

2020-10-08 Thread Inada Naoki


Inada Naoki  added the comment:

Yes, please.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

2020-10-08 Thread Kevin


Kevin  added the comment:

Ok, so should I switch the PR back from PyUnicode_DecodeFSDefault?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

2020-10-08 Thread Inada Naoki


Inada Naoki  added the comment:

OK. Let's use PyUnicode_DecodeLocale() with surrogateescape for consistency.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

2020-10-08 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

In os.strerror() and PyErr_SetFromErrnoWithFilenameObjects() we use 
PyUnicode_DecodeLocale(s, "surrogateescape") for decoding the result of 
strerror().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

2020-10-08 Thread Inada Naoki


Inada Naoki  added the comment:

> So the main problem is: should we allow surrogateescape in error message?

Note that error message may be written to file, stream, structured log (JSON). 
They may be UTF-8:strict. We can not write surrogateescape-d string to them.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

2020-10-08 Thread Inada Naoki


Inada Naoki  added the comment:

> I think that it is more correct to use the locale encoding. If error messages 
> are translated for readability, we should not ruin this by outputting \xXX.

* PyUnicode_DecodeLocale() doesn't support "backslashescape" error handler.
* Error message is usually encoded in locale encoding, but it is not guaranteed.
* Error message may contain path, it may be not locale encoding too.
* \xXX is far better than UnicodeDecodeError, anyway. We need to fix the 
UnicodeDecodeError first.
* non-UTF-8 locale is rare. We used this code for long time but we haven't 
reported this issue until now.

I don't against adding "backslashescape" to PyUnicode_DecodeLocale(). But to 
backport the bugfix for UnicodeDecodeError, change should be minimum.

So the main problem is: should we allow surrogateescape in error message?

For the record, PyUnicode_DecodeLocale() is using mbstowcs(). I don't know how 
reliable the function is in various platforms. That is why I had suggested 
PyUnicode_DecodeFSDefault() at first.

--

___
Python tracker 
<https://bugs.python.org/issue41894>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

2020-10-08 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

I think that it is more correct to use the locale encoding. If error messages 
are translated for readability, we should not ruin this by outputting \xXX.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

2020-10-07 Thread Inada Naoki

Inada Naoki  added the comment:

> I have since changed the PR to use PyUnicode_DecodeFSDefault based on review 
> feedback. I was going to say that you will have to fight it out with @methane 
> on GH, but I see that that's you. :D Would have been nice if you would have 
> left the updated feedback there as well so people who aren't familiar would 
> know it's one person adjusting their recommendation vs two different people 
> with conflicting recommendations.

OK, I changd my b.p.o username.


> The only issue I see with using backslashreplace is that users of non-UTF-8 
> locales would see message text that contains non-ASCII characters only as 
> escape codes. eg, the message above would show "Il modulo dipendente 
> libbz2.so non \xe8 stato caricato." instead of "Il modulo dipendente 
> libbz2.so non è stato caricato."

The issue is not caused by backslashreplace, but by UTF-8 instead of locale. I 
notice it of course, but:

* Using UTF-8 is status quo. UTF-8:backslashreplace is the simplest fix 
approach.
* There is no guarantee that the error message can be decoded by locale 
encoding. Backslash escape is much better than "ignore" or "surrogateescape".


> By using PyUnicode_DecodeFSDefault instead, the message should be properly 
> decoded but any encoding errors (such as utf-8 paths, etc) would be handled 
> by surrogateescape.
> 

There is no guranatee that the message is properly decoded with fsencoding.
And surrogateescape is for round-tripping bytes path, not for human readable.
Error message should be human readable. So backslashreplace is better than 
surrogateescape.

Additionally, non-UTF-8 locale is quite rare on Unix systems, and users of such 
systems would be able to handle backslash escaped message, because they might 
see such message often.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

2020-10-07 Thread Kevin

Kevin  added the comment:

Glad you were able to reproduce on Linux.

I have since changed the PR to use PyUnicode_DecodeFSDefault based on review 
feedback. I was going to say that you will have to fight it out with @methane 
on GH, but I see that that's you. :D Would have been nice if you would have 
left the updated feedback there as well so people who aren't familiar would 
know it's one person adjusting their recommendation vs two different people 
with conflicting recommendations.


The only issue I see with using backslashreplace is that users of non-UTF-8 
locales would see message text that contains non-ASCII characters only as 
escape codes. eg, the message above would show "Il modulo dipendente libbz2.so 
non \xe8 stato caricato." instead of "Il modulo dipendente libbz2.so non è 
stato caricato." By using PyUnicode_DecodeFSDefault instead, the message should 
be properly decoded but any encoding errors (such as utf-8 paths, etc) would be 
handled by surrogateescape.

I guess the question comes to: what's more important to be decoded, the message 
text or the path?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

2020-10-05 Thread Inada Naoki


Inada Naoki  added the comment:

I succeeded to reproduce it on Ubuntu 20.04.

$ sudo vi /var/lib/locales/supported.d/ja # add "ja_JP.EUC-JP EUC-JP"
$ sudo locale-gen ja_JP.EUC-JP
Generating locales (this might take a while)...
ja_JP.EUC-JP... done
Generation complete.
$ chmod 
-r./build/lib.linux-x86_64-3.10/_sha3.cpython-310-x86_64-linux-gnu.so
$ LC_ALL=ja_JP.eucjp ./python
Python 3.10.0a0 (heads/master:fbf43f051e, Aug 17 2020, 15:13:52)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_ALL, "")
'ja_JP.eucjp'
>>> import _sha3
Traceback (most recent call last):
File "", line 1, in 
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 101: 
invalid start byte

Error message contains file path (byte string, probably encoded with fs 
encoding) and translated error message (encoded with locale encoding).

I want to use "backslashescape" error handler, but both of 
PyUnicode_DecodeLocale() and PyUnicode_DecodeFSDefault() don't support it.

After thinking about this several minutes, now I prefer 
PyUnicode_DecodeUTF8(msg, strlen(msg), "backslashreplace").
It fixes the issue with minimum behavior change, although error message is 
still backslashescaped.
It might be the best practice for creating Unicode object from C error message 
like strerror(3).

--
nosy: +inada.naoki

___
Python tracker 
<https://bugs.python.org/issue41894>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

2020-10-02 Thread Tal Einat


Change by Tal Einat :


--
versions:  -Python 3.5, Python 3.6, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

2020-09-30 Thread Kevin


Change by Kevin :


--
keywords: +patch
pull_requests: +21490
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/22466

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41894] UnicodeDecodeError during load failure in non-UTF-8 locale

2020-09-30 Thread Kevin

New submission from Kevin :

If a native module fails to load, the dynload code will call 
PyUnicode_FromString on the error message to give back to the user. This can 
cause a UnicodeDecodeError if the locale is not a UTF-8 locale and the error 
message contains non-ASCII code points.

While Linux systems almost always use a UTF-8 locale by default nowadays, AIX 
systems typically use non-UTF-8 locales by default. We encountered an issue 
where a customer did not have libbz2 installed, causing a load failure when bz2 
tried to import _bz2 when running in an Italian locale:

$ LC_ALL=it_IT python3 -c 'import bz2'
Traceback (most recent call last): 
 File "", line 1, in  
 File "/QOpenSys/pkgs/lib/python3.6/bz2.py", line 21, in  
   from _bz2 import BZ2Compressor, BZ2Decompressor 
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 161: 
invalid continuation byte

After switching to a UTF-8 locale, the problem goes away:

$ LC_ALL=IT_IT python3 -c 'import bz2'   
Traceback (most recent call last): 
 File "", line 1, in  
 File "/QOpenSys/pkgs/lib/python3.6/bz2.py", line 21, in  
   from _bz2 import BZ2Compressor, BZ2Decompressor 
ImportError:0509-022 Impossibile caricare il modulo 
/QOpenSys/pkgs/lib/python3.6/lib-dynload/_bz2.so. 
   0509-150   Il modulo dipendente libbz2.so non è stato caricato. 
   0509-022 Impossibile caricare il modulo libbz2.so. 
   0509-026 Errore di sistema: Un file o una directory nel nome percorso 
non esiste. 
   0509-022 Impossibile caricare il modulo 
/QOpenSys/pkgs/lib/python3.6/lib-dynload/_bz2.so. 
   0509-150   Il modulo dipendente 
/QOpenSys/pkgs/lib/python3.6/lib-dynload/_bz2.so non è stato caricato.


While this conceivably affects any Unix-like platform, the only system I can 
recreate it on is AIX and IBM i PASE. As far as I can tell, on Linux you will 
always get something like "error while loading shared libraries: libbz2.so.1.0: 
cannot open shared object file: No such file or directory". Even though there 
seems to be some translations in GLIBC, I have been unable to get them to be 
used on either Fedora or Ubuntu.

--
components: Interpreter Core
messages: 377713
nosy: kadler
priority: normal
severity: normal
status: open
title: UnicodeDecodeError during load failure in non-UTF-8 locale
type: behavior
versions: Python 3.10, Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 
3.9

___
Python tracker 
<https://bugs.python.org/issue41894>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41497] Potential UnicodeDecodeError in dis

2020-08-07 Thread Inada Naoki


Change by Inada Naoki :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41497] Potential UnicodeDecodeError in dis

2020-08-07 Thread miss-islington


miss-islington  added the comment:


New changeset d9106434f77fa84c8a59f8e60dc9c14cdd989b35 by Miss Islington (bot) 
in branch '3.9':
bpo-41497: Fix potential UnicodeDecodeError in dis CLI (GH-21757)
https://github.com/python/cpython/commit/d9106434f77fa84c8a59f8e60dc9c14cdd989b35


--

___
Python tracker 
<https://bugs.python.org/issue41497>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41497] Potential UnicodeDecodeError in dis

2020-08-07 Thread miss-islington


miss-islington  added the comment:


New changeset 66c899661902edc18df96a5c3f22639310700491 by Miss Islington (bot) 
in branch '3.8':
bpo-41497: Fix potential UnicodeDecodeError in dis CLI (GH-21757)
https://github.com/python/cpython/commit/66c899661902edc18df96a5c3f22639310700491


--

___
Python tracker 
<https://bugs.python.org/issue41497>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41497] Potential UnicodeDecodeError in dis

2020-08-07 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset a4084b9d1e40c1c9259372263d1fe8c8a562b093 by Konge in branch 
'master':
bpo-41497: Fix potential UnicodeDecodeError in dis CLI (GH-21757)
https://github.com/python/cpython/commit/a4084b9d1e40c1c9259372263d1fe8c8a562b093


--
nosy: +inada.naoki

___
Python tracker 
<https://bugs.python.org/issue41497>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41497] Potential UnicodeDecodeError in dis

2020-08-07 Thread miss-islington


Change by miss-islington :


--
nosy: +miss-islington
nosy_count: 3.0 -> 4.0
pull_requests: +20923
pull_request: https://github.com/python/cpython/pull/21782

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41497] Potential UnicodeDecodeError in dis

2020-08-07 Thread miss-islington


Change by miss-islington :


--
pull_requests: +20924
pull_request: https://github.com/python/cpython/pull/21783

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41497] Potential UnicodeDecodeError in dis

2020-08-06 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Good catch. Yes, when read Python source files you should either open them in 
binary mode if reading bytes is enough for use, or open them with 
tokenize.open() if we need string data, or use token.detect_encoding() and pass 
the result to open().

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41497] Potential UnicodeDecodeError in dis

2020-08-06 Thread Inada Naoki


Change by Inada Naoki :


--
versions:  -Python 3.5, Python 3.6, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41497] Potential UnicodeDecodeError in dis

2020-08-06 Thread JIanqiu Tao


JIanqiu Tao  added the comment:

I searched the whole Lib folder and find a lot of code uses "open(filename, 
'r')" without handling default encoding.

Should we open another issue for these problem?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41497] Potential UnicodeDecodeError in dis

2020-08-06 Thread JIanqiu Tao


Change by JIanqiu Tao :


--
keywords: +patch
pull_requests: +20902
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/21757

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41497] Potential UnicodeDecodeError in dis

2020-08-06 Thread JIanqiu Tao

New submission from JIanqiu Tao :

A potential UnicodeDecodeError could be raised when run "python -m dis" on 
non-utf8 encoding environment.

Assume there is a file named "a.py", and contains "print('喵')", then save with 
UTF8 encoding.

Run "python -m dis ./a.py", on non-UTF8 encoding environment, for example a 
Windows PC which default language is Chinese.

A UnicodeDecodeError raised.

Traceback (most recent call last):
  File "C:\Program Files\Python38\lib\runpy.py", line 194, in 
_run_module_as_main
return _run_code(code, main_globals, None,
  File "C:\Program Files\Python38\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
  File "C:\Program Files\Python38\lib\dis.py", line 553, in 
_test()
  File "C:\Program Files\Python38\lib\dis.py", line 548, in _test
source = infile.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0xb5 in position 9: illegal 
multibyte sequence

That because Windows' default encoding is decided by language. Chinese use 
cp936(GB2312) as default encoding and can't handle UTF8 encoding.

It just need to read in "rb" mode instead of "r".

--
components: Library (Lib)
messages: 374961
nosy: zkonge
priority: normal
severity: normal
status: open
title: Potential UnicodeDecodeError in dis
type: behavior
versions: Python 3.10, Python 3.5, Python 3.6, Python 3.7, Python 3.8, Python 
3.9

___
Python tracker 
<https://bugs.python.org/issue41497>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41115] Codecs should raise precise UnicodeDecodeError or UnicodeEncodeError

2020-06-26 Thread utkarsh


utkarsh  added the comment:

@thatiparthy These were the most logical changes, standard error messages, 
which were already there in the existing code, I just edited them as mentioned 
here. What part of your "work" do you think i copied?
Sent this PR to get familiar to the process mostly, i will close it if you feel 
insecure. No need to be rude.
thanks.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41115] Codecs should raise precise UnicodeDecodeError or UnicodeEncodeError

2020-06-26 Thread శ్రీనివాస్ రెడ్డి తాటిపర్తి

Srinivas  Reddy Thatiparthy(శ్రీనివాస్ రెడ్డి తాటిపర్తి) 
 added the comment:

@utk You could have taken some other easy issue from 
https://bugs.python.org/issue?status=1&@sort=-activity&@columns=id%2Cactivity%2Ctitle%2Ccreator%2Cstatus&@dispname=Easy%20issues&@startwith=0&@group=priority=6&@action=search&@filter=&@pagesize=50
 instead of copy pasting my work.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41115] Codecs should raise precise UnicodeDecodeError or UnicodeEncodeError

2020-06-26 Thread utkarsh


Change by utkarsh :


--
nosy: +utk
nosy_count: 8.0 -> 9.0
pull_requests: +20329
pull_request: https://github.com/python/cpython/pull/21170

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41115] Codecs should raise precise UnicodeDecodeError or UnicodeEncodeError

2020-06-26 Thread శ్రీనివాస్ రెడ్డి తాటిపర్తి

Change by Srinivas  Reddy Thatiparthy(శ్రీనివాస్ రెడ్డి తాటిపర్తి) 
:


--
keywords: +patch
pull_requests: +20323
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/21165

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41115] Codecs should raise precise UnicodeDecodeError or UnicodeEncodeError

2020-06-25 Thread STINNER Victor


Change by STINNER Victor :


--
nosy:  -vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41115] Codecs should raise precise UnicodeDecodeError or UnicodeEncodeError

2020-06-25 Thread Walter Dörwald

Walter Dörwald  added the comment:

UnicodeEncodeError and UnicodeDecodeError are used to report un(en|de)codedable 
ranges in the source object, so it wouldn't make sense to use them for errors 
that have nothing to do with problems in the source object. Their constructor 
requires 5 arguments (encoding, object, start, end, reason), not just a simple 
message: e.g. UnicodeEncodeError("utf-8", "foo", 17, 23, "bad string").

But for reporting e.g. missing BOMs at the start it would be useful to use (0,  
0) as the offending range.

--
nosy: +doerwalter

___
Python tracker 
<https://bugs.python.org/issue41115>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41115] Codecs should raise precise UnicodeDecodeError or UnicodeEncodeError

2020-06-25 Thread Emmanuel Arias


Emmanuel Arias  added the comment:

Hi,

IMO this can be mark as an easy issue.

@thatiparthy please, go ahead

--
nosy: +eamanu

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41115] Codecs should raise precise UnicodeDecodeError or UnicodeEncodeError

2020-06-25 Thread శ్రీనివాస్ రెడ్డి తాటిపర్తి

Srinivas  Reddy Thatiparthy(శ్రీనివాస్ రెడ్డి తాటిపర్తి) 
 added the comment:

This looks like an easy task. Shall I create a PR?

--
nosy: +thatiparthy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41115] Codecs should raise precise UnicodeDecodeError or UnicodeEncodeError

2020-06-25 Thread Antoine Pitrou


New submission from Antoine Pitrou :

A number of codecs raise bare UnicodeError, rather than 
Unicode{Decode,Encode}Error. Example:

  File 
"/home/antoine/miniconda3/envs/pyarrow/lib/python3.7/encodings/utf_16.py", line 
67, in _buffer_decode
raise UnicodeError("UTF-16 stream does not start with BOM")

A more complete list can be found here:
https://gist.github.com/pitrou/60594b28d8e47edcdb97d9b15d5f9866

--
components: Library (Lib)
keywords: easy
messages: 372367
nosy: benjamin.peterson, ezio.melotti, lemburg, pitrou, serhiy.storchaka, 
vstinner
priority: normal
severity: normal
stage: needs patch
status: open
title: Codecs should raise precise UnicodeDecodeError or UnicodeEncodeError
type: behavior
versions: Python 3.7, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue41115>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40416] Calling TextIOWrapper.tell() in the middle of reading a gb2312-encoded file causes UnicodeDecodeError

2020-05-03 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

The commit referenced above is for #33578.  The symptoms for that issue were 
very similar, including involving a cjk codec.  The change was not backported 
because it was seen an enhancement.  Rob, if you try 3.8.2 or 3.8.3 (the 
release candidate was out Wednesday, the final probably next week or so) and 
still have the same problem, re-open this.

--
resolution:  -> duplicate
stage:  -> resolved
status: open -> closed
superseder:  -> cjkcodecs missing getstate and setstate implementations

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40416] Calling TextIOWrapper.tell() in the middle of reading a gb2312-encoded file causes UnicodeDecodeError

2020-05-03 Thread Inada Naoki


Inada Naoki  added the comment:

I think this is not a bug, but a limitation of Python 3.7, and improvement in 
3.8.

--
nosy: +inada.naoki

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40416] Calling TextIOWrapper.tell() in the middle of reading a gb2312-encoded file causes UnicodeDecodeError

2020-05-02 Thread Ma Lin


Ma Lin  added the comment:

I did a git bisect, this commit fixed the bug:

https://github.com/python/cpython/commit/ac22f6aa989f18c33c12615af1c66c73cf75d5e7

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40416] Calling TextIOWrapper.tell() in the middle of reading a gb2312-encoded file causes UnicodeDecodeError

2020-05-02 Thread Ma Lin


Ma Lin  added the comment:

On Windows 10, Python 3.7, I get the same message as above reply.

If use Python 3.8, it works well.

--
nosy: +Ma Lin

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40416] Calling TextIOWrapper.tell() in the middle of reading a gb2312-encoded file causes UnicodeDecodeError

2020-05-02 Thread Rob Malouf


Rob Malouf  added the comment:

Same results on MacOS 10.15.4 (both the system python and the intel/anaconda 
version) and on CentOS 7.8

Here's the output with print(...):

13
71
72
392
393
399
536
537
761
762
879
880
933
934
1146
1147
1254
1255
1359
1360
1760
1761
1772
1895
1897
1906
2105
2107
2338
2339
2348
2398
2399
2408
2509
2510
2519
2612
2614
2622
2682
2684
2693
2898
2900
2909
3050
3052
3061
3113
3115
3124
3295
3297
3309
3445
3632
3644
3814
3816
3828
3882
3967
3979
4048
4184
4196
4226
4308
4320
4492
4559
4641
4653
4728
4770
4782
4999
5001
5013
5202
5204
5216
5270
5318
5333
5411
5465
5672
5687
5953
5954
5969
6082
6137
6307
6373
6388
6494
6496
6511
6786
6913
6928
7148
7371
7447
7462
7569
7704
7719
7847
7848
7863
7972
8238
8342
Traceback (most recent call last):
  File "test.py", line 4, in 
print(f.tell())
UnicodeDecodeError: 'gb2312' codec can't decode byte 0xb5 in position 0: 
illegal multibyte sequence

--

___
Python tracker 
<https://bugs.python.org/issue40416>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40416] Calling TextIOWrapper.tell() in the middle of reading a gb2312-encoded file causes UnicodeDecodeError

2020-05-01 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

Change the line to 'print(f.tell())'.  Are any lines printed before the error?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40416] Calling TextIOWrapper.tell() in the middle of reading a gb2312-encoded file causes UnicodeDecodeError

2020-05-01 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

OS? in case it matters

--
nosy: +terry.reedy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40435] IDLE should catch user config file UnicodeDecodeError

2020-04-29 Thread 左迟

左迟  added the comment:

Well, I have uploaded my ~/.idlerc/config-main.cfg. And apeeding 
"encodin=utf-8" is my first time to edit config-main.cfg file manually.
The content of config-main.cfg is below:
  1 [EditorWindow]
  2 font-size = 16
  3 font-bold = False
  4 encoding = utf-8
  5 font = courier new
  6
Just let me know if I could help you.

--
Added file: https://bugs.python.org/file49101/config-main.cfg

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40435] IDLE should catch user config file UnicodeDecodeError

2020-04-29 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

I want this left open to fix IDLE exiting instead of continuing.  The original 
IDLE authors could not anticipate all the things that users around the world 
(and OS developers) might do, and we maintainers are still plugging holes as 
they are reported.

>From the original report, I am slightly surprised that your fix worked.  
>Please either upload your revised config-main.cfg or paste it into a reply.  
>Changing it should not, that I know of, affect the reading of other .cfg 
>files, only the subsequent loading of .py files.  Even then, it would have to 
>end with an "[EditorWindow]" section, so that "encoding=utf-8" would be in the 
>proper section.

(I should also check the error messaging for un-readable .py files too)

--
resolution: not a bug -> 
stage: resolved -> test needed
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40435] IDLE should catch user config file UnicodeDecodeError

2020-04-29 Thread 左迟

左迟  added the comment:

Hi!
Thanks for your useful comment. And I'm sorry for uploading the image but not 
pasting it in the comment.
When I append "encoding=utf-8" to ~/.idlerc/config-main.cfg, the idle turns to 
be good and works well.
Yes, the "[] Beta: Use Unicode UTF8 for worldwide language support" is what I 
mean.
The reason why I raise this issue is there is no searching results for this 
error messgae.
In fact, my OS language is Chinese. And I tick the beta UTF8 option manually. 
It seems like that causes the IDLE crash.
Really thanks for your help!
Best Regards, Chi.

--
resolution:  -> not a bug
stage: test needed -> resolved
status: open -> closed
versions:  -Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40435] IDLE should catch user config file UnicodeDecodeError

2020-04-29 Thread Terry J. Reedy


Change by Terry J. Reedy :


--
title: Failed to launch IDLE in a UTF-8 code page terminal environment -> IDLE 
should catch user config file UnicodeDecodeError

___
Python tracker 
<https://bugs.python.org/issue40435>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40416] Calling TextIOWrapper.tell() in the middle of reading a gb2312-encoded file causes UnicodeDecodeError

2020-04-28 Thread STINNER Victor


Change by STINNER Victor :


--
nosy:  -vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40416] Calling TextIOWrapper.tell() in the middle of reading a gb2312-encoded file causes UnicodeDecodeError

2020-04-27 Thread Rob Malouf


New submission from Rob Malouf :

Calling TextIOWrapper.tell() while reading the attached gb2312-encoded file 
like this:

with open('udhr-gb2312.txt', encoding='GB2312') as f: 
while True: 
   line = f.readline() 
   t = f.tell()
   if not line: 
   break 

gives this result:

Traceback (most recent call last):
  File "test.py", line 4, in 
t = f.tell()
UnicodeDecodeError: 'gb2312' codec can't decode byte 0xb5 in position 0: 
illegal multibyte sequence

The file seems to be well-formed and can be read without any problem.  It's 
only the call to tell() that raises an issue.

--
components: IO, Unicode
files: udhr-gb2312.txt
messages: 367494
nosy: ezio.melotti, rmalouf, vstinner
priority: normal
severity: normal
status: open
title: Calling TextIOWrapper.tell() in the middle of reading a gb2312-encoded 
file causes UnicodeDecodeError
type: crash
versions: Python 3.7
Added file: https://bugs.python.org/file49096/udhr-gb2312.txt

___
Python tracker 
<https://bugs.python.org/issue40416>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



  1   2   3   4   5   6   7   8   9   >