[issue27797] ASCII file with UNIX line conventions and enough lines throws SyntaxError when ASCII-compatible codec is declared

2019-03-29 Thread Inada Naoki


Change by Inada Naoki :


--
resolution:  -> duplicate
stage: needs patch -> resolved
status: open -> closed
superseder:  -> SyntaxError: encoding problem: iso-8859-1 on Windows

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27797] ASCII file with UNIX line conventions and enough lines throws SyntaxError when ASCII-compatible codec is declared

2016-08-19 Thread STINNER Victor

Changes by STINNER Victor :


--
nosy: +haypo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27797] ASCII file with UNIX line conventions and enough lines throws SyntaxError when ASCII-compatible codec is declared

2016-08-19 Thread Eryk Sun

Eryk Sun added the comment:

In issue 20844 I suggested opening the file in binary mode, i.e. change the 
call to _Py_wfopen(filename, L"rb") in Modules/main.c. That would also entail 
documenting that PyRun_SimpleFileExFlags requires a FILE pointer that's opened 
in binary mode. After making this change, there's no problem parsing 
"encoding-problem-cp1252.py":

>python --version
Python 3.6.0a4+

>python encoding-problem-cp1252.py
ok

When fp_setreadl is called while parsing "encoding-problem-cp1252.py", 47 bytes 
in the FILE buffer have been read -- up to the end of the coding spec. Let's 
verify this in the debugger:

0:000> bp python35_d!fp_setreadl
0:000> g
Breakpoint 0 hit
python35_d!fp_setreadl:
`662bee00 4889542410  mov qword ptr [rsp+10h],rdx
ss:00d7`6cfeead8=00d76cfeeaf8
0:000> ;as /x fp @@(((python35_d!tok_state *)@rcx)->fp)
0:000> ;as /x ptr @@(((ucrtbased!__crt_stdio_stream_data *)${fp})->_ptr)
0:000> ;as /x base @@(((ucrtbased!__crt_stdio_stream_data *)${fp})->_base)
0:000> ?? ${ptr} - ${base}
int64 0n47

ftell() should return 47, but instead it returns -1. You can see this by 
opening the file in Python 2 on Windows, which uses FILE streams:

>>> f = open('encoding-problem-cp1252.py')
>>> f.read(47)
'#!/usr/bin/env python\n# -*- coding: cp1252 -*-\n'
>>> f.tell()
Traceback (most recent call last):
  File "", line 1, in 
IOError: [Errno 0] Error

ftell starts by getting the file position from the OS and then subtracts the 
unread bytes in the buffer. The buffer has already undergone CRLF => LF 
translation, so ftell makes an assumption that the file uses CRLF line endings 
and thus subtracts 2 bytes for each unread LF. In this case the buffer happens 
to have 48 unread LFs, so ftell returns -1, with the only actual error being a 
fundamentally flawed design in the CRT's text mode.

--
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27797] ASCII file with UNIX line conventions and enough lines throws SyntaxError when ASCII-compatible codec is declared

2016-08-19 Thread SilentGhost

Changes by SilentGhost :


--
stage:  -> needs patch
type:  -> behavior
versions:  -Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27797] ASCII file with UNIX line conventions and enough lines throws SyntaxError when ASCII-compatible codec is declared

2016-08-19 Thread Martijn Pieters

New submission from Martijn Pieters:

To reproduce, create an ASCII file with > io.DEFAULT_BUFFER_SIZE bytes (can be 
blank lines) and *UNIX line endings*, with the first two lines reading:

  #!/usr/bin/env python
  # -*- coding: cp1252 -*-

Try to run this as a script on Windows:

C:\Python35\python.exe encoding-problem-cp1252.py
 File "encoding-problem-cp1252.py", line 2
SyntaxError: encoding problem: cp1252

Converting the file to use CRLF (Windows) line endings makes the problem go 
away.

This appears to be a fallout from issue #20731.

Demo file that reproduces this issue at 710 bytes: 
https://github.com/techtonik/testbin/raw/fbb8aec3650b45f690c4febfd621fe5d6892b14a/python/encoding-problem-cp1252.py

First reported by anatoly techtonik at 
https://stackoverflow.com/questions/39032416/python-3-5-syntaxerror-encoding-prob-em-cp1252

--
components: Interpreter Core, Windows
messages: 273087
nosy: mjpieters, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: ASCII file with UNIX line conventions and enough lines throws 
SyntaxError when ASCII-compatible codec is declared
versions: Python 3.4, Python 3.5, Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com