[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-

2007-10-19 Thread Guido van Rossum

Changes by Guido van Rossum:


--
resolution:  - fixed
status: open - closed

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1278
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-

2007-10-19 Thread Christian Heimes

Christian Heimes added the comment:

The bug was fixed in r58553 together with
http://bugs.python.org/issue1267. Please close this bug.

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1278
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-

2007-10-15 Thread Guido van Rossum

Guido van Rossum added the comment:

Can you suggest a patch?

Adding Brett Cannon to the list, possibly his import-in-python would
supersede this?

--
nosy: +brett.cannon, gvanrossum

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1278
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-

2007-10-15 Thread Christian Heimes

Christian Heimes added the comment:

 Can you suggest a patch?
 
 Adding Brett Cannon to the list, possibly his import-in-python would
 supersede this?

No, I can't suggest a patch. I don't know how we could get the encoding
from the tokenizer or AST.

Brett is obviously the best man to fix the problem. :)

Christian

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1278
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-

2007-10-15 Thread Guido van Rossum

Guido van Rossum added the comment:

 No, I can't suggest a patch. I don't know how we could get the encoding
 from the tokenizer or AST.

Try harder. :-) Look at the code that accomplishes this feat in the
regular parser...

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1278
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-

2007-10-15 Thread Christian Heimes

Christian Heimes added the comment:

 Try harder. :-) Look at the code that accomplishes this feat in the
 regular parser...

I've already found the methods that find the encoding in
Parser/tokenizer.c: check_coding_spec() and friends.

But it seems like a waste of time to use PyTokenizer_FromFile() just to
find the encoding. *reading* Mmh ... It's not a waste of time if I can
stop the tokenizer. I think it may be possible to use the tokenizer to
get the encoding efficiently. I could read until
tok_state-read_coding_spec or tok_state-indent != 0.

Do you know a better way to stop the tokenizer when the line isn't a
special comment line # -*-?

Christian

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1278
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-

2007-10-15 Thread Guido van Rossum

Guido van Rossum added the comment:

Call PyTokenizer_Get until the line number is  2?

On 10/15/07, Christian Heimes [EMAIL PROTECTED] wrote:

 Christian Heimes added the comment:

  Try harder. :-) Look at the code that accomplishes this feat in the
  regular parser...

 I've already found the methods that find the encoding in
 Parser/tokenizer.c: check_coding_spec() and friends.

 But it seems like a waste of time to use PyTokenizer_FromFile() just to
 find the encoding. *reading* Mmh ... It's not a waste of time if I can
 stop the tokenizer. I think it may be possible to use the tokenizer to
 get the encoding efficiently. I could read until
 tok_state-read_coding_spec or tok_state-indent != 0.

 Do you know a better way to stop the tokenizer when the line isn't a
 special comment line # -*-?

 Christian

 __
 Tracker [EMAIL PROTECTED]
 http://bugs.python.org/issue1278
 __


__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1278
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-

2007-10-15 Thread Brett Cannon

Brett Cannon added the comment:

No, my work has the exact same problem.  Actually, this bug report has
confirmed for me why heapq could not be imported when I accidentally
forced all open text files to use UTF-8.  I just have not gotten around
to trying to solve this issue yet.  But since importlib just uses open()
directly it has the same problems.

Since it looks like TextIOWrapper does not let one change the encoding
after it has been set, some subclass might need to be written that reads
Looks for the the stanza or else immediately stops and uses the expected
encoding (UTF-8 in the case of Py3K or ASCII for 2.6).  That or expose
some C function that takes a file path or open file that returns a code
object.

But I have bigger fish to fry as my attempt to get around open() being
defined in site.py is actually failing once I clobbered my .pyc files as
codecs requires importing modules, even for ASCII encoding.

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1278
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-

2007-10-15 Thread Alexandre Vassalotti

Changes by Alexandre Vassalotti:


--
nosy: +alexandre.vassalotti

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1278
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-

2007-10-14 Thread Christian Heimes

New submission from Christian Heimes:

imp.find_module() returns an io.TextIOWrapper instance first value. The
encoding of the TextIOWrapper isn't set from a -*- coding: Latin-1 -*- line.

 import imp
 imp.find_module(heapq)
(io.TextIOWrapper object at 0xb7c8f50c,
'/home/heimes/dev/python/py3k/Lib/heapq.py', ('.py', 'U', 1))
 imp.find_module(heapq)[0].read()
Traceback (most recent call last):
  File stdin, line 1, in module
  File /home/heimes/dev/python/py3k/Lib/io.py, line 1224, in read
res += decoder.decode(self.buffer.read(), True)
  File /home/heimes/dev/python/py3k/Lib/codecs.py, line 291, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position
1428-1430: invalid data
 imp.find_module(heapq)[0].encoding
'UTF-8'
 imp.find_module(heapq)[0].readline()
'# -*- coding: Latin-1 -*-\n'

--
components: Interpreter Core
messages: 56431
nosy: tiran
severity: normal
status: open
title: imp.find_module() ignores -*- coding: Latin-1 -*-
type: behavior
versions: Python 3.0

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1278
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com