[issue41504] Add links to asttokens, leoAst, LibCST and Parso to ast.rst

2020-08-11 Thread Edward K Ream


Edward K Ream  added the comment:

You're welcome. It was a pleasure working with you all on this issue.

I enjoyed learning the PR workflow, and I enjoyed the discussion of the merits.

One last comment. Like everything in life, links and their implied endorsements 
are provisional. If a link ever becomes problematic, I would expect the python 
devs to remove it.

--

___
Python tracker 
<https://bugs.python.org/issue41504>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41504] Add links to asttokens, leoAst, LibCST and Parso to ast.rst

2020-08-07 Thread Edward K Ream


New submission from Edward K Ream :

These links added with the provisional approval of GvR, pending approval of the 
PR.

--
assignee: docs@python
components: Documentation
messages: 375019
nosy: docs@python, edreamleo
priority: normal
severity: normal
status: open
title: Add links to asttokens, leoAst, LibCST and Parso to ast.rst
type: enhancement
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue41504>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

2020-07-25 Thread Edward K Ream


Edward K Ream  added the comment:

Hello all,

This is a "sideways" response to this issue. I have been dithering about 
whether to give you a heads up. I hope you won't mind...

I have just announced the leoAst.py on python-announce-list. You can read the 
announcement here: 

https://github.com/leo-editor/leo-editor/issues/1565#issuecomment-654904747

Imo, leoAst.py solves many of the concerns mentioned in the first comment of 
this thread. leoAst.py is certainly a different approach.

Also imo, the TOG and TOG in leoAst.py plug significant holes in python's ast 
and tokenize modules. These classes might be candidates for python's ast 
module. If you're interested, I will be willing to do further work. If not, I 
completely understand.

As shown in the project's history, a significant amount of invention and 
discovery was required. The root of much of my initial confusion and 
difficulties was the notion that "real programmers don't use tokens". In fact, 
I discovered that the reverse is true. Tokens contain the ground truth. In many 
cases, the parse tree doesn't.

I would be interested in your reactions.

--
nosy: +edreamleo

___
Python tracker 
<https://bugs.python.org/issue7>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38663] Untokenize does not round-trip ws before bs-nl

2019-11-03 Thread Edward K Ream


Edward K Ream  added the comment:

This post:
https://groups.google.com/d/msg/leo-editor/DpZ2cMS03WE/5X8IDzpgEAAJ

discusses unit testing. The summary states:

"I've done the heavy lifting on issue 38663. Python devs should handle the 
details of testing and packaging."

I'll leave it at that. In some ways this issue if very minor, and of almost no 
interest to anyone :-) Do with it as you will.  The ball is in python's court.

--

___
Python tracker 
<https://bugs.python.org/issue38663>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38663] Untokenize does not round-trip ws before bs-nl

2019-11-03 Thread Edward K Ream


Edward K Ream  added the comment:

This post
https://groups.google.com/d/msg/leo-editor/DpZ2cMS03WE/VPqtB9lTEAAJ
discusses a complete rewrite of tokenizer.untokenize.

To quote from the post:

I have "discovered" a spectacular replacement for Untokenizer.untokenize in 
python's tokenize library module. The wretched, buggy, and impossible-to-fix 
add_whitespace method is gone. The new code has no significant 'if' statements, 
and knows almost nothing about tokens!  This is the way untokenize is written 
in The Book.

The new code should put an end to a long series of issues against untokenize 
code in python's tokenize library module.  Some closed issues were blunders 
arising from dumbing-down the TestRoundtrip.check_roundtrip method in 
test_tokenize.py. 

Imo, the way is now clear for proper unit testing of python's Untokenize class.

--

___
Python tracker 
<https://bugs.python.org/issue38663>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38663] Untokenize does not round-trip ws before bs-nl

2019-11-03 Thread Edward K Ream


Edward K Ream  added the comment:

The original bug report used a Leo-only function, g.toUnicode. To fix this, 
replace:

result = g.toUnicode(tokenize.untokenize(tokens))

by:

result_b = tokenize.untokenize(tokens)
result = result_b.decode('utf-8', 'strict')

--

___
Python tracker 
<https://bugs.python.org/issue38663>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38663] Untokenize does not round-trip ws before bs-nl

2019-11-01 Thread Edward K Ream


New submission from Edward K Ream :

Tested on 3.6.

tokenize.untokenize does not round-trip whitespace before backslash-newlines 
outside of strings:

from io import BytesIO
import tokenize

# Round tripping fails on the second string.
table = (
r'''
print\
("abc")
''',
r'''
print \
("abc")
''',
)
for s in table:
tokens = list(tokenize.tokenize(
BytesIO(s.encode('utf-8')).readline))
result = g.toUnicode(tokenize.untokenize(tokens))
print(result==s)

I have an important use case that would benefit from a proper untokenize. After 
considerable study, I have not found a proper fix for tokenize.add_whitespace.

I would be happy to work with anyone to rewrite tokenize.untokenize so that 
unit tests pass without fudges in TestRoundtrip.check_roundtrip.

--
messages: 355827
nosy: edreamleo
priority: normal
severity: normal
status: open
title: Untokenize does not round-trip ws before bs-nl
type: behavior
versions: Python 3.6

___
Python tracker 
<https://bugs.python.org/issue38663>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22616] Allow connecting AST nodes with corresponding source ranges

2019-01-14 Thread Edward K Ream


Edward K Ream  added the comment:

On Mon, Jan 14, 2019 at 5:24 AM Ivan Levkivskyi 
wrote:

Adding endline and endcolumn to every ast node will be a big improvement.

Edward
--
Edward K. Ream: edream...@gmail.com Leo: http://leoeditor.com/
--

--
nosy: +edreamleo

___
Python tracker 
<https://bugs.python.org/issue22616>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25778] winreg.EnumValue does not truncate strings correctly

2016-12-29 Thread Edward K. Ream

Edward K. Ream added the comment:

Thank you, Steve, et. al. for resolving this issue.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25778>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25778] winreg.EnumValue does not truncate strings correctly

2016-12-03 Thread Edward K. Ream

Edward K. Ream added the comment:

On Sat, Dec 3, 2016 at 1:37 PM, Steve Dower <rep...@bugs.python.org> wrote:

​Thanks, Steve and David, for your replies. Getting this issue fixed
eventually will do.​ Glad to hear it was a mistake, and not policy ;-)

EKR

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25778>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25778] winreg.EnumValue does not truncate strings correctly

2016-12-03 Thread Edward K. Ream

Edward K. Ream added the comment:

The last message on this thread was in January, and this item is Open. 
According to Pep 478, 3.5.2 final was released Sunday, June 26, 2016.

How is this issue not a release blocker?

Why does there appear to be no urgency in fixing this bug?

This bug bites for 64-bit versions of Python 3. When it bit, it caused Leo to 
crash during startup. When it bit, it was reason to recommend Python 2 over 
Python 3.

I have just released an ugly workaround in Leo. So now Leo itself can start up, 
but there is no guarantee that user plugins and scripts will work.

Imo, no future version of Python 3 should go out the door until this bug is 
fixed, for sure, and for all time. If you want people to use Python 3, it can 
NOT have this kind of bug in it.

--
nosy: +Edward.K..Ream

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25778>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22819] Python3.4: xml.sax.saxutils.XMLGenerator.__init__ fails with pythonw.exe

2014-11-08 Thread Edward K. Ream

New submission from Edward K. Ream:

In Python3.2 xml.sax.saxutils.XMLGenerator.__init__ succeeds if the out 
keyword argument is not given and sys.stdout is None, which will typically be 
the case when using pythonw.exe.

Alas, on Python3.4, the ctor throws an exception in this case.

This is a major compatibility issue, and is completely unnecessary: the ctor 
should work as before.  An easy fix: allocate a file-like object as the out 
stream, or just do what is done in Python 3.2 ;-)

--
components: Library (Lib)
messages: 230844
nosy: Edward.K..Ream
priority: normal
severity: normal
status: open
title: Python3.4: xml.sax.saxutils.XMLGenerator.__init__ fails with pythonw.exe
type: crash
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22819
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22616] Allow connecting AST nodes with corresponding source ranges

2014-10-12 Thread Edward K. Ream

Edward K. Ream added the comment:

I urge the Python development team to fix this and the related bugs given in 
the Post Script. The lack of an easy way of associating ast nodes with text 
ranges in the original sources is arguably the biggest hole in the Python api.

These bugs have immediate, severe, practical consequences for any tool that 
attempts to regularize (pep 8) or beautify Python code.

Consider the code for PythonTidy:
http://lacusveris.com/PythonTidy/PythonTidy-1.23.python

Every version has had bugs in this area arising from difficult workarounds to 
the hole in the API.  The entire Comments class is a horror directly related to 
these issues.

Consider Aivar's workaround to these bugs:
https://bitbucket.org/plas/thonny/src/8cdaa41aca7a5cc0b31618b6f1631d360c488196/src/ast_utils.py?at=default
See the docstring for def fix_ast_problems.  This is an absurdly difficult 
solution to what should be a trivial problem.

It's impossible to build reliable software using such heroic hacks.  The 
additional bugs listed below further complicate a nightmarish task.

In short, these bugs are *not* minor little nits.  They are preventing the 
development of reliable source-code tools.

Edward K. Ream

P.S. Here are the related bugs:

http://bugs.python.org/issue10769
Allow connecting AST nodes with corresponding source ranges

http://bugs.python.org/issue21295
Python 3.4 gives wrong col_offset for Call nodes returned from ast.parse

http://bugs.python.org/issue18374
ast.parse gives wrong position (col_offset) for some BinOp-s

http://bugs.python.org/issue16806
col_offset is -1 and lineno is wrong for multiline string expressions

EKR

--
nosy: +Edward.K..Ream

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22616
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17850] unicode_escape encoding fails for '\\Upsilon'

2013-04-26 Thread Edward K. Ream

New submission from Edward K. Ream:

On both windows and Linux the following fails on Python 2.7:

   s = '\\Upsilon'
   unicode(s,unicode_escape)

UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-7: 
end of string in escape sequence

BTW, the six.py package uses this call.  If this call doesn't work, six is 
broken.

--
components: Library (Lib)
messages: 187852
nosy: Edward.K..Ream
priority: normal
severity: normal
status: open
title: unicode_escape encoding fails for '\\Upsilon'
type: crash
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17850
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17850] unicode_escape encoding fails for '\\Upsilon'

2013-04-26 Thread Edward K. Ream

Edward K. Ream added the comment:

Thanks for your quick reply.

If this is not a bug, why does six define six.u as unicode(s,unicode_escape) 
for *all* u constants??

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17850
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17850] unicode_escape encoding fails for '\\Upsilon'

2013-04-26 Thread Edward K Ream

Edward K Ream added the comment:

On Fri, Apr 26, 2013 at 8:51 AM, Edward K. Ream rep...@bugs.python.orgwrote:


 If this is not a bug, why does six define six.u as
 unicode(s,unicode_escape) for *all* u constants??


Oops.  The following works::

s = r'\\Upsilon'
unicode(s,unicode_escape)

My apologies for the noise.

Edward

--
nosy: +edreamleo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17850
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4531] Deprecation warnings in lib\compiler\ast.py

2008-12-04 Thread Edward K Ream

New submission from Edward K Ream [EMAIL PROTECTED]:

Python 2.6 final on Windows XP gives following warnings with -3 option:

c:\python26\lib\compiler\ast.py:54: SyntaxWarning: tuple parameter
unpacking has been removed in 3.x
  def __init__(self, (left, right), lineno=None):
c:\python26\lib\compiler\ast.py:434: SyntaxWarning: tuple parameter
unpacking has been removed in 3.x
  def __init__(self, (left, right), lineno=None):
c:\python26\lib\compiler\ast.py:488: SyntaxWarning: tuple parameter
unpacking has been removed in 3.x
  def __init__(self, (left, right), lineno=None):
c:\python26\lib\compiler\ast.py:806: SyntaxWarning: tuple parameter
unpacking has been removed in 3.x
  def __init__(self, (left, right), lineno=None):
c:\python26\lib\compiler\ast.py:896: SyntaxWarning: tuple parameter
unpacking has been removed in 3.x
  def __init__(self, (left, right), lineno=None):
c:\python26\lib\compiler\ast.py:926: SyntaxWarning: tuple parameter
unpacking has been removed in 3.x
  def __init__(self, (left, right), lineno=None):
c:\python26\lib\compiler\ast.py:998: SyntaxWarning: tuple parameter
unpacking has been removed in 3.x
  def __init__(self, (left, right), lineno=None):
c:\python26\lib\compiler\ast.py:1098: SyntaxWarning: tuple parameter
unpacking has been removed in 3.x
  def __init__(self, (left, right), lineno=None):
c:\python26\lib\compiler\ast.py:1173: SyntaxWarning: tuple parameter
unpacking has been removed in 3.x
  def __init__(self, (left, right), lineno=None):
c:\python26\lib\compiler\pycodegen.py:903: SyntaxWarning: tuple
parameter unpacking has been removed in 3.x

Edward

Edward K. Ream email: [EMAIL PROTECTED]
Leo: http://webpages.charter.net/edreamleo/front.html


--
components: Library (Lib)
messages: 76904
nosy: edreamleo
severity: normal
status: open
title: Deprecation warnings in lib\compiler\ast.py
type: compile error
versions: Python 2.6

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4531
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4531] Deprecation warnings in lib\compiler\ast.py

2008-12-04 Thread Edward K Ream

Edward K Ream [EMAIL PROTECTED] added the comment:

On Thu, Dec 4, 2008 at 12:33 PM, Brett Cannon [EMAIL PROTECTED] wrote:

 Brett Cannon [EMAIL PROTECTED] added the comment:

 Considering the entire compiler package is not in 3.0 it is not worth
 fixing this. Closing as wont fix.

Thanks for this clarification.

Edward

Edward K. Ream email: [EMAIL PROTECTED]
Leo: http://webpages.charter.net/edreamleo/front.html


___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4531
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3590] sax.parser hangs on byte streams

2008-08-18 Thread Edward K Ream

New submission from Edward K Ream [EMAIL PROTECTED]:

While porting Leo to Python 3.0, I found that passing any byte stream to
xml.sax.parser.parse will hang the parser.  My quick fix was to change:

while buffer != :

to:

while buffer !=  and buffer != b:

at line 123 of xmlreader.py

Here is the entire function:

def parse(self, source):
from . import saxutils
source = saxutils.prepare_input_source(source)

self.prepareParser(source)
file = source.getByteStream()
buffer = file.read(self._bufsize)
### while buffer != :
while buffer !=  and buffer != b: ### EKR
self.feed(buffer)
buffer = file.read(self._bufsize)
self.close()

For reference, here is the code in Leo that was hanging::

  parser = xml.sax.make_parser()
  parser.setFeature(xml.sax.handler.feature_external_ges,1)
  handler = saxContentHandler(c,inputFileName,silent,inClipboard)
  parser.setContentHandler(handler)
  parser.parse(theFile)

Looking at the test_expat_file function in test_sax.py, it appears that
the essential difference between the code that hangs and the successful
unit test is that that Leo opens the file in 'rb' mode. (code not shown)
It's doubtful that 'rb' mode is correct--from the unit test I deduce
that the default 'r' mode would be better.  Anyway, it would be nice if
parser.parse didn't hang on dubious streams.

HTH.

Edward

--
components: Library (Lib)
messages: 71339
nosy: edreamleo
severity: normal
status: open
title: sax.parser hangs on byte streams
type: behavior
versions: Python 3.0

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3590
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3590] sax.parser hangs on byte streams

2008-08-18 Thread Edward K Ream

Edward K Ream [EMAIL PROTECTED] added the comment:

On Mon, Aug 18, 2008 at 10:09 AM, Benjamin Peterson
[EMAIL PROTECTED]wrote:


 Benjamin Peterson [EMAIL PROTECTED] added the comment:

 It should probably be changed to just while buffer != b since it
 requests a byte stream.

That was my guess as well.  I added the extra test so as not to remove a
test that might, under some circumstance be important.

Just to be clear, I am at present totally confused about io streams :-)
Especially as used by the sax parsers.  In particular, opening a file in 'r'
mode, that is, passing a *non*-byte stream to parser.parse, works, while
opening a file in 'rb' mode, that is, passing a *byte* stream to
parser.parse, hangs.

Anyway, opening the file passed to parser.parse with 'r' mode looks like the
(only) way to go when using Python 3.0.  In Python 2.5, opening files passed
to parser.parse in 'rb' mode works.  I don't recall whether I had any reason
for 'rb' mode: it may have been an historical accident, or just a lucky
accident :-)

Edward

Edward K. Ream email: [EMAIL PROTECTED]
Leo: http://webpages.charter.net/edreamleo/front.html


Added file: http://bugs.python.org/file11145/unnamed

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3590
___div dir=ltrOn Mon, Aug 18, 2008 at 10:09 AM, Benjamin Peterson span 
dir=ltrlt;a href=mailto:[EMAIL PROTECTED][EMAIL 
PROTECTED]/agt;/span wrote:brdiv class=gmail_quoteblockquote 
class=gmail_quote style=border-left: 1px solid rgb(204, 204, 204); margin: 
0pt 0pt 0pt 0.8ex; padding-left: 1ex;
br
Benjamin Peterson lt;a href=mailto:[EMAIL PROTECTED][EMAIL 
PROTECTED]/agt; added the comment:br
br
It should probably be changed to just while buffer != bquot;quot; since itbr
requests a byte stream./blockquotedivbrThat was my guess as well.nbsp; I 
added the extra test so as not to remove a test that might, under some 
circumstance be important.brbrJust to be clear, I am at present totally 
confused about io streams :-)nbsp; Especially as used by the sax 
parsers.nbsp; In particular, opening a file in #39;r#39; mode, that is, 
passing a *non*-byte stream to parser.parse, works, while opening a file in 
#39;rb#39; mode, that is, passing a *byte* stream to parser.parse, hangs.br
brAnyway, opening the file passed to parser.parse with #39;r#39; mode looks 
like the (only) way to go when using Python 3.0.nbsp; In Python 2.5, opening 
files passed to parser.parse in #39;rb#39; mode works.nbsp; I don#39;t 
recall whether I had any reason for #39;rb#39; mode: it may have been an 
historical accident, or just a lucky accident :-)br
brEdward/div/divbrEdward
 K. Ream email: a href=mailto:[EMAIL PROTECTED][EMAIL 
PROTECTED]/abrLeo: a 
href=http://webpages.charter.net/edreamleo/front.html;http://webpages.charter.net/edreamleo/front.html/abr
brbr
/div
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3590] sax.parser considers XML as text rather than bytes

2008-08-18 Thread Edward K Ream

Edward K Ream [EMAIL PROTECTED] added the comment:

On Mon, Aug 18, 2008 at 1:51 PM, Antoine Pitrou [EMAIL PROTECTED]wrote:


 Antoine Pitrou [EMAIL PROTECTED] added the comment:

 From the discussion on the python-3000, it looks like it would be nice
 if sax.parser handled both bytes and unicode streams.


 Edward, does your simple fix make sax.parser work entirely well with
 byte streams?

No. The sax.parser seems to have other problems.  Here is what I *think* I
know ;-)

1. A smallish .leo file (an xml file) containing a single non-ascii (utf-8)
encoded character appears to have been read correctly with Python 3.0.

2. A larger .leo file fails as follows (it's possible that the duplicate
error messages are a Leo problem):

Traceback (most recent call last):
Traceback (most recent call last):

  File C:\leo.repo\leo-30\leo\core\leoFileCommands.py, line 1283, in
parse_leo_file
parser.parse(theFile) # expat does not support parseString
  File C:\leo.repo\leo-30\leo\core\leoFileCommands.py, line 1283, in
parse_leo_file
parser.parse(theFile) # expat does not support parseString

  File c:\python30\lib\xml\sax\expatreader.py, line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
  File c:\python30\lib\xml\sax\expatreader.py, line 107, in parse
xmlreader.IncrementalParser.parse(self, source)

  File c:\python30\lib\xml\sax\xmlreader.py, line 121, in parse
buffer = file.read(self._bufsize)
  File c:\python30\lib\xml\sax\xmlreader.py, line 121, in parse
buffer = file.read(self._bufsize)

  File C:\Python30\lib\io.py, line 1670, in read
eof = not self._read_chunk()
  File C:\Python30\lib\io.py, line 1670, in read
eof = not self._read_chunk()

  File C:\Python30\lib\io.py, line 1499, in _read_chunk
self._set_decoded_chars(self._decoder.decode(input_chunk, eof))
  File C:\Python30\lib\io.py, line 1499, in _read_chunk
self._set_decoded_chars(self._decoder.decode(input_chunk, eof))

  File C:\Python30\lib\io.py, line 1236, in decode
output = self.decoder.decode(input, final=final)
  File C:\Python30\lib\io.py, line 1236, in decode
output = self.decoder.decode(input, final=final)

  File C:\Python30\lib\encodings\cp1252.py, line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
  File C:\Python30\lib\encodings\cp1252.py, line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 74:
character maps to undefined
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 74:
character maps to undefined

The same calls to sax read the file correctly on Python 2.5.

It would be nice to have a message pinpoint the line and character offset of
the problem.

My vote would be for the code to work on both kinds of input streams. This
would save the users considerable confusion if sax does the (tricky)
conversions automatically.

Imo, now would be the most convenient time to attempt this--there is a
certain freedom in having everything be partially broken :-)

Edward

Edward K. Ream email: [EMAIL PROTECTED]
Leo: http://webpages.charter.net/edreamleo/front.html


Added file: http://bugs.python.org/file11147/unnamed

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3590
___div dir=ltrbrbrdiv class=gmail_quoteOn Mon, Aug 18, 2008 at 1:51 
PM, Antoine Pitrou span dir=ltrlt;a href=mailto:[EMAIL 
PROTECTED][EMAIL PROTECTED]/agt;/span wrote:brblockquote 
class=gmail_quote style=border-left: 1px solid rgb(204, 204, 204); margin: 
0pt 0pt 0pt 0.8ex; padding-left: 1ex;
div class=Ih2E3dbr
Antoine Pitrou lt;a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/agt; 
added the comment:br
br
/divFrom the discussion on the python-3000, it looks like it would be nicebr
if sax.parser handled both bytes and unicode 
streams.br/blockquotedivnbsp;br/divblockquote class=gmail_quote 
style=border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; 
padding-left: 1ex;
Edward, does your simple fix make sax.parser work entirely well withbr
byte streams?/blockquotedivbrNo. The sax.parser seems to have other 
problems.nbsp; Here is what I *think* I know ;-)brbr1. A smallish .leo 
file (an xml file) containing a single non-ascii (utf-8) encoded character 
appears to have been read correctly with Python 3.0.br
br2. A larger .leo file fails as follows (it#39;s possible that the 
duplicate error messages are a Leo problem):brbrTraceback (most recent call 
last):brTraceback (most recent call last):brbrnbsp; File 
quot;C:\leo.repo\leo-30\leo\core\leoFileCommands.pyquot;, line 1283, in 
parse_leo_filebr
nbsp;nbsp;nbsp; parser.parse(theFile) # expat does not support 
parseStringbrnbsp; File 
quot;C:\leo.repo\leo-30\leo\core

[issue3590] sax.parser considers XML as text rather than bytes

2008-08-18 Thread Edward K Ream

Edward K Ream [EMAIL PROTECTED] added the comment:

On Mon, Aug 18, 2008 at 11:00 AM, Antoine Pitrou [EMAIL PROTECTED]wrote:


 Antoine Pitrou [EMAIL PROTECTED] added the comment:

  Just to be clear, I am at present totally confused about io streams :-)

 Python 3.0 distincts more clearly between unicode strings (called str
 in 3.0) and bytes strings (called bytes in 3.0). The most important
 point being that there is no more any implicit conversion between the
 two: you must explicitly use .encode() or .decode().

 Files opened in binary (rb) mode returns byte strings, but files
 opened in text (r) mode return unicode strings, which means you can't
 give a text file to 3.0 library expecting a binary file, or vice-versa.

 What is more worrying is that XML, until decoded, should be considered a
 byte stream, so sax.parser should accept binary files rather than text
 files. I took a look at test_sax and indeed it considers XML as text
 rather than bytes :-(

Thanks for these remarks.  They confirm what I suspected, but was unsure of,
namely that it seems strange to be passing something other than a byte
stream to parser.parse.


 Bumping this as critical because it needs a decision very soon (ideally
 before beta3).

Thanks for taking this seriously.

Edward

P.S.  I love the new unicode plans.  They are going to cause some pain at
first for everyone (Python team and developers), but in the long run they
are going to be a big plus for Python.

EKR

Edward K. Ream email: [EMAIL PROTECTED]
Leo: http://webpages.charter.net/edreamleo/front.html


Added file: http://bugs.python.org/file11148/unnamed

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3590
___div dir=ltrbrbrdiv class=gmail_quoteOn Mon, Aug 18, 2008 at 11:00 
AM, Antoine Pitrou span dir=ltrlt;a href=mailto:[EMAIL 
PROTECTED][EMAIL PROTECTED]/agt;/span wrote:brblockquote 
class=gmail_quote style=border-left: 1px solid rgb(204, 204, 204); margin: 
0pt 0pt 0pt 0.8ex; padding-left: 1ex;
br
Antoine Pitrou lt;a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/agt; 
added the comment:br
br
gt; Just to be clear, I am at present totally confused about io streams :-)br
br
Python 3.0 distincts more clearly between unicode strings (called 
quot;strquot;br
in 3.0) and bytes strings (called quot;bytesquot; in 3.0). The most 
importantbr
point being that there is no more any implicit conversion between thebr
two: you must explicitly use .encode() or .decode().br
br
Files opened in binary (quot;rbquot;) mode returns byte strings, but filesbr
opened in text (quot;rquot;) mode return unicode strings, which means you 
can#39;tbr
give a text file to 3.0 library expecting a binary file, or vice-versa.br
br
What is more worrying is that XML, until decoded, should be considered abr
byte stream, so sax.parser should accept binary files rather than textbr
files. I took a look at test_sax and indeed it considers XML as textbr
rather than bytes :-(/blockquotedivbrThanks for these remarks.nbsp; They 
confirm what I suspected, but was unsure of, namely that it seems strange to be 
passing something other than a byte stream to parser.parse.br/div
blockquote class=gmail_quote style=border-left: 1px solid rgb(204, 204, 
204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;
br
Bumping this as critical because it needs a decision very soon (ideallybr
before beta3)./blockquotedivbrThanks for taking this 
seriously.brbrEdwardbrbrP.S.nbsp; I love the new unicode plans.nbsp; 
They are going to cause some pain at first for everyone (Python team and 
developers), but in the long run they are going to be a big plus for Python.br
brEKRbr/div/divbrEdward
 K. Ream email: a href=mailto:[EMAIL PROTECTED][EMAIL 
PROTECTED]/abrLeo: a 
href=http://webpages.charter.net/edreamleo/front.html;http://webpages.charter.net/edreamleo/front.html/abr
brbr
/div
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3590] sax.parser considers XML as text rather than bytes

2008-08-18 Thread Edward K Ream

Edward K Ream [EMAIL PROTECTED] added the comment:

On Mon, Aug 18, 2008 at 4:15 PM, Antoine Pitrou [EMAIL PROTECTED]wrote:


 Antoine Pitrou [EMAIL PROTECTED] added the comment:

  The same calls to sax read the file correctly on Python 2.5.

 What are those calls exactly?

  parser = xml.sax.make_parser()
  parser.setFeature(xml.sax.handler.feature_external_ges,1)
  handler = saxContentHandler(c,inputFileName,silent,inClipboard)
  parser.setContentHandler(handler)
  parser.parse(theFile)

As discussed in http://bugs.python.org/issue3590

theFile is a file opened with 'rb' attributes

Edward


Edward K. Ream email: [EMAIL PROTECTED]
Leo: http://webpages.charter.net/edreamleo/front.html


Added file: http://bugs.python.org/file11151/unnamed

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3590
___div dir=ltrbrbrdiv class=gmail_quoteOn Mon, Aug 18, 2008 at 4:15 
PM, Antoine Pitrou span dir=ltrlt;a href=mailto:[EMAIL 
PROTECTED][EMAIL PROTECTED]/agt;/span wrote:brblockquote 
class=gmail_quote style=border-left: 1px solid rgb(204, 204, 204); margin: 
0pt 0pt 0pt 0.8ex; padding-left: 1ex;
div class=Ih2E3dbr
Antoine Pitrou lt;a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/agt; 
added the comment:br
br
/divdiv class=Ih2E3dgt; The same calls to sax read the file correctly on 
Python 2.5.br
br
/divWhat are those calls exactly?/blockquotedivbrpre  parser = 
xml.sax.make_parser()br  
parser.setFeature(xml.sax.handler.feature_external_ges,1)br  handler = 
saxContentHandler(c,inputFileName,silent,inClipboard)br
  parser.setContentHandler(handler)br  parser.parse(theFile)brbrAs 
discussed in a 
href=http://bugs.python.org/issue3590;http://bugs.python.org/issue3590/abrbrtheFile
 is a file opened with #39;rb#39; attributesbr
brEdward/pre/div/divbrEdward
 K. Ream email: a href=mailto:[EMAIL PROTECTED][EMAIL 
PROTECTED]/abrLeo: a 
href=http://webpages.charter.net/edreamleo/front.html;http://webpages.charter.net/edreamleo/front.html/abr
brbr
/div
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com