[issue41504] Add links to asttokens, leoAst, LibCST and Parso to ast.rst
Edward K Ream added the comment: You're welcome. It was a pleasure working with you all on this issue. I enjoyed learning the PR workflow, and I enjoyed the discussion of the merits. One last comment. Like everything in life, links and their implied endorsements are provisional. If a link ever becomes problematic, I would expect the python devs to remove it. -- ___ Python tracker <https://bugs.python.org/issue41504> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41504] Add links to asttokens, leoAst, LibCST and Parso to ast.rst
New submission from Edward K Ream : These links added with the provisional approval of GvR, pending approval of the PR. -- assignee: docs@python components: Documentation messages: 375019 nosy: docs@python, edreamleo priority: normal severity: normal status: open title: Add links to asttokens, leoAst, LibCST and Parso to ast.rst type: enhancement versions: Python 3.9 ___ Python tracker <https://bugs.python.org/issue41504> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library
Edward K Ream added the comment: Hello all, This is a "sideways" response to this issue. I have been dithering about whether to give you a heads up. I hope you won't mind... I have just announced the leoAst.py on python-announce-list. You can read the announcement here: https://github.com/leo-editor/leo-editor/issues/1565#issuecomment-654904747 Imo, leoAst.py solves many of the concerns mentioned in the first comment of this thread. leoAst.py is certainly a different approach. Also imo, the TOG and TOG in leoAst.py plug significant holes in python's ast and tokenize modules. These classes might be candidates for python's ast module. If you're interested, I will be willing to do further work. If not, I completely understand. As shown in the project's history, a significant amount of invention and discovery was required. The root of much of my initial confusion and difficulties was the notion that "real programmers don't use tokens". In fact, I discovered that the reverse is true. Tokens contain the ground truth. In many cases, the parse tree doesn't. I would be interested in your reactions. -- nosy: +edreamleo ___ Python tracker <https://bugs.python.org/issue7> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38663] Untokenize does not round-trip ws before bs-nl
Edward K Ream added the comment: This post: https://groups.google.com/d/msg/leo-editor/DpZ2cMS03WE/5X8IDzpgEAAJ discusses unit testing. The summary states: "I've done the heavy lifting on issue 38663. Python devs should handle the details of testing and packaging." I'll leave it at that. In some ways this issue if very minor, and of almost no interest to anyone :-) Do with it as you will. The ball is in python's court. -- ___ Python tracker <https://bugs.python.org/issue38663> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38663] Untokenize does not round-trip ws before bs-nl
Edward K Ream added the comment: This post https://groups.google.com/d/msg/leo-editor/DpZ2cMS03WE/VPqtB9lTEAAJ discusses a complete rewrite of tokenizer.untokenize. To quote from the post: I have "discovered" a spectacular replacement for Untokenizer.untokenize in python's tokenize library module. The wretched, buggy, and impossible-to-fix add_whitespace method is gone. The new code has no significant 'if' statements, and knows almost nothing about tokens! This is the way untokenize is written in The Book. The new code should put an end to a long series of issues against untokenize code in python's tokenize library module. Some closed issues were blunders arising from dumbing-down the TestRoundtrip.check_roundtrip method in test_tokenize.py. Imo, the way is now clear for proper unit testing of python's Untokenize class. -- ___ Python tracker <https://bugs.python.org/issue38663> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38663] Untokenize does not round-trip ws before bs-nl
Edward K Ream added the comment: The original bug report used a Leo-only function, g.toUnicode. To fix this, replace: result = g.toUnicode(tokenize.untokenize(tokens)) by: result_b = tokenize.untokenize(tokens) result = result_b.decode('utf-8', 'strict') -- ___ Python tracker <https://bugs.python.org/issue38663> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38663] Untokenize does not round-trip ws before bs-nl
New submission from Edward K Ream : Tested on 3.6. tokenize.untokenize does not round-trip whitespace before backslash-newlines outside of strings: from io import BytesIO import tokenize # Round tripping fails on the second string. table = ( r''' print\ ("abc") ''', r''' print \ ("abc") ''', ) for s in table: tokens = list(tokenize.tokenize( BytesIO(s.encode('utf-8')).readline)) result = g.toUnicode(tokenize.untokenize(tokens)) print(result==s) I have an important use case that would benefit from a proper untokenize. After considerable study, I have not found a proper fix for tokenize.add_whitespace. I would be happy to work with anyone to rewrite tokenize.untokenize so that unit tests pass without fudges in TestRoundtrip.check_roundtrip. -- messages: 355827 nosy: edreamleo priority: normal severity: normal status: open title: Untokenize does not round-trip ws before bs-nl type: behavior versions: Python 3.6 ___ Python tracker <https://bugs.python.org/issue38663> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22616] Allow connecting AST nodes with corresponding source ranges
Edward K Ream added the comment: On Mon, Jan 14, 2019 at 5:24 AM Ivan Levkivskyi wrote: Adding endline and endcolumn to every ast node will be a big improvement. Edward -- Edward K. Ream: edream...@gmail.com Leo: http://leoeditor.com/ -- -- nosy: +edreamleo ___ Python tracker <https://bugs.python.org/issue22616> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25778] winreg.EnumValue does not truncate strings correctly
Edward K. Ream added the comment: Thank you, Steve, et. al. for resolving this issue. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25778> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25778] winreg.EnumValue does not truncate strings correctly
Edward K. Ream added the comment: On Sat, Dec 3, 2016 at 1:37 PM, Steve Dower <rep...@bugs.python.org> wrote: Thanks, Steve and David, for your replies. Getting this issue fixed eventually will do. Glad to hear it was a mistake, and not policy ;-) EKR -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25778> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25778] winreg.EnumValue does not truncate strings correctly
Edward K. Ream added the comment: The last message on this thread was in January, and this item is Open. According to Pep 478, 3.5.2 final was released Sunday, June 26, 2016. How is this issue not a release blocker? Why does there appear to be no urgency in fixing this bug? This bug bites for 64-bit versions of Python 3. When it bit, it caused Leo to crash during startup. When it bit, it was reason to recommend Python 2 over Python 3. I have just released an ugly workaround in Leo. So now Leo itself can start up, but there is no guarantee that user plugins and scripts will work. Imo, no future version of Python 3 should go out the door until this bug is fixed, for sure, and for all time. If you want people to use Python 3, it can NOT have this kind of bug in it. -- nosy: +Edward.K..Ream ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25778> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22819] Python3.4: xml.sax.saxutils.XMLGenerator.__init__ fails with pythonw.exe
New submission from Edward K. Ream: In Python3.2 xml.sax.saxutils.XMLGenerator.__init__ succeeds if the out keyword argument is not given and sys.stdout is None, which will typically be the case when using pythonw.exe. Alas, on Python3.4, the ctor throws an exception in this case. This is a major compatibility issue, and is completely unnecessary: the ctor should work as before. An easy fix: allocate a file-like object as the out stream, or just do what is done in Python 3.2 ;-) -- components: Library (Lib) messages: 230844 nosy: Edward.K..Ream priority: normal severity: normal status: open title: Python3.4: xml.sax.saxutils.XMLGenerator.__init__ fails with pythonw.exe type: crash versions: Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22819 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22616] Allow connecting AST nodes with corresponding source ranges
Edward K. Ream added the comment: I urge the Python development team to fix this and the related bugs given in the Post Script. The lack of an easy way of associating ast nodes with text ranges in the original sources is arguably the biggest hole in the Python api. These bugs have immediate, severe, practical consequences for any tool that attempts to regularize (pep 8) or beautify Python code. Consider the code for PythonTidy: http://lacusveris.com/PythonTidy/PythonTidy-1.23.python Every version has had bugs in this area arising from difficult workarounds to the hole in the API. The entire Comments class is a horror directly related to these issues. Consider Aivar's workaround to these bugs: https://bitbucket.org/plas/thonny/src/8cdaa41aca7a5cc0b31618b6f1631d360c488196/src/ast_utils.py?at=default See the docstring for def fix_ast_problems. This is an absurdly difficult solution to what should be a trivial problem. It's impossible to build reliable software using such heroic hacks. The additional bugs listed below further complicate a nightmarish task. In short, these bugs are *not* minor little nits. They are preventing the development of reliable source-code tools. Edward K. Ream P.S. Here are the related bugs: http://bugs.python.org/issue10769 Allow connecting AST nodes with corresponding source ranges http://bugs.python.org/issue21295 Python 3.4 gives wrong col_offset for Call nodes returned from ast.parse http://bugs.python.org/issue18374 ast.parse gives wrong position (col_offset) for some BinOp-s http://bugs.python.org/issue16806 col_offset is -1 and lineno is wrong for multiline string expressions EKR -- nosy: +Edward.K..Ream ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22616 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17850] unicode_escape encoding fails for '\\Upsilon'
New submission from Edward K. Ream: On both windows and Linux the following fails on Python 2.7: s = '\\Upsilon' unicode(s,unicode_escape) UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-7: end of string in escape sequence BTW, the six.py package uses this call. If this call doesn't work, six is broken. -- components: Library (Lib) messages: 187852 nosy: Edward.K..Ream priority: normal severity: normal status: open title: unicode_escape encoding fails for '\\Upsilon' type: crash versions: Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17850 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17850] unicode_escape encoding fails for '\\Upsilon'
Edward K. Ream added the comment: Thanks for your quick reply. If this is not a bug, why does six define six.u as unicode(s,unicode_escape) for *all* u constants?? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17850 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17850] unicode_escape encoding fails for '\\Upsilon'
Edward K Ream added the comment: On Fri, Apr 26, 2013 at 8:51 AM, Edward K. Ream rep...@bugs.python.orgwrote: If this is not a bug, why does six define six.u as unicode(s,unicode_escape) for *all* u constants?? Oops. The following works:: s = r'\\Upsilon' unicode(s,unicode_escape) My apologies for the noise. Edward -- nosy: +edreamleo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17850 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4531] Deprecation warnings in lib\compiler\ast.py
New submission from Edward K Ream [EMAIL PROTECTED]: Python 2.6 final on Windows XP gives following warnings with -3 option: c:\python26\lib\compiler\ast.py:54: SyntaxWarning: tuple parameter unpacking has been removed in 3.x def __init__(self, (left, right), lineno=None): c:\python26\lib\compiler\ast.py:434: SyntaxWarning: tuple parameter unpacking has been removed in 3.x def __init__(self, (left, right), lineno=None): c:\python26\lib\compiler\ast.py:488: SyntaxWarning: tuple parameter unpacking has been removed in 3.x def __init__(self, (left, right), lineno=None): c:\python26\lib\compiler\ast.py:806: SyntaxWarning: tuple parameter unpacking has been removed in 3.x def __init__(self, (left, right), lineno=None): c:\python26\lib\compiler\ast.py:896: SyntaxWarning: tuple parameter unpacking has been removed in 3.x def __init__(self, (left, right), lineno=None): c:\python26\lib\compiler\ast.py:926: SyntaxWarning: tuple parameter unpacking has been removed in 3.x def __init__(self, (left, right), lineno=None): c:\python26\lib\compiler\ast.py:998: SyntaxWarning: tuple parameter unpacking has been removed in 3.x def __init__(self, (left, right), lineno=None): c:\python26\lib\compiler\ast.py:1098: SyntaxWarning: tuple parameter unpacking has been removed in 3.x def __init__(self, (left, right), lineno=None): c:\python26\lib\compiler\ast.py:1173: SyntaxWarning: tuple parameter unpacking has been removed in 3.x def __init__(self, (left, right), lineno=None): c:\python26\lib\compiler\pycodegen.py:903: SyntaxWarning: tuple parameter unpacking has been removed in 3.x Edward Edward K. Ream email: [EMAIL PROTECTED] Leo: http://webpages.charter.net/edreamleo/front.html -- components: Library (Lib) messages: 76904 nosy: edreamleo severity: normal status: open title: Deprecation warnings in lib\compiler\ast.py type: compile error versions: Python 2.6 ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4531 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4531] Deprecation warnings in lib\compiler\ast.py
Edward K Ream [EMAIL PROTECTED] added the comment: On Thu, Dec 4, 2008 at 12:33 PM, Brett Cannon [EMAIL PROTECTED] wrote: Brett Cannon [EMAIL PROTECTED] added the comment: Considering the entire compiler package is not in 3.0 it is not worth fixing this. Closing as wont fix. Thanks for this clarification. Edward Edward K. Ream email: [EMAIL PROTECTED] Leo: http://webpages.charter.net/edreamleo/front.html ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4531 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3590] sax.parser hangs on byte streams
New submission from Edward K Ream [EMAIL PROTECTED]: While porting Leo to Python 3.0, I found that passing any byte stream to xml.sax.parser.parse will hang the parser. My quick fix was to change: while buffer != : to: while buffer != and buffer != b: at line 123 of xmlreader.py Here is the entire function: def parse(self, source): from . import saxutils source = saxutils.prepare_input_source(source) self.prepareParser(source) file = source.getByteStream() buffer = file.read(self._bufsize) ### while buffer != : while buffer != and buffer != b: ### EKR self.feed(buffer) buffer = file.read(self._bufsize) self.close() For reference, here is the code in Leo that was hanging:: parser = xml.sax.make_parser() parser.setFeature(xml.sax.handler.feature_external_ges,1) handler = saxContentHandler(c,inputFileName,silent,inClipboard) parser.setContentHandler(handler) parser.parse(theFile) Looking at the test_expat_file function in test_sax.py, it appears that the essential difference between the code that hangs and the successful unit test is that that Leo opens the file in 'rb' mode. (code not shown) It's doubtful that 'rb' mode is correct--from the unit test I deduce that the default 'r' mode would be better. Anyway, it would be nice if parser.parse didn't hang on dubious streams. HTH. Edward -- components: Library (Lib) messages: 71339 nosy: edreamleo severity: normal status: open title: sax.parser hangs on byte streams type: behavior versions: Python 3.0 ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3590 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3590] sax.parser hangs on byte streams
Edward K Ream [EMAIL PROTECTED] added the comment: On Mon, Aug 18, 2008 at 10:09 AM, Benjamin Peterson [EMAIL PROTECTED]wrote: Benjamin Peterson [EMAIL PROTECTED] added the comment: It should probably be changed to just while buffer != b since it requests a byte stream. That was my guess as well. I added the extra test so as not to remove a test that might, under some circumstance be important. Just to be clear, I am at present totally confused about io streams :-) Especially as used by the sax parsers. In particular, opening a file in 'r' mode, that is, passing a *non*-byte stream to parser.parse, works, while opening a file in 'rb' mode, that is, passing a *byte* stream to parser.parse, hangs. Anyway, opening the file passed to parser.parse with 'r' mode looks like the (only) way to go when using Python 3.0. In Python 2.5, opening files passed to parser.parse in 'rb' mode works. I don't recall whether I had any reason for 'rb' mode: it may have been an historical accident, or just a lucky accident :-) Edward Edward K. Ream email: [EMAIL PROTECTED] Leo: http://webpages.charter.net/edreamleo/front.html Added file: http://bugs.python.org/file11145/unnamed ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3590 ___div dir=ltrOn Mon, Aug 18, 2008 at 10:09 AM, Benjamin Peterson span dir=ltrlt;a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/agt;/span wrote:brdiv class=gmail_quoteblockquote class=gmail_quote style=border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex; br Benjamin Peterson lt;a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/agt; added the comment:br br It should probably be changed to just while buffer != bquot;quot; since itbr requests a byte stream./blockquotedivbrThat was my guess as well.nbsp; I added the extra test so as not to remove a test that might, under some circumstance be important.brbrJust to be clear, I am at present totally confused about io streams :-)nbsp; Especially as used by the sax parsers.nbsp; In particular, opening a file in #39;r#39; mode, that is, passing a *non*-byte stream to parser.parse, works, while opening a file in #39;rb#39; mode, that is, passing a *byte* stream to parser.parse, hangs.br brAnyway, opening the file passed to parser.parse with #39;r#39; mode looks like the (only) way to go when using Python 3.0.nbsp; In Python 2.5, opening files passed to parser.parse in #39;rb#39; mode works.nbsp; I don#39;t recall whether I had any reason for #39;rb#39; mode: it may have been an historical accident, or just a lucky accident :-)br brEdward/div/divbrEdward K. Ream email: a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/abrLeo: a href=http://webpages.charter.net/edreamleo/front.html;http://webpages.charter.net/edreamleo/front.html/abr brbr /div ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3590] sax.parser considers XML as text rather than bytes
Edward K Ream [EMAIL PROTECTED] added the comment: On Mon, Aug 18, 2008 at 1:51 PM, Antoine Pitrou [EMAIL PROTECTED]wrote: Antoine Pitrou [EMAIL PROTECTED] added the comment: From the discussion on the python-3000, it looks like it would be nice if sax.parser handled both bytes and unicode streams. Edward, does your simple fix make sax.parser work entirely well with byte streams? No. The sax.parser seems to have other problems. Here is what I *think* I know ;-) 1. A smallish .leo file (an xml file) containing a single non-ascii (utf-8) encoded character appears to have been read correctly with Python 3.0. 2. A larger .leo file fails as follows (it's possible that the duplicate error messages are a Leo problem): Traceback (most recent call last): Traceback (most recent call last): File C:\leo.repo\leo-30\leo\core\leoFileCommands.py, line 1283, in parse_leo_file parser.parse(theFile) # expat does not support parseString File C:\leo.repo\leo-30\leo\core\leoFileCommands.py, line 1283, in parse_leo_file parser.parse(theFile) # expat does not support parseString File c:\python30\lib\xml\sax\expatreader.py, line 107, in parse xmlreader.IncrementalParser.parse(self, source) File c:\python30\lib\xml\sax\expatreader.py, line 107, in parse xmlreader.IncrementalParser.parse(self, source) File c:\python30\lib\xml\sax\xmlreader.py, line 121, in parse buffer = file.read(self._bufsize) File c:\python30\lib\xml\sax\xmlreader.py, line 121, in parse buffer = file.read(self._bufsize) File C:\Python30\lib\io.py, line 1670, in read eof = not self._read_chunk() File C:\Python30\lib\io.py, line 1670, in read eof = not self._read_chunk() File C:\Python30\lib\io.py, line 1499, in _read_chunk self._set_decoded_chars(self._decoder.decode(input_chunk, eof)) File C:\Python30\lib\io.py, line 1499, in _read_chunk self._set_decoded_chars(self._decoder.decode(input_chunk, eof)) File C:\Python30\lib\io.py, line 1236, in decode output = self.decoder.decode(input, final=final) File C:\Python30\lib\io.py, line 1236, in decode output = self.decoder.decode(input, final=final) File C:\Python30\lib\encodings\cp1252.py, line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] File C:\Python30\lib\encodings\cp1252.py, line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 74: character maps to undefined UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 74: character maps to undefined The same calls to sax read the file correctly on Python 2.5. It would be nice to have a message pinpoint the line and character offset of the problem. My vote would be for the code to work on both kinds of input streams. This would save the users considerable confusion if sax does the (tricky) conversions automatically. Imo, now would be the most convenient time to attempt this--there is a certain freedom in having everything be partially broken :-) Edward Edward K. Ream email: [EMAIL PROTECTED] Leo: http://webpages.charter.net/edreamleo/front.html Added file: http://bugs.python.org/file11147/unnamed ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3590 ___div dir=ltrbrbrdiv class=gmail_quoteOn Mon, Aug 18, 2008 at 1:51 PM, Antoine Pitrou span dir=ltrlt;a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/agt;/span wrote:brblockquote class=gmail_quote style=border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex; div class=Ih2E3dbr Antoine Pitrou lt;a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/agt; added the comment:br br /divFrom the discussion on the python-3000, it looks like it would be nicebr if sax.parser handled both bytes and unicode streams.br/blockquotedivnbsp;br/divblockquote class=gmail_quote style=border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex; Edward, does your simple fix make sax.parser work entirely well withbr byte streams?/blockquotedivbrNo. The sax.parser seems to have other problems.nbsp; Here is what I *think* I know ;-)brbr1. A smallish .leo file (an xml file) containing a single non-ascii (utf-8) encoded character appears to have been read correctly with Python 3.0.br br2. A larger .leo file fails as follows (it#39;s possible that the duplicate error messages are a Leo problem):brbrTraceback (most recent call last):brTraceback (most recent call last):brbrnbsp; File quot;C:\leo.repo\leo-30\leo\core\leoFileCommands.pyquot;, line 1283, in parse_leo_filebr nbsp;nbsp;nbsp; parser.parse(theFile) # expat does not support parseStringbrnbsp; File quot;C:\leo.repo\leo-30\leo\core
[issue3590] sax.parser considers XML as text rather than bytes
Edward K Ream [EMAIL PROTECTED] added the comment: On Mon, Aug 18, 2008 at 11:00 AM, Antoine Pitrou [EMAIL PROTECTED]wrote: Antoine Pitrou [EMAIL PROTECTED] added the comment: Just to be clear, I am at present totally confused about io streams :-) Python 3.0 distincts more clearly between unicode strings (called str in 3.0) and bytes strings (called bytes in 3.0). The most important point being that there is no more any implicit conversion between the two: you must explicitly use .encode() or .decode(). Files opened in binary (rb) mode returns byte strings, but files opened in text (r) mode return unicode strings, which means you can't give a text file to 3.0 library expecting a binary file, or vice-versa. What is more worrying is that XML, until decoded, should be considered a byte stream, so sax.parser should accept binary files rather than text files. I took a look at test_sax and indeed it considers XML as text rather than bytes :-( Thanks for these remarks. They confirm what I suspected, but was unsure of, namely that it seems strange to be passing something other than a byte stream to parser.parse. Bumping this as critical because it needs a decision very soon (ideally before beta3). Thanks for taking this seriously. Edward P.S. I love the new unicode plans. They are going to cause some pain at first for everyone (Python team and developers), but in the long run they are going to be a big plus for Python. EKR Edward K. Ream email: [EMAIL PROTECTED] Leo: http://webpages.charter.net/edreamleo/front.html Added file: http://bugs.python.org/file11148/unnamed ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3590 ___div dir=ltrbrbrdiv class=gmail_quoteOn Mon, Aug 18, 2008 at 11:00 AM, Antoine Pitrou span dir=ltrlt;a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/agt;/span wrote:brblockquote class=gmail_quote style=border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex; br Antoine Pitrou lt;a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/agt; added the comment:br br gt; Just to be clear, I am at present totally confused about io streams :-)br br Python 3.0 distincts more clearly between unicode strings (called quot;strquot;br in 3.0) and bytes strings (called quot;bytesquot; in 3.0). The most importantbr point being that there is no more any implicit conversion between thebr two: you must explicitly use .encode() or .decode().br br Files opened in binary (quot;rbquot;) mode returns byte strings, but filesbr opened in text (quot;rquot;) mode return unicode strings, which means you can#39;tbr give a text file to 3.0 library expecting a binary file, or vice-versa.br br What is more worrying is that XML, until decoded, should be considered abr byte stream, so sax.parser should accept binary files rather than textbr files. I took a look at test_sax and indeed it considers XML as textbr rather than bytes :-(/blockquotedivbrThanks for these remarks.nbsp; They confirm what I suspected, but was unsure of, namely that it seems strange to be passing something other than a byte stream to parser.parse.br/div blockquote class=gmail_quote style=border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex; br Bumping this as critical because it needs a decision very soon (ideallybr before beta3)./blockquotedivbrThanks for taking this seriously.brbrEdwardbrbrP.S.nbsp; I love the new unicode plans.nbsp; They are going to cause some pain at first for everyone (Python team and developers), but in the long run they are going to be a big plus for Python.br brEKRbr/div/divbrEdward K. Ream email: a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/abrLeo: a href=http://webpages.charter.net/edreamleo/front.html;http://webpages.charter.net/edreamleo/front.html/abr brbr /div ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3590] sax.parser considers XML as text rather than bytes
Edward K Ream [EMAIL PROTECTED] added the comment: On Mon, Aug 18, 2008 at 4:15 PM, Antoine Pitrou [EMAIL PROTECTED]wrote: Antoine Pitrou [EMAIL PROTECTED] added the comment: The same calls to sax read the file correctly on Python 2.5. What are those calls exactly? parser = xml.sax.make_parser() parser.setFeature(xml.sax.handler.feature_external_ges,1) handler = saxContentHandler(c,inputFileName,silent,inClipboard) parser.setContentHandler(handler) parser.parse(theFile) As discussed in http://bugs.python.org/issue3590 theFile is a file opened with 'rb' attributes Edward Edward K. Ream email: [EMAIL PROTECTED] Leo: http://webpages.charter.net/edreamleo/front.html Added file: http://bugs.python.org/file11151/unnamed ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3590 ___div dir=ltrbrbrdiv class=gmail_quoteOn Mon, Aug 18, 2008 at 4:15 PM, Antoine Pitrou span dir=ltrlt;a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/agt;/span wrote:brblockquote class=gmail_quote style=border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex; div class=Ih2E3dbr Antoine Pitrou lt;a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/agt; added the comment:br br /divdiv class=Ih2E3dgt; The same calls to sax read the file correctly on Python 2.5.br br /divWhat are those calls exactly?/blockquotedivbrpre parser = xml.sax.make_parser()br parser.setFeature(xml.sax.handler.feature_external_ges,1)br handler = saxContentHandler(c,inputFileName,silent,inClipboard)br parser.setContentHandler(handler)br parser.parse(theFile)brbrAs discussed in a href=http://bugs.python.org/issue3590;http://bugs.python.org/issue3590/abrbrtheFile is a file opened with #39;rb#39; attributesbr brEdward/pre/div/divbrEdward K. Ream email: a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/abrLeo: a href=http://webpages.charter.net/edreamleo/front.html;http://webpages.charter.net/edreamleo/front.html/abr brbr /div ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com