Re: [Python-3000] Recursive str
2008/4/16, Michael Urman <[EMAIL PROTECTED]>: > I'll miss this, as I suspect the case of printing a list of unicode > strings will be fairly common. Given Unicode identifiers, even print > locals() could hit this. But perhaps tools for printing better > summaries of the contents of lists and dicts, or shell quoting (repr > as is makes a passable hack for quotes and spaces, but not unicode > characters), etc., can alleviate the pain well enough. > If such tools are given, but I'm not sure it is enough. Using repr() to build output string is common practice in Python world, so repr() is called everywhere in Python-core and third-party applications to print objects, emitting logs, etc.,. For example, >>> f = open("日本語") Traceback (most recent call last): File "", line 1, in File "c:\ww\Python-3.0a4-orig\lib\io.py", line 212, in __new__ return open(*args, **kwargs) File "c:\ww\Python-3.0a4-orig\lib\io.py", line 151, in open closefd) IOError: [Errno 2] No such file or directory: '\u65e5\u672c\u8a9e' This is annoying error message. Or, in Python 2, >>> f = open(u"日本語", "w") >>> f This repr()ed form is difficult to read. When Japanese (or Chinise) programmers look u'\u65e5\u672c\u8a9e', they'll have strong impression that Python is not intended to be used in their country. ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Recursive str
Hello. Sorry for being a bit late in the discussion - my sysadmin has problems setting up our DNS server so I could not send mail. On Tue, Apr 15, 2008 at 06:07:46PM -0400, Terry Reedy wrote: > import unirep > print(*map(unirep.russian, objects)) > > or even > > from unirep import rus_print > > rus_print(ojbects) # does same as above, with **kwds passed on First, this doesn't help anything because that form of print must be recursive if "objects" is a container that contains other objects. Second, I am satisfied with how repr(objects) works - it calls repr() recursively and that's ok. What I was complaining in the original post is that str(objects) calls repr() for items. This is especially problematic when I use repr() and str() semi-explicitly. For example, compare logging.debug("objects: %r", objects) and logging.debug("objects: %s", objects) In the first call I expect and get repr(objects), fine. But in the second case I again get repr(), and even logging.debug("objects: %s", str(objects)) doesn't help. Do I understand it right that str(objects) calls repr() on items to properly quote strings? (str([1, '1']) must give "[1, '1']" as the result). Is it the only reason? PS. atsuo ishimoto has showed that repr() is called in tracebacks. I agree that's a problem, but that's another problem, not "recursive str". Oleg. -- Oleg Broytmannhttp://phd.pp.ru/[EMAIL PROTECTED] Programmers don't die, they just GOSUB without RETURN. ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
[Python-3000] Displaying strings containing unicode escapes at the interactive prompt (was Re: Recursive str)
atsuo ishimoto wrote: > Using repr() to build output string is common practice in Python world, > so repr() is called everywhere in Python-core and third-party applications > to print objects, emitting logs, etc.,. > > For example, > f = open("日本語") > Traceback (most recent call last): > File "", line 1, in > File "c:\ww\Python-3.0a4-orig\lib\io.py", line 212, in __new__ > return open(*args, **kwargs) > File "c:\ww\Python-3.0a4-orig\lib\io.py", line 151, in open > closefd) > IOError: [Errno 2] No such file or directory: '\u65e5\u672c\u8a9e' > > This is annoying error message. Or, in Python 2, > f = open(u"日本語", "w") f > > > This repr()ed form is difficult to read. When Japanese (or Chinise) > programmers look u'\u65e5\u672c\u8a9e', they'll have strong > impression that Python is not intended to be used in their country. This is starting to seem to me more like something to be addressed through sys.displayhook/excepthook at the interactive interpreter level than it is to be dealt with through changes to any __repr__() implementations. Given the following setup code: def replace_escapes(escaped_str): return escaped_str.encode('latin-1').decode('unicode_escape') def displayhook_unicode(expr_result): if expr_result is not None: __builtins__._ = expr_result print(replace_escapes(repr(expr_result))) from traceback import format_exception def excepthook_unicode(*exc_details): msg = ''.join(format_exception(*exc_details)) print(replace_escapes(msg), end='') import sys sys.displayhook = displayhook_unicode sys.excepthook = excepthook_unicode I get the following behaviour: >>> "\u65e5\u672c\u8a9e" '日本語' >>> print("\u65e5\u672c\u8a9e") 日本語 >>> '日本語' '日本語' >>> print('日本語') 日本語 >>> 日本語 = 1 >>> 日本語 1 >>> dir() ['__builtins__', '__doc__', '__name__', '__package__', 'displayhook_unicode', 'excepthook_unicode', 'format_exception', 'replace_escapes', 'sys', '日本語'] >>> b"\u65e5\u672c\u8a9e" b'\u65e5\u672c\u8a9e' >>> print(b"\u65e5\u672c\u8a9e") b'\\u65e5\\u672c\\u8a9e' >>> f = open("\u65e5\u672c\u8a9e") Traceback (most recent call last): File "", line 1, in File "/home/ncoghlan/devel/py3k/Lib/io.py", line 212, in __new__ return open(*args, **kwargs) File "/home/ncoghlan/devel/py3k/Lib/io.py", line 151, in open closefd) IOError: [Errno 2] No such file or directory: '日本語' >>> f = open("\u65e5\u672c\u8a9e", 'w') >>> f.name '日本語' Note that even though the bytes object representation is slightly different from that for the normal displayhook (which doubles up on the backslashes, just like the bytes printing example above), the two different representations are equivalent because \u isn't a valid escape sequence for bytes literals. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt
On Wed, Apr 16, 2008 at 10:11:13PM +1000, Nick Coghlan wrote: > atsuo ishimoto wrote: > > IOError: [Errno 2] No such file or directory: '\u65e5\u672c\u8a9e' > > This is starting to seem to me more like something to be addressed > through sys.displayhook/excepthook at the interactive interpreter level The problem manifests itself in scripts, too: Traceback (most recent call last): File "./ttt.py", line 4, in open("тест") # filename is in koi8-r encoding IOError: [Errno 2] No such file or directory: '\xd4\xc5\xd3\xd4' Oleg. -- Oleg Broytmannhttp://phd.pp.ru/[EMAIL PROTECTED] Programmers don't die, they just GOSUB without RETURN. ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt
Oleg Broytmann wrote: > On Wed, Apr 16, 2008 at 10:11:13PM +1000, Nick Coghlan wrote: >> atsuo ishimoto wrote: >>> IOError: [Errno 2] No such file or directory: '\u65e5\u672c\u8a9e' >> This is starting to seem to me more like something to be addressed >> through sys.displayhook/excepthook at the interactive interpreter level > >The problem manifests itself in scripts, too: > > Traceback (most recent call last): > File "./ttt.py", line 4, in > open("тест") # filename is in koi8-r encoding > IOError: [Errno 2] No such file or directory: '\xd4\xc5\xd3\xd4' Hmm, the io module along with sys.stdout/err may be a better way to attack the problem then. Given: import sys, io class ParseUnicodeEscapes(io.TextIOWrapper): def write(self, text): super().write(text.encode('latin-1').decode('unicode_escape')) args = (sys.stdout.buffer, sys.stdout.encoding, sys.stdout.errors, None, sys.stdout.line_buffering) sys.stdout = ParseUnicodeEscapes(*args) args = (sys.stderr.buffer, sys.stderr.encoding, sys.stderr.errors, None, sys.stderr.line_buffering) sys.stderr = ParseUnicodeEscapes(*args) You get: >>> "тест" 'тест' >>> open("тест") Traceback (most recent call last): File "", line 1, in File "/home/ncoghlan/devel/py3k/Lib/io.py", line 212, in __new__ return open(*args, **kwargs) File "/home/ncoghlan/devel/py3k/Lib/io.py", line 151, in open closefd) IOError: [Errno 2] No such file or directory: 'тест' Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Displaying strings containing unicode escapes
On Wed, Apr 16, 2008 at 11:21:26PM +1000, Nick Coghlan wrote: > Hmm, the io module along with sys.stdout/err may be a better way to > attack the problem then. Given: > > import sys, io > > class ParseUnicodeEscapes(io.TextIOWrapper): >def write(self, text): > super().write(text.encode('latin-1').decode('unicode_escape')) > > args = (sys.stdout.buffer, sys.stdout.encoding, sys.stdout.errors, > None, sys.stdout.line_buffering) > > sys.stdout = ParseUnicodeEscapes(*args) > > args = (sys.stderr.buffer, sys.stderr.encoding, sys.stderr.errors, > None, sys.stderr.line_buffering) > > sys.stderr = ParseUnicodeEscapes(*args) > > You get: > > >>> "тест" > 'тест' > >>> open("тест") > Traceback (most recent call last): >File "", line 1, in >File "/home/ncoghlan/devel/py3k/Lib/io.py", line 212, in __new__ > return open(*args, **kwargs) >File "/home/ncoghlan/devel/py3k/Lib/io.py", line 151, in open > closefd) > IOError: [Errno 2] No such file or directory: 'тест' Very well, then. Thank you! The code should be put in a cookbook or the wiki, if not in the library. Oleg. -- Oleg Broytmannhttp://phd.pp.ru/[EMAIL PROTECTED] Programmers don't die, they just GOSUB without RETURN. ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt
2008/4/16 Oleg Broytmann <[EMAIL PROTECTED]>: >The problem manifests itself in scripts, too: > > Traceback (most recent call last): > File "./ttt.py", line 4, in > open("тест") # filename is in koi8-r encoding > IOError: [Errno 2] No such file or directory: '\xd4\xc5\xd3\xd4' Note that this can be a feature too! You might have a filename that *looks* normal but contains a character from a different language -- the \u encoding will show you the problem. $ ls *.py mc.py x.py guido-van-rossums-imac:~ guido$ python Python 2.5.2 (release25-maint:60953, Feb 25 2008, 09:38:08) [GCC 4.0.1 (Apple Inc. build 5465)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> open('mс.py') Traceback (most recent call last): File "", line 1, in IOError: [Errno 2] No such file or directory: 'm\xd1\x81.py' >>> -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Displaying strings containing unicode escapes
On Wed, Apr 16, 2008 at 07:26:36AM -0700, Guido van Rossum wrote: > 2008/4/16 Oleg Broytmann <[EMAIL PROTECTED]>: > >The problem manifests itself in scripts, too: > > > > Traceback (most recent call last): > > File "./ttt.py", line 4, in > > open("тест") # filename is in koi8-r encoding > > IOError: [Errno 2] No such file or directory: '\xd4\xc5\xd3\xd4' > > Note that this can be a feature too! You might have a filename that > *looks* normal but contains a character from a different language -- > the \u encoding will show you the problem. > > $ ls *.py > mc.py x.py > guido-van-rossums-imac:~ guido$ python > Python 2.5.2 (release25-maint:60953, Feb 25 2008, 09:38:08) > [GCC 4.0.1 (Apple Inc. build 5465)] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> open('mс.py') > Traceback (most recent call last): > File "", line 1, in > IOError: [Errno 2] No such file or directory: 'm\xd1\x81.py' This can be a feature only for people who always have all-ascii file names and never expect non-ascii characters in the file names. Those of us who regularly use non-ascii filenames are too accustomed to that brok^H^H^H^H escaped repr's to spot a difference. Oleg. -- Oleg Broytmannhttp://phd.pp.ru/[EMAIL PROTECTED] Programmers don't die, they just GOSUB without RETURN. ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Displaying strings containing unicode escapes
Oleg Broytmann wrote: > On Wed, Apr 16, 2008 at 11:21:26PM +1000, Nick Coghlan wrote: >> You get: >> >> >>> "тест" >> 'тест' >> >>> open("тест") >> Traceback (most recent call last): >>File "", line 1, in >>File "/home/ncoghlan/devel/py3k/Lib/io.py", line 212, in __new__ >> return open(*args, **kwargs) >>File "/home/ncoghlan/devel/py3k/Lib/io.py", line 151, in open >> closefd) >> IOError: [Errno 2] No such file or directory: 'тест' > >Very well, then. Thank you! The code should be put in a cookbook or the > wiki, if not in the library. > Unfortunately, it turns out that the trick also breaks display of strings containing any other escape codes. For example: >>> '\n' ' ' >>> '\t' ' ' The unicode_escape codec is interpreting all of the escape sequences recognised in Python strings, not just the \u sequences we're interested in. I can't see an easy way around this at the moment, but I'm still reasonably convinced that the issue of Unicode escapes for non-ASCII users is best attacked as a display problem rather than an internal representation problem. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Displaying strings containing unicode escapes
I just had a shower, and I think it's cleared my thoughts a bit. :-) Clearly this is an important problem to those in countries where ASCII doesn't cut it. And just like in Python 3000 we're using UTF-8 as the default source encoding and allowing Unicode letters in identifiers, I think we should bite the bullet and allow repr() of a string to pass through all characters that the Unicode standard considers printable. For those of us with less capable IO devices, setting the error flag for stdout and stderr to backslashreplace is probably the best solution (and it solves more problems than just repr()). I will have another look at Atsuo's patch. I do think we should use some kind of Unicode-standard-endorsed definition of "printable" (as long as it excludes all ASCII escapes), since there are plenty of undefined code points that even Japanese people would probably prefer to see rendered as \u rather than completely invisible. I'm also not sure what people would want to happen for surrogate pairs. (OTOH an unpaired surrogate should be rendered as \u.) I expect that this will require some more research and agreement. Perhaps someone can produce a draft PEP and attempt to sort out the details of specification and implementation? It would also be nice if it could be friendly to Jython, IronPython and PyPy. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Recursive str
On Tue, Apr 15, 2008 at 10:30 PM, Greg Ewing <[EMAIL PROTECTED]> wrote: > Terry Reedy wrote: > > import unirep > > print(*map(unirep.russian, objects)) > > That's okay if the objects are strings, but what about > non-string objects that contain strings? > > We'd need another protocol, such as __unirep__. Or have str.__repr__() respect per-thread settings, the way decimal arithmetic does. Default settings would be in force most of the time; the interactive prompt would apply the user's settings when repr-ing a result. This approach solves the nested-strings problem quite nicely. But it does not catch error/warning/log messages when they are generated, unless the program does *everything* under custom repr settings (dangerous). There really are two use cases here: a human-readable repr for error/warning/log messages; and a machine-readable, always-the-same, ASCII-only repr. Users want to be able to tweak the former. -j ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Recursive str
[Jason Orendorff] > Or have str.__repr__() respect per-thread settings, the way decimal > arithmetic does. I don't think that's a very compelling example. I have serious issues with having global or per-thread state that can change the outcome of repr(); it would make it impossible to write correct code involving repr() because you can never know what it will do the next time. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Recursive str
On Wed, Apr 16, 2008 at 11:05 AM, Jason Orendorff <[EMAIL PROTECTED]> wrote: > There really are two use cases here: a human-readable repr for > error/warning/log messages; and a machine-readable, always-the-same, > ASCII-only repr. Users want to be able to tweak the former. Does machine-readable require ASCII-only, and does repr() guarantee this? It sounded like the worries about not escaping Unicode characters were related to it not visually distinguishing between different encodings for the same visual results (as their machine-readable Unicode strings, or encoded UTF-8 bytestreams, would already differ). -- Michael Urman ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt
2008/4/16, Nick Coghlan <[EMAIL PROTECTED]>: > Oleg Broytmann wrote: > > On Wed, Apr 16, 2008 at 10:11:13PM +1000, Nick Coghlan wrote: > >> atsuo ishimoto wrote: > >>> IOError: [Errno 2] No such file or directory: '\u65e5\u672c\u8a9e' > >> This is starting to seem to me more like something to be addressed > >> through sys.displayhook/excepthook at the interactive interpreter level > > > >The problem manifests itself in scripts, too: > > > > Traceback (most recent call last): > > File "./ttt.py", line 4, in > > open("тест") # filename is in koi8-r encoding > > IOError: [Errno 2] No such file or directory: '\xd4\xc5\xd3\xd4' > > > Hmm, the io module along with sys.stdout/err may be a better way to > attack the problem then. Given: > > import sys, io > > class ParseUnicodeEscapes(io.TextIOWrapper): >def write(self, text): > super().write(text.encode('latin-1').decode('unicode_escape')) > > args = (sys.stdout.buffer, sys.stdout.encoding, sys.stdout.errors, > None, sys.stdout.line_buffering) > > sys.stdout = ParseUnicodeEscapes(*args) > > args = (sys.stderr.buffer, sys.stderr.encoding, sys.stderr.errors, > None, sys.stderr.line_buffering) > > sys.stderr = ParseUnicodeEscapes(*args) > > You get: > > >>> "тест" > 'тест' > >>> open("тест") > I got: >>> print("あ") Traceback (most recent call last): File "", line 1, in File "", line 3, in write UnicodeEncodeError: 'latin-1' codec can't encode character 'あ' in position 0: ordinal not in range(256) >>> print('\\'+'u0041') A Your hack doesn't work. Displayhook hack doesn't work, too. Question: Are you happy if you are forced to live with these hacks forever? If not, why do you think I'll accept your suggestion? ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt
2008/4/16, Guido van Rossum <[EMAIL PROTECTED]>: > Note that this can be a feature too! You might have a filename that > *looks* normal but contains a character from a different language -- > the \u encoding will show you the problem. You won't call it a feature, if your *normal* encoding was koi8-r. ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt
I changed my mind already. :-) See my post of this morning in another thread. On Wed, Apr 16, 2008 at 4:09 PM, atsuo ishimoto <[EMAIL PROTECTED]> wrote: > 2008/4/16, Guido van Rossum <[EMAIL PROTECTED]>: > > > Note that this can be a feature too! You might have a filename that > > *looks* normal but contains a character from a different language -- > > the \u encoding will show you the problem. > > You won't call it a feature, if your *normal* encoding was koi8-r. > -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt
2008/4/17, Guido van Rossum <[EMAIL PROTECTED]>: > I changed my mind already. :-) See my post of this morning in another thread. Ah, I missed the mail! Thank you. ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Displaying strings containing unicode escapes
I've reordered Guido's words. Guido van Rossum writes: > For those of us with less capable IO devices, setting the error flag > for stdout and stderr to backslashreplace is probably the best > solution (and it solves more problems than just repr()). True. But it doesn't solve the ambiguity problem on capable displays. > And just like in Python 3000 we're using UTF-8 as the default > source encoding and allowing Unicode letters in identifiers, I > think we should bite the bullet and allow repr() of a string to > pass through all characters that the Unicode standard considers > printable. The problem is that this doesn't display the representation of strings and identifier names in an unambiguous way. "AKMOT" could be all-ASCII, it could be all-Cyrillic, or it could be a mixture of ASCII, Cyrillic, and Greek. Odds are quite good that there are other scripts that could be mixed in, too. This kind of mixing happens all the time in Japanese, where people mix half-width and full-width ASCII with abandon (especially when altering digits in dates). I could easily see a Russian using Cyrillic 'A' to uppercase an ASCII 'a' in the same way. How about choosing a standard Python repertoire (based on the Unicode standard, of course) of which characters get a graphic repr and which ones get \u-escaped, and have a post-hook for repr which gets passed the string repr proposes to print out? This hook would always be identity in Python-distributed stuff, of course, but on the consenting adults principle applications and modules outside of the stdlib could use it. Would that be acceptable? The standard repertoire would grandfather ASCII, I suppose, because for the foreseeable future most identifiers are going to be ASCII, and all Python implementations will contain a lot of ASCII identifiers and strings indefinitely. ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
[Python-3000] end scope of iteration variables after loop
hello all, A few times in practice I have been tripped up by how Python keeps variables in scope after a loop--and it wasn't immediately obvious what the problem was. I think it is one of the ugliest and non-intuitive features, and hope some others agree that it should be changed in py3k. >>> for a in range(11): pass ... >>> print(a) 10 Thanks, Nicholas ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Recursive str
Guido van Rossum wrote: > The more I think about this, the more I believe that repr() should > *not* be changed, and that instead we should give people who like to > see '日本語' instead of '\u1234\u5678\u9abc' other tools to help > themselves. This seems to be a rather ASCII-centric way of thinking about things, though, which I thought py3k was trying to get away from, with unicode being the one and only string type. Maybe it really is the only practical option, but I can understand non-ASCII speakers feeling disappointed. -- Greg ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] sizeof(size_t) < sizeof(long)
Martin v. Löwis wrote: > 3.6 byte > addressable unit of data storage large enough to hold any > member of the basic character set of the execution > environment Blarg. Well, I think the wording of that part of the standard is braindamaged. The word "byte" already has a pre-existing meaning outside of C, and the C standard shouldn't be redefining it for its own purposes. This is like a financial document that defines "dollar" as "the unit of currency in use in the country concerned". Thoroughly confusing and unnecessary. Particularly since they seem to just be defining "byte" to mean the same thing as "char". Why not just use the term "char" in the first place? -- Greg ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] end scope of iteration variables after loop
previous discussion at http://mail.python.org/pipermail/python-dev/2005-September/056677.html I don't agree with the author that >>> i = 3 >>> for i in range(11): pass ... >>> i 10 is much less confusing than i returning 3. furthermore, his C example makes it obvious that "i" will be available in the scope after the loop. There's no way to know now, but I think mistakes would be less frequent. Additionally, what are others' opinions about this "pseudo-namespace" (i.e. scoping) being slow? Admittedly, I don't know much about the current parser's implementation, but it doesn't seem like scoping necessitates slow parsing -- considering it's done in other languages, and python functions have reasonable scope. >>> def do_nothing(i): i = 3 ... >>> do_nothing(1) >>> i 10 Nicholas On Wed, Apr 16, 2008 at 5:52 PM, Nicholas T <[EMAIL PROTECTED]> wrote: > hello all, > >A few times in practice I have been tripped up by how Python keeps > variables in scope after a loop--and it wasn't immediately obvious what the > problem was. I think it is one of the ugliest and non-intuitive features, > and hope some others agree that it should be changed in py3k. > > >>> for a in range(11): pass > ... > >>> print(a) > 10 > > Thanks, > Nicholas ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Recursive str
Oleg Broytmann wrote: > Do I understand it right that str(objects) calls repr() on items to > properly quote strings? (str([1, '1']) must give "[1, '1']" as the result). > Is it the only reason? In the case of strings, yes. More generally, there can be any kind of object in the list, and repr(x) is more likely to give an unambiguous idea of what x is than str(x) when it's embedded in a comma- separated list. Python has no way of guessing the most appropriate way to display your list of objects when you use str(), so it doesn't try. You have to tell it by writing code to do what you want. -- Greg ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt
Oleg Broytmann wrote: > Traceback (most recent call last): > File "./ttt.py", line 4, in > open("тест") # filename is in koi8-r encoding > IOError: [Errno 2] No such file or directory: '\xd4\xc5\xd3\xd4' In that particular case, I'd say the IOError constructor is doing the wrong thing -- it should be using something like "No such file or directory: '%s'" % filename\ instead of "No such file or directory: %r" % filename i.e. %r shouldn't be used as a quick and dirty way to get a string quoted. -- Greg ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Displaying strings containing unicode escapes
Nick Coghlan wrote: > Unfortunately, it turns out that the trick also breaks display of > strings containing any other escape codes. There's also the worry that it could trigger falsely on something that happened to look like \u but didn't originate from the repr of a unicode char. > I'm still > reasonably convinced that the issue of Unicode escapes for non-ASCII > users is best attacked as a display problem It can only ever be a heuristic, though, not an exact solution, since there isn't enough information left by the time it's a string to undo the escaping correctly in all cases. I'm currently thinking there are too many use cases overloaded onto repr() at the moment. -- Greg ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] sizeof(size_t) < sizeof(long)
David Cournapeau wrote: > They are totally different concepts: byte is not a (C) type, but a unit, > the one returned by the sizeof operator. If a word is needed for this concept, then invent a new one, e.g. "size unit", rather than reusing "byte", which everyone already understands as meaning 8 bits. > C impose that sizeof(unsigned type) == sizeof(signed type) for any type, > so if one byte is one char, unsigned char would be a byte too, and so > unsigned char and char would be the same, which is obviously wrong. No, "char" and "unsigned char" can still be different types. You just need to say that sizeof(char) == sizeof(unsigned char) == 1, and leave bytes out of the discussion altogether. -- Greg ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] sizeof(size_t) < sizeof(long)
Greg Ewing wrote: > > Blarg. Well, I think the wording of that part of the > standard is braindamaged. The word "byte" already has > a pre-existing meaning outside of C, and the C standard > shouldn't be redefining it for its own purposes. > > This is like a financial document that defines "dollar" > as "the unit of currency in use in the country concerned". > Thoroughly confusing and unnecessary. > > Particularly since they seem to just be defining "byte" > to mean the same thing as "char". Why not just use the > term "char" in the first place? > They are totally different concepts: byte is not a (C) type, but a unit, the one returned by the sizeof operator. One char occupies one byte of memory, and in memory, they are the same, but conceptually, they are totally different, from the C point of view at least. For example, C impose that sizeof(unsigned type) == sizeof(signed type) for any type, so if one byte is one char, unsigned char would be a byte too, and so unsigned char and char would be the same, which is obviously wrong. cheers, David ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] sizeof(size_t) < sizeof(long)
Greg Ewing wrote: > > If a word is needed for this concept, then invent a new > one, e.g. "size unit", rather than reusing "byte", which > everyone already understands as meaning 8 bits. > Maybe everyone understands it as 8 bits, but it has always been wrong. Byte is a unit of storage, which often contains 8 bits, but not always. This definition of a byte as a unit of storage certainly precludes the convention that one byte = 8 bits; even if it always contained 8 bits, it would still be wrong to say that one byte is 8 bits BTW: the byte notion (unit of storage), and its actual size are totally different concepts. > > No, "char" and "unsigned char" can still be different types. > You just need to say that sizeof(char) == sizeof(unsigned char) == 1, > and leave bytes out of the discussion altogether. > I was merely answering to the question "why not using char in the first place": because they are totally difference concepts. If you assume char and byte are the same thing because sizeof(char) == 1 byte, then you should assume that unsigned char is the same as a byte, and thus that unsigned char and char are the same. This was a proof by contradiction :) cheers, David ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] sizeof(size_t) < sizeof(long)
On Apr 16, 2008, at 11:00 PM, Greg Ewing wrote: > If a word is needed for this concept, then invent a new > one, e.g. "size unit", rather than reusing "byte", which > everyone already understands as meaning 8 bits. Nope. Everyone understands "octet" to be 8 bits. Bytes being exactly 8 bits is itself the redefinition! In the not-too- distant-past, some hardware had 9-bit bytes. Common Lisp also uses the term "byte" to mean an arbitrary (specified) number of bits. E.g. http://www.lisp.org/HyperSpec/Body/typ_unsigned-byte.html See also http://dictionary.die.net/byte James ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] sizeof(size_t) < sizeof(long)
David Cournapeau wrote: > Maybe everyone understands it as 8 bits, but it has always been wrong. It may not be officially written down anywhere, but almost everyone in the world understands a byte to mean 8 bits. When you go into a computer store and ask for 256MB of RAM, you don't expect to be asked "What size bytes would that be, then, sir?" So it's a de facto standard, and one that works perfectly well. Going against it is both futile and unnecessary, as far as I can see. -- Greg ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt
On Wed, Apr 16, 2008 at 6:53 PM, Greg Ewing <[EMAIL PROTECTED]> wrote: ... > > open("тест") # filename is in koi8-r encoding > > IOError: [Errno 2] No such file or directory: '\xd4\xc5\xd3\xd4' > > In that particular case, I'd say the IOError constructor > is doing the wrong thing -- it should be using something > like > >"No such file or directory: '%s'" % filename\ > > instead of > >"No such file or directory: %r" % filename > > i.e. %r shouldn't be used as a quick and dirty way to > get a string quoted. I disagree: I always recommend using %r to display (in an error message, log entry, etc), a string that may be in error, NOT '%s', because the cause of the error can often be that the string mistakenly contains otherwise-invisible characters -- %r will show them clearly (as escape sequences), while %s could hide them and lead anybody but the most experienced developer to a long and frustrating debugging session. Alex ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt
Alex Martelli wrote: > I disagree: I always recommend using %r to display (in an error > message, log entry, etc), a string that may be in error, For debugging messages, yes, but not output produced in the normal course of operation. And "File Not Found" I consider to be in the latter category -- the user typed in the wrong file name, but it's still a string, and should be displayed to him as such. If it's not a string, the program will most likely fall over with a TypeError trying to open the file before it gets as far as constructing an IOError. -- Greg ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt
On Wed, Apr 16, 2008 at 10:20 PM, Greg Ewing <[EMAIL PROTECTED]> wrote: > Alex Martelli wrote: > > I disagree: I always recommend using %r to display (in an error > > message, log entry, etc), a string that may be in error, > > For debugging messages, yes, but not output produced > in the normal course of operation. And "File Not Found" > I consider to be in the latter category -- the user > typed in the wrong file name, but it's still a string, > and should be displayed to him as such. I respectfully disagree. Control characters and such in the string should *definitely* be escaped. Regarding printable characters outside the ASCII range, see my post in another thread (which somehow nearly everybody appears to have missed); in Py3k I propose to pass printable Unicode characters unchanged through repr(). stdout/stderr will set their error attribute to backslashreplace so that if their encoding is ASCII or some such, out-of-range characters will be printed as \u rather than raising an exception during printing. But as I said, please follow up to my other post. Another reason to use %r is that if someone manages to include \n in a filename, with %s the log message might be spread across two lines, possibly confusing log parsers and even providing ways to hide illegal activities from log scanners. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
Re: [Python-3000] sizeof(size_t) < sizeof(long)
On Wed, Apr 16, 2008 at 10:32 PM, Greg Ewing <[EMAIL PROTECTED]> wrote: > David Cournapeau wrote: > > > Maybe everyone understands it as 8 bits, but it has always been wrong. > > It may not be officially written down anywhere, but > almost everyone in the world understands a byte to mean > 8 bits. When you go into a computer store and ask for > 256MB of RAM, you don't expect to be asked "What size > bytes would that be, then, sir?" > > So it's a de facto standard, and one that works perfectly > well. Going against it is both futile and unnecessary, > as far as I can see. Sure, *now*, but C inherited their definition from a day when it wasn't so clear cut. It may be obsolete today, but good luck getting them to change the standard. -- Adam Olsen, aka Rhamphoryncus ___ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com