Re: [Python-3000] Recursive str

2008-04-16 Thread atsuo ishimoto
2008/4/16, Michael Urman <[EMAIL PROTECTED]>:
> I'll miss this, as I suspect the case of printing a list of unicode
>  strings will be fairly common. Given Unicode identifiers, even print
>  locals() could hit this. But perhaps tools for printing better
>  summaries of the contents of lists and dicts, or shell quoting (repr
>  as is makes a passable hack for quotes and spaces, but not unicode
>  characters), etc., can alleviate the pain well enough.
>
If such tools are given, but I'm not sure it is enough.
Using repr() to build output string is common practice in Python world,
so repr() is called everywhere in Python-core and third-party applications
to print objects, emitting logs, etc.,.

For example,

>>> f = open("日本語")
Traceback (most recent call last):
  File "", line 1, in 
  File "c:\ww\Python-3.0a4-orig\lib\io.py", line 212, in __new__
return open(*args, **kwargs)
  File "c:\ww\Python-3.0a4-orig\lib\io.py", line 151, in open
closefd)
IOError: [Errno 2] No such file or directory: '\u65e5\u672c\u8a9e'

This is annoying error message. Or, in Python 2,

>>> f = open(u"日本語", "w")
>>> f


This repr()ed form is difficult to read. When Japanese (or Chinise)
programmers look u'\u65e5\u672c\u8a9e',  they'll have strong
impression that Python is not intended to be used in their country.
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Recursive str

2008-04-16 Thread Oleg Broytmann
Hello. Sorry for being a bit late in the discussion - my sysadmin has
problems setting up our DNS server so I could not send mail.

On Tue, Apr 15, 2008 at 06:07:46PM -0400, Terry Reedy wrote:
> import unirep
> print(*map(unirep.russian, objects))
> 
> or even
> 
> from unirep import rus_print
> 
> rus_print(ojbects) # does same as above, with **kwds passed on

   First, this doesn't help anything because that form of print must be
recursive if "objects" is a container that contains other objects.

   Second, I am satisfied with how repr(objects) works - it calls repr()
recursively and that's ok. What I was complaining in the original post is
that str(objects) calls repr() for items. This is especially problematic
when I use repr() and str() semi-explicitly. For example, compare

logging.debug("objects: %r", objects)

   and

logging.debug("objects: %s", objects)

   In the first call I expect and get repr(objects), fine. But in the
second case I again get repr(), and even

logging.debug("objects: %s", str(objects))

   doesn't help.

   Do I understand it right that str(objects) calls repr() on items to
properly quote strings? (str([1, '1']) must give "[1, '1']" as the result).
Is it the only reason?

PS. atsuo ishimoto has showed that repr() is called in tracebacks. I agree
that's a problem, but that's another problem, not "recursive str".

Oleg.
-- 
 Oleg Broytmannhttp://phd.pp.ru/[EMAIL PROTECTED]
   Programmers don't die, they just GOSUB without RETURN.
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] Displaying strings containing unicode escapes at the interactive prompt (was Re: Recursive str)

2008-04-16 Thread Nick Coghlan
atsuo ishimoto wrote:
> Using repr() to build output string is common practice in Python world,
> so repr() is called everywhere in Python-core and third-party applications
> to print objects, emitting logs, etc.,.
> 
> For example,
> 
 f = open("日本語")
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "c:\ww\Python-3.0a4-orig\lib\io.py", line 212, in __new__
> return open(*args, **kwargs)
>   File "c:\ww\Python-3.0a4-orig\lib\io.py", line 151, in open
> closefd)
> IOError: [Errno 2] No such file or directory: '\u65e5\u672c\u8a9e'
> 
> This is annoying error message. Or, in Python 2,
> 
 f = open(u"日本語", "w")
 f
> 
> 
> This repr()ed form is difficult to read. When Japanese (or Chinise)
> programmers look u'\u65e5\u672c\u8a9e',  they'll have strong
> impression that Python is not intended to be used in their country.

This is starting to seem to me more like something to be addressed 
through sys.displayhook/excepthook at the interactive interpreter level 
than it is to be dealt with through changes to any __repr__() 
implementations.

Given the following setup code:

def replace_escapes(escaped_str):
 return escaped_str.encode('latin-1').decode('unicode_escape')

def displayhook_unicode(expr_result):
   if expr_result is not None:
 __builtins__._ = expr_result
 print(replace_escapes(repr(expr_result)))

from traceback import format_exception
def excepthook_unicode(*exc_details):
 msg = ''.join(format_exception(*exc_details))
 print(replace_escapes(msg), end='')

import sys
sys.displayhook = displayhook_unicode
sys.excepthook = excepthook_unicode

I get the following behaviour:

 >>> "\u65e5\u672c\u8a9e"
'日本語'
 >>> print("\u65e5\u672c\u8a9e")
日本語
 >>> '日本語'
'日本語'
 >>> print('日本語')
日本語
 >>> 日本語 = 1
 >>> 日本語
1
 >>> dir()
['__builtins__', '__doc__', '__name__', '__package__', 
'displayhook_unicode', 'excepthook_unicode', 'format_exception', 
'replace_escapes', 'sys', '日本語']
 >>> b"\u65e5\u672c\u8a9e"
b'\u65e5\u672c\u8a9e'
 >>> print(b"\u65e5\u672c\u8a9e")
b'\\u65e5\\u672c\\u8a9e'
 >>> f = open("\u65e5\u672c\u8a9e")
Traceback (most recent call last):
   File "", line 1, in 
   File "/home/ncoghlan/devel/py3k/Lib/io.py", line 212, in __new__
 return open(*args, **kwargs)
   File "/home/ncoghlan/devel/py3k/Lib/io.py", line 151, in open
 closefd)
IOError: [Errno 2] No such file or directory: '日本語'
 >>> f = open("\u65e5\u672c\u8a9e", 'w')
 >>> f.name
'日本語'

Note that even though the bytes object representation is slightly 
different from that for the normal displayhook (which doubles up on the 
backslashes, just like the bytes printing example above), the two 
different representations are equivalent because \u isn't a valid escape 
sequence for bytes literals.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt

2008-04-16 Thread Oleg Broytmann
On Wed, Apr 16, 2008 at 10:11:13PM +1000, Nick Coghlan wrote:
> atsuo ishimoto wrote:
> > IOError: [Errno 2] No such file or directory: '\u65e5\u672c\u8a9e'
> 
> This is starting to seem to me more like something to be addressed 
> through sys.displayhook/excepthook at the interactive interpreter level 

   The problem manifests itself in scripts, too:

Traceback (most recent call last):
  File "./ttt.py", line 4, in 
open("тест") # filename is in koi8-r encoding
IOError: [Errno 2] No such file or directory: '\xd4\xc5\xd3\xd4'

Oleg.
-- 
 Oleg Broytmannhttp://phd.pp.ru/[EMAIL PROTECTED]
   Programmers don't die, they just GOSUB without RETURN.
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt

2008-04-16 Thread Nick Coghlan
Oleg Broytmann wrote:
> On Wed, Apr 16, 2008 at 10:11:13PM +1000, Nick Coghlan wrote:
>> atsuo ishimoto wrote:
>>> IOError: [Errno 2] No such file or directory: '\u65e5\u672c\u8a9e'
>> This is starting to seem to me more like something to be addressed 
>> through sys.displayhook/excepthook at the interactive interpreter level 
> 
>The problem manifests itself in scripts, too:
> 
> Traceback (most recent call last):
>   File "./ttt.py", line 4, in 
> open("тест") # filename is in koi8-r encoding
> IOError: [Errno 2] No such file or directory: '\xd4\xc5\xd3\xd4'

Hmm, the io module along with sys.stdout/err may be a better way to 
attack the problem then. Given:

import sys, io

class ParseUnicodeEscapes(io.TextIOWrapper):
   def write(self, text):
 super().write(text.encode('latin-1').decode('unicode_escape'))

args = (sys.stdout.buffer, sys.stdout.encoding, sys.stdout.errors,
 None, sys.stdout.line_buffering)

sys.stdout = ParseUnicodeEscapes(*args)

args = (sys.stderr.buffer, sys.stderr.encoding, sys.stderr.errors,
 None, sys.stderr.line_buffering)

sys.stderr = ParseUnicodeEscapes(*args)

You get:

 >>> "тест"
'тест'
 >>> open("тест")
Traceback (most recent call last):
   File "", line 1, in 
   File "/home/ncoghlan/devel/py3k/Lib/io.py", line 212, in __new__
 return open(*args, **kwargs)
   File "/home/ncoghlan/devel/py3k/Lib/io.py", line 151, in open
 closefd)
IOError: [Errno 2] No such file or directory: 'тест'

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Displaying strings containing unicode escapes

2008-04-16 Thread Oleg Broytmann
On Wed, Apr 16, 2008 at 11:21:26PM +1000, Nick Coghlan wrote:
> Hmm, the io module along with sys.stdout/err may be a better way to 
> attack the problem then. Given:
> 
> import sys, io
> 
> class ParseUnicodeEscapes(io.TextIOWrapper):
>def write(self, text):
>  super().write(text.encode('latin-1').decode('unicode_escape'))
> 
> args = (sys.stdout.buffer, sys.stdout.encoding, sys.stdout.errors,
>  None, sys.stdout.line_buffering)
> 
> sys.stdout = ParseUnicodeEscapes(*args)
> 
> args = (sys.stderr.buffer, sys.stderr.encoding, sys.stderr.errors,
>  None, sys.stderr.line_buffering)
> 
> sys.stderr = ParseUnicodeEscapes(*args)
> 
> You get:
> 
>  >>> "тест"
> 'тест'
>  >>> open("тест")
> Traceback (most recent call last):
>File "", line 1, in 
>File "/home/ncoghlan/devel/py3k/Lib/io.py", line 212, in __new__
>  return open(*args, **kwargs)
>File "/home/ncoghlan/devel/py3k/Lib/io.py", line 151, in open
>  closefd)
> IOError: [Errno 2] No such file or directory: 'тест'

   Very well, then. Thank you! The code should be put in a cookbook or the
wiki, if not in the library.

Oleg.
-- 
 Oleg Broytmannhttp://phd.pp.ru/[EMAIL PROTECTED]
   Programmers don't die, they just GOSUB without RETURN.
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt

2008-04-16 Thread Guido van Rossum
2008/4/16 Oleg Broytmann <[EMAIL PROTECTED]>:
>The problem manifests itself in scripts, too:
>
>  Traceback (most recent call last):
>   File "./ttt.py", line 4, in 
> open("тест") # filename is in koi8-r encoding
>  IOError: [Errno 2] No such file or directory: '\xd4\xc5\xd3\xd4'

Note that this can be a feature too! You might have a filename that
*looks* normal but contains a character from a different language --
the \u encoding will show you the problem.

$ ls *.py
mc.py   x.py
guido-van-rossums-imac:~ guido$ python
Python 2.5.2 (release25-maint:60953, Feb 25 2008, 09:38:08)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> open('mс.py')
Traceback (most recent call last):
  File "", line 1, in 
IOError: [Errno 2] No such file or directory: 'm\xd1\x81.py'
>>>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Displaying strings containing unicode escapes

2008-04-16 Thread Oleg Broytmann
On Wed, Apr 16, 2008 at 07:26:36AM -0700, Guido van Rossum wrote:
> 2008/4/16 Oleg Broytmann <[EMAIL PROTECTED]>:
> >The problem manifests itself in scripts, too:
> >
> >  Traceback (most recent call last):
> >   File "./ttt.py", line 4, in 
> > open("тест") # filename is in koi8-r encoding
> >  IOError: [Errno 2] No such file or directory: '\xd4\xc5\xd3\xd4'
> 
> Note that this can be a feature too! You might have a filename that
> *looks* normal but contains a character from a different language --
> the \u encoding will show you the problem.
> 
> $ ls *.py
> mc.py x.py
> guido-van-rossums-imac:~ guido$ python
> Python 2.5.2 (release25-maint:60953, Feb 25 2008, 09:38:08)
> [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> open('mс.py')
> Traceback (most recent call last):
>   File "", line 1, in 
> IOError: [Errno 2] No such file or directory: 'm\xd1\x81.py'

   This can be a feature only for people who always have all-ascii file
names and never expect non-ascii characters in the file names. Those of us
who regularly use non-ascii filenames are too accustomed to that
brok^H^H^H^H escaped repr's to spot a difference.

Oleg.
-- 
 Oleg Broytmannhttp://phd.pp.ru/[EMAIL PROTECTED]
   Programmers don't die, they just GOSUB without RETURN.
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Displaying strings containing unicode escapes

2008-04-16 Thread Nick Coghlan
Oleg Broytmann wrote:
> On Wed, Apr 16, 2008 at 11:21:26PM +1000, Nick Coghlan wrote:
>> You get:
>>
>>  >>> "тест"
>> 'тест'
>>  >>> open("тест")
>> Traceback (most recent call last):
>>File "", line 1, in 
>>File "/home/ncoghlan/devel/py3k/Lib/io.py", line 212, in __new__
>>  return open(*args, **kwargs)
>>File "/home/ncoghlan/devel/py3k/Lib/io.py", line 151, in open
>>  closefd)
>> IOError: [Errno 2] No such file or directory: 'тест'
> 
>Very well, then. Thank you! The code should be put in a cookbook or the
> wiki, if not in the library.
> 

Unfortunately, it turns out that the trick also breaks display of 
strings containing any other escape codes. For example:

 >>> '\n'
'
'
 >>> '\t'
'   '

The unicode_escape codec is interpreting all of the escape sequences 
recognised in Python strings, not just the \u sequences we're interested in.

I can't see an easy way around this at the moment, but I'm still 
reasonably convinced that the issue of Unicode escapes for non-ASCII 
users is best attacked as a display problem rather than an internal 
representation problem.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Displaying strings containing unicode escapes

2008-04-16 Thread Guido van Rossum
I just had a shower, and I think it's cleared my thoughts a bit. :-)

Clearly this is an important problem to those in countries where ASCII
doesn't cut it. And just like in Python 3000 we're using UTF-8 as the
default source encoding and allowing Unicode letters in identifiers, I
think we should bite the bullet and allow repr() of a string to pass
through all characters that the Unicode standard considers printable.
For those of us with less capable IO devices, setting the error flag
for stdout and stderr to backslashreplace is probably the best
solution (and it solves more problems than just repr()).

I will have another look at Atsuo's patch.

I do think we should use some kind of Unicode-standard-endorsed
definition of "printable" (as long as it excludes all ASCII escapes),
since there are plenty of undefined code points that even Japanese
people would probably prefer to see rendered as \u rather than
completely invisible. I'm also not sure what people would want to
happen for surrogate pairs. (OTOH an unpaired surrogate should be
rendered as \u.)

I expect that this will require some more research and agreement.
Perhaps someone can produce a draft PEP and attempt to sort out the
details of specification and implementation? It would also be nice if
it could be friendly to Jython, IronPython and PyPy.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Recursive str

2008-04-16 Thread Jason Orendorff
On Tue, Apr 15, 2008 at 10:30 PM, Greg Ewing
<[EMAIL PROTECTED]> wrote:
> Terry Reedy wrote:
>  > import unirep
>  > print(*map(unirep.russian, objects))
>
>  That's okay if the objects are strings, but what about
>  non-string objects that contain strings?
>
>  We'd need another protocol, such as __unirep__.

Or have str.__repr__() respect per-thread settings, the way decimal
arithmetic does.

Default settings would be in force most of the time; the interactive
prompt would apply the user's settings when repr-ing a result.  This
approach solves the nested-strings problem quite nicely.  But it does
not catch error/warning/log messages when they are generated, unless
the program does *everything* under custom repr settings (dangerous).

There really are two use cases here: a human-readable repr for
error/warning/log messages; and a machine-readable, always-the-same,
ASCII-only repr.  Users want to be able to tweak the former.

-j
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Recursive str

2008-04-16 Thread Guido van Rossum
[Jason Orendorff]
>  Or have str.__repr__() respect per-thread settings, the way decimal
>  arithmetic does.

I don't think that's a very compelling example. I have serious issues
with having global or per-thread state that can change the outcome of
repr(); it would make it impossible to write correct code involving
repr() because you can never know what it will do the next time.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Recursive str

2008-04-16 Thread Michael Urman
On Wed, Apr 16, 2008 at 11:05 AM, Jason Orendorff
<[EMAIL PROTECTED]> wrote:
>  There really are two use cases here: a human-readable repr for
>  error/warning/log messages; and a machine-readable, always-the-same,
>  ASCII-only repr.  Users want to be able to tweak the former.

Does machine-readable require ASCII-only, and does repr() guarantee
this? It sounded like the worries about not escaping Unicode
characters were related to it not visually distinguishing between
different encodings for the same visual results (as their
machine-readable Unicode strings, or encoded UTF-8 bytestreams, would
already differ).

-- 
Michael Urman
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt

2008-04-16 Thread atsuo ishimoto
2008/4/16, Nick Coghlan <[EMAIL PROTECTED]>:
> Oleg Broytmann wrote:
>  > On Wed, Apr 16, 2008 at 10:11:13PM +1000, Nick Coghlan wrote:
>  >> atsuo ishimoto wrote:
>  >>> IOError: [Errno 2] No such file or directory: '\u65e5\u672c\u8a9e'
>  >> This is starting to seem to me more like something to be addressed
>  >> through sys.displayhook/excepthook at the interactive interpreter level
>  >
>  >The problem manifests itself in scripts, too:
>  >
>  > Traceback (most recent call last):
>  >   File "./ttt.py", line 4, in 
>  > open("тест") # filename is in koi8-r encoding
>  > IOError: [Errno 2] No such file or directory: '\xd4\xc5\xd3\xd4'
>
>
> Hmm, the io module along with sys.stdout/err may be a better way to
>  attack the problem then. Given:
>
>  import sys, io
>
>  class ParseUnicodeEscapes(io.TextIOWrapper):
>def write(self, text):
>  super().write(text.encode('latin-1').decode('unicode_escape'))
>
>  args = (sys.stdout.buffer, sys.stdout.encoding, sys.stdout.errors,
>  None, sys.stdout.line_buffering)
>
>  sys.stdout = ParseUnicodeEscapes(*args)
>
>  args = (sys.stderr.buffer, sys.stderr.encoding, sys.stderr.errors,
>  None, sys.stderr.line_buffering)
>
>  sys.stderr = ParseUnicodeEscapes(*args)
>
>  You get:
>
>   >>> "тест"
>  'тест'
>   >>> open("тест")
>

I got:

>>> print("あ")
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 3, in write
UnicodeEncodeError: 'latin-1' codec can't encode character 'あ'
in position 0: ordinal not in range(256)
>>> print('\\'+'u0041')
A

Your hack doesn't work. Displayhook hack doesn't work, too.

Question: Are you happy if you are forced to live with these hacks forever?
If not, why do you think I'll accept your suggestion?
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt

2008-04-16 Thread atsuo ishimoto
2008/4/16, Guido van Rossum <[EMAIL PROTECTED]>:
> Note that this can be a feature too! You might have a filename that
>  *looks* normal but contains a character from a different language --
>  the \u encoding will show you the problem.

You won't call it a feature, if your *normal* encoding was koi8-r.
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt

2008-04-16 Thread Guido van Rossum
I changed my mind already. :-) See my post of this morning in another thread.

On Wed, Apr 16, 2008 at 4:09 PM, atsuo ishimoto <[EMAIL PROTECTED]> wrote:
> 2008/4/16, Guido van Rossum <[EMAIL PROTECTED]>:
>
> > Note that this can be a feature too! You might have a filename that
>  >  *looks* normal but contains a character from a different language --
>  >  the \u encoding will show you the problem.
>
>  You won't call it a feature, if your *normal* encoding was koi8-r.
>



-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt

2008-04-16 Thread atsuo ishimoto
2008/4/17, Guido van Rossum <[EMAIL PROTECTED]>:
> I changed my mind already. :-) See my post of this morning in another thread.

Ah, I missed the mail!  Thank you.
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Displaying strings containing unicode escapes

2008-04-16 Thread Stephen J. Turnbull
I've reordered Guido's words.

Guido van Rossum writes:

 > For those of us with less capable IO devices, setting the error flag
 > for stdout and stderr to backslashreplace is probably the best
 > solution (and it solves more problems than just repr()).

True.  But it doesn't solve the ambiguity problem on capable displays.

 > And just like in Python 3000 we're using UTF-8 as the default
 > source encoding and allowing Unicode letters in identifiers, I
 > think we should bite the bullet and allow repr() of a string to
 > pass through all characters that the Unicode standard considers
 > printable.

The problem is that this doesn't display the representation of strings
and identifier names in an unambiguous way.  "AKMOT" could be
all-ASCII, it could be all-Cyrillic, or it could be a mixture of
ASCII, Cyrillic, and Greek.  Odds are quite good that there are other
scripts that could be mixed in, too.  This kind of mixing happens all
the time in Japanese, where people mix half-width and full-width ASCII
with abandon (especially when altering digits in dates).  I could
easily see a Russian using Cyrillic 'A' to uppercase an ASCII 'a' in
the same way.

How about choosing a standard Python repertoire (based on the Unicode
standard, of course) of which characters get a graphic repr and which
ones get \u-escaped, and have a post-hook for repr which gets passed
the string repr proposes to print out?  This hook would always be
identity in Python-distributed stuff, of course, but on the consenting
adults principle applications and modules outside of the stdlib could
use it.  Would that be acceptable?

The standard repertoire would grandfather ASCII, I suppose, because
for the foreseeable future most identifiers are going to be ASCII, and
all Python implementations will contain a lot of ASCII identifiers and
strings indefinitely.
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


[Python-3000] end scope of iteration variables after loop

2008-04-16 Thread Nicholas T
hello all,

   A few times in practice I have been tripped up by how Python keeps
variables in scope after a loop--and it wasn't immediately obvious what the
problem was. I think it is one of the ugliest and non-intuitive features,
and hope some others agree that it should be changed in py3k.

>>> for a in range(11): pass
...
>>> print(a)
10

Thanks,
Nicholas
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Recursive str

2008-04-16 Thread Greg Ewing
Guido van Rossum wrote:

> The more I think about this, the more I believe that repr() should
> *not* be changed, and that instead we should give people who like to
> see '日本語' instead of '\u1234\u5678\u9abc' other tools to help
> themselves.

This seems to be a rather ASCII-centric way of thinking
about things, though, which I thought py3k was trying
to get away from, with unicode being the one and only
string type.

Maybe it really is the only practical option, but I
can understand non-ASCII speakers feeling disappointed.

-- 
Greg
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] sizeof(size_t) < sizeof(long)

2008-04-16 Thread Greg Ewing
Martin v. Löwis wrote:

> 3.6 byte
> addressable  unit  of  data storage large enough to hold any
> member  of  the  basic  character  set  of   the   execution
> environment

Blarg. Well, I think the wording of that part of the
standard is braindamaged. The word "byte" already has
a pre-existing meaning outside of C, and the C standard
shouldn't be redefining it for its own purposes.

This is like a financial document that defines "dollar"
as "the unit of currency in use in the country concerned".
Thoroughly confusing and unnecessary.

Particularly since they seem to just be defining "byte"
to mean the same thing as "char". Why not just use the
term "char" in the first place?

-- 
Greg
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] end scope of iteration variables after loop

2008-04-16 Thread Nicholas T
previous discussion at
http://mail.python.org/pipermail/python-dev/2005-September/056677.html

I don't agree with the author that
>>> i = 3
>>> for i in range(11): pass
...
>>> i
10
is much less confusing than i returning 3. furthermore, his C example makes
it obvious that "i" will be available in the scope after the loop. There's
no way to know now, but I think mistakes would be less frequent.

Additionally, what are others' opinions about this "pseudo-namespace" (i.e.
scoping) being slow? Admittedly, I don't know much about the current
parser's implementation, but it doesn't seem like scoping necessitates slow
parsing -- considering it's done in other languages, and python functions
have reasonable scope.

>>> def do_nothing(i): i = 3
...
>>> do_nothing(1)
>>> i
10

Nicholas

On Wed, Apr 16, 2008 at 5:52 PM, Nicholas T <[EMAIL PROTECTED]> wrote:

> hello all,
>
>A few times in practice I have been tripped up by how Python keeps
> variables in scope after a loop--and it wasn't immediately obvious what the
> problem was. I think it is one of the ugliest and non-intuitive features,
> and hope some others agree that it should be changed in py3k.
>
> >>> for a in range(11): pass
> ...
> >>> print(a)
> 10
>
> Thanks,
> Nicholas
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Recursive str

2008-04-16 Thread Greg Ewing
Oleg Broytmann wrote:
> Do I understand it right that str(objects) calls repr() on items to
> properly quote strings? (str([1, '1']) must give "[1, '1']" as the result).
> Is it the only reason?

In the case of strings, yes. More generally, there
can be any kind of object in the list, and repr(x)
is more likely to give an unambiguous idea of what
x is than str(x) when it's embedded in a comma-
separated list.

Python has no way of guessing the most appropriate
way to display your list of objects when you use
str(), so it doesn't try. You have to tell it by
writing code to do what you want.

-- 
Greg
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt

2008-04-16 Thread Greg Ewing
Oleg Broytmann wrote:

> Traceback (most recent call last):
>   File "./ttt.py", line 4, in 
> open("тест") # filename is in koi8-r encoding
> IOError: [Errno 2] No such file or directory: '\xd4\xc5\xd3\xd4'

In that particular case, I'd say the IOError constructor
is doing the wrong thing -- it should be using something
like

   "No such file or directory: '%s'" % filename\

instead of

   "No such file or directory: %r" % filename

i.e. %r shouldn't be used as a quick and dirty way to
get a string quoted.

-- 
Greg

___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Displaying strings containing unicode escapes

2008-04-16 Thread Greg Ewing
Nick Coghlan wrote:

> Unfortunately, it turns out that the trick also breaks display of 
> strings containing any other escape codes.

There's also the worry that it could trigger falsely
on something that happened to look like \u but
didn't originate from the repr of a unicode char.

> I'm still 
> reasonably convinced that the issue of Unicode escapes for non-ASCII 
> users is best attacked as a display problem

It can only ever be a heuristic, though, not an
exact solution, since there isn't enough information
left by the time it's a string to undo the escaping
correctly in all cases.

I'm currently thinking there are too many use cases
overloaded onto repr() at the moment.

-- 
Greg

___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] sizeof(size_t) < sizeof(long)

2008-04-16 Thread Greg Ewing
David Cournapeau wrote:
> They are totally different concepts: byte is not a (C) type, but a unit, 
> the one returned by the sizeof operator.

If a word is needed for this concept, then invent a new
one, e.g. "size unit", rather than reusing "byte", which
everyone already understands as meaning 8 bits.

> C impose that sizeof(unsigned type) == sizeof(signed type) for any type, 
> so if one byte is one char, unsigned char would be a byte too, and so 
> unsigned char and char would be the same, which is obviously wrong.

No, "char" and "unsigned char" can still be different types.
You just need to say that sizeof(char) == sizeof(unsigned char) == 1,
and leave bytes out of the discussion altogether.

-- 
Greg
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] sizeof(size_t) < sizeof(long)

2008-04-16 Thread David Cournapeau
Greg Ewing wrote:
>
> Blarg. Well, I think the wording of that part of the
> standard is braindamaged. The word "byte" already has
> a pre-existing meaning outside of C, and the C standard
> shouldn't be redefining it for its own purposes.
>
> This is like a financial document that defines "dollar"
> as "the unit of currency in use in the country concerned".
> Thoroughly confusing and unnecessary.
>
> Particularly since they seem to just be defining "byte"
> to mean the same thing as "char". Why not just use the
> term "char" in the first place?
>   
They are totally different concepts: byte is not a (C) type, but a unit, 
the one returned by the sizeof operator. One char occupies one byte of 
memory, and in memory, they are the same, but conceptually, they are 
totally different, from the C point of view at least. For example, C 
impose that sizeof(unsigned type) == sizeof(signed type) for any type, 
so if one byte is one char, unsigned char would be a byte too, and so 
unsigned char and char would be the same, which is obviously wrong.

cheers,

David

___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] sizeof(size_t) < sizeof(long)

2008-04-16 Thread David Cournapeau
Greg Ewing wrote:
>
> If a word is needed for this concept, then invent a new
> one, e.g. "size unit", rather than reusing "byte", which
> everyone already understands as meaning 8 bits.
>   

Maybe everyone understands it as 8 bits, but it has always been wrong. 
Byte is a unit of storage, which often contains 8 bits, but not always. 
This definition of a byte as a unit of storage certainly precludes the 
convention that one byte = 8 bits; even if it always contained 8 bits, 
it would still be wrong to say that one byte is 8 bits BTW: the byte 
notion (unit of storage), and its actual size are totally different 
concepts.

>
> No, "char" and "unsigned char" can still be different types.
> You just need to say that sizeof(char) == sizeof(unsigned char) == 1,
> and leave bytes out of the discussion altogether.
>   

I was merely answering to the question "why not using char in the first 
place": because they are totally difference concepts. If you assume char 
and byte are the same thing because sizeof(char) == 1 byte, then you 
should assume that unsigned char is the same as a byte, and thus that 
unsigned char and char are the same. This was a proof by contradiction :)

cheers,

David

___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] sizeof(size_t) < sizeof(long)

2008-04-16 Thread James Y Knight

On Apr 16, 2008, at 11:00 PM, Greg Ewing wrote:

> If a word is needed for this concept, then invent a new
> one, e.g. "size unit", rather than reusing "byte", which
> everyone already understands as meaning 8 bits.

Nope. Everyone understands "octet" to be 8 bits.

Bytes being exactly 8 bits is itself the redefinition! In the not-too- 
distant-past, some hardware had 9-bit bytes. Common Lisp also uses the  
term "byte" to mean an arbitrary (specified) number of bits. E.g. 
http://www.lisp.org/HyperSpec/Body/typ_unsigned-byte.html

See also http://dictionary.die.net/byte

James
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] sizeof(size_t) < sizeof(long)

2008-04-16 Thread Greg Ewing
David Cournapeau wrote:

> Maybe everyone understands it as 8 bits, but it has always been wrong.

It may not be officially written down anywhere, but
almost everyone in the world understands a byte to mean
8 bits. When you go into a computer store and ask for
256MB of RAM, you don't expect to be asked "What size
bytes would that be, then, sir?"

So it's a de facto standard, and one that works perfectly
well. Going against it is both futile and unnecessary,
as far as I can see.

-- 
Greg
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt

2008-04-16 Thread Alex Martelli
On Wed, Apr 16, 2008 at 6:53 PM, Greg Ewing <[EMAIL PROTECTED]> wrote:
   ...
>  > open("тест") # filename is in koi8-r encoding
>  > IOError: [Errno 2] No such file or directory: '\xd4\xc5\xd3\xd4'
>
>  In that particular case, I'd say the IOError constructor
>  is doing the wrong thing -- it should be using something
>  like
>
>"No such file or directory: '%s'" % filename\
>
>  instead of
>
>"No such file or directory: %r" % filename
>
>  i.e. %r shouldn't be used as a quick and dirty way to
>  get a string quoted.

I disagree: I always recommend using %r to display (in an error
message, log entry, etc), a string that may be in error, NOT '%s',
because the cause of the error can often be that the string mistakenly
contains otherwise-invisible characters -- %r will show them clearly
(as escape sequences), while %s could hide them and lead anybody but
the most experienced developer to a long and frustrating debugging
session.


Alex
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt

2008-04-16 Thread Greg Ewing
Alex Martelli wrote:
> I disagree: I always recommend using %r to display (in an error
> message, log entry, etc), a string that may be in error,

For debugging messages, yes, but not output produced
in the normal course of operation. And "File Not Found"
I consider to be in the latter category -- the user
typed in the wrong file name, but it's still a string,
and should be displayed to him as such.

If it's not a string, the program will most likely
fall over with a TypeError trying to open the file
before it gets as far as constructing an IOError.

-- 
Greg

___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] Displaying strings containing unicode escapes at the interactive prompt

2008-04-16 Thread Guido van Rossum
On Wed, Apr 16, 2008 at 10:20 PM, Greg Ewing
<[EMAIL PROTECTED]> wrote:
> Alex Martelli wrote:
>  > I disagree: I always recommend using %r to display (in an error
>  > message, log entry, etc), a string that may be in error,
>
>  For debugging messages, yes, but not output produced
>  in the normal course of operation. And "File Not Found"
>  I consider to be in the latter category -- the user
>  typed in the wrong file name, but it's still a string,
>  and should be displayed to him as such.

I respectfully disagree. Control characters and such in the string
should *definitely* be escaped. Regarding printable characters outside
the ASCII range, see my post in another thread (which somehow nearly
everybody appears to have missed); in Py3k I propose to pass printable
Unicode characters unchanged through repr(). stdout/stderr will set
their error attribute to backslashreplace so that if their encoding is
ASCII or some such, out-of-range characters will be printed as \u
rather than raising an exception during printing. But as I said,
please follow up to my other post.

Another reason to use %r is that if someone manages to include \n in a
filename, with %s the log message might be spread across two lines,
possibly confusing log parsers and even providing ways to hide illegal
activities from log scanners.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com


Re: [Python-3000] sizeof(size_t) < sizeof(long)

2008-04-16 Thread Adam Olsen
On Wed, Apr 16, 2008 at 10:32 PM, Greg Ewing
<[EMAIL PROTECTED]> wrote:
> David Cournapeau wrote:
>
>  > Maybe everyone understands it as 8 bits, but it has always been wrong.
>
>  It may not be officially written down anywhere, but
>  almost everyone in the world understands a byte to mean
>  8 bits. When you go into a computer store and ask for
>  256MB of RAM, you don't expect to be asked "What size
>  bytes would that be, then, sir?"
>
>  So it's a de facto standard, and one that works perfectly
>  well. Going against it is both futile and unnecessary,
>  as far as I can see.

Sure, *now*, but C inherited their definition from a day when it
wasn't so clear cut.  It may be obsolete today, but good luck getting
them to change the standard.

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com