[issue8529] subclassing builtin class (str, unicode, list...) needs to override __getslice__

2010-04-25 Thread Florent Xicluna

New submission from Florent Xicluna florent.xicl...@gmail.com:

It looks like a bug, because __getslice__ is deprecated since 2.0.

If you subclass a builtin type and override the __getitem__ method, you need to 
override the (deprecated) __getslice__ method too.
And if you run your program with python -3, it 

Example script:


class Upper(unicode):

def __getitem__(self, index):
return unicode.__getitem__(self, index).upper()

#def __getslice__(self, i, j):
#return self[i:j:]


if __name__ == '__main__':
text = Upper('Lorem ipsum')

print text[:]
print text[::]

--
components: Interpreter Core
messages: 104148
nosy: flox
priority: normal
severity: normal
status: open
title: subclassing builtin class (str, unicode, list...) needs to override 
__getslice__
type: behavior
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8529
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8529] subclassing builtin class (str, unicode, list...) needs to override __getslice__

2010-04-25 Thread Benjamin Peterson

Benjamin Peterson benja...@python.org added the comment:

This is because unicode implements __getslice__.

--
nosy: +benjamin.peterson
resolution:  - invalid
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8529
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8529] subclassing builtin class (str, unicode, list...) needs to override __getslice__

2010-04-25 Thread Florent Xicluna

Florent Xicluna florent.xicl...@gmail.com added the comment:

OK, but it yields Python 3 DeprecationWarning in the subclass.
And there's no workaround to get rid of the deprecation. 

If it is the correct behaviour, maybe some words could be added about 
subclassing builtin types:
http://docs.python.org/reference/datamodel.html#additional-methods-for-emulation-of-sequence-types

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8529
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8529] subclassing builtin class (str, unicode, list...) needs to override __getslice__

2010-04-25 Thread Florent Xicluna

Florent Xicluna florent.xicl...@gmail.com added the comment:

OK, I said nothing, it is already in the doc.

:-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8529
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: Unicode list

2007-04-01 Thread Georg Brandl
Rehceb Rotkiv schrieb:
 Hello,
 
 I have this little grep-like program:
 
 ++snip++
 #!/usr/bin/python
 
 import sys
 import re
 
 pattern = sys.argv[1]
 inputfile = file(sys.argv[2], 'r')
 
 for line in inputfile:
 matches = re.findall(pattern, line)
 if matches:
 print matches
 ++snip++
 
 Like this, the program prints some characters as strange escape 
 sequences, which is due to the input file being encoded in utf-8

As Paul said, your terminal is likely set to iso-8859 encoding, which
is why it doesn't display UTF-8 correctly. The above program produces
correct UTF-8 output.

What you could do is:
1. read the file in as unicode
2. print the unicode to the terminal (will use the terminal encoding) or
convert the unicode to strings with an explicit encoding before printing

codecs.open() is very helpful for step 1, BTW.

Georg

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode list

2007-04-01 Thread Rehceb Rotkiv
 When printing a list, the individual elements are converted with repr(),
 not with str(). For a string object, repr() adds escape codes for all
 bytes that are not printable ASCII characters.

Thanks Martin, you're right, it were the repr() calls that messed up the 
output. Iterating the array like you proposed is even 1/100s faster ;)

Regards,
Rehceb
-- 
http://mail.python.org/mailman/listinfo/python-list


Unicode list

2007-03-31 Thread Rehceb Rotkiv
Hello,

I have this little grep-like program:

++snip++
#!/usr/bin/python

import sys
import re

pattern = sys.argv[1]
inputfile = file(sys.argv[2], 'r')

for line in inputfile:
matches = re.findall(pattern, line)
if matches:
print matches
++snip++

Like this, the program prints some characters as strange escape 
sequences, which is due to the input file being encoded in utf-8: When I 
convert re.findall... to a string and wrap an unicode() around it, 
the matches get printed correctly. Is it possible to make matches 
unicode without saving it as a single string first? The function unicode
() seems only to work for strings. Or is there a general way of telling 
Python to abandon the ancient and evil land of iso-8859 for good and use 
utf-8 only?

Regards,
Rehceb
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode list

2007-03-31 Thread Paul Boddie
Rehceb Rotkiv wrote:
 Hello,

 I have this little grep-like program:

 ++snip++
 #!/usr/bin/python

 import sys
 import re

 pattern = sys.argv[1]
 inputfile = file(sys.argv[2], 'r')

 for line in inputfile:
 matches = re.findall(pattern, line)
 if matches:
 print matches
 ++snip++

 Like this, the program prints some characters as strange escape
 sequences, which is due to the input file being encoded in utf-8:

So the UTF-8 data gets printed to your terminal which isn't configured
for UTF-8, right?

 When I convert re.findall... to a string and wrap an unicode() around it,
 the matches get printed correctly.

How do you meaningfully convert it to a string? The matches variable
refers to a list, but you surely don't want to be dealing with the
list's string representation.

 Is it possible to make matches unicode without saving it as a single string 
 first?

Why not convert your input into Unicode and then, for the benefit of
certain kinds of character classes, use re.findall in Unicode mode (by
specifying re.U as a flag)? Then, each match will be produced as a
Unicode object.

 The function unicode() seems only to work for strings. Or is there a 
 general way of telling
 Python to abandon the ancient and evil land of iso-8859 for good and use 
 utf-8 only?

The only refuge from ancient and evil lands is found by climbing the
mountain of Unicode: convert from encoded text as soon as you can,
work only with Unicode objects, produce encoded text only when
necessary.

Paul

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode list

2007-03-31 Thread Martin v. Löwis
 Like this, the program prints some characters as strange escape 
 sequences, which is due to the input file being encoded in utf-8: When I 
 convert re.findall... to a string and wrap an unicode() around it, 
 the matches get printed correctly. Is it possible to make matches 
 unicode without saving it as a single string first? The function unicode
 () seems only to work for strings. Or is there a general way of telling 
 Python to abandon the ancient and evil land of iso-8859 for good and use 
 utf-8 only?

Python does not live in the ancient and evi land of iso-8859; it lives
in the ancient and evil land of ASCII.

When printing a list, the individual elements are converted with repr(),
not with str(). For a string object, repr() adds escape codes for all
bytes that are not printable ASCII characters. To avoid this call to
repr, you need to iterate over the list yourself, and print it:

if matches:
for m in matches:
print m,
print

HTH,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list