Re: [Python-Dev] str.translate vs unicode.translate

2006-02-17 Thread M.-A. Lemburg
Bengt Richter wrote:
> If str becomes unicode for PY 3000, and we then have bytes as out 
> coding-agnostic
> byte data, then I think bytes should have the str translation method, with a 
> tweak
> that I would hope could also be done to str now.
> 
> BTW, str.translate will presumably become unicode.translate, so
> perhaps unicode.translate should grow a compatible deletechars parameter.

I'd much rather like to see .translate() method deprecated.

Writing a code for the task is much more effective - the
builtin charmap codec will do all the mapping for you,
if you have a need to go from bytes to Unicode and vice-
versa.

We could also have a bytemap codec for doing bytes to bytes
conversions.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 17 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] str.translate vs unicode.translate (was: Re: str object going in Py3K)

2006-02-17 Thread Bengt Richter
If str becomes unicode for PY 3000, and we then have bytes as out 
coding-agnostic
byte data, then I think bytes should have the str translation method, with a 
tweak
that I would hope could also be done to str now.

BTW, str.translate will presumably become unicode.translate, so
perhaps unicode.translate should grow a compatible deletechars parameter.

But that's not the tweak. The tweak is to eliminate unavoidable pre-conversion 
to unicode
in str(something).translate(u'...', delchars) (and preemptively 
bytes(something).translate(u'...', delchars))

E.g. suppose you now want to write:

s_str.translate(table, delch).encode('utf-8')

Note that s_str has no encoding information, and translate is conceptually just 
a 1:1 substitution
minus characters in delch. But if we want to do one-chr:one-unichr substitution 
by specifying a
256-long table of unicode characters, we cannot. It would be simple to allow 
it, and that's the
tweak I would like. It would allow easy custom decodes.

At the moment, if you want to write the above, you have to introduce a phony 
latin-1 decoding
and write it as (not typo-proof)

s_str.translate(table, delch).decode('latin-1').encode('utf-8') # use 
str.translate
or
s_str.decode('latin-1').translate(mapping).encode('utf-8')  # use 
unicode.translate also for delch

to avoid exceptions if you have non-ascii in your s_str (even if delch would 
have removed them!!)

It seems s_str.translate(table, delchars) wants to convert the s_str to unicode
if table is unicode, and then use unicode.translate (which bombs on delchars!)
instead of just effectively defining str.translate as

def translate(self, table, deletechars=None):
return ''.join((table or isinstance(table,unicode) and uidentity or 
sidentity)[ord(x)] for x in self
   if not deletechars or x not in deletechars)

# For convenience in just pruning with deletechars, s_str.translate('', 
deletechars) deletes without translating,
# and s_str.translate(u'', deletechars)  does the same and then maps to 
same-ord unicode characters
# given
# sidentity = ''.join(chr(i) for i in xrange(256))
# and
# uidentity = u''.join(unichr(i) for i in xrrange(256)).

IMO, if you want unicode.translate, then it doesn't hurt to write 
unicode(s_str).translate and use that.

Let str.translate just use the str ords, so simple custom decodes can be 
written without
the annoyance of e.g.,

UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 3: 
ordinal not in range(128)

Can we change this for bytes? And why couldn't we change this for str.translate 
now?
Or what am I missing? I certainly would like to miss the above message for 
str.translate :-(

BTW This would also allow taking advantage of features of both translates if 
desired, e.g. by
s_str.translate(unichartable256, 
strdelchrs).translate(uniord_to_ustr_or_uniord_mapping).
(e.g., the latter permits single to multiple-character substitution)

I think at least a tweaked translate method for bytes would be good for py3k,
and I hope we can do it for str.translate now.
It it is just too handy a high speed conversion goodie to forgo IMO.

Regards,
Bengt Richter

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com