Re: [Python-Dev] Backporting PEP 3101 to 2.6
André Malo wrote: * Eric Smith wrote: But now that I look at time.strftime in py3k, it's converting the entire unicode string to a char string with PyUnicode_AsString, then converting back with PyUnicode_Decode. Looks wrong to me, too... :-) nd I don't understand Unicode encoding/decoding well enough to describe this bug, but I admit it looks suspicious. Could someone who does understand it open a bug against 3.0 (hopefully with an example that fails)? The bug should also mention that 2.6 avoids this problem entirely by not supporting unicode with strftime or datetime.__format__, but 2.6 could probably leverage whatever solution is developed for 3.0. Thanks. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Backporting PEP 3101 to 2.6
* Eric Smith wrote: André Malo wrote: I guess, a clean and complete solution (besides re-implementing the whole thing) would be to resolve each single format character with strftime, decode according to the locale and re-assemble the result string piece by piece. Doh! That's along the lines of what I was thinking. strftime already does some of this to support %[zZ]. But now that I look at time.strftime in py3k, it's converting the entire unicode string to a char string with PyUnicode_AsString, then converting back with PyUnicode_Decode. Looks wrong to me, too... :-) nd -- $_=q?tvc!uif)%*|#Bopuifs!A`#~tvc!Xibu)%*|qsjou#Kvtu!A`#~tvc!KBQI!)*|~ tvc!ifmm)%*|#Qfsm!A`#~tvc!jt)%*|(Ibdlfs(~ # What the hell is JAPH? ; @_=split/\s\s+#/;$_=(join''=map{chr(ord( # André Malo ; $_)-1)}split//=$_[0]).$_[1];s s.*s$_see; # http://www.perlig.de/ ; ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Backporting PEP 3101 to 2.6
Eric Smith wrote: Guido van Rossum wrote: For data types whose output uses only ASCII, would it be acceptable if they always returned an 8-bit string and left it up to the caller to convert it to Unicode? This would apply to all numeric types. (The date/time types have a strftime() style API which means the user must be able to specifiy Unicode.) I'm finally getting around to finishing this up. The approach I've taken for int, long, and float, is that they take either unicode or str format specifiers, and always return str results. The builtin format() deals with converting str to unicode, if the format specifier was originally unicode. This all works great. It allows me to easily implement both ''.format and u''.format taking int, long, and float parameters. I'm now working on datetime. The __format__ method is really just a wrapper around strftime. I was assuming (or rather hoping) that strftime does the right thing with unicode and str (unicode in = unicode out, str in = str out). But it turns out strftime doesn't accept unicode: $ ./python Python 2.6a0 (trunk:60845M, Feb 15 2008, 21:09:57) [GCC 4.1.2 20070626 (Red Hat 4.1.2-13)] on linux2 Type help, copyright, credits or license for more information. import datetime datetime.date.today().strftime('%y') '08' datetime.date.today().strftime(u'%y') Traceback (most recent call last): File stdin, line 1, in module TypeError: strftime() argument 1 must be str, not unicode As part of this task, I'm really not up to the job of changing strftime to support both str and unicode inputs. So I think I'll put all of the __format__ code in place to support it if and when strftime supports unicode. In the meantime, it won't be possible for u''.format to work with datetime objects. 'year: {0:%y}'.format(datetime.date.today()) 'year: 08' u'year: {0:%y}'.format(datetime.date.today()) Traceback (most recent call last): File stdin, line 1, in module TypeError: strftime() argument 1 must be str, not unicode The bad error message is a result of __format__ passing on unicode to strftime. There are, of course, various ugly ways to work around this involving nested format calls. Maybe I'll extend strftime to unicode for the PyCon sprint. Eric. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Backporting PEP 3101 to 2.6
Eric Smith wrote: The bad error message is a result of __format__ passing on unicode to strftime. There are, of course, various ugly ways to work around this involving nested format calls. I don't know if this fits your definition of ugly workaround, but what if datetime.__format__ did something like: def __format__(self, spec): encoding = None if isinstance(spec, unicode): encoding = 'utf-8' spec = spec.encode(encoding) result = strftime(spec, self) if encoding is not None: result = result.decode(encoding) return result Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Backporting PEP 3101 to 2.6
* Nick Coghlan wrote: Eric Smith wrote: The bad error message is a result of __format__ passing on unicode to strftime. There are, of course, various ugly ways to work around this involving nested format calls. I don't know if this fits your definition of ugly workaround, but what if datetime.__format__ did something like: def __format__(self, spec): encoding = None if isinstance(spec, unicode): encoding = 'utf-8' spec = spec.encode(encoding) result = strftime(spec, self) if encoding is not None: result = result.decode(encoding) return result Note that hardcoding utf-8 is a bad guess here as strftime(3) emits locale strings, so decoding will easily fail. I guess, a clean and complete solution (besides re-implementing the whole thing) would be to resolve each single format character with strftime, decode according to the locale and re-assemble the result string piece by piece. Doh! nd -- [...] weiß jemand zufällig, was der Tag DIV ausgeschrieben bedeutet? DIVerses. Benannt nach all dem unstrukturierten Zeug, was die Leute da so reinpacken und dann absolut positionieren ... -- Florian Hartig und Lars Kasper in dciwam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Backporting PEP 3101 to 2.6
André Malo wrote: I guess, a clean and complete solution (besides re-implementing the whole thing) would be to resolve each single format character with strftime, decode according to the locale and re-assemble the result string piece by piece. Doh! That's along the lines of what I was thinking. strftime already does some of this to support %[zZ]. But now that I look at time.strftime in py3k, it's converting the entire unicode string to a char string with PyUnicode_AsString, then converting back with PyUnicode_Decode. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Backporting PEP 3101 to 2.6
Guido van Rossum wrote: For data types whose output uses only ASCII, would it be acceptable if they always returned an 8-bit string and left it up to the caller to convert it to Unicode? This would apply to all numeric types. (The date/time types have a strftime() style API which means the user must be able to specifiy Unicode.) To elaborate on this a bit (and handwaving a lot of important details out of the way) do you mean something like the following for the builtin format?: def format(obj, fmt_spec=None): if fmt_spec is None: fmt_spec='' result = obj.__format__(fmt_spec) if isinstance(fmt_spec, unicode): if isinstance(result, str): result = unicode(result) return result Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Backporting PEP 3101 to 2.6
Nick Coghlan wrote: Guido van Rossum wrote: For data types whose output uses only ASCII, would it be acceptable if they always returned an 8-bit string and left it up to the caller to convert it to Unicode? This would apply to all numeric types. (The date/time types have a strftime() style API which means the user must be able to specifiy Unicode.) To elaborate on this a bit (and handwaving a lot of important details out of the way) do you mean something like the following for the builtin format?: def format(obj, fmt_spec=None): if fmt_spec is None: fmt_spec='' result = obj.__format__(fmt_spec) if isinstance(fmt_spec, unicode): if isinstance(result, str): result = unicode(result) return result That's the approach I'm taking. The builtin format is the only caller of __format__ that I know of, so it's the only place this would need to be done. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Backporting PEP 3101 to 2.6
Steve Holden wrote: Nick Coghlan wrote: To elaborate on this a bit (and handwaving a lot of important details out of the way) do you mean something like the following for the builtin format?: def format(obj, fmt_spec=None): if fmt_spec is None: fmt_spec='' result = obj.__format__(fmt_spec) if isinstance(fmt_spec, unicode): if isinstance(result, str): result = unicode(result) return result Isn't unicode idempotent? Couldn't if isinstance(result, str): result = unicode(result) avoid repeating in Python a test already made in C by re-spelling it as result = unicode(result) or have you hand-waved away important details that mean the test really is required? This code is written in C. It already has a check to verify that the return from __format__ is either str or unicode, and another check that fmt_spec is str or unicode. So doing the conversion only if result is str and fmt_spec is unicode would be a cheap decision. Good catch, though. I wouldn't have thought of it, and there are parts that are written in Python, so maybe I can leverage this elsewhere. Thanks! Eric. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Backporting PEP 3101 to 2.6
Nick Coghlan wrote: Guido van Rossum wrote: For data types whose output uses only ASCII, would it be acceptable if they always returned an 8-bit string and left it up to the caller to convert it to Unicode? This would apply to all numeric types. (The date/time types have a strftime() style API which means the user must be able to specifiy Unicode.) To elaborate on this a bit (and handwaving a lot of important details out of the way) do you mean something like the following for the builtin format?: def format(obj, fmt_spec=None): if fmt_spec is None: fmt_spec='' result = obj.__format__(fmt_spec) if isinstance(fmt_spec, unicode): if isinstance(result, str): result = unicode(result) return result Isn't unicode idempotent? Couldn't if isinstance(result, str): result = unicode(result) avoid repeating in Python a test already made in C by re-spelling it as result = unicode(result) or have you hand-waved away important details that mean the test really is required? regards Steve -- Steve Holden+1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Backporting PEP 3101 to 2.6
On 2008-01-10 14:31, Eric Smith wrote: (I'm posting to python-dev, because this isn't strictly 3.0 related. Hopefully most people read it in addition to python-3000). I'm working on backporting the changes I made for PEP 3101 (Advanced String Formatting) to the trunk, in order to meet the pre-PyCon release date for 2.6a1. I have a few questions about how I should handle str/unicode. 3.0 was pretty easy, because everything was unicode. Since this is a new feature, why bother with strings at all (even in 2.6) ? Use Unicode throughout and be done with it. 1: How should the builtin format() work? It takes 2 parameters, an object o and a string s, and returns o.__format__(s). If s is None, it returns o.__format__(empty_string). In 3.0, the empty string is of course unicode. For 2.6, should I use u'' or ''? 2: In 3.0, object.__format__() is essentially this: class object: def __format__(self, format_spec): return format(str(self), format_spec) In 2.6, I assume it should be the equivalent of: class object: def __format__(self, format_spec): if isinstance(format_spec, str): return format(str(self), format_spec) elif isinstance(format_spec, unicode): return format(unicode(self), format_spec) else: error Does that seem right? 3: Every overridden __format__() method is going to have to check for string or unicode, just like object.__format() does, and return either a string or unicode object, appropriately. I don't see any way around this, but I'd like to hear any thoughts. I guess there aren't all that many __format__ methods that will be implemented, so this might not be a big burden. I'll of course implement the built in ones. Thanks in advance for any insights. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 10 2008) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Backporting PEP 3101 to 2.6
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jan 10, 2008, at 9:07 AM, M.-A. Lemburg wrote: On 2008-01-10 14:31, Eric Smith wrote: (I'm posting to python-dev, because this isn't strictly 3.0 related. Hopefully most people read it in addition to python-3000). I'm working on backporting the changes I made for PEP 3101 (Advanced String Formatting) to the trunk, in order to meet the pre-PyCon release date for 2.6a1. I have a few questions about how I should handle str/unicode. 3.0 was pretty easy, because everything was unicode. Since this is a new feature, why bother with strings at all (even in 2.6) ? Use Unicode throughout and be done with it. +1 - -Barry -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBR4YrpHEjvBPtnXfVAQJcgwP+PV+XsqtZZ2aFA4yxIYRzkVVCyk+rwFSN H58DygPu4AQvhb1Dzuudag1OkfdpUHeRkvTyjSkUTWbK/03Y4R5A8X8iDkkQozQd m92DynvSEIOtX3WJZT4SOvGj+QavQC4FmkTPlEPNwqBkIl4GkjfOnwMsKx2lwKN+ rOXUf7Mtvd8= =1ME/ -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Backporting PEP 3101 to 2.6
M.-A. Lemburg wrote: On 2008-01-10 14:31, Eric Smith wrote: (I'm posting to python-dev, because this isn't strictly 3.0 related. Hopefully most people read it in addition to python-3000). I'm working on backporting the changes I made for PEP 3101 (Advanced String Formatting) to the trunk, in order to meet the pre-PyCon release date for 2.6a1. I have a few questions about how I should handle str/unicode. 3.0 was pretty easy, because everything was unicode. Since this is a new feature, why bother with strings at all (even in 2.6) ? Use Unicode throughout and be done with it. I was hoping someone would say that! It would certainly make things much easier. But for my own selfish reasons, I'd like to have str.format() work in 2.6. Other than the issues I raised here, I've already done the vast majority of the work for the code to support either string or unicode. For example, I put most of the implementation in Objects/stringlib, so I can include it either as string or unicode. But I can live with unicode only if that's the consensus. Eric. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com