Re: [Python-Dev] Backporting PEP 3101 to 2.6

2008-02-21 Thread Eric Smith
André Malo wrote:
 * Eric Smith wrote:
 But now that I look at time.strftime in py3k, it's converting the entire
 unicode string to a char string with PyUnicode_AsString, then converting
 back with PyUnicode_Decode.
 
 Looks wrong to me, too... :-)
 
 nd

I don't understand Unicode encoding/decoding well enough to describe 
this bug, but I admit it looks suspicious.  Could someone who does 
understand it open a bug against 3.0 (hopefully with an example that fails)?

The bug should also mention that 2.6 avoids this problem entirely by not 
supporting unicode with strftime or datetime.__format__, but 2.6 could 
probably leverage whatever solution is developed for 3.0.

Thanks.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backporting PEP 3101 to 2.6

2008-02-17 Thread André Malo
* Eric Smith wrote:

 André Malo wrote:
  I guess, a clean and complete solution (besides re-implementing the
  whole thing) would be to resolve each single format character with
  strftime, decode according to the locale and re-assemble the result
  string piece by piece. Doh!

 That's along the lines of what I was thinking.  strftime already does
 some of this to support %[zZ].

 But now that I look at time.strftime in py3k, it's converting the entire
 unicode string to a char string with PyUnicode_AsString, then converting
 back with PyUnicode_Decode.

Looks wrong to me, too... :-)

nd
-- 
$_=q?tvc!uif)%*|#Bopuifs!A`#~tvc!Xibu)%*|qsjou#Kvtu!A`#~tvc!KBQI!)*|~
tvc!ifmm)%*|#Qfsm!A`#~tvc!jt)%*|(Ibdlfs(~  # What the hell is JAPH? ;
@_=split/\s\s+#/;$_=(join''=map{chr(ord(  # André Malo ;
$_)-1)}split//=$_[0]).$_[1];s s.*s$_see;  #  http://www.perlig.de/ ;
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backporting PEP 3101 to 2.6

2008-02-16 Thread Eric Smith
Eric Smith wrote:
 Guido van Rossum wrote:
 For data types whose output uses only ASCII, would it be acceptable if
 they always returned an 8-bit string and left it up to the caller to
 convert it to Unicode? This would apply to all numeric types. (The
 date/time types have a strftime() style API which means the user must
 be able to specifiy Unicode.)

I'm finally getting around to finishing this up.  The approach I've 
taken for int, long, and float, is that they take either unicode or str 
format specifiers, and always return str results.  The builtin format() 
deals with converting str to unicode, if the format specifier was 
originally unicode.  This all works great.  It allows me to easily 
implement both ''.format and u''.format taking int, long, and float 
parameters.

I'm now working on datetime.  The __format__ method is really just a 
wrapper around strftime.  I was assuming (or rather hoping) that 
strftime does the right thing with unicode and str (unicode in = unicode 
out, str in = str out).  But it turns out strftime doesn't accept unicode:

$ ./python
Python 2.6a0 (trunk:60845M, Feb 15 2008, 21:09:57)
[GCC 4.1.2 20070626 (Red Hat 4.1.2-13)] on linux2
Type help, copyright, credits or license for more information.
  import datetime
  datetime.date.today().strftime('%y')
'08'
  datetime.date.today().strftime(u'%y')
Traceback (most recent call last):
   File stdin, line 1, in module
TypeError: strftime() argument 1 must be str, not unicode

As part of this task, I'm really not up to the job of changing strftime 
to support both str and unicode inputs.  So I think I'll put all of the 
__format__ code in place to support it if and when strftime supports 
unicode.  In the meantime, it won't be possible for u''.format to work 
with datetime objects.

  'year: {0:%y}'.format(datetime.date.today())
'year: 08'
  u'year: {0:%y}'.format(datetime.date.today())
Traceback (most recent call last):
   File stdin, line 1, in module
TypeError: strftime() argument 1 must be str, not unicode

The bad error message is a result of __format__ passing on unicode to 
strftime.

There are, of course, various ugly ways to work around this involving 
nested format calls.

Maybe I'll extend strftime to unicode for the PyCon sprint.

Eric.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backporting PEP 3101 to 2.6

2008-02-16 Thread Nick Coghlan
Eric Smith wrote:
 The bad error message is a result of __format__ passing on unicode to 
 strftime.
 
 There are, of course, various ugly ways to work around this involving 
 nested format calls.

I don't know if this fits your definition of ugly workaround, but what 
if datetime.__format__ did something like:

   def __format__(self, spec):
 encoding = None
 if isinstance(spec, unicode):
 encoding = 'utf-8'
 spec = spec.encode(encoding)
 result = strftime(spec, self)
 if encoding is not None:
 result = result.decode(encoding)
 return result

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backporting PEP 3101 to 2.6

2008-02-16 Thread André Malo
* Nick Coghlan wrote:

 Eric Smith wrote:
  The bad error message is a result of __format__ passing on unicode to
  strftime.
 
  There are, of course, various ugly ways to work around this involving
  nested format calls.

 I don't know if this fits your definition of ugly workaround, but what
 if datetime.__format__ did something like:

def __format__(self, spec):
  encoding = None
  if isinstance(spec, unicode):
  encoding = 'utf-8'
  spec = spec.encode(encoding)
  result = strftime(spec, self)
  if encoding is not None:
  result = result.decode(encoding)
  return result

Note that hardcoding utf-8 is a bad guess here as strftime(3) emits locale 
strings, so decoding will easily fail.

I guess, a clean and complete solution (besides re-implementing the whole 
thing) would be to resolve each single format character with strftime, 
decode according to the locale and re-assemble the result string piece by 
piece. Doh!

nd
-- 
 [...] weiß jemand zufällig, was der Tag DIV ausgeschrieben bedeutet?
DIVerses. Benannt nach all dem unstrukturierten Zeug, was die Leute da
so reinpacken und dann absolut positionieren ...
   -- Florian Hartig und Lars Kasper in dciwam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backporting PEP 3101 to 2.6

2008-02-16 Thread Eric Smith
André Malo wrote:

 I guess, a clean and complete solution (besides re-implementing the whole 
 thing) would be to resolve each single format character with strftime, 
 decode according to the locale and re-assemble the result string piece by 
 piece. Doh!

That's along the lines of what I was thinking.  strftime already does 
some of this to support %[zZ].

But now that I look at time.strftime in py3k, it's converting the entire 
unicode string to a char string with PyUnicode_AsString, then converting 
back with PyUnicode_Decode.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backporting PEP 3101 to 2.6

2008-01-11 Thread Nick Coghlan
Guido van Rossum wrote:
 For data types whose output uses only ASCII, would it be acceptable if
 they always returned an 8-bit string and left it up to the caller to
 convert it to Unicode? This would apply to all numeric types. (The
 date/time types have a strftime() style API which means the user must
 be able to specifiy Unicode.)

To elaborate on this a bit (and handwaving a lot of important details 
out of the way) do you mean something like the following for the builtin 
format?:

def format(obj, fmt_spec=None):
 if fmt_spec is None: fmt_spec=''
 result = obj.__format__(fmt_spec)
 if isinstance(fmt_spec, unicode):
 if isinstance(result, str):
 result = unicode(result)
 return result

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backporting PEP 3101 to 2.6

2008-01-11 Thread Eric Smith
Nick Coghlan wrote:
 Guido van Rossum wrote:
 For data types whose output uses only ASCII, would it be acceptable if
 they always returned an 8-bit string and left it up to the caller to
 convert it to Unicode? This would apply to all numeric types. (The
 date/time types have a strftime() style API which means the user must
 be able to specifiy Unicode.)
 
 To elaborate on this a bit (and handwaving a lot of important details 
 out of the way) do you mean something like the following for the builtin 
 format?:
 
 def format(obj, fmt_spec=None):
 if fmt_spec is None: fmt_spec=''
 result = obj.__format__(fmt_spec)
 if isinstance(fmt_spec, unicode):
 if isinstance(result, str):
 result = unicode(result)
 return result

That's the approach I'm taking.  The builtin format is the only caller 
of __format__ that I know of, so it's the only place this would need to 
be done.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backporting PEP 3101 to 2.6

2008-01-11 Thread Eric Smith
Steve Holden wrote:
 Nick Coghlan wrote:
 To elaborate on this a bit (and handwaving a lot of important details 
 out of the way) do you mean something like the following for the builtin 
 format?:

 def format(obj, fmt_spec=None):
  if fmt_spec is None: fmt_spec=''
  result = obj.__format__(fmt_spec)
  if isinstance(fmt_spec, unicode):
  if isinstance(result, str):
  result = unicode(result)
  return result

 Isn't unicode idempotent? Couldn't
 
   if isinstance(result, str):
   result = unicode(result)
 
 
 avoid repeating in Python a test already made in C by re-spelling it as
 
  result = unicode(result)
 
 or have you hand-waved away important details that mean the test really 
 is required?

This code is written in C.  It already has a check to verify that the 
return from __format__ is either str or unicode, and another check that 
fmt_spec is str or unicode.  So doing the conversion only if result is 
str and fmt_spec is unicode would be a cheap decision.

Good catch, though.  I wouldn't have thought of it, and there are parts 
that are written in Python, so maybe I can leverage this elsewhere.  Thanks!

Eric.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backporting PEP 3101 to 2.6

2008-01-11 Thread Steve Holden
Nick Coghlan wrote:
 Guido van Rossum wrote:
 For data types whose output uses only ASCII, would it be acceptable if
 they always returned an 8-bit string and left it up to the caller to
 convert it to Unicode? This would apply to all numeric types. (The
 date/time types have a strftime() style API which means the user must
 be able to specifiy Unicode.)
 
 To elaborate on this a bit (and handwaving a lot of important details 
 out of the way) do you mean something like the following for the builtin 
 format?:
 
 def format(obj, fmt_spec=None):
  if fmt_spec is None: fmt_spec=''
  result = obj.__format__(fmt_spec)
  if isinstance(fmt_spec, unicode):
  if isinstance(result, str):
  result = unicode(result)
  return result
 
Isn't unicode idempotent? Couldn't

  if isinstance(result, str):
  result = unicode(result)


avoid repeating in Python a test already made in C by re-spelling it as

 result = unicode(result)

or have you hand-waved away important details that mean the test really 
is required?

regards
  Steve
-- 
Steve Holden+1 571 484 6266   +1 800 494 3119
Holden Web LLC  http://www.holdenweb.com/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backporting PEP 3101 to 2.6

2008-01-10 Thread M.-A. Lemburg
On 2008-01-10 14:31, Eric Smith wrote:
 (I'm posting to python-dev, because this isn't strictly 3.0 related.
 Hopefully most people read it in addition to python-3000).
 
 I'm working on backporting the changes I made for PEP 3101 (Advanced
 String Formatting) to the trunk, in order to meet the pre-PyCon release
 date for 2.6a1.
 
 I have a few questions about how I should handle str/unicode.  3.0 was
 pretty easy, because everything was unicode.

Since this is a new feature, why bother with strings at all
(even in 2.6) ?

Use Unicode throughout and be done with it.

 1: How should the builtin format() work?  It takes 2 parameters, an
 object o and a string s, and returns o.__format__(s).  If s is None, it
 returns o.__format__(empty_string).  In 3.0, the empty string is of
 course unicode.  For 2.6, should I use u'' or ''?
 
 
 2: In 3.0, object.__format__() is essentially this:
 
 class object:
 def __format__(self, format_spec):
 return format(str(self), format_spec)
 
 In 2.6, I assume it should be the equivalent of:
 
 class object:
 def __format__(self, format_spec):
 if isinstance(format_spec, str):
 return format(str(self), format_spec)
 elif isinstance(format_spec, unicode):
 return format(unicode(self), format_spec)
 else:
 error
 
  Does that seem right?
 
 
 3: Every overridden __format__() method is going to have to check for
 string or unicode, just like object.__format() does, and return either a
 string or unicode object, appropriately.  I don't see any way around
 this, but I'd like to hear any thoughts.  I guess there aren't all that
 many __format__ methods that will be implemented, so this might not be a
 big burden.  I'll of course implement the built in ones.
 
 Thanks in advance for any insights.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 10 2008)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


 Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backporting PEP 3101 to 2.6

2008-01-10 Thread Barry Warsaw
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Jan 10, 2008, at 9:07 AM, M.-A. Lemburg wrote:

 On 2008-01-10 14:31, Eric Smith wrote:
 (I'm posting to python-dev, because this isn't strictly 3.0 related.
 Hopefully most people read it in addition to python-3000).

 I'm working on backporting the changes I made for PEP 3101 (Advanced
 String Formatting) to the trunk, in order to meet the pre-PyCon  
 release
 date for 2.6a1.

 I have a few questions about how I should handle str/unicode.  3.0  
 was
 pretty easy, because everything was unicode.

 Since this is a new feature, why bother with strings at all
 (even in 2.6) ?

 Use Unicode throughout and be done with it.

+1
- -Barry

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBR4YrpHEjvBPtnXfVAQJcgwP+PV+XsqtZZ2aFA4yxIYRzkVVCyk+rwFSN
H58DygPu4AQvhb1Dzuudag1OkfdpUHeRkvTyjSkUTWbK/03Y4R5A8X8iDkkQozQd
m92DynvSEIOtX3WJZT4SOvGj+QavQC4FmkTPlEPNwqBkIl4GkjfOnwMsKx2lwKN+
rOXUf7Mtvd8=
=1ME/
-END PGP SIGNATURE-
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backporting PEP 3101 to 2.6

2008-01-10 Thread Eric Smith
M.-A. Lemburg wrote:
 On 2008-01-10 14:31, Eric Smith wrote:
 (I'm posting to python-dev, because this isn't strictly 3.0 related.
 Hopefully most people read it in addition to python-3000).

 I'm working on backporting the changes I made for PEP 3101 (Advanced
 String Formatting) to the trunk, in order to meet the pre-PyCon release
 date for 2.6a1.

 I have a few questions about how I should handle str/unicode.  3.0 was
 pretty easy, because everything was unicode.
 
 Since this is a new feature, why bother with strings at all
 (even in 2.6) ?
 
 Use Unicode throughout and be done with it.

I was hoping someone would say that!  It would certainly make things 
much easier.

But for my own selfish reasons, I'd like to have str.format() work in 
2.6.  Other than the issues I raised here, I've already done the vast 
majority of the work for the code to support either string or unicode. 
For example, I put most of the implementation in Objects/stringlib, so I 
can include it either as string or unicode.

But I can live with unicode only if that's the consensus.

Eric.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com