Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Shane Hathaway
Martin v. Löwis wrote:
> Define correctly. Python, in ucs2 mode, will allow to address individual
> surrogate codes, e.g. in indexing. So you get
> 
> 
u"\U00012345"[0]

When Python encodes characters internally in UCS-2, I would expect
u"\U00012345" to produce a UnicodeError("character can not be encoded in
UCS-2").

> u'\ud808'
> 
> This will never work "correctly", and never should, because an efficient
> implementation isn't possible. If you want "safe" indexing and slicing,
> you need ucs4.

I agree that UCS4 is needed.  There is a balancing act here; UTF-16 is
widely used and takes less space, while UCS4 is easier to treat as an
array of characters.  Maybe we can have both: unicode objects start with
an internal representation in UTF-16, but get promoted automatically to
UCS4 when you index or slice them.  The difference will not be visible
to Python code.  A compile-time switch will not be necessary.  What do
you think?

Shane
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Shane Hathaway
Martin v. Löwis wrote:
> Shane Hathaway wrote:
>>More generally, how should a non-unicode-expert writing Python extension
>>code find out the minimum they need to know about unicode to use the
>>Python unicode API?  The API reference [1] ought to at least have a list
>>of background links.  I had to hunt everywhere.
> 
> That, of course, depends on what your background is. Did you know what
> Latin-1 is, when you started? How it relates to code page 1252? What
> UTF-8 is? What an abstract character is, as opposed to a byte sequence
> on the one hand, and to a glyph on the other hand?
>
> Different people need different background, especially if they are
> writing different applications.

Yes, but the first few steps are the same for nearly everyone, and
people need more help taking the first few steps.  In particular:

- The Python docs link to unicode.org, but unicode.org is complicated,
long-winded, and leaves many questions unanswered.  The Wikipedia
article is far better.  I wish I had thought to look there instead.

  http://en.wikipedia.org/wiki/Unicode

- The docs should say what to expect to happen when a large unicode
character winds up in a Py_UNICODE array.  For instance, what is
len(u'\U00012345')?  1 or 2?  Does the answer depend on the UCS4
compile-time switch?

- The docs should help developers evaluate whether they need the UCS4
compile-time switch.  Is UCS2 good enough for Asia?  For math?  For
hieroglyphics? 

Shane
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Martin v. Löwis
Nicholas Bastin wrote:
> Yes, but the important question here is why would we want that?  Why
> doesn't Python just have *one* internal representation of a Unicode
> character?  Having more than one possible definition just creates
> problems, and provides no value.

It does provide value, there are good reasons for each setting. Which
of the two alternatives do you consider useless?

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Martin v. Löwis
Nicholas Bastin wrote:
> --enable-unicode=ucs2
> 
> be replaced with:
> 
> --enable-unicode=utf16
> 
> and the docs be updated to reflect more accurately the variance of the
> internal storage type.

-1. This breaks existing documentation and usage, and provides only
minimum value.

With --enable-unicode=ucs2, Python's Py_UNICODE does *not* start
supporting the full Unicode ccs the same way it supports UCS-2.
Individual surrogate values remain accessible, and supporting
non-BMP characters is left to the application (with the exception
of the UTF-8 codec).

Regards,
Martin


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Martin v. Löwis
Shane Hathaway wrote:
> I agree that UCS4 is needed.  There is a balancing act here; UTF-16 is
> widely used and takes less space, while UCS4 is easier to treat as an
> array of characters.  Maybe we can have both: unicode objects start with
> an internal representation in UTF-16, but get promoted automatically to
> UCS4 when you index or slice them.  The difference will not be visible
> to Python code.  A compile-time switch will not be necessary.  What do
> you think?

This breaks backwards compatibility with existing extension modules.
Applications that do PyUnicode_AsUnicode get a Py_UNICODE*, and
can use that to directly access the characters.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Martin v. Löwis
> Yes, but the first few steps are the same for nearly everyone, and
> people need more help taking the first few steps.

Contributions to the documentation are certainly welcome.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed alternative to __next__ and __exit__

2005-05-07 Thread Oren Tirosh
I suggest using a variation on the consumer interface, as described by
Fredrik Lundh at http://effbot.org/zone/consumer.htm :

.next() -- stays .next()
.__next__(arg) --  becomes .feed(arg)
.__exit__(StopIteration, ...) -- becomes .close()
.__exit__(..,..,..) -- becomes .feed(exc_info=(..,..,..))   

Extensions to effbot's original consumer interface:
1. The .feed() method may return a value 
2. Some way to raise an exception other than StopIteration inside the
generator/consumer function.  The use of a keyword argument to .feed
is just an example. I'm looking for other suggestions on this one.

No new builtins. No backward-compatibility methods and wrappers.

Yes, it would have been nicer if .next() had been called __next__() in
the first place. But at this stage I feel that the cost of "fixing" it
far outweighs any perceived benefit.

so much for "uncontroversial" parts!  :-)

  Oren


On 5/6/05, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> [Steven Bethard]
> > So, just to make sure, if we had another PEP that contained from PEP 340[1]:
> >  * Specification: the __next__() Method
> >  * Specification: the next() Built-in Function
> >  * Specification: a Change to the 'for' Loop
> >  * Specification: the Extended 'continue' Statement
> >  * the yield-expression part of Specification: Generator Exit Handling
> > would that cover all the pieces you're concerned about?
> >
> > I'd be willing to break these off into a separate PEP if people think
> > it's a good idea.  I've seen very few complaints about any of these
> > pieces of the proposal.  If possible, I'd like to see these things
> > approved now, so that the discussion could focus more directly on the
> > block-statement issues.
> 
> I don't think it's necessary to separate this out into a separate PEP;
> that just seems busy-work. I agree these parts are orthogonal and
> uncontroversial; a counter-PEP can suffice by stating that it's not
> countering those items nor repeating them.
> 
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/oren.tirosh%40gmail.com
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Shane Hathaway
Martin v. Löwis wrote:
> Shane Hathaway wrote:
> 
>>I agree that UCS4 is needed.  There is a balancing act here; UTF-16 is
>>widely used and takes less space, while UCS4 is easier to treat as an
>>array of characters.  Maybe we can have both: unicode objects start with
>>an internal representation in UTF-16, but get promoted automatically to
>>UCS4 when you index or slice them.  The difference will not be visible
>>to Python code.  A compile-time switch will not be necessary.  What do
>>you think?
> 
> 
> This breaks backwards compatibility with existing extension modules.
> Applications that do PyUnicode_AsUnicode get a Py_UNICODE*, and
> can use that to directly access the characters.

Py_UNICODE would always be 32 bits wide.  PyUnicode_AsUnicode would
cause the unicode object to be promoted automatically.  Extensions that
break as a result are technically broken already, aren't they?  They're
not supposed to depend on the size of Py_UNICODE.

Shane
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Martin v. Löwis
Shane Hathaway wrote:
> Py_UNICODE would always be 32 bits wide.

This would break PythonWin, which relies on Py_UNICODE being
the same as WCHAR_T. PythonWin is not broken, it just hasn't
been ported to UCS-4, yet (and porting this is difficult and
will cause a performance loss).

Regards,
Martin


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread M.-A. Lemburg
Shane Hathaway wrote:
> Martin v. Löwis wrote:
> 
>>Shane Hathaway wrote:
>>
>>
>>>I agree that UCS4 is needed.  There is a balancing act here; UTF-16 is
>>>widely used and takes less space, while UCS4 is easier to treat as an
>>>array of characters.  Maybe we can have both: unicode objects start with
>>>an internal representation in UTF-16, but get promoted automatically to
>>>UCS4 when you index or slice them.  The difference will not be visible
>>>to Python code.  A compile-time switch will not be necessary.  What do
>>>you think?
>>
>>
>>This breaks backwards compatibility with existing extension modules.
>>Applications that do PyUnicode_AsUnicode get a Py_UNICODE*, and
>>can use that to directly access the characters.
> 
> 
> Py_UNICODE would always be 32 bits wide.  PyUnicode_AsUnicode would
> cause the unicode object to be promoted automatically.  Extensions that
> break as a result are technically broken already, aren't they?  They're
> not supposed to depend on the size of Py_UNICODE.

-1.

You are free to compile Python with --enable-unicode=ucs4
if you prefer this setting.

I don't see any reason why we should force users to invest 4 bytes
of storage for each Unicode code point - 2 bytes work just fine
and can represent all Unicode characters that are currently
defined (using surrogates if necessary). As more and more
Unicode objects are used in a process, choosing UCS2 vs. UCS4
does make a huge difference in terms of used memory.

All this talk about UTF-16 vs. UCS-2 is not very useful
and strikes me a purely academic.

The reference to possibly breakage by slicing a Unicode and
breaking a surrogate pair is valid, the idea of UCS-4 being
less prone to breakage is a myth:

Unicode has many code points that are meant only for composition
and don't have any standalone meaning, e.g. a combining acute
accent (U+0301), yet they are perfectly valid code points -
regardless of UCS-2 or UCS-4. It is easily possible to break
such a combining sequence using slicing, so the most
often presented argument for using UCS-4 instead of UCS-2
(+ surrogates) is rather weak if seen by daylight.

Some may now say that combining sequences are not used
all that often. However, they play a central role in Unicode
normalization (http://www.unicode.org/reports/tr15/),
which is needed whenever you want to semantically
compare Unicode objects and are

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 07 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread M.-A. Lemburg
Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
> 
>>Hmm, looking at the configure.in script, it seems you're right.
>>I wonder why this weird dependency on TCL was added.
> 
> 
> If Python is configured for UCS-2, and Tcl for UCS-4, then
> Tkinter would not work out of the box. Hence the weird dependency.

I believe that it would be more appropriate to adjust the _tkinter
module to adapt to the TCL Unicode size rather than
forcing the complete Python system to adapt to TCL - I don't
really see the point in an optional extension module
defining the default for the interpreter core.

At the very least, this should be a user controlled option.

Otherwise, we might as well use sizeof(wchar_t) as basis
for the default Unicode size. This at least, would be
a much more reasonable choice than whatever TCL uses.

-
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 07 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Nicholas Bastin

On May 7, 2005, at 9:24 AM, Martin v. Löwis wrote:

> Nicholas Bastin wrote:
>> Yes, but the important question here is why would we want that?  Why
>> doesn't Python just have *one* internal representation of a Unicode
>> character?  Having more than one possible definition just creates
>> problems, and provides no value.
>
> It does provide value, there are good reasons for each setting. Which
> of the two alternatives do you consider useless?

I don't consider either alternative useless (well, I consider UCS-2 to 
be largely useless in the general case, but as we've already discussed 
here, Python isn't really UCS-2).  However, I would be a lot happier if 
we just chose *one*, and all Python's used that one.  This would make 
extension module distribution a lot easier.

I'd prefer UTF-16, but I would be perfectly happy with UCS-4.

--
Nick

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Nicholas Bastin

On May 7, 2005, at 9:29 AM, Martin v. Löwis wrote:

> Nicholas Bastin wrote:
>> --enable-unicode=ucs2
>>
>> be replaced with:
>>
>> --enable-unicode=utf16
>>
>> and the docs be updated to reflect more accurately the variance of the
>> internal storage type.
>
> -1. This breaks existing documentation and usage, and provides only
> minimum value.

Have you been missing this conversation?  UTF-16 is *WHAT PYTHON 
CURRENTLY IMPLEMENTS*.  The current documentation is flat out wrong.  
Breaking that isn't a big problem in my book.

It provides more than minimum value - it provides the truth.


> With --enable-unicode=ucs2, Python's Py_UNICODE does *not* start
> supporting the full Unicode ccs the same way it supports UCS-2.
> Individual surrogate values remain accessible, and supporting
> non-BMP characters is left to the application (with the exception
> of the UTF-8 codec).

I can't understand what you mean by this.  My point is that if you 
configure python to support UCS-2, then it SHOULD NOT support surrogate 
pairs.  Supporting surrogate paris is the purvey of variable width 
encodings, and UCS-2 is not among them.

--
Nick

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340: Deterministic Finalisation (new PEP draft, either a competitor or update to PEP 340)

2005-05-07 Thread Eric Nieuwland
Nick Coghlan wrote:

> [...]
> The whole PEP draft can be found here:
> http://members.iinet.net.au/~ncoghlan/public/pep-3XX.html
> [...]
> Used as follows::
>
>  for del auto_retry(3, IOError):
>  f = urllib.urlopen("http://python.org/";)
>  print f.read()

I don't know. Using 'del' in that place seems ackward to me.
Why not use the following rule:
for [VAR in] EXPR:
SUITE
If EXPR is an iterator, no finalisation is done.
If EXPR is not an iterator, it is created at the start and destroyed at 
the end of the loop.

--eric

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread M.-A. Lemburg
Nicholas Bastin wrote:
> On May 7, 2005, at 9:29 AM, Martin v. Löwis wrote:
>>With --enable-unicode=ucs2, Python's Py_UNICODE does *not* start
>>supporting the full Unicode ccs the same way it supports UCS-2.
>>Individual surrogate values remain accessible, and supporting
>>non-BMP characters is left to the application (with the exception
>>of the UTF-8 codec).
> 
> I can't understand what you mean by this.  My point is that if you 
> configure python to support UCS-2, then it SHOULD NOT support surrogate 
> pairs.  Supporting surrogate paris is the purvey of variable width 
> encodings, and UCS-2 is not among them.

Surrogate pairs are only supported by the UTF-8 and UTF-16
codecs (and a few others), not the Python Unicode
implementation itself - this treats surrogate code
points just like any other Unicode code point.

This allows us to be flexible and efficient in the implementation
while guaranteeing the round-trip safety of Unicode data processed
through Python.

Your complaint about the documentation (which started this
thread) is valid.

However, I don't understand all the excitement
about Py_UNICODE: if you don't like the way this Python
typedef works, you are free to interface to Python using
any of the supported encodings using PyUnicode_Encode()
and PyUnicode_Decode(). I'm sure you'll find one that
fits your needs and if not, you can even write your
own codec and register it with Python, e.g. UTF-32
which we currently don't support ;-)

Please upload your doc-patch to SF.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 07 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340: Deterministic Finalisation (new PEP draft, either a competitor or update to PEP 340)

2005-05-07 Thread Josiah Carlson

Eric Nieuwland <[EMAIL PROTECTED]> wrote:
> 
> Nick Coghlan wrote:
> 
> > [...]
> > The whole PEP draft can be found here:
> > http://members.iinet.net.au/~ncoghlan/public/pep-3XX.html
> > [...]
> > Used as follows::
> >
> >  for del auto_retry(3, IOError):
> >  f = urllib.urlopen("http://python.org/";)
> >  print f.read()
> 
> I don't know. Using 'del' in that place seems ackward to me.
> Why not use the following rule:
>   for [VAR in] EXPR:
>   SUITE
> If EXPR is an iterator, no finalisation is done.
> If EXPR is not an iterator, it is created at the start and destroyed at 
> the end of the loop.

You should know why that can't work.  If I pass a list, is a list an
iterator?  No, but it should neither be created nor destroyed before or
after.

The discussion has been had in regards to why re-using 'for' is a
non-starter; re-read the 200+ messages in the thread.

 - Josiah

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Nicholas Bastin

On May 7, 2005, at 5:09 PM, M.-A. Lemburg wrote:

> However, I don't understand all the excitement
> about Py_UNICODE: if you don't like the way this Python
> typedef works, you are free to interface to Python using
> any of the supported encodings using PyUnicode_Encode()
> and PyUnicode_Decode(). I'm sure you'll find one that
> fits your needs and if not, you can even write your
> own codec and register it with Python, e.g. UTF-32
> which we currently don't support ;-)

My concerns about Py_UNICODE are completely separate from my 
frustration that the documentation is wrong about this type.  It is 
much more important that the documentation be correct, first, and then 
we can discuss the reasons why it can be one of two values, rather than 
just a uniform value across all python implementations.  This makes 
distributing binary extension modules hard.  It has become clear to me 
that no one on this list gives a *%&^ about people attempting to 
distribute binary extension modules, or they would have cared about 
this problem, so I'll just drop that point.

However, somehow, what keeps getting lost in the mix is that 
--enable-unicode=ucs2 is a lie, and we should change what this 
configure option says.  Martin seems to disagree with me, for reasons 
that I don't understand.  I would be fine with calling the option 
utf16, or just 2 and 4, but not ucs2, as that means things that Python 
doesn't intend it to mean.

--
Nick

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Nicholas Bastin

On May 7, 2005, at 5:09 PM, M.-A. Lemburg wrote:

> Please upload your doc-patch to SF.

All of my proposals for what to change the documention to have been 
shot down by Martin.  If someone has better verbiage that they'd like 
to see, I'd be perfectly happy to patch the doc.

My last suggestion was:

"This type represents the storage type which is used by Python 
internally as the basis for holding Unicode ordinals.  Extension module 
developers should make no assumptions about the size of this type on 
any given platform."

--
Nick

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread M.-A. Lemburg
Nicholas Bastin wrote:
> On May 7, 2005, at 5:09 PM, M.-A. Lemburg wrote:
> 
> 
>>However, I don't understand all the excitement
>>about Py_UNICODE: if you don't like the way this Python
>>typedef works, you are free to interface to Python using
>>any of the supported encodings using PyUnicode_Encode()
>>and PyUnicode_Decode(). I'm sure you'll find one that
>>fits your needs and if not, you can even write your
>>own codec and register it with Python, e.g. UTF-32
>>which we currently don't support ;-)
> 
> 
> My concerns about Py_UNICODE are completely separate from my 
> frustration that the documentation is wrong about this type.  It is 
> much more important that the documentation be correct, first, and then 
> we can discuss the reasons why it can be one of two values, rather than 
> just a uniform value across all python implementations.  This makes 
> distributing binary extension modules hard.  It has become clear to me 
> that no one on this list gives a *%&^ about people attempting to 
> distribute binary extension modules, or they would have cared about 
> this problem, so I'll just drop that point.

Actually, many of us know about the problem of having to
ship UCS2 and UCS4 builds of binary extensions and the
troubles this causes with users.

It just adds one more dimension to the number of builds
you have to make - one for the Python version, another
for the platform and in the case of Linux another one for
the Unicode width. Nowadays most Linux distros ship UCS4
builds (after RedHat started this quest), so things start
to normalize again.

> However, somehow, what keeps getting lost in the mix is that 
> --enable-unicode=ucs2 is a lie, and we should change what this 
> configure option says.  Martin seems to disagree with me, for reasons 
> that I don't understand.  I would be fine with calling the option 
> utf16, or just 2 and 4, but not ucs2, as that means things that Python 
> doesn't intend it to mean.

It's not a lie: the Unicode implementation does work with
UCS2 code points (surrogate values are Unicode code points as
well - they happen to live in a special zone of the BMP).

Only the codecs add support for surrogates in a way that
allows round-trip safety regardless of whether you used UCS2
or UCS4 as compile time option.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 07 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340: Deterministic Finalisation (new PEP draft, either a competitor or update to PEP 340)

2005-05-07 Thread Ron Adam

Josiah Carlson wrote:

 > You should know why that can't work.  If I pass a list, is a list an
 > iterator?  No, but it should neither be created nor destroyed before or
 > after.
 >
 > The discussion has been had in regards to why re-using 'for' is a
 > non-starter; re-read the 200+ messages in the thread.
 >
 >  - Josiah


I agree, re-using or extending 'for' doesn't seem like a good idea to me.

I wonder how much effect adding, 'for-next' and the 'StopIteration' 
exception check as proposed in PEP340, will have on 'for''s performance.

And why this isn't just as good:

 try:
 for value in iterator:
 BLOCK1
 except StopIteration:
 BLOCK2

Is one extra line that bad?


I think a completely separate looping or non-looping construct would be 
better for the finalization issue, and maybe can work with class's with 
__exit__ as well as generators.

Having it loop has the advantage of making it break out in a better 
behaved way.  So may be Nicks PEP, would work better with a different 
keyword?

Hint: 'do'

Cheers,
Ron_Adam

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340

2005-05-07 Thread Bob Ippolito
On May 7, 2005, at 1:45 AM, Michele Simionato wrote:

> On 5/6/05, Steven Bethard <[EMAIL PROTECTED]> wrote:
>
>> FWIW, I'm +1 on this.  Enhanced Iterators
>>  * updates the iterator protocol to use .__next__() instead  
>> of .next()
>>  * introduces a new builtin next()
>>  * allows continue-statements to pass values to iterators
>>  * allows generators to receive values with a yield-expression
>> The first two are, I believe, how the iterator protocol probably
>> should have been in the first place.  The second two provide a simple
>> way of passing values to generators, something I got the impression
>> that the co-routiney people would like a lot.
>>
>
> Thank you for splitting the PEP. Conceptually, the "coroutine" part
> has nothing to do with blocks and it stands on its own, it is right
> to discuss it separately from the block syntax.
>
> Personally, I do not see an urgent need for the block syntax (most of
> the use case can be managed with decorators) nor for the "couroutine"
> syntax (you can already use Armin Rigo's greenlets for that).

While Armin's greenlets are really cool they're also really dangerous  
when you're integrating with C code, especially event loops and  
such.  Language support would be MUCH better.

-bob

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340: Deterministic Finalisation (new PEP draft, either a competitor or update to PEP 340)

2005-05-07 Thread Nick Coghlan
Ron Adam wrote:
> I agree, re-using or extending 'for' doesn't seem like a good idea to me.

I agree that re-using a straight 'for' loop is out, due to performance and 
compatibility issues with applying finalisation semantics to all such iterative 
loops (there's a reason the PEP redraft doesn't suggest this).

However, it makes sense to me that a "for loop with finalisation" should 
actually *be* a 'for' loop - just with some extra syntax to indicate that the 
iterator is finalised at the end of the loop.

An option other than the one in my PEP draft would be to put 'del' at the end 
of 
the line instead of before EXPR:

   for [VAR in] EXPR [del]:
   BLOCK1
   else:
   BLOCK2

However, as you say, 'del' isn't great for the purpose, but I was trying to 
avoid introduding yet another keyword. An obvious alternative is to use 
'finally' instead:

   for [finally] [VAR in] EXPR:
   BLOCK1
   else:
   BLOCK2

It still doesn't read all that well, but at least the word more accurately 
reflects the semantics involved.

If a new keyword is used to request iterator finalisation, it should probably 
include the word 'for' since it *is* a for loop:

   foreach [VAR in] EXPR:
   BLOCK1
   else:
   BLOCK2

That is, a basic 'for' loop wouldn't finalise the iterator, but a 'foreach' 
loop 
would. The other difference is that the iterator in the 'foreach' loop has the 
chance to suppress exceptions other than TerminateBlock/StopIteration (by 
refusing to be finalised in response to the exception).

The last option is to leave finalisation out of the 'for' loop syntax, and 
introduce a user defined statement to handle the finalisation:

   def consuming(iterable):
   itr = iter(iterable)
   try:
   yield itr
   finally:
   itr_exit = getattr(itr, "__exit__", None)
   if itr_exit is not None:
   try:
   itr_exit(TerminateBlock)
   except TerminateBlock:
   pass

   stmt consuming(iterable) as itr:
   for item in itr:
   process(item)

With this approach, iterators can't swallow exceptions. This means that 
something like auto_retry() would once again have to be written as a class:

   class auto_retry(3, IOError):
   def __init__(self, times, exc=Exception):
   self.times = xrange(times-1)
   self.exc = exc
   self.succeeded = False

   def __iter__(self):
   attempt = self.attempt
   for i in self.times:
   yield attempt()
   if self.succeeded:
   break
   else:
   yield self.last_attempt()

   def attempt(self):
   try:
   yield
   self.succeeded = True
   except self.exc:
   pass

   def last_attempt(self):
   yield


   for attempt in auto_retry(3, IOError):
stmt attempt:
# Do something!
# Including break to give up early
# Or continue to try again without raising IOError

> I wonder how much effect adding, 'for-next' and the 'StopIteration' 
> exception check as proposed in PEP340, will have on 'for''s performance.

I'm not sure what you mean here - 'for' loops already use a StopIteration 
raised 
by the iterator to indicate that the loop is complete. The code you posted 
can't 
work, since it also intercepts a StopIteration raised in the body of the loop.

> I think a completely separate looping or non-looping construct would be 
> better for the finalization issue, and maybe can work with class's with 
> __exit__ as well as generators.

The PEP redraft already proposes a non-looping version as a new statement. 
However, since generators are likely to start using the new non-looping 
statement, it's important to be able to ensure timely finalisation of normal 
iterators as well. Tim and Greg's discussion the other day convinced me of this 
- that's why the idea of using 'del' to mark a finalised loop made its way into 
the draft. It can be done using a user defined statement (as shown above), but 
it would be nicer to have something built into the 'for' loop syntax to handle 
it.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://boredomandlaziness.blogspot.com
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340: Deterministic Finalisation (new PEP draft, either a competitor or update to PEP 340)

2005-05-07 Thread Josiah Carlson

Ron Adam <[EMAIL PROTECTED]> wrote:
> Josiah Carlson wrote:
> 
>  > You should know why that can't work.  If I pass a list, is a list an
>  > iterator?  No, but it should neither be created nor destroyed before or
>  > after.
>  >
>  > The discussion has been had in regards to why re-using 'for' is a
>  > non-starter; re-read the 200+ messages in the thread.
>  >
>  >  - Josiah
> 
> 
> I agree, re-using or extending 'for' doesn't seem like a good idea to me.

Now that I've actually stopped to read Nick's PEP, my concern is that
'del', while being a keyword, would not be easy to spot embedded in the
rest of the line, and a large number of these 'statements' will only be
executed once, so the 'for' may confuse people.


> I wonder how much effect adding, 'for-next' and the 'StopIteration' 
> exception check as proposed in PEP340, will have on 'for''s performance.

For is already tuned to be as fast as possible, which makes sense; it is
used 4,523 times in Python 2.4.0's standard library, and easily hundreds
of thousands of times in user code.  Changing the standard for loop is
not to be done lightly.


> And why this isn't just as good:
> 
>  try:
>  for value in iterator:
>  BLOCK1
>  except StopIteration:
>  BLOCK2
> 
> Is one extra line that bad?

I don't know what line you are referring to.

> I think a completely separate looping or non-looping construct would be 
> better for the finalization issue, and maybe can work with class's with 
> __exit__ as well as generators.

>From what I understand, the entire conversation has always stated that
class-based finalized objects and generator-based finalized objects will
both work, and that any proposal that works for one, but not the other,
is not sufficient.


> Having it loop has the advantage of making it break out in a better 
> behaved way.

What you have just typed is nonsense.  Re-type it and be explicit.


> Hint: 'do'

'do' has been previously mentioned in the thread.

 - Josiah

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340: Deterministic Finalisation (new PEP draft, either a competitor or update to PEP 340)

2005-05-07 Thread Jp Calderone
On Sun, 08 May 2005 14:16:40 +1000, Nick Coghlan <[EMAIL PROTECTED]> wrote:
>Ron Adam wrote:
>> I agree, re-using or extending 'for' doesn't seem like a good idea to me.
>
>I agree that re-using a straight 'for' loop is out, due to performance and
>compatibility issues with applying finalisation semantics to all such iterative
>loops (there's a reason the PEP redraft doesn't suggest this).
>
>However, it makes sense to me that a "for loop with finalisation" should
>actually *be* a 'for' loop - just with some extra syntax to indicate that the
>iterator is finalised at the end of the loop.
>
>An option other than the one in my PEP draft would be to put 'del' at the end 
>of
>the line instead of before EXPR:
>
>   for [VAR in] EXPR [del]:
>   BLOCK1
>   else:
>   BLOCK2
>
>However, as you say, 'del' isn't great for the purpose, but I was trying to
>avoid introduding yet another keyword. An obvious alternative is to use
>'finally' instead:
>
>   for [finally] [VAR in] EXPR:
>   BLOCK1
>   else:
>   BLOCK2
>
>It still doesn't read all that well, but at least the word more accurately
>reflects the semantics involved.

  If such a construct is to be introduced, the ideal spelling would seem to be:

for [VAR in] EXPR:
BLOCK1
finally:
BLOCK2

  Jp
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340: Deterministic Finalisation (new PEP draft, either a competitor or update to PEP 340)

2005-05-07 Thread Ron Adam
Nick Coghlan wrote:
> Ron Adam wrote:
> 
>>I agree, re-using or extending 'for' doesn't seem like a good idea to me.
> 
> 
> I agree that re-using a straight 'for' loop is out, due to performance and 
> compatibility issues with applying finalisation semantics to all such 
> iterative 
> loops (there's a reason the PEP redraft doesn't suggest this).
> 
> However, it makes sense to me that a "for loop with finalisation" should 
> actually *be* a 'for' loop - just with some extra syntax to indicate that the 
> iterator is finalised at the end of the loop.

Question:  Is the 'for' in your case iterating over a sequence? or is it 
testing for an assignment to determine if it should continue?

The difference is slight I admit, and both views can be said to be true 
for 'for' loops iterating over lists also.  But maybe looking at it as a 
truth test of getting something instead of an iteration over a sequence 
would fit better?  When a variable to assign is not supplied then the 
test would be of a private continue-stop variable in the iterator or a 
StopIteration exception.


> However, as you say, 'del' isn't great for the purpose, but I was trying to 
> avoid introduding yet another keyword. 

I didn't say, that was Josiah, but I agree 'del' is not good.

>An obvious alternative is to use 
> 'finally' instead:
> 
>for [finally] [VAR in] EXPR:
>BLOCK1
>else:
>BLOCK2
> 
> It still doesn't read all that well, but at least the word more accurately 
> reflects the semantics involved.

How about:

   [VAR from] EXPR:

Could 'from' be reused in this context?

If the keyword chosen is completely different from 'for' or 'while', 
then it doesn't need a 'del' or 'finally' as that can be part of the new 
definition of whatever keyword is chosen.

I suggested reusing 'while' a few days ago because it fit the situation 
well, but come to the conclusion reusing either 'for' or 'while' should 
both be avoided.

So you might consider 'do', Guido responded with the following the other 
day:

#quote

 >[Greg Ewing]

 >> How about 'do'?
 >>
 >>do opening(filename) as f:
 >>  ...
 >>
 >>do locking(obj):
 >>  ...
 >>
 >>do carefully(): #  :-)
 >>  ...

I've been thinking of that too. It's short, and in a nostalgic way
conveys that it's a loop, without making it too obvious. (Those too
young to get that should Google for do-loop.  :-)

I wonder how many folks call their action methods do() though.

#endquote

So it's not been ruled out, or followed though with, as far as I know. 
And I think it will work for both looping and non looping situations.


> The last option is to leave finalisation out of the 'for' loop syntax, and 
> introduce a user defined statement to handle the finalisation:

Yes, leaving it out of 'for' loop syntax is good.

I don't have an opinion on user defined statements yet.  But I think 
they would be somewhat slower than a built in block that does the same 
thing.  Performance will be an issue because these things will be nested 
and possibly quite deeply.

>>I wonder how much effect adding, 'for-next' and the 'StopIteration' 
>>exception check as proposed in PEP340, will have on 'for''s performance.
> 
> I'm not sure what you mean here - 'for' loops already use a StopIteration 
> raised 
> by the iterator to indicate that the loop is complete. The code you posted 
> can't 
> work, since it also intercepts a StopIteration raised in the body of the loop.

Oops, meant that to say 'for-else' above ...

The 'else' is new isn't it?  I was thinking that putting a try-except 
around the loop does the same thing as the else.  Unless I misunderstand 
it's use.

But you are right, it wouldn't work if the loop catches the StopIteration.


>>I think a completely separate looping or non-looping construct would be 
>>better for the finalization issue, and maybe can work with class's with 
>>__exit__ as well as generators.
> 
> 
> The PEP redraft already proposes a non-looping version as a new statement. 
> However, since generators are likely to start using the new non-looping 
> statement, it's important to be able to ensure timely finalisation of normal 
> iterators as well. 

Huh?  I thought a normal iterator or generator doesn't need 
finalization?  If it does, then it's not normal.  Has a word been coined 
for iterators with try-finally's in them yet?

Ron_Adam  :-)


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 340: Deterministic Finalisation (new PEP draft, either a competitor or update to PEP 340)

2005-05-07 Thread Ron Adam
Josiah Carlson wrote:

> For is already tuned to be as fast as possible, which makes sense; it is
> used 4,523 times in Python 2.4.0's standard library, and easily hundreds
> of thousands of times in user code.  Changing the standard for loop is
> not to be done lightly.

Agreed!

>>And why this isn't just as good:
>>
>> try:
>> for value in iterator:
>> BLOCK1
>> except StopIteration:
>> BLOCK2
>>
>>Is one extra line that bad?
> 
> 
> I don't know what line you are referring to.

Was referring to the 'try'., the 'except' would be in place of the else.

Nick pointed out this wouldn't work as the 'for' already catches the 
StopIteration exception.


>>I think a completely separate looping or non-looping construct would be 
>>better for the finalization issue, and maybe can work with class's with 
>>__exit__ as well as generators.
> 
> From what I understand, the entire conversation has always stated that
> class-based finalized objects and generator-based finalized objects will
> both work, and that any proposal that works for one, but not the other,
> is not sufficient.

That's good to hear.  There seems to be some confusion as to weather or 
not 'for's will do finalizing.  So I was trying to stress I think 
regular 'for' loops should not finalize. They should probably give an 
error if an object with an try-finally in them or an __exit__ method. 
I'm not sure what the current opinion on that is.  But I didn't see it 
in any of the PEPs.

>>Having it loop has the advantage of making it break out in a better 
>>behaved way.
> 
> What you have just typed is nonsense.  Re-type it and be explicit.

It was a bit brief, sorry about that. :-)

To get a non-looping block to loop, you will need to put it in a loop or 
put a loop in it.

In the first case, doing a 'break' in the block doesn't exit the loop. 
so you need to add an extra test for that.

In the second case, doing a 'break' in the loop does exit the block, but 
finishes any code after the loop.  So you may need an extra case in that 
case.

Having a block that loops can simplify these conditions, in that a break 
alway exits the body of the block and stops the loop.  A 'continue' can 
be used to skip the end of the block and start the next loop early.

And you still have the option to put the block in a loop or loops in the 
block and they will work as they do now.

I hope that clarifies what I was thinking a bit better.


Ron_Adam



























___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com