Re: [Python-Dev] New Py_UNICODE doc
Martin v. Löwis wrote:
> Define correctly. Python, in ucs2 mode, will allow to address individual
> surrogate codes, e.g. in indexing. So you get
>
>
u"\U00012345"[0]
When Python encodes characters internally in UCS-2, I would expect
u"\U00012345" to produce a UnicodeError("character can not be encoded in
UCS-2").
> u'\ud808'
>
> This will never work "correctly", and never should, because an efficient
> implementation isn't possible. If you want "safe" indexing and slicing,
> you need ucs4.
I agree that UCS4 is needed. There is a balancing act here; UTF-16 is
widely used and takes less space, while UCS4 is easier to treat as an
array of characters. Maybe we can have both: unicode objects start with
an internal representation in UTF-16, but get promoted automatically to
UCS4 when you index or slice them. The difference will not be visible
to Python code. A compile-time switch will not be necessary. What do
you think?
Shane
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Martin v. Löwis wrote: > Shane Hathaway wrote: >>More generally, how should a non-unicode-expert writing Python extension >>code find out the minimum they need to know about unicode to use the >>Python unicode API? The API reference [1] ought to at least have a list >>of background links. I had to hunt everywhere. > > That, of course, depends on what your background is. Did you know what > Latin-1 is, when you started? How it relates to code page 1252? What > UTF-8 is? What an abstract character is, as opposed to a byte sequence > on the one hand, and to a glyph on the other hand? > > Different people need different background, especially if they are > writing different applications. Yes, but the first few steps are the same for nearly everyone, and people need more help taking the first few steps. In particular: - The Python docs link to unicode.org, but unicode.org is complicated, long-winded, and leaves many questions unanswered. The Wikipedia article is far better. I wish I had thought to look there instead. http://en.wikipedia.org/wiki/Unicode - The docs should say what to expect to happen when a large unicode character winds up in a Py_UNICODE array. For instance, what is len(u'\U00012345')? 1 or 2? Does the answer depend on the UCS4 compile-time switch? - The docs should help developers evaluate whether they need the UCS4 compile-time switch. Is UCS2 good enough for Asia? For math? For hieroglyphics? Shane ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Nicholas Bastin wrote: > Yes, but the important question here is why would we want that? Why > doesn't Python just have *one* internal representation of a Unicode > character? Having more than one possible definition just creates > problems, and provides no value. It does provide value, there are good reasons for each setting. Which of the two alternatives do you consider useless? Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Nicholas Bastin wrote: > --enable-unicode=ucs2 > > be replaced with: > > --enable-unicode=utf16 > > and the docs be updated to reflect more accurately the variance of the > internal storage type. -1. This breaks existing documentation and usage, and provides only minimum value. With --enable-unicode=ucs2, Python's Py_UNICODE does *not* start supporting the full Unicode ccs the same way it supports UCS-2. Individual surrogate values remain accessible, and supporting non-BMP characters is left to the application (with the exception of the UTF-8 codec). Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Shane Hathaway wrote: > I agree that UCS4 is needed. There is a balancing act here; UTF-16 is > widely used and takes less space, while UCS4 is easier to treat as an > array of characters. Maybe we can have both: unicode objects start with > an internal representation in UTF-16, but get promoted automatically to > UCS4 when you index or slice them. The difference will not be visible > to Python code. A compile-time switch will not be necessary. What do > you think? This breaks backwards compatibility with existing extension modules. Applications that do PyUnicode_AsUnicode get a Py_UNICODE*, and can use that to directly access the characters. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
> Yes, but the first few steps are the same for nearly everyone, and > people need more help taking the first few steps. Contributions to the documentation are certainly welcome. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposed alternative to __next__ and __exit__
I suggest using a variation on the consumer interface, as described by Fredrik Lundh at http://effbot.org/zone/consumer.htm : .next() -- stays .next() .__next__(arg) -- becomes .feed(arg) .__exit__(StopIteration, ...) -- becomes .close() .__exit__(..,..,..) -- becomes .feed(exc_info=(..,..,..)) Extensions to effbot's original consumer interface: 1. The .feed() method may return a value 2. Some way to raise an exception other than StopIteration inside the generator/consumer function. The use of a keyword argument to .feed is just an example. I'm looking for other suggestions on this one. No new builtins. No backward-compatibility methods and wrappers. Yes, it would have been nicer if .next() had been called __next__() in the first place. But at this stage I feel that the cost of "fixing" it far outweighs any perceived benefit. so much for "uncontroversial" parts! :-) Oren On 5/6/05, Guido van Rossum <[EMAIL PROTECTED]> wrote: > [Steven Bethard] > > So, just to make sure, if we had another PEP that contained from PEP 340[1]: > > * Specification: the __next__() Method > > * Specification: the next() Built-in Function > > * Specification: a Change to the 'for' Loop > > * Specification: the Extended 'continue' Statement > > * the yield-expression part of Specification: Generator Exit Handling > > would that cover all the pieces you're concerned about? > > > > I'd be willing to break these off into a separate PEP if people think > > it's a good idea. I've seen very few complaints about any of these > > pieces of the proposal. If possible, I'd like to see these things > > approved now, so that the discussion could focus more directly on the > > block-statement issues. > > I don't think it's necessary to separate this out into a separate PEP; > that just seems busy-work. I agree these parts are orthogonal and > uncontroversial; a counter-PEP can suffice by stating that it's not > countering those items nor repeating them. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > ___ > Python-Dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/oren.tirosh%40gmail.com > ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Martin v. Löwis wrote: > Shane Hathaway wrote: > >>I agree that UCS4 is needed. There is a balancing act here; UTF-16 is >>widely used and takes less space, while UCS4 is easier to treat as an >>array of characters. Maybe we can have both: unicode objects start with >>an internal representation in UTF-16, but get promoted automatically to >>UCS4 when you index or slice them. The difference will not be visible >>to Python code. A compile-time switch will not be necessary. What do >>you think? > > > This breaks backwards compatibility with existing extension modules. > Applications that do PyUnicode_AsUnicode get a Py_UNICODE*, and > can use that to directly access the characters. Py_UNICODE would always be 32 bits wide. PyUnicode_AsUnicode would cause the unicode object to be promoted automatically. Extensions that break as a result are technically broken already, aren't they? They're not supposed to depend on the size of Py_UNICODE. Shane ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Shane Hathaway wrote: > Py_UNICODE would always be 32 bits wide. This would break PythonWin, which relies on Py_UNICODE being the same as WCHAR_T. PythonWin is not broken, it just hasn't been ported to UCS-4, yet (and porting this is difficult and will cause a performance loss). Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Shane Hathaway wrote: > Martin v. Löwis wrote: > >>Shane Hathaway wrote: >> >> >>>I agree that UCS4 is needed. There is a balancing act here; UTF-16 is >>>widely used and takes less space, while UCS4 is easier to treat as an >>>array of characters. Maybe we can have both: unicode objects start with >>>an internal representation in UTF-16, but get promoted automatically to >>>UCS4 when you index or slice them. The difference will not be visible >>>to Python code. A compile-time switch will not be necessary. What do >>>you think? >> >> >>This breaks backwards compatibility with existing extension modules. >>Applications that do PyUnicode_AsUnicode get a Py_UNICODE*, and >>can use that to directly access the characters. > > > Py_UNICODE would always be 32 bits wide. PyUnicode_AsUnicode would > cause the unicode object to be promoted automatically. Extensions that > break as a result are technically broken already, aren't they? They're > not supposed to depend on the size of Py_UNICODE. -1. You are free to compile Python with --enable-unicode=ucs4 if you prefer this setting. I don't see any reason why we should force users to invest 4 bytes of storage for each Unicode code point - 2 bytes work just fine and can represent all Unicode characters that are currently defined (using surrogates if necessary). As more and more Unicode objects are used in a process, choosing UCS2 vs. UCS4 does make a huge difference in terms of used memory. All this talk about UTF-16 vs. UCS-2 is not very useful and strikes me a purely academic. The reference to possibly breakage by slicing a Unicode and breaking a surrogate pair is valid, the idea of UCS-4 being less prone to breakage is a myth: Unicode has many code points that are meant only for composition and don't have any standalone meaning, e.g. a combining acute accent (U+0301), yet they are perfectly valid code points - regardless of UCS-2 or UCS-4. It is easily possible to break such a combining sequence using slicing, so the most often presented argument for using UCS-4 instead of UCS-2 (+ surrogates) is rather weak if seen by daylight. Some may now say that combining sequences are not used all that often. However, they play a central role in Unicode normalization (http://www.unicode.org/reports/tr15/), which is needed whenever you want to semantically compare Unicode objects and are -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 07 2005) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Martin v. Löwis wrote: > M.-A. Lemburg wrote: > >>Hmm, looking at the configure.in script, it seems you're right. >>I wonder why this weird dependency on TCL was added. > > > If Python is configured for UCS-2, and Tcl for UCS-4, then > Tkinter would not work out of the box. Hence the weird dependency. I believe that it would be more appropriate to adjust the _tkinter module to adapt to the TCL Unicode size rather than forcing the complete Python system to adapt to TCL - I don't really see the point in an optional extension module defining the default for the interpreter core. At the very least, this should be a user controlled option. Otherwise, we might as well use sizeof(wchar_t) as basis for the default Unicode size. This at least, would be a much more reasonable choice than whatever TCL uses. - Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 07 2005) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 7, 2005, at 9:24 AM, Martin v. Löwis wrote: > Nicholas Bastin wrote: >> Yes, but the important question here is why would we want that? Why >> doesn't Python just have *one* internal representation of a Unicode >> character? Having more than one possible definition just creates >> problems, and provides no value. > > It does provide value, there are good reasons for each setting. Which > of the two alternatives do you consider useless? I don't consider either alternative useless (well, I consider UCS-2 to be largely useless in the general case, but as we've already discussed here, Python isn't really UCS-2). However, I would be a lot happier if we just chose *one*, and all Python's used that one. This would make extension module distribution a lot easier. I'd prefer UTF-16, but I would be perfectly happy with UCS-4. -- Nick ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 7, 2005, at 9:29 AM, Martin v. Löwis wrote: > Nicholas Bastin wrote: >> --enable-unicode=ucs2 >> >> be replaced with: >> >> --enable-unicode=utf16 >> >> and the docs be updated to reflect more accurately the variance of the >> internal storage type. > > -1. This breaks existing documentation and usage, and provides only > minimum value. Have you been missing this conversation? UTF-16 is *WHAT PYTHON CURRENTLY IMPLEMENTS*. The current documentation is flat out wrong. Breaking that isn't a big problem in my book. It provides more than minimum value - it provides the truth. > With --enable-unicode=ucs2, Python's Py_UNICODE does *not* start > supporting the full Unicode ccs the same way it supports UCS-2. > Individual surrogate values remain accessible, and supporting > non-BMP characters is left to the application (with the exception > of the UTF-8 codec). I can't understand what you mean by this. My point is that if you configure python to support UCS-2, then it SHOULD NOT support surrogate pairs. Supporting surrogate paris is the purvey of variable width encodings, and UCS-2 is not among them. -- Nick ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340: Deterministic Finalisation (new PEP draft, either a competitor or update to PEP 340)
Nick Coghlan wrote:
> [...]
> The whole PEP draft can be found here:
> http://members.iinet.net.au/~ncoghlan/public/pep-3XX.html
> [...]
> Used as follows::
>
> for del auto_retry(3, IOError):
> f = urllib.urlopen("http://python.org/";)
> print f.read()
I don't know. Using 'del' in that place seems ackward to me.
Why not use the following rule:
for [VAR in] EXPR:
SUITE
If EXPR is an iterator, no finalisation is done.
If EXPR is not an iterator, it is created at the start and destroyed at
the end of the loop.
--eric
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Nicholas Bastin wrote: > On May 7, 2005, at 9:29 AM, Martin v. Löwis wrote: >>With --enable-unicode=ucs2, Python's Py_UNICODE does *not* start >>supporting the full Unicode ccs the same way it supports UCS-2. >>Individual surrogate values remain accessible, and supporting >>non-BMP characters is left to the application (with the exception >>of the UTF-8 codec). > > I can't understand what you mean by this. My point is that if you > configure python to support UCS-2, then it SHOULD NOT support surrogate > pairs. Supporting surrogate paris is the purvey of variable width > encodings, and UCS-2 is not among them. Surrogate pairs are only supported by the UTF-8 and UTF-16 codecs (and a few others), not the Python Unicode implementation itself - this treats surrogate code points just like any other Unicode code point. This allows us to be flexible and efficient in the implementation while guaranteeing the round-trip safety of Unicode data processed through Python. Your complaint about the documentation (which started this thread) is valid. However, I don't understand all the excitement about Py_UNICODE: if you don't like the way this Python typedef works, you are free to interface to Python using any of the supported encodings using PyUnicode_Encode() and PyUnicode_Decode(). I'm sure you'll find one that fits your needs and if not, you can even write your own codec and register it with Python, e.g. UTF-32 which we currently don't support ;-) Please upload your doc-patch to SF. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 07 2005) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340: Deterministic Finalisation (new PEP draft, either a competitor or update to PEP 340)
Eric Nieuwland <[EMAIL PROTECTED]> wrote:
>
> Nick Coghlan wrote:
>
> > [...]
> > The whole PEP draft can be found here:
> > http://members.iinet.net.au/~ncoghlan/public/pep-3XX.html
> > [...]
> > Used as follows::
> >
> > for del auto_retry(3, IOError):
> > f = urllib.urlopen("http://python.org/";)
> > print f.read()
>
> I don't know. Using 'del' in that place seems ackward to me.
> Why not use the following rule:
> for [VAR in] EXPR:
> SUITE
> If EXPR is an iterator, no finalisation is done.
> If EXPR is not an iterator, it is created at the start and destroyed at
> the end of the loop.
You should know why that can't work. If I pass a list, is a list an
iterator? No, but it should neither be created nor destroyed before or
after.
The discussion has been had in regards to why re-using 'for' is a
non-starter; re-read the 200+ messages in the thread.
- Josiah
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 7, 2005, at 5:09 PM, M.-A. Lemburg wrote: > However, I don't understand all the excitement > about Py_UNICODE: if you don't like the way this Python > typedef works, you are free to interface to Python using > any of the supported encodings using PyUnicode_Encode() > and PyUnicode_Decode(). I'm sure you'll find one that > fits your needs and if not, you can even write your > own codec and register it with Python, e.g. UTF-32 > which we currently don't support ;-) My concerns about Py_UNICODE are completely separate from my frustration that the documentation is wrong about this type. It is much more important that the documentation be correct, first, and then we can discuss the reasons why it can be one of two values, rather than just a uniform value across all python implementations. This makes distributing binary extension modules hard. It has become clear to me that no one on this list gives a *%&^ about people attempting to distribute binary extension modules, or they would have cared about this problem, so I'll just drop that point. However, somehow, what keeps getting lost in the mix is that --enable-unicode=ucs2 is a lie, and we should change what this configure option says. Martin seems to disagree with me, for reasons that I don't understand. I would be fine with calling the option utf16, or just 2 and 4, but not ucs2, as that means things that Python doesn't intend it to mean. -- Nick ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
On May 7, 2005, at 5:09 PM, M.-A. Lemburg wrote: > Please upload your doc-patch to SF. All of my proposals for what to change the documention to have been shot down by Martin. If someone has better verbiage that they'd like to see, I'd be perfectly happy to patch the doc. My last suggestion was: "This type represents the storage type which is used by Python internally as the basis for holding Unicode ordinals. Extension module developers should make no assumptions about the size of this type on any given platform." -- Nick ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New Py_UNICODE doc
Nicholas Bastin wrote: > On May 7, 2005, at 5:09 PM, M.-A. Lemburg wrote: > > >>However, I don't understand all the excitement >>about Py_UNICODE: if you don't like the way this Python >>typedef works, you are free to interface to Python using >>any of the supported encodings using PyUnicode_Encode() >>and PyUnicode_Decode(). I'm sure you'll find one that >>fits your needs and if not, you can even write your >>own codec and register it with Python, e.g. UTF-32 >>which we currently don't support ;-) > > > My concerns about Py_UNICODE are completely separate from my > frustration that the documentation is wrong about this type. It is > much more important that the documentation be correct, first, and then > we can discuss the reasons why it can be one of two values, rather than > just a uniform value across all python implementations. This makes > distributing binary extension modules hard. It has become clear to me > that no one on this list gives a *%&^ about people attempting to > distribute binary extension modules, or they would have cared about > this problem, so I'll just drop that point. Actually, many of us know about the problem of having to ship UCS2 and UCS4 builds of binary extensions and the troubles this causes with users. It just adds one more dimension to the number of builds you have to make - one for the Python version, another for the platform and in the case of Linux another one for the Unicode width. Nowadays most Linux distros ship UCS4 builds (after RedHat started this quest), so things start to normalize again. > However, somehow, what keeps getting lost in the mix is that > --enable-unicode=ucs2 is a lie, and we should change what this > configure option says. Martin seems to disagree with me, for reasons > that I don't understand. I would be fine with calling the option > utf16, or just 2 and 4, but not ucs2, as that means things that Python > doesn't intend it to mean. It's not a lie: the Unicode implementation does work with UCS2 code points (surrogate values are Unicode code points as well - they happen to live in a special zone of the BMP). Only the codecs add support for surrogates in a way that allows round-trip safety regardless of whether you used UCS2 or UCS4 as compile time option. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 07 2005) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340: Deterministic Finalisation (new PEP draft, either a competitor or update to PEP 340)
Josiah Carlson wrote: > You should know why that can't work. If I pass a list, is a list an > iterator? No, but it should neither be created nor destroyed before or > after. > > The discussion has been had in regards to why re-using 'for' is a > non-starter; re-read the 200+ messages in the thread. > > - Josiah I agree, re-using or extending 'for' doesn't seem like a good idea to me. I wonder how much effect adding, 'for-next' and the 'StopIteration' exception check as proposed in PEP340, will have on 'for''s performance. And why this isn't just as good: try: for value in iterator: BLOCK1 except StopIteration: BLOCK2 Is one extra line that bad? I think a completely separate looping or non-looping construct would be better for the finalization issue, and maybe can work with class's with __exit__ as well as generators. Having it loop has the advantage of making it break out in a better behaved way. So may be Nicks PEP, would work better with a different keyword? Hint: 'do' Cheers, Ron_Adam ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking off Enhanced Iterators PEP from PEP 340
On May 7, 2005, at 1:45 AM, Michele Simionato wrote: > On 5/6/05, Steven Bethard <[EMAIL PROTECTED]> wrote: > >> FWIW, I'm +1 on this. Enhanced Iterators >> * updates the iterator protocol to use .__next__() instead >> of .next() >> * introduces a new builtin next() >> * allows continue-statements to pass values to iterators >> * allows generators to receive values with a yield-expression >> The first two are, I believe, how the iterator protocol probably >> should have been in the first place. The second two provide a simple >> way of passing values to generators, something I got the impression >> that the co-routiney people would like a lot. >> > > Thank you for splitting the PEP. Conceptually, the "coroutine" part > has nothing to do with blocks and it stands on its own, it is right > to discuss it separately from the block syntax. > > Personally, I do not see an urgent need for the block syntax (most of > the use case can be managed with decorators) nor for the "couroutine" > syntax (you can already use Armin Rigo's greenlets for that). While Armin's greenlets are really cool they're also really dangerous when you're integrating with C code, especially event loops and such. Language support would be MUCH better. -bob ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340: Deterministic Finalisation (new PEP draft, either a competitor or update to PEP 340)
Ron Adam wrote: > I agree, re-using or extending 'for' doesn't seem like a good idea to me. I agree that re-using a straight 'for' loop is out, due to performance and compatibility issues with applying finalisation semantics to all such iterative loops (there's a reason the PEP redraft doesn't suggest this). However, it makes sense to me that a "for loop with finalisation" should actually *be* a 'for' loop - just with some extra syntax to indicate that the iterator is finalised at the end of the loop. An option other than the one in my PEP draft would be to put 'del' at the end of the line instead of before EXPR: for [VAR in] EXPR [del]: BLOCK1 else: BLOCK2 However, as you say, 'del' isn't great for the purpose, but I was trying to avoid introduding yet another keyword. An obvious alternative is to use 'finally' instead: for [finally] [VAR in] EXPR: BLOCK1 else: BLOCK2 It still doesn't read all that well, but at least the word more accurately reflects the semantics involved. If a new keyword is used to request iterator finalisation, it should probably include the word 'for' since it *is* a for loop: foreach [VAR in] EXPR: BLOCK1 else: BLOCK2 That is, a basic 'for' loop wouldn't finalise the iterator, but a 'foreach' loop would. The other difference is that the iterator in the 'foreach' loop has the chance to suppress exceptions other than TerminateBlock/StopIteration (by refusing to be finalised in response to the exception). The last option is to leave finalisation out of the 'for' loop syntax, and introduce a user defined statement to handle the finalisation: def consuming(iterable): itr = iter(iterable) try: yield itr finally: itr_exit = getattr(itr, "__exit__", None) if itr_exit is not None: try: itr_exit(TerminateBlock) except TerminateBlock: pass stmt consuming(iterable) as itr: for item in itr: process(item) With this approach, iterators can't swallow exceptions. This means that something like auto_retry() would once again have to be written as a class: class auto_retry(3, IOError): def __init__(self, times, exc=Exception): self.times = xrange(times-1) self.exc = exc self.succeeded = False def __iter__(self): attempt = self.attempt for i in self.times: yield attempt() if self.succeeded: break else: yield self.last_attempt() def attempt(self): try: yield self.succeeded = True except self.exc: pass def last_attempt(self): yield for attempt in auto_retry(3, IOError): stmt attempt: # Do something! # Including break to give up early # Or continue to try again without raising IOError > I wonder how much effect adding, 'for-next' and the 'StopIteration' > exception check as proposed in PEP340, will have on 'for''s performance. I'm not sure what you mean here - 'for' loops already use a StopIteration raised by the iterator to indicate that the loop is complete. The code you posted can't work, since it also intercepts a StopIteration raised in the body of the loop. > I think a completely separate looping or non-looping construct would be > better for the finalization issue, and maybe can work with class's with > __exit__ as well as generators. The PEP redraft already proposes a non-looping version as a new statement. However, since generators are likely to start using the new non-looping statement, it's important to be able to ensure timely finalisation of normal iterators as well. Tim and Greg's discussion the other day convinced me of this - that's why the idea of using 'del' to mark a finalised loop made its way into the draft. It can be done using a user defined statement (as shown above), but it would be nicer to have something built into the 'for' loop syntax to handle it. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://boredomandlaziness.blogspot.com ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340: Deterministic Finalisation (new PEP draft, either a competitor or update to PEP 340)
Ron Adam <[EMAIL PROTECTED]> wrote: > Josiah Carlson wrote: > > > You should know why that can't work. If I pass a list, is a list an > > iterator? No, but it should neither be created nor destroyed before or > > after. > > > > The discussion has been had in regards to why re-using 'for' is a > > non-starter; re-read the 200+ messages in the thread. > > > > - Josiah > > > I agree, re-using or extending 'for' doesn't seem like a good idea to me. Now that I've actually stopped to read Nick's PEP, my concern is that 'del', while being a keyword, would not be easy to spot embedded in the rest of the line, and a large number of these 'statements' will only be executed once, so the 'for' may confuse people. > I wonder how much effect adding, 'for-next' and the 'StopIteration' > exception check as proposed in PEP340, will have on 'for''s performance. For is already tuned to be as fast as possible, which makes sense; it is used 4,523 times in Python 2.4.0's standard library, and easily hundreds of thousands of times in user code. Changing the standard for loop is not to be done lightly. > And why this isn't just as good: > > try: > for value in iterator: > BLOCK1 > except StopIteration: > BLOCK2 > > Is one extra line that bad? I don't know what line you are referring to. > I think a completely separate looping or non-looping construct would be > better for the finalization issue, and maybe can work with class's with > __exit__ as well as generators. >From what I understand, the entire conversation has always stated that class-based finalized objects and generator-based finalized objects will both work, and that any proposal that works for one, but not the other, is not sufficient. > Having it loop has the advantage of making it break out in a better > behaved way. What you have just typed is nonsense. Re-type it and be explicit. > Hint: 'do' 'do' has been previously mentioned in the thread. - Josiah ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340: Deterministic Finalisation (new PEP draft, either a competitor or update to PEP 340)
On Sun, 08 May 2005 14:16:40 +1000, Nick Coghlan <[EMAIL PROTECTED]> wrote: >Ron Adam wrote: >> I agree, re-using or extending 'for' doesn't seem like a good idea to me. > >I agree that re-using a straight 'for' loop is out, due to performance and >compatibility issues with applying finalisation semantics to all such iterative >loops (there's a reason the PEP redraft doesn't suggest this). > >However, it makes sense to me that a "for loop with finalisation" should >actually *be* a 'for' loop - just with some extra syntax to indicate that the >iterator is finalised at the end of the loop. > >An option other than the one in my PEP draft would be to put 'del' at the end >of >the line instead of before EXPR: > > for [VAR in] EXPR [del]: > BLOCK1 > else: > BLOCK2 > >However, as you say, 'del' isn't great for the purpose, but I was trying to >avoid introduding yet another keyword. An obvious alternative is to use >'finally' instead: > > for [finally] [VAR in] EXPR: > BLOCK1 > else: > BLOCK2 > >It still doesn't read all that well, but at least the word more accurately >reflects the semantics involved. If such a construct is to be introduced, the ideal spelling would seem to be: for [VAR in] EXPR: BLOCK1 finally: BLOCK2 Jp ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340: Deterministic Finalisation (new PEP draft, either a competitor or update to PEP 340)
Nick Coghlan wrote: > Ron Adam wrote: > >>I agree, re-using or extending 'for' doesn't seem like a good idea to me. > > > I agree that re-using a straight 'for' loop is out, due to performance and > compatibility issues with applying finalisation semantics to all such > iterative > loops (there's a reason the PEP redraft doesn't suggest this). > > However, it makes sense to me that a "for loop with finalisation" should > actually *be* a 'for' loop - just with some extra syntax to indicate that the > iterator is finalised at the end of the loop. Question: Is the 'for' in your case iterating over a sequence? or is it testing for an assignment to determine if it should continue? The difference is slight I admit, and both views can be said to be true for 'for' loops iterating over lists also. But maybe looking at it as a truth test of getting something instead of an iteration over a sequence would fit better? When a variable to assign is not supplied then the test would be of a private continue-stop variable in the iterator or a StopIteration exception. > However, as you say, 'del' isn't great for the purpose, but I was trying to > avoid introduding yet another keyword. I didn't say, that was Josiah, but I agree 'del' is not good. >An obvious alternative is to use > 'finally' instead: > >for [finally] [VAR in] EXPR: >BLOCK1 >else: >BLOCK2 > > It still doesn't read all that well, but at least the word more accurately > reflects the semantics involved. How about: [VAR from] EXPR: Could 'from' be reused in this context? If the keyword chosen is completely different from 'for' or 'while', then it doesn't need a 'del' or 'finally' as that can be part of the new definition of whatever keyword is chosen. I suggested reusing 'while' a few days ago because it fit the situation well, but come to the conclusion reusing either 'for' or 'while' should both be avoided. So you might consider 'do', Guido responded with the following the other day: #quote >[Greg Ewing] >> How about 'do'? >> >>do opening(filename) as f: >> ... >> >>do locking(obj): >> ... >> >>do carefully(): # :-) >> ... I've been thinking of that too. It's short, and in a nostalgic way conveys that it's a loop, without making it too obvious. (Those too young to get that should Google for do-loop. :-) I wonder how many folks call their action methods do() though. #endquote So it's not been ruled out, or followed though with, as far as I know. And I think it will work for both looping and non looping situations. > The last option is to leave finalisation out of the 'for' loop syntax, and > introduce a user defined statement to handle the finalisation: Yes, leaving it out of 'for' loop syntax is good. I don't have an opinion on user defined statements yet. But I think they would be somewhat slower than a built in block that does the same thing. Performance will be an issue because these things will be nested and possibly quite deeply. >>I wonder how much effect adding, 'for-next' and the 'StopIteration' >>exception check as proposed in PEP340, will have on 'for''s performance. > > I'm not sure what you mean here - 'for' loops already use a StopIteration > raised > by the iterator to indicate that the loop is complete. The code you posted > can't > work, since it also intercepts a StopIteration raised in the body of the loop. Oops, meant that to say 'for-else' above ... The 'else' is new isn't it? I was thinking that putting a try-except around the loop does the same thing as the else. Unless I misunderstand it's use. But you are right, it wouldn't work if the loop catches the StopIteration. >>I think a completely separate looping or non-looping construct would be >>better for the finalization issue, and maybe can work with class's with >>__exit__ as well as generators. > > > The PEP redraft already proposes a non-looping version as a new statement. > However, since generators are likely to start using the new non-looping > statement, it's important to be able to ensure timely finalisation of normal > iterators as well. Huh? I thought a normal iterator or generator doesn't need finalization? If it does, then it's not normal. Has a word been coined for iterators with try-finally's in them yet? Ron_Adam :-) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340: Deterministic Finalisation (new PEP draft, either a competitor or update to PEP 340)
Josiah Carlson wrote: > For is already tuned to be as fast as possible, which makes sense; it is > used 4,523 times in Python 2.4.0's standard library, and easily hundreds > of thousands of times in user code. Changing the standard for loop is > not to be done lightly. Agreed! >>And why this isn't just as good: >> >> try: >> for value in iterator: >> BLOCK1 >> except StopIteration: >> BLOCK2 >> >>Is one extra line that bad? > > > I don't know what line you are referring to. Was referring to the 'try'., the 'except' would be in place of the else. Nick pointed out this wouldn't work as the 'for' already catches the StopIteration exception. >>I think a completely separate looping or non-looping construct would be >>better for the finalization issue, and maybe can work with class's with >>__exit__ as well as generators. > > From what I understand, the entire conversation has always stated that > class-based finalized objects and generator-based finalized objects will > both work, and that any proposal that works for one, but not the other, > is not sufficient. That's good to hear. There seems to be some confusion as to weather or not 'for's will do finalizing. So I was trying to stress I think regular 'for' loops should not finalize. They should probably give an error if an object with an try-finally in them or an __exit__ method. I'm not sure what the current opinion on that is. But I didn't see it in any of the PEPs. >>Having it loop has the advantage of making it break out in a better >>behaved way. > > What you have just typed is nonsense. Re-type it and be explicit. It was a bit brief, sorry about that. :-) To get a non-looping block to loop, you will need to put it in a loop or put a loop in it. In the first case, doing a 'break' in the block doesn't exit the loop. so you need to add an extra test for that. In the second case, doing a 'break' in the loop does exit the block, but finishes any code after the loop. So you may need an extra case in that case. Having a block that loops can simplify these conditions, in that a break alway exits the body of the block and stops the loop. A 'continue' can be used to skip the end of the block and start the next loop early. And you still have the option to put the block in a loop or loops in the block and they will work as they do now. I hope that clarifies what I was thinking a bit better. Ron_Adam ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
