Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Neil Hodgson wrote: M.-A. Lemburg: Unicode has the concept of combining code points, e.g. you can store an é (e with a accent) as e + '. Now if you slice off the accent, you'll break the character that you encoded using combining code points. ... next_indextype(u, index) - integer Returns the Unicode object index for the start of the next indextype found after u[index] or -1 in case no next element of this type exists. Should entity breakage be further discouraged by returning a slice here rather than an object index? You mean a slice that slices out the next indextype ? Something like: i = first_grapheme(u) x = 0 while x width and u[i] != \n: x, _ = draw(u[i], (x, y)) i = next_grapheme(u, i) This sounds a lot like you'd want iterators for the various index types. Should be possible to implement on top of the proposed APIs, e.g. itergraphemes(u), itercodepoints(u), etc. Note that what most people refer to as character is a grapheme in Unicode speak. Given that interpretation, breaking Unicode characters is something you won't ever work around with by using larger code units such as UCS4 compatible ones. Furthermore, you should also note that surrogates (two code units encoding one code point) are part of Unicode life. While you don't need them when storing Unicode in UCS4 code units, they can still be part of the Unicode data and the programmer has to be aware of these. I personally, don't think that slicing Unicode is such a big issue. If you know what you are doing, things tend not to break - which is true for pretty much everything you do in programming ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 25 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] AST branch is in?
Simon Burton [EMAIL PROTECTED] wrote: Is there a python interface ? Not yet. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New codecs checked in
Martin v. Löwis wrote: M.-A. Lemburg wrote: I just left them in because I thought they wouldn't do any harm and might be useful in some applications. Removing them where not directly needed by the codec would not be a problem. I think memory usage caused is measurable (I estimated 4KiB per dictionary). More importantly, people apparently currently change the dictionaries we provide and expect the codecs to automatically pick up the modified mappings. It would be better if the breakage is explicit (i.e. they get an AttributeError on the variable) instead of implicit (their changes to the mapping simply have no effect anymore). Agreed. I've already checked in the changes, BTW. KOI8-U is not available as mapping on ftp.unicode.org and I only recreated codecs from the mapping files available there. I think we should come up with mapping tables for the additional codecs as well, and maintain them in the CVS. This also applies to things like rot13. Agreed. I'll rerun the creation with the above changes sometime this week. I hope I can finish my encoding routine shortly, which again results in changes to the codecs (replacing the encoding dictionaries with other lookup tables). Having seen the decode tables written as long Unicode string, I think that this may indeed also be a good solution for encoding - the major improvement here is that the parser and compiler will do the work of creating the table. At module load time, the .pyc file will only contain a long string which is very fast to create and load (unlike dictionaries which are set up dynamically at load time). In general, it's better to do all the work up-front when creating the codecs, rather than having run-time code repeat these tasks over and over again. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 25 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 351, the freeze protocol
Josiah Carlson wrote: Nick Coghlan [EMAIL PROTECTED] wrote: I think having dicts and sets automatically invoke freeze would be a mistake, because at least one of the following two cases would behave unexpectedly: I'm pretty sure that the PEP was only aslomg if one would freeze the contents of dicts IF the dict was being frozen. That is, which of the following should be the case: freeze({1:[2,3,4]}) - {1:[2,3,4]} freeze({1:[2,3,4]}) - xdict(1=(2,3,4)) I believe the choices you intended are: freeze({1:[2,3,4]}) - imdict(1=[2,3,4]) freeze({1:[2,3,4]}) - imdict(1=(2,3,4)) Regardless, that question makes a lot more sense (and looking at the PEP again, I realised I simply read it wrong the first time). For containers where equality depends on the contents of the container (i.e., all the builtin ones), I don't see how it is possible to implement a sensible hash function without freezing the contents as well - otherwise your immutable isn't particularly immutable. Consider what would happen if list __freeze__ simply returned a tuple version of itself - you have a __freeze__ method which returns a potentially unhashable object! Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://boredomandlaziness.blogspot.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] AST branch is in?
On Tue, Oct 25, 2005 at 01:36:26PM +1000, Simon Burton wrote: Is there a python interface ? Not yet, as far as I know. FYI, all: please see the following weblog entry for a description of the AST branch: http://www.amk.ca/diary/2005/10/the_ast_branch_lands_1 If I got anything wrong, please offer corrections in the comments for that post. --amk ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Reminder: PyCon 2006 submissions due in a week
The submission deadline for PyCon 2006 is now a week away. PyCon 2006 will be in Dallas, Texas, February 24-26 2006. For 2006, I'd like to see more tutorial-style talks on the program. This means that your talk doesn't have to be about something entirely new; you can show how to use a particular language feature, standard library module, examine some aspect of a Python implementation, or compare the available libraries in an application domain. For example, the most popular talk at 2005 was Michelle Levesque's PyWeboff, which compare various web development tools. The next most popular (ignoring a few keynotes and the lightning talks) were Alex Martelli's talks on iterators generators, and on OOP. Partly that's because it's Alex, of course, but I think attendees want help in deciding which tools are good/helpful/safe to use. If you need an idea, http://wiki.python.org/moin/PyCon2005/Feedback lists some topics that 2005's attendees were interested in. CFP: http://www.python.org/pycon/2006/cfp Proposal submission site: http://submit.python.org/ --amk ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Bengt Richter wrote: At 11:43 2005-10-24 +0200, M.-A. Lemburg wrote: Bengt Richter wrote: Please bear with me for a few paragraphs ;-) Please note that source code encoding doesn't really have anything to do with the way the interpreter executes the program - it's merely a way to tell the parser how to convert string literals (currently on the Unicode ones) into constant Unicode objects within the program text. It's also a nice way to let other people know what kind of encoding you used to write your comments ;-) Nothing more. I think somehow I didn't make things clear, sorry ;-) As I tried to show in the example of module_a.cs vs module_b.cs, the source encoding currently results in two different str-type strings representing the source _character_ sequence, which is the _same_ in both cases. I don't follow you here. The source code encoding is only applied to Unicode literals (you are using string literals in your example). String literals are passed through as-is. Whether or not you editor will use the source code encoding marker is really up to your editor and not within the scope of Python. If you open the two module files in Emacs, you'll see identical renderings of the string literals. With other editors, you may have to explicitly tell the editor which encoding to assume. Dito for shell printouts. To make it more clear, try the following little program (untested except on NT4 with Python 2.4b1 (#56, Nov 3 2004, 01:47:27) [GCC 3.2.3 (mingw special 20030504-1)] on win32 ;-): t_srcenc.py import os def test(): open('module_a.py','wb').write( # -*- coding: latin-1 -*- + os.linesep + cs = '\xfcber-cool' + os.linesep) open('module_b.py','wb').write( # -*- coding: utf-8 -*- + os.linesep + cs = '\xc3\xbcber-cool' + os.linesep) # show that we have two modules differing only in encoding: print ''.join(line.decode('latin-1') for line in open('module_a.py')) print ''.join(line.decode('utf-8') for line in open('module_b.py')) # see how results are affected: import module_a, module_b print module_a.cs + ' =?= ' + module_b.cs print module_a.cs.decode('latin-1') + ' =?= ' + module_b.cs.decode('utf-8') if __name__ == '__main__': test() --- The result copied from NT4 console to clipboard and pasted into eudora: __ [17:39] C:\pywk\python-devpy24 t_srcenc.py # -*- coding: latin-1 -*- cs = 'über-cool' # -*- coding: utf-8 -*- cs = 'über-cool' nber-cool =?= ++ber-cool über-cool =?= über-cool __ (I'd say NT did the best it could, rendering the the copied cp437 superscript n as the 'n' above, and the '++' coming from the cp437 box characters corresponding to the '\xc3\xbc'. Not sure how it will show on your screen, but try the program to see ;-) Once a module is compiled, there's no distinction between a module using the latin-1 source code encoding or one using the utf-8 encoding. ISTM module_a.cs and module_b.cs can readily be distinguished after compilation, whereas the sources displayed according to their declared encodings as above (or as e.g. different editors using different native encoding might) cannot (other than the encoding cookie itself) ;-) Perhaps you meant something else? What your editor displays to you is not within the scope of Python, e.g. if you open the files in Emacs you'll see something different than in Notepad. I guess that's the price you have to pay for being able to write programs that can include Unicode literals using the complete range of possible Unicode characters without having to revert to escapes. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 25 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
M.-A. Lemburg wrote: I don't follow you here. The source code encoding is only applied to Unicode literals (you are using string literals in your example). String literals are passed through as-is. however, for Python 3000, it would be nice if the source-code encoding applied to the *entire* file (XML-style), rather than just unicode string literals and (hope- fully) comments and docstrings. /F ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
Fredrik Lundh wrote: M.-A. Lemburg wrote: I don't follow you here. The source code encoding is only applied to Unicode literals (you are using string literals in your example). String literals are passed through as-is. however, for Python 3000, it would be nice if the source-code encoding applied to the *entire* file (XML-style), rather than just unicode string literals and (hope- fully) comments and docstrings. Actually, the encoding is applied to the complete source file: the file is transcoded into UTF-8 and then parsed by the Python parser. Unicode literals are then decoded from the UTF-8 into Unicode. String literals are transcoded back into the source code encoding, thus making the (rather long due to technical constraints) round-trip source code encoding - Unicode - UTF-8 - Unicode - source code encoding. Python 3k should have a fully Unicode based parser to reduce this additional transcoding overhead. Since Py3k will only have Unicode literals, the problems with string literals will go away all by themselves :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 25 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New codecs checked in
M.-A. Lemburg wrote: Martin v. Löwis wrote: M.-A. Lemburg wrote: I had to create three custom mapping files for cp1140, koi8-u and tis-620. Can you please publish the files you have used somewhere? They best go into the Python CVS. Sure; I'll check in the whole build machinery I'm using for this. Done. In order to rebuild the codecs, cd Tools/unicode; make then check the codecs in the created build/ subdir (e.g. using comparecodecs.py) and copy them over to the Lib/encodings/ directory. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 25 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposed resolutions for open PEP 343 issues
Almost there - this is the only issue I have left on my list :) Guido van Rossum wrote: On 10/24/05, Nick Coghlan [EMAIL PROTECTED] wrote: However, those resolutions bring up the following issues: 5 a. What exception is raised when EXPR does not have a __context__ method? b. What about when the returned object is missing __enter__ or __exit__? I suggest raising TypeError in both cases, for symmetry with for loops. The slot check is made in C code, so I don't see any difficulty in raising TypeError instead of AttributeError if the relevant slots aren't filled. Why are you so keen on TypeError? I find AttributeError totally appropriate. I don't see symmetry with for-loops as a valuable property here. AttributeError and TypeError are often interchangeable anyway. The reason I'm keen on TypeError is because 'abstract.c' uses it consistently when it fails to find a method to support a requested protocol. None of the abstract object methods currently raise AttributeError, and this property is fairly visible at the Python level because the abstract API's are used to implement many of the bytecodes and various builtin functions. Both for loops and the iter function, for example, get their current exception behaviour from PyObject_GetIter and PyIter_Next. Having had a look at mwh's patch, however, I've realised that going that way would only be possible if there were dedicated bytecodes for GET_CONTEXT, ENTER_CONTEXT and EXIT_CONTEXT (similar to the dedicated GET_ITER and FOR_ITER). Leaving the exception as AttributeError means that level of bytecode hacking isn't necessary (mwh's patch just emits a fairly normal try/finally statement, although it still modifies the bytecode to include LOAD_EXIT_ARGS). So, the inconsistency with other syntactic protocols still bothers me, but I can live with AttributeError if you don't want to add three new bytecodes just to support PEP 343. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://boredomandlaziness.blogspot.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New codecs checked in
M.-A. Lemburg wrote: Done. In order to rebuild the codecs, cd Tools/unicode; make then check the codecs in the created build/ subdir (e.g. using comparecodecs.py) and copy them over to the Lib/encodings/ directory. Thanks! Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] MinGW and libpython24.a
David Abrahams wrote: Is the instruction at http://www.python.org/dev/doc/devel/inst/tweak-flags.html#SECTION000622000 still relevant? I am not 100% certain I didn't make one myself, but it looks to me as though my Windows Python 2.4.1 distro came with a libpython24.a. I am asking here because it seems only the person who prepares the installer would know. That impression might be incorrect: I can tell you when I started including libpython24.a, but I have no clue whether the instructions you refer to are correct - I don't use the file myself at all. If this is true, in which version was it introduced? It was introduced in 1.20/1.16.2.4 of Tools/msi/msi.py in response to patch #1088716; this in turn was first used to release r241c1. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Bill Janssen wrote: I just got mail this morning from a researcher who wants exactly what Martin described, and wondered why the default MacPython 2.4.2 didn't provide it by default. :-) If all he wants is to represent Deseret, he can do so in a 16-bit Unicode type, too: Python supports UTF-16. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
Fredrik Lundh wrote: however, for Python 3000, it would be nice if the source-code encoding applied to the *entire* file (XML-style), rather than just unicode string literals and (hope- fully) comments and docstrings. As MAL explains, the encoding currently does apply to the entire file. However, because of the Python syntax, you are restricted to ASCII in many places, such as keywords, number literals, and (unfortunately) identifiers. Lifting the restriction on identifiers is on my agenda. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 351, the freeze protocol
Nick Coghlan [EMAIL PROTECTED] wrote: Josiah Carlson wrote: Nick Coghlan [EMAIL PROTECTED] wrote: I think having dicts and sets automatically invoke freeze would be a mistake, because at least one of the following two cases would behave unexpectedly: I'm pretty sure that the PEP was only aslomg if one would freeze the contents of dicts IF the dict was being frozen. That is, which of the following should be the case: freeze({1:[2,3,4]}) - {1:[2,3,4]} freeze({1:[2,3,4]}) - xdict(1=(2,3,4)) I believe the choices you intended are: freeze({1:[2,3,4]}) - imdict(1=[2,3,4]) freeze({1:[2,3,4]}) - imdict(1=(2,3,4)) Regardless, that question makes a lot more sense (and looking at the PEP again, I realised I simply read it wrong the first time). For containers where equality depends on the contents of the container (i.e., all the builtin ones), I don't see how it is possible to implement a sensible hash function without freezing the contents as well - otherwise your immutable isn't particularly immutable. Consider what would happen if list __freeze__ simply returned a tuple version of itself - you have a __freeze__ method which returns a potentially unhashable object! I agree completely, hence my original statement on 10/23: it is of my opinion that a container which is frozen should have its contents frozen as well. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] MinGW and libpython24.a
Martin v. Löwis [EMAIL PROTECTED] writes: David Abrahams wrote: Is the instruction at http://www.python.org/dev/doc/devel/inst/tweak-flags.html#SECTION000622000 still relevant? I am not 100% certain I didn't make one myself, but it looks to me as though my Windows Python 2.4.1 distro came with a libpython24.a. I am asking here because it seems only the person who prepares the installer would know. That impression might be incorrect: I can tell you when I started including libpython24.a, but I have no clue whether the instructions you refer to are correct - I don't use the file myself at all. If this is true, in which version was it introduced? It was introduced in 1.20/1.16.2.4 of Tools/msi/msi.py in response to patch #1088716; this in turn was first used to release r241c1. Thanks! -- Dave Abrahams Boost Consulting www.boost-consulting.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
Martin v. Löwis [EMAIL PROTECTED] wrote: Fredrik Lundh wrote: however, for Python 3000, it would be nice if the source-code encoding applied to the *entire* file (XML-style), rather than just unicode string literals and (hope- fully) comments and docstrings. As MAL explains, the encoding currently does apply to the entire file. However, because of the Python syntax, you are restricted to ASCII in many places, such as keywords, number literals, and (unfortunately) identifiers. Lifting the restriction on identifiers is on my agenda. It seems that removing this restriction may cause serious issues, at least in the case when using cyrillic characters in names. See recent security issues in regards to web addresses in web browsers for the confusion (and/or name errors) that could result in their use. While I agree in principle that people should be able to use the entirety of one's own natural language in writing software in programming languages, I think that it is an ugly can of worms that perhaps shouldn't be opened. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposed resolutions for open PEP 343 issues
Guido van Rossum wrote: It is true though that AttributeError is somewhat special. There are lots of places (perhaps too many?) where an operation is defined using something like if the object has attribute __foo__, use it, otherwise use some other approach. Some operations explicitly check for AttributeError in their attribute check, and let a different exception bubble up the stack. Presumably this is done so that a bug in somebody's __getattr__ implementation doesn't get masked by the otherwise use some other approach branch. But this is relatively rare; most calls to PyObject_GetAttr just clear the error if they have a different approach available. In any case, I don't see any of this as supporting the position that TypeError is somehow more appropriate. An AttributeError complaining about a missing __enter__, __exit__ or __context__ method sounds just fine. (Oh, and please don't go checking for the existence of __exit__ before calling __enter__. That kind of bug is found with even the most cursory testing.) Hmmm... Would it be reasonable to introduce a ProtocolError exception? --eric ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposed resolutions for open PEP 343 issues
On 10/25/05, Eric Nieuwland [EMAIL PROTECTED] wrote: Hmmm... Would it be reasonable to introduce a ProtocolError exception? And which perceived problem would that solve? The problem of Nick Guido disagreeing in public? -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposed resolutions for open PEP 343 issues
Guido van Rossum wrote: On 10/25/05, Eric Nieuwland [EMAIL PROTECTED] wrote: Hmmm... Would it be reasonable to introduce a ProtocolError exception? And which perceived problem would that solve? The problem of Nick Guido disagreeing in public? ;-) No, that will go on in other fields, I guess. It was meant to be a bit more informative about what is wrong. ProtocolError: lacks __enter__ or __exit__ --eric ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposed resolutions for open PEP 343 issues
[Eric are all your pets called Eric? Nieuwland] Hmmm... Would it be reasonable to introduce a ProtocolError exception? [Guido] And which perceived problem would that solve? [Eric] It was meant to be a bit more informative about what is wrong. ProtocolError: lacks __enter__ or __exit__ That's exactly what I'm trying to avoid. :) I find AttributeError: __exit__ just as informative. In either case, if you know what __exit__ means, you'll know what you did wrong. And if you don't know what it means, you'll have to look it up anyway. And searching for ProtocolError doesn't do you any good -- you'll have to learn about what __exit__ is and where it is required. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 343 - multiple context managers in one statement
I have a deep suspicion that this has been done to death already, but my searching ability isn't up to finding the reference. So I'll simply ask the question, and not offer a long discussion: Has the option of letting the with statement admit multiple context managers been considered (and presumably rejected)? I'm thinking of with expr1, expr2, expr3: # whatever In some ways, this doesn't even need an extension to the PEP - giving tuples suitable __enter__ and __exit__ methods would do it. Or, I suppose a user-defined manager which combined a list of others: class combining: def __init__(*mgrs): self.mgrs = mgrs def __with__(self): return self def __enter__(self): return tuple(mgr.__enter__() for mgr in self.mgrs) def __exit__(self, type, value, tb): # first in, last out for mgr in reversed(self.mgrs): mgr.__exit__(type, value, tb) Would that be worth using as an example in the PEP? Sorry - it got a bit long anyway... Paul. PS The signature of __with__ in example 4 in the PEP is wrong - it has an incorrect lock parameter. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] AST branch is in?
On 10/20/05, Neal Norwitz [EMAIL PROTECTED] wrote: The Grammar is (was at one point at least) shared between Jython andwould allow more tools to be able to share infrastructure.The ideais to eventually be able to have [JP]ython output the same AST totools. Hello Python-dev, My name is Frank Wierzbicki and I'm working on the Jython project. Does anyone on this list know more about the history of this Grammar sharing between the two projects? I've heard about some Grammar sharing between Jython and Python, and I've noticed that (most of) the jython code in /org/python/parser/ast is commented Autogenerated AST node. I would definitely like to look at (eventually) coordinating with this effort. I've cross-posted to the Jython-dev list in case someone there has some insight. Thanks, Frank ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] AST branch is in?
On 10/25/05, Frank Wierzbicki [EMAIL PROTECTED] wrote: My name is Frank Wierzbicki and I'm working on the Jython project. Does anyone on this list know more about the history of this Grammar sharing between the two projects? I've heard about some Grammar sharing between Jython and Python, and I've noticed that (most of) the jython code in /org/python/parser/ast is commented Autogenerated AST node. I would definitely like to look at (eventually) coordinating with this effort. I've cross-posted to the Jython-dev list in case someone there has some insight. Your best bet is to track down Jim Hugunin and see if he remembers. He's jimhug at microsoft.com or jim at hugunin.net. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
I think he was more interested in the invariant Martin proposed, that len(\U0001) should always be the same and should always be 1. Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
On 10/25/05, Bill Janssen [EMAIL PROTECTED] wrote: I think he was more interested in the invariant Martin proposed, that len(\U0001) should always be the same and should always be 1. Yes but why? What does this invariant do for him? -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Jython-dev] Re: AST branch is in?
Frank Wierzbicki wrote: On 10/20/05, *Neal Norwitz* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: The Grammar is (was at one point at least) shared between Jython and would allow more tools to be able to share infrastructure. The idea is to eventually be able to have [JP]ython output the same AST to tools. Hello Python-dev, My name is Frank Wierzbicki and I'm working on the Jython project. Does anyone on this list know more about the history of this Grammar sharing between the two projects? I've heard about some Grammar sharing between Jython and Python, and I've noticed that (most of) the jython code in /org/python/parser/ast is commented Autogenerated AST node. I would definitely like to look at (eventually) coordinating with this effort. I've cross-posted to the Jython-dev list in case someone there has some insight. as far as I understand now Python trunk contains some generated AST representation C code created through the asdl_c.py script from an updated Python.asdl, these files live in http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Parser/ a parallel asdl_java.py existed in Python CVS sandbox (where the AST effort started) and was updated the last time the Jython own AST classes were generated with at the time version of Python.asdl (this was done by me if I remember correctly at some point in Jython 2.2 evolution, I think when the PyDev guys wanted a more up-to-date Jython parser to reuse): http://cvs.sourceforge.net/viewcvs.py/*checkout*/python/python/nondist/sandbox/ast/asdl_java.py?content-type=text%2Fplainrev=1.7 basically the new Python.asdl needs to be used, the asdl_java.py maybe updated and our compiler changed as necessary. regards. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
Josiah Carlson wrote: It seems that removing this restriction may cause serious issues, at least in the case when using cyrillic characters in names. See recent security issues in regards to web addresses in web browsers for the confusion (and/or name errors) that could result in their use. That impression is deceiving. We are talking about source code here; people type in identifiers explicitly rather than receiving them through linking, and they scope identifiers (by module or object). If somebody manages to get look-alike identifiers into your Python libraries, you have bigger problems than these look-alikes: anybody capable of doing so could just as well replace the real thing in the first place. As always in computer security: define your threat model before reasoning about the risks. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] AST branch is in?
Guido van Rossum wrote: On 10/25/05, Frank Wierzbicki [EMAIL PROTECTED] wrote: My name is Frank Wierzbicki and I'm working on the Jython project. Does anyone on this list know more about the history of this Grammar sharing between the two projects? I've heard about some Grammar sharing between Jython and Python, and I've noticed that (most of) the jython code in /org/python/parser/ast is commented Autogenerated AST node. I would definitely like to look at (eventually) coordinating with this effort. I've cross-posted to the Jython-dev list in case someone there has some insight. Your best bet is to track down Jim Hugunin and see if he remembers. He's jimhug at microsoft.com or jim at hugunin.net. no. this is all after Jim, its indeed a derived effort from the CPython own AST effort, just that we started using it quite a while ago. This is all after Jim was not involved with Jython anymore, Finn Bock started this. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] AST branch is in?
On 10/25/05, Samuele Pedroni [EMAIL PROTECTED] wrote: Your best bet is to track down Jim Hugunin and see if he remembers. He's jimhug at microsoft.com or jim at hugunin.net. no. this is all after Jim, its indeed a derived effort from the CPython own AST effort, just that we started using it quite a while ago. This is all after Jim was not involved with Jython anymore, Finn Bock started this. Oops! Sorry for the misinformation. Shows how much I know. :( -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
Guido van Rossum wrote: Yes but why? What does this invariant do for him? I don't know about this person, but there are a few things that don't work properly in UTF-16 mode: - the Unicode character database fails to lookup things. u\U0001D670.isupper() gives false, but should give true (since it denotes MATHEMATICAL MONOSPACE CAPITAL A). It gives true in UCS-4 mode - As a result, normalization on these doesn't work, either. It should normalize to LATIN CAPITAL LETTER A under NFKC, but doesn't. - regular expressions only have limited support. In particular, adding non-BMP characters to character classes is not possible. [\U0001D670] will match any character that is either \uD835 or \uDE70, whereas it only matches MATHEMATICAL MONOSPACE CAPITAL A in UCS-4 mode. There might be more limitations, but those are the ones that come to mind easily. While I could imagine fixing the first two with some effort, the third one is really tricky (unless you would accept a wide representation of a character class even if the Unicode representation is only narrow). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
Martin v. Löwis [EMAIL PROTECTED] wrote: Josiah Carlson wrote: It seems that removing this restriction may cause serious issues, at least in the case when using cyrillic characters in names. See recent security issues in regards to web addresses in web browsers for the confusion (and/or name errors) that could result in their use. That impression is deceiving. We are talking about source code here; people type in identifiers explicitly rather than receiving them through linking, and they scope identifiers (by module or object). If somebody manages to get look-alike identifiers into your Python libraries, you have bigger problems than these look-alikes: anybody capable of doing so could just as well replace the real thing in the first place. As always in computer security: define your threat model before reasoning about the risks. I should have been more explicit. I did not mean to imply that I was concerned about the security implications of inserting arbitrary identifiers in Python (I was mentioning the web browser case for an example of how such characters have been confusing previously), I am concerned about confusion involved with using: Greek Capital: Alpha, Beta, Epsilon, Zeta, Eta, Iota, Kappa, Mu, Nu, Omicron, Rho, and Tau. Cyrillic Capital: Dze, Je, A, Ve, Ie, Em, En, O, Er, Es, Te, Ha, ... And how users could say, name error? But I typed in window.draw(PEN) as I was told to, and it didn't work! Identically drawn glyphs are a problem, and pretending that they aren't a problem, doesn't make it so. Right now, all possible name glyphs are visually distinct, which would not be the case if any unicode character could be used as a name (except for numerals). Speaking of which, would we then be offering support for arabic/indic numeric literals, and/or support it in int()/float()? Ideally I would like to say yes, but I could see the confusion if such were allowed. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c
On Monday 24 October 2005 7:39 pm, Guido van Rossum wrote: On 10/24/05, M.-A. Lemburg [EMAIL PROTECTED] wrote: Guido van Rossum wrote: A concern I'd have with fixing this is that Unicode objects also support the buffer API. In any situation where either str or unicode is accepted I'd be reluctant to guess whether a buffer object was meant to be str-like or Unicode-like. I think this covers all the cases you mention here. This situation is a little better than that: the buffer interface has a slot called getcharbuffer which is what the string methods use in case they find that a string argument is not of type str or unicode. I stand corrected! As first step, I'd suggest to implement the gatcharbuffer slot. That will already go a long way. Phil, if anything still doesn't work after doing what Marc-Andre says, those would be good candidates for fixes! The patch is now on SF, #1337876. Phil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
Josiah Carlson wrote: Martin v. Löwis [EMAIL PROTECTED] wrote: Fredrik Lundh wrote: however, for Python 3000, it would be nice if the source-code encoding applied to the *entire* file (XML-style), rather than just unicode string literals and (hope- fully) comments and docstrings. As MAL explains, the encoding currently does apply to the entire file. However, because of the Python syntax, you are restricted to ASCII in many places, such as keywords, number literals, and (unfortunately) identifiers. Lifting the restriction on identifiers is on my agenda. It seems that removing this restriction may cause serious issues, at least in the case when using cyrillic characters in names. See recent security issues in regards to web addresses in web browsers for the confusion (and/or name errors) that could result in their use. While I agree in principle that people should be able to use the entirety of one's own natural language in writing software in programming languages, I think that it is an ugly can of worms that perhaps shouldn't be opened. I agree with Josiah. A few years ago we had a discussion about this on python-dev and agreed to stick with ASCII identifiers for Python. I still think that's the right way to go. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 25 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
On 10/25/05, Josiah Carlson [EMAIL PROTECTED] wrote: Identically drawn glyphs are a problem, and pretending that they aren't a problem, doesn't make it so. Right now, all possible name glyphs are visually distinct, which would not be the case if any unicode character could be used as a name (except for numerals). Speaking of which, would we then be offering support for arabic/indic numeric literals, and/or support it in int()/float()? Ideally I would like to say yes, but I could see the confusion if such were allowed. This problem isn't new. There are plenty of fonts where 1 and l are hard to distinguish, or l and I for that matter, or O and 0. Yes, we need better tools to diagnose this. No, we shouldn't let this stop us from adding such a feature if it is otherwise a good feature. I'm not so sure about this for other reasons -- it hampers code sharing, and as soon as you add right-to-left character sets to the mix (or top-to-bottom, for that matter), displaying source code is going to be near impossible for most tools (since the keywords and standard library module names will still be in the Latin alphabet). This actually seems a killer even for allowing Unicode in comments, which I'd otherwise favor. What do Unicode-aware apps generally do with right-to-left characters? -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
Josiah Carlson wrote: And how users could say, name error? But I typed in window.draw(PEN) as I was told to, and it didn't work! Ah, so the serious issues you are talking about are not security issues, but usability issues. I don't think extending the range of acceptable characters will cause any additional confusion. Users are already getting surprising NameErrors/AttributeErrors in the following cases: - they just misspell the identifier, and then, when the error message is printed, fail to recognize the difference, as they read over the typo just like they read over it when mistyping it in the first place. - they run into confusions with different things having the same names in different contexts. For example, they wonder why they get TypeError for passing the wrong number of arguments to a function, when the call matches exactly what the source code in front of them tells them - only that they were calling a different function which just happened to have the same name. In the light of these common mistakes, your example with an identifier named PEN, where the P might be a cyrillic letter or the E a greek one is just made up: For window.draw, people will readily understand that they are supposed to use Latin letters. More generally, they will know what script to use just from looking at the identifier. Identically drawn glyphs are a problem, and pretending that they aren't a problem, doesn't make it so. Right now, all possible name glyphs are visually distinct Not at all: Just compare Fool and Foo1 (and perhaps FooI) In the font in which I'm typing this, these are slightly different - but there are fonts in which the difference is really difficult to recognize. Speaking of which, would we then be offering support for arabic/indic numeric literals, and/or support it in int()/float()? No. None of the Arabic users have ever requested such a feature, so it would be stupid to provide it. We provide extended identifiers not for the fun of it, but because users are requesting them. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposed resolutions for open PEP 343 issues
Guido van Rossum wrote: On 10/25/05, Nick Coghlan [EMAIL PROTECTED] wrote: Almost there - this is the only issue I have left on my list :) [,,,] Why are you so keen on TypeError? I find AttributeError totally appropriate. I don't see symmetry with for-loops as a valuable property here. AttributeError and TypeError are often interchangeable anyway. The reason I'm keen on TypeError is because 'abstract.c' uses it consistently when it fails to find a method to support a requested protocol. Hm. abstract.c well predates the new type system. Slots and methods weren't really unified back then, so TypeError made obvious sense at the time. Ah, I hadn't considered that, because I never made significant use of any Python versions before 2.2. Maybe there's a design principle in there somewhere: Failed duck-typing - AttributeError (or TypeError for complex checks) Failed instance or subtype check - TypeError Most of the functions in abstract.c handle complex protocols, so a simple attribute error wouldn't convey the necessary meaning. The context protocol, on the other hand, is fairly simple, and an AttributeError tells you everything you really need to know. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://boredomandlaziness.blogspot.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
M.-A. Lemburg wrote: A few years ago we had a discussion about this on python-dev and agreed to stick with ASCII identifiers for Python. I still think that's the right way to go. I don't think there ever was such an agreement. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposed resolutions for open PEP 343 issues
On 10/25/05, Nick Coghlan [EMAIL PROTECTED] wrote: Maybe there's a design principle in there somewhere: Failed duck-typing - AttributeError (or TypeError for complex checks) Failed instance or subtype check - TypeError Doesn't convince me. If there are principles at work here (and not just coincidences), they are (a) don't lightly replace an exception by another, and (b) don't raise AttributeError; the getattr operation raise it for you. (a) says that we should let the AttributeError bubble up in the case of the with-statement; (b) explains why you see TypeError when a slot isn't filled. Most of the functions in abstract.c handle complex protocols, so a simple attribute error wouldn't convey the necessary meaning. The context protocol, on the other hand, is fairly simple, and an AttributeError tells you everything you really need to know. That's what I've been saying all the time. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
Martin v. Löwis [EMAIL PROTECTED] wrote: Josiah Carlson wrote: And how users could say, name error? But I typed in window.draw(PEN) as I was told to, and it didn't work! Ah, so the serious issues you are talking about are not security issues, but usability issues. Indeed, it was a misunderstanding, as the email stated: I did not mean to imply that I was concerned about the security implications of inserting arbitrary identifiers in Python (I was mentioning the web browser case for an example of how such characters have been confusing previously), I am concerned about confusion involved with using: [glyphs which are identical] I don't think extending the range of acceptable characters will cause any additional confusion. Users are already getting surprising NameErrors/AttributeErrors in the following cases: - they just misspell the identifier, and then, when the error message is printed, fail to recognize the difference, as they read over the typo just like they read over it when mistyping it in the first place. In this case it's not just a misreading, the characters look identical! When is an 'E' not an 'E'? When it is an Epsilon or Ie. Saying what characters will or will not be used as identifiers, when those characters are keys on a keyboard of a specific type, is pretty presumptuous. - they run into confusions with different things having the same names in different contexts. For example, they wonder why they get TypeError for passing the wrong number of arguments to a function, when the call matches exactly what the source code in front of them tells them - only that they were calling a different function which just happened to have the same name. Right, and users should be reading the documentation for the functions and methods they are calling. In the light of these common mistakes, your example with an identifier named PEN, where the P might be a cyrillic letter or the E a greek one is just made up: For window.draw, people will readily understand that they are supposed to use Latin letters. More generally, they will know what script to use just from looking at the identifier. Sure, that example was made up, but there are words which have been stolen from various languages by english, and you are discounting the case of single-letter temporary variables. Saying what will and won't happen over the course of using unicode identifiers is quite the prediction. Identically drawn glyphs are a problem, and pretending that they aren't a problem, doesn't make it so. Right now, all possible name glyphs are visually distinct Not at all: Just compare Fool and Foo1 (and perhaps FooI) In the font in which I'm typing this, these are slightly different - but there are fonts in which the difference is really difficult to recognize. Indeed, they are similar, but_ different_ in my font as well. The trick is that the glyphs are not different in the case of certain greek or cyrillic letters. They don't just /look/ similar they /are identical/. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
On 10/25/05, Josiah Carlson [EMAIL PROTECTED] wrote: Indeed, they are similar, but_ different_ in my font as well. The trick is that the glyphs are not different in the case of certain greek or cyrillic letters. They don't just /look/ similar they /are identical/. Well, in the font I'm using to read this email, I and l are /identical/. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
Guido van Rossum [EMAIL PROTECTED] wrote: On 10/25/05, Josiah Carlson [EMAIL PROTECTED] wrote: Indeed, they are similar, but_ different_ in my font as well. The trick is that the glyphs are not different in the case of certain greek or cyrillic letters. They don't just /look/ similar they /are identical/. Well, in the font I'm using to read this email, I and l are /identical/. In all fonts I've seen, E/Epsilon/Ie are /always identical/. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
Martin v. Löwis: This aspect of rendering is often not implemented, though. Web browsers do it correctly, see ... GUI frameworks sometimes do it correctly, sometimes don't; most notably, Tk has no good support for RTL text. Scintilla does a rough job with this. RTL text is displayed correctly as the underlying platform libraries (Windows or GTK+/Pango) handle this aspect when called to draw text. However editing is not performed correctly with the caret not being placed correctly within RTL text and other visual glitches. There is interest in the area and even a funding proposal this week. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
Martin v. Löwis wrote: For window.draw, people will readily understand that they are supposed to use Latin letters. More generally, they will know what script to use just from looking at the identifier. Would it help if an identifier were required to be made up of letters from the same alphabet, e.g. all Latin or all Greek or all Cyrillic, but not a mixture. Then you'd get an immediate error if you accidentally slipped in a letter from the wrong alphabet. Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] make testall hanging on HEAD?
At the moment, I see make testall hanging in test_timeout. In addition, test_curses is leaving the tty in a hosed state: test_crypt test_csv test_curses test_datetime test_dbm test_decimal test_decorators test_deque test_descr This is on Ubuntu Breezy, [GCC 4.0.2 20050808 (prerelease) (Ubuntu 4.0.1-4ubuntu9)] on linux2 Anyone else see this? -- Anthony Baxter [EMAIL PROTECTED] It's never too late to have a happy childhood. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] make testall hanging on HEAD?
ditto on the curses problem, but test_timeout completed just fine, at least the first time around. fedora core 4, x86_64 [GCC 4.0.1 20050727 (Red Hat 4.0.1-5)] on linux2 Jeff pgpTesSunOdI7.pgp Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
M.-A. Lemburg: You mean a slice that slices out the next indextype ? Yes. This sounds a lot like you'd want iterators for the various index types. Should be possible to implement on top of the proposed APIs, e.g. itergraphemes(u), itercodepoints(u), etc. Iterators may be helpful, but can also be too restrictive when the processing is not completely iterative, such as peeking ahead or looking behind to wrap at a word boundary in the display example. There should be It was more that there may leave less scope for error if there was a move away from indexes to slices. The PEP provides ways to specify what you want to examine or modify but it looks to me like returning indexes will see code repetition or additional variables with an increase in fragility. Note that what most people refer to as character is a grapheme in Unicode speak. A grapheme-oriented string type may be worthwhile although you'd probably have to choose a particular normalisation form to ease processing. Given that interpretation, breaking Unicode characters is something you won't ever work around with by using larger code units such as UCS4 compatible ones. I still think we can reduce the scope for errors. Furthermore, you should also note that surrogates (two code units encoding one code point) are part of Unicode life. While you don't need them when storing Unicode in UCS4 code units, they can still be part of the Unicode data and the programmer has to be aware of these. Many programmers can and will ignore surrogates. One day that may bite them but we can't close off text processing to those who have no idea of what surrogates are, or directional marks, or that sorting is locale dependent, or have no understanding of the difference between NFC and NFKD normalization forms. Neil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com