Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread M.-A. Lemburg
Neil Hodgson wrote:
> M.-A. Lemburg:
> 
> 
>>Unicode has the concept of combining code points, e.g. you can
>>store an "é" (e with a accent) as "e" + "'". Now if you slice
>>off the accent, you'll break the character that you encoded
>>using combining code points.
>>...
>>next_(u, index) -> integer
>>
>>Returns the Unicode object index for the start of the next
>> found after u[index] or -1 in case no next element
>>of this type exists.
> 
> 
>Should entity breakage be further discouraged by returning a slice
> here rather than an object index?

You mean a slice that slices out the next  ?

>Something like:
> 
> i = first_grapheme(u)
> x = 0
> while x < width and u[i] != "\n":
>x, _ = draw(u[i], (x, y))
>i = next_grapheme(u, i)

This sounds a lot like you'd want iterators for the various
index types. Should be possible to implement on top of the
proposed APIs, e.g. itergraphemes(u), itercodepoints(u), etc.

Note that what most people refer to as "character" is a
grapheme in Unicode speak. Given that interpretation,
"breaking" Unicode "characters" is something you won't
ever work around with by using larger code units such
as UCS4 compatible ones.

Furthermore, you should also note that surrogates (two
code units encoding one code point) are part of Unicode
life. While you don't need them when storing Unicode
in UCS4 code units, they can still be part of the
Unicode data and the programmer has to be aware of
these.

I personally, don't think that slicing Unicode is
such a big issue. If you know what you are doing,
things tend not to break - which is true for pretty
much everything you do in programming ;-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] AST branch is in?

2005-10-25 Thread Neil Schemenauer
Simon Burton <[EMAIL PROTECTED]> wrote:
> Is there a python interface ?

Not yet.

  Neil

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New codecs checked in

2005-10-25 Thread M.-A. Lemburg
Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
> 
> 
>>I had to create three custom mapping files for cp1140, koi8-u
>>and tis-620.
> 
> 
> Can you please publish the files you have used somewhere? They
> best go into the Python CVS.

Sure; I'll check in the whole build machinery I'm using for this.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New codecs checked in

2005-10-25 Thread M.-A. Lemburg
Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
> 
>>I just left them in because I thought they wouldn't do any harm
>>and might be useful in some applications.
>>
>>Removing them where not directly needed by the codec would not
>>be a problem.
> 
> 
> I think memory usage caused is measurable (I estimated 4KiB per
> dictionary). More importantly, people apparently currently change
> the dictionaries we provide and expect the codecs to automatically
> pick up the modified mappings. It would be better if the breakage
> is explicit (i.e. they get an AttributeError on the variable) instead
> of implicit (their changes to the mapping simply have no effect
> anymore).

Agreed. I've already checked in the changes, BTW.

>>KOI8-U is not available as mapping on ftp.unicode.org and
>>I only recreated codecs from the mapping files available
>>there.
> 
> 
> I think we should come up with mapping tables for the additional
> codecs as well, and maintain them in the CVS. This also applies
> to things like rot13.

Agreed.

>>I'll rerun the creation with the above changes sometime this
>>week.
> 
> 
> I hope I can finish my encoding routine shortly, which again
> results in changes to the codecs (replacing the encoding dictionaries
> with other lookup tables).

Having seen the decode tables written as long Unicode string,
I think that this may indeed also be a good solution for
encoding - the major improvement here is that the parser
and compiler will do the work of creating the table. At
module load time, the .pyc file will only contain a long
string which is very fast to create and load (unlike dictionaries
which are set up dynamically at load time).

In general, it's better to do all the work up-front when
creating the codecs, rather than having run-time code
repeat these tasks over and over again.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-25 Thread Nick Coghlan
Josiah Carlson wrote:
> Nick Coghlan <[EMAIL PROTECTED]> wrote:
>> I think having dicts and sets automatically invoke freeze would be a 
>> mistake, 
>> because at least one of the following two cases would behave unexpectedly:
> 
> I'm pretty sure that the PEP was only aslomg if one would freeze the
> contents of dicts IF the dict was being frozen.
> 
> That is, which of the following should be the case:
> freeze({1:[2,3,4]}) -> {1:[2,3,4]}
> freeze({1:[2,3,4]}) -> xdict(1=(2,3,4))

I believe the choices you intended are:
  freeze({1:[2,3,4]}) -> imdict(1=[2,3,4])
  freeze({1:[2,3,4]}) -> imdict(1=(2,3,4))

Regardless, that question makes a lot more sense (and looking at the PEP 
again, I realised I simply read it wrong the first time).

For containers where equality depends on the contents of the container (i.e., 
all the builtin ones), I don't see how it is possible to implement a sensible 
hash function without freezing the contents as well - otherwise your immutable 
isn't particularly immutable.

Consider what would happen if list "__freeze__" simply returned a tuple 
version of itself - you have a __freeze__ method which returns a potentially 
unhashable object!

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://boredomandlaziness.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] AST branch is in?

2005-10-25 Thread A.M. Kuchling
On Tue, Oct 25, 2005 at 01:36:26PM +1000, Simon Burton wrote:
> Is there a python interface ?

Not yet, as far as I know.

FYI, all: please see the following weblog entry for a description of
the AST branch:  
http://www.amk.ca/diary/2005/10/the_ast_branch_lands_1

If I got anything wrong, please offer corrections in the comments for
that post.

--amk
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Reminder: PyCon 2006 submissions due in a week

2005-10-25 Thread A.M. Kuchling
The submission deadline for PyCon 2006 is now a week away.  PyCon 2006
will be in Dallas, Texas, February 24-26 2006.

For 2006, I'd like to see more tutorial-style talks on the program.
This means that your talk doesn't have to be about something entirely
new; you can show how to use a particular language feature, standard
library module, examine some aspect of a Python implementation, or
compare the available libraries in an application domain.

For example, the most popular talk at 2005 was Michelle Levesque's
PyWeboff, which compare various web development tools.  The next most
popular (ignoring a few keynotes and the lightning talks) were Alex
Martelli's talks on iterators & generators, and on OOP.  Partly that's
because it's Alex, of course, but I think attendees want help in
deciding which tools are good/helpful/safe to use.

If you need an idea, http://wiki.python.org/moin/PyCon2005/Feedback
lists some topics that 2005's attendees were interested in.

CFP:
http://www.python.org/pycon/2006/cfp

Proposal submission site:
http://submit.python.org/

--amk
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread M.-A. Lemburg
Bengt Richter wrote:
> At 11:43 2005-10-24 +0200, M.-A. Lemburg wrote:
> 
>>Bengt Richter wrote:
>>
>>>Please bear with me for a few paragraphs ;-)
>>
>>Please note that source code encoding doesn't really have
>>anything to do with the way the interpreter executes the
>>program - it's merely a way to tell the parser how to
>>convert string literals (currently on the Unicode ones)
>>into constant Unicode objects within the program text.
>>It's also a nice way to let other people know what kind of
>>encoding you used to write your comments ;-)
>>
>>Nothing more.
> 
> I think somehow I didn't make things clear, sorry ;-)
> As I tried to show in the example of module_a.cs vs module_b.cs,
> the source encoding currently results in two different str-type
> strings representing the source _character_ sequence, which is the
> _same_ in both cases. 

I don't follow you here. The source code encoding
is only applied to Unicode literals (you are using string
literals in your example). String literals are passed
through as-is.

Whether or not you editor will use the source
code encoding marker is really up to your editor
and not within the scope of Python.

If you open the two module files in Emacs, you'll
see identical renderings of the string literals.
With other editors, you may have to explicitly tell
the editor which encoding to assume. Dito for shell
printouts.

> To make it more clear, try the following little
> program (untested except on NT4 with
> Python 2.4b1 (#56, Nov  3 2004, 01:47:27)
> [GCC 3.2.3 (mingw special 20030504-1)] on win32 ;-):
> 
> < t_srcenc.py >
> import os
> def test():
> open('module_a.py','wb').write(
> "# -*- coding: latin-1 -*-" + os.linesep +
> "cs = '\xfcber-cool'" + os.linesep)
> open('module_b.py','wb').write(
> "# -*- coding: utf-8 -*-" + os.linesep +
> "cs = '\xc3\xbcber-cool'" + os.linesep)
> # show that we have two modules differing only in encoding:
> print ''.join(line.decode('latin-1') for line in open('module_a.py'))
> print ''.join(line.decode('utf-8') for line in open('module_b.py'))
> # see how results are affected:
> import module_a, module_b
> print module_a.cs + ' =?= ' + module_b.cs
> print module_a.cs.decode('latin-1') + ' =?= ' + 
> module_b.cs.decode('utf-8')
> 
> if __name__ == '__main__':
> test()
> ---
> The result copied from NT4 console to clipboard and pasted into eudora:
> __
> 
> [17:39] C:\pywk\python-dev>py24 t_srcenc.py
> # -*- coding: latin-1 -*-
> cs = 'über-cool'
> 
> # -*- coding: utf-8 -*-
> cs = 'über-cool'
> 
> nber-cool =?= ++ber-cool
> über-cool =?= über-cool
> __
> (I'd say NT did the best it could, rendering the the copied cp437
> superscript n as the 'n' above, and the '++' coming from the
> cp437 box characters corresponding to the '\xc3\xbc'. Not sure
> how it will show on your screen, but try the program to see ;-)
>
>>Once a module is compiled, there's no distinction between
>>a module using the latin-1 source code encoding or one using
>>the utf-8 encoding.
> 
> ISTM module_a.cs and module_b.cs can readily be distinguished after
> compilation, whereas the sources displayed according to their declared
> encodings as above (or as e.g. different editors using different native
> encoding might) cannot (other than the encoding cookie itself) ;-)
> Perhaps you meant something else?

What your editor displays to you is not within the scope
of Python, e.g. if you open the files in Emacs you'll see
something different than in Notepad.

I guess that's the price you have to pay for being able to write
programs that can include Unicode literals using the complete range
of possible Unicode characters without having to revert to
escapes.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Fredrik Lundh
M.-A. Lemburg wrote:

> I don't follow you here. The source code encoding
> is only applied to Unicode literals (you are using string
> literals in your example). String literals are passed
> through as-is.

however, for Python 3000, it would be nice if the source-code encoding applied
to the *entire* file (XML-style), rather than just unicode string literals and 
(hope-
fully) comments and docstrings.

 



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread M.-A. Lemburg
Fredrik Lundh wrote:
> M.-A. Lemburg wrote:
> 
> 
>>I don't follow you here. The source code encoding
>>is only applied to Unicode literals (you are using string
>>literals in your example). String literals are passed
>>through as-is.
> 
> 
> however, for Python 3000, it would be nice if the source-code encoding applied
> to the *entire* file (XML-style), rather than just unicode string literals 
> and (hope-
> fully) comments and docstrings.

Actually, the encoding is applied to the complete source file:
the file is transcoded into UTF-8 and then parsed by the
Python parser.

Unicode literals are then decoded from the UTF-8 into Unicode.
String literals are transcoded back into the source code encoding,
thus making the (rather long due to technical constraints) round-trip
source code encoding -> Unicode -> UTF-8 -> Unicode -> source code encoding.

Python 3k should have a fully Unicode based parser to reduce this
additional transcoding overhead.

Since Py3k will only have Unicode literals, the problems with
string literals will go away all by themselves :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New codecs checked in

2005-10-25 Thread M.-A. Lemburg
M.-A. Lemburg wrote:
> Martin v. Löwis wrote:
> 
>>M.-A. Lemburg wrote:
>>
>>
>>
>>>I had to create three custom mapping files for cp1140, koi8-u
>>>and tis-620.
>>
>>
>>Can you please publish the files you have used somewhere? They
>>best go into the Python CVS.
> 
> 
> Sure; I'll check in the whole build machinery I'm using for this.

Done.

In order to rebuild the codecs, cd Tools/unicode; make
then check the codecs in the created build/ subdir (e.g.
using comparecodecs.py) and copy them over to the
Lib/encodings/ directory.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

2005-10-25 Thread Nick Coghlan
Almost there - this is the only issue I have left on my list :)

Guido van Rossum wrote:
> On 10/24/05, Nick Coghlan <[EMAIL PROTECTED]> wrote:
>> However, those resolutions bring up the following issues:
>>
>>5 a. What exception is raised when EXPR does not have a __context__ 
>> method?
>>  b.  What about when the returned object is missing __enter__ or 
>> __exit__?
>> I suggest raising TypeError in both cases, for symmetry with for loops.
>> The slot check is made in C code, so I don't see any difficulty in 
>> raising
>> TypeError instead of AttributeError if the relevant slots aren't filled.
> 
> Why are you so keen on TypeError? I find AttributeError totally
> appropriate. I don't see symmetry with for-loops as a valuable
> property here. AttributeError and TypeError are often interchangeable
> anyway.

The reason I'm keen on TypeError is because 'abstract.c' uses it consistently
when it fails to find a method to support a requested protocol.

None of the abstract object methods currently raise AttributeError, and this
property is fairly visible at the Python level because the abstract API's are 
used to implement many of the bytecodes and various builtin functions. Both 
for loops and the iter function, for example, get their current exception 
behaviour from PyObject_GetIter and PyIter_Next.

Having had a look at mwh's patch, however, I've realised that going that way 
would only be possible if there were dedicated bytecodes for GET_CONTEXT, 
ENTER_CONTEXT and EXIT_CONTEXT (similar to the dedicated GET_ITER and FOR_ITER).

Leaving the exception as AttributeError means that level of bytecode hacking 
isn't necessary (mwh's patch just emits a fairly normal try/finally statement, 
although it still modifies the bytecode to include LOAD_EXIT_ARGS).

So, the inconsistency with other syntactic protocols still bothers me, but I 
can live with AttributeError if you don't want to add three new bytecodes just 
to support PEP 343.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://boredomandlaziness.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] MinGW and libpython24.a

2005-10-25 Thread David Abrahams

Is the instruction at
http://www.python.org/dev/doc/devel/inst/tweak-flags.html#SECTION000622000
still relevant?  I am not 100% certain I didn't make one myself, but
it looks to me as though my Windows Python 2.4.1 distro came with a
libpython24.a.  I am asking here because it seems only the person who
prepares the installer would know.  If this is true, in which version
was it introduced?

Thanks,

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

2005-10-25 Thread Guido van Rossum
On 10/25/05, Nick Coghlan <[EMAIL PROTECTED]> wrote:
> Almost there - this is the only issue I have left on my list :)
[,,,]
> > Why are you so keen on TypeError? I find AttributeError totally
> > appropriate. I don't see symmetry with for-loops as a valuable
> > property here. AttributeError and TypeError are often interchangeable
> > anyway.
>
> The reason I'm keen on TypeError is because 'abstract.c' uses it consistently
> when it fails to find a method to support a requested protocol.

Hm. abstract.c well predates the new type system. Slots and methods
weren't really unified back then, so TypeError made obvious sense at
the time.

But with the new unified types/classes, those TypeErrors are really
just delayed (or precomputed? depends on your POV) AttributeErrors.

> None of the abstract object methods currently raise AttributeError, and this
> property is fairly visible at the Python level because the abstract API's are
> used to implement many of the bytecodes and various builtin functions. Both
> for loops and the iter function, for example, get their current exception
> behaviour from PyObject_GetIter and PyIter_Next.
>
> Having had a look at mwh's patch, however, I've realised that going that way
> would only be possible if there were dedicated bytecodes for GET_CONTEXT,
> ENTER_CONTEXT and EXIT_CONTEXT (similar to the dedicated GET_ITER and 
> FOR_ITER).
>
> Leaving the exception as AttributeError means that level of bytecode hacking
> isn't necessary (mwh's patch just emits a fairly normal try/finally statement,
> although it still modifies the bytecode to include LOAD_EXIT_ARGS).

Let's definitely not introduce new bytecodes just so we can raise a
different exception.

> So, the inconsistency with other syntactic protocols still bothers me, but I
> can live with AttributeError if you don't want to add three new bytecodes just
> to support PEP 343.

I think the consistency you are seeking is a mirage. The TypeErrors
stem from the pre-computation of the slot population, not from some
requirements to raise TypeError for failing to implement some required
built-in protocol. I wouldn't hold it against other implementations of
Python if they raised AttributeError in more situations.

It is true though that AttributeError is somewhat special. There are
lots of places (perhaps too many?) where an operation is defined using
something like "if the object has attribute __foo__, use it, otherwise
use some other approach".  Some operations explicitly check for
AttributeError in their attribute check, and let a different exception
bubble up the stack. Presumably this is done so that a bug in
somebody's __getattr__ implementation doesn't get masked by the
"otherwise use some other approach" branch. But this is relatively
rare; most calls to PyObject_GetAttr just clear the error if they have
a different approach available. In any case, I don't see any of this
as supporting the position that TypeError is somehow more appropriate.
An AttributeError complaining about a missing __enter__, __exit__ or
__context__ method sounds just fine. (Oh, and please don't go checking
for the existence of __exit__ before calling __enter__. That kind of
bug is found with even the most cursory testing.)

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New codecs checked in

2005-10-25 Thread Martin v. Löwis
M.-A. Lemburg wrote:

> Done.
> 
> In order to rebuild the codecs, cd Tools/unicode; make
> then check the codecs in the created build/ subdir (e.g.
> using comparecodecs.py) and copy them over to the
> Lib/encodings/ directory.

Thanks!

Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] MinGW and libpython24.a

2005-10-25 Thread Martin v. Löwis
David Abrahams wrote:
> Is the instruction at
> http://www.python.org/dev/doc/devel/inst/tweak-flags.html#SECTION000622000
> still relevant?  I am not 100% certain I didn't make one myself, but
> it looks to me as though my Windows Python 2.4.1 distro came with a
> libpython24.a.  I am asking here because it seems only the person who
> prepares the installer would know.

That impression might be incorrect: I can tell you when I started
including libpython24.a, but I have no clue whether the instructions
you refer to are correct - I don't use the file myself at all.

> If this is true, in which version was it introduced?

It was introduced in 1.20/1.16.2.4 of Tools/msi/msi.py in response to
patch #1088716; this in turn was first used to release r241c1.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Martin v. Löwis
Bill Janssen wrote:
> I just got mail this morning from a researcher who wants exactly what
> Martin described, and wondered why the default MacPython 2.4.2 didn't
> provide it by default. :-)

If all he wants is to represent Deseret, he can do so in a 16-bit
Unicode type, too: Python supports UTF-16.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Martin v. Löwis
Fredrik Lundh wrote:
> however, for Python 3000, it would be nice if the source-code encoding applied
> to the *entire* file (XML-style), rather than just unicode string literals 
> and (hope-
> fully) comments and docstrings.

As MAL explains, the encoding currently does apply to the entire file.
However, because of the Python syntax, you are restricted to ASCII
in many places, such as keywords, number literals, and (unfortunately)
identifiers. Lifting the restriction on identifiers is on my agenda.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-25 Thread Josiah Carlson

Nick Coghlan <[EMAIL PROTECTED]> wrote:
> 
> Josiah Carlson wrote:
> > Nick Coghlan <[EMAIL PROTECTED]> wrote:
> >> I think having dicts and sets automatically invoke freeze would be a 
> >> mistake, 
> >> because at least one of the following two cases would behave unexpectedly:
> > 
> > I'm pretty sure that the PEP was only aslomg if one would freeze the
> > contents of dicts IF the dict was being frozen.
> > 
> > That is, which of the following should be the case:
> > freeze({1:[2,3,4]}) -> {1:[2,3,4]}
> > freeze({1:[2,3,4]}) -> xdict(1=(2,3,4))
> 
> I believe the choices you intended are:
>   freeze({1:[2,3,4]}) -> imdict(1=[2,3,4])
>   freeze({1:[2,3,4]}) -> imdict(1=(2,3,4))
> 
> Regardless, that question makes a lot more sense (and looking at the PEP 
> again, I realised I simply read it wrong the first time).
> 
> For containers where equality depends on the contents of the container (i.e., 
> all the builtin ones), I don't see how it is possible to implement a sensible 
> hash function without freezing the contents as well - otherwise your 
> immutable 
> isn't particularly immutable.
> 
> Consider what would happen if list "__freeze__" simply returned a tuple 
> version of itself - you have a __freeze__ method which returns a potentially 
> unhashable object!

I agree completely, hence my original statement on 10/23: "it is of my
opinion that a container which is frozen should have its contents frozen
as well."

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] MinGW and libpython24.a

2005-10-25 Thread David Abrahams
"Martin v. Löwis" <[EMAIL PROTECTED]> writes:

> David Abrahams wrote:
>> Is the instruction at
>> http://www.python.org/dev/doc/devel/inst/tweak-flags.html#SECTION000622000
>> still relevant?  I am not 100% certain I didn't make one myself, but
>> it looks to me as though my Windows Python 2.4.1 distro came with a
>> libpython24.a.  I am asking here because it seems only the person who
>> prepares the installer would know.
>
> That impression might be incorrect: I can tell you when I started
> including libpython24.a, but I have no clue whether the instructions
> you refer to are correct - I don't use the file myself at all.
>
>> If this is true, in which version was it introduced?
>
> It was introduced in 1.20/1.16.2.4 of Tools/msi/msi.py in response to
> patch #1088716; this in turn was first used to release r241c1.

Thanks!

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Josiah Carlson

"Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> 
> Fredrik Lundh wrote:
> > however, for Python 3000, it would be nice if the source-code encoding 
> > applied
> > to the *entire* file (XML-style), rather than just unicode string literals 
> > and (hope-
> > fully) comments and docstrings.
> 
> As MAL explains, the encoding currently does apply to the entire file.
> However, because of the Python syntax, you are restricted to ASCII
> in many places, such as keywords, number literals, and (unfortunately)
> identifiers. Lifting the restriction on identifiers is on my agenda.

It seems that removing this restriction may cause serious issues, at
least in the case when using cyrillic characters in names.  See recent
security issues in regards to web addresses in web browsers for the
confusion (and/or name errors) that could result in their use.

While I agree in principle that people should be able to use the
entirety of one's own natural language in writing software in
programming languages, I think that it is an ugly can of worms that
perhaps shouldn't be opened.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

2005-10-25 Thread Eric Nieuwland
Guido van Rossum wrote:
> It is true though that AttributeError is somewhat special. There are
> lots of places (perhaps too many?) where an operation is defined using
> something like "if the object has attribute __foo__, use it, otherwise
> use some other approach".  Some operations explicitly check for
> AttributeError in their attribute check, and let a different exception
> bubble up the stack. Presumably this is done so that a bug in
> somebody's __getattr__ implementation doesn't get masked by the
> "otherwise use some other approach" branch. But this is relatively
> rare; most calls to PyObject_GetAttr just clear the error if they have
> a different approach available. In any case, I don't see any of this
> as supporting the position that TypeError is somehow more appropriate.
> An AttributeError complaining about a missing __enter__, __exit__ or
> __context__ method sounds just fine. (Oh, and please don't go checking
> for the existence of __exit__ before calling __enter__. That kind of
> bug is found with even the most cursory testing.)

Hmmm... Would it be reasonable to introduce a ProtocolError exception?

--eric

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

2005-10-25 Thread Guido van Rossum
On 10/25/05, Eric Nieuwland <[EMAIL PROTECTED]> wrote:
> Hmmm... Would it be reasonable to introduce a ProtocolError exception?

And which perceived problem would that solve? The problem of Nick &
Guido disagreeing in public?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

2005-10-25 Thread Eric Nieuwland
Guido van Rossum wrote:

> On 10/25/05, Eric Nieuwland <[EMAIL PROTECTED]> wrote:
>> Hmmm... Would it be reasonable to introduce a ProtocolError exception?
>
> And which perceived problem would that solve? The problem of Nick &
> Guido disagreeing in public?

;-)

No, that will go on in other fields, I guess.

It was meant to be a bit more informative about what is wrong.

ProtocolError: lacks __enter__ or __exit__

--eric

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

2005-10-25 Thread Guido van Rossum
[Eric "are all your pets called Eric?" Nieuwland]
> >> Hmmm... Would it be reasonable to introduce a ProtocolError exception?

[Guido]
> > And which perceived problem would that solve?

[Eric]
> It was meant to be a bit more informative about what is wrong.
>
> ProtocolError: lacks __enter__ or __exit__

That's exactly what I'm trying to avoid. :)

I find "AttributeError: __exit__" just as informative. In either case,
if you know what __exit__ means, you'll know what you did wrong. And
if you don't know what it means, you'll have to look it up anyway. And
searching for ProtocolError doesn't do you any good -- you'll have to
learn about what __exit__ is and where it is required.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 343 - multiple context managers in one statement

2005-10-25 Thread Paul Moore
I have a deep suspicion that this has been done to death already, but
my searching ability isn't up to finding the reference. So I'll simply
ask the question, and not offer a long discussion:

Has the option of letting the with statement admit multiple context
managers been considered (and presumably rejected)?

I'm thinking of

with expr1, expr2, expr3:
# whatever

In some ways, this doesn't even need an extension to the PEP - giving
tuples suitable __enter__ and __exit__ methods would do it. Or, I
suppose a user-defined manager which combined a list of others:

class combining:
def __init__(*mgrs):
self.mgrs = mgrs
def __with__(self):
return self
def __enter__(self):
return tuple(mgr.__enter__() for mgr in self.mgrs)
def __exit__(self, type, value, tb):
# first in, last out
for mgr in reversed(self.mgrs):
mgr.__exit__(type, value, tb)

Would that be worth using as an example in the PEP?

Sorry - it got a bit long anyway...

Paul.

PS The signature of __with__ in example 4 in the PEP is wrong - it has
an incorrect "lock" parameter.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] AST branch is in?

2005-10-25 Thread Frank Wierzbicki
On 10/20/05, Neal Norwitz <[EMAIL PROTECTED]> wrote:
The Grammar is (was at one point at least) shared between Jython andwould allow more tools to be able to share infrastructure.  The ideais to eventually be able to have [JP]ython output the same AST totools.

Hello Python-dev,


My name is Frank Wierzbicki and I'm working on the Jython
project.  Does anyone on this list know more about the history of
this
Grammar sharing between the two projects?  I've heard about some
Grammar sharing between Jython and Python, and I've noticed that (most
of)
the jython code in /org/python/parser/ast is commented "Autogenerated
AST node".  I would definitely like to look at (eventually)
coordinating with this effort.

I've cross-posted to the Jython-dev list in case someone there has some insight.



Thanks,

Frank
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] AST branch is in?

2005-10-25 Thread Guido van Rossum
On 10/25/05, Frank Wierzbicki <[EMAIL PROTECTED]> wrote:
>  My name is Frank Wierzbicki and I'm working on the Jython project.  Does
> anyone on this list know more about the history of this Grammar sharing
> between the two projects?  I've heard about some Grammar sharing between
> Jython and Python, and I've noticed that (most of) the jython code in
> /org/python/parser/ast is commented "Autogenerated AST node".  I would
> definitely like to look at (eventually) coordinating with this effort.
>
>  I've cross-posted to the Jython-dev list in case someone there has some
> insight.

Your best bet is to track down Jim Hugunin and see if he remembers.
He's jimhug at microsoft.com or jim at hugunin.net.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Bill Janssen
I think he was more interested in the invariant Martin proposed, that

 len("\U0001")

should always be the same and should always be 1.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Guido van Rossum
On 10/25/05, Bill Janssen <[EMAIL PROTECTED]> wrote:
> I think he was more interested in the invariant Martin proposed, that
>
>  len("\U0001")
>
> should always be the same and should always be 1.

Yes but why? What does this invariant do for him?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Jython-dev] Re: AST branch is in?

2005-10-25 Thread Samuele Pedroni
Frank Wierzbicki wrote:
> On 10/20/05, *Neal Norwitz* <[EMAIL PROTECTED] 
> > wrote:
> 
> The Grammar is (was at one point at least) shared between Jython and
> would allow more tools to be able to share infrastructure.  The idea
> is to eventually be able to have [JP]ython output the same AST to
> tools.
> 
> 
> Hello Python-dev,
> 
> My name is Frank Wierzbicki and I'm working on the Jython project.  Does 
> anyone on this list know more about the history of this Grammar sharing 
> between the two projects?  I've heard about some Grammar sharing between 
> Jython and Python, and I've noticed that (most of) the jython code in 
> /org/python/parser/ast is commented "Autogenerated AST node".  I would 
> definitely like to look at (eventually) coordinating with this effort.
> 
> I've cross-posted to the Jython-dev list in case someone there has some 
> insight.

as far as I understand now Python trunk contains some generated AST
representation C code created through the asdl_c.py script from an 
updated Python.asdl, these files live in

http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Parser/

a parallel asdl_java.py existed in Python CVS sandbox (where the
AST effort started) and was updated the last time the Jython
own AST classes were generated with at the time version of Python.asdl
(this was done by me if I remember correctly at some point in Jython
2.2 evolution, I think when the PyDev guys wanted a more up-to-date
Jython parser to reuse):

http://cvs.sourceforge.net/viewcvs.py/*checkout*/python/python/nondist/sandbox/ast/asdl_java.py?content-type=text%2Fplain&rev=1.7

basically the new Python.asdl needs to be used, the asdl_java.py
maybe updated and our compiler changed as necessary.

regards.








___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Martin v. Löwis
Josiah Carlson wrote:
> It seems that removing this restriction may cause serious issues, at
> least in the case when using cyrillic characters in names.  See recent
> security issues in regards to web addresses in web browsers for the
> confusion (and/or name errors) that could result in their use.

That impression is deceiving. We are talking about source code here;
people type in identifiers explicitly rather than receiving them
through linking, and they scope identifiers (by module or object).

If somebody manages to get look-alike identifiers into your Python
libraries, you have bigger problems than these look-alikes: anybody
capable of doing so could just as well replace the real thing in
the first place.

As always in computer security: define your threat model before
reasoning about the risks.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] AST branch is in?

2005-10-25 Thread Samuele Pedroni
Guido van Rossum wrote:
> On 10/25/05, Frank Wierzbicki <[EMAIL PROTECTED]> wrote:
> 
>> My name is Frank Wierzbicki and I'm working on the Jython project.  Does
>>anyone on this list know more about the history of this Grammar sharing
>>between the two projects?  I've heard about some Grammar sharing between
>>Jython and Python, and I've noticed that (most of) the jython code in
>>/org/python/parser/ast is commented "Autogenerated AST node".  I would
>>definitely like to look at (eventually) coordinating with this effort.
>>
>> I've cross-posted to the Jython-dev list in case someone there has some
>>insight.
> 
> 
> Your best bet is to track down Jim Hugunin and see if he remembers.
> He's jimhug at microsoft.com or jim at hugunin.net.
> 

no. this is all after Jim, its indeed a derived effort from the CPython
own AST effort, just that we started using it quite a while ago.
This is all after Jim was not involved with Jython anymore, Finn Bock
started this.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] AST branch is in?

2005-10-25 Thread Guido van Rossum
On 10/25/05, Samuele Pedroni <[EMAIL PROTECTED]> wrote:
> > Your best bet is to track down Jim Hugunin and see if he remembers.
> > He's jimhug at microsoft.com or jim at hugunin.net.

> no. this is all after Jim, its indeed a derived effort from the CPython
> own AST effort, just that we started using it quite a while ago.
> This is all after Jim was not involved with Jython anymore, Finn Bock
> started this.

Oops! Sorry for the misinformation. Shows how much I know. :(

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Martin v. Löwis
Guido van Rossum wrote:
> Yes but why? What does this invariant do for him?

I don't know about this person, but there are a few things that
don't work properly in UTF-16 mode:

- the Unicode character database fails to lookup things.
   u"\U0001D670".isupper() gives false, but should give true
   (since it denotes MATHEMATICAL MONOSPACE CAPITAL A).
   It gives true in UCS-4 mode
- As a result, normalization on these doesn't work, either.
   It should normalize to "LATIN CAPITAL LETTER A" under
   NFKC, but doesn't.
- regular expressions only have limited support. In
   particular, adding non-BMP characters to character classes
   is not possible. [\U0001D670] will match any character
   that is either \uD835 or \uDE70, whereas it only matches
   MATHEMATICAL MONOSPACE CAPITAL A in UCS-4 mode.

There might be more limitations, but those are the ones that
come to mind easily. While I could imagine fixing the first
two with some effort, the third one is really tricky (unless
you would accept a "wide" representation of a character
class even if the Unicode representation is only narrow).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Josiah Carlson

"Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> 
> Josiah Carlson wrote:
> > It seems that removing this restriction may cause serious issues, at
> > least in the case when using cyrillic characters in names.  See recent
> > security issues in regards to web addresses in web browsers for the
> > confusion (and/or name errors) that could result in their use.
> 
> That impression is deceiving. We are talking about source code here;
> people type in identifiers explicitly rather than receiving them
> through linking, and they scope identifiers (by module or object).
> 
> If somebody manages to get look-alike identifiers into your Python
> libraries, you have bigger problems than these look-alikes: anybody
> capable of doing so could just as well replace the real thing in
> the first place.
> 
> As always in computer security: define your threat model before
> reasoning about the risks.

I should have been more explicit.  I did not mean to imply that I was
concerned about the security implications of inserting arbitrary
identifiers in Python (I was mentioning the web browser case for
an example of how such characters have been confusing previously), I am
concerned about confusion involved with using:
Greek Capital: Alpha, Beta, Epsilon, Zeta, Eta, Iota, Kappa, Mu, Nu,
Omicron, Rho, and Tau.
Cyrillic Capital: Dze, Je, A, Ve, Ie, Em, En, O, Er, Es, Te, Ha, ...

And how users could say, "name error? But I typed in window.draw(PEN) as
I was told to, and it didn't work!"


Identically drawn glyphs are a problem, and pretending that they aren't
a problem, doesn't make it so.  Right now, all possible name glyphs are
visually distinct, which would not be the case if any unicode character
could be used as a name (except for numerals).  Speaking of which, would
we then be offering support for arabic/indic numeric literals, and/or
support it in int()/float()?  Ideally I would like to say yes, but I
could see the confusion if such were allowed.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c

2005-10-25 Thread Phil Thompson
On Monday 24 October 2005 7:39 pm, Guido van Rossum wrote:
> On 10/24/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> > Guido van Rossum wrote:
> > > A concern I'd have with fixing this is that Unicode objects also
> > > support the buffer API. In any situation where either str or unicode
> > > is accepted I'd be reluctant to guess whether a buffer object was
> > > meant to be str-like or Unicode-like. I think this covers all the
> > > cases you mention here.
> >
> > This situation is a little better than that: the buffer
> > interface has a slot called getcharbuffer which is what
> > the string methods use in case they find that a string
> > argument is not of type str or unicode.
>
> I stand corrected!
>
> > As first step, I'd suggest to implement the gatcharbuffer
> > slot. That will already go a long way.
>
> Phil, if anything still doesn't work after doing what Marc-Andre says,
> those would be good candidates for fixes!

The patch is now on SF, #1337876.

Phil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread M.-A. Lemburg
Josiah Carlson wrote:
> "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> 
>>Fredrik Lundh wrote:
>>
>>>however, for Python 3000, it would be nice if the source-code encoding 
>>>applied
>>>to the *entire* file (XML-style), rather than just unicode string literals 
>>>and (hope-
>>>fully) comments and docstrings.
>>
>>As MAL explains, the encoding currently does apply to the entire file.
>>However, because of the Python syntax, you are restricted to ASCII
>>in many places, such as keywords, number literals, and (unfortunately)
>>identifiers. Lifting the restriction on identifiers is on my agenda.
> 
> 
> It seems that removing this restriction may cause serious issues, at
> least in the case when using cyrillic characters in names.  See recent
> security issues in regards to web addresses in web browsers for the
> confusion (and/or name errors) that could result in their use.
> 
> While I agree in principle that people should be able to use the
> entirety of one's own natural language in writing software in
> programming languages, I think that it is an ugly can of worms that
> perhaps shouldn't be opened.

I agree with Josiah.

A few years ago we had a discussion about this on python-dev
and agreed to stick with ASCII identifiers for Python. I still
think that's the right way to go.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2005)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Guido van Rossum
On 10/25/05, Josiah Carlson <[EMAIL PROTECTED]> wrote:
> Identically drawn glyphs are a problem, and pretending that they aren't
> a problem, doesn't make it so.  Right now, all possible name glyphs are
> visually distinct, which would not be the case if any unicode character
> could be used as a name (except for numerals).  Speaking of which, would
> we then be offering support for arabic/indic numeric literals, and/or
> support it in int()/float()?  Ideally I would like to say yes, but I
> could see the confusion if such were allowed.

This problem isn't new. There are plenty of fonts where 1 and l are
hard to distinguish, or l and I for that matter, or O and 0.

Yes, we need better tools to diagnose this.

No, we shouldn't let this stop us from adding such a feature if it is
otherwise a good feature.

I'm not so sure about this for other reasons -- it hampers code
sharing, and as soon as you add right-to-left character sets to the
mix (or top-to-bottom, for that matter), displaying source code is
going to be near impossible for most tools (since the keywords and
standard library module names will still be in the Latin alphabet).
This actually seems a killer even for allowing Unicode in comments,
which I'd otherwise favor. What do Unicode-aware apps generally do
with right-to-left characters?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Martin v. Löwis
Josiah Carlson wrote:
> And how users could say, "name error? But I typed in window.draw(PEN) as
> I was told to, and it didn't work!"

Ah, so the "serious issues" you are talking about are not security 
issues, but usability issues.

I don't think extending the range of acceptable characters will
cause any additional confusion. Users are already getting "surprising"
NameErrors/AttributeErrors in the following cases:
- they just misspell the identifier, and then, when the error message
   is printed, fail to recognize the difference, as they read over the
   typo just like they read over it when mistyping it in the first place.

- they run into confusions with different things having the same names
   in different contexts. For example, they wonder why they get TypeError
   for passing the wrong number of arguments to a function, when the
   call matches exactly what the source code in front of them tells
   them - only that they were calling a different function which just
   happened to have the same name.

In the light of these common mistakes, your example with an identifier
named PEN, where the "P" might be a cyrillic letter or the E a greek one
is just made up: For window.draw, people will readily understand that
they are supposed to use Latin letters. More generally, they will know
what script to use just from looking at the identifier.

> Identically drawn glyphs are a problem, and pretending that they aren't
> a problem, doesn't make it so.  Right now, all possible name glyphs are
> visually distinct

Not at all: Just compare Fool and Foo1 (and perhaps FooI)


In the font in which I'm typing this, these are slightly different - but
there are fonts in which the difference is really difficult to
recognize.

> Speaking of which, would
> we then be offering support for arabic/indic numeric literals, and/or
> support it in int()/float()?

No. None of the Arabic users have ever requested such a feature, so
it would be stupid to provide it. We provide extended identifiers not
for the fun of it, but because users are requesting them.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

2005-10-25 Thread Nick Coghlan
Guido van Rossum wrote:
> On 10/25/05, Nick Coghlan <[EMAIL PROTECTED]> wrote:
>> Almost there - this is the only issue I have left on my list :)
> [,,,]
>>> Why are you so keen on TypeError? I find AttributeError totally
>>> appropriate. I don't see symmetry with for-loops as a valuable
>>> property here. AttributeError and TypeError are often interchangeable
>>> anyway.
>> The reason I'm keen on TypeError is because 'abstract.c' uses it consistently
>> when it fails to find a method to support a requested protocol.
> 
> Hm. abstract.c well predates the new type system. Slots and methods
> weren't really unified back then, so TypeError made obvious sense at
> the time.

Ah, I hadn't considered that, because I never made significant use of any 
Python versions before 2.2.

Maybe there's a design principle in there somewhere:

   Failed duck-typing -> AttributeError (or TypeError for complex checks)
   Failed instance or subtype check -> TypeError

Most of the functions in abstract.c handle complex protocols, so a simple 
attribute error wouldn't convey the necessary meaning. The context protocol, 
on the other hand, is fairly simple, and an AttributeError tells you 
everything you really need to know.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://boredomandlaziness.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Martin v. Löwis
M.-A. Lemburg wrote:
> A few years ago we had a discussion about this on python-dev
> and agreed to stick with ASCII identifiers for Python. I still
> think that's the right way to go.

I don't think there ever was such an agreement.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

2005-10-25 Thread Guido van Rossum
On 10/25/05, Nick Coghlan <[EMAIL PROTECTED]> wrote:
> Maybe there's a design principle in there somewhere:
>
>Failed duck-typing -> AttributeError (or TypeError for complex checks)
>Failed instance or subtype check -> TypeError

Doesn't convince me.

If there are principles at work here (and not just coincidences), they
are (a) don't  lightly replace an exception by another, and (b) don't
raise AttributeError; the getattr operation raise it for you. (a) says
that we should let the AttributeError bubble up in the case of the
with-statement; (b) explains why you see TypeError when a slot isn't
filled.

> Most of the functions in abstract.c handle complex protocols, so a simple
> attribute error wouldn't convey the necessary meaning. The context protocol,
> on the other hand, is fairly simple, and an AttributeError tells you
> everything you really need to know.

That's what I've been saying all the time. :-)

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Martin v. Löwis
Guido van Rossum wrote:
> This actually seems a killer even for allowing Unicode in comments,
> which I'd otherwise favor. What do Unicode-aware apps generally do
> with right-to-left characters?

The Unicode standard has an elaborate definition of what should happen.
There are many rules to it, but essentially, there is the notion of a
"primary" direction, which then is toggled based on the directionality
of each character (unicodedata.bidirectional). There are also formatting
characters which toggle the direction.

This aspect of rendering is often not implemented, though. Web browsers
do it correctly, see

http://he.wikipedia.org/wiki/Python

where all text should come out right-adjusted, yet the Latin fragments
are still left to right (such as "Guido van Rossum")

Integrating it into this text looks like this: פייתון (Python).

GUI frameworks sometimes do it correctly, sometimes don't; most
notably, Tk has no good support for RTL text.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 343 - multiple context managers in one statement

2005-10-25 Thread Nick Coghlan
Paul Moore wrote:
> I have a deep suspicion that this has been done to death already, but
> my searching ability isn't up to finding the reference. So I'll simply
> ask the question, and not offer a long discussion:
> 
> Has the option of letting the with statement admit multiple context
> managers been considered (and presumably rejected)?
> 
> I'm thinking of
> 
> with expr1, expr2, expr3:
> # whatever

Not rejected - deliberately left as a future option (this is the reason why 
the RHS of an as clause has to be parenthesised if you want tuple unpacking).

> In some ways, this doesn't even need an extension to the PEP - giving
> tuples suitable __enter__ and __exit__ methods would do it. Or, I
> suppose a user-defined manager which combined a list of others:
> 
> class combining:
> def __init__(*mgrs):
> self.mgrs = mgrs
> def __with__(self):
> return self
> def __enter__(self):
> return tuple(mgr.__enter__() for mgr in self.mgrs)
> def __exit__(self, type, value, tb):
> # first in, last out
> for mgr in reversed(self.mgrs):
> mgr.__exit__(type, value, tb)
> 
> Would that be worth using as an example in the PEP?

The issue with that implementation is that the semantics are wrong - it 
doesn't actually mirror *nested* with statements. If one of the later 
__enter__ methods, or one of the first-executed __exit__ methods throws an 
exception, there are a lot of __exit__ methods that get skipped.

Getting it right is more complicated (and this probably still has mistakes):

  class nested(object):
  def __init__(*mgrs):
  self.mgrs = mgrs
  self.entered = None

  def __context__(self):
  return self

  def __enter__(self):
  self.entered = deque()
  vars = []
  try:
  for mgr in self.mgrs:
  var = mgr.__enter__()
  self.entered.push_front(mgr)
  vars.append(var)
  except:
  self.__exit__(*sys.exc_info())
  raise
  return vars

  def __exit__(self, *exc_info):
  # first in, last out
  # Behave like nested with statements
  ex = exc_info
  for mgr in self.entered:
  try:
  mgr.__exit__(*ex)
  except:
  ex = sys.exc_info()
  if ex is not exc_info:
  raise ex[0], ex[1], ex[2]

> PS The signature of __with__ in example 4 in the PEP is wrong - it has
> an incorrect "lock" parameter.

Thanks - I'll fix that when I incorporate the resolutions of the open issues 
(which will be post the SVN migration).

Cheers,
Nick.


-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://boredomandlaziness.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Josiah Carlson

"Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> 
> Josiah Carlson wrote:
> > And how users could say, "name error? But I typed in window.draw(PEN) as
> > I was told to, and it didn't work!"
> 
> Ah, so the "serious issues" you are talking about are not security 
> issues, but usability issues.

Indeed, it was a misunderstanding, as the email stated:
I did not mean to imply that I was concerned about the security
implications of inserting arbitrary identifiers in Python (I was
mentioning the web browser case for an example of how such
characters have been confusing previously), I am concerned about
confusion involved with using: [glyphs which are identical]


> I don't think extending the range of acceptable characters will
> cause any additional confusion. Users are already getting "surprising"
> NameErrors/AttributeErrors in the following cases:
> - they just misspell the identifier, and then, when the error message
>is printed, fail to recognize the difference, as they read over the
>typo just like they read over it when mistyping it in the first place.

In this case it's not just a misreading, the characters look identical! 
When is an 'E' not an 'E'?  When it is an Epsilon or Ie.  Saying what
characters will or will not be used as identifiers, when those
characters are keys on a keyboard of a specific type, is pretty
presumptuous.


> - they run into confusions with different things having the same names
>in different contexts. For example, they wonder why they get TypeError
>for passing the wrong number of arguments to a function, when the
>call matches exactly what the source code in front of them tells
>them - only that they were calling a different function which just
>happened to have the same name.

Right, and users should be reading the documentation for the functions
and methods they are calling.


> In the light of these common mistakes, your example with an identifier
> named PEN, where the "P" might be a cyrillic letter or the E a greek one
> is just made up: For window.draw, people will readily understand that
> they are supposed to use Latin letters. More generally, they will know
> what script to use just from looking at the identifier.

Sure, that example was made up, but there are words which have been
stolen from various languages by english, and you are discounting the
case of single-letter temporary variables.  Saying what will and won't
happen over the course of using unicode identifiers is quite the
prediction.


> > Identically drawn glyphs are a problem, and pretending that they aren't
> > a problem, doesn't make it so.  Right now, all possible name glyphs are
> > visually distinct
> 
> Not at all: Just compare Fool and Foo1 (and perhaps FooI)
> 
> In the font in which I'm typing this, these are slightly different - but
> there are fonts in which the difference is really difficult to
> recognize.

Indeed, they are similar, but_ different_ in my font as well.  The trick
is that the glyphs are not different in the case of certain greek or
cyrillic letters.  They don't just /look/ similar they /are identical/.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Guido van Rossum
On 10/25/05, Josiah Carlson <[EMAIL PROTECTED]> wrote:
> Indeed, they are similar, but_ different_ in my font as well.  The trick
> is that the glyphs are not different in the case of certain greek or
> cyrillic letters.  They don't just /look/ similar they /are identical/.

Well, in the font I'm using to read this email, I and l are /identical/.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Josiah Carlson

Guido van Rossum <[EMAIL PROTECTED]> wrote:
> 
> On 10/25/05, Josiah Carlson <[EMAIL PROTECTED]> wrote:
> > Indeed, they are similar, but_ different_ in my font as well.  The trick
> > is that the glyphs are not different in the case of certain greek or
> > cyrillic letters.  They don't just /look/ similar they /are identical/.
> 
> Well, in the font I'm using to read this email, I and l are /identical/.

In all fonts I've seen, E/Epsilon/Ie are /always identical/.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Neil Hodgson
Martin v. Löwis:

> This aspect of rendering is often not implemented, though. Web browsers
> do it correctly, see
> ...
> GUI frameworks sometimes do it correctly, sometimes don't; most
> notably, Tk has no good support for RTL text.

   Scintilla does a rough job with this. RTL text is displayed
correctly as the underlying platform libraries (Windows or GTK+/Pango)
handle this aspect when called to draw text. However editing is not
performed correctly with the caret not being placed correctly within
RTL text and other visual glitches. There is interest in the area and
even a funding proposal this week.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Greg Ewing
Martin v. Löwis wrote:

> For window.draw, people will readily understand that
> they are supposed to use Latin letters. More generally, they will know
> what script to use just from looking at the identifier.

Would it help if an identifier were required to be
made up of letters from the same alphabet, e.g. all
Latin or all Greek or all Cyrillic, but not a mixture.
Then you'd get an immediate error if you accidentally
slipped in a letter from the wrong alphabet.

Greg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] make testall hanging on HEAD?

2005-10-25 Thread Anthony Baxter
At the moment, I see make testall hanging in test_timeout. In 
addition, test_curses is leaving the tty in a hosed state:

test_crypt
test_csv
test_curses
test_datetime
 test_dbm
 test_decimal
 test_decorators
test_deque
  test_descr

This is on Ubuntu Breezy, 
[GCC 4.0.2 20050808 (prerelease) (Ubuntu 4.0.1-4ubuntu9)] on linux2

Anyone else see this?

-- 
Anthony Baxter <[EMAIL PROTECTED]>
It's never too late to have a happy childhood.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] make testall hanging on HEAD?

2005-10-25 Thread jepler
ditto on the "curses" problem, but test_timeout completed just fine, at least
the first time around.

fedora core 4, x86_64
[GCC 4.0.1 20050727 (Red Hat 4.0.1-5)] on linux2

Jeff


pgpTesSunOdI7.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Neil Hodgson
M.-A. Lemburg:

> You mean a slice that slices out the next  ?

   Yes.

> This sounds a lot like you'd want iterators for the various
> index types. Should be possible to implement on top of the
> proposed APIs, e.g. itergraphemes(u), itercodepoints(u), etc.

   Iterators may be helpful, but can also be too restrictive when the
processing is not completely iterative, such as peeking ahead or
looking behind to wrap at a word boundary in the display example.
There should be

  It was more that there may leave less scope for error if there was a
move away from indexes to slices. The PEP provides ways to specify
what you want to examine or modify but it looks to me like returning
indexes will see code repetition or additional variables with an
increase in fragility.

> Note that what most people refer to as "character" is a
> grapheme in Unicode speak.

   A grapheme-oriented string type may be worthwhile although you'd
probably have to choose a particular normalisation form to ease
processing.

> Given that interpretation,
> "breaking" Unicode "characters" is something you won't
> ever work around with by using larger code units such
> as UCS4 compatible ones.

   I still think we can reduce the scope for errors.

> Furthermore, you should also note that surrogates (two
> code units encoding one code point) are part of Unicode
> life. While you don't need them when storing Unicode
> in UCS4 code units, they can still be part of the
> Unicode data and the programmer has to be aware of
> these.

   Many programmers can and will ignore surrogates. One day that may
bite them but we can't close off text processing to those who have no
idea of what surrogates are, or directional marks, or that sorting is
locale dependent, or have no understanding of the difference between
NFC and NFKD normalization forms.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Martin v. Löwis
Josiah Carlson wrote:
> In this case it's not just a misreading, the characters look identical! 
> When is an 'E' not an 'E'?  When it is an Epsilon or Ie.  Saying what
> characters will or will not be used as identifiers, when those
> characters are keys on a keyboard of a specific type, is pretty
> presumptuous.

Why is that rude and disrespectful? I'm certainly respecting developers
who want to use their scripts for identifiers, or else I would not have
suggested that they could do so.

However, from the experience with my own language, and the three or so
foreign languages I know, I can tell you that people would normally
don't mix identifiers of different scripts.

> Sure, that example was made up, but there are words which have been
> stolen from various languages by english, and you are discounting the
> case of single-letter temporary variables.  Saying what will and won't
> happen over the course of using unicode identifiers is quite the
> prediction.

Sure, people can make mistakes. They get an error, and then will
need to find the cause of the problem. Sometimes, this will be easy,
and sometimes, it will not.

> Indeed, they are similar, but_ different_ in my font as well.  The trick
> is that the glyphs are not different in the case of certain greek or
> cyrillic letters.  They don't just /look/ similar they /are identical/.

This string: "EΕ" is the LATIN CAPITAL LETTER E, followed by the GREEK
CAPITAL LETTER EPSILON. In the font my email composer uses, the E is
slightly larger than the Epsilon - so there /is/ a visual difference.

But even if there isn't: if this was a frequent problem, the name
error could include an alternative representation (say, with Unicode
ordinals for non-ASCII characters) which would give an easy visual
clue.

I still doubt that this is a frequent problem, and I don't see any
better grounds for claiming that it is than for claiming that it
is not.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

2005-10-25 Thread Eric Nieuwland
Guido van Rossum wrote:

> [Eric "are all your pets called Eric?" Nieuwland]
 Hmmm... Would it be reasonable to introduce a ProtocolError 
 exception?
>
> [Guido]
>>> And which perceived problem would that solve?
>
> [Eric]
>> It was meant to be a bit more informative about what is wrong.
>>
>> ProtocolError: lacks __enter__ or __exit__
>
> That's exactly what I'm trying to avoid. :)
>
> I find "AttributeError: __exit__" just as informative. In either case,
> if you know what __exit__ means, you'll know what you did wrong. And
> if you don't know what it means, you'll have to look it up anyway. And
> searching for ProtocolError doesn't do you any good -- you'll have to
> learn about what __exit__ is and where it is required.

I see. Then why don't we unify *Error into Error?
Just read the message and know what it means.
And we could then drop the burden of exception classes and only use the 
message.
A sense of deja-vu comes over me somehow ;-)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Martin v. Löwis
Greg Ewing wrote:
> Would it help if an identifier were required to be
> made up of letters from the same alphabet, e.g. all
> Latin or all Greek or all Cyrillic, but not a mixture.
> Then you'd get an immediate error if you accidentally
> slipped in a letter from the wrong alphabet.

Not in the literal sense: you certainly want to allow
"latin" digits in, say, a cyrillic identifier.See

http://www.unicode.org/reports/tr31/

for what the Unicode consortium recommends to do.
In addition to the strict specification, they envision
usage guidelines. This seems Pythonic: just because
you could potentially shoot yourself in the foot doesn't
mean it should be banned from the language.

IOW, whether it would help largely depends on whether
the problem is real in the first place. Just because
you *can* come up with look-alike identifiers doesn't
mean that people will use them, or that they will mistake
the scripts (except for deliberately doing so, of
course).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Stephen J. Turnbull
> "Josiah" == Josiah Carlson <[EMAIL PROTECTED]> writes:

Josiah> Indeed, they are similar, but_ different_ in my font as
Josiah> well.  The trick is that the glyphs are not different in
Josiah> the case of certain greek or cyrillic letters.  They don't
Josiah> just /look/ similar they /are identical/.

But these problems are going to arise in _any_ multilingual context;
it's not at all specific to identifiers.  It's just that computers
lexing identifiers are kinda picky about those things compared to
humans.  I think you can reasonably classify it as a new breed of
typo, and develop UIs to deal with it in that way.

To handle cases where glyphs are (nearly) identical, UIs that visually
flag "foreign" characters, at least in contexts where cross-block
punning is unacceptable, will be developed, and users will learn to
pay attention to those flags.


-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of TsukubaTennodai 1-1-1 Tsukuba 305-8573 JAPAN
   Ask not how you can "do" free software business;
  ask what your business can "do for" free software.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com