Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread M.-A. Lemburg
Neil Hodgson wrote:
 M.-A. Lemburg:
 
 
Unicode has the concept of combining code points, e.g. you can
store an é (e with a accent) as e + '. Now if you slice
off the accent, you'll break the character that you encoded
using combining code points.
...
next_indextype(u, index) - integer

Returns the Unicode object index for the start of the next
indextype found after u[index] or -1 in case no next element
of this type exists.
 
 
Should entity breakage be further discouraged by returning a slice
 here rather than an object index?

You mean a slice that slices out the next indextype ?

Something like:
 
 i = first_grapheme(u)
 x = 0
 while x  width and u[i] != \n:
x, _ = draw(u[i], (x, y))
i = next_grapheme(u, i)

This sounds a lot like you'd want iterators for the various
index types. Should be possible to implement on top of the
proposed APIs, e.g. itergraphemes(u), itercodepoints(u), etc.

Note that what most people refer to as character is a
grapheme in Unicode speak. Given that interpretation,
breaking Unicode characters is something you won't
ever work around with by using larger code units such
as UCS4 compatible ones.

Furthermore, you should also note that surrogates (two
code units encoding one code point) are part of Unicode
life. While you don't need them when storing Unicode
in UCS4 code units, they can still be part of the
Unicode data and the programmer has to be aware of
these.

I personally, don't think that slicing Unicode is
such a big issue. If you know what you are doing,
things tend not to break - which is true for pretty
much everything you do in programming ;-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] AST branch is in?

2005-10-25 Thread Neil Schemenauer
Simon Burton [EMAIL PROTECTED] wrote:
 Is there a python interface ?

Not yet.

  Neil

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New codecs checked in

2005-10-25 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 
I just left them in because I thought they wouldn't do any harm
and might be useful in some applications.

Removing them where not directly needed by the codec would not
be a problem.
 
 
 I think memory usage caused is measurable (I estimated 4KiB per
 dictionary). More importantly, people apparently currently change
 the dictionaries we provide and expect the codecs to automatically
 pick up the modified mappings. It would be better if the breakage
 is explicit (i.e. they get an AttributeError on the variable) instead
 of implicit (their changes to the mapping simply have no effect
 anymore).

Agreed. I've already checked in the changes, BTW.

KOI8-U is not available as mapping on ftp.unicode.org and
I only recreated codecs from the mapping files available
there.
 
 
 I think we should come up with mapping tables for the additional
 codecs as well, and maintain them in the CVS. This also applies
 to things like rot13.

Agreed.

I'll rerun the creation with the above changes sometime this
week.
 
 
 I hope I can finish my encoding routine shortly, which again
 results in changes to the codecs (replacing the encoding dictionaries
 with other lookup tables).

Having seen the decode tables written as long Unicode string,
I think that this may indeed also be a good solution for
encoding - the major improvement here is that the parser
and compiler will do the work of creating the table. At
module load time, the .pyc file will only contain a long
string which is very fast to create and load (unlike dictionaries
which are set up dynamically at load time).

In general, it's better to do all the work up-front when
creating the codecs, rather than having run-time code
repeat these tasks over and over again.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-25 Thread Nick Coghlan
Josiah Carlson wrote:
 Nick Coghlan [EMAIL PROTECTED] wrote:
 I think having dicts and sets automatically invoke freeze would be a 
 mistake, 
 because at least one of the following two cases would behave unexpectedly:
 
 I'm pretty sure that the PEP was only aslomg if one would freeze the
 contents of dicts IF the dict was being frozen.
 
 That is, which of the following should be the case:
 freeze({1:[2,3,4]}) - {1:[2,3,4]}
 freeze({1:[2,3,4]}) - xdict(1=(2,3,4))

I believe the choices you intended are:
  freeze({1:[2,3,4]}) - imdict(1=[2,3,4])
  freeze({1:[2,3,4]}) - imdict(1=(2,3,4))

Regardless, that question makes a lot more sense (and looking at the PEP 
again, I realised I simply read it wrong the first time).

For containers where equality depends on the contents of the container (i.e., 
all the builtin ones), I don't see how it is possible to implement a sensible 
hash function without freezing the contents as well - otherwise your immutable 
isn't particularly immutable.

Consider what would happen if list __freeze__ simply returned a tuple 
version of itself - you have a __freeze__ method which returns a potentially 
unhashable object!

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://boredomandlaziness.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] AST branch is in?

2005-10-25 Thread A.M. Kuchling
On Tue, Oct 25, 2005 at 01:36:26PM +1000, Simon Burton wrote:
 Is there a python interface ?

Not yet, as far as I know.

FYI, all: please see the following weblog entry for a description of
the AST branch:  
http://www.amk.ca/diary/2005/10/the_ast_branch_lands_1

If I got anything wrong, please offer corrections in the comments for
that post.

--amk
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Reminder: PyCon 2006 submissions due in a week

2005-10-25 Thread A.M. Kuchling
The submission deadline for PyCon 2006 is now a week away.  PyCon 2006
will be in Dallas, Texas, February 24-26 2006.

For 2006, I'd like to see more tutorial-style talks on the program.
This means that your talk doesn't have to be about something entirely
new; you can show how to use a particular language feature, standard
library module, examine some aspect of a Python implementation, or
compare the available libraries in an application domain.

For example, the most popular talk at 2005 was Michelle Levesque's
PyWeboff, which compare various web development tools.  The next most
popular (ignoring a few keynotes and the lightning talks) were Alex
Martelli's talks on iterators  generators, and on OOP.  Partly that's
because it's Alex, of course, but I think attendees want help in
deciding which tools are good/helpful/safe to use.

If you need an idea, http://wiki.python.org/moin/PyCon2005/Feedback
lists some topics that 2005's attendees were interested in.

CFP:
http://www.python.org/pycon/2006/cfp

Proposal submission site:
http://submit.python.org/

--amk
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread M.-A. Lemburg
Bengt Richter wrote:
 At 11:43 2005-10-24 +0200, M.-A. Lemburg wrote:
 
Bengt Richter wrote:

Please bear with me for a few paragraphs ;-)

Please note that source code encoding doesn't really have
anything to do with the way the interpreter executes the
program - it's merely a way to tell the parser how to
convert string literals (currently on the Unicode ones)
into constant Unicode objects within the program text.
It's also a nice way to let other people know what kind of
encoding you used to write your comments ;-)

Nothing more.
 
 I think somehow I didn't make things clear, sorry ;-)
 As I tried to show in the example of module_a.cs vs module_b.cs,
 the source encoding currently results in two different str-type
 strings representing the source _character_ sequence, which is the
 _same_ in both cases. 

I don't follow you here. The source code encoding
is only applied to Unicode literals (you are using string
literals in your example). String literals are passed
through as-is.

Whether or not you editor will use the source
code encoding marker is really up to your editor
and not within the scope of Python.

If you open the two module files in Emacs, you'll
see identical renderings of the string literals.
With other editors, you may have to explicitly tell
the editor which encoding to assume. Dito for shell
printouts.

 To make it more clear, try the following little
 program (untested except on NT4 with
 Python 2.4b1 (#56, Nov  3 2004, 01:47:27)
 [GCC 3.2.3 (mingw special 20030504-1)] on win32 ;-):
 
  t_srcenc.py 
 import os
 def test():
 open('module_a.py','wb').write(
 # -*- coding: latin-1 -*- + os.linesep +
 cs = '\xfcber-cool' + os.linesep)
 open('module_b.py','wb').write(
 # -*- coding: utf-8 -*- + os.linesep +
 cs = '\xc3\xbcber-cool' + os.linesep)
 # show that we have two modules differing only in encoding:
 print ''.join(line.decode('latin-1') for line in open('module_a.py'))
 print ''.join(line.decode('utf-8') for line in open('module_b.py'))
 # see how results are affected:
 import module_a, module_b
 print module_a.cs + ' =?= ' + module_b.cs
 print module_a.cs.decode('latin-1') + ' =?= ' + 
 module_b.cs.decode('utf-8')
 
 if __name__ == '__main__':
 test()
 ---
 The result copied from NT4 console to clipboard and pasted into eudora:
 __
 
 [17:39] C:\pywk\python-devpy24 t_srcenc.py
 # -*- coding: latin-1 -*-
 cs = 'über-cool'
 
 # -*- coding: utf-8 -*-
 cs = 'über-cool'
 
 nber-cool =?= ++ber-cool
 über-cool =?= über-cool
 __
 (I'd say NT did the best it could, rendering the the copied cp437
 superscript n as the 'n' above, and the '++' coming from the
 cp437 box characters corresponding to the '\xc3\xbc'. Not sure
 how it will show on your screen, but try the program to see ;-)

Once a module is compiled, there's no distinction between
a module using the latin-1 source code encoding or one using
the utf-8 encoding.
 
 ISTM module_a.cs and module_b.cs can readily be distinguished after
 compilation, whereas the sources displayed according to their declared
 encodings as above (or as e.g. different editors using different native
 encoding might) cannot (other than the encoding cookie itself) ;-)
 Perhaps you meant something else?

What your editor displays to you is not within the scope
of Python, e.g. if you open the files in Emacs you'll see
something different than in Notepad.

I guess that's the price you have to pay for being able to write
programs that can include Unicode literals using the complete range
of possible Unicode characters without having to revert to
escapes.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Fredrik Lundh
M.-A. Lemburg wrote:

 I don't follow you here. The source code encoding
 is only applied to Unicode literals (you are using string
 literals in your example). String literals are passed
 through as-is.

however, for Python 3000, it would be nice if the source-code encoding applied
to the *entire* file (XML-style), rather than just unicode string literals and 
(hope-
fully) comments and docstrings.

/F 



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread M.-A. Lemburg
Fredrik Lundh wrote:
 M.-A. Lemburg wrote:
 
 
I don't follow you here. The source code encoding
is only applied to Unicode literals (you are using string
literals in your example). String literals are passed
through as-is.
 
 
 however, for Python 3000, it would be nice if the source-code encoding applied
 to the *entire* file (XML-style), rather than just unicode string literals 
 and (hope-
 fully) comments and docstrings.

Actually, the encoding is applied to the complete source file:
the file is transcoded into UTF-8 and then parsed by the
Python parser.

Unicode literals are then decoded from the UTF-8 into Unicode.
String literals are transcoded back into the source code encoding,
thus making the (rather long due to technical constraints) round-trip
source code encoding - Unicode - UTF-8 - Unicode - source code encoding.

Python 3k should have a fully Unicode based parser to reduce this
additional transcoding overhead.

Since Py3k will only have Unicode literals, the problems with
string literals will go away all by themselves :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New codecs checked in

2005-10-25 Thread M.-A. Lemburg
M.-A. Lemburg wrote:
 Martin v. Löwis wrote:
 
M.-A. Lemburg wrote:



I had to create three custom mapping files for cp1140, koi8-u
and tis-620.


Can you please publish the files you have used somewhere? They
best go into the Python CVS.
 
 
 Sure; I'll check in the whole build machinery I'm using for this.

Done.

In order to rebuild the codecs, cd Tools/unicode; make
then check the codecs in the created build/ subdir (e.g.
using comparecodecs.py) and copy them over to the
Lib/encodings/ directory.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

2005-10-25 Thread Nick Coghlan
Almost there - this is the only issue I have left on my list :)

Guido van Rossum wrote:
 On 10/24/05, Nick Coghlan [EMAIL PROTECTED] wrote:
 However, those resolutions bring up the following issues:

5 a. What exception is raised when EXPR does not have a __context__ 
 method?
  b.  What about when the returned object is missing __enter__ or 
 __exit__?
 I suggest raising TypeError in both cases, for symmetry with for loops.
 The slot check is made in C code, so I don't see any difficulty in 
 raising
 TypeError instead of AttributeError if the relevant slots aren't filled.
 
 Why are you so keen on TypeError? I find AttributeError totally
 appropriate. I don't see symmetry with for-loops as a valuable
 property here. AttributeError and TypeError are often interchangeable
 anyway.

The reason I'm keen on TypeError is because 'abstract.c' uses it consistently
when it fails to find a method to support a requested protocol.

None of the abstract object methods currently raise AttributeError, and this
property is fairly visible at the Python level because the abstract API's are 
used to implement many of the bytecodes and various builtin functions. Both 
for loops and the iter function, for example, get their current exception 
behaviour from PyObject_GetIter and PyIter_Next.

Having had a look at mwh's patch, however, I've realised that going that way 
would only be possible if there were dedicated bytecodes for GET_CONTEXT, 
ENTER_CONTEXT and EXIT_CONTEXT (similar to the dedicated GET_ITER and FOR_ITER).

Leaving the exception as AttributeError means that level of bytecode hacking 
isn't necessary (mwh's patch just emits a fairly normal try/finally statement, 
although it still modifies the bytecode to include LOAD_EXIT_ARGS).

So, the inconsistency with other syntactic protocols still bothers me, but I 
can live with AttributeError if you don't want to add three new bytecodes just 
to support PEP 343.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://boredomandlaziness.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New codecs checked in

2005-10-25 Thread Martin v. Löwis
M.-A. Lemburg wrote:

 Done.
 
 In order to rebuild the codecs, cd Tools/unicode; make
 then check the codecs in the created build/ subdir (e.g.
 using comparecodecs.py) and copy them over to the
 Lib/encodings/ directory.

Thanks!

Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] MinGW and libpython24.a

2005-10-25 Thread Martin v. Löwis
David Abrahams wrote:
 Is the instruction at
 http://www.python.org/dev/doc/devel/inst/tweak-flags.html#SECTION000622000
 still relevant?  I am not 100% certain I didn't make one myself, but
 it looks to me as though my Windows Python 2.4.1 distro came with a
 libpython24.a.  I am asking here because it seems only the person who
 prepares the installer would know.

That impression might be incorrect: I can tell you when I started
including libpython24.a, but I have no clue whether the instructions
you refer to are correct - I don't use the file myself at all.

 If this is true, in which version was it introduced?

It was introduced in 1.20/1.16.2.4 of Tools/msi/msi.py in response to
patch #1088716; this in turn was first used to release r241c1.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Martin v. Löwis
Bill Janssen wrote:
 I just got mail this morning from a researcher who wants exactly what
 Martin described, and wondered why the default MacPython 2.4.2 didn't
 provide it by default. :-)

If all he wants is to represent Deseret, he can do so in a 16-bit
Unicode type, too: Python supports UTF-16.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Martin v. Löwis
Fredrik Lundh wrote:
 however, for Python 3000, it would be nice if the source-code encoding applied
 to the *entire* file (XML-style), rather than just unicode string literals 
 and (hope-
 fully) comments and docstrings.

As MAL explains, the encoding currently does apply to the entire file.
However, because of the Python syntax, you are restricted to ASCII
in many places, such as keywords, number literals, and (unfortunately)
identifiers. Lifting the restriction on identifiers is on my agenda.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-25 Thread Josiah Carlson

Nick Coghlan [EMAIL PROTECTED] wrote:
 
 Josiah Carlson wrote:
  Nick Coghlan [EMAIL PROTECTED] wrote:
  I think having dicts and sets automatically invoke freeze would be a 
  mistake, 
  because at least one of the following two cases would behave unexpectedly:
  
  I'm pretty sure that the PEP was only aslomg if one would freeze the
  contents of dicts IF the dict was being frozen.
  
  That is, which of the following should be the case:
  freeze({1:[2,3,4]}) - {1:[2,3,4]}
  freeze({1:[2,3,4]}) - xdict(1=(2,3,4))
 
 I believe the choices you intended are:
   freeze({1:[2,3,4]}) - imdict(1=[2,3,4])
   freeze({1:[2,3,4]}) - imdict(1=(2,3,4))
 
 Regardless, that question makes a lot more sense (and looking at the PEP 
 again, I realised I simply read it wrong the first time).
 
 For containers where equality depends on the contents of the container (i.e., 
 all the builtin ones), I don't see how it is possible to implement a sensible 
 hash function without freezing the contents as well - otherwise your 
 immutable 
 isn't particularly immutable.
 
 Consider what would happen if list __freeze__ simply returned a tuple 
 version of itself - you have a __freeze__ method which returns a potentially 
 unhashable object!

I agree completely, hence my original statement on 10/23: it is of my
opinion that a container which is frozen should have its contents frozen
as well.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] MinGW and libpython24.a

2005-10-25 Thread David Abrahams
Martin v. Löwis [EMAIL PROTECTED] writes:

 David Abrahams wrote:
 Is the instruction at
 http://www.python.org/dev/doc/devel/inst/tweak-flags.html#SECTION000622000
 still relevant?  I am not 100% certain I didn't make one myself, but
 it looks to me as though my Windows Python 2.4.1 distro came with a
 libpython24.a.  I am asking here because it seems only the person who
 prepares the installer would know.

 That impression might be incorrect: I can tell you when I started
 including libpython24.a, but I have no clue whether the instructions
 you refer to are correct - I don't use the file myself at all.

 If this is true, in which version was it introduced?

 It was introduced in 1.20/1.16.2.4 of Tools/msi/msi.py in response to
 patch #1088716; this in turn was first used to release r241c1.

Thanks!

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Josiah Carlson

Martin v. Löwis [EMAIL PROTECTED] wrote:
 
 Fredrik Lundh wrote:
  however, for Python 3000, it would be nice if the source-code encoding 
  applied
  to the *entire* file (XML-style), rather than just unicode string literals 
  and (hope-
  fully) comments and docstrings.
 
 As MAL explains, the encoding currently does apply to the entire file.
 However, because of the Python syntax, you are restricted to ASCII
 in many places, such as keywords, number literals, and (unfortunately)
 identifiers. Lifting the restriction on identifiers is on my agenda.

It seems that removing this restriction may cause serious issues, at
least in the case when using cyrillic characters in names.  See recent
security issues in regards to web addresses in web browsers for the
confusion (and/or name errors) that could result in their use.

While I agree in principle that people should be able to use the
entirety of one's own natural language in writing software in
programming languages, I think that it is an ugly can of worms that
perhaps shouldn't be opened.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

2005-10-25 Thread Eric Nieuwland
Guido van Rossum wrote:
 It is true though that AttributeError is somewhat special. There are
 lots of places (perhaps too many?) where an operation is defined using
 something like if the object has attribute __foo__, use it, otherwise
 use some other approach.  Some operations explicitly check for
 AttributeError in their attribute check, and let a different exception
 bubble up the stack. Presumably this is done so that a bug in
 somebody's __getattr__ implementation doesn't get masked by the
 otherwise use some other approach branch. But this is relatively
 rare; most calls to PyObject_GetAttr just clear the error if they have
 a different approach available. In any case, I don't see any of this
 as supporting the position that TypeError is somehow more appropriate.
 An AttributeError complaining about a missing __enter__, __exit__ or
 __context__ method sounds just fine. (Oh, and please don't go checking
 for the existence of __exit__ before calling __enter__. That kind of
 bug is found with even the most cursory testing.)

Hmmm... Would it be reasonable to introduce a ProtocolError exception?

--eric

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

2005-10-25 Thread Guido van Rossum
On 10/25/05, Eric Nieuwland [EMAIL PROTECTED] wrote:
 Hmmm... Would it be reasonable to introduce a ProtocolError exception?

And which perceived problem would that solve? The problem of Nick 
Guido disagreeing in public?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

2005-10-25 Thread Eric Nieuwland
Guido van Rossum wrote:

 On 10/25/05, Eric Nieuwland [EMAIL PROTECTED] wrote:
 Hmmm... Would it be reasonable to introduce a ProtocolError exception?

 And which perceived problem would that solve? The problem of Nick 
 Guido disagreeing in public?

;-)

No, that will go on in other fields, I guess.

It was meant to be a bit more informative about what is wrong.

ProtocolError: lacks __enter__ or __exit__

--eric

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

2005-10-25 Thread Guido van Rossum
[Eric are all your pets called Eric? Nieuwland]
  Hmmm... Would it be reasonable to introduce a ProtocolError exception?

[Guido]
  And which perceived problem would that solve?

[Eric]
 It was meant to be a bit more informative about what is wrong.

 ProtocolError: lacks __enter__ or __exit__

That's exactly what I'm trying to avoid. :)

I find AttributeError: __exit__ just as informative. In either case,
if you know what __exit__ means, you'll know what you did wrong. And
if you don't know what it means, you'll have to look it up anyway. And
searching for ProtocolError doesn't do you any good -- you'll have to
learn about what __exit__ is and where it is required.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 343 - multiple context managers in one statement

2005-10-25 Thread Paul Moore
I have a deep suspicion that this has been done to death already, but
my searching ability isn't up to finding the reference. So I'll simply
ask the question, and not offer a long discussion:

Has the option of letting the with statement admit multiple context
managers been considered (and presumably rejected)?

I'm thinking of

with expr1, expr2, expr3:
# whatever

In some ways, this doesn't even need an extension to the PEP - giving
tuples suitable __enter__ and __exit__ methods would do it. Or, I
suppose a user-defined manager which combined a list of others:

class combining:
def __init__(*mgrs):
self.mgrs = mgrs
def __with__(self):
return self
def __enter__(self):
return tuple(mgr.__enter__() for mgr in self.mgrs)
def __exit__(self, type, value, tb):
# first in, last out
for mgr in reversed(self.mgrs):
mgr.__exit__(type, value, tb)

Would that be worth using as an example in the PEP?

Sorry - it got a bit long anyway...

Paul.

PS The signature of __with__ in example 4 in the PEP is wrong - it has
an incorrect lock parameter.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] AST branch is in?

2005-10-25 Thread Frank Wierzbicki
On 10/20/05, Neal Norwitz [EMAIL PROTECTED] wrote:
The Grammar is (was at one point at least) shared between Jython andwould allow more tools to be able to share infrastructure.The ideais to eventually be able to have [JP]ython output the same AST totools.

Hello Python-dev,


My name is Frank Wierzbicki and I'm working on the Jython
project. Does anyone on this list know more about the history of
this
Grammar sharing between the two projects? I've heard about some
Grammar sharing between Jython and Python, and I've noticed that (most
of)
the jython code in /org/python/parser/ast is commented Autogenerated
AST node. I would definitely like to look at (eventually)
coordinating with this effort.

I've cross-posted to the Jython-dev list in case someone there has some insight.



Thanks,

Frank
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] AST branch is in?

2005-10-25 Thread Guido van Rossum
On 10/25/05, Frank Wierzbicki [EMAIL PROTECTED] wrote:
  My name is Frank Wierzbicki and I'm working on the Jython project.  Does
 anyone on this list know more about the history of this Grammar sharing
 between the two projects?  I've heard about some Grammar sharing between
 Jython and Python, and I've noticed that (most of) the jython code in
 /org/python/parser/ast is commented Autogenerated AST node.  I would
 definitely like to look at (eventually) coordinating with this effort.

  I've cross-posted to the Jython-dev list in case someone there has some
 insight.

Your best bet is to track down Jim Hugunin and see if he remembers.
He's jimhug at microsoft.com or jim at hugunin.net.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Bill Janssen
I think he was more interested in the invariant Martin proposed, that

 len(\U0001)

should always be the same and should always be 1.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Guido van Rossum
On 10/25/05, Bill Janssen [EMAIL PROTECTED] wrote:
 I think he was more interested in the invariant Martin proposed, that

  len(\U0001)

 should always be the same and should always be 1.

Yes but why? What does this invariant do for him?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Jython-dev] Re: AST branch is in?

2005-10-25 Thread Samuele Pedroni
Frank Wierzbicki wrote:
 On 10/20/05, *Neal Norwitz* [EMAIL PROTECTED] 
 mailto:[EMAIL PROTECTED] wrote:
 
 The Grammar is (was at one point at least) shared between Jython and
 would allow more tools to be able to share infrastructure.  The idea
 is to eventually be able to have [JP]ython output the same AST to
 tools.
 
 
 Hello Python-dev,
 
 My name is Frank Wierzbicki and I'm working on the Jython project.  Does 
 anyone on this list know more about the history of this Grammar sharing 
 between the two projects?  I've heard about some Grammar sharing between 
 Jython and Python, and I've noticed that (most of) the jython code in 
 /org/python/parser/ast is commented Autogenerated AST node.  I would 
 definitely like to look at (eventually) coordinating with this effort.
 
 I've cross-posted to the Jython-dev list in case someone there has some 
 insight.

as far as I understand now Python trunk contains some generated AST
representation C code created through the asdl_c.py script from an 
updated Python.asdl, these files live in

http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Parser/

a parallel asdl_java.py existed in Python CVS sandbox (where the
AST effort started) and was updated the last time the Jython
own AST classes were generated with at the time version of Python.asdl
(this was done by me if I remember correctly at some point in Jython
2.2 evolution, I think when the PyDev guys wanted a more up-to-date
Jython parser to reuse):

http://cvs.sourceforge.net/viewcvs.py/*checkout*/python/python/nondist/sandbox/ast/asdl_java.py?content-type=text%2Fplainrev=1.7

basically the new Python.asdl needs to be used, the asdl_java.py
maybe updated and our compiler changed as necessary.

regards.








___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Martin v. Löwis
Josiah Carlson wrote:
 It seems that removing this restriction may cause serious issues, at
 least in the case when using cyrillic characters in names.  See recent
 security issues in regards to web addresses in web browsers for the
 confusion (and/or name errors) that could result in their use.

That impression is deceiving. We are talking about source code here;
people type in identifiers explicitly rather than receiving them
through linking, and they scope identifiers (by module or object).

If somebody manages to get look-alike identifiers into your Python
libraries, you have bigger problems than these look-alikes: anybody
capable of doing so could just as well replace the real thing in
the first place.

As always in computer security: define your threat model before
reasoning about the risks.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] AST branch is in?

2005-10-25 Thread Samuele Pedroni
Guido van Rossum wrote:
 On 10/25/05, Frank Wierzbicki [EMAIL PROTECTED] wrote:
 
 My name is Frank Wierzbicki and I'm working on the Jython project.  Does
anyone on this list know more about the history of this Grammar sharing
between the two projects?  I've heard about some Grammar sharing between
Jython and Python, and I've noticed that (most of) the jython code in
/org/python/parser/ast is commented Autogenerated AST node.  I would
definitely like to look at (eventually) coordinating with this effort.

 I've cross-posted to the Jython-dev list in case someone there has some
insight.
 
 
 Your best bet is to track down Jim Hugunin and see if he remembers.
 He's jimhug at microsoft.com or jim at hugunin.net.
 

no. this is all after Jim, its indeed a derived effort from the CPython
own AST effort, just that we started using it quite a while ago.
This is all after Jim was not involved with Jython anymore, Finn Bock
started this.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] AST branch is in?

2005-10-25 Thread Guido van Rossum
On 10/25/05, Samuele Pedroni [EMAIL PROTECTED] wrote:
  Your best bet is to track down Jim Hugunin and see if he remembers.
  He's jimhug at microsoft.com or jim at hugunin.net.

 no. this is all after Jim, its indeed a derived effort from the CPython
 own AST effort, just that we started using it quite a while ago.
 This is all after Jim was not involved with Jython anymore, Finn Bock
 started this.

Oops! Sorry for the misinformation. Shows how much I know. :(

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Martin v. Löwis
Guido van Rossum wrote:
 Yes but why? What does this invariant do for him?

I don't know about this person, but there are a few things that
don't work properly in UTF-16 mode:

- the Unicode character database fails to lookup things.
   u\U0001D670.isupper() gives false, but should give true
   (since it denotes MATHEMATICAL MONOSPACE CAPITAL A).
   It gives true in UCS-4 mode
- As a result, normalization on these doesn't work, either.
   It should normalize to LATIN CAPITAL LETTER A under
   NFKC, but doesn't.
- regular expressions only have limited support. In
   particular, adding non-BMP characters to character classes
   is not possible. [\U0001D670] will match any character
   that is either \uD835 or \uDE70, whereas it only matches
   MATHEMATICAL MONOSPACE CAPITAL A in UCS-4 mode.

There might be more limitations, but those are the ones that
come to mind easily. While I could imagine fixing the first
two with some effort, the third one is really tricky (unless
you would accept a wide representation of a character
class even if the Unicode representation is only narrow).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Josiah Carlson

Martin v. Löwis [EMAIL PROTECTED] wrote:
 
 Josiah Carlson wrote:
  It seems that removing this restriction may cause serious issues, at
  least in the case when using cyrillic characters in names.  See recent
  security issues in regards to web addresses in web browsers for the
  confusion (and/or name errors) that could result in their use.
 
 That impression is deceiving. We are talking about source code here;
 people type in identifiers explicitly rather than receiving them
 through linking, and they scope identifiers (by module or object).
 
 If somebody manages to get look-alike identifiers into your Python
 libraries, you have bigger problems than these look-alikes: anybody
 capable of doing so could just as well replace the real thing in
 the first place.
 
 As always in computer security: define your threat model before
 reasoning about the risks.

I should have been more explicit.  I did not mean to imply that I was
concerned about the security implications of inserting arbitrary
identifiers in Python (I was mentioning the web browser case for
an example of how such characters have been confusing previously), I am
concerned about confusion involved with using:
Greek Capital: Alpha, Beta, Epsilon, Zeta, Eta, Iota, Kappa, Mu, Nu,
Omicron, Rho, and Tau.
Cyrillic Capital: Dze, Je, A, Ve, Ie, Em, En, O, Er, Es, Te, Ha, ...

And how users could say, name error? But I typed in window.draw(PEN) as
I was told to, and it didn't work!


Identically drawn glyphs are a problem, and pretending that they aren't
a problem, doesn't make it so.  Right now, all possible name glyphs are
visually distinct, which would not be the case if any unicode character
could be used as a name (except for numerals).  Speaking of which, would
we then be offering support for arabic/indic numeric literals, and/or
support it in int()/float()?  Ideally I would like to say yes, but I
could see the confusion if such were allowed.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inconsistent Use of Buffer Interface in stringobject.c

2005-10-25 Thread Phil Thompson
On Monday 24 October 2005 7:39 pm, Guido van Rossum wrote:
 On 10/24/05, M.-A. Lemburg [EMAIL PROTECTED] wrote:
  Guido van Rossum wrote:
   A concern I'd have with fixing this is that Unicode objects also
   support the buffer API. In any situation where either str or unicode
   is accepted I'd be reluctant to guess whether a buffer object was
   meant to be str-like or Unicode-like. I think this covers all the
   cases you mention here.
 
  This situation is a little better than that: the buffer
  interface has a slot called getcharbuffer which is what
  the string methods use in case they find that a string
  argument is not of type str or unicode.

 I stand corrected!

  As first step, I'd suggest to implement the gatcharbuffer
  slot. That will already go a long way.

 Phil, if anything still doesn't work after doing what Marc-Andre says,
 those would be good candidates for fixes!

The patch is now on SF, #1337876.

Phil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread M.-A. Lemburg
Josiah Carlson wrote:
 Martin v. Löwis [EMAIL PROTECTED] wrote:
 
Fredrik Lundh wrote:

however, for Python 3000, it would be nice if the source-code encoding 
applied
to the *entire* file (XML-style), rather than just unicode string literals 
and (hope-
fully) comments and docstrings.

As MAL explains, the encoding currently does apply to the entire file.
However, because of the Python syntax, you are restricted to ASCII
in many places, such as keywords, number literals, and (unfortunately)
identifiers. Lifting the restriction on identifiers is on my agenda.
 
 
 It seems that removing this restriction may cause serious issues, at
 least in the case when using cyrillic characters in names.  See recent
 security issues in regards to web addresses in web browsers for the
 confusion (and/or name errors) that could result in their use.
 
 While I agree in principle that people should be able to use the
 entirety of one's own natural language in writing software in
 programming languages, I think that it is an ugly can of worms that
 perhaps shouldn't be opened.

I agree with Josiah.

A few years ago we had a discussion about this on python-dev
and agreed to stick with ASCII identifiers for Python. I still
think that's the right way to go.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Guido van Rossum
On 10/25/05, Josiah Carlson [EMAIL PROTECTED] wrote:
 Identically drawn glyphs are a problem, and pretending that they aren't
 a problem, doesn't make it so.  Right now, all possible name glyphs are
 visually distinct, which would not be the case if any unicode character
 could be used as a name (except for numerals).  Speaking of which, would
 we then be offering support for arabic/indic numeric literals, and/or
 support it in int()/float()?  Ideally I would like to say yes, but I
 could see the confusion if such were allowed.

This problem isn't new. There are plenty of fonts where 1 and l are
hard to distinguish, or l and I for that matter, or O and 0.

Yes, we need better tools to diagnose this.

No, we shouldn't let this stop us from adding such a feature if it is
otherwise a good feature.

I'm not so sure about this for other reasons -- it hampers code
sharing, and as soon as you add right-to-left character sets to the
mix (or top-to-bottom, for that matter), displaying source code is
going to be near impossible for most tools (since the keywords and
standard library module names will still be in the Latin alphabet).
This actually seems a killer even for allowing Unicode in comments,
which I'd otherwise favor. What do Unicode-aware apps generally do
with right-to-left characters?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Martin v. Löwis
Josiah Carlson wrote:
 And how users could say, name error? But I typed in window.draw(PEN) as
 I was told to, and it didn't work!

Ah, so the serious issues you are talking about are not security 
issues, but usability issues.

I don't think extending the range of acceptable characters will
cause any additional confusion. Users are already getting surprising
NameErrors/AttributeErrors in the following cases:
- they just misspell the identifier, and then, when the error message
   is printed, fail to recognize the difference, as they read over the
   typo just like they read over it when mistyping it in the first place.

- they run into confusions with different things having the same names
   in different contexts. For example, they wonder why they get TypeError
   for passing the wrong number of arguments to a function, when the
   call matches exactly what the source code in front of them tells
   them - only that they were calling a different function which just
   happened to have the same name.

In the light of these common mistakes, your example with an identifier
named PEN, where the P might be a cyrillic letter or the E a greek one
is just made up: For window.draw, people will readily understand that
they are supposed to use Latin letters. More generally, they will know
what script to use just from looking at the identifier.

 Identically drawn glyphs are a problem, and pretending that they aren't
 a problem, doesn't make it so.  Right now, all possible name glyphs are
 visually distinct

Not at all: Just compare Fool and Foo1 (and perhaps FooI)


In the font in which I'm typing this, these are slightly different - but
there are fonts in which the difference is really difficult to
recognize.

 Speaking of which, would
 we then be offering support for arabic/indic numeric literals, and/or
 support it in int()/float()?

No. None of the Arabic users have ever requested such a feature, so
it would be stupid to provide it. We provide extended identifiers not
for the fun of it, but because users are requesting them.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

2005-10-25 Thread Nick Coghlan
Guido van Rossum wrote:
 On 10/25/05, Nick Coghlan [EMAIL PROTECTED] wrote:
 Almost there - this is the only issue I have left on my list :)
 [,,,]
 Why are you so keen on TypeError? I find AttributeError totally
 appropriate. I don't see symmetry with for-loops as a valuable
 property here. AttributeError and TypeError are often interchangeable
 anyway.
 The reason I'm keen on TypeError is because 'abstract.c' uses it consistently
 when it fails to find a method to support a requested protocol.
 
 Hm. abstract.c well predates the new type system. Slots and methods
 weren't really unified back then, so TypeError made obvious sense at
 the time.

Ah, I hadn't considered that, because I never made significant use of any 
Python versions before 2.2.

Maybe there's a design principle in there somewhere:

   Failed duck-typing - AttributeError (or TypeError for complex checks)
   Failed instance or subtype check - TypeError

Most of the functions in abstract.c handle complex protocols, so a simple 
attribute error wouldn't convey the necessary meaning. The context protocol, 
on the other hand, is fairly simple, and an AttributeError tells you 
everything you really need to know.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://boredomandlaziness.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Martin v. Löwis
M.-A. Lemburg wrote:
 A few years ago we had a discussion about this on python-dev
 and agreed to stick with ASCII identifiers for Python. I still
 think that's the right way to go.

I don't think there ever was such an agreement.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposed resolutions for open PEP 343 issues

2005-10-25 Thread Guido van Rossum
On 10/25/05, Nick Coghlan [EMAIL PROTECTED] wrote:
 Maybe there's a design principle in there somewhere:

Failed duck-typing - AttributeError (or TypeError for complex checks)
Failed instance or subtype check - TypeError

Doesn't convince me.

If there are principles at work here (and not just coincidences), they
are (a) don't  lightly replace an exception by another, and (b) don't
raise AttributeError; the getattr operation raise it for you. (a) says
that we should let the AttributeError bubble up in the case of the
with-statement; (b) explains why you see TypeError when a slot isn't
filled.

 Most of the functions in abstract.c handle complex protocols, so a simple
 attribute error wouldn't convey the necessary meaning. The context protocol,
 on the other hand, is fairly simple, and an AttributeError tells you
 everything you really need to know.

That's what I've been saying all the time. :-)

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Josiah Carlson

Martin v. Löwis [EMAIL PROTECTED] wrote:
 
 Josiah Carlson wrote:
  And how users could say, name error? But I typed in window.draw(PEN) as
  I was told to, and it didn't work!
 
 Ah, so the serious issues you are talking about are not security 
 issues, but usability issues.

Indeed, it was a misunderstanding, as the email stated:
I did not mean to imply that I was concerned about the security
implications of inserting arbitrary identifiers in Python (I was
mentioning the web browser case for an example of how such
characters have been confusing previously), I am concerned about
confusion involved with using: [glyphs which are identical]


 I don't think extending the range of acceptable characters will
 cause any additional confusion. Users are already getting surprising
 NameErrors/AttributeErrors in the following cases:
 - they just misspell the identifier, and then, when the error message
is printed, fail to recognize the difference, as they read over the
typo just like they read over it when mistyping it in the first place.

In this case it's not just a misreading, the characters look identical! 
When is an 'E' not an 'E'?  When it is an Epsilon or Ie.  Saying what
characters will or will not be used as identifiers, when those
characters are keys on a keyboard of a specific type, is pretty
presumptuous.


 - they run into confusions with different things having the same names
in different contexts. For example, they wonder why they get TypeError
for passing the wrong number of arguments to a function, when the
call matches exactly what the source code in front of them tells
them - only that they were calling a different function which just
happened to have the same name.

Right, and users should be reading the documentation for the functions
and methods they are calling.


 In the light of these common mistakes, your example with an identifier
 named PEN, where the P might be a cyrillic letter or the E a greek one
 is just made up: For window.draw, people will readily understand that
 they are supposed to use Latin letters. More generally, they will know
 what script to use just from looking at the identifier.

Sure, that example was made up, but there are words which have been
stolen from various languages by english, and you are discounting the
case of single-letter temporary variables.  Saying what will and won't
happen over the course of using unicode identifiers is quite the
prediction.


  Identically drawn glyphs are a problem, and pretending that they aren't
  a problem, doesn't make it so.  Right now, all possible name glyphs are
  visually distinct
 
 Not at all: Just compare Fool and Foo1 (and perhaps FooI)
 
 In the font in which I'm typing this, these are slightly different - but
 there are fonts in which the difference is really difficult to
 recognize.

Indeed, they are similar, but_ different_ in my font as well.  The trick
is that the glyphs are not different in the case of certain greek or
cyrillic letters.  They don't just /look/ similar they /are identical/.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Guido van Rossum
On 10/25/05, Josiah Carlson [EMAIL PROTECTED] wrote:
 Indeed, they are similar, but_ different_ in my font as well.  The trick
 is that the glyphs are not different in the case of certain greek or
 cyrillic letters.  They don't just /look/ similar they /are identical/.

Well, in the font I'm using to read this email, I and l are /identical/.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Josiah Carlson

Guido van Rossum [EMAIL PROTECTED] wrote:
 
 On 10/25/05, Josiah Carlson [EMAIL PROTECTED] wrote:
  Indeed, they are similar, but_ different_ in my font as well.  The trick
  is that the glyphs are not different in the case of certain greek or
  cyrillic letters.  They don't just /look/ similar they /are identical/.
 
 Well, in the font I'm using to read this email, I and l are /identical/.

In all fonts I've seen, E/Epsilon/Ie are /always identical/.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Neil Hodgson
Martin v. Löwis:

 This aspect of rendering is often not implemented, though. Web browsers
 do it correctly, see
 ...
 GUI frameworks sometimes do it correctly, sometimes don't; most
 notably, Tk has no good support for RTL text.

   Scintilla does a rough job with this. RTL text is displayed
correctly as the underlying platform libraries (Windows or GTK+/Pango)
handle this aspect when called to draw text. However editing is not
performed correctly with the caret not being placed correctly within
RTL text and other visual glitches. There is interest in the area and
even a funding proposal this week.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-25 Thread Greg Ewing
Martin v. Löwis wrote:

 For window.draw, people will readily understand that
 they are supposed to use Latin letters. More generally, they will know
 what script to use just from looking at the identifier.

Would it help if an identifier were required to be
made up of letters from the same alphabet, e.g. all
Latin or all Greek or all Cyrillic, but not a mixture.
Then you'd get an immediate error if you accidentally
slipped in a letter from the wrong alphabet.

Greg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] make testall hanging on HEAD?

2005-10-25 Thread Anthony Baxter
At the moment, I see make testall hanging in test_timeout. In 
addition, test_curses is leaving the tty in a hosed state:

test_crypt
test_csv
test_curses
test_datetime
 test_dbm
 test_decimal
 test_decorators
test_deque
  test_descr

This is on Ubuntu Breezy, 
[GCC 4.0.2 20050808 (prerelease) (Ubuntu 4.0.1-4ubuntu9)] on linux2

Anyone else see this?

-- 
Anthony Baxter [EMAIL PROTECTED]
It's never too late to have a happy childhood.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] make testall hanging on HEAD?

2005-10-25 Thread jepler
ditto on the curses problem, but test_timeout completed just fine, at least
the first time around.

fedora core 4, x86_64
[GCC 4.0.1 20050727 (Red Hat 4.0.1-5)] on linux2

Jeff


pgpTesSunOdI7.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Neil Hodgson
M.-A. Lemburg:

 You mean a slice that slices out the next indextype ?

   Yes.

 This sounds a lot like you'd want iterators for the various
 index types. Should be possible to implement on top of the
 proposed APIs, e.g. itergraphemes(u), itercodepoints(u), etc.

   Iterators may be helpful, but can also be too restrictive when the
processing is not completely iterative, such as peeking ahead or
looking behind to wrap at a word boundary in the display example.
There should be

  It was more that there may leave less scope for error if there was a
move away from indexes to slices. The PEP provides ways to specify
what you want to examine or modify but it looks to me like returning
indexes will see code repetition or additional variables with an
increase in fragility.

 Note that what most people refer to as character is a
 grapheme in Unicode speak.

   A grapheme-oriented string type may be worthwhile although you'd
probably have to choose a particular normalisation form to ease
processing.

 Given that interpretation,
 breaking Unicode characters is something you won't
 ever work around with by using larger code units such
 as UCS4 compatible ones.

   I still think we can reduce the scope for errors.

 Furthermore, you should also note that surrogates (two
 code units encoding one code point) are part of Unicode
 life. While you don't need them when storing Unicode
 in UCS4 code units, they can still be part of the
 Unicode data and the programmer has to be aware of
 these.

   Many programmers can and will ignore surrogates. One day that may
bite them but we can't close off text processing to those who have no
idea of what surrogates are, or directional marks, or that sorting is
locale dependent, or have no understanding of the difference between
NFC and NFKD normalization forms.

   Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com