Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-09 Thread Anders J. Munch
On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <[EMAIL PROTECTED]> wrote:
>>> try:
>>>  files = os.listdir(somedir, errors = strict)
>>> except OSError as e:
>>>  log()
>>>  files = os.listdir(somedir)

Instead of a codecs error handler name, how about a callback for
converting bytes to str?

os.listdir(somedir, decoder=bytes.decode)
os.listdir(somedir, decoder=lambda b: b.decode(preferredencoding, 
errors='xmlcharrefreplace'))
os.listdir(somedir, decoder=repr)

ISTM that would be simpler and more flexible than going over the
codecs registry.  One caveat though is that there's no obvious way of
telling listdir to skip a name.  But if the default behaviour for
decoder=None is to skip with a warning, then the need to explicitly
ask for files to be skipped would be small.

Terry's example would then be:

>>> try:
>>>  files = os.listdir(somedir, decoder=bytes.decode)
>>> except UnicodeDecodeError as e:
>>>  log()
>>>  files = os.listdir(somedir)

- Anders
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-09 Thread Nick Coghlan
Glenn Linderman wrote:
> On approximately 12/8/2008 9:30 AM, came the following characters from
> the keyboard of [EMAIL PROTECTED]:
>> PS: I'd like to see a similar warning issued when an access attempt
>> is made through os.environ to a variable that cannot be decoded.
> 
> 
> And argv ?  Seems like the warning technique could be useful for _any_
> interface that has been traditionally bytes, because that's the kind of
> characters that were, but now should move to (Unicode) characters.
> 
> The warnings could be the same, or very similar.
> 
> The question is if one global control should handle all types of bytes
> problems, or if there should be individual controls for each bytes
> problem, or both.  I tend to believe in both; the paranoid can set
> exactly the ones they've coded for, the aggressive can set the global
> one.  In this manner, new cases can be added to the global settings over
> time, if more are discovered -- it should be documented to handle future
> similar issues in a similar manner.

The warnings system provides that level of granularity for 'free' (so
long as we set the stack level appropriately in the C-API warnings call).

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-09 Thread Nick Coghlan
Antoine Pitrou wrote:
> Nick Coghlan  gmail.com> writes:
>> No, you misunderstand what I meant. Py_buffer doesn't need to be changed
>> at all. The *issuing type* would define a new structure with the
>> additional fields, such as:
> 
> With to the current buffer API, this is not possible. It's the caller who
> allocates the Py_buffer struct (usually on the stack), not the callee. 
> Therefore
> the callee (e.g. the getbufferproc of the issuing type) cannot choose to
> allocate a different structure.
> 
> (of course complex schemes can be devised where the callee maintains its own
> separate storage for shape and strides, but I don't think we want to go there)

In that case, as Greg noted, this is exactly what the callee should be
doing. Maintaining a PyDict instance to map from view pointers to shapes
and strides info doesn't strike me as a "complex scheme" though.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-09 Thread M.-A. Lemburg
On 2008-12-09 09:41, Anders J. Munch wrote:
> On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <[EMAIL PROTECTED]> wrote:
 try:
  files = os.listdir(somedir, errors = strict)
 except OSError as e:
  log()
  files = os.listdir(somedir)
> 
> Instead of a codecs error handler name, how about a callback for
> converting bytes to str?
> 
> os.listdir(somedir, decoder=bytes.decode)
> os.listdir(somedir, decoder=lambda b: b.decode(preferredencoding, 
> errors='xmlcharrefreplace'))
> os.listdir(somedir, decoder=repr)
> 
> ISTM that would be simpler and more flexible than going over the
> codecs registry.  One caveat though is that there's no obvious way of
> telling listdir to skip a name.  But if the default behaviour for
> decoder=None is to skip with a warning, then the need to explicitly
> ask for files to be skipped would be small.
> 
> Terry's example would then be:
> 
 try:
  files = os.listdir(somedir, decoder=bytes.decode)
 except UnicodeDecodeError as e:
  log()
  files = os.listdir(somedir)

Well, this is not too far away from just putting the whole decoding
logic into the application directly:

files = [filename.decode(filesystemencoding, errors='warnreplace')
 for filename in os.listdir(dir)]

(or os.listdirb() if that's where the discussion is heading)

... and that also tells us something about this discussion: we're
trying to come up with some magic to work around writing two
lines of Python code.

I'd just have all the os APIs return bytes and leave whatever
conversion to Unicode might be necessary to a higher level API.

Think of it: You really only need the Unicode values if you
ever want to output those values in text form somewhere.

In those cases, it's usually a human reading a log file or
screen output. Most other cases, just care about getting
some form of file identifier in order to open the file
and don't really care about the encoding of the file name
at all.

It's probably better to have a two helper functions in the os module
that take care of the conversion on demand rather than trying
to force this conversion even in cases where the application
never really needs to write the filename somewhere, e.g.
os.decodefilename() and os.encodefilename().

These should then provide some reasonable default logic, e.g.
use a 'warnreplace' error handler. Applications are then
free to use these converters or implement their own.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 09 2008)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2008-12-02: Released mxODBC.Connect 1.0.0  http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-09 Thread André Malo
* M.-A. Lemburg wrote: 


> On 2008-12-09 09:41, Anders J. Munch wrote:
> > On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <[EMAIL PROTECTED]> wrote:
>  try:
>   files = os.listdir(somedir, errors = strict)
>  except OSError as e:
>   log()
>   files = os.listdir(somedir)
> >
> > Instead of a codecs error handler name, how about a callback for
> > converting bytes to str?
> >
> > os.listdir(somedir, decoder=bytes.decode)
> > os.listdir(somedir, decoder=lambda b: b.decode(preferredencoding,
> > errors='xmlcharrefreplace')) os.listdir(somedir, decoder=repr)
> >
> > ISTM that would be simpler and more flexible than going over the
> > codecs registry.  One caveat though is that there's no obvious way of
> > telling listdir to skip a name.  But if the default behaviour for
> > decoder=None is to skip with a warning, then the need to explicitly
> > ask for files to be skipped would be small.
> >
> > Terry's example would then be:
>  try:
>   files = os.listdir(somedir, decoder=bytes.decode)
>  except UnicodeDecodeError as e:
>   log()
>   files = os.listdir(somedir)
>
> Well, this is not too far away from just putting the whole decoding
> logic into the application directly:
>
> files = [filename.decode(filesystemencoding, errors='warnreplace')
>  for filename in os.listdir(dir)]
>
> (or os.listdirb() if that's where the discussion is heading)
>
> ... and that also tells us something about this discussion: we're
> trying to come up with some magic to work around writing two
> lines of Python code.
>
> I'd just have all the os APIs return bytes and leave whatever
> conversion to Unicode might be necessary to a higher level API.

[...]

What I'm saying ;-)

+1.

nd
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-09 Thread Nick Coghlan
Antoine Pitrou wrote:
> Alexander Belopolsky  gmail.com> writes:
>> I did not follow numpy development for the last year or more, so I
>> won't qualify as "the numpy folks," but my understanding is that numpy
>> does exactly what Nick recommended: the viewed object owns shape and
>> strides just as it owns the data.  The viewing object increases the
>> reference count of the viewed object and thus assures that data, shape
>> and strides don't go away prematurely.
> 
> That doesn't work if e.g. you take a slice of a memoryview object, since the
> shape changes in the process.
> See http://bugs.python.org/issue4580

Note that the PEP is unambiguous as to who owns the pointers in the view
object:
"The exporter is responsible for making sure that any memory pointed to
by buf, format, shape, strides, and suboffsets is valid until
releasebuffer is called. If the exporter wants to be able to change an
object's shape, strides, and/or suboffsets before releasebuffer is
called then it should allocate those arrays when getbuffer is called
(pointing to them in the buffer-info structure provided) and free them
when releasebuffer is called."

The problem with memoryview appears to be related to the way it
calculates its own length (since that is the check that is failing when
the view blows up):

>>> a = array('i', range(10))
>>> m = memoryview(a)
>>> len(m) # This is the length in bytes, which is WRONG!
40
>>> m2 = memoryview(a)[2:8]
>>> len(m2) # This is correct
6
>>> a2 = array('i', range(6))
>>> m[:] = a# But this works
>>> m2[:] = a2  # and this does not
Traceback (most recent call last):
  File "", line 1, in 
ValueError: cannot modify size of memoryview object
>>> len(memoryview(a2)) # Ah, 24 != 6 is our problem!
24

Looks to me like there are a couple of bugs here:

The first is that memoryview is treating the len field in the Py_buffer
struct as the number of objects in the view in a few places instead of
as the total number of bytes being exposed (it is actually the latter,
as defined in PEP 3118).

The second is that the getbuf implementation in array.array is broken.
It is ONLY OK for shape to be null when ndim=0 (i.e. a scalar value). An
array is NOT a scalar value, so the array objects should be setting the
shape pointer to point to an single item array (where shape[0] is the
length of the array).

memoryview can then be fixed to use shape[0] instead of len to get the
number of objects in the view.

memoryview also currently gets the shape wrong on slices:

>>> m.shape
(10,)
>>> m2.shape
(10,)


Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-09 Thread Antoine Pitrou
Alexander Belopolsky  gmail.com> writes:
> 
> I did not follow numpy development for the last year or more, so I
> won't qualify as "the numpy folks," but my understanding is that numpy
> does exactly what Nick recommended: the viewed object owns shape and
> strides just as it owns the data.  The viewing object increases the
> reference count of the viewed object and thus assures that data, shape
> and strides don't go away prematurely.

That doesn't work if e.g. you take a slice of a memoryview object, since the
shape changes in the process.
See http://bugs.python.org/issue4580



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-09 Thread Nick Coghlan
Antoine Pitrou wrote:
> Alexander Belopolsky  gmail.com> writes:
>> I did not follow numpy development for the last year or more, so I
>> won't qualify as "the numpy folks," but my understanding is that numpy
>> does exactly what Nick recommended: the viewed object owns shape and
>> strides just as it owns the data.  The viewing object increases the
>> reference count of the viewed object and thus assures that data, shape
>> and strides don't go away prematurely.
> 
> That doesn't work if e.g. you take a slice of a memoryview object, since the
> shape changes in the process.
> See http://bugs.python.org/issue4580

I have zero problem whatsoever if slice assignment TO a memoryview
object is permitted only if the shape stays the same (i.e. I think that
issue should be closed as "not a bug").

The buffer protocol permits you to edit the DATA held by another object.
It doesn't let you edit the *structure* of that object (which is what
would be implied by changing the shape of the object).

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-09 Thread Anders J. Munch
M.-A. Lemburg wrote:
> 
> Well, this is not too far away from just putting the whole decoding
> logic into the application directly:
> 
> files = [filename.decode(filesystemencoding, errors='warnreplace')
>  for filename in os.listdir(dir)]
> 
> (or os.listdirb() if that's where the discussion is heading)

I see what you mean, and yes, I think os.listdirb will do just as
well.  There is no need for any extra parameters to os.listdir.  The
typical application will just obliviously use os.listdir(dir) and get
the default elide-and-warn behaviour for un-decodable names.  That
rare special application that needs more control can use os.listdirb
and handle decoding itself.

Using a global registry of error handlers would just get in the way of
an application that needs more control.

- Anders
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-09 Thread Antoine Pitrou
Le mardi 09 décembre 2008 à 22:33 +1000, Nick Coghlan a écrit :
> I have zero problem whatsoever if slice assignment TO a memoryview
> object is permitted only if the shape stays the same (i.e. I think that
> issue should be closed as "not a bug").

I'm not even talking about slice /assignment/ here, just read-only
slicing.
Slicing a memoryview must produce another memoryview with a different
shape but with the same underlying object. That's why I have to modify
the shape field /after/ the new Py_buffer is initialized.

> The buffer protocol permits you to edit the DATA held by another
> object. It doesn't let you edit the *structure* of that object

Perhaps, but it's necessary for slicing.

> The first is that memoryview is treating the len field in the
> Py_buffer struct as the number of objects in the view in a few places
> instead of as the total number of bytes being exposed (it is actually
> the latter, as defined in PEP 3118).

I don't understand the difference between "the number of objects in the
view" and "the total number of bytes being exposed". For me it should be
the same and the "buf" and "len" fields in the Py_buffer should be
usable by any other C function, otherwise they are useless.

> memoryview also currently gets the shape wrong on slices:

I know, that's what I'm trying to fix...



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-09 Thread Antoine Pitrou
Antoine Pitrou  pitrou.net> writes:
> 
> > The first is that memoryview is treating the len field in the
> > Py_buffer struct as the number of objects in the view in a few places
> > instead of as the total number of bytes being exposed (it is actually
> > the latter, as defined in PEP 3118).
> 
> I don't understand the difference between "the number of objects in the
> view" and "the total number of bytes being exposed". For me it should be
> the same and the "buf" and "len" fields in the Py_buffer should be
> usable by any other C function, otherwise they are useless.

Sorry, I had misread your message. Yes, indeed "len" should the number of bytes,
not the number of objects. This is also solved as part of the patch I proposed
in the aforementioned bug entry.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RELEASED Python 3.0 final

2008-12-09 Thread rdmurray

On Sat, 6 Dec 2008 at 20:19, [EMAIL PROTECTED] wrote:

On 05:54 pm, [EMAIL PROTECTED] wrote:

On Fri, Dec 5, 2008 at 9:28 PM,  <[EMAIL PROTECTED]> wrote:
Whenever someone asks me which version to use, I alwasys respond with
a question -- what do you want to use it for?


In the longer term, I think that you should look at this as a symptom of a 
problem.  If you learn Java, you learn the most recent version.  If you need 
your software to work with an older version, you just pass a special option


Sometimes this even works.  But it isn't always easy to get it right,
and if you are mixing librarieswell, in my real-world experience we
wound up upgrading the VM.

to the compiler.  If you want your *old* software to work with a *new* 
version, it basically just does (at least, 99% of the time).


If you specify the source option correctly.

It seems to me that 3to2 and 2to3 are the python equivalent to the javac
'target' and 'source' options.  Like Guido said, the python community
just doesn't have the resources to make them perfect :(.

Based on a quick google, the Java community appears to be grappling
with these same issues:

http://blog.adjective.org/post/2008/02/21/Java-Backwards-Compatability

the poster seems intent on maintaining more backward compatibility
than we have with python2/3, until you remember that java uses a
compile-and-distribute-binaries paradigm and python does not.  Once you
realize that, the differences in backward compatibility don't
seem so large...at least to me.

--RDM
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-09 Thread James Y Knight

On Dec 9, 2008, at 6:04 AM, Anders J. Munch wrote:
The typical application will just obliviously use os.listdir(dir)  
and get the default elide-and-warn behaviour for un-decodable names.  
That rare special application


I guess this is a new definition of rare special application: "an  
application which deals with user-specified files".


This is the problem I see in having two parallel APIs: people keep  
saying "most applications can just go ahead and use the [broken]  
unicode string API". If there was a unicode API and a bytes API, but  
everyone was clear that "always use the bytes API" is the right thing  
to do, that'd be okay... But, since even python-dev members are saying  
that only a rare special app needs to care about working with users'  
existing files, I'm rather worried this API design will cause most  
programs written in python to be broken. Which seems a shame.


that needs more control can use os.listdirb and handle decoding  
itself.


James
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Floating-point implementations

2008-12-09 Thread Steve Holden
Is anyone aware of any implementations that use other than 64-bit
floating-point? I'd be particularly interested in any that use greater
precision than the usual 56-bit mantissa. Do modern 64-bit systems
implement anything wider than the normal double?

regards
 Steve
-- 
Steve Holden+1 571 484 6266   +1 800 494 3119
Holden Web LLC  http://www.holdenweb.com/

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Floating-point implementations

2008-12-09 Thread Mark Dickinson
On Tue, Dec 9, 2008 at 5:15 PM, Steve Holden <[EMAIL PROTECTED]> wrote:
> Is anyone aware of any implementations that use other than 64-bit
> floating-point? I'd be particularly interested in any that use greater
> precision than the usual 56-bit mantissa. Do modern 64-bit systems
> implement anything wider than the normal double?

I don't know of any.  There are certainly places in the codebase that
assume 56 bits are enough.  (I seem to recall it's something like
56 bits for IBM, 53 bits for IEEE 754, 48 for Cray, and 52 or 56 for VAX.)

Many systems have a "long double" type, which usually seems to
be either 80-bit (with a 64-bit mantissa) or 128-bit.  The latter is
sometimes implemented as a pair of doubles, effectively giving
a 106-bit mantissa, and sometimes as an IEEE extended precision
type;  I don't know how many bits the mantissa would have in that
case, but surely not more than 117.

I asked a related question a while ago:

http://mail.python.org/pipermail/python-dev/2008-February/076680.html

Mark
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Floating-point implementations

2008-12-09 Thread Mark Dickinson
On Tue, Dec 9, 2008 at 5:15 PM, Steve Holden <[EMAIL PROTECTED]> wrote:
> precision than the usual 56-bit mantissa. Do modern 64-bit systems
> implement anything wider than the normal double?

I may have misinterpreted your question.  Are you asking simply
about what the hardware provides, or about what the C compiler
and library support?  Or something else entirely?

It looks like IEEE-conforming 128-bit floats would have a 113-bit
mantissa (including the implicit leading '1' bit).

Mark
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Floating-point implementations

2008-12-09 Thread Steve Holden
Mark Dickinson wrote:
> On Tue, Dec 9, 2008 at 5:15 PM, Steve Holden <[EMAIL PROTECTED]> wrote:
>> precision than the usual 56-bit mantissa. Do modern 64-bit systems
>> implement anything wider than the normal double?
> 
> I may have misinterpreted your question.  Are you asking simply
> about what the hardware provides, or about what the C compiler
> and library support?  Or something else entirely?
> 
> It looks like IEEE-conforming 128-bit floats would have a 113-bit
> mantissa (including the implicit leading '1' bit).
> 
I was actually asking about Python implementations, and read your
original answer as meaning "no, there aren't any". I had assumed,
correctly or otherwise, that the C library would have to offer
well-integrated support to enable its use in Python. In fact I had
assumed it would need to be pretty much a drop-in repleacement, but it
sounds as though there are some hard-coded assumptions about float size
that would not allow that.

regards
 Steve
-- 
Steve Holden+1 571 484 6266   +1 800 494 3119
Holden Web LLC  http://www.holdenweb.com/

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-09 Thread Ulrich Eckhardt
On Monday 08 December 2008, Adam Olsen wrote:
> At this point someone suggests we have a type that can store an
> arbitrary mix of unicode and bytes, so the undecodable portions stay
> in their original form. :P

Well, not an arbitrary mix, but a type that just stores whatever comes from 
the system without further specifying it as either bytes or Unicode:

* If you want a string for displaying it, you first have to extract a string 
from that thing and there you optionally specify the encoding and error 
behaviour.
* If you want to append a string to it, it is automatically encoded in the 
default encoding, which obviously can fail.
* Similarly, e.g. globbing is done on the underlying representation's level, 
so "*.py" will first have to be converted according to the default encoding.
* If you just print it, you will get something that you can make out the 
decodable parts from, but it will probably be like "{Unicode:u'abcde'}" 
or "{bytes:b'ab\xf0\x0fcd'}".
* If you don't want to display it, but just want to pass it to the system, 
just use it as is.

Yes, this puts an inconvenience on application programmers that up to now 
always assumed that they received a list of strings from os.readdir(), but 
that's the way with false assumptions. In any case, they will be aware (from 
reading the docs) of what the problem is and why there is no way to return a 
text. Further, they will get tools to convert these paths or environment vars 
to texts, so it will be simply replacing "os.readdir()" 
with "map(to_unicode,os.readdir())".


Uli

-- 
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

**
   Visit our website at 
**
Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten 
bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen 
Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein 
sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, 
weitergeleitet, veröffentlicht oder anderweitig benutzt werden.
E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte 
Änderungen enthalten. Sator Laser GmbH ist für diese Folgen nicht 
verantwortlich.

**

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Forking and pipes

2008-12-09 Thread Lars Kotthoff
Dear list,

 I recently noticed a python program which uses forks and pipes for
communication between the processes not behaving as expected. The minimal
example program:


#!/usr/bin/python

import os, sys

r, w = os.pipe()
write = os.fdopen(w, 'w')
print >> write, "foo"
pid = os.fork()
if pid:
os.waitpid(pid, 0)
else:
sys.exit(0)
write.close()
read = os.fdopen(r)
print read.read()
read.close()


This prints out "foo" twice although it's only written once to the pipe. It
seems that python doesn't flush file descriptors before copying them to the
child process, thus resulting in the duplicate message. The equivalent C
program behaves as expected,


#include 
#include 
#include 

int main(void) {
int fds[2];
pid_t pid;
char* buf = (char*) calloc(4, sizeof(char));

pipe(fds);
write(fds[1], "foo", 3);

pid = fork();
if(pid) {
waitpid(pid, NULL, 0);
} else {
return EXIT_SUCCESS;
}

close(fds[1]);

read(fds[0], buf, 3);
printf("%s\n", buf);
close(fds[0]);

free(buf);

return EXIT_SUCCESS;
}


Is this behaviour intentional? I've tested both python and C on Linux, OpenBSD
and Solaris (python versions 2.5.2 and 2.3.3), the behaviour was the same
everywhere.

Thanks,

Lars
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-09 Thread Adam Olsen
On Tue, Dec 9, 2008 at 11:31 AM, Ulrich Eckhardt
<[EMAIL PROTECTED]> wrote:
> On Monday 08 December 2008, Adam Olsen wrote:
>> At this point someone suggests we have a type that can store an
>> arbitrary mix of unicode and bytes, so the undecodable portions stay
>> in their original form. :P
>
> Well, not an arbitrary mix, but a type that just stores whatever comes from
> the system without further specifying it as either bytes or Unicode:
>
> * If you want a string for displaying it, you first have to extract a string
> from that thing and there you optionally specify the encoding and error
> behaviour.
> * If you want to append a string to it, it is automatically encoded in the
> default encoding, which obviously can fail.

So the 2.x str, but with a more interesting default encoding than
ASCII.  It'll work fine on the developer's system, but one day a user
will present it with strange input, and boom.

You have to be pessimistic here.  The default operations should either
always work or never work.  Using unicode internally and skipping
garbage input means the operations always work.  Using a bytes API
means mixing with unicode never works, unless the programmer
explicitly converts, in which case the onus is on them to use proper
error handling.

The only thing separating this from a bikeshed discussion is that a
bikeshed has many equally good solutions, while we have no good
solutions.  Instead we're trying to find the least-bad one.  The
unicode/bytes separation is pretty close to that.  Adding a warning
gets even closer.  Adding magic makes it worse.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Forking and pipes

2008-12-09 Thread James Y Knight


On Dec 9, 2008, at 2:26 PM, Lars Kotthoff wrote:


Dear list,

I recently noticed a python program which uses forks and pipes for
communication between the processes not behaving as expected. The  
minimal

example program:

[snip]


This prints out "foo" twice although it's only written once to the  
pipe. It
seems that python doesn't flush file descriptors before copying them  
to the
child process, thus resulting in the duplicate message. The  
equivalent C

program behaves as expected,

[snip]

Is this behaviour intentional? I've tested both python and C on  
Linux, OpenBSD
and Solaris (python versions 2.5.2 and 2.3.3), the behaviour was the  
same

everywhere.



Yes, it's intentional. And, no, your programs aren't equivalent.

Rewrite your C program to use fdopen, and fread/fwrite. *Then* it will  
be equivalent and have the same behavior as the python program.


Alternatively, you can change your python program to use os.read/ 
os.write instead of fdopen and fileobject.read/fileobject.write, if  
you want your python program to work like the C program.


James
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Forking and pipes

2008-12-09 Thread Alexander Shigin
В Втр, 09/12/2008 в 19:26 +, Lars Kotthoff пишет:
> Dear list,
> 
>  I recently noticed a python program which uses forks and pipes for
> communication between the processes not behaving as expected. The minimal
> example program:

If you write 

r, w = os.pipe()
os.write(w, 'foo')
pid = os.fork()


You'll get the same result as C program. Or if you use fdopen in C
program you'll get the same result as Python.

The problem with the example is libc buffering. If you say
write.flush(), buffer won't be shared with child process and you'll see
only one 'foo'.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-09 Thread Toshio Kuratomi
James Y Knight wrote:
> On Dec 9, 2008, at 6:04 AM, Anders J. Munch wrote:
>> The typical application will just obliviously use os.listdir(dir) and
>> get the default elide-and-warn behaviour for un-decodable names. That
>> rare special application
> 
> I guess this is a new definition of rare special application: "an
> application which deals with user-specified files".
> 
> This is the problem I see in having two parallel APIs: people keep
> saying "most applications can just go ahead and use the [broken] unicode
> string API". If there was a unicode API and a bytes API, but everyone
> was clear that "always use the bytes API" is the right thing to do,
> that'd be okay... But, since even python-dev members are saying that
> only a rare special app needs to care about working with users' existing
> files, I'm rather worried this API design will cause most programs
> written in python to be broken. Which seems a shame.
> 
I agree with you which was part of why I raised this subject but I also
think that using the warnings module to issue a warning and ignore the
entire problematic entry is a reasonable compromise.  Hopefully it will
become obvious to people that it's a python3 wart at some point in the
future and we'll re-examine the default.  But until then, having a
printed warning that individual apps can turn into an exception seems
like it is less broken than the other alternatives the "rare special
application" people can live with :-)

-Toshio



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-09 Thread Nick Coghlan
Antoine Pitrou wrote:
> Le mardi 09 décembre 2008 à 22:33 +1000, Nick Coghlan a écrit :
>> memoryview also currently gets the shape wrong on slices:
> 
> I know, that's what I'm trying to fix...

Yes, I was slightly misled by your use of slice assignment to
demonstrate the problem. It also turns out that while assignment to
memoryviews has issues, and so does slicing, there is a fundamental
problem with the length calculation when a memoryview is first created
which is further confusing matters.

For the slicing problem in particular, memoryview is currently trying to
get away with only one Py_buffer object when it needs TWO.

The first Py_buffer object needs to describe the view the memoryview has
of the target object (i.e. it describes the entire data area of the
target). The shape/strides/etc pointers in that struct are owned by the
target object. The existing self->view tends to fill this role fairly well.

The *second* (currently nonexistent) Py_buffer object needs to describe
the memory layout that the memoryview exposes to the rest of the world.
The pointers in *this* struct will be owned by the memoryview object and
accurately reflect any changes in shape due to slicing operations.

Currently, memoryview is trying to make the first Py_buffer also fill
the role of the second one, and that obviously isn't going to work for
subviews.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-09 Thread Greg Ewing

Nick Coghlan wrote:

Maintaining a PyDict instance to map from view pointers to shapes
and strides info doesn't strike me as a "complex scheme" though.


I don't see why a given buffer provider should ever need
more than one set of shape/strides arrays at a time. It
can allocate them on creation, reallocate them as needed
if the shape of its internal data changes, and deallocate
them when it goes away.

If you are creating view objects that present slices or
some other alternative perspective, then the view object
itself is a buffer provider and should maintain shape/stride
arrays for its particular view of the underlying object.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-09 Thread Greg Ewing

Antoine Pitrou wrote:


That doesn't work if e.g. you take a slice of a memoryview object, since the
shape changes in the process.
See http://bugs.python.org/issue4580


I haven't looked in detail at how memoryview is currently
implemented, but it seems to me that the way it should work
is that whenever you access a slice, it obtains a fresh
Py_Buffer from the underlying object, and does the right
thing based on the shape/strides from that together with
the slice ranges.

The only time it should need to allocate its own shape/strides
is if you request a Py_Buffer from the memoryview itself,
at which time it should obtain a Py_Buffer from the underlying
object, update its own shape/strides and pass them to the
caller. The underlying Py_Buffer lock should be held until
the caller releases the memoryview's Py_Buffer, ensuring
that its shape/strides remains valid for as long as they're
needed.

--
Greg

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-09 Thread Greg Ewing

Nick Coghlan wrote:


[from the PEP] "If the exporter wants to be able to change an
object's shape, strides, and/or suboffsets before releasebuffer is
called then it should allocate those arrays when getbuffer is called
(pointing to them in the buffer-info structure provided) and free them
when releasebuffer is called."


Even allowing this seems rather dubious to me. I suppose
there's no serious danger as long as the block of memory
ultimately holding the data doesn't move or change size,
but changing the shape could confuse a buffer user that's
iterating over the data.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-09 Thread Antoine Pitrou
Nick Coghlan  gmail.com> writes:
> 
> For the slicing problem in particular, memoryview is currently trying to
> get away with only one Py_buffer object when it needs TWO.

Why should it need two? Why couldn't the embedded Py_buffer fullfill all the
needs of the memoryview object? If the memoryview can't be a relatively thin
object-oriented wrapper around a Py_buffer, then this all screams failure to me.



In all honesty, I admit I am annoyed by all the problems with the buffer API /
memoryview object, many of which are caused by its utterly bizarre design (and
the fact that the design team went missing in action after imposing such a
bizarre and complex design on us), and I'm reluctant to add yet another level of
byzantine complexity in order to solve those problems. It explains I may sound a
bit angry at times :-)

If we really need to change things a lot to make them work, we should re-work
the buffer API from the ground up, make the Py_buffer struct a true PyObject
(that is, a true variable-length object so as to solve the shape and strides
allocation issue) and merge it with the current memoryview implementation. It
would make things both more simpler and more flexible.

But of course it would destroy C-level compatibility with 2.6 / 3.0.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-09 Thread Greg Ewing

Antoine Pitrou wrote:


Why should it need two? Why couldn't the embedded Py_buffer fullfill all the
needs of the memoryview object? 


Two things here:

  1) The memoryview should *not* be holding onto a Py_buffer
 in between calls to its getitem and setitem methods. It
 should request one from the underlying object when needed
 and release it again as soon as possible.

  2) The "second" Py_buffer referred to above only needs to
 be materialized when someone makes a GetBuffer request on
 the memoryview itself. It's not needed for Python getitem
 and setitem calls. (The implementation might choose to
 implement these by creating a temporary Py_buffer, but
 again, it would only last as long as the call.)


If the memoryview can't be a relatively thin
object-oriented wrapper around a Py_buffer, then this all screams failure to me.


It shouldn't be a wrapper around a Py_buffer, it should be a
wrapper around the buffer *interface* of the underlying object.


In all honesty, I admit I am annoyed by all the problems with the buffer API /
memoryview object, many of which are caused by its utterly bizarre design


It sounds to me like whoever wrote the memoryview implementation
didn't understand how the buffer interface is meant to be used.
That doesn't mean there's anything wrong with the buffer interface.

I have some doubts myself about whether it needs to be as
complicated as it is, but I think the basic idea is sound:
that Py_buffer objects are ephemeral, to be obtained when
needed and not kept for any longer than necessary.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Forking and pipes

2008-12-09 Thread Greg Ewing

Lars Kotthoff wrote:


This prints out "foo" twice although it's only written once to the pipe. It
seems that python doesn't flush file descriptors before copying them to the
child process, thus resulting in the duplicate message. The equivalent C
program behaves as expected,


Your Python and C programs are not equivalent -- the C one is
writing directly to the file descriptor, whereas the Python one
is effectively using a buffered stdio stream. The unflushed stdio
buffer is getting copied by the fork, hence the duplicate output.

Solution: either (a) flush the Python file object before forking
or (b) use os.write() directly on the fd to avoid the buffering.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-09 Thread Antoine Pitrou
Greg Ewing  canterbury.ac.nz> writes:
> 
>1) The memoryview should *not* be holding onto a Py_buffer
>   in between calls to its getitem and setitem methods. It
>   should request one from the underlying object when needed
>   and release it again as soon as possible.

If the memoryview wasn't holding onto a Py_buffer, one couldn't rely on its
length or anything else because the underlying object could be mutated at any
moment (even by another thread). It would make memoryview objects basically
unusable for anything except bytes objects (which are immutable).

>2) The "second" Py_buffer referred to above only needs to
>   be materialized when someone makes a GetBuffer request on
>   the memoryview itself.

It's already what is being done, but that's got nothing to do with the problem
at hand. We are talking about slicing the memoryview, not taking a (non-sliced)
buffer of it.

>   It's not needed for Python getitem
>   and setitem calls.

What is needed for Python getitem and setitem calls is proper shape information
in the embedded Py_buffer struct, otherwise memoryview slices are buggy. In the
case of a memoryview slice, the proper shape information can only be computed
*after* the Py_buffer is obtained.

> It sounds to me like whoever wrote the memoryview implementation
> didn't understand how the buffer interface is meant to be used.

Perhaps, perhaps not, but without any concrete suggestion we won't go anywhere.

As I said, I don't think it would be foolish to revamp the current spec and/or
implementation /if we have a precise plan of how to do better/. The /if/ part is
important :-)

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Floating-point implementations

2008-12-09 Thread Martin v. Löwis
> Is anyone aware of any implementations that use other than 64-bit
> floating-point?

As I understand you are asking about Python implementations:
sure, the gmpy package supports arbitrary-precision floating point.

> I'd be particularly interested in any that use greater
> precision than the usual 56-bit mantissa. 

Nit-pickingly: it's usual that the mantissa is 53-bit.

> Do modern 64-bit systems implement anything wider
> than the normal double?

As Mark said: sure. x86 systems have supported 80-bit
"extended" precision for ages. Some architectures have
architecture support for 128-bit floats (e.g. Itanium, SPARC v9);
it's not clear to me whether they actually implement the
long double operations in hardware, or whether they trap
and get software-emulated.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com