[Python-Dev] Mini Path object

2006-11-01 Thread Mike Orr
Posted to python-dev and python-3000.  Follow-ups to python-dev only please.

On 10/31/06, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
>   here's mine; it's fully backwards compatible, can go right into 2.6,
> and can be incrementally improved in future releases:
>
> 1) add a pathname wrapper to "os.path", which lets you do basic
>path "algebra".  this should probably be a subclass of unicode,
>and should *only* contain operations on names.
>
> 2) make selected "shutil" operations available via the "os" name-
>space; the old POSIX API vs. POSIX SHELL distinction is pretty
>irrelevant.  also make the os.path predicates available via the
>"os" namespace.
>
> this gives a very simple conceptual model for the user; to manipulate
> path *names*, use "os.path.(string)" functions or the ""
> wrapper.  to manipulate *objects* identified by a path, given either as
> a string or a path wrapper, use "os.(path)".  this can be taught in
> less than a minute.

Given the widely-diverging views on what, if anything, should be done
to os.path, how about we make a PEP and a standalone implementation of
(1) for now, and leave (2) and everything else for a later PEP.  This
will make people who want a reasonably forward-compatable object NOW
for their Python 2.4/2.5 programs happy, provide a common seed for
more elaborate libraries that may be proposed for the standard library
later (and eliminate the possibility of moving the other functions and
later deprecating them), and provide a module that will be well tested
by the time 2.6 is ready for finalization.

There's already a reference implementation in PEP 355, we'd just have
to strip out the non-pathname features.  There's a copy here
(http://wiki.python.org/moin/PathModule) that looks reasonably recent
(constructors are self.__class__() to make it subclassable), although
I wonder why the class is called path instead of Path.  There was
another copy in the Python CVS  although I can't find it now; was it
deleted in the move to Subversion?  (I thought it was in
/sandbox/trunk/: http://svn.python.org/view/sandbox/trunk/).

So, let's say we strip this Path class to:

class Path(unicode):
Path("foo")
Path(  Path("directory"),   "subdirectory", "file")# Replaces
.joinpath().
Path()
Path.cwd()
Path("ab") + "c"  => Path("abc")
.abspath()
.normcase()
.normpath()
.realpath()
.expanduser()
.expandvars()
.expand()
.parent
.name # Full filename without path
.namebase# Filename without extension
.ext
.drive
.splitpath()
.stripext()
.splitunc()
.uncshare
.splitall()
.relpath()
.relpathto()

Would this offend anyone?  Are there any attribute renames or method
enhancements people just can't live without?  'namebase' is the only
name I hate but I could live with it.

The multi-argument constructor is a replacement for joining paths.
(The PEP says .joinpath was "problematic" without saying why.)This
could theoretically go either way, doing either the same thing as
os.path.join, getting a little smarter, or doing "safe" joins by
disallowing "/" embedded in string arguments.

I would say that a directory-tuple Path object with these features
could be maintained in parallel, but since the remaining functions
require string arguments you'd have to use unicode() a lot.

-- 
Mike Orr <[EMAIL PROTECTED]>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Martin v. Löwis
Travis E. Oliphant schrieb:
>> 2. Should primitive type codes be characters or integers (from an enum) at
>> C level?
>> - I prefer integers
> 
>> 3. Should size be expressed in bits or bytes?
>> - I prefer bits
>>
> 
> So, you want an integer enum for the "kind" and an integer for the 
> bitsize?   That's fine with me.
> 
> One thing I just remembered.  We have T_UBYTE and T_BYTE, etc. defined 
> in structmember.h already.  Should we just re-use those #defines while 
> adding to them to make an easy to use interface for primitive types?

Notice that those type codes imply sizes, namely the platform sizes
(where "platform" always means "what the C compiler does"). So if
you want to have platform-independent codes as well, you shouldn't
use the T_ codes.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Alexander Belopolsky
Travis E. Oliphant  ieee.org> writes:

> 
> Alexander Belopolsky wrote:
> > ...
> > 1. Should primitive types be associated with simple type codes
 (short, 
int, long,
> > float, double) or type/size pairs [(int,16), (int, 32), (int, 64), 
(float, 32), 
> > (float, 64)]?
> >  - I prefer pairs
> > 
> > 2. Should primitive type codes be characters or integers (from an 
enum) at
> > C level?
> > - I prefer integers
> 
> Are these orthogonal?
> 

Do you mean are my quiestions 1 and 2 orthogonal? I guess they are.

> > 
> > 3. Should size be expressed in bits or bytes?
> > - I prefer bits
> > 
> 
> So, you want an integer enum for the "kind" and an integer for the 
> bitsize?   That's fine with me.
> 
> One thing I just remembered.  We have T_UBYTE and T_BYTE, etc. defined 
> in structmember.h already.  Should we just re-use those #defines while 
> adding to them to make an easy to use interface for primitive types?
> 

I was thinking about using something like NPY_TYPES enum, but T_* 
codes would work as well.  Let me just present both options for the
 record:

 --- numpy/ndarrayobject.h ---

enum NPY_TYPES {NPY_BOOL=0,
NPY_BYTE, NPY_UBYTE,
NPY_SHORT, NPY_USHORT,
NPY_INT, NPY_UINT,
NPY_LONG, NPY_ULONG,
NPY_LONGLONG, NPY_ULONGLONG,
NPY_FLOAT, NPY_DOUBLE, NPY_LONGDOUBLE,
NPY_CFLOAT, NPY_CDOUBLE, NPY_CLONGDOUBLE,
NPY_OBJECT=17,
NPY_STRING, NPY_UNICODE,
NPY_VOID,
NPY_NTYPES,
NPY_NOTYPE,
NPY_CHAR,  /* special flag */
NPY_USERDEF=256  /* leave room for characters */
};

--- structmember.h ---

/* Types */
#define T_SHORT 0
#define T_INT   1
#define T_LONG  2
#define T_FLOAT 3
#define T_DOUBLE4
#define T_STRING5
#define T_OBJECT6
/* XXX the ordering here is weird for binary compatibility */
#define T_CHAR  7   /* 1-character string */
#define T_BYTE  8   /* 8-bit signed int */
/* unsigned variants: */
#define T_UBYTE 9
#define T_USHORT10
#define T_UINT  11
#define T_ULONG 12

/* Added by Jack: strings contained in the structure */
#define T_STRING_INPLACE13

#define T_OBJECT_EX 16  /* Like T_OBJECT, but raises AttributeError
   when the value is NULL, instead of
   converting to None. */
#ifdef HAVE_LONG_LONG
#define T_LONGLONG  17  
#define T_ULONGLONG  18
#endif /* HAVE_LONG_LONG */




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Travis E. Oliphant
Alexander Belopolsky wrote:
> Travis Oliphant  ieee.org> writes:
>> Don't lump those ideas together.  Shapes and strides are necessary for 
>> N-dimensional array's (it's essentially what *defines* the N-dimensional 
>> array).   I really don't want to sacrifice those in the extended buffer 
>> protocol.  If you want to separate them into different functions then 
>> that is a possibility.
>>
> 
> I don't understand.  Do you want to discuss shapes and strides separately
> from the datatype or not? Note that in ctypes shape is a property of 
> datatype (as in c_int*2*3).   In your proposal, shapes and strides are
> communicated separately.  This presents a unique memory management
> challenge: if the object does not contain shape information in a ready to
> be pointed to form, who is responsible for deallocating the shape array?  
>  

Perhaps a "view object" should be returned like /F suggests and it 
manages the shape, strides, and data-format.


>>> If we manage to agree on the standard way to pass primitive type 
>>> information,
>>> it will be a big achievement and immediately useful because simple arrays 
>>> are
>>> already in the standard library.
>>>
>> We could start there, I suppose.  Especially if it helps us all get on 
>> the same page.
> 
> Let's start:
> 
> 1. Should primitive types be associated with simple type codes (short, int, 
> long,
> float, double) or type/size pairs [(int,16), (int, 32), (int, 64), (float, 
> 32), 
> (float, 64)]?
>  - I prefer pairs
> 

> 2. Should primitive type codes be characters or integers (from an enum) at
> C level?
> - I prefer integers

Are these orthogonal?

> 
> 3. Should size be expressed in bits or bytes?
> - I prefer bits
> 

So, you want an integer enum for the "kind" and an integer for the 
bitsize?   That's fine with me.

One thing I just remembered.  We have T_UBYTE and T_BYTE, etc. defined 
in structmember.h already.  Should we just re-use those #defines while 
adding to them to make an easy to use interface for primitive types?

-Travis


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Alexander Belopolsky
Travis Oliphant  ieee.org> writes:
>
> Don't lump those ideas together.  Shapes and strides are necessary for 
> N-dimensional array's (it's essentially what *defines* the N-dimensional 
> array).   I really don't want to sacrifice those in the extended buffer 
> protocol.  If you want to separate them into different functions then 
> that is a possibility.
>

I don't understand.  Do you want to discuss shapes and strides separately
from the datatype or not? Note that in ctypes shape is a property of 
datatype (as in c_int*2*3).   In your proposal, shapes and strides are
communicated separately.  This presents a unique memory management
challenge: if the object does not contain shape information in a ready to
be pointed to form, who is responsible for deallocating the shape array?  
 
> > 
> > If we manage to agree on the standard way to pass primitive type 
> > information,
> > it will be a big achievement and immediately useful because simple arrays 
> > are
> > already in the standard library.
> > 
> 
> We could start there, I suppose.  Especially if it helps us all get on 
> the same page.

Let's start:

1. Should primitive types be associated with simple type codes (short, int, 
long,
float, double) or type/size pairs [(int,16), (int, 32), (int, 64), (float, 32), 
(float, 64)]?
 - I prefer pairs

2. Should primitive type codes be characters or integers (from an enum) at
C level?
- I prefer integers

3. Should size be expressed in bits or bytes?
- I prefer bits


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread Mike Orr
On 11/1/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> On 01:46 am, [EMAIL PROTECTED] wrote:
> >On 11/1/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> >This is ironic coming from one of Python's celebrity geniuses.  "We
> >made this class but we don't know how it works."  Actually, it's
> >downright alarming coming from someone who knows Twisted inside and
> >out yet still can't make sense of path patform oddities.
>
> Man, it is going to be hard being ironically self-deprecating if people keep
> going around calling me a "celebrity genius".  My ego doesn't need any help,
> you know? :)

I respect Twisted in the same way I respect a loaded gun.  It's
powerful, but approach with caution.

> If you ever think I'm suggesting breaking something in Python, you're
> misinterpreting me ;).  I am as cagey as they come about this.  No matter
> what else happens, the behavior of os.path should not really change.

The point is, what *should* a join-like method do in a future improved
path module?  os.path.join should not change because too many programs
depend on its current behavior, in ways we can't necessarily predict.
But a new function/method is not bound by these constraints, as long
as the boundary cases are well documented.  All the os.path and
file-related os/shutil functions need to be reexamined in this
context.  Maybe the existing behavior is best, maybe we'll keep it
even if it's sub-optimal, but we should document why we're making
these choices.

> >The user didn't call normpath, so should we normalize it anyway?
>
> That's really the main point here.
>
> What is a path that hasn't been "normalized"?  Is it a path at all, or is it
> some random garbage with slashes (or maybe other things) in it?  os.path
> performs correct path algebra on correct inputs, and it's correct (as far as
> one can be correct) on inputs that have weird junk in them.

I'm tempted to say Path("/a/b").join("c", "d") should do the same
thing your .child method does, but allow multiple levels in one step.

But on the other hand, there will always be people with prebuilt
"path/fragments" to join to other fragments, and I'm not sure we
should force them to split the fragment just to rejoin it again.
Maybe we need a .join_unsafe method for this, haha.

-- 
Mike Orr <[EMAIL PROTECTED]>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread glyph
On 01:46 am, [EMAIL PROTECTED] wrote:>On 11/1/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:>This is ironic coming from one of Python's celebrity geniuses.  "We>made this class but we don't know how it works."  Actually, it's>downright alarming coming from someone who knows Twisted inside and>out yet still can't make sense of path patform oddities.Man, it is going to be hard being ironically self-deprecating if people keep going around calling me a "celebrity genius".  My ego doesn't need any help, you know? :)In some sense I was being serious; part of the point of abstraction is embedding some of your knowledge in your code so you don't have to keep it around in your brain all the time.  I'm sure that my analysis of path-based problems wasn't exhaustive because I don't really use os.path for path manipulation.  I use static.File and it _works_, I only remember these os.path flaws from the process of writing it, not daily use.>>  * This is confusing as heck:>>    >>> os.path.join("hello", "/world")>>    '/world'>>That's in the documentation.  I'm not sure it's "wrong".  What should>it do in this situation?  Pretend the slash isn't there?You can document anything.  That doesn't really make it a good idea.The point I was trying to make wasn't really that os.path is *wrong*.  Far from it, in fact, it defines some useful operations and they are basically always correct.  I didn't even say "wrong", I said "confusing".  FilePath is implemented strictly in terms of os.path because it _does_ do the right thing with its inputs.  The question is, how hard is it to remember what its inputs should be?>>    >>> os.path.join("hello", "slash/world")>>    'hello/slash/world'>>That has always been a loophole in the function, and many programs>depend on it.If you ever think I'm suggesting breaking something in Python, you're misinterpreting me ;).  I am as cagey as they come about this.  No matter what else happens, the behavior of os.path should not really change.>The user didn't call normpath, so should we normalize it anyway?That's really the main point here.What is a path that hasn't been "normalized"?  Is it a path at all, or is it some random garbage with slashes (or maybe other things) in it?  os.path performs correct path algebra on correct inputs, and it's correct (as far as one can be correct) on inputs that have weird junk in them.In the strings-and-functions model of paths, this all makes perfect sense, and there's no particular sensibility associated with defining ideas like "equivalency" for paths, unless that's yet another function you pass some strings to.  I definitely prefer this:    path1 == path2to this:    os.path.abspath(pathstr1) == os.path.abspath(pathstr2)though.You'll notice I used abspath instead of normpath.  As a side note, I've found interpreting relative paths as always relative to the current directory is a bad idea.  You can see this when you have a daemon that daemonizes and then opens files: the user thinks they're specifying relative paths from wherever they were when they ran the program, the program thinks they're relative paths from /var/run/whatever.  Relative paths, if they should exist at all, should have to be explicitly linked as relative to something *else* (e.g. made absolute) before they can be used.  I think that sequences of strings might be sufficient though.>Good point, but exactly what functionality do you want to see for zip>files and URLs?  Just pathname manipulation?  Or the ability to see>whether a file exists and extract it, copy it, etc?The latter.  See http://twistedmatrix.com/trac/browser/trunk/twisted/python/zippath.pyThis is still _really_ raw functionality though.  I can't claim that it has the same "it's been used in real code" endorsement as the rest of the FilePath stuff I've been talking about.  I've never even tried to hook this up to a Twisted webserver, and I've only used it in one environment.>>  * you have to care about unicode sometimes.>This is a Python-wide problem.I completely agree, and this isn't the thread to try to solve it.  The absence of a path object, however, and the path module's reliance on strings, exacerbates the problem.  The fact that FilePath doesn't deal with this either, however, is a fairly good indication that the problem is deeper than that.>>  * the documentation really can't emphasize enough how bad using>> 'os.path.exists/isfile/isdir', and then assuming the file continues to exist>> when it is a contended resource, is.  It can be handy, but it is _always_ a>> race condition.>>What else can you do?  It's either os.path.exists()/os.remove() or "do>it anyway and catch the exception".  And sometimes you have to check>the filetype in order to determine *what* to do.You have to catch the exception anyway in many cases.  I probably shouldn't have mentioned it though, it's starting to get a bit far afield of even this ridiculously far-ranging discussion.  A more accurate criticism might be that "the absence of a file locking syste

Re: [Python-Dev] Path object design

2006-11-01 Thread Mike Orr
On 11/1/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> On 08:14 pm, [EMAIL PROTECTED] wrote:

> >(...) people have had to spend five years putting hard-to-read
> >os.path functions in the code, or reinventing the wheel with their own
> >libraries that they're not sure they can trust.  I started to use
> >path.py last year when it looked like it was emerging as the basis of
> >a new standard, but yanked it out again when it was clear the API
> >would be different by the time it's accepted.  I've gone back to
> >os.path for now until something stable emerges but I really wish I
> >didn't have to.
>
> You *don't* have to.  This is a weird attitude I've encountered over and
> over again in the Python community, although sometimes it masquerades as
> resistance to Twisted or Zope or whatever.  It's OK to use libraries.  It's
> OK even to use libraries that Guido doesn't like!  I'm pretty sure the first
> person to tell you that would be Guido himself.  (Well, second, since I just
> told you.)  If you like path.py and it solves your problems, use path.py.
> You don't have to cram it into the standard library to do that.  It won't be
> any harder to migrate from an old path object to a new path object than from
> os.path to a new path object, and in fact it would likely be considerably
> easier.

Oh, I understand it's OK to use libraries.  It's just that a path
library needs to be widely tested and well supported so you know it
won't scramble your files.  A bug in a date library affects only
datetimes. A bug in a database database library affects only that
database.  A bug in a template library affects only the page being
output.  But a bug in a path library could ruin your whole day.  "Um,
remember those important files in that other project directory you
weren't working in? They were just overwritten."

Also, I train several programmers new to Python at work. I want to
make them learn *one* path library that we'll be sure to stick with
for several years.  Every path library has subtle quirks, and
switching from one to another may not be just a matter of renaming
methods.

> >- the "secure" features may not be necessary.  If they are, this
> >should be a separate discussion, and perhaps implemented as a
> >subclass.
>
> The main "secure" feature is "child" and it is, in my opinion, the best part
> about the whole class.  Some of the other stuff (rummaging around for
> siblings with extensions, for example) is probably extraneous.  child,
> however, lets you take a string from arbitrary user input and map it into a
> path segment, both securely and quietly.  Here's a good example (and this
> actually happened, this is how I know about that crazy windows 'special
> files' thing I wrote in my other recent message): you have a decision-making
> program that makes two files to store information about a process: "pro" and
> "con".  It turns out that "con" is shorthand for "fall in a well and die" in
> win32-ese.  A "secure" path manipulation library would alert you to this
> problem with a traceback rather than having it inexplicably freeze.
> Obscure, sure, but less obscure would be getting deterministic errors from a
> user entering slashes into a text field that shouldn't accept them.

Perhaps you're right.  I'm not saying it *should not* be a basic
feature, just that unless the Python community as a whole is ready for
this, users should have a choice to use it or not.

I learned about DOS device files from the manuals back in the 80s.
But I had completely forgotten them when I made several "aux"
directories in a Subversion repository on Linux.  People tried to
check it out on Windows and... got some kind of error.  "CON" means
console: its input comes from the keyboard and its output goes to the
screen.  Since this is a device file, I'm not sure a path library has
any responsibility to treat it specially.  We don't treat
"/dev/stdout" specially unless the user specifically calls a device
function. I have no idea why Microsoft thought it was a good idea to
put the seven-odd device files in every directory. Why not force
people to type the colon ("CON:").  If they've memorized what CON
means they should have no trouble with the colon, especially since
it's required with "A:" and "C:" anyway

For trivia, these are the ones I remember:
CON   Console  (keyboard input, screen output)
KBRD  Keyboard input.
???  screen output
LPT1/2/3parallel ports
COM 1/2/3/4  serial ports
PRN  alias for default printer port (normally LPT1)
NUL  bit bucket
AUX  game port?

COPY CON FILENAME.TXT # Unix: "cat >filename.txt".
COPY FILENAME.TXT PRN  # Unix: "lp filename.txt"  or "cat
filename.txt | lp".
TYPE FILENAME.TXT   # Unix: "cat filename.txt".

> >Where have all the proponents of non-OO or limited-OO strategies been?
>
> This continuum doesn't make any sense to me.  Where would you

Re: [Python-Dev] Path object design

2006-11-01 Thread Mike Orr
On 11/1/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> On 06:14 pm, [EMAIL PROTECTED] wrote:
> >[EMAIL PROTECTED] wrote:
> >
> >> I assert that it needs a better[1] interface because the current
> >> interface can lead to a variety of bugs through idiomatic, apparently
> >> correct usage.  All the more because many of those bugs are related to
> >> critical errors such as security and data integrity.
>
> >instead of referring to some esoteric knowledge about file systems that
> >us non-twisted-using mere mortals may not be evolved enough to under-
> >stand,
>
> On the contrary, twisted users understand even less, because (A) we've been
> demonstrated to get it wrong on numerous occasions in highly public and
> embarrassing ways and (B) we already have this class that does it all for us
> and we can't remember how it works :-).

This is ironic coming from one of Python's celebrity geniuses.  "We
made this class but we don't know how it works."  Actually, it's
downright alarming coming from someone who knows Twisted inside and
out yet still can't make sense of path patform oddities.

>  * This is confusing as heck:
>>>> os.path.join("hello", "/world")
>'/world'

That's in the documentation.  I'm not sure it's "wrong".  What should
it do in this situation?  Pretend the slash isn't there?

This came up in the directory-tuple proposal.  I said there was no
reason to change the existing behavior of join.  Noam favored an
exception.

>>>> os.path.join("hello", "slash/world")
>'hello/slash/world'

That has always been a loophole in the function, and many programs
depend on it.  Again, is it "wrong"?  Should an embedded separator in
an argument be an error?  Obviously this depends on the user's
knowledge that the separator happens to be slash.

>>>> os.path.join("hello", "slash//world")
>'hello/slash//world'

Again a case of what "should" it do?  The filesystem treats it as a
single slash.  The user didn't call normpath, so should we normalize
it anyway?

>  * Sometimes a path isn't a path; the zip "paths" in sys.path are a good
> example.  This is why I'm a big fan of including a polymorphic interface of
> some kind: this information is *already* being persisted in an ad-hoc and
> broken way now, so it needs to be represented; it would be good if it were
> actually represented properly.  URL
> manipulation-as-path-manipulation is another; the recent
> perforce use-case mentioned here is a special case of that, I think.

Good point, but exactly what functionality do you want to see for zip
files and URLs?  Just pathname manipulation?  Or the ability to see
whether a file exists and extract it, copy it, etc?

>  * you have to care about unicode sometimes.  rarely enough that none of
> your tests will ever account for it, but often enough that _some_ users will
> notice breakage if your code is ever widely distributed.

This is a Python-wide problem.  The move to universal unicode will
lessen this, or at least move the problem to *one* place (creating the
unicode object), where every Python programmer will get bitten by it
and we'll develop a few standard strategies to deal with it.

(The problem is that if str and unicode are mixed in expressions,
Python will promote the str to unicode and you'll get a
UnicodeDecodeError if it contains non-ASCII characters.  Figuring out
all the ways such strings can slip into a program is difficult if
you're dealing with user strings from an unknown charset, or your
MySQL server is configured differently than you thought it was, or the
string contains Windows curly quotes et al which are undefined in
Latin-1.)

>  * the documentation really can't emphasize enough how bad using
> 'os.path.exists/isfile/isdir', and then assuming the file continues to exist
> when it is a contended resource, is.  It can be handy, but it is _always_ a
> race condition.

What else can you do?  It's either os.path.exists()/os.remove() or "do
it anyway and catch the exception".  And sometimes you have to check
the filetype in order to determine *what* to do.

-- 
Mike Orr <[EMAIL PROTECTED]>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Travis Oliphant
Paul Moore wrote:
> 
> 
> Enough of the abstract. As a concrete example, suppose I have a (byte)
> string in my program containing some binary data - an ID3 header, or a
> TCP packet, or whatever. It doesn't really matter. Does your proposal
> offer anything to me in how I might manipulate that data (assuming I'm
> not using NumPy)? (I'm not insisting that it should, I'm just trying
> to understand the scope of the PEP).
> 

What do you mean by "manipulate the data."  The proposal for a 
data-format object would help you describe that data in a standard way 
and therefore share that data between several library that would be able 
to understand the data (because they all use and/or understand the 
default Python way to handle data-formats).

It would be up to the other packages to "manipulate" the data.

So, what you would be able to do is take your byte-string and create a 
buffer object which you could then share with other packages:

Example:

b = buffer(bytestr, format=data_format_object)

Now.

a = numpy.frombuffer(b)
a['field1']  # prints data stored in the field named "field1"

etc.

Or.

cobj = ctypes.frombuffer(b)

# Now, cobj is a ctypes object that is basically a "structure" that can 
be passed # directly to your C-code.

Does this help?

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Travis Oliphant
Alexander Belopolsky wrote:
> Travis Oliphant  ieee.org> writes:
> 
> 
b = buffer(array('d', [1,2,3]))
> 
> 
> there is not much that I can do with b.  For example, if I want to pass it to
> numpy, I will have to provide the type and shape information myself:
> 
> 
numpy.ndarray(shape=(3,), dtype=float, buffer=b)
> 
> array([ 1.,  2.,  3.])
> 
> With the extended buffer protocol, I should be able to do
> 
> 
numpy.array(b)

or just

numpy.array(array.array('d',[1,2,3]))

and leave-out the buffer object all together.


> 
> 
> So let's start by solving this problem and limit it to data that can be found
> in a standard library array.  This way we can postpone the discussion of 
> shapes,
> strides and nested structs.

Don't lump those ideas together.  Shapes and strides are necessary for 
N-dimensional array's (it's essentially what *defines* the N-dimensional 
array).   I really don't want to sacrifice those in the extended buffer 
protocol.  If you want to separate them into different functions then 
that is a possibility.

> 
> If we manage to agree on the standard way to pass primitive type information,
> it will be a big achievement and immediately useful because simple arrays are
> already in the standard library.
> 

We could start there, I suppose.  Especially if it helps us all get on 
the same page.  But, we already see the applications beyond this simple 
case so I would like to have at least an "eye" for the more difficult 
case which we already have a working solution for in the "array interface"

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Extending the buffer protocol to share array information.

2006-11-01 Thread Travis Oliphant
Fredrik Lundh wrote:
> Chris Barker wrote:
> 
> 
>>While /F suggested we get off the PIL bandwagon
> 
> 
> I suggest we drop the obsession with pointers to memory areas that are 
> supposed to have a specific format; modern data access API:s don't work 
> that way for good reasons, so I don't see why Python should grow a 
> standard based on that kind of model.
> 
> the "right solution" for things like this is an *API* that lets you do 
> things like:
> 
>  view = object.acquire_view(region, supported formats)
>  ... access data in view ...
>  view.release()
> 
> and, for advanced users
> 
>  format = object.query_format(constraints)

So, if the extended buffer protocol were enhanced to enforce this kind 
of viewing and release, then would you support it?

Basically, the extended buffer protocol would at the same time as 
providing *more* information about the "view" require the implementer to 
undertand the idea of "holding" and "releasing" the view.

Would this basically require the object supporting the extended buffer 
protocol to keep some kind of list of who has views (or at least a 
number indicating how many views there are)?


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Paul Moore
On 11/1/06, Alexander Belopolsky <[EMAIL PROTECTED]> wrote:
> Let's just start with that.  The way I see the problem is that buffer protocol
> is fine as long as your data is an array of bytes, but if it is an array of
> doubles, you are out of luck. So, while I can do
>
> >>> b = buffer(array('d', [1,2,3]))
>
> there is not much that I can do with b.  For example, if I want to pass it to
> numpy, I will have to provide the type and shape information myself:
>
> >>> numpy.ndarray(shape=(3,), dtype=float, buffer=b)
> array([ 1.,  2.,  3.])
>
> With the extended buffer protocol, I should be able to do
>
> >>> numpy.array(b)

As a data point, this is the first posting that has clearly explained
to me what the two PEPs are attempting to achieve. That may be my
blindness to what others find self-evident, but equally, I may not be
the only one who needed this example...

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Alexander Belopolsky
Travis Oliphant  ieee.org> writes:
> Frankly, I'd be happy enough to start with 
> "typecodes" in the extended buffer protocol (that's where the array 
> module is now) and then move up to something more complete later.
> 

Let's just start with that.  The way I see the problem is that buffer protocol
is fine as long as your data is an array of bytes, but if it is an array of
doubles, you are out of luck. So, while I can do

>>> b = buffer(array('d', [1,2,3]))

there is not much that I can do with b.  For example, if I want to pass it to
numpy, I will have to provide the type and shape information myself:

>>> numpy.ndarray(shape=(3,), dtype=float, buffer=b)
array([ 1.,  2.,  3.])

With the extended buffer protocol, I should be able to do

>>> numpy.array(b)

So let's start by solving this problem and limit it to data that can be found
in a standard library array.  This way we can postpone the discussion of shapes,
strides and nested structs.

I propose a simple bf_gettypeinfo(PyObject *obj, int* type, int* bitsize) method
that would return a type code and the size of the data item.

I believe it is better to have type codes free from size information for
several reasons:

1. Generic code can use size information directly without having to know
that int is 32 and double is 64 bits.

2. Odd sizes can be easily described without having to add a new type code.

3. I assume that the existing bf_ functions would still return size in bytes,
so having item size available as an int will help to get number of items.

If we manage to agree on the standard way to pass primitive type information,
it will be a big achievement and immediately useful because simple arrays are
already in the standard library.

 





___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Travis Oliphant
Martin v. Löwis wrote:
> Travis E. Oliphant schrieb:
> 
>>2) complex-valued types (you might argue that it's just a 2-array of 
>>floats, but you could say the same thing about int as an array of 
>>bytes).  The point is how do people interpret the data.  Complex-valued 
>>data-types are very common.  It is one reason Fortran is still used by 
>>scientists.
> 
> 
> Well, by the same reasoning, you could argue that pixel values (RGBA)
> are missing in the PEP. It's a convenience, sure, and it may also help
> interfacing with the platform's FORTRAN implementation - however, are
> you sure that NumPy's complex layout is consistent with the platform's
> C99 _Complex definition?
> 

I think so (it is on gcc).  And yes, where you draw the line between 
fundamental and "derived" data-type is somewhat arbitrary.  I'd rather 
include complex-numbers than not given their prevalence in the 
data-streams I'm trying to make compatible with each other.

> 
>>3) Unicode characters
>>
>>4) What about floating-point representations that are not IEEE 754 
>>4-byte or 8-byte.
> 
> 
> Both of these are available in a platform-dependent way: if the
> platform uses non-IEEE754 formats for C float and C double, ctypes
> will interface with that just fine. It is actually vice versa:
> IEEE-754 4-byte and 8-byte is not supported in ctypes.

That's what I meant.  The 'f' kind in the data-type description is also 
intended to mean "platform float" whatever that is.  But, a complete 
data-format representation would have a way to describe other 
bit-layouts for floating point representation.  Even if you can't 
actually calculate directly with them without conversion.

> Same for Unicode: the platform's wchar_t is supported (as you said),
> but not a platform-independent (say) 4-byte little-endian.

Right.

It's a matter of scope.  Frankly, I'd be happy enough to start with 
"typecodes" in the extended buffer protocol (that's where the array 
module is now) and then move up to something more complete later.

But, since we already have an array interface for record-arrays to share 
information and data with each other, and ctypes showing all of it's 
power, then why not be more complete?



-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread glyph
On 08:14 pm, [EMAIL PROTECTED] wrote:>Argh, it's difficult to respond to one topic that's now spiraling into>two conversations on two lists.>[EMAIL PROTECTED] wrote:>(...) people have had to spend five years putting hard-to-read>os.path functions in the code, or reinventing the wheel with their own>libraries that they're not sure they can trust.  I started to use>path.py last year when it looked like it was emerging as the basis of>a new standard, but yanked it out again when it was clear the API>would be different by the time it's accepted.  I've gone back to>os.path for now until something stable emerges but I really wish I>didn't have to.You *don't* have to.  This is a weird attitude I've encountered over and over again in the Python community, although sometimes it masquerades as resistance to Twisted or Zope or whatever.  It's OK to use libraries.  It's OK even to use libraries that Guido doesn't like!  I'm pretty sure the first person to tell you that would be Guido himself.  (Well, second, since I just told you.)  If you like path.py and it solves your problems, use path.py.  You don't have to cram it into the standard library to do that.  It won't be any harder to migrate from an old path object to a new path object than from os.path to a new path object, and in fact it would likely be considerably easier.>> *It is already used in a large body of real, working code, and>> therefore its limitations are known.*>>This is an important consideration.However, to me a clean API is more>important.It's not that I don't think a "clean" API is important.  It's that I think that "clean" is a subjective assessment that is hard to back up, and it helps to have some data saying "we think this is clean because there are very few bugs in this 100,000 line program written using it".  Any code that is really easy to use right will tend to have *some* aesthetic appeal.>I took a quick look at filepath.  It looks similar in concept to PEP>355.  Four concerns:>    - unfamiliar method names (createDirectory vs mkdir, child vs join)Fair enough, but "child" really means child, not join.  It is explicitly for joining one additional segment, with no slashes in it.>    - basename/dirname/parent are methods rather than properties:>leads to () overproliferation in user code.The () is there because every invocation returns a _new_ object.  I think that this is correct behavior but I also would prefer that it remain explicit.>    - the "secure" features may not be necessary.  If they are, this>should be a separate discussion, and perhaps implemented as a>subclass.The main "secure" feature is "child" and it is, in my opinion, the best part about the whole class.  Some of the other stuff (rummaging around for siblings with extensions, for example) is probably extraneous.  child, however, lets you take a string from arbitrary user input and map it into a path segment, both securely and quietly.  Here's a good example (and this actually happened, this is how I know about that crazy windows 'special files' thing I wrote in my other recent message): you have a decision-making program that makes two files to store information about a process: "pro" and "con".  It turns out that "con" is shorthand for "fall in a well and die" in win32-ese.  A "secure" path manipulation library would alert you to this problem with a traceback rather than having it inexplicably freeze.  Obscure, sure, but less obscure would be getting deterministic errors from a user entering slashes into a text field that shouldn't accept them.>    - stylistic objection to verbose camelCase names like createDirectoryThere is no accounting for taste, I suppose.  Obviously if it violates the stlib's naming conventions it would have to be adjusted.>> Path representation is a bike shed.  Nobody would have proposed>> writing an entirely new embedded database engine for Python: python>> 2.5 simply included SQLite because its utility was already proven.>>There's a quantum level of difference between path/file manipulation>-- which has long been considered a requirement for any full-featured>programming language -- and a database engine which is much more>complex."quantum" means "the smallest possible amount", although I don't think you're using like that, so I think I agree with you.  No, it's not as hard as writing a database engine.  Nevertheless it is a non-trivial problem, one worthy of having its own library and clearly capable of generating a fair amount of its own discussion.>Fredrik has convinced me that it's more urgent to OOize the pathname>conversions than the filesystem operations.I agree in the relative values.  I am still unconvinced that either is "urgent" in the sense that it needs to be in the standard library.>Where have all the proponents of non-OO or limited-OO strategies been?This continuum doesn't make any sense to me.  Where would you place Twisted's solution on it?___
Python-Dev mailing list
Python-Dev@python.org

Re: [Python-Dev] PEP: Extending the buffer protocol to share array information.

2006-11-01 Thread Travis Oliphant
Fredrik Lundh wrote:
> Chris Barker wrote:
> 
> 
>>While /F suggested we get off the PIL bandwagon
> 
> 
> I suggest we drop the obsession with pointers to memory areas that are 
> supposed to have a specific format; modern data access API:s don't work 
> that way for good reasons, so I don't see why Python should grow a 
> standard based on that kind of model.
> 

Please give us an example of a modern data-access API (i.e. an 
application that uses one)?

I presume you are not fundamentally opposed to sharing memory given the 
example you gave.

> the "right solution" for things like this is an *API* that lets you do 
> things like:
> 
>  view = object.acquire_view(region, supported formats)
>  ... access data in view ...
>  view.release()
> 
> and, for advanced users
> 
>  format = object.query_format(constraints)
> 

It sounds like you are concerned about the memory-area-not-current 
problem.  Yeah, it can be a problem (but not an unsolvable one). 
Objects that share memory through the buffer protcol just have to be 
careful about resizing themselves or eliminating memory.

Anyway, it's a problem not solved by the buffer protocol.  I have no 
problem with trying to fix that in the buffer protocol, either.

It's all completely separate from what I'm talking about as far as I can 
tell.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] idea for data-type (data-format) PEP

2006-11-01 Thread Alexander Belopolsky
On 11/1/06, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:

> That's also an interesting issue for the datatypes PEP: are datatype
> objects meant to be immutable?
>
That's a question for Travis, but I would think that they would be
immutable at the Python level, but mutable at the C level.  In Travis'
approach array size  is not stored in the datatype, so I don't see
much need to modify datatype objects in-place. It may be reasonable to
allow adding fields to a record, but I don't have enough experience
with that to comment.


> This is particularly interesting for the extended buffer protocol:
> how long can one keep the data you get from bt_getarrayinfo?
>
I think your question is limited to shape and strides outputs because
dataformat is a reference counted PyObject (and PEP should specify
whether it is a borrowed reference).

And the answer is the same as for the data from
bf_getreadbuffer/bf_getwritebuffer .  AFAIK, existing buffer protocol
does not answer this question delegating it to the extension module
writers who provide objects exporting their buffers.


> Also, how does the memory management work for the results?

I think it is implied that all pointers are borrowed references.  I
could not find any discussion of memory management in the current
buffer protocol documentation.

This is a good question.  It may be the case that the shape or stride
information is not available as Py_intptr_t array inside the object
that wants to export its memory buffer.  This is not theoretical, I
have a 64-bit application that uses objects that keep their size
information in a 32-bit int.

BTW, I think the memory management issues with the buffer objects have
been resolved at some point.  Any lessons to learn from that?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread glyph
On 06:14 pm, [EMAIL PROTECTED] wrote:>[EMAIL PROTECTED] wrote:>>> I assert that it needs a better[1] interface because the current>> interface can lead to a variety of bugs through idiomatic, apparently>> correct usage.  All the more because many of those bugs are related to>> critical errors such as security and data integrity.>instead of referring to some esoteric knowledge about file systems that>us non-twisted-using mere mortals may not be evolved enough to under->stand,On the contrary, twisted users understand even less, because (A) we've been demonstrated to get it wrong on numerous occasions in highly public and embarrassing ways and (B) we already have this class that does it all for us and we can't remember how it works :-).>maybe you could just make a list of common bugs that may arise>due to idiomatic use of the existing primitives?Here are some common gotchas that I can think of off the top of my head.  Not all of these are resolved by Twisted's path class:Path manipulation: * This is confusing as heck:   >>> os.path.join("hello", "/world")   '/world'   >>> os.path.join("hello", "slash/world")   'hello/slash/world'   >>> os.path.join("hello", "slash//world")   'hello/slash//world'   Trying to formulate a general rule for what the arguments to os.path.join are supposed to be is really hard.  I can't really figure out what it would be like on a non-POSIX/non-win32 platform. * it seems like slashes should be more aggressively converted to backslashes on windows, because it's near impossible to do anything with os.sep in the current situation. * "C:blah" does not mean what you think it means on Windows.  Regardless of what you think it means, it is not that.  I thought I understood it once as the current process having a current directory on every mapped drive, but then I had to learn about UNC paths of network mapped drives and it stopped making sense again. * There are special files on windows such as "CON" and "NUL" which exist in _every_ directory.  Twisted does get around this, by looking at the result of abspath:   >>> os.path.abspath("c:/foo/bar/nul")   'nul' * Sometimes a path isn't a path; the zip "paths" in sys.path are a good example.  This is why I'm a big fan of including a polymorphic interface of some kind: this information is *already* being persisted in an ad-hoc and broken way now, so it needs to be represented; it would be good if it were actually represented properly.  URL manipulation-as-path-manipulation is another; the recent perforce use-case mentioned here is a special case of that, I think. * paths can have spaces in them and there's no convenient, correct way to quote them if you want to pass them to some gross function like os.system - and a lot of the code that manipulates paths is shell-script-replacement crud which wants to call gross functions like os.system.  Maybe this isn't really the path manipulation code's fault, but it's where people start looking when they want properly quoted path arguments. * you have to care about unicode sometimes.  rarely enough that none of your tests will ever account for it, but often enough that _some_ users will notice breakage if your code is ever widely distributed.  this is an even more obscure example, but pygtk always reports pathnames in utf8-encoded *byte* strings, regardless of your filesystem encoding.  If you forget to decode/encode it, hilarity ensues.  There's no consistent error reporting (as far as I can tell, I have encountered this rarely) and no real way to detect this until you have an actual insanely-configured system with an insanely-named file on it to test with.  (Polymorphic interfaces might help a *bit* here.  At worst, they would at least make it possible to develop a canonical "insanely encoded filesystem" test-case backend.  At best, you'd absolutely have to work in terms of unicode all the time, and no implicit encoding issues would leak through to application code.)  Twisted's thing doesn't deal with this at all, and it really should. * also *sort* of an encoding issue, although basically only for webservers or other network-accessible paths: thanks to some of these earlier issues as well as %2e%2e, there are effectively multiple ways to spell "..".  Checking for all of them is impossible, you need to use the os.path APIs to determine if the paths you've got really relate in the ways you think they do. * os.pathsep can be, and actually sometimes is, embedded in a path.  (again, more  of a general path problem, not really python's fault) * relative path manipulation is difficult.  ever tried to write the function to iterate two separate trees of files in parallel?  shutil re-implements this twice completely differently via recursion, and it's harder to do with a generator (which is what you really want).  you can't really split on os.sep and have it be correct due to the aforementioned windows-path issue, but that's what everybody does anyway. * os.path.split doesn't work anything like str.split.FS ma

Re: [Python-Dev] PEP: Extending the buffer protocol to share array information.

2006-11-01 Thread Fredrik Lundh
Chris Barker wrote:

> While /F suggested we get off the PIL bandwagon

I suggest we drop the obsession with pointers to memory areas that are 
supposed to have a specific format; modern data access API:s don't work 
that way for good reasons, so I don't see why Python should grow a 
standard based on that kind of model.

the "right solution" for things like this is an *API* that lets you do 
things like:

 view = object.acquire_view(region, supported formats)
 ... access data in view ...
 view.release()

and, for advanced users

 format = object.query_format(constraints)



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Extending the buffer protocol to share array information.

2006-11-01 Thread Chris Barker
Martin v. Löwis  v.loewis.de> writes:

> Can you please give examples for real-world applications of this
> interface, preferably examples involving multiple
> independently-developed libraries?

OK -- here's one I haven't seen in this thread yet:

wxPython has a lot code to translate between various Python data types and wx 
data types. An example is PointList Helper. This code examines the input 
Python data, and translates it to a wxList of wxPoints. It is used in a bunch 
of the drawing functions, for instance. It has some nifty optimizations so 
that if a python list if (x,y) tuples is passed in, then the code uses 
PyList_GetItem() to access the tuples, for instance.

If an Nx2 numpy array is passed in, it defaults to PySequence_GetItem() to get 
the (x,y) pair, and then again to get the values, which are converted to 
Python numbers, then checked and converted again to C ints.

The results is an awful lot of processing, even though the data in the numpy 
array already exists in a C array that could be exactly the same as the wxList 
of wxPoints (in fact, many of the drawing methods take a pointer to a 
correctly formatted C array of data).

Right now, it is faster to convert your numpy array of points to a python list 
of tuples first, then pass it in to wx.

However, were there a standard way to describe a buffer (pointer to a C array 
of data), then the PointListHelper code could look to see if the data is 
already correctly formated, and pass the pointer right through. If it was not 
it could still do the translation (like from doubles to ints, for instance) 
far more efficiently.

When I get the chance, I do intend to contribute code to support this in 
wxPython, using the numpy array interface. However, wouldn't it be better for 
it to support a generic interface that was in the standard lib, rather than 
only numpy?

While /F suggested we get off the PIL bandwagon, I do have code that has to 
pass data around between numpy, PIL and wx.Images ( and matplotlib AGG 
buffers, and GDAL geo-referenced image buffers, and ...). Most do support the 
current buffer protocol, so it can be done, but I'd be much happier if there 
was a little more checking going on, rather than my python code having to make 
sure the data is all arranged in memory the right way.

Oh, there is also the Python Cartographic Library, which can take a Python 
list of tuples as coordinates, and to a Projection on them, but which can't 
take a numpy array holding that same data.

-Chris




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] idea for data-type (data-format) PEP

2006-11-01 Thread Martin v. Löwis
Alexander Belopolsky schrieb:
> I would also like to mention one more difference between NumPy datatypes
> and ctypes that I did not see discussed.  In ctypes arrays of different
> shapes are represented using different types.  As a result, if the object
> exporting its buffer is resized, the datatype object cannot be reused, it
> has to be replaced.

That's also an interesting issue for the datatypes PEP: are datatype
objects meant to be immutable?

This is particularly interesting for the extended buffer protocol:
how long can one keep the data you get from bt_getarrayinfo?

Also, how does the memory management work for the results?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] idea for data-type (data-format) PEP

2006-11-01 Thread Alexander Belopolsky
Martin v. Löwis  v.loewis.de> writes:

> 
> I'm afraid of "dead" specifications, things whose only motivation is
> that they look nice. They are just clutter. There are a few examples
> of this already in Python, like the character buffer interface or
> the multi-segment buffers.
> 

Multi-segment buffers are only dead because standard library modules
do not support them.  I often work with text data that is represented
as an array of strings.  I would love to implement a multi-segment
buffer interface on top of that data and be able to do a full text
regular expression search without having to concatenate into one big
string, but python's re module would not take a multi-segment buffer.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] idea for data-type (data-format) PEP

2006-11-01 Thread Martin v. Löwis
Travis Oliphant schrieb:
>>> r_field = PyDict_GetItemString(dtype,'r');
>>> 
> Actually it should read PyDict_GetItemString(dtype->fields).The
> r_field is a tuple (data-type object, offset).  The fields attribute is
> (currently) a Python dictionary.

Ok. This seems to be missing in the PEP. The section titled "Attributes"
seems to talk about Python-level attributes. Apparently, you are
suggesting that there is also a C-level API, lower than
PyObject_GetAttrString, so that you can write dtype->fields, instead
of having to write PyObject_GetAttrString(dtype, "fields").

If it is indeed the intend that this kind of acccess is available
for datatype objects, then the PEP should specify it. Notice that
it would be uncommon for a type in Python: Most types have getter
functions (such as PyComplex_RealAsDouble, rather then specifying
direct access through obj->cval.real).

Going now back to your original code (and assuming proper adjustments):

dtype = img->descr;
r_field = PyDict_GetItemString(dtype,'r');
g_field = PyDict_GetItemString(dtype,'g');
r_field_dtype = PyTuple_GET_ITEM(r_field, 0);
r_field_offset = PyTuple_GET_ITEM(r_field, 1);
g_field_dtype = PyTuple_GET_ITEM(g_field, 0);
g_field_offset = PyTuple_GET_ITEM(g_field, 1);
obj = PyArray_GetField(img, g_field, g_field_offset);
Py_INCREF(r_field)
PyArray_SetField(img, r_field, r_field_offset, obj);

In this code, where is PyArray_GetField coming from? What does
it do? If I wanted to write this code from scratch, what
should I write instead? Since this is all about a flat
memory block, I'm surprised I need "true" Python objects
for the field values in there.

> But, the other option (especially for code already written) would be to
> just convert the data-format specification into it's own internal
> representation.

Ok, so your assumption is that consumers already have their own
machinery, in which case ease-of-use would be the question how
difficult it is to convert datatype objects into the internal
representation.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] idea for data-type (data-format) PEP

2006-11-01 Thread Alexander Belopolsky
Travis E. Oliphant  ieee.org> writes:


> What if we look at this from the angle of trying to communicate 
> data-formats between different libraries (not change the way anybody 
> internally deals with data-formats).
> 
> For example, ctypes has one way to internally deal with data-formats 
> (using type objects).
> 
> NumPy/Numeric has a way to internally deal with data-formats (using 
> PyArray_Descr * structure -- in Numeric it's just a C-structure but in 
> NumPy it's fleshed out further and also a Python object called the 
> data-type).
> 

Ctypes and NumPy's Array Interface address two different needs.
When using ctypes, producers of type information
are at the Python level, but Array Interface information is
produced in C code. It is very convenient to write c_int*2*3 to
specify a 2x3 integer matrix in Python, but it is much easier to
set type code to 'i' and populate the shape array with integers
in C.

Consumers of type information are at the C level in both ctypes
and Array Interface applications, but in the case of ctypes, users
are not expected to write C code. It is typical for an array
interface consumer to switch on the type code.  Single character
(or numeric) type codes are much more convenient than verbose type
names in this case.

I have used Array Interface extensively, but only for simple types
and I have studied ctypes from Python level, but not from C level.

I think the standard data type description object should build on
the strengths of both approaches.

I believe the first step should be to agree on a representation of
simple types.  Just an agreement on the standard type codes that
every module could use would be a great improvement. (Personally,
I don't need anything else from array interface.)

I don't like letter codes, however. I would prefer to use an enum
at the C level and verbose names at Python level.

I would also like to mention one more difference between NumPy datatypes
and ctypes that I did not see discussed.  In ctypes arrays of different
shapes are represented using different types.  As a result, if the object
exporting its buffer is resized, the datatype object cannot be reused, it
has to be replaced.










___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] idea for data-type (data-format) PEP

2006-11-01 Thread Travis Oliphant
Martin v. Löwis wrote:
> Travis E. Oliphant schrieb:
>   
>>> Or, if it does have uses independent of the buffer extension: what
>>> are those uses?
>>>   
>> So that NumPy and ctypes and audio libraries and video libraries and 
>> database libraries and image-file format libraries can communicate about 
>> data-formats using the same expressions (in Python).
>> 
>
> I find that puzzling. In what way can the specification of a data type
> enable communication? Don't you need some kind of protocol for it
> (i.e. operations to be invoked)? Also, do you mean that these libraries
> can communicate with each other? Or with somebody else? If so, with
> whom?
>   
What is puzzling?  I've just specified the extended buffer protocol as 
something concrete that data-format objects are shared through.   That's 
on the C-level.  I gave several examples of where such sharing would be 
useful.

Then, I gave examples in Python of how sharing data-formats would also 
be useful so that modules could support the same means to construct 
data-formats (instead of struct using strings, array using typecodes, 
ctypes using it's type-objects, and NumPy using dtype objects).
>   
>> What problem do you have in defining a standard way to communicate about 
>> binary data-formats (not just images)?  I still can't figure out why you 
>> are so resistant to the idea.  MPI had to do it.
>> 
>
> I'm afraid of "dead" specifications, things whose only motivation is
> that they look nice. They are just clutter. There are a few examples
> of this already in Python, like the character buffer interface or
> the multi-segment buffers.
>   
O.K.  I can understand that concern.But, all you do is make struct, 
array, and ctypes support the same data-format specification (by support 
I mean have a way to "consume" and "produce" the data-format object to 
the natural represenation that they have internally) and you are 
guaranteed it won't "die."   In fact, what would be ideal is for the 
PIL, NumPy, CVXOpt, PyMedia, PyGame, pyre, pympi, PyVoxel, etc., etc. 
(there really are many modules that should be able to talk to each other 
more easily) to all support the same data-format representations. Then, 
you don't have to learn everybody's  re-invention of the same concept 
whenever you encounter a new library that does something with binary data.

How much time do you actually spend with binary data (sound, video, 
images, just plain numbers from a scientific experiment) and trying to 
use multiple Python modules to manipulate it?  If you don't spend much 
time, then I can understand why you don't understand the need.
> As for MPI: It didn't just independently define a data types system.
> Instead, it did that, *and* specified the usage of the data types
> in operations such as MPI_SEND. It is very clear what the scope of
> this data description is, and what the intended usage is.
>
> Without specifying an intended usage, it is impossible to evaluate
> whether the specification meets its goals.
>   
What is not understood about the intended usage in the extended buffer 
protocol.  What is not understood about the intended usage of giving the 
array and struct modules a uniform way to represent binary data?
> Ok, that would be a new usage: I expected that datatype instances
> always come in pairs with memory allocated and filled according to
> the description. 
To me that is the most important usage, but it's not the *only* one. 

> If you are proposing to modify/extend the API
> of the struct and array modules, you should say so somewhere (in
> a PEP).
>   
Sure, I understand that.  But, if there is no data-format object, then 
there is no PEP to "extend the struct and array modules" to support it.  
Chicken before the egg, and all that.
> I expect that the primary readers/users of the PEP would be people who
> have to write libraries: i.e. people implementing NumPy, struct, array,
> and people who implement algorithms that operate on data.

Yes, but not only them.  If it's a default way to represent data,  then 
*users* of those libraries that "consume" the representation would also 
benefit by learning a standard.

>  So usability
> of the specification is a matter of how easy it is to *write* a library
> that does perform the image manipulation.
>
>   
>> If you really want to know.  In NumPy it might look like this:
>>
>> Python code:
>>
>> img['r'] = img['g']
>> img['b'] = img['g']
>> 
>
> That's not what I'm asking. Instead, what does the NumPy code look
> like that gets invoked on these read-and-write operations? Does it
> only use the void* pointing to the start of the data, and the
> datatype object? If not, how would C code look like that only has
> the void* and the datatype object?
>
>   
>> dtype = img->descr;
>> 
>
> In this code, is descr a datatype object? ...
>   
Yes.  But, I have a mistake later...
>   
>> r_field = PyDict_GetItemString(dtype,'r');
>> 
Actually it should read PyDict_GetItemString(dtype

Re: [Python-Dev] Path object design

2006-11-01 Thread Mike Orr
Argh, it's difficult to respond to one topic that's now spiraling into
two conversations on two lists.

[EMAIL PROTECTED] wrote:
> On 03:14 am, [EMAIL PROTECTED] wrote:
>
> >One thing is sure -- we urgently need something better than os.path.
> >It functions well but it makes hard-to-read and unpythonic code.
>
> I'm not so sure.  The need is not any more "urgent" today than it was
> 5 years ago, when os.path was equally "unpythonic" and unreadable.
> The problem is real but there is absolutely no reason to hurry to a
> premature solution.

Except that people have had to spend five years putting hard-to-read
os.path functions in the code, or reinventing the wheel with their own
libraries that they're not sure they can trust.  I started to use
path.py last year when it looked like it was emerging as the basis of
a new standard, but yanked it out again when it was clear the API
would be different by the time it's accepted.  I've gone back to
os.path for now until something stable emerges but I really wish I
didn't have to.

> I've already recommended Twisted's twisted.python.filepath module as a
> possible basis for the implementation of this feature

> *It is already used in a large body of real, working code, and
> therefore its limitations are known.*

This is an important consideration.However, to me a clean API is more
important.  Since we haven't agreed on an API there is no widely-used
module that implements it... it's a chicken-and-egg problem since it
takes significant time to write and test an implementation.  So I'd
like to start from the standpoint of an ideal API rather than just
taking the API of the most widely-used implementation.  os.path is
clearly the most widely-used implementation, but that doesn't mean
that OOizing it as-is would be my favorite choice.

I took a quick look at filepath.  It looks similar in concept to PEP
355.  Four concerns:
- unfamiliar method names (createDirectory vs mkdir, child vs join)
- basename/dirname/parent are methods rather than properties:
leads to () overproliferation in user code.
- the "secure" features may not be necessary.  If they are, this
should be a separate discussion, and perhaps implemented as a
subclass.
- stylistic objection to verbose camelCase names like createDirectory


> Proposals for extending the language are contentious and it is very
> difficult to do experimentation with non-trivial projects because
> nobody wants to do that and then end up with a bunch of code written
> in a language that is no longer supported when the experiment fails.

True.

> Path representation is a bike shed.  Nobody would have proposed
> writing an entirely new embedded database engine for Python: python
> 2.5 simply included SQLite because its utility was already proven.

There's a quantum level of difference between path/file manipulation
-- which has long been considered a requirement for any full-featured
programming language -- and a database engine which is much more
complex.

Georg Brandl <[EMAIL PROTECTED]> wrote:
> I have been a supporter of the full-blown Path object in the past, but the
> recent discussions have convinved me that it is just too big and too 
> confusing,
> and that you can't kill too many birds with one stone in this respect.
> Most of the ugliness really lies in the path name manipulation functions, 
> which
> nicely map to methods on a path name object.

Fredrik has convinced me that it's more urgent to OOize the pathname
conversions than the filesystem operations.  Pathname conversions are
the ones that frequently get nested or chained, whereas filesystem
operations are usually done at the top level of a program statement,
or return a different "kind" of value (stat, true/false, etc).

However, it's interesting that all the proposals I've seen in the past
three years have been a "monolithic" OO class.  Clearly there are a
lot of people who prefer this way, or at least have never heard of
anything different.  Where have all the proponents of non-OO or
limited-OO strategies been?  The first proposal of that sort I've seen
was Nich Cochlan's October 1.  Have y'all just been ignoring the
monolithic OO efforts without offering any alternatives?


Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> > This is fully backwards compatible, can go right into 2.6 without
> > breaking anything, allows people to update their code as they go,
> > and can be incrementally improved in future releases:
> >
> >  1) Add a pathname wrapper to "os.path", which lets you do basic
> > path "algebra".  This should probably be a subclass of unicode,
> > and should *only* contain operations on names.
> >
> >  2) Make selected "shutil" operations available via the "os" name-
> > space; the old POSIX API vs. POSIX SHELL distinction is pretty
> > irrelevant.  Also make the os.path predicates available via the
> > "os" namespace.
> >
> > This gives a very simple conceptual model for the user; to manipu

Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Martin v. Löwis
Travis E. Oliphant schrieb:
> I was too hasty.  There are some things actually missing from ctypes:

I think Thomas can correct me if I'm wrong: I think endianness is
supported (although this support seems undocumented). There seems
to be code that checks for the presence of a _byteswapped_ attribute
on fields of a struct; presence of this field is then interpreted
as data having the "other" endianness.

> 1) long double (this is not the same across platforms, but it is a 
> data-type).

That's indeed missing.

> 2) complex-valued types (you might argue that it's just a 2-array of 
> floats, but you could say the same thing about int as an array of 
> bytes).  The point is how do people interpret the data.  Complex-valued 
> data-types are very common.  It is one reason Fortran is still used by 
> scientists.

Well, by the same reasoning, you could argue that pixel values (RGBA)
are missing in the PEP. It's a convenience, sure, and it may also help
interfacing with the platform's FORTRAN implementation - however, are
you sure that NumPy's complex layout is consistent with the platform's
C99 _Complex definition?

> 3) Unicode characters
> 
> 4) What about floating-point representations that are not IEEE 754 
> 4-byte or 8-byte.

Both of these are available in a platform-dependent way: if the
platform uses non-IEEE754 formats for C float and C double, ctypes
will interface with that just fine. It is actually vice versa:
IEEE-754 4-byte and 8-byte is not supported in ctypes.
Same for Unicode: the platform's wchar_t is supported (as you said),
but not a platform-independent (say) 4-byte little-endian.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] idea for data-type (data-format) PEP

2006-11-01 Thread Martin v. Löwis
Travis E. Oliphant schrieb:
>> Or, if it does have uses independent of the buffer extension: what
>> are those uses?
> 
> So that NumPy and ctypes and audio libraries and video libraries and 
> database libraries and image-file format libraries can communicate about 
> data-formats using the same expressions (in Python).

I find that puzzling. In what way can the specification of a data type
enable communication? Don't you need some kind of protocol for it
(i.e. operations to be invoked)? Also, do you mean that these libraries
can communicate with each other? Or with somebody else? If so, with
whom?

> What problem do you have in defining a standard way to communicate about 
> binary data-formats (not just images)?  I still can't figure out why you 
> are so resistant to the idea.  MPI had to do it.

I'm afraid of "dead" specifications, things whose only motivation is
that they look nice. They are just clutter. There are a few examples
of this already in Python, like the character buffer interface or
the multi-segment buffers.

As for MPI: It didn't just independently define a data types system.
Instead, it did that, *and* specified the usage of the data types
in operations such as MPI_SEND. It is very clear what the scope of
this data description is, and what the intended usage is.

Without specifying an intended usage, it is impossible to evaluate
whether the specification meets its goals.

> Absolutely --- if something is to be made useful across packages and 
> from Python.   This is where the discussion should take place.  The 
> struct module and array modules would both be consumers also so that in 
> the struct module you could specify your structure in terms of the 
> standard data-represenation and in the array module you could specify 
> your array in terms of the standard representation instead of using 
> "character codes".

Ok, that would be a new usage: I expected that datatype instances
always come in pairs with memory allocated and filled according to
the description. If you are proposing to modify/extend the API
of the struct and array modules, you should say so somewhere (in
a PEP).

>> Suppose I wanted to change all RGB values to a gray value (i.e. R=G=B),
>> what would the C code look like that does that? (it seems now that the
>> primary purpose of this machinery is image manipulation)
>>
> 
> For me it is definitely not image manipulation that is the only purpose 
> (or even the primary purpose).  It's just an easy one to explain --- 
> most people understand images).   But, I think this question is actually 
> irrelevant (IMHO).  To me, how you change all RGB values to gray would 
> depend on the library you are using not on how data-formats are expressed.
> 
> Maybe we are still mis-understanding each other.

I expect that the primary readers/users of the PEP would be people who
have to write libraries: i.e. people implementing NumPy, struct, array,
and people who implement algorithms that operate on data. So usability
of the specification is a matter of how easy it is to *write* a library
that does perform the image manipulation.

> If you really want to know.  In NumPy it might look like this:
> 
> Python code:
> 
> img['r'] = img['g']
> img['b'] = img['g']

That's not what I'm asking. Instead, what does the NumPy code look
like that gets invoked on these read-and-write operations? Does it
only use the void* pointing to the start of the data, and the
datatype object? If not, how would C code look like that only has
the void* and the datatype object?

> dtype = img->descr;

In this code, is descr a datatype object? ...

> r_field = PyDict_GetItemString(dtype,'r');

... I guess not, because apparently, it is a dictionary, not
a datatype object.

> But, I still don't see how that is relevant to the question of how to 
> represent the data-format to share that information across two extensions.

Well, if NumPy gets the data from a different module, it can't assume
there is a descr object that is a dictionary. Instead, it must
perform these operations just by using the datatype object. What
else is the purpose of sharing the information, if not to use it
to access the data?

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Travis E. Oliphant
Jim Jewett wrote:
> I'm still not sure exactly what is missing from ctypes.  To make this 
> concrete:

I was too hasty.  There are some things actually missing from ctypes:

1) long double (this is not the same across platforms, but it is a 
data-type).
2) complex-valued types (you might argue that it's just a 2-array of 
floats, but you could say the same thing about int as an array of 
bytes).  The point is how do people interpret the data.  Complex-valued 
data-types are very common.  It is one reason Fortran is still used by 
scientists.
3) Unicode characters (there is w_char support but I mean a way to 
describe what kind of unicode characters you have in a cross-platform 
way).  I actually think we have a way to describe encodings in the 
data-format representation as well.

4) What about floating-point representations that are not IEEE 754 
4-byte or 8-byte.   There should be a way to at least express the 
data-format in these cases (this is actually how long double should be 
handled as well since it varies across platforms what is actually done 
with the extra bits).

So, we can't "just use ctypes" as a complete data-format representation 
because it's also missing some things.

What we need is a standard way for libraries that deal with data-formats 
to communicate with each other.  I need help with a PEP like this and 
that's what I'm asking for.  It's all I've really been after all along.

A couple of points:

* One reason to support the idea of the Python object approach (versus a 
string-syntax) is that it "is already parsed".  A list-syntax approach 
(perhaps built from strings for fundamental data-types) might also be 
considered "already parsed" as well.

* One advantage of using "kind" versus a character for every type (like 
struct and array do) is that it helps consumers and producers speed up 
the parser (a fuller branching tree).


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Tracker-discuss] Getting Started

2006-11-01 Thread Brett Cannon
On 11/1/06, Erik Forsberg <[EMAIL PROTECTED]> wrote:
-BEGIN PGP SIGNED MESSAGE-Hash: SHA1"Brett Cannon" <[EMAIL PROTECTED]> writes:> On 11/1/06, Stefan Seefeld <
[EMAIL PROTECTED]> wrote: Brett Cannon wrote:>> > On 11/1/06, Stefan Seefeld <[EMAIL PROTECTED]> wrote: >> Right. Brett, do we need accounts on 
python.org for this ?>> >>> >>> > Yep.  It just requires SSH 2 keys from each of you.  You can then email>> > python-dev with those keys and your 
first.last name and someone there>> will>> > install the keys for you. My key is at http://www3.sympatico.ca/seefeld/ssh.txt
, I'm Stefan Seefeld. Thanks !>>> Just to clarify, this is not for pydotorg but the svn.python.org.  The> admins for our future Roundup instance are going to keep their Roundup code
> in svn so they need commit access.Now when that's clarified, here's my data:Public SSH key: http://efod.se/about/ptkey.pubFirst.Lastname: erik.forsberg
I'd appreciate if someone with good taste could tell us where in thetree we should add our code :-).Right at the root: ``svn+ssh://pythondev@svn.python.org/tracker``
 (or replace "tracker" without whatever name you guys want to go with).  This is because the tracker code is conceptually its own project.-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Path object design

2006-11-01 Thread Jim Jewett
On 10:06 am, g.brandl at gmx.net wrote:
>> What a successor to os.path needs is not security, it's a better
(more pythonic,
>> if you like) interface to the old functionality.

Glyph:

> Why?

> Rushing ... could exacerbate a very real problem, e.g.
> the security and data-integrity implications of idiomatic usage.

The proposed Path object (or new path module) is intended to replace
os.path.  If it can't do the equivalent of "cd ..", then it isn't a
replacement; it is just another similar alternative to confuse
beginners.

If you're saying that a webserver should use a more restricted
subclass (or even the existing FilePath alternative), then I agree.
I'll even agree that a restricted version would ideally be available
out of the box.  I don't think it should be the only option.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Travis Oliphant
Jim Jewett wrote:
> I'm still not sure exactly what is missing from ctypes.  To make this 
> concrete:

I think the only thing missing from ctypes "expressiveness" as far as I 
can tell in terms of what you "can" do is the byte-order representation.

What is missing is ease-of use for producers and consumers in 
interpreting the data-type.   When I speak of Producers and consumers, 
I'm largely talking about C-code (or Java or .NET) code writers.

Producers must basically use Python code to create classes of various 
types.   This is going to be slow in 'C'.  Probably slower than the 
array interface (which is what we have no informally).

Consumers are going to have a hard time interpreting the result.  I'm 
not even sure how to do that, in fact.  I'd like NumPy to be able to 
understand ctypes as a means to specify data.  Would I have to check 
against all the sub-types of CDataType, pull out the fields, check the 
tp_name of the type object?  I'm not sure.

It seems like a string with the C-structure would be better as a 
data-representation, but then a third-party library would want to parse 
that so that Python might as well have it's own parser for data-types. 

So, Python might as well have it's own way to describe data.  My claim 
is this default way should *not* be overloaded by using Python 
type-objects (the ctypes way).  I'm making a claim that the NumPy way of 
using a different Python object to describe data-types.  I'm not saying 
the NumPy object should be used.  I'm saying we should come up with a 
singe DataFormatType whose instances express the data formats in ways 
that other packages can produce and consume (or even use internally).  

It would be easy for NumPy to "use" the default Python object in it's 
PyArray_Descr * structure.  It would also be easy for ctypes to "use" 
the default Python object in its StgDict object that is the tp_dict of 
every ctypes type object.

It would be easy for the struct module to allow for this data-format 
object (instead of just strings) in it's methods. 

It would be easy for the array module to accept this data-format object 
(instead of just typecodes) in it's constructor.

Lot's of things would suddenly be more consistent throughout both the 
Python and C-Python user space.

Perhaps after discussion, it becomes clear that the ctypes approach is 
sufficient to be "that thing" that all modules use to share data-format 
information.  It's definitely expressive enough.   But, my argument is 
that NumPy data-type objects are also "pretty close." so why should they 
be rejected.  We could also make a "string-syntax" do it.

>
> You have said that creating whole classes is too much overhead, and
> the description should only be an instance.  To me, that particular
> class (arrays of 500 structs) still looks pretty lightweight.  So
> please clarify when it starts to be a problem.
>

> (1)  For simple types -- mapping
>   char name[30];  ==> ("name", c_char*30)
>
> Do you object to using the c_char type?
> Do you object to the array-of-length-30 class, instead of just having
> a repeat or shape attribute?
> Do you object to naming the field?
>
> (2)  For the complex types, nested and struct
>
> Do you object to creating these two classes even once?   For example,
> are you expecting to need different classes for each buffer, and to
> have many buffers created quickly?
I object to the way I "consume" and "produce" the ctypes interface.  
It's much to slow to be used on the C-level for sharing many small 
buffers quickly.
>
> Is creating that new class a royal pain, but frequent (and slow)
> enough that you can't just make a call into python (or ctypes)?
>
> (3)  Given that you will describe X, is X*500 (==> a type describing
> an array of 500 Xs) a royal pain in C?  If so, are you expecting to
> have to do it dynamically for many sizes, and quickly enough that you
> can't just let ctypes do it for you?

That pretty much sums it up (plus the pain of having to basically write 
Python code from "C").

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Tracker-discuss] Getting Started

2006-11-01 Thread Brett Cannon
On 11/1/06, Stefan Seefeld <[EMAIL PROTECTED]> wrote:
Brett Cannon wrote:> On 11/1/06, Stefan Seefeld <[EMAIL PROTECTED]> wrote:>> Right. Brett, do we need accounts on python.org
 for this ?>>> Yep.  It just requires SSH 2 keys from each of you.  You can then email> python-dev with those keys and your first.last name and someone there will> install the keys for you.
My key is at http://www3.sympatico.ca/seefeld/ssh.txt, I'm Stefan Seefeld.Thanks !Just to clarify, this is not for pydotorg but the 
svn.python.org.  The admins for our future Roundup instance are going to keep their Roundup code in svn so they need commit access.-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] idea for data-type (data-format) PEP

2006-11-01 Thread Travis E. Oliphant
Martin v. Löwis wrote:
> Travis E. Oliphant schrieb:
>> What if we look at this from the angle of trying to communicate 
>> data-formats between different libraries (not change the way anybody 
>> internally deals with data-formats).
> 
> ISTM that this is not the right approach. If the purpose of the datatype
> object is just to communicate the layout in the extended buffer
> interface, then it should be specified in that PEP, rather than being
> stand-alone, and it should not pretend to serve any other purpose.

I'm actually quite fine with that.  If that is the consensus, then I 
will just go that direction.   ISTM though that since we are putting 
forth the trouble inside the extended buffer protocol we might as well 
be as complete as we know how to be.

> Or, if it does have uses independent of the buffer extension: what
> are those uses?

So that NumPy and ctypes and audio libraries and video libraries and 
database libraries and image-file format libraries can communicate about 
data-formats using the same expressions (in Python).

Maybe we decide that ctypes-based expressions are a very good way to 
communicate about those things in Python for all other packages.  If 
that is the case, then I argue that we ought to change the array module, 
and the struct module to conform (of course keeping the old ways for 
backward compatibility) and set the standard for other packages to follow.

What problem do you have in defining a standard way to communicate about 
binary data-formats (not just images)?  I still can't figure out why you 
are so resistant to the idea.  MPI had to do it.

> 
>> 1) We could define a special string-syntax (or list syntax) that covers 
>> every special case.  The array interface specification goes this 
>> direction and it requires no new Python types.  This could also be seen 
>> as an extension of the "struct" module to allow for nested structures, etc.
>>
>> 2) We could define a Python object that specifically carries data-format 
>> information.
> 
> To distinguish between these, convenience of usage (and of construction)
> should have to be taken into account. At least for the preferred
> alternative, but better for the runners-up, too, there should be a
> demonstration on how existing modules have to be changed to support it
> (e.g. for the struct and array modules as producers; not sure what
> good consumer code would be).

Absolutely --- if something is to be made useful across packages and 
from Python.   This is where the discussion should take place.  The 
struct module and array modules would both be consumers also so that in 
the struct module you could specify your structure in terms of the 
standard data-represenation and in the array module you could specify 
your array in terms of the standard representation instead of using 
"character codes".

> 
> Suppose I wanted to change all RGB values to a gray value (i.e. R=G=B),
> what would the C code look like that does that? (it seems now that the
> primary purpose of this machinery is image manipulation)
> 

For me it is definitely not image manipulation that is the only purpose 
(or even the primary purpose).  It's just an easy one to explain --- 
most people understand images).   But, I think this question is actually 
irrelevant (IMHO).  To me, how you change all RGB values to gray would 
depend on the library you are using not on how data-formats are expressed.

Maybe we are still mis-understanding each other.


If you really want to know.  In NumPy it might look like this:

Python code:

img['r'] = img['g']
img['b'] = img['g']

C-code:

use the Python C-API to do essentially the same thing as above or

to do
img['r'] = img['g']

dtype = img->descr;
r_field = PyDict_GetItemString(dtype,'r');
g_field = PyDict_GetItemString(dtype,'g');
r_field_dtype = PyTuple_GET_ITEM(r_field, 0);
r_field_offset = PyTuple_GET_ITEM(r_field, 1);
g_field_dtype = PyTuple_GET_ITEM(g_field, 0);
g_field_offset = PyTuple_GET_ITEM(g_field, 1);
obj = PyArray_GetField(img, g_field, g_field_offset);
Py_INCREF(r_field)
PyArray_SetField(img, r_field, r_field_offset, obj);

But, I still don't see how that is relevant to the question of how to 
represent the data-format to share that information across two extensions.


>> The problem with 2b is that what works inside an extension module may 
>> not be the best option when it comes to communicating across multiple 
>> extension modules.   Certainly none of the extension modules have argued 
>> that case effectively.
> 
> I think there are two ways in which one option could be "better" than
> the other: it might be more expressive, and it might be easier to use.
> For the second aspect (ease of use), there are two subways: it might
> be easier to produce, or it might be easier to consume.

I like this as a means to judge a data-format representation. Let me 
summarize to see if I understand:

1) Expressive (does it express every data-format you might want or need)
2) Ease of use
a) Production

[Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Jim Jewett
I'm still not sure exactly what is missing from ctypes.  To make this concrete:

You have an array of 500 elements meeting

struct {
  int  simple;
  struct nested {
   char name[30];
   char addr[45];
   int  amount;
  }

ctypes can describe this as

class nested(Structure):
_fields_ = [("name", c_char*30),
("addr", c_char*45),
("amount", c_long)]

class struct(Structure):
_fields_ = [("simple", c_int), ("nested", nested)]

desc = struct * 500

You have said that creating whole classes is too much overhead, and
the description should only be an instance.  To me, that particular
class (arrays of 500 structs) still looks pretty lightweight.  So
please clarify when it starts to be a problem.

(1)  For simple types -- mapping
   char name[30];  ==> ("name", c_char*30)

Do you object to using the c_char type?
Do you object to the array-of-length-30 class, instead of just having
a repeat or shape attribute?
Do you object to naming the field?

(2)  For the complex types, nested and struct

Do you object to creating these two classes even once?   For example,
are you expecting to need different classes for each buffer, and to
have many buffers created quickly?

Is creating that new class a royal pain, but frequent (and slow)
enough that you can't just make a call into python (or ctypes)?

(3)  Given that you will describe X, is X*500 (==> a type describing
an array of 500 Xs) a royal pain in C?  If so, are you expecting to
have to do it dynamically for many sizes, and quickly enough that you
can't just let ctypes do it for you?

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote:

> I assert that it needs a better[1] interface because the current 
> interface can lead to a variety of bugs through idiomatic, apparently 
> correct usage.  All the more because many of those bugs are related to 
> critical errors such as security and data integrity.

instead of referring to some esoteric knowledge about file systems that 
us non-twisted-using mere mortals may not be evolved enough to under- 
stand, maybe you could just make a list of common bugs that may arise 
due to idiomatic use of the existing primitives?

I promise to make a nice FAQ entry out of it, with proper attribution.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread Georg Brandl
[EMAIL PROTECTED] wrote:
> On 10:06 am, [EMAIL PROTECTED] wrote:
>  >What a successor to os.path needs is not security, it's a better (more 
> pythonic,
>  >if you like) interface to the old functionality.
> 
> Why?
> 
> I assert that it needs a better[1] interface because the current 
> interface can lead to a variety of bugs through idiomatic, apparently 
> correct usage.  All the more because many of those bugs are related to 
> critical errors such as security and data integrity.

AFAICS, people just want an interface that is easier to use and feels more...
err... (trying to avoid the p-word). I've never seen security arguments
being made in this discussion.

> If I felt the current interface did a good job at doing the right thing 
> in the right situation, but was cumbersome to use, I would strenuously 
> object to _any_ work taking place to change it.  This is a hard API to 
> get right.

Well, it's hard to change any running system with that attitude. It doesn't
have to be changed if nobody comes up with something that's agreed (*) to
be better.

(*) agreed in the c.l.py sense, of course

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] idea for data-type (data-format) PEP

2006-11-01 Thread Martin v. Löwis
Travis E. Oliphant schrieb:
> What if we look at this from the angle of trying to communicate 
> data-formats between different libraries (not change the way anybody 
> internally deals with data-formats).

ISTM that this is not the right approach. If the purpose of the datatype
object is just to communicate the layout in the extended buffer
interface, then it should be specified in that PEP, rather than being
stand-alone, and it should not pretend to serve any other purpose.
Or, if it does have uses independent of the buffer extension: what
are those uses?

> 1) We could define a special string-syntax (or list syntax) that covers 
> every special case.  The array interface specification goes this 
> direction and it requires no new Python types.  This could also be seen 
> as an extension of the "struct" module to allow for nested structures, etc.
> 
> 2) We could define a Python object that specifically carries data-format 
> information.

To distinguish between these, convenience of usage (and of construction)
should have to be taken into account. At least for the preferred
alternative, but better for the runners-up, too, there should be a
demonstration on how existing modules have to be changed to support it
(e.g. for the struct and array modules as producers; not sure what
good consumer code would be).

Suppose I wanted to change all RGB values to a gray value (i.e. R=G=B),
what would the C code look like that does that? (it seems now that the
primary purpose of this machinery is image manipulation)

> The problem with 2b is that what works inside an extension module may 
> not be the best option when it comes to communicating across multiple 
> extension modules.   Certainly none of the extension modules have argued 
> that case effectively.

I think there are two ways in which one option could be "better" than
the other: it might be more expressive, and it might be easier to use.
For the second aspect (ease of use), there are two subways: it might
be easier to produce, or it might be easier to consume.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] idea for data-type (data-format) PEP

2006-11-01 Thread Travis E. Oliphant
Travis E. Oliphant wrote:
> Thanks for all the comments that have been given on the data-type 
> (data-format) PEP.  I'd like opinions on an idea for revising the PEP I 
> have.

> 
> 1) We could define a special string-syntax (or list syntax) that covers 
> every special case.  The array interface specification goes this 
> direction and it requires no new Python types.  This could also be seen 
> as an extension of the "struct" module to allow for nested structures, etc.
> 
> 2) We could define a Python object that specifically carries data-format 
> information.
> 
> 
> Does that explain the goal of what I'm trying to do better?

In other-words, what I'm saying is I really want a PEP that does this. 
Could we have a discussion about what the best way to communicate 
data-format information across multiple extension modules would look 
like.  I'm not saying my (pre-)PEP is best.  The point of putting it in 
it's infant state out there is to get the discussion rolling, not to 
claim I've got all the answers.

It seems like there are enough people who have dealt with this issue 
that we ought to be able to put something very useful together that 
would make Python much better glue.


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] idea for data-type (data-format) PEP

2006-11-01 Thread Travis E. Oliphant

Thanks for all the comments that have been given on the data-type 
(data-format) PEP.  I'd like opinions on an idea for revising the PEP I 
have.

What if we look at this from the angle of trying to communicate 
data-formats between different libraries (not change the way anybody 
internally deals with data-formats).

For example, ctypes has one way to internally deal with data-formats 
(using type objects).

NumPy/Numeric has a way to internally deal with data-formats (using 
PyArray_Descr * structure -- in Numeric it's just a C-structure but in 
NumPy it's fleshed out further and also a Python object called the 
data-type).

Numarray has a way to internally deal with data-formats (using type 
objects).

The array module has a way to internally deal with data-formats (using a 
PyArray_Descr * structure -- and character codes to select one).

The struct module deals with data-formats using character codes.

The PIL deals with data-formats using image modes.

PyVTK deals with data-formats using it's own internal objects.

MPI deals with data-formats using it's own MPI_DataType structures.

This list goes on and on.

What I claim is needed in Python (to make it better glue) is to have a 
standard way to communicate data-format information between these 
extensions.  Then, you don't have to build in support for all the 
different ways data-formats are represented by different libraries.  The 
library only has to be able to translate their representation to the 
standard way that Python uses to represent data-format.

How is this goal going to be achieved?  That is the real purpose of the 
data-type object I previously proposed.

Nick showed that there are two (non-orthogonal) ways to think about this 
goal.

1) We could define a special string-syntax (or list syntax) that covers 
every special case.  The array interface specification goes this 
direction and it requires no new Python types.  This could also be seen 
as an extension of the "struct" module to allow for nested structures, etc.

2) We could define a Python object that specifically carries data-format 
information.


There is also a third way (or really 2b) that has been mentioned:  take 
one of the extensions and use what it does to communicate data-format 
between objects and require all other extensions to conform to that 
standard.

The problem with 2b is that what works inside an extension module may 
not be the best option when it comes to communicating across multiple 
extension modules.   Certainly none of the extension modules have argued 
that case effectively.

Does that explain the goal of what I'm trying to do better?





___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread glyph
On 10:06 am, [EMAIL PROTECTED] wrote:>What a successor to os.path needs is not security, it's a better (more pythonic,>if you like) interface to the old functionality.Why?I assert that it needs a better[1] interface because the current interface can lead to a variety of bugs through idiomatic, apparently correct usage.  All the more because many of those bugs are related to critical errors such as security and data integrity.If I felt the current interface did a good job at doing the right thing in the right situation, but was cumbersome to use, I would strenuously object to _any_ work taking place to change it.  This is a hard API to get right.[1]: I am rather explicitly avoiding the word "pythonic" here.  It seems to have grown into a shibboleth (and its counterpart, "unpythonic", into an expletive).  I have the impression it used to mean something a bit more specific, maybe adherence to Tim Peters' "Zen" (although that was certainly vague enough by itself and not always as self-evidently true as some seem to believe).  More and more, now, though, I hear it used to mean 'stuff should be more betterer!' and then everyone nods sagely because we know that no filthy *java* programmer wants things to be more betterer; *we* know *they* want everything to be horrible.  Words like this are a pet peeve of mine though, so perhaps I am overstating the case.  Anyway, moving on... as long as I brought up the Zen, perhaps a particular couplet is appropriate here:  Now is better than never.  Although never is often better than *right* now.Rushing to a solution to a non-problem, e.g. the "pythonicness" of the interface, could exacerbate a very real problem, e.g. the security and data-integrity implications of idiomatic usage.  Granted, it would be hard to do worse than os.path, but it is by no means impossible (just look at any C program!), and I can think of a couple of kinds of API which would initially appear more convenient but actually prove more problematic over time.That brings me back to my original point: the underlying issue here is too important a problem to get wrong *again* on the basis of a superficial "need" for an API that is "better" in some unspecified way.  os.path is at least possible to get right if you know what you're doing, which is no mean feat; there are many path-manipulation libraries in many languages which cannot make that claim (especially portably).  Its replacement might not be.  Getting this wrong outside the standard library might create problems for some people, but making it worse _in_ the standard library could create a total disaster for everyone.I do believe that this wouldn't get past the dev team (least of all the release manager) but it would waste a lot less of everyone's time if we focused the inevitable continuing bike-shed discussion along the lines of discussing the known merits of widely deployed alternative path libraries, or at least an approach to *get* that data on some new code if there is consensus that existing alternatives are in some way inadequate.If for some reason it _is_ deemed necessary to go with an untried approach, I can appreciate the benefits that /F has proposed of trying to base the new interface entirely and explicitly off the old one.  At least that way it will still definitely be possible to get right.  There are problems with that too, but they are less severe.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Extending the buffer protocol to share array information.

2006-11-01 Thread Travis E. Oliphant
Fredrik Lundh wrote:
> Terry Reedy wrote:
> 
>> I believe that at present PyGame can only work with external images that it 
>> is programmed to know how to import.  My guess is that if image source 
>> program X (such as PIL) described its data layout in a way that NumPy could 
>> read and act on, the import/copy step could be eliminated.
> 
> I wish you all stopped using PIL as an example in this discussion;
> for PIL 2, I'm moving towards an entirely opaque data model, with a 
> "data view"-style client API.

That's an un-reasonable request.  The point of the buffer protocol 
allows people to represent their data in whatever way they like 
internally but still share it in a standard way.  The extended buffer 
protocol allows sharing of the shape of the data and its format in a 
standard way as well.

We just want to be able to convert the data in PIL objects to other 
Python objects without having to write special "converter" functions. 
It's not important how PIL or PIL 2 stores the data as long as it 
participates in the buffer protocol.

Of course if the memory layout were compatible with the model of NumPy, 
then data-copies would not be required, but that is really secondary.

-Travis


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Bill Baxter
Martin v. Löwis  v.loewis.de> writes:

> 
> Bill Baxter schrieb:
> > Basically in my code I want to be able to take the binary data descriptor 
> > and
> > say "give me the 'r' field of this pixel as an integer".
> > 
> > Is either one (the PEP or c-types) clearly easier to use in this case? 
> > What
> > would the code look like for handling both formats generically?
> 
> The PEP, as specified, does not support accessing individual fields from
> Python. OTOH, ctypes, as implemented, does. This comparison is not fair,
> though: an *implementation* of the PEP (say, NumPy) might also give you
> Python-level access to the fields.

I see.  So at the Python-user convenience level it's pretty much a wash.  Are
there significant differences in memory usage and/or performance?  ctypes 
sounds to be more heavyweight from the discussion.  If I have a lot of image
formats I want to support is that going to mean lots of overhead with ctypes?
Do I pay for it whether or not I actually end up having to handle an image in a
given format?

> With the PEP, you can get access to the 'r' field from C code.
> Performing this access is quite tedious; as I'm uncertain whether you
> actually wanted to see C code, I refrain from trying to formulate it.

Actually this is more what I was after.  I've written C code to interface with
Numpy arrays and found it to be not so bad.  But the data I was passing around
was just a plain N-dimensional array of doubles.  Very basic.  It *sounds* like
what Travis is saying is that handling a less simple case, like the one above
of supporting a variety of RGB image formats, would be easier with the PEP than
with ctypes.  Or maybe it's generating the data in my C code that's trickier,
as opposed to consuming it?

I'm just trying to understand what the deal is, and at the same time perhaps
inject a more concrete example into the discussion. Travis has said several
times that working with ctypes, which requires a Python type per 'element', is
more complicated from the C side, and I'd like to see more concretely how so,
as someone who may end up needing to write such code.

And I'm ok without seeing the actual code if someone can actually answer my
question.  The question is not whether it is tedious or not -- everything about
the Python C API is tedious from what I've seen.  The question is which is
*more* tedious, and how significan is the difference in tediousness to the guy
who's job it is to actually write the code.

--bb


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread Jean-Paul Calderone
On Wed, 01 Nov 2006 11:06:14 +0100, Georg Brandl <[EMAIL PROTECTED]> wrote:
>[EMAIL PROTECTED] wrote:
>> On 03:14 am, [EMAIL PROTECTED] wrote:
>>
>>  >One thing is sure -- we urgently need something better than os.path.
>>  >It functions well but it makes hard-to-read and unpythonic code.
>>
>> I'm not so sure.  The need is not any more "urgent" today than it was 5
>> years ago, when os.path was equally "unpythonic" and unreadable.  The
>> problem is real but there is absolutely no reason to hurry to a
>> premature solution.
>>
>> I've already recommended Twisted's twisted.python.filepath module as a
>> possible basis for the implementation of this feature.  I'm sorry I
>> don't have the time to pursue that.  I'm also sad that nobody else seems
>> to have noticed.  Twisted's implemenation has an advantage that it
>> doesn't seem that these new proposals do, an advantage I would really
>> like to see in whatever gets seriously considered for adoption:
>
>Looking at
>,
>it seems as if FilePath was made to serve a different purpose than what we're
>trying to discuss here:
>
>"""
>I am a path on the filesystem that only permits 'downwards' access.
>
>Instantiate me with a pathname (for example,
>FilePath('/home/myuser/public_html')) and I will attempt to only provide access
>to files which reside inside that path. [...]
>
>The correct way to use me is to instantiate me, and then do ALL filesystem
>access through me.
>"""
>
>What a successor to os.path needs is not security, it's a better (more 
>pythonic,
>if you like) interface to the old functionality.

No.  You've misunderstood the code you looked at.  FilePath serves exactly
the purpose being discussed here.  Take a closer look.

Jean-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread Fredrik Lundh
Jonathan Lange wrote:

> Then let us discuss that.

Glyph's references to bike sheds went right over your head, right?



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread Jonathan Lange
On 11/1/06, Georg Brandl <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
> > On 03:14 am, [EMAIL PROTECTED] wrote:
> >
> >  >One thing is sure -- we urgently need something better than os.path.
> >  >It functions well but it makes hard-to-read and unpythonic code.
> >
> > I'm not so sure.  The need is not any more "urgent" today than it was 5
> > years ago, when os.path was equally "unpythonic" and unreadable.  The
> > problem is real but there is absolutely no reason to hurry to a
> > premature solution.
> >
> > I've already recommended Twisted's twisted.python.filepath module as a
> > possible basis for the implementation of this feature.  I'm sorry I
> > don't have the time to pursue that.  I'm also sad that nobody else seems
> > to have noticed.  Twisted's implemenation has an advantage that it
> > doesn't seem that these new proposals do, an advantage I would really
> > like to see in whatever gets seriously considered for adoption:
>
> Looking at
> ,
> it seems as if FilePath was made to serve a different purpose than what we're
> trying to discuss here:
>
> """
> I am a path on the filesystem that only permits 'downwards' access.
>
> Instantiate me with a pathname (for example,
> FilePath('/home/myuser/public_html')) and I will attempt to only provide 
> access
> to files which reside inside that path. [...]
>
> The correct way to use me is to instantiate me, and then do ALL filesystem
> access through me.
> """
>
> What a successor to os.path needs is not security, it's a better (more 
> pythonic,
> if you like) interface to the old functionality.
>

Then let us discuss that. Is FilePath actually a better interface to
the old functionality? Even if it was designed to solve a security
problem, it might prove to be an extremely useful general interface.

jml
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] RELEASED Python 2.3.6, FINAL

2006-11-01 Thread Anthony Baxter
On behalf of the Python development team and the Python
community, I'm happy to announce the release of Python 2.3.6
(FINAL).

Python 2.3.6 is a security bug-fix release. While Python 2.5
is the latest version of Python, we're making this release for
people who are still running Python 2.3. Unlike the recently
released 2.4.4, this release only contains a small handful of
security-related bugfixes. See the website for more.

*  Python 2.3.6 contains a fix for PSF-2006-001, a buffer overrun
*  in repr() of unicode strings in wide unicode (UCS-4) builds.
*  See http://www.python.org/news/security/PSF-2006-001/ for more.

This is a **source only** release. The Windows and Mac binaries
of 2.3.5 were built with UCS-2 unicode, and are therefore not
vulnerable to the problem outlined in PSF-2006-001. The PCRE fix
is for a long-deprecated module (you should use the 're' module
instead) and the email fix can be obtained by downloading the
standalone version of the email package.

Most vendors who ship Python should have already released a
patched version of 2.3.5 with the above fixes, this release is
for people who need or want to build their own release, but don't
want to mess around with patch or svn.

There have been no changes (apart from the version number) since the
release candidate of 2.3.6.

Python 2.3.6 will complete python.org's response to PSF-2006-001.
If you're still on Python 2.2 for some reason and need to work
with UCS-4 unicode strings, please obtain the patch from the
PSF-2006-001 security advisory page. Python 2.4.4 and Python 2.5
have both already been released and contain the fix for this
security problem.

For more information on Python 2.3.6, including download links
for source archives, release notes, and known issues, please see:

http://www.python.org/2.3.6

Highlights of this new release include:

  - A fix for PSF-2006-001, a bug in repr() for unicode strings 
on UCS-4 (wide unicode) builds.
  - Two other, less critical, security fixes.

Enjoy this release,
Anthony

Anthony Baxter
[EMAIL PROTECTED]
Python Release Manager
(on behalf of the entire python-dev team)


pgp2oqjSXQGoY.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch 1462525 or similar solution?

2006-11-01 Thread Nick Coghlan
Paul Jimenez wrote:
> I submitted patch 1462525 awhile back to
> solve the problem described even longer ago in
> http://mail.python.org/pipermail/python-dev/2005-November/058301.html
> and I'm wondering what my appropriate next steps are. Honestly, I don't
> care if you take my patch or someone else's proposed solution, but I'd
> like to see something go into the stdlib so that I can eventually stop
> having to ship custom code for what is really a standard problem.

Something that has been lurking on my to-do list for the past year(!) is to 
get the urischemes module I wrote based on your uriparse module off the Python 
patch tracker [1] and into the cheese shop somewhere.

It already has limited documentation in the form of docstrings with doctest 
examples (although the non-doctest examples in the module docstring still need 
to be fixed), and there are a whole barrel tests in the _test() function which 
could be converted to unittest fairly easily.

The reason I'd like to see something in the cheese shop rather than going 
straight into the standard library is that:
   1. It may help people now, rather than in 18-24 months when 2.6 comes out
   2. The module can see some real world usage to firm up the API before we 
commit to it for the standard lib (if it gets added at all)

That said, I don't see myself finding the roundtuits to publish and promote 
this anytime soon :(

Cheers,
Nick.

[1]
http://sourceforge.net/tracker/?func=detail&aid=1500504&group_id=5470&atid=305470


-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread Georg Brandl
[EMAIL PROTECTED] wrote:
> On 03:14 am, [EMAIL PROTECTED] wrote:
> 
>  >One thing is sure -- we urgently need something better than os.path.
>  >It functions well but it makes hard-to-read and unpythonic code.
> 
> I'm not so sure.  The need is not any more "urgent" today than it was 5 
> years ago, when os.path was equally "unpythonic" and unreadable.  The 
> problem is real but there is absolutely no reason to hurry to a 
> premature solution.
> 
> I've already recommended Twisted's twisted.python.filepath module as a 
> possible basis for the implementation of this feature.  I'm sorry I 
> don't have the time to pursue that.  I'm also sad that nobody else seems 
> to have noticed.  Twisted's implemenation has an advantage that it 
> doesn't seem that these new proposals do, an advantage I would really 
> like to see in whatever gets seriously considered for adoption:

Looking at 
,
it seems as if FilePath was made to serve a different purpose than what we're
trying to discuss here:

"""
I am a path on the filesystem that only permits 'downwards' access.

Instantiate me with a pathname (for example, 
FilePath('/home/myuser/public_html')) and I will attempt to only provide access 
to files which reside inside that path. [...]

The correct way to use me is to instantiate me, and then do ALL filesystem 
access through me.
"""

What a successor to os.path needs is not security, it's a better (more pythonic,
if you like) interface to the old functionality.

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread Georg Brandl
Fredrik Lundh wrote:
> Talin wrote:
> 
>> I'm right in the middle of typing up a largish post to go on the 
>> Python-3000 mailing list about this issue. Maybe we should move it over 
>> there, since its likely that any path reform will have to be targeted at 
>> Py3K...?
> 
> I'd say that any proposal that cannot be fit into the current 2.X design 
> is simply too disruptive to go into 3.0.  So here's my proposal for 2.6 
> (reposted from the 3K list).
> 
> This is fully backwards compatible, can go right into 2.6 without 
> breaking anything, allows people to update their code as they go,
> and can be incrementally improved in future releases:
> 
>  1) Add a pathname wrapper to "os.path", which lets you do basic
> path "algebra".  This should probably be a subclass of unicode,
> and should *only* contain operations on names.
> 
>  2) Make selected "shutil" operations available via the "os" name-
> space; the old POSIX API vs. POSIX SHELL distinction is pretty
> irrelevant.  Also make the os.path predicates available via the
> "os" namespace.
> 
> This gives a very simple conceptual model for the user; to manipulate
> path *names*, use "os.path.(string)" functions or the ""
> wrapper.  To manipulate *objects* identified by a path, given either as
> a string or a path wrapper, use "os.(path)".  This can be taught in
> less than a minute.

+1. This is really straightforward and easy to learn.

I have been a supporter of the full-blown Path object in the past, but the
recent discussions have convinved me that it is just too big and too confusing,
and that you can't kill too many birds with one stone in this respect.
Most of the ugliness really lies in the path name manipulation functions, which
nicely map to methods on a path name object.

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Nick Coghlan
Travis Oliphant wrote:
> Nick Coghlan wrote:
>> In fact, it may make sense to just use the lists/strings directly as the 
>> data 
>> exchange format definitions, and let the various libraries do their own 
>> translation into their private format descriptions instead of creating a new 
>> one-type-to-describe-them-all.
> 
> Yes, I'm open to this possibility.   I basically want two things in the 
> object passed through the extended buffer protocol:
> 
> 1) It's fast on the C-level
> 2) It covers all the use-cases.
> 
> If just a particular string or list structure were passed, then I would 
> drop the data-format PEP and just have the dataformat argument of the 
> extended buffer protocol be that thing.
> 
> Then, something that converts ctypes objects to that special format 
> would be very nice indeed.

It may make sense to have a couple distinct sections in the datatype PEP:
  a. describing data formats with basic Python types
  b. a lightweight class for parsing these data format descriptions

It's most of the way there already - part A would just be the various styles 
of arguments accepted by the datatype constructor, and part B would be the 
datatype object itself.

I personally think it makes the most sense to do both, but separating the two 
would make it clear that the descriptions can be standardised without 
*necessarily* defining a new class.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote:

> I am not addressing this message to the py3k list because its general 
> message of extreme conservatism on new features is more applicable to 
> python-dev.  However, py3k designers might also take note: if py3k is 
> going to do something in this area and drop support for the "legacy" 
> os.path, it would be good to choose something that is known to work and 
> have few gotchas, rather than just choosing the devil we don't know over 
> the devil we do.  The weaknesses of os.path are at least well-understood.

that's another reason why a new design might as well be defined in
terms of the old design -- especially if the main goal is call-site 
convenience, rather than fancy new algorithms.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread glyph
On 03:14 am, [EMAIL PROTECTED] wrote:>One thing is sure -- we urgently need something better than os.path.>It functions well but it makes hard-to-read and unpythonic code.I'm not so sure.  The need is not any more "urgent" today than it was 5 years ago, when os.path was equally "unpythonic" and unreadable.  The problem is real but there is absolutely no reason to hurry to a premature solution.I've already recommended Twisted's twisted.python.filepath module as a possible basis for the implementation of this feature.  I'm sorry I don't have the time to pursue that.  I'm also sad that nobody else seems to have noticed.  Twisted's implemenation has an advantage that it doesn't seem that these new proposals do, an advantage I would really like to see in whatever gets seriously considered for adoption:*It is already used in a large body of real, working code, and therefore its limitations are known.*If I'm wrong about this, and I can't claim to really know about the relative levels of usage of all of these various projects when they're not mentioned, please cite actual experiences using them vs. using os.path.Proposals for extending the language are contentious and it is very difficult to do experimentation with non-trivial projects because nobody wants to do that and then end up with a bunch of code written in a language that is no longer supported when the experiment fails.  I understand, therefore, that language-change proposals are therefore going to be very contentious no matter what.However, there is no reason that library changes need to follow this same path.  It is perfectly feasible to write a library, develop some substantial applications with it, tweak it based on that experience, and *THEN* propose it for inclusion in the standard library.  Users of the library can happily continue using the library, whether it is accepted or not, and users of the language and standard library get a new feature for free.  For example, I plan to continue using FilePath regardless of the outcome of this discussion, although perhaps some conversion methods or adapters will be in order if a new path object makes it into the standard library.I specifically say "library" and not "recipie".  This is not a useful exercise if every user of the library has a subtly incompatible and manually tweaked version for their particular application.Path representation is a bike shed.  Nobody would have proposed writing an entirely new embedded database engine for Python: python 2.5 simply included SQLite because its utility was already proven.I also believe it is important to get this issue right.  It might be a bike shed, but it's a *very important* bike shed.  Google for "web server url filesystem path vulnerability" and you'll see what I mean.  Getting it wrong (or passing strings around everywhere) means potential security gotchas lurking around every corner.  Even Twisted, with no C code at all, got its only known arbitrary-code-execution vulnerability from a path manipulation bug.  That was even after we'd switched to an OO path-manipulation layer specifically to avoid bugs like this!I am not addressing this message to the py3k list because its general message of extreme conservatism on new features is more applicable to python-dev.  However, py3k designers might also take note: if py3k is going to do something in this area and drop support for the "legacy" os.path, it would be good to choose something that is known to work and have few gotchas, rather than just choosing the devil we don't know over the devil we do.  The weaknesses of os.path are at least well-understood.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Martin v. Löwis
Bill Baxter schrieb:
> Basically in my code I want to be able to take the binary data descriptor and
> say "give me the 'r' field of this pixel as an integer".
> 
> Is either one (the PEP or c-types) clearly easier to use in this case?  What
> would the code look like for handling both formats generically?

The PEP, as specified, does not support accessing individual fields from
Python. OTOH, ctypes, as implemented, does. This comparison is not fair,
though: an *implementation* of the PEP (say, NumPy) might also give you
Python-level access to the fields.

With the PEP, you can get access to the 'r' field from C code.
Performing this access is quite tedious; as I'm uncertain whether you
actually wanted to see C code, I refrain from trying to formulate it.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch 1462525 or similar solution?

2006-11-01 Thread Martin v. Löwis
Paul Jimenez schrieb:
> I submitted patch 1462525 awhile back to
> solve the problem described even longer ago in
> http://mail.python.org/pipermail/python-dev/2005-November/058301.html
> and I'm wondering what my appropriate next steps are. Honestly, I don't
> care if you take my patch or someone else's proposed solution, but I'd
> like to see something go into the stdlib so that I can eventually stop
> having to ship custom code for what is really a standard problem.

The problem, as I see it, is that we cannot afford to include an
"incorrect" library *again*. urllib may be ill-designed, but can't
be changed for backwards compatibility reasons. The same should
not happen to urilib: it has to be "right" from the start.

So the question is: are you willing to work on it until it is right?

I just reviewed it a bit, and have a number of questions:
- Can you please sign a contributor form, from
  http://www.python.org/psf/contrib/
  and then add the magic words ("Licensed to PSF under
  a Contributor Agreement.") to this code?

- I notice there is no documentation. Can you please come
  up with a patch to Doc/lib?

- Also, there are no test cases. Can you please come up with
  a test suite?

- Is this library also meant to support creation of URIs?
  If so, shouldn't it also do percent-encoding, if the
  input contains reserved characters. Also, shouldn't
  it perform percent-undecoding when the URI contains
  unreserved characters?

- Should this library support RFC 3987 also?

- Why does the code still name things "URL"? The RFC
  avoids this name throughout (except for explaining
  that the fact that the URI is a locator is really
  irrelevant)

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com