subject:"\[Python\-Dev\] Path object design"

Steve Holden wrote:
> Greg Ewing wrote:
> 
>>Having said that, I can see there could be an
>>element of confusion in calling it "join".
>>
> 
> Good point. "relativise" might be appropriate,

Sounds like something to make my computer go at
warp speed, which would be nice, but I won't
be expecting a patch any time soon. :-)

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Path object design

2006-11-02 Thread Steve Holden

Greg Ewing wrote:
> Mike Orr wrote:
> 
> 
>>>* This is confusing as heck:
>>>  >>> os.path.join("hello", "/world")
>>>  '/world'
> 
> 
> It's only confusing if you're not thinking of
> pathnames as abstract entities.
> 
> There's a reason for this behaviour -- it's
> so you can do things like
> 
>full_path = os.path.join(default_dir, filename_from_user)
> 
> where filename_from_user can be either a relative
> or absolute path at his discretion.
> 
> In other words, os.path.join doesn't just mean "join
> these two paths together", it means "interpret the
> second path in the context of the first".
> 
> Having said that, I can see there could be an
> element of confusion in calling it "join".
> 
Good point. "relativise" might be appropriate, though something shorter 
would be better.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Path object design

2006-11-02 Thread glyph

On 01:04 am, [EMAIL PROTECTED] wrote:>[EMAIL PROTECTED] wrote:>If you're serious about writing platform-agnostic>pathname code, you don't put slashes in the arguments>at all. Instead you do>>   os.path.join("hello", "slash", "world")>>Many of the other things you mention are also a>result of not treating pathnames as properly opaque>objects.Of course nobody who cares about these issues is going to put constant forward slashes into pathnames.  The point is not that you'll forget you're supposed to be dealing with pathnames; the point is that you're going to get input from some source that you've got very little control over, and *especially* if that source is untrusted (although sometimes just due to mistakes) there are all kinds of ways it can trip you up.  Did you accidentally pass it through something that doubles or undoubles all backslashes, etc.  Sometimes these will result in harmless errors anyway, sometimes it's a critical error that will end up trying to delete /usr instead of /home/user/installer-build/ROOT/usr.  If you have the path library catching these problems for you then a far greater percentage fall into the former category.>If you're saying that the fact they're strings makes>it easy to forget that you're supposed to be treating>them opaquely,That's exactly what I'm saying.>>  * although individual operations are atomic, shutil.copytree and friends >>aren't.  I've often seen python programs confused by partially-copied trees >>of files.>I can't see how this can be even remotely regarded>as a pathname issue, or even a filesystem interface>issue. It's no different to any other situation>where a piece of code can fall over and leave a>partial result behind.It is a bit of a stretch, I'll admit, but I included it because it is a weakness of the path library that it is difficult to do the kind of parallel iteration required to implement tree-copying yourself.  If that were trivial, then you could write your own file-copying loop and cope with errors yourself.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Path object design

[EMAIL PROTECTED] wrote:
> Relative 
> paths, if they should exist at all, should have to be explicitly linked 
> as relative to something *else* (e.g. made absolute) before they can be 
> used.

If paths were opaque objects, this could be enforced
by not having any way of constructing a path that
wasn't rooted in some existing absolute path.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Path object design

Mike Orr wrote:
> I have no idea why Microsoft thought it was a good idea to
> put the seven-odd device files in every directory. Why not force
> people to type the colon ("CON:").

Yes, this is a particularly stupid piece of braindamage
on the part of the designers of MS-DOS. As far as I
remember, even CP/M (which was itself a severely
warped and twisted version of RT11) had the good
sense to put colons on the end of such things.

But maybe "design" is too strong a word to apply
to MS-DOS...

Anyhow, I think I agree that there's really nothing
a path library can do about this. Whatever it tries
to do, the fact will remain that it's impossible to
have a regular file called "con", and users will
have to live with that.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Path object design

[EMAIL PROTECTED] wrote:

>>>> os.path.join("hello", "slash/world")
>'hello/slash/world'
>>>> os.path.join("hello", "slash//world")
>'hello/slash//world'
>Trying to formulate a general rule for what the arguments to 
> os.path.join are supposed to be is really hard.

If you're serious about writing platform-agnostic
pathname code, you don't put slashes in the arguments
at all. Instead you do

   os.path.join("hello", "slash", "world")

Many of the other things you mention are also a
result of not treating pathnames as properly opaque
objects.

If you're saying that the fact they're strings makes
it easy to forget that you're supposed to be treating
them opaquely, there may be merit in that view. It
would be an argument for making path objects a
truly opaque type instead of a subclass of string
or tuple.

>  * although individual operations are atomic, shutil.copytree and 
> friends aren't.  I've often seen python programs confused by 
> partially-copied trees of files.

I can't see how this can be even remotely regarded
as a pathname issue, or even a filesystem interface
issue. It's no different to any other situation
where a piece of code can fall over and leave a
partial result behind. As always, the cure is
defensive coding (clean up a partial result on error,
or be prepared to tolerate the presence of a previous
partial result when re-trying).

It could be argued that shutil.copytree should clean
up after itself if there is an error, but that might
not be what you want -- e.g. you might want to find
out how far it got, and maybe carry on from there
next time. It's probably better to leave things like
that to the caller.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Path object design

Mike Orr wrote:

>> * This is confusing as heck:
>>   >>> os.path.join("hello", "/world")
>>   '/world'

It's only confusing if you're not thinking of
pathnames as abstract entities.

There's a reason for this behaviour -- it's
so you can do things like

   full_path = os.path.join(default_dir, filename_from_user)

where filename_from_user can be either a relative
or absolute path at his discretion.

In other words, os.path.join doesn't just mean "join
these two paths together", it means "interpret the
second path in the context of the first".

Having said that, I can see there could be an
element of confusion in calling it "join".

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Path object design

On 11/1/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> On 01:46 am, [EMAIL PROTECTED] wrote:
> >On 11/1/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> >This is ironic coming from one of Python's celebrity geniuses.  "We
> >made this class but we don't know how it works."  Actually, it's
> >downright alarming coming from someone who knows Twisted inside and
> >out yet still can't make sense of path patform oddities.
>
> Man, it is going to be hard being ironically self-deprecating if people keep
> going around calling me a "celebrity genius".  My ego doesn't need any help,
> you know? :)

I respect Twisted in the same way I respect a loaded gun.  It's
powerful, but approach with caution.

> If you ever think I'm suggesting breaking something in Python, you're
> misinterpreting me ;).  I am as cagey as they come about this.  No matter
> what else happens, the behavior of os.path should not really change.

The point is, what *should* a join-like method do in a future improved
path module?  os.path.join should not change because too many programs
depend on its current behavior, in ways we can't necessarily predict.
But a new function/method is not bound by these constraints, as long
as the boundary cases are well documented.  All the os.path and
file-related os/shutil functions need to be reexamined in this
context.  Maybe the existing behavior is best, maybe we'll keep it
even if it's sub-optimal, but we should document why we're making
these choices.

> >The user didn't call normpath, so should we normalize it anyway?
>
> That's really the main point here.
>
> What is a path that hasn't been "normalized"?  Is it a path at all, or is it
> some random garbage with slashes (or maybe other things) in it?  os.path
> performs correct path algebra on correct inputs, and it's correct (as far as
> one can be correct) on inputs that have weird junk in them.

I'm tempted to say Path("/a/b").join("c", "d") should do the same
thing your .child method does, but allow multiple levels in one step.

But on the other hand, there will always be people with prebuilt
"path/fragments" to join to other fragments, and I'm not sure we
should force them to split the fragment just to rejoin it again.
Maybe we need a .join_unsafe method for this, haha.

-- 
Mike Orr <[EMAIL PROTECTED]>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Path object design

On 01:46 am, [EMAIL PROTECTED] wrote:>On 11/1/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:>This is ironic coming from one of Python's celebrity geniuses.  "We>made this class but we don't know how it works."  Actually, it's>downright alarming coming from someone who knows Twisted inside and>out yet still can't make sense of path patform oddities.Man, it is going to be hard being ironically self-deprecating if people keep going around calling me a "celebrity genius".  My ego doesn't need any help, you know? :)In some sense I was being serious; part of the point of abstraction is embedding some of your knowledge in your code so you don't have to keep it around in your brain all the time.  I'm sure that my analysis of path-based problems wasn't exhaustive because I don't really use os.path for path manipulation.  I use static.File and it _works_, I only remember these os.path flaws from the process of writing it, not daily use.>>  * This is confusing as heck:>>    >>> os.path.join("hello", "/world")>>    '/world'>>That's in the documentation.  I'm not sure it's "wrong".  What should>it do in this situation?  Pretend the slash isn't there?You can document anything.  That doesn't really make it a good idea.The point I was trying to make wasn't really that os.path is *wrong*.  Far from it, in fact, it defines some useful operations and they are basically always correct.  I didn't even say "wrong", I said "confusing".  FilePath is implemented strictly in terms of os.path because it _does_ do the right thing with its inputs.  The question is, how hard is it to remember what its inputs should be?>>    >>> os.path.join("hello", "slash/world")>>    'hello/slash/world'>>That has always been a loophole in the function, and many programs>depend on it.If you ever think I'm suggesting breaking something in Python, you're misinterpreting me ;).  I am as cagey as they come about this.  No matter what else happens, the behavior of os.path should not really change.>The user didn't call normpath, so should we normalize it anyway?That's really the main point here.What is a path that hasn't been "normalized"?  Is it a path at all, or is it some random garbage with slashes (or maybe other things) in it?  os.path performs correct path algebra on correct inputs, and it's correct (as far as one can be correct) on inputs that have weird junk in them.In the strings-and-functions model of paths, this all makes perfect sense, and there's no particular sensibility associated with defining ideas like "equivalency" for paths, unless that's yet another function you pass some strings to.  I definitely prefer this:    path1 == path2to this:    os.path.abspath(pathstr1) == os.path.abspath(pathstr2)though.You'll notice I used abspath instead of normpath.  As a side note, I've found interpreting relative paths as always relative to the current directory is a bad idea.  You can see this when you have a daemon that daemonizes and then opens files: the user thinks they're specifying relative paths from wherever they were when they ran the program, the program thinks they're relative paths from /var/run/whatever.  Relative paths, if they should exist at all, should have to be explicitly linked as relative to something *else* (e.g. made absolute) before they can be used.  I think that sequences of strings might be sufficient though.>Good point, but exactly what functionality do you want to see for zip>files and URLs?  Just pathname manipulation?  Or the ability to see>whether a file exists and extract it, copy it, etc?The latter.  See http://twistedmatrix.com/trac/browser/trunk/twisted/python/zippath.pyThis is still _really_ raw functionality though.  I can't claim that it has the same "it's been used in real code" endorsement as the rest of the FilePath stuff I've been talking about.  I've never even tried to hook this up to a Twisted webserver, and I've only used it in one environment.>>  * you have to care about unicode sometimes.>This is a Python-wide problem.I completely agree, and this isn't the thread to try to solve it.  The absence of a path object, however, and the path module's reliance on strings, exacerbates the problem.  The fact that FilePath doesn't deal with this either, however, is a fairly good indication that the problem is deeper than that.>>  * the documentation really can't emphasize enough how bad using>> 'os.path.exists/isfile/isdir', and then assuming the file continues to exist>> when it is a contended resource, is.  It can be handy, but it is _always_ a>> race condition.>>What else can you do?  It's either os.path.exists()/os.remove() or "do>it anyway and catch the exception".  And sometimes you have to check>the filetype in order to determine *what* to do.You have to catch the exception anyway in many cases.  I probably shouldn't have mentioned it though, it's starting to get a bit far afield of even this ridiculously far-ranging discussion.  A more accurate criticism might be that "the absence of a file locking syste

Re: [Python-Dev] Path object design

On 11/1/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> On 08:14 pm, [EMAIL PROTECTED] wrote:

> >(...) people have had to spend five years putting hard-to-read
> >os.path functions in the code, or reinventing the wheel with their own
> >libraries that they're not sure they can trust.  I started to use
> >path.py last year when it looked like it was emerging as the basis of
> >a new standard, but yanked it out again when it was clear the API
> >would be different by the time it's accepted.  I've gone back to
> >os.path for now until something stable emerges but I really wish I
> >didn't have to.
>
> You *don't* have to.  This is a weird attitude I've encountered over and
> over again in the Python community, although sometimes it masquerades as
> resistance to Twisted or Zope or whatever.  It's OK to use libraries.  It's
> OK even to use libraries that Guido doesn't like!  I'm pretty sure the first
> person to tell you that would be Guido himself.  (Well, second, since I just
> told you.)  If you like path.py and it solves your problems, use path.py.
> You don't have to cram it into the standard library to do that.  It won't be
> any harder to migrate from an old path object to a new path object than from
> os.path to a new path object, and in fact it would likely be considerably
> easier.

Oh, I understand it's OK to use libraries.  It's just that a path
library needs to be widely tested and well supported so you know it
won't scramble your files.  A bug in a date library affects only
datetimes. A bug in a database database library affects only that
database.  A bug in a template library affects only the page being
output.  But a bug in a path library could ruin your whole day.  "Um,
remember those important files in that other project directory you
weren't working in? They were just overwritten."

Also, I train several programmers new to Python at work. I want to
make them learn *one* path library that we'll be sure to stick with
for several years.  Every path library has subtle quirks, and
switching from one to another may not be just a matter of renaming
methods.

> >- the "secure" features may not be necessary.  If they are, this
> >should be a separate discussion, and perhaps implemented as a
> >subclass.
>
> The main "secure" feature is "child" and it is, in my opinion, the best part
> about the whole class.  Some of the other stuff (rummaging around for
> siblings with extensions, for example) is probably extraneous.  child,
> however, lets you take a string from arbitrary user input and map it into a
> path segment, both securely and quietly.  Here's a good example (and this
> actually happened, this is how I know about that crazy windows 'special
> files' thing I wrote in my other recent message): you have a decision-making
> program that makes two files to store information about a process: "pro" and
> "con".  It turns out that "con" is shorthand for "fall in a well and die" in
> win32-ese.  A "secure" path manipulation library would alert you to this
> problem with a traceback rather than having it inexplicably freeze.
> Obscure, sure, but less obscure would be getting deterministic errors from a
> user entering slashes into a text field that shouldn't accept them.

Perhaps you're right.  I'm not saying it *should not* be a basic
feature, just that unless the Python community as a whole is ready for
this, users should have a choice to use it or not.

I learned about DOS device files from the manuals back in the 80s.
But I had completely forgotten them when I made several "aux"
directories in a Subversion repository on Linux.  People tried to
check it out on Windows and... got some kind of error.  "CON" means
console: its input comes from the keyboard and its output goes to the
screen.  Since this is a device file, I'm not sure a path library has
any responsibility to treat it specially.  We don't treat
"/dev/stdout" specially unless the user specifically calls a device
function. I have no idea why Microsoft thought it was a good idea to
put the seven-odd device files in every directory. Why not force
people to type the colon ("CON:").  If they've memorized what CON
means they should have no trouble with the colon, especially since
it's required with "A:" and "C:" anyway

For trivia, these are the ones I remember:
CON   Console  (keyboard input, screen output)
KBRD  Keyboard input.
???  screen output
LPT1/2/3parallel ports
COM 1/2/3/4  serial ports
PRN  alias for default printer port (normally LPT1)
NUL  bit bucket
AUX  game port?

COPY CON FILENAME.TXT # Unix: "cat >filename.txt".
COPY FILENAME.TXT PRN  # Unix: "lp filename.txt"  or "cat
filename.txt | lp".
TYPE FILENAME.TXT   # Unix: "cat filename.txt".

> >Where have all the proponents of non-OO or limited-OO strategies been?
>
> This continuum doesn't make any sense to me.  Where would you

Re: [Python-Dev] Path object design

On 11/1/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> On 06:14 pm, [EMAIL PROTECTED] wrote:
> >[EMAIL PROTECTED] wrote:
> >
> >> I assert that it needs a better[1] interface because the current
> >> interface can lead to a variety of bugs through idiomatic, apparently
> >> correct usage.  All the more because many of those bugs are related to
> >> critical errors such as security and data integrity.
>
> >instead of referring to some esoteric knowledge about file systems that
> >us non-twisted-using mere mortals may not be evolved enough to under-
> >stand,
>
> On the contrary, twisted users understand even less, because (A) we've been
> demonstrated to get it wrong on numerous occasions in highly public and
> embarrassing ways and (B) we already have this class that does it all for us
> and we can't remember how it works :-).

This is ironic coming from one of Python's celebrity geniuses.  "We
made this class but we don't know how it works."  Actually, it's
downright alarming coming from someone who knows Twisted inside and
out yet still can't make sense of path patform oddities.

>  * This is confusing as heck:
>>>> os.path.join("hello", "/world")
>'/world'

That's in the documentation.  I'm not sure it's "wrong".  What should
it do in this situation?  Pretend the slash isn't there?

This came up in the directory-tuple proposal.  I said there was no
reason to change the existing behavior of join.  Noam favored an
exception.

>>>> os.path.join("hello", "slash/world")
>'hello/slash/world'

That has always been a loophole in the function, and many programs
depend on it.  Again, is it "wrong"?  Should an embedded separator in
an argument be an error?  Obviously this depends on the user's
knowledge that the separator happens to be slash.

>>>> os.path.join("hello", "slash//world")
>'hello/slash//world'

Again a case of what "should" it do?  The filesystem treats it as a
single slash.  The user didn't call normpath, so should we normalize
it anyway?

>  * Sometimes a path isn't a path; the zip "paths" in sys.path are a good
> example.  This is why I'm a big fan of including a polymorphic interface of
> some kind: this information is *already* being persisted in an ad-hoc and
> broken way now, so it needs to be represented; it would be good if it were
> actually represented properly.  URL
> manipulation-as-path-manipulation is another; the recent
> perforce use-case mentioned here is a special case of that, I think.

Good point, but exactly what functionality do you want to see for zip
files and URLs?  Just pathname manipulation?  Or the ability to see
whether a file exists and extract it, copy it, etc?

>  * you have to care about unicode sometimes.  rarely enough that none of
> your tests will ever account for it, but often enough that _some_ users will
> notice breakage if your code is ever widely distributed.

This is a Python-wide problem.  The move to universal unicode will
lessen this, or at least move the problem to *one* place (creating the
unicode object), where every Python programmer will get bitten by it
and we'll develop a few standard strategies to deal with it.

(The problem is that if str and unicode are mixed in expressions,
Python will promote the str to unicode and you'll get a
UnicodeDecodeError if it contains non-ASCII characters.  Figuring out
all the ways such strings can slip into a program is difficult if
you're dealing with user strings from an unknown charset, or your
MySQL server is configured differently than you thought it was, or the
string contains Windows curly quotes et al which are undefined in
Latin-1.)

>  * the documentation really can't emphasize enough how bad using
> 'os.path.exists/isfile/isdir', and then assuming the file continues to exist
> when it is a contended resource, is.  It can be handy, but it is _always_ a
> race condition.

What else can you do?  It's either os.path.exists()/os.remove() or "do
it anyway and catch the exception".  And sometimes you have to check
the filetype in order to determine *what* to do.

-- 
Mike Orr <[EMAIL PROTECTED]>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Path object design

On 08:14 pm, [EMAIL PROTECTED] wrote:>Argh, it's difficult to respond to one topic that's now spiraling into>two conversations on two lists.>[EMAIL PROTECTED] wrote:>(...) people have had to spend five years putting hard-to-read>os.path functions in the code, or reinventing the wheel with their own>libraries that they're not sure they can trust. I started to use>path.py last year when it looked like it was emerging as the basis of>a new standard, but yanked it out again when it was clear the API>would be different by the time it's accepted. I've gone back to>os.path for now until something stable emerges but I really wish I>didn't have to.You *don't* have to. This is a weird attitude I've encountered over and over again in the Python community, although sometimes it masquerades as resistance to Twisted or Zope or whatever. It's OK to use libraries. It's OK even to use libraries that Guido doesn't like! I'm pretty sure the first person to tell you that would be Guido himself. (Well, second, since I just told you.) If you like path.py and it solves your problems, use path.py. You don't have to cram it into the standard library to do that. It won't be any harder to migrate from an old path object to a new path object than from os.path to a new path object, and in fact it would likely be considerably easier.>> *It is already used in a large body of real, working code, and>> therefore its limitations are known.*>>This is an important consideration.However, to me a clean API is more>important.It's not that I don't think a "clean" API is important. It's that I think that "clean" is a subjective assessment that is hard to back up, and it helps to have some data saying "we think this is clean because there are very few bugs in this 100,000 line program written using it". Any code that is really easy to use right will tend to have *some* aesthetic appeal.>I took a quick look at filepath. It looks similar in concept to PEP>355. Four concerns:> - unfamiliar method names (createDirectory vs mkdir, child vs join)Fair enough, but "child" really means child, not join. It is explicitly for joining one additional segment, with no slashes in it.> - basename/dirname/parent are methods rather than properties:>leads to () overproliferation in user code.The () is there because every invocation returns a _new_ object. I think that this is correct behavior but I also would prefer that it remain explicit.> - the "secure" features may not be necessary. If they are, this>should be a separate discussion, and perhaps implemented as a>subclass.The main "secure" feature is "child" and it is, in my opinion, the best part about the whole class. Some of the other stuff (rummaging around for siblings with extensions, for example) is probably extraneous. child, however, lets you take a string from arbitrary user input and map it into a path segment, both securely and quietly. Here's a good example (and this actually happened, this is how I know about that crazy windows 'special files' thing I wrote in my other recent message): you have a decision-making program that makes two files to store information about a process: "pro" and "con". It turns out that "con" is shorthand for "fall in a well and die" in win32-ese. A "secure" path manipulation library would alert you to this problem with a traceback rather than having it inexplicably freeze. Obscure, sure, but less obscure would be getting deterministic errors from a user entering slashes into a text field that shouldn't accept them.> - stylistic objection to verbose camelCase names like createDirectoryThere is no accounting for taste, I suppose. Obviously if it violates the stlib's naming conventions it would have to be adjusted.>> Path representation is a bike shed. Nobody would have proposed>> writing an entirely new embedded database engine for Python: python>> 2.5 simply included SQLite because its utility was already proven.>>There's a quantum level of difference between path/file manipulation>-- which has long been considered a requirement for any full-featured>programming language -- and a database engine which is much more>complex."quantum" means "the smallest possible amount", although I don't think you're using like that, so I think I agree with you. No, it's not as hard as writing a database engine. Nevertheless it is a non-trivial problem, one worthy of having its own library and clearly capable of generating a fair amount of its own discussion.>Fredrik has convinced me that it's more urgent to OOize the pathname>conversions than the filesystem operations.I agree in the relative values. I am still unconvinced that either is "urgent" in the sense that it needs to be in the standard library.>Where have all the proponents of non-OO or limited-OO strategies been?This continuum doesn't make any sense to me. Where would you place Twisted's solution on it?___
Python-Dev mailing list
Python-Dev@python.org

Re: [Python-Dev] Path object design

On 06:14 pm, [EMAIL PROTECTED] wrote:>[EMAIL PROTECTED] wrote:>>> I assert that it needs a better[1] interface because the current>> interface can lead to a variety of bugs through idiomatic, apparently>> correct usage.  All the more because many of those bugs are related to>> critical errors such as security and data integrity.>instead of referring to some esoteric knowledge about file systems that>us non-twisted-using mere mortals may not be evolved enough to under->stand,On the contrary, twisted users understand even less, because (A) we've been demonstrated to get it wrong on numerous occasions in highly public and embarrassing ways and (B) we already have this class that does it all for us and we can't remember how it works :-).>maybe you could just make a list of common bugs that may arise>due to idiomatic use of the existing primitives?Here are some common gotchas that I can think of off the top of my head.  Not all of these are resolved by Twisted's path class:Path manipulation: * This is confusing as heck:   >>> os.path.join("hello", "/world")   '/world'   >>> os.path.join("hello", "slash/world")   'hello/slash/world'   >>> os.path.join("hello", "slash//world")   'hello/slash//world'   Trying to formulate a general rule for what the arguments to os.path.join are supposed to be is really hard.  I can't really figure out what it would be like on a non-POSIX/non-win32 platform. * it seems like slashes should be more aggressively converted to backslashes on windows, because it's near impossible to do anything with os.sep in the current situation. * "C:blah" does not mean what you think it means on Windows.  Regardless of what you think it means, it is not that.  I thought I understood it once as the current process having a current directory on every mapped drive, but then I had to learn about UNC paths of network mapped drives and it stopped making sense again. * There are special files on windows such as "CON" and "NUL" which exist in _every_ directory.  Twisted does get around this, by looking at the result of abspath:   >>> os.path.abspath("c:/foo/bar/nul")   'nul' * Sometimes a path isn't a path; the zip "paths" in sys.path are a good example.  This is why I'm a big fan of including a polymorphic interface of some kind: this information is *already* being persisted in an ad-hoc and broken way now, so it needs to be represented; it would be good if it were actually represented properly.  URL manipulation-as-path-manipulation is another; the recent perforce use-case mentioned here is a special case of that, I think. * paths can have spaces in them and there's no convenient, correct way to quote them if you want to pass them to some gross function like os.system - and a lot of the code that manipulates paths is shell-script-replacement crud which wants to call gross functions like os.system.  Maybe this isn't really the path manipulation code's fault, but it's where people start looking when they want properly quoted path arguments. * you have to care about unicode sometimes.  rarely enough that none of your tests will ever account for it, but often enough that _some_ users will notice breakage if your code is ever widely distributed.  this is an even more obscure example, but pygtk always reports pathnames in utf8-encoded *byte* strings, regardless of your filesystem encoding.  If you forget to decode/encode it, hilarity ensues.  There's no consistent error reporting (as far as I can tell, I have encountered this rarely) and no real way to detect this until you have an actual insanely-configured system with an insanely-named file on it to test with.  (Polymorphic interfaces might help a *bit* here.  At worst, they would at least make it possible to develop a canonical "insanely encoded filesystem" test-case backend.  At best, you'd absolutely have to work in terms of unicode all the time, and no implicit encoding issues would leak through to application code.)  Twisted's thing doesn't deal with this at all, and it really should. * also *sort* of an encoding issue, although basically only for webservers or other network-accessible paths: thanks to some of these earlier issues as well as %2e%2e, there are effectively multiple ways to spell "..".  Checking for all of them is impossible, you need to use the os.path APIs to determine if the paths you've got really relate in the ways you think they do. * os.pathsep can be, and actually sometimes is, embedded in a path.  (again, more  of a general path problem, not really python's fault) * relative path manipulation is difficult.  ever tried to write the function to iterate two separate trees of files in parallel?  shutil re-implements this twice completely differently via recursion, and it's harder to do with a generator (which is what you really want).  you can't really split on os.sep and have it be correct due to the aforementioned windows-path issue, but that's what everybody does anyway. * os.path.split doesn't work anything like str.split.FS ma

Re: [Python-Dev] Path object design

Argh, it's difficult to respond to one topic that's now spiraling into
two conversations on two lists.

[EMAIL PROTECTED] wrote:
> On 03:14 am, [EMAIL PROTECTED] wrote:
>
> >One thing is sure -- we urgently need something better than os.path.
> >It functions well but it makes hard-to-read and unpythonic code.
>
> I'm not so sure.  The need is not any more "urgent" today than it was
> 5 years ago, when os.path was equally "unpythonic" and unreadable.
> The problem is real but there is absolutely no reason to hurry to a
> premature solution.

Except that people have had to spend five years putting hard-to-read
os.path functions in the code, or reinventing the wheel with their own
libraries that they're not sure they can trust.  I started to use
path.py last year when it looked like it was emerging as the basis of
a new standard, but yanked it out again when it was clear the API
would be different by the time it's accepted.  I've gone back to
os.path for now until something stable emerges but I really wish I
didn't have to.

> I've already recommended Twisted's twisted.python.filepath module as a
> possible basis for the implementation of this feature

> *It is already used in a large body of real, working code, and
> therefore its limitations are known.*

This is an important consideration.However, to me a clean API is more
important.  Since we haven't agreed on an API there is no widely-used
module that implements it... it's a chicken-and-egg problem since it
takes significant time to write and test an implementation.  So I'd
like to start from the standpoint of an ideal API rather than just
taking the API of the most widely-used implementation.  os.path is
clearly the most widely-used implementation, but that doesn't mean
that OOizing it as-is would be my favorite choice.

I took a quick look at filepath.  It looks similar in concept to PEP
355.  Four concerns:
- unfamiliar method names (createDirectory vs mkdir, child vs join)
- basename/dirname/parent are methods rather than properties:
leads to () overproliferation in user code.
- the "secure" features may not be necessary.  If they are, this
should be a separate discussion, and perhaps implemented as a
subclass.
- stylistic objection to verbose camelCase names like createDirectory

> Proposals for extending the language are contentious and it is very
> difficult to do experimentation with non-trivial projects because
> nobody wants to do that and then end up with a bunch of code written
> in a language that is no longer supported when the experiment fails.

True.

> Path representation is a bike shed.  Nobody would have proposed
> writing an entirely new embedded database engine for Python: python
> 2.5 simply included SQLite because its utility was already proven.

There's a quantum level of difference between path/file manipulation
-- which has long been considered a requirement for any full-featured
programming language -- and a database engine which is much more
complex.

Georg Brandl <[EMAIL PROTECTED]> wrote:
> I have been a supporter of the full-blown Path object in the past, but the
> recent discussions have convinved me that it is just too big and too 
> confusing,
> and that you can't kill too many birds with one stone in this respect.
> Most of the ugliness really lies in the path name manipulation functions, 
> which
> nicely map to methods on a path name object.

Fredrik has convinced me that it's more urgent to OOize the pathname
conversions than the filesystem operations.  Pathname conversions are
the ones that frequently get nested or chained, whereas filesystem
operations are usually done at the top level of a program statement,
or return a different "kind" of value (stat, true/false, etc).

However, it's interesting that all the proposals I've seen in the past
three years have been a "monolithic" OO class.  Clearly there are a
lot of people who prefer this way, or at least have never heard of
anything different.  Where have all the proponents of non-OO or
limited-OO strategies been?  The first proposal of that sort I've seen
was Nich Cochlan's October 1.  Have y'all just been ignoring the
monolithic OO efforts without offering any alternatives?

Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> > This is fully backwards compatible, can go right into 2.6 without
> > breaking anything, allows people to update their code as they go,
> > and can be incrementally improved in future releases:
> >
> >  1) Add a pathname wrapper to "os.path", which lets you do basic
> > path "algebra".  This should probably be a subclass of unicode,
> > and should *only* contain operations on names.
> >
> >  2) Make selected "shutil" operations available via the "os" name-
> > space; the old POSIX API vs. POSIX SHELL distinction is pretty
> > irrelevant.  Also make the os.path predicates available via the
> > "os" namespace.
> >
> > This gives a very simple conceptual model for the user; to manipu

[Python-Dev] Path object design

2006-11-01 Thread Jim Jewett

On 10:06 am, g.brandl at gmx.net wrote:
>> What a successor to os.path needs is not security, it's a better
(more pythonic,
>> if you like) interface to the old functionality.

Glyph:

> Why?

> Rushing ... could exacerbate a very real problem, e.g.
> the security and data-integrity implications of idiomatic usage.

The proposed Path object (or new path module) is intended to replace
os.path.  If it can't do the equivalent of "cd ..", then it isn't a
replacement; it is just another similar alternative to confuse
beginners.

If you're saying that a webserver should use a more restricted
subclass (or even the existing FilePath alternative), then I agree.
I'll even agree that a restricted version would ideally be available
out of the box.  I don't think it should be the only option.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Path object design

2006-11-01 Thread Fredrik Lundh

[EMAIL PROTECTED] wrote:

> I assert that it needs a better[1] interface because the current 
> interface can lead to a variety of bugs through idiomatic, apparently 
> correct usage.  All the more because many of those bugs are related to 
> critical errors such as security and data integrity.

instead of referring to some esoteric knowledge about file systems that 
us non-twisted-using mere mortals may not be evolved enough to under- 
stand, maybe you could just make a list of common bugs that may arise 
due to idiomatic use of the existing primitives?

I promise to make a nice FAQ entry out of it, with proper attribution.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Path object design

2006-11-01 Thread Georg Brandl

[EMAIL PROTECTED] wrote:
> On 10:06 am, [EMAIL PROTECTED] wrote:
>  >What a successor to os.path needs is not security, it's a better (more 
> pythonic,
>  >if you like) interface to the old functionality.
> 
> Why?
> 
> I assert that it needs a better[1] interface because the current 
> interface can lead to a variety of bugs through idiomatic, apparently 
> correct usage.  All the more because many of those bugs are related to 
> critical errors such as security and data integrity.

AFAICS, people just want an interface that is easier to use and feels more...
err... (trying to avoid the p-word). I've never seen security arguments
being made in this discussion.

> If I felt the current interface did a good job at doing the right thing 
> in the right situation, but was cumbersome to use, I would strenuously 
> object to _any_ work taking place to change it.  This is a hard API to 
> get right.

Well, it's hard to change any running system with that attitude. It doesn't
have to be changed if nobody comes up with something that's agreed (*) to
be better.

(*) agreed in the c.l.py sense, of course

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Path object design

On 10:06 am, [EMAIL PROTECTED] wrote:>What a successor to os.path needs is not security, it's a better (more pythonic,>if you like) interface to the old functionality.Why?I assert that it needs a better[1] interface because the current interface can lead to a variety of bugs through idiomatic, apparently correct usage.  All the more because many of those bugs are related to critical errors such as security and data integrity.If I felt the current interface did a good job at doing the right thing in the right situation, but was cumbersome to use, I would strenuously object to _any_ work taking place to change it.  This is a hard API to get right.[1]: I am rather explicitly avoiding the word "pythonic" here.  It seems to have grown into a shibboleth (and its counterpart, "unpythonic", into an expletive).  I have the impression it used to mean something a bit more specific, maybe adherence to Tim Peters' "Zen" (although that was certainly vague enough by itself and not always as self-evidently true as some seem to believe).  More and more, now, though, I hear it used to mean 'stuff should be more betterer!' and then everyone nods sagely because we know that no filthy *java* programmer wants things to be more betterer; *we* know *they* want everything to be horrible.  Words like this are a pet peeve of mine though, so perhaps I am overstating the case.  Anyway, moving on... as long as I brought up the Zen, perhaps a particular couplet is appropriate here:  Now is better than never.  Although never is often better than *right* now.Rushing to a solution to a non-problem, e.g. the "pythonicness" of the interface, could exacerbate a very real problem, e.g. the security and data-integrity implications of idiomatic usage.  Granted, it would be hard to do worse than os.path, but it is by no means impossible (just look at any C program!), and I can think of a couple of kinds of API which would initially appear more convenient but actually prove more problematic over time.That brings me back to my original point: the underlying issue here is too important a problem to get wrong *again* on the basis of a superficial "need" for an API that is "better" in some unspecified way.  os.path is at least possible to get right if you know what you're doing, which is no mean feat; there are many path-manipulation libraries in many languages which cannot make that claim (especially portably).  Its replacement might not be.  Getting this wrong outside the standard library might create problems for some people, but making it worse _in_ the standard library could create a total disaster for everyone.I do believe that this wouldn't get past the dev team (least of all the release manager) but it would waste a lot less of everyone's time if we focused the inevitable continuing bike-shed discussion along the lines of discussing the known merits of widely deployed alternative path libraries, or at least an approach to *get* that data on some new code if there is consensus that existing alternatives are in some way inadequate.If for some reason it _is_ deemed necessary to go with an untried approach, I can appreciate the benefits that /F has proposed of trying to base the new interface entirely and explicitly off the old one.  At least that way it will still definitely be possible to get right.  There are problems with that too, but they are less severe.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Path object design

2006-11-01 Thread Jean-Paul Calderone

On Wed, 01 Nov 2006 11:06:14 +0100, Georg Brandl <[EMAIL PROTECTED]> wrote:
>[EMAIL PROTECTED] wrote:
>> On 03:14 am, [EMAIL PROTECTED] wrote:
>>
>>  >One thing is sure -- we urgently need something better than os.path.
>>  >It functions well but it makes hard-to-read and unpythonic code.
>>
>> I'm not so sure.  The need is not any more "urgent" today than it was 5
>> years ago, when os.path was equally "unpythonic" and unreadable.  The
>> problem is real but there is absolutely no reason to hurry to a
>> premature solution.
>>
>> I've already recommended Twisted's twisted.python.filepath module as a
>> possible basis for the implementation of this feature.  I'm sorry I
>> don't have the time to pursue that.  I'm also sad that nobody else seems
>> to have noticed.  Twisted's implemenation has an advantage that it
>> doesn't seem that these new proposals do, an advantage I would really
>> like to see in whatever gets seriously considered for adoption:
>
>Looking at
>,
>it seems as if FilePath was made to serve a different purpose than what we're
>trying to discuss here:
>
>"""
>I am a path on the filesystem that only permits 'downwards' access.
>
>Instantiate me with a pathname (for example,
>FilePath('/home/myuser/public_html')) and I will attempt to only provide access
>to files which reside inside that path. [...]
>
>The correct way to use me is to instantiate me, and then do ALL filesystem
>access through me.
>"""
>
>What a successor to os.path needs is not security, it's a better (more 
>pythonic,
>if you like) interface to the old functionality.

No.  You've misunderstood the code you looked at.  FilePath serves exactly
the purpose being discussed here.  Take a closer look.

Jean-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Path object design

2006-11-01 Thread Fredrik Lundh

Jonathan Lange wrote:

> Then let us discuss that.

Glyph's references to bike sheds went right over your head, right?



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Path object design

2006-11-01 Thread Jonathan Lange

On 11/1/06, Georg Brandl <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
> > On 03:14 am, [EMAIL PROTECTED] wrote:
> >
> >  >One thing is sure -- we urgently need something better than os.path.
> >  >It functions well but it makes hard-to-read and unpythonic code.
> >
> > I'm not so sure.  The need is not any more "urgent" today than it was 5
> > years ago, when os.path was equally "unpythonic" and unreadable.  The
> > problem is real but there is absolutely no reason to hurry to a
> > premature solution.
> >
> > I've already recommended Twisted's twisted.python.filepath module as a
> > possible basis for the implementation of this feature.  I'm sorry I
> > don't have the time to pursue that.  I'm also sad that nobody else seems
> > to have noticed.  Twisted's implemenation has an advantage that it
> > doesn't seem that these new proposals do, an advantage I would really
> > like to see in whatever gets seriously considered for adoption:
>
> Looking at
> ,
> it seems as if FilePath was made to serve a different purpose than what we're
> trying to discuss here:
>
> """
> I am a path on the filesystem that only permits 'downwards' access.
>
> Instantiate me with a pathname (for example,
> FilePath('/home/myuser/public_html')) and I will attempt to only provide 
> access
> to files which reside inside that path. [...]
>
> The correct way to use me is to instantiate me, and then do ALL filesystem
> access through me.
> """
>
> What a successor to os.path needs is not security, it's a better (more 
> pythonic,
> if you like) interface to the old functionality.
>

Then let us discuss that. Is FilePath actually a better interface to
the old functionality? Even if it was designed to solve a security
problem, it might prove to be an extremely useful general interface.

jml
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Path object design

2006-11-01 Thread Georg Brandl

[EMAIL PROTECTED] wrote:
> On 03:14 am, [EMAIL PROTECTED] wrote:
> 
>  >One thing is sure -- we urgently need something better than os.path.
>  >It functions well but it makes hard-to-read and unpythonic code.
> 
> I'm not so sure.  The need is not any more "urgent" today than it was 5 
> years ago, when os.path was equally "unpythonic" and unreadable.  The 
> problem is real but there is absolutely no reason to hurry to a 
> premature solution.
> 
> I've already recommended Twisted's twisted.python.filepath module as a 
> possible basis for the implementation of this feature.  I'm sorry I 
> don't have the time to pursue that.  I'm also sad that nobody else seems 
> to have noticed.  Twisted's implemenation has an advantage that it 
> doesn't seem that these new proposals do, an advantage I would really 
> like to see in whatever gets seriously considered for adoption:

Looking at 
,
it seems as if FilePath was made to serve a different purpose than what we're
trying to discuss here:

"""
I am a path on the filesystem that only permits 'downwards' access.

Instantiate me with a pathname (for example, 
FilePath('/home/myuser/public_html')) and I will attempt to only provide access 
to files which reside inside that path. [...]

The correct way to use me is to instantiate me, and then do ALL filesystem 
access through me.
"""

What a successor to os.path needs is not security, it's a better (more pythonic,
if you like) interface to the old functionality.

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Path object design

2006-11-01 Thread Georg Brandl

Fredrik Lundh wrote:
> Talin wrote:
> 
>> I'm right in the middle of typing up a largish post to go on the 
>> Python-3000 mailing list about this issue. Maybe we should move it over 
>> there, since its likely that any path reform will have to be targeted at 
>> Py3K...?
> 
> I'd say that any proposal that cannot be fit into the current 2.X design 
> is simply too disruptive to go into 3.0.  So here's my proposal for 2.6 
> (reposted from the 3K list).
> 
> This is fully backwards compatible, can go right into 2.6 without 
> breaking anything, allows people to update their code as they go,
> and can be incrementally improved in future releases:
> 
>  1) Add a pathname wrapper to "os.path", which lets you do basic
> path "algebra".  This should probably be a subclass of unicode,
> and should *only* contain operations on names.
> 
>  2) Make selected "shutil" operations available via the "os" name-
> space; the old POSIX API vs. POSIX SHELL distinction is pretty
> irrelevant.  Also make the os.path predicates available via the
> "os" namespace.
> 
> This gives a very simple conceptual model for the user; to manipulate
> path *names*, use "os.path.(string)" functions or the ""
> wrapper.  To manipulate *objects* identified by a path, given either as
> a string or a path wrapper, use "os.(path)".  This can be taught in
> less than a minute.

+1. This is really straightforward and easy to learn.

I have been a supporter of the full-blown Path object in the past, but the
recent discussions have convinved me that it is just too big and too confusing,
and that you can't kill too many birds with one stone in this respect.
Most of the ugliness really lies in the path name manipulation functions, which
nicely map to methods on a path name object.

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Path object design

2006-11-01 Thread Fredrik Lundh

[EMAIL PROTECTED] wrote:

> I am not addressing this message to the py3k list because its general 
> message of extreme conservatism on new features is more applicable to 
> python-dev.  However, py3k designers might also take note: if py3k is 
> going to do something in this area and drop support for the "legacy" 
> os.path, it would be good to choose something that is known to work and 
> have few gotchas, rather than just choosing the devil we don't know over 
> the devil we do.  The weaknesses of os.path are at least well-understood.

that's another reason why a new design might as well be defined in
terms of the old design -- especially if the main goal is call-site 
convenience, rather than fancy new algorithms.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Path object design