Re: [Python-Dev] Path object design

2006-11-06 Thread Steve Holden
Fredrik Lundh wrote:
> Andrew Dalke wrote:
> 
> 
>>>as I said, today's urljoin doesn't guarantee that the output is
>>>the *shortest* possible way to represent the resulting URI.
>>
>>I didn't think anyone was making that claim.  The module claims
>>RFC 1808 compliance.  From the docstring:
>>
>>DESCRIPTION
>>See RFC 1808: "Relative Uniform Resource Locators", by R. Fielding,
>>UC Irvine, June 1995.
>>
>>Now quoting from RFC 1808:
>>
>>   5.2.  Abnormal Examples
>>
>>   Although the following abnormal examples are unlikely to occur in
>>   normal practice, all URL parsers should be capable of resolving them
>>   consistently.
> 
> 
>>My claim is that "consistent" implies "in the spirit of the rest of the RFC"
>>and "to a human trying to make sense of the results" and not only
>>mean "does the same thing each time."  Else
>>
>>
>urljoin("http://blah.com/";, "../../..")
>>
>>'http://blah.com/there/were/too/many/dot-dot/path/elements/in/the/relative/url'
>>
>>would be equally consistent.
> 
> 
> perhaps, but such an urljoin wouldn't pass the
> 
>  minimize(base + relative) == minimize(urljoin(base, relative))
> 
> test that today's urljoin passes (where "minimize" is defined as "create 
> the shortest possible URI that identifies the same target, according to 
> the relevant RFC").
> 
> isn't the real issue in this subthread whether urljoin should be 
> expected to pass the
> 
>  minimize(base + relative) == urljoin(base, relative)
> 
> test?
> 
I should hope that *is* the issue, and I should further hope that the 
general wish would be for it to pass that test. Of course web systems 
have been riddled with canonicalization errors in the past, so it'd be 
best if you and/or Andrew could provide a minimize() implementation :-)

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-06 Thread Martin v. Löwis
Fredrik Lundh schrieb:
>> urlparse.urljoin("http://blah.com/";, "../")
>>
>> should also give 'http://blah.com/'.
> 
> make that: could also give 'http://blah.com/'.

How so? If that would implement RFC 3986, you can
get only a single outcome, if urljoin is meant
to implement section 5 of that RFC.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-06 Thread Martin v. Löwis
Andrew Dalke schrieb:
> Hence I would say to just grab their library.  And perhaps update the
> naming scheme.

Unfortunately, this is not an option. *You* can just grab their library;
the Python distribution can't. Doing so would mean to fork, and history
tells that forks cause problems in the long run. OTOH, if the 4Suite
people would contribute the library, integrating it would be an option.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-06 Thread Fredrik Lundh
Andrew Dalke wrote:

>> as I said, today's urljoin doesn't guarantee that the output is
>> the *shortest* possible way to represent the resulting URI.
> 
> I didn't think anyone was making that claim.  The module claims
> RFC 1808 compliance.  From the docstring:
> 
> DESCRIPTION
> See RFC 1808: "Relative Uniform Resource Locators", by R. Fielding,
> UC Irvine, June 1995.
> 
> Now quoting from RFC 1808:
> 
>5.2.  Abnormal Examples
> 
>Although the following abnormal examples are unlikely to occur in
>normal practice, all URL parsers should be capable of resolving them
>consistently.

> My claim is that "consistent" implies "in the spirit of the rest of the RFC"
> and "to a human trying to make sense of the results" and not only
> mean "does the same thing each time."  Else
> 
 urljoin("http://blah.com/";, "../../..")
> 'http://blah.com/there/were/too/many/dot-dot/path/elements/in/the/relative/url'
> 
> would be equally consistent.

perhaps, but such an urljoin wouldn't pass the

 minimize(base + relative) == minimize(urljoin(base, relative))

test that today's urljoin passes (where "minimize" is defined as "create 
the shortest possible URI that identifies the same target, according to 
the relevant RFC").

isn't the real issue in this subthread whether urljoin should be 
expected to pass the

 minimize(base + relative) == urljoin(base, relative)

test?



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-06 Thread Andrew Dalke
Andrew:
> >>> urlparse.urljoin("http://blah.com/";, "..")
> 'http://blah.com/'
> >>> urlparse.urljoin("http://blah.com/";, "../")
> 'http://blah.com/../'
> >>> urlparse.urljoin("http://blah.com/";, "../..")
> 'http://blah.com/'

/F:
> as I said, today's urljoin doesn't guarantee that the output is
> the *shortest* possible way to represent the resulting URI.

I didn't think anyone was making that claim.  The module claims
RFC 1808 compliance.  From the docstring:

DESCRIPTION
See RFC 1808: "Relative Uniform Resource Locators", by R. Fielding,
UC Irvine, June 1995.

Now quoting from RFC 1808:

   5.2.  Abnormal Examples

   Although the following abnormal examples are unlikely to occur in
   normal practice, all URL parsers should be capable of resolving them
   consistently.  Each example uses the same base as above.

   An empty reference resolves to the complete base URL:

  <>= http://a/b/c/d;p?q#f>

   Parsers must be careful in handling the case where there are more
   relative path ".." segments than there are hierarchical levels in the
   base URL's path.

My claim is that "consistent" implies "in the spirit of the rest of the RFC"
and "to a human trying to make sense of the results" and not only
mean "does the same thing each time."  Else

>>> urljoin("http://blah.com/";, "../../..")
'http://blah.com/there/were/too/many/dot-dot/path/elements/in/the/relative/url'

would be equally consistent.

>>> for rel in ".. ../ ../.. ../../ ../../.. ../../../ ../../../..".split():
...   print repr(rel), repr(urlparse.urljoin("http://blah.com/";, rel))
...
'..' 'http://blah.com/'
'../' 'http://blah.com/../'
'../..' 'http://blah.com/'
'../../' 'http://blah.com/../../'
'../../..' 'http://blah.com/../'
'../../../' 'http://blah.com/../../../'
'../../../..' 'http://blah.com/../../'

I grant there is a consistency there.  It's not one most would have
predicted beforehand.

Then again, "should" is that wishy-washy "unless you've got a good
reason to do it a different way" sort of constraint.

Andrew
[EMAIL PROTECTED]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-06 Thread Fredrik Lundh
Martin v. Löwis wrote:

> Andrew Dalke schrieb:
> urlparse.urljoin("http://blah.com/";, "..")
>> 'http://blah.com/'
> urlparse.urljoin("http://blah.com/";, "../")
>> 'http://blah.com/../'
> urlparse.urljoin("http://blah.com/";, "../..")
>> 'http://blah.com/'
>>
>> Does the result make sense to you?  Does it make
>> sense that the last of these is shorter than the middle
>> one?  It sure doesn't to me.  I thought it was obvious
>> that there was an error;
> 
> That wasn't obvious at all to me. Now looking at the
> examples, I agree there is an error. The middle one
> is incorrect;
> 
> urlparse.urljoin("http://blah.com/";, "../")
> 
> should also give 'http://blah.com/'.

make that: could also give 'http://blah.com/'.

as I said, today's urljoin doesn't guarantee that the output is
the *shortest* possible way to represent the resulting URI.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-06 Thread Andrew Dalke
Martin:
> It still should be possible to come up with examples for these as
> well, no? For example, if you pass a relative URI as the base
> URI, what would you like to see happen?

Until two days ago I didn't even realize that was an incorrect
use of urljoin.  I can't be the only one.  Hence, raise an
exception - just like 4Suite's Uri.py does.

> That's true. Actually, it's probably not true; it will only get fixed
> if some volunteer contributes a fix.

And it's not I.  A true fix is a lot of work.  I would rather use Uri.py,
now that I see it handles everything I care about, and then some.
Eg, file name <-> URI conversion.

> So do you think this patch meets your requirements?

# new
>>> uriparse.urljoin("http://spam/";, "foo/bar")
'http://spam//foo/bar'
>>>

# existing
>>> urlparse.urljoin("http://spam/";, "foo/bar")
'http://spam/foo/bar'
>>>

No.  That was the first thing I tried.  Also found

>>> urlparse.urljoin("http://blah";, "/spam/")
'http://blah/spam/'
>>> uriparse.urljoin("http://blah";, "/spam/")
'http://blah/spam'
>>>

I reported these on the  patch page.  Nothing else strange
came up, but I did only try http urls and not the others.

My "requirements", meaning my vague, spur-of-the-moment thoughts
without any research or experimentation to determing their validity,
are different than those for Python.

My real requirements are met by the existing code.

My imagined ones include support for edge cases, the idna
codec, unicode, and real-world use on a variety of OSes.

4Suite's Uri.py seems to have this.  Eg, lots of edge-case
code like

# On Windows, ensure that '|', not ':', is used in a drivespec.
if os.name == 'nt' and scheme == 'file':
path = path.replace(':','|',1)

Hence the uriparse.py patch does not meet my hypothetical
requirements .

Python's requirements are probably to get closer to the spec.
In which case yes, it's at least as good as and likely generally
better than the existing module, modulo a few API naming debates
and perhaps some rough edges which will be found when put into use.

And perhaps various arguments about how bug compatible it should be
and if the old code should be available as well as the new one,
for those who depend on the existing 1808-allowed implementation
dependent behavior.

For those I have not the experience to guide me and no care to push
the debate.  I've decided I'm going to experiment using 4Suite's Uri.py
for my code because it handles things I want which are outside of
the scope of uriparse.py

> This topic (URL parsing) is not only inherently difficult to
> implement, it is just as tedious to review. Without anybody
> reviewing the contributed code, it's certain that it will never
> be incorporated.

I have a different opinion.

Python's url manipulation code is a mess.  urlparse, urllib, urllib2.
Why is "urlencode" part of urllib and not urllib2?  For that matter,
urllib is labeled 'Open an arbitrary URL' and not 'and also do
manipulations on parts of URLs."

I don't want to start fixing code because doing it the way I want to
requires a new API and a much better understanding of the RFCs
than I care about, especially since 4Suite and others have already
done this.

Hence I would say to just grab their library.  And perhaps update the
naming scheme.

Also, urlgrabber and pycURL are better for downloading arbitrary
URIs.  For some definitions of "better".

Andrew
[EMAIL PROTECTED]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-05 Thread Martin v. Löwis
Andrew Dalke schrieb:
>> I find there is a difference between "urllib behaves
>> non-intuitively" and "urllib gives result A for parameters B and C,
>> but should give result D instead". Can you please add specific examples
>> to your report that demonstrate the difference between implemented
>> and expected behavior?
> 
> No.
> 
> I consider the "../" cases to be unimportant edge cases and
> I would rather people fixed the other problems highlighted in the
> text I copied from 4Suite's Uri.py -- like improperly allowing a
> relative URL as the base url, which I incorrectly assumed was
> legit - and that others have reported on python-dev, easily found
> with Google.

It still should be possible to come up with examples for these as
well, no? For example, if you pass a relative URI as the base
URI, what would you like to see happen?

> If I only add test cases for "../" then I believe that that's all that
> will be fixed.

That's true. Actually, it's probably not true; it will only get fixed
if some volunteer contributes a fix.

> Finally, I see that my report is a dup.  SF search is poor.  As
> Nick Coghlan reported, Paul Jimenez has a replacement for urlparse.
> Summarized in
>  http://www.python.org/dev/summary/2006-04-01_2006-04-15/
> It was submitted in spring as a patch - SF# 1462525 at
>   
> http://sourceforge.net/tracker/index.php?func=detail&aid=1462525&group_id=5470&atid=305470
> which I didn't find in my earlier searching.

So do you think this patch meets your requirements?

This topic (URL parsing) is not only inherently difficult to
implement, it is just as tedious to review. Without anybody
reviewing the contributed code, it's certain that it will never
be incorporated.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-05 Thread Andrew Dalke
Me [Andrew]:
> > As this is not a bug, I have added the feature request 1591035 to SF
> > titled "update urlparse to RFC 3986".  Nothing else appeared to exist
> > on that specific topic.

Martin:
> Thanks. It always helps to be more specific; being less specific often
> hurts.

So does being more specific.  I wasn't trying to report a bug in
urlparse.  I figured everyone knew the problems existed.  The code
comments say so and various back discussions on this list say so.

All I wanted to do what point out that two seemingly similar problems -
path traversal of hierarchical structures - had two different expected
behaviors.  Now I've spent entirely too much time on specifics I didn't
care about and didn't think were important.

I've also been known to do the full report and have people ignore what
I wrote because it was too long.

> I find there is a difference between "urllib behaves
> non-intuitively" and "urllib gives result A for parameters B and C,
> but should give result D instead". Can you please add specific examples
> to your report that demonstrate the difference between implemented
> and expected behavior?

No.

I consider the "../" cases to be unimportant edge cases and
I would rather people fixed the other problems highlighted in the
text I copied from 4Suite's Uri.py -- like improperly allowing a
relative URL as the base url, which I incorrectly assumed was
legit - and that others have reported on python-dev, easily found
with Google.

If I only add test cases for "../" then I believe that that's all that
will be fixed.

Given the back history of this problem and lack of followup I
also believe it won't be fixed unless someone develops a brand
new module, from scratch, which will be added to some future
Python version.  There's probably a compliance suite out there
to use for this sort of task.  I hadn't bothered to look as I am
no more proficient than others here at Google.

Finally, I see that my report is a dup.  SF search is poor.  As
Nick Coghlan reported, Paul Jimenez has a replacement for urlparse.
Summarized in
 http://www.python.org/dev/summary/2006-04-01_2006-04-15/
It was submitted in spring as a patch - SF# 1462525 at
  
http://sourceforge.net/tracker/index.php?func=detail&aid=1462525&group_id=5470&atid=305470
which I didn't find in my earlier searching.

Andrew
[EMAIL PROTECTED]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-05 Thread Martin v. Löwis
Andrew Dalke schrieb:
 urlparse.urljoin("http://blah.com/";, "..")
> 'http://blah.com/'
 urlparse.urljoin("http://blah.com/";, "../")
> 'http://blah.com/../'
 urlparse.urljoin("http://blah.com/";, "../..")
> 'http://blah.com/'
> 
> Does the result make sense to you?  Does it make
> sense that the last of these is shorter than the middle
> one?  It sure doesn't to me.  I thought it was obvious
> that there was an error;

That wasn't obvious at all to me. Now looking at the
examples, I agree there is an error. The middle one
is incorrect;

urlparse.urljoin("http://blah.com/";, "../")

should also give 'http://blah.com/'.

>> You shouldn't be giving more "../" sequences than are possible. I find
>> the current behavior acceptable.
> 
> (Aparently for RFC 1808 that's a valid answer; it was an implementation
> choice in how to handle that case.)

There is still some text left to that respect in 5.4.2 of RFC 3986.

> While not directly relevant, postings like John J Lee's
> http://mail.python.org/pipermail/python-bugs-list/2006-February/031875.html
>> The urlparse.urlparse() code should not be changed, for
>> backwards compatibility reasons.
> 
> strongly suggest a desire to not change that code.

This is John J Lee's opinion, of course. I don't see a reason not to fix
such bugs, or to update the implementation to the current RFCs.

> As this is not a bug, I have added the feature request 1591035 to SF
> titled "update urlparse to RFC 3986".  Nothing else appeared to exist
> on that specific topic.

Thanks. It always helps to be more specific; being less specific often
hurts. I find there is a difference between "urllib behaves
non-intuitively" and "urllib gives result A for parameters B and C,
but should give result D instead". Can you please add specific examples
to your report that demonstrate the difference between implemented
and expected behavior?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-05 Thread Andrew Dalke
Martin:
> Unfortunately, you didn't say which of these you want explained.
> As it is tedious to write down even a single one, I restrain to the
> one with the What?! remark.
>
>  urlparse.urljoin("http://blah.com/a/b/c";, "../../../..")  # What?!
> > 'http://blah.com/'

The "What?!" is in context with the previous and next entries.  I've
reduced it to a simpler case

>>> urlparse.urljoin("http://blah.com/";, "..")
'http://blah.com/'
>>> urlparse.urljoin("http://blah.com/";, "../")
'http://blah.com/../'
>>> urlparse.urljoin("http://blah.com/";, "../..")
'http://blah.com/'

Does the result make sense to you?  Does it make
sense that the last of these is shorter than the middle
one?  It sure doesn't to me.  I thought it was obvious
that there was an error; obvious enough that I didn't
bother to track down why - especially as my main point
was to argue there are different ways to deal with
hierarchical/path-like schemes, each correct for its
given domain.

> Please follow me through section 5 of
>
> http://www.ietf.org/rfc/rfc3986.txt

The core algorithm causing the "what?!" comes from
"reduce_dot_segments", section 5.2.4.  In parallel my
3 cases should give:

5.2.4 Remove Dot Segments
 remove_dot_segments("/..")r_d_s("/../")r_d_s("/../..")

 1. I = "/.."   I="/../"I="/../.."
O = ""  O=""O=""
 2A. (does not apply) 2A. (does not apply)  2A. (does not apply)
 2B. (does not apply) 2B. (does not apply)  2B. (does not apply)
 2C. O="" I="/"   2C. O="" I="/"2C. O="" I="/.."
 2A. (does not apply) 2A. (does not apply)   .. reduces to r_d_s("/..")
 2B. (does not apply) 2B. (does not apply)  3. Result "/"
 2C. (does not apply) 2C. (does not apply)
 2D. (does not apply) 2D. (does not apply)
 2E. O="/", I=""  2E. O="/", I=""
 3. Result: "/"   3. Result "/"

My reading of the RFC 3986 says all three examples should
produce the same result.  The fact that my "what?!" comment happens
to be correct according to that RFC is purely coincidental.

Then again, urlparse.py does *not* claim to be RFC 3986 compliant.
The module docstring is

"""Parse (absolute and relative) URLs.

See RFC 1808: "Relative Uniform Resource Locators", by R. Fielding,
UC Irvine, June 1995.
"""

I tried the same code with 4Suite, which does claim compliance, and get

>>> import Ft
>>> from Ft.Lib import Uri
>>> Uri.Absolutize("..", "http://blah.com/";)
'http://blah.com/'
>>> Uri.Absolutize("../", "http://blah.com/";)
'http://blah.com/'
>>> Uri.Absolutize("../..", "http://blah.com/";)
'http://blah.com/'
>>>

The text of it's Uri.py says

This function is similar to urlparse.urljoin() and urllib.basejoin().
Those functions, however, are (as of Python 2.3) outdated, buggy, and/or
designed to produce results acceptable for use with other core Python
libraries, rather than being earnest implementations of the relevant
specs. Their problems are most noticeable in their handling of
same-document references and 'file:' URIs, both being situations that
come up far too often to consider the functions reliable enough for
general use.
"""
# Reasons to avoid using urllib.basejoin() and urlparse.urljoin():
# - Both are partial implementations of long-obsolete specs.
# - Both accept relative URLs as the base, which no spec allows.
# - urllib.basejoin() mishandles the '' and '..' references.
# - If the base URL uses a non-hierarchical or relative path,
#or if the URL scheme is unrecognized, the result is not
#always as expected (partly due to issues in RFC 1808).
# - If the authority component of a 'file' URI is empty,
#the authority component is removed altogether. If it was
#not present, an empty authority component is in the result.
# - '.' and '..' segments are not always collapsed as well as they
#should be (partly due to issues in RFC 1808).
# - Effective Python 2.4, urllib.basejoin() *is* urlparse.urljoin(),
#but urlparse.urljoin() is still based on RFC 1808.

In searching the archives
  http://mail.python.org/pipermail/python-dev/2005-September/056152.html

Fabien Schwob:
> I'm using the module urlparse and I think I've found a bug in the
> urlparse module. When you merge an url and a link
> like"../../../page.html" with urljoin, the new url created keep some
> "../" in it. Here is an example :
>
>  >>> import urlparse
>  >>> begin = "http://www.example.com/folder/page.html";
>  >>> end = "../../../otherpage.html"
>  >>> urlparse.urljoin(begin, end)
> 'http://www.example.com/../../otherpage.html'

Guido:
> You shouldn't be giving more "../" sequences than are possible. I find
> the current behavior acceptable.

(Aparently for RFC 1808 that's a valid answer; it was an implementation
choice in how to handle that case.)

While not directly relevant, postings like John J Lee's
 http://mail.python.org/pipermail/python-bugs-lis

Re: [Python-Dev] Path object design

2006-11-05 Thread Martin v. Löwis
Andrew Dalke schrieb:
> I have looked at the spec, and can't figure out how its explanation
> matches the observed urljoin results.  Steve's excerpt trimmed out
> the strangest example.

Unfortunately, you didn't say which of these you want explained.
As it is tedious to write down even a single one, I restrain to the
one with the What?! remark.

 urlparse.urljoin("http://blah.com/a/b/c";, "../../../..")  # What?!
> 'http://blah.com/'

Please follow me through section 5 of

http://www.ietf.org/rfc/rfc3986.txt

5.2.1: Pre-parse the Base URI
 B.scheme = "http"
 B.authority = "blah.com"
 B.path = "/a/b/c"
 B.query = undefined
 B.fragment = undefined

5.2.2: Transform References
 parse("../../../..")
 R.scheme = R.authority = R.query = R.fragment = undefined
 R.path = "../../../.."
 (strictness not relevant, R.scheme is already undefined)
 R.scheme is not defined
 R.authority is not defined
 R.path is not ""
 R.path does not start with /
 T.path = merge("/a/b/c", "../../../..")
 T.path = remove_dot_segments(T.path)
 T.authority = "blah.com"
 T.scheme = "http"
 T.fragment = undefined

5.2.3 Merge paths
 merge("/a/b/c", "../../../..") =
 (base URI does have path)
 "/a/b/../../../.."

5.2.4 Remove Dot Segments
 remove_dot_segments("/a/b/../../../..")
 1. I = "/a/b/../../../.."
O = ""
 2. A (does not apply)
B (does not apply)
C (does not apply)
D (does not apply)
E O="/a" I="/b/../../../.."
 2. E O="/a/b" I="/../../../.."
 2. C O="/a" I="/../../.."
 2. C O="" I="/../.."
 2. C O="" I="/.."
 2. C O="" I="/"
 2. E O="/" I=""
 3. Result: "/"

5.3 Component Recomposition
 result = ""
 (scheme is defined)
 result = "http:"
 (authority is defined)
 result = "http://blah.com";
 (append path)
 result = "http://blah.com/";

HTH,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-05 Thread Mike Orr
On 11/5/06, Andrew Dalke <[EMAIL PROTECTED]> wrote:
>
>I agree that supporting non-filesystem directories (zip files,
>CSV/Subversion sandboxes, URLs) would be nice, but we already have a
>big enough project without that.  What constraints should a Path
>object keep in mind in order to be forward-compatible with this?
>
> Is the answer therefore that URLs and URI behaviour should not
> place constraints on a Path object becuse they are sufficiently
> dissimilar from file-system paths?  Do these other non-FS hierarchical
> structures have similar differences causing a semantic mismatch?

This discussion has renforced my belief that os.path.join's behavior
is correct with non-initial absolute args:

os.path.join('/usr/bin', '/usr/local/bin/python')

I've used that in applications and haven't found it a burden.

Its behavior with '..' seems justifiable too, and Talin's trick of
wrapping everything in os.path.normpath is a great one.

I do think join should take more care to avoid multiple slashes
together in the middle of a path, although this is really the
responsibility of the platform library, not a generic function/method.
 Join is true to its documentation of only adding separators and never
than deleting them, but that seems like a bit of sloppiness.   On the
other hand, the filesystems don't care; I don't think anybody has
mentioned a case where it actually creates a path the filesystem can't
handle.

urljoin clearly has a different job.  When we talked about extending
path to URLs, I was thinking more in terms of opening files, fetching
resources, deleting, renaming, etc. rather than split-modify-rejoin.
A hypothetical urlpath module would clearly have to follow the URL
rules.  I don't see a contradition in supporting both URL joining
rules and having a non-initial absolute argument, just to avoid
cross-"platform" surprises.  But urlpath would also need methods to
parse the scheme and host on demand, query strings, #fragments, a
class method for building a URL from the smallest parts, etc.

As for supporting path fragments and '..' in join arguments (for
filesystem paths), it's clearly too widely used to eliminate.  Users
can voluntarily refrain from passing arguments containing separators.
For cases involving a user-supplied -- possibly hostile -- path,
either a separate method (safe_join, child) could achieve this, or a
subclass implemetation that allows only safe arguments.

Regarding pathname-manipulation methods and filesystem-access methods,
I'm not sure how workable it is to have separate objects for them.

os.mkdir(   Path("/usr/local/lib/python/Cheetah/Template.py").parent   )
Path("/usr/local/lib/python/Cheetah/Template.py").parent.mkdir()
FileAccess(
Path("/usr/local/lib/python/Cheetah/Template.py").parent   ).mkdir()

The first two are reasonable.  The third... who would want to do this
for every path?  How often would you reuse the FileAccess object?  I
typically create Path objects from configuration values and keep them
around for the entire application; e.g., data_dir.  Then I create
derived paths as necessary. I suppose if the FileAccess object has a
.path attribute, it could do double-duty so you wouldn't have to store
the path separately.  Is this what the advocates of two classes have
in mind?  With usage like this?

my_file = FileAccess(   file_access_obj.path.joinpath("my_file")   )
my_file = FileAccess(   Path(file_access_obj,path, "my_file")   )

Working on my Path implementation.  (Yes it's necessary, Glyph, at
least to me.)  It's going slow because I just got a Macintosh laptop
and am still rounding up packages to install.

-- 
Mike Orr <[EMAIL PROTECTED]>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-05 Thread Andrew Dalke
Steve:
> > I'm darned if I know. I simply know that it isn't right for http resources.

/F:
> the URI specification disagrees; an URI that starts with "../" is per-
> fectly legal, and the specification explicitly states how it should be
> interpreted.

I have looked at the spec, and can't figure out how its explanation
matches the observed urljoin results.  Steve's excerpt trimmed out
the strangest example.

>>> urlparse.urljoin("http://blah.com/a/b/c";, "../../../")
'http://blah.com/../'
>>> urlparse.urljoin("http://blah.com/a/b/c";, "../../../..")  # What?!
'http://blah.com/'
>>> urlparse.urljoin("http://blah.com/a/b/c";, "../../../../")
'http://blah.com/../../'
>>>

> (it's important to realize that "urijoin" produces equivalent URI:s, not
> file names)

Both, though, are "paths".  The OP, Mik Orr, wrote:

   I agree that supporting non-filesystem directories (zip files,
   CSV/Subversion sandboxes, URLs) would be nice, but we already have a
   big enough project without that.  What constraints should a Path
   object keep in mind in order to be forward-compatible with this?

Is the answer therefore that URLs and URI behaviour should not
place constraints on a Path object becuse they are sufficiently
dissimilar from file-system paths?  Do these other non-FS hierarchical
structures have similar differences causing a semantic mismatch?

Andrew
[EMAIL PROTECTED]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-05 Thread stephen
Michael Urman writes:

 > Ah, but how do you know when that's wrong? At least under ftp:// your
 > root is often a mid-level directory until you change up out of it.
 > http:// will tend to treat the targets as roots, but I don't know that
 > there's any requirement for a /.. to be meaningless (even if it often
 > is).

ftp and http schemes both have authority ("host") components, so the
meaning of ".." path components is defined in the same way for both by
section 5 of RFC 3986.

Of course an FTP server is not bound to interpret the protocol so as
to mimic URL semantics.  But that's a different question.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-04 Thread Fredrik Lundh
Steve Holden wrote:

>> Ah, but how do you know when that's wrong? At least under ftp:// your
>> root is often a mid-level directory until you change up out of it.
>> http:// will tend to treat the targets as roots, but I don't know that
>> there's any requirement for a /.. to be meaningless (even if it often
>> is).
>>
> I'm darned if I know. I simply know that it isn't right for http resources.

the URI specification disagrees; an URI that starts with "../" is per- 
fectly legal, and the specification explicitly states how it should be 
interpreted.

(it's important to realize that "urijoin" produces equivalent URI:s, not 
file names)



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-04 Thread Steve Holden
Michael Urman wrote:
> On 11/3/06, Steve Holden <[EMAIL PROTECTED]> wrote:
> 
>> Having said this, Andrew *did* demonstrate quite convincingly that the
>> current urljoin has some fairly egregious directory traversal glitches.
>> Is it really right to punt obvious gotchas like
>>
>>  >>>urlparse.urljoin("http://blah.com/a/b/c";, "../../../../")
>>
>> 'http://blah.com/../../'
> 
> 
> Ah, but how do you know when that's wrong? At least under ftp:// your
> root is often a mid-level directory until you change up out of it.
> http:// will tend to treat the targets as roots, but I don't know that
> there's any requirement for a /.. to be meaningless (even if it often
> is).
> 
I'm darned if I know. I simply know that it isn't right for http resources.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-04 Thread Michael Urman
On 11/3/06, Steve Holden <[EMAIL PROTECTED]> wrote:
> Having said this, Andrew *did* demonstrate quite convincingly that the
> current urljoin has some fairly egregious directory traversal glitches.
> Is it really right to punt obvious gotchas like
>
>  >>>urlparse.urljoin("http://blah.com/a/b/c";, "../../../../")
>
> 'http://blah.com/../../'

Ah, but how do you know when that's wrong? At least under ftp:// your
root is often a mid-level directory until you change up out of it.
http:// will tend to treat the targets as roots, but I don't know that
there's any requirement for a /.. to be meaningless (even if it often
is).

-- 
Michael Urman  http://www.tortall.net/../mu/blog ;)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-03 Thread Nick Coghlan
Steve Holden wrote:
> Having said this, Andrew *did* demonstrate quite convincingly that the 
> current urljoin has some fairly egregious directory traversal glitches. 
> Is it really right to punt obvious gotchas like
> 
>  >>>urlparse.urljoin("http://blah.com/a/b/c";, "../../../../")
> 
> 'http://blah.com/../../'
> 
>  >>>
> 
> to the server?

See Paul Jimenez's thread about replacing urlparse with something better. The 
current module has some serious issues :)

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-03 Thread Steve Holden
Phillip J. Eby wrote:
> At 01:56 AM 11/4/2006 +0100, Andrew Dalke wrote:
> 
>>os.join assumes the base is a directory
>>name when used in a join: "inserting '/' as needed" while RFC
>>1808 says
>>
>>   The last segment of the base URL's path (anything
>>   following the rightmost slash "/", or the entire path if no
>>   slash is present) is removed
>>
>>Is my intuition wrong in thinking those should be the same?
> 
> 
> Yes.  :)
> 
> Path combining and URL absolutization(?) are inherently different 
> operations with only superficial similarities.  One reason for this is that 
> a trailing / on a URL has an actual meaning, whereas in filesystem paths a 
> trailing / is an aberration and likely an actual error.
> 
> The path combining operation says, "treat the following as a subpath of the 
> base path, unless it is absolute".  The URL normalization operation says, 
> "treat the following as a subpath of the location the base URL is 
> *contained in*".
> 
> Because of this, os.path.join assumes a path with a trailing separator is 
> equivalent to a path without one, since that is the only reasonable way to 
> interpret treating the joined path as a subpath of the base path.
> 
> But for a URL join, the path /foo and the path /foo/ are not only 
> *different paths* referring to distinct objects, but the operation wants to 
> refer to the *container* of the referenced object.  /foo might refer to a 
> directory, while /foo/ refers to some default content (e.g. 
> index.html).  This is actually why Apache normally redirects you from /foo 
> to /foo/ before it serves up the index.html; relative URLs based on a base 
> URL of /foo won't work right.
> 
> The URL approach is designed to make peer-to-peer linking in a given 
> directory convenient.  Instead of referring to './foo.html' (as one would 
> have to do with filenames, you can simply refer to 'foo.html'.  But the 
> cost of saving those characters in every link is that joining always takes 
> place on the parent, never the tail-end.  Thus directory URLs normally end 
> in a trailing /, and most tools tend to automatically redirect when 
> somebody leaves it off.  (Because otherwise the links would be wrong.)
> 
Having said this, Andrew *did* demonstrate quite convincingly that the 
current urljoin has some fairly egregious directory traversal glitches. 
Is it really right to punt obvious gotchas like

 >>>urlparse.urljoin("http://blah.com/a/b/c";, "../../../../")

'http://blah.com/../../'

 >>>

to the server?

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-03 Thread Phillip J. Eby
At 01:56 AM 11/4/2006 +0100, Andrew Dalke wrote:
>os.join assumes the base is a directory
>name when used in a join: "inserting '/' as needed" while RFC
>1808 says
>
>The last segment of the base URL's path (anything
>following the rightmost slash "/", or the entire path if no
>slash is present) is removed
>
>Is my intuition wrong in thinking those should be the same?

Yes.  :)

Path combining and URL absolutization(?) are inherently different 
operations with only superficial similarities.  One reason for this is that 
a trailing / on a URL has an actual meaning, whereas in filesystem paths a 
trailing / is an aberration and likely an actual error.

The path combining operation says, "treat the following as a subpath of the 
base path, unless it is absolute".  The URL normalization operation says, 
"treat the following as a subpath of the location the base URL is 
*contained in*".

Because of this, os.path.join assumes a path with a trailing separator is 
equivalent to a path without one, since that is the only reasonable way to 
interpret treating the joined path as a subpath of the base path.

But for a URL join, the path /foo and the path /foo/ are not only 
*different paths* referring to distinct objects, but the operation wants to 
refer to the *container* of the referenced object.  /foo might refer to a 
directory, while /foo/ refers to some default content (e.g. 
index.html).  This is actually why Apache normally redirects you from /foo 
to /foo/ before it serves up the index.html; relative URLs based on a base 
URL of /foo won't work right.

The URL approach is designed to make peer-to-peer linking in a given 
directory convenient.  Instead of referring to './foo.html' (as one would 
have to do with filenames, you can simply refer to 'foo.html'.  But the 
cost of saving those characters in every link is that joining always takes 
place on the parent, never the tail-end.  Thus directory URLs normally end 
in a trailing /, and most tools tend to automatically redirect when 
somebody leaves it off.  (Because otherwise the links would be wrong.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-03 Thread Andrew Dalke
Martin:
> Just in case this isn't clear from Steve's and Fredrik's
> post: The behaviour of this function is (or should be)
> specified, by an IETF RFC. If somebody finds that non-intuitive,
> that's likely because their mental model of relative URIs
> deviate's from the RFC's model.

While I didn't realize that urljoin is only supposed to be used
with a base URL, where "base URL" (used in the docstring) has
a specific requirement that it be absolute.

I instead saw the word "join" and figured it's should do roughly
the same things as os.path.join.


>>> import urlparse
>>> urlparse.urljoin("file:///path/to/hello", "slash/world")
'file:///path/to/slash/world'
>>> urlparse.urljoin("file:///path/to/hello", "/slash/world")
'file:///slash/world'
>>> import os
>>> os.path.join("/path/to/hello", "slash/world")
'/path/to/hello/slash/world'
>>>

It does not.  My intuition, nowadays highly influenced by URLs, is that
with a couple of hypothetical functions for going between filenames and URLs:

os.path.join(absolute_filename, filename)
   ==
file_url_to_filename(urlparse.urljoin(
 filename_to_file_url(absolute_filename),
 filename_to_file_url(filename)))

which is not the case.  os.join assumes the base is a directory
name when used in a join: "inserting '/' as needed" while RFC
1808 says

   The last segment of the base URL's path (anything
   following the rightmost slash "/", or the entire path if no
   slash is present) is removed

Is my intuition wrong in thinking those should be the same?

I suspect it is. I've been very glad that when I ask for a directory
name that I don't need to check that it ends with a "/".  Urljoin's
behaviour is correct for what it's doing.  os.path.join is better for
what it's doing.  (And about once a year I manually verify the
difference because I get unsure.)

I think these should not share the "join" in the name.

If urljoin is not meant for relative base URLs, should it
raise an exception when misused? Hmm, though the RFC
algorithm does not have a failure mode and the result may
be a relative URL.

Consider

>>> urlparse.urljoin("http://blah.com/a/b/c";, "..")
'http://blah.com/a/'
>>> urlparse.urljoin("http://blah.com/a/b/c";, "../")
'http://blah.com/a/'
>>> urlparse.urljoin("http://blah.com/a/b/c";, "../..")
'http://blah.com/'
>>> urlparse.urljoin("http://blah.com/a/b/c";, "../../")
'http://blah.com/'
>>> urlparse.urljoin("http://blah.com/a/b/c";, "../../..")
'http://blah.com/'
>>> urlparse.urljoin("http://blah.com/a/b/c";, "../../../")
'http://blah.com/../'
>>> urlparse.urljoin("http://blah.com/a/b/c";, "../../../..")  # What?!
'http://blah.com/'
>>> urlparse.urljoin("http://blah.com/a/b/c";, "../../../../")
'http://blah.com/../../'
>>>


> Of course, there is also the chance that the implementation
> deviates from the RFC; that would be a bug.

The comment in urlparse

# XXX The stuff below is bogus in various ways...

is ever so reassuring.  I suspect there's a bug given the
previous code.  Or I've a bad mental model.  ;)

Andrew
[EMAIL PROTECTED]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-03 Thread Martin v. Löwis
Andrew Dalke schrieb:
>  >>> import urlparse
>  >>> urlparse.urljoin("hello", "/world")
>  '/world'
>  >>> urlparse.urljoin("hello", "slash/world")
>  'slash/world'
>  >>> urlparse.urljoin("hello", "slash//world")
>  'slash//world'
>  >>>
> 
> It does not make sense to me that these should be different.

Just in case this isn't clear from Steve's and Fredrik's
post: The behaviour of this function is (or should be)
specified, by an IETF RFC. If somebody finds that non-intuitive,
that's likely because their mental model of relative URIs
deviate's from the RFC's model.

Of course, there is also the chance that the implementation
deviates from the RFC; that would be a bug.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-03 Thread Fredrik Lundh
Steve Holden wrote:

> Although the last two smell like bugs, the point of urljoin is to make 
> an absolute URL from an absolute ("current page") URL

also known as a base URL:

 http://www.w3.org/TR/html4/struct/links.html#h-12.4.1

(os.path.join's behaviour is also well-defined, btw; if any component is 
an absolute path, all preceding components are ignored.)



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-03 Thread Steve Holden
Andrew Dalke wrote:
> glyph:
> 
>>Path manipulation:
>>
>> * This is confusing as heck:
>>   >>> os.path.join("hello", "/world")
>>   '/world'
>>   >>> os.path.join("hello", "slash/world")
>>   'hello/slash/world'
>>   >>> os.path.join("hello", "slash//world")
>>   'hello/slash//world'
>>   Trying to formulate a general rule for what the arguments to os.path.join
>>are supposed to be is really hard.  I can't really figure out what it would
>>be like on a non-POSIX/non-win32 platform.
> 
> 
> Made trickier by the similar yet different behaviour of urlparse.urljoin.
> 
>  >>> import urlparse
>  >>> urlparse.urljoin("hello", "/world")
>  '/world'
>  >>> urlparse.urljoin("hello", "slash/world")
>  'slash/world'
>  >>> urlparse.urljoin("hello", "slash//world")
>  'slash//world'
>  >>>
> 
> It does not make sense to me that these should be different.
> 
Although the last two smell like bugs, the point of urljoin is to make 
an absolute URL from an absolute ("current page") URL and a relative 
(link) one. As we see:

  >>> urljoin("/hello", "slash/world")
'/slash/world'

and

  >>> urljoin("http://localhost/hello";, "slash/world")
'http://localhost/slash/world'

but

  >>> urljoin("http://localhost/hello/";, "slash/world")
'http://localhost/hello/slash/world'
  >>> urljoin("http://localhost/hello/index.html";, "slash/world")
'http://localhost/hello/slash/world'
  >>>

I think we can probably conclude that this is what's supposed to happen. 
In the case of urljoin the first argument is interpreted as referencing 
an existing resource and the second as a link such as might appear in 
that resource.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-03 Thread Andrew Dalke
glyph:
> Path manipulation:
>
>  * This is confusing as heck:
>>>> os.path.join("hello", "/world")
>'/world'
>>>> os.path.join("hello", "slash/world")
>'hello/slash/world'
>>>> os.path.join("hello", "slash//world")
>'hello/slash//world'
>Trying to formulate a general rule for what the arguments to os.path.join
> are supposed to be is really hard.  I can't really figure out what it would
> be like on a non-POSIX/non-win32 platform.

Made trickier by the similar yet different behaviour of urlparse.urljoin.

 >>> import urlparse
 >>> urlparse.urljoin("hello", "/world")
 '/world'
 >>> urlparse.urljoin("hello", "slash/world")
 'slash/world'
 >>> urlparse.urljoin("hello", "slash//world")
 'slash//world'
 >>>

It does not make sense to me that these should be different.

   Andrew
   [EMAIL PROTECTED]

[Apologies to glyph for the dup; mixed up the reply-to.  Still getting
used to gmail.]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-02 Thread Talin
Steve Holden wrote:
> Greg Ewing wrote:
>> Mike Orr wrote:
>> Having said that, I can see there could be an
>> element of confusion in calling it "join".
>>
> Good point. "relativise" might be appropriate, though something shorter 
> would be better.
> 
> regards
>   Steve

The term used in many languages for this sort of operation is "combine". 
(See .Net System.IO.Path for an example.) I kind of like the term - it 
implies that you are mixing two paths together, but it doesn't imply 
that the combination will be additive.

- Talin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-02 Thread Greg Ewing
Steve Holden wrote:
> Greg Ewing wrote:
> 
>>Having said that, I can see there could be an
>>element of confusion in calling it "join".
>>
> 
> Good point. "relativise" might be appropriate,

Sounds like something to make my computer go at
warp speed, which would be nice, but I won't
be expecting a patch any time soon. :-)

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-02 Thread Steve Holden
Greg Ewing wrote:
> Mike Orr wrote:
> 
> 
>>>* This is confusing as heck:
>>>  >>> os.path.join("hello", "/world")
>>>  '/world'
> 
> 
> It's only confusing if you're not thinking of
> pathnames as abstract entities.
> 
> There's a reason for this behaviour -- it's
> so you can do things like
> 
>full_path = os.path.join(default_dir, filename_from_user)
> 
> where filename_from_user can be either a relative
> or absolute path at his discretion.
> 
> In other words, os.path.join doesn't just mean "join
> these two paths together", it means "interpret the
> second path in the context of the first".
> 
> Having said that, I can see there could be an
> element of confusion in calling it "join".
> 
Good point. "relativise" might be appropriate, though something shorter 
would be better.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-02 Thread glyph
On 01:04 am, [EMAIL PROTECTED] wrote:>[EMAIL PROTECTED] wrote:>If you're serious about writing platform-agnostic>pathname code, you don't put slashes in the arguments>at all. Instead you do>>   os.path.join("hello", "slash", "world")>>Many of the other things you mention are also a>result of not treating pathnames as properly opaque>objects.Of course nobody who cares about these issues is going to put constant forward slashes into pathnames.  The point is not that you'll forget you're supposed to be dealing with pathnames; the point is that you're going to get input from some source that you've got very little control over, and *especially* if that source is untrusted (although sometimes just due to mistakes) there are all kinds of ways it can trip you up.  Did you accidentally pass it through something that doubles or undoubles all backslashes, etc.  Sometimes these will result in harmless errors anyway, sometimes it's a critical error that will end up trying to delete /usr instead of /home/user/installer-build/ROOT/usr.  If you have the path library catching these problems for you then a far greater percentage fall into the former category.>If you're saying that the fact they're strings makes>it easy to forget that you're supposed to be treating>them opaquely,That's exactly what I'm saying.>>  * although individual operations are atomic, shutil.copytree and friends >>aren't.  I've often seen python programs confused by partially-copied trees >>of files.>I can't see how this can be even remotely regarded>as a pathname issue, or even a filesystem interface>issue. It's no different to any other situation>where a piece of code can fall over and leave a>partial result behind.It is a bit of a stretch, I'll admit, but I included it because it is a weakness of the path library that it is difficult to do the kind of parallel iteration required to implement tree-copying yourself.  If that were trivial, then you could write your own file-copying loop and cope with errors yourself.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-02 Thread Greg Ewing
[EMAIL PROTECTED] wrote:
> Relative 
> paths, if they should exist at all, should have to be explicitly linked 
> as relative to something *else* (e.g. made absolute) before they can be 
> used.

If paths were opaque objects, this could be enforced
by not having any way of constructing a path that
wasn't rooted in some existing absolute path.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-02 Thread Greg Ewing
Mike Orr wrote:
> I have no idea why Microsoft thought it was a good idea to
> put the seven-odd device files in every directory. Why not force
> people to type the colon ("CON:").

Yes, this is a particularly stupid piece of braindamage
on the part of the designers of MS-DOS. As far as I
remember, even CP/M (which was itself a severely
warped and twisted version of RT11) had the good
sense to put colons on the end of such things.

But maybe "design" is too strong a word to apply
to MS-DOS...

Anyhow, I think I agree that there's really nothing
a path library can do about this. Whatever it tries
to do, the fact will remain that it's impossible to
have a regular file called "con", and users will
have to live with that.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-02 Thread Greg Ewing
[EMAIL PROTECTED] wrote:

>>>> os.path.join("hello", "slash/world")
>'hello/slash/world'
>>>> os.path.join("hello", "slash//world")
>'hello/slash//world'
>Trying to formulate a general rule for what the arguments to 
> os.path.join are supposed to be is really hard.

If you're serious about writing platform-agnostic
pathname code, you don't put slashes in the arguments
at all. Instead you do

   os.path.join("hello", "slash", "world")

Many of the other things you mention are also a
result of not treating pathnames as properly opaque
objects.

If you're saying that the fact they're strings makes
it easy to forget that you're supposed to be treating
them opaquely, there may be merit in that view. It
would be an argument for making path objects a
truly opaque type instead of a subclass of string
or tuple.

>  * although individual operations are atomic, shutil.copytree and 
> friends aren't.  I've often seen python programs confused by 
> partially-copied trees of files.

I can't see how this can be even remotely regarded
as a pathname issue, or even a filesystem interface
issue. It's no different to any other situation
where a piece of code can fall over and leave a
partial result behind. As always, the cure is
defensive coding (clean up a partial result on error,
or be prepared to tolerate the presence of a previous
partial result when re-trying).

It could be argued that shutil.copytree should clean
up after itself if there is an error, but that might
not be what you want -- e.g. you might want to find
out how far it got, and maybe carry on from there
next time. It's probably better to leave things like
that to the caller.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-02 Thread Greg Ewing
Mike Orr wrote:

>> * This is confusing as heck:
>>   >>> os.path.join("hello", "/world")
>>   '/world'

It's only confusing if you're not thinking of
pathnames as abstract entities.

There's a reason for this behaviour -- it's
so you can do things like

   full_path = os.path.join(default_dir, filename_from_user)

where filename_from_user can be either a relative
or absolute path at his discretion.

In other words, os.path.join doesn't just mean "join
these two paths together", it means "interpret the
second path in the context of the first".

Having said that, I can see there could be an
element of confusion in calling it "join".

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread Mike Orr
On 11/1/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> On 01:46 am, [EMAIL PROTECTED] wrote:
> >On 11/1/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> >This is ironic coming from one of Python's celebrity geniuses.  "We
> >made this class but we don't know how it works."  Actually, it's
> >downright alarming coming from someone who knows Twisted inside and
> >out yet still can't make sense of path patform oddities.
>
> Man, it is going to be hard being ironically self-deprecating if people keep
> going around calling me a "celebrity genius".  My ego doesn't need any help,
> you know? :)

I respect Twisted in the same way I respect a loaded gun.  It's
powerful, but approach with caution.

> If you ever think I'm suggesting breaking something in Python, you're
> misinterpreting me ;).  I am as cagey as they come about this.  No matter
> what else happens, the behavior of os.path should not really change.

The point is, what *should* a join-like method do in a future improved
path module?  os.path.join should not change because too many programs
depend on its current behavior, in ways we can't necessarily predict.
But a new function/method is not bound by these constraints, as long
as the boundary cases are well documented.  All the os.path and
file-related os/shutil functions need to be reexamined in this
context.  Maybe the existing behavior is best, maybe we'll keep it
even if it's sub-optimal, but we should document why we're making
these choices.

> >The user didn't call normpath, so should we normalize it anyway?
>
> That's really the main point here.
>
> What is a path that hasn't been "normalized"?  Is it a path at all, or is it
> some random garbage with slashes (or maybe other things) in it?  os.path
> performs correct path algebra on correct inputs, and it's correct (as far as
> one can be correct) on inputs that have weird junk in them.

I'm tempted to say Path("/a/b").join("c", "d") should do the same
thing your .child method does, but allow multiple levels in one step.

But on the other hand, there will always be people with prebuilt
"path/fragments" to join to other fragments, and I'm not sure we
should force them to split the fragment just to rejoin it again.
Maybe we need a .join_unsafe method for this, haha.

-- 
Mike Orr <[EMAIL PROTECTED]>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread glyph
On 01:46 am, [EMAIL PROTECTED] wrote:>On 11/1/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:>This is ironic coming from one of Python's celebrity geniuses.  "We>made this class but we don't know how it works."  Actually, it's>downright alarming coming from someone who knows Twisted inside and>out yet still can't make sense of path patform oddities.Man, it is going to be hard being ironically self-deprecating if people keep going around calling me a "celebrity genius".  My ego doesn't need any help, you know? :)In some sense I was being serious; part of the point of abstraction is embedding some of your knowledge in your code so you don't have to keep it around in your brain all the time.  I'm sure that my analysis of path-based problems wasn't exhaustive because I don't really use os.path for path manipulation.  I use static.File and it _works_, I only remember these os.path flaws from the process of writing it, not daily use.>>  * This is confusing as heck:>>    >>> os.path.join("hello", "/world")>>    '/world'>>That's in the documentation.  I'm not sure it's "wrong".  What should>it do in this situation?  Pretend the slash isn't there?You can document anything.  That doesn't really make it a good idea.The point I was trying to make wasn't really that os.path is *wrong*.  Far from it, in fact, it defines some useful operations and they are basically always correct.  I didn't even say "wrong", I said "confusing".  FilePath is implemented strictly in terms of os.path because it _does_ do the right thing with its inputs.  The question is, how hard is it to remember what its inputs should be?>>    >>> os.path.join("hello", "slash/world")>>    'hello/slash/world'>>That has always been a loophole in the function, and many programs>depend on it.If you ever think I'm suggesting breaking something in Python, you're misinterpreting me ;).  I am as cagey as they come about this.  No matter what else happens, the behavior of os.path should not really change.>The user didn't call normpath, so should we normalize it anyway?That's really the main point here.What is a path that hasn't been "normalized"?  Is it a path at all, or is it some random garbage with slashes (or maybe other things) in it?  os.path performs correct path algebra on correct inputs, and it's correct (as far as one can be correct) on inputs that have weird junk in them.In the strings-and-functions model of paths, this all makes perfect sense, and there's no particular sensibility associated with defining ideas like "equivalency" for paths, unless that's yet another function you pass some strings to.  I definitely prefer this:    path1 == path2to this:    os.path.abspath(pathstr1) == os.path.abspath(pathstr2)though.You'll notice I used abspath instead of normpath.  As a side note, I've found interpreting relative paths as always relative to the current directory is a bad idea.  You can see this when you have a daemon that daemonizes and then opens files: the user thinks they're specifying relative paths from wherever they were when they ran the program, the program thinks they're relative paths from /var/run/whatever.  Relative paths, if they should exist at all, should have to be explicitly linked as relative to something *else* (e.g. made absolute) before they can be used.  I think that sequences of strings might be sufficient though.>Good point, but exactly what functionality do you want to see for zip>files and URLs?  Just pathname manipulation?  Or the ability to see>whether a file exists and extract it, copy it, etc?The latter.  See http://twistedmatrix.com/trac/browser/trunk/twisted/python/zippath.pyThis is still _really_ raw functionality though.  I can't claim that it has the same "it's been used in real code" endorsement as the rest of the FilePath stuff I've been talking about.  I've never even tried to hook this up to a Twisted webserver, and I've only used it in one environment.>>  * you have to care about unicode sometimes.>This is a Python-wide problem.I completely agree, and this isn't the thread to try to solve it.  The absence of a path object, however, and the path module's reliance on strings, exacerbates the problem.  The fact that FilePath doesn't deal with this either, however, is a fairly good indication that the problem is deeper than that.>>  * the documentation really can't emphasize enough how bad using>> 'os.path.exists/isfile/isdir', and then assuming the file continues to exist>> when it is a contended resource, is.  It can be handy, but it is _always_ a>> race condition.>>What else can you do?  It's either os.path.exists()/os.remove() or "do>it anyway and catch the exception".  And sometimes you have to check>the filetype in order to determine *what* to do.You have to catch the exception anyway in many cases.  I probably shouldn't have mentioned it though, it's starting to get a bit far afield of even this ridiculously far-ranging discussion.  A more accurate criticism might be that "the absence of a file locking syste

Re: [Python-Dev] Path object design

2006-11-01 Thread Mike Orr
On 11/1/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> On 08:14 pm, [EMAIL PROTECTED] wrote:

> >(...) people have had to spend five years putting hard-to-read
> >os.path functions in the code, or reinventing the wheel with their own
> >libraries that they're not sure they can trust.  I started to use
> >path.py last year when it looked like it was emerging as the basis of
> >a new standard, but yanked it out again when it was clear the API
> >would be different by the time it's accepted.  I've gone back to
> >os.path for now until something stable emerges but I really wish I
> >didn't have to.
>
> You *don't* have to.  This is a weird attitude I've encountered over and
> over again in the Python community, although sometimes it masquerades as
> resistance to Twisted or Zope or whatever.  It's OK to use libraries.  It's
> OK even to use libraries that Guido doesn't like!  I'm pretty sure the first
> person to tell you that would be Guido himself.  (Well, second, since I just
> told you.)  If you like path.py and it solves your problems, use path.py.
> You don't have to cram it into the standard library to do that.  It won't be
> any harder to migrate from an old path object to a new path object than from
> os.path to a new path object, and in fact it would likely be considerably
> easier.

Oh, I understand it's OK to use libraries.  It's just that a path
library needs to be widely tested and well supported so you know it
won't scramble your files.  A bug in a date library affects only
datetimes. A bug in a database database library affects only that
database.  A bug in a template library affects only the page being
output.  But a bug in a path library could ruin your whole day.  "Um,
remember those important files in that other project directory you
weren't working in? They were just overwritten."

Also, I train several programmers new to Python at work. I want to
make them learn *one* path library that we'll be sure to stick with
for several years.  Every path library has subtle quirks, and
switching from one to another may not be just a matter of renaming
methods.

> >- the "secure" features may not be necessary.  If they are, this
> >should be a separate discussion, and perhaps implemented as a
> >subclass.
>
> The main "secure" feature is "child" and it is, in my opinion, the best part
> about the whole class.  Some of the other stuff (rummaging around for
> siblings with extensions, for example) is probably extraneous.  child,
> however, lets you take a string from arbitrary user input and map it into a
> path segment, both securely and quietly.  Here's a good example (and this
> actually happened, this is how I know about that crazy windows 'special
> files' thing I wrote in my other recent message): you have a decision-making
> program that makes two files to store information about a process: "pro" and
> "con".  It turns out that "con" is shorthand for "fall in a well and die" in
> win32-ese.  A "secure" path manipulation library would alert you to this
> problem with a traceback rather than having it inexplicably freeze.
> Obscure, sure, but less obscure would be getting deterministic errors from a
> user entering slashes into a text field that shouldn't accept them.

Perhaps you're right.  I'm not saying it *should not* be a basic
feature, just that unless the Python community as a whole is ready for
this, users should have a choice to use it or not.

I learned about DOS device files from the manuals back in the 80s.
But I had completely forgotten them when I made several "aux"
directories in a Subversion repository on Linux.  People tried to
check it out on Windows and... got some kind of error.  "CON" means
console: its input comes from the keyboard and its output goes to the
screen.  Since this is a device file, I'm not sure a path library has
any responsibility to treat it specially.  We don't treat
"/dev/stdout" specially unless the user specifically calls a device
function. I have no idea why Microsoft thought it was a good idea to
put the seven-odd device files in every directory. Why not force
people to type the colon ("CON:").  If they've memorized what CON
means they should have no trouble with the colon, especially since
it's required with "A:" and "C:" anyway

For trivia, these are the ones I remember:
CON   Console  (keyboard input, screen output)
KBRD  Keyboard input.
???  screen output
LPT1/2/3parallel ports
COM 1/2/3/4  serial ports
PRN  alias for default printer port (normally LPT1)
NUL  bit bucket
AUX  game port?

COPY CON FILENAME.TXT # Unix: "cat >filename.txt".
COPY FILENAME.TXT PRN  # Unix: "lp filename.txt"  or "cat
filename.txt | lp".
TYPE FILENAME.TXT   # Unix: "cat filename.txt".

> >Where have all the proponents of non-OO or limited-OO strategies been?
>
> This continuum doesn't make any sense to me.  Where would you

Re: [Python-Dev] Path object design

2006-11-01 Thread Mike Orr
On 11/1/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> On 06:14 pm, [EMAIL PROTECTED] wrote:
> >[EMAIL PROTECTED] wrote:
> >
> >> I assert that it needs a better[1] interface because the current
> >> interface can lead to a variety of bugs through idiomatic, apparently
> >> correct usage.  All the more because many of those bugs are related to
> >> critical errors such as security and data integrity.
>
> >instead of referring to some esoteric knowledge about file systems that
> >us non-twisted-using mere mortals may not be evolved enough to under-
> >stand,
>
> On the contrary, twisted users understand even less, because (A) we've been
> demonstrated to get it wrong on numerous occasions in highly public and
> embarrassing ways and (B) we already have this class that does it all for us
> and we can't remember how it works :-).

This is ironic coming from one of Python's celebrity geniuses.  "We
made this class but we don't know how it works."  Actually, it's
downright alarming coming from someone who knows Twisted inside and
out yet still can't make sense of path patform oddities.

>  * This is confusing as heck:
>>>> os.path.join("hello", "/world")
>'/world'

That's in the documentation.  I'm not sure it's "wrong".  What should
it do in this situation?  Pretend the slash isn't there?

This came up in the directory-tuple proposal.  I said there was no
reason to change the existing behavior of join.  Noam favored an
exception.

>>>> os.path.join("hello", "slash/world")
>'hello/slash/world'

That has always been a loophole in the function, and many programs
depend on it.  Again, is it "wrong"?  Should an embedded separator in
an argument be an error?  Obviously this depends on the user's
knowledge that the separator happens to be slash.

>>>> os.path.join("hello", "slash//world")
>'hello/slash//world'

Again a case of what "should" it do?  The filesystem treats it as a
single slash.  The user didn't call normpath, so should we normalize
it anyway?

>  * Sometimes a path isn't a path; the zip "paths" in sys.path are a good
> example.  This is why I'm a big fan of including a polymorphic interface of
> some kind: this information is *already* being persisted in an ad-hoc and
> broken way now, so it needs to be represented; it would be good if it were
> actually represented properly.  URL
> manipulation-as-path-manipulation is another; the recent
> perforce use-case mentioned here is a special case of that, I think.

Good point, but exactly what functionality do you want to see for zip
files and URLs?  Just pathname manipulation?  Or the ability to see
whether a file exists and extract it, copy it, etc?

>  * you have to care about unicode sometimes.  rarely enough that none of
> your tests will ever account for it, but often enough that _some_ users will
> notice breakage if your code is ever widely distributed.

This is a Python-wide problem.  The move to universal unicode will
lessen this, or at least move the problem to *one* place (creating the
unicode object), where every Python programmer will get bitten by it
and we'll develop a few standard strategies to deal with it.

(The problem is that if str and unicode are mixed in expressions,
Python will promote the str to unicode and you'll get a
UnicodeDecodeError if it contains non-ASCII characters.  Figuring out
all the ways such strings can slip into a program is difficult if
you're dealing with user strings from an unknown charset, or your
MySQL server is configured differently than you thought it was, or the
string contains Windows curly quotes et al which are undefined in
Latin-1.)

>  * the documentation really can't emphasize enough how bad using
> 'os.path.exists/isfile/isdir', and then assuming the file continues to exist
> when it is a contended resource, is.  It can be handy, but it is _always_ a
> race condition.

What else can you do?  It's either os.path.exists()/os.remove() or "do
it anyway and catch the exception".  And sometimes you have to check
the filetype in order to determine *what* to do.

-- 
Mike Orr <[EMAIL PROTECTED]>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread glyph
On 08:14 pm, [EMAIL PROTECTED] wrote:>Argh, it's difficult to respond to one topic that's now spiraling into>two conversations on two lists.>[EMAIL PROTECTED] wrote:>(...) people have had to spend five years putting hard-to-read>os.path functions in the code, or reinventing the wheel with their own>libraries that they're not sure they can trust.  I started to use>path.py last year when it looked like it was emerging as the basis of>a new standard, but yanked it out again when it was clear the API>would be different by the time it's accepted.  I've gone back to>os.path for now until something stable emerges but I really wish I>didn't have to.You *don't* have to.  This is a weird attitude I've encountered over and over again in the Python community, although sometimes it masquerades as resistance to Twisted or Zope or whatever.  It's OK to use libraries.  It's OK even to use libraries that Guido doesn't like!  I'm pretty sure the first person to tell you that would be Guido himself.  (Well, second, since I just told you.)  If you like path.py and it solves your problems, use path.py.  You don't have to cram it into the standard library to do that.  It won't be any harder to migrate from an old path object to a new path object than from os.path to a new path object, and in fact it would likely be considerably easier.>> *It is already used in a large body of real, working code, and>> therefore its limitations are known.*>>This is an important consideration.However, to me a clean API is more>important.It's not that I don't think a "clean" API is important.  It's that I think that "clean" is a subjective assessment that is hard to back up, and it helps to have some data saying "we think this is clean because there are very few bugs in this 100,000 line program written using it".  Any code that is really easy to use right will tend to have *some* aesthetic appeal.>I took a quick look at filepath.  It looks similar in concept to PEP>355.  Four concerns:>    - unfamiliar method names (createDirectory vs mkdir, child vs join)Fair enough, but "child" really means child, not join.  It is explicitly for joining one additional segment, with no slashes in it.>    - basename/dirname/parent are methods rather than properties:>leads to () overproliferation in user code.The () is there because every invocation returns a _new_ object.  I think that this is correct behavior but I also would prefer that it remain explicit.>    - the "secure" features may not be necessary.  If they are, this>should be a separate discussion, and perhaps implemented as a>subclass.The main "secure" feature is "child" and it is, in my opinion, the best part about the whole class.  Some of the other stuff (rummaging around for siblings with extensions, for example) is probably extraneous.  child, however, lets you take a string from arbitrary user input and map it into a path segment, both securely and quietly.  Here's a good example (and this actually happened, this is how I know about that crazy windows 'special files' thing I wrote in my other recent message): you have a decision-making program that makes two files to store information about a process: "pro" and "con".  It turns out that "con" is shorthand for "fall in a well and die" in win32-ese.  A "secure" path manipulation library would alert you to this problem with a traceback rather than having it inexplicably freeze.  Obscure, sure, but less obscure would be getting deterministic errors from a user entering slashes into a text field that shouldn't accept them.>    - stylistic objection to verbose camelCase names like createDirectoryThere is no accounting for taste, I suppose.  Obviously if it violates the stlib's naming conventions it would have to be adjusted.>> Path representation is a bike shed.  Nobody would have proposed>> writing an entirely new embedded database engine for Python: python>> 2.5 simply included SQLite because its utility was already proven.>>There's a quantum level of difference between path/file manipulation>-- which has long been considered a requirement for any full-featured>programming language -- and a database engine which is much more>complex."quantum" means "the smallest possible amount", although I don't think you're using like that, so I think I agree with you.  No, it's not as hard as writing a database engine.  Nevertheless it is a non-trivial problem, one worthy of having its own library and clearly capable of generating a fair amount of its own discussion.>Fredrik has convinced me that it's more urgent to OOize the pathname>conversions than the filesystem operations.I agree in the relative values.  I am still unconvinced that either is "urgent" in the sense that it needs to be in the standard library.>Where have all the proponents of non-OO or limited-OO strategies been?This continuum doesn't make any sense to me.  Where would you place Twisted's solution on it?___
Python-Dev mailing list
Python-Dev@python.org

Re: [Python-Dev] Path object design

2006-11-01 Thread glyph
On 06:14 pm, [EMAIL PROTECTED] wrote:>[EMAIL PROTECTED] wrote:>>> I assert that it needs a better[1] interface because the current>> interface can lead to a variety of bugs through idiomatic, apparently>> correct usage.  All the more because many of those bugs are related to>> critical errors such as security and data integrity.>instead of referring to some esoteric knowledge about file systems that>us non-twisted-using mere mortals may not be evolved enough to under->stand,On the contrary, twisted users understand even less, because (A) we've been demonstrated to get it wrong on numerous occasions in highly public and embarrassing ways and (B) we already have this class that does it all for us and we can't remember how it works :-).>maybe you could just make a list of common bugs that may arise>due to idiomatic use of the existing primitives?Here are some common gotchas that I can think of off the top of my head.  Not all of these are resolved by Twisted's path class:Path manipulation: * This is confusing as heck:   >>> os.path.join("hello", "/world")   '/world'   >>> os.path.join("hello", "slash/world")   'hello/slash/world'   >>> os.path.join("hello", "slash//world")   'hello/slash//world'   Trying to formulate a general rule for what the arguments to os.path.join are supposed to be is really hard.  I can't really figure out what it would be like on a non-POSIX/non-win32 platform. * it seems like slashes should be more aggressively converted to backslashes on windows, because it's near impossible to do anything with os.sep in the current situation. * "C:blah" does not mean what you think it means on Windows.  Regardless of what you think it means, it is not that.  I thought I understood it once as the current process having a current directory on every mapped drive, but then I had to learn about UNC paths of network mapped drives and it stopped making sense again. * There are special files on windows such as "CON" and "NUL" which exist in _every_ directory.  Twisted does get around this, by looking at the result of abspath:   >>> os.path.abspath("c:/foo/bar/nul")   'nul' * Sometimes a path isn't a path; the zip "paths" in sys.path are a good example.  This is why I'm a big fan of including a polymorphic interface of some kind: this information is *already* being persisted in an ad-hoc and broken way now, so it needs to be represented; it would be good if it were actually represented properly.  URL manipulation-as-path-manipulation is another; the recent perforce use-case mentioned here is a special case of that, I think. * paths can have spaces in them and there's no convenient, correct way to quote them if you want to pass them to some gross function like os.system - and a lot of the code that manipulates paths is shell-script-replacement crud which wants to call gross functions like os.system.  Maybe this isn't really the path manipulation code's fault, but it's where people start looking when they want properly quoted path arguments. * you have to care about unicode sometimes.  rarely enough that none of your tests will ever account for it, but often enough that _some_ users will notice breakage if your code is ever widely distributed.  this is an even more obscure example, but pygtk always reports pathnames in utf8-encoded *byte* strings, regardless of your filesystem encoding.  If you forget to decode/encode it, hilarity ensues.  There's no consistent error reporting (as far as I can tell, I have encountered this rarely) and no real way to detect this until you have an actual insanely-configured system with an insanely-named file on it to test with.  (Polymorphic interfaces might help a *bit* here.  At worst, they would at least make it possible to develop a canonical "insanely encoded filesystem" test-case backend.  At best, you'd absolutely have to work in terms of unicode all the time, and no implicit encoding issues would leak through to application code.)  Twisted's thing doesn't deal with this at all, and it really should. * also *sort* of an encoding issue, although basically only for webservers or other network-accessible paths: thanks to some of these earlier issues as well as %2e%2e, there are effectively multiple ways to spell "..".  Checking for all of them is impossible, you need to use the os.path APIs to determine if the paths you've got really relate in the ways you think they do. * os.pathsep can be, and actually sometimes is, embedded in a path.  (again, more  of a general path problem, not really python's fault) * relative path manipulation is difficult.  ever tried to write the function to iterate two separate trees of files in parallel?  shutil re-implements this twice completely differently via recursion, and it's harder to do with a generator (which is what you really want).  you can't really split on os.sep and have it be correct due to the aforementioned windows-path issue, but that's what everybody does anyway. * os.path.split doesn't work anything like str.split.FS ma

Re: [Python-Dev] Path object design

2006-11-01 Thread Mike Orr
Argh, it's difficult to respond to one topic that's now spiraling into
two conversations on two lists.

[EMAIL PROTECTED] wrote:
> On 03:14 am, [EMAIL PROTECTED] wrote:
>
> >One thing is sure -- we urgently need something better than os.path.
> >It functions well but it makes hard-to-read and unpythonic code.
>
> I'm not so sure.  The need is not any more "urgent" today than it was
> 5 years ago, when os.path was equally "unpythonic" and unreadable.
> The problem is real but there is absolutely no reason to hurry to a
> premature solution.

Except that people have had to spend five years putting hard-to-read
os.path functions in the code, or reinventing the wheel with their own
libraries that they're not sure they can trust.  I started to use
path.py last year when it looked like it was emerging as the basis of
a new standard, but yanked it out again when it was clear the API
would be different by the time it's accepted.  I've gone back to
os.path for now until something stable emerges but I really wish I
didn't have to.

> I've already recommended Twisted's twisted.python.filepath module as a
> possible basis for the implementation of this feature

> *It is already used in a large body of real, working code, and
> therefore its limitations are known.*

This is an important consideration.However, to me a clean API is more
important.  Since we haven't agreed on an API there is no widely-used
module that implements it... it's a chicken-and-egg problem since it
takes significant time to write and test an implementation.  So I'd
like to start from the standpoint of an ideal API rather than just
taking the API of the most widely-used implementation.  os.path is
clearly the most widely-used implementation, but that doesn't mean
that OOizing it as-is would be my favorite choice.

I took a quick look at filepath.  It looks similar in concept to PEP
355.  Four concerns:
- unfamiliar method names (createDirectory vs mkdir, child vs join)
- basename/dirname/parent are methods rather than properties:
leads to () overproliferation in user code.
- the "secure" features may not be necessary.  If they are, this
should be a separate discussion, and perhaps implemented as a
subclass.
- stylistic objection to verbose camelCase names like createDirectory


> Proposals for extending the language are contentious and it is very
> difficult to do experimentation with non-trivial projects because
> nobody wants to do that and then end up with a bunch of code written
> in a language that is no longer supported when the experiment fails.

True.

> Path representation is a bike shed.  Nobody would have proposed
> writing an entirely new embedded database engine for Python: python
> 2.5 simply included SQLite because its utility was already proven.

There's a quantum level of difference between path/file manipulation
-- which has long been considered a requirement for any full-featured
programming language -- and a database engine which is much more
complex.

Georg Brandl <[EMAIL PROTECTED]> wrote:
> I have been a supporter of the full-blown Path object in the past, but the
> recent discussions have convinved me that it is just too big and too 
> confusing,
> and that you can't kill too many birds with one stone in this respect.
> Most of the ugliness really lies in the path name manipulation functions, 
> which
> nicely map to methods on a path name object.

Fredrik has convinced me that it's more urgent to OOize the pathname
conversions than the filesystem operations.  Pathname conversions are
the ones that frequently get nested or chained, whereas filesystem
operations are usually done at the top level of a program statement,
or return a different "kind" of value (stat, true/false, etc).

However, it's interesting that all the proposals I've seen in the past
three years have been a "monolithic" OO class.  Clearly there are a
lot of people who prefer this way, or at least have never heard of
anything different.  Where have all the proponents of non-OO or
limited-OO strategies been?  The first proposal of that sort I've seen
was Nich Cochlan's October 1.  Have y'all just been ignoring the
monolithic OO efforts without offering any alternatives?


Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> > This is fully backwards compatible, can go right into 2.6 without
> > breaking anything, allows people to update their code as they go,
> > and can be incrementally improved in future releases:
> >
> >  1) Add a pathname wrapper to "os.path", which lets you do basic
> > path "algebra".  This should probably be a subclass of unicode,
> > and should *only* contain operations on names.
> >
> >  2) Make selected "shutil" operations available via the "os" name-
> > space; the old POSIX API vs. POSIX SHELL distinction is pretty
> > irrelevant.  Also make the os.path predicates available via the
> > "os" namespace.
> >
> > This gives a very simple conceptual model for the user; to manipu

[Python-Dev] Path object design

2006-11-01 Thread Jim Jewett
On 10:06 am, g.brandl at gmx.net wrote:
>> What a successor to os.path needs is not security, it's a better
(more pythonic,
>> if you like) interface to the old functionality.

Glyph:

> Why?

> Rushing ... could exacerbate a very real problem, e.g.
> the security and data-integrity implications of idiomatic usage.

The proposed Path object (or new path module) is intended to replace
os.path.  If it can't do the equivalent of "cd ..", then it isn't a
replacement; it is just another similar alternative to confuse
beginners.

If you're saying that a webserver should use a more restricted
subclass (or even the existing FilePath alternative), then I agree.
I'll even agree that a restricted version would ideally be available
out of the box.  I don't think it should be the only option.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote:

> I assert that it needs a better[1] interface because the current 
> interface can lead to a variety of bugs through idiomatic, apparently 
> correct usage.  All the more because many of those bugs are related to 
> critical errors such as security and data integrity.

instead of referring to some esoteric knowledge about file systems that 
us non-twisted-using mere mortals may not be evolved enough to under- 
stand, maybe you could just make a list of common bugs that may arise 
due to idiomatic use of the existing primitives?

I promise to make a nice FAQ entry out of it, with proper attribution.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread Georg Brandl
[EMAIL PROTECTED] wrote:
> On 10:06 am, [EMAIL PROTECTED] wrote:
>  >What a successor to os.path needs is not security, it's a better (more 
> pythonic,
>  >if you like) interface to the old functionality.
> 
> Why?
> 
> I assert that it needs a better[1] interface because the current 
> interface can lead to a variety of bugs through idiomatic, apparently 
> correct usage.  All the more because many of those bugs are related to 
> critical errors such as security and data integrity.

AFAICS, people just want an interface that is easier to use and feels more...
err... (trying to avoid the p-word). I've never seen security arguments
being made in this discussion.

> If I felt the current interface did a good job at doing the right thing 
> in the right situation, but was cumbersome to use, I would strenuously 
> object to _any_ work taking place to change it.  This is a hard API to 
> get right.

Well, it's hard to change any running system with that attitude. It doesn't
have to be changed if nobody comes up with something that's agreed (*) to
be better.

(*) agreed in the c.l.py sense, of course

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread glyph
On 10:06 am, [EMAIL PROTECTED] wrote:>What a successor to os.path needs is not security, it's a better (more pythonic,>if you like) interface to the old functionality.Why?I assert that it needs a better[1] interface because the current interface can lead to a variety of bugs through idiomatic, apparently correct usage.  All the more because many of those bugs are related to critical errors such as security and data integrity.If I felt the current interface did a good job at doing the right thing in the right situation, but was cumbersome to use, I would strenuously object to _any_ work taking place to change it.  This is a hard API to get right.[1]: I am rather explicitly avoiding the word "pythonic" here.  It seems to have grown into a shibboleth (and its counterpart, "unpythonic", into an expletive).  I have the impression it used to mean something a bit more specific, maybe adherence to Tim Peters' "Zen" (although that was certainly vague enough by itself and not always as self-evidently true as some seem to believe).  More and more, now, though, I hear it used to mean 'stuff should be more betterer!' and then everyone nods sagely because we know that no filthy *java* programmer wants things to be more betterer; *we* know *they* want everything to be horrible.  Words like this are a pet peeve of mine though, so perhaps I am overstating the case.  Anyway, moving on... as long as I brought up the Zen, perhaps a particular couplet is appropriate here:  Now is better than never.  Although never is often better than *right* now.Rushing to a solution to a non-problem, e.g. the "pythonicness" of the interface, could exacerbate a very real problem, e.g. the security and data-integrity implications of idiomatic usage.  Granted, it would be hard to do worse than os.path, but it is by no means impossible (just look at any C program!), and I can think of a couple of kinds of API which would initially appear more convenient but actually prove more problematic over time.That brings me back to my original point: the underlying issue here is too important a problem to get wrong *again* on the basis of a superficial "need" for an API that is "better" in some unspecified way.  os.path is at least possible to get right if you know what you're doing, which is no mean feat; there are many path-manipulation libraries in many languages which cannot make that claim (especially portably).  Its replacement might not be.  Getting this wrong outside the standard library might create problems for some people, but making it worse _in_ the standard library could create a total disaster for everyone.I do believe that this wouldn't get past the dev team (least of all the release manager) but it would waste a lot less of everyone's time if we focused the inevitable continuing bike-shed discussion along the lines of discussing the known merits of widely deployed alternative path libraries, or at least an approach to *get* that data on some new code if there is consensus that existing alternatives are in some way inadequate.If for some reason it _is_ deemed necessary to go with an untried approach, I can appreciate the benefits that /F has proposed of trying to base the new interface entirely and explicitly off the old one.  At least that way it will still definitely be possible to get right.  There are problems with that too, but they are less severe.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread Jean-Paul Calderone
On Wed, 01 Nov 2006 11:06:14 +0100, Georg Brandl <[EMAIL PROTECTED]> wrote:
>[EMAIL PROTECTED] wrote:
>> On 03:14 am, [EMAIL PROTECTED] wrote:
>>
>>  >One thing is sure -- we urgently need something better than os.path.
>>  >It functions well but it makes hard-to-read and unpythonic code.
>>
>> I'm not so sure.  The need is not any more "urgent" today than it was 5
>> years ago, when os.path was equally "unpythonic" and unreadable.  The
>> problem is real but there is absolutely no reason to hurry to a
>> premature solution.
>>
>> I've already recommended Twisted's twisted.python.filepath module as a
>> possible basis for the implementation of this feature.  I'm sorry I
>> don't have the time to pursue that.  I'm also sad that nobody else seems
>> to have noticed.  Twisted's implemenation has an advantage that it
>> doesn't seem that these new proposals do, an advantage I would really
>> like to see in whatever gets seriously considered for adoption:
>
>Looking at
>,
>it seems as if FilePath was made to serve a different purpose than what we're
>trying to discuss here:
>
>"""
>I am a path on the filesystem that only permits 'downwards' access.
>
>Instantiate me with a pathname (for example,
>FilePath('/home/myuser/public_html')) and I will attempt to only provide access
>to files which reside inside that path. [...]
>
>The correct way to use me is to instantiate me, and then do ALL filesystem
>access through me.
>"""
>
>What a successor to os.path needs is not security, it's a better (more 
>pythonic,
>if you like) interface to the old functionality.

No.  You've misunderstood the code you looked at.  FilePath serves exactly
the purpose being discussed here.  Take a closer look.

Jean-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread Fredrik Lundh
Jonathan Lange wrote:

> Then let us discuss that.

Glyph's references to bike sheds went right over your head, right?



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread Jonathan Lange
On 11/1/06, Georg Brandl <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
> > On 03:14 am, [EMAIL PROTECTED] wrote:
> >
> >  >One thing is sure -- we urgently need something better than os.path.
> >  >It functions well but it makes hard-to-read and unpythonic code.
> >
> > I'm not so sure.  The need is not any more "urgent" today than it was 5
> > years ago, when os.path was equally "unpythonic" and unreadable.  The
> > problem is real but there is absolutely no reason to hurry to a
> > premature solution.
> >
> > I've already recommended Twisted's twisted.python.filepath module as a
> > possible basis for the implementation of this feature.  I'm sorry I
> > don't have the time to pursue that.  I'm also sad that nobody else seems
> > to have noticed.  Twisted's implemenation has an advantage that it
> > doesn't seem that these new proposals do, an advantage I would really
> > like to see in whatever gets seriously considered for adoption:
>
> Looking at
> ,
> it seems as if FilePath was made to serve a different purpose than what we're
> trying to discuss here:
>
> """
> I am a path on the filesystem that only permits 'downwards' access.
>
> Instantiate me with a pathname (for example,
> FilePath('/home/myuser/public_html')) and I will attempt to only provide 
> access
> to files which reside inside that path. [...]
>
> The correct way to use me is to instantiate me, and then do ALL filesystem
> access through me.
> """
>
> What a successor to os.path needs is not security, it's a better (more 
> pythonic,
> if you like) interface to the old functionality.
>

Then let us discuss that. Is FilePath actually a better interface to
the old functionality? Even if it was designed to solve a security
problem, it might prove to be an extremely useful general interface.

jml
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread Georg Brandl
[EMAIL PROTECTED] wrote:
> On 03:14 am, [EMAIL PROTECTED] wrote:
> 
>  >One thing is sure -- we urgently need something better than os.path.
>  >It functions well but it makes hard-to-read and unpythonic code.
> 
> I'm not so sure.  The need is not any more "urgent" today than it was 5 
> years ago, when os.path was equally "unpythonic" and unreadable.  The 
> problem is real but there is absolutely no reason to hurry to a 
> premature solution.
> 
> I've already recommended Twisted's twisted.python.filepath module as a 
> possible basis for the implementation of this feature.  I'm sorry I 
> don't have the time to pursue that.  I'm also sad that nobody else seems 
> to have noticed.  Twisted's implemenation has an advantage that it 
> doesn't seem that these new proposals do, an advantage I would really 
> like to see in whatever gets seriously considered for adoption:

Looking at 
,
it seems as if FilePath was made to serve a different purpose than what we're
trying to discuss here:

"""
I am a path on the filesystem that only permits 'downwards' access.

Instantiate me with a pathname (for example, 
FilePath('/home/myuser/public_html')) and I will attempt to only provide access 
to files which reside inside that path. [...]

The correct way to use me is to instantiate me, and then do ALL filesystem 
access through me.
"""

What a successor to os.path needs is not security, it's a better (more pythonic,
if you like) interface to the old functionality.

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread Georg Brandl
Fredrik Lundh wrote:
> Talin wrote:
> 
>> I'm right in the middle of typing up a largish post to go on the 
>> Python-3000 mailing list about this issue. Maybe we should move it over 
>> there, since its likely that any path reform will have to be targeted at 
>> Py3K...?
> 
> I'd say that any proposal that cannot be fit into the current 2.X design 
> is simply too disruptive to go into 3.0.  So here's my proposal for 2.6 
> (reposted from the 3K list).
> 
> This is fully backwards compatible, can go right into 2.6 without 
> breaking anything, allows people to update their code as they go,
> and can be incrementally improved in future releases:
> 
>  1) Add a pathname wrapper to "os.path", which lets you do basic
> path "algebra".  This should probably be a subclass of unicode,
> and should *only* contain operations on names.
> 
>  2) Make selected "shutil" operations available via the "os" name-
> space; the old POSIX API vs. POSIX SHELL distinction is pretty
> irrelevant.  Also make the os.path predicates available via the
> "os" namespace.
> 
> This gives a very simple conceptual model for the user; to manipulate
> path *names*, use "os.path.(string)" functions or the ""
> wrapper.  To manipulate *objects* identified by a path, given either as
> a string or a path wrapper, use "os.(path)".  This can be taught in
> less than a minute.

+1. This is really straightforward and easy to learn.

I have been a supporter of the full-blown Path object in the past, but the
recent discussions have convinved me that it is just too big and too confusing,
and that you can't kill too many birds with one stone in this respect.
Most of the ugliness really lies in the path name manipulation functions, which
nicely map to methods on a path name object.

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote:

> I am not addressing this message to the py3k list because its general 
> message of extreme conservatism on new features is more applicable to 
> python-dev.  However, py3k designers might also take note: if py3k is 
> going to do something in this area and drop support for the "legacy" 
> os.path, it would be good to choose something that is known to work and 
> have few gotchas, rather than just choosing the devil we don't know over 
> the devil we do.  The weaknesses of os.path are at least well-understood.

that's another reason why a new design might as well be defined in
terms of the old design -- especially if the main goal is call-site 
convenience, rather than fancy new algorithms.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-01 Thread glyph
On 03:14 am, [EMAIL PROTECTED] wrote:>One thing is sure -- we urgently need something better than os.path.>It functions well but it makes hard-to-read and unpythonic code.I'm not so sure.  The need is not any more "urgent" today than it was 5 years ago, when os.path was equally "unpythonic" and unreadable.  The problem is real but there is absolutely no reason to hurry to a premature solution.I've already recommended Twisted's twisted.python.filepath module as a possible basis for the implementation of this feature.  I'm sorry I don't have the time to pursue that.  I'm also sad that nobody else seems to have noticed.  Twisted's implemenation has an advantage that it doesn't seem that these new proposals do, an advantage I would really like to see in whatever gets seriously considered for adoption:*It is already used in a large body of real, working code, and therefore its limitations are known.*If I'm wrong about this, and I can't claim to really know about the relative levels of usage of all of these various projects when they're not mentioned, please cite actual experiences using them vs. using os.path.Proposals for extending the language are contentious and it is very difficult to do experimentation with non-trivial projects because nobody wants to do that and then end up with a bunch of code written in a language that is no longer supported when the experiment fails.  I understand, therefore, that language-change proposals are therefore going to be very contentious no matter what.However, there is no reason that library changes need to follow this same path.  It is perfectly feasible to write a library, develop some substantial applications with it, tweak it based on that experience, and *THEN* propose it for inclusion in the standard library.  Users of the library can happily continue using the library, whether it is accepted or not, and users of the language and standard library get a new feature for free.  For example, I plan to continue using FilePath regardless of the outcome of this discussion, although perhaps some conversion methods or adapters will be in order if a new path object makes it into the standard library.I specifically say "library" and not "recipie".  This is not a useful exercise if every user of the library has a subtly incompatible and manually tweaked version for their particular application.Path representation is a bike shed.  Nobody would have proposed writing an entirely new embedded database engine for Python: python 2.5 simply included SQLite because its utility was already proven.I also believe it is important to get this issue right.  It might be a bike shed, but it's a *very important* bike shed.  Google for "web server url filesystem path vulnerability" and you'll see what I mean.  Getting it wrong (or passing strings around everywhere) means potential security gotchas lurking around every corner.  Even Twisted, with no C code at all, got its only known arbitrary-code-execution vulnerability from a path manipulation bug.  That was even after we'd switched to an OO path-manipulation layer specifically to avoid bugs like this!I am not addressing this message to the py3k list because its general message of extreme conservatism on new features is more applicable to python-dev.  However, py3k designers might also take note: if py3k is going to do something in this area and drop support for the "legacy" os.path, it would be good to choose something that is known to work and have few gotchas, rather than just choosing the devil we don't know over the devil we do.  The weaknesses of os.path are at least well-understood.___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-10-31 Thread Fredrik Lundh
Talin wrote:

> I'm right in the middle of typing up a largish post to go on the 
> Python-3000 mailing list about this issue. Maybe we should move it over 
> there, since its likely that any path reform will have to be targeted at 
> Py3K...?

I'd say that any proposal that cannot be fit into the current 2.X design 
is simply too disruptive to go into 3.0.  So here's my proposal for 2.6 
(reposted from the 3K list).

This is fully backwards compatible, can go right into 2.6 without 
breaking anything, allows people to update their code as they go,
and can be incrementally improved in future releases:

 1) Add a pathname wrapper to "os.path", which lets you do basic
path "algebra".  This should probably be a subclass of unicode,
and should *only* contain operations on names.

 2) Make selected "shutil" operations available via the "os" name-
space; the old POSIX API vs. POSIX SHELL distinction is pretty
irrelevant.  Also make the os.path predicates available via the
"os" namespace.

This gives a very simple conceptual model for the user; to manipulate
path *names*, use "os.path.(string)" functions or the ""
wrapper.  To manipulate *objects* identified by a path, given either as
a string or a path wrapper, use "os.(path)".  This can be taught in
less than a minute.

With this in place in 2.6 and 2.7, all that needs to be done for 3.0 is 
to remove (some of) the old cruft.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-10-31 Thread Talin
I'm right in the middle of typing up a largish post to go on the 
Python-3000 mailing list about this issue. Maybe we should move it over 
there, since its likely that any path reform will have to be targeted at 
Py3K...?

Mike Orr wrote:
> I just saw the Path object thread ("PEP 355 status", Sept-Oct), saying
> that the first object-oriented proposal was rejected.  I'm in favor of
> the "directory tuple" approach which wasn't mentioned in the thread.
> This was proposed by Noal Raphael several months ago: a Path object
> that's a sequence of components (a la os.path.split) rather than a
> string.  The beauty of this approach is that slicing and joining are
> expressed naturally using the [] and + operators, eliminating several
> methods.
> 
> Introduction:  http://wiki.python.org/moin/AlternativePathClass
> Feature discussion:  http://wiki.python.org/moin/AlternativePathDiscussion
> Reference implementation:  http://wiki.python.org/moin/AlternativePathModule
> 
> (There's a link to the introduction at the end of PEP 355.)  Right now
> I'm working on a test suite, then I want to add the features marked
> "Mike" in the discussion -- in a way that people can compare the
> feature alternatives in real code -- and write a PEP.  But it's a big
> job for one person, and there are unresolved issues on the discussion
> page, not to mention things brought up in the "PEP 355 status" thread.
>  We had three people working on the discussion page but development
> seems to have ground to a halt.
> 
> One thing is sure -- we urgently need something better than os.path.
> It functions well but it makes hard-to-read and unpythonic code.  For
> instance, I have an application that has to add its libraries to the
> Python path, relative to the executable's location.
> 
> /toplevel
> app1/
> bin/
> main_progam.py
> utility1.py
> init_app.py
> lib/
> app_module.py
> shared/
> lib/
> shared_module.py
> 
> The solution I've found is an init_app module in every application
> that sets up the paths.  Conceptually it needs "../lib" and
> "../../shared/lib", but I want the absolute paths without hardcoding
> them, in a platform-neutral way.  With os.path, "../lib" is:
> 
> os.path.join(os.path.dirname(os.path.dirname(__FILE__)), "lib")
> 
> YUK!  Compare to PEP 355:
> 
> Path(__FILE__).parent.parent.join("lib")
> 
> Much easier to read and debug.  Under Noam's proposal it would be:
> 
> Path(__FILE__)[:-2] + "lib"
> 
> I'd also like to see the methods more intelligent: don't raise an
> error if an operation is already done (e.g., a directory exists or a
> file is already removed).  There's no reason to clutter one's code
> with extra if's when the methods can easily encapsulate this. This was
> considered a too radical departure from os.path for some, but I have
> in mind even more radical convenience methods which I'd put in a
> third-party subclass if they're not accepted into the standard
> library, the way 'datetime' has third-party subclasses.
> 
> In my application I started using Orendorff's path module, expecting
> the standard path object would be close to it.  When PEP 355 started
> getting more changes and the directory-based alternative took off, I
> took path.py out and rewrote my code for os.path until an alternative
> becomes more stable. Now it looks like it will be several months and
> possibly several third-party packages until one makes it into the
> standard library. This is unfortunate.  Not only does it mean ugly
> code in applications, but it means packages can't accept or return
> Path objects and expect them to be compatible with other packages.
> 
> The reasons PEP 355 was rejected also sound strange.  Nick Coghlan
> wrote (Oct 1):
> 
>> Things the PEP 355 path object lumps together:
>>   - string manipulation operations
>>   - abstract path manipulation operations (work for non-existent filesystems)
>>   - read-only traversal of a concrete filesystem (dir, stat, glob, etc)
>>   - addition & removal of files/directories/links within a concrete 
>> filesystem
> 
>> Dumping all of these into a single class is certainly practical from a 
>> utility
>> point of view, but it's about as far away from beautiful as you can get, 
>> which
>> creates problems from a learnability point of view, and from a
>> capability-based security point of view.
> 
> What about the convenience of the users and the beauty of users' code?
>  That's what matters to me.  And I consider one class *easier* to
> learn.  I'm tired of memorizing that 'split' is in os.path while
> 'remove' and 'stat' are in os.  This seems arbitrary: you're statting
> a path, aren't you?  Also, if you have four classes (abstract path,
> file, directory, symlink), *each* of those will have 3+
> platform-specific versions.  Then if you want to make an enhancement
> subclass you'll have to make 12 of them, one for each of the 3*4
> combinations of superclass

[Python-Dev] Path object design

2006-10-31 Thread Mike Orr
I just saw the Path object thread ("PEP 355 status", Sept-Oct), saying
that the first object-oriented proposal was rejected.  I'm in favor of
the "directory tuple" approach which wasn't mentioned in the thread.
This was proposed by Noal Raphael several months ago: a Path object
that's a sequence of components (a la os.path.split) rather than a
string.  The beauty of this approach is that slicing and joining are
expressed naturally using the [] and + operators, eliminating several
methods.

Introduction:  http://wiki.python.org/moin/AlternativePathClass
Feature discussion:  http://wiki.python.org/moin/AlternativePathDiscussion
Reference implementation:  http://wiki.python.org/moin/AlternativePathModule

(There's a link to the introduction at the end of PEP 355.)  Right now
I'm working on a test suite, then I want to add the features marked
"Mike" in the discussion -- in a way that people can compare the
feature alternatives in real code -- and write a PEP.  But it's a big
job for one person, and there are unresolved issues on the discussion
page, not to mention things brought up in the "PEP 355 status" thread.
 We had three people working on the discussion page but development
seems to have ground to a halt.

One thing is sure -- we urgently need something better than os.path.
It functions well but it makes hard-to-read and unpythonic code.  For
instance, I have an application that has to add its libraries to the
Python path, relative to the executable's location.

/toplevel
app1/
bin/
main_progam.py
utility1.py
init_app.py
lib/
app_module.py
shared/
lib/
shared_module.py

The solution I've found is an init_app module in every application
that sets up the paths.  Conceptually it needs "../lib" and
"../../shared/lib", but I want the absolute paths without hardcoding
them, in a platform-neutral way.  With os.path, "../lib" is:

os.path.join(os.path.dirname(os.path.dirname(__FILE__)), "lib")

YUK!  Compare to PEP 355:

Path(__FILE__).parent.parent.join("lib")

Much easier to read and debug.  Under Noam's proposal it would be:

Path(__FILE__)[:-2] + "lib"

I'd also like to see the methods more intelligent: don't raise an
error if an operation is already done (e.g., a directory exists or a
file is already removed).  There's no reason to clutter one's code
with extra if's when the methods can easily encapsulate this. This was
considered a too radical departure from os.path for some, but I have
in mind even more radical convenience methods which I'd put in a
third-party subclass if they're not accepted into the standard
library, the way 'datetime' has third-party subclasses.

In my application I started using Orendorff's path module, expecting
the standard path object would be close to it.  When PEP 355 started
getting more changes and the directory-based alternative took off, I
took path.py out and rewrote my code for os.path until an alternative
becomes more stable. Now it looks like it will be several months and
possibly several third-party packages until one makes it into the
standard library. This is unfortunate.  Not only does it mean ugly
code in applications, but it means packages can't accept or return
Path objects and expect them to be compatible with other packages.

The reasons PEP 355 was rejected also sound strange.  Nick Coghlan
wrote (Oct 1):

> Things the PEP 355 path object lumps together:
>   - string manipulation operations
>   - abstract path manipulation operations (work for non-existent filesystems)
>   - read-only traversal of a concrete filesystem (dir, stat, glob, etc)
>   - addition & removal of files/directories/links within a concrete filesystem

> Dumping all of these into a single class is certainly practical from a utility
> point of view, but it's about as far away from beautiful as you can get, which
> creates problems from a learnability point of view, and from a
> capability-based security point of view.

What about the convenience of the users and the beauty of users' code?
 That's what matters to me.  And I consider one class *easier* to
learn.  I'm tired of memorizing that 'split' is in os.path while
'remove' and 'stat' are in os.  This seems arbitrary: you're statting
a path, aren't you?  Also, if you have four classes (abstract path,
file, directory, symlink), *each* of those will have 3+
platform-specific versions.  Then if you want to make an enhancement
subclass you'll have to make 12 of them, one for each of the 3*4
combinations of superclasses.  Encapsulation can help with this, but
it strays from the two-line convenience for the user:

from path import Path
p = Path("ABC")  # Works the same for files/directories on any platform.

Nevertheless, I'm open to seeing a multi-class API, though hopefully
less verbose than Talin's preliminary one (Oct 26).  Is it necessary
to support path.parent(), pathobj.parent(), io.dir.listdir(), *and*
io.dir.Directory().  T