Re: [Python-ideas] pathlib suggestions

2017-01-26 Thread Philipp A.
How about adding a new argument to with_suffix?

Path.with_suffix(suffix: str,
 stripped: Union[int, str, Iterable[str]]=1)

stripped would either receive an int (in which case it will greedily strip
up to that many suffixes), or a (optionally compound) suffix which would be
stripped if present verbatim, or an iterable of suffix strings, in which
case it would strip all suffixes in the iterable as many times as
available. Examples:

Path('flop.pkg.tar.gz').with_suffix('') → Path('flop.pkg.tar')  # current
behavior
Path('flop.pkg.tar.gz').with_suffix('', 2) → Path('flop.pkg')  # you have
to know what you’re doing. 3 would have stripped '.pkg' too
Path('flop.pkg.tar.gz').with_suffix('', '.tar.gz') → Path('flop.pkg')
Path('flop.pkg.tar.gz').with_suffix('', '.gz.tar') →
Path('flop.pkg.tar.gz')  # not stripped, the suffix doesn’t appear verbatim
Path('flop.pkg.tar.gz.tar').with_suffix('', ['.gz', '.tar']) →
Path('flop.pkg')  # all instances stripped. probably useless.


Franklin? Lee  schrieb am Mi., 25. Jan. 2017
um 21:44 Uhr:

> > A ".tar.gz" is not the same as a ".svg.gz".  The fact that they are both
> > gzip-compressed is an implementation detail as far as most software I
> deal
> > with is concerned.  My unarchiver will extract a ".tar.gz" into a
> directory
> > as if it was just a ".tar", while my image viewer will view a ".svg.gz"
> as a
> > vector image as if it was just a ".svg".  From a user-interaction
> > standpoint, the ".gz" part is ignored.
>
> Just to be sure we're on the same page:
> - A .tar file is an uncompressed bundle of files.
> - A .gz file is a compressed version of a single file.
> - Technically, there's no such thing as a .tar.gz file. "x.tar.gz"
> means that if you unwrap it with gunzip, you'll get a file called
> "x.tar", which you can then unpack with tar.
>
> "x.tar.gz" is not a tar file using the gzip compression. It's a gz
> file which unpacks to a tar file. Conceptually, your unarchiver does
> it in two separate steps.
>
> Similarly, "x.svg.gz" is a gz file which unpacks to an svg file. Your
> viewer just knows to unzip it before use.
>
> I don't wanna appear as a naysayer, so here's an alternative
> suggestion: A parameter for a collection of "extension suffixes". The
> function will try to eat extensions from the end until it finds one
> NOT on the list (or it runs out). The docs can recommend `('gz', 'xz',
> 'bz', 'bz2', ...)`. Maybe a later Python version can use that
> recommendation as the default.
>
> IMO, ".part1" is not a part of the extension. You'd usually have
> "x.part1.rar" and "x.part2.rar" in the same folder, and it makes more
> sense that there are two files with base names "x.part1" and "x.part2"
> than to have two different files with the same base name and an
> extension which just keeps them ordered.
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] pathlib suggestions

2017-01-25 Thread Franklin? Lee
> A ".tar.gz" is not the same as a ".svg.gz".  The fact that they are both
> gzip-compressed is an implementation detail as far as most software I deal
> with is concerned.  My unarchiver will extract a ".tar.gz" into a directory
> as if it was just a ".tar", while my image viewer will view a ".svg.gz" as a
> vector image as if it was just a ".svg".  From a user-interaction
> standpoint, the ".gz" part is ignored.

Just to be sure we're on the same page:
- A .tar file is an uncompressed bundle of files.
- A .gz file is a compressed version of a single file.
- Technically, there's no such thing as a .tar.gz file. "x.tar.gz"
means that if you unwrap it with gunzip, you'll get a file called
"x.tar", which you can then unpack with tar.

"x.tar.gz" is not a tar file using the gzip compression. It's a gz
file which unpacks to a tar file. Conceptually, your unarchiver does
it in two separate steps.

Similarly, "x.svg.gz" is a gz file which unpacks to an svg file. Your
viewer just knows to unzip it before use.

I don't wanna appear as a naysayer, so here's an alternative
suggestion: A parameter for a collection of "extension suffixes". The
function will try to eat extensions from the end until it finds one
NOT on the list (or it runs out). The docs can recommend `('gz', 'xz',
'bz', 'bz2', ...)`. Maybe a later Python version can use that
recommendation as the default.

IMO, ".part1" is not a part of the extension. You'd usually have
"x.part1.rar" and "x.part2.rar" in the same folder, and it makes more
sense that there are two files with base names "x.part1" and "x.part2"
than to have two different files with the same base name and an
extension which just keeps them ordered.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] pathlib suggestions

2017-01-25 Thread Todd
On Wed, Jan 25, 2017 at 11:16 AM, Petr Viktorin  wrote:

> On 01/25/2017 04:33 PM, Todd wrote:
>
>> On Wed, Jan 25, 2017 at 10:18 AM, Petr Viktorin > > wrote:
>>
>> On 01/25/2017 04:04 PM, Todd wrote:
>>
>> On Wed, Jan 25, 2017 at 12:25 AM, Stephen J. Turnbull
>> > 
>> >
>> >> wrote:
>>
>> I'm just going to let fly with the +1s and -1s, don't take
>> them too
>> seriously, they're basically impressionistic (I'm not a huge
>> user of
>> pathlib yet).
>>
>> Todd writes:
>>
>>  > So although the names are tentative, perhaps there could
>> be a
>> "fullsuffix"
>>  > property to return the extensions as a single string,
>>
>> -0  '.'.join(p.suffixes) vs. p.fullsuffix?  TOOWTDI says
>> no.  I
>> also don't really see the use case.
>>
>>
>> The whole point of pathlib is to provide convenience functions for
>> common path-related operations.  It is full of methods and
>> properties
>> that could be implemented other ways.
>>
>> Dealing with multi-part extensions, at least for me, is extremely
>> common.  A ".tar.gz" file is not the same as a ".tar.bz2" or a
>> ".svg.gz".  When I want to find a ".tar.gz" file, having to deal
>> with
>> the ".tar" and ".gz" parts separately is nothing but a
>> nuisance.  If I
>> want to find and extract ".rar" files, I don't want ".part1.rar"
>> files,
>> ".part2.rar" files, and so on.  So for me dealing with the
>> extension as
>> a single unit, rather than individual parts, is the most common
>> approach.
>>
>>
>> But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"?
>> Existing tools like glob and endswith() can deal with the ".tar.gz"
>> extension reliably, but "fullsuffix" would, arguably, not give the
>> answers you want.
>>
>>
>>
>> I wouldn't use it in that situation.  The existing "suffix" and "stem"
>> properties also only work reliably under certain situations.
>>
>
> Which situations do you mean? It works quite fine with multiple suffixes:
> The suffix of "pip-9.0.1.tar.gz" is ".gz", and sure enough, you can
> reasonably expect it's a gz-compressed file. If you uncompress it and strip
> the extension, you'll end up with a "pip-9.0.1.tar", where the suffix is
> ".tar" -- and humans would be surprised if it wasn't a tar archive.
>
>

A ".tar.gz" is not the same as a ".svg.gz".  The fact that they are both
gzip-compressed is an implementation detail as far as most software I deal
with is concerned.  My unarchiver will extract a ".tar.gz" into a directory
as if it was just a ".tar", while my image viewer will view a ".svg.gz" as
a vector image as if it was just a ".svg".  From a user-interaction
standpoint, the ".gz" part is ignored.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] pathlib suggestions

2017-01-25 Thread Steve Dower

On 25Jan2017 0816, Petr Viktorin wrote:

On 01/25/2017 04:33 PM, Todd wrote:

But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"?
Existing tools like glob and endswith() can deal with the ".tar.gz"
extension reliably, but "fullsuffix" would, arguably, not give the
answers you want.

I wouldn't use it in that situation.  The existing "suffix" and "stem"
properties also only work reliably under certain situations.


Which situations do you mean? It works quite fine with multiple suffixes:
The suffix of "pip-9.0.1.tar.gz" is ".gz", and sure enough, you can
reasonably expect it's a gz-compressed file. If you uncompress it and
strip the extension, you'll end up with a "pip-9.0.1.tar", where the
suffix is ".tar" -- and humans would be surprised if it wasn't a tar
archive.



It may be handy if suffixes was a reversed tuple of suffixes (or 
possibly a cumulative tuple):


>>> Path('pip-9.0.1.tar.gz').suffixes
('.gz', '.tar', '.1', '.0')

This has a nice benefit for comparisons:
>>> targzs = [f for f in all_files if f.suffixes[:2] == ('.gz', '.tar')]

It doesn't necessarily improve over .endswith(), but it has a slight 
convenience over .split() and arguably demonstrates intent more clearly. 
(Though my biggest issue with all of this is case-sensitivity, which 
probably means we need to add comparison functions to Path flavours in 
order to do this stuff properly.)



The "cumulative tuple" version would be like this:

>>> Path('pip-9.0.1.tar.gz').suffixes
('.gz', '.tar.gz', '.1.tar.gz', '.0.1.tar.gz')

This doesn't compare as nicely, since now we would use f.suffixes[1] 
which will raise if there is only one suffix (likely). But it does 
return a value which cannot be easily recreated using other functions.


Cheers,
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] pathlib suggestions

2017-01-25 Thread Petr Viktorin

On 01/25/2017 04:33 PM, Todd wrote:

On Wed, Jan 25, 2017 at 10:18 AM, Petr Viktorin mailto:encu...@gmail.com>> wrote:

On 01/25/2017 04:04 PM, Todd wrote:

On Wed, Jan 25, 2017 at 12:25 AM, Stephen J. Turnbull
mailto:turnbull.stephen...@u.tsukuba.ac.jp>
>> wrote:

I'm just going to let fly with the +1s and -1s, don't take
them too
seriously, they're basically impressionistic (I'm not a huge
user of
pathlib yet).

Todd writes:

 > So although the names are tentative, perhaps there could be a
"fullsuffix"
 > property to return the extensions as a single string,

-0  '.'.join(p.suffixes) vs. p.fullsuffix?  TOOWTDI says
no.  I
also don't really see the use case.


The whole point of pathlib is to provide convenience functions for
common path-related operations.  It is full of methods and
properties
that could be implemented other ways.

Dealing with multi-part extensions, at least for me, is extremely
common.  A ".tar.gz" file is not the same as a ".tar.bz2" or a
".svg.gz".  When I want to find a ".tar.gz" file, having to deal
with
the ".tar" and ".gz" parts separately is nothing but a
nuisance.  If I
want to find and extract ".rar" files, I don't want ".part1.rar"
files,
".part2.rar" files, and so on.  So for me dealing with the
extension as
a single unit, rather than individual parts, is the most common
approach.


But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"?
Existing tools like glob and endswith() can deal with the ".tar.gz"
extension reliably, but "fullsuffix" would, arguably, not give the
answers you want.



I wouldn't use it in that situation.  The existing "suffix" and "stem"
properties also only work reliably under certain situations.


Which situations do you mean? It works quite fine with multiple suffixes:
The suffix of "pip-9.0.1.tar.gz" is ".gz", and sure enough, you can 
reasonably expect it's a gz-compressed file. If you uncompress it and 
strip the extension, you'll end up with a "pip-9.0.1.tar", where the 
suffix is ".tar" -- and humans would be surprised if it wasn't a tar 
archive.


The function can't determine what a particular human would think of as 
the full (or "real") suffix in a particular situation -- but I wouldn't 
call it unreliable.



Perhaps more specialized tools would be useful, though, for example:
repacked_path = original_path.replace_suffix(".tar.gz", ".zip")


That is helpful if I want to rename, not if I want to (for example)
uncompress a file.


Something like this?
uncompressed = original_path.replace_suffix(".tar.gz", "")



___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] pathlib suggestions

2017-01-25 Thread Todd
On Wed, Jan 25, 2017 at 11:04 AM, Thomas Kluyver 
wrote:

> On Wed, Jan 25, 2017, at 03:54 PM, Todd wrote:
>
> Those [.tar.foo] are just examples that I encounter a lot, there can be
> other cases where multiple extensions are used.
>
>
> The real issue is that there's no definition of what an extension is. You
> can have dots anywhere in a filename, and it's not at all unusual for them
> to be used before the bit we recognise as the extension. Almost every
> package on PyPI has files named like 'pip-9.0.1.tar.gz', but '.0.1.tar.gz'
> clearly doesn't make any sense as an extension. Without a good definition
> of what the 'full extension' is, we can't have code to find it.
>
> Thomas
>
>

Right, that is why we would have three properties

1. suffix: gets the part after the last period as a string, including the
period (already exists), so "spam.tar.gz" -> ".gz"
2. fullsuffix: gets the part after the first period as a string, including
the period (this is what I am proposing), so "spam.tar.gz" -> ".gz"
3. suffixes: gets the part after the first period as a list of strings
split on the leading period, each including the leading period (already
exists), so "spam.tar.gz" -> [".tar", ".gz"]

"suffix" is only useful if you are sure only the part after the last period
is useful, "fullsuffix" is only useful if you are sure the entire part
after first period is useful, and "suffixes" is needed in more complicated
situations.  This is similar in principle to having "str.split",
"str.rsplit", "str.partition", and "str.rpartition".  pathlib currently has
the equivalent of "str.split" (suffixes) and "str.rpartition" (suffix), but
lacks the equivalent of "str.partition" (fullsuffix).
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] pathlib suggestions

2017-01-25 Thread Thomas Kluyver
On Wed, Jan 25, 2017, at 03:58 PM, Todd wrote:

> On Wed, Jan 25, 2017 at 10:45 AM, Thomas Kluyver
>  wrote:
>> __You might not, but it seems like an attractive nuisance. You can't
>> reliably use it as a test for  .tar.gz files, but it would be easy to
>> think that you can and write buggy code using it. And I can't
>> currently think of a general example where it would be useful.
> 

> From my perspective at least, those arguments apply just as well to
> the existing "suffix" and "stem" properties.


To some extent it does. But the convention of looking at a single
extension is common enough that there's a stronger case for providing
easy access to that.


>  I thought about suggesting a 'hassuffix' method, but it doesn't pass
>  the 'one way to do it' test when you can do:
>> 

>> p.name.endswith('.tar.gz')

> Then why is there a "match" method?  It doesn't seem like the "one
> way to do it test" is being used for pathlib, nor do I think it
> really applies for a module whose whole point is to provide
> convenience tools.


Everything is trade-offs: if you can justify why a new thing is useful
enough, that can override the 'one way to do it' consideration. That's
why we now have four kinds of string formatting. But I don't think 'X
got away with it so we should allow Y too' is a compelling argument.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] pathlib suggestions

2017-01-25 Thread Paul Moore
On 25 January 2017 at 16:04, Thomas Kluyver  wrote:
> On Wed, Jan 25, 2017, at 03:54 PM, Todd wrote:
>
> Those [.tar.foo] are just examples that I encounter a lot, there can be
> other cases where multiple extensions are used.
>
>
> The real issue is that there's no definition of what an extension is. You
> can have dots anywhere in a filename, and it's not at all unusual for them
> to be used before the bit we recognise as the extension. Almost every
> package on PyPI has files named like 'pip-9.0.1.tar.gz', but '.0.1.tar.gz'
> clearly doesn't make any sense as an extension. Without a good definition of
> what the 'full extension' is, we can't have code to find it.

More precisely, we *can* have code to find it, but it's of necessity
application-specific, and so not a good fit for a general library like
the stdlib.

One of the design principles for code in the stdlib is "does it solve
a sufficiently general problem?" In this case, there's a general
problem, which is "give me back what I think of as the suffix in this
case" - but the proposed method doesn't solve that problem (because of
the cases already quoted). Conversely, the problem which the proposed
solution *does* solve ("give me the part of the filename after the
first dot") isn't general enough to warrant going into the stdlib,
because it's too often not what people actually want.

Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] pathlib suggestions

2017-01-25 Thread Thomas Kluyver
On Wed, Jan 25, 2017, at 03:54 PM, Todd wrote:

> Those [.tar.foo] are just examples that I encounter a lot, there can
> be other cases where multiple extensions are used.


The real issue is that there's no definition of what an extension is.
You can have dots anywhere in a filename, and it's not at all unusual
for them to be used before the bit we recognise as the extension.
Almost every package on PyPI has files named like 'pip-9.0.1.tar.gz',
but '.0.1.tar.gz' clearly doesn't make any sense as an extension.
Without a good definition of what the 'full extension' is, we can't
have code to find it.


Thomas
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] pathlib suggestions

2017-01-25 Thread Todd
On Wed, Jan 25, 2017 at 10:45 AM, Stephan Houben 
wrote:

> Hi all,
>
> It seems to me that the correct algorithm to get the "full suffix" is not
> to take everything after the FIRST dot,
> but rather to:
> 1. Recognize that the last suffix is one of the UNIX-style compression
> tools .Z, .gz, ,bz2, .xz, .lzma (at least)
> 2. Then add the next-to-last suffix.
>
> So we can then determine that the suffix of
>   order.for.tar.ps.gz
> is .ps.gz and the basename is order.for.tar .
>
> However, I am not sure if we want to hard-code a list of such suffixes in
> the standard library.
> (Even though it could be user-extensible.)
>
> Stephan
>


Those are just examples that I encounter a lot, there can be other cases
where multiple extensions are used.


>
> 2017-01-25 16:33 GMT+01:00 Todd :
>
>> On Wed, Jan 25, 2017 at 10:18 AM, Petr Viktorin 
>> wrote:
>>
>>> On 01/25/2017 04:04 PM, Todd wrote:
>>>
 On Wed, Jan 25, 2017 at 12:25 AM, Stephen J. Turnbull
 >>> > wrote:

 I'm just going to let fly with the +1s and -1s, don't take them too
 seriously, they're basically impressionistic (I'm not a huge user of
 pathlib yet).

 Todd writes:

  > So although the names are tentative, perhaps there could be a
 "fullsuffix"
  > property to return the extensions as a single string,

 -0  '.'.join(p.suffixes) vs. p.fullsuffix?  TOOWTDI says no.  I
 also don't really see the use case.


 The whole point of pathlib is to provide convenience functions for
 common path-related operations.  It is full of methods and properties
 that could be implemented other ways.

 Dealing with multi-part extensions, at least for me, is extremely
 common.  A ".tar.gz" file is not the same as a ".tar.bz2" or a
 ".svg.gz".  When I want to find a ".tar.gz" file, having to deal with
 the ".tar" and ".gz" parts separately is nothing but a nuisance.  If I
 want to find and extract ".rar" files, I don't want ".part1.rar" files,
 ".part2.rar" files, and so on.  So for me dealing with the extension as
 a single unit, rather than individual parts, is the most common
 approach.

>>>
>>> But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"?
>>> Existing tools like glob and endswith() can deal with the ".tar.gz"
>>> extension reliably, but "fullsuffix" would, arguably, not give the answers
>>> you want.
>>>
>>
>>
>> I wouldn't use it in that situation.  The existing "suffix" and "stem"
>> properties also only work reliably under certain situations.
>>
>>
>>>
>>> Perhaps more specialized tools would be useful, though, for example:
>>> repacked_path = original_path.replace_suffix(".tar.gz", ".zip")
>>>
>>>
>> That is helpful if I want to rename, not if I want to (for example)
>> uncompress a file.
>>
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] pathlib suggestions

2017-01-25 Thread Todd
On Wed, Jan 25, 2017 at 10:45 AM, Thomas Kluyver 
wrote:

> On Wed, Jan 25, 2017, at 03:33 PM, Todd wrote:
>
> On Wed, Jan 25, 2017 at 10:18 AM, Petr Viktorin  wrote:
>
> But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"?
> Existing tools like glob and endswith() can deal with the ".tar.gz"
> extension reliably, but "fullsuffix" would, arguably, not give the answers
> you want.
>
>
>
> I wouldn't use it in that situation.
>
>
> You might not, but it seems like an attractive nuisance. You can't
> reliably use it as a test for .tar.gz files, but it would be easy to think
> that you can and write buggy code using it. And I can't currently think of
> a general example where it would be useful.
>


>From my perspective at least, those arguments apply just as well to the
existing "suffix" and "stem" properties.


>
> I thought about suggesting a 'hassuffix' method, but it doesn't pass the
> 'one way to do it' test when you can do:
>
> p.name.endswith('.tar.gz')
>

Then why is there a "match" method?  It doesn't seem like the "one way to
do it test" is being used for pathlib, nor do I think it really applies for
a module whose whole point is to provide convenience tools.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] pathlib suggestions

2017-01-25 Thread Stephan Houben
Hi all,

It seems to me that the correct algorithm to get the "full suffix" is not
to take everything after the FIRST dot,
but rather to:
1. Recognize that the last suffix is one of the UNIX-style compression
tools .Z, .gz, ,bz2, .xz, .lzma (at least)
2. Then add the next-to-last suffix.

So we can then determine that the suffix of
  order.for.tar.ps.gz
is .ps.gz and the basename is order.for.tar .

However, I am not sure if we want to hard-code a list of such suffixes in
the standard library.
(Even though it could be user-extensible.)

Stephan

2017-01-25 16:33 GMT+01:00 Todd :

> On Wed, Jan 25, 2017 at 10:18 AM, Petr Viktorin  wrote:
>
>> On 01/25/2017 04:04 PM, Todd wrote:
>>
>>> On Wed, Jan 25, 2017 at 12:25 AM, Stephen J. Turnbull
>>> >> > wrote:
>>>
>>> I'm just going to let fly with the +1s and -1s, don't take them too
>>> seriously, they're basically impressionistic (I'm not a huge user of
>>> pathlib yet).
>>>
>>> Todd writes:
>>>
>>>  > So although the names are tentative, perhaps there could be a
>>> "fullsuffix"
>>>  > property to return the extensions as a single string,
>>>
>>> -0  '.'.join(p.suffixes) vs. p.fullsuffix?  TOOWTDI says no.  I
>>> also don't really see the use case.
>>>
>>>
>>> The whole point of pathlib is to provide convenience functions for
>>> common path-related operations.  It is full of methods and properties
>>> that could be implemented other ways.
>>>
>>> Dealing with multi-part extensions, at least for me, is extremely
>>> common.  A ".tar.gz" file is not the same as a ".tar.bz2" or a
>>> ".svg.gz".  When I want to find a ".tar.gz" file, having to deal with
>>> the ".tar" and ".gz" parts separately is nothing but a nuisance.  If I
>>> want to find and extract ".rar" files, I don't want ".part1.rar" files,
>>> ".part2.rar" files, and so on.  So for me dealing with the extension as
>>> a single unit, rather than individual parts, is the most common
>>> approach.
>>>
>>
>> But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"?
>> Existing tools like glob and endswith() can deal with the ".tar.gz"
>> extension reliably, but "fullsuffix" would, arguably, not give the answers
>> you want.
>>
>
>
> I wouldn't use it in that situation.  The existing "suffix" and "stem"
> properties also only work reliably under certain situations.
>
>
>>
>> Perhaps more specialized tools would be useful, though, for example:
>> repacked_path = original_path.replace_suffix(".tar.gz", ".zip")
>>
>>
> That is helpful if I want to rename, not if I want to (for example)
> uncompress a file.
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] pathlib suggestions

2017-01-25 Thread Thomas Kluyver
On Wed, Jan 25, 2017, at 03:33 PM, Todd wrote:

> On Wed, Jan 25, 2017 at 10:18 AM, Petr Viktorin
>  wrote:
>> But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"?

>>  Existing tools like glob and endswith() can deal with the ".tar.gz"
>>  extension reliably, but "fullsuffix" would, arguably, not give the
>>  answers you want.
> 

> 

> I wouldn't use it in that situation. 



You might not, but it seems like an attractive nuisance. You can't
reliably use it as a test for  .tar.gz files, but it would be easy to
think that you can and write buggy code using it. And I can't currently
think of a general example where it would be useful.


I thought about suggesting a 'hassuffix' method, but it doesn't pass the
'one way to do it' test when you can do:


p.name.endswith('.tar.gz')


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] pathlib suggestions

2017-01-25 Thread Todd
On Wed, Jan 25, 2017 at 10:18 AM, Petr Viktorin  wrote:

> On 01/25/2017 04:04 PM, Todd wrote:
>
>> On Wed, Jan 25, 2017 at 12:25 AM, Stephen J. Turnbull
>> > > wrote:
>>
>> I'm just going to let fly with the +1s and -1s, don't take them too
>> seriously, they're basically impressionistic (I'm not a huge user of
>> pathlib yet).
>>
>> Todd writes:
>>
>>  > So although the names are tentative, perhaps there could be a
>> "fullsuffix"
>>  > property to return the extensions as a single string,
>>
>> -0  '.'.join(p.suffixes) vs. p.fullsuffix?  TOOWTDI says no.  I
>> also don't really see the use case.
>>
>>
>> The whole point of pathlib is to provide convenience functions for
>> common path-related operations.  It is full of methods and properties
>> that could be implemented other ways.
>>
>> Dealing with multi-part extensions, at least for me, is extremely
>> common.  A ".tar.gz" file is not the same as a ".tar.bz2" or a
>> ".svg.gz".  When I want to find a ".tar.gz" file, having to deal with
>> the ".tar" and ".gz" parts separately is nothing but a nuisance.  If I
>> want to find and extract ".rar" files, I don't want ".part1.rar" files,
>> ".part2.rar" files, and so on.  So for me dealing with the extension as
>> a single unit, rather than individual parts, is the most common  approach.
>>
>
> But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"?
> Existing tools like glob and endswith() can deal with the ".tar.gz"
> extension reliably, but "fullsuffix" would, arguably, not give the answers
> you want.
>


I wouldn't use it in that situation.  The existing "suffix" and "stem"
properties also only work reliably under certain situations.


>
> Perhaps more specialized tools would be useful, though, for example:
> repacked_path = original_path.replace_suffix(".tar.gz", ".zip")
>
>
That is helpful if I want to rename, not if I want to (for example)
uncompress a file.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] pathlib suggestions

2017-01-25 Thread Petr Viktorin

On 01/25/2017 04:04 PM, Todd wrote:

On Wed, Jan 25, 2017 at 12:25 AM, Stephen J. Turnbull
mailto:turnbull.stephen...@u.tsukuba.ac.jp>> wrote:

I'm just going to let fly with the +1s and -1s, don't take them too
seriously, they're basically impressionistic (I'm not a huge user of
pathlib yet).

Todd writes:

 > So although the names are tentative, perhaps there could be a
"fullsuffix"
 > property to return the extensions as a single string,

-0  '.'.join(p.suffixes) vs. p.fullsuffix?  TOOWTDI says no.  I
also don't really see the use case.


The whole point of pathlib is to provide convenience functions for
common path-related operations.  It is full of methods and properties
that could be implemented other ways.

Dealing with multi-part extensions, at least for me, is extremely
common.  A ".tar.gz" file is not the same as a ".tar.bz2" or a
".svg.gz".  When I want to find a ".tar.gz" file, having to deal with
the ".tar" and ".gz" parts separately is nothing but a nuisance.  If I
want to find and extract ".rar" files, I don't want ".part1.rar" files,
".part2.rar" files, and so on.  So for me dealing with the extension as
a single unit, rather than individual parts, is the most common  approach.


But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"?
Existing tools like glob and endswith() can deal with the ".tar.gz" 
extension reliably, but "fullsuffix" would, arguably, not give the 
answers you want.


Perhaps more specialized tools would be useful, though, for example:
repacked_path = original_path.replace_suffix(".tar.gz", ".zip")

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] pathlib suggestions

2017-01-25 Thread Todd
On Wed, Jan 25, 2017 at 12:25 AM, Stephen J. Turnbull <
turnbull.stephen...@u.tsukuba.ac.jp> wrote:

> I'm just going to let fly with the +1s and -1s, don't take them too
> seriously, they're basically impressionistic (I'm not a huge user of
> pathlib yet).
>
> Todd writes:
>
>  > So although the names are tentative, perhaps there could be a
> "fullsuffix"
>  > property to return the extensions as a single string,
>
> -0  '.'.join(p.suffixes) vs. p.fullsuffix?  TOOWTDI says no.  I
> also don't really see the use case.
>
>
The whole point of pathlib is to provide convenience functions for common
path-related operations.  It is full of methods and properties that could
be implemented other ways.

Dealing with multi-part extensions, at least for me, is extremely common.
A ".tar.gz" file is not the same as a ".tar.bz2" or a ".svg.gz".  When I
want to find a ".tar.gz" file, having to deal with the ".tar" and ".gz"
parts separately is nothing but a nuisance.  If I want to find and extract
".rar" files, I don't want ".part1.rar" files, ".part2.rar" files, and so
on.  So for me dealing with the extension as a single unit, rather than
individual parts, is the most common  approach.


>  > a "nosuffix" extension to return the path without any extensions,
>
> +1  (subject to name bikeshedding) .suffixes itself is kinda
> useless without this, and you shouldn't have to roll your own
>
> Do you propose to return a Path or a str here?
>

I intend it to behave as much as possible like the existing "stem"
property, so a string.


>
>  > and a "with_suffixes" method that replaces all the suffix and can
>  > accept multiple arguments (which would then be joined to create the
>  > extensions).
>
> Do you propose to return a Path or a str here?  +1 for a Path, +0 for
> a str.
>

It is intended to behave as much as possible like the existing "with_suffix"
method, so a Path.


>  > Second, for methods like "rename" and "replace", it would be nice if
> there
>  > was an "exist_ok" argument that defaults to "True" to allow for safe
>  > renaming.
>
> -1  I don't see how this is an improvement.  If it would raise if
> exist_ok == False, then
>
> try:
> p.rename(another_p, exist_ok=False)
> except ExistNotOKError:
> take_evasive_action(p)
>
> doesn't seem like a big improvement over
>
> if p.exists():
> take_evasive_action(p)
> else:
> p.rename(another_p)
>
> And if it doesn't raise, then the action just silently fails?
>
>

As Ed said, this can lead to race conditions.  Something could happen after
you check "exists".

Also, the "mkdir" method already has an "exist_ok" argument, and the "open"
function has the "x" flag to raise an exception if the file already
exists.  It seems like a major omission to me that there are safe ways to
make files and safe ways to make directories, but no safe way to move files
or directories.


> Name bikeshedding: IIRC, if an argument is essentially always going to
> be one of a small number of literals, Guido strongly prefers a new
> method (eg, rename_safely).
>
>
File and directory handling is already full of flags like this.  This
argument was taken verbatim from the existing "mkdir" method for
consistency.


>
>  > Fourth, for the "is_*" methods, it would be nice if there was a "strict"
>  > argument that would raise an exception if the file or directory doesn't
>  > exist.
>
> -1 That seems weird in a library intended for the syntactic
> manipulation of uninterpreted paths (even though this is a semantic
> operation).  TOOWTDI and EIBTI, as well.  For backward compatibility,
> strict would have to default to False.
>
>
First, these methods only exist for "concrete" paths, which are explicitly
intended for use in I/O operations.

Second, as before, this argument is taken from another method.  In this
case, the "resolve" method has a "strict" argument.  Any other approach
suffers from the same race conditions as "rename" and "replace", and again
it seems weird that resolving a path can be done safely but testing it
can't be.

And yes, the argument would have to default to "False".  All of my
suggestions are intended to be completely backwards-compatible.  I don't
see that as a problem, though.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] pathlib suggestions

2017-01-25 Thread Ed Kellett
On Wed, 25 Jan 2017 at 05:26 Stephen J. Turnbull <
turnbull.stephen...@u.tsukuba.ac.jp> wrote:

> -1  I don't see how this is an improvement.  If it would raise if
> exist_ok == False, then
>
> try:
> p.rename(another_p, exist_ok=False)
> except ExistNotOKError:
> take_evasive_action(p)
>
> doesn't seem like a big improvement over
>
> if p.exists():
> take_evasive_action(p)
> else:
> p.rename(another_p)
>
> And if it doesn't raise, then the action just silently fails?
>

The latter should be if another_p.exists(), and it can race with the
creation of another_p—this is a textbook motivating example for EAFP.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] pathlib suggestions

2017-01-24 Thread Stephen J. Turnbull
I'm just going to let fly with the +1s and -1s, don't take them too
seriously, they're basically impressionistic (I'm not a huge user of
pathlib yet).

Todd writes:

 > So although the names are tentative, perhaps there could be a "fullsuffix"
 > property to return the extensions as a single string,

-0  '.'.join(p.suffixes) vs. p.fullsuffix?  TOOWTDI says no.  I
also don't really see the use case.  

 > a "nosuffix" extension to return the path without any extensions,

+1  (subject to name bikeshedding) .suffixes itself is kinda
useless without this, and you shouldn't have to roll your own

Do you propose to return a Path or a str here?

 > and a "with_suffixes" method that replaces all the suffix and can
 > accept multiple arguments (which would then be joined to create the
 > extensions).

Do you propose to return a Path or a str here?  +1 for a Path, +0 for
a str.

 > Second, for methods like "rename" and "replace", it would be nice if there
 > was an "exist_ok" argument that defaults to "True" to allow for safe
 > renaming.

-1  I don't see how this is an improvement.  If it would raise if
exist_ok == False, then

try:
p.rename(another_p, exist_ok=False)
except ExistNotOKError:
take_evasive_action(p)

doesn't seem like a big improvement over

if p.exists():
take_evasive_action(p)
else:
p.rename(another_p)

And if it doesn't raise, then the action just silently fails?

Name bikeshedding: IIRC, if an argument is essentially always going to
be one of a small number of literals, Guido strongly prefers a new
method (eg, rename_safely).

I will admit that the current API seems strange to me: on Unix,
.rename and .replace are apparently the same, and both unsafe?  I
would prefer

.rename Unix semantics (deprecated)
.rename_safely  replacement for .rename, raises if exists
.replacesilently replace

Names to be bikeshedded per usual.

 > Third, it would be nice if there was a "uid" and "gid" method for getting
 > the numeric user and group IDs for a file,

+1

 > or alternatively a "numeric" argument for the "owner" and "group"
 > methods.

-1 (see "Guido prefers" above)

 > Fourth, for the "is_*" methods, it would be nice if there was a "strict"
 > argument that would raise an exception if the file or directory doesn't
 > exist.

-1 That seems weird in a library intended for the syntactic
manipulation of uninterpreted paths (even though this is a semantic
operation).  TOOWTDI and EIBTI, as well.  For backward compatibility,
strict would have to default to False.

 > the example for the "parts" property should probably show at least
 > one file with an extension,

+1

Steve


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] pathlib suggestions

2017-01-24 Thread Vamsi Krishna Avula
(I have a small question, I hope it's not off-topic for this thread.)

What was the rationale behind an explicit `iterdir` method? Why not simply make 
the `Path` objects iterable?

From: Python-ideas  on 
behalf of Todd 
Sent: Wednesday, January 25, 2017 3:32:14 AM
To: python-ideas
Subject: Re: [Python-ideas] pathlib suggestions

On Tue, Jan 24, 2017 at 4:27 PM, Chris Angelico 
mailto:ros...@gmail.com>> wrote:
On Wed, Jan 25, 2017 at 7:30 AM, Todd 
mailto:toddr...@gmail.com>> wrote:
> First, for me, extensions are primarily useful as a single unit.  So,
> practically speaking, the extension of "spam.tar.gz" isn't ".gz", it is
> ".tar.gz".  So it would be nice to have some properties to make it easier to
> deal with the "complete" extension like this.  There is a "suffixes"
> property, but it returns a list, which you then have to recombine manually.
> And as far as I can tell there is no method to return the name without any
> extension.  And there is no method for replacing all the extensions at once.
>
> So although the names are tentative, perhaps there could be a "fullsuffix"
> property to return the extensions as a single string, a "nosuffix" extension
> to return the path without any extensions, and a "with_suffixes" method that
> replaces all the suffix and can accept multiple arguments (which would then
> be joined to create the extensions).

+0. Not all files with multiple dots in them are actually using them
to mean multiple file extensions. Every day I'm working with files
that use dots to separate words in a title, or have section numbers
("4.2.5 Yada Yada Yada.md" does not have a base name of "4"), etc.
Since there's no perfect way to pin these down, this needs to be a
completely separate feature, and it'd only really be useful for some
situations. So go ahead, if there's interest, but the current one
shouldn't be deprecated or anything.

ChrisA

Of course the current ones shouldn't be deprecated, I never suggested they 
should be.  The whole point of using new method and property names was to avoid 
any conflict with the existing methods.  And yes, it won't work in all 
situations.  Which method or property you would use depends on your specific 
needs.

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] pathlib suggestions

2017-01-24 Thread Todd
On Tue, Jan 24, 2017 at 4:27 PM, Chris Angelico  wrote:

> On Wed, Jan 25, 2017 at 7:30 AM, Todd  wrote:
> > First, for me, extensions are primarily useful as a single unit.  So,
> > practically speaking, the extension of "spam.tar.gz" isn't ".gz", it is
> > ".tar.gz".  So it would be nice to have some properties to make it
> easier to
> > deal with the "complete" extension like this.  There is a "suffixes"
> > property, but it returns a list, which you then have to recombine
> manually.
> > And as far as I can tell there is no method to return the name without
> any
> > extension.  And there is no method for replacing all the extensions at
> once.
> >
> > So although the names are tentative, perhaps there could be a
> "fullsuffix"
> > property to return the extensions as a single string, a "nosuffix"
> extension
> > to return the path without any extensions, and a "with_suffixes" method
> that
> > replaces all the suffix and can accept multiple arguments (which would
> then
> > be joined to create the extensions).
>
> +0. Not all files with multiple dots in them are actually using them
> to mean multiple file extensions. Every day I'm working with files
> that use dots to separate words in a title, or have section numbers
> ("4.2.5 Yada Yada Yada.md" does not have a base name of "4"), etc.
> Since there's no perfect way to pin these down, this needs to be a
> completely separate feature, and it'd only really be useful for some
> situations. So go ahead, if there's interest, but the current one
> shouldn't be deprecated or anything.
>
> ChrisA
>

Of course the current ones shouldn't be deprecated, I never suggested they
should be.  The whole point of using new method and property names was to
avoid any conflict with the existing methods.  And yes, it won't work in
all situations.  Which method or property you would use depends on your
specific needs.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] pathlib suggestions

2017-01-24 Thread Chris Angelico
On Wed, Jan 25, 2017 at 7:30 AM, Todd  wrote:
> First, for me, extensions are primarily useful as a single unit.  So,
> practically speaking, the extension of "spam.tar.gz" isn't ".gz", it is
> ".tar.gz".  So it would be nice to have some properties to make it easier to
> deal with the "complete" extension like this.  There is a "suffixes"
> property, but it returns a list, which you then have to recombine manually.
> And as far as I can tell there is no method to return the name without any
> extension.  And there is no method for replacing all the extensions at once.
>
> So although the names are tentative, perhaps there could be a "fullsuffix"
> property to return the extensions as a single string, a "nosuffix" extension
> to return the path without any extensions, and a "with_suffixes" method that
> replaces all the suffix and can accept multiple arguments (which would then
> be joined to create the extensions).

+0. Not all files with multiple dots in them are actually using them
to mean multiple file extensions. Every day I'm working with files
that use dots to separate words in a title, or have section numbers
("4.2.5 Yada Yada Yada.md" does not have a base name of "4"), etc.
Since there's no perfect way to pin these down, this needs to be a
completely separate feature, and it'd only really be useful for some
situations. So go ahead, if there's interest, but the current one
shouldn't be deprecated or anything.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] pathlib suggestions

2017-01-24 Thread Ryan Gonzalez
As another suggestion, I'd love an rmtree method analogous to shutil.rmtree.
And maybe also a remove method, that basically does:

  

  

if path.is_dir():

path.rmtree()

else:

path.unlink()

  

  
\--  

Ryan (ライアン)

Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else



  
On Jan 24 2017, at 2:32 pm, Todd  wrote:  

> I have been using pathlib, and I have come up with a few suggestions on what
would make the module more useful for me (and hopefully others):  
  
First, for me, extensions are primarily useful as a single unit.  So,
practically speaking, the extension of "spam.tar.gz" isn't ".gz", it is
".tar.gz".  So it would be nice to have some properties to make it easier to
deal with the "complete" extension like this.  There is a "suffixes" property,
but it returns a list, which you then have to recombine manually.  And as far
as I can tell there is no method to return the name without any extension.
And there is no method for replacing all the extensions at once.  
  
So although the names are tentative, perhaps there could be a "fullsuffix"
property to return the extensions as a single string, a "nosuffix" extension
to return the path without any extensions, and a "with_suffixes" method that
replaces all the suffix and can accept multiple arguments (which would then be
joined to create the extensions).  
  
Second, for methods like "rename" and "replace", it would be nice if there was
an "exist_ok" argument that defaults to "True" to allow for safe renaming.  
  
Third, it would be nice if there was a "uid" and "gid" method for getting the
numeric user and group IDs for a file, or alternatively a "numeric" argument
for the "owner" and "group" methods.  
  
Fourth, for the "is_*" methods, it would be nice if there was a "strict"
argument that would raise an exception if the file or directory doesn't exist.  
  
Finally, although not problem with the module per se, the example for the
"parts" property should probably show at least one file with an extension, to
make it clear how it deals with extensions (since the documentation is
ambiguous in this regard).  
  
Thanks for your time.  

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] pathlib suggestions

2017-01-24 Thread Todd
I have been using pathlib, and I have come up with a few suggestions on
what would make the module more useful for me (and hopefully others):

First, for me, extensions are primarily useful as a single unit.  So,
practically speaking, the extension of "spam.tar.gz" isn't ".gz", it is
".tar.gz".  So it would be nice to have some properties to make it easier
to deal with the "complete" extension like this.  There is a "suffixes"
property, but it returns a list, which you then have to recombine
manually.  And as far as I can tell there is no method to return the name
without any extension.  And there is no method for replacing all the
extensions at once.

So although the names are tentative, perhaps there could be a "fullsuffix"
property to return the extensions as a single string, a "nosuffix"
extension to return the path without any extensions, and a "with_suffixes"
method that replaces all the suffix and can accept multiple arguments
(which would then be joined to create the extensions).

Second, for methods like "rename" and "replace", it would be nice if there
was an "exist_ok" argument that defaults to "True" to allow for safe
renaming.

Third, it would be nice if there was a "uid" and "gid" method for getting
the numeric user and group IDs for a file, or alternatively a "numeric"
argument for the "owner" and "group" methods.

Fourth, for the "is_*" methods, it would be nice if there was a "strict"
argument that would raise an exception if the file or directory doesn't
exist.

Finally, although not problem with the module per se, the example for the
"parts" property should probably show at least one file with an extension,
to make it clear how it deals with extensions (since the documentation is
ambiguous in this regard).

Thanks for your time.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/