Re: [Python-Dev] PEP 3121, 384 Refactoring Issues

2014-07-15 Thread Nick Coghlan
On 14 Jul 2014 11:41, "Brett Cannon"  wrote:
>
>
> I agree for PEP  3121 which is the initialization/finalization work. The
stable ABi is not necessary. So maybe we should re-examine the patches and
accept the bits that clean up init/finalization and leave out any
ABi-related changes.

Martin's right about improving the subinterpreter support - every type
declaration we move from a static struct to the dynamic type creation API
is one that isn't shared between subinterpreters any more.

That argument is potentially valid even for *builtin* modules and types,
not just those in extension modules.

Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-15 Thread Nick Coghlan
On 14 Jul 2014 22:50, "Ben Hoyt"  wrote:
>
> In light of that, I propose I update the PEP to basically follow
> Victor's model of is_X() and stat() following symlinks by default, and
> allowing you to specify follow_symlinks=False if you want something
> other than that.
>
> Victor had one other question:
>
> > What happens to name and full_name with followlinks=True?
> > Do they contain the name in the directory (name of the symlink)
> > or name of the linked file?
>
> I would say they should contain the name and full path of the entry --
> the symlink, NOT the linked file. They kind of have to, right,
> otherwise they'd have to be method calls that potentially call the
> system.

It would be worth explicitly pointing out "os.readlink(entry.full_name)" in
the docs as the way to get the target of a symlink entry.

Alternatively, it may be worth including a readlink() method directly on
the entry objects. (That can easily be added later though, so no need for
it in the initial proposal).

>
> In any case, here's the modified proposal:
>
> scandir(path='.') -> generator of DirEntry objects, which have:
>
> * name: name as per listdir()
> * full_name: full path name (not necessarily absolute), equivalent of
> os.path.join(path, entry.name)
> * is_dir(follow_symlinks=True): like os.path.isdir(entry.full_name),
> but free in most cases; cached per entry
> * is_file(follow_symlinks=True): like os.path.isfile(entry.full_name),
> but free in most cases; cached per entry
> * is_symlink(): like os.path.islink(), but free in most cases; cached per
entry
> * stat(follow_symlinks=True): like os.stat(entry.full_name,
> follow_symlinks=follow_symlinks); cached per entry
>
> The above may not be quite perfect, but it's good, and I think there's
> been enough bike-shedding on the API. :-)

+1, sounds good to me (and I like having the caching guarantees listed -
helps make it clear how DirEntry differs from pathlib.Path)

Cheers,
Nick.

>
> So please speak now or forever hold your peace. :-) I intend to update
> the PEP to reflect this and make a few other clarifications in the
> next few days.
>
> -Ben
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-15 Thread Ben Hoyt
> Looks doable.  Just make sure the cached entries reflect the
> 'follow_symlinks' setting -- so a symlink could end up with both an lstat
> cached entry and a stat cached entry.

Yes, good point -- basically the functions will use the _stat cache if
follow_symlinks=True, otherwise the _lstat cache. If the entry is not
a symlink (the usual case), they'll be the same value.

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-15 Thread Ben Hoyt
> Sorry, I don't remember who but someone proposed to add the follow_symlinks
> parameter in scandir()  directly. If the parameter is added to methods,
> there is no such issue.

Yeah, I think having the DirEntry methods do different things
depending on how scandir() was called is a really bad idea. It seems
you're agreeing with this?

> Again: remove any garantee about the cache in the definitions of methods,
> instead copy the doc from os.path and os. Add a global remark saying that
> most methods don't need any syscall in general, except for symlinks (with
> follow_symlinks=True).

I'm not sure I follow this -- surely it *has* to be documented that
the values of DirEntry.is_X() and DirEntry.stat() are cached per
entry, in contrast to os.path.isX()/os.stat()?

I don't mind a global remark about not needing syscalls, but I do
think it makes sense to make it explicit -- that is_X() almost never
need syscalls, whereas stat() does only on POSIX but is free on
Windows (except for symlinks).

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-15 Thread Ben Hoyt
> I'd *keep DirEntry.lstat() method* regardless of existence of
> .stat(*, follow_symlinks=True) method (despite the slight violation of
> DRY principle) for readability. `dir_entry.lstat().st_mode` is more
> consice than `dir_entry.stat(follow_symlinks=False).st_mode` and the
> meaning of lstat is well-established -- get (symbolic link) status [2].

The meaning of lstat() is well-established, so I don't mind this. But
I don't think it's necessary, either. My thought would be that in new
code/functions we should kind of prescribe best-practices rather than
leave the options open. Yes, it's a few more characters, but
"follow_symlinks=True" is allow much clear than "l" to describe this
behaviour, especially for non-Linux hackers.

> I suggest *renaming .full_name -> .path* due to reasons outlined in [1].
>
> [1]: https://mail.python.org/pipermail/python-dev/2014-July/135441.html

Hmmm, perhaps. You suggest .full_name implies it's the absolute path,
which isn't true. I don't mind .path, but it kind of sounds like "the
Path object associated with this entry". I think "full_name" is fine
-- it's not "abs_name".

> follow_symlinks (if added) should be *keyword-only parameter* because
> `dir_entry.is_dir(False)` is unreadable (it is not clear at a glance
> what `False` means in this case).

Agreed follow_symlinks should be a keyword-only parameter (as it is in
os.stat() in Python 3).

> Exceptions are part of the public API. pathlib is inconsitent with
> os.path here e.g., os.path.isdir() ignores all OS errors raised by
> the stat() call but the corresponding pathlib call ignores only broken
> symlinks (non-existent entries).
>
> The cherry-picking of which stat errors to silence (implicitly) seems
> worse than either silencing the errors (like os.path.isdir does) or
> allowing them to propagate.

Hmmm, you're right there's a subtle difference here. I think the
os.path.isdir() behaviour could mask real errors, and the pathlib
behaviour is more correct. pathlib's behaviour is not implicit though
-- it's clearly documented in the docs:
https://docs.python.org/3/library/pathlib.html#pathlib.Path.is_dir

> Returning False instead of raising OSError in is_dir() method simplifies
> the usage greatly without (much) negative consequences. It is a *rare*
> case when silencing errors could be more practical.

I think is_X() *should* fail if there are permissions errors or other
fatal errors. Whether or not they should fail if the file doesn't
exist (unlikely to happen anyway) or on a broken symlink is a
different question, but there's a good prececent with the existing
os/pathlib functions there.

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-15 Thread Paul Moore
On 15 July 2014 13:19, Ben Hoyt  wrote:
> Hmmm, perhaps. You suggest .full_name implies it's the absolute path,
> which isn't true. I don't mind .path, but it kind of sounds like "the
> Path object associated with this entry". I think "full_name" is fine
> -- it's not "abs_name".

Interesting. I hadn't really thought about it, but I might have
assumed full_name was absolute. However, now I see that it's "only as
absolute as the directory argument to scandir is". Having said that, I
don't think that full_name *implies* that, just that it's a possible
mistake people could make. I agree that "path" could be seen as
implying a Path object.

My preference would be to retain the name full_name, but just make it
explicit in the documentation that it is based on the directory name
argument.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-15 Thread Ethan Furman

On 07/14/2014 11:25 PM, Victor Stinner wrote:


Again: remove any garantee about the cache in the definitions of methods,
instead copy the doc from os.path and os. Add a global remark saying that
 most methods don't need any syscall in general, except for symlinks (with
 follow_symlinks=True).


I don't understand what you're saying here.  The fact that DirEnrry.is_xxx will use cached values *must* be documented, 
or our users will waste huge amounts of time trying to figure out why an unknowingly cached value is no longer matching 
the current status.


~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Another case for frozendict

2014-07-15 Thread Russell E. Owen
In article 
,
 Chris Angelico  wrote:

> On Mon, Jul 14, 2014 at 12:04 AM, Jason R. Coombs  wrote:
> > I can achieve what I need by constructing a set on the ‘items’ of the 
> > dict.
> >
>  set(tuple(doc.items()) for doc in res)
> >
> > {(('n', 1), ('err', None), ('ok', 1.0))}
> 
> This is flawed; the tuple-of-tuples depends on iteration order, which
> may vary. It should be a frozenset of those tuples, not a tuple. Which
> strengthens your case; it's that easy to get it wrong in the absence
> of an actual frozendict.

I would love to see frozendict in python.

I find myself using dicts for translation tables, usually tables that 
should not be modified. Documentation usually suffices to get that idea 
across, but it's not ideal.

frozendict would also be handy as a default values for function 
arguments. In that case documentation isn't enough and one has to resort 
to using a default value of None and then changing it in the function 
body.

I like frozendict because I feel it is expressive and adds some safety. 

-- Russell

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Another case for frozendict

2014-07-15 Thread MRAB

On 2014-07-16 00:48, Russell E. Owen wrote:

In article
,
  Chris Angelico  wrote:


On Mon, Jul 14, 2014 at 12:04 AM, Jason R. Coombs  wrote:
> I can achieve what I need by constructing a set on the ‘items’ of the 
dict.
>
 set(tuple(doc.items()) for doc in res)
>
> {(('n', 1), ('err', None), ('ok', 1.0))}

This is flawed; the tuple-of-tuples depends on iteration order, which
may vary. It should be a frozenset of those tuples, not a tuple. Which
strengthens your case; it's that easy to get it wrong in the absence
of an actual frozendict.


I would love to see frozendict in python.

I find myself using dicts for translation tables, usually tables that
should not be modified. Documentation usually suffices to get that idea
across, but it's not ideal.

frozendict would also be handy as a default values for function
arguments. In that case documentation isn't enough and one has to resort
to using a default value of None and then changing it in the function
body.

I like frozendict because I feel it is expressive and adds some safety.


Here's another use-case.

Using the 're' module:

>>> import re
>>> # Make a regex.
... p = re.compile(r'(?P\w+)\s+(?P\w+)')
>>>
>>> # What are the named groups?
... p.groupindex
{'first': 1, 'second': 2}
>>>
>>> # Perform a match.
... m = p.match('FIRST SECOND')
>>> m.groupdict()
{'first': 'FIRST', 'second': 'SECOND'}
>>>
>>> # Try modifying the pattern object.
... p.groupindex['JUNK'] = 'foobar'
>>>
>>> # What are the named groups now?
... p.groupindex
{'first': 1, 'second': 2, 'JUNK': 'foobar'}
>>>
>>> # And the match object?
... m.groupdict()
Traceback (most recent call last):
  File "", line 2, in 
IndexError: no such group

It can't find a named group called 'JUNK'.

And with a bit more tinkering it's possible to crash Python. (I'll
leave that as an exercise for the reader! :-))

The 'regex' module, on the other hand, rebuilds the dict each time:

>>> import regex
>>> # Make a regex.
... p = regex.compile(r'(?P\w+)\s+(?P\w+)')
>>>
>>> # What are the named groups?
... p.groupindex
{'second': 2, 'first': 1}
>>>
>>> # Perform a match.
... m = p.match('FIRST SECOND')
>>> m.groupdict()
{'second': 'SECOND', 'first': 'FIRST'}
>>>
>>> # Try modifying the regex.
... p.groupindex['JUNK'] = 'foobar'
>>>
>>> # What are the named groups now?
... p.groupindex
{'second': 2, 'first': 1}
>>>
>>> # And the match object?
... m.groupdict()
{'second': 'SECOND', 'first': 'FIRST'}

Using a frozendict instead would be a nicer solution.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-15 Thread Cameron Simpson

I was going to stay out of this one...

On 14Jul2014 10:25, Victor Stinner  wrote:

2014-07-14 4:17 GMT+02:00 Nick Coghlan :

Or the ever popular symlink to "." (or a directory higher in the tree).


"." and ".." are explicitly ignored by os.listdir() an os.scandir().


I think os.walk() is a good source of inspiration here: call the flag
"followlink" and default it to False.


I also think followslinks should be spelt like os.walk, and also default to 
False.



IMO the specific function os.walk() is not a good example. It includes
symlinks to directories in the dirs list and then it does not follow
symlink,


I agree that is a bad mix.


it is a recursive function and has a followlinks optional
parameter (default: False).


Which I think is desirable.


Moreover, in 92% of cases, functions using os.listdir() and
os.path.isdir() *follow* symlinks:
https://mail.python.org/pipermail/python-dev/2014-July/135435.html


Sigh.

This is a historic artifact, a convenience, and a side effect of bring symlinks 
into UNIX in the first place.


The objective was that symlinks should largely be transparent to users for 
naive operation. So the UNIX calls open/cd/listdir all follow symlinks so that 
things work transparently and a million C programs do not break. 

However, so do chmod/chgrp/chown, for the same reasons and with generally less 
desirable effects.


Conversely, the find command, for example, does not follow symlinks and this is 
generally a good thing. "ls" is the same. Like os.walk, they are for inspecting 
stuff, and shouldn't indirect unless asked.


I think following symlinks, especially for something like os.walk and 
os.scandir, should default to False. I DO NOT want to quietly wander to remote 
parts of the file space because someone has stuck a symlink somewhere 
unfortunate, lurking like a little bomb (or perhaps trapdoor, waiting to suck 
me down into an unexpected dark place).


It is also slower to follow symlinks by default.

I am also against flag parameters that default to True, on the whole; they are 
a failure of ergonomic design. Leaving off a flag should usually be like 
setting it to False. A missing flag is an "off" flag.


For these reasons (and others I have not yet thought through:-) I am voting for 
a:


  followlinks=False

optional parameter.

If you want to follow links, it is hardly difficult.

Cheers,
Cameron Simpson 

Our job is to make the questions so painful that the only way to make the
pain go away is by thinking.- Fred Friendly
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com