Re: [Distutils] Entry points: specifying and caching

2017-10-27 Thread Nathaniel Smith
On Fri, Oct 27, 2017 at 5:34 AM, Nick Coghlan  wrote:
> On 27 October 2017 at 18:10, Nathaniel Smith  wrote:
>>
>> On Thu, Oct 26, 2017 at 9:02 PM, Nick Coghlan  wrote:
>> > Option 2: temporary (or persistent) per-user-session cache
>> >
>> > * Pro: only the first query per path entry per user session incurs a
>> > linear
>> > DB read
>> > * Pro: given persistent cache dirs (e.g. XDG_CACHE_HOME, ~/.cache) even
>> > that
>> > overhead can be avoided
>> > * Pro: sys.path directory mtimes are sufficient for cache invalidation
>> > (subject to filesystem timestamp granularity)
>>
>> Timestamp granularity is a solvable problem. You just have to be
>> careful not to write out the cache unless the directory mtime is
>> sufficiently far in the past, like 10 seconds old, say. (This is an
>> old trick that VCSes use to make commands like 'git status'
>> fast-and-reliable.)
>
>
> Yeah, we just recently fixed a bug related to that in pyc file caching (If
> you managed to modify and reload a source file multiple times in the same
> second we could end up missing the later edits. The fix was to check the
> source timestamp didn't match the current timestamp before actually updating
> the cached copy on the filesystem)
>
>>
>> This does mean you can get in a weird state where if the directory
>> mtime somehow gets set to the future, then start time starts sucking
>> because caching goes away.
>
>
> For pyc files, we're able to avoid that by looking for cache *inconsistency*
> without making any assumptions about which direction time moves - as long as
> the source timestamp recorded in the file pyc doesn't match the source
> file's mtime, we'll refresh the cache.
>
> This is necessary to cope with things like version controlled directories,
> where directory mtimes can easily go backwards because you switched branches
> or reverted to an earlier version.

Yeah, this is a good idea, but it doesn't address the reason why some
systems refuse to update their caches when they see mtimes in the
future. The motivation there is that if the mtime is in the future,
then it's possible that at some point in the future, the mtime will
match the current time, and then if the directory is modified at that
moment, the cache will become silently invalid.

It's not clear how important this really is; you have to get somewhat
unlucky, and if you're seeing timestamps from the future then
timekeeping has obviously broken down somehow and nothing based on
mtimes can be reliable without reliable timekeeping. (For example,
even if the mtime seems to be in the past, the clock could get set
backwards and now the same mtime is in the future after all.) But
that's the reasoning I've seen.

> The os module has atomic write support on Windows in 3.x now:
> https://docs.python.org/3/library/os.html#os.replace
>
> So the only problematic case is 2.7 on WIndows, and for that Christian
> Heimes backported pyosreplace here: https://pypi.org/project/pyosreplace/
>
> (The "may be non-atomic" case is the same situation where it will fail
> outright on POSIX systems: when you're attempting to do the rename across
> filesystems. If you stay within the same directory, which you want to do
> anyway for permissions inheritance and automatic file labeling, it's
> atomic).

I've never been able to tell whether this is trustworthy or not; MS
documents the rename-across-filesystems case as an *example* of a case
where it's non-atomic, and doesn't document any atomicity guarantees
either way. Is it really atomic on FAT filesystems? On network
filesystems? (Do all versions of CIFS even give a way to express file
replacement as a single operation?) But there's folklore saying it's
OK...

I guess in this case atomicity wouldn't be that crucial anyway though.

>> > Option 3: persistent per-path-entry cache
>> >
>> > * Pro: assuming cache freshness means zero runtime queries incur a
>> > linear DB
>> > read (cache creation becomes an install time cost)
>> > * Con: if you don't assume cache freshness, you need option 1 or 2
>> > anyway,
>> > and the install time cache just speeds up that first linear read
>> > * Con: filesystem access control requires either explicit cache refresh
>> > or
>> > implicit metadata caching support in installers
>> > * Con: sys.path directory mtimes are no longer sufficient for cache
>> > invalidation (due to potential for directory relocation)
>>
>> Not sure what problem you're thinking of here? In this model we
>> wouldn't be using mtimes for cache invalidation anyway, because it'd
>> be the responsibility of those modifying the directory to update the
>> cache. And if you rename a whole directory, that doesn't affect its
>> mtime anyway?
>
>
> Your second sentence is what I meant - whether the cache is still valid or
> not is less about the mtime, and more about what other actions have been
> performed. (It's much closer to the locate/updatedb model, where the runtime
> part just 

Re: [Distutils] Entry points: specifying and caching

2017-10-27 Thread Nick Coghlan
On 27 October 2017 at 18:10, Nathaniel Smith  wrote:

> On Thu, Oct 26, 2017 at 9:02 PM, Nick Coghlan  wrote:
> > Option 2: temporary (or persistent) per-user-session cache
> >
> > * Pro: only the first query per path entry per user session incurs a
> linear
> > DB read
> > * Pro: given persistent cache dirs (e.g. XDG_CACHE_HOME, ~/.cache) even
> that
> > overhead can be avoided
> > * Pro: sys.path directory mtimes are sufficient for cache invalidation
> > (subject to filesystem timestamp granularity)
>
> Timestamp granularity is a solvable problem. You just have to be
> careful not to write out the cache unless the directory mtime is
> sufficiently far in the past, like 10 seconds old, say. (This is an
> old trick that VCSes use to make commands like 'git status'
> fast-and-reliable.)
>

Yeah, we just recently fixed a bug related to that in pyc file caching (If
you managed to modify and reload a source file multiple times in the same
second we could end up missing the later edits. The fix was to check the
source timestamp didn't match the current timestamp before actually
updating the cached copy on the filesystem)


> This does mean you can get in a weird state where if the directory
> mtime somehow gets set to the future, then start time starts sucking
> because caching goes away.
>

For pyc files, we're able to avoid that by looking for cache
*inconsistency* without making any assumptions about which direction time
moves - as long as the source timestamp recorded in the file pyc doesn't
match the source file's mtime, we'll refresh the cache.

This is necessary to cope with things like version controlled directories,
where directory mtimes can easily go backwards because you switched
branches or reverted to an earlier version.


>
> Note also that you'll want to explicitly write the observed directory
> mtime to the cache file, rather than comparing it to the cache file's
> mtime, to avoid the race condition where the directory gets modified
> just after we scan it but before we write out the cache.
>
> > * Pro: zero elevated privileges needed (cache would be stored in a
> per-user
> > directory tree)
> > * Con: interprocess locking likely needed to avoid the "thundering herd"
> > cache update problem [1]
>
> Interprocess filesystem locking is going to be far more painful than
> any problem it might solve. Seriously. At least on Unix, the right
> approach is to go ahead and regenerate the cache, and then atomically
> write it to the given place, and if someone else overwrites it a few
> milliseconds later then oh well.
>

Aye, limiting the handling for this to the use of atomic writes is likely
an entirely reasonable approach to take.


> I guess on Windows locking might be OK, given that it has no atomic
> writes and less gratuitously broken filesystem locking.


The os module has atomic write support on Windows in 3.x now:
https://docs.python.org/3/library/os.html#os.replace

So the only problematic case is 2.7 on WIndows, and for that Christian
Heimes backported pyosreplace here: https://pypi.org/project/pyosreplace/

(The "may be non-atomic" case is the same situation where it will fail
outright on POSIX systems: when you're attempting to do the rename across
filesystems. If you stay within the same directory, which you want to do
anyway for permissions inheritance and automatic file labeling, it's
atomic).

But you'd
> still want to make sure you never block when acquiring the lock; if
> the lock is already taken because someone else is in the middle of
> updating the cache, then you need to fall back on doing a linear scan.
> This is explicitly *not* avoiding the thundering herd problem, because
> it's more important to avoid the "one process got stuck and now
> everyone else freezes on startup waiting for it" problem.
>

Fair point.


> > * Con: if a non-persistent storage location is used, zero benefit over an
> > in-memory cache for throwaway environments (e.g. container startup)
>
> You also have to be careful about whether you have a writeable storage
> location at all, and if so whether you have the right permissions. (It
> might be bad if 'sudo somescript.py' leaves me with root-owned cache
> files in /home/njs/.cache/.)
>
> Filesystems are just a barrel of fun.
>

C'mon, who doesn't enjoy debugging SELinux file labeling problems arising
from mounting symlinked host directories into Docker containers running as
root internally? :)


>
> > * Con: cost of the cache freshness check will still scale linearly with
> the
> > number of sys.path entries
> >
> > Option 3: persistent per-path-entry cache
> >
> > * Pro: assuming cache freshness means zero runtime queries incur a
> linear DB
> > read (cache creation becomes an install time cost)
> > * Con: if you don't assume cache freshness, you need option 1 or 2
> anyway,
> > and the install time cache just speeds up that first linear read
> > * Con: filesystem access control requires either explicit cache 

Re: [Distutils] Entry points: specifying and caching

2017-10-27 Thread Nathaniel Smith
On Thu, Oct 26, 2017 at 9:02 PM, Nick Coghlan  wrote:
> Option 2: temporary (or persistent) per-user-session cache
>
> * Pro: only the first query per path entry per user session incurs a linear
> DB read
> * Pro: given persistent cache dirs (e.g. XDG_CACHE_HOME, ~/.cache) even that
> overhead can be avoided
> * Pro: sys.path directory mtimes are sufficient for cache invalidation
> (subject to filesystem timestamp granularity)

Timestamp granularity is a solvable problem. You just have to be
careful not to write out the cache unless the directory mtime is
sufficiently far in the past, like 10 seconds old, say. (This is an
old trick that VCSes use to make commands like 'git status'
fast-and-reliable.)

This does mean you can get in a weird state where if the directory
mtime somehow gets set to the future, then start time starts sucking
because caching goes away.

Note also that you'll want to explicitly write the observed directory
mtime to the cache file, rather than comparing it to the cache file's
mtime, to avoid the race condition where the directory gets modified
just after we scan it but before we write out the cache.

> * Pro: zero elevated privileges needed (cache would be stored in a per-user
> directory tree)
> * Con: interprocess locking likely needed to avoid the "thundering herd"
> cache update problem [1]

Interprocess filesystem locking is going to be far more painful than
any problem it might solve. Seriously. At least on Unix, the right
approach is to go ahead and regenerate the cache, and then atomically
write it to the given place, and if someone else overwrites it a few
milliseconds later then oh well.

I guess on Windows locking might be OK, given that it has no atomic
writes and less gratuitously broken filesystem locking. But you'd
still want to make sure you never block when acquiring the lock; if
the lock is already taken because someone else is in the middle of
updating the cache, then you need to fall back on doing a linear scan.
This is explicitly *not* avoiding the thundering herd problem, because
it's more important to avoid the "one process got stuck and now
everyone else freezes on startup waiting for it" problem.

> * Con: if a non-persistent storage location is used, zero benefit over an
> in-memory cache for throwaway environments (e.g. container startup)

You also have to be careful about whether you have a writeable storage
location at all, and if so whether you have the right permissions. (It
might be bad if 'sudo somescript.py' leaves me with root-owned cache
files in /home/njs/.cache/.)

Filesystems are just a barrel of fun.

> * Con: cost of the cache freshness check will still scale linearly with the
> number of sys.path entries
>
> Option 3: persistent per-path-entry cache
>
> * Pro: assuming cache freshness means zero runtime queries incur a linear DB
> read (cache creation becomes an install time cost)
> * Con: if you don't assume cache freshness, you need option 1 or 2 anyway,
> and the install time cache just speeds up that first linear read
> * Con: filesystem access control requires either explicit cache refresh or
> implicit metadata caching support in installers
> * Con: sys.path directory mtimes are no longer sufficient for cache
> invalidation (due to potential for directory relocation)

Not sure what problem you're thinking of here? In this model we
wouldn't be using mtimes for cache invalidation anyway, because it'd
be the responsibility of those modifying the directory to update the
cache. And if you rename a whole directory, that doesn't affect its
mtime anyway?

> * Con: interprocess locking arguably still needed to avoid the "thundering
> herd" cache update problem (just between installers rather than runtime
> processes)

If two installers are trying to rearrange the same directory at the
same time then they can conflict in lots of ways. For the most part
people get away with it because doing multiple 'pip install' runs in
parallel is generally considered a Bad Idea and unlikely to happen by
accident; and if it is a problem then we should add locking anyway
(like dpkg and rpm already do), regardless of the cache update part.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-26 Thread Nick Coghlan
On 27 October 2017 at 01:45, Thomas Kluyver  wrote:

>
>
> On Thu, Oct 26, 2017, at 03:57 PM, Daniel Holth wrote:
>
> Would something as simple as a file per sys.path with the 'last modified
> by installer' date be helpful? You could check those to determine whether
> your cache was out of date.
>
>
> I wonder if we could use the directory mtime for this? It's only really
> useful if we can be confident that all installer tools will update it.
>

There are lots of options for this, and one thing worth keeping in mind is
compatibility with the "monolithic system package" model, where the entire
preconfigured virtual environment gets archived, and then dropped into
place on the target system. In such cases, filesystem level mtimes may
change *without* the entry point cache actually being out of date.

In that context, it's worth keeping in mind what the actual goals of the
cache will be:

1. The entry point cache should ideally reflect the state of installed
components in a given execution environment at the time of access. If this
is not true, installing a component may require explicit cache
invalidation/rebuilding to get things back to a consistent state (similar
to the way a call to importlib.invalidate_caches() is needed to reliably
see filesystem changes)
2. Checking for available entry points in a given group should be
consistently cheap (ideally O(1)), rather than scaling with the number of
packages installed or the number of sys.path entries

Given those goals, there are a number of different points in time where the
cache can be generated, each with different trade-offs between how reliably
fresh the cache is, and how frequently you have to rebuild the cache.

Option 1: in-memory cache

* Pro: consistent with the way importlib caches work
* Pro: automatically adjusts to sys.path changes
* Pro: will likely be needed regardless to handle per-path-entry caches
with other methods
* Con: every process incurs at least 1 linear DB read
* Con: zero pay-off if you only query one entry point group
* Con: requires explicit invalidation to pick up filesystem changes (but
can hook into importlib.invalidate_caches())

Option 2: temporary (or persistent) per-user-session cache

* Pro: only the first query per path entry per user session incurs a linear
DB read
* Pro: given persistent cache dirs (e.g. XDG_CACHE_HOME, ~/.cache) even
that overhead can be avoided
* Pro: sys.path directory mtimes are sufficient for cache invalidation
(subject to filesystem timestamp granularity)
* Pro: zero elevated privileges needed (cache would be stored in a per-user
directory tree)
* Con: interprocess locking likely needed to avoid the "thundering herd"
cache update problem [1]
* Con: if a non-persistent storage location is used, zero benefit over an
in-memory cache for throwaway environments (e.g. container startup)
* Con: cost of the cache freshness check will still scale linearly with the
number of sys.path entries

Option 3: persistent per-path-entry cache

* Pro: assuming cache freshness means zero runtime queries incur a linear
DB read (cache creation becomes an install time cost)
* Con: if you don't assume cache freshness, you need option 1 or 2 anyway,
and the install time cache just speeds up that first linear read
* Con: filesystem access control requires either explicit cache refresh or
implicit metadata caching support in installers
* Con: sys.path directory mtimes are no longer sufficient for cache
invalidation (due to potential for directory relocation)
* Con: interprocess locking arguably still needed to avoid the "thundering
herd" cache update problem (just between installers rather than runtime
processes)

Given those trade-offs, I think it would probably make the most sense to
start out by exploring a combination of options 1 & 2, and then only
explore option 3 based on demonstrated performance problems with a
per-user-session caching model. My rationale for that is that even in an
image based "immutable infrastructure" deployment model, it's often
entirely feasible to preseed runtime caches as part of the build process,
and in cases where that *isn't* possible, you're likely also going to have
trouble generating per-path-entry caches.

Cheers,
Nick.

[1] https://en.wikipedia.org/wiki/Thundering_herd_problem

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-26 Thread Thomas Kluyver


On Thu, Oct 26, 2017, at 03:57 PM, Daniel Holth wrote:
> Would something as simple as a file per sys.path with the 'last
> modified by installer' date be helpful? You could check those to
> determine whether your cache was out of date.
I wonder if we could use the directory mtime for this? It's only really
useful if we can be confident that all installer tools will update it.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-26 Thread Daniel Holth
I agree. The "malware" problem is really a "how do I understand which hooks
run in each environment" problem. The hooks could slow down or confuse,
frustrate people in ways that were unrelated to any malicious intent.

The caching could just be a more efficient, lossless representation of the
*.dist/egg-info data model.
Would something as simple as a file per sys.path with the 'last modified by
installer' date be helpful? You could check those to determine whether your
cache was out of date.

Another option would be to try to investigate whether the per-sys-path
operations that 'import x' has to do anyway can be cached and shared with
pkg_resources?

On Thu, Oct 26, 2017 at 8:21 AM Nick Coghlan  wrote:

> On 26 October 2017 at 18:33, Thomas Kluyver  wrote:
>
>> Nathaniel raises the point that it may be easier to convince other
>> package managers to regenerate an entry points cache than to call arbitrary
>> Python hooks on install.
>>
>
> At least for RPM, we have file triggers now, whereby system packages can
> register a hook to say "Any time another package touches a file under  of interest> I want to know about it".
>
> That means the exact semantics of any RPM integration would likely end up
> just living in a file trigger, so it wouldn't matter to much whether that
> trigger was "refresh these predefined caches" or "run any installed hooks
> based on the defined Python level metadata".
>
> However, I expect it would be much easier to define a "optionally export
> data for caching in a more efficient key value store" API than it would be
> to define an API for arbitrary pre-/post- [un]install hooks. In particular,
> a caching API is much easier to *repair*, since the "source of truth"
> remains the installation DB itself - the cache is just to speed up runtime
> lookups.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-26 Thread Nick Coghlan
On 26 October 2017 at 18:33, Thomas Kluyver  wrote:

> Nathaniel raises the point that it may be easier to convince other package
> managers to regenerate an entry points cache than to call arbitrary Python
> hooks on install.
>

At least for RPM, we have file triggers now, whereby system packages can
register a hook to say "Any time another package touches a file under  I want to know about it".

That means the exact semantics of any RPM integration would likely end up
just living in a file trigger, so it wouldn't matter to much whether that
trigger was "refresh these predefined caches" or "run any installed hooks
based on the defined Python level metadata".

However, I expect it would be much easier to define a "optionally export
data for caching in a more efficient key value store" API than it would be
to define an API for arbitrary pre-/post- [un]install hooks. In particular,
a caching API is much easier to *repair*, since the "source of truth"
remains the installation DB itself - the cache is just to speed up runtime
lookups.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-26 Thread Thomas Kluyver
On Sat, Oct 21, 2017, at 07:59 AM, Nick Coghlan wrote:
> Yeah, here's the gist of what I had in mind regarding the malware
> problem (i.e. aiming to ensure we don't get all of setup.py's problems
> back again):> 
> - a package's own install hooks do *not* get called for that package
> - hooks only run by default inside a virtualenv as a regular user
> - outside a virtualenv, the default is "hooks don't get run at all"

This one would make caching much less useful for me, because I install a
lot of stuff with 'pip install --user'.
I'm not really sure how useful this protection is. A malicious
package can shadow common module names and command names, so once
it's installed, it has an excellent chance of getting to run code,
even without hooks. And virtualenvs are not a security boundary -
malware installed in a virtualenv is just as bad as malware installed
with --user.
Moving away from running 'setup.py' to install stuff protects us against
packages doing silly things like running pip in a subprocess, but it
provides very little protection against deliberately malicious packages.
If we're going to do package install hooks, let's not cripple them by
trying to introduce security that doesn't really achieve much.
Nathaniel raises the point that it may be easier to convince other
package managers to regenerate an entry points cache than to call
arbitrary Python hooks on install. I guess the key question here is: how
many other use cases can we see for package install/uninstall hooks, and
how would those work with other packaging systems?
Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-26 Thread Nathaniel Smith
On Fri, Oct 20, 2017 at 11:59 PM, Nick Coghlan  wrote:
> Yeah, here's the gist of what I had in mind regarding the malware problem
> (i.e. aiming to ensure we don't get all of setup.py's problems back again):
>
> - a package's own install hooks do *not* get called for that package

Doesn't that break the entry point caching use case that started this
whole discussion? When you first install the caching package, then it
has to immediately build the cache for the first time.

I don't really have the time or interest to dig into this (I know
there are legitimate use cases for entry points but I'm very wary of
any feature where package A starts doing something different because
package B was installed). But, I just wanted to throw out that I see
at least two reasons we might want to "bake in" the caching as part of
our PEPified metadata:

- if we do want to add "install hooks", then we need some way for a
package to declare it has an install hook and for pip-or-whoever to
find it. The natural way would be to use an entry point, which means
entry points are in some sense "more fundamental" than install hooks.

- in general, the only thing that can update an entry-point cache is
the package that's doing the install, at the time it runs. In
particular, consider an environment with some packages installed in
/usr, some in /usr/local, some in ~/.local/. Really you want one cache
in each location, and then to have dpkg/rpm responsible for updating
the /usr cache (this is something they're familiar with, it's
isomorphic to stuff like /etc/ld.so.cache), 'sudo pip' responsible for
updating the /usr/local cache, and 'pip --user' responsible for
updating the ~/.local/ cache. If we go the install hook route instead,
then when I do 'pip install --user entry_point_cacher' then there's no
way that it'll ever have the permissions to write to /usr, and maybe
not to /usr/local either depending on how you want to handle the
interaction between 'sudo pip' and ~/.local/ install hooks, so it
just... won't actually work as a caching tool. Similarly, it's
probably easier to convince conda to regenerate a single standard
entry point cache after installing a conda package, than it would be
to convince them to run generic wheel install hooks when not even
installing wheels.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-25 Thread Nick Coghlan
On 21 October 2017 at 18:21, Nick Coghlan  wrote:

> On 21 October 2017 at 18:04, Wes Turner  wrote:
> > On Saturday, October 21, 2017, Nick Coghlan  wrote:
> >> I'm also going to file an issue on the setuptools issue tracker to make
> sure Jason is aware of what we're doing, and get his explicit OK with the
> idea of making the format a PyPA interoperability specification (if he
> isn't, we'll demote Thomas's document to being a guide for tool developers
> aiming for pkg_resources interoperability).
> >
> > What are the URIs for this PR and issue?
>
> New setuptools issue: https://github.com/pypa/setuptools/issues/1179 (I
> hadn't filed it yet when I wrote the previous comment)
> Thomas's PR: https://github.com/pypa/python-packaging-user-guide/pull/390
>

With Jason's +1 on the setuptools issue, I've gone ahead and hit the merge
button on Thomas's PR:
https://github.com/pypa/python-packaging-user-guide/commit/34c37f0e66821127a8cbe59fa1f33dca0cf20d97

The spec is now available here
https://packaging.python.org/specifications/entry-points/, and
clarifications and corrections can be submitted as follow-up PRs (as for
other PyPA specifications).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-21 Thread Nick Coghlan
On 21 October 2017 at 18:04, Wes Turner  wrote:
> On Saturday, October 21, 2017, Nick Coghlan  wrote:
>> I'm also going to file an issue on the setuptools issue tracker to make
sure Jason is aware of what we're doing, and get his explicit OK with the
idea of making the format a PyPA interoperability specification (if he
isn't, we'll demote Thomas's document to being a guide for tool developers
aiming for pkg_resources interoperability).
>
> What are the URIs for this PR and issue?

New setuptools issue: https://github.com/pypa/setuptools/issues/1179 (I
hadn't filed it yet when I wrote the previous comment)
Thomas's PR: https://github.com/pypa/python-packaging-user-guide/pull/390

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-21 Thread Wes Turner
On Saturday, October 21, 2017, Nick Coghlan  wrote:

> On 20 October 2017 at 23:42, Donald Stufft  > wrote:
>
>> On Oct 20, 2017, at 9:35 AM, Nick Coghlan > > wrote:
>> The interoperability spec is going to state that conflict resolution when
>> the same name within a group is declared by multiple packages is the
>> responsibility of the group consumer, so documenting the format should
>> actually improve this situation, since it allows for the development of
>> competing conflict resolution strategies in different runtime libraries.
>>
>> I think it makes it *worse*, because now the behavior isn’t just a
>> entrypoints weirdness, but now it changes based on which runtime library
>> you use (which isn’t something that end users are likely to have much
>> insight into) and it represents a footgun that package authors are unlikely
>> to be aware of. If mycoolentrypointslib comes out that is faster, but
>> changes some subtle behavior like this it’ll break people, but that is
>> unlikely going to be an effect that people expect to happen just because
>> they switched between two things both implementing the same standard.
>>
>> So effectively this means that not only is the fact you’re using
>> entrypoints part of your API, but now which entry point library you’re
>> using at runtime is now also part of your API.
>>
>
> The semantics of conflict resolution across different projects is a
> concern that mainly affects app developers a large established plugin base,
> and even with pkg_resources the question of whether or not multiple
> projects re-using the same entrypoint name is a problem depends on how the
> application uses that information.
>
> With console_scripts and gui_scripts, name conflicts can definitely be a
> problem, since different projects will end up fighting over the same
> filename for their executable script wrapper.
>
> For other use cases (like some of the ones Doug described for stevedore),
> it's less of a concern, because the names never get collapsed into a single
> flat namespace the way script wrappers do.
>
> Cheers,
> Nick.
>
> P.S. Thanks for your comments on the PR - they're helping to make sure we
> accurately capture the status quo. I'm also going to file an issue on the
> setuptools issue tracker to make sure Jason is aware of what we're doing,
> and get his explicit OK with the idea of making the format a PyPA
> interoperability specification (if he isn't, we'll demote Thomas's document
> to being a guide for tool developers aiming for pkg_resources
> interoperability).
>

What are the URIs for this PR and issue?


>
> --
> Nick Coghlan   |   ncogh...@gmail.com
>    |   Brisbane,
> Australia
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-21 Thread Nick Coghlan
On 21 October 2017 at 05:26, Doug Hellmann  wrote:

> I would also like to compare the performance of a few approaches
> (1 file per sys.path hash using INI, JSON, and sqlite; one file per
> entry on sys.path using the same formats) using a significant number
> of plugins (~100?) before we decide.
>

If you can manage it, you'll want to run at least some of those tests with
the plugins and their metadata mounted via a network drive. When the import
system switched from multiple stat calls to cached os.listdir() lookups,
SSD and spinning disk imports received a minor speedup, but NFS imports
improved *dramatically* (folks reported order of magnitude improvements,
along the lines of startup times dropping from 2-3 seconds to 200-300 ms).

I'd expect to see a similar pattern here - inefficient file access patterns
can be tolerable with an SSD, and even spinning disks, but the higher
latency involved in accessing network drives will make you pay for it.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-21 Thread Nick Coghlan
On 20 October 2017 at 23:42, Donald Stufft  wrote:

> On Oct 20, 2017, at 9:35 AM, Nick Coghlan  wrote:
> The interoperability spec is going to state that conflict resolution when
> the same name within a group is declared by multiple packages is the
> responsibility of the group consumer, so documenting the format should
> actually improve this situation, since it allows for the development of
> competing conflict resolution strategies in different runtime libraries.
>
> I think it makes it *worse*, because now the behavior isn’t just a
> entrypoints weirdness, but now it changes based on which runtime library
> you use (which isn’t something that end users are likely to have much
> insight into) and it represents a footgun that package authors are unlikely
> to be aware of. If mycoolentrypointslib comes out that is faster, but
> changes some subtle behavior like this it’ll break people, but that is
> unlikely going to be an effect that people expect to happen just because
> they switched between two things both implementing the same standard.
>
> So effectively this means that not only is the fact you’re using
> entrypoints part of your API, but now which entry point library you’re
> using at runtime is now also part of your API.
>

The semantics of conflict resolution across different projects is a concern
that mainly affects app developers a large established plugin base, and
even with pkg_resources the question of whether or not multiple projects
re-using the same entrypoint name is a problem depends on how the
application uses that information.

With console_scripts and gui_scripts, name conflicts can definitely be a
problem, since different projects will end up fighting over the same
filename for their executable script wrapper.

For other use cases (like some of the ones Doug described for stevedore),
it's less of a concern, because the names never get collapsed into a single
flat namespace the way script wrappers do.

Cheers,
Nick.

P.S. Thanks for your comments on the PR - they're helping to make sure we
accurately capture the status quo. I'm also going to file an issue on the
setuptools issue tracker to make sure Jason is aware of what we're doing,
and get his explicit OK with the idea of making the format a PyPA
interoperability specification (if he isn't, we'll demote Thomas's document
to being a guide for tool developers aiming for pkg_resources
interoperability).

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-21 Thread Nick Coghlan
On 21 October 2017 at 06:50, Daniel Holth  wrote:

> I like the idea of lifecycle hooks but I worry about the malware problem;
> would there need to be a blacklist / whitelist / disable system?
> (ignore-scripts=true is now a recommended part of anyone's npm
> configuration) That is why we have avoided any kind of (package specific)
> hooks to wheel. However hooks would be a very elegant way to avoid worrying
> about core pip functionality since it wouldn't be core functionality.
>

Yeah, here's the gist of what I had in mind regarding the malware problem
(i.e. aiming to ensure we don't get all of setup.py's problems back again):

- a package's own install hooks do *not* get called for that package
- hooks only run by default inside a virtualenv as a regular user
- outside a virtualenv, the default is "hooks don't get run at all"
- when running with elevated privileges, the default is "hooks don't get
run at all"

There are still some open questions with it (like what to do with hooks
defined in packages that get implicitly coinstalled as a dependency), and
having the default behaviour depend on both "venv or not" and "superuser or
not" may prove confusing, but it would avoid a number of the things we
dislike about install-time setup.py invocation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Daniel Holth
I like the idea of lifecycle hooks but I worry about the malware problem;
would there need to be a blacklist / whitelist / disable system?
(ignore-scripts=true is now a recommended part of anyone's npm
configuration) That is why we have avoided any kind of (package specific)
hooks to wheel. However hooks would be a very elegant way to avoid worrying
about core pip functionality since it wouldn't be core functionality.

On Fri, Oct 20, 2017 at 4:41 PM Nathaniel Smith  wrote:

> On Oct 19, 2017 11:10, "Donald Stufft"  wrote:
>
>
> EXCEPT, for the fact that with the desire to cache things, it would be
> beneficial to “hook” into the lifecycle of a package install. However I
> know that there are other plugin systems out there that would like to also
> be able to do that (Twisted Plugins come to mind) and that I think outside
> of plugin systems, such a mechanism is likely to be useful in general for
> other cases.
>
> So heres a different idea that is a bit more ambitious but that I think is
> a better overall idea. Let entrypoints be a setuptools thing, and lets
> define some key lifecycle hooks during the installation of a package and
> some mechanism in the metadata to let other tools subscribe to those hooks.
> Then  a caching layer could be written for setuptools entrypoints to make
> that faster without requiring standardization, but also a whole new, better
> plugin system could to, Twisted plugins could benefit, etc [1].
>
>
> In this hypothetical system, how do installers like pip find the list of
> hooks to call? By looking up an entrypoint? (Sorry if this was discussed
> downthread; I didn't see it but I admit I only skimmed.)
>
> -n
>
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Doug Hellmann
Excerpts from Nathaniel Smith's message of 2017-10-20 13:41:03 -0700:
> On Oct 19, 2017 11:10, "Donald Stufft"  wrote:
> 
> 
> EXCEPT, for the fact that with the desire to cache things, it would be
> beneficial to “hook” into the lifecycle of a package install. However I
> know that there are other plugin systems out there that would like to also
> be able to do that (Twisted Plugins come to mind) and that I think outside
> of plugin systems, such a mechanism is likely to be useful in general for
> other cases.
> 
> So heres a different idea that is a bit more ambitious but that I think is
> a better overall idea. Let entrypoints be a setuptools thing, and lets
> define some key lifecycle hooks during the installation of a package and
> some mechanism in the metadata to let other tools subscribe to those hooks.
> Then  a caching layer could be written for setuptools entrypoints to make
> that faster without requiring standardization, but also a whole new, better
> plugin system could to, Twisted plugins could benefit, etc [1].

Having post-install and pre-uninstall hooks should be sufficient for
updating a cache, assuming the hook could be given enough information
about the thing being manipulated to probe for whatever data it
needs.

> In this hypothetical system, how do installers like pip find the list of
> hooks to call? By looking up an entrypoint? (Sorry if this was discussed
> downthread; I didn't see it but I admit I only skimmed.)

That's how I would expect it to work. Using setuptools most likely?
That would mean that other plugin systems would have to provide one
setuptools plugin to hook into the installer to build a lookup
cache, but the actual plugins wouldn't have to use setuptools for
anything.

Doug
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Nathaniel Smith
On Oct 19, 2017 11:10, "Donald Stufft"  wrote:


EXCEPT, for the fact that with the desire to cache things, it would be
beneficial to “hook” into the lifecycle of a package install. However I
know that there are other plugin systems out there that would like to also
be able to do that (Twisted Plugins come to mind) and that I think outside
of plugin systems, such a mechanism is likely to be useful in general for
other cases.

So heres a different idea that is a bit more ambitious but that I think is
a better overall idea. Let entrypoints be a setuptools thing, and lets
define some key lifecycle hooks during the installation of a package and
some mechanism in the metadata to let other tools subscribe to those hooks.
Then  a caching layer could be written for setuptools entrypoints to make
that faster without requiring standardization, but also a whole new, better
plugin system could to, Twisted plugins could benefit, etc [1].


In this hypothetical system, how do installers like pip find the list of
hooks to call? By looking up an entrypoint? (Sorry if this was discussed
downthread; I didn't see it but I admit I only skimmed.)

-n
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Wes Turner
On Friday, October 20, 2017, Doug Hellmann  wrote:

> Excerpts from Wes Turner's message of 2017-10-20 10:41:02 -0400:
> > On Friday, October 20, 2017, Donald Stufft  > wrote:
> >
> > >
> > >
> > > On Oct 20, 2017, at 9:35 AM, Nick Coghlan  
> > > ');>>
> wrote:
> > >
> > > On 20 October 2017 at 23:19, Donald Stufft  
> > > ');>>
> wrote:
> > >
> > >> One that I was helping someone debug just the other day is that
> they’re
> > >> super non-debuggable and the behavior when you have two things
> providing
> > >> the same entry point is basically  (If I remember correctly, the
> > >> behavior is that the first thing found is the one that “wins”, which
> means
> > >> the ordering of sys.path and the names of the projects supply it is
> what
> > >> determines it). This got exposed to the end user that they installed
> > >> something that they thought was going to add support for something,
> but
> > >> which silently did nothing because two different project happened to
> pick
> > >> the same name for their entry point (not the group, it was two things
> > >> providing plugins for the same system).
> > >>
> > >
> > > While I agree with this, I think that's a combination of pkg_resources
> > > itself being hard to debug in general, and the fact that pkg_resources
> > > doesn't clearly define the semantics of how it resolves name conflicts
> > > within an entry point group - as far as I know, it's largely an
> accident of
> > > implementation.
> > >
> > > The interoperability spec is going to state that conflict resolution
> when
> > > the same name within a group is declared by multiple packages is the
> > > responsibility of the group consumer, so documenting the format should
> > > actually improve this situation, since it allows for the development of
> > > competing conflict resolution strategies in different runtime
> libraries.
> > >
> > >
> > > I think it makes it *worse*, because now the behavior isn’t just a
> > > entrypoints weirdness, but now it changes based on which runtime
> library
> > > you use (which isn’t something that end users are likely to have much
> > > insight into) and it represents a footgun that package authors are
> unlikely
> > > to be aware of. If mycoolentrypointslib comes out that is faster, but
> > > changes some subtle behavior like this it’ll break people, but that is
> > > unlikely going to be an effect that people expect to happen just
> because
> > > they switched between two things both implementing the same standard.
> > >
> > > So effectively this means that not only is the fact you’re using
> > > entrypoints part of your API, but now which entry point library you’re
> > > using at runtime is now also part of your API.
> > >
> >
> > When should the check for duplicate entry points occur?
> >
> > - At on_install() time (+1)
> > - At runtime
> >
> > Is a sys.path-like OrderedDict preemptive strategy preferable or just as
> > dangerous as importlib?
>
> Having "duplicate" entry points is not necessarily an error. It's
> a different usage pattern.  The semantics of dropping a named plugin
> into a namespace are defined by the application and plugin-point.
> Please do not build assumptions about uniqueness into the underlying
> implementation.


I think that, at least with console_scripts, we already assume uniqueness:
if there's another package which provides a 'pip' console_script, for
example, there's not yet an error message?

Would it be helpful to at least spec that iterated entrypoints are in
sys.path order? And then what about entrypoints coming from the same path
in sys.path: alphabetical? Whatever hash randomization does with it?

Whenever I feel unsure about my data model, I tend to sometimes read the
OWL spec: here, the OWL spec has owl:cardinality OR owl:minCardinality OR
owl:maxCardinality. Some entrypoints may have 0, only one, or n "instances"?

We should throw an error if a given console_script entrypoint has more than
one "instance" (exceeds maxCardinality xsd:string = 1).


>
> The stevedore library wraps up pkg_resources with several such
> patterns. For example, it supports "give me all of the plugins in
> a namespace" (find all the extensions to your app), "give me all
> of the plugins named $name in a namespace" (find the hooks for a
> specific event defined by the app), and "give me *the* plugin named
> $name in a namespace" (load a driver for talking to a backend).
>
> https://docs.openstack.org/stevedore/latest/reference/index.html


https://github.com/openstack/stevedore/blob/master/stevedore/extension.py

https://github.com/openstack/stevedore/blob/master/stevedore/tests/test_extension.py

These tests mention saving discovered entry points in a cache?
___

Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Doug Hellmann
Excerpts from Thomas Kluyver's message of 2017-10-20 19:37:45 +0100:
> On Fri, Oct 20, 2017, at 07:24 PM, Doug Hellmann wrote:
> > I have been trying to find time to do something like that within
> > stevedore for a while to solve some client-side startup performance
> > issues with the OpenStack client. I would be happy to help add it
> > to entrypoints instead and use it from there.
> > 
> > Thomas, please me know how I can help.
> 
> Thanks Doug! For starters, I'd be interested to hear any plans you have
> for how to tackle caching, or any thoughts you have on the rough plan I
> described before. If you're happy with the concepts, I'll have a go at
> implementing it. I'll probably consider it experimental until there's a
> hooks mechanism to trigger rebuilding the cache when packages are
> installed or uninstalled.
> 
> Thomas

I assumed that the user loading the plugins might not be able to
write to any of the directories on sys.path (aside from "." and we
don't want to put a cache file there), so my plan was to build the
cache the first time entry points were scanned and use appdirs [1]
to pick a cache location specific to the user.  I thought I would
use the value of sys.path as a string (joining the paths together
with a separator of some sort) to create a hash for the cache file
ID. Some of that may be obviated if we assume a setuptools hook that
lets us update the cache(s) when a package is installed.

I also thought I'd provide a command line tool to generate the cache
just in case it became corrupted or if someone wanted to update it
by hand for some other reason, similar to Nick's locate/updatedb
parallel UX example (and re-reading your email, I see you mention this,
too).

I hadn't gone as far as deciding on a file format, but sqlite, JSON,
and INI (definitely something built-in) were all on my mind.  I
planned to see if we would actually gain enough of a boost just by
placing a separate file for each dist in a single cache directory,
rather than trying to merge everything into one file. In addition
to eliminating the concurrency issue, that approach might have the
additional benefit of simplifying operating system packages, because
they could just add a new file to the package instead of having to
run a command to update the cache when a package was installed (if
the file is the same format as entry_points.txt but with a different
name, that's even simpler since it's just a copy of a file that
will already be available during packaging).

Your idea of having a cache file per directory on sys.path is also
interesting, though I have to admit I'm not familiar enough with
the import machinery to know if it's easy to determine the containing
directory for a dist to find the right cache to update. I am
interested in hearing more details about what you planned there.

I would also like to compare the performance of a few approaches
(1 file per sys.path hash using INI, JSON, and sqlite; one file per
entry on sys.path using the same formats) using a significant number
of plugins (~100?) before we decide.

I agree with your statement in the original email that applications
should be able to disable the cache. I'm not sure it makes sense
to have a mode that only reads from a cache, but I may just not see
the use case for that.

What's our next step?

Doug

[1] https://pypi.python.org/pypi/appdirs/1.4.3
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Thomas Kluyver
On Fri, Oct 20, 2017, at 07:31 PM, Marius Gedminas wrote:
> Please do not forget about gui_scripts entry points!

I haven't forgotten about them in the draft spec:
https://github.com/pypa/python-packaging-user-guide/pull/390/files#diff-089b079de062f6fdb759bb719b79e6c8R121
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Marius Gedminas
On Fri, Oct 20, 2017 at 08:10:06AM -0400, Donald Stufft wrote:
> Packaging tools shouldn’t be expected to know anything about it other
> than the console_scripts feature

Please do not forget about gui_scripts entry points!

Marius Gedminas
-- 
What can I do with Python that I can't do with C#?  You can go home on time at
the end of the day.
-- Daniel Klein


signature.asc
Description: PGP signature
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Thomas Kluyver
On Fri, Oct 20, 2017, at 07:24 PM, Doug Hellmann wrote:
> I have been trying to find time to do something like that within
> stevedore for a while to solve some client-side startup performance
> issues with the OpenStack client. I would be happy to help add it
> to entrypoints instead and use it from there.
> 
> Thomas, please me know how I can help.

Thanks Doug! For starters, I'd be interested to hear any plans you have
for how to tackle caching, or any thoughts you have on the rough plan I
described before. If you're happy with the concepts, I'll have a go at
implementing it. I'll probably consider it experimental until there's a
hooks mechanism to trigger rebuilding the cache when packages are
installed or uninstalled.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Doug Hellmann
Excerpts from Nick Coghlan's message of 2017-10-20 14:42:09 +1000:
> On 20 October 2017 at 02:14, Thomas Kluyver  wrote:
> 
> > On Thu, Oct 19, 2017, at 04:10 PM, Donald Stufft wrote:
> > > I’m in favor, although one question I guess is whether it should be a a
> > > PEP or an ad hoc spec. Given (2) it should *probably* be a a PEP (since
> > > without (2), its just another file in the .dist-info directory and that
> > > doesn’t actually need standardized at all). I don’t think that this will
> > > be a very controversial PEP though, and should be pretty easy.
> >
> > I have opened a PR to document what is already there, without adding any
> > new features. I think this is worth doing even if we don't change
> > anything, since it's a de-facto standard used for different tools to
> > interact.
> >
> > https://github.com/pypa/python-packaging-user-guide/pull/390
> >
> > We can still write a PEP for caching if necessary.
> >
> 
> +1 for that approach (PR for the status quo, PEP for a shared metadata
> caching design) from me
> 
> Making the status quo more discoverable is valuable in its own right, and
> the only decisions we'll need to make for that are terminology
> clarification ones, not interoperability ones (this isn't like PEP 440 or
> 508 where we actually thought some of the default setuptools behaviour was
> slightly incorrect and wanted to change it).
> 
> Figuring out a robust cross-platform network-file-system-tolerant metadata
> caching design on the other hand is going to be hard, and as Donald
> suggests, the right ecosystem level solution might be to define
> install-time hooks for package installation operations.
> 
> > > I’m also in favor of this. Although I would suggest SQLite rather than a
> > > JSON file for the primary reason being that a JSON file isn’t
> > > multiprocess safe without being careful (and possibly introducing
> > > locking) whereas SQLite has already solved that problem.
> >
> > SQLite was actually my first thought, but from experience in Jupyter &
> > IPython I'm wary of it - its built-in locking does not work well over
> > NFS, and it's easy to corrupt the database. I think careful use of
> > atomic writing can be more reliable (though that has given us some
> > problems too).
> >
> > That may be easier if there's one cache per user, though - we can
> > perhaps try to store it somewhere that's not NFS.
> >
> 
> I'm wondering if rather than jumping straight to a PEP, it may make sense
> to instead initially pursue this idea as a *non-*standard, implementation
> dependent thing specific to the "entrypoints" project. There are a *lot* of
> challenges to be taken into account for a truly universal metadata caching
> design, and it would be easy to fall into the trap of coming up with a
> design so complex that nobody can realistically implement it.
> 
> Specifically, I'm thinking of a usage model along the lines of the
> updatedb/locate pair on *nix systems: `locate` gives you access to very
> fast searches of your filesystem, but it *doesn't* try to automagically
> keeps its indexes up to date. Instead, refreshing the indexes is handled by
> `updatedb`, and you can either rely on that being run automatically in a
> cron job, or else force an update with `sudo updatedb` when you want to use
> `locate`.
> 
> For a project like entrypoints, what that might look like is that at
> *runtime*, you may implement a reasonably fast "cache freshness check",
> where you scanned the mtime of all the sys.path entries, and compared those
> to the mtime of the cache. If the cache looks up to date, then cool,
> otherwise emit a warning about the stale metadata cache, and then bypass it.
> 
> The entrypoints project itself could then expose a
> `refresh-entrypoints-cache` command that could start out only supporting
> virtual environments, and then extend to per-user caching, and then finally
> (maybe) consider whether or not it wanted to support installation-wide
> caches (with the extra permissions management and cross-process and
> cross-system coordination that may imply).
> 
> Such an approach would also tie in nicely with Donald's suggestion of
> reframing the ecosystem level question as "How should the entrypoints
> project request that 'refresh-entrypoints-cache' be run after every package
> installation or removal operation?", which in turn would integrate nicely
> with things like RPM file triggers (where the system `pip` package could
> set a file trigger that arranged for any properly registered Python package
> installation plugins to be run for every modification to site-packages
> while still appropriately managing the risk of running arbitrary code with
> elevated privileges)
> 
> Cheers,
> Nick.
> 

I have been trying to find time to do something like that within
stevedore for a while to solve some client-side startup performance
issues with the OpenStack client. I would be happy to help add it
to entrypoints instead and use it from there.

Thomas, 

Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Doug Hellmann
Excerpts from Wes Turner's message of 2017-10-20 10:41:02 -0400:
> On Friday, October 20, 2017, Donald Stufft  wrote:
> 
> >
> >
> > On Oct 20, 2017, at 9:35 AM, Nick Coghlan  > > wrote:
> >
> > On 20 October 2017 at 23:19, Donald Stufft  > > wrote:
> >
> >> One that I was helping someone debug just the other day is that they’re
> >> super non-debuggable and the behavior when you have two things providing
> >> the same entry point is basically  (If I remember correctly, the
> >> behavior is that the first thing found is the one that “wins”, which means
> >> the ordering of sys.path and the names of the projects supply it is what
> >> determines it). This got exposed to the end user that they installed
> >> something that they thought was going to add support for something, but
> >> which silently did nothing because two different project happened to pick
> >> the same name for their entry point (not the group, it was two things
> >> providing plugins for the same system).
> >>
> >
> > While I agree with this, I think that's a combination of pkg_resources
> > itself being hard to debug in general, and the fact that pkg_resources
> > doesn't clearly define the semantics of how it resolves name conflicts
> > within an entry point group - as far as I know, it's largely an accident of
> > implementation.
> >
> > The interoperability spec is going to state that conflict resolution when
> > the same name within a group is declared by multiple packages is the
> > responsibility of the group consumer, so documenting the format should
> > actually improve this situation, since it allows for the development of
> > competing conflict resolution strategies in different runtime libraries.
> >
> >
> > I think it makes it *worse*, because now the behavior isn’t just a
> > entrypoints weirdness, but now it changes based on which runtime library
> > you use (which isn’t something that end users are likely to have much
> > insight into) and it represents a footgun that package authors are unlikely
> > to be aware of. If mycoolentrypointslib comes out that is faster, but
> > changes some subtle behavior like this it’ll break people, but that is
> > unlikely going to be an effect that people expect to happen just because
> > they switched between two things both implementing the same standard.
> >
> > So effectively this means that not only is the fact you’re using
> > entrypoints part of your API, but now which entry point library you’re
> > using at runtime is now also part of your API.
> >
> 
> When should the check for duplicate entry points occur?
> 
> - At on_install() time (+1)
> - At runtime
> 
> Is a sys.path-like OrderedDict preemptive strategy preferable or just as
> dangerous as importlib?

Having "duplicate" entry points is not necessarily an error. It's
a different usage pattern.  The semantics of dropping a named plugin
into a namespace are defined by the application and plugin-point.
Please do not build assumptions about uniqueness into the underlying
implementation.

The stevedore library wraps up pkg_resources with several such
patterns. For example, it supports "give me all of the plugins in
a namespace" (find all the extensions to your app), "give me all
of the plugins named $name in a namespace" (find the hooks for a
specific event defined by the app), and "give me *the* plugin named
$name in a namespace" (load a driver for talking to a backend).

https://docs.openstack.org/stevedore/latest/reference/index.html

Doug
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Brett Cannon
On Wed, 18 Oct 2017 at 17:54 Nick Coghlan  wrote:

> On 19 October 2017 at 04:18, Alex Grönholm 
> wrote:
>
>> Daniel Holth kirjoitti 18.10.2017 klo 21:06:
>>
>>
>> http://setuptools.readthedocs.io/en/latest/formats.html?highlight=entry_points.txt#entry-points-txt-entry-point-plugin-metadata
>>
>>
>> http://setuptools.readthedocs.io/en/latest/pkg_resources.html?highlight=pkg_resources#creating-and-parsing
>>
>> It is not very complicated. It looks like the characters are mostly
>> 'python identifier' rules with a little bit of 'package name' rules.
>>
>> I am also concerned about the amount of parsing on startup. A hard
>> problem for certain, since no one likes outdated cache problems either. It
>> is also unpleasant to have too much code with a runtime dependency on
>> 'packaging'.
>>
>> Wasn't someone working on implementing pkg_resources in the standard
>> library at some point?
>>
>
> The idea has been raised, but we've been hesitant for the same reason
> we're inclined to take distutils out: packaging APIs need to be free to
> evolve in line with packaging interoperability standards, rather than with
> the Python language definition.
>
> Barry Warsaw & Brett Cannon recently mentioned something to me about
> working on a potential runtime alternative to pkg_resources that could be
> installed without also installing setuptools, but I don't know any of the
> specifics (and I'm not sure either of them follows distutils-sig).
>

I've been following distutils-sig for a couple of years now. :)

And what Barry and I are working on is only a subset of pkg_resources,
specifically the reading of data files included in a package. We aren't
touching any other aspect of pkg_resources.

Heck, until this discussion, "entry points" == "console scripts" for me so
I don't really know what y'all are talking about standardizing when it
comes to plug-in systems and metadata. Having said that, I do understand
why Donald doesn't want to just go ahead and standardize something by
giving it the level of a spec on packaging.python.org just because it's out
there. But since entry points seem to be used widely enough, having them
written down appropriately also seems reasonable.

As a compromise, could entry points be documented as Thomas is suggesting,
but have a note at the top saying something along the lines of "entry
points are considered a setuptools-specific feature, but their wide spread
use warrants a clear understanding of how they function for other packaging
tools choose on their own to also support them"? Basically acknowledge
there are ad-hoc, folk standards in the community that a decent chunk of
people rely on and thus docs would be helpful, but don't need to be
promoted to full-on, everyone-implements standard.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Wes Turner
On Friday, October 20, 2017, Donald Stufft  wrote:

>
>
> On Oct 20, 2017, at 9:35 AM, Nick Coghlan  > wrote:
>
> On 20 October 2017 at 23:19, Donald Stufft  > wrote:
>
>> One that I was helping someone debug just the other day is that they’re
>> super non-debuggable and the behavior when you have two things providing
>> the same entry point is basically  (If I remember correctly, the
>> behavior is that the first thing found is the one that “wins”, which means
>> the ordering of sys.path and the names of the projects supply it is what
>> determines it). This got exposed to the end user that they installed
>> something that they thought was going to add support for something, but
>> which silently did nothing because two different project happened to pick
>> the same name for their entry point (not the group, it was two things
>> providing plugins for the same system).
>>
>
> While I agree with this, I think that's a combination of pkg_resources
> itself being hard to debug in general, and the fact that pkg_resources
> doesn't clearly define the semantics of how it resolves name conflicts
> within an entry point group - as far as I know, it's largely an accident of
> implementation.
>
> The interoperability spec is going to state that conflict resolution when
> the same name within a group is declared by multiple packages is the
> responsibility of the group consumer, so documenting the format should
> actually improve this situation, since it allows for the development of
> competing conflict resolution strategies in different runtime libraries.
>
>
> I think it makes it *worse*, because now the behavior isn’t just a
> entrypoints weirdness, but now it changes based on which runtime library
> you use (which isn’t something that end users are likely to have much
> insight into) and it represents a footgun that package authors are unlikely
> to be aware of. If mycoolentrypointslib comes out that is faster, but
> changes some subtle behavior like this it’ll break people, but that is
> unlikely going to be an effect that people expect to happen just because
> they switched between two things both implementing the same standard.
>
> So effectively this means that not only is the fact you’re using
> entrypoints part of your API, but now which entry point library you’re
> using at runtime is now also part of your API.
>

When should the check for duplicate entry points occur?

- At on_install() time (+1)
- At runtime

Is a sys.path-like OrderedDict preemptive strategy preferable or just as
dangerous as importlib?
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Donald Stufft


> On Oct 20, 2017, at 9:35 AM, Nick Coghlan  wrote:
> 
> On 20 October 2017 at 23:19, Donald Stufft  > wrote:
> One that I was helping someone debug just the other day is that they’re super 
> non-debuggable and the behavior when you have two things providing the same 
> entry point is basically  (If I remember correctly, the behavior is that 
> the first thing found is the one that “wins”, which means the ordering of 
> sys.path and the names of the projects supply it is what determines it). This 
> got exposed to the end user that they installed something that they thought 
> was going to add support for something, but which silently did nothing 
> because two different project happened to pick the same name for their entry 
> point (not the group, it was two things providing plugins for the same 
> system).
> 
> While I agree with this, I think that's a combination of pkg_resources itself 
> being hard to debug in general, and the fact that pkg_resources doesn't 
> clearly define the semantics of how it resolves name conflicts within an 
> entry point group - as far as I know, it's largely an accident of 
> implementation.
> 
> The interoperability spec is going to state that conflict resolution when the 
> same name within a group is declared by multiple packages is the 
> responsibility of the group consumer, so documenting the format should 
> actually improve this situation, since it allows for the development of 
> competing conflict resolution strategies in different runtime libraries.

I think it makes it *worse*, because now the behavior isn’t just a entrypoints 
weirdness, but now it changes based on which runtime library you use (which 
isn’t something that end users are likely to have much insight into) and it 
represents a footgun that package authors are unlikely to be aware of. If 
mycoolentrypointslib comes out that is faster, but changes some subtle behavior 
like this it’ll break people, but that is unlikely going to be an effect that 
people expect to happen just because they switched between two things both 
implementing the same standard.

So effectively this means that not only is the fact you’re using entrypoints 
part of your API, but now which entry point library you’re using at runtime is 
now also part of your API.

>  
> Of course there is the perennial entrypoints are super slow, which is 
> partially the fault of pkg_resources does a bunch of import time logic, but 
> also because scanning sys.path for all installed stuff is just slow.
> 
> Similar to the above, one of the goals of documenting the entry point file 
> format is to permit libraries to compete in the development of effective 
> entrypoint metadata caching strategies without needing to bless any 
> particular one a priori, and without trying to manage experimental cache 
> designs across the huge pkg_resources install base.

That goal can be achieved if it’s documented in setuptools.

>  
> They’re also somewhat fragile since they rely on the packaging metadata 
> system at runtime, and a number of tools exclude that information (often 
> times things that deploy stuff as a tarball/zipfile) which causes regular 
> issues to be opened up for these projects when they get used in those 
> environments.
> 
> This is true, and one of the main pragmatic benefits of adopting one of the 
> purely import based plugin management systems. However, this problem will 
> impact all packaging metadata based plugin management solutions, regardless 
> of whether they use an existing file format or a new one.
>  
> Those are the ones I remember because they come up regularly (and people 
> regularly come to me with issues with any project related to packaging in any 
> way even for non packaging related features in those projects). I’m pretty 
> sure there were more of them that I’ve encountered and seen projects 
> encounter, but I can’t remember them to be sure.
> 
> I’m more familiar with why console_scripts entry point is not great and why 
> we should stop using it since I regularly try to re-read all of pip’s issues 
> and a lot of it’s issues are documented there.
> 
> I'm sympathetic to that, but I think even in that case, clearly documenting 
> the format as an interoperability specification will help tease out which of 
> those are due to the file format itself, and which are due to 
> setuptools.setup specifically.

All of the ones I’m aware of are due to the file format itself, because they 
exist even without setuptools being involved at all.


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Nick Coghlan
On 20 October 2017 at 23:19, Donald Stufft  wrote:

> One that I was helping someone debug just the other day is that they’re
> super non-debuggable and the behavior when you have two things providing
> the same entry point is basically  (If I remember correctly, the
> behavior is that the first thing found is the one that “wins”, which means
> the ordering of sys.path and the names of the projects supply it is what
> determines it). This got exposed to the end user that they installed
> something that they thought was going to add support for something, but
> which silently did nothing because two different project happened to pick
> the same name for their entry point (not the group, it was two things
> providing plugins for the same system).
>

While I agree with this, I think that's a combination of pkg_resources
itself being hard to debug in general, and the fact that pkg_resources
doesn't clearly define the semantics of how it resolves name conflicts
within an entry point group - as far as I know, it's largely an accident of
implementation.

The interoperability spec is going to state that conflict resolution when
the same name within a group is declared by multiple packages is the
responsibility of the group consumer, so documenting the format should
actually improve this situation, since it allows for the development of
competing conflict resolution strategies in different runtime libraries.


> Of course there is the perennial entrypoints are super slow, which is
> partially the fault of pkg_resources does a bunch of import time logic, but
> also because scanning sys.path for all installed stuff is just slow.
>

Similar to the above, one of the goals of documenting the entry point file
format is to permit libraries to compete in the development of effective
entrypoint metadata caching strategies without needing to bless any
particular one a priori, and without trying to manage experimental cache
designs across the huge pkg_resources install base.


> They’re also somewhat fragile since they rely on the packaging metadata
> system at runtime, and a number of tools exclude that information (often
> times things that deploy stuff as a tarball/zipfile) which causes regular
> issues to be opened up for these projects when they get used in those
> environments.
>

This is true, and one of the main pragmatic benefits of adopting one of the
purely import based plugin management systems. However, this problem will
impact all packaging metadata based plugin management solutions, regardless
of whether they use an existing file format or a new one.


> Those are the ones I remember because they come up regularly (and people
> regularly come to me with issues with any project related to packaging in
> any way even for non packaging related features in those projects). I’m
> pretty sure there were more of them that I’ve encountered and seen projects
> encounter, but I can’t remember them to be sure.
>
> I’m more familiar with why console_scripts entry point is not great and
> why we should stop using it since I regularly try to re-read all of pip’s
> issues and a lot of it’s issues are documented there.
>

I'm sympathetic to that, but I think even in that case, clearly documenting
the format as an interoperability specification will help tease out which
of those are due to the file format itself, and which are due to
setuptools.setup specifically.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Donald Stufft
On Oct 20, 2017, at 8:41 AM, Thomas Kluyver  wrote:
> 
> On Fri, Oct 20, 2017, at 01:36 PM, Donald Stufft wrote:
>> Entry points have a lot of problems and I know of multiple systems that have 
>> either moved away from them, had to hack around how bad they are, have 
>> refused to implement them because of previous pain felt by them, are looking 
>> for ways to eliminate them, or which just regret ever supporting them.
> 
> The fate of the PR notwithstanding, I'd be interested in hearing more about 
> what problems projects have experienced with entry points, if you have time 
> to describe some examples. We're looking at using them in more places than we 
> already do, so it would be useful to hear about drawbacks we might not have 
> thought about, and about what other options projects have moved to.
> 

One that I was helping someone debug just the other day is that they’re super 
non-debuggable and the behavior when you have two things providing the same 
entry point is basically  (If I remember correctly, the behavior is that 
the first thing found is the one that “wins”, which means the ordering of 
sys.path and the names of the projects supply it is what determines it). This 
got exposed to the end user that they installed something that they thought was 
going to add support for something, but which silently did nothing because two 
different project happened to pick the same name for their entry point (not the 
group, it was two things providing plugins for the same system).

Of course there is the perennial entrypoints are super slow, which is partially 
the fault of pkg_resources does a bunch of import time logic, but also because 
scanning sys.path for all installed stuff is just slow.

They’re also somewhat fragile since they rely on the packaging metadata system 
at runtime, and a number of tools exclude that information (often times things 
that deploy stuff as a tarball/zipfile) which causes regular issues to be 
opened up for these projects when they get used in those environments.

Those are the ones I remember because they come up regularly (and people 
regularly come to me with issues with any project related to packaging in any 
way even for non packaging related features in those projects). I’m pretty sure 
there were more of them that I’ve encountered and seen projects encounter, but 
I can’t remember them to be sure.

I’m more familiar with why console_scripts entry point is not great and why we 
should stop using it since I regularly try to re-read all of pip’s issues and a 
lot of it’s issues are documented there.

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Thomas Kluyver
On Fri, Oct 20, 2017, at 01:58 PM, Wes Turner wrote:
> What were the issues with setuptools entry points here (in 2014, when
> you two were opposed to adding them to sendibly list ipython
> extensions)?
I'm impressed by your memory! The main issue then was that it implied
that extension authors would have to use setuptools.
Setuptools has got much better since then, we have better tools and
norms for dealing with its rough edges, and there are usable alternative
tools that can be used to distribute entrypoints. But the description
I've written up is still basically trying to solve the same problem: an
application should be able to use entry points without forcing all
plugins to use setuptools.
Thomas

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Wes Turner
On Friday, October 20, 2017, Thomas Kluyver  wrote:

> On Fri, Oct 20, 2017, at 01:36 PM, Donald Stufft wrote:
>
> Entry points have a lot of problems and I know of multiple systems that
> have either moved away from them, had to hack around how bad they are, have
> refused to implement them because of previous pain felt by them, are
> looking for ways to eliminate them, or which just regret ever supporting
> them.
>
>
> The fate of the PR notwithstanding, I'd be interested in hearing more
> about what problems projects have experienced with entry points, if you
> have time to describe some examples. We're looking at using them in more
> places than we already do, so it would be useful to hear about drawbacks we
> might not have thought about, and about what other options projects have
> moved to.
>
> Thomas
>

 What were the issues with setuptools entry points here (in 2014, when you
two were opposed to adding them to sendibly list ipython extensions)?

https://github.com/ipython/ipython/pull/4673

https://github.com/ipython/ipython/compare/master...westurner:setuptools_entry_points
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Thomas Kluyver
On Fri, Oct 20, 2017, at 01:36 PM, Donald Stufft wrote:
> Entry points have a lot of problems and I know of multiple systems
> that have either moved away from them, had to hack around how bad they
> are, have refused to implement them because of previous pain felt by
> them, are looking for ways to eliminate them, or which just regret
> ever supporting them.
The fate of the PR notwithstanding, I'd be interested in hearing more
about what problems projects have experienced with entry points, if you
have time to describe some examples. We're looking at using them in more
places than we already do, so it would be useful to hear about drawbacks
we might not have thought about, and about what other options projects
have moved to.
Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Donald Stufft


> On Oct 20, 2017, at 8:34 AM, Nick Coghlan  wrote:
> 
> You're acting like you believe you have veto power over this topic. You don't 
> - it's not a PyPI related concern, and it doesn't require any changes to pip 
> or warehouse.
> 
> I'd certainly be *happier* if you were only -0 rather than -1, but your 
> disapproval won't prevent me from accepting Thomas's PR either way.


I’m acting like I have an opinion. You’re obviously free to accept something 
that I think is a bad idea, that doesn’t mean I should just shut up and not 
voice my concerns or objections and I’d appreciate it if you didn’t imply that 
I should.___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Donald Stufft

> On Oct 20, 2017, at 8:23 AM, Nick Coghlan  wrote:
> 
> On 20 October 2017 at 22:10, Donald Stufft  > wrote:
> If I could guess, I’d say it hasn’t changed in years because setuptools has 
> had bigger things to work on and not enough time to do it in.
> 
> Then you'd be wrong - it hasn't changed in years because it's a sensible, 
> simple solution to the problem of declaring integration points between 
> independently distributed pieces of software that allows the installed 
> integration points to be listed *without* importing the software providing 
> them (unlike most import based plugin systems).

I mean no I’m not.

Entry points have a lot of problems and I know of multiple systems that have 
either moved away from them, had to hack around how bad they are, have refused 
to implement them because of previous pain felt by them, are looking for ways 
to eliminate them, or which just regret ever supporting them.

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Nick Coghlan
On 20 October 2017 at 22:18, Donald Stufft  wrote:

>
> The “existing installations” is horse shit, because existing
> implementations won’t support *any* new feature of anything so it can
> literally be used as a justification for doing nothing about anything
> except standardizing what already exists. I guess we shouldn’t have done
> PEP 517 or PEP 518 because, by your logic here, since it won’t be supported
> by existing tooling, there won’t be any incentive for people to use it ever.
>

No, because PEP 517 and 518 actually change the UX for *publishers* away
from setup.py to pyproject.toml + whatever build system they choose, while
allowing the definition of a *common* setup.py shim for compatibility with
older clients.

By contrast, it's relatively rare for people to edit entry_points.txt by
hand - it's typically a generated file, just like PKG-INFO.

For any *new* console_scripts replacement, you're also going to have define
how to translate it back to entry_points.txt for compatibility with older
pip installations, and that means you're also going to have to define how
to do that without conflicting with any other pkg_resources entry points
already declared by a package.

Those two characteristics mean that entry_points.txt has a lot more in
common with PKG-INFO than it does with setup.py, and that similarity is
further enhanced by the fact that it's a pretty easy format to document.

> So if you want to say it is neither pip's nor PyPI's responsibility to say
> anything one way or the other about the entry points format (beyond whether
> or not they're used to declare console scripts in a way that pip
> understands), then I agree with you entirely. This spec isn't something you
> personally need to worry about, since it doesn't impact any of the tools
> you work on (aside from giving pip's existing console_scripts
> implementation a firmer foundation from an interoperability perpsective).
>
>
> My objection has absolutely nothing to do with whether pip is the consumer
> or not. My objection is entirely based on the fact that a plugin system is
> no .a packaging related feature and it doesn’t become one because a
> packaging tool once added a plugin system.
>

You're acting like you believe you have veto power over this topic. You
don't - it's not a PyPI related concern, and it doesn't require any changes
to pip or warehouse.

I'd certainly be *happier* if you were only -0 rather than -1, but your
disapproval won't prevent me from accepting Thomas's PR either way.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Thomas Kluyver
On Fri, Oct 20, 2017, at 01:18 PM, Donald Stufft wrote:
> I guess we shouldn’t have done PEP 517 or PEP 518 because, by your
> logic here, since it won’t be supported by existing tooling, there
> won’t be any incentive for people to use it ever.
I see this as having a similar purpose to those PEPs: reducing
dependence on setuptools. The difference is that for building packages,
pip explicitly uses setuptools, so the practical way forward was to
define an alternative to achieve the same ends. For this, the existing
mechanism does not directly rely on setuptools, so it's sufficient to
document it so that other tools can confidently produce and consume it.
I also get annoyed at times by arguments that it's not worth improving
something because it will be a long time before the change is useful.
But I don't think that's what Nick is saying here.
Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Nick Coghlan
On 20 October 2017 at 22:10, Donald Stufft  wrote:

> If I could guess, I’d say it hasn’t changed in years because setuptools
> has had bigger things to work on and not enough time to do it in.


Then you'd be wrong - it hasn't changed in years because it's a sensible,
simple solution to the problem of declaring integration points between
independently distributed pieces of software that allows the installed
integration points to be listed *without* importing the software providing
them (unlike most import based plugin systems).

And yes, I know you're attempting to claim that "declaring integration
points between independently distributed pieces of software" isn't
something that's a packaging ecosystem level to concern.

It is an ecosystem level concern, but we haven't had to worry about it
previously, because entry points haven't had problems to be fixed the way
that other aspects of setuptools have (lack of uninstall support in
easy_install, lack of filesystem layout metadata in eggs, ordering quirks
in the versioning scheme). For entry points, by contrast, the only missing
piece is explicit documentation of the file format used in distribution
archives and the installation database.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Donald Stufft


> On Oct 20, 2017, at 8:06 AM, Nick Coghlan  wrote:
> 
> On 20 October 2017 at 21:15, Donald Stufft  > wrote:
> Tell you what, I’ll drop everything today and write up a PEP that adds 
> metadata for console scripts to the packaging metadata where it belongs,
> 
> Donald, you're making the same mistake I did with PEP 426: interoperability 
> specifications are useless without a commitment from tooling developers to 
> actually provide implementations that know how to read and write them. And 
> since any new format you come up with won't be supported by existing pip and 
> pkg_resources installations, there won't be any incentive for publishers to 
> start using it, which means there's no incentives for runtime libraries to 
> learn how to read it, etc, etc.

Not particularly no.

I can promise you 100% that pip will support it in the next version once I 
write it. I can also promise you that setuptools will have a PR to support it 
(not pkg_resources, because console scripts are a install time feature not a 
runtime feature), and I assume Jason would be happy to merge it.

So there’s commitment from at least one tool.

The “existing installations” is horse shit, because existing implementations 
won’t support *any* new feature of anything so it can literally be used as a 
justification for doing nothing about anything except standardizing what 
already exists. I guess we shouldn’t have done PEP 517 or PEP 518 because, by 
your logic here, since it won’t be supported by existing tooling, there won’t 
be any incentive for people to use it ever.

> 
> In this case, we already have a perfectly serviceable format 
> (entry_points.txt), a reference publisher (setuptools.setup) and a reference 
> consumer (pkg_resources). The fact that the reference consumer is 
> pkg_resources rather than pip doesn't suddenly take this outside the domain 
> of responsibility of distutils-sig as a whole - it only takes it outside the 
> domain of responsibility of PyPI.
> 
> So if you want to say it is neither pip's nor PyPI's responsibility to say 
> anything one way or the other about the entry points format (beyond whether 
> or not they're used to declare console scripts in a way that pip 
> understands), then I agree with you entirely. This spec isn't something you 
> personally need to worry about, since it doesn't impact any of the tools you 
> work on (aside from giving pip's existing console_scripts implementation a 
> firmer foundation from an interoperability perpsective).

My objection has absolutely nothing to do with whether pip is the consumer or 
not. My objection is entirely based on the fact that a plugin system is no .a 
packaging related feature and it doesn’t become one because a packaging tool 
once added a plugin system.

> 
> So the core of our disagreement is whether or not interfaces involving pip 
> and PyPI represent the limits of distutil-sig's responsibility. They don't, 
> and that's reflected in the fact we have a split standing delegation from 
> Guido (one initially to Richard Jones and later to you for changes that 
> affect PyPI, and one to me for packaging ecosystem interoperability 
> specifications in general)

No that’s not the core of our disagreement. The core of our disagreement is 
whether random runtime features suddenly become a packaging concern because 
they were implemented by one packaging tool once.

>  
> so we can move console_scripts entry point to a legacy footnote as far as 
> packaging systems go. Then we can discuss whether an arbitrary plugin system 
> is actually a packaging related spec (it’s not) on it’s own merits.
> 
> Instructing publishing system developers on how to publish pkg_resources 
> compatible entry points is indeed a Python packaging ecosystem level concern. 

No it’s really not.

> 
> Whether that capability survives into a hypothetical future metadata 
> specification (whether that's PEP 426+459 or something else entirely) would 
> then be a different question, but it isn't one we need to worry about right 
> now (since it purely affects internal interoperability file formats that only 
> automated scripts and folks maintaining those scripts need to care about, and 
> we'd expect entry_points.txt and PKG-INFO to coexist alongside any new format 
> for a *long* time).
> 
> Cheers,
> Nick.
> 
> -- 
> Nick Coghlan   |   ncogh...@gmail.com    |   
> Brisbane, Australia

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Nick Coghlan
On 20 October 2017 at 21:57, Thomas Kluyver  wrote:

> On Fri, Oct 20, 2017, at 12:50 PM, Donald Stufft wrote:
> > * We stifle innovation (hell just including it in setutools at all does
> > this, but we can’t unopen that can of worms).
>
> I don't think that's true to any significant extent. Having a standard
> does not stop people coming up with something better.
>

entry_points.txt will be hard to change for similar reasons to why PKG-INFO
is hard to change, but that challenge exists regardless of whether we
consider it a setuptools/pkg_resources feature or an ecosystem level
standard, since it relates to coupling between metadata publishers and
consumers of that metadata.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Donald Stufft


> On Oct 20, 2017, at 7:57 AM, Thomas Kluyver  wrote:
> 
> On Fri, Oct 20, 2017, at 12:50 PM, Donald Stufft wrote:
>> * Since it is a packaging standard, then it is expected that all
>> packaging tools will be updated to work with it.
> 
> Where packaging tools need to know about it,  they already have to.
> Where they don't, writing a standard doesn't imply that every tool has
> to implement it. Documenting it doesn't change either case, it just
> makes life easier for tools that do need to use it.

Packaging tools shouldn’t be expected to know anything about it other than the 
console_scripts feature (which shouldn’t exist as an entry point, but currently 
does for historical reasons).

Publishing tools should have a way for additional files that the publishing 
tool wasn’t even aware might exist someday to get added to the metadata 
directory, installation tools should preserve those files when installing them. 
With those two generic features, then entry points (and other things!) can be 
written on top of the ecosystem *without* needing to standardize on one 
solution for one particular non-packaging problem.

If a publishing tool doesn’t want to provide that mechanism, then that is fine, 
but that limits their audience (in the same way that not building C extensions 
limits their audience, people who need that capability won’t be able to use 
them).

> 
>> * We’re explicitly saying that this is the one true way of solving this
>> problem in the Python ecosystem.
> 
> I don't buy that at all. We're saying that it exists, and this is what
> it is.

It’s literally the way all of our packaging standards are written. Don’t use 
eggs, wheels are the one true way, don’t use YOLO versions, PEP 440 is the one 
true way, don’t add arbitrary extensions to the simple repo format, PEP 503 API 
Is the one true way, etc etc etc.

> 
>> * We stifle innovation (hell just including it in setutools at all does
>> this, but we can’t unopen that can of worms).
> 
> I don't think that's true to any significant extent. Having a standard
> does not stop people coming up with something better.

It doesn’t actively prevent someone from coming up with something better no, 
but what it does do is add a pretty huge barrier to entry for someone who 
wanted to come up with something better. It’s the same way that something being 
added to the stdlib stifles competition. When something is “the standard”, it 
discourages people from even trying to make something better— or if they do 
make other people from trying it, unless “the standard” is really bad.

> 
>> * We make it actively harder to improve the feature (since once it’s part
>> of the purview of packaging standards, all of distutils-sig gets to weigh
>> in on improvements).
> 
> It hasn't changed in years, as far as I know, and it's so widely used
> that any change is likely to break a load of stuff anyway. As we've
> already discussed for caching, we can improve by building *on top* of it
> relatively easily. And ultimately I think that bringing it out into
> daylight leads to a healthier future than leaving it under the stone
> marked 'setuptools''.
> 


If I could guess, I’d say it hasn’t changed in years because setuptools has had 
bigger things to work on and not enough time to do it in.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Nick Coghlan
On 20 October 2017 at 21:15, Donald Stufft  wrote:

> Tell you what, I’ll drop everything today and write up a PEP that adds
> metadata for console scripts to the packaging metadata where it belongs,
>

Donald, you're making the same mistake I did with PEP 426: interoperability
specifications are useless without a commitment from tooling developers to
actually provide implementations that know how to read and write them. And
since any new format you come up with won't be supported by existing pip
and pkg_resources installations, there won't be any incentive for
publishers to start using it, which means there's no incentives for runtime
libraries to learn how to read it, etc, etc.

In this case, we already have a perfectly serviceable format
(entry_points.txt), a reference publisher (setuptools.setup) and a
reference consumer (pkg_resources). The fact that the reference consumer is
pkg_resources rather than pip doesn't suddenly take this outside the domain
of responsibility of distutils-sig as a whole - it only takes it outside
the domain of responsibility of PyPI.

So if you want to say it is neither pip's nor PyPI's responsibility to say
anything one way or the other about the entry points format (beyond whether
or not they're used to declare console scripts in a way that pip
understands), then I agree with you entirely. This spec isn't something you
personally need to worry about, since it doesn't impact any of the tools
you work on (aside from giving pip's existing console_scripts
implementation a firmer foundation from an interoperability perpsective).

So the core of our disagreement is whether or not interfaces involving pip
and PyPI represent the limits of distutil-sig's responsibility. They don't,
and that's reflected in the fact we have a split standing delegation from
Guido (one initially to Richard Jones and later to you for changes that
affect PyPI, and one to me for packaging ecosystem interoperability
specifications in general)


> so we can move console_scripts entry point to a legacy footnote as far as
> packaging systems go. Then we can discuss whether an arbitrary plugin
> system is actually a packaging related spec (it’s not) on it’s own merits.
>

Instructing publishing system developers on how to publish pkg_resources
compatible entry points is indeed a Python packaging ecosystem level
concern.

Whether that capability survives into a hypothetical future metadata
specification (whether that's PEP 426+459 or something else entirely) would
then be a different question, but it isn't one we need to worry about right
now (since it purely affects internal interoperability file formats that
only automated scripts and folks maintaining those scripts need to care
about, and we'd expect entry_points.txt and PKG-INFO to coexist alongside
any new format for a *long* time).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Thomas Kluyver
On Fri, Oct 20, 2017, at 12:50 PM, Donald Stufft wrote:
> * Since it is a packaging standard, then it is expected that all
> packaging tools will be updated to work with it.

Where packaging tools need to know about it,  they already have to.
Where they don't, writing a standard doesn't imply that every tool has
to implement it. Documenting it doesn't change either case, it just
makes life easier for tools that do need to use it.

> * We’re explicitly saying that this is the one true way of solving this
> problem in the Python ecosystem.

I don't buy that at all. We're saying that it exists, and this is what
it is.

> * We stifle innovation (hell just including it in setutools at all does
> this, but we can’t unopen that can of worms).

I don't think that's true to any significant extent. Having a standard
does not stop people coming up with something better.

> * We make it actively harder to improve the feature (since once it’s part
> of the purview of packaging standards, all of distutils-sig gets to weigh
> in on improvements).

It hasn't changed in years, as far as I know, and it's so widely used
that any change is likely to break a load of stuff anyway. As we've
already discussed for caching, we can improve by building *on top* of it
relatively easily. And ultimately I think that bringing it out into
daylight leads to a healthier future than leaving it under the stone
marked 'setuptools''.

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Donald Stufft

> On Oct 20, 2017, at 7:31 AM, Thomas Kluyver  wrote:
> 
> On Fri, Oct 20, 2017, at 12:15 PM, Donald Stufft wrote:
>> Tell you what, I’ll drop everything today and write up a PEP...
> 
> Donald, why are you so determined that this spec should not be created? Your 
> time is enormously valuable, so why would you drop everything to write a PEP 
> which implies changes to tooling, simply so that we don't document the status 
> quo? Even if we do make that change, there are thousands of existing packages 
> using the existing de-facto standard, so it would still be valuable to 
> document it.
> 
> If it makes things easier, I'll host the spec on my own site and add a 'see 
> also' from the specs page of the packaging user guide (because I think people 
> would expect it to be there, even if it's not the 'right' place). But I don't 
> think anyone else has expressed any objection to putting the spec there.
> 
> Thomas


I mean, it’s a PEP I was already planning on writing at some point, because 
I’ve *never* liked the fact that our console script support was reliant on a 
setuptools feature so all I’d be doing is re-prioritizing work I was already 
planning on doing. I’m also completely happy with documenting the status quo, 
which from a packaging stand point means documenting console_scripts— it 
doesn’t mean pulling in an entire setuptools feature. I’m not even against 
documenting the entire feature, *if* it’s done inside of setuptools where it 
belongs.

What I am against, is moving the entire entry points feature from a setuptools 
feature to a packaging standard. It is at best, tangental to packaging since 
outside of console_scripts it’s only real relation is that it uses features of 
the packaging ecosystem and happened to come from setuptools (but it could have 
just as easily been written externally to setuptools). Making it a packaging 
standard comes with several implications:

* Since it is a packaging standard, then it is expected that all packaging 
tools will be updated to work with it.
* We’re explicitly saying that this is the one true way of solving this problem 
in the Python ecosystem.
* We stifle innovation (hell just including it in setutools at all does this, 
but we can’t unopen that can of worms).
* We make it actively harder to improve the feature (since once it’s part of 
the purview of packaging standards, all of distutils-sig gets to weigh in on 
improvements).

I don’t get why anyone would want to saddle all of the extra implications and 
work that comes with being a packaging standard on a feature that isn’t one and 
doesn’t need to be one. We are at our best when our efforts are on generalized 
mechanisms that allow features such as entry points to be implemented on top of 
us, rather than trying to pull in every tangential feature under the sun into 
our domain.

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Thomas Kluyver
On Fri, Oct 20, 2017, at 12:15 PM, Donald Stufft wrote:
> Tell you what, I’ll drop everything today and write up a PEP...

Donald, why are you so determined that this spec should not be created?
Your time is enormously valuable, so why would you drop everything to
write a PEP which implies changes to tooling, simply so that we don't
document the status quo? Even if we do make that change, there are
thousands of existing packages using the existing de-facto standard, so
it would still be valuable to document it.
If it makes things easier, I'll host the spec on my own site and add a
'see also' from the specs page of the packaging user guide (because I
think people would expect it to be there, even if it's not the 'right'
place). But I don't think anyone else has expressed any objection to
putting the spec there.
Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Donald Stufft

> On Oct 20, 2017, at 1:32 AM, Nick Coghlan  wrote:
> 
> 3. Unlike setup.cfg & pyproject.toml, actual humans never touch it - it's 
> written and read solely by software

This is wrong BTW, humans can and do effectively write entry_points.txt, it’s a 
supported feature of setuptools to do:

setuptools.setup(
entry_points=“””
[my_cool_entrypoint]
athing = the.thing:bar
“””,
)


This is documented and I have run into a number of projects that do this.___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Donald Stufft


> On Oct 20, 2017, at 7:02 AM, Nick Coghlan  wrote:
> 
>  That's the one point where the "de facto standard" status of setuptools is 
> relevant to the question of whether the entry_points.txt format is a PyPA 
> interoperability standard: it is, because providing a functionally equivalent 
> capability is required for publishers to be able to transparently switch from 
> setuptools to something else without their end users noticing the difference.


Nope. Because this isn’t a packaging feature. It’s a runtime feature of 
setuptools, and we do everyone a disservice by trying to move this into the 
purview of distutils-sig just because setuptools included a feature once. Just 
because setuptools included a feature does *NOT* make it a packaging related 
feature.

Tell you what, I’ll drop everything today and write up a PEP that adds metadata 
for console scripts to the packaging metadata where it belongs, so we can move 
console_scripts entry point to a legacy footnote as far as packaging systems 
go. Then we can discuss whether an arbitrary plugin system is actually a 
packaging related spec (it’s not) on it’s own merits.___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Nick Coghlan
On 20 October 2017 at 20:48, Donald Stufft  wrote:

>
> On Oct 20, 2017, at 1:32 AM, Nick Coghlan  wrote:
>
> If we want to enable pytest plugin authors to use other build systems like
> flit, then those build systems need a defined interoperability format
> that's compatible with what pytest is expecting to see (i.e. entry point
> definitions that pkg_resources knows how to read).
>
>
> This is thinking about it wrong IMO.
>
> We could just as easily say if we want tools like flit to be able to
> package Twisted plugins then those build systems need a defined
> interoperability format that is compatible with what Twisted and that
> ecosystem is expecting.
>

Twisted already defines plugin discovery in an inherently
packaging-friendly way, since it's based on import names rather than
packaging metadata. Other plugin management systems like straight.plugins
are similar: they use Python's import system as their pub/sub channel to
advertise plugin availability, and accept the limitation that this means
all plugin APIs will be module level ones rather than being individual
classes or callables.


> The *ONLY* reason we should care at all about defining entry points as a
> packaging feature is console scripts, so we should limit our
> standardization to that. PBR has a runtime feature too where it inserts
> metadata into the .dist-info directory at build time and then a runtime API
> that reads that.. should we standardize that too?
>

No, because PBR isn't the defacto default build system that pip injects
into setup.py execution by default. That's the one point where the "de
facto standard" status of setuptools is relevant to the question of whether
the entry_points.txt format is a PyPA interoperability standard: it is,
because providing a functionally equivalent capability is required for
publishers to be able to transparently switch from setuptools to something
else without their end users noticing the difference.

Sure we *could* say "We don't want to standardise on that one, we want to
define a different one", but I think entry points are good enough for our
purposes, so inventing something different wouldn't be a good use of
anyone's time (see also: the perpetually deferred status of PEP 426).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Nick Coghlan
On 20 October 2017 at 16:43, Thomas Kluyver  wrote:

> I would also be happy to add a section to the document describing the
> specific use of entry points for defining scripts to install.
>

Yeah, it would make sense to include that, as well as reserving the
"console_scripts" name on PyPI so we abide by our own "Only rely on a
category name if you or one of your dependencies controls it on PyPI"
guideline.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Donald Stufft

> On Oct 20, 2017, at 1:32 AM, Nick Coghlan  wrote:
> 
> If we want to enable pytest plugin authors to use other build systems like 
> flit, then those build systems need a defined interoperability format that's 
> compatible with what pytest is expecting to see (i.e. entry point definitions 
> that pkg_resources knows how to read).
> 

This is thinking about it wrong IMO.

We could just as easily say if we want tools like flit to be able to package 
Twisted plugins then those build systems need a defined interoperability format 
that is compatible with what Twisted and that ecosystem is expecting.The *ONLY* 
reason we should care at all about defining entry points as a packaging feature 
is console scripts, so we should limit our standardization to that. PBR has a 
runtime feature too where it inserts metadata into the .dist-info directory at 
build time and then a runtime API that reads that.. should we standardize that 
too?

I’m *not* saying that flit doesn’t nee to know how to generate entry points if 
a entry points using project wants to use flit, but what I am saying is that 
entry points isn’t a packaging specification. It’s a setuptools feature that 
should live within setuptools.___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Thomas Kluyver
I would also be happy to add a section to the document describing the
specific use of entry points for defining scripts to install.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-20 Thread Thomas Kluyver
On Fri, Oct 20, 2017, at 05:42 AM, Nick Coghlan wrote:
> I'm wondering if rather than jumping straight to a PEP, it may make
> sense to instead initially pursue this idea as a *non-*standard,
> implementation dependent thing specific to the "entrypoints" project.
> There are a *lot* of challenges to be taken into account for a truly
> universal metadata caching design, and it would be easy to fall into
> the trap of coming up with a design so complex that nobody can
> realistically implement it.
I'd be happy to tackle it like that. Donald's proposed hooks for package
installation and uninstallation would provide all the necessary
interoperation between different tools. As and when it's working, the
cache format can be documented for other consumers to use.
> Right now, the only documented publishing API for that pub/sub channel
> is setuptools.setup(), and the only documented subscription API is
> pkg_resources. Documenting the file format explicitly changes that
> dynamic, such that any publisher that produces a compliant
> `entry_points.txt` file will be supported by pkg_resources, and any
> consumer that can read a compliant `entry_points.txt` file will be
> supported by setuptools.setup()
Yup, this is very much what I'd like :-)

Thanks,
Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Nick Coghlan
On 20 October 2017 at 07:33, Donald Stufft  wrote:

>
> On Oct 19, 2017, at 5:26 PM, Tres Seaver  wrote:
>
> Having the packaging
> system register those services at installation time (even if it doesn't
> care otherwise about them) seems pretty reasonable to me.
>
>
> It does register them at installation time, using an entirely generic
> feature of “you can add any file you want to a dist-info directory and we
> will preserve it”. It doesn’t need to know anything else about them other
> then it’s a file that needs preserved.
>

That's all the *installer* needs to know. Publishing tools like flit need
to know the internal format in order to replicate the effect of
https://packaging.python.org/tutorials/distributing-packages/#console-scripts
and to interoperate with any other pkg_resources based plugin ecosystem.

I personally find it useful to think of entry points as a pub/sub
communications channel between package authors and other runtime components.

When you use the entry points syntax to declare a pytest plugin as a
publisher, your intended subscriber is pytest, and pytest defines the
possible messages. Ditto for any other entry points based plugin management
system.

Installers are mostly just a relay link in that pub/sub channel - they take
the entry point announcement messages in the sdist or wheel archive, and
pass them along to the installation database.

The one exception to the "installers as passive relay" behaviour is that
when you specify "console_scripts", your intended subscribers *are* package
installation tools, and your message is "I'd like an executable wrapper for
these entry points, please".

Right now, the only documented publishing API for that pub/sub channel is
setuptools.setup(), and the only documented subscription API is
pkg_resources. Documenting the file format explicitly changes that dynamic,
such that any publisher that produces a compliant `entry_points.txt` file
will be supported by pkg_resources, and any consumer that can read a
compliant `entry_points.txt` file will be supported by setuptools.setup()

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Nick Coghlan
On 20 October 2017 at 06:34, Donald Stufft  wrote:

>
> On Oct 19, 2017, at 4:04 PM, Donald Stufft  wrote:
>
> Like I said, I’m perfectly fine documenting that if you add an
> entry_points.txt to the .dist-info directory, that is an INI file that
> contains a section named “console_scripts” and define what is valid inside
> of the console_scripts section that it will generate script wrappers, then
> fine. But we should leave any other section in this entry_points.txt file
> as undefined in packaging terms, and point people towards setuptools for
> more information about it if they want to know anything more than what we
> need for packaging.
>
>
> To be more specific here, the hypothetical thing we would be
> documenting/standardizing here is console entry points and script wrappers,
> not a generic plugin system. So console scripts would be the focus of the
> documentation.
>

We've already effectively blessed console_scripts as a standard approach:
https://packaging.python.org/tutorials/distributing-packages/#entry-points

The specific problem that blessing creates is that we currently only define:

- a way for publishers to specify console_scripts via setuptools
- a way for installers to find console_scripts using pkg_resources

That's *very* similar to the problem we had with dependency declarations:
only setuptools knew how to write them, and only easy_install knew how to
read them.

Beyond the specific example of console_scripts, there are also multiple
subecosystems where both publishers and subscribers remain locked into the
setuptools/pkg_resources combination because they use entry points for
their plugin management. This means that if you want to write a pytest
plugin, for example, the only officially supported way to do so is to use
setuptools in order to publish the relevant entry point definitions:
https://docs.pytest.org/en/latest/writing_plugins.html#setuptools-entry-points

If we want to enable pytest plugin authors to use other build systems like
flit, then those build systems need a defined interoperability format
that's compatible with what pytest is expecting to see (i.e. entry point
definitions that pkg_resources knows how to read).

We ended up solving the previous tight publisher/installer coupling problem
for dependency management *not* by coming up with completely new metadata
formats, but rather by better specifying the ones that setuptools already
knew how to emit, such that most publishers didn't need to change anything,
and even when there were slight differences between the way setuptools
worked and the agreed interoperability standards, other tools could readily
translate setuptools output into the standardised form (e.g. egg_info ->
PEP 376 dist-info directories and wheel metadata).

The difference in this case is that:

1. entry_points.txt is already transported reliably through the whole
packaging toolchain
2. It is the existing interoperability format for `console_scripts`
definitions
3. Unlike setup.cfg & pyproject.toml, actual humans never touch it - it's
written and read solely by software

This means that the interoperability problems we actually care about
solving (allowing non-setuptools based publishing tools to specify
console_scripts and other pkg_resources entry points, and allowing
non-pkg_resources based consumers to read pkg_resources entry point
metadata, including console_scripts) can both be solved *just* by properly
specifying the existing de facto format.

So standardising on entry_points.txt isn't a matter of "because setuptools
does it", it's because formalising it is the least-effort solution to what
we actually want to enable: making setuptools optional on the publisher
side (even if you need to publish entry point metadata), and making
pkg_resources optional on the consumer side (even if you need to read entry
point metadata).

I do agree that the metadata caching problem is best tackled as a specific
motivating example for supporting packaging installation and uninstallation
hooks, but standardising the entry points format still helps us with that:
it means we can just define "python.install_hooks" as a new entry point
category, and spend our energy on defining the semantics and APIs of the
hooks themselves, rather than having to worry about defining a new format
for how publishers will declare how to run the hooks, or how installers
will find out which hooks have been installed locally.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Nick Coghlan
On 20 October 2017 at 02:14, Thomas Kluyver  wrote:

> On Thu, Oct 19, 2017, at 04:10 PM, Donald Stufft wrote:
> > I’m in favor, although one question I guess is whether it should be a a
> > PEP or an ad hoc spec. Given (2) it should *probably* be a a PEP (since
> > without (2), its just another file in the .dist-info directory and that
> > doesn’t actually need standardized at all). I don’t think that this will
> > be a very controversial PEP though, and should be pretty easy.
>
> I have opened a PR to document what is already there, without adding any
> new features. I think this is worth doing even if we don't change
> anything, since it's a de-facto standard used for different tools to
> interact.
>
> https://github.com/pypa/python-packaging-user-guide/pull/390
>
> We can still write a PEP for caching if necessary.
>

+1 for that approach (PR for the status quo, PEP for a shared metadata
caching design) from me

Making the status quo more discoverable is valuable in its own right, and
the only decisions we'll need to make for that are terminology
clarification ones, not interoperability ones (this isn't like PEP 440 or
508 where we actually thought some of the default setuptools behaviour was
slightly incorrect and wanted to change it).

Figuring out a robust cross-platform network-file-system-tolerant metadata
caching design on the other hand is going to be hard, and as Donald
suggests, the right ecosystem level solution might be to define
install-time hooks for package installation operations.


> > I’m also in favor of this. Although I would suggest SQLite rather than a
> > JSON file for the primary reason being that a JSON file isn’t
> > multiprocess safe without being careful (and possibly introducing
> > locking) whereas SQLite has already solved that problem.
>
> SQLite was actually my first thought, but from experience in Jupyter &
> IPython I'm wary of it - its built-in locking does not work well over
> NFS, and it's easy to corrupt the database. I think careful use of
> atomic writing can be more reliable (though that has given us some
> problems too).
>
> That may be easier if there's one cache per user, though - we can
> perhaps try to store it somewhere that's not NFS.
>

I'm wondering if rather than jumping straight to a PEP, it may make sense
to instead initially pursue this idea as a *non-*standard, implementation
dependent thing specific to the "entrypoints" project. There are a *lot* of
challenges to be taken into account for a truly universal metadata caching
design, and it would be easy to fall into the trap of coming up with a
design so complex that nobody can realistically implement it.

Specifically, I'm thinking of a usage model along the lines of the
updatedb/locate pair on *nix systems: `locate` gives you access to very
fast searches of your filesystem, but it *doesn't* try to automagically
keeps its indexes up to date. Instead, refreshing the indexes is handled by
`updatedb`, and you can either rely on that being run automatically in a
cron job, or else force an update with `sudo updatedb` when you want to use
`locate`.

For a project like entrypoints, what that might look like is that at
*runtime*, you may implement a reasonably fast "cache freshness check",
where you scanned the mtime of all the sys.path entries, and compared those
to the mtime of the cache. If the cache looks up to date, then cool,
otherwise emit a warning about the stale metadata cache, and then bypass it.

The entrypoints project itself could then expose a
`refresh-entrypoints-cache` command that could start out only supporting
virtual environments, and then extend to per-user caching, and then finally
(maybe) consider whether or not it wanted to support installation-wide
caches (with the extra permissions management and cross-process and
cross-system coordination that may imply).

Such an approach would also tie in nicely with Donald's suggestion of
reframing the ecosystem level question as "How should the entrypoints
project request that 'refresh-entrypoints-cache' be run after every package
installation or removal operation?", which in turn would integrate nicely
with things like RPM file triggers (where the system `pip` package could
set a file trigger that arranged for any properly registered Python package
installation plugins to be run for every modification to site-packages
while still appropriately managing the risk of running arbitrary code with
elevated privileges)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Wes Turner
On Thursday, October 19, 2017, Donald Stufft  wrote:

>
> On Oct 19, 2017, at 5:26 PM, Tres Seaver  > wrote:
>
> Having the packaging
> system register those services at installation time (even if it doesn't
> care otherwise about them) seems pretty reasonable to me.
>
>
> It does register them at installation time, using an entirely generic
> feature of “you can add any file you want to a dist-info directory and we
> will preserve it”. It doesn’t need to know anything else about them other
> then it’s a file that needs preserved.
>

When I think of 'register at installation time', I think of adding them to
a single { locked JSON || SQLite DB || ...}; because that's the only way
there'd be a performance advantage?

Why would we write a .txt, transform it to {JSON || SQL INSERTS}, and then
write it to a central registrar?

(BTW, pipsi does console script entry points with isolated virtualenvs
linked into from ~/.local/bin (which is generally user-writable)).
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Donald Stufft

> On Oct 19, 2017, at 5:26 PM, Tres Seaver  wrote:
> 
> Having the packaging
> system register those services at installation time (even if it doesn't
> care otherwise about them) seems pretty reasonable to me.

It does register them at installation time, using an entirely generic feature 
of “you can add any file you want to a dist-info directory and we will preserve 
it”. It doesn’t need to know anything else about them other then it’s a file 
that needs preserved.___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Tres Seaver
On 10/19/2017 04:57 PM, Donald Stufft wrote:

> Because the feature is unrelated to packaging other than the fact we
> currently utilize it for console_scripts.
That seems like an odd perspective.  Console scripts may be the only bit of
entry points which is used *by the packaging system* at installation time,
but an system composed of separately-installable packages providing shared
services needs some way of querying those services at runtime, which is
what all the *other* uses of entry points represent.  Having the packaging
system register those services at installation time (even if it doesn't
care otherwise about them) seems pretty reasonable to me.


Tres.
-- 
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   "Excellence by Design"http://palladion.com

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Donald Stufft


> On Oct 19, 2017, at 4:36 PM, Thomas Kluyver  wrote:
> 
> On Thu, Oct 19, 2017, at 09:04 PM, Donald Stufft wrote:
>> Like I said, I’m perfectly fine documenting that if you add an
>> entry_points.txt to the .dist-info directory, that is an INI file that
>> contains a section named “console_scripts” and define what is valid
>> inside of the console_scripts section that it will generate script
>> wrappers, then fine. But we should leave any other section in this
>> entry_points.txt file as undefined in packaging terms, and point people
>> towards setuptools for more information about it if they want to know
>> anything more than what we need for packaging.
> 
> I don't see any advantage in describing the file format but then
> pretending that there's only section in it. We're not prescribing any
> particular meaning or use for other sections, but it seems bizarre to
> not describe the possibilities. console_scripts is just one use case.

Because the feature is unrelated to packaging other than the fact we currently 
utilize it for console_scripts. A spec to standardize console_scripts is a good 
thing, a spec to standardize an almost entirely unrelated feature for packaging 
is a bad thing. 

> 
> Also, entry points in general kind of are a packaging thing. You specify
> them in packaging metadata, both for setuptools and flit, and the
> packaging tools write entry_points.txt. It's not the only way to create
> a plugin system, but it's the way this one was created.

You can describe lots of things in the packaging metadata, because one of the 
features of the packaging metadata is you can add arbitrary files to the 
dist-info directory. Entrypoints are one such file that some projects add to 
that directory, but there are other examples and jsut becuause it involves 
adding files to that, does not mean it belongs to “packaging”.

> 
> I honestly don't get the resistance to documenting this as a whole. I'm
> not proposing something that will add a new maintenance burden; it's a
> description of something that's already there. Can't we save the energy
> for discussing a real change or new thing?
> 

I don’t get the resistance to documenting this where it belongs. Its not any 
more difficult to document things in the setuptools repository than it is to 
document it in the packaging specs repository.

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Thomas Kluyver
On Thu, Oct 19, 2017, at 09:04 PM, Donald Stufft wrote:
> Like I said, I’m perfectly fine documenting that if you add an
> entry_points.txt to the .dist-info directory, that is an INI file that
> contains a section named “console_scripts” and define what is valid
> inside of the console_scripts section that it will generate script
> wrappers, then fine. But we should leave any other section in this
> entry_points.txt file as undefined in packaging terms, and point people
> towards setuptools for more information about it if they want to know
> anything more than what we need for packaging.

I don't see any advantage in describing the file format but then
pretending that there's only section in it. We're not prescribing any
particular meaning or use for other sections, but it seems bizarre to
not describe the possibilities. console_scripts is just one use case.

Also, entry points in general kind of are a packaging thing. You specify
them in packaging metadata, both for setuptools and flit, and the
packaging tools write entry_points.txt. It's not the only way to create
a plugin system, but it's the way this one was created.

I honestly don't get the resistance to documenting this as a whole. I'm
not proposing something that will add a new maintenance burden; it's a
description of something that's already there. Can't we save the energy
for discussing a real change or new thing?

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Donald Stufft

> On Oct 19, 2017, at 4:04 PM, Donald Stufft  wrote:
> 
> Like I said, I’m perfectly fine documenting that if you add an 
> entry_points.txt to the .dist-info directory, that is an INI file that 
> contains a section named “console_scripts” and define what is valid inside of 
> the console_scripts section that it will generate script wrappers, then fine. 
> But we should leave any other section in this entry_points.txt file as 
> undefined in packaging terms, and point people towards setuptools for more 
> information about it if they want to know anything more than what we need for 
> packaging.

To be more specific here, the hypothetical thing we would be 
documenting/standardizing here is console entry points and script wrappers, not 
a generic plugin system. So console scripts would be the focus of the 
documentation.___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Donald Stufft


> On Oct 19, 2017, at 3:55 PM, Thomas Kluyver  wrote:
> 
> On Thu, Oct 19, 2017, at 08:29 PM, Donald Stufft wrote:
>> Because it is? A generic plugin mechanism is not a packaging feature any 
>> more then a HTTP client is a packaging feature, but setuptools contains one 
>> of those too. Since setuptools was in large part a packaging library, it 
>> will of course contain many packaging features that we’re going to 
>> standardize on, but something being in setuptools does not in fact make it a 
>> packaging feature in and of itself.
> 
> My argument is not that it's in setuptools, it's that
> 
> 1. It's already processed by multiple packaging tools
> 2. Any tool producing wheels which include command line tools basically has 
> to use entry points (or include a bunch of redundant complexity to make 
> command-line wrappers). It's a de-facto part of the wheel spec, at least 
> until a replacement is devised - and since it works, replacing for semantic 
> cleanliness is not a priority.
> 
> You're quite right that a plugin system doesn't need to be a packaging 
> standard. But that ship has sailed. It's already a standard format for 
> packaging, the only question is whether it's documented. Practicality beats 
> purity.


Like I said, I’m perfectly fine documenting that if you add an entry_points.txt 
to the .dist-info directory, that is an INI file that contains a section named 
“console_scripts” and define what is valid inside of the console_scripts 
section that it will generate script wrappers, then fine. But we should leave 
any other section in this entry_points.txt file as undefined in packaging 
terms, and point people towards setuptools for more information about it if 
they want to know anything more than what we need for packaging.

I am against fully speccing out or adding more features to entry points as part 
of a packaging standardization effort.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Thomas Kluyver
On Thu, Oct 19, 2017, at 08:29 PM, Donald Stufft wrote:
> Because it is? A generic plugin mechanism is not a packaging feature
> any more then a HTTP client is a packaging feature, but setuptools
> contains one of those too. Since setuptools was in large part a
> packaging library, it will of course contain many packaging features
> that we’re going to standardize on, but something being in setuptools
> does not in fact make it a packaging feature in and of itself.
My argument is not that it's in setuptools, it's that

1. It's already processed by multiple packaging tools
2. Any tool producing wheels which include command line tools basically
   has to use entry points (or include a bunch of redundant complexity
   to make command-line wrappers). It's a de-facto part of the wheel
   spec, at least until a replacement is devised - and since it works,
   replacing for semantic cleanliness is not a priority.
You're quite right that a plugin system doesn't need to be a packaging
standard. But that ship has sailed. It's already a standard format for
packaging, the only question is whether it's documented. Practicality
beats purity.
Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Donald Stufft


> On Oct 19, 2017, at 3:15 PM, Thomas Kluyver  wrote:
> 
> On Thu, Oct 19, 2017, at 08:01 PM, Donald Stufft wrote:
>> 
>>> On Oct 19, 2017, at 2:54 PM, Thomas Kluyver >> > wrote:
>>> 
>>> I don't think this needs to be controversial. They are a de-facto
>>> packaging standard, whether or not that's theoretically necessary.
>>> There's more than one tool that can create them (setuptools, flit), and
>>> more than one that can consume them (pkg_resources, entrypoints). Lots
>>> of packages use them, and they're not going anywhere soon. Describing
>>> the format properly seems like a clear win.
>> 
>> 
>> 
>> I disagree they are a packaging standard and I think it would be crummy to 
>> define it as one. I believe it is a setuptools feature, that flit and 
>> entrypoints wants to integrate with a setuptools feature is fine, but that 
>> doesn’t make it a packaging standard just because it came from setuptools. I 
>> agree that describing the format properly is a clear win, but I believe it 
>> belongs in the setuptools documentation.
> 
> pip and distlib also independently read this format without going through 
> setuptools. It's a de-facto standard already.  Entry points are also the most 
> common way for packages to install command-line scripts, and the most 
> effective way to do so across different platforms. So it's essential that 
> install tools do understand this.

It’s only essential in that we support a very limited subset specifically for 
console scripts, which long term we should be extracting from entry points and 
using something dedicated to that. Generating script wrappers is a packaging 
concern, and if this proposal was about documenting the console_scripts key in 
an entry_points.txt file to trigger a console script being generated, then 
that’s fine with me.

> 
> Much of our packaging standards were built out of setuptools features anyway 
> - why pretend that this is different?

Because it is? A generic plugin mechanism is not a packaging feature any more 
then a HTTP client is a packaging feature, but setuptools contains one of those 
too. Since setuptools was in large part a packaging library, it will of course 
contain many packaging features that we’re going to standardize on, but 
something being in setuptools does not in fact make it a packaging feature in 
and of itself.

As an example of another setuptools feature that isn’t a packaging feature, I 
also would be against adding the resource APIs in a packaging standard because 
they’re not a packaging feature either, they’re a python import module feature 
(which is why Brett Cannon and Barry are adding them to importlib instead of 
trying to make a packaging PEP for them).

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Thomas Kluyver
On Thu, Oct 19, 2017, at 08:01 PM, Donald Stufft wrote:
> 
>> On Oct 19, 2017, at 2:54 PM, Thomas Kluyver
>>  wrote:>> 
>> I don't think this needs to be controversial. They are a de-facto
>> packaging standard, whether or not that's theoretically necessary.
>> There's more than one tool that can create them (setuptools,
>> flit), and>> more than one that can consume them (pkg_resources,
>> entrypoints). Lots>> of packages use them, and they're not going anywhere 
>> soon. Describing>> the format properly seems like a clear win.
> 
> 
> I disagree they are a packaging standard and I think it would be
> crummy to define it as one. I believe it is a setuptools feature, that
> flit and entrypoints wants to integrate with a setuptools feature is
> fine, but that doesn’t make it a packaging standard just because it
> came from setuptools. I agree that describing the format properly is a
> clear win, but I believe it belongs in the setuptools documentation.
pip and distlib also independently read this format without going
through setuptools. It's a de-facto standard already.  Entry points are
also the most common way for packages to install command-line scripts,
and the most effective way to do so across different platforms. So it's
essential that install tools do understand this.
Much of our packaging standards were built out of setuptools features
anyway - why pretend that this is different?
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Donald Stufft

> On Oct 19, 2017, at 2:54 PM, Thomas Kluyver  wrote:
> 
> I don't think this needs to be controversial. They are a de-facto
> packaging standard, whether or not that's theoretically necessary.
> There's more than one tool that can create them (setuptools, flit), and
> more than one that can consume them (pkg_resources, entrypoints). Lots
> of packages use them, and they're not going anywhere soon. Describing
> the format properly seems like a clear win.


I disagree they are a packaging standard and I think it would be crummy to 
define it as one. I believe it is a setuptools feature, that flit and 
entrypoints wants to integrate with a setuptools feature is fine, but that 
doesn’t make it a packaging standard just because it came from setuptools. I 
agree that describing the format properly is a clear win, but I believe it 
belongs in the setuptools documentation.___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Thomas Kluyver
On Thu, Oct 19, 2017, at 07:09 PM, Donald Stufft wrote:
> So heres a different idea that is a bit more ambitious but that I think
> is a better overall idea. Let entrypoints be a setuptools thing, and lets
> define some key lifecycle hooks during the installation of a package and
> some mechanism in the metadata to let other tools subscribe to those
> hooks.

I'd like to document the existing mechanism as previously suggested. Not
least because I've already written the PR ;-).

I don't think this needs to be controversial. They are a de-facto
packaging standard, whether or not that's theoretically necessary.
There's more than one tool that can create them (setuptools, flit), and
more than one that can consume them (pkg_resources, entrypoints). Lots
of packages use them, and they're not going anywhere soon. Describing
the format properly seems like a clear win.

For caching, I'm happy enough to work on a more general PEP to define
packaging hooks, so long as that isn't going to be as long a discussion
as PEP 517.

Daniel:
> How long does pkg_resources take to import for you folks?

About 0.5s on my laptop with an SSD, about 5s on a machine with a
spinning hard drive. This is simulating a cold start on both; it's much
quicker once the OS caches it in memory.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Donald Stufft

> On Oct 19, 2017, at 2:28 PM, Paul Moore  wrote:
> 
> While I agree with this, one thing I have noticed with recent work is
> that standardising existing things has typically been relatively
> painless and stress-free. But designing new mechanisms generally ends
> up with huge threads, heated debates, and people burning out on the
> whole thing. We've had a couple of cases of that recently, and in
> particular Thomas has endured the big PEP 517 debate, so I'm inclined
> to say we should take a rest from new designs for a while, and keep
> the scope here limited.

So I’m generally fine with keeping the scope limited, but for the same reason 
as I think the real solution is what I defined above, I think this 
isn’t/shouldn’t be a packaging standard and is a setuptools feature and should 
be documented/live there. If setuptools wants to enable people to directly 
manipulate those files they can document the standard of those files, if they 
want to treat it as internal and you’re expected to use their APIs then they 
can.

Essentially, I don’t think that a plugin system should be within the domain of 
distutils-sig or the PyPA and the only reason we’re even thinking of it as one 
is because (a) historically setuptools _had_ a plugin system and (b) we lack 
lifecycle hooks. I’m loathe to move the documentation for a setuptools specific 
feature out of their documentation because I think it muddies the water further.___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Daniel Holth
I prefer a single more generic mechanism that packaging happens to use
instead of making special mechanisms for scripts or other callables that
packaging might some day be interested in. One API, I can type
pkg_resources.iter_entry_points('console_scripts') to enumerate the scripts
and perhaps invoke them without the wrappers, or I can look other plugins.

+1 on simply documenting what we have first.

How long does pkg_resources take to import for you folks?

On Thu, Oct 19, 2017 at 2:10 PM Donald Stufft  wrote:

>
>
> > On Oct 19, 2017, at 12:14 PM, Thomas Kluyver 
> wrote:
> >
> > On Thu, Oct 19, 2017, at 04:10 PM, Donald Stufft wrote:
> >> I’m in favor, although one question I guess is whether it should be a a
> >> PEP or an ad hoc spec. Given (2) it should *probably* be a a PEP (since
> >> without (2), its just another file in the .dist-info directory and that
> >> doesn’t actually need standardized at all). I don’t think that this will
> >> be a very controversial PEP though, and should be pretty easy.
> >
> > I have opened a PR to document what is already there, without adding any
> > new features. I think this is worth doing even if we don't change
> > anything, since it's a de-facto standard used for different tools to
> > interact.
> >
> > https://github.com/pypa/python-packaging-user-guide/pull/390
> >
> > We can still write a PEP for caching if necessary.
>
> I think documenting what’s there is a reasonable goal, but if we’re going
> to add caching we should just PEP the whole thing changing it from a defect
> standard to an actual standard + caching. Generally we should only use
> non-PEP “specs” in places where we’re just trying to document what exists
> already, but where we’re not really happy with the current solution or we
> plan to alter it eventually.
>
> For this, I think the entry points solution is generally a good one with
> some alterations (namely, the addition of caching)…. Although now that I
> think about it, maybe this isn’t really a packaging problem at all and I’m
> not sure that it benefits from standardization at all.
>
> So stepping back a second, here’s what entrypoints provides today:
>
> 1. A way to implement a interface that some other package can provide
> implementations for.
> 2. A way to specify script wrappers that will be automatically generated.
> 3. A way to define extras that must be installed in order for a particular
> entry point to be available.
>
> Off the bat I’m going to say we don’t need to worry about (2) in this
> hypothetical system, because I think the fact it is implemented currently
> via this system is mostly a historic accident, and it’s not something we
> should be looking at in the future. Script wrappers should have some
> dedicated metadata, not piggybacking off of the plugin system.
>
> For (3) I don’t believe that what extras were installed is recorded
> anywhere, so I’m going to guess that this works by looking up what extras
> are *available* for a particular package and then seeing if all of the
> requirements of that distribution are satisfied. Assuming that’s the case
> then that’s not really something that requires deep integration with the
> packaging toolchain, it just needs the APIs to look those things up.
>
> Finally we come to (1), which is in my opinion the meet of what you’re
> hoping to achieve here (and what most people are using entry points for
> outside of console scripts. What I notice about (1) is that it really has
> absolutely nothing to do with packaging at all. It would likely use some of
> the APIs provided by the packaging toolchain (for instance, the ability to
> add custom files to a .dist-info directory, the ability to iterate over
> installed packages, etc) but as a whole pip, setuptools, twine, PyPI, etc
> none of these things need to know anything about it.
>
> EXCEPT, for the fact that with the desire to cache things, it would be
> beneficial to “hook” into the lifecycle of a package install. However I
> know that there are other plugin systems out there that would like to also
> be able to do that (Twisted Plugins come to mind) and that I think outside
> of plugin systems, such a mechanism is likely to be useful in general for
> other cases.
>
> So heres a different idea that is a bit more ambitious but that I think is
> a better overall idea. Let entrypoints be a setuptools thing, and lets
> define some key lifecycle hooks during the installation of a package and
> some mechanism in the metadata to let other tools subscribe to those hooks.
> Then  a caching layer could be written for setuptools entrypoints to make
> that faster without requiring standardization, but also a whole new, better
> plugin system could to, Twisted plugins could benefit, etc [1].
>
> One thing that I like about all of our work recently in packaging is a lot
> of it has been about making it so there isn’t just one standard set of
> tools, and I think that providing lifecycle hooks is 

Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Paul Moore
On 19 October 2017 at 19:09, Donald Stufft  wrote:
>
> So heres a different idea that is a bit more ambitious but that I think is a 
> better overall idea. Let entrypoints be a setuptools thing, and lets define 
> some key lifecycle hooks during the installation of a package and some 
> mechanism in the metadata to let other tools subscribe to those hooks. Then  
> a caching layer could be written for setuptools entrypoints to make that 
> faster without requiring standardization, but also a whole new, better plugin 
> system could to, Twisted plugins could benefit, etc [1].

I think this is a nice idea, and like you say could likely enable a
number of interesting use cases. However...

>
> One thing that I like about all of our work recently in packaging is a lot of 
> it has been about making it so there isn’t just one standard set of tools, 
> and I think that providing lifecycle hooks is another step along that path.

While I agree with this, one thing I have noticed with recent work is
that standardising existing things has typically been relatively
painless and stress-free. But designing new mechanisms generally ends
up with huge threads, heated debates, and people burning out on the
whole thing. We've had a couple of cases of that recently, and in
particular Thomas has endured the big PEP 517 debate, so I'm inclined
to say we should take a rest from new designs for a while, and keep
the scope here limited.

We can go back and hit packaging system hooks later, it's not like the
idea will go away. And the breathing space will also give people time
to actually implement the recent PEPs, and consolidate the gains we've
already made.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Donald Stufft


> On Oct 19, 2017, at 12:14 PM, Thomas Kluyver  wrote:
> 
> On Thu, Oct 19, 2017, at 04:10 PM, Donald Stufft wrote:
>> I’m in favor, although one question I guess is whether it should be a a
>> PEP or an ad hoc spec. Given (2) it should *probably* be a a PEP (since
>> without (2), its just another file in the .dist-info directory and that
>> doesn’t actually need standardized at all). I don’t think that this will
>> be a very controversial PEP though, and should be pretty easy.
> 
> I have opened a PR to document what is already there, without adding any
> new features. I think this is worth doing even if we don't change
> anything, since it's a de-facto standard used for different tools to
> interact.
> 
> https://github.com/pypa/python-packaging-user-guide/pull/390
> 
> We can still write a PEP for caching if necessary.

I think documenting what’s there is a reasonable goal, but if we’re going to 
add caching we should just PEP the whole thing changing it from a defect 
standard to an actual standard + caching. Generally we should only use non-PEP 
“specs” in places where we’re just trying to document what exists already, but 
where we’re not really happy with the current solution or we plan to alter it 
eventually.

For this, I think the entry points solution is generally a good one with some 
alterations (namely, the addition of caching)…. Although now that I think about 
it, maybe this isn’t really a packaging problem at all and I’m not sure that it 
benefits from standardization at all.

So stepping back a second, here’s what entrypoints provides today:

1. A way to implement a interface that some other package can provide 
implementations for.
2. A way to specify script wrappers that will be automatically generated.
3. A way to define extras that must be installed in order for a particular 
entry point to be available.

Off the bat I’m going to say we don’t need to worry about (2) in this 
hypothetical system, because I think the fact it is implemented currently via 
this system is mostly a historic accident, and it’s not something we should be 
looking at in the future. Script wrappers should have some dedicated metadata, 
not piggybacking off of the plugin system.

For (3) I don’t believe that what extras were installed is recorded anywhere, 
so I’m going to guess that this works by looking up what extras are *available* 
for a particular package and then seeing if all of the requirements of that 
distribution are satisfied. Assuming that’s the case then that’s not really 
something that requires deep integration with the packaging toolchain, it just 
needs the APIs to look those things up.

Finally we come to (1), which is in my opinion the meet of what you’re hoping 
to achieve here (and what most people are using entry points for outside of 
console scripts. What I notice about (1) is that it really has absolutely 
nothing to do with packaging at all. It would likely use some of the APIs 
provided by the packaging toolchain (for instance, the ability to add custom 
files to a .dist-info directory, the ability to iterate over installed 
packages, etc) but as a whole pip, setuptools, twine, PyPI, etc none of these 
things need to know anything about it.

EXCEPT, for the fact that with the desire to cache things, it would be 
beneficial to “hook” into the lifecycle of a package install. However I know 
that there are other plugin systems out there that would like to also be able 
to do that (Twisted Plugins come to mind) and that I think outside of plugin 
systems, such a mechanism is likely to be useful in general for other cases.

So heres a different idea that is a bit more ambitious but that I think is a 
better overall idea. Let entrypoints be a setuptools thing, and lets define 
some key lifecycle hooks during the installation of a package and some 
mechanism in the metadata to let other tools subscribe to those hooks. Then  a 
caching layer could be written for setuptools entrypoints to make that faster 
without requiring standardization, but also a whole new, better plugin system 
could to, Twisted plugins could benefit, etc [1].

One thing that I like about all of our work recently in packaging is a lot of 
it has been about making it so there isn’t just one standard set of tools, and 
I think that providing lifecycle hooks is another step along that path.

> 
>> I’m also in favor of this. Although I would suggest SQLite rather than a
>> JSON file for the primary reason being that a JSON file isn’t
>> multiprocess safe without being careful (and possibly introducing
>> locking) whereas SQLite has already solved that problem.
> 
> SQLite was actually my first thought, but from experience in Jupyter &
> IPython I'm wary of it - its built-in locking does not work well over
> NFS, and it's easy to corrupt the database. I think careful use of
> atomic writing can be more reliable (though that has given us some
> problems too).
> 
> That may be easier if there's one 

Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Thomas Kluyver
On Thu, Oct 19, 2017, at 04:10 PM, Donald Stufft wrote:
> I’m in favor, although one question I guess is whether it should be a a
> PEP or an ad hoc spec. Given (2) it should *probably* be a a PEP (since
> without (2), its just another file in the .dist-info directory and that
> doesn’t actually need standardized at all). I don’t think that this will
> be a very controversial PEP though, and should be pretty easy.

I have opened a PR to document what is already there, without adding any
new features. I think this is worth doing even if we don't change
anything, since it's a de-facto standard used for different tools to
interact.

https://github.com/pypa/python-packaging-user-guide/pull/390

We can still write a PEP for caching if necessary.

> I’m also in favor of this. Although I would suggest SQLite rather than a
> JSON file for the primary reason being that a JSON file isn’t
> multiprocess safe without being careful (and possibly introducing
> locking) whereas SQLite has already solved that problem.

SQLite was actually my first thought, but from experience in Jupyter &
IPython I'm wary of it - its built-in locking does not work well over
NFS, and it's easy to corrupt the database. I think careful use of
atomic writing can be more reliable (though that has given us some
problems too).

That may be easier if there's one cache per user, though - we can
perhaps try to store it somewhere that's not NFS.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Donald Stufft

> On Oct 18, 2017, at 10:52 AM, Thomas Kluyver  wrote:
> 
> 
> 1. Specification
> 


I’m in favor, although one question I guess is whether it should be a a PEP or 
an ad hoc spec. Given (2) it should *probably* be a a PEP (since without (2), 
its just another file in the .dist-info directory and that doesn’t actually 
need standardized at all). I don’t think that this will be a very controversial 
PEP though, and should be pretty easy.


> 
> 2. Caching


I’m also in favor of this. Although I would suggest SQLite rather than a JSON 
file for the primary reason being that a JSON file isn’t multiprocess safe 
without being careful (and possibly introducing locking) whereas SQLite has 
already solved that problem.

One possible further enhancement to your proposal is to try and think of a way 
to have a singular cache, since we can include the sys.path entry as part of 
the data inside the cache, having a singular cache means we can reduce the the 
number of files we have to open down to a single file. The biggest problem I 
see with this, is it opens up questions about how we handle things like user 
installs… so maybe a cache DB per sys.path entry is the best way. I think we 
could use something like SQLite’s ATTACH DATABASE command to add multiple DBs 
to the same SQLite connection to be able to query across all of the entries 
with a single query. One downside to this is that SQLite is an optional module 
in Python so it may not exist, although we could implement that so that we just 
bypass the cache always in that case (and probably raise a warning?) so things 
continue to work, they will just be slower.

I know that Twisted has used a cache file for awhile for plugins (so a similiar 
use case) so I wonder if they would have any opinions or insight into this as 
well.


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-19 Thread Wes Turner
def get_env_json_path():
  directory = $VIRTUAL_ENV || ?
  return os.path.join(directory, ENV_JSON_FILENAME)

def on_install(pkg_json):
  env_json_path = get_env_json_path()
  env_json = json.load(env_json_path)
  env_json['pkgs’][pkgname] = pkg_json
  with open(env_json_path, 'w') as f:
f.write(env_json)

def read_cached_entry_points():
  env_json_path = get_env_json_path()
  env_json = json.load(env_json_path)
  entry_points = flatten(**{ pkg['entry_points'] for pkg in
env_json['pigs']})
  return entry_points


Would this introduce a need for a new and confusing rescan_metadata()
(pkg.on_install() for pkg in pkgs)?

On Wednesday, October 18, 2017, Nick Coghlan  wrote:

> On 19 October 2017 at 12:16, Daniel Holth  > wrote:
>
>> We said "you won't have to install setuptools" but actually "you don't
>> have to use it" is good enough. If you had 2 pkg-resources implementations
>> running you might wind up scanning sys.path extra times...
>>
> True, but that's where Thomas's suggestion of attempting to define a
> standardised caching convention comes in: right now, there's no middle
> ground between "you must use pkg_resources" and "every helper library must
> scan for the raw entry-point metadata itself".
>
> If there's a defined common caching mechanism, and support for it is added
> to new versions of pkg_resources, then the design constraint becomes "If
> you end up using multiple entry-point scanners, you'll want a recent
> setuptools/pkg_resource, so you don't waste too much time on repeated
> metadata scans".
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com
>    |   Brisbane,
> Australia
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-18 Thread Nick Coghlan
On 19 October 2017 at 12:16, Daniel Holth  wrote:

> We said "you won't have to install setuptools" but actually "you don't
> have to use it" is good enough. If you had 2 pkg-resources implementations
> running you might wind up scanning sys.path extra times...
>
True, but that's where Thomas's suggestion of attempting to define a
standardised caching convention comes in: right now, there's no middle
ground between "you must use pkg_resources" and "every helper library must
scan for the raw entry-point metadata itself".

If there's a defined common caching mechanism, and support for it is added
to new versions of pkg_resources, then the design constraint becomes "If
you end up using multiple entry-point scanners, you'll want a recent
setuptools/pkg_resource, so you don't waste too much time on repeated
metadata scans".

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-18 Thread Daniel Holth
We said "you won't have to install setuptools" but actually "you don't have
to use it" is good enough. If you had 2 pkg-resources implementations
running you might wind up scanning sys.path extra times...

On Wed, Oct 18, 2017, 20:53 Nick Coghlan  wrote:

> On 19 October 2017 at 04:18, Alex Grönholm 
> wrote:
>
>> Daniel Holth kirjoitti 18.10.2017 klo 21:06:
>>
>>
>> http://setuptools.readthedocs.io/en/latest/formats.html?highlight=entry_points.txt#entry-points-txt-entry-point-plugin-metadata
>>
>>
>> http://setuptools.readthedocs.io/en/latest/pkg_resources.html?highlight=pkg_resources#creating-and-parsing
>>
>> It is not very complicated. It looks like the characters are mostly
>> 'python identifier' rules with a little bit of 'package name' rules.
>>
>> I am also concerned about the amount of parsing on startup. A hard
>> problem for certain, since no one likes outdated cache problems either. It
>> is also unpleasant to have too much code with a runtime dependency on
>> 'packaging'.
>>
>> Wasn't someone working on implementing pkg_resources in the standard
>> library at some point?
>>
>
> The idea has been raised, but we've been hesitant for the same reason
> we're inclined to take distutils out: packaging APIs need to be free to
> evolve in line with packaging interoperability standards, rather than with
> the Python language definition.
>
> Barry Warsaw & Brett Cannon recently mentioned something to me about
> working on a potential runtime alternative to pkg_resources that could be
> installed without also installing setuptools, but I don't know any of the
> specifics (and I'm not sure either of them follows distutils-sig).
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-18 Thread Nick Coghlan
On 19 October 2017 at 04:18, Alex Grönholm  wrote:

> Daniel Holth kirjoitti 18.10.2017 klo 21:06:
>
> http://setuptools.readthedocs.io/en/latest/formats.html?
> highlight=entry_points.txt#entry-points-txt-entry-point-plugin-metadata
>
> http://setuptools.readthedocs.io/en/latest/pkg_resources.
> html?highlight=pkg_resources#creating-and-parsing
>
> It is not very complicated. It looks like the characters are mostly
> 'python identifier' rules with a little bit of 'package name' rules.
>
> I am also concerned about the amount of parsing on startup. A hard problem
> for certain, since no one likes outdated cache problems either. It is also
> unpleasant to have too much code with a runtime dependency on 'packaging'.
>
> Wasn't someone working on implementing pkg_resources in the standard
> library at some point?
>

The idea has been raised, but we've been hesitant for the same reason we're
inclined to take distutils out: packaging APIs need to be free to evolve in
line with packaging interoperability standards, rather than with the Python
language definition.

Barry Warsaw & Brett Cannon recently mentioned something to me about
working on a potential runtime alternative to pkg_resources that could be
installed without also installing setuptools, but I don't know any of the
specifics (and I'm not sure either of them follows distutils-sig).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-18 Thread Daniel Holth
On Wed, Oct 18, 2017 at 2:57 PM Paul Moore  wrote:

> On 18 October 2017 at 19:42, Thomas Kluyver  wrote:
> > On Wed, Oct 18, 2017, at 05:59 PM, Paul Moore wrote:
> >> > I've always used the setuptools documentation as a reference. Are you
> >> > suggesting moving that information to a different location to
> >> > allow/encourage other tools to implement it as a standard?
> >>
> >> I've never used entry points myself (other than the console script
> >> entry points supported by packaging) but a quick Google search found
> >>
> http://setuptools.readthedocs.io/en/latest/setuptools.html#dynamic-discovery-of-services-and-plugins
> >> as the only obvious candidate for documentation (and a bit later I
> >> thought of looking under pkg_resources and found
> >>
> http://setuptools.readthedocs.io/en/latest/pkg_resources.html#entry-points
> ).
> >> This doesn't really say how the entry point data is stored in the
> >> project metadata, so it's not clear how I'd read that data in my own
> >> code (the answer is of course to use pkg_resources, but the point of
> >> documenting it as a standard is to allow alternative implementations).
> >
> > I have in fact made an alternative implementation (PyPI package
> > entrypoints) by 'reverse engineering' the format. A simple text-based
> > format doesn't really justify the term 'reverse engineering', but for
> > instance it wasn't obvious to me that the names were case sensitive,
> > whereas Python's standard config parser treats keys as case-insensitive.
> >
> > Daniel:
> >>
> http://setuptools.readthedocs.io/en/latest/formats.html?highlight=entry_points.txt#entry-points-txt-entry-point-plugin-metadata
> >
> > Thanks, this link is closer than any I found to a specification. There
> > are docs on how to create entry points in setup.py and how to use them
> > with pkg_resources, but that's the only bit I've seen that describes the
> > interchange file format.
>
> Agreed, I hadn't found that, either.
>
> > I think we can probably expand on it a bit, though! I'll try to put
> > together something for packaging.python.org.
>
> One thing that immediately strikes me is that the encoding of the file
> is unspecified...
> Paul
>

Now that's an easy one to clear up, since there is only one worthwhile
encoding.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-18 Thread Daniel Holth
On Wed, Oct 18, 2017 at 2:18 PM Alex Grönholm 
wrote:

> Daniel Holth kirjoitti 18.10.2017 klo 21:06:
>
>
> http://setuptools.readthedocs.io/en/latest/formats.html?highlight=entry_points.txt#entry-points-txt-entry-point-plugin-metadata
>
>
> http://setuptools.readthedocs.io/en/latest/pkg_resources.html?highlight=pkg_resources#creating-and-parsing
>
> It is not very complicated. It looks like the characters are mostly
> 'python identifier' rules with a little bit of 'package name' rules.
>
> I am also concerned about the amount of parsing on startup. A hard problem
> for certain, since no one likes outdated cache problems either. It is also
> unpleasant to have too much code with a runtime dependency on 'packaging'.
>
> Wasn't someone working on implementing pkg_resources in the standard
> library at some point?
>

I'm just saying it is good to avoid importing it unless you really need to.
Same reason we removed it from entry point script wrappers.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-18 Thread Paul Moore
On 18 October 2017 at 19:42, Thomas Kluyver  wrote:
> On Wed, Oct 18, 2017, at 05:59 PM, Paul Moore wrote:
>> > I've always used the setuptools documentation as a reference. Are you
>> > suggesting moving that information to a different location to
>> > allow/encourage other tools to implement it as a standard?
>>
>> I've never used entry points myself (other than the console script
>> entry points supported by packaging) but a quick Google search found
>> http://setuptools.readthedocs.io/en/latest/setuptools.html#dynamic-discovery-of-services-and-plugins
>> as the only obvious candidate for documentation (and a bit later I
>> thought of looking under pkg_resources and found
>> http://setuptools.readthedocs.io/en/latest/pkg_resources.html#entry-points).
>> This doesn't really say how the entry point data is stored in the
>> project metadata, so it's not clear how I'd read that data in my own
>> code (the answer is of course to use pkg_resources, but the point of
>> documenting it as a standard is to allow alternative implementations).
>
> I have in fact made an alternative implementation (PyPI package
> entrypoints) by 'reverse engineering' the format. A simple text-based
> format doesn't really justify the term 'reverse engineering', but for
> instance it wasn't obvious to me that the names were case sensitive,
> whereas Python's standard config parser treats keys as case-insensitive.
>
> Daniel:
>> http://setuptools.readthedocs.io/en/latest/formats.html?highlight=entry_points.txt#entry-points-txt-entry-point-plugin-metadata
>
> Thanks, this link is closer than any I found to a specification. There
> are docs on how to create entry points in setup.py and how to use them
> with pkg_resources, but that's the only bit I've seen that describes the
> interchange file format.

Agreed, I hadn't found that, either.

> I think we can probably expand on it a bit, though! I'll try to put
> together something for packaging.python.org.

One thing that immediately strikes me is that the encoding of the file
is unspecified...
Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-18 Thread Thomas Kluyver
On Wed, Oct 18, 2017, at 05:59 PM, Paul Moore wrote:
> > I've always used the setuptools documentation as a reference. Are you
> > suggesting moving that information to a different location to
> > allow/encourage other tools to implement it as a standard?
> 
> I've never used entry points myself (other than the console script
> entry points supported by packaging) but a quick Google search found
> http://setuptools.readthedocs.io/en/latest/setuptools.html#dynamic-discovery-of-services-and-plugins
> as the only obvious candidate for documentation (and a bit later I
> thought of looking under pkg_resources and found
> http://setuptools.readthedocs.io/en/latest/pkg_resources.html#entry-points).
> This doesn't really say how the entry point data is stored in the
> project metadata, so it's not clear how I'd read that data in my own
> code (the answer is of course to use pkg_resources, but the point of
> documenting it as a standard is to allow alternative implementations).

I have in fact made an alternative implementation (PyPI package
entrypoints) by 'reverse engineering' the format. A simple text-based
format doesn't really justify the term 'reverse engineering', but for
instance it wasn't obvious to me that the names were case sensitive,
whereas Python's standard config parser treats keys as case-insensitive.

Daniel:
> http://setuptools.readthedocs.io/en/latest/formats.html?highlight=entry_points.txt#entry-points-txt-entry-point-plugin-metadata

Thanks, this link is closer than any I found to a specification. There
are docs on how to create entry points in setup.py and how to use them
with pkg_resources, but that's the only bit I've seen that describes the
interchange file format.

I think we can probably expand on it a bit, though! I'll try to put
together something for packaging.python.org.

Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-18 Thread Alex Grönholm

Daniel Holth kirjoitti 18.10.2017 klo 21:06:
http://setuptools.readthedocs.io/en/latest/formats.html?highlight=entry_points.txt#entry-points-txt-entry-point-plugin-metadata 



http://setuptools.readthedocs.io/en/latest/pkg_resources.html?highlight=pkg_resources#creating-and-parsing

It is not very complicated. It looks like the characters are mostly 
'python identifier' rules with a little bit of 'package name' rules.


I am also concerned about the amount of parsing on startup. A hard 
problem for certain, since no one likes outdated cache problems 
either. It is also unpleasant to have too much code with a runtime 
dependency on 'packaging'.
Wasn't someone working on implementing pkg_resources in the standard 
library at some point?


On Wed, Oct 18, 2017 at 1:00 PM Paul Moore > wrote:


On 18 October 2017 at 17:48, Doug Hellmann > wrote:
> Excerpts from Thomas Kluyver's message of 2017-10-18 15:52:00 +0100:
>> We're increasingly using entry points in Jupyter to help integrate
>> third-party components. This brings up a couple of things that
I'd like
>> to do:
>>
>> 1. Specification
>>
>> As far as I know, there's no document describing the details of
entry
>> points; it's a de-facto standard established by setuptools. It
seems to
>> work quite well, but it's worth writing down what is unofficially
>> standardised. I would like to see a document on
>> https://packaging.python.org/specifications/ saying:
>>
>> - Where build tools should put entry points in wheels
>> - Where entry points live in installed distributions
>> - The file format (including allowed characters, case
sensitivity...)
>>
>> I guess I'm volunteering to write this, although if someone
else wants
>> to, don't let me stop you. ;-)
>>
>> I'd also be happy to hear that I'm wrong, that this specification
>> already exists somewhere. If it does, can we add a link from
>> https://packaging.python.org/specifications/ ?
>
> I've always used the setuptools documentation as a reference.
Are you
> suggesting moving that information to a different location to
> allow/encourage other tools to implement it as a standard?

I've never used entry points myself (other than the console script
entry points supported by packaging) but a quick Google search found

http://setuptools.readthedocs.io/en/latest/setuptools.html#dynamic-discovery-of-services-and-plugins
as the only obvious candidate for documentation (and a bit later I
thought of looking under pkg_resources and found
http://setuptools.readthedocs.io/en/latest/pkg_resources.html#entry-points).
This doesn't really say how the entry point data is stored in the
project metadata, so it's not clear how I'd read that data in my own
code (the answer is of course to use pkg_resources, but the point of
documenting it as a standard is to allow alternative implementations).
Also, it's not clear how a tool like flit might implement entry points
- again, because the specifications don't describe how the metadata is
stored.

+1 from me on moving the entry point specification to
https://packaging.python.org/specifications/

Paul
___
Distutils-SIG maillist  - Distutils-SIG@python.org

https://mail.python.org/mailman/listinfo/distutils-sig



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-18 Thread Daniel Holth
http://setuptools.readthedocs.io/en/latest/formats.html?highlight=entry_points.txt#entry-points-txt-entry-point-plugin-metadata

http://setuptools.readthedocs.io/en/latest/pkg_resources.html?highlight=pkg_resources#creating-and-parsing

It is not very complicated. It looks like the characters are mostly 'python
identifier' rules with a little bit of 'package name' rules.

I am also concerned about the amount of parsing on startup. A hard problem
for certain, since no one likes outdated cache problems either. It is also
unpleasant to have too much code with a runtime dependency on 'packaging'.

On Wed, Oct 18, 2017 at 1:00 PM Paul Moore  wrote:

> On 18 October 2017 at 17:48, Doug Hellmann  wrote:
> > Excerpts from Thomas Kluyver's message of 2017-10-18 15:52:00 +0100:
> >> We're increasingly using entry points in Jupyter to help integrate
> >> third-party components. This brings up a couple of things that I'd like
> >> to do:
> >>
> >> 1. Specification
> >>
> >> As far as I know, there's no document describing the details of entry
> >> points; it's a de-facto standard established by setuptools. It seems to
> >> work quite well, but it's worth writing down what is unofficially
> >> standardised. I would like to see a document on
> >> https://packaging.python.org/specifications/ saying:
> >>
> >> - Where build tools should put entry points in wheels
> >> - Where entry points live in installed distributions
> >> - The file format (including allowed characters, case sensitivity...)
> >>
> >> I guess I'm volunteering to write this, although if someone else wants
> >> to, don't let me stop you. ;-)
> >>
> >> I'd also be happy to hear that I'm wrong, that this specification
> >> already exists somewhere. If it does, can we add a link from
> >> https://packaging.python.org/specifications/ ?
> >
> > I've always used the setuptools documentation as a reference. Are you
> > suggesting moving that information to a different location to
> > allow/encourage other tools to implement it as a standard?
>
> I've never used entry points myself (other than the console script
> entry points supported by packaging) but a quick Google search found
>
> http://setuptools.readthedocs.io/en/latest/setuptools.html#dynamic-discovery-of-services-and-plugins
> as the only obvious candidate for documentation (and a bit later I
> thought of looking under pkg_resources and found
> http://setuptools.readthedocs.io/en/latest/pkg_resources.html#entry-points
> ).
> This doesn't really say how the entry point data is stored in the
> project metadata, so it's not clear how I'd read that data in my own
> code (the answer is of course to use pkg_resources, but the point of
> documenting it as a standard is to allow alternative implementations).
> Also, it's not clear how a tool like flit might implement entry points
> - again, because the specifications don't describe how the metadata is
> stored.
>
> +1 from me on moving the entry point specification to
> https://packaging.python.org/specifications/
>
> Paul
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-18 Thread Paul Moore
On 18 October 2017 at 17:48, Doug Hellmann  wrote:
> Excerpts from Thomas Kluyver's message of 2017-10-18 15:52:00 +0100:
>> We're increasingly using entry points in Jupyter to help integrate
>> third-party components. This brings up a couple of things that I'd like
>> to do:
>>
>> 1. Specification
>>
>> As far as I know, there's no document describing the details of entry
>> points; it's a de-facto standard established by setuptools. It seems to
>> work quite well, but it's worth writing down what is unofficially
>> standardised. I would like to see a document on
>> https://packaging.python.org/specifications/ saying:
>>
>> - Where build tools should put entry points in wheels
>> - Where entry points live in installed distributions
>> - The file format (including allowed characters, case sensitivity...)
>>
>> I guess I'm volunteering to write this, although if someone else wants
>> to, don't let me stop you. ;-)
>>
>> I'd also be happy to hear that I'm wrong, that this specification
>> already exists somewhere. If it does, can we add a link from
>> https://packaging.python.org/specifications/ ?
>
> I've always used the setuptools documentation as a reference. Are you
> suggesting moving that information to a different location to
> allow/encourage other tools to implement it as a standard?

I've never used entry points myself (other than the console script
entry points supported by packaging) but a quick Google search found
http://setuptools.readthedocs.io/en/latest/setuptools.html#dynamic-discovery-of-services-and-plugins
as the only obvious candidate for documentation (and a bit later I
thought of looking under pkg_resources and found
http://setuptools.readthedocs.io/en/latest/pkg_resources.html#entry-points).
This doesn't really say how the entry point data is stored in the
project metadata, so it's not clear how I'd read that data in my own
code (the answer is of course to use pkg_resources, but the point of
documenting it as a standard is to allow alternative implementations).
Also, it's not clear how a tool like flit might implement entry points
- again, because the specifications don't describe how the metadata is
stored.

+1 from me on moving the entry point specification to
https://packaging.python.org/specifications/

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Entry points: specifying and caching

2017-10-18 Thread Doug Hellmann
Excerpts from Thomas Kluyver's message of 2017-10-18 15:52:00 +0100:
> We're increasingly using entry points in Jupyter to help integrate
> third-party components. This brings up a couple of things that I'd like
> to do:
> 
> 1. Specification
> 
> As far as I know, there's no document describing the details of entry
> points; it's a de-facto standard established by setuptools. It seems to
> work quite well, but it's worth writing down what is unofficially
> standardised. I would like to see a document on
> https://packaging.python.org/specifications/ saying:
> 
> - Where build tools should put entry points in wheels
> - Where entry points live in installed distributions
> - The file format (including allowed characters, case sensitivity...)
> 
> I guess I'm volunteering to write this, although if someone else wants
> to, don't let me stop you. ;-)
> 
> I'd also be happy to hear that I'm wrong, that this specification
> already exists somewhere. If it does, can we add a link from
> https://packaging.python.org/specifications/ ?

I've always used the setuptools documentation as a reference. Are you
suggesting moving that information to a different location to
allow/encourage other tools to implement it as a standard?

> 2. Caching
> 
> "There are only two hard problems in computer science: cache
> invalidation, naming things, and off-by-one errors"
> 
> I know that caching is going to make things more complex, but at present
> a scan of available entry points requires a stat() for every installed
> package, plus open()+read()+parse for every installed package that
> provides entry points. This doesn't scale well, especially on spinning
> hard drives. By eliminating a call to pygments which caused an entry
> points scan, we cut the cold-start time of IPython almost in half on one
> HDD system (11s -> 6s; PR 10859).
> 
> As packaging improves, the trend is to break functionality into more,
> smaller packages, which is only going to make this worse (though I hope
> we never end up with a left-pad package ;-). Caching could allow entry
> points to be used in places where the current performance penalty is too
> much.
> 
> I envisage a cache working something like this:
> - Each directory on sys.path can have a cache file, e.g.
> 'entry-points.json'
> - I suggest JSON because Python can parse it efficiently, and it's not
> intended to be directly edited by humans. Other options? SQLite? Does
> someone want to do performance comparisons?
> - There is a command to scan all packages in a directory and build the
> cache file
> - After an install tool (e.g. pip) has added/removed packages from a
> directory, it should call that command to rebuild the cache.
> - A second command goes through all directories on sys.path and rebuilds
> their cache files - this lets the user rebuild caches if something has
> gone wrong.
> - Applications looking for entry points can choose from a range of
> behaviours depending on how important accuracy and performance are. E.g.
> ignore all caches, only use caches, use caches for directories where
> they exist, or try caches first and then scan packages if a key is
> missing.
> 
> In the best case, when the caches exist and you trust them, loading them
> would cost one set of filesystem operations per sys.path entry, rather
> than per package.
> 
> Thanks,
> Thomas

We've run into similar issues in some applications I work on. I had
intended to implement a caching layer within stevedore
(https://docs.openstack.org/stevedore/latest/) as a first step for
experimenting with approaches, but I would be happy to collaborate on
something further upstream if there's interest.

Doug
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] Entry points: specifying and caching

2017-10-18 Thread Thomas Kluyver
We're increasingly using entry points in Jupyter to help integrate
third-party components. This brings up a couple of things that I'd like
to do:

1. Specification

As far as I know, there's no document describing the details of entry
points; it's a de-facto standard established by setuptools. It seems to
work quite well, but it's worth writing down what is unofficially
standardised. I would like to see a document on
https://packaging.python.org/specifications/ saying:

- Where build tools should put entry points in wheels
- Where entry points live in installed distributions
- The file format (including allowed characters, case sensitivity...)

I guess I'm volunteering to write this, although if someone else wants
to, don't let me stop you. ;-)

I'd also be happy to hear that I'm wrong, that this specification
already exists somewhere. If it does, can we add a link from
https://packaging.python.org/specifications/ ?

2. Caching

"There are only two hard problems in computer science: cache
invalidation, naming things, and off-by-one errors"

I know that caching is going to make things more complex, but at present
a scan of available entry points requires a stat() for every installed
package, plus open()+read()+parse for every installed package that
provides entry points. This doesn't scale well, especially on spinning
hard drives. By eliminating a call to pygments which caused an entry
points scan, we cut the cold-start time of IPython almost in half on one
HDD system (11s -> 6s; PR 10859).

As packaging improves, the trend is to break functionality into more,
smaller packages, which is only going to make this worse (though I hope
we never end up with a left-pad package ;-). Caching could allow entry
points to be used in places where the current performance penalty is too
much.

I envisage a cache working something like this:
- Each directory on sys.path can have a cache file, e.g.
'entry-points.json'
- I suggest JSON because Python can parse it efficiently, and it's not
intended to be directly edited by humans. Other options? SQLite? Does
someone want to do performance comparisons?
- There is a command to scan all packages in a directory and build the
cache file
- After an install tool (e.g. pip) has added/removed packages from a
directory, it should call that command to rebuild the cache.
- A second command goes through all directories on sys.path and rebuilds
their cache files - this lets the user rebuild caches if something has
gone wrong.
- Applications looking for entry points can choose from a range of
behaviours depending on how important accuracy and performance are. E.g.
ignore all caches, only use caches, use caches for directories where
they exist, or try caches first and then scan packages if a key is
missing.

In the best case, when the caches exist and you trust them, loading them
would cost one set of filesystem operations per sys.path entry, rather
than per package.

Thanks,
Thomas
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig