Excerpts from Nick Coghlan's message of 2017-10-20 14:42:09 +1000: > On 20 October 2017 at 02:14, Thomas Kluyver <tho...@kluyver.me.uk> wrote: > > > On Thu, Oct 19, 2017, at 04:10 PM, Donald Stufft wrote: > > > I’m in favor, although one question I guess is whether it should be a a > > > PEP or an ad hoc spec. Given (2) it should *probably* be a a PEP (since > > > without (2), its just another file in the .dist-info directory and that > > > doesn’t actually need standardized at all). I don’t think that this will > > > be a very controversial PEP though, and should be pretty easy. > > > > I have opened a PR to document what is already there, without adding any > > new features. I think this is worth doing even if we don't change > > anything, since it's a de-facto standard used for different tools to > > interact. > > > > https://github.com/pypa/python-packaging-user-guide/pull/390 > > > > We can still write a PEP for caching if necessary. > > > > +1 for that approach (PR for the status quo, PEP for a shared metadata > caching design) from me > > Making the status quo more discoverable is valuable in its own right, and > the only decisions we'll need to make for that are terminology > clarification ones, not interoperability ones (this isn't like PEP 440 or > 508 where we actually thought some of the default setuptools behaviour was > slightly incorrect and wanted to change it). > > Figuring out a robust cross-platform network-file-system-tolerant metadata > caching design on the other hand is going to be hard, and as Donald > suggests, the right ecosystem level solution might be to define > install-time hooks for package installation operations. > > > > I’m also in favor of this. Although I would suggest SQLite rather than a > > > JSON file for the primary reason being that a JSON file isn’t > > > multiprocess safe without being careful (and possibly introducing > > > locking) whereas SQLite has already solved that problem. > > > > SQLite was actually my first thought, but from experience in Jupyter & > > IPython I'm wary of it - its built-in locking does not work well over > > NFS, and it's easy to corrupt the database. I think careful use of > > atomic writing can be more reliable (though that has given us some > > problems too). > > > > That may be easier if there's one cache per user, though - we can > > perhaps try to store it somewhere that's not NFS. > > > > I'm wondering if rather than jumping straight to a PEP, it may make sense > to instead initially pursue this idea as a *non-*standard, implementation > dependent thing specific to the "entrypoints" project. There are a *lot* of > challenges to be taken into account for a truly universal metadata caching > design, and it would be easy to fall into the trap of coming up with a > design so complex that nobody can realistically implement it. > > Specifically, I'm thinking of a usage model along the lines of the > updatedb/locate pair on *nix systems: `locate` gives you access to very > fast searches of your filesystem, but it *doesn't* try to automagically > keeps its indexes up to date. Instead, refreshing the indexes is handled by > `updatedb`, and you can either rely on that being run automatically in a > cron job, or else force an update with `sudo updatedb` when you want to use > `locate`. > > For a project like entrypoints, what that might look like is that at > *runtime*, you may implement a reasonably fast "cache freshness check", > where you scanned the mtime of all the sys.path entries, and compared those > to the mtime of the cache. If the cache looks up to date, then cool, > otherwise emit a warning about the stale metadata cache, and then bypass it. > > The entrypoints project itself could then expose a > `refresh-entrypoints-cache` command that could start out only supporting > virtual environments, and then extend to per-user caching, and then finally > (maybe) consider whether or not it wanted to support installation-wide > caches (with the extra permissions management and cross-process and > cross-system coordination that may imply). > > Such an approach would also tie in nicely with Donald's suggestion of > reframing the ecosystem level question as "How should the entrypoints > project request that 'refresh-entrypoints-cache' be run after every package > installation or removal operation?", which in turn would integrate nicely > with things like RPM file triggers (where the system `pip` package could > set a file trigger that arranged for any properly registered Python package > installation plugins to be run for every modification to site-packages > while still appropriately managing the risk of running arbitrary code with > elevated privileges) > > Cheers, > Nick. >
I have been trying to find time to do something like that within stevedore for a while to solve some client-side startup performance issues with the OpenStack client. I would be happy to help add it to entrypoints instead and use it from there. Thomas, please me know how I can help. Doug _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig