Re: [Distutils] PEP 426: proposed metadata caching convention
On Thu, Feb 28, 2013 at 12:54 AM, Nick Coghlan ncogh...@gmail.com wrote: On Thu, Feb 28, 2013 at 7:59 AM, Daniel Holth dho...@gmail.com wrote: My aim is to provide a hook mechanism that specifically does not say anything about the way the cache is stored or even whether the hook produces a cache at all. It will just run when pip is done. How does the following idea sound? New metadata field: Post-Install Format: a *single* callable reference in entry-points format (i.e. module.name:callable.name) Call signature: def post_install_hook(metadata, extras, previous_version=None): ... extras would be a tuple indicating which extras were installed. For an upgrade, previous_version would be set to the version that was previously installed. For a clean installation, it would either be None or omitted entirely. The metadata argument would be the PEP 426 metadata, reformatted as JSON-compatible structured metadata. I had planned to postpone defining the algorithm for that conversion until after PEP 426 acceptance, but if we're going to add a post-install hook mechanism to PEP 426, I think it makes more sense to define it up front: 1. The top level is a mapping, with lowercase versions of all PEP 426 fields as keys. All multiple-use fields other than requires-python are pluralised (that one is only multiple use so you can depend on a different version of Python given different environment markers - for example, supporting Python 2.6 everywhere, but requiring Python 2.7 on Windows. Aside from those cases, you can collapse an arbitrarily complex version specifier down to a single line) 3. Every mandatory field is present, with a string value 4. If present, the keywords field, references a list of keywords (created via str.split) 5. If present, the description is always stored under the description key, even if provided in the PEP 426 metadata payload 6. If any other optional field is present, it references a string value 7. If present, the project-urls key references a mapping of labels to URLs. 8. If present, the extensions key references a mapping of extension names to the extension's embedded JSON metadata. (Note: this is the key reason for my planned change to the extension format from arbitrary subfields to allowing only a single json subfield - it greatly simplifies this aspect of the translation to structured metadata, *and* makes it more flexible and powerful at the same time) 9. For any multi-use field that is present and supports environment markers, it is a reference to a mapping where each key is a whitespace-normalized (i.e. every sequence of whitespace converted to a single space) environment marker string that references a list of string values. The unqualified fields are referenced by the string always. This breakdown allows each unique environment marker to be evaluated only once to determine whether or not it is applicable, regardless of how many times it was originally used. 10. If any other multi-use field is present, it references a list of string values. For example: Metadata-Version: 2.0 Name: BeagleVote Version: 1.0a2 Summary: A module for collecting votes from beagles. Keywords: dog puppy voting election Project-URL: Bug, Issue Tracker, http://bitbucket.org/tarek/distribute/issues/ Requires-Dist: pkginfo Requires-Dist: PasteDeploy Requires-Dist: zope.interface (3.5.0) Extension: Chili Chili/json: { Type: Poblano, Heat: Mild } Apparently, these beagles like their chili. (This is not a helpful description) Would become: { metadata-version: 2.0, name: BeagleVote, version: 1.0a2, summary: A module for collecting votes from beagles., description: Apparently, these beagles like their chili. (This is not a helpful description), keywords: [dog, puppy, voting, election], project-urls: { Bug, Issue Tracker: http://bitbucket.org/tarek/distribute/issues/; }, requires-dists: {always: [pkginfo, PasteDeploy, zope.interface (3.5.0)]}, extensions: { Chili: { Type: Poblano, Heat: Mild } } } An apparently simpler alternative would be to rely on PEP 376 to retrieve the full metadata and only provide the distribution name and version to the hook: def post_install_hook(distname, current_version, previous_version=None): ... The key disadvantage of that seemingly simpler approach is it *only* works for post install and pre uninstall hooks, *and* requires that the post-install hook have the tools needed to read the PEP 376 metadata. If we later want to add pre-install, build or archiving hooks, they would need the structured metadata format anyway, as relying on PEP 376 isn't an option for software that hasn't been installed yet. This simpler alternative
Re: [Distutils] PEP 426: proposed metadata caching convention
On Fri, Mar 1, 2013 at 12:00 AM, Daniel Holth dho...@gmail.com wrote: We will probably wind up with some JSON very much like that. I like just exposing it as an ordered multidict with the same key names as mentioned in the PEP. A multidict is not really JSON-compatible - making sure there's an unambiguous mapping to an ordinary dictionary is highly desirable. Also, it's handy to pre-split and group the entries conditioned on the environment markers. IMO the environment marker for always is just (empty string). I initially had that, but it looked weird in the case where there weren't any conditional entries, and it also looks weird when accessing the data structure. By contrast, always is a self-describing key. My hook would be a literal Entry-Point. You would install a package twisted.plugins that would register its interest in installation changes by declaring the entry point [packaging.hooks] post_install=twisted.plugins:hook. Afterwards, every time you install or uninstall another package, twisted.plugins.hook() would be called. It would iterate over all installed distributions using some API like pkg_resources.working_set or distlib's database and do whatever it needed to do. It could be called once per pip invocation instead of once per individual package. The hook is not guaranteed to run. If you do not run the hook, you should expect Twisted's plugin discovery process to take longer just like it does today. In fact the packages available on sys.path are not guaranteed to have been installed at all. This is *not* the same kind of hook at all. The proposed hook is only run when *Twisted* is installed to replace some current legitimate customisation of ./setup.py install behaviour, not when an arbitrary package is installed to let Twisted know about it. Your suggestion would indeed be more appropriately part of an installer-specific entry point (but one made much easier by the standard including an algorithm for conversion to structured metadata). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 426: proposed metadata caching convention
On Thu, Feb 28, 2013 at 10:04 AM, Nick Coghlan ncogh...@gmail.com wrote: On Fri, Mar 1, 2013 at 12:00 AM, Daniel Holth dho...@gmail.com wrote: We will probably wind up with some JSON very much like that. I like just exposing it as an ordered multidict with the same key names as mentioned in the PEP. A multidict is not really JSON-compatible - making sure there's an unambiguous mapping to an ordinary dictionary is highly desirable. Also, it's handy to pre-split and group the entries conditioned on the environment markers. Sure, nothing wrong with it. Just don't bother pluralizing the names. Goose: gander becomes geese : {} no thanks. IMO the environment marker for always is just (empty string). I initially had that, but it looked weird in the case where there weren't any conditional entries, and it also looks weird when accessing the data structure. By contrast, always is a self-describing key. Or True, or an environment-marker tautology... My hook would be a literal Entry-Point. You would install a package twisted.plugins that would register its interest in installation changes by declaring the entry point [packaging.hooks] post_install=twisted.plugins:hook. Afterwards, every time you install or uninstall another package, twisted.plugins.hook() would be called. It would iterate over all installed distributions using some API like pkg_resources.working_set or distlib's database and do whatever it needed to do. It could be called once per pip invocation instead of once per individual package. The hook is not guaranteed to run. If you do not run the hook, you should expect Twisted's plugin discovery process to take longer just like it does today. In fact the packages available on sys.path are not guaranteed to have been installed at all. This is *not* the same kind of hook at all. The proposed hook is only That is why this conversation has been so confusing :-) run when *Twisted* is installed to replace some current legitimate customisation of ./setup.py install behaviour, not when an arbitrary package is installed to let Twisted know about it. Your suggestion would indeed be more appropriately part of an installer-specific entry point (but one made much easier by the standard including an algorithm for conversion to structured metadata). ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 426: proposed metadata caching convention
Nick Coghlan ncoghlan at gmail.com writes: I'm not a fan of post-install hooks - that way lies setup.py. If people want to run arbitrary code at install time, they can publish a platform specific installer. *Maybe* we can go down that path in the Python 3.5 timeframe, but for now, no. I'm concerned that this might affect adoption: there are a lot of projects that have non-trivial custom code in setup.py - often doing mundane stuff like copying files around before the actual setup() call. Having hooks will enable easier migration for such projects (which include, for example, Twisted, Cython, NumPy). I don't believe it's realistic to expect them all to create platform-specific installers; they'll just carry on using setuptools/distribute. If we want to move things forward in packaging, surely we have to make migration easier? IMO this was one of the things that distutils2/packaging also did not address sufficiently. Just to clarify: when I say hooks, what I mean is setuptools-style entry points that the installer looks for, which are used to customise the installation process. I believe it is possible to provide limited extensibility using hooks without it leading to the complete ad-hocery that setup.py entails. Regards, Vinay Sajip ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 426: proposed metadata caching convention
On Wed, Feb 27, 2013 at 8:52 PM, Vinay Sajip vinay_sa...@yahoo.co.uk wrote: Just to clarify: when I say hooks, what I mean is setuptools-style entry points that the installer looks for, which are used to customise the installation process. The command to create a wheel from a source archive is currently still ./setup.py bdist_wheel. This may be executed on an appropriate build system rather than the target system, but aside from that everything in setup.py should still execute normally. This is the major difference between the current attempt and distutils2: du2 made moving from setup.py to setup.cfg a requirement to generate the new metadata format. By contrast, I want at least distribute, as well as the Python 3.4 distutils, to be able to generate wheels (including the new metadata) from current setup.py files. I believe it is possible to provide limited extensibility using hooks without it leading to the complete ad-hocery that setup.py entails. For version 1.0, the only install-time modification that all wheel installers must implement is fanning files out to their target locations based on sysconfig directories and rewriting script shebang lines (they may also want to generate parallel Windows executables, but with the Windows launcher, that's less necessary). If a project needs more than that, they cannot ship wheels at this time, and will need to continue shipping source distributions that can execute arbitrary code at install time. Alternatively (and preferably), such a project could split out a support library that is wheel compatible, and have a separate component that must be installed from source and is able to make arbitrary changes to the target system. *Incremental* change, and explicitly leaving some use cases to source distribution and ./setup.py for the moment is the key to creating a distribution format that is as simple as we can make it while still supporting a wide variety of use cases. Will we eventually get pre-install and post-install hooks ala RPM and other platform specific systems? Quite possibly. But let's see how far we can get without them first - in particular, I want to focus people's initial efforts on putting the smarts into the wheel *creation* process rather than delaying decisions until install time. The initial problem I believe we need to solve is the one of arcane build systems for key dependencies, and the simple fact that most Windows users aren't equipped to build software written in C in the first place. Eggs tried to tackle that problem years ago, but ignored things like the Filesystem Hierarchy Standard and the interests of OS distributions and system administrators, limiting its adoption to those developers that were happy with the idea of storing *everything* inside a single directory (the various legitimate concerns with the default behaviour of easy_install also didn't help). Wheel is designed to integrate more cleanly with platform specific conventions, hopefully overcoming some of those past objections to the egg format. This preliminary approach also integrates well with centralised system management tools like Puppet, Chef and Salt - for those, the states and configurations of services and other components are handled through the management infrastructure, and the language specific package management tools are just a way to get the application code onto the target systems in a controlled fashion. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 426: proposed metadata caching convention
On Wed, Feb 27, 2013 at 6:45 AM, Nick Coghlan ncogh...@gmail.com wrote: On Wed, Feb 27, 2013 at 8:52 PM, Vinay Sajip vinay_sa...@yahoo.co.uk wrote: Just to clarify: when I say hooks, what I mean is setuptools-style entry points that the installer looks for, which are used to customise the installation process. The command to create a wheel from a source archive is currently still ./setup.py bdist_wheel. This may be executed on an appropriate build system rather than the target system, but aside from that everything in setup.py should still execute normally. This is the major difference between the current attempt and distutils2: du2 made moving from setup.py to setup.cfg a requirement to generate the new metadata format. By contrast, I want at least distribute, as well as the Python 3.4 distutils, to be able to generate wheels (including the new metadata) from current setup.py files. Vinay's distlib has taken the wheel spec at its word, runs an unmodified install command with all the various paths set to wheel-compatible distname-1.0.data/scripts etc., and converts the .egg-info directory to .dist-info the same as bdist_wheel's final step. All wheel does is it takes a basic assumption of distutils2 (avoid running setup.py), rearranges it slightly (avoid running setup.py at install time) and magically people seem to like it. I wanted lxml to compile faster and wound up with a distutils escape hatch. Now I think that avoiding running *distutils* at install time is much more important than avoiding setup.py. I believe it is possible to provide limited extensibility using hooks without it leading to the complete ad-hocery that setup.py entails. For version 1.0, the only install-time modification that all wheel installers must implement is fanning files out to their target locations based on sysconfig directories and rewriting script shebang lines (they may also want to generate parallel Windows executables, but with the Windows launcher, that's less necessary). If a project needs more than that, they cannot ship wheels at this time, and will need to continue shipping source distributions that can execute arbitrary code at install time. Alternatively (and preferably), such a project could split out a support library that is wheel compatible, and have a separate component that must be installed from source and is able to make arbitrary changes to the target system. *Incremental* change, and explicitly leaving some use cases to source distribution and ./setup.py for the moment is the key to creating a distribution format that is as simple as we can make it while still supporting a wide variety of use cases. Will we eventually get pre-install and post-install hooks ala RPM and other platform specific systems? Quite possibly. But let's see how far we can get without them first - in particular, I want to focus people's initial efforts on putting the smarts into the wheel *creation* process rather than delaying decisions until install time. It's just the 1.0 release. There's no hurry to write the document entitled PEP 376 is now the/a standard *interchange* format for distribution metadata; here's how you can experiment with caching runtime introspection. Other tasks such as create the simplest possible useful packaging system for the stdlib [by only including the install feature] and create an ecosystem of interoperable third-party products to do everything else are higher up on the Grand Python Packaging Plan or GP3 (tm) to-do list. The initial problem I believe we need to solve is the one of arcane build systems for key dependencies, and the simple fact that most Windows users aren't equipped to build software written in C in the first place. Eggs tried to tackle that problem years ago, but ignored things like the Filesystem Hierarchy Standard and the interests of OS distributions and system administrators, limiting its adoption to those developers that were happy with the idea of storing *everything* inside a single directory (the various legitimate concerns with the default behaviour of easy_install also didn't help). Wheel is designed to integrate more cleanly with platform specific conventions, hopefully overcoming some of those past objections to the egg format. It's designed to make binary packaging generally interesting, even if you don't have C extensions, or even if you do have a C compiler. This will hopefully be a benefit to our Windows community as well. This preliminary approach also integrates well with centralised system management tools like Puppet, Chef and Salt - for those, the states and configurations of services and other components are handled through the management infrastructure, and the language specific package management tools are just a way to get the application code onto the target systems in a controlled fashion. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia
Re: [Distutils] PEP 426: proposed metadata caching convention
Daniel Holth dholth at gmail.com writes: Vinay's distlib has taken the wheel spec at its word, runs an unmodified install command with all the various paths set to wheel-compatible distname-1.0.data/scripts etc., and converts the .egg-info directory to .dist-info the same as bdist_wheel's final step. Right, except there's no conversion of .egg-info to .dist-info in distlib itself. That's done by the separate wheeler.py demonstration script, which uses vanilla pip to install to a holding location, converts the .egg-info to .dist-info and then builds the wheel from that. At installation time, the wheel's .dist-info contents are moved to the installation site's site-packages, except for WHEEL, which is omitted, and RECORD which is recreated. All wheel does is it takes a basic assumption of distutils2 (avoid running setup.py), rearranges it slightly (avoid running setup.py at install time) and magically people seem to like it. I wanted lxml to compile faster and wound up with a distutils escape hatch. Now I think A happy accident, then! that avoiding running *distutils* at install time is much more important than avoiding setup.py. It's just the 1.0 release. There's no hurry to write the document entitled PEP 376 is now the/a standard *interchange* format for [snip] third-party products to do everything else are higher up on the Grand Python Packaging Plan or GP3 (tm) to-do list. I suppose you're right, but I want to make as much progress as I can while I still have the time I can spend on this, and while the grey cells haven't succumbed to packaging fatigue ... :-) Regards, Vinay Sajip ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 426: proposed metadata caching convention
On Wed, Feb 27, 2013 at 10:08 AM, Vinay Sajip vinay_sa...@yahoo.co.uk wrote: Daniel Holth dholth at gmail.com writes: Vinay's distlib has taken the wheel spec at its word, runs an unmodified install command with all the various paths set to wheel-compatible distname-1.0.data/scripts etc., and converts the .egg-info directory to .dist-info the same as bdist_wheel's final step. Right, except there's no conversion of .egg-info to .dist-info in distlib itself. That's done by the separate wheeler.py demonstration script, which uses vanilla pip to install to a holding location, converts the .egg-info to .dist-info and then builds the wheel from that. At installation time, the wheel's .dist-info contents are moved to the installation site's site-packages, except for WHEEL, which is omitted, and RECORD which is recreated. All wheel does is it takes a basic assumption of distutils2 (avoid running setup.py), rearranges it slightly (avoid running setup.py at install time) and magically people seem to like it. I wanted lxml to compile faster and wound up with a distutils escape hatch. Now I think A happy accident, then! that avoiding running *distutils* at install time is much more important than avoiding setup.py. It's just the 1.0 release. There's no hurry to write the document entitled PEP 376 is now the/a standard *interchange* format for [snip] third-party products to do everything else are higher up on the Grand Python Packaging Plan or GP3 (tm) to-do list. I suppose you're right, but I want to make as much progress as I can while I still have the time I can spend on this, and while the grey cells haven't succumbed to packaging fatigue ... :-) Luckily parts of your brain are red and black. I'm amazed at the effort you've put forth so far. The idea isn't to limit the amount of progress but simply to have a good separation between a smaller number things we need to agree on and probably put in the stdlib (for example dependency declarations and a basic binary format) and the things we don't have to or are very unlikely to agree on that will probably be outside the stdlib (for example a not-likely-forthcoming universal build system, and perhaps the best way to cache .dist-info assuming the feature is even beneficial at all). Anyway Nick has been describing a different thing numpy or package specific post-install hook than the proposal some way to run code that is intended to cache .dist-info directories at install time without patching every installer. ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 426: proposed metadata caching convention
On Feb 27, 2013, at 2:52 AM, Vinay Sajip vinay_sa...@yahoo.co.uk wrote: Nick Coghlan ncoghlan at gmail.com writes: I'm not a fan of post-install hooks - that way lies setup.py. If people want to run arbitrary code at install time, they can publish a platform specific installer. *Maybe* we can go down that path in the Python 3.5 timeframe, but for now, no. I'm concerned that this might affect adoption: there are a lot of projects that have non-trivial custom code in setup.py - often doing mundane stuff like copying files around before the actual setup() call. Having hooks will enable easier migration for such projects (which include, for example, Twisted, Cython, NumPy). I don't believe it's realistic to expect them all to create platform-specific installers; they'll just carry on using setuptools/distribute. Quite so. Post-install hooks are a requirement for Twisted and for many projects which depend on Twisted. The hook is always the same on every platform, so it's not a platform-specific installer issue. Frankly, a big appeal of some next-generation package distribution system is the introduction of a proper set of events we can hook into, instead of assuming that by some accident of timing we can work out when the software is being installed and call some random function from the bottom of setup.py with a bunch of state scooped out of distutils' internals. The current situation is a total mess. -glyph ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 426: proposed metadata caching convention
On Wednesday, February 27, 2013 at 1:47 PM, Glyph wrote: On Feb 27, 2013, at 2:52 AM, Vinay Sajip vinay_sa...@yahoo.co.uk (mailto:vinay_sa...@yahoo.co.uk) wrote: Nick Coghlan ncoghlan at gmail.com (http://gmail.com/) writes: I'm not a fan of post-install hooks - that way lies setup.py. If people want to run arbitrary code at install time, they can publish a platform specific installer. *Maybe* we can go down that path in the Python 3.5 timeframe, but for now, no. I'm concerned that this might affect adoption: there are a lot of projects that have non-trivial custom code in setup.py - often doing mundane stuff like copying files around before the actual setup() call. Having hooks will enable easier migration for such projects (which include, for example, Twisted, Cython, NumPy). I don't believe it's realistic to expect them all to create platform-specific installers; they'll just carry on using setuptools/distribute. Quite so. Post-install hooks are a requirement for Twisted and for many projects which depend on Twisted. The hook is always the same on every platform, so it's not a platform-specific installer issue. Frankly, a big appeal of some next-generation package distribution system is the introduction of a proper set of events we can hook into, instead of assuming that by some accident of timing we can work out when the software is being installed and call some random function from the bottom of setup.py with a bunch of state scooped out of distutils' internals. The current situation is a total mess. -glyph ___ Distutils-SIG maillist - Distutils-SIG@python.org (mailto:Distutils-SIG@python.org) http://mail.python.org/mailman/listinfo/distutils-sig I'm generally +1 on hooks, the failure of setup.py isn't particularly that it's executable, it's that you can't access the metadata without executing it. In general hooks also allow people to easily disable them during install if they don't wish for that (of course packages have no reason to support that if they don't want to). ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 426: proposed metadata caching convention
On Feb 27, 2013, at 11:04 AM, Daniel Holth dho...@gmail.com wrote: What does it have to do in the hook? This: https://twistedmatrix.com/documents/current/core/howto/plugin.html#auto3 While this is theoretically optional - Twisted will behave mostly correctly without it - it noticeably improves the start-up performance of Twisted-based command-line tools, like 'twistd' and 'trial'. -glyph___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 426: proposed metadata caching convention
On Feb 27, 2013, at 10:49 AM, Donald Stufft donald.stu...@gmail.com wrote: I'm generally +1 on hooks, the failure of setup.py isn't particularly that it's executable, it's that you can't access the metadata without executing it. In general hooks also allow people to easily disable them during install if they don't wish for that (of course packages have no reason to support that if they don't want to). I pretty much agree. I'd be happy – enthusiastic, even – for Twisted to update to some static metadata expression system. -glyph ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 426: proposed metadata caching convention
On Mon, Feb 25, 2013 at 9:39 AM, Nick Coghlan ncogh...@gmail.com wrote: (This probably belongs in a successor to PEP 376, but I'll leave it under the PEP 426 umbrella for now) One of the points raised regarding PEP 426's integrated metadata format is the potential for runtime issues with pkg_resources as it reads and processes the metadata during startup, particularly if it needs to process any environment markers. While I acknowledge the suggestions I have received that we should really be moving away from the current filesystem based distributed installation information to a real database that properly handle import hooks, I'm looking for something simpler that will make it easier for setuptools and distribute to consume the new metadata format (and thus hopefully make them more amenable to generating it as well) Assuming we add an Entry-Points field as I have proposed in another message, I'd like to propose that installers generate three additional cache files as part of the installation process: dist-info-dir/__cache__/version.txt dist-info-dir/__cache__/requires-dist.txt dist-info-dir/__cache__/entry-points.txt version.txt would just be the version of the installed distribution (no need to parse the main metadata file just to read the version field) requires-dist.txt would be similar to the pkg_resources requires.txt format, but use PEP 426 version specifiers. It would: - only contain runtime requirements where the environment markers match the current system - be split into sections based on the extras definition needed to get the environment marker to pass entry-points.txt would be the same format as the pkg_resources entry_points.txt Cheers, Nick. Since this isn't going to be backwards-compatible anyway, may I suggest that: 1. The caching algorithm be fixed and defined as part of the extension machinery 2. The caching consists of simply copying the data to a file, whose name is programmatically based on the extension/field name. 3. Environment markers are not processed - that's up to the tool consuming the cached data This way, if e.g. entry points are defined as an extension, then the Builder making a wheel doesn't need to understand entry points, it just has to copy fields to a file. It allows other resource types (like i18n/l10n resources) to be defined in the metadata and cached for runtime use, without needing a metadata version upgrade or any tool rewrites. And not processing environment markers means that pure-Python wheels can still be used by just placing them on sys.path. ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 426: proposed metadata caching convention
On Wed, Feb 27, 2013 at 4:48 PM, PJ Eby p...@telecommunity.com wrote: On Mon, Feb 25, 2013 at 9:39 AM, Nick Coghlan ncogh...@gmail.com wrote: (This probably belongs in a successor to PEP 376, but I'll leave it under the PEP 426 umbrella for now) One of the points raised regarding PEP 426's integrated metadata format is the potential for runtime issues with pkg_resources as it reads and processes the metadata during startup, particularly if it needs to process any environment markers. While I acknowledge the suggestions I have received that we should really be moving away from the current filesystem based distributed installation information to a real database that properly handle import hooks, I'm looking for something simpler that will make it easier for setuptools and distribute to consume the new metadata format (and thus hopefully make them more amenable to generating it as well) Assuming we add an Entry-Points field as I have proposed in another message, I'd like to propose that installers generate three additional cache files as part of the installation process: dist-info-dir/__cache__/version.txt dist-info-dir/__cache__/requires-dist.txt dist-info-dir/__cache__/entry-points.txt version.txt would just be the version of the installed distribution (no need to parse the main metadata file just to read the version field) requires-dist.txt would be similar to the pkg_resources requires.txt format, but use PEP 426 version specifiers. It would: - only contain runtime requirements where the environment markers match the current system - be split into sections based on the extras definition needed to get the environment marker to pass entry-points.txt would be the same format as the pkg_resources entry_points.txt Cheers, Nick. Since this isn't going to be backwards-compatible anyway, may I suggest that: 1. The caching algorithm be fixed and defined as part of the extension machinery 2. The caching consists of simply copying the data to a file, whose name is programmatically based on the extension/field name. 3. Environment markers are not processed - that's up to the tool consuming the cached data This way, if e.g. entry points are defined as an extension, then the Builder making a wheel doesn't need to understand entry points, it just has to copy fields to a file. It allows other resource types (like i18n/l10n resources) to be defined in the metadata and cached for runtime use, without needing a metadata version upgrade or any tool rewrites. And not processing environment markers means that pure-Python wheels can still be used by just placing them on sys.path. My aim is to provide a hook mechanism that specifically does not say anything about the way the cache is stored or even whether the hook produces a cache at all. It will just run when pip is done. ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 426: proposed metadata caching convention
On Thu, Feb 28, 2013 at 7:59 AM, Daniel Holth dho...@gmail.com wrote: My aim is to provide a hook mechanism that specifically does not say anything about the way the cache is stored or even whether the hook produces a cache at all. It will just run when pip is done. How does the following idea sound? New metadata field: Post-Install Format: a *single* callable reference in entry-points format (i.e. module.name:callable.name) Call signature: def post_install_hook(metadata, extras, previous_version=None): ... extras would be a tuple indicating which extras were installed. For an upgrade, previous_version would be set to the version that was previously installed. For a clean installation, it would either be None or omitted entirely. The metadata argument would be the PEP 426 metadata, reformatted as JSON-compatible structured metadata. I had planned to postpone defining the algorithm for that conversion until after PEP 426 acceptance, but if we're going to add a post-install hook mechanism to PEP 426, I think it makes more sense to define it up front: 1. The top level is a mapping, with lowercase versions of all PEP 426 fields as keys. All multiple-use fields other than requires-python are pluralised (that one is only multiple use so you can depend on a different version of Python given different environment markers - for example, supporting Python 2.6 everywhere, but requiring Python 2.7 on Windows. Aside from those cases, you can collapse an arbitrarily complex version specifier down to a single line) 3. Every mandatory field is present, with a string value 4. If present, the keywords field, references a list of keywords (created via str.split) 5. If present, the description is always stored under the description key, even if provided in the PEP 426 metadata payload 6. If any other optional field is present, it references a string value 7. If present, the project-urls key references a mapping of labels to URLs. 8. If present, the extensions key references a mapping of extension names to the extension's embedded JSON metadata. (Note: this is the key reason for my planned change to the extension format from arbitrary subfields to allowing only a single json subfield - it greatly simplifies this aspect of the translation to structured metadata, *and* makes it more flexible and powerful at the same time) 9. For any multi-use field that is present and supports environment markers, it is a reference to a mapping where each key is a whitespace-normalized (i.e. every sequence of whitespace converted to a single space) environment marker string that references a list of string values. The unqualified fields are referenced by the string always. This breakdown allows each unique environment marker to be evaluated only once to determine whether or not it is applicable, regardless of how many times it was originally used. 10. If any other multi-use field is present, it references a list of string values. For example: Metadata-Version: 2.0 Name: BeagleVote Version: 1.0a2 Summary: A module for collecting votes from beagles. Keywords: dog puppy voting election Project-URL: Bug, Issue Tracker, http://bitbucket.org/tarek/distribute/issues/ Requires-Dist: pkginfo Requires-Dist: PasteDeploy Requires-Dist: zope.interface (3.5.0) Extension: Chili Chili/json: { Type: Poblano, Heat: Mild } Apparently, these beagles like their chili. (This is not a helpful description) Would become: { metadata-version: 2.0, name: BeagleVote, version: 1.0a2, summary: A module for collecting votes from beagles., description: Apparently, these beagles like their chili. (This is not a helpful description), keywords: [dog, puppy, voting, election], project-urls: { Bug, Issue Tracker: http://bitbucket.org/tarek/distribute/issues/; }, requires-dists: {always: [pkginfo, PasteDeploy, zope.interface (3.5.0)]}, extensions: { Chili: { Type: Poblano, Heat: Mild } } } An apparently simpler alternative would be to rely on PEP 376 to retrieve the full metadata and only provide the distribution name and version to the hook: def post_install_hook(distname, current_version, previous_version=None): ... The key disadvantage of that seemingly simpler approach is it *only* works for post install and pre uninstall hooks, *and* requires that the post-install hook have the tools needed to read the PEP 376 metadata. If we later want to add pre-install, build or archiving hooks, they would need the structured metadata format anyway, as relying on PEP 376 isn't an option for software that hasn't been installed yet. This simpler alternative also won't work for eventually decoupling the installation database from a particular filesystem layout (e.g. adding metadata support to import hooks or tunnelling the
Re: [Distutils] PEP 426: proposed metadata caching convention
On 25 February 2013 14:39, Nick Coghlan ncogh...@gmail.com wrote: (This probably belongs in a successor to PEP 376, but I'll leave it under the PEP 426 umbrella for now) One of the points raised regarding PEP 426's integrated metadata format is the potential for runtime issues with pkg_resources as it reads and processes the metadata during startup, particularly if it needs to process any environment markers. While I acknowledge the suggestions I have received that we should really be moving away from the current filesystem based distributed installation information to a real database that properly handle import hooks, I'm looking for something simpler that will make it easier for setuptools and distribute to consume the new metadata format (and thus hopefully make them more amenable to generating it as well) Assuming we add an Entry-Points field as I have proposed in another message, I'd like to propose that installers generate three additional cache files as part of the installation process: dist-info-dir/__cache__/version.txt dist-info-dir/__cache__/requires-dist.txt dist-info-dir/__cache__/entry-points.txt version.txt would just be the version of the installed distribution (no need to parse the main metadata file just to read the version field) requires-dist.txt would be similar to the pkg_resources requires.txt format, but use PEP 426 version specifiers. It would: - only contain runtime requirements where the environment markers match the current system - be split into sections based on the extras definition needed to get the environment marker to pass entry-points.txt would be the same format as the pkg_resources entry_points.txt Why a __cache__ subdirectory? Is this purely an easier-to-process copy of what's in the METADATA file? If so, I'd prefer to simply take the information out of the METADATA file and have it in a single separate file in the first place. IIUC, that's what Daniel is suggesting as well. We don't really need everything to be in a single file, surely? Paul. ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 426: proposed metadata caching convention
On Tue, Feb 26, 2013 at 12:45 AM, Paul Moore p.f.mo...@gmail.com wrote: We don't really need everything to be in a single file, surely? Yes, I want the metadata to map cleanly to a single data structure so it can be more easily managed through things that *aren't* file systems (such as finally getting the installation database to support import hooks and also for potential metadata publication through TUF). However, decomposing it for efficient runtime access and backwards compatibility reasons makes sense. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Re: [Distutils] PEP 426: proposed metadata caching convention
Post install hooks are different than setup.py because they are installed first and then run for all packages, and are not requested by the installed dist. They are more like rewriting script #!python shebang. May I humbly suggest deleting things from this pep until it is acceptable and not the other way around? On Feb 25, 2013 11:54 AM, Paul Moore p.f.mo...@gmail.com wrote: On 25 February 2013 15:10, Nick Coghlan ncogh...@gmail.com wrote: entry-points.txt is pure backwards compatibility, though. The only reason I didn't suggest reusing the setuptools name for the file is because I want the __cache__ in the name to clearly identify the files the installer derives from METADATA rather than the ones defined in PEP 376 or installed as part of the distribution. One thing I *would* like to suggest is that the cached versions of the data should be optional. My specific reason for this is that as things stand, many wheels are usable without installation, simply by putting them on sys.path. As wheels are a distribution format, they won't have the cached data, and I'd be unhappy if that fact broke the ability to use them as zips. Paul. ___ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig