On Thu, Feb 17, 2011 at 6:57 PM, P.J. Eby <[email protected]> wrote: > At 12:01 AM 2/17/2011 +0000, Daniele Varrazzo wrote:
>> I'm sorry, it is obvious that I have not spent so much time into this >> problem as the designer of this feature. But it still don't get the >> rationale behind discarding available, non-ambiguous metadata in >> favour of screen scraping. > > When easy_install was first written, PyPI didn't even support *uploading*. > And the quality of available metadata on PyPI is still quite sketchy -- > many packages will have only one file uploaded for an outdated version, but > still have good downloads on their home pages or download URLs. >> the shortcomings of a package manager > > Well, technically, this'd be a feature. Granted, it's only a feature for > users of projects whose maintainers are *not* keeping a well-groomed PyPI > page. ;-) I guess it is a shortcoming in the sense that there ought to be > a way to stop it from using this feature. What I understand here is that setuptools/pip/distutils2 had to grow in intelligence because of the limitation of the data available on PyPI, in quantity and in quality. I see that this could have been the situation in the infancy of PyPI, but I don't think this is the case anymore. PyPI is now the only package repository, people know about its existence and is willing to have their module there. The quantity is a solved problem. But from the picture you describe, it seems to me that the "intelligence" of the package managers is now an hinder to the quality. Because setuptools is relatively good at screenscraping people has little incentive in improving the quality of the metadata describing their package. I don't think in the long term it will be a win: in my experience the programs have a limited lifespan, whereas the databases largely outlive them and end up being the real *resource*. Good data + ok program: win. Bad data + too good program: random outcome. I don't see the need of semantic tagging, nor improved algorithms for a better packaging system: this is only complexity increasing, maybe some hack value, but no robustness. Because PyPI is now largely credited as a good idea, I believe that a *stupid* package manager, one that only follows the directive made available by the packager on PyPI would make people *run* to fix their bloody metadata. If they release foo-1.0.1 and a test shows that "easy_install foo" still installs 1.0 they will *spring* to move the fat fingers and type "python setup.py upload". An improvement would be providing a command "python setup.py test-upload" that would download the package from PyPI and check that the version matches the one in setup.py. Of course the disclaimer holds: I haven't spent my hack time after the reasoning that have led to the birth of pip, zc.buildout, distribute2 after setuptools, so my position is probably naive. But I see a run to have the most spiffy setup program to work on a pile of trash instead of using the privileged position you have to encourage a better environment. If in this very moment setuptools n+1 was released, and it had the *feature* to install by default the current PyPI version of the packages, then it would be a bug in pip the fact that it installs instead the wrong version. At the same moment people would see their program not being installed anymore by easy_install so they will rush to update their metadata. It wouldn't even be difficult to have a list of the "problematic packages", ones for which the version installed by easy_install is different from the current PyPI, either for a shortcoming in the search algorithm as for psycopg2 or because the maintainer has forgotten to upload. They could receive a mail explaining the discrepancy and suggesting to fix it, or the metadata may be fixed once by a PyPI maintainer. I don't expect these packages to be a large percentage of the current 13365. End of rambling :) Thank you for the effort, don't think I don't appreciate it. Have a nice day. -- Daniele _______________________________________________ Distutils-SIG maillist - [email protected] http://mail.python.org/mailman/listinfo/distutils-sig
