On Fri, Mar 01, 2013 at 06:09 -0500, Donald Stufft wrote: > On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote: > > On 01.03.2013 11:19, holger krekel wrote: > > > Hi Richard, all, > > > > > > somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py" > > > script which takes a project name as an argument and then goes to > > > pypi.python.org (http://pypi.python.org) and removes all > > > homepage/download metadata entries for > > > this project. This sanitizes/speeds up installation because > > > pip/easy_install don't need to crawl them anymore. I just did this for > > > three of my projects, (pytest, tox and py) and it seems to work fine. > > > > > > > > > Does it also cleanup the links that PyPI adds to the /simple/ by > > parsing the project description for links ? > > > > I think those are far nastier than the homepage and download links, > > which can be put to some good use to limit the external lookups > > (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) > > > > See e.g. https://pypi.python.org/simple/zc.buildout/ > > for a good example of the mess this generates... even mailto links > > get listed and "file:///" links open up the installers for all > > kinds of nasty things (unless they explicitly protect against > > following these). > > > > > > pip at least, and I assume the other tools don't spider those links, but > they do consider them for download (e.g. if the link looks installable > it will be a candidate for installing, but it won't fetch it, and look for > more links like it will donwnload_url/home_page). > > I believe that's the way it's structured atm.
That's right. Even though the long-description extracted links look ugly on a simple/PKGNAME page, neither pip nor easy_install do anything with them except if the "href" ends in "#egg=PKGNAME-" in which case they are taken as pointing to a development tarball (e.g. at github or bitbucket). ASFAIK a link like "PKGNAME-VER.tar.gz" will not be treated as an installation candidate, just the "#egg=PKGNAME" one. best, holger > > > > > Now before i release this as a tool, i wonder: Is it a good idea to remove > > > download/homepage entries? Is there any current machine use (other than > > > the dreaded crawling) for the homepage/download_url per-release metadata > > > fields? > > > > > > For humans the homepage link is nicely discoverable if the > > > long-description > > > doesn't mention it prominently. But i think there also is a "project url" > > > or "bugtrack url" for a project so maybe those could be used to reference > > > these important pages? (i am a bit confused on the exact meaning of those > > > urls, btw). > > > > > > Should we maybe stop advertising "homepage" and "download_url" > > > and instead see to extend project-url/bugtrackurl to be used > > > and shown nicely? The latter are independent of releases which i think > > > makes sense - what use are old probably unreachable/borked homepages > > > anyway. And it's also not too bad having to go once to pypi.python.org > > > (http://pypi.python.org) > > > to set it, usually it seldomly changes. > > > > > > > > > I think it would be better to differentiate between showing the > > fields on the project pages, where they provide useful resources > > for people, and their use on the /simple/ index pages which are > > meant for programs to parse. > > > > IMO, the homepage and download links on the project pages are > > indeed very useful for people. On the /simple/ index a homepage > > link is probably not all that useful (provided a download link > > is set). The download links serve the purpose of directing > > tools to the right location, so those do belong on the /simple/ > > index listings. I'd completely remove the links parsed from > > the descriptions, since those don't really provide a good > > basis for crawling (the description is meant for humans to > > parse, not programs). > > > > -- > > Marc-Andre Lemburg > > eGenix.com (http://eGenix.com) > > > > Professional Python Services directly from the Source (#1, Mar 01 2013) > > > > > Python Projects, Consulting and Support ... http://www.egenix.com/ > > > > > mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ > > > > > mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > > > > > > > > > > > > > > > > ________________________________________________________________________ > > > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > > > eGenix.com (http://eGenix.com) Software, Skills and Services GmbH > > Pastor-Loeh-Str.48 > > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > > Registered at Amtsgericht Duesseldorf: HRB 46611 > > http://www.egenix.com/company/contact/ > > _______________________________________________ > > Catalog-SIG mailing list > > Catalog-SIG@python.org (mailto:Catalog-SIG@python.org) > > http://mail.python.org/mailman/listinfo/catalog-sig > > > > > > _______________________________________________ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig