On 01.03.2013 12:17, holger krekel wrote: > On Fri, Mar 01, 2013 at 06:09 -0500, Donald Stufft wrote: >> On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote: >>> On 01.03.2013 11:19, holger krekel wrote: >>>> Hi Richard, all, >>>> >>>> somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py" >>>> script which takes a project name as an argument and then goes to >>>> pypi.python.org (http://pypi.python.org) and removes all homepage/download >>>> metadata entries for >>>> this project. This sanitizes/speeds up installation because >>>> pip/easy_install don't need to crawl them anymore. I just did this for >>>> three of my projects, (pytest, tox and py) and it seems to work fine. >>>> >>> >>> >>> Does it also cleanup the links that PyPI adds to the /simple/ by >>> parsing the project description for links ? >>> >>> I think those are far nastier than the homepage and download links, >>> which can be put to some good use to limit the external lookups >>> (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) >>> >>> See e.g. https://pypi.python.org/simple/zc.buildout/ >>> for a good example of the mess this generates... even mailto links >>> get listed and "file:///" links open up the installers for all >>> kinds of nasty things (unless they explicitly protect against >>> following these). >>> >>> >> >> pip at least, and I assume the other tools don't spider those links, but >> they do consider them for download (e.g. if the link looks installable >> it will be a candidate for installing, but it won't fetch it, and look for >> more links like it will donwnload_url/home_page). >> >> I believe that's the way it's structured atm. > > That's right. Even though the long-description extracted links > look ugly on a simple/PKGNAME page, neither pip nor easy_install do anything > with them except if the "href" ends in "#egg=PKGNAME-" in which case they are > taken as pointing to a development tarball (e.g. at github or bitbucket). > ASFAIK a link like "PKGNAME-VER.tar.gz" will not be treated as > an installation candidate, just the "#egg=PKGNAME" one.
Hmm, then why not remove links that don't match the above from the /simple/ index pages ? Note that it's easily possible to make e.g. file:/// links have a fragment that matches what you described, so I guess the filters would have to be more careful about what to allow (e.g. only http/ftp schemes, perhaps even only https schemes) and what not. BTW: Are those links also shown as-is on the description page ? People could do nasty stuff by adding "javascript:" links which look like normal links to the descriptions. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 01 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ _______________________________________________ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig