On Friday, March 1, 2013 at 6:04 AM, M.-A. Lemburg wrote: > On 01.03.2013 11:19, holger krekel wrote: > > Hi Richard, all, > > > > somewhere deep in the threads i mentioned i wrote a little "cleanpypi.py" > > script which takes a project name as an argument and then goes to > > pypi.python.org (http://pypi.python.org) and removes all homepage/download > > metadata entries for > > this project. This sanitizes/speeds up installation because > > pip/easy_install don't need to crawl them anymore. I just did this for > > three of my projects, (pytest, tox and py) and it seems to work fine. > > > > > Does it also cleanup the links that PyPI adds to the /simple/ by > parsing the project description for links ? > > I think those are far nastier than the homepage and download links, > which can be put to some good use to limit the external lookups > (see http://wiki.python.org/moin/PyPI/DownloadMetaDataProposal) > > See e.g. https://pypi.python.org/simple/zc.buildout/ > for a good example of the mess this generates... even mailto links > get listed and "file:///" links open up the installers for all > kinds of nasty things (unless they explicitly protect against > following these). > >
pip at least, and I assume the other tools don't spider those links, but they do consider them for download (e.g. if the link looks installable it will be a candidate for installing, but it won't fetch it, and look for more links like it will donwnload_url/home_page). I believe that's the way it's structured atm. > > > Now before i release this as a tool, i wonder: Is it a good idea to remove > > download/homepage entries? Is there any current machine use (other than > > the dreaded crawling) for the homepage/download_url per-release metadata > > fields? > > > > For humans the homepage link is nicely discoverable if the long-description > > doesn't mention it prominently. But i think there also is a "project url" > > or "bugtrack url" for a project so maybe those could be used to reference > > these important pages? (i am a bit confused on the exact meaning of those > > urls, btw). > > > > Should we maybe stop advertising "homepage" and "download_url" > > and instead see to extend project-url/bugtrackurl to be used > > and shown nicely? The latter are independent of releases which i think > > makes sense - what use are old probably unreachable/borked homepages > > anyway. And it's also not too bad having to go once to pypi.python.org > > (http://pypi.python.org) > > to set it, usually it seldomly changes. > > > > > I think it would be better to differentiate between showing the > fields on the project pages, where they provide useful resources > for people, and their use on the /simple/ index pages which are > meant for programs to parse. > > IMO, the homepage and download links on the project pages are > indeed very useful for people. On the /simple/ index a homepage > link is probably not all that useful (provided a download link > is set). The download links serve the purpose of directing > tools to the right location, so those do belong on the /simple/ > index listings. I'd completely remove the links parsed from > the descriptions, since those don't really provide a good > basis for crawling (the description is meant for humans to > parse, not programs). > > -- > Marc-Andre Lemburg > eGenix.com (http://eGenix.com) > > Professional Python Services directly from the Source (#1, Mar 01 2013) > > > > Python Projects, Consulting and Support ... http://www.egenix.com/ > > > > mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ > > > > mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > > > > > > > > > > > ________________________________________________________________________ > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com (http://eGenix.com) Software, Skills and Services GmbH > Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Catalog-SIG mailing list > [email protected] (mailto:[email protected]) > http://mail.python.org/mailman/listinfo/catalog-sig > >
_______________________________________________ Catalog-SIG mailing list [email protected] http://mail.python.org/mailman/listinfo/catalog-sig
