Hi there, On Wed, Feb 8, 2012 at 1:21 AM, <[email protected]> wrote: [snip] > Why not? People certainly use it as an archive - and the same people > suggesting > that PyPI should host all files also insist that old releases should never > be deleted from PyPI (making it an archive).
There are various semantics of the word "archive" involved. One way to see an archive is as a repository of out-of-date historical information. Its only interest is historical. Broken links are acceptable in that. Another way to see an archive is as a repository of information that is current. While some bits (releases) were added to the archive long ago, they still see active use every day. (There's a grey area. Python 1.5.2 is historical to most of us, but is undoubtedly still in active use somewhere. If it were to disappear from python.org the people complaining about it would have a legitimate complaint: it's unnecessarily hurting its users to do so, I think, for little gain to the python.org maintainers. But for most of us the 1.5.2 download page is of historical value only.) PyPI is both: it's both an archive of historical information and that's why links in its metadata and documentation should be allowed to remain even when the outside world has changed and they are broken, and it's a repository of current information, where we want the metadata and in particular the *releases* to remain available. If an archive contained no links to the outside world at all, an archive (active action to modify the archive disregarded) would automatically be both historical and current. But PyPI does contain links, and in particular links to releases. The thing that brings tension between the two uses of PyPI is that releases are, in the "repository of current information" sense, more like metadata than like links. PyPI retains old metadata, but *links* to releases can break. So if a release is uploaded to PyPI, the release will remain (unless active action is taken), and this permanence is under control of PyPI, just like that of metadata. If a *link* is uploaded to PyPI for indicating releases, it can only be maintained by PyPI in the "historical archive" sense; it might be up to date or might be outdated, and PyPI cannot help it or control it. If PyPI *only* contained links to releases and didn't contain releases itself, we would either not have the automatic download tools we have now, or we'd have cache or repository technology to make sure that releases *can* be reliably accessed to reduce the points of failure. If PyPI *only* contained releases and no links to releases, we'd not be having this discussion (we'd only have the discussion about people actively removing old releases). But PyPI does both and that's what is creating complexity. I think the best way out would be for caching technology for active releases to find active use (if the license information in the PyPI metadata allows such caching). This is a technical solution that can be worked on independently. Regards, Martijn _______________________________________________ Catalog-SIG mailing list [email protected] http://mail.python.org/mailman/listinfo/catalog-sig
