Re: [Distutils] Python people want CPAN and how the latter came about

Sridhar Ratnakumar Fri, 25 Dec 2009 00:00:48 -0800

Greetings Lennart,

On 12/24/2009 10:27 PM, Lennart Regebro wrote:

On Fri, Dec 25, 2009 at 05:39, Sridhar Ratnakumar
<[email protected]>  wrote:

Is it because of this benefit to package authors that we are withholding the
implementation of a simple archive that would: 1) simplify the tools to no
rely on adhoc web scrapping


There are better ways to do that.


May I ask, what would they be?

2) reduce the downtime for users by rsync/ftp mirroring


This is true, but the idea to upload them by robots is preferable in
my opinion. Again it's a difference between trying to force other
people to behave to your expectations vs trying to make it easier for
others to behave to your expectations.

3) have package sources mirrored so project owners do not have to
worry about downtime of their servers.


That's *their* problem. If they don't want to upload, then they don't
want to upload.

As the original proposal is to retain the existing behavior for alreadyregistered/uploaded package releases (such as Twisted) so existingsystems will continue to work, but implement the suggested upload rulesonly for new requests (creation/register)- so as to gradually improvethe quality of PyPI like that of other packaging systems - byencouraging authors to generate a reasonably good sdist (setup.py +PKG-INFO) and uploading them .. and consequently enabling the movetowards a static archive that can easily be mirrored, I fail to see justwhat good is achieved by retaining the status quo.

If I want to use a web service, I obviously have to adhere to theirrules and policies. Nobody is forcing me to do so.

I assume in good faith that package authors will be happy to adapt tothe new system .. for the benefit of everyone. I will be happy to beproven otherwise. (Speculations are useless; how about we actually askthe package authors themselves?)

4) enable proliferation of third-party tools like CPAN?


That won't help.

Why not? Do you conceive of any reason apart from CPAN-like archivesthat would help in proliferation of mirror sites and third-party sites?I ask because I personally went through significant hurdles to setup adaily PyPI mirror-like area. I just don't see how someone merelyinterested in writing a third-party service, or setup a mirror of PyPIwould be *most likely inclined* to face similar hurdles before givingup. Because I went through these hurdles, I was able to appreciateCPAN's design while reading about it [cpan.org/misc/ZCAN.html].

Nope, it matters not whether the metadata can be retrived via a simple HTTP
GET or XmlRpc.


OK. Then you have two proposals: 1. Require uploading, which is a bad
idea and 2. Making it easier to mirror the metadata, which seems
reasonable, assuming it's currently hard. :)


Here's one idea (example only):

$ tar zxf foo-0.1.tar.gz
$ cp foo-0.1/PKG-INFO foo-0.1.tar.gz.PKG-INFO

Metadata is definitely needed. Otherwise, I'd have to extract the tarball of
each and every release of a pacticular package, in order to even find their
version number (it is unreliable to parse the filename to get version
number).


Yes, but it's not particularly unreliable to compare the filename to
see if it had been handled before. You don't even need to parse the
version number for most services that work on the tarballs.

It is indeed unreliable to rely on filenames to get package versions(unless that sdist is generated by the `setup.py sdist` command). AsI've mentioned elsewhere, some packages have weird filenames (eg:"latest.zip", "foo.py"); some others have '.dev' suffix in the filenameswhile setup.py:version (hence PKG-INFO) will not have the '.dev' prefix.And several other issues that I cannot recall right now.

I am not speculating as I've actually experimented with the PyPI index,mirroring it .. handling the metadata in packages, and building it.

As for the sdists, the following tools would need it: testing service,
quality ratings, thirdparty package managers (enstaller, PyPM) .. and not to
mention the various mirror sites.


Yes, but since thay have the source package, and will have to unpack
it and build the packages anyway, they also have the metadata.

It is not that simple. PyPM backend, for instance, is not monolithic asin doing only a sequential build of packages. It first loads thedependency graph (for which metadata - PKG-INFO/requires.txt - isrequired) from our internal mirror over the network. It is expensive togo extract each and every tarball .. from each build machine. Afterloading the dependency graph, and then comparing it with existingrepository .. every day, new builds happen.

Certain packages even lack metadata (eg: no PKG-INFO in Twisted's sdist)in their source distributions .. which is another issue altogether.

Further, I can imagine search.cpan.org (which is not hosted by cpan.orgfolks) using only the metadata without touching the source distributions.


-srid
_______________________________________________
Distutils-SIG maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] Python people want CPAN and how the latter came about

Reply via email to