Greetings Lennart,

On 12/24/2009 10:27 PM, Lennart Regebro wrote:
On Fri, Dec 25, 2009 at 05:39, Sridhar Ratnakumar
<[email protected]>  wrote:
Is it because of this benefit to package authors that we are withholding the
implementation of a simple archive that would: 1) simplify the tools to no
rely on adhoc web scrapping

There are better ways to do that.

May I ask, what would they be?

2) reduce the downtime for users by rsync/ftp mirroring

This is true, but the idea to upload them by robots is preferable in
my opinion. Again it's a difference between trying to force other
people to behave to your expectations vs trying to make it easier for
others to behave to your expectations.

3) have package sources mirrored so project owners do not have to
worry about downtime of their servers.

That's *their* problem. If they don't want to upload, then they don't
want to upload.

As the original proposal is to retain the existing behavior for already registered/uploaded package releases (such as Twisted) so existing systems will continue to work, but implement the suggested upload rules only for new requests (creation/register)- so as to gradually improve the quality of PyPI like that of other packaging systems - by encouraging authors to generate a reasonably good sdist (setup.py + PKG-INFO) and uploading them .. and consequently enabling the move towards a static archive that can easily be mirrored, I fail to see just what good is achieved by retaining the status quo.

If I want to use a web service, I obviously have to adhere to their rules and policies. Nobody is forcing me to do so.

I assume in good faith that package authors will be happy to adapt to the new system .. for the benefit of everyone. I will be happy to be proven otherwise. (Speculations are useless; how about we actually ask the package authors themselves?)

4) enable proliferation of third-party tools like CPAN?

That won't help.

Why not? Do you conceive of any reason apart from CPAN-like archives that would help in proliferation of mirror sites and third-party sites? I ask because I personally went through significant hurdles to setup a daily PyPI mirror-like area. I just don't see how someone merely interested in writing a third-party service, or setup a mirror of PyPI would be *most likely inclined* to face similar hurdles before giving up. Because I went through these hurdles, I was able to appreciate CPAN's design while reading about it [cpan.org/misc/ZCAN.html].

Nope, it matters not whether the metadata can be retrived via a simple HTTP
GET or XmlRpc.

OK. Then you have two proposals: 1. Require uploading, which is a bad
idea and 2. Making it easier to mirror the metadata, which seems
reasonable, assuming it's currently hard. :)

Here's one idea (example only):

$ tar zxf foo-0.1.tar.gz
$ cp foo-0.1/PKG-INFO foo-0.1.tar.gz.PKG-INFO

Metadata is definitely needed. Otherwise, I'd have to extract the tarball of
each and every release of a pacticular package, in order to even find their
version number (it is unreliable to parse the filename to get version
number).

Yes, but it's not particularly unreliable to compare the filename to
see if it had been handled before. You don't even need to parse the
version number for most services that work on the tarballs.

It is indeed unreliable to rely on filenames to get package versions (unless that sdist is generated by the `setup.py sdist` command). As I've mentioned elsewhere, some packages have weird filenames (eg: "latest.zip", "foo.py"); some others have '.dev' suffix in the filenames while setup.py:version (hence PKG-INFO) will not have the '.dev' prefix. And several other issues that I cannot recall right now.

I am not speculating as I've actually experimented with the PyPI index, mirroring it .. handling the metadata in packages, and building it.

As for the sdists, the following tools would need it: testing service,
quality ratings, thirdparty package managers (enstaller, PyPM) .. and not to
mention the various mirror sites.

Yes, but since thay have the source package, and will have to unpack
it and build the packages anyway, they also have the metadata.

It is not that simple. PyPM backend, for instance, is not monolithic as in doing only a sequential build of packages. It first loads the dependency graph (for which metadata - PKG-INFO/requires.txt - is required) from our internal mirror over the network. It is expensive to go extract each and every tarball .. from each build machine. After loading the dependency graph, and then comparing it with existing repository .. every day, new builds happen.

Certain packages even lack metadata (eg: no PKG-INFO in Twisted's sdist) in their source distributions .. which is another issue altogether.

Further, I can imagine search.cpan.org (which is not hosted by cpan.org folks) using only the metadata without touching the source distributions.

-srid
_______________________________________________
Distutils-SIG maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to