The problem I have with the several proposals so far is that they seem
to assume things that may or may not be supported by CPAN's actual
metadata and file structure.

1) Distributions can't be uniquely identified without an author name.
For example:

  cpan://dist/Foo-Bar/1.23

There is no reliable way to identify where Foo-Bar-1.23 is to be
found.  There is no reason as far as I know, why two authors can't
have Foo-Bar-1.23:

  authors/id/D/DA/DAGOLDEN/Foo-Bar-1.23.tar.gz
  authors/id/R/RJ/RJBS/Foo-Bar-1.23.tar.gz

They don't even need to contain the same modules (*.pm files) or
packages (package statement within a .pm file).  Both versions of
Foo-Bar-1.23 could appear in the 02packages file.

2) dists may or may not even contain modules -- they could just contain scripts.

3) version numbers have no easy standards given what's out there in
the wild.  Consider for example:

  Sendmail_M4.0.26a.tar.gz
  Term-Gnuplot-0.90_38b_00.tar.gz
  Data-Dump-Streamer-2.08-40.tar.gz

I'm not even sure if CPAN::DistnameInfo really handles all the odd
cases well, but it's probably pretty close to a standard for what can
be done.

4) packages with arbitrary version numbers can't be mapped to a
distribution unless it appears in 02packages.txt (latest non-developer
version).  If the latest Foo::Bar is 1.23, there's no way to tell what
distribution tarball contains Foo::Bar 1.22.

My suggestion is to keep a cpan URI focused on just the two things
that can be done fairly reliably

A) package names

B1) author/distname-version
B2) author/distname/version

Item (A) can be mapped to (B) via the 02packages file.  Item (B1)
corresponds to a specific tarball, sans its archive suffix.  Item (B2)
would be an alternative that offers a best attempt at parsing the
version, or without a version, references the highest version (to the
extent it can be determined) that isn't a developer version.

Moreover, for (B1) to have any useful external meaning, it really
needs to have the archive suffix attached.  For (B2), without a
version, there would need to be substantial heuristics to actually
translate that into a "latest version" and to find the right archive
suffix to use that I'm skeptical of the utility of defining it as a
URI.

Certainly, the implementation of the (B2) heuristics would need to be
included in a URI::cpan module, or else an "author/distname/" URI
might be interpreted differently in different places, which seems to
defeat the purpose of having a URI in the first place.  And since that
would require URI::cpan to have access to CPAN mirror indices, that's
pretty heavyweight.  I would suggest skipping (B2).

So what does that leave us with?  Much like the original, but without
package versions and merging dist and author:

  (A) cpan://package/Foo::Bar

  (B) cpan://id/AUTHOR/Foo-Bar-1.23.tar.gz

Thus, a CPAN URI would be either a "package" that maps uniquely to a
CPAN path via 02packages.txt or else is an "id" that -- with suitable
A/AU/AUTHOR expansion reflects an actual path on a cpan mirror.

Not surprisingly, these are pretty much exactly the forms that CPAN.pm
will accept as arguments to commands, and I find that symmetry
appealing as well.

Regards,
David

Reply via email to