On 20 April 2012 19:46, Corentin Chary <corentin.ch...@gmail.com> wrote:
> On Fri, Apr 20, 2012 at 9:37 AM, Kent Fredric <kentfred...@gmail.com> wrote:
>> On 20 April 2012 03:31, Corentin Chary <corentin.ch...@gmail.com> wrote:
>>> Add rubygems, github, gitorious, pecl, pear, bitbucket.
>>> All of them are handled by my remoteids.py script.
>>>
>>> ref: https://bugs.gentoo.org/show_bug.cgi?id=406287
>>> ref: https://github.com/iksaif/portage-janitor/blob/master/remoteids.py
>>>
>>> --- a/metadata/dtd/metadata.dtd 2010-03-02 18:52:11.000000000 +0100
>>> +++ b/metadata/dtd/metadata.dtd 2012-04-19 14:22:14.077954310 +0200
>>> @@ -61,7 +61,7 @@
>>>     <!ELEMENT bugs-to (#PCDATA)>
>>>     <!-- specify a type of package identification tracker -->
>>>     <!ELEMENT remote-id (#PCDATA)>
>>> -      <!ATTLIST remote-id type 
>>> (freshmeat|sourceforge|sourceforge-jp|cpan|vim|google-code|ctan|pypi|rubyforge|cran)
>>>  #REQUIRED>
>>> +      <!ATTLIST remote-id type 
>>> (freshmeat|sourceforge|sourceforge-jp|cpan|vim|google-code|ctan|pypi|rubyforge|cran|rubygems|github|gitorious|pecl|pear|bitbucket)
>>>  #REQUIRED>
>>>
>>>   <!-- category/package information for cross-linking in descriptions
>>>     and useflag descriptions -->
>>>
>>> --
>>> Corentin Chary
>>> http://xf.iksaif.net/
>>
>>
>> I suggested last week on #gentoo-perl that it might be nice to have
>> 'cpan' and 'cpan-module'  ( or something like that ) to disambiguate 2
>> queryable terms. ( where 'cpan'  => 'the package name on cpan' )
>>
>> For some purposes, its most convenient to use the distribution name,
>> and for other purposes, (ie: cpan clients) its more convenient to use
>> a Module name, and its not easy to translate between the two, as
>> Module names sometimes switch between packages  they're shipped in.
>>
>> For instance, a while ago, the BioPerl module was shipped in a
>> distribution 'bioperl' , which has only recently been changed to
>> BioPerl
>>
>>
>> http://api.metacpan.org/release/_search?q=distribution:bioperl&fields=archive,author,date,download_url
>>
>> http://api.metacpan.org/release/_search?q=distribution:BioPerl&fields=archive,author,date,download_url
>>
>> vs
>>
>>
>> http://api.metacpan.org/module/_search?q=module.name:Bio\:\:Perl&fields=distribution,author,release
>
> Looks sane since the goal of remote-id is being able to identify the
> package upstream.
> Do you think you could patch remotesid.py to generate tags for cpan /
> cpan-modules ? Or at least give me a pseudo-algo that does the trick.
> Thanks :)
>
> --
> Corentin Chary
> http://xf.iksaif.net
>


That is sadly not straight forward.  Extracting the package name can
be straight forward if you have the URL, because the package name is
literally the same as the archive name in SRC_URI , sans version
information.

However, if you look at many perl ebuilds, you'll notice many lack
this field and we've got other things in place, so the current parsing
technique you use to detect uses of SRC_URI wont work there ( I could
be wrong, I don't fully grok your python code )

And more-over, determining the value of 'cpan-module' may be
impossible without access to the tar.gz itself, or querying the
MetaCPAN API.

Usually, upstream are sensible and have package names which closely
correspond with the module names, ie: "Dist::Zilla" is shipped in
'Dist-Zilla-$VERSION.tar.gz',  but there are many packages which dont
do this, such as this notable example:
https://metacpan.org/release/Scalar-List-Utils  , which has no modules
corresponding to the package name, and no way to divine the/a 'main'
module from the package itself. ( and this is exacerbated by packages
changing names, or package joins ( 2 packages becoming 1 via releasing
modules together ),  and package splits ( 1 package rips into 2 sets
of modules ).

Essentially, using a cpan-module as an identifier is somewhat
"forwards only" , and even then, what it will resolve to is governed
by time.

This is fine for CPAN clients, which do the resolution hot, using the
whole of CPAN as their data, if a user asks for "Foo::Bar", their cpan
client will ask a cpan server ( or regularly (hourly) updated list )
as to what package that module can be found in ( and this only returns
the most recent package, so name changes and so-forth are invisible to
the user ).

And being helpful to CPAN clients is one of the reasons we want this
value as a specifiable option in the first place. For us, its easier
to track the package name, and then when that has to change we can
manually resolve the issue

-- 
Kent

perl -e  "print substr( \"edrgmaM  SPA NOcomil.ic\\@tfrken\", \$_ * 3,
3 ) for ( 9,8,0,7,1,6,5,4,3,2 );"

http://kent-fredric.fox.geek.nz

Reply via email to