Re: CPANTS: has_signature, has_pod_index

2005-11-07 Thread Ivan Tubert-Brohman

Adam Kennedy wrote:



* has_pod_index: The POD contains at least one X<> keyword that helps
POD indexers. Whether only one is usefull is open for debate, because
at least the license (X), your CPAN ID under authors (x),
and some generic keyword what your module (X) is about can
probably added even for the most minimal module.



Can you give an example of how this has any practical impact on
anything?




Here is the main page for the project.
http://pod-indexing.annocpan.org/wiki/index.cgi

They talk only about the Perl core doc at this point, probably because 
adding keywords there is already enough work. AFAIK the core docs are 
now covered, so individual modules would be next.


Yep, a google-like search engine could save the effort of manually 
tagging with keywords, but I think this idea is more practical and 
will improve perldoc greatly.



I hate to say it, but this indexing thing has seemed to be ass-backwards 
to me from the beginning.


Instead of having one person combine a Pod Parser and Plucene indexer or 
some other simple process, they expect the 3500 authors to ADD extra 
content to all their POD?


Well, indexing all of CPAN was never in my original goals. My goal is to 
make the core documentation more usable, and I haven't seen any 
automated search engine that does that.


For example, let's say you want to find the definition of "scalar". 
Sure, you can use grep and find that there are 77 documents where 
"scalar" appears a total of 738 times. But which is the good one? (And 
which section of the document?) You can try to come up with some clever 
ranking algorithm, but it is not trivial (and it's not so easy to define 
things like PageRank[tm] in this case). I'd rather have a human indexer 
label the place, or just a handful of places, that have the most 
relevant information for that keyword.


Cheers,
Ivan



Re: CPANTS: has_signature, has_pod_index

2005-11-07 Thread Ivan Tubert-Brohman

Adam Kennedy wrote:



* has_pod_index: The POD contains at least one X<> keyword that helps
POD indexers. Whether only one is usefull is open for debate, because
at least the license (X), your CPAN ID under authors (x),
and some generic keyword what your module (X) is about can
probably added even for the most minimal module.



Can you give an example of how this has any practical impact on
anything?




Here is the main page for the project.
http://pod-indexing.annocpan.org/wiki/index.cgi

They talk only about the Perl core doc at this point, probably because 
adding keywords there is already enough work. AFAIK the core docs are 
now covered, so individual modules would be next.


Yep, a google-like search engine could save the effort of manually 
tagging with keywords, but I think this idea is more practical and 
will improve perldoc greatly.



I hate to say it, but this indexing thing has seemed to be ass-backwards 
to me from the beginning.


Instead of having one person combine a Pod Parser and Plucene indexer or 
some other simple process, they expect the 3500 authors to ADD extra 
content to all their POD?


Well, indexing all of CPAN was never in my original goals. My goal is to
make the core documentation more usable, and I haven't seen any
automated search engine that does that.

For example, let's say you want to find the definition of "scalar".
Sure, you can use grep and find that there are 77 documents where
"scalar" appears a total of 738 times. But which is the good one? (And
which section of the document?) You can try to come up with some clever
ranking algorithm, but it is not trivial (and it's not so easy to define
things like PageRank[tm] in this case). I'd rather have a human indexer
label the place, or just a handful of places, that have the most
relevant information for that keyword.

Cheers,
Ivan




Re: CPANTS: has_signature, has_pod_index

2005-11-07 Thread Adam Kennedy



* has_pod_index: The POD contains at least one X<> keyword that helps
POD indexers. Whether only one is usefull is open for debate, because
at least the license (X), your CPAN ID under authors (x),
and some generic keyword what your module (X) is about can
probably added even for the most minimal module.


Can you give an example of how this has any practical impact on
anything?



Here is the main page for the project. 


http://pod-indexing.annocpan.org/wiki/index.cgi

They talk only about the Perl core doc at this point, probably because 
adding keywords there is already enough work. AFAIK the core docs are now 
covered, so individual modules would be next.


Yep, a google-like search engine could save the effort of manually tagging 
with keywords, but I think this idea is more practical and will improve 
perldoc greatly.


I hate to say it, but this indexing thing has seemed to be ass-backwards 
to me from the beginning.


Instead of having one person combine a Pod Parser and Plucene indexer or 
some other simple process, they expect the 3500 authors to ADD extra 
content to all their POD?


It seems like an absolutely terrible case of CYJ... making the life of 
the search engine writer easier by making everyone else change.


Having the coverage kwalitee bit was bad enough, but supporting a 
project like this seems far far worse, as I'm not how you this is 
supposed to be any better than a natural text search of CPAN would be.


In fact, it occurs to me I've just uploaded CPAN::Mini::Extract, and if 
you tied that to Plucene you could probably _have_ an indexer for such a 
Google'esk search up and running in a day or so.


The signature one I don't mind as much, signatures are at least 
supported in most places and make some kind of sense at some level :)


Adam K


Re: CPANTS: has_signature, has_pod_index

2005-11-06 Thread Tels
-BEGIN PGP SIGNED MESSAGE-

Moin Ivan,

On Sunday 06 November 2005 17:39, Ivan Tubert-Brohman wrote:
> Tels wrote:
> >>>* has_pod_index: The POD contains at least one X<> keyword that
> >>> helps POD indexers. Whether only one is usefull is open for debate,
> >>> because at least the license (X), your CPAN ID under authors
> >>> (x), and some generic keyword what your module (X) is
> >>> about can probably added even for the most minimal module.
> >>
> >>Can you give an example of how this has any practical impact on
> >>anything?
> >
> > Here is the main page for the project.
> >
> > http://pod-indexing.annocpan.org/wiki/index.cgi
> >
> > They talk only about the Perl core doc at this point, probably
> > because adding keywords there is already enough work. AFAIK the core
> > docs are now covered, so individual modules would be next.
>
> We are not done with the core docs yet; the list of documents that are
> done is listed at
> http://pod-indexing.annocpan.org/wiki/index.cgi?IndexStats . The next
> stage in my plan would be to index the modules that come with the core
> distribution. Indexing CPAN modules is up to each individual author and
> I haven't really thought much about it yet.

Understood.

> Much as I love the POD indexing project, I'm reluctant to see this
> added as a kwalitee point. First, because there are already enough
> complaints that CPANTS is trying to "force" authors to do things in one
> specific way needlessly; and second, because it would be too early
> anyway, as pod indexing still needs to be tested in practice.

Fair enough.

> Getting off topic: I still have to figure out how a perldoc -k would
> handle indexing of CPAN modules. The problem is that having too many
> things indexed could be counterproductive. For example, doing "perldoc
> -k pop" will give you the pop function (
> http://pod-indexing.annocpan.org/perldoc-k.cgi?keyword=pop ), but what
> would happen if you index all of CPAN and there are dozens of modules
> that implement a "pop" method? I'm thinking that the best solution
> would be to have the option of doing a "core search" vs a "global
> search"...

I thought about this, too, and I think that the search result lists will 
ultimatelvily be big - after all, there will be a lot of things having 
the same keyword. 

So, reducing the set of returned "hits" must be done. Adding too much 
keywords is not a good idea, but then, we have no experience on what is 
"too much" and "too little".

OTOH, I do think that adding a keyword with the name for each function is 
not a good idea, namely because you would get hundreds of hits for "new".

Hm, maybe like so for methods:

X
X

and for non OO:

X
X

Then you could search for "method and new" (I think having the ability to 
search for more than one keyword is absolut nec so that the results do 
not overhelm the user :).

Should this discussion be continued on another mailinglist?

Best wishes,

Tels

- -- 
 Signed on Sun Nov  6 17:50:48 2005 with key 0x93B84C15.
 Visit my photo gallery at http://bloodgate.com/photos/
 PGP key on http://bloodgate.com/tels.asc or per email.

 "My glasses, my glasses. I cannot see without my glasses." - "My
 glasses, my glasses. I cannot be seen without my glasses."

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iQEVAwUBQ2415XcLPEOTuEwVAQH7cAf6A/jzDt4qxOou+Qy4PL+ThlyUp7SlrWX5
9eGGwxIEzjC6KR5LThJAmJJpJQXxuLU1kaNOvydNzbYO9a9ISg8/4T2k9K0UtvNX
LX6wFktIFoky2U6T8xtmK6ywNYBx1CM7X3SgJlgm+CfVgX8fwovaWlS9UdcEJ80R
/lQiF8YI9kbvgsfCUTRxf+5B40cMfU9uDmRQhHoxnfZe8bQaEsMSUKJQ7nZIMn1W
tVChXkJssKTWgoHcOBUK64e7ARJp2Zig0VFIodBlgtYffZj34lM0KgAYC4LTA1O9
+h0Qi6XdFFGAJhABAIBjhCIJ2eEJZOAP8nP/2CAGmdICZYucQVh0vw==
=mZiN
-END PGP SIGNATURE-


Re: CPANTS: has_signature, has_pod_index

2005-11-06 Thread Ivan Tubert-Brohman

Tels wrote:

* has_pod_index: The POD contains at least one X<> keyword that helps
POD indexers. Whether only one is usefull is open for debate, because
at least the license (X), your CPAN ID under authors (x),
and some generic keyword what your module (X) is about can
probably added even for the most minimal module.


Can you give an example of how this has any practical impact on
anything?



Here is the main page for the project. 


http://pod-indexing.annocpan.org/wiki/index.cgi

They talk only about the Perl core doc at this point, probably because 
adding keywords there is already enough work. AFAIK the core docs are now 
covered, so individual modules would be next.


We are not done with the core docs yet; the list of documents that are 
done is listed at 
http://pod-indexing.annocpan.org/wiki/index.cgi?IndexStats . The next 
stage in my plan would be to index the modules that come with the core 
distribution. Indexing CPAN modules is up to each individual author and 
I haven't really thought much about it yet.


Much as I love the POD indexing project, I'm reluctant to see this added 
as a kwalitee point. First, because there are already enough complaints 
that CPANTS is trying to "force" authors to do things in one specific 
way needlessly; and second, because it would be too early anyway, as pod 
indexing still needs to be tested in practice.


Getting off topic: I still have to figure out how a perldoc -k would 
handle indexing of CPAN modules. The problem is that having too many 
things indexed could be counterproductive. For example, doing "perldoc 
-k pop" will give you the pop function ( 
http://pod-indexing.annocpan.org/perldoc-k.cgi?keyword=pop ), but what 
would happen if you index all of CPAN and there are dozens of modules 
that implement a "pop" method? I'm thinking that the best solution would 
be to have the option of doing a "core search" vs a "global search"...


Cheers,
Ivan



Re: CPANTS: has_signature, has_pod_index

2005-11-06 Thread Tels
-BEGIN PGP SIGNED MESSAGE-

Moin,

On Sunday 06 November 2005 16:10, Ricardo SIGNES wrote:
> * Tels <[EMAIL PROTECTED]> [2005-11-06T09:44:14]
>
> > * has_signature: a SIGNATURE file exists, and is a valid signatur.
>
> That seems reasonable, even though I dread signing all my dists.  I
> feel like it will be a big hassle, but maybe I'm just afraid of change.
>
> > * has_pod_index: The POD contains at least one X<> keyword that helps
> > POD indexers. Whether only one is usefull is open for debate, because
> > at least the license (X), your CPAN ID under authors (x),
> > and some generic keyword what your module (X) is about can
> > probably added even for the most minimal module.
>
> Can you give an example of how this has any practical impact on
> anything?

Here is the main page for the project. 

http://pod-indexing.annocpan.org/wiki/index.cgi

They talk only about the Perl core doc at this point, probably because 
adding keywords there is already enough work. AFAIK the core docs are now 
covered, so individual modules would be next.

Yep, a google-like search engine could save the effort of manually tagging 
with keywords, but I think this idea is more practical and will improve 
perldoc greatly.

Best wishes,

Tels

- -- 
 Signed on Sun Nov  6 16:17:01 2005 with key 0x93B84C15.
 Visit my photo gallery at http://bloodgate.com/photos/
 PGP key on http://bloodgate.com/tels.asc or per email.

 "Metaphorisch gesprochen war das  Trusted-Computing-Vorhaben bisher wie
 eine Großmutter, die das Rotkäppchen in ihr Häuschen bitten will und ihm
 erklärt, dass die dort vorhandenen Ketten, Handschellen und Kameras zum
 Schutz vor dem bösen Wolf dienten und nichts mit ihren belgischen
 Geschäftsfreunden zu tun hätten." -- Peter Mühlbauer 22.02.2004 in
 http://tinyurl.com/yv6j3

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iQEVAwUBQ24fDncLPEOTuEwVAQEFhQf9HwkgsN0Z2O9RZmFMgmzPwqAPI4XlN9Q7
ki29yR2s7UFHhz7NcxWqq2lgLid/pISwb3/UNI4xjryx6lRCaMEFZBvPPcgi7XAV
VmFXccIwz/O6q+DTvOQFFrQDlIbhrwog026Kr+CX7NrDx6wb3EcItdt2Oo9fAUEb
sGsSD2D0IOqxrRqXZVJaUuXPMQ70TmmshqPy+mww9C3oq6M4NYftPWibE7DDt7ZT
rko7RL0B4xQGVubwl4JgV+tupYe3OlwjdhoyKzmBkmJs1Kyn7RXjYy97aufhmTTg
zlCM0i/v3AxD9n5SMJOJXyi/JykDowqMUcIpnrSHyz9TAOuBxGkgHg==
=DfuA
-END PGP SIGNATURE-


Re: CPANTS: has_signature, has_pod_index

2005-11-06 Thread Ricardo SIGNES
* Tels <[EMAIL PROTECTED]> [2005-11-06T09:44:14]
> * has_signature: a SIGNATURE file exists, and is a valid signatur.

That seems reasonable, even though I dread signing all my dists.  I feel
like it will be a big hassle, but maybe I'm just afraid of change.

> * has_pod_index: The POD contains at least one X<> keyword that helps POD 
> indexers. Whether only one is usefull is open for debate, because at 
> least the license (X), your CPAN ID under authors (x), and 
> some generic keyword what your module (X) is about can probably 
> added even for the most minimal module.

Can you give an example of how this has any practical impact on
anything?

-- 
rjbs


pgp8S7CW0B3F2.pgp
Description: PGP signature


CPANTS: has_signature, has_pod_index

2005-11-06 Thread Tels
-BEGIN PGP SIGNED MESSAGE-

Moin,

if these already have been proposed, please ignore me :)

I think the following kwalitiy checks should be added:

* has_signature: a SIGNATURE file exists, and is a valid signatur.

Technically, you should get -1 points if the signature file is 
invalid/garbled/dosn't match. However, 0 points for an invalid SIGNATURE 
one, 1 for none, and 2 for a valid one would work w/o negative scores.

0 for none/invalid and 1 for valid, would work, too.

* has_pod_index: The POD contains at least one X<> keyword that helps POD 
indexers. Whether only one is usefull is open for debate, because at 
least the license (X), your CPAN ID under authors (x), and 
some generic keyword what your module (X) is about can probably 
added even for the most minimal module.

Best wishes,

Tels

- -- 
 Signed on Sun Nov  6 15:40:11 2005 with key 0x93B84C15.
 Visit my photo gallery at http://bloodgate.com/photos/
 PGP key on http://bloodgate.com/tels.asc or per email.

 "Die deutsche Zensoren  - - - - - - - - - - - - - - - - - - - - - - - -
 - - - - - - - - - - - - - - - - - - - - - - - - - - - Dummköpfe - - - -
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 - - - - - - - - - - - - - - -." Heinrich Heine

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iQEVAwUBQ24WvncLPEOTuEwVAQEfsgf9FBSFugGZiLpMA2gmoq/SM79F1BMRcP+9
+NNhCMRaAWjoaoag/SnC0E9SahYYYrP7US8H2QNWpHWa75FhOTBZQNZdopBu271y
rPGtBuOYzLc5w+cX4YGt7Sc3JelQ71MnG2lHS1BbSFKv4MgC9ebiNYIWnGe30C3P
XUNXEustGEU2KYqYTu4PEMZD+zRW8WycuG0PgUzHsJDMJfYflRDFgTTmKruDPdme
rGvRS2DfDoCyqV/9Wd7CAX0/Nnal7o2mrJvA6yX8b5Qqpt+DwidtxyeQhsVBk3FU
bkz6xLOIWWAOzWZxbQ1uKjIT8hakBGUVlK6T1hTby0oqA9Z5Rf1TfQ==
=QQsC
-END PGP SIGNATURE-