Re: CPANTS: has_signature, has_pod_index
Adam Kennedy wrote: * has_pod_index: The POD contains at least one X<> keyword that helps POD indexers. Whether only one is usefull is open for debate, because at least the license (X), your CPAN ID under authors (x), and some generic keyword what your module (X) is about can probably added even for the most minimal module. Can you give an example of how this has any practical impact on anything? Here is the main page for the project. http://pod-indexing.annocpan.org/wiki/index.cgi They talk only about the Perl core doc at this point, probably because adding keywords there is already enough work. AFAIK the core docs are now covered, so individual modules would be next. Yep, a google-like search engine could save the effort of manually tagging with keywords, but I think this idea is more practical and will improve perldoc greatly. I hate to say it, but this indexing thing has seemed to be ass-backwards to me from the beginning. Instead of having one person combine a Pod Parser and Plucene indexer or some other simple process, they expect the 3500 authors to ADD extra content to all their POD? Well, indexing all of CPAN was never in my original goals. My goal is to make the core documentation more usable, and I haven't seen any automated search engine that does that. For example, let's say you want to find the definition of "scalar". Sure, you can use grep and find that there are 77 documents where "scalar" appears a total of 738 times. But which is the good one? (And which section of the document?) You can try to come up with some clever ranking algorithm, but it is not trivial (and it's not so easy to define things like PageRank[tm] in this case). I'd rather have a human indexer label the place, or just a handful of places, that have the most relevant information for that keyword. Cheers, Ivan
Re: CPANTS: has_signature, has_pod_index
Adam Kennedy wrote: * has_pod_index: The POD contains at least one X<> keyword that helps POD indexers. Whether only one is usefull is open for debate, because at least the license (X), your CPAN ID under authors (x), and some generic keyword what your module (X) is about can probably added even for the most minimal module. Can you give an example of how this has any practical impact on anything? Here is the main page for the project. http://pod-indexing.annocpan.org/wiki/index.cgi They talk only about the Perl core doc at this point, probably because adding keywords there is already enough work. AFAIK the core docs are now covered, so individual modules would be next. Yep, a google-like search engine could save the effort of manually tagging with keywords, but I think this idea is more practical and will improve perldoc greatly. I hate to say it, but this indexing thing has seemed to be ass-backwards to me from the beginning. Instead of having one person combine a Pod Parser and Plucene indexer or some other simple process, they expect the 3500 authors to ADD extra content to all their POD? Well, indexing all of CPAN was never in my original goals. My goal is to make the core documentation more usable, and I haven't seen any automated search engine that does that. For example, let's say you want to find the definition of "scalar". Sure, you can use grep and find that there are 77 documents where "scalar" appears a total of 738 times. But which is the good one? (And which section of the document?) You can try to come up with some clever ranking algorithm, but it is not trivial (and it's not so easy to define things like PageRank[tm] in this case). I'd rather have a human indexer label the place, or just a handful of places, that have the most relevant information for that keyword. Cheers, Ivan
Re: CPANTS: has_signature, has_pod_index
* has_pod_index: The POD contains at least one X<> keyword that helps POD indexers. Whether only one is usefull is open for debate, because at least the license (X), your CPAN ID under authors (x), and some generic keyword what your module (X) is about can probably added even for the most minimal module. Can you give an example of how this has any practical impact on anything? Here is the main page for the project. http://pod-indexing.annocpan.org/wiki/index.cgi They talk only about the Perl core doc at this point, probably because adding keywords there is already enough work. AFAIK the core docs are now covered, so individual modules would be next. Yep, a google-like search engine could save the effort of manually tagging with keywords, but I think this idea is more practical and will improve perldoc greatly. I hate to say it, but this indexing thing has seemed to be ass-backwards to me from the beginning. Instead of having one person combine a Pod Parser and Plucene indexer or some other simple process, they expect the 3500 authors to ADD extra content to all their POD? It seems like an absolutely terrible case of CYJ... making the life of the search engine writer easier by making everyone else change. Having the coverage kwalitee bit was bad enough, but supporting a project like this seems far far worse, as I'm not how you this is supposed to be any better than a natural text search of CPAN would be. In fact, it occurs to me I've just uploaded CPAN::Mini::Extract, and if you tied that to Plucene you could probably _have_ an indexer for such a Google'esk search up and running in a day or so. The signature one I don't mind as much, signatures are at least supported in most places and make some kind of sense at some level :) Adam K
Re: CPANTS: has_signature, has_pod_index
-BEGIN PGP SIGNED MESSAGE- Moin Ivan, On Sunday 06 November 2005 17:39, Ivan Tubert-Brohman wrote: > Tels wrote: > >>>* has_pod_index: The POD contains at least one X<> keyword that > >>> helps POD indexers. Whether only one is usefull is open for debate, > >>> because at least the license (X), your CPAN ID under authors > >>> (x), and some generic keyword what your module (X) is > >>> about can probably added even for the most minimal module. > >> > >>Can you give an example of how this has any practical impact on > >>anything? > > > > Here is the main page for the project. > > > > http://pod-indexing.annocpan.org/wiki/index.cgi > > > > They talk only about the Perl core doc at this point, probably > > because adding keywords there is already enough work. AFAIK the core > > docs are now covered, so individual modules would be next. > > We are not done with the core docs yet; the list of documents that are > done is listed at > http://pod-indexing.annocpan.org/wiki/index.cgi?IndexStats . The next > stage in my plan would be to index the modules that come with the core > distribution. Indexing CPAN modules is up to each individual author and > I haven't really thought much about it yet. Understood. > Much as I love the POD indexing project, I'm reluctant to see this > added as a kwalitee point. First, because there are already enough > complaints that CPANTS is trying to "force" authors to do things in one > specific way needlessly; and second, because it would be too early > anyway, as pod indexing still needs to be tested in practice. Fair enough. > Getting off topic: I still have to figure out how a perldoc -k would > handle indexing of CPAN modules. The problem is that having too many > things indexed could be counterproductive. For example, doing "perldoc > -k pop" will give you the pop function ( > http://pod-indexing.annocpan.org/perldoc-k.cgi?keyword=pop ), but what > would happen if you index all of CPAN and there are dozens of modules > that implement a "pop" method? I'm thinking that the best solution > would be to have the option of doing a "core search" vs a "global > search"... I thought about this, too, and I think that the search result lists will ultimatelvily be big - after all, there will be a lot of things having the same keyword. So, reducing the set of returned "hits" must be done. Adding too much keywords is not a good idea, but then, we have no experience on what is "too much" and "too little". OTOH, I do think that adding a keyword with the name for each function is not a good idea, namely because you would get hundreds of hits for "new". Hm, maybe like so for methods: X X and for non OO: X X Then you could search for "method and new" (I think having the ability to search for more than one keyword is absolut nec so that the results do not overhelm the user :). Should this discussion be continued on another mailinglist? Best wishes, Tels - -- Signed on Sun Nov 6 17:50:48 2005 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email. "My glasses, my glasses. I cannot see without my glasses." - "My glasses, my glasses. I cannot be seen without my glasses." -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) iQEVAwUBQ2415XcLPEOTuEwVAQH7cAf6A/jzDt4qxOou+Qy4PL+ThlyUp7SlrWX5 9eGGwxIEzjC6KR5LThJAmJJpJQXxuLU1kaNOvydNzbYO9a9ISg8/4T2k9K0UtvNX LX6wFktIFoky2U6T8xtmK6ywNYBx1CM7X3SgJlgm+CfVgX8fwovaWlS9UdcEJ80R /lQiF8YI9kbvgsfCUTRxf+5B40cMfU9uDmRQhHoxnfZe8bQaEsMSUKJQ7nZIMn1W tVChXkJssKTWgoHcOBUK64e7ARJp2Zig0VFIodBlgtYffZj34lM0KgAYC4LTA1O9 +h0Qi6XdFFGAJhABAIBjhCIJ2eEJZOAP8nP/2CAGmdICZYucQVh0vw== =mZiN -END PGP SIGNATURE-
Re: CPANTS: has_signature, has_pod_index
Tels wrote: * has_pod_index: The POD contains at least one X<> keyword that helps POD indexers. Whether only one is usefull is open for debate, because at least the license (X), your CPAN ID under authors (x), and some generic keyword what your module (X) is about can probably added even for the most minimal module. Can you give an example of how this has any practical impact on anything? Here is the main page for the project. http://pod-indexing.annocpan.org/wiki/index.cgi They talk only about the Perl core doc at this point, probably because adding keywords there is already enough work. AFAIK the core docs are now covered, so individual modules would be next. We are not done with the core docs yet; the list of documents that are done is listed at http://pod-indexing.annocpan.org/wiki/index.cgi?IndexStats . The next stage in my plan would be to index the modules that come with the core distribution. Indexing CPAN modules is up to each individual author and I haven't really thought much about it yet. Much as I love the POD indexing project, I'm reluctant to see this added as a kwalitee point. First, because there are already enough complaints that CPANTS is trying to "force" authors to do things in one specific way needlessly; and second, because it would be too early anyway, as pod indexing still needs to be tested in practice. Getting off topic: I still have to figure out how a perldoc -k would handle indexing of CPAN modules. The problem is that having too many things indexed could be counterproductive. For example, doing "perldoc -k pop" will give you the pop function ( http://pod-indexing.annocpan.org/perldoc-k.cgi?keyword=pop ), but what would happen if you index all of CPAN and there are dozens of modules that implement a "pop" method? I'm thinking that the best solution would be to have the option of doing a "core search" vs a "global search"... Cheers, Ivan
Re: CPANTS: has_signature, has_pod_index
-BEGIN PGP SIGNED MESSAGE- Moin, On Sunday 06 November 2005 16:10, Ricardo SIGNES wrote: > * Tels <[EMAIL PROTECTED]> [2005-11-06T09:44:14] > > > * has_signature: a SIGNATURE file exists, and is a valid signatur. > > That seems reasonable, even though I dread signing all my dists. I > feel like it will be a big hassle, but maybe I'm just afraid of change. > > > * has_pod_index: The POD contains at least one X<> keyword that helps > > POD indexers. Whether only one is usefull is open for debate, because > > at least the license (X), your CPAN ID under authors (x), > > and some generic keyword what your module (X) is about can > > probably added even for the most minimal module. > > Can you give an example of how this has any practical impact on > anything? Here is the main page for the project. http://pod-indexing.annocpan.org/wiki/index.cgi They talk only about the Perl core doc at this point, probably because adding keywords there is already enough work. AFAIK the core docs are now covered, so individual modules would be next. Yep, a google-like search engine could save the effort of manually tagging with keywords, but I think this idea is more practical and will improve perldoc greatly. Best wishes, Tels - -- Signed on Sun Nov 6 16:17:01 2005 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email. "Metaphorisch gesprochen war das Trusted-Computing-Vorhaben bisher wie eine Großmutter, die das Rotkäppchen in ihr Häuschen bitten will und ihm erklärt, dass die dort vorhandenen Ketten, Handschellen und Kameras zum Schutz vor dem bösen Wolf dienten und nichts mit ihren belgischen Geschäftsfreunden zu tun hätten." -- Peter Mühlbauer 22.02.2004 in http://tinyurl.com/yv6j3 -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) iQEVAwUBQ24fDncLPEOTuEwVAQEFhQf9HwkgsN0Z2O9RZmFMgmzPwqAPI4XlN9Q7 ki29yR2s7UFHhz7NcxWqq2lgLid/pISwb3/UNI4xjryx6lRCaMEFZBvPPcgi7XAV VmFXccIwz/O6q+DTvOQFFrQDlIbhrwog026Kr+CX7NrDx6wb3EcItdt2Oo9fAUEb sGsSD2D0IOqxrRqXZVJaUuXPMQ70TmmshqPy+mww9C3oq6M4NYftPWibE7DDt7ZT rko7RL0B4xQGVubwl4JgV+tupYe3OlwjdhoyKzmBkmJs1Kyn7RXjYy97aufhmTTg zlCM0i/v3AxD9n5SMJOJXyi/JykDowqMUcIpnrSHyz9TAOuBxGkgHg== =DfuA -END PGP SIGNATURE-
Re: CPANTS: has_signature, has_pod_index
* Tels <[EMAIL PROTECTED]> [2005-11-06T09:44:14] > * has_signature: a SIGNATURE file exists, and is a valid signatur. That seems reasonable, even though I dread signing all my dists. I feel like it will be a big hassle, but maybe I'm just afraid of change. > * has_pod_index: The POD contains at least one X<> keyword that helps POD > indexers. Whether only one is usefull is open for debate, because at > least the license (X), your CPAN ID under authors (x), and > some generic keyword what your module (X) is about can probably > added even for the most minimal module. Can you give an example of how this has any practical impact on anything? -- rjbs pgp8S7CW0B3F2.pgp Description: PGP signature
CPANTS: has_signature, has_pod_index
-BEGIN PGP SIGNED MESSAGE- Moin, if these already have been proposed, please ignore me :) I think the following kwalitiy checks should be added: * has_signature: a SIGNATURE file exists, and is a valid signatur. Technically, you should get -1 points if the signature file is invalid/garbled/dosn't match. However, 0 points for an invalid SIGNATURE one, 1 for none, and 2 for a valid one would work w/o negative scores. 0 for none/invalid and 1 for valid, would work, too. * has_pod_index: The POD contains at least one X<> keyword that helps POD indexers. Whether only one is usefull is open for debate, because at least the license (X), your CPAN ID under authors (x), and some generic keyword what your module (X) is about can probably added even for the most minimal module. Best wishes, Tels - -- Signed on Sun Nov 6 15:40:11 2005 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email. "Die deutsche Zensoren - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Dummköpfe - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -." Heinrich Heine -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) iQEVAwUBQ24WvncLPEOTuEwVAQEfsgf9FBSFugGZiLpMA2gmoq/SM79F1BMRcP+9 +NNhCMRaAWjoaoag/SnC0E9SahYYYrP7US8H2QNWpHWa75FhOTBZQNZdopBu271y rPGtBuOYzLc5w+cX4YGt7Sc3JelQ71MnG2lHS1BbSFKv4MgC9ebiNYIWnGe30C3P XUNXEustGEU2KYqYTu4PEMZD+zRW8WycuG0PgUzHsJDMJfYflRDFgTTmKruDPdme rGvRS2DfDoCyqV/9Wd7CAX0/Nnal7o2mrJvA6yX8b5Qqpt+DwidtxyeQhsVBk3FU bkz6xLOIWWAOzWZxbQ1uKjIT8hakBGUVlK6T1hTby0oqA9Z5Rf1TfQ== =QQsC -END PGP SIGNATURE-