Clustering Carrot2:
It is a search results clustering plugin -- it attempts to group documents that are similar and put them in a group with a reasonable label so that you can identify topics covered in the search results faster.
D.
Christophe Noel wrote:
Hello,
Nutch plugins allows some functionnalities as parsing pdf, as indexing "date modified" tags.
Please confirm this little sum up and tutorial for plugins.
(1)
PARSING plugins : allow to parse different kinds of mime types -> html, text, pdf, msword, mp3, rtf
** parse-ext ** is a wrapper ... what can it do ?
INDEXING plugins : allow to index different field of the fetched pages
** index-basic : basic indexing
** index-more : index "last modified" tag, and "content-type" tag, "file-length" is coming soon...
QUERY plugins : allow different queries (query-basic handle basic queries of' course)
** query-site : query handler for site as "nutch site:www.nutch.org (missing : a whole search as "site:www.nutch.org)
** query-url : query handler for url searches.
PROTOCOL plugins : handle different protocols as file, http, and ftp
Unknown (or bad-known) by myself : ONTHOLOGY CLUSTERING CARROT2 LANGUAGE-IDENTIFIER (please explain).
(2) USE PLUGINS
Use the following kind of tags in the nutch-site.xml or nutch-default.xml
<nutch-conf>
<property>
<name>plugin.includes</name>
<value>protocol-(http|ftp)|parse-(text|html|pdf|rtf|msword|ext)|index-basic|query-(basic|site|url)|language-identifier</value>
</property>
(3) HOW TO MAKE A PLUGIN ? What are main difficulties to make a plugin ?
Thanks for your help. This could be great to talk about it on the Wiki.
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
