[ http://issues.apache.org/jira/browse/NUTCH-237?page=all ]
Dawid Weiss updated NUTCH-237:
------------------------------
Attachment: NUTCH-237.DWEISS.patch.zip
Hi Andrzej. The ZIP file contains a patch and svn stat with the improved code:
- The primary language for hits without explicit langid and a list of enabled
languages in the clustering component can be specified in the configuration
file (readme.txt gives the details).
- by default all languages in Carrot2 (except for Polish) are enabled. English
is the default.
- I removed the dependency on Neko in favor of the simpler routine we have in
Carrot2 codebase anyway. The change shouldn't affect the results (I checked on
my local installation and it seems to be fine).
I haven't played with the language identifier yet because I don't have a crawl
with documents containing langid codes. The code should work without problems
though -- details.getValue("lang") is converted to Carrot2's property
RawDocument.PROPERTY_LANGUAGE and this is taken into account when clustering.
I couldn't delete previously attached files. This ZIP file contains only the
patch and svnstat -- you'll have to remove a few JARs manually and replace
other with their new counterparts from the ZIP file I've attached to this issue
earlier (they haven't changed). Let me know if you need anything.
> Carrot2 clustering plugin upgrade.
> ----------------------------------
>
> Key: NUTCH-237
> URL: http://issues.apache.org/jira/browse/NUTCH-237
> Project: Nutch
> Type: Improvement
> Reporter: Dawid Weiss
> Priority: Trivial
> Attachments: NUTCH-237.DWEISS.patch.zip, c2.patch, libs.zip, svn-stat.txt
>
> This is an upgrade of the clustering plugin to the newest release (1.0.2).
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers