[ http://issues.apache.org/jira/browse/NUTCH-237?page=all ]

Dawid Weiss updated NUTCH-237:
------------------------------

    Attachment: NUTCH-237.DWEISS.patch.zip

Hi Andrzej. The ZIP file contains a patch and svn stat with the improved code:

- The primary language for hits without explicit langid and a list of enabled 
languages in the clustering component can be specified in the configuration 
file (readme.txt gives the details).

- by default all languages in Carrot2 (except for Polish) are enabled. English 
is the default.

- I removed the dependency on Neko in favor of the simpler routine we have in 
Carrot2 codebase anyway. The change shouldn't affect the results (I checked on 
my local installation and it seems to be fine).

I haven't played with the language identifier yet because I don't have a crawl 
with documents containing langid codes. The code should work without problems 
though -- details.getValue("lang") is converted to Carrot2's property 
RawDocument.PROPERTY_LANGUAGE and this is taken into account when clustering.

I couldn't delete previously attached files. This ZIP file contains only the 
patch and svnstat -- you'll have to remove a few JARs manually and replace 
other with their new counterparts from the ZIP file I've attached to this issue 
earlier (they haven't changed). Let me know if you need anything.


> Carrot2 clustering plugin upgrade.
> ----------------------------------
>
>          Key: NUTCH-237
>          URL: http://issues.apache.org/jira/browse/NUTCH-237
>      Project: Nutch
>         Type: Improvement

>     Reporter: Dawid Weiss
>     Priority: Trivial
>  Attachments: NUTCH-237.DWEISS.patch.zip, c2.patch, libs.zip, svn-stat.txt
>
> This is an upgrade of the clustering plugin to the newest release (1.0.2).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to