Hi,

I am trying to use LanguageIdentifier plugin for detecting language for
crawled results and found the following link :
http://wiki.apache.org/nutch/LanguageIdentifier

This page mentions some open issues on the lab test benchmark. Since these
numbers were reported by analyzing results
from the previous version nutch-0.7, I am curious if these issues have been
fixed in the newer versions (nutch-0.9) ?
Is there a newer link/thread for the LanguageIdentifier plugin.

Also this plugin API assumes that the given contents are in UTF-8 format.
Are the contents of nutch dump file in UTF-8 fomat?

Thanks and Regards,
Neera

Reply via email to