Hi, I am trying to use LanguageIdentifier plugin for detecting language for crawled results and found the following link : http://wiki.apache.org/nutch/LanguageIdentifier
This page mentions some open issues on the lab test benchmark. Since these numbers were reported by analyzing results from the previous version nutch-0.7, I am curious if these issues have been fixed in the newer versions (nutch-0.9) ? Is there a newer link/thread for the LanguageIdentifier plugin. Also this plugin API assumes that the given contents are in UTF-8 format. Are the contents of nutch dump file in UTF-8 fomat? Thanks and Regards, Neera
