> Any plan to implement this ? I mean move LanguageIdentifier class
> intto nutch core.

As I already suggested it on this list, I really would like to move the
LanguageIdentifier class (and profiles) to
an independant Lucene sub-project (and the MimeType repository too).
I don't remember why but there were some objections about this...

Here is a short status of what I have in mind for next improvements with the
LanguageIdentifier / MultiLanguage support :
* Enhance LanguageIdentifier APIs by returning something like an ordered
LangDetail[] array when guessing language (each LangDetail should contains
the language code and its score) - I have a prototype version of this on my
disk but I doesn't take time to finalize it
* I encountered some identification problems with some specific sites (with
blogger for instance), and I plan to investigate on this point.
* Another pending task : the analysis (and coding) of multilingual querying
support.

Regards

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Reply via email to