Sami Siren wrote:
Doug Cutting wrote:
Language id can already be reasonably done by an indexing filter. I don't see any advantage to moving it here. Am I missing something?
And stemming and other word-based operations should be performed in Lucene Analyzers, while indexing. Nutch does not yet permit a plugin here, but this might eventually make sense.
When I was thinking, and actually implemented something, about localized stemming I
did it with TokenFilter. But the point is that the language was known before the
indexing(before creating the Document, or adding the fields) isn't that (adding fields) the point that analyzers/filters are utilized?
But if the language is identified in indexing filter how can that information be used to
select localized stemmer/other data for the fields allready added?
This points out an important deficiency in the plugin system - currently it is not possible to be absolutely sure which plugin runs first. Many Unix systems solve this by naming (e.g. /etc/rc.d scripts are named like "S01xxx.sh", "S02xxx.sh") - we could do it that way... It would also be possible to put a "priority" number in the plugin definition, so that the plugins are ordered during execution according to their "priority".
This should solve the problem you describe, because then you could put the language detection plugin as the first, and then set the appropriate analyzer for adding the documents (which requires minor modifications to the IndexSegment).
-- Best regards, Andrzej Bialecki
------------------------------------------------- Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator ------------------------------------------------- FreeBSD developer (http://www.freebsd.org)
------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
