The proposal sounds good. I'd love to see something like this.
A couple of comments... 1. It might be nice to change the wording so users know that they can plug any analyzer into it, whether it's for another language or just for stemming or something else. 2. Having the query analysis in a QueryFilter might be an easier way to go. I found it easier to let NutchAnalysis work on the query string first, and then pump the clause values through an analyzer. That way I didn't have to worry about parsing and stemming things like "site:" and doing the right thing. I just worried about analyzing certain fields' values from the query like "title", "anchor", and "content". And you can easily skip analysis of the values for fields like "url". It's probably less efficient this way since you're tokenizing twice, but query strings are usually short anyway, and it gives the user more control over which fields to stem/analyze. Writing an abstract LanguageQueryFilter plugin could be a nice solution. You could subclass this to use LanguageIdentifier or other methods to select an analyzer. Others could simply subclass it with an analyzer of their choice. Howie
I recently send a proposal on the Nutch Wiki for Multi-Lingual support in Nutch (add the ability to add language specific analyzers for both querying and analyzing). This document is available at http://wiki.apache.org/nutch/MultiLingualSupport It seems that your solution is very similar to mine (except mine uses the plugin framework). Could you please review my proposal regarding your experience. I will begin implementation in a few days (and will keep in mind the piece of code you send). Regards Jerome -- http://motrech.free.fr/ http://frutch.free.fr/
------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
