The proposal sounds good. I'd love to see something like this.

A couple of comments...

1. It might be nice to change the wording so users know that
they can plug any analyzer into it, whether it's for another
language or just for stemming or something else.

2. Having the query analysis in a QueryFilter might be an easier
way to go. I found it easier to let NutchAnalysis work on the query
string first, and then pump the clause values through an analyzer.
That way I didn't have to worry about parsing and stemming
things like "site:" and doing the right thing. I just worried about
analyzing certain fields' values from the query like "title", "anchor", and
"content". And you can easily skip analysis of the values for
fields like "url".

It's probably less efficient this way since you're tokenizing twice, but
query strings are usually short anyway, and it gives the user more
control over which fields to stem/analyze. Writing an abstract
LanguageQueryFilter plugin could be a nice solution. You could
subclass this to use LanguageIdentifier or other methods to select
an analyzer. Others could simply subclass it with an analyzer of their
choice.

Howie

I recently send a proposal on the Nutch Wiki for Multi-Lingual support in
Nutch (add the ability to add language specific analyzers for both querying
and analyzing).
This document is available at
http://wiki.apache.org/nutch/MultiLingualSupport
It seems that your solution is very similar to mine (except mine uses the
plugin framework).
Could you please review my proposal regarding your experience.
I will begin implementation in a few days (and will keep in mind the piece
of code you send).

Regards

Jerome

--
http://motrech.free.fr/
http://frutch.free.fr/




-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to