: - Use an IndexEverythingAnalyzer for writing, : so "werk", "werkte", "gewerkt" and "en" is indexed as-is when they are : encountered. : : - And then use a DutchAnalyzer for reading, : which if I ask "werk" searches for "werk", "werkte" and "gewerkt", : and also ignores stop words like "en" in the query. : EnglishAnalyzer would search with "werk" for "werk", "werkes", "werked", ...
that sounds liek a fairly simplistic view of things ... sucessful analysis tends to require compatible efforts taken when indexing and querying ... when indexing you know context, and structure; when querying you have much more limited information to start with. Most Stemming algorithms i've seen end to rely more on work done when indexing then work done when searching -- even if you had an extremely well maintained dictionary based stemmer, you would probably wind up wanting to do stem normalation when indexing (as opposed to expansion when searching) to keep your search times fast. : - It might seem a bad idea to mix several languages in the same index, : but in reality few data comes with the meta-data which declares the : language of the data is written in. i'm told by the people who seem to know a lot about this that this is where langauge detection comes ito play .. when indexing documnts, even without metadata, you tend to have a large neough "chunk" of text to make assumptions about the language. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]