Re: Automatic analyzer resolving based on Locale

Geoffrey De Smet Wed, 09 May 2007 08:15:22 -0700

We 'd use a different index for each locale's language that isconfigured, however this might have an impact on performance.


Would this be attainable (maybe some day in lucene)?


- Use an IndexEverythingAnalyzer for writing,

so "werk", "werkte", "gewerkt" and "en" is indexed as-is when they areencountered.


- And then use a DutchAnalyzer for reading,
which if I ask "werk" searches for "werk", "werkte" and "gewerkt",
and also ignores stop words like "en" in the query.
EnglishAnalyzer would search with "werk" for "werk", "werkes", "werked", ...


- It might seem a bad idea to mix several languages in the same index,

but in reality few data comes with the meta-data which declares thelanguage of the data is written in.



With kind regards,
Geoffrey De Smet

Chris Hostetter schreef:

: There is nothing canned that I know of. I'm also not sure how this
: would be used. If you're using a single index, how are you going
: to index, then search using these analyzers? Or is there some
: other magic going on?

i suspect the use case is "shipped" software product, where you want to
have one jar that works anywhere, but you want the analyzer used to depend
on Locale of the JVM the software is installed in.

Personally, i would advise against auto-selecting an Analyzer based on the
runtime Locale ... it's a fine approach when dealing with purely transient
data (ie: parsing Dates iput into a form) but it's a bad idea for
persistant data (ie: formating dates to write them to a file) because the
user could change their Locale and now the index they built the last time
they ran your softare doesn't work anymore.

just make it an option configurable at install time.



-Hoss



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Automatic analyzer resolving based on Locale

Reply via email to