Hi,

looking up the different terms with a common stem can be useful in different
scenarios - so I don't want to judge it whether someone needs it or not.

E.g., in the case you have multilingual documents in your index, it is straight
forward to determine the language of the documents in order to choose the right
stemmer. At least this is right for document with homogenous language.

Althought this is true at indexing time, the language classification for the
user query is not such trivial - and you have to do this in order to stem the
query terms for searching. One possibility would be to search for the stems
given from all stemmers - but in this case you will receive many wrong
searching terms, thus much noise in the result lists.

Another possibility can be to offer all 'potential synonyms' of the query terms
to the user - where he can choose whether these are right or not. In this case
you need exactly the lookup 'queryTerm->stem->terms with same stem'. This can
be much more precise, the lacks are of course the interaction needed by the
user and longer queries.

To realize this, someone could write a specific Analyzer that stores this
relationship additionally e.g. into a database. I personaly don't know any
possibility to read this directly out of the Lucene index.


In the case someone has best practices or an idea how processing multilingual
indices can be done better, I would be appreciated to read / hear about this.



all best

Chris


On Tue, 6 Oct 2009 16:31:36 +0900
David Leangen <apa...@leangen.net> wrote:

> 
> Hello,
> 
> I've been using Lucene in a very basic way for some time now, and I'm  
> starting to take advantage of some of the linguistic capabilities only  
> now.
> 
> I am making use of the snowball analyzer for stemming, and it works  
> very well.
> 
> 
> Question: is there any such thing as a "reverse stemmer"? In other  
> words, given the stem of a word, is there any algorithm to find the  
> original word? Or is this just fantasy? ;-)
> 
> Now, I understand that there is a 1:n mapping of stems:words. I can  
> deal with that.
> 
> 
> Thanks!
> =David
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 

Attachment: signature.asc
Description: PGP signature

Reply via email to