It would be useful to have more details about the query input and the expected 
highlights you want.

So given your 'zone-indeling' example document and the index-time tokenisation 
you described, which of the following queries would you expect to match and 
what would you want highlighted in each case?
1) zone
2) zone-indeling
3) "zone indeling"
4) zone-somethingElse


My assumption here is that you are using the standard Lucene Query parser and 
that query 3 will therefore be a phrase query. 

Cheers
Mark


----- Original Message ----
From: Matthijs Bierman <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, 14 November, 2007 11:51:07 AM
Subject: Re: get original term for synonym

Hi Mark,

Your solution would be correct if the synonym would be a true 2-way
synonym. Unfortunately this is not the case. My analyzer takes care of
decomposition of specific Dutch words (where a "-" is used to create
compound words). For example: 'zone-indeling' would create synonyms for
'zone'-> 'zone-indeling' and 'indeling'->'zone-indeling'.
When analyzing 'zone' it will therefore not point back to
'zone-indeling' (this information is simply not available). Putting all
the results from the indexing process into a file or lucene document
(thus creating a 'lookup' index) would probably make the lookup process
rather slow, or make application startup too long (due to HashMap
generation).

Maybe you can do something with offsets?

Thanks,
Matthijs


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






      ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to