Hi Mark,

I have solved it in another way now. I've created my own implementation of StandardAnalyzer (which I've called AdvancedAnalyzer). This analyzer keeps the word "zone-indeling" together, so users can simply search for this term and it will be highlighted exactly as is. These compound words occur in Dutch only I presume.

The problem was my highlighter was for a PDF document (through PDFBox). This would highlight 1,2 and "indeling". Resulting in unwanted behaviour when the word was split into zone and indeling.

Thanks for your help though.

Cheers,
Matthijs

mark harwood wrote:
It would be useful to have more details about the query input and the expected 
highlights you want.

So given your 'zone-indeling' example document and the index-time tokenisation 
you described, which of the following queries would you expect to match and 
what would you want highlighted in each case?
1) zone
2) zone-indeling
3) "zone indeling"
4) zone-somethingElse


My assumption here is that you are using the standard Lucene Query parser and that query 3 will therefore be a phrase query.
Cheers
Mark


----- Original Message ----
From: Matthijs Bierman <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, 14 November, 2007 11:51:07 AM
Subject: Re: get original term for synonym

Hi Mark,

Your solution would be correct if the synonym would be a true 2-way
synonym. Unfortunately this is not the case. My analyzer takes care of
decomposition of specific Dutch words (where a "-" is used to create
compound words). For example: 'zone-indeling' would create synonyms for
'zone'-> 'zone-indeling' and 'indeling'->'zone-indeling'.
When analyzing 'zone' it will therefore not point back to
'zone-indeling' (this information is simply not available). Putting all
the results from the indexing process into a file or lucene document
(thus creating a 'lookup' index) would probably make the lookup process
rather slow, or make application startup too long (due to HashMap
generation).

Maybe you can do something with offsets?

Thanks,
Matthijs


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






      ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to